Website dataset workflow
Source:vignettes/website-dataset-workflow.Rmd
website-dataset-workflow.Rmd
The Pandemic PACT data is publicly available for download from its website. pactr provides an application programming interface (API) to the research programme’s dataset available from its website data download facility allowing for programmatic access to its publicly available funder tracker dataset.
Website data interface
The functions for interfacing with the data available from the Pandemic PACT website allow for downloading, reading, and processing. Current website data-specific functionalities available in pactr are:
Downloading of Pandemic PACT dataset available from the website (stable);
Reading of Pandemic PACT dataset available from the website (stable); and,
Processing of Pandemic PACT dataset available from the website (experimental).
Website data workflow
Download data available from website
To download the Pandemic PACT tracker dataset available from its website, the following command can be used:
## Save the dataset from website to a temporary directory ----
pact_download_website(path = tempdir())
which will return the path to the downloaded dataset:
#> [1] "/tmp/RtmpjUmn8P/pandemic-pact-grants.csv"
Read the Pandemic PACT tracker dataset from the website
Instead of downloading, the Pandemic PACT dataset available from its website can be read into R directly as follows:
which results in the following:
#> # A tibble: 9,862 × 39
#> GrantID PubMedGrantId GrantTitleEng Abstract
#> <chr> <chr> <chr> <chr>
#> 1 C00037 170359 COVID-19: Improving th… "The cl…
#> 2 C00038 170357, 171495, 175580 Identification of biom… "The ou…
#> 3 C00040 170353, 175493 Development of a rapid… "This r…
#> 4 C00041 109434 Rapid, Low-cost Diagno… "The ou…
#> 5 C00043 170355 Rapid Research Respons… "In 201…
#> 6 C00045 170343 Development and Evalua… "Corona…
#> 7 C00046 170346, 175528 Rapid development of a… "The ou…
#> 8 C00047 170342 Preventing SARS-CoV- 2… "The SA…
#> 9 C00048 170360 Understanding, Forecas… "A new …
#> 10 C00049 170362, 175535 RIsk of environmental … "This s…
#> # ℹ 9,852 more rows
#> # ℹ 35 more variables: PublicationYearOfAward <int>,
#> # GrantEndYear <int>, ResearchInstitutionName <chr>,
#> # GrantAmountConverted <dbl>, StudySubject <chr>,
#> # Ethnicity <chr>, AgeGroups <chr>, Rurality <chr>,
#> # VulnerablePopulations <chr>, OccupationalGroups <chr>,
#> # StudyType <chr>, ClinicalTrial <chr>, Pathogen <chr>, …
Process the Pandemic PACT tracker dataset from the website
The package includes functions that will process the Pandemic PACT tracker dataset into specific structures and aggregations that will allow for further plotting and reporting of similar outputs that are currently presented in the Pandemic PACT website.
For example, the following will process the Pandemic PACT tracker dataset into an aggregated dataset structure that can be used to create a similar plot to the one presented in the website.
pact_read_website() |>
pact_process_website() |>
pact_process_topic_group(topic = "Disease", group = "GrantStartYear")
which produces the following output:
#> Error in pact_process_website(pact_data): could not find function "pact_process_website"
#> # A tibble: 170 × 3
#> GrantStartYear Disease n
#> <int> <chr> <int>
#> 1 1978 Pandemic-prone influenza 1
#> 2 1978 Severe Acute Respiratory Syndrome (SARS) 1
#> 3 1981 COVID-19 1
#> 4 1982 COVID-19 1
#> 5 1988 COVID-19 1
#> 6 1992 COVID-19 6
#> 7 1994 COVID-19 2
#> 8 1996 COVID-19 1
#> 9 1997 COVID-19 28
#> 10 1997 Zika virus disease 1
#> # ℹ 160 more rows