– Written by Dimple Patel, Clinical Programmer
Clinical programmers have been slow on the uptake in adopting R programming in the pharmaceutical industry. It’s not surprising: SAS is a robust language that quickly handles large data and has been used for ages in clinical trials. Is there a solution out there for R enthusiasts who want to integrate the language into CDISC-compliant, submission-ready, analysis-friendly outputs such as ADaM datasets? Yes!
Roche and GSK – forces at pharmaverse – teamed up to address this gap in clinical programmers and R assimilation by developing an open-source modular toolkit to create ADaM in R: {admiral}, ADaM in R Asset Library.
Programmers can find the package on the CRAN (Comprehensive R Archive Network) website. A single line of code can be written to install this special toolbox in RStudio:
The {admiral} package works well in conjunction with other R packages: dplyr, magrittr, lubridate and pharmaversesdtm. dplyr provides core data manipulation techniques for ADaM dataset creation, such as mutate and summarise to add new variables; left_join, right_join, inner_join and full_join to merge input datasets; and filter to extract rows. magrittr supports multimodality by manipulating data via the infamous “pipe” operator; specifically, magrittr’s forward pipe feeds values into right-hand side functions to structure data operations from left to right instead of from inside out. lubridate, a member of the tidyverse family of R packages, lends support with date/time conversions from SDTM to ADaM-friendly formats; calculation of duration; and imputation for missing data. Lastly, pharmaversesdtm helps develop ADaM datasets and vignettes by providing dummy SDTM datasets culled from CDISC’s pilot project. Other libraries, such as tibble (for data frame manipulation) and stringr (for character manipulation), can be installed based on the programmer’s needs.
The {admiral} package has simplified developing ADaM subject-level (ADSL) datasets and basic data structure (BDS) datasets for novice R programmers like myself. The {admiral} team has thoroughly documented and provided example vignettes to help create and handle both ADSL and BDS datasets and describe the easy-to-use functions.
For example, the ADSL vignette briefly describes how to derive a common ADSL variable – RANDDT (randomisation date) – by customising the inputs to {admiral} functions derive_vars_dt() and derive_vars_merged():
The above code is easy to use for beginners! First, the derive_vars_dt() function converts character date variables like DSSTDTC from the DS (disposition) SDTM dataset into a numeric date that starts with a prefix (DSST) such as DSSTDT. Here, DSSTDTC, the disposition start date, is converted and used to extend the DS dataset into an intermediate dataset – DS_EXT. Second, the derive_vars_merge function adds the RANDDT randomisation date to the DS_EXT dataset by filtering disposition dates to only those that are “RANDOMIZED”, linking in identifiers like STUDYID and USUBJID and naming the new variable to the ADaM-friendly RANDDT. Note, the exprs function handles matrices of values – perfect for ADaM dataset creation. The exprs function returns expressions in an unevaluated form, essentially in a tree-like object that describes how to compute the value, which reduces computational power until needed for large matrices such as USUBJID and SUBJID in the aforementioned code, both of which have 306 elements each! Finally, the %>% pipe operator feeds the RANDDT output of the function derive_vars_merged (pipe operators are read from right to left) into the ADSL dataset using the %>% pipe operator.
How does the {admiral} toolkit help make BDS creation easier? Again, like the ADSL capabilities, clearly defined functions with thorough documentation help weave new ADaM datasets such as ADVS (vital signs analysis datasets) using SDTM datasets as input.
In the above code, two existing functions – exprs and mutate – are piled on top of the {admiral} package functions to correctly map subjects’ vital signs (VS) data to an analysis-friendly dataset with corresponding parameters and analysis variables and values. First, the derive_vars_merged function is used to merge actual and planned treatment variables culled from the ADSL dataset – like TRTSDT, TRTEDT, TRT01A, TRT01P – to the VS dataset by identifiers like STUDYID and USUBJID. Then the derive_vars_merged_lookup() function uses an intermediate parameter look-up tribble that creates ADaM-compliant parameter code (PARAMCD) and parameter (PARAM) variables to accurately map every SDTM vital sign test code (VSTESTCD). This function mimics clunky SQL merging capabilities easily. Finally, the mutate() function adds two new columns that map numeric vital sign findings (VSSTRESN) and character findings (VSSTRESC) to ADaM-ready analysis value variables AVAL and AVALC, respectively.
The {admiral} package further supports specialised ADaM programming with three extension packages: admiralonco, admiralophtha and admiralvaccine. admiralonco features many functions and descriptive vignettes that support the creation of datasets for oncology trials such as tumour response analysis dataset (ADRS) and time-to-event dataset (ADTTE). For example, one admiralonco function can calculate the best overall response (BOR) parameter, an important oncology trial endpoint featured in ADRS datasets. BOR itself depends on overall responses such as complete response (CR), partial response (PR), progressive disease (PD) and stable disease (SD). admiralonco boasts functions and vignettes which help create ADRS under both iRECIST and RECIST 1.1 guidelines. RECIST 1.1 objectively measures tumour response to treatment, and iRECIST is a modified version of RECIST 1.1 to address tumour response to immunotherapeutic agents. For example, iRECIST can have unconfirmed progressive disease (iUPD) before iCR, iPR and iSD, but RECIST 1.1 criteria precludes progressive disease before CR, PR and SD. Other handy admiralonco functions can calculate other ADRS variables, like measurable disease at baseline.
admiralophtha features an innovative function – derive_var_afeye() – which yields the affected eye for ADOE – the ophthalmology exam analysis dataset – based on a few input parameters such as laterality of the eye where information was collected. admiralophtha also equips programmers with an easily customised function that converts best corrected visual acuity (BCVA) results into logMAR units and another that further sorts the BCVA data into analysis-friendly, therapeutic area-specific Snellen categories.
In addition to admiralophtha and admiralonco, admiralvaccine helps programmers create specialised datasets such as the immunogenecity dataset (ADIS) and the analysis clinical events dataset (ADCE). Powerful functions such as derive_basetype_records() and derive_var_base determine baseline records, variables and base category for subjects who have completed immunogenicity specimen assessments (IS). admiralvaccine also features derive_fever_records() for vaccine clinical trials; this special function determines records in which subjects have experienced a fever based on the SDTM vital signs (VS) dataset. Moreover, other innovative functions such as derive_vars_crits() derive criteria evaluation analysis flags based on simple inputs and a defined criterion.
What are some limitations of this otherwise versatile toolkit? The {admiral} package doesn’t build a single ADaM dataset with a single click or jack-of-all-trades umbrella function, so programmers still need to ensure compliance although the package follows CDISC standards. Also, when working with efficacy endpoints, programmers may need to judge and test packages’ functions against their own company standards. And although {admiral} has tested data from several pharmaceutical companies, programmers may need to use dummy data to properly address this limitation. The {admiral} creators provided tests for >120 existing and new functions, but the {admiral} package does not replace any company’s quality check or validation procedure.
Overall, {admiral} is a robust R toolkit that culls functions to support programmers in creating ADaM datasets for clinical trial submission. Not only are the functions robust, but also the toolkit uniquely features descriptive vignettes that help novice R programmers grasp the scope and flexibility of the {admiral} package.