– Written by Michael Rimler, Open Source Technologies Director
In 2023, I was invited to join the PHUSE Board as Open Source Technologies Director. The PHUSE Community – those interested in clinical data science and the development of new medicines and vaccines to impact global health – has demonstrated a significant interest in figuring out how open-source technology and software may be confidently deployed to support the clinical data pipeline. The decision for PHUSE to expand its Board of Directors is thus, in my opinion, directly in support of its mission of “sharing ideas, tools and standards around data, statistical and reporting technologies to advance the future of life science”. In short, the role I have had the responsibility of filling is testament to PHUSE listening and responding to its Community. As we approach the end of 2024, I’d like to take this opportunity to report back to the Community on all that PHUSE has enabled with respect to open-source technology this year.
In case you missed it, 2024 started off at the US Connect in Bethesda, where I was joined on stage with Ross Farrugia to deliver the Tuesday keynote address Open Source: The Current State of the Industry and Projections for the Future (recording).
Key takeaways from our keynote:
-
Open-source technology offers us a myriad of opportunities, but it’s not free.
-
We can share the burden – collaboration is key to full realisation of the value.
-
You don’t have to start big – even small contributions help advance this space.
In addition, an ongoing project led through my Board role is Open Source Technology in Clinical Data Analysis (OSTCDA), a crowd-sourced effort to document answers to key questions in the open source in pharma space. Industry colleagues Katja Glass, Mike Smith and Mike Stackhouse have joined me to create a GitHub repository of key questions. Throughout the year, we have facilitated both in-person discussions (at the Connects and at the CSS) and virtual discussions (via Open Forums) to explore these questions, feeding into the current draft of the manuscript. Katja also led a discussion at the Copenhagen SDE in October. Anyone is welcome to contribute to the effort by engaging in the GitHub discussions, upvoting answers, or logging feedback through GitHub Issues. We also conducted interactive discussions at the EU Connect 2024, in Strasbourg, and plan to do so again at the US Connect 2025, in Orlando.
The PHUSE Working Group Data Visualisation & Open Source Technology (DVOST) has six active Working Group projects, including CAMIS (Comparing Analysis Method Implementations in Software). Led by Lyn Taylor and Christina Fillmore, this WG project is open-sourcing the “[demystification of] conflicting results between software and to help ease the transitions to new languages by providing comparison and comprehensive explanations”. In the project’s GitHub repository, anyone can review comparisons of statistical method implementations in SAS, R and Python, as well as contribute to any gaps that remain to be covered (including additional language implementations). Many results and comparisons are already present for our community to reference.
PHUSE Single Day Events have enabled community knowledge sharing on open-source topics. The theme of the 2024 Chennai SDE was squarely focused on Open-Source Technologies in Data Sciences and Analytics – Next Steps. Heidelberg included a Pre-SDE Workshop: Open-Source Framework for Building Interactive Visualisation to “provide guidance and hands-on exercises for using the open-source framework and building interactive visualisation of clinical trial data. The sessions delivered an insight into what could be expected at the SDE, which followed the next day”.
Content from the 19 PHUSE SDEs held around the world in 2024 include:
-
Integrating GenAI with an Open-Source Toolkit for Insights Generation to Boost Data Science Efficiency – Vincent Shen, Roche
-
R and pharmaverse: The New Frontier for Today’s Statistical Programmers – Sunil Gupta, Experis
-
Is the Development of Automation Tools in Both Open Source and Commercial Software the Future of Clinical Data Operations? – Abigail Steward & Ben Barnaby-Pass, Phastar
-
Revolutionizing Clinical Trials: The Impact of OpenStudyBuilder on Automation and Insights – Katja Glass, Katja Glass Consulting
-
Redefining Standards: Interactive R/Shiny Dashboards for FDA Submissions – Vedha Viyash, Appsilon
-
PHUSE DVOST CAMIS Project – Qian Wang, MSD
-
Getting Open Source R in the Mainstream for CDISC SDTM – Allwyn Dsouza & Vishvanath Kothari, Saama Technologies
-
Leveraging Open-Source Technology for SAS Program/Dataset/Output Comparison – Balram Chauhan, Cognizant
-
Data Anonymisation & De-Identification Using Open-Source Technology – Priya Tiwary, Abluva
-
{whirl} – An Open-Source R Package for Program Logs – Aksel Thomsen, Novo Nordisk
-
Navigating Open Seas: Exploring Open-Source Technology in Clinical Data Analysis – Katja Glass, Katja Glass Consulting
-
Enabling Multilingual Code Understanding: Bridging the Gap in Diverse Programming Environments – Casey Higgins, Atorus Research (slides will be available here shortly.)
And, of course, the PHUSE Connects deliver further quality presentations, workshops and discussions on open-source technologies – too many to detail. I encourage you to head to the PHUSE Archive and search to your heart’s content. One specific callout from the US Connect 2024 is a workshop on Dataset-JSON, an alternative data format which is open source and which many see as a replacement to the traditional open-source XPT Transport file format. The workshop comes from the CDISC-PHUSE pilot project described on the CDISC website and the PHUSE Working Group web page. A webinar presented by the joint project is available here. Bethesda (US Connect) was fantastic, the EU Connect (Strasbourg) this month was equally impressive!
Finally, what about pharmaverse – an initiative PHUSE has been supporting since 2023? This year, pharmaverse has seen a number of new packages added to the ecosystem, some as brand-new releases. Please check them out and engage with the product teams via the respective GitHub repositories to log issues for bugs or features you’d like added or modified. This is the true power of open source – your power!
-
admiralpeds – provides a complementary (to admiral) toolbox for users to develop specifics for paediatric clinical trials
-
admiralmetabolic (release expected in 2024)
-
aNCA – enables users to upload their datasets and perform non-compartment analysis (NCA) on both pre-clinical and clinical datasets, with the results being easily visualisable
-
cardinal – table-generating functions to implement standard FDA Safety Tables according to the guidelines
-
cards – creates CDISC analysis results datasets
-
cardx – extension of the {cards} package, providing additional functions to create analysis results datasets (ARDs)
-
chevron – collection of high-level functions to create standard outputs for clinical trial reporting with limited parameterisation
-
gtsummary – provides an elegant and flexible way to create publication-ready analytical and summary tables using the R programming language
-
rlistings – designed to create and display listings with R
-
sdtm.oak – an EDC and data standard-agnostic solution that enables the pharmaceutical programming community to develop CDISC SDTM datasets in R
-
tfrmtbuilder – provides a language for defining display-related metadata, which can then be used to automate and easily update output formats
In addition to the great work the Community is doing to provide valuable open-sourced and permissively licensed R-based tools to the public, the pharmaverse Council has also been hard at work. In 2024, Laura Needleman joined the Council as a DEI Champion (diversity, equity, and inclusion). Laura has been bringing great value to the Council and is well positioned to start making an impact on the wider OS community in pharma. You can meet Laura, with her own words, in this blog post. The Council has also raised awareness of the pharmaverse ecosystem at the US CDISC Interchange and at the OCS Innovation Forum (Office of Computational Sciences, FDA). If you want to know more about what the pharmaverse is, its mission, or how it works, head over to the Inside the pharmaverse blog.
I hope this summary demonstrates all the ways PHUSE is supporting its Community, enabling discussions and knowledge sharing on key questions of using open-source technology for clinical data analysis. And, with all the progress PHUSE has enabled, I want to acknowledge that it is not the only organisation in our industry tackling the challenges around open source. CDISC supports COSA (CDISC Open-Source Alliance). R Consortium focuses on R and has facilitated successful working group activities for R-based submissions to the FDA. R/pharma (R in pharma) continues to amplify and enable learnings in this space, through workshops, webinars and conferences. Our industry is evolving swiftly in this space – driven by Community efforts and enabled by industry organisations such as PHUSE. Keep up the good work, my phriends – together!