Written by Michael Rimler, Senior Director, Head of Technical Excellence and Innovation at GSK Please note, this blog is the opinion of the author and does not represent PHUSE nor necessarily the opinion of the author's employer.
A very hot topic lies at the intersection of data analytics and pharmaceutical drug development: the use of open-source software to generate analyses used in regulatory submissions to support increased availability of medications to patients in the market (INDs, NDAs, sNDAs, BLAs, etc.). From my vantage point, by far the most activity relates to the use of the R programming language to deliver conventional submission components such as data packages (SDTM and ADaM data models) and static analyses (tables, listings, figures). The experts active in the area also believe that open-source languages such as R offer additional opportunities to reimagine how sponsor companies present their clinical trial data to regulatory authorities to enable faster, more thorough and more robust review of the data for both safety and efficacy – ultimately to bring life-changing medicines safely to patients more quickly and at lower cost.
The fundamental question is quite simple: can we use R for a regulatory submission?
I believe that the answer is just as simple: yes. However, the answer to how has proven to be exceedingly complex.
To start, let’s be clear on the challenge: transforming an organisation from conventional delivery of submission data packages to one which uses R, in part or in full, for regulatory submissions. Yes, it’s about using open-source software – software which has elements of community development – software which is not developed with a direct commercial interest – software which has its source code available for public review (and scrutiny) – software which is typically freely licensed for wide use, reuse and modification. But the challenge is also about transformation – enabling the people in the organisation to develop the skills necessary to use new tools. In some respects, it’s like transforming people into using a computer word processor instead of a typewriter. Or using instant messaging instead of the telephone. Or using mobile apps in our pyjamas instead of desktops in our cubicles. It’s saying hey, we have new tools – you need to be able to use them because they are our future.
This means that the challenge is not just technical; there’s also a human element. Hence, there are three fundamental components which need to be in place for successful implementation: people, tools and platform. The people (who) must use tools (how) in a platform (where) to transform our data from collection to submission so that a regulatory agency can assure patients that an approved medicine is safe and effective. To do that, the reviewing authority must have confidence that the data supporting an application are correct. If the results are correct, then they are true and no other result is possible without committing some error. Therefore, correct results are not only accurate, but also reproducible and traceable from source to report.
At its core, it shouldn’t matter if we generate the results with highly automated computational systems which produce heavy audit trails on the data processing or with pen and paper after manually transcribing a thousand case report forms. All of the complexity with using R for regulatory submissions comes in proving that our results are correct. Can we demonstrate that we’ve collected the data with integrity, monitored and cleaned the data with integrity, processed the data with integrity, and all with computational tools that do what they say they do? Because, if we trust the inputs and we trust the process, then we will trust that the outputs are correct.
Once we’ve chosen our process (SAS, R, Excel, calculator, pen and paper), everything else, literally everything else, is to either demonstrate that any human action in the process is without error (quality control and quality assurance) or any machine action in the process works as intended (testing and validation). We accomplish this by demonstrating the accuracy, reproducibility and traceability of the data which is transformed through that process. We aim to establish trust with a reviewing authority that our data is correct and can be confidently used to assess the safety and efficacy of a particular medicine for a particular indication within a particular population.
However, although the FDA issued its Statistical Software Clarifying Statement a while ago, the answer to what satisfies sufficient demonstration of accuracy, reproducibility and traceability remains vague and, in my opinion, rightfully so. I propose that perhaps it is not the responsibility of a regulatory agency to instruct sponsor companies what they should do with respect to the use of analytical software for data analysis. Rather, it is the responsibility of a regulatory agency to evaluate the results submitted; not only the results themselves, but also the integrity of those results – the correctness of those results. And, it is our responsibility as sponsor applicants to determine what we believe is sufficient demonstration of accuracy, reproducibility and traceability. We are indeed invited, at least by the FDA, to “consult with FDA review teams and especially with FDA statisticians regarding the choice and suitability of statistical software packages at an early stage in the product development process”.
So, can we use R for a regulatory submission? Absolutely, but it is our responsibility to ensure that our results are correct and assure the reviewing regulatory agency that they can trust our data handling process has integrity. Only then can the data and results adequately speak to the safety and efficacy of the medication for patients.
Find Out More
If you would like to find out more on this topic, then access the PHUSE Education resources. The vision of PHUSE Education is to create a PHUSE roadmap to education, which covers the broad bandwidth of knowledge a clinical data scientist needs to have to be successful in their job.
The Regulatory Environment Cluster aims to summarise the key infliction points with regulators, provide an understanding of how regulatory bodies use submitted data to support drug approval and provide intelligence on emerging approaches that could influence the way clinical trial data is submitted in the near future.