Why Is PHUSE CAMIS Vitally Important to Statistical Medical Research?

– Written By CAMIS Project team, within the Data Visualisation & Open Source Technologies Working Group

CAMIS (Comparing Analysis Method Implementations in Software) is a PHUSE DVOST Working Group (WG), in collaboration with PSI AIMS SIG, who are building an open-source repository to provide vital information about the application of statistical methodology in software including SAS, R and Python.

Industry reliance on SAS software has resulted in a dependence on SAS software default methods, and a lack of understanding about exactly how SAS implements analysis methods. Statistical analysis plans (SAPs) are often written without full specification of the method, relying on the SAS default being used for methods not fully specified. This may not be an issue if all analysts use SAS. However, when CROs, sponsors and regulators are trying to replicate each other’s analysis, it’s vital to specify how each method will be implemented so it can be replicated using any software without ambiguity.

The CAMIS WG found that when the same methods are used replication is often easy, but finding out exactly what method is being used by each software can be time consuming. A simple example would be using a Kaplan-Meier method to estimate the median time to death with 95% confidence interval (CI). We need to specify how we intend to calculate the CIs. The SAS default (log method) is not the same as the R default (log-log method), and often we do not specify in SAP which we will use!

Another example is Cox proportional hazards modelling. We should always state if we are using the SAS default method for tied observation times (Breslow), the R default method (Efron) or another method! Even logistic regression modelling, which is commonly used to obtain odds ratios and CIs, requires the analyst to specify if we are using SAS default (Wald CIs) or R default (profile likelihood CIs). A lack of clear specification of methods can result in an inability to reproduce the same results in different software, especially as not all options are available in all software (e.g. SAS cannot calculate profile likelihood CIs using proc logistic).

Other examples of replication issues include a lack of clarity in SAS documentation for proc rmstreg, where it states a default tau=maximum event time will be used, but it implements the maximum of the event or censoring time. CAMIS helps users ensure correct model specification to avoid common pitfalls when software documentation is incorrect or unclear.

CAMIS also identified a bug in R {RBesT} package for a specific use case of 0% or 100% responders, resulting in a lack of replication. We successfully worked with the authors, and the package was fixed and re released on CRAN as v1.8-0. When doing confidence intervals for proportions, there was no single package that did all the common methods, resulting in requiring {DescTools} – a miscellaneous basic stats package unlikely to be acceptable for GxP environments. CAMIS highlighted this issue and work has begun to create a new package to meet this missing R functionality.

CAMIS applies a variety of use case studies, in an attempt to replicate results across software. This provides the reader with 10 advantages!

An easy-to-read guide for common medical research analysis in R, SAS and python.
A comparison of results across the software.
Rationale for why differences exist (default options vs available options).
Common mistakes when interpreting the documentation, which helps the user correctly implement methods in programming code.
Details regarding background methodology (including use of continuity corrections or convergence methods), which can be highly technical statistically and often not well documented.
Identifies bugs in the software, gives advice on workarounds or avoidance, or works with authors to fix bugs and improve the quality of the software.
Identifies missing requirements in R and collaborates to author new packages to fill these GxP needs.
Provides reassurance and confidence in the software being used through cross-software replication.
Highlights alternative open-source trusted macros (for example, if procedures are not available in SAS, there may be SAS macros published that can be used).
Provides time-saving trustworthy guidance to avoid duplication of effort within the medical research industry.

Topics within the repository include basic statistics, general linear models, generalised linear models, multiple imputation, survival modelling, non-parametric analysis, categorical data analysis, repeated measures, sample size and machine learning.

The CAMIS repository is a vital resource for medical statisticians and programmers, continually growing in content with your support. For additional insights into the importance of open-source skills, check out the PSI AIMS SIG blog post Why Open-Source Skills are Important – PSI AIMS SIG Website. See CAMIS - A PHUSE DVOST Working Group for the CAMIS repository, or how to Get Involved.

Posted on

26th

June

2025

Categories: Working Groups News

Related Blogs

CAMIS Blog – Tobit Regression Comparison Between R and SAS (March 2025)