– Written By CAMIS Project team, within the Data Visualisation & Open Source Technologies Working Group
A recent CAMIS contribution explored the standard Tobit model for a virology endpoint (viral load) with a lower detection limit.
Tobit regression, a censored regression model, estimates linear relationships between independent variables and a dependent variable that is either left- or right-censored at a specific known value. The standard Tobit model assumes a normally distributed endpoint.
In this CAMIS contribution, the standard Tobit model was explored in the case where the endpoint has a lower limit. Specifically, it was explored in the context of a virology endpoint viral load, which has a lower limit of detection (e.g. 100 copies/mL), and in such a case one only knows that the sample result is <100 copies/mL, but the exact value is not known.
The data consisted of two equally sized groups (n=10 in each group), with the data censored on the left (lower limit of detection) at a value of 8.0. In Group A there were 4/10 records censored, and in Group B there were 1/10. The implementations of Tobit regression in R and SAS were compared (link to full comparison on the CAMIS website: R vs SAS Tobit Regression).
In SAS, the LIFEREG procedure was used, which requires a specific structure in the MODEL statement, namely "(lower, upper)". Here, since the lower value is missing, the upper value is used as a left-censored value. The data and other model specifications are also given as input. The output provides an estimate of difference between the groups, along with p-value, standard errors, confidence limits, and model fit statistics (full details can be found here: psiaims.github.io/CAMIS/SAS/tobit regression SAS.html).
In R, the censReg, survival and VGAM packages were explored. Similarly to SAS, each package takes the given data and model specification as input and provides an estimate of the difference between the groups as output. P-values, standard deviation for CIs, and model fit statistics are also provided (full details can be found here: psiaims.github.io/CAMIS/R/tobit regression.html).
The censReg() and survreg() (from the survival package) functions provided matching results with SAS LIFEREG. In both cases, estimation is being done by the maximum likelihood approach. The vglm() function in VGAM showed slight numerical differences due to a different estimation technique. The VGAM package uses vector generalised linear and additive models, which are estimated using an iteratively reweighted least squares (IRLS) algorithm.
Typically, the Tobit model assumes normally distributed data, and the standard Tobit regression results matched between R and SAS when a normally distributed endpoint was assumed. Additionally, this comparison highlighted the flexibility of Tobit regression implementations across software (as well as the importance of being aware of different default and available options), with SAS LIFEREG and R’s survival package offering multiple different distributional assumptions.
CAMIS (Comparing Analysis Method Implementations in Software) is a PHUSE DVOST Working Group (WG) collaboration with PSI AIMS SIG. The CAMIS open-source repository aims to provide essential information about the application of statistical methodology in software including SAS, R and Python. A lack of clear specification of methods can result in an inability to reproduce the same results in different software, especially as not all options are available in all software, and documentation can be unclear.
By documenting found differences in a repository, we aim to reduce time-consuming efforts within the community where multiple people are investigating the same issues. Please help us build a high-quality, easy-to-read and comprehensive repository. With your support, the repository will continue to grow in content and be a vital source for medical statisticians and programmers. See CAMIS - A PHUSE DVOST Working Group for the repository or how to Get Involved.