In multivariable analysis, it is common to have a mix of binary, categorical (ordinal or unordered) and continuous variables which may influence an outcome. While TG6: Evaluating Diagnostic Tests and Prediction Models considers the situation where the main task is predicting the outcome as accurately as possible, the main focus of TG2 is to identify influential variables and gain insight into their individual and joint relationship with the outcome. Two of the (interrelated) main challenges are selection of variables for inclusion in a multivariable explanatory model, and choice of the functional forms for continuous variables (Harrell 20011, Sauerbrei et al. 20072).
In practice, multivariable models are usually built through a combination of
There is a consensus that all of the many suggested model building strategies have weaknesses (Miller 20023) but opinions on the relative advantages and disadvantages of particular strategies differ considerably.
The effects of continuous predictors are typically modeled by either categorizing them (which raises such issues as the number of categories, cutpoint values, implausibility of the resulting step-function relationships, local biases, power loss, or invalidity of inference in case of data-dependent cutpoints) (Greenland 19954) or assuming linear relationships with the outcome, possibly after a simple transformation (e.g. logarithmic or quadratic). Often, however, the reasons for choosing such conventional representation of continuous variables are not discussed and the validity of the underlying assumptions is not assessed.
To address these limitations, statisticians have developed flexible modeling techniques based on various types of smoothers, including fractional polynomials (Royston and Altman 19945, Royston and Sauerbrei 20086) and several ‘flavours’ of splines. The latter include restricted regression splines (Boer 20017, Harrell 20011) penalized regression splines (Wood 20068) and smoothing splines (Hastie and Tibshirani 19909). For multivariable analysis, these smoothers have been incorporated in generalized additive models.
Various examples illustrate that such smoothers can yield new insight into the role of continuous variables (Abrahamowicz et al. 199710, Royston and Sauerbrei 20086). However, further practical guidance is urgently needed, necessitating extended investigations of analytical properties and systematic comparisons between alternative methods.TG2 will start with a comprehensive review of methodological, medical and econometrics literature to
Part (c) may lead to new comparative simulation studies and provide building blocks for evaluation of new techniques by simulation.
We aim to develop consensus-based tentative recommendations, initially for level 2 expertise, under some simplifying assumptions about the data structure. Recommendations will address accuracy, efficiency, transportability, ease of implementation and interpretatbility, in wide range of applications (Sauerbrei 20072). Furthermore, we aim to develop systematic guidance for using splines in applications, similar to existing guidelines for fractional polynomials (Royston and Sauerbrei 20086). Longer-term goals include evaluation of and recommendations for computationally intensive variable selection algorithms which incorporate shrinkage and resampling techniques; collaborations with other TGs to account for such complexities as missing data, measurement errors, time-varying confounding, or issues specific to modeling continuous predictors in survival analyses (Abrahamowicz and MacKenzie 200612).