Missing data are ubiquitous in observational studies, and the simple solution of restricting the analyses to the subset with complete records will often result in bias and loss of power. The seriousness of these issues for resulting inferences depends on both the mechanism causing the missing data and the form of the substantive question and associated model. The methodological literature on methods for the analysis of partially observed data has grown substantially over the last twenty years, e.g. Fitzmaurice et al. 20141, Little and Rubin 20022 and references therein, such that it may be hard for analysts to identify appropriate (but not unduly complex) methods for their setting. Our aim is to draw on both the exisiting advice, e.g. National Research Council (U.S.) 20103, Sterne et al. 20094, Carpenter et al. 20125, Carpenter and Kenward 20136 and the expertise of the TG1 members, to provide practical guidance which will lead to appropriate analysis in standard observational settings, while giving principles which can inform analysis plans for less common substantive models.
To achieve this aim, the topic group will describe a set of principles for the analysis of partially observed observational data, and illustrate their application in a range of settings, ranging from simple summaries of single variables, through regression models, models for hierarchical and longitudinal data and models to adjust for time varying confounding.
Specifically, we aim to:
In particular, we will delineate how the various methods relate to each other, and in particular when they are likely to give similar answers.
Since the data at hand cannot definitively identify the missing data mechanism, exploring the robustness of inferences to departures from the primary assumption about the missing data mechanism is important in many applications. We will discuss how to frame assumptions for such sensitivity analyses, and practical approaches to analyses under these assumptions.
Consistent with the STRATOS initiative, the group will not focus narrowly on missing data (which is itself but an extreme form of coarsened data, such as measurement error), but place its guidance firmly in the context of appropriate design and statistical methods for the inferential question at hand. Rather than providing a recipe book, our aim is to foster understanding of the key principles, so that methods can be chosen and applied with confidence. As a result of our work, it is inevitable that key areas requiring further research will emerge, and we anticipate that scoping their nature will also be a useful contribution to future research in this area.