Topic Group 1: Missing data

Chairs: James Carpenter, Kate Lee
Members: Melanie Bell, Els Goetghebeur, Joe Hogan, Rod Little, Andrea Rotnitzky, Kate Tilling, Ian White

Missing data are ubiquitous in observational studies, and the simple solution of restricting the analyses to the subset with complete records will often result in bias and loss of power. The seriousness of these issues for resulting inferences depends on both the mechanism causing the missing data and the form of the substantive question and associated model. The methodological literature on methods for the analysis of partially observed data has grown substantially over the last twenty years, e.g. Fitzmaurice et al. 20141, Little and Rubin 20022 and references therein, such that it may be hard for analysts to identify appropriate (but not unduly complex) methods for their setting. Our aim is to draw on both the exisiting advice, e.g. National Research Council (U.S.) 20103, Sterne et al. 20094, Carpenter et al. 20125, Carpenter and Kenward 20136 and the expertise of the TG1 members, to provide practical guidance which will lead to appropriate analysis in standard observational settings, while giving principles which can inform analysis plans for less common substantive models.

To achieve this aim, the topic group will describe a set of principles for the analysis of partially observed observational data, and illustrate their application in a range of settings, ranging from simple summaries of single variables, through regression models, models for hierarchical and longitudinal data and models to adjust for time varying confounding.

Specifically, we aim to:

  1. assist analysts in understanding the nature of the additional assumptions inherent in the analysis of partially observed data;
  2. describe, in a range of settings, the implications of these assumptions for analyses that restrict to the subset of complete records;
  3. detail the range of methods available for improving on a complete records analysis, including the EM and related algorithms, multiple imputation, inverse probability and doubly robust methods, and
  4. provide guidance on the utility and pitfalls of each approach, bearing in mind the importance of software availability for most applied researchers.

In particular, we will delineate how the various methods relate to each other, and in particular when they are likely to give similar answers.

Since the data at hand cannot definitively identify the missing data mechanism, exploring the robustness of inferences to departures from the primary assumption about the missing data mechanism is important in many applications. We will discuss how to frame assumptions for such sensitivity analyses, and practical approaches to analyses under these assumptions.

Consistent with the STRATOS initiative, the group will not focus narrowly on missing data (which is itself but an extreme form of coarsened data, such as measurement error), but place its guidance firmly in the context of appropriate design and statistical methods for the inferential question at hand. Rather than providing a recipe book, our aim is to foster understanding of the key principles, so that methods can be chosen and applied with confidence. As a result of our work, it is inevitable that key areas requiring further research will emerge, and we anticipate that scoping their nature will also be a useful contribution to future research in this area.

  1. Fitzmaurice GM; Kenward MG;  Molenberghs G; Tsiatis AA; Verbeke G. Handbook of Missing Data. CRC Press: New York, 2014.
  2. Little RJ; Rubin DB. Statistical Analysis with Missing Data. John Wiley and Sons Ltd: Chichester, 2002.
  3. National Research Council (U.S.). The prevention and treatment of missing data in clinical trials. National Academies Press: Washington D.C, 2010.
  4. Sterne JAC; White IR; Carlin JB; Spratt M; Royston P; Kenward MG; Wood AM; Carpenter JR. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ (Clinical research ed.) 2009; 338: b2393.
  5. Carpenter JR;  Kenward MG;  Goldstein H. Statistical modelling of partially observed data using multiple imputation: principles and practice. 15-23. (Eds. Y. Tu and D. Greenwood). Springer: New York, 2012.
  6. Carpenter JR;  Kenward MG. Multiple Imputation and its Application. John Wiley & Sons Ltd: Chichester, 2013.