Topic Group 3: Initial data analysis

Chairs: Marianne Huebner, Saskia le Cessie, Werner Vach
Members: Dianne Cook, Heike Hofmann, Lara Lusa, Carsten Oliver Schmidt

Homepage: Topic Group 3

The main aim of IDA is seen in providing reliable knowledge about the data to enable responsible statistical analyses and interpretation (Schmidt et al. 20181). 

IDA consists of all steps performed on the data of a study between the end of the data collection/entry and start of those statistical analyses that address research questions. Shortcomings in these first steps may result in inappropriate statistical methods or incorrect conclusions (Huebner et al. 20162).

Our topic group promotes initial data analysis (IDA) as a highly structured step in the data analysis process. For this purpose, we develop a framework for IDA and tools to facilitate the IDA process.


The following steps are an integral part of IDA (Huebner et al. 20183):

  1. Metadata setup aimed at systematically gathering all background information required to properly conduct all following IDA steps. Beyond technical metadata such as labels or plausibility limits, this covers conceptual metadata which combines information from the study protocol, secondary information sources and information about the actual study conduct.
  2. Data cleaning aimed at identifying and correcting errors in the data using the metadata for an efficient procedure.
  3. Data screening consisting of understanding the properties and the quality of the data that may affect future analysis and interpretation. The focus is on data properties.
  4. Initial data reporting aiming at informing all potential collaborators about all relevant insights obtained from the previous steps. It should provide all necessary information to properly conduct the intended analyses.
  5. Refining and updating the analysis plan translates the relevant findings from the data screening into corresponding adaptations of the analysis plan.
  6. Reporting IDA in research papers is a final step ensuring transparency regarding key findings and actions that impacted the analysis and interpretation of results.