Implementing statistical analyses

  • WMO
  • Non-WMO

The necessary techniques and methods for statistical analyses are described in the Statistical Analysis Plan (SAP).

To enhance your statistical knowledge, the methodologists and statisticians of the EDS department have developed an e-learning on Practical Biostatistics. Additionally, the website of EpidM offers various training resources and refresher courses on statistical methods and techniques for medical-scientific research.

A Biostatistics Wiki is also available, containing short and accessible explanations of commonly used statistical techniques in medical-scientific research.

Organizing statistical analyses

Practical suggestions for conducting statistical analyses in an organized and reproducible manner:

  • It is extremely important to retain the final, cleaned dataset in its original form. Therefore, create a working copy first, and save the original dataset as a read-only file with a new name in a separate folder. Use the working copy for all further statistical analyses;
  • Always use a syntax (in SPSS) or a script (in R) when performing data operations (e.g. recoding, merging variables) and statistical analyses. This ensures working in a systematic and organised manner, and enables you to replicate your analyses later on or have them repeated and checked by other researchers. Also, cumbersome tasks (e.g. recoding) can be easily and efficiently repeated by copying from previous syntax files;
  • Add descriptions and comments to the syntax in simple, clear language. This will clarify the purpose and result of each operation, making the process easier to understand, especially for others.
  • Save syntax files with meaningful file names (.sps files in SPSS). Similarly, save corresponding output files (.spv files in SPSS) with descriptive names that reflect the results of the analyses.

Data analyses documentation

Clear documentation is important for reproducibility and efficiency in data analysis.This can be done by creating a log file for relevant analyses:

  • Start the log file with the research question to be answered and the date of the analysis. Conclude with a(n) (provisional) answer to the question;
  • Use the log file (e.g. SPSS syntax) to document your analyses (e.g. for an article). This allows for easy retrieval and reproduction by you and others.
  • Always include the name and location of the data file (e.g. ‘get file’ in SPSS), so you know which data file is related to which analysis and where it is saved.
  • Log files should include the code for all statistical analyses performed. Structure the code logically, separating variable definitions from analyses (e.g. first all variable definitions, then the analyses for table 1, then table 2, etc.).
  • Annotate your log files (e.g. * followed by text in a SPSS syntax). Annotations help document your analysis process and and facilitate the reuse of your code.

Handling missing data

Missing data is a common problem in research. How you handle it depends on the amount, type (f.e. single items, full questionnaire, measurement wave), and the reasons for the missing data (f.e. at random or not).

Software packages like SPSS, Stata or SAS typically exclude cases with missing values in the analyses, which can reduce sample size and increase standard errors. As a result, the statistical power to find a significant result decreases and the chance that you correctly accept the alternative hypothesis of an effect (compared to the null hypothesis of no effect) is smaller.

Excluding missing cases can also introduce bias in effect estimates if the characteristics of responders and non-responders differ. Especially when the non-responder group is large, the sample characteristics will be different from the original sample and the study population..

Therefore, always inspect the missing data before starting any analysis. Never exclude cases without understanding the reasons for the missing values, and consider alternative methods for handling missing data.