Objective To examine how variation in clinical coding systems and the number of conditions included under different study criteria influence estimates of multimorbidity prevalence in a nationally representative adult population in England. Methods and analysis We conducted a cross-sectional analysis of anonymised records from 7.2 million adults in the Clinical Practice Research Datalink, linked to Hospital Episode Statistics, covering the period from 1987 to 2020. Adults were included if they had at least two recorded health conditions. Multimorbidity was defined as ≥2 health conditions selected from a list of 54 conditions. Prevalence was estimated separately using general practice (GP) data, hospital data and combined sources, and stratified by age, sex, ethnicity and deprivation. A stepwise inclusion approach assessed the impact of expanding the number of conditions included after different study criteria. Gradient boosting (XGBoost) with Shapley Additive Explanations values identified predictors of multimorbidity recorded only in GP data. Directionality was examined using Pearson correlations. Results Multimorbidity prevalence was 92.3% using GP data, 63.2% using hospital data and 100% when both sources were combined. Prevalence increased consistently as more conditions were included under different study criteria and was always higher in GP data. Discrepancies were most pronounced among younger adults and ethnic minority groups. GP-only coding was associated with younger age, female sex, shorter hospital stays, absence of Accident & Emergency use, no palliative care coding and lower deprivation. Conclusion Estimates of multimorbidity prevalence are highly sensitive to both the clinical coding system used and the number of conditions included under different study criteria. Standardised approaches to condition selection and the integration of data sources are essential to ensure accurate measurement and equitable representation.
Journal article
BMJ
2026-01-01T00:00:00+00:00
4
e004120 - e004120