Case Study
The EuroSCORE II is a commonly used scoring rule for estimating the risk of in-hospital mortality in patients undergoing major cardiac surgery. It was developed using data from 16,828 adult patients from 43 countries. Predictors include patient characteristics (e.g. age, gender), cardiac related factors (e.g. recent MI) and surgery related factors (e.g. Surgery on thoracic aorta). In 2014, a systematic review was undertaken by Guida et al. (2014) to search articles assessing the performance of EuroSCORE II on perioperative mortality in cardiac surgery. The systematic review identified 24 eligible validation studies, 22 studies were included in the main analysis.
In this case study, we summarize the results from these 22 studies, as well as the results from the split-sample validation contained within original development article of EuroSCORE II. We will use the metamisc package to derive summary estimates for the discrimination and calibration performance of EuroSCORE II, to evaluate the presence of between-study heterogeneity, and to identify potential sources of between-study heterogeneity. A step-by-step tutorial is provided by Debray et al. (2017).
We can load the data from all 23 validation studies as follows:
Study | Patients (n) | Mortality (%) | c-index |
---|---|---|---|
Nashef | 5553 | 4.18 | 0.8095 |
Biancari | 1027 | 2.73 | 0.8670 |
Di Dedda | 1090 | 3.76 | 0.8100 |
Chalmers | 5576 | 3.43 | 0.7900 |
Grant | 23740 | 3.14 | 0.8080 |
Carneo | 3798 | 5.66 | 0.8500 |
Kunt | 428 | 7.94 | 0.7200 |
Kirmani | 15497 | 3.53 | 0.8180 |
Howell | 933 | 9.65 | 0.6700 |
Wang | 11170 | 2.02 | 0.7200 |
Borde | 498 | 1.61 | 0.7200 |
Qadir | 2004 | 3.79 | 0.8400 |
Spiliopoulos | 216 | 6.48 | 0.7700 |
Wendt | 1066 | 4.22 | 0.7200 |
Laurent | 314 | 5.73 | 0.7700 |
Wang | 818 | 1.59 | 0.6420 |
Nishida | 461 | 7.16 | 0.7697 |
Barilli | 12201 | 1.72 | 0.8000 |
Barilli | 1670 | 7.49 | 0.8200 |
Paparella | 6191 | 4.85 | 0.8300 |
Carosella | 250 | 3.60 | 0.7600 |
Borracci | 503 | 4.17 | 0.8560 |
Osnabrugge | 50588 | 2.12 | 0.7700 |
Meta-analysis of calibration performance
Calibration refers to a model’s accuracy of predicted risk probabilities, and indicates the extent to which expected outcomes (predicted from the model) and observed outcomes agree. Summarising estimates of calibration performance is challenging because calibration plots are most often not presented, and because studies tend to report different types of summary statistics in calibration. For example, in the case study, calibration was assessed using the Hosmer-Lemeshow test, calibration plots or by comparing the observed mortality to the predicted EuroSCORE II (either overall or for groups of patients). Within each validation study, we can compare the total number of observed events (O) with the total number of expected (predicted) events by deriving their ratio O:E. The total O:E ratio provides a rough indication of the overall model calibration (across the entire range of predicted risks). It describes whether more (O:E > 1) or fewer (O:E < 1) events occurred than expected based on the model. Whilst the O:E ratio itself was not explicitly reported in all studies, it can be calculated from other reported information:
The O:E ratio can also be derived from the observed and predicted
mortality risk Po
and, respectively, Pe
:
It is recommended to first transform extracted O:E ratios to the log (natural logarithm) scale before applying a meta-analysis.