The SART ‘Patient Predictor’ (SARTPP) is the only IVF prediction model/online calculator developed using national United States data. In this external validation, we compare model performance of the updated SARTPP with the original published Luke model predicting live birth in the first cycle of IVF.
DESIGN: External validation by retrospective analysis of IVF cycles in our university-based infertility clinic.
MATERIALS AND METHODS: With IRB approval, patient age, BMI, parity, infertility diagnoses and live birth (>22 weeks, ≥300 grams) outcome were extracted from clinic SARTCORS data. All first, fresh, autologous IVF cycles 2012-2016 were included. This sample size exceeds the minimum 100 events and 100 nonevents for validation (1). The Luke logistic regression model derivation was previously described (2). To generate SARTPP individual probabilities, two authors entered each patient's variables into the calculator available at https://www.sartcorsonline.com/Predictor/Patient. Model performance was analyzed with SAS 9.4 and GraphPad Prism 7.03.
RESULTS: Live birth resulted in 229 of 498 cycles (46.0%). Predicted individual probabilities were higher for the Luke model (2.5-59.5%, median 49.6%) than the SARTPP (3-52%, median 43%). Discrimination, as measured by the area under the ROC curve (AUC), was greater for the SARTPP, 0.628 (95% CI 0.580-0.677), than the Luke model, 0.618 (95% CI 0.569-0.667). The Luke model showed excellent calibration with a Hosmer-Lemeshow p-value of 0.99, while the SARTPP p-value of 0.11 was closer to rejecting the null hypothesis: that a straight line fits well. Table 1 shows the models’ differences in expected and observed live births, the SARTPP underestimated outcomes in 6/10 deciles by more than 5%. The net reclassification index was -2.1% for the SARTPP compared to the Luke model.
|Decile of Predicted Probability||Luke Model||SART Patient Predictor|
|Observed Live Births||Expected Live Births||Observed Proportion - Predicted Probability||Observed Live Births||Expected Live Births||Observed Proportion - Predicted Probability|
CONCLUSIONS: Use of the SARTPP is better informed by an understanding of its discrimination (proportion of outcome pairs where live birth was assigned higher prediction) and calibration (agreement between expected and observed outcomes of deciles). Similar to other validated IVF prediction models, both studied models have modest discrimination, but the SARTPP has a slightly higher AUC. This improved discrimination comes at the expense of inferior calibration, and calibration is clinically more relevant (3). In our external validation, the SARTPP underestimated IVF success and provided no gain in predictive performance according to net reclassification. We therefore favor the Luke model.