External validation of the parsimonious EuroLung risk models: analysis of the Brazilian Lung Cancer Registry

D’Ambrosio1, Paula Duarte; Terra1, Ricardo Mingarini; Brunelli2, Alessandro; Lauricella1, Leticia Leone; Cavadas1, Carolina Adan; Fonini1, Jaqueline Schaparini; Gross3, Jefferson Luiz; Cipriano4, Federico Enrique Garcia; Silva5, Fabio May da; Pêgo-Fernandes1, Paulo Manuel

1880
Views

Back to summary

Open Access

Peer-Reviewed
Artigo Original

External validation of the parsimonious EuroLung risk models: analysis of the Brazilian Lung Cancer Registry

Paula Duarte D’Ambrosio1, Ricardo Mingarini Terra1, Alessandro Brunelli2, Leticia Leone Lauricella1, Carolina Adan Cavadas1, Jaqueline Schaparini Fonini1, Jefferson Luiz Gross3, Federico Enrique Garcia Cipriano4, Fabio May da Silva5, Paulo Manuel Pêgo-Fernandes1

ABSTRACT

Objective: The purpose of this study was to assess performance in the Brazilian Lung Cancer Registry Database by using the parsimonious EuroLung risk models for morbidity and mortality. Methods: The EuroLung1 and EuroLung2 models were tested and evaluated through calibration (calibration plot, Brier score, and the Hosmer-Lemeshow test) and discrimination (ROC AUCs), in a national multicenter registry of 1,031 patients undergoing anatomic lung resection. Results: The evaluation of performance in Brazilian health care facilities utilizing risk-adjustment models, specifically EuroLung1 and EuroLung2, revealed substantial miscalibration, as evidenced by calibration plots and Hosmer-Lemeshow tests in both models. In terms of calibration, EuroLung1 exhibited a calibration plot with overlapping points, characterized by a slope of 1.11 and a Brier score of 0.15; the Hosmer-Lemeshow test yielded a statistically significant p-value of 0.015; and the corresponding ROC AUC was 0.678 (95% CI: 0.636-0.721). The EuroLung2 model displayed better calibration, featuring fewer overlapping points in the calibration plot, with a slope of 1.22, with acceptable discrimination, as indicated by a ROC AUC of 0.756 (95% CI: 0.670-0.842). Both models failed to accurately predict morbidity and mortality outcomes in this specific health care context. Conclusions: Discrepancies between the EuroLung model predictions and outcomes in Brazil underscore the need for model refinement and for a probe into inefficiencies in the Brazilian health care system. (Plataforma Brasil identifier: 16424413.2.1001.0065. [https://plataformabrasil.saude.gov.br/])

Keywords: Quality of health care; Models, statistical; Public health; Morbidity; Lung neoplasms.

INTRODUCTION

In a managed care system, the assessment of care quality within surgical units is crucial. Quality is an abstract concept often measured through various indicators.(1) In thoracic surgery, outcome measures are the main quality indicators. Evaluating the performance of health care providers requires adjusting outcomes for different case mixes across institutions.(2)

To facilitate equitable comparative audits, The European Society of Thoracic Surgeons (ESTS) Database Committee developed risk-adjustment models for morbidity and mortality from a dataset of nearly 50,000 patients.(3) These models were simplified into the parsimonious EuroLung1 and EuroLung2 versions in 2019.(4) Those versions offer excellent discrimination capabilities in Europe and are applicable for risk-adjusted performance audits, aiding in quality improvement.

The Brazilian Lung Cancer Registry, a multicenter prospective database, collects data from thoracic procedures at health care facilities in Brazil, supporting quality management. Predictive models like the parsimonious EuroLung risk models facilitate the initial quality assessment and subsequent improvements. Although these models have shown validity in Europe,(4) they have been shown to have limited discrimination capacity when applied to patients in Canada and Japan. (5,6). To our knowledge, there have been no studies evaluating their applicability in Latin America. This is crucial because of disparities among these populations, including variations in socioeconomic factors and challenges related to diagnosing lung cancer and initiating treatment, often due to barriers to health care access.

The primary objective of this study was to assess the performance of thoracic surgery facilities in Brazil by using the parsimonious EuroLung1 and EuroLung2 risk models within the Brazilian Lung Cancer Registry. A secondary objective was to test the external validity of the parsimonious EuroLung risk models in the Brazilian context.

METHODS

Ethics statement

This study was approved by the local institutional review board (Registration no. 16424413.2.1001.0065). The requirement for informed consent was waived because only anonymized data were used.

Modeling cohort - parsimonious EuroLung1 and EuroLung2 models

In 2017, the ESTS Database Committee published the first models for the prediction of risk after anatomical lung resection (EuroLung1 for cardiopulmonary morbidity and EuroLung2 for 30-day mortality), based on data from approximately 50,000 patients. (3) A recent update described models that are more parsimonious. (4) The parsimonious EuroLung models contain five variables for morbidity and six variables for mortality. The two models (EuroLung1 and EuroLung2) contain some common variables associated with morbidity and mortality—age, sex, postoperative FEV1 (ppoFEV1), and thoracotomy—together with some that are specific for either morbidity (extended resection) or mortality (BMI and pneumonectomy).(4)

Cardiopulmonary complications listed in the ESTS database were included as outcome variables.(7) Mortality was defined as any death within 30 days after operation or surgical death occurring at any time during the same hospital stay. Extended resection(3) consisted of chest wall involvement; Pancoast tumors; resection of the atrium, superior vena cava, aorta, diaphragm, or vertebra; bronchial sleeve resection; pleuropneumonectomy; sleeve pneumonectomies; and intrapericardial pneumonectomy.

Aggregate EuroLung2 model

Similar to what was done in the original EuroLung study,(7) we tested the aggregate version of the EuroLung2 model to be used as a simple risk stratification tool. using ROC analysis, we found the best cutoff values associated with mortality to be as follows(8): age > 70 years; ppoFEV1 < 70%; and BMI < 18.5 kg/m2.8 A score of 1 point was assigned to the variables with the smallest odd ratios at logistic regression (age > 70 years and ppoFEV1 < 70%) and proportionally weighting the four other variables(4): 2.5 points for male sex, BMI < 18.5 kg/m2, and thoracotomy; and 3 points for pneumonectomy.(4) Patients were grouped into seven risk classes to evaluate incremental risk of mortality.(4)

Performance evaluation

This study evaluates the performance in Brazilian health care facilities utilizing the EuroLung1 and EuroLung2 risk-adjustment models.(4) We used a validation cohort from the nationwide multicenter registry known as the Brazilian Lung Cancer Registry. This registry stands as a forward-looking, comprehensive database including patients who have undergone surgical treatment for lung cancer. It involves 12 institutions across five Brazilian states that have provided data related to patients treated between December of 2009 and December of 2022. Our sample comprised 1,031 lung cancer patients who underwent anatomic lung resection during that timeframe, representing 46.25% of all anatomic lung resections cataloged in the Registry. We excluded patients for whom any values pertaining to pivotal variables were missing.

The definitions of variables were derived from the ESTS standardization document.(9) The goal is to use both risk models as instruments of internal auditing and for quality control in the local context.

Statistical analysis

To test the parsimonious EuroLung1 and EuroLung2 scores, we used the published coefficients for both scores(8) to assess the calibration and discrimination. (9,12,13) The logit of the EuroLung1 model was as follows:
−2.852 + 0.021 × age + 0.472 × male − 0.015 × ppoFEV1 + 0.662 × thoracotomy + 0.324 × extended resection

The logit of the EuroLung2 model was as follows:
−6.350 + 0.047 × age + 0.889 × male − 0.055 × BMI − 0.010 × ppoFEV1 + 0.892 × thoracotomy + 0.983 × pneumonectomy

In our assessment, we employed calibration plots, the Brier score, and the Hosmer-Lemeshow test. The calibration plot displays the relationship between observed frequencies and predicted probabilities. (8,10,11) The Brier score quantifies the overall disparity between the predicted probability of an event (such as winning) and the actual occurrence of that event.(8,10,11) The Hosmer-Lemeshow test divides the study cohort into deciles based on predicted values, comparing the observed rates with the expected rates.(8,10,11) Model discrimination was characterized by the ROC AUC.(8,10,11)

In order to investigate the linear association between the levels of the score for the variable “risk class” and patient mortality (aggregate EuroLung2 model), the Mantel-Haenszel chi-square test (MH χ2) was applied to the data.

Continuous variables are expressed as median and interquartile range, whereas categorical covariates were described as absolute counts and percentages. The 95% confidence intervals are also presented.

Analyses for model development and validation were performed using the R package, version 3.3.3 (R Core Team, 2017) and Stata software, version 15.0 (Stata Corp., College Station, TX, USA). Values of p < 0.05 were considered statistically significant.

RESULTS

Among 1,210 patients who underwent lung resection and were characterized in our database, critical data were missing for 179, and the remaining 1,031 patients were included in further analyses. The characteristics of the included patients are shown in Table 1. Major cardiopulmonary complications occurred in 196 patients (19.0%), and 46 patients (3.8%) died in the hospital or within the first 30 days after the procedure. The observed morbidity rate was higher than that predicted by the EuroLung1 model (19.0% vs. 13.1%). As for mortality, the observed rate was higher than that predicted by the EuroLung2 model (3.8% vs 1.5%). The observed and predicted outcomes in the validation dataset from the EuroLung1 and EuroLung2 models are shown in Tables 2 and 3, respectively.

For the EuroLung1 model, the calibration plot shows some overlap, indicating a lack of perfect calibration. The slope of 1.11 suggests that the model is slightly overestimating probabilities (Figure 1). The Brier score of 0.15 indicates moderate calibration performance, and the p-value of 0.015 from the Hosmer-Lemeshow test suggests that the model is not well calibrated. In addition, the AUC for the EuroLung1 model was 0.678 (95% CI: 0.636-0.721), indicating weak discrimination performance (Figure 2).

For the EuroLung2 model, the calibration plot shows less overlap, indicating better calibration (i.e., improved alignment between predicted probabilities and observed outcomes) than that of the EuroLung1 model. The slope of 1.22 further supports that finding, suggesting a closer fit between predicted and observed probabilities (Figure 2). The Brier score of 0.03 indicates good calibration performance, although the Hosmer-Lemeshow test suggested that the model is not well calibrated, given the p-value of 0.044. The EuroLung2 model had acceptable discrimination, as demonstrated by an AUC of 0.756 (95% CI: 0.670-0.842).

Patients were grouped into five risk classes showing incremental risk of mortality, as can be seen in Table 4. There is a statistically significant linear association (p < 0.001; MH χ2 = 6.530, therefore, p < 0.05) between the levels of the score of the aggregate EuroLung2 model and the percentage of mortality of patients. The patients in the lowest risk class had a 3.4% mortality rate, whereas those in the highest risk class had a 28.2% mortality rate. It is noteworthy that the 9.5-12.0 score category was removed from this analysis because it comprised only five cases, which is not sufficient for a reliable prognosis of death.

DISCUSSION

The external validation assessment of the parsimonious EuroLung1 and EuroLung2 models reveals miscalibration in both. In addition, performance assessments of Brazilian health care facilities using risk-adjustment models like EuroLung1 and EuroLung2 indicate a higher observed mortality and morbidity rate in the Brazilian Lung Cancer Registry than those predicted by the EuroLung risk models. The miscalibration observed in both models indicates the limitations of directly applying them to the Brazilian population without appropriate adjustments, and it emphasizes the need for recalibration or development of locally tailored models to enhance accuracy and improve clinical decision-making. These findings also suggest limitations in the direct application of the EuroLung models to the Brazilian population without suitable modifications, which could potentially highlight the underperformance of health care facilities in Brazil.

The EuroLung risk models represent recent advancements in population-based tools for predicting cardiopulmonary morbidity and mortality following anatomic lung resection, necessitating external validation across diverse populations for generalizability. (2,3) However, such validation is often hindered by population-specific discrepancies.(8) In the Brazilian cohort, the EuroLung2 model demonstrated acceptable discrimination, as evidenced by a higher AUC value. However, discrepancies in both models probably stem from the exclusion of critical variables in the ESTS model, which are vital in the Brazilian context, such as racial and social factors, along with caseload variations. This observation is consistent with the findings of a study conducted in Japan,(6) highlighting the predictive limitations of the EuroLung models for morbidity and mortality due to notable baseline differences with the European demographic.(6) Such omissions might significantly impact the observed underperformance of Brazilian health care facilities. Nonetheless, the discrepancy between the observed and predicted morbidity rates can be attributed to patient-specific factors, which encompass pre-existing comorbidities, socioeconomic conditions, and the disease stage at the time of diagnosis.(9,12-14) In Brazil, a middle-income country, the absence of adequate education regarding disease prevention often results in patients presenting to the health care system with advanced, symptomatic disease,(15) in contrast to their counterparts in high-income countries. Notably, Knorst et al.(16) reported a historical cohort study in which the time from the onset of initial symptoms to the diagnosis of lung cancer in a university hospital in the southern region of Brazil exceeded 20 weeks, whereas the Standing Medical Advisory Committee recommendation is that the interval between symptom onset and treatment should be no longer than 6-8 weeks.

The discrepancy in mortality may be linked to systemic factors, including access to health care services for prevention, timely diagnosis, and treatment.(17) In Brazil, over 75% of patients depend exclusively on the Brazilian Unified Health Care System. Despite its goal of providing universal care, the system faces significant challenges related to accessibility, diagnostic delays, treatment availability, and substantial disparities among cancer care facilities concerning diagnostic and treatment technologies.(18,19) For example, Lista et al.(20) discovered that almost 80% of the initial treatments for lung cancer in Brazil did not take the diagnosis into consideration; only 6.8% of patients received a lung cancer diagnosis within 30 days after experiencing symptoms. Another study conducted among the Brazilian population revealed that 10-18% of lung cancer patients, regardless of their disease stage, did not undergo any cancer treatment due to their poor clinical condition,(21) rendering them unable to withstand the risks associated with treatment.

Lung cancer remains a pressing public health concern in Brazil, and as a response to this challenge, the country has implemented a series of public policies aimed at improving surgical treatment outcomes. Over the past decade, Brazil has made significant strides in this area, with initiatives focused on expanding access to early detection, enhancing surgical techniques, and ensuring equitable care for all patients. In addition, strong public health measures in Brazil have led to notable reductions in tobacco consumption in Brazil, setting a valuable precedent for other low- and middle-income countries. National research in Brazil has revealed a nearly 50% reduction in smoking prevalence, aligning with a corresponding decrease in tobacco-related fatalities.(22) These policies, coupled with efforts to reduce health care disparities, have the potential to revolutionize lung cancer surgery in Brazil, ultimately leading to better patient outcomes and a brighter future in the fight against this devastating disease.

Another reason for the underperformance of Brazilian health care facilities may be related to surgical skills. Therefore, we will examine the data in a more granular manner to gain a deeper understanding of the quality of surgical care at the facilities that could be associated with these outcomes. Subsequently, we will investigate design actions aimed at enhancing improvement factors. Overall, these findings highlight the complex interplay between patient-specific and systemic factors that influence the calibration and performance of risk models in a diverse health care landscape such as that of Brazil. Further research and tailored interventions are essential to bridge these disparities and improve the quality of lung cancer care in the country.

The present study relied on data from the Brazilian Lung Cancer Registry, a prospective multicenter database. The main limitation of the study is the size of the sample, which was small in comparison with the original population from which the models were generated. In addition, the study may simply be underpowered to assess the calibration and discrimination of the risk models. The fact that 46% of the cases were excluded from analysis in both arms because key values were missing raises concerns about the validity of our findings. This significant data gap suggests a potential bias, given that less than half of the facilities contributed meaningful data, limiting the comprehensiveness and reliability of the analysis. Furthermore, the Brazilian Lung Cancer Registry includes 12 institutions in five Brazilian states and does not represent the entire country. However, it is important to note that it stands as the only database related to the surgical treatment of lung cancer in Brazil. Therefore, the findings should be interpreted within the context of the studied population. Moreover, our database initially included mostly patients from the public health care sector, only later including those from the private sector. In the present study, no analyses were carried out separating patients by sector.

The disparities between the EuroLung model predictions and Brazilian patient outcomes highlight the need for model adjustments and signal potential underperformance within the health care system in Brazil, underscoring the importance of investigating contributing factors. The EuroLung2 model showed promising performance in terms of discrimination in the Brazilian cohort, indicating its potential utility. Considering additional variables and exploring machine learning analytics may further enhance the performance of surgical risk prediction models.

Meeting presentation

This abstract was presented as a poster at the 31st Annual Conference of the ESTS, in Milano, Italy.

ACKNOWLEDGMENTS

We would like to thank all the participants of the Brazilian Lung Cancer Registry.

AUTHOR CONTRIBUTIONS

PDD: conceptualization; data curation; formal analysis; methodology; and writing. RMT: conceptualization; methodology; supervision; and writing – review & editing. AB: writing – original draft; and writing – review & editing. LLL: conceptualization; data curation; formal analysis; methodology; and writing. CAC: writing – original draft; and writing – review & editing. JSF: original draft; and writing – review & editing. JLG: original draft; and writing – review & editing. FEGC: original draft; and writing – review & editing. FSM: writing – original draft; and writing – review & editing. PMP-F: conceptualization; methodology; supervision; and writing – review & editing.

CONFLICTS OF INTEREST

None declared.

REFERENCES

1.           Shahian DM, Edwards FH, Ferraris VA, Haan CK, Rich JB, Normand SL, et al. Quality measurement in adult cardiac surgery: part 1--Conceptual framework and measure selection. Ann Thorac Surg. 2007;83(4 Suppl):S3-S12. https://doi.org/10.1016/j.athoracsur.2007.01.053
2.           Brunelli A, Rocco G. The comparison of performance between thoracic surgical units. Thorac Surg Clin. 2007;17(3):413-424. https://doi.org/10.1016/j.thorsurg.2007.07.006
3.           Brunelli A, Salati M, Rocco G, Varela G, Van Raemdonck D, Decaluwe H, et al. European risk models for morbidity (EuroLung1) and mortality (EuroLung2) to predict outcome following anatomic lung resections: an analysis from the European Society of Thoracic Surgeons database [published correction appears in Eur J Cardiothorac Surg. 2017 Jun 1;51(6):1212. doi: 10.1093/ejcts/ezx155]. Eur J Cardiothorac Surg. 2017;51(3):490-497. https://doi.org/10.1093/ejcts/ezx155
4.           Brunelli A, Cicconi S, Decaluwe H, Szanto Z, Falcoz PE. Parsimonious Eurolung risk models to predict cardiopulmonary morbidity and mortality following anatomic lung resections: an updated analysis from the European Society of Thoracic Surgeons database. Eur J Cardiothorac Surg. 2020;57(3):455-461. https://doi.org/10.1093/ejcts/ezz272
5.           Pompili C, Shargall Y, Decaluwe H, Moons J, Chari M, Brunelli A. Risk-adjusted performance evaluation in three academic thoracic surgery units using the Eurolung risk models. Eur J Cardiothorac Surg. 2018;54(1):122-126. https://doi.org/10.1093/ejcts/ezx483
6.           Nagoya A, Kanzaki R, Kanou T, Ose N, Funaki S, Minami M, et al. Validation of Eurolung risk models in a Japanese population: a retrospective single-centre analysis of 612 cases. Interact Cardiovasc Thorac Surg. 2019;29(5):722-728. https://doi.org/10.1093/icvts/ivz171
7.           Fernandez FG, Falcoz PE, Kozower BD, Salati M, Wright CD, Brunelli A. The Society of Thoracic Surgeons and the European Society of Thoracic Surgeons general thoracic surgery databases: joint standardization of variable definitions and terminology. Ann Thorac Surg. 2015;99(1):368-376. https://doi.org/10.1016/j.athoracsur.2014.05.104
8.           Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. New York: Springer; 2009.
9.           Youlden et al. The International Epidemiology of Lung Cancer Geographical Distribution and Secular Trends. Journal of Thoracic Oncology • Volume 3, Number 8, August 2008 https://doi.org/10.1097/JTO.0b013e31818020eb
10.        Steyerberg E, Vickers A, Cook N, Gerds T, Gonen M, Obuchowski N, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128-138. https://doi.org/10.1097/EDE.0b013e3181c30fb2
11.        Royston P, Altman D. External validation of a cox prognostic model: principles and methods. BMC Med Res Methodol. 2013;13:33. https://doi.org/10.1186/1471-2288-13-33
12.        Redondo-Sánchez D, Petrova D, Rodríguez-Barranco M, Fernández-Navarro P, Jiménez-Moleón JJ, Sánchez MJ. Socio-Economic Inequalities in Lung Cancer Outcomes: An Overview of Systematic Reviews. Cancers (Basel). 2022;14(2):398. https://doi.org/10.3390/cancers14020398
13.        Butler CA, Darragh KM, Currie GP, Anderson WJ. Variation in lung cancer survival rates between countries: do differences in data reporting contribute?. Respir Med. 2006;100(9):1642-1646. https://doi.org/10.1016/j.rmed.2005.12.006
14.        Soares MS, Coltro LM, Leite PHC, Costa PB, Lauricella LL, et al. Evolution of the surgical treatment of lung cancer at a tertiary referral center in Brazil, 2011-2018. J Bras Pneumol. 2020;47(1):e20190426. https://doi.org/10.36416/1806-3756/e20190426
15.        de Sá VK, Coelho JC, Capelozzi VL, de Azevedo SJ. Lung cancer in Brazil: epidemiology and treatment challenges. Lung Cancer (Auckl). 2016;7:141-148. https://doi.org/10.2147/LCTT.S93604
16.        Knorst MM, Dienstmann R, Fagundes LP. Delay in the diagnosis and in the surgical treatment of lung cancer [Article in Portuguese] J Pneumol. 2003;29(6):358-364. https://doi.org/10.1590/S0102-35862003000600007
17.        Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209-249. https://doi.org/10.3322/caac.21660
18.        Grabois MF, Oliveira EX, Sá Carvalho M. Access to pediatric cancer care in Brazil mapping origin-destination flows [Article in Portuguese]. Rev Saude Publica. 2013;47(2):368-378. https://doi.org/10.1590/S0034-8910.2013047004305
19.        Ferreira CG. Lung cancer in developing countries: access to molecular testing. Am Soc Clin Oncol Educ Book. 2013;327-331. https://doi.org/10.1200/EdBook_AM.2013.33.327
20.        Lista M, Bes FC, Pereira JR, Ikari FK, Nikaedo SM. Excessiva demora no diagnóstico clínico do câncer de pulmão Depende do médico, do paciente ou do sistema? Arq Med Hosp Fac Cienc Med St Casa Sao Paulo. 2008;53(1):6-9.
21.        Costa GJ, Mello MJG, Bergmann A, Ferreira CG, Thuler LCS. Tumor-node-metastasis staging and treatment patterns of 73,167 patients with lung cancer in Brazil. J Bras Pneumol. 2020;46(1):e20180251. https://doi.org/10.1590/1806-3713/e20180251
22.        Levy D, de Almeida LM, Szklo A. The Brazil SimSmoke policy simulation model: the effect of strong tobacco control policies on smoking prevalence and smoking-attributable deaths in a middle income nation. PLoS Med. 2012;9(11):e1001336. https://doi.org/10.1371/journal.pmed.1001336
23.        Erridge SC, Møller H, Price A, Brewster D. International comparisons of survival from lung cancer: pitfalls and warnings. Nat Clin Pract Oncol. 2007;4(10):570-577. https://doi.org/10.1038/ncponc0932

External validation of the parsimonious EuroLung risk models: analysis of the Brazilian Lung Cancer Registry

Related articles

Indexes

Official publication

Newsletters