Article Text

other Versions

Detection of patients with COVID-19 by the emergency medical services in Lombardy through an operator-based interview and machine learning models
  1. Stefano Spina1,2,
  2. Lorenzo Gianquintieri1,3,
  3. Francesco Marrazzo1,2,
  4. Maurizio Migliari1,2,
  5. Giuseppe Maria Sechi1,
  6. Maurizio Migliori1,
  7. Andrea Pagliosa1,
  8. Rodolfo Bonora1,
  9. Thomas Langer2,4,
  10. Enrico Gianluca Caiani3,
  11. Roberto Fumagalli1,2,4
  12. AREU 118 EMS Network Collaborators
    1. 1SOREU, Agenzia Regionale Emergenza Urgenza (AREU), Milano, Italy
    2. 2Department of Anesthesia, Critical Care and Pain Medicine, ASST Grande Ospedale Metropolitano Niguarda, Milano, Italy
    3. 3Electronics, Information and Biomedical Engineering Department, Politecnico di Milano, Milano, Italy
    4. 4Department of Medicine and Surgery, University of Milan-Bicocca, Monza, Italy
    1. Correspondence to Professor Roberto Fumagalli, University of Milan-Bicocca Faculty of Medicine and Surgery, Monza, Lombardia, Italy; roberto.fumagalli{at}unimib.it

    Abstract

    Background The regional emergency medical service (EMS) in Lombardy (Italy) developed clinical algorithms based on operator-based interviews to detect patients with COVID-19 and refer them to the most appropriate hospitals. Machine learning (ML)-based models using additional clinical and geospatial epidemiological data may improve the identification of infected patients and guide EMS in detecting COVID-19 cases before confirmation with SARS-CoV-2 reverse transcriptase PCR (rtPCR).

    Methods This was an observational, retrospective cohort study using data from October 2020 to July 2021 (training set) and October 2021 to December 2021 (validation set) from patients who underwent a SARS-CoV-2 rtPCR test within 7 days of an EMS call. The performance of an operator-based interview using close contact history and signs/symptoms of COVID-19 was assessed in the training set for its ability to determine which patients had an rtPCR in the 7 days before or after the call. The interview accuracy was compared with four supervised ML models to predict positivity for SARS-CoV-2 within 7 days using readily available prehospital data retrieved from both training and validation sets.

    Results The training set includes 264 976 patients, median age 74 (IQR 55–84). Test characteristics for the detection of COVID-19-positive patients of the operator-based interview were: sensitivity 85.5%, specificity 58.7%, positive predictive value (PPV) 37.5% and negative predictive value (NPV) 93.3%. Contact history, fever and cough showed the highest association with SARS-CoV-2 infection. In the validation set (103 336 patients, median age 73 (IQR 50–84)), the best-performing ML model had an AUC of 0.85 (95% CI 0.84 to 0.86), sensitivity 91.4% (95 CI% 0.91 to 0.92), specificity 44.2% (95% CI 0.44 to 0.45) and accuracy 85% (95% CI 0.84 to 0.85). PPV and NPV were 13.3% (95% CI 0.13 to 0.14) and 98.2% (95% CI 0.98 to 0.98), respectively. Contact history, fever, call geographical distribution and cough were the most important variables in determining the outcome.

    Conclusion ML-based models might help EMS identify patients with SARS-CoV-2 infection, and in guiding EMS allocation of hospital resources based on prespecified criteria.

    • COVID-19
    • emergency ambulance systems
    • machine learning
    • pre-hospital care

    Data availability statement

    Data are available on reasonable request. Data relative to SARS-CoV-2 positivity rate in the territory of Lombardy region are available in a public, open-access repository (data source: Protezione Civile repository, https://github.com/pcm-dpc/COVID-19). Data relative to the machine learning models implemented in the study will be available on reasonable request.

    http://creativecommons.org/licenses/by-nc/4.0/

    This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

    Statistics from Altmetric.com

    Request Permissions

    If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

    What is already known on this topic

    • There have been several risk tools created to determine either the presence of SARS-CoV-2 or its likely course; however, there is little information about the identification of these patients in the prehospital phase of care.

    • The use of machine learning (ML) algorithms in the prehospital context has been limited to specific conditions, such as the recognition of out-of-hospital cardiac arrest and predicting the need for critical care resources.

    What this study adds

    • Using retrospective data from operator-based telephone interviews by emergency medicine services, several variables were sensitive for identifying patients who later tested positive for SARS-CoV-2.

    • However, an ML model based on contact history, clinical parameters, geographical data and local epidemiology had greater sensitivity in detecting SARS-CoV-2 infection.

    How this study might affect research, practice or policy

    • ML models may guide emergency medical service in detecting COVID-19 cases before confirmation with SARS-CoV-2 rtPCR results and could be useful in other pandemic outbreaks to allow appropriate isolation and referral to dedicated hospital resources.

    Introduction

    The SARS-CoV-2 pandemic has been spreading worldwide over the last 4 years and the continued emergence of new viral variants has put a strain on public health systems.1 As with other prehospital providers, the emergency medical service (EMS) in the Lombardy region (Italy) was challenged by a remarkable increase in calls directed to its public safety answering points (PSAP) since the first COVID-19 outbreak.2 Detecting COVID-19 cases has been crucial to directing these patients to dedicated hospital resources while guaranteeing other routine EMS activity.

    A strategic plan based on multiple clinical algorithms was implemented by the Agenzia Regionale Emergenza Urgenza (AREU) to manage the escalating volume of calls, control ambulance allocations and ultimately avoid EMS collapse.3

    Both operators and healthcare professionals working in PSAP use these algorithms to identify these individuals based on signs and symptoms severity as well as identifying early indicators of new local viral outbreaks in the Lombardy region.4 5

    However, it is still uncertain which signs and symptoms obtained at the prehospital level are most predictive of SARS-CoV-2 infection and no large-scale analysis has been done to assess the accuracy of EMS-collected clinical data in the determination of SARS-CoV-2 as confirmed by reverse transcriptase PCR (rtPCR). Furthermore, machine learning (ML) models have shown promise in predictive tasks during the COVID-19 outbreak. However, this has been limited to ED cohorts rather than in the prehospital setting.6 7

    In this study, we assessed the performance of the clinical algorithms currently used in our PSAP (ie, operator-based interview) to identify patients that will test positive on SARS-CoV-2 rtPCR. We also aimed to develop an ML model to predict SARS-CoV-2 rtPCR positivity, based on clinical and geospatial data obtained during the PSAP call and provided by the ambulance report on the scene. We evaluated such models: (1) as a screening test to detect positive patients and (2) as a support decision tool to guide patient’ allocation in a real-world scenario. We hypothesised that an ML model could achieve a better performance than clinical interviews in detecting cases of COVID-19 in the prehospital setting.

    Methods

    Study design and participants

    This observational, retrospective cohort study included all patients managed by the AREU EMS in Lombardy, Italy from October 2020 to July 2021 (training set) and from October 2021 to December 2021 (validation set). Patients who received assistance by the regional EMS were included if a result of the SARS-CoV-2 rtPCR was available in the timeframe of 7 days before or after ED admission, regardless of whether their index EMS call was for COVID-19-related symptoms or not.

    Setting

    The AREU is responsible for the EMS in the Lombardy region (Italy), covering a population of almost 10 million people in an area of about 24 000 km2. A primary-level PSAP is the first recipient of 1-1-2 phone calls from citizens asking for police, fire or medical assistance (ie, the equivalent of 9-1-1 or 9-9-9 systems used in other countries). When medical assistance is required, callers are redirected to a secondary-level PSAP (PSAP-2), which manages all regional EMS resources.3

    During pandemic surges, hospitals were designated by the regional public health authorities for COVID-19 treatment through a hub-and-spoke model based on severity.8 Patients screened as non-COVID-19 were allocated elsewhere (such as trauma or stroke centres), however, each hospital had a COVID-19 and non-COVID-19 pathway in its ED, based on SARS-CoV-2 testing results.

    Data sources

    The following variables were retrieved and analysed:

    • General: unique identifiers for ambulance mission and for individual patients, date, administrative area where the event occurred, caller’s Global Positioning System coordinates, classification of the cause of the event requiring intervention, gender and age of the patient, admitting ED.

    • Operator-based interview: binary answers to questions asked by operators at the PSAP-2: close contact with a person who tested positive for SARS-CoV-2, complaining of or audible shortness of breath, presence of fever, vomiting, diarrhoea, cough and/or other cold-like symptoms, ageusia/anosmia, asthenia and/or diffuse pain. The caller was considered a suspected case for COVID-19 if she/he reported one or more of these signs and symptoms.

    • Clinical parameters, retrieved by nurses or physicians from on-scene ambulance reports: mental status ("Alert, Verbal, Pain, Unresponsive (AVPU) score), RR, oxygen saturation in room air (SpO2), respiratory quality (normal or distress), HR, systolic and diastolic BP and temperature.

    • Daily report of SARS-CoV-2 positivity rate in the Lombardy region, computed as an average over the previous 5 days (data source: Protezione Civile repository, https://github.com/pcm-dpc/COVID-19).

    • SARS-CoV-2 rtPCR testing: positive or negative result ±7 days from the EMS call.9

    Machine learning model development

    The initial dataset was preprocessed by removing records with missing and outlier values (detected by the z-index method with a threshold set to five), and deleting the variable ‘RR’, as it was reported in only half of the records. Categorical variables were converted into dummy numerical values, and all variables were scaled in the 0–1 range.7

    We implemented four supervised learning models to predict the positivity for SARS-CoV-2 on the rtPCR test (ie, the gold standard), the target variable for all models (figure 1). Results were evaluated with a 10-fold cross-validation protocol: the entire available dataset was divided into 10 subsets, and each of them was used once to validate a model trained on the other 9 subsets, with a final evaluation based on the distribution of the metrics across the different iterations.7 For each model, we tested four different algorithms (ie, logistic regression, random forest classifier, support vector machine and Gaussian Naïve Bayes). The different explanatory variables were included in the models as follows:

    • Model 1: age, gender and variables retrieved by the operator-based interview.

    • Model 2: as for model 1, plus clinical parameters retrieved by healthcare professionals from the on-scene ambulance.

    • Model 3: variables in model 1, plus the current SARS-CoV-2 epidemiology in the Lombardy region, and the geographical distribution of EMS calls for respiratory and infectious diseases in the previous 7 days.4

    • Model 4: all variables used in models 1–3.

    Figure 1

    Model description. The performance of the operator-based interview in detecting patients with COVID-19 is evaluated by matching the results of the available SARS-CoV-2 rtPCR (box A). The machine learning models are implemented considering different combinations of the variables of the operator-based interview, the clinical parameters provided by on-scene ambulances, the local epidemiology and the distribution of EMS calls in the previous 7 days. The ultimate goal of the models is to detect cases of COVID-19. An additional model is also tested in two scenarios that could be used to guide the decision to refer patients to the proper hospital destination, based on prespecified criteria (box B). The explanatory variables included in each model are reported in the table (box C). rtPCR, reverse transcriptase PCR; AREU, Agenzia Regionale Emergenza Urgenza; EMS, emergency medical service; PSAP-2, secondary-level public safety answering point; SpO2, pulse oximeter oxygen saturation.

    Additional information regarding each model development is reported in online supplemental methods and online supplemental figure 1.

    A further ML model was developed to simulate a real-world application. Specifically, the model was implemented to support the EMS decision-making capability to allocate patients to the appropriate hospital, based on specific criteria such as the patient’s clinical condition and her/his SARS-CoV-2 positivity. Here, an iterative procedure was implemented using historical data, repeating the whole process on every week of records for a total of 38 cycles. Additional information regarding this model development is provided in online supplemental methods and online supplemental figure 2.

    The first analysis included all patients in the low prevalence period (online supplemental table 1). Here, we assumed that positive patients should be allocated to hub hospitals and negative patients to non-hub hospitals. A second analysis included only patients presenting with severe features (ie, SpO2<94% or RR >30) at EMS calls in the high prevalence period (online supplemental table 1). Therefore, in the latter, positive patients would be addressed to hub hospitals, whereas negative patients would be addressed to non-hub hospitals. Therefore, the two analyses assessed the model’s capability to address patients to the appropriate hospital, which was the ultimate outcome of the model.

    Statistical analysis

    Continuous variables were expressed as median (IQR) and categorical variables as count (n) and percentage (%). Sensitivity, specificity, positive predictive value (PPV) and negative predicted value (NPV) were calculated to quantify the performance of the variables collected through the operator-based interview as compared with the rtPCR gold standard. In order to study the variability of the operator-based interview performance among the different phases of the pandemic, the dataset was divided into quartiles. The official daily number of positive patients in the Lombardy region was retrieved for the entire period and filtered with a 7-day window moving average, with each day assigned to one of the four quartiles accordingly. Four different datasets were thus obtained, each one reporting the records that occurred on all days belonging to the same quartile of SARS-CoV-2 prevalence in the territory (online supplemental table 1 and online supplemental figure 3).

    To assess the importance of clinical variables, univariate and multivariate logistic regression models were implemented including predictor variables retrieved by the operator-based interview (alone) and with the on-scene ambulance report (ie, clinical parameters). The OR, 95% CI and C-statistics were calculated. A two-sided p value of <0.05 was considered statistically significant.

    To assess the performance of the ML model in the training set, receiver operating characteristic curves were plotted and the area under the curve (AUC) was calculated. Sensitivity, specificity, PPV, NPV and accuracy were also calculated for the ML-based model at a fixed cut-off with a sensitivity target threshold of 90% (95% CIs were estimated with the Clopper-Pearson method, considering the median values across five cycles of 10-fold cross-validation).10 In order to assess the contribution of each variable to our models, Shapley additive explanations (SHAP) were applied.11 This method builds on the game-theory approach to explain the results of ML models.

    The performance of each ML model was additionally tested on a validation dataset, independent of the training set. A detailed assessment of the real-world simulation model is described in online supplemental methods and online supplemental figure 2.

    Data were first collected in regionally developed software for computer-aided dispatch (Emma, V.6.8.5, Beta80 Group, Milan, Italy) and exported using SAS Web Report Studio V.4.4 M4 (SAS Institute, Cary, North Carolina, USA). Data analysis and model implementation were performed with Python (V.3.9); the libraries used are provided in the online supplemental methods. Quartiles distribution was performed with MatLab (V.2018b). Call distribution in the Lombardy region was performed with QGIS (V.3.4.6). All model scripts used in the analysis are publicly available on GitHub (https://github.com/LGpolimi/Detection-of-patients-with-COVID-19/tree/master/env/COVID_DIAGNOSIS_MODEL).

    Results

    Baseline characteristics

    The AREU managed 684 481 ambulance dispatches from October 2020 to July 2021 (training set), of which 549 755 were transported to a regional hospital. Of these, 264 976 (48.2%) patients had SARS-CoV-2 rtPCR tests performed within 7 days prior to (n=40 731, 15.4%) or after (n=224 245, 84.6%) their EMS call and were included in the training set. Median age was 74 (IQR 55–84) years, 127 215 (48%) were female and 59 526 (22.5%) tested positive.

    The validation set included 238 387 ambulance dispatches from October 2021 to December 2021, of whom 191 838 were transported to a regional hospital. A SARS-CoV-2 rtPCR test result was available in 103 336 patients, and 8253 (8%) were positive.

    The population characteristics of training and validation sets are reported in table 1. The distribution of positive cases in the Lombardy region during the study periods is reported in online supplemental figure 4. Overall, the prevalence in the study period ranged from 73 cases/100 000 to 2528 cases/100 000 population.

    Table 1

    Population characteristics

    Operator-based interview

    The operator-based interview is based on binary answers to questions asked by receiver technicians at the PSAP-2, investigating signs and symptoms related to SARS-CoV-2 infection. The caller was considered a suspected case of COVID-19 by the PSAP-2 operator if they reported one or more of the signs and symptoms detailed in the ‘Methods’ section. The sensitivity and specificity of the interview in the whole training set were 85.5% and 58.7%, respectively. The PPV and NPV were 37.5% and 93.3% and accuracy 0.65 (table 2).

    Table 2

    Operator-based interview performances

    Importance of clinical variables retrieved by operators and EMS

    To assess the importance of clinical variables, univariate and multivariate logistic regression models were implemented including predictor variables retrieved by the operator-based interview alone or variables provided by the on-scene ambulance report (ie, clinical parameters) combined with variables retrieved by the operator-based interview. Complete results are reported in table 3. When variables retrieved at the operator-based interview and clinical parameters obtained in the field by EMS were both included in the analysis, close contact, fever, cough and SpO2<94% showed the highest association with SARS-CoV-2 infection. The C-index of the model based on the operator-based interview alone was 0.79. The logistic regression model that included all variables (ie, operator-based interview plus clinical parameters) had a C-index of 0.83.

    Table 3

    Logistic regression analysis

    Machine learning models

    The best performing algorithm for all models was the random forest (table 4) in both training and validation sets. Complete metrics of the different ML algorithms that were tested in the training set are reported in the online supplemental table 2.

    Table 4

    Detailed metrics of different machine learning (random forest) models in the training and validation sets

    The performance of ML models was lower in the validation set, especially model 1. Model 4 had the highest AUC in training (0.94, 95% CI 0.93 to 0.95) and validation (0.85, 95% CI 0.84 to 0.86) sets, respectively (figure 2 left, and table 4). The importance of each explanatory variable in the model output is reported as SHAP value and graphically represented in figure 2 (right). Briefly, close contact and fever were the most relevant variables in determining the outcome in all four models. Other important variables were cough and age in model 1, and SpO2 and cough in model 2. Caller geographical distribution was the third most important variable in both models 3 and 4.

    Figure 2

    Machine learning models performance. Left: receiver operating characteristic (ROC) curves of the four models implemented, as compared with the performance of the operator-based interview (*). The box within each graph reports the AUC with the respective 95% CI, relevant to the random forest classifier. Black line, training set; blue line, validation set. Right: graphical representation of the contribution of each explanatory variable in predicting SARS-CoV-2 positivity within each model, according to SHAP. The impact on the model is reported as SHAP value. The lines represent the variables, whereas each dot represents a single record. The importance of the variable within the specific model decreases from the top to the bottom. The colour of a single dot is red if the value is high or blue if the value is low. AUC, area under the curve; DBP, diastolic blood pressure; SBP, systolic BP; SHAP, Shapley additive explanations; SpO2, pulse oximeter oxygen saturation.

    A further ML model was developed to test the ability to refer patients to the hub hospitals based on arbitrary criteria.

    In the first simulation (panel A, figure 3), in the low prevalence period (n=96 984), positive patients would be addressed to hub hospitals, and negative patients to non-hub hospitals. Based on actual data, the goal was achieved in 61.6% (n=59 724) patients, while it would be achieved in 81.8% (n=79 386) in an ML-based scenario.

    Figure 3

    Real-world scenario simulation. Upper panel: a first simulation was performed in the low prevalence period (n=96 984). Here, positive patients would be addressed to hub hospitals, while negative patients to non-hub hospitals. Based on actual data (ie, actual scenario), the goal was achieved in 61.6% (n=59 724) patients, while it would be achieved in 81.8% (n=79 386) in a machine learning-based scenario. Box C, n=49 198 (50.7%). Box D, n=30 188 (31.1%). Box E, n=7072 (7.3%). Box F, n=10 526 (10.8%). Lower panel: a second simulation was performed in patients presenting with severe features at EMS calls in the high prevalence period (n=37 230). Here, positive patients presenting with severe features would be addressed to hub hospitals, while negative patients presenting with severe features would be addressed to non-hub hospitals. Here, the goal was achieved in 50.6% (n=18 850) in the actual scenario and it would be achieved in 74.4% (n=27 688) patients in a machine learning-based scenario. Box C, n=13 134 (35.3%). Box D, n=14 554 (39.1%). Box E, n=3826 (10.3%). Box F, n=5716 (15.3%).

    In the second simulation (panel B, figure 3), in patients presenting with severe features in the high prevalence period (n=37 230), positive patients would be addressed to hub hospitals, while negative patients would be addressed to non-hub hospitals. Based on actual data, the goal was achieved in 50.6% (n=18 850) of cases and would be achieved in 74.4% (n=27 688) in an ML-based scenario. Complete data of both simulations are provided in online supplemental figures 5-8 and online supplemental table 3.

    Discussion

    This cohort study, conducted in one of the most involved areas in Europe during the pandemic, investigated the association of prehospital demographic data and clinical features among patients with a rtRCR-confirmed infection who called EMS.12 13

    Similar to other studies, close contact with a known case, cough and fever were most predictive of COVID-19.14 15 The presence of altered consciousness, vomiting, diarrhoea and haemodynamic instability was associated with a reduced risk of infection, suggesting that aetiologies other than COVID-19 were responsible for the symptoms for which the patient was seeking care.16 17 A clinical algorithm using the variables obtained by an EMS operator had a sensitivity of >80%, but low specificity, which is reasonable for a screening test in the prehospital setting.18 However, a ML model using additional clinical and epidemiological data, which were available to EMS in the prehospital setting, showed superior performance in detecting cases with greater sensitivity and specificity.

    The implementation of ML models to guide clinical decisions has gained interest recently, especially in hospital settings.6 During the pandemic, studies focused on early COVID-19 detection and prediction of disease progression.19–24 Canas et al estimated the probability of an individual being infected with SARS-CoV-2 based on self-reported symptoms. They found that a hierarchical Gaussian process model trained on 3 days of symptoms had an AUC of 0.80 (95% CI 0.80 to 0.81), which is comparable to our models.24 Soltan et al developed a tool (CURIAL-Lab) to screen for SARS-CoV-2 infection in the ED, with an AUC range of 0.84–0.85 (95% CI 0.81 to 0.89) in validation cohorts. However, their model is based on full blood count values, along with vital signs, and is not applicable prehospital.22

    The use of ML algorithms in the EMS context has been limited to specific subjects, such as the recognition of cardiac arrest and the need for critical care resources.25–27 We developed an ML model that showed promise in helping EMS to detect COVID-19 cases. The integration of contact history, signs and symptoms, clinical parameters collected by ambulance personnel, along with geographical call distribution and current number of positive cases in a specific area, led to a model that could more accurately predict COVID-19 positivity by considering clinical data and up-to-date viral distribution in a specific territory. We included different explanatory variables in our models integrating the different information in a gradual manner. The first two models (ie, model 1 and model 2) include variables commonly retrieved by worldwide PSAP and might be applicable to other settings. The other models also include variables retrieved from local epidemiology and analysis of the geospatial distribution of EMS calls, hence leveraging information sharing between EMS and local public health authorities. The study also highlights that the weight of each variable changes throughout the analyses performed. In fact, when focusing on only interviews and clinical variables, close contact, fever and cough showed the strongest association with patients’ positivity. When the same variables were included in the ML models, close contact and fever still showed the strongest association. However, call geographical distribution and local epidemiology played a significant role as well as improving the model’s ability to detect positive cases.

    Although the impact of the pandemic is declining, other similar calamities might occur in the future. It is therefore conceivable that ML-based models might be adapted and applied in the EMS setting to other events. It may be crucial for public health authorities to estimate the extent and spread of a pandemic disease, especially in the early phases when the course is unpredictable. EMS has a role in managing calls and patients one step before hospital care. In that sense, if ML algorithms were integrated into the out-of-hospital data process, EMS might provide public health authorities with early clues of disease spread.4 On the other hand, with the differences between health systems, it would be essential to have algorithms flexible enough to adapt to prespecified criteria, for instance, to allocate patients to different hospitals in a network. For this reason, we simulated the application of an ML model to test its utility in referring patients to hospital resources with different characteristics (ie, hub vs non-hub hospitals). We found that the algorithm could ‘correct’ the hospital destination for a significant proportion of patients. For instance, in a high prevalence scenario, it may be desirable to limit access to hub hospitals for positive patients with severe features, with >20% of patients correctly re-addressed by the ML algorithm. Therefore, although our models do not predict individual clinical severity and outcome, they might be potentially useful at a prehospital level for operational or public health reasons.

    This study included a large number of patients managed by a regional EMS that links out-of-hospital clinical presentation with the result of the gold standard rtPCR test performed in a close time frame and retrieved from an official database directly provided by the regional public health authorities. Moreover, most variables included in the analysis are relatively simple, precise and commonly retrieved by other EMS. Thus, the information provided by our study could be relevant and applied to other services worldwide. The signs, symptoms and clinical parameters were screened and retrieved precisely and contemporaneously by trained personnel and using the same software. The dimension of the dataset allowed for consistent analysis, enabling the application of a 10-fold validation protocol. Finally, the ML models maintained a good performance (AUC >0.8) on validation on a large, independent dataset. This suggests a stable application of our models in the setting of different viral variants presenting with different clinical and epidemiological characteristics.

    Our study has some limitations. First, we included in the analysis patients whose rtPCR test was done within 7 days of their EMS call. Therefore, patients whose tests were performed outside this time frame have been excluded. As most studies assume a median incubation period of up to 5–7 days, it is unlikely that this timeframe might significantly impact the performance of the models implemented in the study.28 Second, the EMS in Lombardy is part of a two-level PSAP system, where the PSAP-2 dispatches ambulances in the regional territory and allocates them to different hospitals, which have different characteristics and resources. Thus, the applicability of our model might be challenged in areas with very different EMS and hospital systems. However, we tried to overcome such limitation by including in our models variables commonly retrieved by EMS worldwide. Third, our analysis does not consider the different viral variants that have been shown to impact viral shedding, contagiousness, transmissibility and clinical severity. Fourth, the analysis does not include the vaccination status of either single patients or the general population. However, as the training and the validation sets are temporally independent, it could be hypothesised that the patient profiles were different, especially with respect to different viral variants and vaccination status. Performance in the validation cohort was good, with an AUC >0.8 in most models. Fifth, we acknowledge that an rtPCR result was unavailable in about half of the subjects included in the study period. However, the risk of verification bias is low as all patients underwent an rtPCR test once admitted to the ED regardless of the reason for calling EMS. Moreover, RR was not included in model development due to the high proportion of missing data. Given that respiratory symptoms were a key feature of COVID-19, this may have impacted model performance. Finally, the estimated improvement in the achievement of hospital destination (hub vs non-hub) does not consider operational components of the real-world scenario, such as crowding level of different facilities and urgency of interventions, which could have affected decisions about actual hospital destination.

    Conclusions

    An operator-based interview that explores signs and symptoms most commonly associated with COVID-19 showed a sensitivity >80% for detecting patients with COVID-19. An ML model that integrates clinical variables, geographical information and current local epidemiology showed the best performance in detecting cases. When the ML model is tested in real-world scenarios, such as the determination of hospital destination, the model can guide EMS to refer a remarkable percentage of patients to the proper hospital resources, based on prespecified allocation criteria.

    Data availability statement

    Data are available on reasonable request. Data relative to SARS-CoV-2 positivity rate in the territory of Lombardy region are available in a public, open-access repository (data source: Protezione Civile repository, https://github.com/pcm-dpc/COVID-19). Data relative to the machine learning models implemented in the study will be available on reasonable request.

    Ethics statements

    Patient consent for publication

    Ethics approval

    The study was approved by the institutional review board of Milano Area 2 (approval ID: 598_2021). Due to the retrospective nature of the study, the requirement for written informed consent was waived.

    Acknowledgments

    The authors are deeply grateful to all the technicians, nurses and physicians working in the emergency medical system in Lombardy (Italy). The authors thank David Campeau for his critical revision of the manuscript and for his support in further improving the study.

    References

    Supplementary materials

    Footnotes

    • Handling editor Shammi L Ramlakhan

    • Twitter @ste_spi

    • SS, LG and FM contributed equally.

    • Collaborators AREU 118 EMS Network Collaborators: Matteo Caresani, Rainiero Rizzini, Fabrizio Canevari, Alessandra Sforza, Dario Franchi, MD, Stefano Alberti, MD Antonella Brancaglione, MD, Paola Manzoni, MD, Fabio Sangalli, MD, Simone Redaelli, MD, Eleonora Brioschi.

    • Contributors SS, LG, FM, MM (Maurizio Migliari) and TL contributed to the study concept and design. AP, RB and MM (Maurizio Migliori) contributed to data acquisition. SS, LG and FM contributed to the initial drafting of the manuscript. All authors contributed to data interpretation and critical revision of the manuscript. RF, EGC and GMS contributed to study supervision. TL, MM (Maurizio Migliari), RF, EGC and GMS contributed to the discussion of the results and reviewed the data and the final manuscript. All authors had full access to all the data in the study and had final responsibility for the decision to submit for publication. RF acts as guarantor for the study.

    • Funding This study was funded by ‘Development of geospatial-temporal predictive models for air quality and pandemic risk’—Department of Electronics, Information and Bioengineering, Politecnico di Milano (Milan, Italy) (ID 80724). ‘Department of Anaesthesia, Critical Care, and Pain Medicine’—School of Medicine and Surgery, University of Milano-Bicocca (Milan, Italy) (ID 2016cont0078).

    • Competing interests None declared.

    • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

    • Provenance and peer review Not commissioned; externally peer reviewed.

    • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.