Objective To compare the prognostic performance of the predisposition, infection, response and organ failure (PIRO) score with the traditional sepsis category and clinical judgement in high-risk and low-risk Dutch emergency department (ED) sepsis populations.
Methods Prospective study in ED patients with severe sepsis and septic shock (high-risk cohort), or suspected infection (low-risk cohort). Outcome: 28-day mortality. Prognostic performance of PIRO, sepsis category and clinical judgement were assessed with Cox regression analysis with correction for quality of ED treatment and disposition. Illness severity measures were divided into four groups with the lowest illness severity as reference category; discrimination was quantified by receiver operator characteristics with area under the curve (AUC) analysis.
Results Death occurred in 72/323 (22%, high-risk) and 23/385 (6%, low-risk) patients. For the low-risk cohort, corrected HRs (95% CI) for categories 2–4 were 2.0 (0.4 to 11.9), 4.3 (0.8 to 24.7) and 17.8 (2.8 to 113.0: PIRO); 0.5 (0.05 to 5.4), 2.1 (0.2 to 21.8) and 7.5 (0.6 to 92.9: sepsis category). Patients discharged home (category 1) all survived. HRs were 4.5 (0.5 to 39.1) and 13.6 (4.3 to 43.5) for clinical judgement categories 3–4. Prognostic performance was consistently better in the low-risk than in the high-risk cohort. For PIRO AUCs were 0.68 (0.61 to 0.74; high-risk) and 0.83 (0.75 to 0.91; low-risk); for sepsis category AUCs were 0.50 (0.42 to 0.57; high-risk) and 0.73 (0.61 to 0.86; low-risk); for clinical judgement AUCs were 0.69 (0.60 to 0.78; high-risk) and 0.84 (0.73 to 0.96; low-risk).
Conclusions The accuracy and discriminative performance of the PIRO score and clinical judgement are similar, but better than the sepsis category. Prognostic performance of illness severity scores is less in high-risk cohorts, while in high-risk populations a risk stratification tool would be most useful.
- clinical assessment
- emergency care systems, emergency departments
- infectious diseases
- risk management
Statistics from Altmetric.com
Risk stratification of emergency department (ED) patients with a suspected infection is important for guidance of initial treatment and the decision to admit patients to an intensive care unit (ICU) or ward, both affecting patient outcome.1–5 A risk stratification tool should predict relevant clinical outcomes and guide treatment and disposition decisions for the individual patient. Illness severity scores have the potential to serve this purpose, although this is much debated.4–6
The ‘predisposition, infection, response, organ failure (PIRO) score’ has recently been developed for risk stratification of ED patients with sepsis,7 and is based on the PIRO classification which was proposed to replace the rather one-dimensional systemic inflammatory response syndrome (SIRS) concept, which lacks specificity and clinical utility.8 However, it is unclear whether the PIRO score is better than SIRS or even the clinician's risk estimate.9–11 The PIRO score has been developed in low-risk populations (mortality of ∼5%) with a large fraction of patients with sepsis without organ failure,7 whereas for most of these patients clinical judgement will probably be adequate. An erroneous decision probably has the largest effect in high-risk populations, as it was recently shown that 13% of severely septic patients have an unexpected transfer from ward to ICU, associated with increased mortality and hospital length of stay.4 Thus, proper risk assessment seems to be most needed in high-risk patients, while at the same time it has been suggested that in these patients prognostic performance of PIRO might be limited.4 ,12 Interestingly, the O component of PIRO, representing potentially reversible organ dysfunction, seemed to have similar prognostic performance to that of the total PIRO score, but is much simpler.7
Before the SIRS concept and the already existing mortality in ED sepsis (MEDS) score13–15 are abandoned and the PIRO staging system is introduced in EDs of various healthcare systems, its value compared with MEDS and SIRS should be better characterised, particularly in high-risk populations. More importantly, it is unknown how the prognostic performance of PIRO relates to the clinician's risk estimate and whether a relatively simple organ failure score is equally useful as a risk stratification tool as the total PIRO score.
The purpose of this study was therefore twofold. First, to directly compare the prognostic performance of the PIRO score with the MEDS score, traditional sepsis category and clinical judgement, in a high-risk and low-risk population of Dutch ED patients with suspected infection. Second, to investigate which aspects of the PIRO staging system would be most useful for guidance of treatment and disposition decisions in the ED.
Study design and setting
This was an observational study conducted in a high-risk and low-risk cohort. Data were used of an existing high-risk cohort in which subjects were prospectively enrolled between 1 November 2007 and 1 April 2011 at the Leiden University Medical Centre (LUMC: tertiary care centre with ∼30 000 ED visits annually) and Medical Centre Haaglanden, Westeinde (urban hospital with ∼50 000 ED patients annually), as part of a quality improvement programme. Patients for the low-risk cohort were prospectively enrolled in a separate time period between 1 June 2011 and 24 April 2012, so that both cohorts were independent. The study was approved by the medical ethics committee of the LUMC. This study met the criteria for exemption from obtaining informed consent.
Selection of participants
Low-risk cohort: All consecutive ED patients ≥17 years with suspected infection and triage category17 yellow, orange and red were included by the triage nurse or the nurse/physician caring for the patient. Triage categories blue and green were excluded because many very-low-risk patients were expected in these categories for whom risk stratification is not likely to be important (ie, patients with a simple pharyngitis). Any sign that triggered the triage nurse and treating physician to suspect an infection was suitable (ie, fever, coughing, erythema, etc). Patients who appeared to have no infection according to final hospital discharge letter were excluded (ie, pulmonary embolus, autoimmune disorders presenting with fever).
In the high-risk cohort, data were collected as described previously.4 For the low-risk cohort data were collected similarly: the treating nurse put a patient sticker on a registration form if the patient met the inclusion criteria. All nurses/physicians were informed about the data that had to be collected by means of oral presentations, posters and flyers in the ED and the registration form which contained the study protocol. Demographic and laboratory variables, vital signs, time to antibiotics, amount of fluid and outcome variables were prospectively registered in the digital hospital information system Chipsoft Ezis (Chipsoft, Amsterdam). A medical student subsequently transferred data from the electronic hospital information system (including the electronic patient file) to a PASW file (PASW V.20.0, IBM, New York, USA) to calculate the number of Surviving Sepsis Campaign (SSC) goals attained4 ,16 and all measures of illness severity.
Illness severity was described in four ways (see online supplementary data for definitions). First, by sepsis category, similar to the SSC.1 ,16 Second, by the PIRO and MEDS scores, as described previously.7 ,13 The PIRO and MEDS scores are not used in the participating hospitals and were calculated retrospectively so the treating physicians were not aware of the score at the time of ED presentation. Physicians were, however, asked to screen for organ failure in patients with a suspected infection as part of a SSC-based sepsis screening.4 In a similar way to the validation of the APACHE ((Acute Physiology And Chronic Health Evaluation) score missing values were counted as 0 in the calculations of the PIRO and MEDS score.18
Because, to the best of our knowledge, no validated scoring system exists to quantify clinical judgement, we a priori chose to use the disposition decision of the treating physician in the ED as a proxy measure for clinical judgement, as described previously.4 Quantification of clinical judgement by prospective assessment of the ED physician's risk estimate carries the risk of the Hawthorne effect—that is, the extra attention leads to more aggressive ED management by the treating physician,19 which would affect our study results.
Four severity categories were defined: category 1 represented low-risk patients who were discharged home; category 2 represented ward admissions; category 3 represented ward admissions after ICU consultation in the ED—the ED physician deemed ICU admission necessary but the intensivist thought that a ward admission would be adequate; category 4 comprised patients who were admitted to the ICU—the ED and ICU physician considered the patient to be very ill and at high risk. In the high-risk cohort, clinical judgement was quantified in a similar way with the exception that category 1 did not exist because no patients were sent home, so three illness severity categories were defined accordingly.
A patient was considered to have a ‘do not resuscitate’ (DNR) status if existing medical files already stated that the patient had a DNR code or when it was decided at the time of ED presentation or during hospital admission. The final hospital diagnosis was derived from the hospital discharge letter.
The primary outcome measure was in-hospital and 28-day mortality. A secondary outcome measure was hospital length of stay. Patients were followed up for 28 days or until hospital discharge if the patient was admitted for longer than 28 days. A medical student collected all hospital discharge letters (for the in-hospital mortality) and reports of the outpatient visits after the admission (for the 28-day mortality).
Data were presented as mean (SD) if normally distributed and median (IQR) if data were rightly skewed. Descriptive categorical and continuous data were analysed using χ2 tests and Student t tests. PIRO and MEDS scores were divided into four categories to enable direct comparison of the four sepsis categories (suspected infection and <2 SIRS criteria, sepsis, severe sepsis, septic shock) and clinical judgement, with the lowest used as reference. A log-rank test was used to compare survival at 28 days for all illness severity measures, so that mortality and also hospital length of stay were taken into account in the prognostic properties of the various illness severity scores. HRs and 95% CI were calculated with Cox regression analysis: first, HRs were calculated for the PIRO, MEDS, sepsis and clinical judgement categories (crude model). Subsequently, these HRs were corrected for ED treatment (the number of SSC goals of the resuscitation bundle that were achieved in each individual patient4 ,16 and disposition to ward or ICU, because both were expected to affect in-hospital mortality (corrected model). Clinical judgement was not corrected for disposition because disposition was used to quantify clinical risk estimate, causing colinearity by definition. However, HRs were shown.
Binary logistic regression was used in a similar way to assess the association of the illness severity categories and in-hospital mortality. Calibration was tested with the Hosmer and Lemeshow test. The general rule of thumb was that the number of events divided by 10 indicated that we needed 20–30 events to be able to correct the association between illness severity category and mortality for quality of ED treatment and disposition.
Discriminative performance was assessed using receiver operator characteristics with area under the curve (AUC) analysis, which measures the ability of a score to discriminate between survival and mortality. The AUC of clinical judgement was reported with and without DNR patients because many DNR patients were expected to be admitted to a ward, regardless of illness severity. The AUCs of the illness severity scores were not expected to be affected by DNR status. AUCs were compared using the method of Obuchowski.20 Briefly, the covariance term in the formula for the Z statistic was calculated using Kendall's τ, as described previously.21 The covariance equals 0 when independent data are compared. p Values were multiplied by three to correct for three comparisons of AUCs among illness severity groups (Bonferroni's method). A p value of <0.05 was considered significant. All data were analysed using PASW statistics (SPSS V.20.0, IBM, New York, USA).
Patient characteristics and inclusion
Three hundred and twenty-three patients were included in the high-risk cohort. In the low-risk cohort, 405 patients were initially eligible but 20 patients were excluded because in retrospect they appeared to have no infection (pulmonary embolism, anaemia, toxic fever caused by chemotherapy and epilepsy), leaving 385 patients for analysis. Patient characteristics are shown in tables 1 and 2.
Prognostic performance of illness severity scores and clinical judgement for relevant clinical outcomes
In figure 1 it is shown that observed in-hospital mortality increases with increasing PIRO, MEDS, sepsis and clinical judgement category. For in-hospital mortality, the Hosmer–Lemeshow tests (see figure legend) indicated fair to good model calibration for all measures of illness severity, although the low-risk cohort seemed to have the best goodness of fit. The results of the logistic regression for 28-day mortality were similar to in-hospital mortality and were therefore not shown separately.
Similarly, increases in illness severity category were associated with increasing hazard for 28-day mortality (figure 2), except for sepsis category, which had no significant association with 28-day mortality, once corrected for ED treatment and disposition. The lowest category of clinical judgement (patients who were discharged home) was the perfect predictor of survival (no patients died). All HRs decreased when they were corrected for ED treatment and disposition.
Discriminative performance of PIRO, MEDS, sepsis and clinical judgement category in the low-risk cohort was similar for prediction of 28-day mortality, as is shown in figure 3 (p>0.05).
Figure 4 shows that the discriminative performance in the low-risk cohort was better than in the high-risk cohort (0.036, 0.089, 0.001 and 0.002) for PIRO, MEDS and sepsis and clinical judgement category, respectively. PIRO, MEDS and clinical judgement had larger AUCs than the sepsis category in the high-risk cohort (p<0.001 for all comparisons). In the low-risk cohort, there was a similar trend but this was not significant (p>0.05 for all comparisons).
Discriminative performance of clinical judgement in the high-risk cohort was better when DNR patients were excluded (p<0.001). In the low-risk cohort, the AUCs of clinical judgement with and without DNR patients were similar (p=0.288).
Finally, to assess which components of the PIRO score had the best discriminative properties, AUCs of the separate scores are depicted for both the high and low-risk cohorts in figure 5. In the high-risk cohort only the predisposition and organ failure score had an AUC that was significantly different from 0.5. In the low-risk cohort, all individual components of the PIRO score were significantly different from 0.5 and tended to have larger AUCs than the high-risk cohort, with the organ failure component being significantly larger than that of the high-risk cohort (p<0.001).
This study has several limitations.
First, we used a proxy measure for clinical judgement because no validated scoring systems exist. However, we found a clear association between clinical judgement category and the hazard of death within 28 days, suggesting that disposition decision of the ED physician is a reasonable construct for clinical risk estimate.
Second, the use of the SSC screenings tool for inclusion of patients in the high-risk cohort and an SSC-based sepsis protocol in the low-risk cohort stimulated the clinician to look for organ failure in patients with a suspected infection but they were not obliged to do so. Thus, ‘clinical judgement’ in this study is a protocol-supported judgement, reducing the difference between the prognostic performance of PIRO, MEDS and clinical judgement, because elements of the PIRO score are incorporated in the SSC protocol. It would be unethical, however, not to have some kind of sepsis protocol. Even without a sepsis protocol, physicians are probably influenced by the numerous publications about illness severity published in the last decades,6 ,7 ,12–15 so they would probably include organ failure in their disposition decision.
Finally, clinical judgement in the later collected low-risk cohort might have improved over time because of the aforementioned factors. Because the purpose of this study was not to compare clinical judgement between the cohorts, the conclusion of our study is not affected by a possible improvement of clinical judgement.
The main finding of this study is that the prognostic performance of PIRO is similar to MEDS and clinical judgement, but better than the traditional sepsis staging system.22 The distinction between severe sepsis and septic shock is not useful for prediction of mortality. In the high-risk cohort, prognostic properties of PIRO are worse than those in the low-risk cohort, while in high-risk populations illness severity scores are needed, because clinical judgement perfectly identifies low-risk patients who can be safely discharged home. The organ failure component of PIRO seems to be its most useful component.
Prognostic performance of illness severity scores and clinical judgement for relevant clinical outcomes
PIRO, MEDS and sepsis and clinical judgement category are associated with in-hospital mortality, as in previous American studies,1 ,10 ,11 but a new finding of this study is that they are also associated with 28-day mortality taking the time to death into account, as was shown by the survival analyses (figure 2). It was shown that the total PIRO score has better prognostic properties than the traditional sepsis staging system,22 confirming that PIRO is a useful concept,7 also in Dutch ED patients with a suspected infection. In high-risk patients, the distinction between severe sepsis and septic shock (AUC=0.5) is of no use for mortality prediction. However, the total PIRO score has no additional value over the already existing MEDS score and clinical judgement. So, should the PIRO score be used as a risk stratification tool for the individual ED patient, or should it be used as an instrument to characterise ED populations for research purposes?
PIRO score as a risk stratification tool for ED patients with suspected infection
High risk versus low risk
The PIRO score has been validated in relatively low-risk American ED populations.7 In high-risk populations, in particular, risk stratification should be helpful in guiding initial treatment and disposition to ward or ICU, because in these populations the consequences of inadequate ED management are detrimental.4 Unfortunately, direct comparison of the prognostic performance of PIRO between a high-risk and low-risk population showed that PIRO performs less well in a high-risk population, similar to previous studies of PIRO and MEDS.12 ,23 The lower prognostic performance is explained by the more aggressive initial treatment and disposition (see tables 1 and 2) in high-risk patients, which affects outcome and also results in a rapid change of the potentially reversible response and organ failure components of PIRO, decreasing the prognostic accuracy of PIRO. The decrease in HRs after correction for the quality of ED treatment and disposition to ward or ICU supports this hypothesis. In low-risk patients the initial illness severity score is influenced less by ED management, and prognostic performance is therefore better. Interestingly, although the prognostic performance of the initial PIRO score decreases with aggressive ED management, the score does exactly what it is supposed to do—namely, alert the ED physician to treat ill patients aggressively and admit the sickest patients to the ICU. However, this is not ‘captured’ in an AUC or Hosmer–Lemeshow test, statistical measures that are generally used (and were used in this study) to measure the quality of an illness severity score for the ED. Incorporation of the response to ED treatment is expected to improve the prognostic performance (as measured by AUC and the Hosmer–Lemeshow statistic).
Organ failure score
Treatment effect depends on predicted mortality,24 but its initiation should preferably be based on pathophysiological changes that can still be reversed by an intervention and not on predicted mortality itself.5 The non-modifiable predisposition and infection variables could help the decision by the ED physician to admit a patient to a ward or ICU—for example, in a 90-year-old nursing home resident with chronic obstructive pulmonary disease and pneumonia but no acute organ failure, the PIRO score will be high but ICU admission will not have any additional value because mechanical ventilation, vasopressors, central veno-venous haemodialysis are not indicated. On the other hand, a 30-year-old patient with septic shock, or other signs of potentially reversible organ failure, will probably benefit from intensive care treatment.5 Interestingly, the organ failure score performs as well as the total PIRO score in both the high-risk and low-risk cohort and much better than the other individual components (figure 5), corresponding with previous studies. 7 ,25 It has the great advantage that it is much simpler and therefore has more chance of being successfully implemented in clinical practice. In table 2 it can be seen that only the non-survivors had signs of organ failure, while organ failure was always absent in survivors.
Risk stratification by the ED physician compared with illness severity scores
The physician's risk estimate has similar prognostic performance to PIRO and MEDS and can be used to safely discharge ED patients home. If the lowest categories of PIRO, MEDS and sepsis had been used for the decision to discharge the patient home, 1–3% of the ED patients with a suspected infection would have died outside the hospital (figure 1). However, the AUCs of ∼0.68 and 0.83 in the high-risk and low-risk cohorts, respectively, indicate that clinical judgement is not perfect and that there is still room for improvement.
On the one hand, clinical judgement is expected to have better prognostic performance because it incorporates the patient's response to treatment, as discussed above and does not depend on cut-off values. Thus, an ED physician will interpret a systolic blood pressure of 91 mm Hg as hypotension, but in the PIRO score the same amount of points will be assigned as to a patient with a systolic blood pressure of 120 mm Hg. On the other hand, concerns that are valuable for clinical practice but not related to illness severity, such as social circumstances and the inability to take oral medication,9–11 would have led to hospital admission and consequent decrease of prognostic performance of clinical judgement in this study.
Our findings correspond with the findings of Sinuff et al,11 who found that the discriminative value of risk prediction by ICU physicians had an AUC of 0.85, better than the risk prediction by various illness severity scores.
One great advantage of the PIRO and MEDS scores is that they provide an objective way of measuring and reporting illness severity in research, which will always be difficult for clinical judgement.
In conclusion, the PIRO score has better prognostic performance than the conventional SIRS concept but adds little to clinical judgement and the already existing MEDS score for risk stratification of the individual patient with sepsis, especially in a high-risk population where a risk stratification tool would be most useful. The individual components of the PIRO staging system, especially the organ failure score, are most useful for guidance of initial treatment and disposition of ED patients with a suspected infection. Future research should focus on the use of the separate components of the PIRO classification in clinical decision-making, the change of the acute physiology components in the PIRO score and the willingness of doctors to incorporate the PIRO staging system in their clinical practice.
We are grateful to all the nurses, staff members, senior house officers and residents who were involved in patient inclusion.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online supplement
Contributors BDG devised and designed the study, collected data, contributed to the analyses and edited the manuscript. BDG takes full responsibility for the study as a whole. JL collected data, carried out the analyses and wrote the manuscript. ERJTdD, AV collected data.
Competing interests None.
Ethics approval Ethics comittee of Leiden University Medical Centre.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.