Identifying, treating and prognosticating patients who attend the Emergency Department with infective symptoms, is part of the bread and butter of an emergency physician’s practice. It’s important that we identify patients that will develop or are at risk of developing septic shock accurately and swiftly. In the UK mortality from severe sepsis is 18.2-21%1.
Compare this to the widely accepted medical emergency of a STEMI that has a mortality of 7.3% at one year2 .
Sepsis is a problem I am not convinced we have solved. The new Sepsis 3 guidelines have been discussed on St.Emlyn’s, http://www.stemlynsblog.org/sepsis-16/,3
They aim to re-define Sepsis, moving away from the ‘SIRS’ criteria and introducing ‘SOFA’ and ‘qSOFA’. The Sequential [Sepsis related] organ failure assessment score (SOFA), states that if a patient has an increase in their SOFA score by 2 or more points they have an overall mortality rate of 10%, thereby identifying this cohort as high risk patients. In order to use this scoring system, we need some test results; not always a luxury we have in the Emergency department at the beginning of a patient’s journey. So, enter the quick SOFA score (qSOFA), a bedside test assessing respiratory rate, blood pressure and mental state. Patients scoring 2 or more with this system are predicted to be at high risk for a bad outcome. So how do these scoring systems hold up against each other? Are any of them sensitive or specific at identifying patients who are at high risk of inpatient mortality?
This paper from JAMA puts SIRS, qSOFA and SOFA to the test. It’s a large, multi-centred, international Retrospective Cohort study including 184,875 patients 4.
What did they do and who was studied?
This study looked at adult patients admitted to intensive care with a suspected infection. Its primary aim was to identify the effect of an increase in SIRS, SOFA and qSOFA score by 2 or more points (within the first 24 hours of admission) had on in hospital mortality and length of ITU stay. Researchers have studied data from the Australian and New Zealand Intensive Care Society (ANZICS) database looking at admissions to Intensive care units in New Zealand and Australia over a 5-year period for any patient who had an admission code of suspected infection. Using this database, SIRS, qSOFA and SOFA scores from the first 24 hours of ITU admission were calculated, as well as outcomes of in- hospital mortality or an ITU admission being over 3 days. The SOFA and qSOFA score are looking to identify the cohort of patients that are going to go on and develop organ failure, so is designing a trail that looks at an different outcome (mortality and length of ITU stay) a fair test of the accuracy of these scoring systems?
All patients were assumed to have a score of 0 prior to entering the ITU, therefore a calculated score of 2 or more at time of data collection, was taken to be an increase. Sepsis 3 does state that all patients should be assumed to have a SOFA score of 0 as a baseline, unless they have a known pre-existing organ dysfunction. However, I feel this presents a difficulty when comparing SOFA, SIRS, and qSOFA. I am unconvinced that a patient being admitted to an Intensive care unit will have a base line SIRS score of 0. Patients unwell enough to require ITU admission will usually have deranged physiological parameters. So is this data truly comparing a rise in score of 2 or more points at the time of admission to ITU or is it looking at patients who have a score of 2 or more points?
In this study there was a mortality rate of 18.7%, with a statistically significant difference between the mean age of the survivors (61.4) and the non survivors (69.2), as well as a significantly higher mean APACHE III score on admission in the non-survivor group (91.8 vs 56.3).
There was varying mortality depending on source of admission; 37.4% of patients were admitted directly from the Emergency Department. This group had a mortality rate of 17%, compared to patients admitted from the ward (accounting for 26.2% of admissions) that had a mortality rate of 27.9%. There is no break down of scoring systems for these patients, but does this mean patients arriving to ITU from the ward, are arriving sicker or have we not predicted these patients would develop sepsis related organ failure early enough in their journey?
Diagnosis coded on ITU admission has also been detailed; the commonest diagnosis on admission was bacterial pneumonia (17.7%) and sepsis from unknown source (17.2. It is important to highlight that pneumonia has 4 separate codes (bacterial, viral, parasitic and pneumonia), if these were clustered together they would account for 25% of admissions. The diagnosis associated with the highest mortality (37%) was parasitic pneumonia however this only accounted for 0.3% of admissions. This is followed by a mortality rate of 34% in septic shock with an unspecified source that accounted for 16% of admissions. Before we have looked at the scoring systems, we have already identified several factors such as age, and diagnosis that could potentially help us identify patients who are at high risk of mortality. I am sure these are things we were already consider when reviewing patients, but it’s important that we don’t just rely on scoring systems alone, and remember they are a useful tool but not the only one we should have in our toolkit
Were they limited by the data recorded?
Previously on St.Emlyn’s we have looked at the potential problems a retrospective study can fall into. We currently have a podcast series running on Critical Appraisal Nuggets (CAN) to highlight key topics to help you appraise papers yourself. If you haven’t listened to these then you can do so by clicking this link 5.
Data collected on the ANZICS database are expansive, however not aimed for calculating SOFA, qSOFA and SIRS scores. As a result, not all patients included in the study could have a score calculated for each scoring system. The score that had the smallest number of patients in was SIRS calculation, 19010 patients were excluded for missing data of all components of the score.
The main data that was missing for accurate calculations was knowledge regarding the level of inotropic support a patient was receiving. This is needed for calculation of the cardiovascular component of the SOFA score. In the method it is stated that for patients for whom there is missing data for one component of the scoring system they have entered a score of 0 for that component. It is only if every data component is missing that, that patient was excluded, for that particular scoring system. We are unable to see how many patients have been under-scored, a factor that could potentially impact the results shown on the ROC curves. I would be interested in seeing a worst-case analysis i.e. assigning the highest scores for unknown values as well as a best case. If patients have been underscored, the scoring systems may appear to be less sensitive than they are.
What about the scores on the doors?
Not all patients enrolled in the study could have each score calculated; therefore the number of patients varies for each group. Percentages stated below are percentage of that subgroup, not of the overall study number of (184,875 patients).
1 SIRS – 86.7% of patients had a SIRS score of ≥ 2, of these 19.9% died
2 SOFA – 90.1% of patients had a SOFA score of ≥2, of these 20.2% died
3 qSOFA – 54.4% of patients had a qSOFA score of ≥ 2, of these 22.8% died
Mortality was significantly lower in patients scoring < 2 for each system. Notably with the SOFA score, mortality was only 4% patients scoring less than 2.
When combining in hospital mortality with ITU stay for ≥ 3 days to the results, the ROC curve still shows SOFA as the superior scoring system, with qSOFA and SIRS very closely matched. It’s important to remember that while a ROC curve is good graphical representation of data, with the basic principle of the area under the curve being the probability that the test done will correctly represent a true positive or true negative result. Or put another way the closer the curve is to the left-hand boarder and top of the graph the more accurate the test result.
These ROC curves do not allow us to see the exact sensitivity and specificity for mortality and ITU stay for each test at the cut off of a score ≥ 2. So, while we can look at the graph and say SOFA appears to perform with a greater sensitive and specificity the exact numbers are not known, for the cut off used in this trial of a score ≥ 2.
From very careful reading of the paper we think we have been able to extract the data needed to calculate the sensitivity and specificity for in-hospital mortality of a score ≥ 2 with SIRS, qSOFA and SOFA. As this data was a little challenging to put together, we have included the tables needed to work out specificity and sensitivity for each score.
|SIRS score||Has disease (non-survivors)||No disease (alive)||Sensitivity||Specificity|
|Positive result (≥2)||31,648||127,062|
|Negative result (<2)||2,387||21,877|
|SOFA score||Has disease (non-survivor)||No disease (alive)||Sensitivity||Specificity|
|Positive result (≥2)||33,365||131,738|
|Negative result (<2)||793||17,435|
|qSOFA score||Has disease (non-survivor)||No disease (survivor)||Sensitivity||Specificity|
|Positive result (≥2)||22,758||76,853|
|Negative result (<2)||11,332||72,125|
While the sensitivity of SOFA and SIRS is high, with SOFA outperforming SIRS, there specificity is very low, leaving questions regarding the use of these scores. The specificity and sensitivity of qSOFA hovering around the 50% mark, questions its usefulness in prognosticating for patients.
Is it fair to compare SOFA, qSFOA and SIRS?
Quick bedside tests such as qSOFA and SIRS are potentially more applicable in environments in which we don’t have access lots of information such as biochemical markers, this is often early in a patient’s journey. Are these scoring systems perhaps more helpful in detecting patients who may require higher level of care, in a setting that is relatively information poor? Would it be appropriate to validate these scores in an environment such as an Emergency Department where they are more likely to be implemented?
What does this all mean?
This is a very large study, looking into a vital area of medicine. I believe this shows there is still work to be done to convince me that these scoring systems help us to accurately prognosticate for patients with suspected infections. We certainly have systems that show good sensitivity, however their specificity leaves many questions.
Before you go please don’t forget to…
- Subscribe to the blog (look top right for the link)
- Subscribe to our PODCAST on iTunes
- Follow us on twitter @stemlyns
- See our best pics and photos on Instagram
- PLEASE Like us on Facebook
- Find out more about the St.Emlyn’s team