A core skill in emergency medicine is in the identification and classification of risk categories for the patients we see. For many conditions we have derived, validated and refined scoring systems over many years (for example chest pain), but in COVID-19 we have faced the challenge of developing risk stratification tools at the same time that we are dealing with the patients arriving at the door. In the early phase of the pandemic several risk stratification tools were used based upon mortality data, but they were not systematically derived or validated. It’s therefore good to see the first output from the PRIEST trial in the UK.
PRIEST is similar to the RECOVERY trial in that it was borne out of a hibernating trial (the PAINTED trial) originally set up to deal with the next flu pandemic and has been delivered by the same team.
Currently the trial is unpublished and not peer-reviewed but is available on the medRxiv server. The abstract is below, but as it is open access we strongly encourage you to read the full paper yourself.
What type of trial is this?
This is a prospective observational cohort study. Patient with a specific condition (in this case COVID-19) were identified and then followed up to see what happened to them.
What did they do?
Patients attending emergency departments in the UK with suspected COVID-19 were identified and data collected on likely predictive signs and symptoms. These were then used to test established risk scores and then to derive a specific one for COVID-19 patients.
The PRIEST study split the data set into derivation and then validation sets to compare and contrast established scores and newly derived ones.
The primary outcome was death, or respiratory, cardiovascular or renal support.
What did they find?
Data was collected from 22445 patients attending 70 emergency departments in the UK. 11773 patients were used to derive a COVID-19 risk stratification tool which was then validated in 9118 cases. 22.4% of patients died. Data was available for analysis from 20892 patients.
They used ROC curves to evaluate which tools work best in the ED population to risk stratify patients. Specifically they looked at CURB-65, PMEWS, NEWS2, SFAHP and WHO tools.
Of the 22445 patients assessed, 13997 (67%) were admitted of whom 6251 (31.2%) tested positive for COVID-19.
In terms of predicting the primary outcome, and as measured by the c-statistic (a measure of the area under the ROC curve), the scores were CURB-65 0.75; CRB-65 0.70; PMEWS 0.77; NEWS2 (score) 0.77; NEWS2 (rule) 0.69; SFAHP (6-point) 0.70; SFAHP (7-point) 0.68; WHO algorithm 0.61.
As with many tools there was a balance to be drawn between sensitivity and specificity. PMEWS, WHO criteria and NEWS2 (at a lower threshold) had high sensitivity, but poor specificity. CURB-65, PMEWS and NEWS2 were reasonable prediction tools for adverse outcomes in suspected COVID-19, and predicted death without organ support better than receipt of organ support.
c-statistic | Sens | Spec | |
CURB-65 | 0,75 | 71 | 70 |
CRB-65 | 0.70 | 86 | 48 |
PMEWS | 0.77 | 96 | 31 |
NEWS2 (score) | 0.77 | 77 | 64 |
NEWS2 (rule) | 0.69 | 83 | 55 |
SFAHP 6-point | 0.70 | 74 | 66 |
SFAHP 7-point | 0.68 | 88 | 49 |
WHO algorithm | 0.61 | 95 | 27 |
There is no perfect score here. Although scores such as CURB-65 have reasonable performance on the c-statistic, they are a blend of average sens and spec and so less useful in practice (arguably). In practice we are usually looking for scores that either rule in or rule out diagnoses. None of the scores have high specificity and so none can perform as rule in tests. PMEWS and the WHO algorithm have good sensitivity, but at the expense of very poor specificity.
There was some differences in the scores as to whether they were trying to predict death or organ support. The scores perhaps varied as there are triage decisions about potentially futile admissions to critical care (perhaps).
What about the ISARIC 4C score?
The ISARIC study of in patients also derived a risk stratification tool to predict mortality but I’ve had concerns about using the 4C score in the emergency department as it was derived only amongst patients admitted to hospital and from an EBM perspective it’s important that any tool is derived and then validated in the population in which it is intended to be used. In that respect the PRIEST is a better approach in answering the ED clinician’s questions than ISARIC. However, what I would love to see (if possible) is an re-analysis of the patients in this study using the ISARIC 4C criteria.
It’s likely that the 4C score will produce similar results to PRIEST and 4C arguably has the benefit of producing a more bespoke mortality risk for each score produced, but I would prefer that it was validated in an ED population before it’s widely adopted. However, like much evidence in the COVID age that check may not happen and it seems to be being adopted without ED validation. You can read and listen more about our concerns that EBM principles have been lost here
You can read more about the ISARIC score on St Emlyn’s and on REBEL EM.
Other questions with this study.
The authors critique is reasonable in the paper. Data was obtained pragmatically and from clinical records or reporting forms. It was not applied to patients and so we cannot really know what the impact of the score might be in practice.
It’s also interesting to see that a minority of patients actually tested positive for COVID-19. That may appear to be a problem for the study, but I suspect it’s less important than it first appears. At the time of the study rapid testing was unavailable and so this pragmatically represented the sort of clinical decisions that were made at the time. Similarly, the testing procedures were less than perfect and so false negatives and positives were likely. In that regard the test was evaluated in the population that it would be used in, but as testing improves and near patient testing becomes more useful such that we may be identify patients who do actually have COVID-19 then the performance of these scores will likely change.
Final thoughts.
This is a really useful study as it’s the largest study of prediction tools in an ED population. As it stands none of the tools assessed here are likely to be useful to the emergency clinician unless they are looking for a high sensitivity test at the expense of low specificity. Further work on comparing the score to the ISARIC 4C and on a more defined group of patients who have rapid COVID-19 positive testing would be useful.
References
- Prognostic accuracy of emergency department triage tools for adults with suspected COVID-19: The PRIEST observational cohort study https://www.medrxiv.org/content/10.1101/2020.09.02.20185892v1
- Dexamethasone, COVID-19 and the RECOVERY trial. St Emlyn’s https://www.stemlynsblog.org/dexamethasone-covid-19-and-the-recovery-trial-st-emlyns/
- The UK hibernated pandemic influenza research portfolio: triggered for COVID-19 https://www.thelancet.com/journals/laninf/article/PIIS1473-3099%2820%2930398-4/fulltext
- The ISARIC WHO Clinical Characterization Protocol: Risk Stratification of Patients Admitted to Hospital with COVID-19 https://rebelem.com/the-isaric-who-clinical-characterization-protocol-risk-stratification-of-patients-admitted-to-hospital-with-covid-19/
- ISARIC prediction tool https://isaric4c.net/outputs/4c_score/
- Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score https://www.bmj.com/content/370/bmj.m3339
- JC: ISARIC. Possibly The Best COVID-19 Risk Prediction Tool To Date https://www.stemlynsblog.org/jc-isaric-possibly-the-best-covid-19-risk-prediction-tool-to-date/
- The ISARIC WHO Clinical Characterization Protocol: Risk Stratification of Patients Admitted to Hospital with COVID-19 https://rebelem.com/the-isaric-who-clinical-characterization-protocol-risk-stratification-of-patients-admitted-to-hospital-with-covid-19/
- Evidence-based medicine and COVID-19: what to believe and when to change https://emj.bmj.com/content/37/9/572
- SGEM XTRA: EBM AND THE CHANGINGMAN https://thesgem.com/2020/07/sgem-xtra-ebm-and-the-changingman/
Thanks for the great summary. I guess the main issue is no matter how many patients are in a study, can you really glean much if you’re using an outcome ‘death’ when you’re saying the clinical decision may largely be ‘can I send this patient home?’ – in the end predicitng a future outcome like death, in a heterogenous population (frail >90yr-olds vs 20-40yr-olds prev fit & well), even if they all have the same problem, is v difficult & the same old variables come up (NEWS, age, co-morb/performance status) that is part of any assessment anyway.
Hi Luke and thanks for your interest.
We agreed that what EM practitioners would want to know is whether a patient could go home safely – which is why we picked death OR need for organ support (HDU or ICU level cardiac, respiratory or renal) as our outcome. You don’t want to discharge a patient who is/will become big sick and need critical care to survive. And this is a fundamental problem with using scores which predict 30-day mortality to say that patients don’t need to be in hospital…(but that’s a different soapbox).
Kirsty (DOI PRIEST team)
Thanks for the interesting summary. I wonder if the abstract in the blog is different to the study that is being appraised?
Double checked and It’s the right one. Nature abstracts are a bit odd in comparison to our usual fare.
So there are 2 PRIEST preprints of relevance here:
https://www.medrxiv.org/content/10.1101/2020.09.01.20185793v2 (assessing the performance of existing tools).
https://www.medrxiv.org/content/10.1101/2020.10.12.20209809v2 (developing & validating a de novo tool).
Both are worth a read (but I would say that).
Hello Simon,
Thank you for the review of the PREIST study paper, it is well balanced.
Happy to say that the derivation/validation paper is now published in PLOS one: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0245840
Ben (PRIEST study manager)