JC: Do you see the light? Serum neurofilament light chain for prognostication following OOHCA

I’ve been meaning to do a blog on this paper for some time now. Paul Young, who’s opinion I have a lot of time for, seemed clearly excited about its release. And in December at the ICSSOA 2018 Celia Bradford selected it as one of her top 5 neuro papers of the year. It also got a cheeky nod from Alasdair Proudfoot in the top 5 cardiovascular papers. Evidence enough that people are sitting up and taking notice.

This is an extraordinary piece of work which is well worth reading in full. It may prove transformative.https://t.co/xVVGIxPdWd
— Paul Young (@DogICUma) November 2, 2018

This attention is not surprising when the paper concludes that “(serum neurofilament light chain) performs better (at predicting neurologic outcome after cardiac arrest) than other biochemical, clinical, neuroimaging and electrophysiological methods.” This is quite the claim. And it is a claim regarding an area that we all struggle with. Neuroprognostication following cardiac arrest has changed over recent years; we all want to be clear about who has any chance of a good outcome, and who doesn’t, but this is not black and white. What does a ‘good outcome’ mean to you, and what does it mean to your patient? What does a ‘trial of therapy’ mean? What duration of time is enough to be confident in your prognosis of a poor outcome, in those failing to improve? And are you sure this is a good use of the bed/staff/morale/equipment when the chances of survival are down to the low single digits?

Follow this link to the paper

Any paper like this which seems to offer us a dichotomous objective test to inform this type of tricky decision making is always going to be very attractive to clinicians. Our moral compass can point straight and true if we withdraw life sustaining treatment because the test told us to do it. As such, we have an immediate issue that we want this test to be a good one. That affects the way we read the paper. There is a risk of focussing effect, anchoring and/or confirmation bias.

Luckily St Emlyns is on hand for you to break it down. Although let’s remember, as a neurointensivist, I am likely to be biased in my appraisal. But I’ll try not to be.

As always, we would recommend downloading this paper and reading it in full. I don’t think this is FOAMed in the classic sense, but this really is one worth the hassle of dropping your librarian an email. Other summaries can be found here if you already think I’ve been going on for too long.

Tell me about the paper.

This is essentially an international multicentre prospective observational cohort study of 717 patients presenting with ROSC after cardiac arrest, who had serum biomarkers collected at several intervals and underwent protocolised neuroprognostication if still comatose at 108 hours post arrest. The authors then describe follow up data at 6 months using the Cerebral Performance Scale, dichotomised to a good (CPC 1 or 2) and bad (CPC 3,4 or 5). Prognostic test characteristics are then described, such as the sensitivity, specificity and PPV of this test to predict a bad outcome.

What on earth is serum neurofilament light chain (NFL)

That’s a good question. Took me right back. Light chains are proteins produced by plasma cells. NFL appears to be a major protein component of the neuroaxonal cytoskeleton, providing structural support for axons and regulating diameter, which influences conduction velocity. In axonal damage, these proteins are released into CSF and subsequently peripheral blood. Early pilot work started looking at NFL levels in CSF through lumbar puncture. This current work centres on the theory that serial LPs are inconvenient and introduce risk, therefore serum NFL could offer a more practical and reproducible way of assessing axonal injury after cardiac arrest.

It is worth highlighting that NFL appears to be a general biomarker for axonal injury – work is ongoing to look at clinical relevance in multiple other disease states, including multiple sclerosis. A troponin of the brain, if you will. And troponin, as we know, can be released due to a variety of myocardial infarction states depending on the underlying pathological process.

Is this a new study

No. Let’s be clear about that from the outset. This is a subset of patients from the previously published randomised controlled TTM trial, who had serum samples sent to a biobank. These patients were recruited between 2010 and 2013. With 6 months follow up, one wonders why it took 4 years to write up and publish these data. The TTM lead authors are on the paper, so clearly have approved the analysis. I couldn’t find any mention of this study in the prespecified analysis plan for the TTM trial, but that may be an issue to do with tech development?

Who were the patients?

TTM recruited out of hospital cardiac arrest patients who remained unconscious on arrival to hospital. Their specific inclusion and exclusion criteria can be found here – it is of specific note that TTM excluded the following patients:

1. Obvious or suspected pregnancy

2. Known bleeding diathesis (medically induced coagulopathy (e.g. warfarin, clopidogrel) does not exclude the patient).

3. Suspected or confirmed acute intracranial bleeding

4. Suspected or confirmed acute stroke

5. Unwitnessed cardiac arrest with initial rhythm asystole

6. Known limitations in therapy and Do Not Resuscitate-order

7. Known disease making 180 days survival unlikely

8. Known pre-arrest Cerebral Performance Category 3 or 4

9. >4 hours (240 minutes) from ROSC to screening

10. Temp <30C on admission

11. Systolic blood pressure < 80 mm Hg in spite of fluid loading/vasopressor and/or inotropic medication/intra aortic balloon pump

As such any results from this subgroup analysis are not generalizable to these populations above. This is a particularly depressing point for those of us working in neuro centres, as it seems that anyone with a significant brain injury of any kind would have been excluded. As such, we don’t have any idea about what serum NFL tells us in these situations.

360 patients (50.2%) had a poor neurologic outcome at 6 months as per their prespecified definition.

What did they do?

All patients recruited to the biobank arm of the study were supposed to have serum NFL samples sent at 24, 48 and 72h post ROSC. Of 963 recruited to TTM, 819 were engaged in the biobank arm, 782 patients were deemed eligible for this study and then for some reason only 717 samples were available for NFL analysis. The authors report 65 exclusions for sample problems or missing outcome data, but don’t really tell us what happened to the other 181 patients.

They compared different levels of serum NFL against 6 month outcome as planned, and then also against other tools used for prognostication. These other tools included currently available biomarkers, neuroimaging, electroencephalography and somatosensory evoked potentials.

Outcome was determined by CPC score as noted, but it is worth considering the neuroprognostication protocol within the TTM trial so we can be clear about how this was done. Essentially, this supported initial targeted temperature management with mandatory sedation (at either 33 or 36 degrees C depending on randomisation) for 28 hours, followed by cautious rewarming at 0.5degrees C per hour until back to 37, tapering/cessation of sedation at 36h and then fever control (<37.5 degrees C) for a total of 72h post ROSC. At this stage, a blinded physician performed a clinical assessment of those patients who remained unconscious and a decision on WLST was taken clinically, but supported by a variety of criteria as listed in the supplementary appendix. Ongoing care was at the discretion of the clinical treating team.

How did they plan to analyse the data?

The authors wanted to look at specificity primarily, with the rationale that a false positive test in this situation (a raised biomarker predicting futility when actually the patient might have survived) would be the worst possible result. In other words, they wanted to rule in a poor outcome (SpIN) – ‘if the biomarker is high, you are likely to die. If the biomarker is low, then onwards we go’. I suppose this fair is enough, but then they decided to report cut points that would provide specificities between 95-100%. As soon as we drop from 100 here, we are starting to see false positives. Bearing in mind that this test is taken early on in the disease process, this makes me anxious.

There was also no sample size calculation and there was lots of talk of comparison and regression models in this section. However, we should bear in mind that these analyses could only be performed in patients who had the additional tests (which was always clinician directed and not 100%). So the additional analyses are only hypothesis generating really.

What were the results

First off, not everyone had the relevant serum NFL levels and so data is reported on all patients but with >5% of cases missing 2 timepoint measurements. But the first headline is that serum NFL was significantly higher at all three intervals in those patients with a poor outcome, compared to those with a good one. And by some stretch; well into 4 digits in the poor outcome group, compared to 2 in the good. This significant effect remained after adjustment for target temperature, age and sex. The rise in serum NFL also seemed to correlate with the severity of outcome, such that a higher CPC score was associated with a higher NFL. Face validity – check.

The authors then looked at prognostic performance and report an area under the receiver operating curve of 0.94. This is very high indeed and implies an excellent diagnostic test. These data were compared to other recognised and established biomarkers and other well used tests in neuroprognostication such as neuroimaging, somatosensory evoked potentials and clinical examination of brainstem reflexes. Their headline was of serum NFL outperforming all other investigations – The AUC values knock the other biomarkers out of the park, and NFL appeared to have improved sensitivity over all other adjunctive tests. For the latter comparisons they used matched specificity cut points, which is a little confusing – this implies that they were selecting bespoke cupoints to match the specificity of CT for example, then reporting on differences in other test characteristics. This is a bit naughty – they could have chosen a suggested cut point and then compared specificity of other tests as an alternative method I think.

The authors also looked at the additional value of serum NFL when added to features such as clinical information and bedside neurological testing. The AUC values increased quite remarkably but it was a little unclear as to the timepoints of these assessments.

Wow – sounds too good to be true?

Well, these are certainly exciting results. Table 2 in particular highlights potential cut off values mapped to diagnostic test characteristic data and reveals just how useful this test could potentially be. For instance, at 24hours after ROSC, a cut off level of 12,317 for serum NFL resulted in 100% specificity for prediction of a poor outcome, with 0 false positive results in 693 patients. Of this large cohort, this corresponds to 178 patient families being informed within 24h that the outcome looks poor and 0 families being told this in error. Of the remaining 500 odd patients, 164 with a NFL lower than the cut point go on to have a bad outcome, but 351 with a low value have a good outcome. At 48 and 72 hours the sensitivity just keeps getting better when the authors maintain a cut point producing 100% specificity.

When they look at cut points with any less than 100% specificity, then of course you start to get false positives. Even then however, the figures remain interesting. If you enjoy likelihood ratios, then you can convert the figures provided in table 2 – for example, serum NFL at 24h with a cutpoint of 641pg/mL and specificity of 99% would correspond to a positive likelihood ratio of >50 and a negative likelihood ration of <0.5. These really are excellent likelihood ratios for a prognostic test.

There must be a kicker though?

There is. Well there are 2 really.

Firstly, this ‘test’ is one of many in critical care which has a high risk of being a self fulfilling prophecy. Although a specificity of 99% would usually be deemed excellent, what that means in this cohort is that if we rely on the test individually, we will tell 4 families (out of 693) that their loved one has little chance of a good outcome incorrectly. This will often lead to discussions on futility, preference and withdrawal of life sustaining treatment. Thus, we have very little opportunity to correct our mistake. These patients do not represent, or ask for second opinions. And worse still, we will never know that we made a mistake.

Second, the authors of this paper do not propose a cut point for NFL. Their data presents prognostic information based on the outcomes from this dataset, but at no point do they validate this externally or suggest a figure that could be used in clinical practice. The implication is that a really high NFL is a bad prognostic sign. But how high is really high? How high does it have to be before you are certain of futility?

Any other concerns?

Yes, a few minor ones. We have already mentioned the decline in participants from TTM recruitment, to biobanking, to analysis. This study is presented under the banner of the TTM trial, but it is not the same cohort overall. Are the results therefore generalisable?

In addition, 2 of the lead authors are cofounders of the tech. We should be thinking about this like pharmaceutical research really. Often industry gets away with minimal declarations if it is not drug related. However, the same biases and concerns exist. Although this study is presented clearly and methodically, the authors have pecuniary interest in the success of the biomarker.

Lastly, the powering is interesting here. They could have powered this on test characteristic data. They chose instead to try and derive a cut point. There is also a bit of logistic regression in there, some bootstrapping and mention of AUROC data. This is a relatively small group of patients to do all this on, which screams the need for further independent validation studies in my eyes.

And the take home?

Like everyone else, I wanted this to be the test that absolves me of personal accountability and the need for careful examination, ancillary testing and balanced judgement in this cohort. I do not think it is that test.

Personally I think the authors overstate the conclusions. Although this test appears exciting, we need to be absolutely explicit about the fact it has no role at present for anyone with any kind of brain injury, and that the authors have not as yet externally validated the results or defined a cut point for practical clinical use.

Certainly, if these results are validated then in an uncomplicated post cardiac arrest patient, high serum NFL values at 24/48/72 hours will add to my clinical impression of a likely poor outcome. But I will certainly use this only alongside my standard ESICM endorsed pathway of multimodality prognostication. And I will be careful in my discussions with families.

If we think there is any chance whatsoever of functional recovery in these patients, then we owe it to them to be cautious, objective and gentle in our prognostication. Often this involves several days anyway, as care is delivered, sedation cleared, clinical examination repeated and end of life discussions commenced. How much does a biomarker add to this pathway? Potentially a reasonable amount of supporting information. But I do not think this is an overt game changer and I would never rely on a biomarker in isolation for these patients.

Maybe that’s just me. What about you?

Cheers

Dan @ RCEMProf

References

Peter Tagmose Thomsen

March 27, 2019 at 7:14 am

Thanks for the post!

Though it’s not typical FOAMed, this area has great relevance when you are standing in the CPR-scenarios

I’ve done a systematic review (unpublished) of the litterature on this in 2016-2017, and I agree with you. This is not a test, that will make all other tests redundant, as several studies on i.e SSEP and certain EEG patterns have shown similar FPR values (more or less consistently)

The most thorough review in the TTM era – the Sandroni et al review from 2014 (which was adapted for ERC 2015) – stated that for a single test to be consistently “good enough”, 5 studies from 3 different teams with FPR 95% CI <10 must be done (in my oppinion this has not happened yet – and I say oppinion, as the studies are quiet heterogenous and therefore hard to interpret). In the ERC version they go to say "low 95% CI", instead.

With an FPR of 2% (specificity -1), this is in the better end of the spectrum of tests, but not better than SSEP and EEG (which also in some studies have similar FPR's)
The conclusion is therefore the same – as you state. This test goes into the pile of multimodality as recommended by the ERC and ILCOR. However looks like this biomarker might beat NSE and S100B.

Sadly I do not have access to this paper, but I would be curious to what they have testet it against as the golden standard. The problem with self-fulfilling prophecy is (one of) the main problems in these studies

I don't think there will ever be a single test with low enough FPR – and as you mention, it will always be a conversation with the families

All the best
Peter Tagmose Thomsen, Doctor in Emergency Medicine / Neurology, Denmark/Sweden