JC: Pain Scales in the Paediatric ED

I will confess a geeky interest in paediatric pain management (I’ve spoken and written about it before at St Emlyns), so when I spotted this paper published online ahead of print a few months ago I flagged it for a closer look. When I finally got around to diving into the article properly I was a little surprised by what I read there – but don’t take my word for it, head on over to the EMJ¹ and check the paper out for yourself.

Pain Scales? What is this paper about?

Measuring paediatric pain is pretty tricky. Pain is a subjective and multifactorial experience, particularly for children for whom anxiety, fear, tiredness, hunger and other non-nociceptive factors may have a profound influence. Part of our role in the Emergency Department is to provide symptomatic relief for our patients alongside diagnosis (or at least exclusion of serious and significant pathology) and it can be helpful to measure how well we are relieving pain so we can determine the need for additional interventions. However, many adults struggle to give their pain a number – not unreasonably – so there are a few alternatives used commonly for the sequential assessment of pain in children.

The Royal College of Emergency Medicine has a paediatric pain best practice guideline² and it is the pain scale used therein that is central to the premise of this publication.

The authors looked at a convenience sample of children presenting to a single ED with an upper limb injury (who had not received any analgesia prior to attendance), then measured a single point pain score in a variety of different ways by different parties (treating doctor, triage nurse and the patient themselves) and looked for agreement in the pain scales. This is definitely a relevant research question because we can easily assume that being able to accurately assess the severity of patients’ pain will make us better at treating it – no?

So what did they actually do?

The manuscript is quite short so you get the feeling there are a few details the authors could have shared with us to make it a little clearer, but it sounds like essentially the patients were divided into two groups (age 0-8 and 8-16) and pain scores were obtained.

For the 0-8 year olds, both doctor and nurse provided pain scores using the Wong-Baker FACESScore and the Behavioural Pattern Score, both of which you can see on the RCEM guideline above. The child did not provide a pain assessment – more on this later.

For the older age group, doctor and nurse scored the patient’s pain using the behaviour score while the child used the FACESScore and ladder score (e.g. a number from 0-10). The authors then compared the healthcare practitioners’ scores to one another and to the child’s scores.

What did they find?

For the under 8s, there was reasonable agreement between the doctor and nurse.

For the FACESScore, there was 84.6% agreement, giving a Kappa score of 0.778. As you will no doubt remember from that wonderful FRCEM critical appraisal exam (!) the Kappa score, or Cohen’s Kappa, is a value from 0-1 giving a level of interobserver reliability for a particular observation (more on Kappa here³). It is important in studies where there are several researchers measuring the same thing; it is used to ensure if one researcher writes “5”, the value would be considered by other researchers to be “5” – if there is significant disagreement this should prompt consideration of an alternative observation. Kappa is usually used as part of a good quality study to demonstrate that the measurements used in statistical analysis are reliable so it is quite unusual to see it as an outcome measure as it is used here.

For the Behaviour Score, the agreement wasn’t as good – 69.2% with a Kappa score of 0.573. This is suggested to represent “moderate” agreement between observers but it does also suggest a significant amount of disagreement. Worth thinking about.

Then in the over 8s group, the doctor and nurse continue to show good agreement – 82.4% with a Kappa score of 0.729. That’s “substantial”, apparently. However, neither doctor nor nurse score had agreement as impressive as that with the child’s score, irrespective of whether the child was using the ladder or FACESScore. The best agreement was between the doctor and the child using the FACESScore – 59.3% (Kappa 0.385, “fair”) and the worst between nurse and child using the ladder score – 40.7% (Kappa 0.182, “none to slight”). Child ladder score comparisons with doctors and nurses were both pretty rubbish, so the authors suggest that the FACESScore is more reliable and that the ladder score can be omitted. This strikes me as a little odd – were the children scoring their pain in different categories when they used the FACESScore and the ladder – as in, was a child saying 2/10 severity (mild pain) but selecting the “moderate pain” face? Were the children demonstrating poor intraobserver reliability (isn’t that the only way these results could have occurred), and if so, doesn’t that have significant ramifications for the validity of this paper as a whole?

Issues and challenges with this paper

First off, I was perplexed by the omission of pain scores from the 0-8 group. Of course pain scoring requires an element of abstract thinking, something which most preschool children will be unable to do (children are very literal in this age group) and obviously non-verbal children can’t contribute a score – but 8 years old seems like a strange cutoff: I have certainly asked younger children than 8 to rate their pain and I haven’t been concerned about the reliability of their assessment.

Then there’s a frustrating lack of information about the direction of difference between child and healthcare professional scores. I don’t quite read the results in the same way as the authors do – I read that professionals tend to agree on their assessments of a child’s pain but the child’s view is often different. I really want to know whether children are scoring their pain consistently higher or lower than healthcare professionals, or if the scores are all over the place.

Without this information, I don’t think I can agree with the authors’ conclusion that the ladder scale – only provided by patients themselves – is unreliable. This is suggesting that children themselves are less reliable at assessing their pain than people who are not experiencing it! Can you imagine suggesting the same of adult patients? I have a feeling some of my patients would tell you where to go if you tried to suggest your assessment of their pain was more accurate than their own. There’s a certain paternalistic arrogance about the dismissal of patient pain scores as “unreliable” that makes alarm bells ring in my head.

This might read as a criticism of the authors – if it does I must immediately apologise as that is certainly not my intention. But I have a grave concern that this simply adds to a culture of excluding children from decisions about their own care; disempowering them and removing their voice from conversations about the healthcare they are receiving. There’s good evidence we do a lot of that already – and also good evidence that children often want to be included in consultations and decisions made about their care. There’s a lot more I can say on this particular topic but much of it will be feeding into my talk at the smaccMINI workshop in Berlin so I’m going to leave this rant here – more to come in later blog posts though.

Thirdly, this was a single snapshot of pain scores – and that is important because as much as an initial score might guide the nature of treatment we reach for, we are particularly interested in responses to treatment; that is, how much a particular intervention reduces the pain score, if at all. This paper does not address that and I can’t help but wonder if a post-intervention score (to give a treatment effect measurement) would be a more useful measure to us – would doctor, nurse and patient all agree that pain had improved, even if the starting point was different?
Lastly my gut feeling is to ask how we know a test is validated? It has to be reliable (both with inter-rater and intra-rater agreement) and valid. This paper is only showing us half of one of those markers, so its a bit much to completely write off the assessment.
Also, it would be nice to know that the researchers were trained at the assessments especially the two doctors participating, as they are the least likely people to be doing the scales on a regular basis (that may be a bit cynical of me). It would be good to also see the over all inter-rater agreement as it’s unclear why it was split between the nurses and the doctors.

So what’s the bottom line?

While doctors and nurses seem to reach reasonable agreement on pain scores using the FACESScore for under 8s and the behaviour score for over 8s, I’m really not sure what the value of these scores are at all when they seem to disagree with what the patients themselves think. Lots more questions coming out of this paper for me than answers.

James F, Edwards R, James N, Dyer R, Goodwin V. The Royal College of Emergency Medicine composite pain scale for children: level of inter-rater agreement. Emerg Med J. March 2017:emermed-2015-205517. doi: 10.1136/emermed-2015-205517

College of Emergency Medicine T. Management of Paediatric Pain – Best Practice Guideline. Royal College of Emergency Medicine. https://secure.rcem.ac.uk/code/document.asp?ID=4682. Published July 2013. Accessed May 25, 2017. [Source]

McHugh M. Interrater reliability: the kappa statistic. Biochem Med (Zagreb). 2012;22(3):276-282. [PMC]