I often use information from diagnostic tools to help me with my research. As with any measure I need to consider how well these tools perform when making sense of the results and their implications. But how do we measure the performance of a diagnostic tool? Usually we focus on trying to answer two questions: does the tool measure what it was intended to (validity), and, does it do so consistently (reliability)? In this post I hope to give you a bit of a crash course about some of the ways we evaluate the validity of diagnostic tools in psychology.
Indicators of validity
Typically, when we evaluate the validity of a diagnostic tool we examine its sensitivity, specificity, positive predictive value and negative predictive value. In plain English, we look to see whether the test accurately identifies the construct of interest/thing we are interested in measuring (sensitivity), and accurately identifies the people who do not have the construct of interest (specificity). But’s it’s not enough just to consider the sensitivity and specificity of a diagnostic tool, we also try to consider the likelihood that someone will be accurately classified as either having the construct of interest (positive predictive value) or not (negative predictive value). These concepts may appear deceptively similar. In fact, experiencing some difficulty articulating how sensitivity and specificity differ from positive and negative predictive value myself, I realised that I needed to revisit these concepts and so lucky you gets to read about it!
So what do sensitivity, specificity and negative and positive predictive value actually tell us? Sensitivity and specificity tell us about how a test performs. In the context of autism diagnosis, they give some indication of how useful a diagnostic tool might be in helping clinicians identify whether someone has an autism spectrum disorder, how good they are in ‘capturing’ the people with autism and ‘capturing’ the people without autism. In contrast, negative and positive predictive value inform us about the likely proportion of correct classifications, that is, the likelihood of a diagnostic tool helping to identify someone as truly having or not having an autism spectrum disorder. Still confused? Bear with me. Positive and negative predictive values tell us about the likelihood that a diagnostic outcome is correct by considering prevalence. Prevalence can reflect the number of people with the disorder in the study sample or, in the broader community. For instance, in Australia, it is estimated that 1 in 160 children between 6 and 12 years have an autism spectrum disorder (Australian Advisory Board on Autism Spectrum Disorders, 2007).
Putting it into practice
Hopefully this worked example will tie all these together and give you an idea about how they can be applied. Imagine for me that 50 people have volunteered to help me with my research and that each person has been administered a diagnostic measure. I know that 25 of these people actually have an autism spectrum disorder. The table below displays the results of this study. The ‘test result’ rows reflect the diagnostic outcome supported by the measure, while the ‘diagnostic status’ columns indicate the true diagnosis of participants.
To calculate sensitivity we take the number of people correctly classified as having autism (17) and divide it by the total number of people who have autism (17 +8). Similarly, to calculate specificity, we take the number of people correctly classified as not having autism (15) and divide it by the total number of people who do not have autism (10 + 15). To get a balanced picture, we can also calculate its positive and negative predictive value. To calculate positive predictive value, we take the number of people correctly classified as having autism (17) and divide it by the sum of correct and incorrect classifications of autism (17 + 10). Likewise, to calculate the negative predictive value, we take the number of people correctly classified as not having autism (15), and divide it by the sum of correct and incorrect classifications of not having autism (15 + 8).
These calculations leave us with the following results:
We now have a lot of stats, but what do they mean? The sensitivity and specificity of the tool indicate that it is better at recognising people with autism than people without autism i.e. it ‘missed’ less people with autism than it did people without autism. However, the positive and negative predictive values for this measure indicate that though this tool may miss some of the people who do not have autism, when it does identify them it classifies them correctly slightly more often than it classifies people with autism correctly. To put this another way, using this diagnostic tool results in slightly more false positives than negatives. Ideally, we want a diagnostic tool with adequate sensitivity, specificity, positive predictive value and negative predictive value. Benchmarks for adequate sensitivity, specificity, positive predictive value and negative predictive value vary by field because the implications of false negatives and positives will also differ. For example, Glascoe (2005), recommends that when screening for developmental disorders we should aim to use tools with a sensitivity between 70-80% and a specificity of 80% or more.
There are many other aspects to consider when establishing the validity of a diagnostic tool and other aspects of its performance too. However, hopefully this post has given you some insight into the process. If you think that I have made an error in my explanations by all means let me know!