Category Archives: statistics

Data analysis: Discrimination indices vs chi-squares and principal components vs factor analyses

*this post has been gathering dust for the better part of a month so I figured I should send it on its way.

I’m more of a words person than a numbers person. Show me a graph and you’ve lost me. Explain it in words however, and I’m much more likely to understand. Right now I’m working on answering two questions in my research: which data analyses would be best and how to actually run them? Given that words are more my thing, I’ve decided to blog about figuring this out. What you’re in for with this post then, is a little bit about the process of research, how we go about answering these sorts of questions and some information about statistical techniques.

Decision making

Depending on your background, you may not have spent much time thinking about what researchers do with the data they collect. You may even be wondering why I don’t already know which analyses I will run or how to run them. Well, to use the ‘technical term,’ the number of number crunching techniques available to analyse my data is mind-boggling!  Newbie researchers, tend to have a general understanding of a range of techniques but are most familiar with the handful of analyses they typically use. When you cast the net further to the statistical gurus and established researchers, the number of analyses they are familiar with increases exponentially. I’m definitely in the newbie researcher category. Looking into what other people have done and chatting to my supervisors has already highlighted a few techniques that might be better than those I’d planned to use originally and the need to refresh and familiarise myself with these techniques before diving in. So here goes…


If this post is going to be at all intelligible  you need to meet my data. I have dichotomous categorical data for two groups of participants. In English, this means that for each  group, I have information about whether they’ve endorsed each item from various measures. Still with me? Good.

What I’m interested in is first, identifying for the items that are endorsed more by one of the groups and second, what these ‘good’ items might be tapping into. So in statistical terms, I need to work out the proportion of people who have endorsed each item and the inter-relationships between these items.

Investigating proportions: Discrimination indices vs. Chi-squares

Several analyses handle proportions well, including discrimination indices and chi-square analyses. Chances are, you haven’t heard of discrimination indices before, so I’ll start with those.

Discrimination indices represent proportional differences in item endorsement between groups. That is, a discrimination index would represent  the proportion of people in Group A, subtracted from the proportion of people in Group B who voted yes rather than no for a particular item. We interpret these indices by looking at their size and polarity:

  • Small indices suggest that the proportion of people who endorsed the item was comparable between groups, while large indices indicate one group endorsed the item more than the others.
  • Positive indices tell us that group A endorsed the item more while negative indices tell us that group B endorsed the item more.

Here’s a worked example to make this clearer:

I have 10 participants in Group A and B. Each participant is asked ‘Do you like mint choc-chip ice cream?’ Eight people in Group A have very good taste and say ‘yes,’ but only three people in Group B say ‘yes’.

To calculate how good the mint choc chip question is in discriminating between the two groups I subtract the proportion of people who like this flavour in Group B  (.30) from the proportion of people who like this flavour in Group A (.80) which gives me a  discrimination index of .5.

The positive polarity of the index tells me that Group A endorsed the item more than group B, but does the score .5, mean this a good item? The general consensus in the literature (Gillis, Callahan & Romanczyk, 2011; Allison, Auyeung, & Baron-Cohen, 2012),  is that good items have indices from .3 to .7. So, it seems that mint-choc chip ice cream preference is good at discriminating between Groups A and B.

But why is an index of .3 to .7 considered good? Do indices in this range reflect significantly different proportions? From what I can gather here: these indices were originally designed to evaluate ability based test items, and items falling above or below this range were judged to be too hard or too easy. So it doesn’t appear that this range has anything to do with statistically significant proportions. And, within the context of my research, this recommended range don’t have much meaning. So maybe chi-square analyses would be better for my study?

Chi-squares. If you’ve taken an introductory statistics course, you’ve probably heard of them. A quick  refresher from (Pallant, 2009)  reminds me that chi-square tests of independence allow you to determine whether the difference in proportions between groups on a categorical variable is significantly different and provides you with a phi-coefficient for effect size. In other words, chi-squares help us to work out how confident we can be that differences in proportions between two groups are not simply coincidental. Likewise, phi-coefficients help us to work out how big these group differences are so we can decide whether these differences have practical meaning. For example, if there was a significant difference in mint choc-chip preferences between groups, but the effect size was minuscule, ice cream preferences might not be the best way to discriminate between the two groups.

Investigating what the ‘good items’ are tapping into: Factor vs principal components analysis

Factor and principal components analyses look at the inter-relationships between variables so that they can be condensed into unique groups of variables (Pallant, 2009; Tabachnick & Fidell, 2007). These sorts of analyses are useful when trying to make sense of a larger construct e.g. ice-cream preferences and so we often use them  to identify subscales within a test. For example a factor analysis of an ice cream preferences test might produce three factors that we could use to create sorbet, chocolate-based and fruit-based flavour subscales; the items that correspond to each factor are used to help work out what each factor represents. 

But how do you decide whether to use a principal components analysis or a factor analysis? Generally, we use principle components analyses when  there is no pre-existing rationale for how variables should inter-relate (Tabachnick & Fidell, 2007). In contrast, we use factor analyses when there is such a rationale for how the items should relate and  we only want to take into account the shared rather than independent contributions of these items in explaining differences in the overall variable, i.e. ice cream preference (Tabachnick & Fidell, 2007). Yes, that last bit about shared contributions is still rather Greek to me too, so don’t quote me on that. Thankfully, for my study I think it’s pretty clear I need to use a principal components analysis.


The jury is still somewhat out on whether I should use discrimination indices or chi-squares and principal components or factor analyses. However, I think I have enough information now to ask the right questions to clear up the last few points I need to know to make this decision now.


Allison, C., Auyeung, B., & Baron-Cohen, S. (2012). Toward brief “red flags” for autism screening: The Short Autism Spectrum Quotient and the Short Quantitative Checklist in 1,000 cases and 3,000 controls. Journal of the American Academy of Child & Adolescent Psychiatry, 51(2), 202-212. 

Gillis, J. M., Callahan, E. H., & Romanczyk, R. G. (2011). Assessment of social behavior in children with autism: The development of the Behavioral Assessment of Social Interactions in Young Children. Research in Autism Spectrum Disorders, 5(1), 351-360. doi: 10.1016/j.rasd.2010.04.019

Pallant, J.(2009). SPSS Survival Manual: a step by step guide to data analysis using SPSS (3rd ed.). Crows Nest, NSW: Allen & Unwin.

Tabachnick, B. G., and Fidell, L. S. (2007). Using Multivariate Statistics, 5th ed.  Boston : Pearson Education, Inc. / Allyn and Bacon.


1 Comment

January 2, 2013 · 11:44 am

A Crash Course in Evaluating Diagnostic Tools: Validity

I often use information from diagnostic tools to help me with my research. As with any measure I  need to consider how well these tools perform when making sense of the results and their implications. But how do we measure the performance of a diagnostic tool?  Usually we focus on trying to answer two questions: does the tool measure what it was intended to (validity), and, does it do so consistently (reliability)?  In this post I hope to give you a bit of a crash course about some of the ways we evaluate the validity of diagnostic tools in psychology.

Indicators of validity

Typically, when we evaluate the validity of a diagnostic tool we examine its sensitivity, specificity, positive predictive value and negative predictive value. In plain English, we look to see whether the test accurately identifies the construct of interest/thing we are interested in measuring (sensitivity), and accurately identifies the people who do not have the construct of interest (specificity). But’s it’s not enough just to consider the sensitivity and specificity of a diagnostic tool, we also try to consider the likelihood that someone will be accurately classified as either having the construct of interest (positive predictive value) or not (negative predictive value). These concepts may appear deceptively similar. In fact, experiencing some difficulty articulating how sensitivity and specificity differ from positive and negative predictive value myself, I realised that I needed to revisit these concepts and so lucky you gets to read about it!

So what do sensitivity, specificity and negative and positive predictive value actually tell us? Sensitivity and specificity tell us about how a test performs. In the context of autism diagnosis, they give some indication of how useful a diagnostic tool might be in helping clinicians identify whether someone has an autism spectrum disorder, how good they are in ‘capturing’ the people with autism and ‘capturing’ the people without autism. In contrast, negative and positive predictive value inform us about the likely proportion of correct classifications, that is, the likelihood of a diagnostic tool helping to identify someone as truly having or not having an autism spectrum disorder. Still confused? Bear with me. Positive and negative predictive values tell us about the likelihood that a diagnostic outcome is correct by considering  prevalence. Prevalence can reflect the number of people with the disorder in the study sample or, in the broader community. For instance, in Australia, it is estimated that 1 in 160 children between 6 and 12 years have an autism spectrum disorder (Australian Advisory Board on Autism Spectrum Disorders, 2007).

Putting it into practice

Hopefully this worked example will tie all these together and give you an idea about how they can be applied. Imagine for me that 50 people have volunteered to help me with my research and that each person has been administered a diagnostic measure. I know that 25 of these people actually have an autism spectrum disorder. The table below displays the results of this study. The ‘test result’ rows reflect the diagnostic outcome supported by the measure, while the ‘diagnostic status’ columns indicate the true diagnosis of participants.

To calculate sensitivity we take the number of people correctly classified as having autism (17) and divide it by the total number of people who have autism (17 +8).  Similarly, to calculate specificity, we take the number of people correctly classified as not having autism (15) and divide it by the total number of people who do not have autism (10 + 15). To get a balanced picture, we can also calculate its positive and negative predictive value.  To calculate positive predictive value, we take the number of people correctly classified as having autism (17) and divide it by the sum of correct and incorrect classifications of autism (17 + 10). Likewise, to calculate the negative predictive value, we take the number of people correctly classified as not having autism (15), and divide it by the sum of correct and incorrect classifications of not having autism (15 + 8).

These calculations leave us with the following results:

We now have a lot of stats, but what do they mean? The sensitivity and specificity of the tool indicate that it is better at recognising people with autism than people without autism i.e. it ‘missed’ less people with autism than it did people without autism. However, the positive and negative predictive values for this measure indicate that though this tool may miss some of the people who do not have autism, when it does identify them it classifies them correctly slightly  more often than it classifies people with autism correctly. To put this another way, using this diagnostic tool results in slightly more false positives than negatives. Ideally, we want a diagnostic tool with adequate sensitivity, specificity, positive predictive value and negative predictive value. Benchmarks for adequate sensitivity, specificity, positive predictive value and negative predictive value vary by field because the implications of   false negatives and positives will also differ. For example, Glascoe (2005), recommends that when screening for developmental disorders we should aim to use tools with a sensitivity between 70-80% and a specificity of 80% or more.

There are many other aspects to consider when establishing the validity of a diagnostic tool and other aspects of its performance too. However, hopefully this post has given you some insight into the process. If you think that I have made an error in my explanations by all means let me know!

1 Comment

Filed under analyses, Clinical Phd, Practice, Research, statistics

Ten precious weeks: Ethics, participant recruitment, exam results and a looming deadline

I’ve written and re-written the first line to this post three times. Why? I don’t even know where to begin. So much has happened in the last few weeks.The first cab off the ranks on my Honours news highway is ethics submission. That particular process led me through even more twists and turns than I mentioned in my last post. If I could go back in time and give myself a tip, it would be to expect that it would take longer than anticipated. Regardless, I now have ethics approval for my study! Subsequently, I have entered the participant recruitment phase like all the other Honours students. This means that I am now checking my email with ridiculous frequency, just in case someone has contacted me wanting to volunteer. Sadly, repeatedly hitting your email  refresh button will not make the recruitment phase go any faster…

Ever wondered about the inner working of an Honours student’s mind at this stage of the year? Wonder no more:  

Why isn’t anyone signing up? How can I get more participants? What if I don’t get enough participants? How many people do I really need? What if it takes me ages to get volunteers and I end up with no data?

The above is a decent cross-section of discussions with fellow Honours students over the last week. You might have noticed the recurring theme, a desperate wish for more participants, and quickly! To be honest, I think we all need to relax a bit. Yes, we need to actively search for volunteers, but at the same time volunteers are just that, volunteers. There is only so much you can do to let them know about your study and then the rest is up to them. Let’s see how zen I am about this next week though…

I think this growing anxiety over recruitment is because time is galloping away. My thesis is due in TEN WEEKS. In this time I am aiming to (read: must) have collected data from thirty participants, entered it into SPSS, analysed it, written and edited my introduction, method, results, discussion, references and acknowledgements. And of course binding and submission. Piece of cake I say with tongue firmly in cheek! I know that I will make it happen because I must. Life is nothing without a challenge.

While the last few weeks have been eventful, they have been equally surprising. At the end of last semester I sat several exams. It was with trepidation that I made my way upstairs to the noticeboard to find my marks. As usual, it took me three attempts to locate my student ID among the others not to mention those all important grades. I am not exaggerating when I say that I saw my marks and laughed in astonishment. I honestly could not believe it. I had thought I had done well in one exam, but not that well! And as for the essay exam I had been worried about, I had also earned a good grade. Finding out my results was such a morale boost, I now have a fighting chance in the competitive entry process to postgraduate psychology. I ‘just’ have to defend this chance by throwing my all into the rest of the assessment tasks and my thesis!

Until next time, thanks for reading and good luck with your studies : )

1 Comment

Filed under ethics, Honours year, Research, statistics