Category Archives: data

Data analysis: Discrimination indices vs chi-squares and principal components vs factor analyses

*this post has been gathering dust for the better part of a month so I figured I should send it on its way.

I’m more of a words person than a numbers person. Show me a graph and you’ve lost me. Explain it in words however, and I’m much more likely to understand. Right now I’m working on answering two questions in my research: which data analyses would be best and how to actually run them? Given that words are more my thing, I’ve decided to blog about figuring this out. What you’re in for with this post then, is a little bit about the process of research, how we go about answering these sorts of questions and some information about statistical techniques.

Decision making

Depending on your background, you may not have spent much time thinking about what researchers do with the data they collect. You may even be wondering why I don’t already know which analyses I will run or how to run them. Well, to use the ‘technical term,’ the number of number crunching techniques available to analyse my data is mind-boggling!  Newbie researchers, tend to have a general understanding of a range of techniques but are most familiar with the handful of analyses they typically use. When you cast the net further to the statistical gurus and established researchers, the number of analyses they are familiar with increases exponentially. I’m definitely in the newbie researcher category. Looking into what other people have done and chatting to my supervisors has already highlighted a few techniques that might be better than those I’d planned to use originally and the need to refresh and familiarise myself with these techniques before diving in. So here goes…


If this post is going to be at all intelligible  you need to meet my data. I have dichotomous categorical data for two groups of participants. In English, this means that for each  group, I have information about whether they’ve endorsed each item from various measures. Still with me? Good.

What I’m interested in is first, identifying for the items that are endorsed more by one of the groups and second, what these ‘good’ items might be tapping into. So in statistical terms, I need to work out the proportion of people who have endorsed each item and the inter-relationships between these items.

Investigating proportions: Discrimination indices vs. Chi-squares

Several analyses handle proportions well, including discrimination indices and chi-square analyses. Chances are, you haven’t heard of discrimination indices before, so I’ll start with those.

Discrimination indices represent proportional differences in item endorsement between groups. That is, a discrimination index would represent  the proportion of people in Group A, subtracted from the proportion of people in Group B who voted yes rather than no for a particular item. We interpret these indices by looking at their size and polarity:

  • Small indices suggest that the proportion of people who endorsed the item was comparable between groups, while large indices indicate one group endorsed the item more than the others.
  • Positive indices tell us that group A endorsed the item more while negative indices tell us that group B endorsed the item more.

Here’s a worked example to make this clearer:

I have 10 participants in Group A and B. Each participant is asked ‘Do you like mint choc-chip ice cream?’ Eight people in Group A have very good taste and say ‘yes,’ but only three people in Group B say ‘yes’.

To calculate how good the mint choc chip question is in discriminating between the two groups I subtract the proportion of people who like this flavour in Group B  (.30) from the proportion of people who like this flavour in Group A (.80) which gives me a  discrimination index of .5.

The positive polarity of the index tells me that Group A endorsed the item more than group B, but does the score .5, mean this a good item? The general consensus in the literature (Gillis, Callahan & Romanczyk, 2011; Allison, Auyeung, & Baron-Cohen, 2012),  is that good items have indices from .3 to .7. So, it seems that mint-choc chip ice cream preference is good at discriminating between Groups A and B.

But why is an index of .3 to .7 considered good? Do indices in this range reflect significantly different proportions? From what I can gather here: these indices were originally designed to evaluate ability based test items, and items falling above or below this range were judged to be too hard or too easy. So it doesn’t appear that this range has anything to do with statistically significant proportions. And, within the context of my research, this recommended range don’t have much meaning. So maybe chi-square analyses would be better for my study?

Chi-squares. If you’ve taken an introductory statistics course, you’ve probably heard of them. A quick  refresher from (Pallant, 2009)  reminds me that chi-square tests of independence allow you to determine whether the difference in proportions between groups on a categorical variable is significantly different and provides you with a phi-coefficient for effect size. In other words, chi-squares help us to work out how confident we can be that differences in proportions between two groups are not simply coincidental. Likewise, phi-coefficients help us to work out how big these group differences are so we can decide whether these differences have practical meaning. For example, if there was a significant difference in mint choc-chip preferences between groups, but the effect size was minuscule, ice cream preferences might not be the best way to discriminate between the two groups.

Investigating what the ‘good items’ are tapping into: Factor vs principal components analysis

Factor and principal components analyses look at the inter-relationships between variables so that they can be condensed into unique groups of variables (Pallant, 2009; Tabachnick & Fidell, 2007). These sorts of analyses are useful when trying to make sense of a larger construct e.g. ice-cream preferences and so we often use them  to identify subscales within a test. For example a factor analysis of an ice cream preferences test might produce three factors that we could use to create sorbet, chocolate-based and fruit-based flavour subscales; the items that correspond to each factor are used to help work out what each factor represents. 

But how do you decide whether to use a principal components analysis or a factor analysis? Generally, we use principle components analyses when  there is no pre-existing rationale for how variables should inter-relate (Tabachnick & Fidell, 2007). In contrast, we use factor analyses when there is such a rationale for how the items should relate and  we only want to take into account the shared rather than independent contributions of these items in explaining differences in the overall variable, i.e. ice cream preference (Tabachnick & Fidell, 2007). Yes, that last bit about shared contributions is still rather Greek to me too, so don’t quote me on that. Thankfully, for my study I think it’s pretty clear I need to use a principal components analysis.


The jury is still somewhat out on whether I should use discrimination indices or chi-squares and principal components or factor analyses. However, I think I have enough information now to ask the right questions to clear up the last few points I need to know to make this decision now.


Allison, C., Auyeung, B., & Baron-Cohen, S. (2012). Toward brief “red flags” for autism screening: The Short Autism Spectrum Quotient and the Short Quantitative Checklist in 1,000 cases and 3,000 controls. Journal of the American Academy of Child & Adolescent Psychiatry, 51(2), 202-212. 

Gillis, J. M., Callahan, E. H., & Romanczyk, R. G. (2011). Assessment of social behavior in children with autism: The development of the Behavioral Assessment of Social Interactions in Young Children. Research in Autism Spectrum Disorders, 5(1), 351-360. doi: 10.1016/j.rasd.2010.04.019

Pallant, J.(2009). SPSS Survival Manual: a step by step guide to data analysis using SPSS (3rd ed.). Crows Nest, NSW: Allen & Unwin.

Tabachnick, B. G., and Fidell, L. S. (2007). Using Multivariate Statistics, 5th ed.  Boston : Pearson Education, Inc. / Allyn and Bacon.


1 Comment

January 2, 2013 · 11:44 am

Data collection 101

Today at 4 o’clock I thanked my final volunteer for participating in my study. I have officially finished data collection! I still can’t quite believe it but it certainly feels good to have reached this milestone. I wanted to celebrate with a hot chocolate, but by that time everything was closed so I had a celebratory Turkish delight at home instead.

Now is as good a time as any to reflect on what the experience has taught me. Firstly, data collection was a lesson in adapting to the unexpected. One particularly memorable experience was opening the door to a room I had booked for my study, to be greeted by fifteen people balefully staring back at me. After a hasty retreat I was able to find another room. Secondly, I learned that you can never be too organised. I carried a folder with me filled with spare study materials, which, entitled with the name of my study, doubled as a sign. This certainly paid off. I had arranged to meet my participants at a landmark on campus. The only problem was, I had no idea what each of my participants looked like and the place I had chosen was quite a popular meeting point. I resorted to conspicuously displaying my improvised sign and asking anyone in the vicinity if they were participating in my study. It worked quite well, though on one occasion I was approached by someone who, after some initial confusion on both our parts, turned out to be a curious stranger. Lastly I learned a lesson or three about data entry. If you need to reverse code something, TRIPLE CHECK you have recoded everything you need to. Double checking is not enough, believe me. I also found keeping multiple copies of my data, and a codebook to make sure the 1s and 0s I’d entered in SPSS meant more to me than binary code, quite useful.

Tomorrow I am taking the day to ‘regroup.’ I want to have a clear plan of where I am headed with my analyses and discussion. It is after all a very significant day today, one month until my thesis is due.

Leave a comment

Filed under A day in the life, analyses, data, Honours year, Research, running a study

A few days make all the difference: Participants at last!

Maybe you had your fingers crossed for me because a few days after I last posted, participants began steadily trickling in. I was so happy to get my first wave of volunteers that I felt like dancing around. I couldn’t settle down to work again for the rest of the day.

The process of data collection has been a learning curve, figuring out what I need to keep a record of, making sure I have covered everything in my research protocol and managing the other little administrative tasks. Speaking of administrative tasks, with all the questionnaires I have been printing I must be responsible for the demise of at least one tree. The actual assessments have been a rewarding experience. Some of the measures I’m using need to be administered by a registered psychologist. It has been great to sit in and see these take place and to interact with the people participating. Also, it feels good seeing what is described in the literature come to life.

Needless to say, having data is also very exciting. I couldn’t resist having a go at analysing it, despite having a minute sample size (I think at the time N = 5)! I have a slightly larger sample now and there are a few interesting and at times bizarre things cropping up but it is still too early to draw any firm conclusions. This hasn’t stopped me reanalysing my data every time I get a few more participants though…

The blog statistics tell me that quite a few people have wandered over to this page from around the world. I’d like to pose you a question, what brought you here? An interest in psychology? Research? Or did you find this blog by accident? : ) Speaking of finding blogs, I have a few under ‘blogs I read’ that I find interesting. Take a look if you get a chance and thanks for reading!


Filed under data, running a study, thesis