This post and accompanying video are part two in a series of four by Aaron Dewald, Associate Director, Center for Innovation in Legal Education, University of Utah College of Law. Watch his second video, and then read his article below regarding how to calculate item difficulty.
Item difficulty is a simple yet powerful tool to look at how our students are answering the questions. Simply put, it’s how difficult this particular question was for our students based on how many got it right and how many got it wrong. To calculate, it’s simply the number of students who got the item correct divided by the total number of students who answered the item. The number that’s returned gives you the proportion of students who answered it correctly. So, if 78 out of 100 students answered a particular question correct, the difficulty is said to be .78.
You’ll see a range of numbers from 0, which means the question was super hard and no one got it right, to 1, which means it was very easy and everyone got it correct. Generally speaking, we want our item difficulty to be somewhere in a range of .4 to .6, but it really depends on how we’re using the test (Crocker & Algina, 2008). If we’re looking for a normal distribution of scores, a bell curve, then generally we’ll want our difficulties in this range. I say generally because you’ll never hear anyone commit to a hard and fast number. But since we’re just starting out, let’s aim for this range of scores. If we’re really low, like below .3, then the question is too hard. If we’re really high, like over .8, then the question is too easy. These questions won’t add much to the test’s reliability and should be used sparingly.
Let’s clarify one thing though – item difficulty has a lot loaded into it, right? I mean… it could honestly be a difficult (or easy) question. But it could also be a horribly written or convoluted question. It could have horrible distractors. So, as it’s a good first indicator to look close, but unless you’ve critically analyzed this question, you’ll want to look at other indicators to see how the question is truly performing.
Examples of Item Difficulty
Ok. This is all great, but let’s look at some examples, and talk about them. Here, we have the results for five questions on a quiz. The first column on the left represents the question number. The second column shows the number of people who correctly answered the question. Column three is the the total number of people who tried to answer the question. The final column is the number of correct answers divided by the total responses – the question’s item difficulty.
Using our general rule of thumb (but remember, not a hard and fast rule), questions one and three are great questions. They’re in our range of .4 to .6. Question two isn’t a bad question either. It’s just over our rule of thumb, but we’re comfortable adding questions up to .8, if we need to. Question four is insanely difficult and probably needs to be looked at more closely, if not removed entirely. Question five is a very, very easy question and could probably use a check if it’s not a mastery/building confidence style question.
Stay tuned for the next videos/articles in the coming weeks. Subscribe above to have everything delivered straight to your inbox, or, register here for an upcoming Twitter Chat hosted by Aaron himself!
About the Author
Aaron is currently a Ph.D. candidate in learning science, which gives him a unique perspective on technology use in pedagogical situations. Aaron received his B.S. in Information Systems from North Dakota State University in 2001, and his M.Ed. in Instructional Design and Educational Technology from the University of Utah in 2010.Follow on Twitter More Content by Aaron Dewald