A method of compensation for random results in multichoice vocabulary size tests.

Multichoice tests are often used for vocabulary size tests and are helpful to teachers and students. However, at this time[1], most are subject to a flaw which makes them extremely inaccurate when students have low vocabularies[2].

This can be verified by going to any of the following sites and entering answers at random. In each you will be informed that you have a vocabulary of several thousand words.

http://my.vocabularysize.com/H

https://www.lextutor.ca/tests/levels/recognition/1_14k/

https://www.arealme.com/vocabulary-size-test/en/

You will also get the same results from the paper-based test at https://www.victoria.ac.nz/lals/about/staff/publications/paul-nation/Vocabulary-Size-Test-14000.pdf

The reason for the flaw is that the significance of incorrect answers has not been taken into consideration.

The significance of incorrect answers

Incorrect answers may occur for several reasons but, no matter what the reason, an incorrect answer has the same effect as a random answer. It tells us that the student did not enter the correct answer.

Consider a well-constructed[3] multi-choice test with 100 questions each with 4 options.

If we enter random answers for all questions, our score will be very close to 25, which is, in fact, the “Random Level” (R).[4]

Therefore, if a student completes the test and the score is very close to R we can conclude that the student has demonstrated no knowledge of the subject. Their score should be 0. So, the correct calculation to obtain their score is:

initial score (25) – random score (25) = 0

This calculation is valid when the student score is very low. At the other extreme, if a student completes the test and obtains a score of 100, they are showing perfect knowledge and so no adjustment for the Random Level should be made. Therefore the correct calculation to obtain their score is”

initial score (100) – random score (0) = 100

So, between an initial score of 25 (corrected to 0) and a score of 100 (“corrected” to 100), the amount to be deducted to allow for random answers has decreased from 25 to 0.[5] [6]

We can graph it like this:

In relation to the above diagram, it will be noted that, while it seems that the range of student scores is 0 to 100, the actual range over which a student demonstrates their knowledge is from 25 to 100.[7] [8]

So, what is required is to convert the range 25 to 100 to the range 0 to 100. In this way, a student with no knowledge will not be credited with having correct answers for 25 of the questions and credited with a vocabulary size they do not have.

By making the range conversion described, the correct number of words will result, not 25 x 100 = 2000, but 0 x 100, as is correct.

Developing the formula

There is a simple formula which will convert scores.

First I will demonstrate a simplified version of the formula applied to the 100 question with 4 option test we have already looked at.

We have seen that to get the correct score we must first subtract the random score. How do we obtain the random score?

Let us call the original student score O.
Let us call the Random Level R.
Let us call the amount we deduct for the random effect E.

In this example we already know that R = 25.

As stated before, E will vary between 25 and 0 as from O = 25 to O = 100.

The final column has been converted to a vocabulary size figure, assuming that each correct question represents 100 words as, for example, in the Paul Nation 14,000 word test. This in this test, the vocabulary size will be in the range from 0 to 10,000.

 

Original score

 

Random Level

 

Corrected score

 

Word score (100 words per correct answer)

25

25 - (25 * (25 - 25)/75) =

25

25 - 25 =

0

0 * 100 =

0

35

25 - (25 * (35 - 25)/75) =

22

35 - 22 =

13

13 * 100 =

1300

45

25 - (25 * (45 - 25)/75) =

18

45 - 18 =

27

27 * 100 =

2700

55

25 - (25 * (55 - 25)/75) =

15

55 - 15 =

40

40 * 100 =

4000

70

25 - (25 * (70 - 25)/75) =

10

70 - 10 =

60

60 * 100 =

6000

90

25 - (25 * (90 - 25)/75) =

3

90 - 3 =

87

87 * 100 =

8700

95

25 - (25 * (95 - 25)/75) =

2

95 - 2 =

93

93 * 100 =

9300

100

25 - (25 * (100-25)/75) =

0

100 - 0 =

100

100 * 100 =

10000

 

It can be seen that the flaw whereby low answers get spurious high word results scores been resolved.

The General Equation.

As stated above, we have demonstrated on a single test configuration, 100 questions each with 4 options, and each correct answer representing 100 words.

Now we must add new variables to our equation to make it general for any multi-choice test. The full list of variables will now be:

O = Original student score

R = Random Level

N = Number of questions

M = Number of options in each multi-choice question

W = Number of words represented by each correct answer

V = Vocabulary size as measured by this test in words per question answered correctly

The formula is:

 V = ((O - (N/M - (N/M * ((O - N/M) / (N - N/M))))) * W)

Let us now look at the use of this formula in two scenarios. One is a 200 word/5 option assessment where each correct answer represents 50 words. The other is the very commonly-used Paul Nation 14,000 word test, used in some online assessments and commonly used in classrooms.

 

Table of representative results for N = 200, M = 5, W = 50

O

(O - (N/M - (N/M * ((O - N/M) / (N - N/M))))) * W

V

40

(40 - (200/5 - (200/5* ((40 - 200/5) / (200 - 200/5))))) * 50

0

60

(60 - (200/5 - (200/5* ((60 - 200/5) / (200 - 200/5))))) * 50

1250

100

(100 - (200/5 - (200/5* ((100 - 200/5) / (200 - 200/5))))) * 50

3750

125

(125 - (200/5 - (200/5* ((125 - 200/5) / (200 - 200/5))))) * 50

5312

150

(150 - (200/5 - (200/5* ((150 - 200/5) / (200 - 200/5))))) * 50

6875

180

(180 - (200/5 - (200/5* ((180 - 200/5) / (200 - 200/5))))) * 50

8750

190

(190 - (200/5 - (200/5* ((190 - 200/5) / (200 - 200/5))))) * 50

9375

195

(195 - (200/5 - (200/5* ((195 - 200/5) / (200 - 200/5))))) * 50

9687

200

(200 - (200/5 - (200/5* ((200 - 200/5) / (200 - 200/5))))) * 50

10000

 

 

Table of representative results for N = 140, M = 4, W = 100 (Paul Nation 14,000)

O

(O - (N/M - (N/M * ((O - N/M) / (N - N/M))))) * W

V

35

(35 - (140/4 - (140/4 * ((35 - 140/4) / (140- 140/4))))) * 100

0

40

(40 - (140/4 - (140/4 * ((40 - 140/4) / (140- 140/4))))) * 100

667

45

(45 - (140/4 - (140/4 * ((45 - 140/4) / (140- 140/4))))) * 100

1333

55

(55 - (140/4 - (140/4 * ((55 - 140/4) / (140- 140/4))))) * 100

2666

65

(65 - (140/4 - (140/4 * ((65- 140/4) / (140- 140/4))))) * 100

4000

85

(85 - (140/4 - (140/4 * ((85 - 140/4) / (140- 140/4))))) * 100

6667

105

(105 - (140/4 - (140/4 * ((105- 140/4) / (140- 140/4))))) * 100

9333

125

(125 - (140/4 - (140/4 * ((125- 140/4) / (140- 140/4))))) * 100

12000

135

(135 - (140/4 - (140/4 * ((135- 140/4) / (140- 140/4))))) * 100

13333

140

(140 - (140/4 - (140/4 * ((140 - 140/4) / (140- 140/4))))) * 100

14000

 

Points to note

·         When a student scores at the random point the formula correctly reports knowledge of 0 words.

·         When a student scores a maximum score the formula correctly reports the maximum possible score over the range being tested.

·         Between these extremes the score increases evenly.

Discussion

There has been discussion about the usefulness of VSTs (Vocabulary Size Tests). This document is concerned only about how to remove a flaw from multichoice VST results. However, a comment or two is in order.

I began teaching ESL some 16 years ago. It quickly became apparent to me that the one of the major difficulties students faced was limited vocabulary size.

It also became apparent that, from a teacher’s point of view, knowledge of vocabulary size was valuable in several ways. It was a very useful indicator of how best to help the student. For example, when assigning reading materials such as graded readers, it became easier to choose the appropriate text for each student. It was useful in helping a student decide which course options they were more likely to succeed in. I came to realise too, that many teachers were unaware of the very low vocabulary that some of their students were struggling with and, when I was able to provide a reasonably accurate figure of vocabulary, those teachers became more motivated to give help at the appropriate level.

Furthermore, I and my colleagues observed over the years that students needed to be able to score at least 6,000 on the (corrected) Paul Nation 14,000 multichoice test in order to have a chance of succeeding at senior high school levels. If, as so often happened, a 15, 16 or 17 year old student tested to have a vocabulary of 2-3,000 words, we learned they had a very serious problem; they simply did not have the time to build their vocabulary to be able to function properly at the senior levels of high school. Often they, their parents, or even their teachers did not know why they were failing and we were able to help with this explanation and to explain what would be required to remedy the situation.

Final word

How did the flaw which I have written about here come about? It seems highly unlikely that leading academics and programmers around the world would not have identified this issue during the design and testing phases. But somehow it was overlooked.

I suggest that the problem arose mainly from normal experience of multichoice tests. They work well in identifying how much knowledge a student has in relation to other students. The exact score, even if affected by the random effect, isn’t particularly significant.

However, in VSTs, multichoice tests are intended not so much to rank students but to get an absolute value, a certain number of words. So, if the original score is wrong, so is the number of words. And not in a trivial way, but by hundreds of percent.

By way of example, suppose I have two students to which I give the Paul Nation 14,000 VST. One student (A) scores 4000 words (uncorrected), while the other student (B) scores 5500 (uncorrected). It is clear that B has more vocabulary knowledge than A. However, uncorrected, both appear to have reasonably good learner vocabularies of 4000 or more, and may result in both students being treated in a similar way. For example, they may be put into the same class and given the same reading material and assignments.

However, after the scores have been corrected, to 667 (A) and 2666 (B) respectively, it is apparent that student B knows nearly 4 times more words than student A, a very substantial advantage in the early stages of learning a language. It would thus be much less appropriate to treat them in a similar way. A is very much a beginner while B is moving into intermediate territory.

It can be seen the random effect, apparently trivial, is not so when it is used for vocabulary size testing at low levels.

Teachers should therefore, if they use multichoice tests of vocabulary size, always ensure that they are corrected.

To make this easier to do, I have prepared a web page which enables a table of corrected values to be created for any multichoice VST. It is available at:

https://www.mrbrook.net.nz/CalcVocab/multichoicecomp/MultichoiceCompensation.html

 

Comments and suggestions welcomed, of course.

 

Jim Brook

February 2018.

 



[1] It is hoped that within a reasonable amount of time, test writers will create tests which correct the flaws described here.

[2] This flaw, as discussed later, has a much smaller effect when the vocabulary is relatively high.

[3] A well-constructed test would have features such as: the same number of options for each question, and options which are all plausible.

[4] The formula for the calculation of the random point is calculated by N/M where N is the number of questions, and M is the number of choices given for each question.

[5] Of course, in this case, the actual midpoint of  25 and 100 is 37.5 and such an original score will not occur.

[6] It is interesting to consider the significance of an original score of 0. This could only be obtained by a student who had sufficient knowledge to enter the incorrect answer for each question. This degree of knowledge would be comparable to that of a student who obtained and original score of 100!

[7] A student could, by chance, present a score of slightly less than 25. In this case the laws of probability would kick in. i.e. A 41 (1 in 4) chance of 24, a 42 (1 in 16) of a score of 23 etc. Ignoring this effect, the range does begin with 25.

[8] Of course, the starting point of a range will not always be 25. In a 100 item test with 5 options per questions, the starting point will be 20. Or in a 200 question test with 4 options it will be 50. The general formula for calculating the random point is R = N / M where N = number of questions and M = number of choices available in each question.