<< Chapter < Page Chapter >> Page >

I see two arguments in favour of retaining extensive automated testing - one that I consider to be invalid, and one that is somewhat valid.

The invalid argument is the classic “reliability and validity” arguments from educational measurement and test theory. The argument is that automated tests are a fair judge of a student’s ability, whereas the kind of assessment needed for the types of learning described above will be subjective and unreliable. For now I won’t dispute the second part of this argument, but in terms of fairness arising from reliability of automated assessment, there is a fundamental problem with this argument that is rarely discussed.

Educational measurement, if it is to be valid, needs to meet the requirements of “scientific” or true measurement. Scientific measurement requires that the underlying attribute being measured (in this case a student’s ability in a particular area) is quantitative (like length) and not qualitative (like colour). For an attribute to be quantitative, it is not simply a matter of assigning numerals to things, rather, a scientific study to investigate whether or not the underlying attribute has the “structure” required for something to be quantitative needs to be conducted.

For something like length, this is easy to establish, as we can compare and add lengths. For other attributes (such as density, or potentially educational abilities), we can’t add objects/people together, but we can potentially order them. The discovery of conjoint measurement provides a method of testing ordered structures to see if they are also quantitative.

So if one applies the rigorous requirements of scientific measurement to educational scores, what do we find? Well, when I last looked into this field deeply*, there was no robust evidence that educational measurement is quantitative. If this is the case, then we can’t add scores together in education and achieve at a meaningful outcome (eg, creating an “overall” score is invalid, because the numerals being added together aren’t based on a demonstrably quantitative attribute). And if this is the case, then we don’t actually have fairness, as the reliability and validity that we appear to have are built on a false foundation.

*For a detailed version of this argument, see Dalziel (1998)

If automated testing produces scores which are not real measurement, but rather spurious numerals; and given that the use of automated testing has such a great impact on the way students learn (and how teachers teach), then I believe there is an argument for a fundamental change in the way education is conducted in the US (and elsewhere). If automated testing is rejected, and the types of learning described above are valued, then the alternative approach to education could look more like typical Learning Design sequences.

The second, somewhat valid defence of extensive automated testing is that any alternative to this would involve enormous human effort on the part of educators. If educators need to conduct rich assessments with feedback and dialogue for each individual student, then this would take an enormous amount of time; and educators are already incredibly busy, so it is hard to see where this time could come from.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, The impact of open source software on education. OpenStax CNX. Mar 30, 2009 Download for free at http://cnx.org/content/col10431/1.7
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'The impact of open source software on education' conversation and receive update notifications?

Ask