Writing Exam Questions III

October 2, 2021 Martin Jones

Weighting and marking exams questions for both taker and marker.

Reliable exams require good weighting - the allocation of marks - and reasonable marking schemes. This, the third post in a series on writing exam questions, provides suggestions for both these areas.

If you prefer, go straight to the checklist checklist to get an overview.

As with the other posts in the series, this represents my personal experience and not the results of a formal study.

Weighting and Marking

Each exam should be an opportunity for the taker to demonstrate their competence. Clear communication of exam question weighting enables them to choose to answer to their strengths, especially when time is a factor. It is also worth considering the role of marker. A marker’s job is to look for proof of the exam taker’s competence through the evidence of the exam answers. It’s not adversarial. There are no points awarded to a marker for being harsh - people with mean streaks should not mark. A marking scheme helps a marker to remain consistent and thus improves the reliability of the exam. However, marking schemes cannot cover every possibility, so they’re just guides; markers should be permitted to provide marks for relevant answers that aren’t covered by the mark scheme.

W1: Marks are Stated

The marks available for a question must be clearly written with the question. It should be visually distinct from the main body of the text. For example, [2] is clearer than (2 marks), especially when breaking a question down.

The start of the exam should state how many questions there are, the total number of marks, and the duration of the exam. Exam takers feel more comfortable knowing this information before the exam despite it not practically changing anything when it comes to taking the exam.

W2: Marks are Broken Down

Break down marks to make their allocation clear.

Building your C++ project causes the following error:

/usr/bin/ld: /tmp/ccuLawBt.o: in function `main': x.cpp:(.text+0x10): undefined reference to `f()' collect2: error: ld returned 1 exit status

State a likely cause of this error. Describe one possible fix. [3]

wrong

In the example above, it is unclear how many points are awarded to stating the likely cause compared to describing a possible fix. In fact, it could be read that there are three marks for describing a possible fix, and none for stating a likely cause.

Building your C++ project causes the following error:

/usr/bin/ld: /tmp/ccuLawBt.o: in function `main': x.cpp:(.text+0x10): undefined reference to `f()' collect2: error: ld returned 1 exit status

State a likely cause of this error [1]. Describe one possible fix [2].

correct

Now the allocation of marks is clear, and we have an idea of relative weighting.

W3: Marks are Proportionate to Effort Only

The marks should represent the amount of effort expected in an answer. They will normally correlate with the directives. In the question above, the describe is expecting more effort than the state.

Exam writers mistakenly break this rule when they assign more weight to a question because it is a more important topic. The intent is noble - improve exam validity. Alas, it reduces reliability by reducing the utility of the marks for an exam taker.

An important topic should be strongly represented. Do so by either having more parts to the question or writing more than one question for it. If this makes the exam too long, then consider whether it is more valid to have focus or breadth.

W4: Write a Perfect Answer

Write a perfect answer for each question. After doing so, you may find that the marks need to be adjusted. You may even wish to start with this before thinking about marks.

Here is what I would expect for the question above:

No translation unit contains a definition of the function f [1]. Find (or create) the .cpp that corresponds to the .h containing the declaration of f [1]. Add the definition of f into that .cpp [1].

Writing a perfect answer should confirm that both the question and mark allocations are reasonable. It also shows that a good answer includes not just what to do, but where to do it for full marks.

M1: A Marking Scheme is Just a Guide

The marking scheme is only a guide and should not be followed unquestioningly. Consider the question above, and its ‘correct’ answer. It’s not the only correct answer. If a taker puts the definition in file f.cpp, then that might be considered correct, or partially correct, depending on the level of the course.

A taker might provide a completely different answer, e.g. the declaration in a .h does not match the definition in the .cpp and therefore we have accidental overloading. The question deliberately uses a function that takes no parameters to make it less likely that a taker would answer this way, but it is still a possible reason.

M2: Allow Half-marks, but Don’t Require Them

A marker should be free to award half marks where it makes sense. The marking scheme itself should avoid assigning half marks. Doing so is burdensome for the marker and could be a sign that marks are not being allocated for effort. Here is an extreme example:

Function overload resolution ranks conversions of parameters to determine which function best matches a function call. Complete the following table to say which parameter type is preferable for the given argument. Use either number 1 or 2 to indicate the parameter, or write 'neither' if neither is considered preferable. The first two rows have been completed as examples, and x has been introduced as:

int x;

Argument Param 1 Param 2 Answer

7 int int& 1

x int int& neither

x const int& int&

x int&& int&

7 int&& int&

x const int int&

'A' int double

5.0 int float

Argument	Param 1	Param 2	Answer
`7`	`int`	`int&`	1
`x`	`int`	`int&`	neither
`x`	`const int&`	`int&`
`x`	`int&&`	`int&`
`7`	`int&&`	`int&`
`x`	`const int`	`int&`
`'A'`	`int`	`double`
`5.0`	`int`	`float`

Marking Scheme

Argument	Param 1	Param 2	Answer	Marks
`7`	`int`	`int&`	1	0
`x`	`int`	`int&`	neither	0
`x`	`const int&`	`int&`	2	0.25
`x`	`int&&`	`int&`	2	0.25
`7`	`int&&`	`int&`	1	0.25
`x`	`const int`	`int&`	neither	0.25
`'A'`	`int`	`double`	1	0.25
`5.0`	`int`	`float`	neither	0.25

Award 0.25 marks for each correct answer, rounding to the nearest half mark.

wrong

This question is difficult to mark, and the marks are not clear to the exam taker. Imagine how much harder it would be if the answers were not one of three possibilities! The rounding instruction makes it slightly less reliable as well.

M3: Consider the Marker

When writing questions, please consider the marker. Questions that are difficult to mark, such as the one above, make it harder to stay focused and unbiased. It may not be believable for most exam takers, but taking is more fun than marking.

W5: No Giveaway Marks

Giveaway marks come from questions that are easily guessed. Consider the following Scala question:

Consider the following code:

trait GEMControlled { def logGEMMessage(s: String) } class WaferShim extends GEMControlled { }

State what happens when we compile this file [1]. Explain, with a code example, how to fix this error [2].

wrong

Regardless of whether you know Scala, can you guess what happens when we try to compile that code? It fails, of course. Why would a question ask about something as mundane as compilation passing?

A giveaway mark, such as this, makes the exam less reliable. We could ask for the details of the error message, but it’s unfair to expect exam takers to remember actual error messages.

Consider the improvement below:

Consider the following code:

trait GEMControlled { def logGEMMessage(s: String) } class WaferShim extends GEMControlled { }

Compilation fails and the following text is included in the error message: '... class WaferShim needs to be abstract ...'.

State why the compiler says WaferShim should be abstract [1]. Explain, with a code example, how to fix this error without making the WaferShim abstract [2].

correct

We avoid the giveaway by literally telling the taker that compilation fails. We even provide the error message, well at least partially.

We can then get into why it’s a compile error. A far more interesting question. The second part of the challenge makes it clear that there is a solution that’s not acceptable. An exam taker could legitimately answer the earlier version of the question just by sticking in an abstract.

W6: Do not Reward Mirroring Negatives

This is a specific case of a giveaway mark but can be a difficult one to fix. Consider the question below:

You have a choice of deploying your project as either a web application or a desktop application. Describe two advantages and disadvantages of each method [4].

wrong

Updates are trivial to deliver to web applications, but not desktop applications. But does writing that count as two marks? If you state one side, you automatically have the other.

Rather than try to rewrite the question, I suggest eliminating it. It’s a recall based question and, as such, isn’t valid, so its reliability is moot.

M4: Each Paper is Individual

Marking papers can be frustrating, but the marker must remember that each paper was written by an individual. The moment I find myself thinking something like, ‘it’s the tenth time I’ve seen this mistake, why do you keep repeating it’, I have to take a break. A marker will see the same mistake again and again, but they must remember it is the first time this taker is making it.

M5: Mark Each Question in a Random Order Across Papers

I recommend marking the same question across all papers, and doing so in a random order each time. Marking the same question across all papers leads to greater consistency for that question.

A marker’s temperament is likely to change while marking a question. It would be unrealistic to think otherwise. Judges can’t keep an even temperament and that’s their very important job. Marking in a random order distributes any fluctuations fairly.

M6: Be Forgiving of Non-Technical English

Errors in English should be ignored if the meaning remains clear and correct. This is the case regardless of whether the taker has English as their primary language.

If an error changes the meaning of an answer, however, and we cannot see the taker’s intention, then I suggest not giving the marks. As a marker we should be looking for the taker demonstrating competence, but they have to do so clearly. It’s the marker’s job to award marks, not find them.

Technical terms and phrases relevant to the question used incorrectly should not be marked correct.

M7: No Arbitrarily Specific Marks

The marking scheme should provide freedom to the markers to determine what is correct. This is where tying the question back to a learning point can be useful.

If, however, the marking scheme contains very specific marks, it may turn into a game of ‘can the taker guess what the exam writer was thinking’.

You can mitigate by using a different person to assign marks and write the marking scheme.

M8: Reasonable Marking Schemes can be Shown to Takers

Always imagine that you will be showing the marking scheme to takers at some point after the exam. Maybe in a group review or an individual performance review. This will help everyone writing the mark scheme to ensure it is reasonable and fair. Those arbitrarily specific marks we mentioned above can’t be allowed to remain.

I have been in the situation where I was performing a review with a failing student only to look at their answer and see that it’s a perfectly reasonable answer and it’s the marking scheme that is not reasonable. No one expects exam writers to be infallible, but they must strive for it.

I have also been in the rather uncomfortable position of watching a fellow instructor running a review session to an audience of takers with a mark scheme that was not as reasonable as it should have been. He compounded this by clinging to the marks provided by the scheme rather than stressing that it is merely a guide and those answers were not the only ones. His reputation and the team’s reputation took a hit that day.