What Skating Judging can Learn from Workplace Performance Evaluation

At every Winter Olympics, it seems, there are complaints about figure skating judging. Occasionally those complaints lead to something more – as in 2002, when a second gold medal was awarded in the pairs event because of alleged bias in the judging. But usually the complaints are along the lines of “The judging was unfair because my favourite skater lost”, or “The judging was unfair because I didn’t understand it” – that second one often coming from sportswriters and commentators who don’t regularly follow figure skating, or who can’t be bothered to learn how the judging system works.

At the 2014 Sochi Olympics, there were complaints about the judging in every one of the figure skating events, including allegations of fixed results in at least two of the events. The purpose of this post isn’t to argue about those results. Instead, I want to look at the judging system itself, and analyze it using the model of an effective workplace performance evaluation system. I’m using this model for two reasons: 1) I teach courses in human resource management, so it’s an evaluation model that I’m familiar with, and 2) figure skating officials could learn a lot from the principles underlying this model and from its applications.

The current international figure skating judging system, called the ISU Judging System, the Code of Points, or the International Judging System (IJS), dates from 2004. It was implemented by the International Skating Union, figure skating’s international governing body, in response to the 2002 judging scandals. Under the previous judging system, called the ordinal system or the 6.0 system, a panel of judges ranked skaters’ performances on two criteria (presentation and technical merit/required elements), using a scale of marks with a maximum of 6.0. The IJS also uses a panel of judges, along with a referee, but it also uses a technical specialist (the “caller”) and an assistant caller. The caller, using the skater’s planned program sheet as a guide, assigns a level of difficulty, from 1 to 4, to the technical content of each element in a program as it is performed. Elements get different amounts of points for each level of technical difficulty. The judges rate the quality of the performance of the element – the “grade of execution” – on a scale from -3 to +3. The average “grade of execution” rating for each element is added or subtracted to the points for the technical difficulty of the element, and the resulting numbers are added together to get the technical part of the score. There are also five program components, each with an assigned value, and the judges mark these on a scale of 0 to 10. The average of the judges’ mark for each component is multiplied by the component’s value, and the total of those five calculations is the program component score. The overall score for the program is the sum of the program component score and the technical score. (The ISU’s more formal and detailed explanation of the IJS system is here.)

Now obviously there are some differences between evaluating an employee’s job performance at work and judging a skater’s performance at a skating competition. For example:

A skating competition is not a workplace, and skaters are not employees. So the relationship between the evaluators and the evaluatees in skating is not quite the same as it is in a work organization.
In a workplace, an evaluator should observe the employee at work regularly enough to have an informed opinion on the quality of their performance. In skating, the panel of judges isn’t the same at every competition, and the competitors aren’t the same either. So a skating judge evaluating a skater’s performance is usually working more from their knowledge of the sport, the judging system, and the event’s requirements than they are from their knowledge of a specific skater.
A skating competition has to have a winner, so a system to judge skating has to produce a ranking of all the performances. A performance evaluation in the workplace isn’t usually structured to produce a ranking; it usually focuses more on documenting the individual employee’s performance and giving feedback to the employee.
The information from workplace performance evaluations is often used as a basis for other decisions affecting employees, such as promotions or pay raises. The results of some skating competitions might determine which skaters qualify to participate in another competition, but that use of the outcomes isn’t an explicit goal of the judging system.
A workplace performance evaluation usually isn’t made public. Scores from international skating events, including the details of the points awarded by the judges, are always published.
Judges in figure skating competitions are doing their evaluations a lot more quickly than performance evaluators in workplaces.

However, I would argue that there are more similarities than there are differences between the two systems. In both systems, trained evaluators use pre-determined criteria to evaluate performance. There are multiple evaluation criteria, each designed to address a specific component of the performance. The criteria are accompanied by written standards specifying the requirements to achieve different levels of complexity. The criteria are associated with a range of numerical values used to differentiate between less than perfect, perfect, and exceptional performance. And both systems, ideally, give specific and detailed feedback to the evaluatees, so that they can identify areas for improvement and make changes to their performance in the future.

An example of IJS marking: the long program marks for Japan’s Kanako Murakami at the 2013 World Championships. Note that on element 1 (triple lutz jump) and element 6 (triple loop jump) the judges’ marks range from -2 to 1, while the marks for almost all the other elements are all positive or negative. (credit: isuresults.org)

So in that context, let’s look at some of the characteristics of an effective workplace performance evaluation system, and see how the IJS measures up against those characteristics.

Validity. The criteria used in a job performance evaluation should be relevant to the performance and to the job being evaluated; should be designed to accommodate any performance variations that are beyond the control of the person being evaluated (e.g. sales volumes may not be a fair measure of a salesperson’s performance in a geographic region with a poor economy); and should be broad enough to cover all the aspects of the job. The IJS is relatively strong in this area, since it has clearly stated and extensive evaluation criteria. But the IJS system gives higher points to some moves, like this spin, that may not be physically possible for all skaters to accomplish. And a points-based system may also encourage skaters to include high-scoring elements in their programs that don’t really fit the program or its music.

Reliability. In an effective workplace performance evaluation system, the same criteria used by similarly trained evaluators to assess similar performances should produce comparable results across time. Reliability is a little tricky to achieve when evaluating skating, because skaters don’t perform their programs in competition as regularly as most employees perform their job tasks, and skaters don’t perform their programs in the exact same way every time. However, even with those considerations, the application of the IJS does have some reliability problems. As shown in the example depicted above, there can be variations in marks that are wide enough to call into question either the training of the evaluators or the interpretation of the rating criteria.

Appropriately structured performance measures. Measures of workplace performance should be clearly worded so that everyone using the system understands what is being measured, and how it is being measured. Evaluation criteria such as “has a good attitude” are not good measures of employee performance, because they can mean different things to different evaluators. The IJS has many measurement criteria ,with detailed explanations of each one, but I would say that there are wording problems with some of them – for example, it’s not completely clear how the program components of “interpretation” and “performance” are different from each other. Another example of unclear criteria is that each fall gets a one-point deduction, but it is not explicitly stated whether the fall’s effect on the program should also be reflected in the scoring of other criteria, such as skating skills or performance/execution. And some qualities of a skating program, such as musical interpretation, are difficult to numerically measure because they’re intangible and subjective. The IJS criteria address some of these qualities, but there can be very different assessments of these qualities based on the observer’s own preferences or frame of reference.

Trained raters/evaluators. In an effective workplace performance evaluation, the raters or evaluators understand the purposes of the system and what its output will be used for, and are trained to have the same understanding of how to use the system’s components. This ensures consistency in how the system is applied, and reduces the potential of evaluator bias. Skating judges need many years of experience at local, regional and national levels before they are even eligible to be considered for international-level judging – and once they are identified as eligible, they have to go through a lot of training and practice judging before they can actually judge an international event. Judges also have to keep up with clarifications of the rules throughout each competitive season, in addition to having their own judging at each competition evaluated (this is true for technical specialists as well). So the IJS system puts a high priority on having trained raters and users.

However, while training can reduce rater bias, it can’t eliminate it. And since skating is a relatively small community – with, for example, retired skaters moving into roles as coaches, officials, judges or administrators – it’s hard to reduce the perception of skating judging as potentially biased, because there are so many formal and informal connections among the skating community. The IJS is probably stronger than other marking systems in avoiding some forms of bias, because it’s a more objective and quantitative system than others. But the perception of the potential for bias is reinforced by the ISU’s unwillingness to deal effectively and decisively with judges caught cheating.

Appeal mechanisms. An effective workplace performance evaluation system needs an appeal process to resolve disputes over its use or its outcomes. A visible, available, and transparent appeal system creates confidence in the performance evaluation system and in its administrators. The opportunities for appeals in the IJS system, however, are limited. An appeal of a competition’s results has to be in writing, and it has to be submitted “immediately” after the event concludes. In some ways, this restriction is understandable given the compressed schedule of a competitive skating event. However, it also eliminates the possibility of an appeal if problems or evidence of problems do not emerge until some time after an event is over. The ISU’s procedure for dealing with an appeal is also not well defined. So the lack of a viable or understandable appeal process is definitely a weakness in the IJS system.

But beyond the mechanics of an evaluation system, there’s another extremely important factor in the effectiveness of the system – how effective and fair the system is perceived to be. In workplaces, the perception of fairness is as important as the actual fairness in building confidence in the organization and in the evaluation system. There’s a large amount of workplace research on what’s called procedural justice (how fair the process is) and distributive justice (how fair the outcome is). The two kinds of justice are related, but the research on both shows that very often, even if an organization’s members are dissatisfied with an outcome, they will still accept it if they perceive that there was a fair procedure leading to that outcome.

To be blunt, while the IJS has contributed to making judging processes fairer – or at least to making their operations more visible – the ISU is terrible at managing perceptions of unfairness in its procedures and outcomes. Here, for example, is the ISU’s formal response to the criticisms of the judging of the ladies’ figure skating event at the recent Olympics. As you can see, this statement addresses none of the specific allegations of biased or inaccurate judging. It simply repeats how the judging system operates, and expresses the ISU’s confidence in the system – without saying why it has that confidence, or how or if the system worked to counteract whatever allegedly went wrong with the judging. Sadly, this response is typical of how the ISU responds to criticisms of judging – and here is where the ISU could really learn something from workplace performance evaluation systems. Criticisms of evaluation processes or outcomes don’t just disappear because the organization ignores them, and if criticism is routinely ignored, over time that undermines the credibility of the evaluation system and of the organization as a whole. The IJS has helped the ISU, and competitive skating as a whole, regain some legitimacy since 2002 – but the ISU risks losing that legitimacy by continually downplaying any and all criticism of skating judging. And that’s the type of context within which even the best-designed evaluation system can’t be effective.

All About Work

News & Views on Work and Organizations

What Skating Judging can Learn from Workplace Performance Evaluation

One comment

Leave a comment Cancel reply

Share this:

Related

One comment

Leave a comment Cancel reply