The (Mis)Use of Teaching Evaluations

Student evaluations of teaching (SETs) are standard practice in almost every Canadian university and college. These are in-class or online questionnaires that students fill out anonymously to rate and comment on the instructor and the course, with the results passed along to the instructor and, usually, to their supervisor.

But although SETs are standard practice, they’re also controversial. SETs can provide instructors with valuable feedback that they can then use to improve the course or their teaching – the so-called “formative” purpose of such  evaluations. But SETs are also often used by universities and colleges as a measure of the quality of the instructor’s teaching – the so-called “summative” purpose. Using SETs for summative purposes can be a problem because there are lots of factors beyond the instructor’s control – such as the difficulty of the course material, the class schedule, the timing and content of the evaluation itself, and even the instructor’s gender or race – that can unduly influence students’ ratings. That is why we’ve seen pushbacks from faculty members and unions at several Canadian post-secondary institutions on SETs being part of the evidence used to decide whether to promote instructors or give them tenure.

A few months ago, an arbitrator issued a decision in a grievance at Toronto’s Ryerson Polytechnic University over the use of SETs in the tenure and promotion process. The dispute that led to the arbitration was a dispute between the university and the union representing Ryerson’s faculty members on how Ryerson was using the results of SETs. The union wanted the results of instructors’ SETs to be given less importance in, or removed entirely from, the university’s decisions on promotion or tenure. The university’s view was that SETs were an important part of those decisions, because including SETs meant that students’ perspectives were considered in the decision process. It says something about the complexity and importance of this issue that the university and the faculty union started formally discussing the use of SETs as long ago as 2003, with the grievance being filed in 2009.

The arbitrator stated in his decision that high-quality teaching should be expected from instructors, and that students’ satisfaction with their educational experience “is central to the University’s mission”.  But he also found that “the most meaningful aspects of teaching performance and effectiveness cannot be assessed by SETs. Insofar as assessing teaching effectiveness is concerned – especially in the context of tenure and promotion – SETs are imperfect at best and downright biased and unreliable at worst.”

An example of a student evaluation of teaching (SET) form, from Brandeis University. (credit:

Here are some of the problems that the arbitrator identified with SETs and how they are used:

  • The SET questionnaire can include questions that students may not have sufficient knowledge to answer, such as whether the subject matter in the course is up to date.
  • SET questionnaires may have widely ranging response rates, which makes the results difficult to compare accurately across semesters or across courses. Also, if the response rate is low, it’s impossible to tell whether the ratings from the students who respond actually represent the opinions of all the students enrolled in the course.
  • Students doing well in a class generally rate instructors more positively than students doing poorly, which suggests that the rating has more to do with the students’ feelings about their grade rather than with the quality of the instructor’s teaching.
  • Universities and colleges often calculate averages of individual instructors’ SET scores and then compare the averages across departments, schools, faculties, or the entire institution. But these averages don’t reflect that SETs usually have several different components (e.g. course material, in-class teaching, instructors’ availability or helpfulness outside class time), and also don’t reflect the content of any written comments provided by students. Thus, comparisons of instructors’ average SET scores may not be fair to the instructors, because averages don’t accurately represent the outcomes of those SETs.
  • If instructors know that SET scores will be used for performance assessment, they may change their behaviour in ways that will maximize their SET scores – but these changes may not necessarily result in better teaching or a better learning experience for the students.

The arbitrator suggested that other methods of assessing teaching effectiveness, such as the instructor providing a teaching dossier of the materials they use in class, or the instructor’s teaching being observed in class by another instructor, were more accurate ways to assess the quality of an instructor’s teaching. The tenure and promotion process at Ryerson includes both of these methods.

The arbitrator ruled that the collective agreement between Ryerson and the faculty union must be amended to remove SETs from the faculty tenure and promotion process. He also ruled that that the faculty union and the university must collaborate on improving the presentation and use of SET data, and that online SETs must be discontinued for face-to-face courses taught by probationary (pre-tenure) faculty.

(As part of the submissions in the case, the Ontario Council of University Faculty Associations commissioned two reports on the effectiveness of SETs. Both are excellent reads for anyone interested in the issue of SETs and how they are used.)

This decision has the potential to affect how SETs are used at other universities and colleges, in Canada or elsewhere. Some commentators have criticized this decision as meaning that students will no longer be able to evaluate their instructors, but that’s not true. The decision says that SETs have to be carefully designed so as to accurately capture students’ perceptions, and that SETs have to be used in ways that acknowledge their limitations. Student evaluations can provide important feedback for instructors, but they can also produce noisy data with irrelevant information that can have very serious consequences. This decision may be a significant incentive for post-secondary institutions to consider more meaningful ways to assess instructors’ teaching abilities.

One comment

  1. The ruling sounds both fair and based in common sense. While it’s good to give the students a voice and ensure the quality of their education, it can be misguided to use those evaluation scores to determine professors’ compensation. I have a college-professor friend who points out that “easy” and “fun” courses routinely get higher scores at her university than more rigorous classes such as biology or neuroscience, regardless of who the instructor is. I’ll be eager to see if this development in Canadian education works its way down across the southern border.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.