The sound of scraping metal legs on wood floor resonates as the coach flings a chair across the empty gym screaming, We were screwed! I scurry away, fearing for my safety at the end of a gymnastics meet. Good grief, I think to myself. Gymnastics is supposed to be a refined sport. I judged fairly.
In judged sports, such as figure skating, diving and gymnastics, public perception is often that judges are influenced by leotard and team color or nationality, order of competitors and difficulty of the performance. Through social media, comments about gymnastics judging—by fans, parents, coaches and even gymnasts themselves—have proliferated. Judges look only at the leotard color and team! The last competitor up will get the highest score! Gymnasts with the hardest skills always win! Judges are biased!
I have judged women’s gymnastics for four decades; my experience is that judges are not purposefully unfair. And yet what does research say—are women’s gymnastics judges biased?
A research study published in 2019 looked at detecting bias and cheating for international gymnastics competitions held from 2013-2016. In this study—“Judging the Judges: Evaluating the performance of International Gymnastics Judges” by Switzerland researchers Hugues Mercier of Université de Neuchâtel and Sandro Heiniger of the Universität St. Gallen—scores were statistically analyzed for outliers significantly away from the control score. The control score was defined as the median (middle) score of all judges who viewed a given routine.
This study found that there were significant differences between the best and the worst judges in how close their scores were to the control score. The researchers suggested that using their mathematical model could determine which of the best judges should be used in important competitions. The international judges are aware that they are being constantly monitored, but the study did not indicate if using this model decreased the judges’ national bias.
In discussing limitations of this model, the authors indicated that if one judge’s score is out of range—we sometimes call this being out to lunch—it can mean that the judge is correct and the rest of the judges are incorrect. I was pleased to see this caveat; I remember a few times on a four-judge panel where one judge had caught an error that the rest of us missed.
I chatted with a fellow judge about concern that we “judge by leotard.” She laughed. Teams change the colors of the leotards so often that honestly, do you think we know which team has the pink or blue, or purple leotard this year? Now, we judge primarily collegiate and developmental level gymnastics—amateur athletes ages 6-18—not the international competitions studied in this research. So, I do not give strong credence to this study’s results; researchers need to study non-international gymnastics competitions to assess whether judges are biased based on team or leotard color.
A coach once said to me, Robin, I really like how you judge. You’re not biased, and you give good scores no matter what team the athlete is from. I laughed and replied, Well, you’re wrong. I’m VERY biased. I’m biased towards good gymnastics.
Last year—while judging a meet on floor exercise—an out-of-state team competed. I didn’t know the name of the team or the coaches, but why would that matter? The gymnasts’ dance and musical expression were exquisite, and they tumbled high and beautifully with pointed toes and legs squeezed together on all skills. Not every athlete from that team scored higher than others in the meet, but the artistry, rhythm and dynamics of those routines was impeccable. So, when athletes from that team scored well, was it because we judges were judging by leotard? I don’t think so. We were judging based on good gymnastics!
Standing tall and confident, the young gymnast walked onto the floor exercise mat. Moving with the compulsory music—the same music and routine done by all the athletes—every step was on high toe. Leaps split over 180o and back handsprings done with her legs squeezed together and toes pointed—this performance was beautiful. She ended with a back salto—no handed back flip—elevated a foot above her head height. This amazing routine was from the FIRST competitor in a beginning level state meet one year. The other judge and I awarded her performance a 9.8 out of 10.0—an almost perfect score for a developmental level gymnast. She won.
Our psychological biases mean order matters when we judge items in sequence according to Robin Kramer—the School of Psychology, University of Lincoln—in a January 2017 article. The first and last items are remembered best. They are also judged more positively. Researchers have identified this effect in a variety of situations, including ratings of students’ essays, perceived attractiveness—and scoring of Olympic gymnasts.
Overall order bias results in escalating scores throughout a competition. Gymnastics coaches are aware of this and will often “stack” the lineup to build to the highest score with the last competitor. Other studies suggest it is better to go later in any judged event.
For myself—when I judge developmental gymnastics—the order of competition is random thus the probability of order bias is minimal. The potential for order bias is primarily seen at the collegiate level and occasionally at developmental level meets. However, as a judge I KNOW about this research; I am always trying to guess when the coaches are trying to trick me.
At our pre-competition judges’ meetings, we always discuss judging consistently no matter the gymnasts’ order and do our best to mitigate order bias. At these meetings, I often refer to my experience where I awarded the top score of the day to the first competitor. As a judge, it is imperative that this is possible.
Another form of bias that has been studied in women’s gymnastics judges is difficulty bias. Kurt W. Rotthoff of Seton Hall University in 2020 wrote an article “Revisiting difficulty bias, and other forms of bias, in elite level gymnastics.” His study found that people who attempt more difficult tasks are artificially awarded higher scores. Data was used from the 2009 World Gymnastics Championships where the competition order was set up randomly.
This study found that difficulty bias exists. Those who perform a more difficult routine receive a higher score. Conversely, those performing less difficult routines are given lower scores.
Personally, I have seen this bias in judges. I THINK that I am immune to it, but that is probably wishful thinking on my behalf.
BOO! BOO! The audience reacted because they didn’t like the 9.95 I awarded after they saw the other judge’s perfect 10.0 score. A gymnast had just completed her floor exercise routine including a full twisting double back—one twist and two flips—one of the most difficult tumbling skills done at the time. However, her legs were noticeably apart on the skill, and so I took a deduction for this error. My score probably should have been a 9.9…So yes, I have seen difficulty bias while judging and know that it can influence judges.
So, Judges are Biased. Is Robot Judging a Panacea?
Researcher Robin Kramer suggests that the best way of removing human bias is to use computer analysis which has been used for synchronized diving, tennis, cricket, and several other sports.
Robot judging—using computerized Artificial Intelligence to judge—is that the answer? Robot judging—by the Japanese information technology company Fujitsu—has been tested at the 2019 World Gymnastics Championships and was slated to assist with scoring at the Tokyo 2020 Olympic Games. (Now delayed until 2021 due to the corona virus pandemic).
Robot judging can eliminate inherent bias, but there are still issues to be worked out. One worry is the security of the computer systems; hackers could get in and sway results. Another concern—which is typical when technology improves—is that judges may lose their jobs.
Many believe that combining robot judging with human judges may be the best route forward; humans are currently better able to assess the artistic components of gymnastics than are the robots. Who knows—in the future can robots be programmed to recognize beauty and artistry?
I believe that in about 10 years when the cost of the technology decreases—yes—robot judging will be the new normal in gymnastics. I also think coaches and athletes will not like the lower scores as a robot judge can see errors that a human judge can miss.
However, I do not expect to be judging in 10 years. And so, I watch with interest—but somewhat dispassionately—as the testing of robot judging goes forward. I think gymnastics judges need to be aware that it is on its way and to be prepared when their role changes.
Yes, research finds that women’s gymnastics judges are biased in various ways. In the future, as technology is tested and improved, robot judging may be the way to mitigate judges’ biases.
Twice in my judging career a coach has thrown a chair across the gym and yelled about the scores. This was decades ago, and the coaches have long since retired. But I got your attention which was the point of starting the essay with this event.
This essay was done for a class with a page limit of 4-6 pages double spaced. The assignment was to weave together personal experience and research about a particular topic. There is an incredible amount of research out there about gymnastics judges and bias. Currently USA Gymnastics is teaming with credible researcher(s) on studying the role of implicit bias in judging. So, stay tuned, there is more to learn about this topic.
Finally, I have edited out identifying information about gymnasts, locations, teams, coaches, and other judges. If I have failed, please let me know and I will further revise.