How well do Google image results represent reality? Comparing occupational census data against web-based search results
Images are a powerful medium for reflecting reality and also identifying bias. For example, how well do Google image search results of gender-neutral professions match the reality of the workforce? Oliver Duke-Williams of UK Data Service Census Support, digs in to 2011 census data and Google image search results to find out more.
Much has been written about Sir Tim Hunt’s remarks at the World Conference of Science Journalists in Seoul earlier this month. The debate has developed in a number of directions, including a discussion about the gender representation in images returned by Google’s image search, with a specific example being made of the male-dominated results when using the search term ‘professor’. Writing in The Guardian, Dame Athene Donald observed:
If you think that doesn’t matter, imagine you are a 12-year-old girl trying to get a sense of what the adult professional world is like. If the only images that appear against the search term of “professor” are either elderly white males or cartoons of men in white coats with sticking-up hair, as a girl you are hardly likely to think it is the sort of career aspiration you should be considering.
The representation of ‘professor’ is of course problematic in a number of ways: as well as being shown as male, professors are also shown as sterotypically balding and bespectacled. Similarly stereotype-driven images are de rigeur in children’s literature, as documented by Professor Melissa Terras. A natural response to this observation is to wonder what the gender representation of other jobs looks like through the prism of Google Images. Are they similarly one-sided? For example, although the Women’s World Cup is under way at the time of writing, searching for ‘footballer’ returns an entirely male set of results. As with the case for professors, this would not encourage a girl to think that football is a sport for all.
The results of the 2011 Census (for England and Wales) were used to identify a number of different occupations accounting for significant employment. Results are given using the SOC2010 classification of occupations. This is a hierarchical classification, broken down into increasingly detailed job descriptions. Two elements of this were used. Firstly, the top level classifications were used, as counts by sex are available at this level. Secondly, the more detailed third level were used, although only a count of total persons is published at this level.
In each case, a gender-neutral search term was used based on the occupation description, and then the number of male and female images returned were manually counted, using the somewhat arbitrary (but consistently applied) metric of those results which appeared on screen with no scrolling. Cartoon and stylised images were included where gender was obvious; where more than one person was included in an image, all who were in focus and relevant were included. Persons in images were not counted if their gender was not obvious. In a number of cases, it was necessary to select one person who matched the image description: thus, when searching for ‘carer’ it was generally the case that each image depicted both the giver and the receiver of care.
Table 1 shows the results for the top level of the SOC coding, giving both the numbers employed as reported in the census, and the gender split as determined by assessing the Google image results. The SOC labels here are rather broad, and inevitably in representing this as a single search term a significant degree of generalisation is required. A suitable search term for the final ‘elementary occupations’ group was not found. The results are quite interesting: in most cases the gender balance is in the right direction, whilst not being close enough to suggest that this an accurate mirror of society. The one category that is clearly wrong is ‘Administrative and secretarial occupations’ which is probably an artefact of the search term used.
There are some strong gender biases in the occupations shown in Table 1 – both in the census data an in the Google images data – perhaps reflecting long run biases in recruitment or perceived job status.
We can also look at more detailed SOC2010 classified observations from the census, although in this case we do not know the gender balance of those actually employed in these positions. Google searches were made for the top 20 or so jobs (by number of persons employed), omitting those for which no suitable search term was found. Many of the jobs are shown with a strong gender bias. For some jobs, there are separate breakdowns of employment that allow us to find employment by gender. Thus, the search term ‘teacher’ gave results that were 83% female; the most recent data published by the Department for Education for the school workforce in England shows that 74% of teachers were female.
It should of course be remembered that the Google results are for a small set of images (those displayed on the first screenful of results) – one different image could alter the results quite easily. There are a small number of images in each case, typically portraying 20-40 people.
An outlier is ‘civil servant’ – a term used because ‘government worker’ as a search term largely returned caricatures etc depicting government workers as idle – which is shown to be strongly male dominated, whereas in reality civil service employees are 56% male for full-time employees, or 53% female when both full and part time employees are counted.
It could be argued that many of these sets of results seem to emphasise gender difference: traditionally male roles and female roles both being represented in a way more stereotypical way than actual. It is left as an exercise for the reader, but a working hypothesis might be that in Google image searches, women tend to be under-represented in high status jobs and over-represented in lower status jobs. You might also want to consider representation by ethnic group in this way.
Finally, we return to questions about how academia and science might be presented in image search results. Another set of searches were carried out for some academic / science related job titles (Table 3). There are strong biases in the results. We learn that both lecturers and professors are largely male, with apparent problems for women who would hope to make a career progression. Scientists are mostly male, whereas lab assistants were mostly female. Nobody in any of the images for any of these jobs appeared to be crying. Finally, a search was made for ‘doctor’, and again the results were overwhelmingly male, although clearly doctors are medical rather than academic.
Compared with data from HESA these Google results significantly under-represent female employment in higher education (22% of professors are female, and 45% of all academic staff, in the most recent data).
We are however left with a dilemma. Would we like Google results to represent actual employment, or the employment pattern that might exist in a less prejudiced society?
Oliver Duke-Williams is Co-Director of the Census & Administrative data LongitudinaL Studies Hub (CALLS-Hub) [calls.ac.uk] and is a co-investigator in the Centre for Longitudinal Study Information and User Support (CeLSIUS) [ucl.ac.uk/celsius] project which facilitates access to the ONS Longitudinal Study. He is also a co-investigator in Census Support, a value added service of the UK Data Service, where he leads on access to flow data [wicid.ukdataservice.ac.uk], and a co-investigator in the Administrative Data Research Centre – England [adrn.ac.uk/about/research-centre-england]. Oliver tweets @oliver_dw