Researchers find that labels in computer vision datasets poorly capture racial diversity
Datasets are a primary driver of progress in computer vision, and many computer vision applications require datasets that include human faces. These datasets often have labels denoting racial identity, expressed as a category assigned to faces. But historically, little attention has been paid to the validity, construction, and stability of these categories. Race is an abstract, fuzzy notion, and highly consistent representations of a racial group across datasets could be indicative of stereotyping.
Northeastern University researchers sought to study these face labels in the context of racial categories and fair AI. In a paper, they argue that labels are unreliable as indicators of identity because some labels are more consistently defined than others, and because datasets appear to “systematically” encode stereotypes of racial categories.
Their timely research comes after Deborah Raji and coauthor Genevieve Fried published a pivotal study examining facial recognition datasets compiled over 43 years. They found that researchers, driven by the exploding data requirements of machine learning, gradually abandoned asking for people’s consent, leading them unintentionally include photos of minors, use racist and sexist labels, and have inconsistent quality and lighting
Racial labels are used in computer vision without definition or only with loose and nebulous definition, the coauthors observe from the datasets they analyzed (FairFace, BFW, RFW, and LAOFIW). There’s myriad systems of racial classifications and terminology, some of debatable coherence, with one dataset grouping together “people with ancestral origins in Sub-Saharan Africa, India, Bangladesh, Bhutan, among others.” Other datasets use labels that could be considered offensive, like “Mongoloid.”
Moreover, a number of computer vision datasets use the label “Indian/South Asian,” which the researchers point to as an example of the pitfalls of racial categories. If the “Indian” label refers only to the country of India, it’s arbitrary in the sense that the borders of India represent the partitioning of a colonial empire on political grounds. Indeed, racial labels largely correspond with geographic regions, including populations with a range of languages, cultures, separation in space and time, and phenotypes. Labels like “South Asian” should include populations in Northeast India, who might exhibit traits more common in East Asia, but ethnic groups span racial lines and labels can fractionalize them, placing some members in one racial category and others in a different category.
“The often employed, standard set of racial categories — e.g., ‘Asian,’ ‘Black,’ ‘White,’ ‘South Asian’ — is, at a glance, incapable of representing a substantial number of humans,” the coauthors wrote. “It obviously excludes indigenous peoples of the Americas, and it is unclear where the hundreds of millions of people who live in the Near East, Middle East, or North Africa should be placed. One can consider extending the number of racial categories used, but racial categories will always be incapable of expressing multiracial individuals, or racially ambiguous individuals. National origin or ethnic origin can be utilized, but the borders of countries are often the results of historical circumstance and don’t reflect differences in appearance, and many countries are not racially homogeneous.”
Equally problematically, the researchers found that faces in the datasets they analyzed were systematically the subject of racial disagreements among annotators. All datasets seemed to include and recognize a very specific type of person as Black — a stereotype — while having more expansive (and less consistent) definitions for other racial categories. Furthermore, the consistency of racial perception varied across ethnic groups, with Filipinos in one dataset being seen less consistently seen as Asian compared with Koreans, for example.
“It is possible to explain some of the results purely probabilistically – blonde hair is relatively uncommon outside of Northern Europe, so blond hair is a strong signal of being from Northern Europe, and thus, belonging to the White category. But If the datasets are biased towards images collected from individuals in the U.S., then East Africans may not be included in the datasets, which results in high disagreement on the racial label to assign to Ethiopians relative to the low disagreement on the Black racial category in general,” the coauthors explained.
These racial labeling biases could be reproduced and amplified if left unaddressed, the coauthors warn, taking take on validity with dangerous consequences when divorced from cultural context. Indeed, numerous studies — including the landmark Gender Shades work by Joy Buolamwini, Dr. Timnit Gebru, Dr. Helen Raynham, and Raji — and VentureBeat’s own analyses of public benchmark data have shown facial recognition algorithms are susceptible to various biases. One frequent confounder is technology and techniques that favor lighter skin, which include everything from sepia-tinged film to low-contrast digital cameras. These prejudices can be encoded in algorithms such that their performance on darker-skinned people falls short of that on those with lighter skin.
“A dataset can have equal amounts of individuals across racial categories, but exclude ethnicities or individuals who don’t fit into stereotypes,” they wrote. “It is tempting to believe fairness can be purely mathematical and independent of the categories used to construct groups, but measuring the fairness of systems in practice, or understanding the impact of computer vision in relation to the physical world, necessarily requires references to groups which exist in the real world, however loosely.”
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform
- networking features, and more
Source: Read Full Article