Zobrazeno 1 - 10
of 20
pro vyhledávání: '"Zhao, Dora"'
Machine learning (ML) datasets, often perceived as neutral, inherently encapsulate abstract and disputed social constructs. Dataset curators frequently employ value-laden terms such as diversity, bias, and quality to characterize datasets. Despite th
Externí odkaz:
http://arxiv.org/abs/2407.08188
Autor:
Hirota, Yusuke, Andrews, Jerone T. A., Zhao, Dora, Papakyriakopoulos, Orestis, Modas, Apostolos, Nakashima, Yuta, Xiang, Alice
We tackle societal bias in image-text datasets by removing spurious correlations between protected groups and image attributes. Traditional methods only target labeled attributes, ignoring biases from unlabeled ones. Using text-guided inpainting mode
Externí odkaz:
http://arxiv.org/abs/2407.03623
Autor:
Zhao, Dora, Scheuerman, Morgan Klaus, Chitre, Pooja, Andrews, Jerone T. A., Panagiotidou, Georgia, Walker, Shawn, Pine, Kathleen H., Xiang, Alice
Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of
Externí odkaz:
http://arxiv.org/abs/2406.06407
Autor:
Papakyriakopoulos, Orestis, Choi, Anna Seo Gyeong, Andrews, Jerone, Bourke, Rebecca, Thong, William, Zhao, Dora, Xiang, Alice, Koenecke, Allison
Speech datasets are crucial for training Speech Language Technologies (SLT); however, the lack of diversity of the underlying training data can lead to serious limitations in building equitable and robust SLT products, especially along dimensions of
Externí odkaz:
http://arxiv.org/abs/2305.04672
Autor:
Andrews, Jerone T. A., Zhao, Dora, Thong, William, Modas, Apostolos, Papakyriakopoulos, Orestis, Xiang, Alice
Human-centric computer vision (HCCV) data curation practices often neglect privacy and bias concerns, leading to dataset retractions and unfair models. HCCV datasets constructed through nonconsensual web scraping lack crucial metadata for comprehensi
Externí odkaz:
http://arxiv.org/abs/2302.03629
Autor:
Ramaswamy, Vikram V., Lin, Sing Yu, Zhao, Dora, Adcock, Aaron B., van der Maaten, Laurens, Ghadiyaram, Deepti, Russakovsky, Olga
Current dataset collection methods typically scrape large amounts of data from the web. While this technique is extremely scalable, data collected in this way tends to reinforce stereotypical biases, can contain personally identifiable information, a
Externí odkaz:
http://arxiv.org/abs/2301.02560
As computer vision systems become more widely deployed, there is increasing concern from both the research community and the public that these systems are not only reproducing but amplifying harmful social biases. The phenomenon of bias amplification
Externí odkaz:
http://arxiv.org/abs/2210.11924
As teenage use of social media platform continues to proliferate, so do concerns about teenage privacy and safety online. Prior work has established that privacy on networked publics, such as social media, is complex, requiring users to navigate not
Externí odkaz:
http://arxiv.org/abs/2208.02796
Autor:
Meister, Nicole, Zhao, Dora, Wang, Angelina, Ramaswamy, Vikram V., Fong, Ruth, Russakovsky, Olga
Gender biases are known to exist within large-scale visual datasets and can be reflected or even amplified in downstream models. Many prior works have proposed methods for mitigating gender biases, often by attempting to remove gender expression info
Externí odkaz:
http://arxiv.org/abs/2206.09191
Image captioning is an important task for benchmarking visual reasoning and for enabling accessibility for people with vision impairments. However, as in many machine learning settings, social biases can influence image captioning in undesirable ways
Externí odkaz:
http://arxiv.org/abs/2106.08503