“Cleaning” descriptive metadata is a frequent task in digital library work, often enabled by scripting or OpenRefine. But what about when the issue at hand isn’t an odd schema, trailing whitespace, or inconsistent capitalization; but pervasive racial or gender bias in the descriptive language? Currently, the work of seeking to remediate the latter tends to be highly manual and reliant on individual judgment and prioritization, despite their systemic nature.
This talk will explore what using programming to identify and address such biases might look like, and argue that seriously considering such an approach is essential to equitably publishing digital collections on a large scale. I’ll discuss precedents and challenges for such work, and share two small experiments to this end in Python: one aided by Wikidata to replace LCSH terms for indigenous people in the U.S. with more currently preferred terminology, and another using natural language processing to identify where women are named as Mrs. [Husband’s First Name] [Husband’s Last Name].