Machine Learning and Metadata with the Charles Teenie Harris Archive

In July 2018, the project team (co-led by an archivist and a creative technologist) conducted a one-week of intensive, dedication to scripting and testing experimental code to document the limitations, capabilities, and costs of machine learning, text parsing, computer vision, and crowdsourcing technologies on making a meaningful contribution to archival metadata. This project is specifically designed to evaluate the viability of using technology to solve the problem of “dirty” data in the Charles Teenie Harris item-level catalog records. Charles “Teenie” Harris (1908-1998) was a photographer for The Pittsburgh Courier, one of the most influential black newspapers of the 20th century. In career spanning more than four decades, Harris captured the events and everyday experience of African American life in Pittsburgh in a collection of almost 80,000 images. This presentation will present the successes, challenges, and future opportunities (both archival and administrative) of this work, and how automation of metadata creation and cleaning can be applied to photography-based collections.

Speaker(s)

11:20 AM
15 minutes