Natural Language Processing for Discovery of Born-Digital Records

How do we move from discussing new technologies to actually implementing them? This presentation will cover several applications of NLP (natural language processing) for improving discovery of born-digital records in special collections and archives, focusing on two NLP-centered projects at the North Carolina State University Special Collections Research Center. The first is the implementation of out-of-the-box software tools to enable browsing collections by named entities in the reading room. The second project is the integration of named entity recognition and topic modeling into born-digital processing workflows in order to improve and automate aggregate description of collections. The presentation evaluates the success of these projects and describes possibilities for the future extension of these tools vis-à-vis practical applications. This presentation is framed in terms of working solutions that add value to the researcher experience in the library, and works to fill the gap between theoretical discussion of these tools and realized application.

Speaker(s)

Emily Higgs

February 20^th

01:30 PM

15 minutes