Open Discovery for Open Data: An Open Source Index for Institutional Research Data

Many libraries currently support two institutional repository systems—one for publications, and one for research data—even when there are nearly a thousand data repositories in the United States. To do so, we either increase spending by purchasing data repository solutions from vendors, or replicate work by building, customizing, and managing individual instances of data repository software. Especially for small and midsized institutions, this feels overwhelming to our limited resources. This poster suggests a potential solution: a centralized metadata store for datasets produced by an institution’s researchers. With funding from the Institute of Museum and Library Services, we have created a prototype for an open source institutional research data index (IRDI) that promotes discovery of existing datasets that are housed in third-party repositories. Google Dataset Search has recently come onto the scene and piqued our imaginations around what is possible for research data discovery. IRDI complements Google Dataset Search, SHARE, DataMed, and other research data indexes, adding to the conversation a three-pronged focus. First, IRDI promotes discovery for institution-specific research datasets, thus allowing institutions to showcase research data as a scholarly product and a driver of institutional reputation. Second, IRDI provides newly-generated descriptive metadata for individual datasets, gleaned through topic mining of scholarly profile sources like ORCID and Google Scholar Profiles. Third, IRDI content is optimized for discovery by commercial search engines. IRDI is one step toward community-driven, community-owned index for academic institutional research data. Such an index would in turn not only increase discovery, reuse, and citation of open research data, but also act as an easy-to-implement, open source, library-built system. This presentation will demonstrate a prototype of IRDI, discuss challenges and opportunities, request feedback from the Code4Lib community, and generate discussion around open source data discovery tools.

Presenter(s): Sara Mannheimer, Jason A. Clark, and James Espeland, Montana State University

03:30 PM