|
|
|
Electronic Theses and Dissertations (ETDs) encapsulate significant research findings and innovative ideas but often have limited visibility and accessibility, particularly in regions and disciplines with restricted digital reach. This workshop introduces an LLM-based application using a Retrieval-Augmented Generation (RAG) architectural approach to address these challenges. By utilizing LLMs to translate and standardize ETD metadata and content into a user’s native language, a unified vector database is established as a knowledge source for retrieving relevant information. This information is then supplied to the LLMs to generate comprehensive responses, enhancing searchability tailored to local or remote ETD collections. This approach improves the indexing and discoverability of ETDs and ensures accessibility across linguistic boundaries. During the workshop, we will present the details of this system's components, illustrating the program workflow and the interaction dynamics between the query, retrieval, and response generation phases. Participants will learn how to integrate these technologies into their digital library systems and repositories, adapting them to various institutional needs to enhance their ETD collections' global visibility and utility. |
ETD 2024 Workshop 2 |
|
Learning ObjectivesBy the end of this workshop, participants will be able to:
Target AudienceThis workshop is designed for:
Workshop FormatThe workshop will employ a mixed format, combining lectures, interactive discussions, and hands-on sessions:
Technical Requirements
Estimated DurationTotal duration: 4 hours
Workshop Outline1. Introduction and Overview (10 minutes)
2. Lecture: LLMs and RAG Architecture Basics (40 minutes)
3. Break (10 minutes)4. Lecture: ETD Metadata Standardization and Translation (40 minutes)
5. Break (10 minutes)6. Hands-on Session: Implementing LLM Solutions (90 minutes)
7. Break (10 minutes)8. Discussion: Challenges and Opportunities (30 minutes)
Biography of Workshop LeadersYinlin Chen holds a Ph.D. in Computer Science and Applications from Virginia Tech, and a M.S. and a B.S. in Computer Science from National Tsing Hua University, Taiwan. He is an Assistant Director of the Center for Digital Research & Scholarship and an Assistant Professor at the Virginia Tech Libraries. His professional interests include Digital Libraries, Machine Learning, Artificial Intelligence, and Cloud Computing. William A Ingram is an Associate Professor at Virginia Tech and serves as Associate Dean and Executive Director for Information Technologies in the University Libraries. He holds a B.A. in Cognitive Science from the University of Virginia and an M.S. in Library and Information Science from the University of Illinois at Urbana-Champaign. Ingram's research focuses on digital libraries and information retrieval, particularly applying machine learning and AI to improve access to digital collections. He is also instrumental in organizing workshops on AI for libraries and cultural heritage organizations, with an emphasis on ethics and bias mitigation. Edward A Fox is a Professor Computer Science at Virginia Tech, where he directs the Digital Library Research Laboratory. Since 1983 he has taught courses on digital libraries, information retrieval, multimedia/hypertext/information access, etc. He is a Fellow of ACM, IEEE, AIIA, and AAIA. His degrees are from MIT (BS) and Cornell University (MS, Ph.D.). He serves as Executive Director and Chairman of the Board of the Networked Digital Library of Theses and Dissertations (NDLTD). He collaborates with Yinlin Chen and William Ingram on IMLS grants related to the topics of this workshop. |
|