Princeton University Library is moving to a new model for research data ingestion, dissemination, discovery, and preservation. This replaces a monolithic legacy application that is no longer sustainable, and the new system that is emerging comes in the form of the Princeton Data Commons (PDC) ecosystem. Our team has taken a decoupled approach to supporting the research data lifecycle, from ingestion to publication, to preservation and discovery in the new, lean, service-focused applications PDC Describe (for ingestion/storage) and PDC Discovery (for dissemination/discovery). We have also undertaken a data migration model for moving existing research data objects out of the old system and into the new with a migration/re-description workflow that allows legacy research data content to be enhanced, receiving maximum benefits of the new system, including more suitable metadata schemas, improved file storage/serving, and better support for researcher identities and stable, unique identifiers. This presentation will detail the design and development of PDC Describe and Discovery, how they work with existing data, and what doors this new workflow model opens for future research data support enhancements.
In this presentation, I’ll go over a real world scenario of resolving a bug by way of writing tests. This involves refactoring to improve the state of the code while also demonstrating expectations around the given solution. From this, we’ll generalize an approach for fixing bugs within an application; and provide guidance on how to submit those changes upstream.
Do you need to know detailed information about your digital collections in your repository services? While your repository’s dashboard interface might provide some of these answers, it often is not the best way to get detailed answers to complex questions in an exportable format.
At Tufts University, we have a history of querying SOLR within our Hyrax repository environment. In this presentation, we will show how we used this data to aid in annual reporting and make decisions about feature improvement requests. Also, we will highlight a lightweight Ruby application that uses SOLR queries to locate works and their filesets as they exit their embargo periods, ensuring appropriate access restrictions.
While skill in querying SOLR is common and necessary for developers, we intend to show its user friendliness and usefulness for service managers and look forward to conversation about other ways it might be helpful.
This presentation aims to propose and detail a set of strategies which might be used to automate the administration of GitHub organizations, teams, and repositories using the GitHub API and the GitHub command-line utility. Ideally this might be synchronized into proposed Hyrax work cycles and Developer Congresses.
We know that many of our users appreciate content packaged as PDF. When we were looking to start using OCR, we knew that PDF-with-text-layer is one way we'd like to deliver that. But delivering PDFs is not very common in our community of software practice, and we found that there was a lot of domain knowledge and tooling landscape and options to figure out. I will provide specific details of the automated pipeline we ended up building, primarily using open source unix command-line tools, and the choices and tradeoffs we made, to create multi-page PDFs with text layers from high-resolution digitized images.
This presentation will share a Fedora community update and discuss how we are focusing on charting a path forward by reflecting on the challenges we have faced in the past. Not everything works the first time around and the Fedora community has shown it’s resilience in navigating uncertainty. We will also discuss some of the great successes we have seen recently as our Fedora 6.x user base grows and adoption rates continue to increase. We will talk about newly developed features and integrations for Fedora 6 and how to take advantage of them for robust digital preservation purposes. Lastly we will share our future plans and community roadmap - all created and established through a series of strategic listening sessions held throughout the summer. These sessions provided unique insight into how our community is using the software and ways we can focus our support to meet the needs of many.
Metadata changes have historically been a costly development activity within the Samvera community. Over the years, there have been evolving community efforts to tame this problem by providing mechanisms for metadata configuration. This presentation will cover Project Surfliner’s implementation of the Machine-readable Metadata Modeling (M3) Specification—now nearly feature‐complete, and including some necessary extensions—focusing on how it supports our metadata experts in managing different schemas for our adopter campuses and coordinating metadata mappings between applications in our growing digital library architecture. We’ll also cover how our work fits into the technical landscape of configurable metadata in other community projects.
Fewer and fewer of our services run on a single server. With on premises network resources or full blown cloud services like AWS or GCP there are lots of options for how to deploy your application. This panel will explore some practices, talk about what is working and what is not working. We'll discuss topics like monitoring, logging, deployment processes, resiliency and training.
Problem: For the American Archive of Public Broadcasting migrating data from an old metadata management system (AMS1) to a Hyrax-like (AMS2) became too slow after more than 50K assets. To date we have tried replacing batch ingests with Bulkrax, improved the templates that build the system, conducted speed tests, and added additional AWS resources. The work has been a collaboration between SoftServ and GBH on Bulkrax and the data model, the Picard Maneuver, and an ID rewrite.
Results: Performance improved with Bulkrax, we reran failures only instead of the whole ingest, the Picard Maneuver not working. However a problem with IDs resurfaces and running a script to rebalance the tree. The unbalanced tree with the IDs is the smoking gun. Currently trying to ingest metadata into a post-gres backed app and hoping the speed improves are once the tree is balanced.
Exectuive Director GBH Archive, WGBH Educational Foundation (GBH)
Karen Cariani, is the David O. Ives Executive Director of the GBH Archives and GBH Project Director for the American Archive of Public Broadcasting, a collaboration with the Library of Congress to preserve and provide a centralized on-line access to content created by public media... Read More →
Northwestern University Libraries have begun exploring new ways to interact with our collections using the natural language capabilities of OpenAI's GPT family of large language models (LLM). We're building a chat interface that answers queries by pairing an LLM's conversational ability with the digital collections data under our control in a vector database. This allows users to interact with digital collections using natural language while mitigating AI challenges, often referred to as "hallucinations" in the current discourse. Throughout our prototype, we have dedicated effort to exploring ways to increase transparency and safety for our users. We are eager to share our experiences with the Samvera Community, and look forward to hearing about other initiatives currently underway!
Avalon Media System, a Samvera application for audio/video content, was originally written using ActiveFedora and Fedora 3 over a decade ago and migrated to Fedora 4 six years ago. Avalon has been mature and stable for years with over 20 installations at different institutions but the time has come to upgrade to Fedora 6 for resilience, performance, and sustainability reasons. While Valkyrie seems a natural choice since it already has support for Fedora 6, the cost of switching is high. This presentation will detail our experiments with getting Avalon to run on Fedora 6 using ActiveFedora along with a discussion of the pros and cons to sticking with ActiveFedora.
The COAR Notify Initiative will enable open-access repositories across the world to easily connect with and exploit value-adding services. To this end, COAR has developed the Notify Protocol - an asynchronous messaging protocol based on W3C Linked Data Notifications and Activity Streams 2.0. The Notify Initiative has a significant grant from Arcadia, a charitable foundation. Using this funding, we are working with some repositories and services to establish reference implementations, demonstrating key use-cases (e.g. peer-review for pre-prints, linking publications to related research-data etc.). The team is also starting to work with repository platform providers and communities, supporting them in "Notify-enabling" their platforms. We would like to invite the Samvera community to participate in this exciting initiative, by outlining the goals and benefits, and to explaining how you can get involved.
Paul Walk is Founder and Director of Antleaf, a digital consultancy, which delivers technical consultancy, management and development services, in the general domain of research data management.
Wednesday October 25, 2023 2:05pm - 2:30pm EDT
Ballroom D
The Samvera and Fedora communities have a long-standing relationship of collaboration and mutual support. Presently, there is growing interest across the communities to upgrade Hyrax instances backed by Fedora 6.x in institutions of varying sizes. But migrations are costly and resource intensive and there is always a risk, when the pathways forward are challenging, that people will be left behind.
This presentation will take a look into the collaborative work being done by stakeholders who straddle both communities to support those institutions on Hyrax who want to take advantage of the updated digital preservation capabilities offered by Fedora 6. We will discuss how the need came to our attention, and talk through the collaborative efforts these stakeholders have undertaken to serve users straddling both communities. Throughout the process, our focus has been entirely on building strength through listening, open dialogue and sharing resources and experience to bring all users forward.
Successfully making Oregon Digital more accessible was part of the minimum viable product for our recent migration to Hyrax 3. Meeting this required the Team to acquire an understanding of the Web Content Accessibility Guidelines (WCAG 2.0) and the skills to test the UI and implement solutions. Integrating this work into our normal workflows for the first time proved a challenge, from assigning existing personnel and recruiting outside expertise to the documenting of problems and the timing of dedicated sprints. We did many things right and a few things wrong, learned from the experience and are prepared to move forward with accessibility issues foremost in mind for future maintenance and development.
Ruhr University Bochum (Germany), working with Antleaf and Cottage Labs, is developing a Research Data Management System based on Samvera Hyrax 3, using remote S3-based object data storage. This presentation will focus on the technical challenges involved in adapting Hyrax covering in particular:
1. The use of Hyrax to support university research staff in managing, preserving and publishing their data 2. The use of remote (S3) object storage instead of Fedora for storing all content 3. The use of external authentication (Shibboleth and ORCID) for all users 4. Creating new "work types" to cater for different types of relatively complex, nested research data models 5. The challenge of accommodating relatively complex, multi-stage review workflows involving several different users in different roles
We are interested in comparing our experiences with others in the Samvera community, and so will be very open to discussion on any of these - or related - topics.
Join this session to learn about the outcomes of the Hyku for Consortia: Removing Barriers to Adoption Project, funded by an IMLS National Leadership Grant for Libraries. Now at the close of the project, the team has reduced roadblocks for institutions and communities to adopt Samvera Hyku as a low-cost, multi-tenant repository solution. This presentation will report on community feedback, user experience research, technical developments, the toolkit for repository collaboration, and a gap assessment for future work.
Ramp is a general-purpose IIIF powered audio/video media player built as a series of React components. Formerly known as the IIIF React Media Player, it includes interactive components for media playback, structural metadata, and transcript presentation. It will be the primary media player for the Avalon Media System moving forward, and the Avalon team is currently adding new components for supplemental files and playlists. The goal of this work is to enable Ramp to provide a full-featured item display using data from a IIIF Presentation 3.0 manifest.
This presentation will provide an overview of Ramp and its constituent components as well as an update on current and future work as the Avalon team adds functionality to meet common use cases.
Head, Digital Media Software Development, Indiana University Libraries
Emily Lynema is the Head of Digital Media Software Development at Indiana University Libraries. Her team develops systems that support access to digitized audiovisual materials, including the large volume of content digitized by the University’s Media Digitization and Preservation... Read More →
Wednesday October 25, 2023 3:25pm - 3:50pm EDT
Ballroom E