Attending this event?
Learn more about Samvera Connect 20203 on the conference wiki page, including hotel reservations and information about fun things to do in Philly!

How to register for the conference: Sign Up or Log Into Sched to access the tickets and to build your schedule. Find the options to access your account at the top of the screen. Once logged in, click the purple “Reserve Tickets” button.

Call for proposals for lightning talks and posters is open until September 15th!
Back To Schedule
Wednesday, October 25 • 10:10am - 10:35am
Creating PDFs with OCR text layers from digitized content

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

We know that many of our users appreciate content packaged as PDF. When we were looking to start using OCR, we knew that PDF-with-text-layer is one way we'd like to deliver that. But delivering PDFs is not very common in our community of software practice, and we found that there was a lot of domain knowledge and tooling landscape and options to figure out. I will provide specific details of the automated pipeline we ended up building, primarily using open source unix command-line tools, and the choices and tradeoffs we made, to create multi-page PDFs with text layers from high-resolution digitized images.


Jonathan Rochkind

software developer, science history institute

Wednesday October 25, 2023 10:10am - 10:35am EDT
Ballroom C
Feedback form isn't open yet.

Attendees (2)