Chesterton Digital Library
How do you take 6,000+ word documents, put them online, and make them searchable?
The Challenge
The Society of G.K. Chesterton had a goal: They wanted to make the complete works of the 20th century author G.K. Chesterton available the world. Volunteers had scoured the world finding and cataloging the works of the writer G.K. Chesterton. They visited libraries and national archives, tracking down the original documents, scanning them, and typing them up into word documents. The Society had amassed thousands of word docs and gigabytes of source images sitting on a hard drive but didn't know how to get it out to the world.
They had reached the limits of their knowledge and resources. They were stuck wondering: How do we put these works online? How do we make them searchable? How can we do this within our budget? How do we allow volunteers to proofread them?
The Solution
Manalive Software partnered with the Society to identify the goals of the project and an iterative way to achieve them. First, we identified a subset of the works, scripted a way to convert them from a word document into a markdown file. It allowed volunteers to proofread the document online. It also allowed us to track who is made what changes to a work.
Next we took that first set of works and put them online. This was our first win! We made hundreds of his works accessible to the public for the first time. With this win under our belts, we continued to convert and publish additional batches of works.
Now that the raw material was there, we turned our focus to adding search. For this task we extracted the text out of the markdown files and pushed it into an Elasticsearch database. Elasticsearch is a database that is particularly good at searching through tons of text quickly. We released this functionality to much appreciation, as this is the killer feature that many users have been wanting for years.
The Outcome
We've put 6,000+ of Chesterton's works online and searchable. The Society is incorporating these works into their study guides for small groups. The CDL has driven traffic and donations to the Society.
We chose technologies and platforms which fit into the Society's budget. Delivering static content and using a hosted database allows monthly maintenance costs to be kept to a minimum.