At the GLBT Historical Society we’re diligently digitizing more than 1,500 issues of the Bay Area Reporter, the San Francisco-based weekly newspaper that’s been serving the LGBT community since 1971. Thanks to a generous grant from the Bob Ross Foundation, we purchased a shiny new scanner that could accommodate newspaper spreads, and we set about digitizing the paper to the specifications put forth by the National Digital Newspaper Program (NDNP) and the California Digital Newspaper Collection (CDNC). When the project is complete, we’ll have created a publicly-accessible, full-text-searchable collection of over three decades worth of LGBT and California history, written week by week.
The software that accompanied our scanning hardware appeared well-suited for the task, with image processing capabilities like deskewing, optical character recognition (OCR), and image format conversion baked in. However, in practice we quickly realized that while this software worked well for small projects and one-off scans, it was not sufficient for the large-scale effort before us, mainly because it did not allow us to shift this processor-intensive work to off hours. We set out to construct a digital workflow using free, open-source tools that would replicate these image-processing tasks, but could churn through large batches of newspaper scans at night and over the weekend, freeing up precious work hours for staff, interns, and volunteers to move quickly from one newspaper issue to the next. Continue reading Using open source tools in a newspaper digitization workflow
The Linked Jazz project has derived most of the social relationships in its dataset from the transcripts of oral histories given by jazz musicians. One question we began to ask some time ago is: what other jazz historical material in digital form would be a good source of additional relationship data? One answer to that question is digitized photographs, specifically those with good-quality metadata.
Tulane University has a rich collection of historical photographs of jazz musicians living and performing in New Orleans and around the world. The Hogan Jazz Archive Photography Collection and Ralston Crawford Collection of Jazz Photography are two such collections, and we received two tab-delimited text files from Tulane, exported from their CONTENTdm system.
Some numbers: in this set we have 1,787 images, at least 681 unique individuals, and more than 2,700 depictions. Depiction is the FOAF term that we later used as a predicate in our triples from this dataset. One group photograph might depict several individuals, and one individual might be depicted in several photographs. People depicted in the same photograph can be said to “know” each other in some way.
In this post, I’ll describe the process we used to first standardize and reconcile the photograph metadata, and then describe the photographs and the people and relationships depicted using RDF triples. Continue reading Connecting musicians through the photo archive
I, for one, welcome our new robot overlords.
The ARChive of Contemporary Music website features many image galleries depicting items from the collection, including great album and book covers, 45-rpm adaptors, punk flyers and more. Since the launch of the site in May 2014, web traffic to the galleries has been relatively low, about a third of the number of users that hit the homepage. The ARC’s social media posts also have relatively low reach and low engagement (e.g., average interaction per tweet = 1).
As an ARC employee and the developer of the ARC website, I thought that by repurposing interesting, fun, and quirky digital content in the context of social media, perhaps we could better engage followers, attract new users, and drive new traffic to the site, potentially attracting new donors to the non-profit archive.
This was my idea when I was dreaming up a final project for LIS 664 – Programming for Cultural Heritage. By the end of the semester, I had written some Python scripts that, in conjunction with free web services, allowed me to put this idea to the test. Continue reading Tumblr Image Bot: A friendly social media robot