Using open source tools in a newspaper digitization workflow

At the GLBT Historical Society we’re diligently digitizing more than 1,500 issues of the Bay Area Reporter, the San Francisco-based weekly newspaper that’s been serving the LGBT community since 1971. Thanks to a generous grant from the Bob Ross Foundation, we purchased a shiny new scanner that could accommodate newspaper spreads, and we set about digitizing the paper to the specifications put forth by the National Digital Newspaper Program (NDNP) and the California Digital Newspaper Collection (CDNC). When the project is complete, we’ll have created a publicly-accessible, full-text-searchable collection of over three decades worth of LGBT and California history, written week by week.

The software that accompanied our scanning hardware appeared well-suited for the task, with image processing capabilities like deskewing, optical character recognition (OCR), and image format conversion baked in. However, in practice we quickly realized that while this software worked well for small projects and one-off scans, it was not sufficient for the large-scale effort before us, mainly because it did not allow us to shift this processor-intensive work to off hours. We set out to construct a digital workflow using free, open-source tools that would replicate these image-processing tasks, but could churn through large batches of newspaper scans at night and over the weekend, freeing up precious work hours for staff, interns, and volunteers to move quickly from one newspaper issue to the next. Continue reading Using open source tools in a newspaper digitization workflow

Digital collections workflows at CHS

The California Historical Society recently added four collections of historical photographs to its digital library, including images of Los Angeles at the turn of the 20th century, and photos taken by a 15-year-old Alice Burr of volunteer infantrymen mustering in San Francisco during the Spanish-American War. These collections and more are available at

Perhaps more importantly, we’ve established new guidelines and workflows for our digital collections that help streamline time-consuming processes like cataloging digital objects at the item level and creating robust MODS records, preparing digital objects for ingestion in our Islandora DAMS, and making collection- or system-wide changes to objects’ descriptive metadata. Our GitHub account is a growing public repository of our digital tools and documentation of these workflows. Continue reading Digital collections workflows at CHS

If you love these blues: A Mike Bloomfield discography

I had the great privilege of compiling the discography at the back of the newly-published Michael Bloomfield: The Rise and Fall of an American Guitar Hero, a totally revised and expanded edition of Ed Ward’s 1983 biography of the Chicago-born blues guitarist, from Chicago Review Press.

There have been a few nice write-ups of the book, from Rolling Stoneamong others. And a blurb on the back cover from Douglas Brinkley reads, in part: “The discography alone is worth the price of admission. Highly recommended!”

It was thanks to Ed Berger, of the Institute of Jazz Studies at Rutgers University, that I got the gig. Continue reading If you love these blues: A Mike Bloomfield discography

Fender Rhodes refurb project

Several years back I decided to give my dad’s 1975 Fender Rhodes Mark I Stage 73 a bit of love and care. It was already in great shape; it didn’t leave the house much, if at all, was cosmetically beautiful, and all 73 pickups still worked. But the action was very sluggish and the tone fairly dead. I started reading up on refurbishment projects and with some help from Rhodes forums and Vintage Vibe soon had the piano completely apart in the basement, with new hardware [see here and here] and the Miracle Mod ready to be installed.

When the Rhodes was back together, the action was indeed quicker, and after some amateur attempts at voicing, the tone was brighter, with a nice twinkly upper register and bass with a little bark. I also had a pro tuner work on it, a guy from the East Village who specialized in electric pianos, and who I picked up from the Staten Island Ferry and brought to my apartment. He used his vintage stroboscopic tuner, and the old thing ended up sounding pretty great.

Here’s a briefly annotated look at the project: Continue reading Fender Rhodes refurb project

Visualizing jazz discography

As jazz music evolved alongside sound recording technology and the record industry, so too did the study and cataloging of sound recordings, or discography. From the early discographies of Charles Delaunay through the work of Brian Rust, Tom Lord, and many others, jazz discographers have published thousands upon thousands of pages of highly structured data about jazz records and jazz musicians.

The free database software BRIAN (in honor of Brian Rust), by Steve Albin, allows users to compile their own discographies in the Rust style and easily output this information as HTML. By web-scraping and parsing this data, we can visualize musicians’ performance and recording careers, and better understand the professional relationships of working musicians. Continue reading Visualizing jazz discography

Sounds of the street

Over at Tumblr I have an ongoing project I call Audio Litter. When I see a discarded CD, cassette, pair of earbuds, or other audio carrier or listening device, I snap a photo and post it. Simple. Most of the photos I’ve taken so far have been on my walk between my apartment in the St. George neighborhood of Staten Island and the SI Ferry terminal.

I was inspired by the changing character of our audio litter — more cheap earbuds, fewer CDs — and I had been thinking along the same lines as Atlantic writer Adrienne LaFrance who last year filed a piece on deteriorating CDs:

Disc drives are disappearing from newer models of laptops and cars. Many of the places we used to buy CDs—Tower Records, Sam Goody, Borders—are gone. The memory of jogging with a Discman in hand seems absurd now, but that rain-slicker-yellow Sony Sports model was once top of the line. Even the iPod that replaced it feels like a brick compared with its slim successors.

Yes, the ubiquity of a once dominant media is again receding. Like most of the technology we leave behind, CDs are are being forgotten slowly. Eventually, even the fragments disappear. No more metallic shards of broken discs glinting from the gutter. No more old strands of tape cassette tangled in tree branches like tinsel. We stop using old formats little by little. They stop working. We stop replacing them. And, before long, they’re gone.

But once I started keeping my eyes peeled for this musical trash, I was surprised at the number of CDs and even cassettes I found on sidewalks, streets, and medians. Of course, in my neighborhood there are a lot of cars, and older cars do still have players for these things. That would explain the shattered Belkin cassette adapter I found last December. But I expect to make fewer discoveries like these in the years ahead.

Tumblr Image Bot: A friendly social media robot

I, for one, welcome our new robot overlords.

The ARChive of Contemporary Music website features many image galleries depicting items from the collection, including great album and book covers, 45-rpm adaptors, punk flyers and more. Since the launch of the site in May 2014, web traffic to the galleries has been relatively low, about a third of the number of users that hit the homepage. The ARC’s social media posts also have relatively low reach and low engagement (e.g., average interaction per tweet = 1).

As an ARC employee and the developer of the ARC website, I thought that by repurposing interesting, fun, and quirky digital content in the context of social media, perhaps we could better engage followers, attract new users, and drive new traffic to the site, potentially attracting new donors to the non-profit archive.

This was my idea when I was dreaming up a final project for LIS 664 – Programming for Cultural Heritage. By the end of the semester, I had written some Python scripts that, in conjunction with free web services, allowed me to put this idea to the test. Continue reading Tumblr Image Bot: A friendly social media robot

Digitizing 1980s TV ads from VHS

This was a practice attempt with at-home VHS digitization using a composite-to-USB converter and an iMac. Why commercials? Why not? Thirty-second spots were easier to handle than three- to four-minute music videos or half-hour shows, and they’re beautiful pop-cultural snapshots. The tape I started with happened to be from 1984. There’s a local New York City spot for Wheel of Fortune, a fun Pepto Bismol ad, and a strange Quasar spot with pulsating alien eggs and a Martin Sheen voiceover, along with a few other gems. Click through for the playlist. Continue reading Digitizing 1980s TV ads from VHS