I, for one, welcome our new robot overlords.
The ARChive of Contemporary Music website features many image galleries depicting items from the collection, including great album and book covers, 45-rpm adaptors, punk flyers and more. Since the launch of the site in May 2014, web traffic to the galleries has been relatively low, about a third of the number of users that hit the homepage. The ARC’s social media posts also have relatively low reach and low engagement (e.g., average interaction per tweet = 1).
As an ARC employee and the developer of the ARC website, I thought that by repurposing interesting, fun, and quirky digital content in the context of social media, perhaps we could better engage followers, attract new users, and drive new traffic to the site, potentially attracting new donors to the non-profit archive.
This was my idea when I was dreaming up a final project for LIS 664 – Programming for Cultural Heritage. By the end of the semester, I had written some Python scripts that, in conjunction with free web services, allowed me to put this idea to the test.
I also wrote a blog post over at the ARC site about how I am using this tool for ARC content.
Methods: To efficiently capture and repost this content, I wrote two complementary Python scripts. The first scrapes (using Beautiful Soup) image URLs and metadata from the ARC photo gallery pages, which are generated by the NextGEN Gallery plugin for WordPress. This data is written out to a JSON file.
The second script uses pytumblr, a Python Tumblr API client, to build and send a user-determined number of randomly-selected photo posts to Tumblr along with appropriate caption text and tags. The JSON file is then updated to indicate which images have been posted by means of a true/false value.
Why Tumblr? It’s free, many themes support a photo-gallery style layout, users can queue and schedule up to 300 posts for publication, and it can serve as a social media hub – using a service like IFTTT.com, Tumblr photo posts can trigger parallel photo posts on Twitter and Facebook.
Installing pytumblr was a challenge. After some troubleshooting, I realized it was the OAuth2 module, which is a dependency of pytumblr, that was causing the installation to fail. Eventually I was able to install and run pytumblr using Python 2.7.
After writing the scripts specifically with the ARChive of Contemporary Music in mind, I went back and moved all ARC-specific data into a separate settings.py file. This leaves the Tumblr-post code clean and generic – in theory, someone else could use this code as their own Tumblr bot, repurposing their own set of images. However, this process was much more difficult for the web scraper script, as this kind of image scraping is so context dependent.
The GitHub repository for this project contains all relevant scripts and data files, except for the actual settings file I use for ARC content, which includes confidential API keys. It also includes a README written in Markdown, a more detailed description in a TXT file, and a PDF of the slides I used when presenting this project to the class.
Ideas for future development of this tool include:
- Continue “abstracting” the code for the web scraper
- Update the README.md documentation accordingly
- Continue analysis on ARChive web traffic and social media engagement
- Systematically change variables in social media posts (time of day, caption text, tags/hashtags) and observe effect on engagement and site traffic
- Set up web scraping “profiles” for other popular image-gallery generators, e.g., CONTENTdm, Omeka, Flickr, Tumblr itself