As jazz music evolved alongside sound recording technology and the record industry, so too did the study and cataloging of sound recordings, or discography. From the early discographies of Charles Delaunay through the work of Brian Rust, Tom Lord, and many others, jazz discographers have published thousands upon thousands of pages of highly structured data about jazz records and jazz musicians.
The free database software BRIAN (in honor of Brian Rust), by Steve Albin, allows users to compile their own discographies in the Rust style and easily output this information as HTML. By web-scraping and parsing this data, we can visualize musicians’ performance and recording careers, and better understand the professional relationships of working musicians.
Abbey Lincoln (1930-2010) was a jazz singer, songwriter, and actress active from the 1950s through the 2000s. An Abbey Lincoln discography compiled by Michael Fitzgerald is available at jazzdiscography.com, home to many BRIAN-generated discographies.
Below is a snippet of the discography that includes two recording sessions for the 1960 LP We Insist! Max Roach’s Freedom Now Suite. These are good examples of the typical structure of a BRIAN-generated discography. General information about the session is at top, followed by a list of all musicians on the session (the collective personnel) with their instrument(s) in parentheses. Below that are the performances; each is given a letter to indicate the order of performance in the studio (not to be confused with the order in which the recorded performances might appear on an LP or CD.) Release, or issue, information is next — record label, format, catalogue number, title, and year of a published recording. This is followed by personnel exceptions, which specify which tracks a musician played on, if he did not play on all. Finally, there are general notes.
Looking at the HTML, we see some CSS class names, like “SessCollPers” (highlighted), we can use as hooks for our web scraper.
Collecting the data
I wrote two Python scripts to collect the necessary network data from the HTML discography, parse it, and output it as CSV files formatted for Gephi. The first script scraped the HTML page for data about each recording session, including date, location, record label, works performed, and most importantly, personnel.
The second script created a Gephi edge table by looping through each personnel list and creating a row for each relationship between two musicians. The meat of this script is shown below.
These lines look for the name of our central player (Abbey Lincoln in this case) in the list of session personnel to confirm that this person indeed played on a particular session. It then loops through each player and creates a relationship with every other player on the session, as long as the reciprocal relationship has not already been defined; adds “undirected” as the Gephi edge “type”; and (optionally) creates edge attributes from the session date and location information.
I split this edge data into 20-year segments — 1950s-1960s, 1970s-1980s, and 1990s-2000s. This helped me illustrate Lincoln’s career arc and how her professional recording relationships changed over time.
I also created node tables listing a musician’s main instrument, as well as the range of years in which he collaborated with Lincoln in the studio.
Visualizing the data
Abbey Lincoln, 1950s-1960s
Fifty-five nodes, or just 26% of the total nodes in the network, appear in this graph. It’s the most tightly connected graph of the three subsets, with a density of 0.196. We can surmise that Lincoln was mostly recording with smaller combos during this period, and working with the same musicians on several occasions.
One of those frequent collaborators was drummer Max Roach, Lincoln’s husband from 1962 to 1970. Roach appears in the graph as the large red node closest in proximity to Lincoln; they share an edge weight of 16. Roach is connected to many of the other musicians, with a degree of 39.
Abbey Lincoln, 1970s-1980s
At first glance it appears that Lincoln was much less active in her recording career during the 1970s and ‘80s. And indeed, with only 35 nodes, or 16% of the total, and just 6.8% of all edges, this graph shows the fewest musicians and the fewest connections of the three subsets. This graph is also less dense than the preceding one, and there are more distinct clusters, indicating fewer collaborations among members of the different combos with whom Lincoln worked.
Abbey Lincoln, 1990s-2000s
The latter years of Lincoln’s career were her most fruitful, with 10 albums recorded for Verve in this period, along with other recorded performances. She also performed with some larger ensembles than in the past — at least one orchestra is identifiable as a large cluster at the bottom right.
There are 142 nodes in this graph — exactly two-thirds of the total nodes in the network — and nearly 77% of all edges, indicating that Lincoln collaborated with many more musicians during this phase of her career than she had in the past.
Bassist Michael Bowie, represented here as the large blue node at Lincoln’s 11 o’clock, was Lincoln’s most regular collaborator; the pair share an edge weight of 20, and Bowie has a degree of 47.
Putting it all together
Maintaining our color scheme for the three phases of Lincoln’s career allows us to approximate a timeline on the full network graph, with her early collaborators in red on the left and later collaborators in blue on the right. Musicians who span the different phases are represented by the appropriate secondary color. We can see some significant overlap between her mid- and late-career periods, represented by the green nodes, which highlights the arbitrariness of the cutoffs of our two-decade spans. Lincoln and the two musicians with whom she worked throughout her career, Mal Waldron and Cedar Walton, are in gray. These individuals, along with two purple nodes — heavyweights Babatunde Olatunji and Stanley Turrentine, with whom Lincoln worked in the early and late parts of her career — act as bridges through time, connecting different generations of musicians.
I then used the Sigma.js Exporter plugin for Gephi to create an interactive network graph. (Note: After I uploaded this exported content to a subdirectory of my WordPress installation, I had to edit my .htaccess file in order for the URL to resolve properly. This Stack Overflow answer was helpful.)
I plan to connect this project with my work on Linked Jazz by expressing in RDF the discographic data I web-scraped and parsed. To this end, I’ve begun sketching a data model and drafting some JSON-LD, making heavy use of the Music Ontology. I’ll need to reconcile the musicians’ names with a name authority file, likely VIAF, to obtain URIs, and further refine my data model.