(Capitalism and Material Life, 1400-1800). PDF document 'Braudel's Concepts and Methodology Reconsidered' Check. Braudel took responsibility for preparation of what became a three-volume series and was sole editor of the Annales during its most influential period.
Abstract
This article introduces the project A Big Data History of Music, which set out to unlock the bibliographical data held by research libraries in order to create new research opportunities for musicologists. The project cleaned and enhanced aspects of the British Library catalogues of printed and manuscript music, which are now available as open data. It also experimented with the analysis and visualization of the British Library datasets and the RISM inventories of printed and manuscript music. The article shows how quantitative analysis of these datasets can expose long-term historical trends, such as the rise and fall of music printing in 16th- and 17th-century Europe. Data analysis and visualization also facilitates research on the dissemination and canonization of specific composers (as shown by case-studies on Palestrina and Purcell) and on changing trends in genres, scoring and ethnic colourings in music (as shown by a case-study on ‘Scottish’ music).
Big Data has been defined as information that requires special processing techniques because it exists in large quantities, is highly heterogeneous, or is produced extremely quickly.1 Big Data is usually associated with major scientific endeavours such as the Large Hadron Collider or the Human Genome Project. These projects produce millions of gigabytes of data annually, to be analysed collaboratively by scientists spread over many nations. Yet humanities scholars have also been mining large datasets, such as the full-text archives produced by optical character recognition (OCR) of digitized books and other scanned documents. In 2014 the historians Jo Guldi and David Armitage called on their discipline to use quantitative analysis to understand long-term historical change—for instance, to plot the effects of climate change or the varying distribution of wealth. For Guldi and Armitage, such large-scale data analysis can help show the synchronicity and interdependence of global events, countering a focus on small case studies and microhistory.2
Literary historians have also explored the new perspectives offered by large quantities of data. Franco Moretti pioneered the technique of ‘distant reading’, in which he analysed bibliographical data such as the titles of novels and their publication details. Whereas the ‘close reading’ typically practised by literary critics focuses on a few canonized texts, Moretti sought through his quantitative analyses to gain an overview of the production of novels across the 18th and 19th centuries. He showed how political or military conflict led to an initial collapse and then a belated rise in novel writing, for instance in France after 1789 or in Milan during the wars of the late 1840s. In a study of the English novel, he showed how each genre (for example, the sentimental or the Gothic novel) was in favour for about 25 to 50 years, before being superseded by another genre.3 Moretti’s approach (not least his claim that ‘quantitative research provides a type of data that is ideally independent of interpretations’)4 is controversial and has attracted charges of positivism. In response, he and his defenders argue that ‘distant reading’ draws attention to the ‘Great Unread’, allowing the representativeness of the literary canon to be evaluated against the thousands of other novels produced in the period.5
For music historians there already exist large bibliographical datasets, including the catalogues of research libraries and the various inventories created by RISM (Répertoire International des Sources Musicales). The British Library’s catalogue of printed music, for instance, describes works by more than 100,000 composers. Such large datasets offer the possibility for a ‘distant reading’ of musical sources, directing scholarly attention away from the canonized composers that are the usual object of research and instead highlighting long-term trends. By extending the scope of musicological study in this way, we open the possibility of exploring what might be called (to adapt Moretti’s term) the ‘Great Unheard’. A further advantage of investigating bibliographical datasets is that they already possess structure, having been created according to the rules of library cataloguing; they therefore may be easier to manipulate and analyse than other large datasets available to musicologists (such as libraries of audio files).
Our project A Big Data History of Music, a collaboration between Royal Holloway and the British Library, has explored how such large bibliographical datasets may open new avenues for research into music history. The first phase of the project cleaned and enhanced various aspects of the British Library’s catalogues of printed and manuscript music. The second phase piloted techniques for analysing large datasets, in order to examine large-scale trends in music history and to use visualizations to test and develop hypotheses. This article introduces the datasets used in the project, and describes some of the results gained in the second phase of the project. It is hoped this account will whet the appetite of readers to explore the datasets and undertake similar analyses themselves.
Project datasets
Our project worked with several datasets, whose characteristics and limitations will be briefly described here. The British Library’s catalogue of printed music (search interface at http://explore.bl.uk) contains over a million records, describing publications between 1500 and the present day. The British Library has a copy of most music published in Britain, acquired as a result of legal deposit legislation, and a vast collection of material from elsewhere; however, much popular and ‘light’ music of the 20th century still has not been added to the catalogue. Catalogue entries vary markedly in their level of detail, having been accumulated over two centuries by cataloguers working to different standards. Some old records give little more than the title of the book and the place of publication, whereas the records for 16th-century anthologies have been recently upgraded to include transcriptions of title-pages and full inventories of contents. Information such as place of publication and name of publisher is recorded in the form given on the copy, so can vary enormously; thus the location ‘Lyon’ may be recorded in such variants as ‘Leon’, ‘Lions’, ‘Lugduni’ or ‘Lubduni’. The dating of much 18th- and 19th-century printed music is conjectural—often cataloguers assigned these publications to a round date such as a new decade—and therefore cannot be relied on for a year-by-year chronological analysis.
Regarding the British Library’s catalogues of manuscript music, the project primarily worked with a digitized version of Augustus Hughes-Hughes’s Catalogue of manuscript music in the British Museum (London, 1906–9), which until now has not been available electronically. The dataset derived from Hughes-Hughes’s catalogue contains more than 35,000 records, each describing an individual composition in a manuscript, with details of genre and composer where known. Unlike the catalogue of printed music, information on the place of origin is rarely given. Both British Library datasets are freely available for download from www.bl.uk/bibliographic/download.html as CSV (comma-separated value) files, for users who wish to work in software such as Excel. The catalogue of printed music is also available in RDF/XML; RDF (the Resource Description Framework) enables the exchange and reuse of data on the web, giving users the opportunity to combine this dataset with other resources.
Also used in the project were RISM datasets, particularly its inventories of early printed music before 1800: RISM A/I contains about 100,000 records describing editions holding the work of a single composer; RISM B/I contains about 17,000 records for anthologies (containing works by more than one composer). As the product of an international cataloguing effort, RISM A/I and B/I have a much wider geographical scope than the British Library catalogues. Although their coverage of Eastern European and Iberian libraries is patchy, RISM A/I and B/I probably list up to 80 per cent of extant printed editions worldwide. Like the British Library’s catalogue of printed music, RISM A/I and B/I contain information on places of publication; dates of publication are given only when included on the copy, meaning much music printed after 1700 has no date allocated to it. The final dataset used was RISM A/II, which contains over 900,000 records describing manuscripts originating between c.1500 and c.1850, often catalogued to a high level of detail, with information on constituent compositions. Its geographical coverage is strongest for German- and English-speaking lands; it has relatively few contributions from French, Italian or Iberian libraries, which have preferred to catalogue their manuscript holdings in national bibliographical initiatives. Since 2012, RISM A/II has been available as open data from http://opac.rism.info, and from May 2015 all of RISM A/I and a small portion of B/I can also be consulted via this site.6
Most of the project datasets were obtained in the format of library catalogue records (marc21), from which spreadsheets of data were exported using the tool MarcEdit (http://marcedit.reeset.net). In the initial phase of data cleaning, particular attention was given to facets such as the places and dates of publication, as these are often recorded in variant forms that can thwart automated analysis. Data cleaning and alignment were again an important part of the second phase of the project, because an excerpt of data rarely has sufficient consistency to be immediately suitable for analysis. Once a dataset has been prepared for analysis, it can be manipulated and visualized with a variety of tools, ranging from Excel spreadsheets to open-source software such as the R Project for Statistical Computing (www.r-project.org/).7 The following sections describe case studies explored in the project, showing how the analysis of large datasets allows new ways of studying music history.
The rise and fall of music publishing, 1500–1700
Quantitative analysis can allow musicologists to detect long-term trends, for instance involving the formation of musical markets and musical taste across centuries. Once a long-range development has been detected, it is possible to identify the individual items that contribute to this trend; such dynamic switching of focus between macro- and micro-scale is one of the most powerful aspects of Big Data analysis, although hard to capture in a journal article such as this. As an example, we analysed the rise and fall of music publishing in the 16th and 17th centuries, using data from RISM A/I and B/I. As mentioned above, the RISM datasets are reasonably comprehensive and have a high degree of chronological accuracy: typeset printed music of the 16th and 17th centuries is usually dated to a specific year on its title-page, unlike the engraved or lithographed music of later eras. Spellings of place names in the dataset required standardization, and geographic co-ordinates were added to facilitate the production of maps. The following analysis was then done in Excel using a spreadsheet of over 16,000 bibliographical entries.
Viewed decade-by-decade (illus.1), the RISM data shows the rise of European music publishing across the 16th century, albeit with a plateau in the 1570s and a brief dip in the 1590s. Music publishing reached a peak in the 1610s, during which decade approximately 1,800 editions of music were printed. Such an increase in printing constituted a paradigm shift in how composers disseminated their works in the 16th and early 17th centuries. Yet in the 1630s music printing suddenly declined, and for the rest of the 17th century the industry operated at about half its previous level of intensity, with never more than about 900 publications surviving from each decade.
European output of printed music by decade, 1500–1699. Red shading denotes anthologies (data from RISM B/I) and blue shading indicates single-composer editions (data from RISM A/I)
European output of printed music by decade, 1500–1699. Red shading denotes anthologies (data from RISM B/I) and blue shading indicates single-composer editions (data from RISM A/I)
The red and blue shadings in illus.1 show anthologies versus single-composer editions respectively. Anthologies dominate the early years of music printing, suggesting the entrepreneurial role of publishers and editors in this emerging industry. In the 1540s anthologies still accounted for about half of all printed music, but thereafter their number remained static at approximately 200 per decade. The subsequent growth in printed music entirely comprised single-composer collections, suggesting that from the 1550s composers took more initiative in publishing their works for financial gain or as symbols of prestige and skill. This quantitative analysis supports Kate van Orden’s recent suggestion that in the early 16th century, ‘it is hard to presume … that print was a natural locus of [musicians’] authorial identity’, yet by the 1550s there was ‘a dramatic shift in the attitude of composers toward the [single-composer] book of music’.8
Having observed these large-scale trends, we can examine the data in closer detail. A year-by-year analysis (illus.2) shows that the plateau in European music publishing in the 1570s can be attributed to falls in 1571/2 and 1576/7. A chart of the output of the leading printing centres (illus.3) shows that substantial falls occurred in Venice during these years. Both dips can be attributed to external factors: in 1571 the war with the Turks (culminating in the Battle of Lepanto), and in 1576/7 the plague epidemic that killed about 30 per cent of the Venetian population.9 Almost 30 years ago, Tim Carter used RISM data to chart the publishing of secular music in late 16th-century Italy, and his graphs likewise showed the temporary dips caused by these Venetian crises in 1571 and 1576/7.10 Compared to Carter’s article, the advantages of a digital analysis lie in the ease with which the data can be manipulated, drilling down to expose the individual publications produced in Venice, yet also placing Venice within Europe-wide trends for music printing.
European output of printed music by year, 1500–1649. Data from RISM A/I and B/I
European output of printed music by year, 1500–1649. Data from RISM A/I and B/I
Annual output of printed music for six major cities, 1500–1699. Data from RISM A/I and B/I
Annual output of printed music for six major cities, 1500–1699. Data from RISM A/I and B/I
Legacy 8.0 family tree software. Turning to the fate of music printing in the 17th century, a Big Data approach allows long-term trends to be plotted and thereby raises questions about the social and economic factors shaping musical life. As illus.1 shows, letterpress music printing had a distinct lifespan, with a sharp decline in the 1630s. Such a profile conforms to Fernand Braudel’s comment on the life expectancy of industries before the modern era: ‘the typical pattern of a sharp rise followed by an abrupt fall can very easily be imagined as the probable profile, in the pre-industrial economy’.11 In the case of music printing, one reason for the ‘abrupt fall’ was that movable type could not represent the complexities of virtuoso vocal or instrumental music, and it was then partly superseded by manuscript dissemination in the 17th century.
Analysis of the RISM data can also show the geographical reconfiguration of music publishing in the mid-17th century. In the previous century, Venice dominated music printing, typically producing over half of the European output of printed music in each decade. Illus.4, charting the ten most productive centres of music printing in the 1570s, shows the lagoon city’s pre-eminence even in that crisis-ridden decade. Venice’s dominance ceased in the 1630s, partly because of local reasons such as another plague outbreak in 1630–2, but also because of deeper structural changes, as the focus of the European economy shifted away from the Mediterranean to centres with closer access to the Atlantic trade such as London, Paris and Amsterdam.12 These cities began to play a major role in European music printing from the 1650s (see illus.3), although by this stage the industry had fragmented. The number of printing centres increased, yet each typically had a smaller output and served a narrower market. Illus.5 shows the ten cities with the highest output of printed music in the 1690s. No longer was a single city dominant: instead London and Paris had equal importance, with just over 150 items of printed music each. Bologna was the third most productive centre of music printing, and Venice and Amsterdam were in fourth and fifth places respectively. Such analyses highlight trends spanning two centuries, showing the reconfiguration of music printing in response to the economic and musical changes of the 17th century.
Totals of music publications from the ten main printing centres of the 1570s. Data from RISM A/I and B/I
Totals of music publications from the ten main printing centres of the 1570s. Data from RISM A/I and B/I
Totals of music publications from the ten main printing centres of the 1690s. Data from RISM A/I and B/I
Totals of music publications from the ten main printing centres of the 1690s. Data from RISM A/I and B/I
The previous paragraphs have used broad brush-strokes to represent complex phenomena. It might be objected that counting publications is a crude measure: surely a book historian should distinguish between large volumes containing many compositions and single-sheet songs, between first editions and reprints, and between pricey folio editions and cheap octavo books? Clearly the analysis could be nuanced in many ways. Yet the advantage of a digital analysis is that it is easy to cross-refer at all times to the master-sheet of individual bibliographical entries, and if necessary to augment this data or change the selection for analysis. Such Big Data analyses add a wider perspective to musicological study, showing how the individual sources (which are the usual object of research) contribute to broader trends, and thereby highlighting the interplay between music and its economic, political and social environments.
Mapping dissemination, reception and canonization
Analysis of bibliographical data can also illuminate the dissemination and reception of the works of specific composers, as the following case studies on Giovanni Pierluigi da Palestrina and Henry Purcell show. Spreadsheets detailing the dissemination of their works, derived from relevant entries in RISM A/I, A/II, B/I or B/II, can be imported into web-based visualization services such as Google Fusion (http://google.com/fusiontables) or Palladio (http://palladio.designhumanities.org/). These can produce a map of geo-coded data, or create network diagrams that show the links between entities (for instance, between musical works and places). Such network diagrams can expose geographical or chronological trends in the circulation of music, raising questions about why certain works or genres gained importance while others remained outliers.
Illus.6 is a network diagram showing the places and decades where Palestrina’s music was published in the 16th and 17th centuries. It includes single-composer editions of Palestrina (as listed in RISM A/I) and anthologies containing works by Palestrina (as listed in RISM B/I). The diagram clarifies which locations were central or peripheral to the dissemination of his music, and shows some of the patterns in the posthumous publication of his music. The size of the nodes shows that the most important locations for the publication of Palestrina’s music (in terms of numbers of books) were Venice and Rome. Venice, as demonstrated above, had an overwhelmingly dominant position in European music printing of the 16th century; in the 1570s Palestrina favoured Scotto and Gardano as publishers, and Venetian firms offered reprints of many of Palestrina’s works initially printed in Rome. The second biggest node, Rome, was the place where the first editions of many of Palestrina’s sacred collections appeared, and as the centre of Tridentine church reform it remained important for the ongoing publication of his liturgical music.
Places and decades for the publication of Palestrina’s music to 1700. Data from RISM A/I and B/I
Places and decades for the publication of Palestrina’s music to 1700. Data from RISM A/I and B/I
So far, this analysis has confirmed the observations in Jane Bernstein’s 2007 article on Palestrina’s publishing strategy.13 Where the network diagram makes a distinct contribution is in clarifying the chronological and geographical extremes of the publishing of Palestrina’s music. It shows that after Palestrina’s death in 1594, his music continued to be published mainly in Catholic centres such as Rome (where single-composer editions of his hymns and offertories appeared until the 1620s, and anthologies with his music, notably Anerio’s four-voice arrangement of the Missa Papae Marcelli, appeared until the 1680s). Another centre for the posthumous printing of Palestrina’s music was Antwerp, which was forcibly re-catholicized after 1585.
Illus.6 furthermore shows the smaller publishing centres and peripheral locations where Palestrina’s music appeared. Given his reputation as an archetypal Catholic composer, it is not surprising that his music was printed in Counter-Reformation Milan and the Jesuit university town of Dillingen in Bavaria. The diagram also demonstrates the dates when Palestrina’s music was printed in Protestant locations—Nuremberg, Strasbourg and London in the 1580s, Heidelberg in the 1600s and Leipzig in the 1610s. Glimpsing such outliers can prompt an investigation into which genres and compositions travelled to Protestant locations: for instance, did madrigals in contrafacta or in wordless intabulations for lute travel better than Latin motets?
Unusually for composers of the 16th century, Palestrina’s music underwent a strong revival in the 18th and 19th centuries. The strength of this revival is indicated by the RISM A/II dataset of music manuscripts. Here the caveat must be repeated that RISM A/II is incomplete, with little coverage of French, Iberian or Italian holdings. These omissions notwithstanding, Table 1 shows the enormous increase in manuscript copies of Palestrina in the 18th and 19th centuries, an increase probably partly triggered by Fux’s veneration of Palestrina in his Gradus ad Parnassum (1725).14 Such statistics demand closer scrutiny, for instance an investigation of which of Palestrina’s compositions were copied most, or a study to see if other 16th-century composers such as Arcadelt or Lassus underwent a comparable revival. At the least, though, such quantitative analysis can open new avenues for research into reception history.
Numbers of compositions in manuscript attributed to Palestrina in RISM A/II, 1500–1900. Some dates are approximate.