releasing open data for illuminated manuscript collection records and research…
Opportunities and Possibilities
The primary focus of Open Book has been to release open data about the collections in the Fitzwilliam Museum, as well as to explore how we go beyond item level data and link to related resource information and research activity data.
A fundamental building block of the project was to first establish an organisational position on open data – how we define it and what it means to us. We had to ensure that we had an understanding of open data and were prepared to release it. So, the issue of licensing reared its head. But before that we tried to formalise a description of the types of data that, as a museum, we deal with and, in particular, clarify what we meant by metadata. With that distinction in place, internal advocacy for open data was easier. In the end it has led to a layered approach, with a basic set of metadata (aimed specifically at resource discovery and aggregation) being dedicated to the public domain and more detailed data licensed as Creative Commons Attribution-Share Alike. The latter can still be defined as an open licence but recognises the curated nature of this data and the significance (in academic terms) of attribution of source.
The next question was how we were going to release this data and I have to admit we have taken a scattergun approach. Metadata is made available via OAI-PMH for harvesting. Fuller data sets are offered via an API, returning JSON responses, and a SPARQL endpoint using RDF and mapping to the CIDOC-CRM. We have conceived this as a data service that complements our web presence and online catalogue. Achieving this has taken time and technical resources that would have been difficult to access without the Open Book project. The uptake and value of each of these can only be properly assessed over time. If we had to focus on a single aspect of this, perhaps the path with the potential for greatest returns would be the release of open metadata (whether through OAI or some other data feed) to the Culture Grid and subsequently to Europeana. The sector context of such aggregations is likely to offer users the most benefit, in the short term at least.
Development of use case scenarios and understanding of user requirements for the metadata released by Open Book has relied heavily on work carried out with the sister project Contextual Wrappers 2. In the focus group sessions for Contextual Wrappers 2 and Open Book, the concept of open data was well received by data managers and users alike, although it was recognised that there may still be management resistance within some organisations. The general approach, adopted by both Contextual Wrappers 2 and Open Book, of guided discovery (providing contextual information, including collection level descriptions, to supplement item level records) was seen as positive and is already pursued by some museums. The value in particular to users unfamiliar with the subject or collection was highlighted, as well as a role in creating course subject guides. However, the potential was seen also for academic research, creating links across multiple collections and using collection level descriptions as a way to draw together bibliographic information and research material across a group of objects.
In addition to releasing item level metadata, Open Book has explored the modelling of data related to the outputs from ongoing research projects (including pigment analysis of illuminated manuscripts) at the Fitzwilliam Museum. An initial proposition was that the CIIM (collections information integration module), which was being deployed during the project, could be used to capture, store and eventually publish the primary research data outputs. As such, one group of potential users would have been internal, employing the CIIM to manage the research data. However, it became clear at an early stage that this wasn’t going to be possible or desirable. The real strength of the CIIM is data augmentation and publishing; to use it as a database for research outputs and to integrate it within the research workflow (particularly given that the CIIM’s user interface is at an early phase of development) wasn’t a viable option. Although the notion of ‘contexts’ within the CIIM, which augment the item data extracted from the collections database, could be used to store item-specific research data, it was apparent that research projects don’t operate according to this kind of simple one-to-one relationship. Tools and applications with which the researcher is already familiar are a more efficient means of handling the research process and better able to capture the complexities of the data.
Ultimately what we have tried to do is encapsulate the research output in a structured form as metadata about the project. This kind of ‘abstract’ of the research sounds (and is) quite straightforward but it isn’t something that we have done before or have had the capacity to store and publish in a sensible way. We will continue to explore how we might vary the granularity of these abstracts, from global project metadata down to individual strands of a research project, and how they can be linked to specific context information augmenting the basic object data. We want to both publish the fact that research (which may as yet not have any associated academic publication) is ongoing and associate the relevant objects with that research, creating links between item and research metadata in the same way as there are links between items and collections. We have more work to do on assessing the potential use of this metadata but the emphasis is on resource discovery, particularly within a cross-disciplinary environment where an activity such as pigment analysis would be of interest to researchers in disciplines outside the usual museum subject areas. This leads into what are, for us, new areas of aggregation of research activity data. We are familiar, in our object-centric view of the world, with object data aggregation but structuring research related data, and how this is/can be used in a wider academic context, is new territory for us.
In addition to any benefits to external users, there also may be internal gains. Although dealing only with a relatively small number of projects, one by-product of structured research activity data will be an enhanced strategic and administrative capacity to search and produce reports (e.g. research activity over last 10 years funded by x). The role of documenting research in this way was highlighted in the focus group sessions, where it was suggested that it could help raise the profile of research within museums, provide a metric in the Research Excellence Framework and be used for internal advocacy.
lessons learned about modelling metadata
At the risk of following a philosophical path which I’m not qualified to tread, there is a Neo-Platonic tenet that states something along the lines of “The soldier is more real than the army.” By the same token a museum object could be said to be more real than a collection. An object has a physical presence. A collection exists only because someone says it does.
The problem with defining what a collection means, from a museum point of view at least, is that there are many different definitions depending on who is doing the defining and the perceived role and value (to the potential user) of the collection. Beyond collection records that group together items impractical to catalogue individually, they don’t fit well within a collections information management system and are not usually part of the normal curatorial cataloguing workflow.
Museum object records exist because museums have created them as a natural part of collections management. However, there is not the same imperative to create records at a collection level. They are more fluid, artificial constructs and don’t necessarily have a clear role within the organisation. Although published collection catalogues have always been within academic and curatorial scope, short interoperable collection level descriptions, which complement and provide context for object records, have perhaps only begun to find a real role with the advent of digital publication.
The JISC Resource Discovery programme has prompted us (and as a museum we are probably much further behind in this than libraries or archives) to make a greater distinction between publishing resources and publishing information about those resources. We have tended to think of item records in the former category but collection descriptions fall more naturally into the second.
In looking at how we represent the metadata for collections during this project we began also to question whether there was value still in having a distinct entity called a ‘collection level description’. Would a generic resource description, which encompasses collection descriptions (in all their various flavours) as well as online exhibitions, digital resources, and any other published grouping of objects, be a more useful concept? And who should create these resource descriptions – curators, education staff, documentation staff?
The notion of ‘sets’, implemented in the middleware CIIM (collections information integration module) for “Open Book”, has provided a way for us to model these entities. It also addresses the modelling and integration of metadata related to specialist research information generated by the museum (which has been another part of this project). As well as facilitating the publishing of open metadata derived from item records on our collections information management system (Adlib), the CIIM acts as a primary store for metadata describing collections, research outputs, electronic resources or anything else which we wish to publish as an aid to resource discovery. As a concept, the ‘set’ is generic but we are able to assign specific schema to each type of set, according to the metadata content and the purpose of the set. These sets can then be associated also with other sets as well as item level data and with ‘contexts’, additional data beyond that extracted from the collections information database, which are wrapped around the object records. This has provided a very adaptable framework, helping us move beyond publishing only object records. Collections might not exist but we can now at least capture, store, connect and publish the metadata for them.
In trying to reach a position on the rights and licensing issues related to the range of stuff that a museum deals with, one approach we have taken is to tighten up our definition of metadata. It might seem an obvious thing to do but metadata means different things to different people and organisations. The definition of “Data about data” offers plenty of scope. It is common, for example, to think of a museum object record as metadata. CHIN’s guide to museum standards observes that the most obvious example of metadata in a museum context is “…the museum catalogue record (structured data about an object in the museum’s collection).”
The key distinguishing feature about metadata for us, working within the context of the Resource Discovery strand of this JISC programme, is precisely that it is about resource discovery. It is just a tool, a means to an end, in a way in which an object record, or even a collection level description, is not. It serves a different purpose from a catalogue record. It is not intended for collections management or interpretation but as a signpost which will be of value when aggregated with signposts to other things. As such it could hold information that we wouldn’t put in a collections record. Curators might be reluctant to record the term “Impressionist” in the record of a painting by Renoir, particularly if the painting is on the margin of what may be considered his Impressionist period, but the metadata might usefully contain the term. The museum is released from committing to a definitive interpretation (a rare thing in art history) but at the same time the user is given a pointer to something that could be of relevance to them. Storing these additional data, orientated to resource discovery, within a collections information management system is not ideal – hence the “middleware” approach that we are taking in this project.
By extension, we are looking beyond object records, to collection records (something that we haven’t sought to maintain before – other than the initial foray in Cornucopia and pointers to the other resources that we have begun to create, such as online exhibitions and educational resources (again something that we haven’t maintained since the initial batch of MICHAEL records).
The metadata record could be a generic resource description, typed according to whether it points to object data, collection level data, online exhibitions or other resources. The key factor evidently is how well these things aggregate – how they can be identified so that they can be targeted at different user groups and how the potential links between them are expressed. We don’t need to make object records from different sources/museums interoperable but we do need to give the metadata enough common ground to be effective as a means of resource discovery.
Initial draft model of the different classes of digital “stuff” that we generate and the different levels of licensing that we propose.