JISC – Open Book Project

releasing open data for illuminated manuscript collection records and research…

It’s Obvious

Part 1 – The Problem

When developing online resources for an audience using a Museum Collections Management System (CMS) a couple of things become quickly apparent:

  • re-purposing data designed for collections care and research is hard, and
  • building an online interface over an ‘internal application’ is hard.

It’s obvious isn’t it?

It is hard because neither the data, nor the functions of the application (i.e. search) , were probably ever designed to serve external audiences.

Despite this we have all been doing exactly this – building web services, from OAI-PMH data feeds, to full online public access catalogues (OPACs), directly over collections management systems. Many of these CMS’s have had a ‘web module’ tacked onto them at some stage in their development history and this is what has been used to build these web services.

Experience has shown this has its limitations. We have explored these issues in previous JISC projects as far back 2002 (i.e. issues documents published during the ‘Harvesting the Fitzwilliam’ project). We have been handling the practical problems of this approach ever since. Despite this, we have built OAI-PMH services, have taken our OPAC through two development incarnations, and built varyingly successful ‘dynamic’ web resources based on the ‘underlying OPAC’ (e.g. this is a resource which combines static and OPAC derived data) .

The problem of ‘re-purposing data’ is well rehearsed.  The problem of ‘re-purposing’ an application brings yet another set of issues. Briefly looking at the main iterations The Fitzwilliam Museum has gone through with its OPAC is not a bad way to draw some of these issues out.

Phase 1 OPAC (circa 2001/2) was built entirely on the vendors ‘cmsopac’ module and all customisation was carried out in its own proprietary scripting language.

Phase 2 OPAC (circa 2005/6) coincided with an evolution of the vendors ‘cmsopac’ module which now had XML output.  This provided the opportunity to ‘wrapper’ the ‘cmsopac’ module with in-house developed functionality (using PHP and XSLT technologies). Now, in-house development is not something one embarks on lightly. In-house development was considered necessary though to be able to provide the web experience we aspired to. Primarily what we achieved was:

  • to provided a search interface to the user which did its best to ‘hide’ the underlying application functionality (and its limitations)
  • to build a completely flexible presentation system (based on XSLT) above the ‘cmsopac’
  • tinkered with the ability (as unsophisticated as it is) to integrate simple related data, not held in the CMS, into OPAC results

Put another way we had built a ‘layer’ which partly de-coupled the web functionality, both on the input and output side, from the underlying ‘cmsopac’ application. This approach served us well for a time. Obviously, however, any limitations the ‘cmsopac’ application has are always present because it is still at the core of the system. In time, the limitations which really began to ‘hurt’ us were: its search functionality, search performance, and our desire to do data integration from multiple sources..

If you have read this far I’m guessing you may be reciting my title by now – “it’s obvious” – and in its simplest version it is – “middleware”. “Middleware” is one solution which could complete the job – meaning it would completely de-couple our CMS’s data and functionality from our OPAC.  This is already being done in the museum sector – the most common problem being tackled by the “middleware” approach is integration of the CMS with a Digital Asset Management (DAM) system. An example of this which comes from the same JISC programme as our project is the Bodleian’s iNQUIRE system.  Knowledge Integration, a partner in Open Book, has also done work in this direction – bringing CMS and DAM data together in their CIIM system to drive the Imperial War Museum’s Collection Search.

This should come as no surprise really – “middleware”, or “fusion service” as it is called in the JISC Information Environment Architecture, has been conceptually desirable for a very long time. Obviously the Collection Trust’s Culture Grid, as an aggregator, is by definition a sophisticated “middleware” system.  Today, in medium to large museums, many components of the JISC IE Architecture are migrating ‘inside’ – a small private version of that architecture inside an organisation if you like.

“Middleware” has probably ‘come of age’ for this smaller, internal, deployment and development for a number of reasons.  Firstly the entry barrier has been lowered – sophisticated purpose built open source components requiring less development effort have matured over the past years. This makes it possible for smaller organisations to consider “middleware” solutions from the perspective of the required resources to actually deploy such a system.   Previously it was too ‘complicated’.  More importantly the need, and aspirations, to provide better user experiences and web services simply require more flexible ‘systems’ to be achievable. “Middleware” becomes an obvious component choice in this new ‘system’.

The specific problems which have brought The Fitzwilliam Museum to the need for “middleware”, and what ‘Open Book’ will begin to address are:

  • an ability to bring together collection, object catalogue and object research data
  • the ability to provide more sophisticated harvesting (OAI-PMH)
  • the ability to provide new services conforming to open linked data best practices

In part 2 we’ll explore how that “middleware” ‘fits into the picture’…


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: