JISC – Open Book Project

releasing open data for illuminated manuscript collection records and research…

Monthly Archives: March 2012

It’s Obvious #2

Part 2 – What’s “Middleware”

I like the beginning of the Wikpedia article for Middleware (I’m dropping the Middleware quotes now in the name of efficiency, too many keystrokes…). It spends most of the first paragraph saying what Middleware is not. But I’m not here to bore you with definitions, lets keep it simple…

Figure 1

Basically, Middleware does something, to something, to produce something. My other jocular way to describe it is ‘Slurp, Stir and Serve’…
Figure 1 is awfully familiar to anyone with exposure to traditional ‘Computing 101’ – computers take input, process it, and create output. At its simplest level so does Middleware.

But let’s quickly get more concrete. Figure 2 is our ‘problem scenario’ from part 1 – using a Collections Management System (CollMS) and its data to feed directly to your web users.

Figure 2

Next, Figure 3 puts Middleware in the picture. Immediately the opportunity exists to change what you present to your web audience – because you’re ‘driving’ your web presence from a ‘different system and data-set’ – more formally you have de-coupled your web functionality from your Collections Management System (CollMS). Most importantly your Middleware will (be chosen to and) allow you to design for your web users (and services) rather than be constrained by the functionality of your CollMS.

Figure 3

Figure 3

Now I’ve lulled you in a false sense of simplicity,  let me expand…

Figure 4

Figure 4

In Figure 4  we see a little more detail of how the Middleware part of this looks.

At a conceptual level it’s worth noting that the Middleware ‘box’ was itself described as an input-process-output ‘box’ at its highest level. And inside the Middleware box, to accomplish that, I’ve shown that we’ll need at least two more input-process-output ‘chains’ to achieve our purpose.

This kind of  ‘model’ is valuable in its simplicity – it shows the potential benefit of breaking everything down to a input-process-output ‘tasks’. And, wherever you see a ‘process’ box you can be pretty sure it has its own input-process-output ‘tasks’ inside it. So as the infamous saying goes, in our ‘model’, it really is just ‘turtles all the way down’

So far, so obvious…

More important is that this approach (Figure 4) has some appealing charateristics from a functional and developmental point of view:

  • it’s technology agnostic. It is not reliant on the system (in our case, CollMS) it’s ‘slurping from’. It’s possible to use a different piece of technology for each of the components (datastore, each of the processes) – this has the potential for you to choose ‘best of breed’ technology for each component. Or, more pragmatically, to choose technology your organisation has skills in – i.e. for the ‘Microsoft shop’ they can use .NET stuff, for University environments like ours we can leverage our skills in open source/standard technologies like PHP, XSLT, JSON etc.
  • it’s modular. Modular is good – it allows us to break problems into contained packets which humans can deal with.  Through modularisation the development of such a system can be broken into simpler, smaller, problems – or ‘black boxes’. Modular should also aid sustainability – replacing one piece in a modular system won’t break the rest – if designed properly.
  •  it’s loosely coupled. What’s good about loose coupling is exactly that it’s not like its opposite – a monolithic system. Our Figure 1 shows a simplified monolithic system, where if the CollMS ‘goes down’ everything is down. Conversely in the loosely coupled example, Figure 3, our web users won’t even notice when the CollMS is down because they are being ‘served’ by the Middleware. Loose coupling is most important as a requirement for modularisation – together they enable the ‘black box’ development approach.
  • it can be distributed. Modularisation and loose coupling means that different components of the system can ‘live’ on different machines, even in different localities if desired. In the technical world this characteristic provides us an easier path to deal with issues such as continuity, recoverability & scalability.

A couple of notes before I finish this weeks instalment.

Presentation. I have completely ignored, for simplicity, that generally there is a whole separate layer, or module, between the Middleware and the web user called the presentation layer. It’s crucial, you should be aware it exists, but it would have messed up my simple pictures..

Integration. Another crucial characteristic of Middleware which I’ve chosen to not detail this week.  But it’s not a great stretch to see that if you can ‘slurp’ from one data source, you can do it from many. This is a key in our “Open Book” project – to show how, and document issues, in bringing together (integrating) multiple data sources.

One final point to ponder – when you take the concepts and characteristics outlined here, multiply them many fold, you have a simple conceptual picture of what the web, and in particular the web of data, looks like. A loosely coupled, distributed system, of modules, which are all able to ‘slurp, stir and serve’. You can plug these modules together,  integrate different data sets, create a new modules… you can create new modules which just serve a processing purpose and let others use them… and so on and so on…

In part 3 we’ll look what ‘Open Book’ will ‘build’ as we move ahead in constructing The Fitzwilliam Museum’s ‘explore service(s)’…

P.S. When I decided I wanted to illustrate this blog, my heart sank – I have never really ‘got on’ with the likes of Visio and MS Office Draw… anyway I found a freeware drawing package, yEd, that is great. and made ‘knocking out’ the figures for this post pretty straight forward.

It’s Obvious

Part 1 – The Problem

When developing online resources for an audience using a Museum Collections Management System (CMS) a couple of things become quickly apparent:

  • re-purposing data designed for collections care and research is hard, and
  • building an online interface over an ‘internal application’ is hard.

It’s obvious isn’t it?

It is hard because neither the data, nor the functions of the application (i.e. search) , were probably ever designed to serve external audiences.

Despite this we have all been doing exactly this – building web services, from OAI-PMH data feeds, to full online public access catalogues (OPACs), directly over collections management systems. Many of these CMS’s have had a ‘web module’ tacked onto them at some stage in their development history and this is what has been used to build these web services.

Experience has shown this has its limitations. We have explored these issues in previous JISC projects as far back 2002 (i.e. issues documents published during the ‘Harvesting the Fitzwilliam’ project). We have been handling the practical problems of this approach ever since. Despite this, we have built OAI-PMH services, have taken our OPAC through two development incarnations, and built varyingly successful ‘dynamic’ web resources based on the ‘underlying OPAC’ (e.g. this is a resource which combines static and OPAC derived data) .

The problem of ‘re-purposing data’ is well rehearsed.  The problem of ‘re-purposing’ an application brings yet another set of issues. Briefly looking at the main iterations The Fitzwilliam Museum has gone through with its OPAC is not a bad way to draw some of these issues out.

Phase 1 OPAC (circa 2001/2) was built entirely on the vendors ‘cmsopac’ module and all customisation was carried out in its own proprietary scripting language.

Phase 2 OPAC (circa 2005/6) coincided with an evolution of the vendors ‘cmsopac’ module which now had XML output.  This provided the opportunity to ‘wrapper’ the ‘cmsopac’ module with in-house developed functionality (using PHP and XSLT technologies). Now, in-house development is not something one embarks on lightly. In-house development was considered necessary though to be able to provide the web experience we aspired to. Primarily what we achieved was:

  • to provided a search interface to the user which did its best to ‘hide’ the underlying application functionality (and its limitations)
  • to build a completely flexible presentation system (based on XSLT) above the ‘cmsopac’
  • tinkered with the ability (as unsophisticated as it is) to integrate simple related data, not held in the CMS, into OPAC results

Put another way we had built a ‘layer’ which partly de-coupled the web functionality, both on the input and output side, from the underlying ‘cmsopac’ application. This approach served us well for a time. Obviously, however, any limitations the ‘cmsopac’ application has are always present because it is still at the core of the system. In time, the limitations which really began to ‘hurt’ us were: its search functionality, search performance, and our desire to do data integration from multiple sources..

If you have read this far I’m guessing you may be reciting my title by now – “it’s obvious” – and in its simplest version it is – “middleware”. “Middleware” is one solution which could complete the job – meaning it would completely de-couple our CMS’s data and functionality from our OPAC.  This is already being done in the museum sector – the most common problem being tackled by the “middleware” approach is integration of the CMS with a Digital Asset Management (DAM) system. An example of this which comes from the same JISC programme as our project is the Bodleian’s iNQUIRE system.  Knowledge Integration, a partner in Open Book, has also done work in this direction – bringing CMS and DAM data together in their CIIM system to drive the Imperial War Museum’s Collection Search.

This should come as no surprise really – “middleware”, or “fusion service” as it is called in the JISC Information Environment Architecture, has been conceptually desirable for a very long time. Obviously the Collection Trust’s Culture Grid, as an aggregator, is by definition a sophisticated “middleware” system.  Today, in medium to large museums, many components of the JISC IE Architecture are migrating ‘inside’ – a small private version of that architecture inside an organisation if you like.

“Middleware” has probably ‘come of age’ for this smaller, internal, deployment and development for a number of reasons.  Firstly the entry barrier has been lowered – sophisticated purpose built open source components requiring less development effort have matured over the past years. This makes it possible for smaller organisations to consider “middleware” solutions from the perspective of the required resources to actually deploy such a system.   Previously it was too ‘complicated’.  More importantly the need, and aspirations, to provide better user experiences and web services simply require more flexible ‘systems’ to be achievable. “Middleware” becomes an obvious component choice in this new ‘system’.

The specific problems which have brought The Fitzwilliam Museum to the need for “middleware”, and what ‘Open Book’ will begin to address are:

  • an ability to bring together collection, object catalogue and object research data
  • the ability to provide more sophisticated harvesting (OAI-PMH)
  • the ability to provide new services conforming to open linked data best practices

In part 2 we’ll explore how that “middleware” ‘fits into the picture’…