releasing open data for illuminated manuscript collection records and research…
Part 2 – What’s “Middleware”
I like the beginning of the Wikpedia article for Middleware (I’m dropping the Middleware quotes now in the name of efficiency, too many keystrokes…). It spends most of the first paragraph saying what Middleware is not. But I’m not here to bore you with definitions, lets keep it simple…
Basically, Middleware does something, to something, to produce something. My other jocular way to describe it is ‘Slurp, Stir and Serve’…
Figure 1 is awfully familiar to anyone with exposure to traditional ‘Computing 101’ – computers take input, process it, and create output. At its simplest level so does Middleware.
But let’s quickly get more concrete. Figure 2 is our ‘problem scenario’ from part 1 – using a Collections Management System (CollMS) and its data to feed directly to your web users.
Next, Figure 3 puts Middleware in the picture. Immediately the opportunity exists to change what you present to your web audience – because you’re ‘driving’ your web presence from a ‘different system and data-set’ – more formally you have de-coupled your web functionality from your Collections Management System (CollMS). Most importantly your Middleware will (be chosen to and) allow you to design for your web users (and services) rather than be constrained by the functionality of your CollMS.
Now I’ve lulled you in a false sense of simplicity, let me expand…
In Figure 4 we see a little more detail of how the Middleware part of this looks.
At a conceptual level it’s worth noting that the Middleware ‘box’ was itself described as an input-process-output ‘box’ at its highest level. And inside the Middleware box, to accomplish that, I’ve shown that we’ll need at least two more input-process-output ‘chains’ to achieve our purpose.
This kind of ‘model’ is valuable in its simplicity – it shows the potential benefit of breaking everything down to a input-process-output ‘tasks’. And, wherever you see a ‘process’ box you can be pretty sure it has its own input-process-output ‘tasks’ inside it. So as the infamous saying goes, in our ‘model’, it really is just ‘turtles all the way down’…
So far, so obvious…
More important is that this approach (Figure 4) has some appealing charateristics from a functional and developmental point of view:
A couple of notes before I finish this weeks instalment.
Presentation. I have completely ignored, for simplicity, that generally there is a whole separate layer, or module, between the Middleware and the web user called the presentation layer. It’s crucial, you should be aware it exists, but it would have messed up my simple pictures..
Integration. Another crucial characteristic of Middleware which I’ve chosen to not detail this week. But it’s not a great stretch to see that if you can ‘slurp’ from one data source, you can do it from many. This is a key in our “Open Book” project – to show how, and document issues, in bringing together (integrating) multiple data sources.
One final point to ponder – when you take the concepts and characteristics outlined here, multiply them many fold, you have a simple conceptual picture of what the web, and in particular the web of data, looks like. A loosely coupled, distributed system, of modules, which are all able to ‘slurp, stir and serve’. You can plug these modules together, integrate different data sets, create a new modules… you can create new modules which just serve a processing purpose and let others use them… and so on and so on…
In part 3 we’ll look what ‘Open Book’ will ‘build’ as we move ahead in constructing The Fitzwilliam Museum’s ‘explore service(s)’…
P.S. When I decided I wanted to illustrate this blog, my heart sank – I have never really ‘got on’ with the likes of Visio and MS Office Draw… anyway I found a freeware drawing package, yEd, that is great. and made ‘knocking out’ the figures for this post pretty straight forward.
Part 1 – The Problem
When developing online resources for an audience using a Museum Collections Management System (CMS) a couple of things become quickly apparent:
It’s obvious isn’t it?
It is hard because neither the data, nor the functions of the application (i.e. search) , were probably ever designed to serve external audiences.
Despite this we have all been doing exactly this – building web services, from OAI-PMH data feeds, to full online public access catalogues (OPACs), directly over collections management systems. Many of these CMS’s have had a ‘web module’ tacked onto them at some stage in their development history and this is what has been used to build these web services.
Experience has shown this has its limitations. We have explored these issues in previous JISC projects as far back 2002 (i.e. issues documents published during the ‘Harvesting the Fitzwilliam’ project). We have been handling the practical problems of this approach ever since. Despite this, we have built OAI-PMH services, have taken our OPAC through two development incarnations, and built varyingly successful ‘dynamic’ web resources based on the ‘underlying OPAC’ (e.g. this is a resource which combines static and OPAC derived data) .
The problem of ‘re-purposing data’ is well rehearsed. The problem of ‘re-purposing’ an application brings yet another set of issues. Briefly looking at the main iterations The Fitzwilliam Museum has gone through with its OPAC is not a bad way to draw some of these issues out.
Phase 1 OPAC (circa 2001/2) was built entirely on the vendors ‘cmsopac’ module and all customisation was carried out in its own proprietary scripting language.
Phase 2 OPAC (circa 2005/6) coincided with an evolution of the vendors ‘cmsopac’ module which now had XML output. This provided the opportunity to ‘wrapper’ the ‘cmsopac’ module with in-house developed functionality (using PHP and XSLT technologies). Now, in-house development is not something one embarks on lightly. In-house development was considered necessary though to be able to provide the web experience we aspired to. Primarily what we achieved was:
Put another way we had built a ‘layer’ which partly de-coupled the web functionality, both on the input and output side, from the underlying ‘cmsopac’ application. This approach served us well for a time. Obviously, however, any limitations the ‘cmsopac’ application has are always present because it is still at the core of the system. In time, the limitations which really began to ‘hurt’ us were: its search functionality, search performance, and our desire to do data integration from multiple sources..
If you have read this far I’m guessing you may be reciting my title by now – “it’s obvious” – and in its simplest version it is – “middleware”. “Middleware” is one solution which could complete the job – meaning it would completely de-couple our CMS’s data and functionality from our OPAC. This is already being done in the museum sector – the most common problem being tackled by the “middleware” approach is integration of the CMS with a Digital Asset Management (DAM) system. An example of this which comes from the same JISC programme as our project is the Bodleian’s iNQUIRE system. Knowledge Integration, a partner in Open Book, has also done work in this direction – bringing CMS and DAM data together in their CIIM system to drive the Imperial War Museum’s Collection Search.
This should come as no surprise really – “middleware”, or “fusion service” as it is called in the JISC Information Environment Architecture, has been conceptually desirable for a very long time. Obviously the Collection Trust’s Culture Grid, as an aggregator, is by definition a sophisticated “middleware” system. Today, in medium to large museums, many components of the JISC IE Architecture are migrating ‘inside’ – a small private version of that architecture inside an organisation if you like.
“Middleware” has probably ‘come of age’ for this smaller, internal, deployment and development for a number of reasons. Firstly the entry barrier has been lowered – sophisticated purpose built open source components requiring less development effort have matured over the past years. This makes it possible for smaller organisations to consider “middleware” solutions from the perspective of the required resources to actually deploy such a system. Previously it was too ‘complicated’. More importantly the need, and aspirations, to provide better user experiences and web services simply require more flexible ‘systems’ to be achievable. “Middleware” becomes an obvious component choice in this new ‘system’.
The specific problems which have brought The Fitzwilliam Museum to the need for “middleware”, and what ‘Open Book’ will begin to address are:
In part 2 we’ll explore how that “middleware” ‘fits into the picture’…