releasing open data for illuminated manuscript collection records and research…
Learning technical lessons
August 23, 2012Posted by on
As I alluded to in my previous post we feel like we have only just scratched the surface with what is possible on the foundations which ‘Open Book’ has laid. This is hardly surprising as this project was pretty ‘rapid fire’, achieving much proof of concept work over a very short project life of 7 months. We believe it’s been a valuable demonstrator of short, sufficiently funded development bringing together a small group of organisations each bringing their specialist experience into the mix.
One thing which is not immediately apparent is that to deploy the type of technical architecture we wished for Open Book (diagram below) you need plenty of flexibility in your infrastructure. The Fitzwilliam Museum’s investment in a virtualised server infrastructure in 2011 has been an important enabler to ‘Open Book’ – deploying the new servers required was trivial for us today, compared to our old infrastructure where server = physical box. The project required a mix of technologies and being able to be flexible enough to just deploy whatever was needed, relativity quickly, mean’t we didnt get bogged down in infrastructure matters.
Another less than obvious thought is that when seeking to create secondary stores and build services over them one needs to make very important decisions about just how structured or unstructured the data needs to be for that service.
Maybe a very small subset of the your data is required which can be mapped into further consolidated fields (e.g. any number of fields from source(s) may end up all being concatenated into a ‘global keyword field’ in the secondary data store) – this can create incredibly small and fast indexes which are quite suitable for the purpose at hand. On the other hand where you want very fine granulation in your output services you will have apply the more traditional highly designed schemas, structures and cross-linkages – this is very resource intensive but it will be reflected in the quality of the service you provide (e.g. a high value triple-store will have had an equally high level of effort expended on structuring & creating it).
Our collaboration during Open Book often came down to ‘modelling‘ – not schemas or field definitions – rather looking at the data we were trying to represent/use and seeing if, rather than solving just a single specific instance problem, whether indeed we could ‘generalise’ the problem up a level (create a model). Knowledge Integration’s CIIM concepts of ‘sets’ and contexts’ are very good examples of this and we enjoyed participating in further refinement of those concepts within the CIIM.
Designing URI’s for open data services is not trivial – but if you hunt about there’s a bit of prior knowledge and activity going on in this area. One very useful resource is the government’s open data URI guidelines and a useful forum for these matters in the Museum context is the Museums and the Machine Processable Web site.
There is bound to be much more that could be said – but instead I’ll offer commenters to ask specific questions if they wish, and we’ll try answer them in the context of our project.