Tuesday, 14 October 2014

Reuse of Digitised Content (2): Here's One I Made Earlier, or, It's Lolly Time

Following on from my previous post in which I bemoan how hard it is to reuse digitised content as a source for creating something, I reuse a digitised image of an item in the National Library of Scotland, discovering how tricky it is to reuse images of "orphan works", but producing something that, well, I like!

 

After a few months of exploring digitised collections looking for A Thing to Make and Do, something caught my eye. Ironically, I found it whilst flicking through a print catalogue of an exhibition I hadn't had the chance to attend: Going to the pictures: Scotland at the cinema, which had run at the National Library of Scotland* in the summer of 2012. A quick google showed it had been digitised at least in low resolution, appearing on the website: 
A 1960s "lantern" interval slide tempting patrons 
to buy an ice lolly, used at the Odeon Cinema, 
Eglinton Toll, Glasgow. Image used here 
with permission from the Scottish Screen Archive,
National Library of Scotland. [source page]

Look at that! How cheerful is it? And right up my street. I kept going back to it and going... ahhhhh! But was it digitised in high enough resolution, and could I get permission to do anything with it, given it is quite clearly still in copyright?

The folks at the Scottish Screen Archive, and the Intellectual Property Officer at the National Library of Scotland, couldn't have been more helpful. Yes, they had previously digitised it at high resolution (all 69MB of it), and I could get permission to use it for my own use (and to feature the image(s) here on my blog) for the princely sum of ten of your British Pounds for the license. I also contacted the Odeon: their records dont go that far back for design so they cannot prove they own copyright, but they gave me permission to use it if they do, with the caveat that a copyright owner, whom they cannot speak for, may come forward at some future date (and hey, stranger things have happened once you put things into the blogosphere, if anyone knows anything about the illustrator, please get in touch). This lantern slide is officially an "orphan work", then. This means it isn't in the public domain, and I cant reuse the high resolution image provided from the SSA willy-nilly (such as making a pattern for anyone to use with it, or giving away the source files, or putting it up on third party website such as spoonflower), under the terms of the license agreed. But it means I can use it for personal use. I'll come back to that later, but lets crack on.

Getting My Make On

The process of turning this into something was straightforward. Once I had the high res file, I spent a few hours tidying up the image, removing some scratches and marks from the slide: this is a fragile, opaque, archival item, and it's no wonder that, close up, there were some marks that may detract from print quality. Its a line to walk, though: you dont want to make it too cleaned up. It still wants to look original.
Before and after, with a bit of cleaning up in PhotoShop.

This resulted in a cleaned up version of the lantern slide, ready to go:
It's not a huge difference from the original (and I havent put the full resolution file up here that I have, I'm not allowed to), but it just makes the whole thing a bit fresher for printing.

Then it was just a case of more PhotoShop jiggery pokery, measuring up, tiling, choosing my printer (I went with BagsofLove, a UK company which seems to offer quite a range of printing: people online say that if you order from Spoonflower, a company based in the USA, import duty can really make the costs mount up for shipping to the UK).

The Big Reveal



Ta da! A pure silk scarf, with repeating motif. Cute, huh? Bagsoflove offer silk printing plus hemming, given a lot of people want silk scarves to test patterns. I got quite a large one made, and the whole thing cost £100 all in, ready to wear).

And here I am wearing it! While we are talking about copyright, etc, this photo was taken by my 6 year old who has given me permission to post it here (which also might explain why all the scarf is cropped out on the left! but you get the drift).

Thoughts On the Process


Do I like the resulting item? Well, I chuckled when it came in the post, so yeah. I do feel as if I've made it - a few hours navigating licensing issues, about 5 hours total in PhotoShop, a few hours choosing where to get it made and what to get it made into, so it feels like I've had to invest time (and some brain effort, in working out tiling sizes, etc, and what I actually wanted size wise: this was a significant investment in time and cash, so its good to get it right). It's already made me think about the next digital printing project, which means the whole thing must've been fun. Working, as I do, with so much digital data, its nice to actually have a product at the end of the day. Going with silk was expensive, and there are cheaper options available, but I've got a high quality item (that would probably cost around the same on the high street - I'm not going to make a fortune if I choose to sell these on etsy, unless I go for a cheaper supplier!).

The one frustration I have is that I cant share the files with anyone, and I cant say, if you like it, here it is, get it printed up yourself, and I cant, at the moment, stick it up on etsy for sale even if I wanted to, due to the orphan works copyright restrictions. I talked at length with the NLS's Intellectual Property Officer, and we walked through why its just not legal, at the moment, for them to allow someone else to "publish" something that is in their collection and still in copyright without getting the holder's permission, and I understand this - although it doesnt mean I'm not frustrated by that. (You could get a license from the NLS yourself, if you wanted to use it for personal use).

But of course, the law on the licensing of orphan works in the UK is changing very soon. The upcoming orphan works licensing scheme (coming into force on the 29th October 2014) will allow that a person can obtain a license for commercial or non-commercial use of an orphan work on payment of a nominal fee and demonstration of a ‘diligent search’. (There's a PDF summary of this new scheme over at the Intellectual Property Office's website, with more on diligent search here). At time of writing, there is very little up there about how the process will work, or what the "nominal" fee would be (one person's nominal is another person's how-bleedin'-much?) but that's one to watch. Come the end of October, I'll start a blog post chasing this image through the Orphan Works Licensing Scheme: who knows, within a few months, you may be able to make some It's Lolly Time! merchandise yourself, should you care to.

It's been a fun journey, chasing something from idea to conception to manipulation to production. I've learned a lot about how we are delivering digital content to end users in the gallery, library, archive and museum sector, and also how frustrating it can be at times. But look, I've eventually ended up with a bespoke thing that I love, just for me. And once I've published this blog post, I'm going to start wearing the scarf that I made, just in time for winter-a-comin' in.

*One final thing to say: eagle-eyed regular readers may know that I'm currently serving on the board of the National Library of Scotland, but I applied to use this image from my civilian, non-work, unidentifiable email account, so as not to get any special treatment in the process of licensing. It has to be said though, that being on the board was the reason I was flicking through past catalogues of their exhibitions in the first place! And I'm personally glad I found something in the NLS collections that so tickled me: a little bit of Scotland to remind me of where I'm from, and an emotional attachment to a piece of digitised cultural heritage.

Monday, 6 October 2014

Reuse of Digitised Content (1): So you want to reuse digital heritage content in a creative context? Good luck with that.

Although there is a lot of digitised cultural heritage content online, it is still incredibly difficult to source good material to reuse in creative projects. What can institutions do to help people who want to invest their time in making and creating using digitised historical items as source material?

 

 The Garden of Earthly Delights, repurposed over at Etsy

Over the last few months I have become increasingly interested obsessed with creative reuse of digitised cultural heritage content. We live at a time when most galleries, libraries, archives and museums are digitising collections and putting them up online to increase access, with some (such as the Rijksmuseum, LACMA, The British Library, and the Internet Archive) releasing content with open licensing actively encouraging reuse.  We also live at a time where it has become increasingly easy to take digital content, repurpose it, mash it up, produce new material, and make physical items (with many commercial photographic services offering no end of digital printing possibilities, and cheaper global manufacturing opportunities at scale being assisted with internet technologies). What relationship does digitisation of cultural and heritage content have to the maker movement? Where are all the people looking at online image collections like Europeana or the book images from the Internet Archive and going... fantastic! Cousin Henry would love a teatowel of that: I'll make some xmas presents based on that lot!

I'm not the only person interested in this: The British Library is currently tracking their Public Domain Reuse in the Wild, looking to see where the 1 million images they released into the public domain, and on Flickr, end up being used. At the moment, they manually maintain a list of creative projects of what people have got up to with their content. And people are using digitised stuff: pop over to a commercial fabric printing service like Spoonflower and you can see people grabbing creative commons images off Wikipedia and providing the means to print them on a whole range of materials for creative reuse. At Spoonflower, people are remixing images, providing opportunities for creative projects, designing and playing with available heritage content, using it as a design source and inspiration, although many dont quote the source of their hopefully out of copyright images used a basis for fabric design. Pop over to Etsy, and you can see (as the illustration above shows) high res images of historical art and culture turned into coasters, corsets, bangles, pillows, phone cases, jewellery, etc - and mashed up and remixed into further creations, all of which are for sale (although, again, where they got the source images from isnt usually made clear, and there are obvious copyright infringements happening in some cases). But overall, I'm left wondering why more use isn't made of online digital collections - and why we havent seen the "maker's revolution" where everyone is walking around going "this old thing? I cobbled it together from public domain images on wikimedia and had a tailor on Etsy run it up for me!" - or even see more commercial  companies start to use this content as the basis for their home and fashion collections on the high street. There are now funding programs and efforts to help try and help the exchange between the "multiple sub-sectors of the creative industries and the public infrastructure of museums, galleries, libraries, orchestras, theatres and the like" and funds for "collaboration between arts and humanities researchers and creative companies" etc etc - in this this new "impact" world, allowing reuse of your content will probably score huge brownie points - but what can institutions be doing off their own back to make sure the digitised content they spent so much time creating is used, and reused, further?

I was really impressed, at DH2014, to see Quinn Dombrowski have an entire wardrobe made with fabric designed using heritage content images in the public domain, and this inspired me to think: I should have a go at this. I should find something which is digitised and online, that I like, that I can access, that I can repurpose, and make something that I want and will use from it. What larks! But the rest of this blog post is an expression of sheer frustration at the current state of play of delivering digitised content online, for people who want to take digitised content, and reuse, and repurpose it.

Before I get started: let me make clear that I'm entirely supportive of folks like the Rijksmuseum, LACMA, The British Library, and the Internet Archive making their out of copyright images freely available for folks to use. Its absolutely the right thing to do, and I'm not going to start railing against them (there are, of course, many institutions who haven't made their digitised content available and they deserve railing against.) But with that caveat in place, let's broach some frustrations of someone looking through digitised heritage content, wanting to get a decent image of something they want, to reuse in a way that they would like (whether or not that involves paying for the privilege - this isnt just about getting stuff for free, its about getting it at all). It isnt pretty.

1. So much stuff, such poor interfaces. 

Yay! so much stuff online! Europeana now has over 30 million items online from 2000 institutions! Flickr Commons has a tonne of stuff online! Flickr is now being used, independently of the commons, to host tens of millions of digital cultural heritage objects, by thousands of institutions! But for a user, browsing through this stuff, it is nigh on impossible to navigate or search Flickr in any meaningful way, and sift through this, simply because Flickr's interface is so poor (and often the content isnt tagged very well, so isn't very findable).  What if institutions dont use Flickr? Dont get me started on content management systems, and their "user friendly" interfaces, such as Aquabrowser, or Digitool: shudder. Unless you know exactly what you are looking for, it's incredibly difficult for a user to browse and view content - and there is a lot of dross out there to sift through. Finding decent images that are interesting from a design perspective is a time consuming, utterly frustrating task. I speak from a few months of chuck-my-computer-across-the-room frustration in trying to navigate ( mostly unsuccessfully) what the cultural heritage sector has spent millions of pounds putting online.

Suggestion: Institutions should use a little resources to get folk with any sort of graphic or design background help sort through the thousands or millions of images and present to their users a curated collection of a few hundred really good things which are ripe for using. Heck, put together some downloadable packs of images of art, logos, boats, trains, etc. Here are 10 great images of witches you may like to play with! At the moment you are making users work too hard to sort through the digital haystack to find the interesting, usable needle. No wonder much of the content isn't used - people simply cant find it, or they walk away from your rubbish interface before finding that digitisation diamond.

2. The shackles of Copyright, part 1: aesthetic.

The copyright free images which are put online free to use are out of copyright (duh) which means they are from a particular time period: generally pre-1920s (depending on the country's copyright laws). There's a lot of stuff up there, but an incredible amount of it is Victoriana, which has a particular aesthetic. This is great if you are into Steampunk (check out the first few pages of the Internet Archive book images Flickr stream and you'll see what I mean) but... having scrolled thought oodles of this stuff, it just doesnt float my boat. I'm into mid-20th-century design, so that puts me into an entirely different category of user: one who is going to have to sort out permission for reuse for items still in copyright, if the institution hasnt sorted out copyright before publishing online. B*gger. This isn't going to be as easy as it first appeared for me, then.

Suggestion: Institutions should cherry pick a few in-copyright items that are really very reusable, and preemptively clear copyright under various licenses. Here are 10 fabulous 1950s illustrations which we have arranged for you to use under a creative commons license! (There is some of this stuff up on Flickr Commons, but it is in the minority). I understand the resources which are required for this, but really, institutions could be leading the way in making images of selected in-copyright items available and usable for people, to encourage uptake and creativity. Or - at least - make processes for chasing copyright clearance a bit clearer to users. Information on that is very sketchy, to say the least, and its often impossible to even find out who in the institutions to email about rights clearances.

3. The shackles of Copyright, part 2: cowardice.

Let's put aside the wonderful work of those who are bravely making their collections available for reuse, and arranging licensing for folks to do so, and address the majority of institutions who dont do this. Say you think... I'd like to make some of my own stationery! I know, I'll pop over to Europeana, and grab some cool images of old envelopes, and print up some notecards with those on (not to sell! just for my own use!). There's 6563 images labelled "envelope" currently in Europeana.  The licensing for these - what you can and cant reuse - is incredibly confusing. Only 60 of these items have been put into the public domain. I have no issue with institutions wanting attribution when their images are reused - of course not - and you can do that with 592 images (although... how are you going to provide attribution on fabric or a cushion or a corset or a bracelet, etc). My beef is with the quarter of these digitised items which allow access but no further reuse of the images. Seriously, why not? What are you scared of? That someone is going to pop over to Photobox (other commercial photo printers are available) and make up some notelets? That someone will make a corset out of those image and sell them on Etsy? Quite frankly, if your stuff is out of copyright, and if you dont have the nous or cant afford to employ a graphic designer to turn your images of envelopes into going commercial concerns, good luck to anyone who can. I dont get why you would put images of old stuff online and say to the users "You can't use it. At all". What are you afraid of? (I also presume here that people wont use digital images when they dont have persmission to do so. Which is nonsense. People will take it and use it anyway).

Oh yeah, you are saying, but copyright is complex, envelopes are manuscripts, manuscripts never go out of copyright, blah blah, till the cows come home. But just let people reuse digital content, and good luck to them. Seriously, what is the worst that could happen? That something archival takes off and becomes another "keep calm and carry on" meme? But really - wouldnt your institution love to be the source of one of those, for perpetuity?

Yes, I did find a really good image of an envelope I wanted to use on some notecards, but couldnt get permission to do so (hence choosing it as an example). I'll address licensing and paying for image licenses in another blog post (I'm not averse to that either. At the end of the day, just let me reuse that cool image, even if I have to pay license costs to do so).

All over the world, institutions are digitising cultural heritage content and putting it online with restrictive licensing which means that users cannot do anything at all with it (at least not without jumping through lots of begging hoops, or using it illegally). Not use it on a blog post. Not print it on a home made birthday card. Not make their granny a key ring with it on. Not make a scholar who is an expert in this field a mug with it printed on for their retirement present. This seems absolutely bonkers to me - and a complete waste of limited resources in the sector. What "access" do you think you are actually providing, if its only of the "look but dont touch" variety?

Suggestion: if you arent going to monetise it yourself, just make it available for others to reuse, with a generous license. Go on!

4. Image quality

All I want is a clear, 300dpi (or higher) image of the digitised item. Its no use saying "this is in the public domain!" if you only provide 72dpi: you cant do anything with that, except stick it up on another webpage. Just give me a reasonably high resolution image, and let me go and play with it. Cheers! So, so much of the "public domain" material is quite low resolution, which stops people from using the images for creative purposes. Maybe that was your plan all along (ha ha! we'll put this online but only at low resolution! that'll thwart those corset makers!) but seriously, 300dpi. Let folk have at it.

One other point: if you are using algorithms to crop lots of stuff before sticking it up on Flickr, please make sure that it works, and isnt cropping things too tightly. I understand that its all about efficiency and storage capacity - you dont want to be storing tens of millions of blank pixels and paying for hostage for empty content - but if you crop things too closely, its just unusable. Another reason I stopped looking for images in the Internet Archive Book Images Flickr pool was all the ones I want were shaved off. I know! I'll make a montage of ye olde fruit and veg! except this apple is cut off at the bottom, these carrots are missing part of their top, this apple sliced right through, as are these peaches. Thanks for offering to give me all this stuff free, but its unusable for creative purposes unless you give me a whole illustration, not one that has been chopped off around the edges.

Suggestion: 300dpi, at least. Cheers, love. 

5. Checking the maker privilege

Its worth just remembering that you may be making some content freely available, but its still actually quite costly for people to do anything creative with it where digital printing is concerned, especially in small print runs, making individual items, etc. It takes significant investment of time and resources to take an archival tiff and turn it into, say, a cushion (or a corset). I'm not really sure what I'm trying to say here in making that point (isnt that what ranty blog posts are for?)... perhaps it offsets the feeling that institutions are giving this stuff away for nothing: people reusing digital images are putting in significant time and often money to turn them into something else. It becomes co-creation, rather than mere duplication. Or something. It's certainly not an activity that is available to those without the skills to do image manipulation (despite many publication features being available on these commercial digital image printing websites: if you want to do anything that deviates from very simple printing, it still takes time and effort to set up). It still takes skill and resources and sometimes training and probably talent to make something nice and that people will want from something someone else has digitised, and it often takes a huge amount of time. It certainly surprised me how long the selection and preparation of items takes before you get to the stage of sending something to the print shop. So let's all proceed in a realm of mutual respect and adoration, yeah? Love the provision of high quality digital heritage imaging online: love the people who have the sewing chops to make the corsets.  (There are also ethical considerations if people start sending high resolution images of items to be made into products in "cheaper" international production contexts, but I'm not sure realistically how that can be broached by image licensing).

Suggestion: Wonderful things can happen when individuals work with institutional digitised content! sometimes.

Conclusion

Overall, here is what institutions can do if they want people to really use digitised content:
  • Put out of copyright material in the public domain to encourage reuse. Go on! what are you scared of?
  • Provide 300dpi images as a minimum. 
  • Curate small collections of really good stuff for people to reuse. Present them in downloadable "get all the images at once" bundles, with related documentation about usage rights, how to cite, etc.
  • Think carefully about the user interface you have invested in. Have you actually tried to use it? Does it work? Can people browse and find stuff? Really?
  • Make sure the image quality is good before putting it online. Dont chop bits off illustrations.
  • Make rights clearer. Give guidance for rights clearance for in-copyright material, and perhaps provide small collections with pre-cleared rights, to allow some 20th Century Materials to be reusable.
What do we want! Curated bundles of 300dpi images of cultural heritage content, freely and easily available with clear licensing and attribution guidelines! When do we want that? Yesteryear!

So what about me, and my task? Did I find something that I like, that I can access, that I can repurpose, and make something that I want and will use from it? After a few months trawling digitised collections online, I eventually stumbled across something which I adore, which got sent off to the print shop last week. I'll be waiting by the postbox over the next few days, in the hope that my investment in time and resources has paid off: I cant wait to see it IRL. But that, my friends, is for another blog post. And in the meantime, I leave you with this conclusion: institutions can be doing so, so much more to help those wanting to use digitised content creatively.



nce onlin
release of 1 million images and counting into the public domain and on to Flickr Commons - See more at: http://britishlibrary.typepad.co.uk/digital-scholarship/2014/03/tracking-public-domain-re-use-in-the-wild.html#sthash.pPTYhUN3.dpuf
the release of 1 million images and counting into the public domain and on to Flickr Commons - See more at: http://britishlibrary.typepad.co.uk/digital-scholarship/2014/03/tracking-public-domain-re-use-in-the-wild.html#sthash.pPTYhUN3.dpuf
the release of 1 million images and counting into the public domain and on to Flickr Commons - See more at: http://britishlibrary.typepad.co.uk/digital-scholarship/2014/03/tracking-public-domain-re-use-in-the-wild.html#sthash.pPTYhUN3.dpuf
the release of 1 million images and counting into the public domain and on to Flickr Commons - See more at: http://britishlibrary.typepad.co.uk/digital-scholarship/2014/03/tracking-public-domain-re-use-in-the-wild.html#sthash.pPTYhUN3.dpuf

Wednesday, 1 October 2014

Want to be taken seriously as scholar in the humanities? Publish a monograph

(This is the unedited version of a piece published yesterday over at Guardian Higher Ed.)

A decade ago, in my first year as lecturer in a Humanities department, an eminent Professor helped me secure a book contract with a top university press for my recently completed doctoral thesis. Another senior colleague stopped me in the corridor: “This is very rare,” she said. “And this is what gets you ahead in this game.” The book itself is a lovely object, of which I’m still very proud (it took me four years of doctoral research, plus another two years of preparation). It only sold a few hundred copies: enough to make the press happy, and to give me annual royalties of a fiver. There is an ebook, comparable in price to the physical version, but no Open Access version. Despite little proof that it is well read, it has been cited just enough to give me another elusive point on the dreaded H-index. We don't write Humanities monographs for riches, we may do for an attempt at academic fame, but the career kickback for me was rapid promotion. In the Humanities, the monograph’s the thing.

Today, the Humanities publishing landscape is, of course, changing alongside every other. We must work through the potentials and issues that digital technologies bring. With digital publishing comes the uncoupling of content from print: why should those six years of work (or more) result in only a physical book that sits on a few shelves? Why can’t the content be made available freely online via Open Access? Isn’t this the great ethical stance: making knowledge available to all? Won’t opening up access to the detailed, considered arguments held within Humanities monographs do wonders for the reputation and impact of subject areas whose contribution to society is often under-rated?

Research councils are prescribing Open Access requirements for outputs which will be submittable in the next REF, and there are now nods towards monographs being included in those requirements at some elusive point in the future. The Humanities’ dependency on the monograph for the shaping and sharing of scholarship means that
scholars, and publishers, should be paying attention.  How will small-print runs of expensive books fare in this new “content should be available for free” marketplace? How will production costs be recouped? Predatory models are already emerging, with established presses offering Open Access monographs alongside the print version for an all inclusive £10,000 charge to offset a presumed (but not proven) fall in revenue: out of the reach for most individual academics, or many institutions. I certainly couldn't have afforded those costs, a junior academic fresh out of the doctoral pod, with student debt hanging around my neck.

The latest JISC survey on the attitudes of academics in the Humanities and Social Sciences to Open Access monograph publishing makes an interesting contribution to this debate, showing how central single author monographs still are to the Humanities, and how important the physical – rather than digital – copies are. People still like to read, and in many cases buy, them. The survey suggests monographs are fairly easy to access even in physical form (inter-library loan, anyone?). Open Access is welcomed, and is seen to increase readership, but the physical object is still central to the consideration of the monograph: something which should allay fears of publishers wondering how any change in the REF requirement will affect their bottom line.  The most difficult problem seems to be securing a book contract in the first place, whether that has an Open Access option or not: the survey clearly shows that ECRs need help and guidance to do so.

Will I publish another monograph without an associated Open Access version? No, but getting published in the first place is the important thing. What advice do I have for early career researchers looking to publish their doctoral thesis, especially if they had the chance to do so with a strong, established academic publisher? The monograph is still the thing: anyone who wants to be taken seriously as a scholar in the Humanities should work towards having one. Open Access requirements are on the horizon, so broach them with the publisher. Don't accept £10,000 costs. Brandish this survey, say People Still Buy Books. Ask for help from those further along the academic path to help you navigate the pre-contract stage. Even with the changing publishing environment, some things stay the same: the importance of the physical single author monograph, and the importance of academic patronage.

Tuesday, 27 May 2014

Inaugural Lecture: A Decade in Digital Humanities

This is the crux of what I planned to say - or hoped to say! at my professorial inaugural lecture at UCL on the 27th May 2014. I'm not one for reading off a script though, so may have deviated, hesitated, or expanded on the night. A video of my talk on the night is now available. No I haven't watched it myself!



I decided to call my inaugural lecture "A Decade in Digital Humanities" for three reasons.
1. The term Digital Humanities has been commonly used to describe the application of computational methods in the arts and humanities for 10 years, since the publication, in 2004, of the Companion to Digital Humanities. "Digital Humanities" was quickly picked up by the academic community as a catch-all, big tent name for a range of activities in computing, the arts, and culture.  A decade on from the publication of this text, I thought it would be useful to reflect on the growth, spread, and changes that had occurred in our discipline, and my place within them.

2. This year sees me in my 10th year of being in an academic post. I joined UCL in August 2003, my first academic post after obtaining my doctorate, and since then have worked my way up the ranks from probationary lecturer, to senior lecturer, to reader, and now full professor. The professorial lecture gives me a rare chance to pause and look behind me to see what the body of work built up over this time represents, and what it means to be undertaking research in this area.

3. You'll have to wait for later in the lecture to see the third reason...

Who here would be comfortable defining what is meant by the term Digital Humanities? In this, the week of UCL Festival of the Arts, celebrating all things to do with the Arts and Humanities, let's go back to first principles. In UCLDH and 4Humanities' award winning infographic "The Humanities Matter" we defined the humanities as "academic disciplines that seek to understand and interpret the human experience, from individuals to entire cultures, engaging in the discovery, preservation, and communication of the past and present record to enable a deeper understanding of contemporary society." It stands to reason, then, that the Digital Humanities are computational methods that are trying to understand what it means to be human, in both our past and present society. But it may be easier if I give some brief examples to demonstrate the kind of work we Digital Humanists get up to.

One of the easiest things we can do with computers is count things. For data to be computationally manipulated, it has to be in numeric form. If we can get text into a computational form, we can easily count and manipulate the language, showing trends across time. For example, if we take a million words of conference abstracts from my discipline from the ALLC/ACH conference across various years, we can easily see how mentions of one technology (XML) becomes more popular, while another (SGML) is in decline. Much of the work in DH is in manipulating and processing and analysing text - our iOS app Textal is just part of that trajectory. Much of my work, though, has been in digital images, starting with developing systems to try and read damaged documents from Hadrian's wall, and more recently working on multispectral and 3D manipulation of damaged texts. We've also worked with museums on large scale 3D capture of cultural and heritage objects. The important thing about all of this is that as well as implementation, we're also interested in use and usage of these technologies, and what impact that they have on those working in culture and heritage, and the ability to study the past and present human record. We often innovate new systems, or adopt concepts and apply them to humanities projects, such as the crowdsourcing of Jeremy Bentham's handwriting by volunteers, or working with visitors to the Grant Museum of Zoology at UCL to encourage debate about zoological collections. We build, we test, we reflect back on what using these technologies means for the humanities, giving recommendations which can be useful across the sector. From these projects, its difficult to pin down what Digital Humanities actually is, but that sums up the difficulty of our discpline's title: it encourages thinking about computational methods in the arts and humanities, and then into culture and heritage, in as broad a sense as possible.

What made Digital Humanities spring, fully formed like Athena from the Head of Zeus, as an academic field in 2004? Was it because that was the first time quantifiable methods had been used in the Arts and Humanities? (remember - all computational methods require quantification). Well, of course that is nonsense. When you look back across the history of Humanities scholarship, quantifiable methods were used in the Arts and Humanities since the birth of Universities. If we think of the book as technology, from its inception scholars took it to pieces to see under the hood: concordances and indexes of works were manually created, such as this "Concordance or table made after the order of the alphabet" from 1579 which lists how many times concepts such as "abomination" appear in the New Testament. Or the work of Joseph Scaliger who in the early 1600s plotted the different periods in time in which different civilizations must have existed, through quantifiable methods. Or the work of August Schleicher in the 1850s who showed, by quantifiable methods, that the languages of Europe must have had a common historical root. All of these texts are available from UCL Library, none of which I have to leave my sofa to see because YAY! Digitisation! Changing humanities scholarship! - but the point is that quantifiable methods are part of established methods in the humanities, and have been for as long as the Humanities have existed. So when I undertook my first project at UCL, looking at whether we could use the high performance computing facilities at UCL to analyse historical census data - this is part of an quantifiable humanities academic tradition which harks back 500 years, just at a grander scale.

So what made Digital Humanities spring, fully formed like Athena from the Head of Zeus, as an academic field in 2004? Perhaps in 2004, this was the first time people had used computational techniques in the arts and humanities? But of course, that is nonsense too. When you look back at the history of computing - and not even digital computing, but the very first computer - the very first computer programmer, Ada Lovelace, hints at the possibilities for art, music, and understanding human knowledge and culture in her earliest writings. She understood that there was something more to the mathematical calculations afforded by this machine than science, and they called her a madwoman for it. Well, this madwoman has a (yet unproven) theory that if you look at the history of the first 100 electronic programmable computers in the 1950s, 1960s and 1970s across the world, you will see humanists eyeing them up and asking "how can I use, or develop this tool for use, in my research"? Its certainly true of Father Busa, working with IBM in the 1950s on the concordance of the works of Thomas Aquinas (counting, indexing, and manipulating words, as part of the historical trajectory of humanities methods stretching back 500 years, just a change in scale...) but also of Roy Wisbey, in Cambridge, who set up the Literary and Linguistic Computing Centre there in the 1960s. When the first computers arrived at UCL, the artists from the Slade School of Fine Art were over there like a shot to establish the Experimental and Computing Department. We should also mention Susan Hockey, who led various initiatives in text encoding, text analysis, and digital libraries. Susan, incidentally, gave me my first academic job here at UCL in 2003: UCL had included a Digital Resources in the Humanities module course as part of its MA offering for librarians and archivists in the School of Library, Archive and Information Studies (now the Department of Information Studies) from 2000, under Susan's auspices. But the point is, considering how best to use computing in the arts and humanities is not something which started in the 21st Century,  nor 2004, and Humanists have been looking at available tools, and how best to use them, since computation began. So when we undertook one of the latest projects at UCLDH, which came from looking at an iPhone, thinking "how can I use, or develop this tool for use, in my research in the Humanities" and developed an iOS app for text analysis, this is part of a longer trajectory of considering available computational tools, and how they may be appropriated, adopted, and adapted for our means in the humanities, just at a grander scale, as processing technologies increase in speed.

So why Digital Humanities, in 2004? Firstly, the coalescing of interested scholars into an identifiable field is an understandable academic response to societal changes. The speed of computing rises, the price of computing plummets, the information available on the internet (and the possibility to create new information) increases, use and usage of internet technologies has become commonplace. Remember, its up to Humanities scholars to look at the past and present record to enable a deeper understanding of contemporary society: quite frankly, it would be more alarming if an academic movement hadn't emerged looking at what using computational methods could do for our understanding of human society, both past and present, and how best we can grab the technical opportunities which fly by and appropriate them for our means, to inform both ourselves and others about the prospects of using computing in this area. The discipline of Digital Humanities is inevitable, and would have appeared whatever the title it was given.

Secondly, Digital Humanities is a handy, all inclusive, modern title which rebrands all the various work which has gone before it, such as Humanities Computing, Computing and the Humanities, Cultural Heritage Informatics, Humanities Advanced Technology... DH has a ring to is, and boy, what a rebranding it was. We tend to call it "Big Tent Digital Humanities" meaning: roll up! roll up! everyone using any computational method in any aspects of the arts and humanities is welcome! but really, Big Wave Digital Humanities may be more appropriate, as we countenance the sudden swell, dissipation, and speed of the activities of the discipline. Taking a peek at the mention of Digital Humanities on Google Ngrams we can see its sudden growth, and the fact that it is now used as a proper noun, with Capital Letters (although remember that this, counting words, is part of a long tradition of humanities scholarship, Google simply have more books to include in their count). We can see how DH has trended over time, appearing in headlines in the media. Many, many textbooks in DH appear, some of which I am responsible for myself. Journals appear, such as Digital Humanities Quarterly (of which I'm one of the general editors), and the ALLC/ACH conference renames itself Digital Humanities (this year, for my sins, I'm the Program Chair for DH2014 which will be held in Lausanne, Switzerland. We have seen over 700 proposals from more than 2000 vying for a space to present). There are many more DH conference presentations and workshop slots, worldwide, year on year. In 2010, I gathered together all the available evidence I could on DH in an infographic called Quantifying Digital Humanities, showing that there were 114 DH Centres in 24 countries. Today, not even four full years later, there are 195 DH Centers in 27 Countries. Those knowing how long it takes to set up a research centre know that this is phenomenal growth in the university and GLAM sector, and that institutional support must be strong, behind each and everyone of these.

UCL Centre for Digital Humanities is part of those who have joined the recently founded centres. We officially launched four years ago to the week of this lecture, in the same lecture hall where this lecture is being presented. We dont talk about the launch much - its not often I'm part of something at work which ends up featured in the political pages of the newspapers - but you'll have to google that to find out more (YAY! digital media! the internet never forgets!) but in those four years since launch we've undertaken a phenomenal amount of projects, covering many aspects of Humanities and Arts research, and considered Digital Humanities in its broadest sense. This isnt all me - there is an amazing team who are part of the Centre, and we've won various awards for our academic projects and collaborations, published many books, papers, and book chapters, and been part of successful funding bids from research councils worth tens of millions of pounds. One wonders what makes a Digital Humanities Centre attractive to universities that dont have one. Nope, I cant see what makes that level of activity attractive, at all.

So what proportion of Humanities scholars are now digital humanists? Back in 2005, participants in the Summit on Digital Tools in the Humanities at the University of Virginia estimated that "only about six percent of humanist scholars go beyond general purpose information technology and use digital resources and more complex digital tools in their scholarship" (p.4 of this PDF). By 2012, N. Katherine Hayles, in her chapter "How we think: transforming power and digital technologies" in David M. Berry's edited text "Understanding Digital Humanities", estimates that 10 per cent of Humanists are now digital humanists (p.59).  Now, in 2014, a forthcoming study from Ithaka S+R (with the working title of Sustaining the Digital Humanities: Institutional Strategies beyond the Start-up Phase) includes surveys of faculty at four American universities. In the departments surveyed at each institution, nearly 50% of faculty members indicated they have "created or managed" digital resources. Granted, the departments were chosen by campus staff (often at the library) who felt there was some significant activity taking     place there. The percentage of these "creators" was consistent across all universities (Brown, Columbia, University of Wisconsin, Indiana University), and most of the creators also felt that their creation was intended for public use (not just their own research aims), and would require ongoing development in the future.

50% of humanists are involved in digital activity, are digital humanists. How can this possibly be? And how can we conceptualise what it means to be a digital humanist, amongst this spread of activity and range of available technology: is creating or managing digital resources the same as being a digital humanist? At a time where (nearly) every library catalogue is digitised and available online, and (nearly) every book manuscript written on a work processor, and many historical documents digitised and available for consulting from your own sofa, does that make everybody working in the humanities a digital humanist? How can I begin to conceptualise my contribution, and my place, and where my work sits within Big Wave Digital Humanities?

I find it useful, here to turn to Roger's Innovation Adoption Curve, a sociological model that looks at how technology spreads through society. This is a bell curve, and right at the start of adoption of technology, are a few innovators, experimenting (and developing) new technology. These innovators sometimes persuade a larger number of early adopters to take up the new technology on offer, and only once a sufficient mass of users are achieved, does the technology "cross the chasm" and become used by the majority of individuals in a society (who are split into an early majority, or late majority). Finally, we have adoption by the "laggards", who are slow in taking up technologies, but do so if they have permeated throughout society. (Hard not to think, here, of my elderly grandmother who recently got her first mobile phone).  Now, this model is useful as we can plot along it some of the technologies which are available to a humanist. Things like word processing, and searching for references online, and even looking up the digitised texts which I showed at the start of this lecture: even the technologically laggard humanists can do it now, and although these technologies are changing scholarship, its a question of scale (better! faster! more!) rather than of approach or technique, for the main. Technically facilitated tasks like updating websites, using and updating wikis, using social media: even the late majority of humanists can do it now. Online tools are available, such as Voyant, which allow you to do text analysis, and manipulate texts to see the underlying patterns: so the early majority of humanists can use these tools should they want to. But the most difficult, intellectual work of applying technology in the humanities still occurs before the chasm has been crossed, in the phase of innovation, and early adoption, where we are looking at the technologies that cross our path and saying "how can I use, or develop this tool for use, in my research?", much like those in the 1950s or 1960s who were coming across university mainframes and asking how best to apply that in the literary and linguistic arena. It's important to note, of course, that this wave of technology keeps on coming at us, and the place of where technology sits along the curve changes: 20 years ago, had you been making a website for your humanities project, you would have been an innovator, rather than a late majority, and the same holds for word processing 40 years ago. The technology keeps coming: we have to respond to this, innovate, adopt, and see what is useful or useable for, or used by, the majority of people in our discpline.

Now (and this is the most contentious thing I'm going to say in my whole lecture, for those attending who are dyed-in-the-wool Digital Humanists) one of the problems that we have as a movement is that we tend to get caught up and fixated upon a certain technological solution. For example, every DH program I've come across teaches XML, that technology which took over from SGML in the conference abstracts - as the best practice way to encode text. And there's no doubt that XML provides the framework with which we can both explore theoretically what is means to describe texts computationally, in such a way they retain the information in their printed or manuscript form, whilst also the means to build and test prototypes. But XML as a technological standard has been around for 16 years, and technology moves on, but DH doesnt seem to be doing so. In many ways, DH's relationship to XML is similar to the AI community's relationship with LISP: the means of computational expression in the language or format suit the questions which need to be asked by the field, so there is no need to use other technologies which come on stream, which may be more efficient from a computational point of view, as we explore what is means to work with our question in this computational way. And that's ok, but we shouldnt be blind to the fact that, hey! technology is advancing all the time and, also, XML is not a technology that crossed the chasm: it may be in use for technical systems, but its not one that you see a lot of the general populace using. This, in turn, means that DH has permanently hitched its wagon to an aging technology, which is hard to explain to others, including other non-XML humanists, whilst other things are happening in the technological world around us. Just something we have to watch out for, when building teaching programs, or looking at the scope of outputs in our field. We dont want to be left behind as the digital in digital humanities rolls on without us.


I find it useful to plot my research on the Innovation Curve, to see where what I am doing sits. So, the work on counting terms across a corpus - very much sits in the early majority, given the availability of tools to do so. But the work on building an iPhone app to do so - very much innovation: it took a lot of pure programming in a relatively new space to achieve it. The work in image processing I do is either innovation (we are publishing here in pure computer/engineering science venues, as well as in humanities venues, which I'm very proud of), or we adopt technologies our academic colleagues in the engineering sciences have generated and roll them out to a humanities or heritage application. Our work on user studies is something completely different though: here we are generally looking at how the majority of people are using an extant text, or (in the case of something like Transcribe Bentham, or QRator) we are conducting reception studies, where we innovate and build a technology, launch it, and study its uptake across the whole cycle. We can see, then a range of DH activity across the innovation cycle, but the majority of the work I do is certainly at the start of the innovation curve. Is this where DH sits? I like to think so, but more to the point, I'm confident its where I sit best, when doing DH.

I need here to show you another curve, though. This time, the Gartner Hype Cycle, which looks at how technologies are launched, mature, and are applied (so people know when to invest). The premise of this is that when technologies are first triggered, everyone thinks they are going to be the Next Big Thing, and so they reach "the peak of inflated expectations", before crashing down into a "trough of disillusionment" when those adopting them realise they aren't that great at all. Its hard work to get technologies up the "slope of enlightenment" where useful, useable applications are found, and few technologies make it to the "plateau of productivity" where they become profitable. Its a useful curve - this year's predictions show Big Data right at the top of the peak, which chimes in with media coverage of how it will solve everything, for example. So where would I put DH, if I had to as a movement, on this curve?

I'd put it at the top. At the top of the Peak of Inflated Expectations. We've got a lot of pressure on us to prove our johnny-come-lately benefit to the world of academia, to demonstrate our worth, to show that the investment made in us over the past few years is worth it (whilst also bringing in further investments in research funding, to meet institutional expectations). After a peak, comes a crash, and we have to be prepared for the tide to turn and the backlash to begin, after the years of media hype and raised expectations. So how do we get to the plateau of productivity of Digital Humanities?

First, I would argue that we have to understand our lineage: that the current manifestation of DH is a logical progression of qualitative methods used in the humanities for the past 500 years. That the current manifestation of DH is a logical progression of humans wondering what the potential is for applying computational methods to humanities problems, which has been going on in the digital space for the past 60 years. These combined trajectories aren't going away, and despite what funding cuts and media backlash may come at us, it is the role of the digital humanist to understand and investigate how computers can be used to question what it means to be human, and the human record, in both our past and present society. Secure in our mission, we can carry on whatever the storm throws at us.

Second, I would argue we have to ignore naysayers who are unsure about this new Digital Humanities lark (and believe me, there are plenty, even in my own department) and just do good work. The way to demonstrate our worth is to demonstrate our worth through doing good work. We have to keep asking questions about computational methods, computational processes, and the potentials that they offer humanities scholars, as well as the pitfalls, to explore this changing information environment from the humanities viewpoint. Its not just about building websites, or putting information online, its about innovating and adopting, and questioning while we build about the ramifications of doing this, the impact on the humanities, the issues using technology raises, and the answers it provides that you couldn't otherwise generate, to do good work in Digital Humanities. I realise this is very Calvinist of me - you can take the lass out of Scotland - but I do see that we have to be engaging with theories and questions of what is means to be doing this work in this way, as well as updating a website or creating a digital file. A continuation of what it means to be a humanities scholar, in the digital space.

I'm not one for looking back, and despite the title, I deliberately didn't want this inaugural to be a survey of all the projects I have undertaken over the past ten years - then I did this, then I talked to that person, then I visited there - but when I look back over the variety and range of projects, publications, and outputs that I've worked on, either on my own, or as part of a team (there's a lot of teamwork that has gone on here) I'm firstly surprised at how much of it there is and the range of topics we've covered, and the opportunities we've pounced on. I see a body of work which explores various aspects of what it means to be applying digital technologies in the humanities space, and facilitates both those in engineering science and those in the humanities to explore issues which are important to them. I've learn't things along the way about the nature of interdisciplinary work, the nature of teams, the nature of the academic publishing and peer review process, the nature of the grant funding process, but I've written about that elsewhere. There are things, also, that I am proud of that are physical rather than purely digital: over the last few years I'm most proud of building the UCL Multi-Modal digitisation suite, which is a shared space between the UCL Library Services, UCL Faculty of Arts and Humanities, and UCL Faculty of Engineering Science, contributing to the infrastructure of UCL in a collaborative endeavor. But what I see here, as a common thread, is that the work I do tends to sit right at the beginning of the technology adoption cycle, aiding and abetting the application of technology within the arts, humanities, and heritage, and I'm comfortable with that. There's a strength in knowing your place, and your remit, and what you do best.


So the third reason for calling my talk "A Decade in Digital Humanities" is that I didn't say which decade we were talking about, and it is time also to look towards the future, and what the next ten years holds for both DH, as the field turns into a teenager, and for me, as I go into my next decade here at UCL. I'm not one for crystal balls, so I'll keep my scrying brief. I see an inevitable fragmentation of the DH community and DH focus - it was never conceived of as a homogenous entity anyway, and it is the nature of waves and swells that they will dissipate. We'll see (we are already seeing) more focussed groups of scholarly work around, say, Geographical Information Systems and literature, as people specialise and work on specific technologies and specific methods. The technology will keep coming, and its up to individual humanities scholars to respond to what is appropriate to their research question: the effects of DH scholarship will continue to ripple out across the humanities as technologies go along the adoption cycle, and certain aspects of digital research will just become normal for humanities scholars, as time goes on. But I do see that there will always be a place, right at the start of the technology innovation uptake curve, for specialists in Digital Humanities to sit, watching out for these changing and emerging technologies, setting up pilot projects to experiment with different aspects of these technologies, feeding back recommendations and the potential ramifications for other humanities and engineering scholars and those within the wider cultural and heritage sector, and exploring what is means to be doing humanities research in that area. I'm happy to remain there, and I see that this will remain my place working with other humanists, and engineers and computer scientists, over the next decade. I'm delighted to be a co-investigator on the doctoral training centre for Science and Engineering in the Arts Heritage and Archaeology, which is the EPSRC's largest every investment in Heritage Science, and for the next 8 years we'll be training up a range of doctoral students in this cross section of the arts, heritage, humanities, and engineering and conservation science. (Perhaps what I really do is Heritage Science, but that's another talk entirely, and DH has work to do with the Heritage Science community in future).  That said, we do have work to do, in keeping an eye to making sure people know about the successes, outputs, and impacts of DH work. Given the expectations foisted upon us, we have to learn to be more vocal about our objectives, our remit, and our results. It's our job to be thinking what it means to use digital technologies in humanities research, and just research, full stop. As a result, our insights can benefit a range of other fields, if we communicate them effectively.

Digital technologies are not going away any time soon: and although DH has had a rapid swell, it will remain essential that we investigate, use, and experiment with technologies over the coming decade. There is a new Companion to Digital Humanities coming out in late 2014, showing how the technologies used in humanities research have developed since the first edition (I'm delighted to have written a chapter on our public engagement work for it), and our see our field, as well as knowing where we have come from, has to understand that the technological wave on which we sail is continually on the move. I hope I've shown here that our uptake of technologies in the humanities is, and will continue to be, a moving target, and that as part of a longer trajectory of investigation into humanities methods, DH is a modern but necessary, and even inevitable, part of the Humanities, and even computational, landscape. I look forward to what adventures the next Decade in Digital Humanities holds. There is so much to do!

Now, that is where I'd normally pause and say thank you for your attention, but hey, its my inaugural, so I'll cry if I want to. I have a few brief thanks to make - its quite a lick to go from probationary lecturer to full prof in ten years, and so I have to thank those who have supported me. Thanks go to my family up in Scotland for all their support, and my family of my own: many of you know that in the past few year's I've had three children, so biggest thanks of all go to my husband Os, aka Expert Sleepers, for his forbearance and baby juggling skillz. I've been blessed with an amazing support network of friends, who have supported my enormously over this period. My first academic supervisor was Professor Seamus Ross, who kick started my interest in this area, and his support and interest at the start of my career really set me up for the work I do today. Likewise, my PhD supervisor Professor Alan Bowman remains a fantastic mentor: thank you, Alan. My other PhD supervisor, Professor Sir Mike Brady, made me promise (when I got my doctorate in engineering) not to go near any nuclear power stations or bridges, a promise I have kept - thanks Mike. I've already mentioned that Professor Susan Hockey gave me my first academic job: but her work remains an inspiration on what is possible in computing in the arts and humanities. I work with an amazing team of people at UCLDH and I thank them for their input both for the centre and on our various projects. Special thanks go to Rudolf Ammann, our designer at large, who helped prepare the graphics for this lecture.

But in this week of UCL's Festival of the Arts and Humanities, its good to pause and see how embedded Digital Humanities research is now throughout college, and how much we work, in the Humanities, with those around us. The projects I've shown, albeit briefly, today, are carried out in league with various other faculties (UCLDH reports to both the Arts and Humanities and Engineering Faculties here). Colleagues come from a range of different departments including not only those across the Arts Faculty, but the Bartlett Centre for Advanced Spatial Analysis (in the UCL Bartlett Faculty of the Built Environment), and across the UCL Faculty of Engineering (I have joint projects with Medical Physics, Computer Science, and Civil, Environmental, and Geomatic Engineering). We are dependent on input from both our colleagues in UCL Library Services, and UCL Museums and Collections, and work very closely with items in all the collections across college. The success of DH at UCL is then dependent on the institutional context we have here. Digital Humanities is now embedded into college life at UCL, and in this week of the Festival of the Arts, my final thanks go to UCL as an community for its institutional support in encouraging us to ride the DH wave: for without being at UCL, my decade in digital humanities would have been completely different.











Saturday, 24 May 2014

Roy Wisbey, and Literary and Linguistic Computing, 1965 style



I recently got in touch with Professor Roy Wisbey, who set up the University of Cambridge's Linguistic Computing Centre in 1960, to invite him to my inaugural lecture. He is not able to attend (but passes on his regards to those who know him!) and he also briefly loaned me this newspaper article, from 24th September 1965, from the Cambridge News. A very early piece of Humanities Computing history! It's in very fragile condition - I've spliced it together here to give the whole piece in one image (and the blog stylesheet is not my friend here - will sort out later - but...) - enjoy!

The use of computers will save the scholar years of mindless drudgery! indeed!




Friday, 16 May 2014

Siberian Digital Humanities Adventure

The Siberian Federal University
Greetings from Krasnoyark, Siberia, where for the past week I've been hanging out at the Siberian Federal University, the largest university in the Siberian region, which is in the top rankings in Russia. I've been giving some guest lectures on digital humanities, meeting various staff and students, and plotting with them on how to support their work and how to make connections to the wider digital humanities community.

How did I end up here? Its all down to the wonderful Inna Kizhner who approached me nearly two years ago, in my guise then as secretary of what is now the European Association for Digital Humanities. After helping source some teaching materials, in English and Russian, for their taught courses, Inna remarked to me "no-one ever comes to Siberia..." and I immediately said "ask me!". And finally, after much preparation, here I am.


Siberian Federal University are establishing a solid Digital Humanities presence. In the Institute of Humanities they currently offer digital humanities modules at both undergraduate and postgraduate level, and also an undergraduate module in the subject area of digital history (which next year will be taught by Inna). They have a digital lab (door sign, above!)  and digitisation lab. They have a range of projects they have been working on with both researchers and students, many of them led by Maxim Rumyantsev who is now the university's deputy head, so there is positive institutional support here. These projects are mostly in the area of multimedia and digitisation. For example, working with the Museum of Geology of Central Siberia to create the simply stunning companion to their minerals collection (it is no easy task to capture minerals in this detail, at this quality); capturing, virtually exploring,  and explaining regional heritage architecture (which is fast disappearing under new developments in this region) from the nearby town of Yeniseisk, documenting regional art shows and youth art shows; capturing high resolution images of the art contained within the Surikov Museum (life size copies of which adorn the university's walls at every turn); working with Gigapan capture methods and the State Russian Museum to create zoomable images of large art works (can you spot Pushkin?); and creating an interactive model of the Siberian Federal University campus itself. They are keen, now, to be making connections with others across the world, and I'm delighted to be helping them, and introducing them to various figures, and associations, in Digital Humanities. There is much work to be done, we have plans set out, and they are keen to make new relationships and new collaborations.

Its not all been work! I've been welcomed into colleagues' homes for meals (often meeting their families), treated at friendly restaurants (the food is wonderful), and toured round museums and supermarkets (Inna patiently put up with me pointing and exclaiming at various products we dont have in the UK, such as dried fish, and tinned horse). Today we went to the Krasnoyarsk Dam, 30km upstream from the city, on a glorious spring day which showed off this remarkable feat of engineering (which is so exceptional it features on banknotes across Russia). There is a heavy security presence, and no photos allowed, but I did manage this sneaky selfie...


It's been a fantastic, trip, and I've been very welcome here. Thanks to Inna, Maxim and Marina for their hospitality, and I look forward to further opportunities, visits and introducing anyone who wants to be introduced (if I can be of help, drop me an email and I will forward it on). I have to admit I was nervous about my trip here - but instead of stress I've found friendly connections, and much opportunity to help further establish DH in this region, and throughout Russia. Now to pack, and begin the long trip home, where my three small boys are missing their mummy on the other side of the world (and I them). до свидания!

Thursday, 15 May 2014

Digitisation's Most Wanted

What are the most commonly accessed digitised items from heritage organisations? Even asking the question leads to further understanding about the current digitisation landscape.


Have you seen this Dog? Last spotted on the Flickr account of the National Library of Wales. Dog with a Pipe in Its Mouth, Taken by P. B. Abery, 1940s.
Last month, at a meeting at the National Library of Scotland, an interesting fact flew by me. The NLS has hundreds of thousands of digitised items online, so what do you think is the most popular, and most regularly accessed and/or downloaded? (it is difficult to make the distinction regarding accessed or downloaded on most sites.) Is it the original Robert Burns material? The last letter of Mary Queen of Scots? or any of the 86,000 maps held in this, one of the best map collections worldwide? No. It is "A grammar and dictionary of the Malay language : with a preliminary dissertation" by John Crawfurd, published in 1852. This is accessed by hundreds of people every month - mostly from Malaysia, partly because it is featured on many product pages providing definitions of malaysian words - demonstrating the surprising reach and potential in digitising items and then making them freely available online, reaching out to a worldwide audience far beyond the geographical local of the library itself. Wonderful.

This left me pondering... what are the other most downloaded items at major institutions in the UK? So I sent out some feelers, and here are the results, demonstrating both the hidden complexity of the question, and the relationship of digitised heritage content to the current online audience landscape.

At Cambridge University Library, the most accessed collection overall is the Newton Papers, which was the first major digitised collection launched by the Library in 2010, and promoted widely. Within that, there is one particular notebook (which Newton acquired while he was an undergraduate at Trinity College and used from about 1661 to 1665 for his lecture notes) which is the most popular, featuring heavily in the initial promotion of the collection, and also in an In Our Time special series hosted my Melvyn Bragg on Radio 4.  But within that notebook there is one page that is accessed more than the others, with most of the traffic coming from Greece. Why? This page was picked up in the Greek press and pointed to on many websites, blogs, newspaper reports, and in social media as evidence that Newton knew Greek. The links that remain still direct thousands of users to view Newton's jottings from his Greek lessons at the front of the book, showing the fascinating relationship between publicity, social media, linkage, and an item which reflects national pride, to a worldwide audience.

The most downloaded items at Cambridge also reflect the rapidly changing mentions of items on social media: in April 2014, an item downloaded/accessed more than 6000 times was the Breviary of Marie de Saint Pol, which went live this month. Why the sudden notice? On the 3rd of April, one of the Cambridge colleges with thousands of followers posted a link to it on Facebook followed by the Cambridge Digital Library Facebook and Twitter feed on the 4th of April. Retweeted a few times, these few postings led to the thousands of views of the document, demonstrating the growing importance of using social media to tell people about newly mounted digitised content.

Over at Trinity College Library, the most accessed item from their digital collection in general is the Book of Kells,  which again was their first major digitised item, heavily promoted in the press, and attracting a level of viewing that is unique due to general tourism and cultural heritage interest. The second most accessed digitised item is the surprise: a book of Lute music by William Ballet, from the 17th Century. There is much discussion of this item, and links to it online, posted by online communities of lute players, and those who blog about lutes worldwide. Interest and demand in at item can therefore be encouraged if interested online communities hear about it, and share with their membership.

A similar tale about the importance of publicity and social media emerges from the British Museum. There are popular items about the Viking exhibition which are linked from their home page at the moment given the current exhibition, but since the 1st January 2014 til now, the most popular item accessed in the digital collection (no, wait, go on, guess.... Rosetta stone? Vindolanda Tablets? ...) is the Landscape Alphabet by Joseph Hulmandell (no? me neither). These were discovered and shared on social media by type enthusiasts on twitter  in mid February, and promoted by the cool-hunter the Laughing Squid who has almost half a million followers on twitter, which caused a sudden spike (I cant see the British Museum actually tweeting them out themselves on their timeline).  However, the initial swell of tens of thousands of hits has since dwindled to nothing, showing the fickleness of attention that comes with the social media stream. In 2013, the most single viewed item at the British Museum was... (go on, guess!)... a lead sling bullet, viewed 42,156 times in total. Why? It was picked up on reddit, due to the sarcastic inscription "some ancient sling bullets excavated from the city of Athens, Greece were inscribed with the word "ΔΕΞΑΙ" (dexai), which translates to "catch!"" which generated a lot of online LOLs ("Halt gentlemen. Do not yet partake of the feast before us, for I must capture the image of it with instagram whereupon I shalt bequeath it to my herald upon Facebook for all to see." here) and this encouraged  - and still encourages - visitors to the British Museum website: some forms of posting on social media generate the long tail of usage more than others.

Things start to get more complicated when various digital asset management systems (DAMS) come into place - often institutions have more than one database of digitised content, from different suppliers, with different licensing restrictions and requirements, and so ascertaining the most viewed single item is not a simple question. Organisations also post and share content in various different places. The National Library of Wales are looking through their DAMS to see which items are the most accessed, but immediately know that the most popular item they hold that has been posted to Flickr (with no known copyright restrictions, contributed to Flickr Commons) is the photograph at the top of this post, Dog with a Pipe in its Mouth, from the P. B. Abery Collection. Again, this is an image which has been mentioned regularly on blogs, social media, and internet chats, as well as being a featured image on the 2013 anniversary of Flickr Commons: the fact that it has no copyright restrictions encourages its reuse - and therefore traffic towards its host institution's site, if those users point back to it - online.

The libraries at Oxford University, including the Bodleian, have been digitising items for over twenty years, and so it is difficult to say what the most accessed or popular items are, due to the way the systems have been designed, implemented and integrated over the past two decades. Their most downloaded or accessed digitised book, scanned in collaboration with Google, is probably the "History of the Scott Monument, to which is prefixed a biographical sketch of Sir Walter Scott" by James Colston (published 1881) - a freely downloadable version is available from its library record (ignore the resellers offering printed versions generated from this for much cost on amazon and eBay!). As far as images are concerned, the most popular at Oxford are among those listed on Early Manuscripts at Oxford University, partly because many of them have been up continuously for twenty years (legacy data for the history of downloads of specific images are not available, indicating how difficult it is to access long term data about this. Server logs get very big very quickly and so are generally periodically discarded, and it is only recently that reporting facilities such as Google Analytics have allowed a quick and easy overview of the usage of websites). Currently popular digitisation projects at the University of Oxford Libraries are the Polonsky Foundation Digitization Project, and the recently launched digitized First Folio of Shakespeare's works, but there isn't sufficient data available from all the digital collections to be able to say one way or the other which is the one most popular project, never mind item. It was also pointed out, though, that you would probably struggle just as much (if not more so) to identify which has been the most requested book in the Bodleian's collections!

This trend of databases complicating the question continues at the British Library, where their digitisation outputs and projects are made available via multiple platforms and viewers, some managed by the British Library, and others by commercial partners, with some content available for free, other content via subscription, or paying a fee per image. These are only some of the most popular different sites: https://imagesonline.bl.uk, http://www.bl.uk/treasures/treasuresinfull.html, http://www.bl.uk/manuscripts/, www.sounds.bl.uk, https://www.flickr.com/photos/britishlibrary/, http://www.britishnewspaperarchive.co.uk/, http://find.galegroup.com/bncn/, http://gdc.gale.com/products/17th-and-18th-century-burney-collection-newspapers/ and the BL module on http://www.biblioboard.com/libraries.html. In addition, there are BL digitisation partnerships with other content providers, for example http://idp.bl.uk/ and http://eap.bl.uk/. Finding out the most accessed digitised item from within this is tricky (but not impossible - they tell me they are looking into it). The fact that they cannot say immediately demonstrates the complexity of running many large databases of digitised content.

These results, from very different institutions, invite discussions on shallow versus deep engagement with digital collections. Some examples of commonly accessed material are what we would think of as part of the Canon of Digitised Content: Shakespeare, Newton, Medieval Manuscripts. Some examples of commonly accessed material here can be taken as little more than clickbait - LOL! History! - or free reference material - its a free Malaysian Dictionary! Bonus! - but is getting people through the virtual door to digitised collections in this way, and through these items, such a bad thing? Come for the Dog with the pipe in its mouth! stay for the genealogy, then the discussions on palaeographic method! One can also argue that some of the discussion surrounding these objects are exactly what we are trying to encourage - many of the hundreds of comments posted on the Reddit item about the British Museum sling shot bullet, although hilarious, show consideration of what it would mean to be human in the time of Ancient Greece, and relate their societal response to ours. Isn't that the starting place (and in some cases, the ending place) of engagement with primary historical evidence? 

Asking to see Digitisation's most wanted opens up wider questions of public engagement, the impact of social networks on internet traffic to digitised collections (from highlights posted by the institution, to those identified and shared by others outside it, often quite unexpectedly), and the role of making images of primary historical sources open for others to discover, use and share. We also become aware of the complex and intertwined database systems which are in place in many large organisations undertaking digitisation and delivering digitised items to users, and the difficulties in reporting on individual items (be they physical or digital!) as a result. Digitisation's most wanted is also a rapidly moving target, dependent on publicity, and changing interest and focus over time: social media can encourage large swings and changes in popular items very quickly. The act of posing this question has led to an interesting discussion on how we think about use of digitised content, and how we can build up evidence about usage. (I'd also like to thank the organisations listed above for responding to my query so promptly!)

Have you, or any organisation you work with, been affected by the discussion in this blog post? Do you have any evidence you can contribute to the investigation? Your help is needed to catch digitisation's most wanted. Please do post your comments about your experiences below (comments are moderated so may take a few hours to appear), or email m dot terras at ucl.ac.uk for them to be integrated here. The internet is a place of busy traffic. Someone must have seen them...

Update 15/05/14: The British Library's Endangered Archives' most popular item is the St Helena Banns of Marriage, an item commonly pointed to on genealogy websites such as this and this.

Update 16/05/14:
-The National Library of Australia have a discussion of their 25 most viewed digitised newspapers, and why, here.
- The International Dunhuang Project at the British Library tell me that a redevelopment of their database and website is underway to improve reporting for them, their partners and users.
- Glasgow University Library Special Collections tell me that their most popular item is the Curious Case of Mary Toft, from 1726, who supposedly gave birth to a litter of rabbits.  This was featured as a book of the month in 2009, but picked up by the social media site Mental Floss in January 2014, with that page being shared on facebook more than 4000 times, and garnering 30,000 hits in one day alone, and has since been posted on various other social media platforms, including Reddit.  Glasgow also say that there is a difficulty in measuring access counts as the content is held on various different servers, and it can be difficult to interpret Google Analytics in this case. They also point out that, from their perspective, there is a lack of benchmarks to compare usage of their items to that of other special collections.
- The National Archives tell me they point to the popular items as part of their navigation and as a result, these "most popular items" remain the most popular, in a virtuous circle. A very popular item at the moment is the The Security Service: Personal (PF Series) Files KV2 which hosts the records of spies such as Mata Hari. These were embargoed until Thursday 10 April 2014, then launched with an accompanying press release, which garnered significant press coverage worldwide, driving traffic to the site. The only frequently accessed item which is not in these lists is the muster roll of HMS Victory for the Battle of Trafalgar, which is commonly referred to in military and naval history websites (although interestingly few people link through directly to the page where it can be downloaded from, so those who read about it must come to TNA's website and search themselves).

Update 19/05/14
- The Estonian Folklore Archives at the Estonian Literary Museum tell me that their most popular item is a leaflet from 1937 on how to preserve sealskins, although I can see no other webpages pointing to this item (perhaps because my Estonian search skills are weak!).
- UCLA Digital Library tell me their most viewed item is a Lyrical Map of the Concept of Los Angeles,  a 23-foot long hand-drawn and hand-lettered map of Los Angeles, using the words and images of dozens of L.A. authors, which was on display in a museum in 2011, and was featured widely on blogs  both at the time of the exhibit and since, which points people to the digital version now the display is no longer live in the museum space. Another popular item is the complete set of the 1582 Corpus Juris Canonici, the "Body of Canon Law," particularly the table of contents, which is commonly linked to from those interested in Canon Law, such as this, thus driving subject specialists to the site.
- The History of Computing in Learning and Education Virtual Museum tells me the most viewed items are the writing competition and Historic Newsletters from the People's Computer Company.
-  A Hack day carried out at the Zurich Hackathon 2014 looked at image analytics from the US National Archives and Record Administrations contributions to flickr commons, looking at 200 million hits in a 3 month period and identifying the most common images: a description of that hack is here, which also gives examples of the most commonly looked at images. "There is a spike on March 24. Further analysis shows that the biggest referral on that day is Dorothy Height. Turns out this lady was featured on a Google Doodle on that day." Popular subjects (and referrer pages, generally from Wikipedia) were John F. Kennedy, World War II, Japanese American Internment, Vietnam War. A full list is available on the project page. This shows the importance of institutions linking their content from Wikipedia, and what can happen if you are featured by Google.
- There is also a useful tool in BaGLAMA which shows view counts for pages using Commons images in GLAM-related category trees.

Update 20/05/14
- The Bodleian also make the very good point that "With most browsers now defaulting to 'do not track' combined with the EU cookies legislation it is difficult to find any sort of data that one can 'stand behind' these days."
- The Jüdischen Museums Berlin's most accessed items are the Sammeldatensatz: Orden, Ehrenzeichen und Embleme von Julius Fliess (1876-1955), but they say that most accesses come from searches for "jewish emblems", and so there is a need to add emblem as synonym for symbol to thesaurus, to help users find what they are looking for. In this way, looking at search terms can help develop user paths through the system so they can find what they actually want.
- The University of Iowa Digital Libraries say that based on google analytics for the last year, the most popular item is a dada book, and the most popular collection is Iowa Maps, but the access numbers for different objects in the database themselves are hard to count, and they'll get back to me on that. Based on recent web searches reported from the web master, a surprisingly high number of people find them via searches for Peter Rabbit: the digital book of which is linked through to their site from the Wikipedia page and various other websites featuring Peter Rabbit.
- The National Library of Wales tell me the most popular article on http://welshnewspapers.llgc.org.uk is a 1916 Cambria Daily Leader advert for 'blouses' and 'hosiery'. To find out more about why may take some digging, though!
- Hamlet Depot and Museums tell me that their most popular items are genealogical records, including railroad employees lists, and seniority records, and also historic pictures.

Update 22/05/14
- The New Zealand Electronic Text Collection tell me that reference works are their most used, including A Grammar and Dictionary of the Samoan Language, with English and Samoan vocabulary (which is linked to from thousands of different sources about New Zealand culture, and discussions on translation), New Zealand in the First World War (which is linked to from various history and genealogy sites) and The Official History of New Zealand in the Second World War (which is also popularly linked to online, including in reminiscing personal postings from soldiers who served, talking about the war on social media).
- The University of Otago Library provided me with a very detailed overview of the issues they face (thanks!). They are in the process of developing a repository to manage all of their digital collections that they want to curate, and the pilot will be live by November, but for the moment, they have a variety of different sites on which you can see digitised material, showing again the complex relationship of databases and content which many institutions have. For example, they have OUR Heritage which is a window across some collections.  Some records are pulled from OUR Heritage and displayed via Special Collections Online Exhibitions. There also is Hocken Collections who had their reader access collection digitised and made available online. They track this via Google Analytics, and also watching their own server stats: and these do not in any way match up. Google does not capture when someone goes directly to a file, so Analytics reports just a fraction of the over a million hits in the past year that they can track on their server. They digitise on request, and respond to community demand, and are trying to prioritise the digitisation process. From Google Analytics, the most heavily used collections are the History of the University and Botanical charts (which belong to the Department of Botany at Otago and some are still used in the Labs.  They digitised these, provided a copy for their use and deposited the originals in Hocken Collections.) The most popular items are “Key plan to Mr G.B. Shaw’s picture of Dunedin in 1851” which is mentioned on various genealogical sites online:  a Painting “Sangro, a rosary of olive trees, landscape of windswept manuka.” which appears linked from some other major federated collections online and a printed map of Rome “Mappa della campagna Romana del 1547” which is a commonly consulted map (there are various copies of it in libraries worldwide) so those searching online to see it must find the freely available copy here.