Tuesday, 27 May 2014

Inaugural Lecture: A Decade in Digital Humanities

This is the crux of what I planned to say - or hoped to say! at my professorial inaugural lecture at UCL on the 27th May 2014. I'm not one for reading off a script though, so may have deviated, hesitated, or expanded on the night. A video of my talk on the night is now available. No I haven't watched it myself!



I decided to call my inaugural lecture "A Decade in Digital Humanities" for three reasons.
1. The term Digital Humanities has been commonly used to describe the application of computational methods in the arts and humanities for 10 years, since the publication, in 2004, of the Companion to Digital Humanities. "Digital Humanities" was quickly picked up by the academic community as a catch-all, big tent name for a range of activities in computing, the arts, and culture.  A decade on from the publication of this text, I thought it would be useful to reflect on the growth, spread, and changes that had occurred in our discipline, and my place within them.

2. This year sees me in my 10th year of being in an academic post. I joined UCL in August 2003, my first academic post after obtaining my doctorate, and since then have worked my way up the ranks from probationary lecturer, to senior lecturer, to reader, and now full professor. The professorial lecture gives me a rare chance to pause and look behind me to see what the body of work built up over this time represents, and what it means to be undertaking research in this area.

3. You'll have to wait for later in the lecture to see the third reason...

Who here would be comfortable defining what is meant by the term Digital Humanities? In this, the week of UCL Festival of the Arts, celebrating all things to do with the Arts and Humanities, let's go back to first principles. In UCLDH and 4Humanities' award winning infographic "The Humanities Matter" we defined the humanities as "academic disciplines that seek to understand and interpret the human experience, from individuals to entire cultures, engaging in the discovery, preservation, and communication of the past and present record to enable a deeper understanding of contemporary society." It stands to reason, then, that the Digital Humanities are computational methods that are trying to understand what it means to be human, in both our past and present society. But it may be easier if I give some brief examples to demonstrate the kind of work we Digital Humanists get up to.

One of the easiest things we can do with computers is count things. For data to be computationally manipulated, it has to be in numeric form. If we can get text into a computational form, we can easily count and manipulate the language, showing trends across time. For example, if we take a million words of conference abstracts from my discipline from the ALLC/ACH conference across various years, we can easily see how mentions of one technology (XML) becomes more popular, while another (SGML) is in decline. Much of the work in DH is in manipulating and processing and analysing text - our iOS app Textal is just part of that trajectory. Much of my work, though, has been in digital images, starting with developing systems to try and read damaged documents from Hadrian's wall, and more recently working on multispectral and 3D manipulation of damaged texts. We've also worked with museums on large scale 3D capture of cultural and heritage objects. The important thing about all of this is that as well as implementation, we're also interested in use and usage of these technologies, and what impact that they have on those working in culture and heritage, and the ability to study the past and present human record. We often innovate new systems, or adopt concepts and apply them to humanities projects, such as the crowdsourcing of Jeremy Bentham's handwriting by volunteers, or working with visitors to the Grant Museum of Zoology at UCL to encourage debate about zoological collections. We build, we test, we reflect back on what using these technologies means for the humanities, giving recommendations which can be useful across the sector. From these projects, its difficult to pin down what Digital Humanities actually is, but that sums up the difficulty of our discpline's title: it encourages thinking about computational methods in the arts and humanities, and then into culture and heritage, in as broad a sense as possible.

What made Digital Humanities spring, fully formed like Athena from the Head of Zeus, as an academic field in 2004? Was it because that was the first time quantifiable methods had been used in the Arts and Humanities? (remember - all computational methods require quantification). Well, of course that is nonsense. When you look back across the history of Humanities scholarship, quantifiable methods were used in the Arts and Humanities since the birth of Universities. If we think of the book as technology, from its inception scholars took it to pieces to see under the hood: concordances and indexes of works were manually created, such as this "Concordance or table made after the order of the alphabet" from 1579 which lists how many times concepts such as "abomination" appear in the New Testament. Or the work of Joseph Scaliger who in the early 1600s plotted the different periods in time in which different civilizations must have existed, through quantifiable methods. Or the work of August Schleicher in the 1850s who showed, by quantifiable methods, that the languages of Europe must have had a common historical root. All of these texts are available from UCL Library, none of which I have to leave my sofa to see because YAY! Digitisation! Changing humanities scholarship! - but the point is that quantifiable methods are part of established methods in the humanities, and have been for as long as the Humanities have existed. So when I undertook my first project at UCL, looking at whether we could use the high performance computing facilities at UCL to analyse historical census data - this is part of an quantifiable humanities academic tradition which harks back 500 years, just at a grander scale.

So what made Digital Humanities spring, fully formed like Athena from the Head of Zeus, as an academic field in 2004? Perhaps in 2004, this was the first time people had used computational techniques in the arts and humanities? But of course, that is nonsense too. When you look back at the history of computing - and not even digital computing, but the very first computer - the very first computer programmer, Ada Lovelace, hints at the possibilities for art, music, and understanding human knowledge and culture in her earliest writings. She understood that there was something more to the mathematical calculations afforded by this machine than science, and they called her a madwoman for it. Well, this madwoman has a (yet unproven) theory that if you look at the history of the first 100 electronic programmable computers in the 1950s, 1960s and 1970s across the world, you will see humanists eyeing them up and asking "how can I use, or develop this tool for use, in my research"? Its certainly true of Father Busa, working with IBM in the 1950s on the concordance of the works of Thomas Aquinas (counting, indexing, and manipulating words, as part of the historical trajectory of humanities methods stretching back 500 years, just a change in scale...) but also of Roy Wisbey, in Cambridge, who set up the Literary and Linguistic Computing Centre there in the 1960s. When the first computers arrived at UCL, the artists from the Slade School of Fine Art were over there like a shot to establish the Experimental and Computing Department. We should also mention Susan Hockey, who led various initiatives in text encoding, text analysis, and digital libraries. Susan, incidentally, gave me my first academic job here at UCL in 2003: UCL had included a Digital Resources in the Humanities module course as part of its MA offering for librarians and archivists in the School of Library, Archive and Information Studies (now the Department of Information Studies) from 2000, under Susan's auspices. But the point is, considering how best to use computing in the arts and humanities is not something which started in the 21st Century,  nor 2004, and Humanists have been looking at available tools, and how best to use them, since computation began. So when we undertook one of the latest projects at UCLDH, which came from looking at an iPhone, thinking "how can I use, or develop this tool for use, in my research in the Humanities" and developed an iOS app for text analysis, this is part of a longer trajectory of considering available computational tools, and how they may be appropriated, adopted, and adapted for our means in the humanities, just at a grander scale, as processing technologies increase in speed.

So why Digital Humanities, in 2004? Firstly, the coalescing of interested scholars into an identifiable field is an understandable academic response to societal changes. The speed of computing rises, the price of computing plummets, the information available on the internet (and the possibility to create new information) increases, use and usage of internet technologies has become commonplace. Remember, its up to Humanities scholars to look at the past and present record to enable a deeper understanding of contemporary society: quite frankly, it would be more alarming if an academic movement hadn't emerged looking at what using computational methods could do for our understanding of human society, both past and present, and how best we can grab the technical opportunities which fly by and appropriate them for our means, to inform both ourselves and others about the prospects of using computing in this area. The discipline of Digital Humanities is inevitable, and would have appeared whatever the title it was given.

Secondly, Digital Humanities is a handy, all inclusive, modern title which rebrands all the various work which has gone before it, such as Humanities Computing, Computing and the Humanities, Cultural Heritage Informatics, Humanities Advanced Technology... DH has a ring to is, and boy, what a rebranding it was. We tend to call it "Big Tent Digital Humanities" meaning: roll up! roll up! everyone using any computational method in any aspects of the arts and humanities is welcome! but really, Big Wave Digital Humanities may be more appropriate, as we countenance the sudden swell, dissipation, and speed of the activities of the discipline. Taking a peek at the mention of Digital Humanities on Google Ngrams we can see its sudden growth, and the fact that it is now used as a proper noun, with Capital Letters (although remember that this, counting words, is part of a long tradition of humanities scholarship, Google simply have more books to include in their count). We can see how DH has trended over time, appearing in headlines in the media. Many, many textbooks in DH appear, some of which I am responsible for myself. Journals appear, such as Digital Humanities Quarterly (of which I'm one of the general editors), and the ALLC/ACH conference renames itself Digital Humanities (this year, for my sins, I'm the Program Chair for DH2014 which will be held in Lausanne, Switzerland. We have seen over 700 proposals from more than 2000 vying for a space to present). There are many more DH conference presentations and workshop slots, worldwide, year on year. In 2010, I gathered together all the available evidence I could on DH in an infographic called Quantifying Digital Humanities, showing that there were 114 DH Centres in 24 countries. Today, not even four full years later, there are 195 DH Centers in 27 Countries. Those knowing how long it takes to set up a research centre know that this is phenomenal growth in the university and GLAM sector, and that institutional support must be strong, behind each and everyone of these.

UCL Centre for Digital Humanities is part of those who have joined the recently founded centres. We officially launched four years ago to the week of this lecture, in the same lecture hall where this lecture is being presented. We dont talk about the launch much - its not often I'm part of something at work which ends up featured in the political pages of the newspapers - but you'll have to google that to find out more (YAY! digital media! the internet never forgets!) but in those four years since launch we've undertaken a phenomenal amount of projects, covering many aspects of Humanities and Arts research, and considered Digital Humanities in its broadest sense. This isnt all me - there is an amazing team who are part of the Centre, and we've won various awards for our academic projects and collaborations, published many books, papers, and book chapters, and been part of successful funding bids from research councils worth tens of millions of pounds. One wonders what makes a Digital Humanities Centre attractive to universities that dont have one. Nope, I cant see what makes that level of activity attractive, at all.

So what proportion of Humanities scholars are now digital humanists? Back in 2005, participants in the Summit on Digital Tools in the Humanities at the University of Virginia estimated that "only about six percent of humanist scholars go beyond general purpose information technology and use digital resources and more complex digital tools in their scholarship" (p.4 of this PDF). By 2012, N. Katherine Hayles, in her chapter "How we think: transforming power and digital technologies" in David M. Berry's edited text "Understanding Digital Humanities", estimates that 10 per cent of Humanists are now digital humanists (p.59).  Now, in 2014, a forthcoming study from Ithaka S+R (with the working title of Sustaining the Digital Humanities: Institutional Strategies beyond the Start-up Phase) includes surveys of faculty at four American universities. In the departments surveyed at each institution, nearly 50% of faculty members indicated they have "created or managed" digital resources. Granted, the departments were chosen by campus staff (often at the library) who felt there was some significant activity taking     place there. The percentage of these "creators" was consistent across all universities (Brown, Columbia, University of Wisconsin, Indiana University), and most of the creators also felt that their creation was intended for public use (not just their own research aims), and would require ongoing development in the future.

50% of humanists are involved in digital activity, are digital humanists. How can this possibly be? And how can we conceptualise what it means to be a digital humanist, amongst this spread of activity and range of available technology: is creating or managing digital resources the same as being a digital humanist? At a time where (nearly) every library catalogue is digitised and available online, and (nearly) every book manuscript written on a work processor, and many historical documents digitised and available for consulting from your own sofa, does that make everybody working in the humanities a digital humanist? How can I begin to conceptualise my contribution, and my place, and where my work sits within Big Wave Digital Humanities?

I find it useful, here to turn to Roger's Innovation Adoption Curve, a sociological model that looks at how technology spreads through society. This is a bell curve, and right at the start of adoption of technology, are a few innovators, experimenting (and developing) new technology. These innovators sometimes persuade a larger number of early adopters to take up the new technology on offer, and only once a sufficient mass of users are achieved, does the technology "cross the chasm" and become used by the majority of individuals in a society (who are split into an early majority, or late majority). Finally, we have adoption by the "laggards", who are slow in taking up technologies, but do so if they have permeated throughout society. (Hard not to think, here, of my elderly grandmother who recently got her first mobile phone).  Now, this model is useful as we can plot along it some of the technologies which are available to a humanist. Things like word processing, and searching for references online, and even looking up the digitised texts which I showed at the start of this lecture: even the technologically laggard humanists can do it now, and although these technologies are changing scholarship, its a question of scale (better! faster! more!) rather than of approach or technique, for the main. Technically facilitated tasks like updating websites, using and updating wikis, using social media: even the late majority of humanists can do it now. Online tools are available, such as Voyant, which allow you to do text analysis, and manipulate texts to see the underlying patterns: so the early majority of humanists can use these tools should they want to. But the most difficult, intellectual work of applying technology in the humanities still occurs before the chasm has been crossed, in the phase of innovation, and early adoption, where we are looking at the technologies that cross our path and saying "how can I use, or develop this tool for use, in my research?", much like those in the 1950s or 1960s who were coming across university mainframes and asking how best to apply that in the literary and linguistic arena. It's important to note, of course, that this wave of technology keeps on coming at us, and the place of where technology sits along the curve changes: 20 years ago, had you been making a website for your humanities project, you would have been an innovator, rather than a late majority, and the same holds for word processing 40 years ago. The technology keeps coming: we have to respond to this, innovate, adopt, and see what is useful or useable for, or used by, the majority of people in our discpline.

Now (and this is the most contentious thing I'm going to say in my whole lecture, for those attending who are dyed-in-the-wool Digital Humanists) one of the problems that we have as a movement is that we tend to get caught up and fixated upon a certain technological solution. For example, every DH program I've come across teaches XML, that technology which took over from SGML in the conference abstracts - as the best practice way to encode text. And there's no doubt that XML provides the framework with which we can both explore theoretically what is means to describe texts computationally, in such a way they retain the information in their printed or manuscript form, whilst also the means to build and test prototypes. But XML as a technological standard has been around for 16 years, and technology moves on, but DH doesnt seem to be doing so. In many ways, DH's relationship to XML is similar to the AI community's relationship with LISP: the means of computational expression in the language or format suit the questions which need to be asked by the field, so there is no need to use other technologies which come on stream, which may be more efficient from a computational point of view, as we explore what is means to work with our question in this computational way. And that's ok, but we shouldnt be blind to the fact that, hey! technology is advancing all the time and, also, XML is not a technology that crossed the chasm: it may be in use for technical systems, but its not one that you see a lot of the general populace using. This, in turn, means that DH has permanently hitched its wagon to an aging technology, which is hard to explain to others, including other non-XML humanists, whilst other things are happening in the technological world around us. Just something we have to watch out for, when building teaching programs, or looking at the scope of outputs in our field. We dont want to be left behind as the digital in digital humanities rolls on without us.


I find it useful to plot my research on the Innovation Curve, to see where what I am doing sits. So, the work on counting terms across a corpus - very much sits in the early majority, given the availability of tools to do so. But the work on building an iPhone app to do so - very much innovation: it took a lot of pure programming in a relatively new space to achieve it. The work in image processing I do is either innovation (we are publishing here in pure computer/engineering science venues, as well as in humanities venues, which I'm very proud of), or we adopt technologies our academic colleagues in the engineering sciences have generated and roll them out to a humanities or heritage application. Our work on user studies is something completely different though: here we are generally looking at how the majority of people are using an extant text, or (in the case of something like Transcribe Bentham, or QRator) we are conducting reception studies, where we innovate and build a technology, launch it, and study its uptake across the whole cycle. We can see, then a range of DH activity across the innovation cycle, but the majority of the work I do is certainly at the start of the innovation curve. Is this where DH sits? I like to think so, but more to the point, I'm confident its where I sit best, when doing DH.

I need here to show you another curve, though. This time, the Gartner Hype Cycle, which looks at how technologies are launched, mature, and are applied (so people know when to invest). The premise of this is that when technologies are first triggered, everyone thinks they are going to be the Next Big Thing, and so they reach "the peak of inflated expectations", before crashing down into a "trough of disillusionment" when those adopting them realise they aren't that great at all. Its hard work to get technologies up the "slope of enlightenment" where useful, useable applications are found, and few technologies make it to the "plateau of productivity" where they become profitable. Its a useful curve - this year's predictions show Big Data right at the top of the peak, which chimes in with media coverage of how it will solve everything, for example. So where would I put DH, if I had to as a movement, on this curve?

I'd put it at the top. At the top of the Peak of Inflated Expectations. We've got a lot of pressure on us to prove our johnny-come-lately benefit to the world of academia, to demonstrate our worth, to show that the investment made in us over the past few years is worth it (whilst also bringing in further investments in research funding, to meet institutional expectations). After a peak, comes a crash, and we have to be prepared for the tide to turn and the backlash to begin, after the years of media hype and raised expectations. So how do we get to the plateau of productivity of Digital Humanities?

First, I would argue that we have to understand our lineage: that the current manifestation of DH is a logical progression of qualitative methods used in the humanities for the past 500 years. That the current manifestation of DH is a logical progression of humans wondering what the potential is for applying computational methods to humanities problems, which has been going on in the digital space for the past 60 years. These combined trajectories aren't going away, and despite what funding cuts and media backlash may come at us, it is the role of the digital humanist to understand and investigate how computers can be used to question what it means to be human, and the human record, in both our past and present society. Secure in our mission, we can carry on whatever the storm throws at us.

Second, I would argue we have to ignore naysayers who are unsure about this new Digital Humanities lark (and believe me, there are plenty, even in my own department) and just do good work. The way to demonstrate our worth is to demonstrate our worth through doing good work. We have to keep asking questions about computational methods, computational processes, and the potentials that they offer humanities scholars, as well as the pitfalls, to explore this changing information environment from the humanities viewpoint. Its not just about building websites, or putting information online, its about innovating and adopting, and questioning while we build about the ramifications of doing this, the impact on the humanities, the issues using technology raises, and the answers it provides that you couldn't otherwise generate, to do good work in Digital Humanities. I realise this is very Calvinist of me - you can take the lass out of Scotland - but I do see that we have to be engaging with theories and questions of what is means to be doing this work in this way, as well as updating a website or creating a digital file. A continuation of what it means to be a humanities scholar, in the digital space.

I'm not one for looking back, and despite the title, I deliberately didn't want this inaugural to be a survey of all the projects I have undertaken over the past ten years - then I did this, then I talked to that person, then I visited there - but when I look back over the variety and range of projects, publications, and outputs that I've worked on, either on my own, or as part of a team (there's a lot of teamwork that has gone on here) I'm firstly surprised at how much of it there is and the range of topics we've covered, and the opportunities we've pounced on. I see a body of work which explores various aspects of what it means to be applying digital technologies in the humanities space, and facilitates both those in engineering science and those in the humanities to explore issues which are important to them. I've learn't things along the way about the nature of interdisciplinary work, the nature of teams, the nature of the academic publishing and peer review process, the nature of the grant funding process, but I've written about that elsewhere. There are things, also, that I am proud of that are physical rather than purely digital: over the last few years I'm most proud of building the UCL Multi-Modal digitisation suite, which is a shared space between the UCL Library Services, UCL Faculty of Arts and Humanities, and UCL Faculty of Engineering Science, contributing to the infrastructure of UCL in a collaborative endeavor. But what I see here, as a common thread, is that the work I do tends to sit right at the beginning of the technology adoption cycle, aiding and abetting the application of technology within the arts, humanities, and heritage, and I'm comfortable with that. There's a strength in knowing your place, and your remit, and what you do best.


So the third reason for calling my talk "A Decade in Digital Humanities" is that I didn't say which decade we were talking about, and it is time also to look towards the future, and what the next ten years holds for both DH, as the field turns into a teenager, and for me, as I go into my next decade here at UCL. I'm not one for crystal balls, so I'll keep my scrying brief. I see an inevitable fragmentation of the DH community and DH focus - it was never conceived of as a homogenous entity anyway, and it is the nature of waves and swells that they will dissipate. We'll see (we are already seeing) more focussed groups of scholarly work around, say, Geographical Information Systems and literature, as people specialise and work on specific technologies and specific methods. The technology will keep coming, and its up to individual humanities scholars to respond to what is appropriate to their research question: the effects of DH scholarship will continue to ripple out across the humanities as technologies go along the adoption cycle, and certain aspects of digital research will just become normal for humanities scholars, as time goes on. But I do see that there will always be a place, right at the start of the technology innovation uptake curve, for specialists in Digital Humanities to sit, watching out for these changing and emerging technologies, setting up pilot projects to experiment with different aspects of these technologies, feeding back recommendations and the potential ramifications for other humanities and engineering scholars and those within the wider cultural and heritage sector, and exploring what is means to be doing humanities research in that area. I'm happy to remain there, and I see that this will remain my place working with other humanists, and engineers and computer scientists, over the next decade. I'm delighted to be a co-investigator on the doctoral training centre for Science and Engineering in the Arts Heritage and Archaeology, which is the EPSRC's largest every investment in Heritage Science, and for the next 8 years we'll be training up a range of doctoral students in this cross section of the arts, heritage, humanities, and engineering and conservation science. (Perhaps what I really do is Heritage Science, but that's another talk entirely, and DH has work to do with the Heritage Science community in future).  That said, we do have work to do, in keeping an eye to making sure people know about the successes, outputs, and impacts of DH work. Given the expectations foisted upon us, we have to learn to be more vocal about our objectives, our remit, and our results. It's our job to be thinking what it means to use digital technologies in humanities research, and just research, full stop. As a result, our insights can benefit a range of other fields, if we communicate them effectively.

Digital technologies are not going away any time soon: and although DH has had a rapid swell, it will remain essential that we investigate, use, and experiment with technologies over the coming decade. There is a new Companion to Digital Humanities coming out in late 2014, showing how the technologies used in humanities research have developed since the first edition (I'm delighted to have written a chapter on our public engagement work for it), and our see our field, as well as knowing where we have come from, has to understand that the technological wave on which we sail is continually on the move. I hope I've shown here that our uptake of technologies in the humanities is, and will continue to be, a moving target, and that as part of a longer trajectory of investigation into humanities methods, DH is a modern but necessary, and even inevitable, part of the Humanities, and even computational, landscape. I look forward to what adventures the next Decade in Digital Humanities holds. There is so much to do!

Now, that is where I'd normally pause and say thank you for your attention, but hey, its my inaugural, so I'll cry if I want to. I have a few brief thanks to make - its quite a lick to go from probationary lecturer to full prof in ten years, and so I have to thank those who have supported me. Thanks go to my family up in Scotland for all their support, and my family of my own: many of you know that in the past few year's I've had three children, so biggest thanks of all go to my husband Os, aka Expert Sleepers, for his forbearance and baby juggling skillz. I've been blessed with an amazing support network of friends, who have supported my enormously over this period. My first academic supervisor was Professor Seamus Ross, who kick started my interest in this area, and his support and interest at the start of my career really set me up for the work I do today. Likewise, my PhD supervisor Professor Alan Bowman remains a fantastic mentor: thank you, Alan. My other PhD supervisor, Professor Sir Mike Brady, made me promise (when I got my doctorate in engineering) not to go near any nuclear power stations or bridges, a promise I have kept - thanks Mike. I've already mentioned that Professor Susan Hockey gave me my first academic job: but her work remains an inspiration on what is possible in computing in the arts and humanities. I work with an amazing team of people at UCLDH and I thank them for their input both for the centre and on our various projects. Special thanks go to Rudolf Ammann, our designer at large, who helped prepare the graphics for this lecture.

But in this week of UCL's Festival of the Arts and Humanities, its good to pause and see how embedded Digital Humanities research is now throughout college, and how much we work, in the Humanities, with those around us. The projects I've shown, albeit briefly, today, are carried out in league with various other faculties (UCLDH reports to both the Arts and Humanities and Engineering Faculties here). Colleagues come from a range of different departments including not only those across the Arts Faculty, but the Bartlett Centre for Advanced Spatial Analysis (in the UCL Bartlett Faculty of the Built Environment), and across the UCL Faculty of Engineering (I have joint projects with Medical Physics, Computer Science, and Civil, Environmental, and Geomatic Engineering). We are dependent on input from both our colleagues in UCL Library Services, and UCL Museums and Collections, and work very closely with items in all the collections across college. The success of DH at UCL is then dependent on the institutional context we have here. Digital Humanities is now embedded into college life at UCL, and in this week of the Festival of the Arts, my final thanks go to UCL as an community for its institutional support in encouraging us to ride the DH wave: for without being at UCL, my decade in digital humanities would have been completely different.











Saturday, 24 May 2014

Roy Wisbey, and Literary and Linguistic Computing, 1965 style



I recently got in touch with Professor Roy Wisbey, who set up the University of Cambridge's Linguistic Computing Centre in 1960, to invite him to my inaugural lecture. He is not able to attend (but passes on his regards to those who know him!) and he also briefly loaned me this newspaper article, from 24th September 1965, from the Cambridge News. A very early piece of Humanities Computing history! It's in very fragile condition - I've spliced it together here to give the whole piece in one image (and the blog stylesheet is not my friend here - will sort out later - but...) - enjoy!

The use of computers will save the scholar years of mindless drudgery! indeed!




Friday, 16 May 2014

Siberian Digital Humanities Adventure

The Siberian Federal University
Greetings from Krasnoyark, Siberia, where for the past week I've been hanging out at the Siberian Federal University, the largest university in the Siberian region, which is in the top rankings in Russia. I've been giving some guest lectures on digital humanities, meeting various staff and students, and plotting with them on how to support their work and how to make connections to the wider digital humanities community.

How did I end up here? Its all down to the wonderful Inna Kizhner who approached me nearly two years ago, in my guise then as secretary of what is now the European Association for Digital Humanities. After helping source some teaching materials, in English and Russian, for their taught courses, Inna remarked to me "no-one ever comes to Siberia..." and I immediately said "ask me!". And finally, after much preparation, here I am.


Siberian Federal University are establishing a solid Digital Humanities presence. In the Institute of Humanities they currently offer digital humanities modules at both undergraduate and postgraduate level, and also an undergraduate module in the subject area of digital history (which next year will be taught by Inna). They have a digital lab (door sign, above!)  and digitisation lab. They have a range of projects they have been working on with both researchers and students, many of them led by Maxim Rumyantsev who is now the university's deputy head, so there is positive institutional support here. These projects are mostly in the area of multimedia and digitisation. For example, working with the Museum of Geology of Central Siberia to create the simply stunning companion to their minerals collection (it is no easy task to capture minerals in this detail, at this quality); capturing, virtually exploring,  and explaining regional heritage architecture (which is fast disappearing under new developments in this region) from the nearby town of Yeniseisk, documenting regional art shows and youth art shows; capturing high resolution images of the art contained within the Surikov Museum (life size copies of which adorn the university's walls at every turn); working with Gigapan capture methods and the State Russian Museum to create zoomable images of large art works (can you spot Pushkin?); and creating an interactive model of the Siberian Federal University campus itself. They are keen, now, to be making connections with others across the world, and I'm delighted to be helping them, and introducing them to various figures, and associations, in Digital Humanities. There is much work to be done, we have plans set out, and they are keen to make new relationships and new collaborations.

Its not all been work! I've been welcomed into colleagues' homes for meals (often meeting their families), treated at friendly restaurants (the food is wonderful), and toured round museums and supermarkets (Inna patiently put up with me pointing and exclaiming at various products we dont have in the UK, such as dried fish, and tinned horse). Today we went to the Krasnoyarsk Dam, 30km upstream from the city, on a glorious spring day which showed off this remarkable feat of engineering (which is so exceptional it features on banknotes across Russia). There is a heavy security presence, and no photos allowed, but I did manage this sneaky selfie...


It's been a fantastic, trip, and I've been very welcome here. Thanks to Inna, Maxim and Marina for their hospitality, and I look forward to further opportunities, visits and introducing anyone who wants to be introduced (if I can be of help, drop me an email and I will forward it on). I have to admit I was nervous about my trip here - but instead of stress I've found friendly connections, and much opportunity to help further establish DH in this region, and throughout Russia. Now to pack, and begin the long trip home, where my three small boys are missing their mummy on the other side of the world (and I them). до свидания!

Thursday, 15 May 2014

Digitisation's Most Wanted

What are the most commonly accessed digitised items from heritage organisations? Even asking the question leads to further understanding about the current digitisation landscape.


Have you seen this Dog? Last spotted on the Flickr account of the National Library of Wales. Dog with a Pipe in Its Mouth, Taken by P. B. Abery, 1940s.
Last month, at a meeting at the National Library of Scotland, an interesting fact flew by me. The NLS has hundreds of thousands of digitised items online, so what do you think is the most popular, and most regularly accessed and/or downloaded? (it is difficult to make the distinction regarding accessed or downloaded on most sites.) Is it the original Robert Burns material? The last letter of Mary Queen of Scots? or any of the 86,000 maps held in this, one of the best map collections worldwide? No. It is "A grammar and dictionary of the Malay language : with a preliminary dissertation" by John Crawfurd, published in 1852. This is accessed by hundreds of people every month - mostly from Malaysia, partly because it is featured on many product pages providing definitions of malaysian words - demonstrating the surprising reach and potential in digitising items and then making them freely available online, reaching out to a worldwide audience far beyond the geographical local of the library itself. Wonderful.

This left me pondering... what are the other most downloaded items at major institutions in the UK? So I sent out some feelers, and here are the results, demonstrating both the hidden complexity of the question, and the relationship of digitised heritage content to the current online audience landscape.

At Cambridge University Library, the most accessed collection overall is the Newton Papers, which was the first major digitised collection launched by the Library in 2010, and promoted widely. Within that, there is one particular notebook (which Newton acquired while he was an undergraduate at Trinity College and used from about 1661 to 1665 for his lecture notes) which is the most popular, featuring heavily in the initial promotion of the collection, and also in an In Our Time special series hosted my Melvyn Bragg on Radio 4.  But within that notebook there is one page that is accessed more than the others, with most of the traffic coming from Greece. Why? This page was picked up in the Greek press and pointed to on many websites, blogs, newspaper reports, and in social media as evidence that Newton knew Greek. The links that remain still direct thousands of users to view Newton's jottings from his Greek lessons at the front of the book, showing the fascinating relationship between publicity, social media, linkage, and an item which reflects national pride, to a worldwide audience.

The most downloaded items at Cambridge also reflect the rapidly changing mentions of items on social media: in April 2014, an item downloaded/accessed more than 6000 times was the Breviary of Marie de Saint Pol, which went live this month. Why the sudden notice? On the 3rd of April, one of the Cambridge colleges with thousands of followers posted a link to it on Facebook followed by the Cambridge Digital Library Facebook and Twitter feed on the 4th of April. Retweeted a few times, these few postings led to the thousands of views of the document, demonstrating the growing importance of using social media to tell people about newly mounted digitised content.

Over at Trinity College Library, the most accessed item from their digital collection in general is the Book of Kells,  which again was their first major digitised item, heavily promoted in the press, and attracting a level of viewing that is unique due to general tourism and cultural heritage interest. The second most accessed digitised item is the surprise: a book of Lute music by William Ballet, from the 17th Century. There is much discussion of this item, and links to it online, posted by online communities of lute players, and those who blog about lutes worldwide. Interest and demand in at item can therefore be encouraged if interested online communities hear about it, and share with their membership.

A similar tale about the importance of publicity and social media emerges from the British Museum. There are popular items about the Viking exhibition which are linked from their home page at the moment given the current exhibition, but since the 1st January 2014 til now, the most popular item accessed in the digital collection (no, wait, go on, guess.... Rosetta stone? Vindolanda Tablets? ...) is the Landscape Alphabet by Joseph Hulmandell (no? me neither). These were discovered and shared on social media by type enthusiasts on twitter  in mid February, and promoted by the cool-hunter the Laughing Squid who has almost half a million followers on twitter, which caused a sudden spike (I cant see the British Museum actually tweeting them out themselves on their timeline).  However, the initial swell of tens of thousands of hits has since dwindled to nothing, showing the fickleness of attention that comes with the social media stream. In 2013, the most single viewed item at the British Museum was... (go on, guess!)... a lead sling bullet, viewed 42,156 times in total. Why? It was picked up on reddit, due to the sarcastic inscription "some ancient sling bullets excavated from the city of Athens, Greece were inscribed with the word "ΔΕΞΑΙ" (dexai), which translates to "catch!"" which generated a lot of online LOLs ("Halt gentlemen. Do not yet partake of the feast before us, for I must capture the image of it with instagram whereupon I shalt bequeath it to my herald upon Facebook for all to see." here) and this encouraged  - and still encourages - visitors to the British Museum website: some forms of posting on social media generate the long tail of usage more than others.

Things start to get more complicated when various digital asset management systems (DAMS) come into place - often institutions have more than one database of digitised content, from different suppliers, with different licensing restrictions and requirements, and so ascertaining the most viewed single item is not a simple question. Organisations also post and share content in various different places. The National Library of Wales are looking through their DAMS to see which items are the most accessed, but immediately know that the most popular item they hold that has been posted to Flickr (with no known copyright restrictions, contributed to Flickr Commons) is the photograph at the top of this post, Dog with a Pipe in its Mouth, from the P. B. Abery Collection. Again, this is an image which has been mentioned regularly on blogs, social media, and internet chats, as well as being a featured image on the 2013 anniversary of Flickr Commons: the fact that it has no copyright restrictions encourages its reuse - and therefore traffic towards its host institution's site, if those users point back to it - online.

The libraries at Oxford University, including the Bodleian, have been digitising items for over twenty years, and so it is difficult to say what the most accessed or popular items are, due to the way the systems have been designed, implemented and integrated over the past two decades. Their most downloaded or accessed digitised book, scanned in collaboration with Google, is probably the "History of the Scott Monument, to which is prefixed a biographical sketch of Sir Walter Scott" by James Colston (published 1881) - a freely downloadable version is available from its library record (ignore the resellers offering printed versions generated from this for much cost on amazon and eBay!). As far as images are concerned, the most popular at Oxford are among those listed on Early Manuscripts at Oxford University, partly because many of them have been up continuously for twenty years (legacy data for the history of downloads of specific images are not available, indicating how difficult it is to access long term data about this. Server logs get very big very quickly and so are generally periodically discarded, and it is only recently that reporting facilities such as Google Analytics have allowed a quick and easy overview of the usage of websites). Currently popular digitisation projects at the University of Oxford Libraries are the Polonsky Foundation Digitization Project, and the recently launched digitized First Folio of Shakespeare's works, but there isn't sufficient data available from all the digital collections to be able to say one way or the other which is the one most popular project, never mind item. It was also pointed out, though, that you would probably struggle just as much (if not more so) to identify which has been the most requested book in the Bodleian's collections!

This trend of databases complicating the question continues at the British Library, where their digitisation outputs and projects are made available via multiple platforms and viewers, some managed by the British Library, and others by commercial partners, with some content available for free, other content via subscription, or paying a fee per image. These are only some of the most popular different sites: https://imagesonline.bl.uk, http://www.bl.uk/treasures/treasuresinfull.html, http://www.bl.uk/manuscripts/, www.sounds.bl.uk, https://www.flickr.com/photos/britishlibrary/, http://www.britishnewspaperarchive.co.uk/, http://find.galegroup.com/bncn/, http://gdc.gale.com/products/17th-and-18th-century-burney-collection-newspapers/ and the BL module on http://www.biblioboard.com/libraries.html. In addition, there are BL digitisation partnerships with other content providers, for example http://idp.bl.uk/ and http://eap.bl.uk/. Finding out the most accessed digitised item from within this is tricky (but not impossible - they tell me they are looking into it). The fact that they cannot say immediately demonstrates the complexity of running many large databases of digitised content.

These results, from very different institutions, invite discussions on shallow versus deep engagement with digital collections. Some examples of commonly accessed material are what we would think of as part of the Canon of Digitised Content: Shakespeare, Newton, Medieval Manuscripts. Some examples of commonly accessed material here can be taken as little more than clickbait - LOL! History! - or free reference material - its a free Malaysian Dictionary! Bonus! - but is getting people through the virtual door to digitised collections in this way, and through these items, such a bad thing? Come for the Dog with the pipe in its mouth! stay for the genealogy, then the discussions on palaeographic method! One can also argue that some of the discussion surrounding these objects are exactly what we are trying to encourage - many of the hundreds of comments posted on the Reddit item about the British Museum sling shot bullet, although hilarious, show consideration of what it would mean to be human in the time of Ancient Greece, and relate their societal response to ours. Isn't that the starting place (and in some cases, the ending place) of engagement with primary historical evidence? 

Asking to see Digitisation's most wanted opens up wider questions of public engagement, the impact of social networks on internet traffic to digitised collections (from highlights posted by the institution, to those identified and shared by others outside it, often quite unexpectedly), and the role of making images of primary historical sources open for others to discover, use and share. We also become aware of the complex and intertwined database systems which are in place in many large organisations undertaking digitisation and delivering digitised items to users, and the difficulties in reporting on individual items (be they physical or digital!) as a result. Digitisation's most wanted is also a rapidly moving target, dependent on publicity, and changing interest and focus over time: social media can encourage large swings and changes in popular items very quickly. The act of posing this question has led to an interesting discussion on how we think about use of digitised content, and how we can build up evidence about usage. (I'd also like to thank the organisations listed above for responding to my query so promptly!)

Have you, or any organisation you work with, been affected by the discussion in this blog post? Do you have any evidence you can contribute to the investigation? Your help is needed to catch digitisation's most wanted. Please do post your comments about your experiences below (comments are moderated so may take a few hours to appear), or email m dot terras at ucl.ac.uk for them to be integrated here. The internet is a place of busy traffic. Someone must have seen them...

Update 15/05/14: The British Library's Endangered Archives' most popular item is the St Helena Banns of Marriage, an item commonly pointed to on genealogy websites such as this and this.

Update 16/05/14:
-The National Library of Australia have a discussion of their 25 most viewed digitised newspapers, and why, here.
- The International Dunhuang Project at the British Library tell me that a redevelopment of their database and website is underway to improve reporting for them, their partners and users.
- Glasgow University Library Special Collections tell me that their most popular item is the Curious Case of Mary Toft, from 1726, who supposedly gave birth to a litter of rabbits.  This was featured as a book of the month in 2009, but picked up by the social media site Mental Floss in January 2014, with that page being shared on facebook more than 4000 times, and garnering 30,000 hits in one day alone, and has since been posted on various other social media platforms, including Reddit.  Glasgow also say that there is a difficulty in measuring access counts as the content is held on various different servers, and it can be difficult to interpret Google Analytics in this case. They also point out that, from their perspective, there is a lack of benchmarks to compare usage of their items to that of other special collections.
- The National Archives tell me they point to the popular items as part of their navigation and as a result, these "most popular items" remain the most popular, in a virtuous circle. A very popular item at the moment is the The Security Service: Personal (PF Series) Files KV2 which hosts the records of spies such as Mata Hari. These were embargoed until Thursday 10 April 2014, then launched with an accompanying press release, which garnered significant press coverage worldwide, driving traffic to the site. The only frequently accessed item which is not in these lists is the muster roll of HMS Victory for the Battle of Trafalgar, which is commonly referred to in military and naval history websites (although interestingly few people link through directly to the page where it can be downloaded from, so those who read about it must come to TNA's website and search themselves).

Update 19/05/14
- The Estonian Folklore Archives at the Estonian Literary Museum tell me that their most popular item is a leaflet from 1937 on how to preserve sealskins, although I can see no other webpages pointing to this item (perhaps because my Estonian search skills are weak!).
- UCLA Digital Library tell me their most viewed item is a Lyrical Map of the Concept of Los Angeles,  a 23-foot long hand-drawn and hand-lettered map of Los Angeles, using the words and images of dozens of L.A. authors, which was on display in a museum in 2011, and was featured widely on blogs  both at the time of the exhibit and since, which points people to the digital version now the display is no longer live in the museum space. Another popular item is the complete set of the 1582 Corpus Juris Canonici, the "Body of Canon Law," particularly the table of contents, which is commonly linked to from those interested in Canon Law, such as this, thus driving subject specialists to the site.
- The History of Computing in Learning and Education Virtual Museum tells me the most viewed items are the writing competition and Historic Newsletters from the People's Computer Company.
-  A Hack day carried out at the Zurich Hackathon 2014 looked at image analytics from the US National Archives and Record Administrations contributions to flickr commons, looking at 200 million hits in a 3 month period and identifying the most common images: a description of that hack is here, which also gives examples of the most commonly looked at images. "There is a spike on March 24. Further analysis shows that the biggest referral on that day is Dorothy Height. Turns out this lady was featured on a Google Doodle on that day." Popular subjects (and referrer pages, generally from Wikipedia) were John F. Kennedy, World War II, Japanese American Internment, Vietnam War. A full list is available on the project page. This shows the importance of institutions linking their content from Wikipedia, and what can happen if you are featured by Google.
- There is also a useful tool in BaGLAMA which shows view counts for pages using Commons images in GLAM-related category trees.

Update 20/05/14
- The Bodleian also make the very good point that "With most browsers now defaulting to 'do not track' combined with the EU cookies legislation it is difficult to find any sort of data that one can 'stand behind' these days."
- The Jüdischen Museums Berlin's most accessed items are the Sammeldatensatz: Orden, Ehrenzeichen und Embleme von Julius Fliess (1876-1955), but they say that most accesses come from searches for "jewish emblems", and so there is a need to add emblem as synonym for symbol to thesaurus, to help users find what they are looking for. In this way, looking at search terms can help develop user paths through the system so they can find what they actually want.
- The University of Iowa Digital Libraries say that based on google analytics for the last year, the most popular item is a dada book, and the most popular collection is Iowa Maps, but the access numbers for different objects in the database themselves are hard to count, and they'll get back to me on that. Based on recent web searches reported from the web master, a surprisingly high number of people find them via searches for Peter Rabbit: the digital book of which is linked through to their site from the Wikipedia page and various other websites featuring Peter Rabbit.
- The National Library of Wales tell me the most popular article on http://welshnewspapers.llgc.org.uk is a 1916 Cambria Daily Leader advert for 'blouses' and 'hosiery'. To find out more about why may take some digging, though!
- Hamlet Depot and Museums tell me that their most popular items are genealogical records, including railroad employees lists, and seniority records, and also historic pictures.

Update 22/05/14
- The New Zealand Electronic Text Collection tell me that reference works are their most used, including A Grammar and Dictionary of the Samoan Language, with English and Samoan vocabulary (which is linked to from thousands of different sources about New Zealand culture, and discussions on translation), New Zealand in the First World War (which is linked to from various history and genealogy sites) and The Official History of New Zealand in the Second World War (which is also popularly linked to online, including in reminiscing personal postings from soldiers who served, talking about the war on social media).
- The University of Otago Library provided me with a very detailed overview of the issues they face (thanks!). They are in the process of developing a repository to manage all of their digital collections that they want to curate, and the pilot will be live by November, but for the moment, they have a variety of different sites on which you can see digitised material, showing again the complex relationship of databases and content which many institutions have. For example, they have OUR Heritage which is a window across some collections.  Some records are pulled from OUR Heritage and displayed via Special Collections Online Exhibitions. There also is Hocken Collections who had their reader access collection digitised and made available online. They track this via Google Analytics, and also watching their own server stats: and these do not in any way match up. Google does not capture when someone goes directly to a file, so Analytics reports just a fraction of the over a million hits in the past year that they can track on their server. They digitise on request, and respond to community demand, and are trying to prioritise the digitisation process. From Google Analytics, the most heavily used collections are the History of the University and Botanical charts (which belong to the Department of Botany at Otago and some are still used in the Labs.  They digitised these, provided a copy for their use and deposited the originals in Hocken Collections.) The most popular items are “Key plan to Mr G.B. Shaw’s picture of Dunedin in 1851” which is mentioned on various genealogical sites online:  a Painting “Sangro, a rosary of olive trees, landscape of windswept manuka.” which appears linked from some other major federated collections online and a printed map of Rome “Mappa della campagna Romana del 1547” which is a commonly consulted map (there are various copies of it in libraries worldwide) so those searching online to see it must find the freely available copy here.

Thursday, 24 April 2014

Inaugural Preparations

So, my inaugural lecture is coming up in a few weeks, and I'm starting to write it now, nervously... The event has already sold out, but will be streamed online live, and there is also another lecture theatre at UCL that it will be shown live in (The Terras Terrace?).  Dr Rudolf Ammann, UCLDH's designer at large, has kindly provided some visuals for me... here's the promotional flyer.

I plan to write the lecture out long hand once it is done, and of course, you will be the first to know about it (after I've given it...)




Thursday, 27 February 2014

Making it Free, Making it Open - Transcribe Bentham, publications, and unexpected benefits

A few years ago I made a commitment to Open Access - in an attempt to reach a wider audience for my academic work, and to tell people about research as it was happening (not three of four years later once it was locked behind a paywalled journal). I'm really pleased to have something new to talk about once again, and this time I can share it with you before it even comes out in print. Allied to this are a few spin offs from the project in question - Transcribe Bentham, which aims to make the work of the the philosopher and reformer, Jeremy Bentham (1748 – 1832) available via a
 double award-winning collaborative transcription initiative, which is digitising and making available digital images of Bentham’s unpublished manuscripts through a platform known as the ‘Transcription Desk‘. There, you can access the material and—just as importantly—transcribe the material, to help the work of UCL’s Bentham Project, and further improve access to, and searchability of, this enormously important collection of historical and philosophical material. [Link]
First, the article: a pre-publication version which will be published in April in a special issue of the International Journal of Humanities and Arts Computing, from Edinburgh University Press. In it, Tim Causer and myself talk about crowdsourcing transcriptions of Bentham's writings, the impact of Transcribe Bentham on the work of the Bentham Project, and the use of volunteers to help us with tasks traditionally associated with lone academic researchers. We give particular examples of new Bentham material transcribed by volunteers dealing with the subjects of political economy, animal welfare, and convict transportation and the history of early New South Wales, which has further clarified and widened our understanding of certain aspects of Bentham’s thought. You can go and get it here:
 Causer, T. and Terras, M. M. (2014) "Crowdsourcing Bentham: beyond the traditional boundaries of academic history". International Journal of Humanities and Arts Computing, 8 (1) (In press). Link to PDF version in UCL Repository.
I'm pleased it is up there quickly, and openly, and free for all to see. Its one of the aims of the Transcribe Bentham project, of which I am only a small cog, to make Bentham's writings more well known, accessible, and searchable, over the long term. Allied to that is the ethos in involving a wider group of society in contributing to the project - this is about "co-creation" (as it gets called in Gallery, Library, Archive, and Museum (GLAM) circles) rather than academic broadcast. It would make no sense for us to take the product of something developed in online crowdsourcing, and lock it back in the academic ivory tower, given we asked for help to understand and find the material in the first place. We're finding our way with how to credit transcribers along the way (some of them are named in the article above, and we did ask their permission to do so) and to carry out crowdsourcing in as ethical a way as possible (something which is also of concern to others figuring out crowdsourcing in GLAM as we go). All in all, open access here is part of the Transcribe Bentham product: make it free, make it open.

And future doors line up ahead of us to walk through. This week we hit over 7000 manuscripts transcribed via the Transcription Desk, and a few months ago we passed the 3 million words of transcribed material mark. So we now have a body of digital material with which to work, and make available, and to a certain extent play with. We're pursuing various research aims here - from both a Digital Humanities side, and a Bentham studies side, and a Library side, and  Publishing side. We're working on making canonical versions of all images and transcribed texts available online.  Students in UCL Centre for Publishing are (quite literally) cooking up plans from what has been found in the previously untranscribed Bentham material, unearthed via Transcribe Bentham. What else can we do with this material?

And other doors open. I've talked before about reuse of the code behind Transcribe Bentham - in use by the Public Record Office of Victoria, and parts of it (the Transcription Desk bar, since you ask) has since been used in the Letters of 1916 transcription project, too. We're also in talks with other collections who are thinking of doing crowdsourcing, and who may use the Transcription Desk: watch this space. Again, this is part of the same trajectory: make it free, make it available.

And other doors open. The development of systems to read handwritten material (more advanced than Optical Character Recognition, which to date really only has success on printed, clean material) depends on having datasets of images of handwritten texts, plus checked validated transcripts of their content in a useful format, to train and test systems and algorithms. Transcribe Bentham is pleased to be part of the Transcriptorium project (as am I!), looking into Handwritten Text Recognition (HTR) technologies, and a set of 433 pages of Bentham's manuscripts plus the crowdsourced transcriptions are this year making up the "ICFHR 2014 Handwritten Text Recognition on the tranScriptorium Dataset" - to evaluate and test the current algorithms on Handwritten Text Recognition. How great is that. Did any of us sitting round the table first discussing crowdsourcing and Bentham back in 2009 ever expect we (and our transcribers) would be creating a benchmarked dataset in which to train handwriting recognition technologies? No. It is wonderful.

Create. Involve. Research. Make it available. Some of this by planning, some of this by happy accident. I now see the Open Access ethos underpinning all of this, and driving forward the direction of my research into the use of computing in culture, heritage, and the humanites. So, enjoy the article. We have access to and did and found out some cool stuff, you know - and we made it freely available. 



Wednesday, 5 February 2014

Male, Mad and Muddleheaded: Academics in Children's Picture Books

Academics in children's picture books tend to be elderly, old men, who work in science, called Professor SomethingDumb. Why does this matter?


Like many academics, I love books. Like many book-loving parents, I'm keen to share that love with my young children. Two years ago, I chanced upon two different professors in children's books, in quick succession. Wouldn't it be a fun project, I thought, to see how academics, and universities, appear in children's illustrated books? This would function both as an excuse to buy more books (we do live in a golden age of second hand books, cheaply delivered to your front door) and to explain to my kids - now five and a half, and twins of three - what Mummy Actually Does.

It turns out it's hard to search just for children's books, and picture books, in library catalogues, but I combed through various electronic library resources, as well as Amazon, eBay, LibraryThing, and Abe, to dig up source material. I began to obsessively search the bookshelves of kids books in friend's houses, and doctors and dentist and hospital waiting rooms, whilst also keeping on the look out on our regular visits to our local library: often academics appear in books without being named in the title, so dont turn up easily via electronic searches. Parking my finds on a devoted Tumblr which was shared on social media, friends, family members, and total strangers tweeted, facebooked, and emailed me to suggest additions. People sidled up to me after invited guest lectures to whisper "I have a good professor for you..." Two years on, I've no doubt still not found all of the possible candidates, but new finds in my source material are becoming less frequent. 101 books (or individual books from a series*) and 108 academics, and a few specific mentions of university architecture and systems later, its time to look at what results from a survey of the representation of academics and academia in children's picture books.

What are academics in children's books like?

The 108 academics found consist of 76 Professors, 21 Academic Doctors, 2 Students, 2 Lecturers, 1 Assistant Professor, 1 Child, 1 Astronomer, 1 Geographer, 1 Medical Doctor who undertakes research, 1 researcher, and 1 lab assistant. In general, the Academic Doctors tend to be crazy mad evil egotists ("It's Dr Frankensteiner - the maddest mad scientist on mercury!"), whilst the Professors tend to be kindly, but baffled, obsessive egg-heads who dont quite function normally.

The academics are mostly (old, white) males. Out of the 108 found, only 9 are female: 90% of the identified academics are male, 8% are female, and 2% have no identifiable gender (there are therefore much fewer women in this cohort than in reality, where it is estimated that one third of senior research posts are occupied by women).  They are also nearly all caucasian: only two of those identified are people of colour: one Professor, and one child who is so smart he is called The Prof: both are male: this is scarily close to the recent statistic that only 0.4% of the UK professoriat are black. 43% of those found in this corpus are are elderly men, 33% are middle aged (comprising of 27% male and 6% female, there are no elderly female professors, as they are all middle age or younger). The women are so lacking that the denoument of one whodunnit/ solve the mystery/ choose your own adventure book for slightly older children is that the professor they have been talking about was actually a woman, and you didn't see that coming, did you? Ha!

The earliest published academic in a children's book found was in 1922 (although its probable that the real craze for featuring baffled old men came after the success of Professor Branestawm, which was a major international bestseller, first published in 1933, and not out of print since). The first woman Professor found is the amazing Professor Puffendorf - billed as "the world's greatest scientist" -,  published in 1992, 70 years after the first male professor appears in a children's book. 70 years (although it is frustrating that the book really isn't about her, but what her jealous, male lab assistant gets up to in her lab when she goes off to a conference. More Puffendorf next time, please). There is also a more recent phenomenon of using a Professor as a framing device to suggest some gravitas to a book's subject, but the professor themselves does not appear in any way within the text, so its impossible to say if they are male or female.  Male Professors in children's books have appeared much more frequently over the past ten years: women not so much.


What areas do these fictional academics work in? (There is an entirely different genre of children's books covering the lives of real academics - but that's for another obsessive compulsive mini research project). Here we identify the subject areas of the 108 academics:

Most of the identified academics work in science, engineering and technology subjects. 31% work in some area of generic "science", 10% work in biology, a few in maths, paleontology, geography, and zoology, and lone academics in rocket science, veterinary science, astronomy, computing, medical research and oceanography. There is one prof who is a homeopath, and I wasnt sure whether to put them in STEM or Fiction, so I plumped for STEM as they seemed to be trying to see if homeopathy worked (I like to presume all the academics here have proper qualifications, but who knows if fictional characters can buy professorships online these days). Subjects classed as Fictional were serpentology, dragonology, and magic. Arts, Humanities and Social Science subjects identified are archaeology (6% of the total), and linguistics, psychology, arts and theatre. 27% of those with an academic title make no reference to what type of area they supposed to work in: they are generally just trying to take over the world. Just out of interest, the female academics identify their subject areas as serpentology, maths, paleontology, ecology, and three generic scientists (with two further unknown subjects), so its not as is the women are doing the "soft" subjects in children's books, when they actually appear.

Not all of these academics featured are humans: 74% are human, 19% are animals, 4% are aliens, 2% are unknown, and 1% are vegetable.  There are no discernible trends regarding animals that are chosen to represent wisdom - its not like they are all owls - with three mice, three dogs, two toads, a kingfisher, a gorilla, a woodpecker, a pig, a crow, an owl, a dumbo octopus, a mole, a bumble bee, a shark, a cockroach, and a wooden bird. If you spot any defining similarities there, let me know.

There are some other fun trends to note. 46% of those humans featured are bald (higher than the average percentage?) - no women are bald. 35% had very big, messy hair, and it seems to be that if you are in academia, you should be a bit disheveled, in general. 45% have white hair - but none of the women have white hair. 13% had ginger hair (higher than the average percentage?). 37% had moustaches, and 16% had beards (higher than the average percentage?) - but no women had facial hair.  What they wore is also interesting:
Labcoats, suits (but not if you are female!) or safari suits (but not if you are female!) are the academic uniform du jour.

The names given to the academics are telling, with the majority being less than complimentary: Professor Dinglebat, Professor P. Brain, Professor Blabbermouth, Professor Bumblebrain, Professor Muddlehead, Professor Hogwash, Professor Bumble, Professor Dumkopf, Professor Nutter, and two different Professor Potts. There is the odd professor with a name that alludes to intelligence: Professor I.Q, Professor Inkling, Professor Wiseman, but those are in the minority.

What types of book are they featured in? 82% of the 101 books are fiction stories, and the theme of the stories tends to be "academic is out of touch with how the world works, with hilarious consequences" in the case of professors, or "is evil and wants to take over the world, but is thwarted by our plucky hero (never heroine)" in the case of doctors. 7% of the books are factual, using a fictional academic to explain how science or experiments work, and 1% are cookbooks. The remainder, 10%, are a curious genre I have called "tall tales" - where the fictional academic character is brought in to bring gravitas and explain something, but the explanations are either fictional or bordering on fiction. Its a curious blend of science and fiction: they are not traditional stories, but work in a way which subverts the traditional children's science books, injecting fiction into the process (not very succesfully, in most cases).

What can we draw from this? If you are going to be a fictional human academic in a children's book, you are most likely to be an elderly, old man, with big white hair, who wears a lab coat, has facial hair, works in science, and is called Professor SomethingDumb or Dr CrazyPants, featuring in a story about how you bumble around causing some type of chaos. Close your eyes and think of a Professor. Is this what you see? Or this?  (One wonders how much well-circulated images of Einstein have perculated into the subconscious of writers and publishers to emerge as the obvious representation of an academic in children's illustrated books).


Universities in Children's Picture Books

What about the universities themselves? They dont feature as often as the academics associated with them - the focus of children's books is seldom about such an institution that will have an effect so far in the future of the reader, although some characters plan well ahead in advance. Lectures, when depicted, are obviously very boring and impenetrable. University buildings are like castle schools for grown ups or  the site of secret underground lairs or the best holiday park ever. There are a couple of sweet kids books from the USA that attempt to describe the university campus and rituals of specific actual colleges - Baylor University and Boston College.  But in general, the children's books revolve around the characters, rather than the fact they are in a university, per se.

Why is this relevant? 

Obviously, this has been a bit of a fun project. Given the lengths gone to to gather this corpus of children's books, it is unlikely that any individual child would happen across all of the books noted. It's actually interesting to think how few children's picture or illustrated books feature academics or academia (at time of writing, Amazon lists 1.3 million books in its children's section, and 101* different books (or books series) were identified in this project). While no doubt there are other books out there not on the list, this has been a darn good crack at finding as many as possible, not only in the English Language. Professors and academic Doctors in children's books are a useful device on occasion, but really are not terribly frequent in the scheme of things.

That said, the difference in gender, and how women and men are represented, and the underepresentation of those who are anything but white in children's books about academia, is shocking, especially given that almost all scientific fields are still dominated by men, and women are frequently discriminated against and although 46% of all PhD graduates in the EU are female, only 1/3 of senior research posts are occupied by women. At a time when researchers are asking if available toys can influence later career choice, can the same be said about books? At a time when it is becoming the parents' job to encourage girls into science and technology - and to educate all children about science and engineering careers - does the lack of anything but white, old men as academics in children's books reinforce the impossibility of anyone other than those making a contribution? At a time when the leaky pipe of academia shows that women are leaving in droves at every level of the academic ladder, should we be worried that there are no female academics in children's books above middle age?  Laugh at this analysis if you will, but sociological analysis of other children's books has shown that
there is a hidden language or code inscribed in children’s books, which teaches kids to view inequalities within the division of labor as a “natural” fact of life  – that is, as a reflection of the inherent characteristics of the workers themselves.  Young readers learn (without realizing it, of course) that some... are simply better equipped to hold manual or service jobs, while other[s]... ought to be professionals. Once this code is acquired by pre-school children... it becomes exceedingly difficult to unlearn.  As adults, then, we are already predisposed to accept the hierarchical, caste-based system of labor that characterizes the... workplace. [link]
Another analysis of 6000 children's books published between 1900 and 2000 suggests the gender disparity, and the lack of women characters, sends children a message that "women and girls occupy a less important role in society than men or boys":
The messages conveyed through representation of males and females in books contribute to children's ideas of what it means to be a boy, girl, man, or woman. The disparities we find point to the symbolic annihilation of women and girls, and particularly female animals, in 20th-century children's literature, suggesting to children that these characters are less important than their male counterparts... The disproportionate numbers of males in central roles may encourage children to accept the invisibility of women and girls and to believe they are less important than men and boys, thereby reinforcing the gender system. [link]
As for the diversity issue - in general, children's books have been shown to be stubbornly white, even though "children of all ethnicities and races need role models of all ethnicities and races. That breeds normalcy and acceptance, and it's good for everybody. [link]" What we are seeing here in this corpus, then, is a microcosm of what is happening in children's literature in general, although played out alongside an ongoing debate about the involvement of women and minorities in the academy. That doesn't make it ok, mind.

There are wider nuances, though, that dont just involve headcounts of men and women, black and white.  Children's perceptions of scientists have been shown to be based on various stereotypes, and the stereotypes of academics presented and promulgated in these books is the product of writers and publishers who, taken together, quite clearly don't think academics are much cop, which will perculate back to those who read the books, or have the books read to them. Academics are routinely shown as individuals obsessed with one topic who are either baffled and harmless and ineffectual, or malicious, vindictive and psychotic, and although these can be affectionate sketches ("bless! look at the clueless/psychopathic genius!") academics routinely come across as out of touch wierdos - and what is that teaching kids about universities?  In this age of proving academic "impact", it might be not so bad for us to be able to show we were relevant to society? That there is more to academia than science? Or for the kids books I show my kids to have more positive and integrated representations of professors and academics? Perhaps this is not the role of kids books though, and I should just be telling my kids my own tales of academic derring-do. 

I mean, who would spend two years gathering a corpus of kids lit for fun, and then count how many beards the people in the books had. Wierdo. Wierdos, the lot of them.

Top Children's Picture Books Featuring Academia

Out of all of the books found in this project, there are some which have been read and read again by my boys, and some which got tossed aside as soon as they arrived. There is also one I adore, but the boys are not so interested in. If you wanted to read some children's books which feature academics and universities, you could do worse than start with the following:

1. Dr. Dog, by Babette Cole, Red Fox, London, 1994. Dr Dog is a medical Doctor, but who also does research. It has one fantastic page where Dr Dog goes to conference in Brazil to give a talk about bone marrow, and that one page has explained where Mummy goes when she goes in the airplane, on many occasions. Very useful. For age 2+

2. Professor Puffendorf’s Secret Potions. Robin Tzannes, Korky Paul, Oxford University Press, 1992. The most read story in our house about a Professor. Prof P goes off to a conference and her lazy lab assistant wants to steal her secret potions for himself... (I would have preferred to see more about her, though). 2+

3. Mahalia Mouse Goes to College, by John Lithgow, illustrated by Igor Oleynikov, Simon and Schuster, 2007, New York. Mahalia is a brave little mouse who wants to go to Harvard and study maths, and succeeds. Uplifting. 3+

4. The Rooftop Rocket Party, by Roland Chambers, Anderson Press, 2002. Doctor Gass is a rocket scientist, who doesnt believe a little boy that the water coolers on top of the New York skyline are capable of going to the moon... Delightful.3+

5.  Professor Astro Cat’s Frontiers of Space, by Dominic Walliman and Ben Newman. Flying Eye Books, 2013. This is a lovely, well illustrated, detailed and well written kids introduction to astronomy, which is explained by Professor Astro Cat. Nice paper too, bibliophiles. For age 5+.

6. Professor Wormbog in Search for the Zipperump-a-Zoo (Mercer Mayer Classic Collectible: Little Monsters), by Mercer Mayer, Golden Pr. 1976.  Professor Wormbog is searching for the only thing he hasnt got in his zoology collection... perhaps they are right under his nose all along? 2+

7. Mungo and the Spiders from Space. By Timothy Knapman, illustrated by Adam Stower. 2007, Puffin, London. A rollicking space adventure about a little boy who gets an old book about an evil doctor... and steps into the book... 4+

8. Any of the Octonaut books, by Meomi. Now a popular tv programme, the Octonauts started off as a book series. Professor Inkling shows how he can work with others to, ya know, deliver impact in the field etc etc. The Meomi books (Harper Collins) are delightful, with lots of detail that demand rereading - start of with the Octonauts Explore the Great Big Ocean (but steer clear of the tv spin off books published by Simon and Schuster - they arent a patch on the illustrated books by Meomi). Much, much loved in our house. 2+

9.  The Dr Xargle and Professor Xargle books (he gets promoted at some point, evidently). By Jeanne Willis and Tony Ross, different publishers. Xargle explains various things about human society, or science, to his university class of aliens, with hilarious consequences. 3+

10. Professor Twill’s Travels, written and illustrated by Bob Gumpertz. Ward Lock Limited, London, 1968. A sweet tale of Professor Twill, travelling the world to collect animals. The illustrations in this book are very much of the era - it's just beautiful. A forgotten classic. 1+

And one just for the adults: Jack Dawe and The Professors, Bedtime Stories for Technically Inclined Little Ones, 1964, illustrated by Brian Green. (By "Uncle B", no press listed). An Oxford Professor wrote down and vanity published the tales of academia he told his nieces and nephews. They are absolutely hilarious.

Happy reading. And if you find any more academics or universities that I dont know about in children's picture books... do let me know!


*There were a few characters that appear in series of books, for example Professor Branestawm, Dr Xargle, and Professor Inkling in Octonauts. Only one book from each series was counted: if all the books from series were included, there would be over 140 books in total. Please note, none of the spin offs from the children's film Monsters University were included in this analysis, as we're dealing here with things that started as books, rather than spin offs, and it would take over the corpus, and, hmmm, that deserves an analysis of its very own... uh-oh...