The other day we had a meeting at work with a former colleague (now at QMUL) to discuss general project progress. The topics covered included the somewhat complicated workflow that we're using for doing optical music recognition (OMR) on early printed music sources. It includes mensural notation specific OMR software called Aruspix. Aruspix itself is fairly accurate in its output, but the reason why our workflow is non-trivial is that the sources we're working with are partbooks; that is, each part (or voice) of a multi-part texture is written on its own part of the page, or even on a different page. This is very different to modern score notation in which each part is written in vertical alignment. In these sources, we don't even know where separate pieces begin and end, and they can actually begin in the middle of a line. The aim is to go from the double page scans ("openings") to distinct pieces with their complete and correctly aligned parts.
Anyway, our colleague from QMUL was very interested in this little part of the project and suggested that we spend the afternoon, after the style of good software engineering, formalising the workflow. So that's what we did. During the course of the conversation diagrams were drawn on the whiteboard. However (and this was really the point of this post) I made notes in Haskell. It occurred to me a few minutes into the conversation that laying out some types and the operations over those types that comprise our workflow is pretty much exactly the kind of formal specification we needed.
Here's what I typed:
module MusicalDocuments where import Data.Maybe -- A document comprises some number of openings (double page spreads) data Document = Document [Opening] -- An opening comprises one or two pages (usually two) data Opening = Opening (Page, Maybe Page) -- A page comprises multiple systems data Page = Page [System] -- Each part is the line for a particular voice data Voice = Superius | Discantus | Tenor | Contratenor | Bassus -- A part comprises a list of musical sybmols, but it may span mutliple systems --(including partial systems) data Part = Part [MusicalSymbol] -- A piece comprises some number of sections data Piece = Piece [Section] -- A system is a collection of staves data System = System [Staff] -- A staff is a list of atomic graphical symbols data Staff = Staff [Glyph] -- A section is a collection of parts data Section = Section [Part] -- These are the atomic components, MusicalSymbols are semantic and Glyphs are --syntactic (i.e. just image elements) data MusicalSymbol = MusicalSymbol data Glyph = Glyph -- If this were real, Image would abstract over some kind of binary format data Image = Image -- One of the important properties we need in order to be able to construct pieces -- from the scanned components is to be able to say when objects of the some of the -- types are strictly contiguous, i.e. this staff immediately follows that staff class Contiguous a where immediatelyFollows :: a -> a -> Bool immediatelyPrecedes :: a -> a -> Bool immediatelyPrecedes a b = b `immediatelyFollows` a instance Contiguous Staff where immediatelyFollows :: Staff -> Staff -> Bool immediatelyFollows = undefined -- Another interesting property of this data set is that there are a number of -- duplicate scans of openings, but nothing in the metadata that indicates this, -- so our workflow needs to recognise duplicates instance Eq Opening where (==) :: Opening -> Opening -> Bool (==) a b = undefined -- Maybe it would also be useful to have equality for staves too? instance Eq Staff where (==) :: Staff -> Staff -> Bool (==) a b = undefined -- The following functions actually represent the workflow collate :: [Document] collate = undefined scan :: Document -> [Image] scan = undefined split :: Image -> Opening split = undefined paginate :: Opening -> [Page] paginate = undefined omr :: Page -> [System] omr = undefined segment :: System -> [Staff] segment = undefined tokenize :: Staff -> [Glyph] tokenize = undefined recogniseMusicalSymbol :: Glyph -> Maybe MusicalSymbol recogniseMusicalSymbol = undefined part :: [Glyph] -> Maybe Part part gs = if null symbols then Nothing else Just $ Part symbols where symbols = mapMaybe recogniseMusicalSymbol gs alignable :: Part -> Part -> Bool alignable = undefined piece :: [Part] -> Maybe Piece piece = undefined
I then added the comments and implemented the
part function later
on. Looking at it now, I keep wondering whether the types of the
functions really make sense; especially where a return type is a type
that's just a label for a list or pair.
I haven't written much Haskell code before, and given that I've only implemented one function here, I still haven't written much Haskell code. But it seemed to be a nice way to formalise this procedure. Any criticisms (or function implementations!) welcome.
I submitted my Ph.D thesis at the end of September 2013 in time for what was believed to be the AHRC deadline. It was a rather slim submission at around 44,000 words and rejoiced under the title of Understanding Information Technology Adoption in Musicology. Here's the abstract:
Since the mid 1990s, innovations and technologies have emerged which, to varying extents, allow content-based search of music corpora. These technologies and their applications are known commonly as music information retrieval (MIR). While there are a variety of stakeholders in such technologies, the academic discipline of musicology has always played an important motivating and directional role in the development of these technologies. However, despite this involvement of a small representation of the discipline in MIR, the technologies have so far failed to make any significant impact on mainstream musicology. The present thesis, carried out under a project aiming to examine just such an impact, attempts to address the question of why this has been the case by examining the histories of musicology and MIR to find their common roots and by studying musicologists themselves to gauge their level of technological sophistication. We find that some significant changes need to be made in both music information retrieval and musicology before the benefits of technology can really make themselves felt in music scholarship.
(Incidentally, the whole thing was written using
org-mode, including some graphs that get
automatically generated each time the text is compiled. Unfortunately
I did have to cheat a little bit and typed in LaTeX
rather than using proper
org-mode links for the references.)
So the thing was then examined in January 2014 by an information science, user studies expert and a musicologist. As far as it went, the defence was actually not too bad, but after defending the defensible it eventually became clear that significant portions of the thesis were just not up to scratch; not, in fact, defensible. They weren't prepared to pass it and have asked that I revise and then re-submit it.
Two things seem necessary to address: 1) why did this happen? And 2) what do I do next?
I started work on this Ph.D with only quite a vague notion of what it was going to be about. The Purcell Plus project left open the possibility of the Ph.D student doing some e-Science-enabled musicological study. But I think I'd come out of undergraduate and masters study with a view of academic research that was very much text-based; the process of research---according to the me of ca. 2008---was to read lots of things and synthesise them, and the more obscure and dense the stuff read the better. The process is one of noticing generalisations amongst all these sources that haven't been remarked on before and remarking on them, preferably with a good balance of academic rigour and barefaced rhetoric. And I brought this pre-conception into a computing department. My first year was intended to be a training year, but I was actually already highly computer literate with considerable programming experience and quite a bit of knowledge of at least symbolic work in computational musicology. Consequently, I didn't fully engage with learning new stuff during that first year and instead embarked on a project of attempting to be rhetorical. It wasn't until later on that I really started to understand that those around had a completely different idea as to how research can be carried out. While I was busy reading, most of my colleagues were doing experiments; they were actually finding out new stuff (or at least were attempting to) and had the potential to make an original contribution to knowledge. At this point I started to look for research methods that could be applicable to my subject matter and eventually hit upon a couple of actually quite standard social science methods. So I think that's the first thing that went wrong: I failed to take on board soon enough the new research culture that I had (or, I suppose, should have) entered.
I think I've always been someone who thrives on the acknowledgement of things I've done; I always looked forwarded to receiving my marks at school and as an undergraduate; and I liked finding opportunities to do clever jobs for people, especially little software development projects where there's someone to say, "that's great! Thanks for doing that." I think I quickly found that doctoral research didn't offer me this at all. My experience was very much a solitary one where no one was really aware of what I was working on. Consequently two things happened: first, I tended not to pursue things very far through lack of motivation; and second (and this was the really dangerous one), I kept finding other things to do that did give me that feedback. I always found ways to justify these--lets face it---procrastination activities; mainly that they were all Goldsmiths work, including quite a lot of undergraduate and masters level teaching (the latter actually including designing a course from scratch), some Computing Department admin work, and some development projects. Doing these kinds of activities is actually generally considered very good for doctoral students, but they're normally deliberately constrained to ensure that the student has plenty of research time still. Through my own choice, I let them take over far too much of my research time.
The final causal point to mention is the one that any experienced academic will immediately point to: supervision. I failed to take advantage of the possibilities of supervision. As my supervisor was both involved in the project of which the Ph.D was part and also worked in the same office as me, we never had the right kind of relationship to foster good progress and working habits from me. I spoke to my supervisor every day and so I didn't really push for formal supervision often enough. I can now see that it would have been better to have someone with whom I had a less familiar relationship and who had less of an interest in my work and who, as a result, would just operate and enforce the procedures of doctoral project progress. It's also possible that a more formal supervision relationship would have addressed the points above: I may have been forced to solidify ideas and to identify proper methods much sooner; I may have had more of the feedback that I needed; and I may have been more strongly discouraged from engaging in so much extra-research activity.
The purpose of all this is not to apportion blame (I have a strong sense of being responsible for everything that I do), but to state publicly something that I've been finding very hard to say: I failed my Ph.D. And (and this is the important bit) to make sure that I get on with what I need to do to pass it.
- I need disinterested supervision; I've requested assistance from the Sociology Department which should fit well with the research methods I used;
- I need to improve the reporting of the studies I carried out; this involves correcting and expanding the methods sections and also doing more analysis work;
- I need to either extend the samples of the existing studies, or carry out credible follow up studies to improve my evidence base;
- I need to focus the research questions better and (having done so) make the conclusions directly address them.
I'm going to blog this work as it goes along. So if I stop blogging, please send me harassing emails telling me to get the f*** on with it!
My formal role on the Transforming Musicology project is Project Manager. This involves ensuring that goals are reached, objectives are met, and deliverables are, well, delivered. Two things seem to be key to these ends: maintaining a good and current overview of both high and low level project activity; and maintaining good communication across the whole project team.
Early on in the project, one of the co-investigators started an IkiWiki for us to use for various project management activities. Since then it's been my responsibility to develop this resource. Given that I've asserted that awareness of activity and communication are crucial for project management, how have I used the wiki to enable those?
We're using an IRC for part of our communication needs, although not all project team members are fully conversant in IRC and its idiosyncrasies. So I thought it would be useful to keep a log on the wiki. I already have a log file for the channel which is generated by dircproxy so I started looking for ways to get an HTML version of this onto the wiki. Nothing was immediately apparent. Stuff exists for generating HTML from channel logs, so I thought about scripting something to dump some HTML to somewhere accessible from the wiki which, in turn, lead me to thinking about automating it the proper IkiWiki way: with a plugin. And irclog was born.
It provides a directive,
[[!irclog ]], which pulls a channel log from a given location, uses Parse::IRCLog to parse the log for
\me events, and renders those events as HTML to be included in place of the
[[!irclog ]] in the page. The implementation involved adding dircproxy-specific parsing to Parse::IRCLog. (It would be nice to get that merged into Parse::IRCLog itself, but for now it's bundled with the plugin.) It also involved thinking up strategies for allowing the host on which the wiki is compiled to get at the channel log at compile time. I did this by allowing the
location parameter to the
[[!irclog ]] directive to be a string parsable by the core
URI module and then implementing (well, not quite) handlers for a number of URI schemes. In fact, I've only really tested the scheme I'm actually using,
ssh. In my case, the wiki compiling host holds an SSH key with a public part authorised on the dircproxy host to retrieve the log file. I then have cron on the wiki compiling host rebuild the wiki periodically to cause the log to be updated. (There might be a less sledge hammer-like solution to the updating problem: perhaps
--rendering the page and moving the result to the
To make the plugin a bit more of an IkiWiki citizen, it allows inclusion of wikilinks by providing a text substitution feature. You can specify a
keywords argument to the
[[!irclog ]] directive which should contain a string formatted a bit like a Perl hash (e.g.
richard=>[[richard]]) and which indicates that occurrences of the 'key' should be replaced by the 'value'. The replacement text could be a wikilink, thus allowing your IRC log to integrate with the rest of your wiki. The obvious usage (and the one I've implemented) is a mapping from nicks to project team members' user pages.
A future post may document how I'm using IkiWiki for task management...
A great deal of time has passed since I last wrote a blog post. During that time my partner and I have had a baby (who's now 20 months old) and bought a house, I've started a new job, finished my Ph.D, finished the previously mentioned new job, and started another new job.
The first new job was working for an open source consultancy firm called credativ which is based in Rugby but which, at the time I started, had recently opened a London office. Broadly, they consult on open source software for business. In practice most of the work is using OpenERP, an open source enterprise resource planning (ERP) system written in Python. I was very critical of OpenERP when I started, but I guess this was partly because my unfamiliarity with it led to me often feeling like a n00b programmer again and this was quite frustrating. By the time I finished at credativ I'd learned to understand how to deal with this quite large software system and I now have a better understanding of its real deficiencies: code quality in the core system is generally quite poor, although it has a decent test suite and is consequently functionally fairly sound, the code is scrappy and often quite poorly designed; the documentation is lacking and not very organised; its authors, I find, don't have a sense of what developers who are new to the framework actually need to know. I also found that, during the course of my employment, it took a long time to gain experience of the system from a user's perspective (because I had to spend time doing development work with it); I think earlier user experience would have helped me to understand it sooner. Apart from those things, it seems like a fairly good ERP. Although one other thing I learned working with it (and with business clients in general) is the importance of domain knowledge: OpenERP is about business applications (accounting, customer relations, sales, manufacture) and, it turns out, I don't know anything about any of these things. That makes trying to understand software designed to solve those problems doubly hard. (In all my previous programming experience, I've been working in domains that are much more familiar.)
As well as OpenERP, I've also learned quite a lot about the IT services industry and about having a proper job in general. Really, this was the first proper job I've ever had; I've earned money for years, but always in slightly off-the-beaten-track ways. I've found that team working skills (that great CV cliché) are actually not one of my strong points; I had to learn to ask for help with things, and to share responsibilities with my colleagues. I've learned a lot about customers. It's a very different environment where a lot of your work is reactive; I've previously been used to long projects where the direction is largely self-determined. A lot of the work was making small changes requested by customers. In such cases it's so important to push them to articulate as clearly as possible what they are actually trying to achieve; too often customers will describe a requirement at the wrong level of detail, that is, they'll describe a technical level change. What's much better is if you can get them to describe the business process they are trying to implement so you can be sure the technical change they want is appropriate or specify something better. I've learned quite a bit about managing my time and being productive. We undertook a lot of fixed-price work, where we were required to estimate the cost of the work beforehand. This involves really knowing how long things take which is quite a skill. We also needed to be able to account for all our working time in order to manage costs and stick within budgets for projects. So I learned some more org-mode tricks for managing effort estimates and for keeping more detailed time logs.
My new new job is working back at Goldsmiths (where I did my Ph.D) again, with mostly the same colleagues. We're working on an AHRC-funded project called Transforming Musicology. We have partners at Queen Mary, the Centre for e-Research at Oxford, Oxford Music Faculty, and the Lancaster Institute for Contemporary Arts. The broad aim of the project can be understood as the practical follow-on from my Ph.D: how does the current culture of pervasive networked computing affect what it means to study music and how music gets studied? We're looking for evidence of people using computers to do things which we would understand as musicology, even though they may not. We're also looking at how computers can be integrated into the traditional discipline. And we're working on extending some existing tools for music and sound analysis, and developing frameworks for making music resources available on the Semantic Web. My role is as project manager. I started work at the beginning of October so we've done four days so far. It's mainly been setting up infrastructure (website, wiki, mailing list) and trying to get a good high-level picture of how the two years should progress.
I've also moved my blog from livejournal to here which I manage using Ikiwiki. Livejournal is great; I just liked the idea of publishing my blog using Ikiwiki, writing it in Emacs, and managing it using git. Let's see if I stick to it...
James Murdoch gave a lecture for the opening of the new Centre for Digital Humanities at UCL on Thursday. I wasn't there; it was invitation only and also I was at the Barbican listening to the LSO playing Turangalila. Fantastic!
I did, however, read the transcript of his lecture and wanted to make a few comments on content freedom.
Murdoch's view is essentially that content published online (and especially journalism) is a kind of commodity and that it must be paid for, or, in his reasonable and unagressive terms, content producers should be allowed to "assert a fair value for their online editions."
However, as well as being pro-paid for content, he's also quite anti-free content. He describes what he calls the "digital consensus": that the virtuality of the internet requires that content published on it be free; that free and pervasive availability of content leads to a better society ("wiser, better informed and more democratic"). His references to "utopian" narratives possibly (but not explicitly) betray a dislike for a kind of hippie culture of the early days of the internet.
Similarly, he's quite critical of the British Library's intention to digitise and publish online free of charge archives of newspapers. He describes how doing this helps them to secure additional public funding and seems to argue that is an unfair form of competition: the BL is getting paid to publish free content, while media companies have to compete to sell their content. He's critical of the claims academic institutions make for justifying their publication of content online free of charge on the grounds of increased access, preservation, and scholarly interest, arguing that ultimately they stand to gain financially from doing so.
"When we look over this terrain, we can see the economic pressures driving down the value of content are very powerful. Arguments over rights and wrongs seem little more than a disguise for self-interest."
He gives a brief account of the history of British copyright law, arguing that it was established to help protect the interests of content producers and that it must still play that role today. He argues that even those who do wish to publish their content free of charge stand to gain from copyright law, and that copyright incentivises content production.
Despite this concession, he later makes clear his stance on free content:
"If you want to offer your product for free, then there is nothing to stop you --- and it's a lot easier these days to do so. The only temptation you need to resist is the idea that what you want to do is what everyone else should be made to do."
Further, he argues that the future of the "creative industries" should be considered in an "economically serious way". This is probably the most revealing comment he makes betraying his attitude towards freedom of content: it's unserious; it's silly, hippie utopianism that can't stand up to the might of capitalist media imperialism. Those who are involved in free content cannot be serious about what they produce.
Finally on the subject of objecting to the free, he argues that if news producers do not charge for their content, then the only people who would be able to produce news would be "the wealthy, the amateur or the government." Of course, in some regimes state-sponsored news may be biased and even harmful, but this happens not to be the case with the BBC which has a remit of impartiality. But even more concerning is his implied assumption that the private sector is likely to produce higher quality, and less partial news content than governments, amateurs and the wealthy. In fact, private sector content producers (especially in news) rely on, and therefore are subject to the opinions of, the wealthy. These wealthy are actually often responsible for the nature of how news gets reported and even what news gets reported by private sector news businesses.
My main criticism, then, is of his assumption that paid for, private sector content is necessarily better than free and/or public sector content. However, there are two other points I find interesting in this lecture.
First is his use of the term "content" to describe and generalise the published work he wants to protect. "Content" is a very digital age notion, it implies a late twentieth century conception of knowledge capital and of knowledge work that seeks to reduce literature to information that can be quantified, homogenised, stored, transmitted. Content (in this sense) has de-coupled arts from practice: inscriptive mechanisms---printing, recording, digital encoding---change the nature of art works from practice to text. It's this text that Murdoch obsesses over, while creative practitioners, in fact, are increasingly returning to art as practice. His failure to realise the importance of practice against content de-values his universal claims for paid-for's supremacy.
The other is his conception of the "humanities". At one point he effectively equates the humanities with "the creative industries". He also (when describing the importance of private sector news production) appeals to those who "really care[s] about the humanities of tomorrow" to feel the same as he does about private ownership of media. This implies a total lack of understanding of the critical (by which I mean being critical) role the humanities must play. Humanities cannot be privately sponsored and subject to bias and politicisation. Humanities must be independent, state-sponsored, and provide a voice of criticism in the world of content production, business and politics.
I went to a colloquium on e-Research on Texts and Images at the British Academy yesterday; very, very swanky. Lunch was served on triangular plates, triangular! Big chandeliers, paintings, grand staircase. Well worth investigating for post-doc fellowships one day.
There were also some good papers. Just one or two things that really stuck out for me. There seems to be quite a lot of interest in e-research now around formalising, encoding, and analysing scholarly process. The motivation seems to be that, in order to design software tools to aid scholarship, it's necessary to identify what scholarly processes are engaged in and how they may be re-figured in software manifestations. This is the same direction that my research has been taking, and relates closely to the study of tacit knowledge in which Purcell Plus is engaged.
S�gol�ne Tarte presented a very useful diagram in her talk explaining why this line of investigation is important. It showed a continuum of activity which started with "signal" and ended with "meaning". Running along one side of this continuum were the scholarly activities and conceptions that occur as raw primary sources are interpreted, and along the other were the computational processes which may aid these human activities. Her particular version of this continuum was describing the interpretation of images of Roman writing tablets, so the kinds of activities described included identification of marks, characters, and words, and boundary and shape detection in images. She described some of the common aspects of this process, including: oscillation of activity and understanding; dealing with noise; phase congruency; and identifying features (a term which has become burdened with assumed meaning but which should also be considered at its most general sometimes). But I'm sure the idea extends to other humanities disciplines and other kinds of "signal" or primary sources.
Similarly, Melissa Terras talked about her work on knowledge elicitation from expert papyrologists. This included various techniques (drawn from social science and clinical psychology) such as talk-aloud protocols and concept sorting. She was able to show nice graphs of how an expert's understanding of a particular source switches between different levels continuously during the process of working with it. It's this cyclical, dynamic process of coming to understand an artifact which we're attempting to capture and encode with a view to potentially providing decision support tools whose design is informed by this encoded procedure.
A few other odd notes I made. David DeRoure talked about the importance of social science methods in e-Humanities. Amongst other things, he also made an interesting point that it's probably a better investment to teach scholars and researchers about understanding data (representation, manipulation, management) than it is to buy lots of expensive and powerful hardware. Annamaria Carusi said lots of interesting things which I'm annoyed with myself for not having written down properly. (There was something about warning of the non-neutrality of abstractions; interpretation as arriving at a hypothesis, and how this potentially aligns humanistic work with scientific method; and how use of technologies can make some things very easy, but at the expense of making other things very hard.)
Also, I gave a talk at Goldsmiths Spring Review Week today. It's basically a chance for Ph.D students to get together and talk about what they've been doing all year. One interesting aspect of it is that it's Ph.D students from all departments, so you have to assume that your audience are non-expert. I spoke about "Computational Approaches to Scholarly Procedure in Musicology". (See, I told you everyone is thinking the same way as me!)
I went to the second Decoding Digital Humanities meeting last week. We read a paper suggested by me on procedural literacy: understanding the ideas behind computer programming, even if not being familiar with any specific language. As expected, a number of interesting points and criticisms were raised.
A suggestion was made that the paper tried to equate computer languages to human (or "natural", as computer scientists often call them) languages. I would argue that this wasn't intended by the author. He talks about communicating ideas of process through computer languages, but doesn't argue that they can be used for other kinds of communication.
This discussion did lead on to the question of whether this kind of thinking really just essentialises technology. It was argued that computer programming shouldn't been seen as a panacea for forcing focus. Although attempting to express ideas in formal languages undoubtedly does force the focusing of those ideas, requires their explicit and unambiguous expression in terms which the computer can understand, should programming be seen as the only method of doing this? It's probably the case that any writing exercise will force focus of ideas.
Also raised was the criticism against digital humanities of "problematising innovation". Perhaps arguments against DH are more arguments against unsettling the status quo? What can these changes possibly have to do with our discipline? Of course, the point was made that the current stratification of disciplines in universities is quite a modern construction and is most likely subject to continual change, whether the impetus be intellectual, geographic, or economic.
It was suggested that the argument in the paper may be a classic example of the problematic role of the critic: is it valid to criticise a practice in which you yourself are not skilled? This reminded me of one recent body of scholarship I've engaged with by Harry Collins, a sociologist of science. He describes the difference between "contributory expertise" and "interactional expertise". Contributors to a discipline are those who are trained in the practices of the discipline - who conduct experiments and generate new knowledge. But Collins also argues that there's a class of expertise he calls "interactional" in which the expert has engaged with other practitioners in the discipline to a sufficient extent that he can hold conversations with them and understands all the important principles.
Perhaps procedural literacy could be a class of interactional expertise, rather than a necessarily practical engagement?
Some time ago (probably mid-November 2009) I read a short article (Joseph Raben, Introducing Issues in Humanities Computing, DHQ 1:1 Spring 2007) in the first issue of Digital Humanities Quarterly which ends with a series of questions to be asked about humanities computing, its nature, outcomes, effects. I made a note to myself to answer these questions and have finally got round to having a go. Some I have no idea how to answer, some I can give a few opinions on, and some I know I need to say a lot more about.
- Can software development, rather than conventional research, serve as a step up the promotion ladder?
- So does software development count as a valid research output? This problem can be generalised to the concerns of practice-led research. Does the scholarly community accept work such as musical compositions, painting and sculpture, biographies, digital art, and fiction as valid research outputs? There are certainly structures in place which allow scholars up to and including doctoral level in arts and humanities areas to have evidence of their practice considered as part of their research. And disciplines which include engineering components such as computer science often produce doctoral theses which include substantial practical components. But beyond doctoral level the accepted product of research, in the arts and humanities at least, becomes homogeneous with the mode of its communication: journal articles, conference papers, book chapters, monographs. However, humanities computing stands at an interesting intersection between a humanist discipline and an engineering/science discipline. It's broad questions are likely humanistic (to make observations about the human condition based on evidence of human activity), but its methods may be more related to computer science (development and use of software). Which of these two components of the research (findings and methods) are the most publication worthy? My own opinion is increasingly that software is a means to capture and express procedure and that procedure in scholarship (as well as other areas) should be considered a valid object of study.
- Are there better ways to organize our information than the current search programs provide?
- How far should we trust simple information retrieval methods to tell us what is relevant and interesting? The idea of automatic relevance ranking based on keyword matching does seem a bit dry and inhuman, but it has certainly become commonplace. Computers are now relied upon to make judgements of similarity, and not just with text; there's a whole field of study which attempts to get computers to make judgements of musical similarity.
- How do we confront the trend toward English as a universal scholarly language in the face of objections, such as those from France? How far need we go in accommodating other world languages---Spanish, Russian, Chinese?
- How concerned should we be about the consequence of Web accessibility undermining the status of major research centers in or near metropolitan cities?
- I've used access grid, I regularly talk with colleagues in IRC channels, I use Skype and instant messaging tools and, of course, make regular use of email. I've also watched/listened to recorded lectures. But I'm not convinced that any of these things really replace the nuances of human communication which may be vital for serious discussion and networking.
- Has the availability of the Internet as a scholarly medium enhanced the academic status of women and
- Will humanists' dependence on computer-generated data lead to a scientistic search for objective and reproducible results?
- This, of course, assumes that humanists will become dependent on computer-generated data, and that they will interact with that data via computational means. I oppose this to mere digitisation, in which artifacts of scholarly interest (such as manuscripts, printed texts, paintings) are merely transcribed onto a digital medium and made more easily accessible; the mode of interaction with such digitised artifacts is often non-computational, it's just a more convenient way of looking at them. Genuine computational interaction with artifacts, on the other hand, may well call for new understandings amongst humanist scholars, and lead to new priorities and concerns in their research. I see two such potential major changes. Computational techniques may require that the tacit knowledge and implicit procedures that humanist scholars use become explicit and reproducible by being encoded and published in software, somewhat reminiscent of the necessarily pedantic detail used in "methods" sections of scientists' papers. The other change relates to humanists embracing the opposite of their typical close reading paradigm, adopting "distant reading" techniques. The question of what you can do with a million books requires that a scholar knows how to deal with the quantity of information contained in such a corpus. This includes learning to generate valid statistics and to draw legitimate conclusions from them. Whether or not any of this counts as objective is another matter.
- Can we learn anything about today's resistance to new technologies from studying the reactions in the Renaissance to the introduction of printing?
- Will digital libraries make today's libraries obsolete?
- I can't imagine using a card index over an OPAC (online public access catalogue), and having online access to journal literature is infinitely more convenient than browsing through dusty old archives in library basements. I'm also very keen on digitisation projects as a way of opening up access to important (and maybe also seemingly not to important) artifacts to scholars. Access to information and resources from your desktop is certainly a major advantage. But we will still require institutions which foster and make use of information expertise. Catalogues are only as good as the people who design and maintain them. These are certainly the domains of expertise of libraries and, I imagine, will continue to be so. There is also the question of serendipity in library browsing; simply scanning the shelves can sometimes turn up items which would probably never have been the subject of a "relevant" keyword search.
- Are the concepts and development of artificial intelligence relevant to humanistic scholarship?
- Why ask this question? Is it because artificial intelligences may take on the status of human agents whose thoughts and actions could be argued to be in the domain of interest of humanist scholars? Is it because artificial intelligences may be able to perform the same functions, make the same judgements and arguments as human humanists? Or even that the whole artificial intelligence project (and its wider context of enquiry into the nature of human cognition and intelligence) could be the subject of a humanities study?
I went to what amounts to a digital humanities pub meeting on Tuesday. It was called Decoding Digital Humanities, held at a pub near UCL, and organised by the new Centre for Digital Humanities at UCL. There must have been about 30 participants, though mainly from UCL.
They set some reading: Walter Benjamin Art in the Age of Mechanical Reproduction and the Wikipedia article on Digital Humanities. The Benjamin evoked discussions on aspects of digital humanities which I've never really considered seriously; the whole area of new media art, digital literature, interactive narrative. It seems I've always considered digital humanities (like its non-digital parent) an analytic discipline rather than a synthetic one.
The Wikipedia article, owing to its broadly expository nature, seeded a discussion on the definition of digital humanities. Like many of these discussions, I was left with the impression that DH has more to do with creating digitised versions of the kinds of artifacts that are of interest to humanist scholars, although there was also discussion of the changes that digital humanities may bring about not only to the traditional humanities, but also to computer science. It was suggested that computer scientists find working with humanist data sets interesting and challenging because of their fuzziness. This point also lead to an interesting question: what is the correct/a good technical term for describing this kind of qualitative data?
But the discussion never quite got to considering what the valid questions of digital humanities might be. What are the techniques that make digital humanities digital? Is digital humanities just computer-assisted humanities, easy and interactive access to publications, manuscripts and other artifacts? Or is there a new programme of scholarship which computational methods may make possible?
This blog is powered by ikiwiki.