Tag Archives: text encoding

Recent NEH/DFG Digital Humanities Awards and the Future of Autonomous Projects

The NEH (National Endowment for the Humanities) and DFG (Deutsche Forschungsgemeinschaft) have announced another round of awards for their Bilateral Digital Humanities Program. The program provides support for projects that contribute to developing and implementing digital infrastructures and services for humanities research. They are awarded to collaborative projects between at least one partner based in the U.S. and one partner based in Germany.

This round’s awardees were largely focused on digitization projects, especially text encoding, which seem to be indicative of the general field of digital humanities, especially those concerned with “ancient” languages and literatures. The goal of such projects is to create innovative (and hopefully better) ways to present texts in digital format. Part of the innovation is the ability to consider diachronic aspects of literature, especially variant traditions of ancient literature and critical work associated with the text in question. Additionally, these projects provide ready access to literature that had been previously limited to few (and generally quite expensive) volumes from a small group of publishers. The well-known and oft-mentioned Perseus Digital Library and the much less well-known Comprehensive Aramaic Lexicon Project provide numerous examples of the benefits of such projects. I have used a number of similar projects including these two mentioned here during my young academic career, and I can attest to their great benefits.

There are, however, a few drawbacks that seem to accompany these projects. The most central recurring caveat to these programs that I have experienced is the development of the projects seems to stop when the grant funding runs out. While it is certainly understandable why projects cannot continue to develop without funding, this problem is largely the result of the fact that these projects seem to often stand on their own, meaning they are not part of a larger collection to which they contribute. This autonomy creates an environment where the innovative technology developed by each of the individual projects seems to stagnate with the project itself. The arrested development of these individual projects creates a considerable disparity between autonomous projects—especially those that focus on relatively obscure content—and projects that are either paid applications (e.g. Accordance Bible Software) or are developed in collaboration with large tech companies (e.g. Dead Sea Scrolls Digital Project a collaborative effort between the Israel Museum and Google). I am not criticizing these latter projects. On the contrary, I have used both of these example programs with great relish. Rather, I am lamenting the stagnation of many autonomous projects whose subject matter might be more obscure (relatively speaking, of course), but is vital for a number of scholars’ research.

As the process of text encoding becomes more standardized, it would be interesting to see the development of a digital library that could incorporate these autonomous projects into one central location. This may allow for the continued development of autonomous projects whose dwindling funding limits the participation of its original developers. To be sure, there are obstacles to such grand collaborative work, and, ironically, this sort of project may need to begin as an autonomous project. However, the recent launch of the Digital Public Library of America provides a substantial step toward the further development of a central digital library of various digital materials, and may itself be the very project I would like to see.

I congratulate the program awardees, and very much look forward to experiencing the results of their projects.

Introduction to Contextual TEI, Day 1

Cross-posted from my own site.

IMG_4759.JPGI’m here in Providence (can’t you see where I am?) for a three-day workshop at Brown on contextual encoding with TEI, run by the Women Writer’s Project, and led by Julia Flanders and Syd Bauman. One of the first things I did when getting on board with digital humanities was to take part in the first iteration of THATCamp New England in 2010, and I’m glad I didn’t really have any idea who I was there with, or I would have been horribly intimidated instead of just self-conscious. One of the other attendees was Julia Flanders, and among other things she leads the Women Writers Project at Brown. What I learned about the WWP at THATCamp was impressive, but I have since learned (tonight, if you must know) is that it is a self-sustaining project residing at Brown. As I also know more about sustaining university-level projects than I used to, I am even more impressed. However, I have also built up my knowledge of and abilities in digital humanities, so I’m also more ready to approach problems with what I would consider, were I leading a workshop such as this, an appropriate level.

It’s a strange situation for me, as I have worked with text encodings in one way or another since some time in the mid-90s when I was in publishing and worked with Quark XPress, though I didn’t entirely know at that time what I was doing and certainly didn’t know about the global history of text encoding, let alone SGML, TEI, and XML. In my second stab at making it in the publishing world, I learned a bit more about that space. While at HarperCollins in the late 90s, we used an in-house encoding system that we called Text Markup System, though we also were phasing it out when I was laid off. Even so, I never really associated my work in TMS with a larger world of text encoding, not even with the HTML that I was teaching myself on the side. Extend that situation roughly through the next several years, and you’ll see that while I understand a lot of the basics of markup and even have paid attention to some of the questions posed about TEI and to limitations suggested from some critics, I still have a lot to learn during this workshop.

Today was, I expect, the strongest showing for the awkwardness, as there was a good deal of scene-setting. We went through general notions of why we encode research objects and the basics of XML in the morning, then got in to the basics of TEI in the early afternoon, with enough time in the later afternoon to work on our own documents. My attention was frequently consciously divided, as much of the presentation was known material for me. Since I don’t have a research project per se (that is, my text-based research projects are whatever faculty or students bring to me), I needed to choose a work that would be appropriate for a workshop on contextual encoding. With some advice from Yale post-doc Natalia Cecire, I settled on Émile Zola‘s Le Ventre de Paris, and I haven’t regretted it. Among her many other helpful suggestions was Jean Toomer‘s Cane novel with the benefit of having some site-specificity in the good ol’ US of A as well as that of having multiple text formats for juicy encoding goodness. However, what I might call my research interests include continually examining digital humanities tools, practices, and constructions from a multilingual or plurilingual point of view, so I went with the Zola and grabbed the textfile from Project Gutenberg. My recollection is of having read it years and years ago, but I can’t recall with any further precision, so this process is also about getting reacquainted with this story.

After discovering and then applying Matthew Jockers’ Python text-to-TEI formatter for Gutenberg content (I knew learning some Python would come in handy one day!), I dumped our friend Émile into le ventre de oXygen and spent some time figuring out what I care about in this text and how to encode it. Since we are dealing in context, I decided to start with marking up all specified locations and all people. So far, I’ve been able to geocode everything I’ve found, but I’m still at a fairly generic and introductory point in the text. Even so, while I say I’ve been able to geocode what I’ve found, it hasn’t been entirely straightforward how to then encode it. For instance, there’s an early mention of the Pont de Neuilly. Reading a little too closely, which is not to say doing a close reading of course, I wasn’t sure whether it was the bridge of the same name currently located in Neuilly-sur-Seine or some other one that may have been eliminated. Even so, it wasn’t so simple to reference with a GeoNames page as was Paris. The latter got a placeName element and a ref attribute with a GeoNames page URI, but for the former I had to bludgeon GeoNames into giving me an OpenStreetMap page based on the lat-long. I played around with using something different for the rue de Longchamp, ending up with a nesting of place, location, and geo with location having a sibling of placeName that contained “rue de Longchamp.” In a very small way, it’s an editorial decision to assert that Zola meant the intersection of rue de Longchamp and Avenue Charles de Gaulle, not least because Zola never met Charles de Gaulle. But that’s what I’m hoping to get deeper with over these three days — these editorial decisions, how they can be made manifest as a result of the encoding choices, and how they can prove useful in scholarship for Yale researchers and student-researchers.