Tag Archives: wwp

Introduction to Contextual TEI, Day 1

Cross-posted from my own site.

IMG_4759.JPGI’m here in Providence (can’t you see where I am?) for a three-day workshop at Brown on contextual encoding with TEI, run by the Women Writer’s Project, and led by Julia Flanders and Syd Bauman. One of the first things I did when getting on board with digital humanities was to take part in the first iteration of THATCamp New England in 2010, and I’m glad I didn’t really have any idea who I was there with, or I would have been horribly intimidated instead of just self-conscious. One of the other attendees was Julia Flanders, and among other things she leads the Women Writers Project at Brown. What I learned about the WWP at THATCamp was impressive, but I have since learned (tonight, if you must know) is that it is a self-sustaining project residing at Brown. As I also know more about sustaining university-level projects than I used to, I am even more impressed. However, I have also built up my knowledge of and abilities in digital humanities, so I’m also more ready to approach problems with what I would consider, were I leading a workshop such as this, an appropriate level.

It’s a strange situation for me, as I have worked with text encodings in one way or another since some time in the mid-90s when I was in publishing and worked with Quark XPress, though I didn’t entirely know at that time what I was doing and certainly didn’t know about the global history of text encoding, let alone SGML, TEI, and XML. In my second stab at making it in the publishing world, I learned a bit more about that space. While at HarperCollins in the late 90s, we used an in-house encoding system that we called Text Markup System, though we also were phasing it out when I was laid off. Even so, I never really associated my work in TMS with a larger world of text encoding, not even with the HTML that I was teaching myself on the side. Extend that situation roughly through the next several years, and you’ll see that while I understand a lot of the basics of markup and even have paid attention to some of the questions posed about TEI and to limitations suggested from some critics, I still have a lot to learn during this workshop.

Today was, I expect, the strongest showing for the awkwardness, as there was a good deal of scene-setting. We went through general notions of why we encode research objects and the basics of XML in the morning, then got in to the basics of TEI in the early afternoon, with enough time in the later afternoon to work on our own documents. My attention was frequently consciously divided, as much of the presentation was known material for me. Since I don’t have a research project per se (that is, my text-based research projects are whatever faculty or students bring to me), I needed to choose a work that would be appropriate for a workshop on contextual encoding. With some advice from Yale post-doc Natalia Cecire, I settled on Émile Zola‘s Le Ventre de Paris, and I haven’t regretted it. Among her many other helpful suggestions was Jean Toomer‘s Cane novel with the benefit of having some site-specificity in the good ol’ US of A as well as that of having multiple text formats for juicy encoding goodness. However, what I might call my research interests include continually examining digital humanities tools, practices, and constructions from a multilingual or plurilingual point of view, so I went with the Zola and grabbed the textfile from Project Gutenberg. My recollection is of having read it years and years ago, but I can’t recall with any further precision, so this process is also about getting reacquainted with this story.

After discovering and then applying Matthew Jockers’ Python text-to-TEI formatter for Gutenberg content (I knew learning some Python would come in handy one day!), I dumped our friend Émile into le ventre de oXygen and spent some time figuring out what I care about in this text and how to encode it. Since we are dealing in context, I decided to start with marking up all specified locations and all people. So far, I’ve been able to geocode everything I’ve found, but I’m still at a fairly generic and introductory point in the text. Even so, while I say I’ve been able to geocode what I’ve found, it hasn’t been entirely straightforward how to then encode it. For instance, there’s an early mention of the Pont de Neuilly. Reading a little too closely, which is not to say doing a close reading of course, I wasn’t sure whether it was the bridge of the same name currently located in Neuilly-sur-Seine or some other one that may have been eliminated. Even so, it wasn’t so simple to reference with a GeoNames page as was Paris. The latter got a placeName element and a ref attribute with a GeoNames page URI, but for the former I had to bludgeon GeoNames into giving me an OpenStreetMap page based on the lat-long. I played around with using something different for the rue de Longchamp, ending up with a nesting of place, location, and geo with location having a sibling of placeName that contained “rue de Longchamp.” In a very small way, it’s an editorial decision to assert that Zola meant the intersection of rue de Longchamp and Avenue Charles de Gaulle, not least because Zola never met Charles de Gaulle. But that’s what I’m hoping to get deeper with over these three days — these editorial decisions, how they can be made manifest as a result of the encoding choices, and how they can prove useful in scholarship for Yale researchers and student-researchers.