Digital Hermeneutics. A New Approach to Wittgenstein’s Nachlass

Herbert Hrachovec, Dieter Köhler

Digital Hermeneutics. A New Approach to Wittgenstein’s Nachlass in: Wittgenstein and the Future of Philosophy. A Reassessment after 50 years. (hrsgg. von R. Haller und K. Puhl). Wien 2002. S. 151-159

The problems facing posthumous editions of literary or philosophical writings are fairly well understood. Decisions that are — in the case of ordinary practice — the prerequisite of actual authors are taken over by scholars guided by different obligations and interests. A common feature of such editions is, for example, their derivative character: they aim at supplementing and completing a deceased writer’s oeuvre. Occasionally controversies over editorial policies arise, but they do seldom touch upon matters of low-level, physical presentation like page formats or index layout. It does, in fact, sound odd to even mention such details in the present context. Yet, such concerns carry considerable weight when one turns to electronic publications of literary remains like Wittgenstein’s Nachlass.

The principle reason for the required change in attitude is the introduction of an additional technological layer on top of the time-honored print procedures. Even though the rendering of written/printed pages on a monitor is designed to implement as many familiar features as possible, the procedure of digital encoding of alpha-numeric tokens raises a number of hitherto unforseen problems and prospects. To mention but two, computer systems differ significantly in the use of resources mapping the letters of some alphabet onto screen representations; but, given such a mapping, their power of syntactic transformation far exceeds anything feasible in the world of printed books. Standardization as well as manipulative power are relevant for electronic scholarly editions that aim at producing cross-platform tools for digital research on a given corpus. This is not the place to discuss the various technical, commercial and ideological issues arising for computer-assisted philology. The present paper will rather restrict itself to sketch some salient ideas of the theory of text encoding and to present a modest hermeneutico-philological project built upon the most recent standards of text-based electronic content management.

Although we shall cover only a small sector of the field, some important prospects will present themselves. Work on the "Bergen Electronic Edition" (BEE) can be regarded as a case study trying to determine the course future collaborative digital scholarship is going to take. Much of this future development is bound to turn on the close relation between technical details of software construction and matters of scholarly content. Many conventions that used to be taken for granted in the age of print are not applicable to computer-assisted research or, at least, in need of thorough re-examination. Inquiring beneath the surface of the virtual pages produced by word processors is a first step to enhance the awareness of the digital procedures underlying screen renditions of manuscripts and type scripts. Such research will, furthermore, increase the ability of scholars to actively control their means of production in the digital age.

I.

The digital transcriptions underlying the BEE are based upon one particular markup language, called "Multi Element Code System" (MECS). MECS is a powerful instrument well adapted to the needs of sophisticated transcriptions following elaborated methods of philology. It contains features absent in other major markup languages, in particular the ability to straightforwardly encode overlapping textual features. The benefits of MECS are, unfortunately, somewhat diminished by the narrow scope of its application. Its main use are the Wittgenstein transcriptions and the only available software tools have been developed in the Bergen context. Outside the small community of Wittgenstein philologists MECS is barely known at all. One of the drawbacks of this situation is that projects based upon comparatively exotic data formats tend to be less attractive as well as more expensive since data standardization, software implementation and the training of end users require additional effort.

To understand the present situation one has to remind oneself that scholarly digitization was in its infancy even 10 years ago. At the time work on the digital transcriptions of the Wittgenstein Nachlass was initiated some standards for philological text encoding were available but none of them was firmly established. The most prominent proposal was for the adoption of the so-called "Standard Generalized Markup Language" (SGML) which was made an ISO standard in 1986 and served as the most advanced syntax for the encoding of meta-textual information until the late 90s. There were, however, some serious obstacles inhibiting its widespread adoption as a philological tool. Its complexities were difficult to handle and the implementation of conforming software proved to be hard and expensive. Extensive training of its users was required. Against this background to stick to an encoding format tailor-made for a problem at hand was a reasonable decision. But things have changed. Fortunately, a new data format, a simplified descendant of SGML, suitable for sophisticated digital transcriptions, has recently been established. Called "Extensible Markup Language" (XML) it became increasingly popular since its first appearance in early 1998. The goals of XML are described in the relevant specification as follows:

1. XML shall be straightforwardly usable over the Internet.
2. XML shall support a wide variety of applications.
3. XML shall be compatible with SGML.
4. It shall be easy to write programs which process XML documents.
5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
6. XML documents should be human-legible and reasonably clear.
7. The XML design should be prepared quickly.
8. The design of XML shall be formal and concise.
9. XML documents shall be easy to create.
10. Terseness in XML markup is of minimal importance.

Each XML document consists of at least two ingredients: "character data" representing the text content of a document and "markup" functioning as structural marker, linking mechanism, lexical annotation etc. The encoding scheme for character data used by XML is based on the "Unicode" standard, which currently defines code points for more than 50,000 characters. For future extensions Unicode still has room for over a million further characters. The Unicode Consortium established to develop, extend and promote the Unicode Standard includes many leading IT companies and organizations. All these characteristics turn Unicode into a promising candidate for a long term standard which is about to resolve, or at least greatly reduce, the current confusion and lack of interoperability caused by incompatible character encoding formats.

XML’s key issue is to capture the meta-information needed to process a given textual resource in a machine-readable way. This is achieved by a formal syntax which provides rules governing the construction of possible markup. A set of such rules is called a Document Type Definition (DTD), which contains for example the general conventions delimiting meta-information code or determining the hierarchical order of the envisaged textual structure as well as collateral data ("attributes") linked to particular nodes of the tree-like construct imposed onto texts by XML. It is important to notice that a DTD does not equip its "elements" with meaning, i.e. the setup of the structural "tree" does not imply one fixed interpretation for its constituents. What XML in general offers is a formal model to construct special purpose markup languages by laying down DTDs on the basis of which particular elements can be assigned a meaning by convention.

One crucial feature of XML is that it allows users to separate indications of the internal structure of a text from provisions regarding its layout. In that case, the XML document describing the content of a text uses only so-called "logical markup", which, for example, designates paragraphs, headlines, quotations, names, and units of measurement, but does not specify how to render these textual features. Instead, this task is left to so-called "style sheets", which contain instructions for a rendering engine or some other output module. This kind of separation makes it easy to apply different style sheets to the same XML document, as well as applying the same style sheet on different documents (which have to use the same set of markup elements). Within this framework adjusting the form of the output to the requirements of different devices, such as color screens or black ink on white paper, is made easy. Likewise, it facilitates the task of maintaining a consistent layout for individual documents in a corpus.

Using logical markup provides further advantages, such as context aware queries. It is also possible to merge portions of different documents matching a specific pattern of markup elements into one file, or to rearrange a text based on the content of a certain type of markup elements. Examples for the latter procedure include the sorting of the letters of a correspondence by date vs. sorting them by recipient. Such mergers or rearrangements of texts can be triggered by formal procedures, which simplifies automatic indexing, cataloging and standard queries on growing text corpora. Since the result usually is still an XML document, it can in turn be combined with other XML data. Thus XML recommends itself as a universal standard for meta-data (data about data) storage and data exchange.

Given these benefits of a newly available and nevertheless already widely accepted standard we took another look at the Bergen situation and proposed an XML-based approach to any further philological work on the archive’s sources. (This assessment was shared by the participants of a conference on future direction of Nachlass research in Bergen, December 2001.) This project takes the XML standard as a guideline for its work on the Wittgenstein transcriptions, enriching the available material with innovative features like hyperlinks and intertextual commentary supported by the digital apparatus. We aim at constructing a rich set of meta-data, and in particular at developing XML-DTDs and computer programs for describing the structure of the manuscripts, for maintaining commentaries and annotations, and for indicating cross-references and other intertextual relations.

We do not focus on the transcriptions of the Wittgenstein manuscripts themselves, since they are not available in XML at the moment. But even though we cannot speak for the Wittgenstein Archive it is, in view of the consensus reported, to be expected that future developments surrounding the BEE will likewise choose an XML approach. In this case a seamless integration of the Nachlass transcription and the meta-data produced with our tools will be possible, e.g. the restriction of a keyword search in the transcription to a set of paragraphs connected by meta-data cross-references. In the meantime our XML meta-data are linked to the Nachlass transcription by reference pointers that do not require a specific format of the documents they point to.

II.

The construction of the reference pointers just mentioned is perhaps the most innovative detail of our project. It demonstrates how deficiencies of current hypertext systems can be overcome by referring back to strategies that have been used to solve similar problems in traditional media. To approach the problem, let us investigate what it means to refer to a text passage.

One may characterize a page of a Wittgenstein manuscript and the traces of ink it displays as a material entity. But its textual content is of another kind. Let us call it an "abstract entity" in contrast. A transcription of the manuscript is mainly concerned with bringing out relevant features of this abstract entity and fitting it out in new clothes. Typical questions in this more abstract context would be whether something counts as a sign in contrast to an ink plot or which kind of sign it is and how the larger units of meaning in a text are to be constructed. Such questions call for a gradual transition from the physical evidence to the relatively abstract, from things about which consensus can be established more easily to the more controversial issues. In that sense it is usually acknowledged that an edition is a certain kind of interpretation, albeit a comparatively elementary one, providing the foundation for further, more specific and more questionable interpretations. This gradual progression from primary sources to secondary literature fits into the framework elaborated earlier since this procedure can also be described in terms of an increasing enrichment and structural modeling of data by meta-data.

Notice that such a process usually does not freeze into a steady and hierarchical state. It rather forms a holistic as well as dynamic web of readings influencing each other in various directions and often described as "a hermeneutical circle". A case in point are editorial decisions based on the meaning of a text. Detailed examinations of the "ink-spots" exhibited in a manuscript may, on the other hand, decide controversies about a passage’s correct interpretation. Due to the holistic nature of these couplings of texts, our understanding of one and the same passage may vary with modifications made to a text not even directly linked to the passage under investigation. For this reason it is, e.g. often very difficult to reconstruct the message contained in a private correspondence which were entirely clear to the original recipients.

This much is familiar from classical scholarship. In adapting these insights to the schematism of data and meta-data required by digital philology we are faced with a difficulty, though. We need connections between data and meta-data which are robust. But linking data with other data results in a web of relations which in our case is neither static nor necessarily consistent: New interpretations and new links can be added; interpretations usually differ. There was no immediate need to reflect on this dilemma when preparing an edition in classical scholarship since the holistic nature of its exertions was for all practical purposes confined to printed material, obviating the need to confront oneself with the dynamics of a global on-line communication setting.

With regard to robustness an open hypertext system like the World Wide Web introduces a number of problems for a project intending to use its resources: (i) Current internet links are quite inflexible. Basically they are directions on how to copy a sequence of bits, which is stored at a certain place, to another location. If a document is moved from one location to another, the links to that document are broken though it will itself still be available elsewhere on the internet. (ii) Which of several possible reference systems is best depends on the purpose one is pursuing. Normally there is no single solution equally appropriate to all major cases of use. Text-critical analyses will usually require references to manuscript pages. If, on the other hand, one analyzes a line of thought presented by the text, relying on chapters and paragraphs will in all likelihood be more appropriate. Moreover, the numbering of paragraphs may depend on whether an analysis is based on a diplomatic edition or on a "last hand" version which skips crossed out paragraphs and carries out changes of sequence.

To overcome problem (i), the inflexibility of hypertext links, we use an extended conception of linking analogous to traditional reference systems used since classical antiquity. As mentioned before, a text has to be treated as an abstract entity. In that sense a textual reference, e.g. "Wittgenstein: Ms 115, p. 12", does refer to a text, but not simply to the physical manuscript. The entity referred to exists in several forms: as an original, as a facsimile, as a diplomatic edition, etc., and this entity itself is distinguished from but nevertheless related to units such as comments on the given passage, quotations, and allusions. A textual reference to a primary source is a guide to resolve the various tokens of what we consider to be the same abstract text. In using traditional media it guides us when we consult a library catalog for the shelfmarks of possible editions; after managing to get a copy it helps us to find the page where the referred passage resides; if we are getting interested in finding related material, we can fall back on a well established infrastructure of subject catalogs, bibliographies, indices of cross-references, journal reviews, etc. These tools benefit from more or less sophisticated bibliographic standards, and together they carry the weight of frequent modifications necessitated by cooperative, interactive research. Following this example we are about to explore a framework for building an equally efficient infrastructure for net-based philology.

To implement the required references in an open hypertext system we make use of so-called "Universal Resource Names" (URNs). An URN is, roughly speaking, a character sequence which follows a specific syntax and is assigned a fixed meaning by a certain authoritative procedure. So we defined a system of URNs standing for the paragraphs of a Wittgenstein manuscript. Other URNs were used to characterize secondary sources as belonging to this-or-that type of text, for example as being of type "structure outline". The roles of the links between the documents were also characterized in terms of URNs, allowing us to differentiate for instance between direct or indirect quotations. This strategy can be expanded even further by constructing sets of relations or qualifying second order relations between relations, e.g. when considering the relations "facsimile of" and "transcription of" as being subclasses of the class "edition of".

The linking of documents by means of URNs is not static, but is performed dynamically. This is done by so-called software agents which, in their simplest form, are computer programs maintaining an index of URNs plus a list of the locations of instances of the designated documents. In other words: It is an index which resolves dynamic links into static ones by pointing to the location where the data to be displayed is to be found. More complex software agents can be constructed which interact with simple ones via the internet in order to search for newly available relevant documents or using artificial intelligence to find interesting relations between existing documents, so that distributed information about a specific topic can be identified by computers using URNs. A next step might consist in condensing the results into a dossier for further work. Since software agents traversing such a net of qualified cross references rely mainly on formal criteria, standard software for such tasks can be developed. This way the software we are using on Wittgenstein texts can easily be adapted for employment with other digitized texts.

Problem (ii), namely the difference of reference systems according to different epistemological agendas, can likewise be approached by means of URNs. For that purpose one must define, e.g. in a separate electronic document, a mapping between the reference systems in question. Given this correlation a software agent can automatically perform a great number of operations upon (a set of) parallel instances of text items.

III.

We have, up to now, mainly dealt with general issues of digital philology. How does this touch upon Wittgenstein scholarship in a narrower sense? In order to explore the possible interplay between the mechanisms indicated and a more traditional exegesis of Wittgenstein’s philosophical writings we are in the course of developing a prototype program that will employ the reference system described above for the purpose of a closer reading of manuscript 115, the HTML source of which has been made available by the Trustees. This enterprise is markedly different from conventional commentaries on Wittgenstein’s work that have been focussed on the Tractatus and the Investigations and made only occasional reference to the Nachlass material. It is, of course, entirely legitimate to assume that Wittgenstein edited his notes in view of an eventual book and that by following his reasoning laid out in almost ready-for-publication typescripts one gets closest to his doctrine. It has, however, often been pointed out that it is not by accident that the Investigations did not get published after all and that the very notion of a Wittgenstein "doctrine" misses some of the most important characteristics of his way of doing philosophy. Approaching the issue from the other end by taking a closer look at the collection of notes assembled in one of his working manuscripts, is an instructive supplementary strategy.

When examining Ms 115 one is put right into the middle of an argument concerning pictures that is continued from the previous volume, Ms 114. Tentative ideas are put down in paragraphs that are heavily re-worked (and occasionally crossed out in their entirety), with subsequent paragraphs picking up their line of thoughts. No immediate order is discernible in the 117 pages making up the first half of the volume. At this stage of his philosophical endeavor Wittgenstein is apparently struggeling to become clear about a number of interconnected issues: the nature of pictures, of consciousness, memory and representation in general. His philosophical motives and argumentative moves are frequently quite surprising, judged from his later, more mature outlook. In dealing with this material the opportunities afforded by the digital machinery are particularly helpful. Two basic reference schemes are introduced, providing initial orientation about the dynamics of Wittgenstein’s writing. One tree-structure allows users to call up any of the manuscript’s paragraphs in a seperate window, the second tree presents a preliminary structural analysis of the sequence of paragraphs, modeling Wittgenstein’s philosophical explorations according to an interpretation of the manuscript. The segments of text referred to by this analytical "table of contents" are, again, presented in a seperate window and can be cokpared with the paragraph sequence arranged in simple linear order. A running commentary, succinctly discussing the philosophical ideas developed by Wittgenstein, is included.

The initial version of the prototype program will consist of just those simple ingredients: segments of text, accessible in multiple windows, under the guidance of a dynamical content analysis which is accompanied by a fairly short exegesis of the underlying text. This is only a beginning, though. It is quite easy to provide a whole number of "views" onto the underlying collection of remarks and all of them can be elaborated upon according to the required context. Several scholars may cooperate and compete for the best fit of their analysis. And the scope of possible, modularized textual input is not restricted to one’s hard disk. The URNs we have been discussing allow the seamless integration of external sources: supplementary Wittgenstein manuscripts, alternative structures, additional information etc. We are, in fact, in the process of establishing a working group that is to test and further develop the program announced here. The software development, led by one of us (Dieter Köhler), is done in Delphi, a "rapid application development tool" from Borland Inc. The computer program is freely available, including source code, for various MS-Windows platforms. For the future, it is intended to additionally provide a version under the terms of the "GNU General Public License" (GPL) based on Kylix, the Delphi equivalent for LINUX.

We have given an account of some of the characteristic difficulties facing digital philological work and briefly introduced our ongoing project to design software supporting text critical and hermeneutic approaches to Wittgenstein’s Nachlass. The printed page does not support presentation of the prototype in action; we run against limits of the traditional medium that can only be overcome by actually installing and running the program, which you are welcome to download from http://wittgenstein.philo.at. Updates on the status of the project are also available from this site.

Computers help to analyze systematic, historical and philological questions which could not be dealt with until recently because of the overwhelming amount of sources, cross references and supplementary data. The most familar use of digital editions is—to the present day—their employment on local computers where they figure as cheap alternative for printed editions, offering comfortable searching facilities as an additional bonus. An approach using XML and URNs might, in the not too distant future, place digital editions into the center of a net based infrastructure of academic communication. Even though we are just beginning, needing a lot of further standards, tools and content, the more we get, the more synergetic power is likely to develop. One simple piece of information, when fed into the system, might interrelate with other data in new an interesting ways and provide insights, unavailable before.

Herbert Hrachovec, Philosophie

Fakten und Faibles

Digital Hermeneutics. A New Approach to Wittgenstein’s Nachlass