The Association for Computers and the Humanities (ACH) The Association for Computational Linguistics (ACL) The Association for Literary and Linguistic Computing (ALLC) Guidelines for Electronic Text Encoding and Interchange Edited by C. M. Sperberg-McQueen and Lou Burnard TEI P3 Text Encoding Initiative Chicago, Oxford Copyright (c) 1990, 1992, 1993, 1994 ACH, ACL, ALLC 16 May 1994 Revised Reprint, Oxford, May 1999

In memoriam Donald E. Walker 22 November 1928 - 26 November 1993

Introductory Note (May 1999)

No work of the size and complexity of the TEI Guidelines could reasonably be expected to be error-free on publication, nor to remain long uncorrected. It has however taken rather longer than might have been anticipated to complete production of the present corrected reprint of the first edition, for which we present our apologies, both to the many individuals and institutions whose enthusiastic adoption and promotion of the TEI encoding scheme have ensured its continued survival in the rapidly changing world of digital scholarship, and also to the many helpfully critical users whose assiduous uncovering and reporting of our errors have made possible the present revision.

At its first meeting in Bergen, in June 1996, the TEI Technical Review Committee (TRC) approved the setting up of a small working committee to oversee the production of a revised edition of the TEI Guidelines, to include corrections of as many as possible of the `corrigible errors' notified to the editors since publication of the first edition in May 1994, the bulk of which are summarized in a TEI working paper (TEI EDW67, available from the TEI website).

During the spring of 1997, this TRC Core Subcommittee reviewed nearly 200 comments and proposals which the editors had collected from public debate and discussion over the preceding two years, and provided invaluable technical guidance in disposition of them. We are glad to take this opportunity of expressing our thanks to this subcommittee, whose members were Elli Mylonas, Dominic Dunlop, and David T. Barnard.

The work of making the corrections and regenerating the text proceeded rather fitfully during 1998 and 1999, largely because of increasing demands on the editors' time from their other responsibilities. With the establishment of the new TEI Consortium, it is be hoped that maintenance of the Guidelines will be placed on a more secure footing. Some specific areas in which we anticipate future revisions being carried out are listed below.

Typographic corrections made examples of TEI markup throughout the text were all checked against the relevant DTD fragment and an embarassingly large number of tagging errors corrected; various minor typographic and spelling errors were corrected; the corrigible errors listed in working paper TEI EDW67 were all corrected: some of these required specific changes to the DTD which are listed in the next section. Specific changes in the DTD

A major goal of this revision was to avoid changes which might invalidate existing data, even where existing constructs seemed erroneous in retrospect. To that end, wherever changes have been made in content models for existing elements, they have as far as possible been made so that the DTD will now accept a superset of what was previously legal. Only one new element (ab) has been added.

Where possible, a few content models have been changed in such a way as to facilitate conversion to XML, but XML compatibility is nota goal of this revision.

Brief details of all changes made in the DTD follow:

Several changes were made in class membership, in order to correct unreachability problems. Specifically: elements geogName, persName, placeNamewere added to the m.data class; geogNameand placeNamewere removed from the m.placepart class. the elements addSpan, delSpan, gap, were added to the m.Edit class; a new class m.editIncl was defined, with members addSpan, delSpan, and gap; this class was then added to the global inclusion class m.globIncl along with anchor (erroneously a member of the m.Seg class, from which it is now removed), m.metadata and m.refsys. added name element to m.addrPart class; added dateLine to m.divtop and m.divbot classes; added epilogue and castList to m.dramafront class; added divGen to m.front class; added dateLine to m.divtop and m.divtop classes; added u element to a.declaring class; defined new class m.fmchunk (front matter chunk), comprising argument, byline, docAuthor, docDate, docEdition, docImprint, docTitle, epigraph, head, and titlePart for use in simplification of the content model for front element; defined new element ab (anonymous block), and added it to the m.chunk class; corrected an error whereby global attributes were not properly defined for elements specifying a non-default value for any of the a.global attributes: elements affected include: foreign, hi, del, pb, lb, cb, language, anchor, and when; changed content models to permit empty list and empty availabilityelements; changed content model for series element to permit #PCDATA; changed content model for setting element to permit date element as a direct child; added a key attribute to the distanceelement, for consistency with other elements in its class; changed content model for orgName element to make it more consistent with e.g. persname; changed content model for opener element to include argument, byLine, and epigraph; change content models for app, rdgGrp, and witelements; revised attributes on handelement. Finally, a number of content models were changed with a view to easing the creation of an XML compatible version of the Guidelines. Specifically: removed ampersand connectors from cit, respStmt, publicationStmt, and graph; changed the mixed content models for sense, re, persName, placeName, geogName, dateStruct, timeStruct, and dateLine to make them XML conformant.
Outstanding errors

A small number of other known problems remain uncorrected in this version and are briefly listed below. Please watch the TEI mailing list for announcements of their correction.

elements of class inter don't always behave as they should (e.g. one cannot insert a tablebefore anything else in a div); some mixed-content problems consequent on the definition of specialParaneed to be addressed systematically; in particular, the treatment of list items or notes which contain several paragraphs continues to surprise many users: no white space is allowed between the paragraphs; the respattributes on editorial elements are not consistently defined; the discussions of DTD invocation, and the DTD itself, all use system identifiers instead of formal public identifiers.

Our next priority however will be the production of a fully XML-compliant version of the TEI DTD, work on which is already well advanced.

C.M. Sperberg McQueen and Lou Burnard, May 1999
Note

These Guidelines are the result of over five years' effort by members of the research and academic community within the framework of an international cooperative project called the Text Encoding Initiative (TEI), established in 1987 under the joint sponsorship of the Association for Computers and the Humanities, the Association for Computational Linguistics, and the Association for Literary and Linguistic Computing.

The impetus for the project came from the humanities computing community, which sought a common encoding scheme for complex textual structures in order to reduce the diversity of existing encoding practices, simplify processing by machine, and encourage the sharing of electronic texts. It soon became apparent that a sufficiently flexible scheme could provide solutions for text encoding problems generally. The scope of the TEI was therefore broadened to meet the varied encoding requirements of any discipline or application. Thus, the TEI became the only systematized attempt to develop a fully general text encoding model and set of encoding conventions based upon it, suitable for processing and analysis of any type of text, in any language, and intended to serve the increasing range of existing (and potential) applications and use.

What is published here is a major milestone in this effort. It provides a single, coherent framework for all kinds of text encoding which is hardware-, software- and application-independent. Within this framework, it specifies encoding conventions for a number of key text types and features. The ongoing work of the TEI is to extend the scheme presented here to cover additional text types and features, as well as to continue to refine its encoding recommendations on the basis of extensive experience with their actual application and use.

We therefore offer these Guidelines to the user community for use in the same spirit of active collaboration and cooperation with which they have so far been developed. The TEI is committed to actively supporting the wide-spread and large-scale use of the Guidelines which, with the publication of this volume, is now for the first time possible. In addition, we anticipate that users of the TEI Guidelines will in some instances adapt and extend them as necessary to suit particular needs; we invite such users to engage in the further development of the Guidelines by working with us as they do so.

Like any standard which is actually used, these Guidelines do not represent a static finished work, but rather one which will evolve over time with the active involvement of its community of users. We invite and encourage the participation of the the user community in this process, in order to ensure that the TEI Guidelines become and remain useful in all sorts of work with machine-readable texts.

This document was made possible in part by financial support from the U.S. National Endowment for the Humanities, an independent federal agency; Directorate General XIII of the Commission of the European Communities; the Andrew W. Mellon Foundation; and the Social Science and Humanities Research Council of Canada. Direct and indirect support has also been received from the University of Illinois at Chicago, the Oxford University Computing Services, the University of Arizona, the University of Oslo and Queen's University (Kingston, Ont.), and Ohio State University.

The production of this document has been greatly facilitated by the willingness of many software vendors to provide us with evaluation versions of their products. Most parts of this text have been processed at some time by almost every currently available SGML-aware software system. In particular, we gratefully acknowledge the assistance of the following vendors: Berger-Levrault AIS s.a. (for Balise). E2S n.v. (for E2S Advanced SGML Editor) Electronic Book Technology (for DynaText), SEMA Group and Yard Software (for Mark-It and Write-It), Software Exoterica (for CheckMark and Xtran), SoftQuad, Inc., (for Author/Editor and RulesBuilder), WordPerfect Corporation (for Intellitag) Xerox Corporation (for Ventura Publisher)

Details of the software actually used to produce the current document are given in the colophon at the end of the work. Acknowledgments

Many people have given of their time, energy, expertise, and support in the creation of this document; it is unfortunately not possible to thank them all adequately. Below are listed those who have served as formal members of the TEI's Work Groups and Working Committees during its six-year history; others not so officially enfranchised also contributed much to the quality of the result.

The editors take this opportunity to acknowledge our debt to those who have patiently endured and corrected our misunderstandings of their work; we hope that they will feel the wait has not been in vain. For any errors and inconsistencies remaining, we must accept responsibility; any virtue in what is here presented, we gladly ascribe to the energies of the keen intellects listed below.

C. M. Sperberg McQueen and Lou Burnard TEI Working Committees (1990-1993) Not all members listed were able to serve throughout the development of the Guidelines.

Committee on Text Documentation:

Chair: Dominik Wujastyk (Wellcome Institute for the History of Medicine)

Members 1990-1992: J. D. Byrum (Library of Congress); Marianne Gaunt (Rutgers University); Richard Giordano (Manchester University); Barbara Ann Kipfer (Independent Consultant); Hans Jørgen Marker (Danish Data Archive, Odense); Marcia Taylor (University of Essex);

Committee on Text Representation

Chair: Stig Johansson (University of Oslo)

Members 1990-1992: Roberto Cencioni (Commission of the European Communities); David R. Chesnutt (University of South Carolina); Robin C. Cover (Dallas Theological Seminary); Steven J. DeRose (Electronic Book Technology Inc); David G. Durand (Boston University); Susan M. Hockey (Oxford University Computing Service); Claus Huitfeldt (University of Bergen); Francisco Marcos-Marin (University Madrid); Elli Mylonas (Harvard University); Wilhelm Ott (University of Tübingen); Allen H. Renear (Brown University); Manfred Thaller (Max-Planck-Institut für Geschichte, Göttingen)

Committee on Text Analysis and Interpretation

Chair: D. Terence Langendoen (University of Arizona)

Members 1990-1992: Robert Amsler (Bell Communications Research); Stephen Anderson (Johns Hopkins University); Branimir Boguraev (IBM T. J. Watson Research Center); Nicoletta Calzolari (University of Pisa); Robert Ingria (Bolt Beranek Newman Inc); Winfried Lenders (University of Bonn); Mitch Marcus (University of Pennsylvania); Nelleke Oostdijk (University of Nijmegen); William Poser (Stanford University); Beatrice Santorini (University of Pennsylvania); Gary Simons (Summer Institute of Linguistics); Antonio Zampolli, University of Pisa.

Committee on Metalanguage and Syntax

Chair: David T. Barnard (Queen's University); David G. Durand (Boston University); Jean-Pierre Gaspart (Associated Consultants and Software Engineers sa/nv); Nancy M. Ide (Vassar College); Lynne A. Price (Software Exoterica / Xerox PARC); Frank Tompa (University of Waterloo); Giovanni Battista Varile (Commission of the European Communities).

In addition, the two TEI editors served ex officio on each committee.

Following publication of the first draft of the TEI Guidelines (P1) in November 1990, a number of specialist work groups were charged with responsibility for drafting revisions and extensions, which, together with material already presented in P1, constitute the basis of the present work.

In addition, many members of the work groups listed below met on three occasions to review the emerging proposals in detail as members of the TEI Technical Review Committee. These meetings, held in Myrdal Norway (December 1991), Chicago (June 1992) and Oxford (March 1993), were largely responsible for the technical content and organization of the present work. Attendants at these meetings are starred in the list below. Chair: Harry Gaylord* (University of Groningen); Syun Tutiya* (Chiba University). Advisory Board

Members of the TEI Advisory Board during the life time of the project are listed below, grouped under the name of the organization represented. Steering Committee Membership

Members of the Steering Committee of the TEI during the preparation of this work were: 1987-1993: Robert A. Amsler (Bell Communications Research); 1987-1993: Donald E. Walker (Bell Communications Research); 1993- : Susan Armstrong-Warwick (University of Geneva); 1994- : Judith Klavans (Columbia University). 1987- : Nancy M. Ide (Vassar College); 1987-1994: C. M. Sperberg-McQueen (University of Illinois at Chicago); 1994- : David Barnard (Queen's University). 1987- : Susan M. Hockey (Center for Electronic Texts in the Humanities); 1987- : Antonio Zampolli (University of Pisa). Changes from TEI P1 to TEI P3

This list gives a partial indication of the major changes from Versions 1 and 2 of these Guidelines (issued by the TEI as drafts under the document numbers TEI P1 and TEI P2, the latter released in chapters between March 1992 and the end of 1993) to the current text.

Chapter : this chapter corresponds to chapter 1 of TEI P1 and TEI P2; it has been reorganized, revised, and expanded, and a new section explaining the notational conventions of this document has been added.

Chapter : this is a slightly revised version of chapter 2 of TEI P1 and P2. Brief discussions of parameter entities and marked sections have been added, but no other changes of substance have been made.

Chapter : this chapter was introduced in TEI P2; the lists of classes and important parameter entities have been updated, and some declarations have been reordered; no other changes have been made.

Chapter : this chapter corresponds to some material in chapter 3 of TEI P1, but presents it in what is hoped to be a more accessible form. No substantive changes have been made since its publication as part of TEI P2.

Chapter : this is a revised and much expanded version of chapter 4 of TEI P1. The overall structure of the TEI header has been retained, but most of the elements have been renamed to match a new set of naming conventions. The encoding.declarations element of TEI P1 has been split into the encodingDesc and profileDesc elements, the former concentrating on the process by which the electronic text has been encoded, the latter on the non-bibliographic characteristics of the text itself. A number of specialized declarations have been added to both these sections of the header, in order to allow the formal specification of important information about the text and its encoding.

Chapter : this chapter corresponds to sections 5.3 to 5.6, portions of 5.7, and 5.8 of the first public draft of these Guidelines (TEI P1). Changes made to this material in this version include: The individual sections have been reordered and reorganized. Highlighting and quotation marks are treated together. The tags for names and dates have been revised, and a separate additional tag set has been provided for detailed analysis of names and dates (chapter ). The tags for simple editorial interventions have been revised; the new set includes several complementary pairs of elements, so that the encoder is consistently given the choice of recording the original text, or an editorial modification of it, as data content, and the other as an optional attribute value. The tags for bibliographic references have been renamed (from citn to bibl, etc.) and a new form (biblFull), corresponding to the structure of the TEI header, has been added. The treatment of canonical reference systems has been thoroughly revised and the discussion is now supplemented by discussions in chapter , and chapter .

Chapter : this chapter corresponds to section 5.2 of TEI P1. Changes made to this material in this version include: The theoretical discussion of alternative methods of constructing a tag set for overall text structure has been suppressed. The tags for elements of a title-page have been renamed. The specialized tags for divisions of front matter and back matter (foreword, acknowledgements, dedication, colophon, etc.) have been deleted; like those of the text body, these elements may be tagged with generic div elements. In addition to numbered div elements, the current draft also allows for un-numbered generic divs. The treatment of collections and anthologies is explicitly discussed, building on section 7.2 of TEI P1, and the group element is introduced to deal with them.

Chapter , chapter , and chapter : these correspond to the subparts of section 7.3 of TEI P1, but have been completely redesigned and rewritten from scratch.

Chapter : this chapter first appeared in TEI P2; it has been revised here to match changes in the overall design of the Guidelines since its publication. Most importantly, this tag set now uses the default text-structure elements described in chapter , and the methods for handling overlap and other time-specific information have been revised to make use of the techniques described in chapter .

Chapter : the tag set presented in this chapter is a complete revision of that described in section 7.4 of TEI P1, and the chapter itself was entirely rewritten from scratch.

Chapter : this chapter was first published in December, 1993, as part of TEI P2. Since that publication, it has been revised slightly for the sake of consistency with the rest of the Guidelines and with the work of Technical Committee 37 of the International Organization for Standardization (ISO) on ISO DIS 12 200.

Chapter : this chapter corresponds to sections 5.7 (Links and Cross References) and sections 6.2.3 through 6.2.5 (Alignment of Multiple Analyses, etc.) of TEI P1. Changes made to this material in this version include: The xref element of P1 has been split into the two elements ptr and xptr, of which the former is used to point at IDs within the document and the latter for pointing outside the document or for pointing at passages without IDs in the current document. The elements ref and xref have been added to provide pointer elements which can accept character content, for cases in which the pointing phrase of the source text cannot be reconstructed algorithmically. The extended pointer syntax has been substantially revised and systematized; the syntax and semantics of extended pointers have been defined more precisely. The unit and level elements defined by P1 for implicit alignment of multiple levels of analysis have been dropped; in their stead, the revised feature structure elements should be used. These are defined in chapter . The elements alignment, al.map, al.ptr, al.list, al.range, defined in P1 for explicit alignment of multiple texts or analyses, have been replaced by the link, linkGrp, corresp and correspGrp elements. Because the link and corresp elements can point at multiple targets, there is no need for special alignment pointers, alignment lists, or alignment range elements. The elements link and correspond may be used in connection with xptr to align elements in external entities or passages which do not bear SGML identifiers. Since the publication of this chapter in TEI P2, it has been revised, the section on alternation has been added, new examples have been introduced, and the extended pointer syntax has been revised. The extended pointer syntax is now also used to specify canonical reference systems, as well as in the xptr and xref elements.

Chapter : the bulk of this chapter is new, though some parts of its substance derive from chapter 6 of TEI P1. The global ana and inst attributes have been added, to simplify the notation for simple forms of alignment between text and analyses; the elements span and interp have been introduced, to simplify the specification of analyses which do not require the structural rigor of feature structures.

Chapter : this chapter derives from sections 6.2.1 and 6.3 of TEI P1. In its broad outlines, the feature structure notation introduced there is retained. The most important changes include these: Some elements have been renamed. Feature names and feature structure names are now represented as attribute values, rather than as embedded subelements. The treatment of Boolean logic has been substantially changed.

Chapter : this chapter was introduced in TEI P2; its wording has been revised slightly since then, but the tags described remain the same.

Chapter : this chapter presents new material on the use of the core tags for editorial intervention, and on specialized problems in the transcription of primary source material, especially manuscripts.

Chapter : this chapter is a substantial revision of section 5.10 (Critical Apparatus) of TEI P1. The major changes include the following: The single end-point attachment method of encoding critical apparatus has been dropped. A new method of apparatus encoding, the location-referenced method, has been introduced to simplify the transcription of existing critical editions. The problem of subvariation is treated more explicitly. The witList element has been introduced for the purpose of identifying all the witnesses whose readings are recorded in the apparatus. The treatment of detailed information about a particular reading in a particular witness (in the witDetail element) has been changed somewhat.

Chapter : this chapter is new in this version.

Chapter : this chapter replaces section 5.9 of TEI P1; it provides small but usable tag set for tables, and describes in much more detail the process of including graphical information (figures, illustrations, etc.) in TEI-encoded texts.

Chapter : this chapter builds on section 7.2 of TEI P1, but its contents are largely new. The tag set described here provides much fuller methods for documenting text type, subject area, and demographic characteristics of speakers, listeners, authors, etc. associated with the texts of a corpus.

Chapter : this chapter was introduced in TEI P2; it has been slightly revised since.

Chapter : this chapter derives from the writing system declaration described in chapter 3 (Characters and Character Sets) of TEI P1. The structure of the WSD has been changed slightly, and the chapter now gives an explicit account of the semantics of specifying base character sets, entity sets, or WSDs, and of modifying them using the exceptions element.

Chapter : this chapter is new in this version of these Guidelines.

Chapter : this chapter first appeared in TEI P2; it has not changed substantially since.

Chapter , chapter , chapter , chapter , and chapter : these chapters are all new in the current version of these Guidelines (though the mechanisms of modifying the TEI DTDs described in chapter remain the same as those described in chapter 8 of TEI P1). The definition of conformance provided in this version of these Guidelines differs from that of TEI P1 primarily in making more explicit the nature of the requirement that extensions to the tag set be documented, in specifying the nature of the DTD modifications allowed in TEI-conformant documents, and in completely divorcing the issue of TEI-conformance from that of the character sets used in the document.

The alphabetical reference list of classes, entities, and elements was introduced in TEI P2; in this version, slightly fuller information is given. For element classes, lists of members are given which include members of all subclasses, and the declarations of the a-dot and m-dot parameter entities for the class are reproduced. The files in which entities and elements are declared are also given.

Chapter : this chapter appeared in TEI P2 and has not been revised for this version of these Guidelines.