The Association for Computers and the Humanities (ACH)
The Association for Computational Linguistics (ACL)
The Association for Literary and Linguistic Computing (ALLC)
Guidelines for Electronic
Text Encoding and Interchange
Edited by C. M. Sperberg-McQueen
and Lou BurnardTEI P3
Text Encoding Initiative
Chicago, Oxford
Copyright (c) 1990, 1992, 1993, 1994 ACH, ACL, ALLC
16 May 1994Revised Reprint, Oxford, May 1999
In memoriam
Donald E. Walker
22 November 1928 - 26 November 1993
Introductory Note (May 1999)
No work of the size and complexity of the TEI
Guidelines could reasonably be expected to be error-free on
publication, nor to remain long uncorrected. It has however taken rather longer
than might have been anticipated to complete production of the present
corrected reprint of the first edition, for which we present our apologies,
both to the many individuals and institutions whose enthusiastic adoption and
promotion of the TEI encoding scheme have ensured its continued survival in the
rapidly changing world of digital scholarship, and also to the many helpfully
critical users whose assiduous uncovering and reporting of our errors have made
possible the present revision.
At its first meeting in Bergen, in June 1996, the TEI Technical Review
Committee (TRC) approved the setting up of a small working committee to oversee the
production of a revised edition of the TEI
Guidelines, to include corrections of as many as possible of the
`corrigible errors' notified to the editors since publication
of the first edition in May 1994, the bulk of which are summarized in a TEI
working paper (TEI EDW67, available from the TEI website).
During the spring of 1997, this TRC Core Subcommittee reviewed nearly 200
comments and proposals which the editors had collected from public debate and
discussion over the preceding two years, and provided invaluable technical
guidance in disposition of them. We are glad to take this opportunity of
expressing our thanks to this subcommittee, whose members were Elli Mylonas,
Dominic Dunlop, and David T. Barnard.
The work of making the corrections and regenerating the text proceeded
rather fitfully during 1998 and 1999, largely because of increasing demands on
the editors' time from their other responsibilities. With the
establishment of the new TEI Consortium, it is be hoped that maintenance of the
Guidelines will be placed on a more secure footing. Some specific areas in which we
anticipate future revisions being carried out are listed below.
Typographic corrections made
examples of TEI markup throughout the text were all checked against the
relevant DTD fragment and an embarassingly large number of tagging errors
corrected;various minor typographic and spelling errors were corrected;the corrigible errors listed in working paper TEI EDW67 were all
corrected: some of these required specific changes to the DTD which are listed
in the next section.
Specific changes in the DTD
A major goal of this revision was to avoid changes which might invalidate
existing data, even where existing constructs seemed erroneous in retrospect.
To that end, wherever changes have been made in content models for existing
elements, they have as far as possible been made so that the DTD will now
accept a superset of what was previously legal. Only one new element
(ab) has been added.
Where possible, a few content models have been changed in such a way as to
facilitate conversion to XML, but XML compatibility is nota goal
of this revision.
Brief details of all changes made in the DTD follow:
Several changes were made in class membership, in order to correct
unreachability problems. Specifically:
elements geogName, persName,
placeNamewere added to the m.data class; geogNameand placeNamewere removed from the
m.placepart class.the elements addSpan, delSpan, gap, were
added to the m.Edit class; a new class m.editIncl was defined, with members
addSpan, delSpan, and gap; this class was then added
to the global inclusion class
m.globIncl along with
anchor (erroneously a member of the
m.Seg class, from
which it is now removed), m.metadata and m.refsys.
added name element to m.addrPart class;added dateLine to m.divtop and m.divbot
classes;added epilogue and castList to m.dramafront
class;added divGen to m.front class;added dateLine to m.divtop and
m.divtop classes;added u element to a.declaring class;defined new class m.fmchunk (front matter chunk),
comprising argument, byline, docAuthor,
docDate, docEdition, docImprint, docTitle,
epigraph, head, and titlePart for use in
simplification of the content model for front element;defined new element ab (anonymous block), and added it to the
m.chunk class;corrected an error whereby global attributes were not properly defined
for elements specifying a non-default value for any of the
a.global attributes: elements affected include: foreign,
hi, del, pb, lb, cb,
language, anchor, and when;changed content models to permit empty list and empty
availabilityelements;changed content model for series element to permit #PCDATA;
changed content model for setting element to permit
date element as a direct child; added a key attribute to the distanceelement,
for consistency with other elements in its class;changed content model for orgName element to make it more
consistent with e.g. persname;changed content model for opener element to include
argument, byLine, and epigraph;change content models for app, rdgGrp, and
witelements;revised attributes on handelement.Finally, a number of content models were changed with a view to
easing the creation of an XML compatible version of the
Guidelines. Specifically:
removed ampersand connectors from cit,
respStmt, publicationStmt, and graph;changed the mixed content models for sense, re, persName,
placeName, geogName, dateStruct, timeStruct, and dateLine
to make them XML conformant.
Outstanding errors
A small number of other known problems remain uncorrected in this version
and are briefly listed below. Please watch the TEI mailing list for
announcements of their correction.
elements of class inter don't always behave
as they should (e.g. one cannot insert a tablebefore
anything else in a div);
some mixed-content problems consequent on the definition of
specialParaneed to be addressed systematically;
in particular, the treatment of list items or notes which contain
several paragraphs continues to surprise many users: no white
space is allowed between the paragraphs;the respattributes on editorial elements are not
consistently defined; the discussions of DTD invocation, and the DTD itself, all use
system identifiers instead of formal public identifiers.
Our next priority however will be the production of a fully XML-compliant version
of the TEI DTD, work on which is already well advanced.
C.M. Sperberg McQueen and Lou Burnard, May 1999Note
These Guidelines are the result of over five years' effort by
members of the research and academic community within the
framework of an international cooperative project called the Text
Encoding Initiative (TEI), established in 1987 under the joint
sponsorship of the Association for Computers and the Humanities,
the Association for Computational Linguistics, and the Association
for Literary and Linguistic Computing.
The impetus for the project came from the humanities computing
community, which sought a common encoding scheme for complex textual
structures in order to reduce the diversity of existing encoding
practices, simplify processing by machine, and encourage the sharing of
electronic texts. It soon became apparent that a sufficiently flexible
scheme could provide solutions for text encoding problems generally. The
scope of the TEI was therefore broadened to meet the varied encoding
requirements of any discipline or application. Thus, the TEI became the
only systematized attempt to develop a fully general text encoding model
and set of encoding conventions based upon it, suitable for processing
and analysis of any type of text, in any language, and intended to serve
the increasing range of existing (and potential) applications and use.
What is published here is a major milestone in this effort. It
provides a single, coherent framework for all kinds of text encoding
which is hardware-, software- and application-independent. Within this
framework, it specifies encoding conventions for a number of key text
types and features. The ongoing work of the TEI is to extend the
scheme presented here to cover additional text types and features, as
well as to continue to refine its encoding recommendations on the
basis of extensive experience with their actual application and use.
We therefore offer these Guidelines to the user community for use in
the same spirit of active collaboration and cooperation with which
they have so far been developed. The TEI is committed to actively
supporting the wide-spread and large-scale use of the Guidelines
which, with the publication of this volume, is now for the first time
possible. In addition, we anticipate that users of the TEI Guidelines
will in some instances adapt and extend them as necessary to suit
particular needs; we invite such users to engage in the further
development of the Guidelines by working with us as they do so.
Like any standard which is actually used, these Guidelines do not
represent a static finished work, but rather one which will evolve
over time with the active involvement of its community of users. We
invite and encourage the participation of the the user community in
this process, in order to ensure that the TEI Guidelines become and
remain useful in all sorts of work with machine-readable texts.
This document was made possible in part by financial support from
the U.S. National Endowment for the Humanities, an independent federal
agency; Directorate General XIII of the Commission of the European
Communities; the Andrew W. Mellon Foundation; and the Social Science
and Humanities Research Council of Canada. Direct and indirect support
has also been received from the University of Illinois at Chicago,
the Oxford University Computing Services, the University of Arizona,
the University of Oslo and Queen's University (Kingston, Ont.),
and Ohio State University.
The production of this document has been greatly facilitated by the
willingness of many software vendors to provide us with evaluation
versions of their products. Most parts of this text have been processed
at some time by almost every currently available SGML-aware software
system. In particular, we gratefully acknowledge the
assistance of the following vendors:
Berger-Levrault AIS s.a. (for Balise).
E2S n.v. (for E2S Advanced SGML Editor)
Electronic Book Technology (for DynaText),
SEMA Group and Yard Software (for Mark-It and Write-It),
Software Exoterica (for CheckMark and Xtran),
SoftQuad, Inc., (for Author/Editor and RulesBuilder),
WordPerfect Corporation (for Intellitag)
Xerox Corporation (for Ventura Publisher)
Details of the software actually used to produce the current document
are given in the colophon at the end of the work.
Acknowledgments
Many people have given of their time, energy, expertise, and support
in the creation of this document; it is unfortunately not possible to
thank them all adequately. Below are listed those who have served as
formal members of the TEI's Work Groups and Working Committees during
its six-year history; others not so officially enfranchised also
contributed much to the quality of the result.
The editors take this opportunity to acknowledge our debt to those
who have patiently endured and corrected our misunderstandings of their
work; we hope that they will feel the wait has not been in vain. For
any errors and inconsistencies remaining, we must accept responsibility;
any virtue in what is here presented, we gladly ascribe to the energies
of the keen intellects listed below.
C. M. Sperberg McQueen and Lou Burnard
TEI Working Committees (1990-1993)
Not all members listed were able to serve throughout the development
of the Guidelines.
Committee on Text Documentation:
Chair: Dominik Wujastyk (Wellcome Institute for the History of
Medicine)
Members 1990-1992: J. D. Byrum (Library of Congress);
Marianne Gaunt (Rutgers University);
Richard Giordano (Manchester University);
Barbara Ann Kipfer (Independent Consultant);
Hans Jørgen Marker (Danish Data Archive, Odense);
Marcia Taylor (University of Essex);
Committee on Text Representation
Chair: Stig Johansson (University of Oslo)
Members 1990-1992: Roberto Cencioni (Commission of the European
Communities);
David R. Chesnutt (University of South Carolina);
Robin C. Cover (Dallas Theological Seminary);
Steven J. DeRose (Electronic Book Technology Inc);
David G. Durand (Boston University);
Susan M. Hockey (Oxford University Computing Service);
Claus Huitfeldt (University of Bergen);
Francisco Marcos-Marin (University Madrid);
Elli Mylonas (Harvard University);
Wilhelm Ott (University of Tübingen);
Allen H. Renear (Brown University);
Manfred Thaller (Max-Planck-Institut für Geschichte,
Göttingen)
Committee on Text Analysis and Interpretation
Chair: D. Terence Langendoen (University of Arizona)
Members 1990-1992:
Robert Amsler (Bell Communications Research);
Stephen Anderson (Johns Hopkins University);
Branimir Boguraev (IBM T. J. Watson Research Center);
Nicoletta Calzolari (University of Pisa);
Robert Ingria (Bolt Beranek Newman Inc);
Winfried Lenders (University of Bonn);
Mitch Marcus (University of Pennsylvania);
Nelleke Oostdijk (University of Nijmegen);
William Poser (Stanford University);
Beatrice Santorini (University of Pennsylvania);
Gary Simons (Summer Institute of Linguistics);
Antonio Zampolli, University of Pisa.
Committee on Metalanguage and Syntax
Chair: David T. Barnard (Queen's University);
David G. Durand (Boston University);
Jean-Pierre Gaspart (Associated Consultants and
Software Engineers sa/nv);
Nancy M. Ide (Vassar College);
Lynne A. Price (Software Exoterica / Xerox PARC);
Frank Tompa (University of Waterloo);
Giovanni Battista Varile (Commission of the European Communities).
In addition, the two TEI editors served ex officio on each
committee.
Following publication of the first draft of the TEI Guidelines (P1)
in November 1990, a number of specialist work groups were charged with
responsibility for drafting revisions and extensions, which, together
with material already presented in P1, constitute the basis of the
present work.
In addition, many members of the work groups listed below met on
three occasions to review the emerging proposals in detail as members
of the TEI Technical Review Committee. These meetings, held in Myrdal
Norway (December 1991), Chicago (June 1992) and Oxford (March 1993),
were largely responsible for the technical content and organization of
the present work. Attendants at these meetings are starred in the list
below.
Chair: Harry Gaylord* (University of Groningen);
Syun Tutiya* (Chiba University).Advisory Board
Members of the TEI Advisory Board during the life time of the
project are listed below, grouped under the name of the organization
represented.
American Anthropological Association:
Chad McDaniel (University of Maryland).American Historical Association:
Elizabeth A. R. Brown (Brooklyn College, CUNY).American Philological Association:
Jocelyn Penny Small (Rutgers University).American Philosophical Association:
Allen Renear (Brown University).American Society for Information Science:
Clifford A. Lynch (University of California).Association for Computing Machinery, Special Interest Group for
Information Retrieval:
1989-93: Scott Deerwester (University of Chicago); 1993- :
Martha Evens (Illinois Institute of Technology).Association for Documentary Editing:
David Chesnutt (University of South Carolina).Association for History and Computing:
1989-91: Manfred Thaller, Max-Planck-Institut fü
Geschichte, Göttingen; 1991- : Daniel Greenstein (Glasgow
University).Association Internationale Bible et Informatique
1989-93: Wilhelm Ott (University of Tübingen); 1993- :
Winfried Bader (University of Tübingen).Canadian Linguistic Association:
Anne-Maria di Sciullo (Université du Québec
à Montréal).Dictionary Society of North America:
Barbara Ann Kipfer (independent consultant).AAP Electronic Publishing Special Interest Group:
1989-92: Betsy Kiser (OCLC); 1992- :
Deborah Bendig and Andrea Keyhani (OCLC).International Federation of Library Associations and
Institutions:
J. D. Byrum Jr. (The Library of Congress).Linguistic Society of America:
Stephen Anderson (The Johns Hopkins University).Modern Language Association:
Randall Jones (Brigham Young University) and
Ian Lancashire (University of Toronto).Steering Committee Membership
Members of the Steering Committee of the TEI during the preparation
of this work were:
Association for Computational Linguistics:1987-1993: Robert A. Amsler (Bell Communications Research);
1987-1993: Donald E. Walker (Bell Communications Research);
1993- : Susan Armstrong-Warwick (University of Geneva);
1994- : Judith Klavans (Columbia University).
Association for Computers and the Humanities:1987- : Nancy M. Ide (Vassar College);
1987-1994: C. M. Sperberg-McQueen (University of Illinois at
Chicago);
1994- : David Barnard (Queen's University).
Association for Literary and Linguistic Computing:1987- : Susan M. Hockey (Center for Electronic Texts in the
Humanities);
1987- : Antonio Zampolli (University of Pisa).
Changes from TEI P1 to TEI P3
This list gives a partial indication of the major changes from
Versions 1 and 2 of these Guidelines (issued by the TEI as drafts
under the document numbers TEI P1 and TEI P2, the latter released in
chapters between March 1992 and the end of 1993) to the current text.
Chapter : this chapter corresponds to chapter 1 of
TEI P1 and TEI P2; it has been reorganized, revised, and expanded, and a
new section explaining the notational conventions of this document has
been added.
Chapter : this is a slightly revised version of
chapter 2 of TEI P1 and P2. Brief discussions of parameter entities and
marked sections have been added, but no other changes of substance have
been made.
Chapter : this chapter was introduced in TEI P2;
the lists of classes and important parameter entities have been updated,
and some declarations have been reordered; no other changes have been
made.
Chapter : this chapter corresponds to some material
in chapter 3 of TEI P1, but presents it in what is hoped to be a more
accessible form. No substantive changes have been made since its
publication as part of TEI P2.
Chapter : this is a revised and much expanded
version of chapter 4 of TEI P1. The overall structure of the TEI
header has been retained, but most of the elements have been renamed to
match a new set of naming conventions. The
encoding.declarations element of TEI P1 has been split into the
encodingDesc and profileDesc elements, the former
concentrating on the process by which the electronic text has been
encoded, the latter on the non-bibliographic
characteristics of the text itself. A number
of specialized declarations have been added to both these sections of
the header, in order to allow the formal specification of important
information about the text and its encoding.
Chapter : this chapter corresponds to sections 5.3
to 5.6, portions of 5.7, and 5.8 of the first public draft of these
Guidelines (TEI P1). Changes made to this material in this version
include:
The individual sections have been reordered and reorganized.
Highlighting and quotation marks are treated together.
The tags for names and dates have been revised, and a separate
additional tag set has been provided for detailed analysis of names
and dates (chapter ).
The tags for simple editorial interventions have been revised;
the new set includes several complementary pairs of elements, so that
the encoder is consistently given the choice of recording the original
text, or an editorial modification of it, as data content, and the other
as an optional attribute value.
The tags for bibliographic references have been renamed (from
citn to bibl, etc.) and a new form
(biblFull), corresponding to the structure of the TEI
header, has been added.
The treatment of canonical reference systems has been thoroughly
revised and the discussion is now supplemented by discussions in
chapter , and chapter .
Chapter : this chapter corresponds to section
5.2 of TEI P1. Changes
made to this material in this version include:
The theoretical discussion of alternative methods of constructing
a tag set for overall text structure has been suppressed.
The tags for elements of a title-page have been renamed.
The specialized tags for divisions of front matter and back matter
(foreword, acknowledgements, dedication,
colophon, etc.) have been deleted; like those of the text body,
these elements may be tagged with generic div elements.
In addition to numbered div elements, the current draft
also allows for un-numbered generic divs.
The treatment of collections and anthologies is explicitly
discussed, building on section 7.2 of TEI P1, and the group
element is introduced to deal with them.
Chapter , chapter , and chapter : these correspond to the subparts of section 7.3 of TEI P1,
but have been completely redesigned and rewritten from scratch.
Chapter : this chapter first appeared in TEI P2;
it has been revised here to match changes in the overall design of the
Guidelines since its publication. Most importantly, this tag set now
uses the default text-structure elements described in chapter , and the methods for handling overlap and other time-specific
information have been revised to make use of the techniques
described in chapter .
Chapter : the tag set presented in this chapter is
a complete revision of that described in section 7.4 of TEI P1, and the
chapter itself was entirely rewritten from scratch.
Chapter : this chapter was first published in
December, 1993, as part of TEI P2. Since that publication, it has been
revised slightly for the sake of consistency with the rest of the
Guidelines and with the work of Technical Committee 37 of the
International Organization for Standardization (ISO) on ISO DIS 12 200.
Chapter : this chapter corresponds to sections 5.7
(Links and Cross References) and sections 6.2.3 through 6.2.5
(Alignment of Multiple Analyses, etc.) of TEI P1. Changes made
to this material in this version include:
The xref element of P1 has been split into the two
elements ptr and xptr, of which the former is used to
point at IDs within the document and the latter for pointing outside the
document or for pointing at passages without IDs in the current
document.
The elements ref and xref have been added to
provide pointer elements which can accept character content, for
cases in which the pointing phrase of the source text cannot be
reconstructed algorithmically.
The extended pointer syntax has been
substantially revised and systematized; the syntax and semantics
of extended pointers have been defined more precisely.
The unit and level elements defined by P1 for
implicit alignment of multiple levels of analysis have been dropped;
in their stead, the revised feature structure elements should be used.
These are defined in chapter .
The elements alignment, al.map, al.ptr,
al.list, al.range, defined in P1 for explicit
alignment of multiple texts or analyses, have been replaced by the
link, linkGrp, corresp and
correspGrp elements. Because the link and
corresp elements can point at multiple targets, there is no
need for special alignment pointers, alignment lists, or alignment range
elements.
The elements link and correspond may be used
in connection with xptr to align elements in external entities
or passages which do not bear SGML identifiers.
Since the publication of this chapter in TEI P2, it has been revised,
the section on alternation has been added, new examples have been
introduced, and the extended pointer syntax has been revised. The
extended pointer syntax is now also used to specify canonical reference
systems, as well as in the xptr and xref elements.
Chapter : the bulk of this chapter is new, though
some parts of its substance derive from chapter 6 of TEI P1. The global
ana and inst attributes have been added, to
simplify the notation for simple forms of alignment between text and
analyses; the elements span and interp have been
introduced, to simplify the specification of analyses which do not
require the structural rigor of feature structures.
Chapter : this chapter derives from sections 6.2.1
and 6.3 of TEI P1. In its broad outlines, the feature structure
notation introduced there is retained. The most important changes
include these:
Some elements have been renamed.
Feature names and feature structure names are now represented as
attribute values, rather than as embedded subelements.
The treatment of Boolean logic has been substantially changed.
Chapter : this chapter was introduced in TEI P2; its
wording has been revised slightly since then, but the tags described
remain the same.
Chapter : this chapter presents new material on the
use of the core tags for editorial intervention, and on specialized
problems in the transcription of primary source material, especially
manuscripts.
Chapter : this chapter is a substantial revision of
section 5.10 (Critical Apparatus) of TEI P1. The major changes
include the following:
The single end-point attachment method of encoding critical
apparatus has been dropped.
A new method of apparatus encoding, the location-referenced
method, has been introduced to simplify the transcription of existing
critical editions.
The problem of subvariation is treated more explicitly.
The witList element has been introduced for the purpose
of identifying all the witnesses whose readings are recorded in the
apparatus.
The treatment of detailed information about a particular
reading in a particular witness (in the witDetail element) has
been changed somewhat.
Chapter : this chapter is new in this version.
Chapter : this chapter replaces section 5.9 of TEI
P1; it provides small but usable tag set for tables, and describes in
much more detail the process of including graphical information
(figures, illustrations, etc.) in TEI-encoded texts.
Chapter : this chapter builds on section 7.2 of TEI
P1, but its contents are largely new. The tag set described here
provides much fuller methods for documenting text type, subject area,
and demographic characteristics of speakers, listeners, authors, etc.
associated with the texts of a corpus.
Chapter : this chapter was introduced in TEI P2; it
has been slightly revised since.
Chapter : this chapter derives from the writing
system declaration described in chapter 3 (Characters and Character
Sets) of TEI P1. The structure of the WSD has been changed
slightly, and the chapter now gives an explicit account of the semantics
of specifying base character sets, entity sets, or WSDs, and of
modifying them using the exceptions element.
Chapter : this chapter is new in this version of
these Guidelines.
Chapter : this chapter first appeared in TEI P2; it
has not changed substantially since.
Chapter ,
chapter ,
chapter ,
chapter , and
chapter : these chapters are all new in the current
version of these Guidelines (though the mechanisms of modifying
the TEI DTDs described in chapter remain the same as
those described in chapter 8 of TEI P1). The definition of conformance
provided in this version of these Guidelines differs from that of TEI P1
primarily in making more explicit the nature of the requirement that
extensions to the tag set be documented, in specifying the nature of the
DTD modifications allowed in TEI-conformant documents, and in
completely divorcing the issue of TEI-conformance from that of the
character sets used in the document.
The alphabetical reference list of classes, entities, and elements
was introduced in TEI P2; in this version, slightly fuller information
is given. For element classes, lists of members are given which include
members of all subclasses, and the declarations of the a-dot and m-dot
parameter entities for the class are reproduced. The files in which
entities and elements are declared are also given.
Chapter : this chapter appeared in TEI P2 and has
not been revised for this version of these Guidelines.