Info file: mule,    -*-Text-*-
produced by `texinfo-format-buffer'
from file `mule.texi'
using `texinfmt.el' version 2.32 of 19 November 1993.


This file documents the internal structure of Mule (MULtilingual
Enhancement to GNU Emacs).



File: mule, Node: Top, Next: Overview, Up: (dir)

Mule specific information
*************************

This file documents Mule specific information.

We are very sorry for the poor structure of this file.  Currently, this
is just a collection of several documents which are previously located
under `doc' directory.

* Menu:

* Overview::            Overview of the internal structure of Mule
* Character::           Internal representation of multilingual text
* Coding-system::       Encoding of text while reading/writing
* Syntax::              Extended syntax and character category
* Font::                Font handling while displaying text on X
* CCL: (CCL).           Code Conversion Language
* Terminology: (terminology).
* Languages: (languages).  Language specific tips.
* X's font: (XFONT).    X's FONT usage for novice users
* R2L: (R2L).           Right-to-left writing
* EGG: (egg).           Japanese/Chinese inputting methods using Wnn/cWnn
* Quail: (quail).       Imputting methods of multilingual text
* Keyboard Translation: (kbd-trans).
* ISO2022: (ISO2022).   ISO2022 encoding mechanism
* m2ps: (m2ps).         Convert multilingual text to PostScript


File: mule, Node: Overview, Next: Character, Prev: Top, Up: Top

Overview
========

To handle multilingual text, Mule extended GNU Emacs in many aspects.
Mule uses special internal representation of multilingual text, converts
text from/to outernal representation on reading/writing, special font
selection mechanism to display multilingual text on X window.


File: mule, Node: Character, Next: Coding-system, Prev: Overview, Up: Top

Character
=========

* Menu:

* Character type::
* Buffer and string::
* Character object::
* GLYPH::
* Functions::
* Character set::


File: mule, Node: Character type, Next: Buffer and string, Up: Character

Character type
==============

There are 6 types of character.  `Type N-M' means that original N-byte
code is represented by M-byte within Mule.

*Type 1-1*
     ASCII characters
*Type 1-2*
     Characters in one-byte official character-set (e.g. ISO8859-1,
     Latin-1)
*Type 1-3*
     Characters in one-byte private character-set
*Type 2-3*
     Characters in two-byte official character-set (e.g. JISX0208,
     Japanese)
*Type 2-4*
     Characters in two-byte private character-sets
*Type N*
     Composite characters of variable length

*Note Character set:: for predefined character sets.


File: mule, Node: Buffer and string, Next: Character object, Prev: Character type, Up: Character

Buffer and string
=================

*Type 1-1*
     1-byte 'C' [C <= 0x7F] (same as the original representation)
*Type 1-2*
     2-byte sequence 'LC1 C1', where
     	LC1 = leading character for the character-set,
     		0x81..0x8F -- 15 sets
             C1 = 0x80 | (original byte for the character)]
     		0xA0 <= C1 <= 0xFF
*Type 1-3*
     3-byte sequence 'LCPRV1 LC12 C1', where
     	LCPRV1 = 0x9A (for one column) or 0x9B (for two column)
     	LC12 = extended leading character,
     		0xA0 <= LC12 <=0xDF (if LCPRV1 = 0x9A) -- 64 sets
     		0xE0 <= LC12 <=0xEF (if LCPRV1 = 0x9B) -- 16 sets
     	C1 = same as above
*Type 2-3*
     3-byte sequence 'LC2 C21 C22', where
     	LC2 = leading character for the character-set,
     		0x90 <= LC2 <= 0x99 -- 10 sets
     	C21 = 0x80 | (original first byte for the character),
     	C22 = 0x80 | (original second byte for the character),
     		0xA0 <= C21,C22 <= 0xFF
*Type 2-4*
     4-byte sequence 'LCPRV2 LC22 C21 C22'
     	LCPRV2 = 0x9D (for one column) or 0x9E (for two column)
     	LC22 = extended leading character,
     		0xF0 <= LC22 <=0xF4 (if LCPRV2 = 0x9C) -- 5 sets
     		0xF5 <= LC22 <=0xFE (if LCPRV2 = 0x9D) -- 10 sets
     	C21, C22 = same as above
*Type N*
     n-byte sequence 'LCCMP LCN1 C11 ... LCN2 C21 ... LCNn Cn1 ...'
     	all characters 'LCN1 C11 ... LCN2 C21 ... LCNN CN1 ...'
     	are displayed on the same column.
     	LCCMP = 0x80
     	LCN1 .. LCNN = leading character + 0x20, but, for ASCII,
     0xA0

Here's an example of a text with mixture of these types (at the place of
0x?? comes real binary code) .

"Here comes Latin-1 character of n with ~ '0x81 0xF1' and here comes
Japanese Hiragana '0x92 0xA4 0xA2'."


File: mule, Node: Character object, Next: GLYPH, Prev: Buffer and string, Up: Character

Character object
================

Emacslisp treats a character object as an integer of value less than 256
(8-bit).  Mule extends a character object to 19-bit.

The bit fields are divided into 3 parts:
     f1(5bits):f2(7bits):f3(7bits)

*Type 1-1: C [C <= 7F] (same as character code itself)*
     	0:00:00-7F
*Type 1-2: ((LC1 & 0x7F + 0x10) << 7) | (C1 & 0x7F)*
     	0:11-1F:20-7F (f1=0,f2=LC1&0x7F + 0x10,f3=C1&0x7F)
*Type 1-3: ((LC21 & 0x7F + 0x10) << 7) | (C1 & 0x7F)*
     	0:30-FF:20-7F (f1=0,f2=LC21&0x7F + 0x10,f3=C1&0x7F)
*Type 2-3: ((LC2 - 0x8F) << 14) | ((C21 & 0x7F) << 7) | (C22 & 0x7F)*
     	01-0A:20-7F:20-7F (f1=LC2&0x7F,f2=C21&0x7F,f3=C22&0x7F)
*Type 2-4: ((LC22 - 0xE0) << 14) | ((C21 & 0x7F) << 7) | (C22 & 0x7F)*
     	10-1E:20-7F:20-7F
     (f1=LC2&0x7F-0x20,f2=C21&0x7F,f3=C22&0x7F)
*Type N:*
     	1F:00-7F:00-7F

For instance, if '?' is followed by Type 1-2 character '0x81 0xF1', 241
[= 0xF1 = ((0x81 & 0x7F) << 7) | (0xF1 & 0x7F)] is returned.

In the above table, several blocks are not defined.  Those are used
internally to represent incomplete characters.

  0:01-12:00 leading-char only or invalid char
  0:20-5F:00 LCPRV11/LCPRV12 + LC21 of Type 1-3
  1-8:20-7F:00 LC2 + C21 of Type 2-3
  9-E:00:00 LCPRV21/LCPRV22 + LC22 of Type 2-4
  9-E:20-7F:00 LCPRV21/LCPRV22 + LC22 + C21 of Type 2-4


File: mule, Node: GLYPH, Next: Functions, Prev: Character object, Up: Character

GLYPH
=====

The original definition of GLYPH is (FACE-ID << 8 | CHAR).  Since Mule,
however, requires 19 bits for CHAR, the definition is changed to
(FACE-ID << 19 | CHAR).  So, we can use only 2024 (= 2^11) different
faces.


File: mule, Node: Functions, Next: Character set, Prev: GLYPH, Up: Character

Functions
=========

To handle multilingual characters, we extended or added the following
functions:

In editfns.c ...

char-to-string: Convert arg CHAR to a string containing that character.
If CHAR < 0, it is considered as a multilingual character, and returned
a correct string.

Example:
	(char-to-string ?A) => "A"
	(char-to-string ?あ) => "あ"
	(char-to-string 53794) => "あ"

string-to-char: Convert arg STRING to a character, the first character
of that string.

Example:
	(string-to-char "ABあい") => 65 (== ?A)
	(string-to-char "あい") => 53794 (== ?あ)

sref: DEFUN ("sref", Fsref, Ssref, 2, 2, 0, Return the character in
STRING at index INDEX.  INDEX starts at 0.  If INDEX does not points to
character boundary, -1 is returned.

Example:
	(sref "ABあい" 1) => 66 (== ?b)
	(sref "ABあい" 2) => 53794 (== ?あ)
	(sref "ABあい" 3) => -1 (non character boundary)
	(sref "ABあい" 5) => 53796 (== ?い)

sset: Store into STRING at index INDEX the character CHAR.  INDEX should
point to a character of same bytes as CHAR.  If not, returns nil, else
returns CHAR.

Example:
	(setq s "ABあい")
	(sset s 1 ?C) => ?C (s == "ACあい")
	(sset s 2 ?う) => ?う (s == "ACうい")
	(sset s 2 ?A) => ?A (s == "ACA\244\246い")
	(sset s 8 ?A) => nil (out of range)

following-char: Return the character following point, as a number.  If
mc-flag of the current buffer is not nil, the returned character
 may be a multi-byte character.

Example: If cursor is at 'あ' of buffer "..Aあ..",
	(following-char) => 53794 (== ?あ)
	(let ((mc-flag nil))
	 (following-char t)) => 146 (== leading char of ?あ)

preceding-char: Return the character preceding point, as a number.  If
mc-flag of the current buffer is not nil, the returned character
 may be a multi-byte character.

Example: If cursor is at 'A' of buffer "..あA..",
	(preceding-char) => 53794 (== ?あ)
	(let ((mc-flag nil))
	  (preceding-char t)) => 162 (== last byte of ?あ)

char-after First arg, POS, a number.  Return the character in the
current buffer at position POS.  If POS is out of range, the value is
NIL.  If mc-flag of the current buffer is not nil, the returned
character
 may be a multi-byte character.

Function 'insert' and 'insert-char' also work correctly with
multilingual characters.

	(insert ?あ) -- inserts "あ" at point.

buffer-substring: Return the contents of part of the current buffer as a
string.  The two arguments specify the start and end, as character
numbers.  If mc-flag of the current buffer is non-nil, region may be
widen
 to meet character boundary.

Example: If a buffer starts with the contents like "あいう..."
	(buffer-substring 1 2) => "あ"
	(buffer-substring 1 3) => "あ"
	(buffer-substring 2 4) => "あい"

Other functions which deal with 'region' also widen range automatically.

subst-char-in-region: From START to END, replace FROMCHAR with TOCHAR
each time it occurs.  If optional arg NOUNDO is non-nil, don't record
this change for undo and don't mark the buffer as really changed.  It
also works well with multilingual characters only if the substitution
doesn't alter the length of buffer.

Example:
	(subst-char-in-region 1 10 ?a ?b) => possible
	(subst-char-in-region 1 10 ?あ ?い) => possible
	(subst-char-in-region 1 10 ?a ?あ) => impossible

In functions 'message' and 'format', %c works well with multilingual
characters.

	(message "%c" ?あ) -- shows "あ" in echo area.

In mule.c ...

make-character: Make multi-byte character from LEADING-CHAR
 and optional args ARG1 and ARG2.

Example:
	(make-character lc-jp ?\244 ?\242) => 53794 (== ?あ)

char-component: Return a components of multi-byte character CHAR.
Second arg IDX indicate which component should be returned as follows.
 0: leading character or extended leading character,
 1: first byte of the character code,
 2: second byte of the character code.  If the character does not have
the componets, 0 is returned.

Example:
	(char-component ?あ 0) => 146 (== lc-jp)
	(char-component ?あ 1) => 164
	(char-component ?あ 2) => 162
	(char-component ?A 1) => 0

char-leading-char: Return leading character of CHAR.  If CHAR is not a
multi-byte code, 0 is returned.

Example:
	(char-leading-char ?あ) => 146 (== lc-jp)
	(char-leading-char ?A) => 0

char-bytes: Return number of bytes CHAR will occupy in a buffer.  You
can specify a character set to be concerned
 by providing a leading character as CHAR.

Example:
	(char-bytes ?あ) => 3
	(char-bytes ?A) => 1
	(char-bytes lc-jp) => 3

char-width: Return number of columns CHAR will occupy when displayed.
You can specify a character set to be concerned
 by providing a leading character as CHAR.

Example:
	(char-width ?あ) => 2
	(char-width ?A) => 1
	(char-width lc-jp) => 2

chars-in-string: Return number of characters in STRING.  Each
multilingual character is also counted as one.

Example:
	(chars-in-string "ABあい") => 4

char-boundary-p: Return non nil value if POS is at character boundary.
The value is:
 0: if POS is at an ASCII character or end of range,
 1: if POS is at a leading char of 2-byte character.
 2: if POS is at a leading char of 3-byte character.  If POS is out of
range or not at character boundary, nil is returned.


File: mule, Node: Character set, Prev: Functions, Up: Character

A character set is a set of ordered characters such as ASCII, right half
of ISO8859-1, JIS X0208, and etc.  Mule identifies a character set by a
leading-char assigned to each set uniquely.

Each character-set is characterized by the following attributes:
  1. Bytes length of code: 1-byte or 2-byte
     	ISO8859-1, Right half of JISX0201 (Japanese Katakana) --
     1-byte
     	GB2312-1980 (Chinese), JISX0208 (Japanese) -- 2-byte
  2. Columns occupied on a screen: 1-column or 2-column,
     	ISO8859-1, Right half of JISX0201 (Japanese Katakana) --
     1-column
     	GB2312-1980 (Chinese), JISX0208 (Japanese) -- 2-column
  3. Type: 94-char-set, 96-char-set, 94x94-char-set, or 96x96 char-set,
  4. Graphic set: GL or GR,
  5. Final character: one of '0' thru '~',
  6. Displaying direction: Left-to-right or Right-to-left
  7. Leading character: the system assigns one by one.

3 thru 5 are notations of ISO2022.

Character-sets are defined by 'new-character-set' function call.

     --- mule.c ---------------------------------------------------------
     DEFUN ("new-character-set", Fnew_character_set, Snew_character_set, 8, MANY, 0,
       "Define new character set of LEADING-CHAR (1st arg).\n\
     Rest of args are:\n\
      BYTE: 1, 2, or 3\n\
      COLUMNS: 1 or 2\n\
      TYPE: 0 (94 chars), 1 (96 chars), 2 (94x94 chars), or 3 (96x96 chars)\n\
      GRAPHIC: 0 (use g0 on output) or 1 (use g1 on output)\n\
      FINAL: final character of ISO escape sequence\n\
      DIRECTION: 0 (left-to-right) or 1 (right-to-left)\n\
      DOC: short description string.\n\
     If LEADING-CHAR >= 0xA0, it is regarded as extended leading-char\n\
     and BYTE and COLUMNS args are ignored.")
     ------------------------------------------------------------

The system pre-defines the following character-sets.

     --- mule.el ---------------------------------------------------------
     (defconst *predefined-character-set*
       (list
        ;; (cons lc '(bytes width type graphic final direction doc))
        ;; (cons lc-ascii '(0 1 0 0 ?B 0 "ASCII" "ISO8859-1")) ;; predefined in C
        (cons lc-ltn1 '(1 1 1 1 ?A 0 "Latin-1" "ISO8859-1"))
        (cons lc-ltn2 '(1 1 1 1 ?B 0 "Latin-2" "ISO8859-2"))
        (cons lc-ltn3 '(1 1 1 1 ?C 0 "Latin-3" "ISO8859-3"))
        (cons lc-ltn4 '(1 1 1 1 ?D 0 "Latin-4" "ISO8859-4"))
        (cons lc-thai '(1 1 1 1 ?T 0 "Thai" "TIS620"))
        (cons lc-grk '(1 1 1 1 ?F 0 "Greek" "ISO8859-7"))
        (cons lc-arb '(1 1 1 1 ?G 1 "Arabic" "ISO8859-6"))
        (cons lc-hbw '(1 1 1 1 ?H 1 "Hebrew" "ISO8859-8"))
        (cons lc-kana '(1 1 0 1 ?I 0 "Japanese Katakana" "JISX0201.1976"))
        (cons lc-roman '(1 1 0 0 ?J 0 "Japanese Roman" "JISX0201.1976"))
        (cons lc-crl '(1 1 1 1 ?L 0 "Cyrillic" "ISO8859-5"))
        (cons lc-ltn5 '(1 1 1 1 ?M 0 "Latin-5" "ISO8859-9"))
        (cons lc-jpold '(2 2 2 0 ? 0 "Japanese Old" "JISX0208.1978"))
        (cons lc-cn '(2 2 2 0 ?A 0 "Chinese" "GB2312"))
        (cons lc-jp '(2 2 2 0 ?B 0 "Japanese" "JISX0208.\\(1983\\|1990\\)"))
        (cons lc-kr '(2 2 2 0 ?C 0 "Korean" "KSC5601"))
        (cons lc-jp2 '(2 2 2 0 ?D 0 "Japanese Supplement" "JISX0212"))
        (cons lc-cns1 '(2 2 2 0 ?G 0 "CNS Plane1" "CNS11643.1"))
        (cons lc-cns2 '(2 2 2 0 ?H 0 "CNS Plane2" "CNS11643.2"))
        (cons lc-big5-1 '(2 2 2 0 ?0 0 "Big5 Level 1" "Big5"))
        (cons lc-big5-2 '(2 2 2 0 ?1 0 "Big5 Level 2" "Big5"))))

     (let ((c *predefined-character-set*)
           lc data)
       (while c
         (setq lc (car (car c))
     	  data (cdr (car c)))
         (apply 'new-character-set lc data)
         (setq c (cdr c))))

In addition, the following private character sets are predifined.

     --- mule-config.el -----------------------------------------
     ;; REGISTRATION OF PRIVATE CHARACTER SETS

     ;; PinYin-ZhuYin
     (setq lc-sisheng
           (new-private-character-set 1 1 0 0 ?0 0 "PinYin-ZhuYin" "sisheng_cwnn"))

     (setq lc-ascr2l
           (new-private-character-set 1 1 0 0 ?B 1 "Right-to-Left ASCII" "ISO8859-1")) 

     ;; Vietnamese VISCII with two tables.
     (setq lc-vn-1
           (new-private-character-set 1 1 1 1 ?1 0 "VISCII lower" "VISCII1.1"))
     (setq lc-vn-2
           (new-private-character-set 1 1 1 1 ?2 0 "VISCII upper" "VISCII1.1"))

     ;; Three character sets for Arabic
     (setq lc-arb0
           (new-private-character-set
            1 1 0 0 ?2 0 "Arabic digit" "MuleArabic-0"))
     (setq lc-arb1
           (new-private-character-set
            1 1 0 0 ?3 1 "1-column Arabic" "MuleArabic-1"))
     (setq lc-arb2
           (new-private-character-set
            1 2 0 0 ?4 1 "2-column Arabic" "MuleArabic-2"))

     ;; for Mule IPA
     (setq lc-ipa0
           (new-private-character-set 1 1 1 1 ?0 0 "IPA for Mule" "MuleIPA"))
     ------------------------------------------------------------


File: mule, Node: Coding-system, Next: Syntax, Prev: Character, Up: Top

Coding-system
=============

`coding-system' is a method for encoding several character-sets and
represented by a symbol which has properties of 'coding-system and '
eol-type.

You can specify different coding-system on file I/O, process I/O, output
to terminal (if not running on X), input from keyboard (if not running
on X).


* Menu:

* Structure::   Structure of coding-system
	  o Property 'coding-system
	  o Property 'eol-type
	  o Property 'post-read-conversion
	  o Property 'pre-write-conversion
* Creation::   How to create coding-system?
* Predefined coding-system::
* Automatic conversion::
	  o Category of coding-system
	  o How automatic conversion works?
	  o Priority of category
* Mode-line::   How coding-system is shown in mode-line?::
* ISO2022 restriction::
* Big5::        Special treatment of Big5


File: mule, Node: Structure, Next: Creation, Prev: Coding-system, Up: Coding-system

Structure of coding-system
==========================


Property 'coding-system
-----------------------

The value of the property 'coding-system is a vector:
       [ TYPE MNEMONIC DOCUMENT DUMMY FLAGS ] or the other
coding-system.  Contents of the vector are:
       TYPE:	nil: no conversion, t: automatic conversion,
     	0:Internal, 1:Shift-JIS, 2:ISO2022, 3:Big5, 4:CCL.
       MNEMONIC: a character shown at mode-line to indicate the coding-system.
       DOCUMENT: a describing documents for the coding-system.
       DUMMY: always nil (for backward compatibility)
       FLAGS (option): more precise information about the coding-system,
         If TYPE is 2 (ISO2022), FLAGS should be a list of:
           LC-G0, LC-G1, LC-G2, LC-G3:
     	Leading character of charset initially designated to G? graphic set,
     	nil means G? is not designated initially,
     	lc-invalid means G? can never be designated to,
     	if (- leading-char) is specified, it is designated on output,
           SHORT: non-nil - allow such as \"ESC $ B\", nil - always \"ESC $ \( B\",
           ASCII-EOL: non-nil - designate ASCII to g0 at end of line on output,
           ASCII-CNTL: non-nil - designate ASCII to g0 at control codes on output
           SEVEN: non-nil - use 7-bit environment on output,
           LOCK-SHIFT: non-nil - use locking-shift (SO/SI) instead of single-shift
     	or designation by escape sequence,
           USE-ROMAN: non-nil - designate JIS0201-1976-Roman instead of ASCII,
           USE-OLDJIS: non-nil - designate JIS0208-1976 instead of JIS0208-1983,
           NO-ISO6429: non-nil - don't use ISO6429's direction specification,
       If TYPE is 3 (Big5), FLAGS `t' means Big5-ETen, `nil' means Big5-HKU,
       If TYPE is 4 (private), FLAGS should be a cons of CCL programs
         for encoding and decoding.  See documentation of CCL for more detail.


Property 'eol-type
------------------

The value of the property 'eol-type is:
  nil: no conversion for end-of-line type
  1: LF
  2: CRLF
  3: CR
  vector of length 3: automatic detection of end-of-line type.
	1st element: coding-system of eol-type LF
	2nd element: coding-system of eol-type CRLF
	3rd element: coding-system of eol-type CR


Property 'post-read-conversion
------------------------------

The value of the property 'post-read-conversion is a function to convert
some text just read into a buffer.  When the function is called, the
text has already been converted according to 'coding-system and '
eol-type of the coding-system.  The argument of the function is the
region (START and END) of inserted text.


Property 'pre-write-conversion
------------------------------

The value of the property 'pre-write-conversion is a function to convert
some text just before writing it out.  After the function is called, the
text is converted accoding to 'coding-system and 'eol-type of the
coding-system.  The argument of the function is the region (START and
END) of the text.


File: mule, Node: Creation, Next: Predefined coding-system, Prev: Strucure, Up: Coding-system

How to create coding-system?
============================

Mule provides a function `make-coding-system' to create a coding-system.

FUNCTION make-coding-system: NAME TYPE MNEMONIC DOC &optional EOL-TYPE
FLAGS

Register symbol NAME as a coding-system whose 'coding-system property is
a vector [ TYPE MNEMONIC DOC nil FLAGS ] and 'eol-type property is
EOL-TYPE.  If `t' is specified as EOL-TYPE, the value of 'eol-type
property is a vector of generated coding-systems whose 'eol-type
properties are 1 (LF), 2 (CRLF), and 3 (CR).  The names of generated
coding-systems are NAMEunix, NAMEdos, and NAMEmac respectively.

Just to make an alias of some coding-system, call a fucntion
`copy-coding-system'.

FUNCTION copy-coding-system: ORIGINAL ALIAS

Make the same coding-system as ORIGINAL and name it ALIAS.  If 'eol-type
property of ORIGINAL is a vector, coding-systems ALIASunix, ALIASdos,
and ALIASmac are generated, and 'eol-type property of ALIAS becomes a
vector of them.


File: mule, Node: Predefined coding-system, Next: Automatic conversion, Prev: Creation, Up: Coding-system

Predefined coding-system
========================

In the file lisp/mule.el, the following coding-systems are predefined.

     ----- lisp/mule.el -----------------------------------------
     (make-coding-system
      '*noconv* nil
      ?= "No conversion.")

     (make-coding-system
      '*autoconv* t
      ?+ "Automatic conversion." t)

     (make-coding-system
      '*internal* 0
      ?= "Internal coding-system used in a buffer.")

     (make-coding-system
      '*sjis* 1
      ?S "Coding-system of Shift-JIS used in Japan." t)

     (make-coding-system
      '*iso-2022-jp* 2
      ?J "Coding-system used for communication with mail and news in Japan."
      t
      (list lc-ascii lc-invalid lc-invalid lc-invalid
            'short 'ascii-eol 'ascii-cntl 'seven))
     (copy-coding-system '*iso-2022-jp* '*junet*)

     (make-coding-system
      '*oldjis* 2
      ?J "Coding-system used for old jis terminal."
      t
      (list lc-ascii lc-invalid lc-invalid lc-invalid
            'short 'ascii-eol 'ascii-cntl 'seven nil 'use-roman 'use-oldjis))

     (make-coding-system
      '*ctext* 2
      ?X "Coding-system used in X as Compound Text Encoding."
      t
      (list lc-ascii lc-ltn1 lc-invalid lc-invalid
            nil 'ascii-eol))

     (make-coding-system
      '*euc-japan* 2
      ?E "Coding-system of Japanese EUC (Extended Unix Code)."
      t
      (list lc-ascii lc-jp lc-kana lc-jp2
            'short 'ascii-eol 'ascii-cntl))

     (make-coding-system
      '*euc-korea* 2
      ?K "Coding-system of Korean EUC (Extended Unix Code)."
      t
      (list lc-ascii lc-kr lc-invalid lc-invalid
            nil 'ascii-eol 'ascii-cntl))
     ;; 93.12.16 by K.Handa
     (copy-coding-system '*euc-korea* '*euc-kr*)

     (make-coding-system
      '*iso-2022-kr* 2
      ?k "Coding-System used for communication with mail in Korea."
      nil
      (list lc-ascii (- lc-kr) lc-invalid lc-invalid
            nil 'ascii-eol 'ascii-cntl 'seven 'lock-shift))
     (copy-coding-system '*iso-2022-kr* '*korean-mail*)

     (make-coding-system
      '*euc-china* 2
      ?C "Coding-system of Chinese EUC (Extended Unix Code)."
      t
      (list lc-ascii lc-cn lc-invalid lc-invalid
            nil 'ascii-eol 'ascii-cntl))

     (make-coding-system
      '*iso-2022-ss2-8* 2
      ?I "ISO-2022 coding system using SS2 for 96-charset in 8-bit code."
      t
      (list lc-ascii lc-invalid nil lc-invalid
            nil 'ascii-eol 'ascii-cntl))

     (make-coding-system
      '*iso-2022-ss2-7* 2
      ?I "ISO-2022 coding system using SS2 for 96-charset in 7-bit code."
      t
      (list lc-ascii lc-invalid nil lc-invalid
            'short 'ascii-eol 'ascii-cntl 'seven))

     (make-coding-system
      '*iso-2022-lock* 2
      ?i "ISO-2022 coding system using Locking-Shift for 96-charset."
      t
      (list lc-ascii nil lc-invalid lc-invalid
            nil 'ascii-eol 'ascii-cntl 'seven
            'lock-shift))			;93.12.1 by H.Minamino

     (make-coding-system
      '*big5-eten* 3
      ?B "Coding-system of BIG5-ETen."
      t t)

     (make-coding-system
      '*big5-hku* 3
      ?B "Coding-system of BIG5-HKU."
      t nil)
     ------------------------------------------------------------


File: mule, Node: Automatic conversion, Next: Mode-line, Prev: Predefined coding-system, Up: Coding-system

Automatic conversion
====================


Category of coding-system
-------------------------

Mule has a facility to detect coding-system of text automatically,
however, what mule actually detect is not a coding-system itself but a
category of coding-system.  A category is also represented by a symbol
and a value should be an actual coding-system.

There are eight categories:
*coding-category-internal*:
     	coding-system used in a buffer
*coding-category-sjis*
     	Shift-JIS
*coding-category-iso-7*
     	ISO2022 variation with the following feature:
     	  o no locking shift, single shift
     	  o only G0 is used
*coding-category-iso-8-1*
     	ISO2022 variation with the following feature:
     	  o no locking shift
     	  o designation sequence is allowed only for G0 and G1
     	  o G1 is used only for 1-byte character set
*coding-category-iso-8-2*
     	ISO2022 variation with the following feature:
     	  o no locking shift
     	  o designation sequence is allowed only for G0 and G1
     	  o G1 is used only for 2-byte character set
*coding-category-iso-else*
     	ISO2022 variation which doesn't satisfy any of above.
*coding-category-big5*
     	Big5 (ETen or HKU)
*coding-category-bin*
     	Any other coding-system which uses MSB.

The values of these symbols are pre-defined as follows:

     ----- lisp/mule.el -----------------------------------------
     (defvar *coding-category-internal* '*internal*)
     (defvar *coding-category-sjis* '*sjis*)
     (defvar *coding-category-iso-7* '*junet*)
     (defvar *coding-category-iso-8-1* '*ctext*)
     (defvar *coding-category-iso-8-2* '*euc-japan*)
     (defvar *coding-category-iso-else* '*iso-2022-ss2-7*)
     (defvar *coding-category-big5* '*big5-eten*)
     (defvar *coding-category-bin* '*noconv*)
     ------------------------------------------------------------

but, some of them are overridden in such language specific files as
japanese.el, chinese.el, etc.


How automatic conversion works?
-------------------------------

When coding-system `*autoconv*' is specified on reading text (this is
the default), mule tries to detect a category of coding-system by which
text are encoded.  If an appropriate category is found, it converts text
according to a coding-system bound to the cateogry.  If the 'eol-type
property of the coding-system is a vector of coding-systems and Mule
detects a type of end-of-line (LF, CRLF, or CR) of the text, one of
those coding-system is used.

Automatic conversion occurs both on reading from files and inputing from
process.  In the latter case, if some coding-system is found,
output-coding-system of the process is also set to the found
coding-system.


Priority of cateogry
--------------------

In the case that more than two categories are found, the category of the
highest priority is selected.

A priority of category is pre-defined as follows:

     ----- lisp/mule.el -----------------------------------------
     (set-coding-priority
      '(*coding-category-iso-8-2*
        *coding-category-sjis*
        *coding-category-iso-8-1*
        *coding-category-big5*
        *coding-category-iso-7*
        *coding-category-iso-else*
        *coding-category-bin*
        *coding-category-internal*))
     ------------------------------------------------------------

The function `set-coding-priority' put a property 'priority to each
element of the argument from 0 to 7 (smaller number has higher
priority).  Some language specific files may override this priority.


File: mule, Node: Mode-line, Next: ISO2022 restriction, Prev: Automatic conversion, Up: Coding-system

How coding-system is shown in mode-line?
========================================

Each coding-system has unique mnemonic (one character).  By default,
mnemonic of `file-coding-system' of a buffer is shown at the left of
mode-line of the buffer.  In addition, the mnemonic is followed by an
another mnemonic to show eol-type of the coding-system.  This mnemonic
is defined as follows:
	".": LF
	":": CRLF
	"'": CR
	"_": not yet desided
	"-": nil (for coding-system of nil, *noconv*, or *internal*) So,
usual appearance of mode-line for a buffer which is visiting a file
(*junet* encoding on Unix system) is:

     	    +-- mnemonic of file-coding-system
     	    |+-- mnemonic of eol-type
     	    VV
     	[--]J.:----Mule: filename

The left most bracket is the indicator for inputing method.

When a buffer is attaced to some process, coding-system for input and
output of the process are also shown as follows:

     	    +-- mnemonic of file-coding-system
     	    |+-- mnemonic of eol-type of file-coding-system
     	    ||+-- mnemonic of input-coding-system of a process
     	    |||+-- mnemonic of eol-type of input-coding-system
     	    ||||+-- mnemonic of output-coding-system of a process
     	    |||||+-- mnemonic of eol-type of output-coding-system
     	    VVVVVV
     	[--]+_+.--:--**-Mule: *shell*

This means that Mule is now communicating with shell with coding-systems
*autoconv*unix ("+.") for input and nil ("--") for output.


File: mule, Node: ISO2022 restriction, Next: Big5, Prev: Mode-line, Up: Coding-system

ISO2022 restriction
===================

For decoding to Type 2 (ISO2022), we have the following restrictions:

Locking-Shift:
     Use SI and SO only when decoding with a coding-system whose
     LOCK-SHIFT and SEVEN is t.

Single-Shift:
     Use SS2 and SS3 (if SEVEN is nil) or ESC N and ESC O (if SEVEN is
     t).

Invocation:
     G0 is always invoked to GL, G1 to GR (but only if SEVEN is nil).
     G2 and G3 are invoked to GL by Shingle-Shift of SS2 and SS3.

Unofficial use of ESC sequence for designation:
     If SEVEN is t, LOCK-SHIFT is nil, and designation to G2 and G3 are
     prohibited, we should designate all character sets to G0 (and hence
     invoke to GL).  To designate 96 char-set to G0, we use "ESC , <F>".
     For instance, to designate ISO8859-1 to G0, we use "ESC , A".

Unofficial use of ESC sequence for composit character:
     To indicate the start and end of composit character, we use ESC 0
     (start) and ESC 1 (end).

Text direction specifier of ISO6429
     We use ISO6429's ESC sequence "ESC [ 2 ]" to change text direction
     to right-to-left, and "ESC [ 0 ]" to revert it to left-to-right.


File: mule, Node: Big5, Prev: ISO2022 restriction, Up: Coding-system

Special treatment of Big5
=========================

As far as I know, there's several different codes called
Big5.  The most famous ones are Big5-ETen and
Big5-HKU-form2.  Since both of them use a code range 0xa140
- 0xfefe (in each row, columns (second byte) 0x7f - 0xa0 is
skipped) and number of characters is more than 13000, it's impossible to
treat each of them as a single character-set in the current Mule system.
So, Mule treat them in a quite irregular manner as described below:

  1. Mule does not treats them as a different character set, but as the
     same character set called Big5.
     	Caution!! Big5 is a different character set from GB.

  2. Mule divides Big5 into two sub-character-sets:
     	0xa140 - 0xc67e (Level 1)
     	0xc6a1 - 0xfefe (Level 2) and allocates two leading-chars
     lc-big5-1 and lc-big5-2 to them.  (See character.txt)

  3. Usually, each leading-char (or character-set) has unique character
     category.  But lc-big5-1 and lc-big5-2 has the same character
     category of mnemonic 't'.  So, regular expression "\\ct" matches
     any Big5 (Level 1 and Level 2) characters.  (See syntax.txt)

  4. If you specify ISO2022 type coding-system on output, Mule converts
     Big5 code using unofficial final-characters '0' (for Level 1) and '
     1' (for Level 2).

  5. You can use either fonts of ETen or HKU for displaying Big5 code.
     Mule judges which font is used by examining existence of character
     whose code point is 0xC6A1.  If it exists, the font is HKU, else
     the fonts is ETen.


File: mule, Node: Syntax, Next: Font, Prev: Coding-system, Up: Top

Syntax and Category of character
================================


Syntax
------

Mule can define syntax of all multi-byte characters by
`modify-syntax-entry'.

The first argument of `modify-syntax-entry should' be one of below:
  1. ASCII character
  2. multi-byte character
  3. leading character of multi-byte character
  4. partially defined characters returned by:

          `(make-character leading-char arg)'

There's a restriction of specifying matching character within second
argument.  If the first argument specifies multi-byte character or
leading char of multi-byte character, the matching character should have
the same leading character.  If the character is 2-byte code, the
first-byte of it should also be the same with the first-byte of first
argument.


Category
--------

Like syntax, category also defines characteristics of characters.  The
differences are:
  1. Each Character can have more than one category.
  2. User can define new type of category as he wishes.
     	Example: See japanese.el
  3. `char-category' returns all mnemonics of the character by string.
  4. For regular expression search, you can use the \cm or \Cm (any
     mnemonics comes at the place of 'm') instead of \sm and \Sm.


File: mule, Node: Font, Next: CCL, Prev: Syntax, Up: Top

Font
====

FONTSET is a set of fonts which have the same height and style.  A
fontset should hopefully contain enough fonts to display a character of
various character sets.

Mule uses fontset instead of font.  You can specify fontset at any place
where you can specify font.  You can still specify font, in which case,
a fontset which include the font is searched and used.

Like font, fontset is also a string specifying the name.

* Menu:

* Initial fontsets::	Fontsets which Mule have at startup time.
* Specify fontset::     How to specify a fontset?
* Manage fontset::      How to create or modify a fontset?


File: mule, Node: Initial fontsets, Next: Specify fontset, Up: Font


"default-fontset"
-----------------

Mule automatically creates a fontset named "default-fontset" at startup
time.  Each font in this fontset is specifed by a very generic name such
as "-*-fixed-medium-r-*--16-*-iso8859-1" for ASCII and
"-*-fixed-medium-r-*--*-jisx0208.1983-*" for JISX0208 (Kanji).  These
values are defined in `lisp/term/x-win.el'.

If there's no other fontsets specifed by X's resource, "default-fontset"
is used for the first frame of Mule.

In most cases, this is enough.  You probably don't have to have any
other fontsets.


X's resourse
------------

Mule also creates fontsets specified in X's resource "fontSetList (class
FontSetList)".  The value is a comma separated list of fontset names.

     *FontSetList: 16,24

The actual contents of each fontset is specified by "fontSet-xxx (class
FontSet-xxx)" where "xxx" is a name of the corresponding fontset.  The
value of this resource is a comma separated list of font names.

     *FontSet-16: -etl-fixed-medium-r-*--24-*-iso8859-1

Each font name should not contain wild card `*' or `?' in
CHARSET_REGSTRY field because a character set for this font is
recognized by this field.  This means that you don't have to care about
the order of font names.

For instance,

     *FontSet-16:\
             -etl-fixed-medium-r-*--16-*-iso8859-1\
     	-ming-fixed-medium-r-*--*-*-jisx0208.1983-*

is enough to tell Mule that the fontset "16" contains ASCII font and
JISX0208 font.  Please note that the second name has only wild card in
PIXEL_SIZE field.  Since Mule try to open a font of the same PIXEL_SIZE
as ASCII font of the same fontset, you'ld better not specify actual
value in PIXEL_SIZE field except for ASCII font.

As for fonts not listed in the specification of fontset, corresponding
font names in "default fontset" is used.

The first fontset in FontSetList is used for the first frame of Mule.
If you want to use "default-fontset" while specifying other fontsets in
the resource, please put "default-fontset" at the first of the value.

     *FontSetList: default-fontset,16,24

In this case, you don't have to have the resource
"FontSet-default-fontset".


File: mule, Node: Specify fontset, Next: Manage fontset, Prev: Initial fontsets, Up: Font

How to specify a fontset?
=========================

You can specify fontset at any place where you can sepcify font.

To change the fontset used for the first frame of Mule:

  1. command line arguments "-fn xxx" or "-font xxx"

     If this argument exits, fontset is searched in the following order:
       1. A fontset whose name is "xxx".
       2. A fontset which contains ASCII font "xxx".
       3. Create a new fontset "xxx" which contains ASCII font "xxx".

  2. In your ~/.emacs,

          (setcdr (assoc 'font default-frame-alist) "xxx")


To change a fontset after Mule started:

  1. By the command

          M-x set-default-fontset<CR>xxx<CR>

  2. By Ctl-Mouse-3



File: mule, Node: Manage fontset, Prev: Specify fontset, Up: Font

How to create or modify a fontset?
==================================

You can create a new fontset by `new-fontset' and modify an existing
fontset by `set-fontset-font'.

You can get a list of fontset currently created by `fonset-list'.

You can check if a fontset is already created or not by `fonsetp'.



Tag table:
Node: Top242
Node: Overview1420
Node: Character1782
Node: Character type1994
Node: Buffer and string2667
Node: Character object4476
Node: GLYPH5894
Node: Functions6204
Node: Character set11496
Node: Coding-system16447
Node: Structure17347
Node: Creation20410
Node: Predefined coding-system21483
Node: Automatic conversion24828
Node: Mode-line28449
Node: ISO2022 restriction30011
Node: Big531239
Node: Syntax32862
Node: Font34161
Node: Initial fontsets34837
Node: Specify fontset37052
Node: Manage fontset37828

End tag table