Info file: mule, -*-Text-*- produced by `texinfo-format-buffer' from file `mule.texi' using `texinfmt.el' version 2.32 of 19 November 1993. This file documents the internal structure of Mule (MULtilingual Enhancement to GNU Emacs).  File: mule, Node: Top, Next: Overview, Up: (dir) Mule specific information ************************* This file documents Mule specific information. We are very sorry for the poor structure of this file. Currently, this is just a collection of several documents which are previously located under `doc' directory. * Menu: * Overview:: Overview of the internal structure of Mule * Character:: Internal representation of multilingual text * Coding-system:: Encoding of text while reading/writing * Syntax:: Extended syntax and character category * Font:: Font handling while displaying text on X * CCL: (CCL). Code Conversion Language * Terminology: (terminology). * Languages: (languages). Language specific tips. * X's font: (XFONT). X's FONT usage for novice users * R2L: (R2L). Right-to-left writing * EGG: (egg). Japanese/Chinese inputting methods using Wnn/cWnn * Quail: (quail). Imputting methods of multilingual text * Keyboard Translation: (kbd-trans). * ISO2022: (ISO2022). ISO2022 encoding mechanism * m2ps: (m2ps). Convert multilingual text to PostScript  File: mule, Node: Overview, Next: Character, Prev: Top, Up: Top Overview ======== To handle multilingual text, Mule extended GNU Emacs in many aspects. Mule uses special internal representation of multilingual text, converts text from/to outernal representation on reading/writing, special font selection mechanism to display multilingual text on X window.  File: mule, Node: Character, Next: Coding-system, Prev: Overview, Up: Top Character ========= * Menu: * Character type:: * Buffer and string:: * Character object:: * GLYPH:: * Functions:: * Character set::  File: mule, Node: Character type, Next: Buffer and string, Up: Character Character type ============== There are 6 types of character. `Type N-M' means that original N-byte code is represented by M-byte within Mule. *Type 1-1* ASCII characters *Type 1-2* Characters in one-byte official character-set (e.g. ISO8859-1, Latin-1) *Type 1-3* Characters in one-byte private character-set *Type 2-3* Characters in two-byte official character-set (e.g. JISX0208, Japanese) *Type 2-4* Characters in two-byte private character-sets *Type N* Composite characters of variable length *Note Character set:: for predefined character sets.  File: mule, Node: Buffer and string, Next: Character object, Prev: Character type, Up: Character Buffer and string ================= *Type 1-1* 1-byte 'C' [C <= 0x7F] (same as the original representation) *Type 1-2* 2-byte sequence 'LC1 C1', where LC1 = leading character for the character-set, 0x81..0x8F -- 15 sets C1 = 0x80 | (original byte for the character)] 0xA0 <= C1 <= 0xFF *Type 1-3* 3-byte sequence 'LCPRV1 LC12 C1', where LCPRV1 = 0x9A (for one column) or 0x9B (for two column) LC12 = extended leading character, 0xA0 <= LC12 <=0xDF (if LCPRV1 = 0x9A) -- 64 sets 0xE0 <= LC12 <=0xEF (if LCPRV1 = 0x9B) -- 16 sets C1 = same as above *Type 2-3* 3-byte sequence 'LC2 C21 C22', where LC2 = leading character for the character-set, 0x90 <= LC2 <= 0x99 -- 10 sets C21 = 0x80 | (original first byte for the character), C22 = 0x80 | (original second byte for the character), 0xA0 <= C21,C22 <= 0xFF *Type 2-4* 4-byte sequence 'LCPRV2 LC22 C21 C22' LCPRV2 = 0x9D (for one column) or 0x9E (for two column) LC22 = extended leading character, 0xF0 <= LC22 <=0xF4 (if LCPRV2 = 0x9C) -- 5 sets 0xF5 <= LC22 <=0xFE (if LCPRV2 = 0x9D) -- 10 sets C21, C22 = same as above *Type N* n-byte sequence 'LCCMP LCN1 C11 ... LCN2 C21 ... LCNn Cn1 ...' all characters 'LCN1 C11 ... LCN2 C21 ... LCNN CN1 ...' are displayed on the same column. LCCMP = 0x80 LCN1 .. LCNN = leading character + 0x20, but, for ASCII, 0xA0 Here's an example of a text with mixture of these types (at the place of 0x?? comes real binary code) . "Here comes Latin-1 character of n with ~ '0x81 0xF1' and here comes Japanese Hiragana '0x92 0xA4 0xA2'."  File: mule, Node: Character object, Next: GLYPH, Prev: Buffer and string, Up: Character Character object ================ Emacslisp treats a character object as an integer of value less than 256 (8-bit). Mule extends a character object to 19-bit. The bit fields are divided into 3 parts: f1(5bits):f2(7bits):f3(7bits) *Type 1-1: C [C <= 7F] (same as character code itself)* 0:00:00-7F *Type 1-2: ((LC1 & 0x7F + 0x10) << 7) | (C1 & 0x7F)* 0:11-1F:20-7F (f1=0,f2=LC1&0x7F + 0x10,f3=C1&0x7F) *Type 1-3: ((LC21 & 0x7F + 0x10) << 7) | (C1 & 0x7F)* 0:30-FF:20-7F (f1=0,f2=LC21&0x7F + 0x10,f3=C1&0x7F) *Type 2-3: ((LC2 - 0x8F) << 14) | ((C21 & 0x7F) << 7) | (C22 & 0x7F)* 01-0A:20-7F:20-7F (f1=LC2&0x7F,f2=C21&0x7F,f3=C22&0x7F) *Type 2-4: ((LC22 - 0xE0) << 14) | ((C21 & 0x7F) << 7) | (C22 & 0x7F)* 10-1E:20-7F:20-7F (f1=LC2&0x7F-0x20,f2=C21&0x7F,f3=C22&0x7F) *Type N:* 1F:00-7F:00-7F For instance, if '?' is followed by Type 1-2 character '0x81 0xF1', 241 [= 0xF1 = ((0x81 & 0x7F) << 7) | (0xF1 & 0x7F)] is returned. In the above table, several blocks are not defined. Those are used internally to represent incomplete characters. 0:01-12:00 leading-char only or invalid char 0:20-5F:00 LCPRV11/LCPRV12 + LC21 of Type 1-3 1-8:20-7F:00 LC2 + C21 of Type 2-3 9-E:00:00 LCPRV21/LCPRV22 + LC22 of Type 2-4 9-E:20-7F:00 LCPRV21/LCPRV22 + LC22 + C21 of Type 2-4  File: mule, Node: GLYPH, Next: Functions, Prev: Character object, Up: Character GLYPH ===== The original definition of GLYPH is (FACE-ID << 8 | CHAR). Since Mule, however, requires 19 bits for CHAR, the definition is changed to (FACE-ID << 19 | CHAR). So, we can use only 2024 (= 2^11) different faces.  File: mule, Node: Functions, Next: Character set, Prev: GLYPH, Up: Character Functions ========= To handle multilingual characters, we extended or added the following functions: In editfns.c ... char-to-string: Convert arg CHAR to a string containing that character. If CHAR < 0, it is considered as a multilingual character, and returned a correct string. Example: (char-to-string ?A) => "A" (char-to-string ?あ) => "あ" (char-to-string 53794) => "あ" string-to-char: Convert arg STRING to a character, the first character of that string. Example: (string-to-char "ABあい") => 65 (== ?A) (string-to-char "あい") => 53794 (== ?あ) sref: DEFUN ("sref", Fsref, Ssref, 2, 2, 0, Return the character in STRING at index INDEX. INDEX starts at 0. If INDEX does not points to character boundary, -1 is returned. Example: (sref "ABあい" 1) => 66 (== ?b) (sref "ABあい" 2) => 53794 (== ?あ) (sref "ABあい" 3) => -1 (non character boundary) (sref "ABあい" 5) => 53796 (== ?い) sset: Store into STRING at index INDEX the character CHAR. INDEX should point to a character of same bytes as CHAR. If not, returns nil, else returns CHAR. Example: (setq s "ABあい") (sset s 1 ?C) => ?C (s == "ACあい") (sset s 2 ?う) => ?う (s == "ACうい") (sset s 2 ?A) => ?A (s == "ACA\244\246い") (sset s 8 ?A) => nil (out of range) following-char: Return the character following point, as a number. If mc-flag of the current buffer is not nil, the returned character may be a multi-byte character. Example: If cursor is at 'あ' of buffer "..Aあ..", (following-char) => 53794 (== ?あ) (let ((mc-flag nil)) (following-char t)) => 146 (== leading char of ?あ) preceding-char: Return the character preceding point, as a number. If mc-flag of the current buffer is not nil, the returned character may be a multi-byte character. Example: If cursor is at 'A' of buffer "..あA..", (preceding-char) => 53794 (== ?あ) (let ((mc-flag nil)) (preceding-char t)) => 162 (== last byte of ?あ) char-after First arg, POS, a number. Return the character in the current buffer at position POS. If POS is out of range, the value is NIL. If mc-flag of the current buffer is not nil, the returned character may be a multi-byte character. Function 'insert' and 'insert-char' also work correctly with multilingual characters. (insert ?あ) -- inserts "あ" at point. buffer-substring: Return the contents of part of the current buffer as a string. The two arguments specify the start and end, as character numbers. If mc-flag of the current buffer is non-nil, region may be widen to meet character boundary. Example: If a buffer starts with the contents like "あいう..." (buffer-substring 1 2) => "あ" (buffer-substring 1 3) => "あ" (buffer-substring 2 4) => "あい" Other functions which deal with 'region' also widen range automatically. subst-char-in-region: From START to END, replace FROMCHAR with TOCHAR each time it occurs. If optional arg NOUNDO is non-nil, don't record this change for undo and don't mark the buffer as really changed. It also works well with multilingual characters only if the substitution doesn't alter the length of buffer. Example: (subst-char-in-region 1 10 ?a ?b) => possible (subst-char-in-region 1 10 ?あ ?い) => possible (subst-char-in-region 1 10 ?a ?あ) => impossible In functions 'message' and 'format', %c works well with multilingual characters. (message "%c" ?あ) -- shows "あ" in echo area. In mule.c ... make-character: Make multi-byte character from LEADING-CHAR and optional args ARG1 and ARG2. Example: (make-character lc-jp ?\244 ?\242) => 53794 (== ?あ) char-component: Return a components of multi-byte character CHAR. Second arg IDX indicate which component should be returned as follows. 0: leading character or extended leading character, 1: first byte of the character code, 2: second byte of the character code. If the character does not have the componets, 0 is returned. Example: (char-component ?あ 0) => 146 (== lc-jp) (char-component ?あ 1) => 164 (char-component ?あ 2) => 162 (char-component ?A 1) => 0 char-leading-char: Return leading character of CHAR. If CHAR is not a multi-byte code, 0 is returned. Example: (char-leading-char ?あ) => 146 (== lc-jp) (char-leading-char ?A) => 0 char-bytes: Return number of bytes CHAR will occupy in a buffer. You can specify a character set to be concerned by providing a leading character as CHAR. Example: (char-bytes ?あ) => 3 (char-bytes ?A) => 1 (char-bytes lc-jp) => 3 char-width: Return number of columns CHAR will occupy when displayed. You can specify a character set to be concerned by providing a leading character as CHAR. Example: (char-width ?あ) => 2 (char-width ?A) => 1 (char-width lc-jp) => 2 chars-in-string: Return number of characters in STRING. Each multilingual character is also counted as one. Example: (chars-in-string "ABあい") => 4 char-boundary-p: Return non nil value if POS is at character boundary. The value is: 0: if POS is at an ASCII character or end of range, 1: if POS is at a leading char of 2-byte character. 2: if POS is at a leading char of 3-byte character. If POS is out of range or not at character boundary, nil is returned.  File: mule, Node: Character set, Prev: Functions, Up: Character A character set is a set of ordered characters such as ASCII, right half of ISO8859-1, JIS X0208, and etc. Mule identifies a character set by a leading-char assigned to each set uniquely. Each character-set is characterized by the following attributes: 1. Bytes length of code: 1-byte or 2-byte ISO8859-1, Right half of JISX0201 (Japanese Katakana) -- 1-byte GB2312-1980 (Chinese), JISX0208 (Japanese) -- 2-byte 2. Columns occupied on a screen: 1-column or 2-column, ISO8859-1, Right half of JISX0201 (Japanese Katakana) -- 1-column GB2312-1980 (Chinese), JISX0208 (Japanese) -- 2-column 3. Type: 94-char-set, 96-char-set, 94x94-char-set, or 96x96 char-set, 4. Graphic set: GL or GR, 5. Final character: one of '0' thru '~', 6. Displaying direction: Left-to-right or Right-to-left 7. Leading character: the system assigns one by one. 3 thru 5 are notations of ISO2022. Character-sets are defined by 'new-character-set' function call. --- mule.c --------------------------------------------------------- DEFUN ("new-character-set", Fnew_character_set, Snew_character_set, 8, MANY, 0, "Define new character set of LEADING-CHAR (1st arg).\n\ Rest of args are:\n\ BYTE: 1, 2, or 3\n\ COLUMNS: 1 or 2\n\ TYPE: 0 (94 chars), 1 (96 chars), 2 (94x94 chars), or 3 (96x96 chars)\n\ GRAPHIC: 0 (use g0 on output) or 1 (use g1 on output)\n\ FINAL: final character of ISO escape sequence\n\ DIRECTION: 0 (left-to-right) or 1 (right-to-left)\n\ DOC: short description string.\n\ If LEADING-CHAR >= 0xA0, it is regarded as extended leading-char\n\ and BYTE and COLUMNS args are ignored.") ------------------------------------------------------------ The system pre-defines the following character-sets. --- mule.el --------------------------------------------------------- (defconst *predefined-character-set* (list ;; (cons lc '(bytes width type graphic final direction doc)) ;; (cons lc-ascii '(0 1 0 0 ?B 0 "ASCII" "ISO8859-1")) ;; predefined in C (cons lc-ltn1 '(1 1 1 1 ?A 0 "Latin-1" "ISO8859-1")) (cons lc-ltn2 '(1 1 1 1 ?B 0 "Latin-2" "ISO8859-2")) (cons lc-ltn3 '(1 1 1 1 ?C 0 "Latin-3" "ISO8859-3")) (cons lc-ltn4 '(1 1 1 1 ?D 0 "Latin-4" "ISO8859-4")) (cons lc-thai '(1 1 1 1 ?T 0 "Thai" "TIS620")) (cons lc-grk '(1 1 1 1 ?F 0 "Greek" "ISO8859-7")) (cons lc-arb '(1 1 1 1 ?G 1 "Arabic" "ISO8859-6")) (cons lc-hbw '(1 1 1 1 ?H 1 "Hebrew" "ISO8859-8")) (cons lc-kana '(1 1 0 1 ?I 0 "Japanese Katakana" "JISX0201.1976")) (cons lc-roman '(1 1 0 0 ?J 0 "Japanese Roman" "JISX0201.1976")) (cons lc-crl '(1 1 1 1 ?L 0 "Cyrillic" "ISO8859-5")) (cons lc-ltn5 '(1 1 1 1 ?M 0 "Latin-5" "ISO8859-9")) (cons lc-jpold '(2 2 2 0 ? 0 "Japanese Old" "JISX0208.1978")) (cons lc-cn '(2 2 2 0 ?A 0 "Chinese" "GB2312")) (cons lc-jp '(2 2 2 0 ?B 0 "Japanese" "JISX0208.\\(1983\\|1990\\)")) (cons lc-kr '(2 2 2 0 ?C 0 "Korean" "KSC5601")) (cons lc-jp2 '(2 2 2 0 ?D 0 "Japanese Supplement" "JISX0212")) (cons lc-cns1 '(2 2 2 0 ?G 0 "CNS Plane1" "CNS11643.1")) (cons lc-cns2 '(2 2 2 0 ?H 0 "CNS Plane2" "CNS11643.2")) (cons lc-big5-1 '(2 2 2 0 ?0 0 "Big5 Level 1" "Big5")) (cons lc-big5-2 '(2 2 2 0 ?1 0 "Big5 Level 2" "Big5")))) (let ((c *predefined-character-set*) lc data) (while c (setq lc (car (car c)) data (cdr (car c))) (apply 'new-character-set lc data) (setq c (cdr c)))) In addition, the following private character sets are predifined. --- mule-config.el ----------------------------------------- ;; REGISTRATION OF PRIVATE CHARACTER SETS ;; PinYin-ZhuYin (setq lc-sisheng (new-private-character-set 1 1 0 0 ?0 0 "PinYin-ZhuYin" "sisheng_cwnn")) (setq lc-ascr2l (new-private-character-set 1 1 0 0 ?B 1 "Right-to-Left ASCII" "ISO8859-1")) ;; Vietnamese VISCII with two tables. (setq lc-vn-1 (new-private-character-set 1 1 1 1 ?1 0 "VISCII lower" "VISCII1.1")) (setq lc-vn-2 (new-private-character-set 1 1 1 1 ?2 0 "VISCII upper" "VISCII1.1")) ;; Three character sets for Arabic (setq lc-arb0 (new-private-character-set 1 1 0 0 ?2 0 "Arabic digit" "MuleArabic-0")) (setq lc-arb1 (new-private-character-set 1 1 0 0 ?3 1 "1-column Arabic" "MuleArabic-1")) (setq lc-arb2 (new-private-character-set 1 2 0 0 ?4 1 "2-column Arabic" "MuleArabic-2")) ;; for Mule IPA (setq lc-ipa0 (new-private-character-set 1 1 1 1 ?0 0 "IPA for Mule" "MuleIPA")) ------------------------------------------------------------  File: mule, Node: Coding-system, Next: Syntax, Prev: Character, Up: Top Coding-system ============= `coding-system' is a method for encoding several character-sets and represented by a symbol which has properties of 'coding-system and ' eol-type. You can specify different coding-system on file I/O, process I/O, output to terminal (if not running on X), input from keyboard (if not running on X). * Menu: * Structure:: Structure of coding-system o Property 'coding-system o Property 'eol-type o Property 'post-read-conversion o Property 'pre-write-conversion * Creation:: How to create coding-system? * Predefined coding-system:: * Automatic conversion:: o Category of coding-system o How automatic conversion works? o Priority of category * Mode-line:: How coding-system is shown in mode-line?:: * ISO2022 restriction:: * Big5:: Special treatment of Big5  File: mule, Node: Structure, Next: Creation, Prev: Coding-system, Up: Coding-system Structure of coding-system ========================== Property 'coding-system ----------------------- The value of the property 'coding-system is a vector: [ TYPE MNEMONIC DOCUMENT DUMMY FLAGS ] or the other coding-system. Contents of the vector are: TYPE: nil: no conversion, t: automatic conversion, 0:Internal, 1:Shift-JIS, 2:ISO2022, 3:Big5, 4:CCL. MNEMONIC: a character shown at mode-line to indicate the coding-system. DOCUMENT: a describing documents for the coding-system. DUMMY: always nil (for backward compatibility) FLAGS (option): more precise information about the coding-system, If TYPE is 2 (ISO2022), FLAGS should be a list of: LC-G0, LC-G1, LC-G2, LC-G3: Leading character of charset initially designated to G? graphic set, nil means G? is not designated initially, lc-invalid means G? can never be designated to, if (- leading-char) is specified, it is designated on output, SHORT: non-nil - allow such as \"ESC $ B\", nil - always \"ESC $ \( B\", ASCII-EOL: non-nil - designate ASCII to g0 at end of line on output, ASCII-CNTL: non-nil - designate ASCII to g0 at control codes on output SEVEN: non-nil - use 7-bit environment on output, LOCK-SHIFT: non-nil - use locking-shift (SO/SI) instead of single-shift or designation by escape sequence, USE-ROMAN: non-nil - designate JIS0201-1976-Roman instead of ASCII, USE-OLDJIS: non-nil - designate JIS0208-1976 instead of JIS0208-1983, NO-ISO6429: non-nil - don't use ISO6429's direction specification, If TYPE is 3 (Big5), FLAGS `t' means Big5-ETen, `nil' means Big5-HKU, If TYPE is 4 (private), FLAGS should be a cons of CCL programs for encoding and decoding. See documentation of CCL for more detail. Property 'eol-type ------------------ The value of the property 'eol-type is: nil: no conversion for end-of-line type 1: LF 2: CRLF 3: CR vector of length 3: automatic detection of end-of-line type. 1st element: coding-system of eol-type LF 2nd element: coding-system of eol-type CRLF 3rd element: coding-system of eol-type CR Property 'post-read-conversion ------------------------------ The value of the property 'post-read-conversion is a function to convert some text just read into a buffer. When the function is called, the text has already been converted according to 'coding-system and ' eol-type of the coding-system. The argument of the function is the region (START and END) of inserted text. Property 'pre-write-conversion ------------------------------ The value of the property 'pre-write-conversion is a function to convert some text just before writing it out. After the function is called, the text is converted accoding to 'coding-system and 'eol-type of the coding-system. The argument of the function is the region (START and END) of the text.  File: mule, Node: Creation, Next: Predefined coding-system, Prev: Strucure, Up: Coding-system How to create coding-system? ============================ Mule provides a function `make-coding-system' to create a coding-system. FUNCTION make-coding-system: NAME TYPE MNEMONIC DOC &optional EOL-TYPE FLAGS Register symbol NAME as a coding-system whose 'coding-system property is a vector [ TYPE MNEMONIC DOC nil FLAGS ] and 'eol-type property is EOL-TYPE. If `t' is specified as EOL-TYPE, the value of 'eol-type property is a vector of generated coding-systems whose 'eol-type properties are 1 (LF), 2 (CRLF), and 3 (CR). The names of generated coding-systems are NAMEunix, NAMEdos, and NAMEmac respectively. Just to make an alias of some coding-system, call a fucntion `copy-coding-system'. FUNCTION copy-coding-system: ORIGINAL ALIAS Make the same coding-system as ORIGINAL and name it ALIAS. If 'eol-type property of ORIGINAL is a vector, coding-systems ALIASunix, ALIASdos, and ALIASmac are generated, and 'eol-type property of ALIAS becomes a vector of them.  File: mule, Node: Predefined coding-system, Next: Automatic conversion, Prev: Creation, Up: Coding-system Predefined coding-system ======================== In the file lisp/mule.el, the following coding-systems are predefined. ----- lisp/mule.el ----------------------------------------- (make-coding-system '*noconv* nil ?= "No conversion.") (make-coding-system '*autoconv* t ?+ "Automatic conversion." t) (make-coding-system '*internal* 0 ?= "Internal coding-system used in a buffer.") (make-coding-system '*sjis* 1 ?S "Coding-system of Shift-JIS used in Japan." t) (make-coding-system '*iso-2022-jp* 2 ?J "Coding-system used for communication with mail and news in Japan." t (list lc-ascii lc-invalid lc-invalid lc-invalid 'short 'ascii-eol 'ascii-cntl 'seven)) (copy-coding-system '*iso-2022-jp* '*junet*) (make-coding-system '*oldjis* 2 ?J "Coding-system used for old jis terminal." t (list lc-ascii lc-invalid lc-invalid lc-invalid 'short 'ascii-eol 'ascii-cntl 'seven nil 'use-roman 'use-oldjis)) (make-coding-system '*ctext* 2 ?X "Coding-system used in X as Compound Text Encoding." t (list lc-ascii lc-ltn1 lc-invalid lc-invalid nil 'ascii-eol)) (make-coding-system '*euc-japan* 2 ?E "Coding-system of Japanese EUC (Extended Unix Code)." t (list lc-ascii lc-jp lc-kana lc-jp2 'short 'ascii-eol 'ascii-cntl)) (make-coding-system '*euc-korea* 2 ?K "Coding-system of Korean EUC (Extended Unix Code)." t (list lc-ascii lc-kr lc-invalid lc-invalid nil 'ascii-eol 'ascii-cntl)) ;; 93.12.16 by K.Handa (copy-coding-system '*euc-korea* '*euc-kr*) (make-coding-system '*iso-2022-kr* 2 ?k "Coding-System used for communication with mail in Korea." nil (list lc-ascii (- lc-kr) lc-invalid lc-invalid nil 'ascii-eol 'ascii-cntl 'seven 'lock-shift)) (copy-coding-system '*iso-2022-kr* '*korean-mail*) (make-coding-system '*euc-china* 2 ?C "Coding-system of Chinese EUC (Extended Unix Code)." t (list lc-ascii lc-cn lc-invalid lc-invalid nil 'ascii-eol 'ascii-cntl)) (make-coding-system '*iso-2022-ss2-8* 2 ?I "ISO-2022 coding system using SS2 for 96-charset in 8-bit code." t (list lc-ascii lc-invalid nil lc-invalid nil 'ascii-eol 'ascii-cntl)) (make-coding-system '*iso-2022-ss2-7* 2 ?I "ISO-2022 coding system using SS2 for 96-charset in 7-bit code." t (list lc-ascii lc-invalid nil lc-invalid 'short 'ascii-eol 'ascii-cntl 'seven)) (make-coding-system '*iso-2022-lock* 2 ?i "ISO-2022 coding system using Locking-Shift for 96-charset." t (list lc-ascii nil lc-invalid lc-invalid nil 'ascii-eol 'ascii-cntl 'seven 'lock-shift)) ;93.12.1 by H.Minamino (make-coding-system '*big5-eten* 3 ?B "Coding-system of BIG5-ETen." t t) (make-coding-system '*big5-hku* 3 ?B "Coding-system of BIG5-HKU." t nil) ------------------------------------------------------------  File: mule, Node: Automatic conversion, Next: Mode-line, Prev: Predefined coding-system, Up: Coding-system Automatic conversion ==================== Category of coding-system ------------------------- Mule has a facility to detect coding-system of text automatically, however, what mule actually detect is not a coding-system itself but a category of coding-system. A category is also represented by a symbol and a value should be an actual coding-system. There are eight categories: *coding-category-internal*: coding-system used in a buffer *coding-category-sjis* Shift-JIS *coding-category-iso-7* ISO2022 variation with the following feature: o no locking shift, single shift o only G0 is used *coding-category-iso-8-1* ISO2022 variation with the following feature: o no locking shift o designation sequence is allowed only for G0 and G1 o G1 is used only for 1-byte character set *coding-category-iso-8-2* ISO2022 variation with the following feature: o no locking shift o designation sequence is allowed only for G0 and G1 o G1 is used only for 2-byte character set *coding-category-iso-else* ISO2022 variation which doesn't satisfy any of above. *coding-category-big5* Big5 (ETen or HKU) *coding-category-bin* Any other coding-system which uses MSB. The values of these symbols are pre-defined as follows: ----- lisp/mule.el ----------------------------------------- (defvar *coding-category-internal* '*internal*) (defvar *coding-category-sjis* '*sjis*) (defvar *coding-category-iso-7* '*junet*) (defvar *coding-category-iso-8-1* '*ctext*) (defvar *coding-category-iso-8-2* '*euc-japan*) (defvar *coding-category-iso-else* '*iso-2022-ss2-7*) (defvar *coding-category-big5* '*big5-eten*) (defvar *coding-category-bin* '*noconv*) ------------------------------------------------------------ but, some of them are overridden in such language specific files as japanese.el, chinese.el, etc. How automatic conversion works? ------------------------------- When coding-system `*autoconv*' is specified on reading text (this is the default), mule tries to detect a category of coding-system by which text are encoded. If an appropriate category is found, it converts text according to a coding-system bound to the cateogry. If the 'eol-type property of the coding-system is a vector of coding-systems and Mule detects a type of end-of-line (LF, CRLF, or CR) of the text, one of those coding-system is used. Automatic conversion occurs both on reading from files and inputing from process. In the latter case, if some coding-system is found, output-coding-system of the process is also set to the found coding-system. Priority of cateogry -------------------- In the case that more than two categories are found, the category of the highest priority is selected. A priority of category is pre-defined as follows: ----- lisp/mule.el ----------------------------------------- (set-coding-priority '(*coding-category-iso-8-2* *coding-category-sjis* *coding-category-iso-8-1* *coding-category-big5* *coding-category-iso-7* *coding-category-iso-else* *coding-category-bin* *coding-category-internal*)) ------------------------------------------------------------ The function `set-coding-priority' put a property 'priority to each element of the argument from 0 to 7 (smaller number has higher priority). Some language specific files may override this priority.  File: mule, Node: Mode-line, Next: ISO2022 restriction, Prev: Automatic conversion, Up: Coding-system How coding-system is shown in mode-line? ======================================== Each coding-system has unique mnemonic (one character). By default, mnemonic of `file-coding-system' of a buffer is shown at the left of mode-line of the buffer. In addition, the mnemonic is followed by an another mnemonic to show eol-type of the coding-system. This mnemonic is defined as follows: ".": LF ":": CRLF "'": CR "_": not yet desided "-": nil (for coding-system of nil, *noconv*, or *internal*) So, usual appearance of mode-line for a buffer which is visiting a file (*junet* encoding on Unix system) is: +-- mnemonic of file-coding-system |+-- mnemonic of eol-type VV [--]J.:----Mule: filename The left most bracket is the indicator for inputing method. When a buffer is attaced to some process, coding-system for input and output of the process are also shown as follows: +-- mnemonic of file-coding-system |+-- mnemonic of eol-type of file-coding-system ||+-- mnemonic of input-coding-system of a process |||+-- mnemonic of eol-type of input-coding-system ||||+-- mnemonic of output-coding-system of a process |||||+-- mnemonic of eol-type of output-coding-system VVVVVV [--]+_+.--:--**-Mule: *shell* This means that Mule is now communicating with shell with coding-systems *autoconv*unix ("+.") for input and nil ("--") for output.  File: mule, Node: ISO2022 restriction, Next: Big5, Prev: Mode-line, Up: Coding-system ISO2022 restriction =================== For decoding to Type 2 (ISO2022), we have the following restrictions: Locking-Shift: Use SI and SO only when decoding with a coding-system whose LOCK-SHIFT and SEVEN is t. Single-Shift: Use SS2 and SS3 (if SEVEN is nil) or ESC N and ESC O (if SEVEN is t). Invocation: G0 is always invoked to GL, G1 to GR (but only if SEVEN is nil). G2 and G3 are invoked to GL by Shingle-Shift of SS2 and SS3. Unofficial use of ESC sequence for designation: If SEVEN is t, LOCK-SHIFT is nil, and designation to G2 and G3 are prohibited, we should designate all character sets to G0 (and hence invoke to GL). To designate 96 char-set to G0, we use "ESC , ". For instance, to designate ISO8859-1 to G0, we use "ESC , A". Unofficial use of ESC sequence for composit character: To indicate the start and end of composit character, we use ESC 0 (start) and ESC 1 (end). Text direction specifier of ISO6429 We use ISO6429's ESC sequence "ESC [ 2 ]" to change text direction to right-to-left, and "ESC [ 0 ]" to revert it to left-to-right.  File: mule, Node: Big5, Prev: ISO2022 restriction, Up: Coding-system Special treatment of Big5 ========================= As far as I know, there's several different codes called Big5. The most famous ones are Big5-ETen and Big5-HKU-form2. Since both of them use a code range 0xa140 - 0xfefe (in each row, columns (second byte) 0x7f - 0xa0 is skipped) and number of characters is more than 13000, it's impossible to treat each of them as a single character-set in the current Mule system. So, Mule treat them in a quite irregular manner as described below: 1. Mule does not treats them as a different character set, but as the same character set called Big5. Caution!! Big5 is a different character set from GB. 2. Mule divides Big5 into two sub-character-sets: 0xa140 - 0xc67e (Level 1) 0xc6a1 - 0xfefe (Level 2) and allocates two leading-chars lc-big5-1 and lc-big5-2 to them. (See character.txt) 3. Usually, each leading-char (or character-set) has unique character category. But lc-big5-1 and lc-big5-2 has the same character category of mnemonic 't'. So, regular expression "\\ct" matches any Big5 (Level 1 and Level 2) characters. (See syntax.txt) 4. If you specify ISO2022 type coding-system on output, Mule converts Big5 code using unofficial final-characters '0' (for Level 1) and ' 1' (for Level 2). 5. You can use either fonts of ETen or HKU for displaying Big5 code. Mule judges which font is used by examining existence of character whose code point is 0xC6A1. If it exists, the font is HKU, else the fonts is ETen.  File: mule, Node: Syntax, Next: Font, Prev: Coding-system, Up: Top Syntax and Category of character ================================ Syntax ------ Mule can define syntax of all multi-byte characters by `modify-syntax-entry'. The first argument of `modify-syntax-entry should' be one of below: 1. ASCII character 2. multi-byte character 3. leading character of multi-byte character 4. partially defined characters returned by: `(make-character leading-char arg)' There's a restriction of specifying matching character within second argument. If the first argument specifies multi-byte character or leading char of multi-byte character, the matching character should have the same leading character. If the character is 2-byte code, the first-byte of it should also be the same with the first-byte of first argument. Category -------- Like syntax, category also defines characteristics of characters. The differences are: 1. Each Character can have more than one category. 2. User can define new type of category as he wishes. Example: See japanese.el 3. `char-category' returns all mnemonics of the character by string. 4. For regular expression search, you can use the \cm or \Cm (any mnemonics comes at the place of 'm') instead of \sm and \Sm.  File: mule, Node: Font, Next: CCL, Prev: Syntax, Up: Top Font ==== FONTSET is a set of fonts which have the same height and style. A fontset should hopefully contain enough fonts to display a character of various character sets. Mule uses fontset instead of font. You can specify fontset at any place where you can specify font. You can still specify font, in which case, a fontset which include the font is searched and used. Like font, fontset is also a string specifying the name. * Menu: * Initial fontsets:: Fontsets which Mule have at startup time. * Specify fontset:: How to specify a fontset? * Manage fontset:: How to create or modify a fontset?  File: mule, Node: Initial fontsets, Next: Specify fontset, Up: Font "default-fontset" ----------------- Mule automatically creates a fontset named "default-fontset" at startup time. Each font in this fontset is specifed by a very generic name such as "-*-fixed-medium-r-*--16-*-iso8859-1" for ASCII and "-*-fixed-medium-r-*--*-jisx0208.1983-*" for JISX0208 (Kanji). These values are defined in `lisp/term/x-win.el'. If there's no other fontsets specifed by X's resource, "default-fontset" is used for the first frame of Mule. In most cases, this is enough. You probably don't have to have any other fontsets. X's resourse ------------ Mule also creates fontsets specified in X's resource "fontSetList (class FontSetList)". The value is a comma separated list of fontset names. *FontSetList: 16,24 The actual contents of each fontset is specified by "fontSet-xxx (class FontSet-xxx)" where "xxx" is a name of the corresponding fontset. The value of this resource is a comma separated list of font names. *FontSet-16: -etl-fixed-medium-r-*--24-*-iso8859-1 Each font name should not contain wild card `*' or `?' in CHARSET_REGSTRY field because a character set for this font is recognized by this field. This means that you don't have to care about the order of font names. For instance, *FontSet-16:\ -etl-fixed-medium-r-*--16-*-iso8859-1\ -ming-fixed-medium-r-*--*-*-jisx0208.1983-* is enough to tell Mule that the fontset "16" contains ASCII font and JISX0208 font. Please note that the second name has only wild card in PIXEL_SIZE field. Since Mule try to open a font of the same PIXEL_SIZE as ASCII font of the same fontset, you'ld better not specify actual value in PIXEL_SIZE field except for ASCII font. As for fonts not listed in the specification of fontset, corresponding font names in "default fontset" is used. The first fontset in FontSetList is used for the first frame of Mule. If you want to use "default-fontset" while specifying other fontsets in the resource, please put "default-fontset" at the first of the value. *FontSetList: default-fontset,16,24 In this case, you don't have to have the resource "FontSet-default-fontset".  File: mule, Node: Specify fontset, Next: Manage fontset, Prev: Initial fontsets, Up: Font How to specify a fontset? ========================= You can specify fontset at any place where you can sepcify font. To change the fontset used for the first frame of Mule: 1. command line arguments "-fn xxx" or "-font xxx" If this argument exits, fontset is searched in the following order: 1. A fontset whose name is "xxx". 2. A fontset which contains ASCII font "xxx". 3. Create a new fontset "xxx" which contains ASCII font "xxx". 2. In your ~/.emacs, (setcdr (assoc 'font default-frame-alist) "xxx") To change a fontset after Mule started: 1. By the command M-x set-default-fontsetxxx 2. By Ctl-Mouse-3  File: mule, Node: Manage fontset, Prev: Specify fontset, Up: Font How to create or modify a fontset? ================================== You can create a new fontset by `new-fontset' and modify an existing fontset by `set-fontset-font'. You can get a list of fontset currently created by `fonset-list'. You can check if a fontset is already created or not by `fonsetp'.  Tag table: Node: Top242 Node: Overview1420 Node: Character1782 Node: Character type1994 Node: Buffer and string2667 Node: Character object4476 Node: GLYPH5894 Node: Functions6204 Node: Character set11496 Node: Coding-system16447 Node: Structure17347 Node: Creation20410 Node: Predefined coding-system21483 Node: Automatic conversion24828 Node: Mode-line28449 Node: ISO2022 restriction30011 Node: Big531239 Node: Syntax32862 Node: Font34161 Node: Initial fontsets34837 Node: Specify fontset37052 Node: Manage fontset37828  End tag table