Mercurial > emacs
comparison src/coding.c @ 88485:e92f62c0073e
Doc fixes.
(Fdefine_coding_system_alias): Use names, not symbols, in
coding-system-alist.
| author | Dave Love <fx@gnu.org> |
|---|---|
| date | Mon, 13 May 2002 17:50:19 +0000 |
| parents | 5f974cbba7b3 |
| children | d2b9e0d4c2f6 |
comparison
equal
deleted
inserted
replaced
| 88484:3667d64a1787 | 88485:e92f62c0073e |
|---|---|
| 44 /*** 0. General comments *** | 44 /*** 0. General comments *** |
| 45 | 45 |
| 46 | 46 |
| 47 CODING SYSTEM | 47 CODING SYSTEM |
| 48 | 48 |
| 49 Coding system is an object for a encoding mechanism that contains | 49 A coding system is an object for an encoding mechanism that contains |
| 50 information about how to convert byte sequence to character | 50 information about how to convert byte sequences to character |
| 51 sequences and vice versa. When we say "decode", it means converting | 51 sequences and vice versa. When we say "decode", it means converting |
| 52 a byte sequence of a specific coding system into a character | 52 a byte sequence of a specific coding system into a character |
| 53 sequence that is represented by Emacs' internal coding system | 53 sequence that is represented by Emacs' internal coding system |
| 54 `emacs-utf-8', and when we say "encode", it means converting a | 54 `emacs-utf-8', and when we say "encode", it means converting a |
| 55 character sequence of emacs-utf-8 to a byte sequence of a specific | 55 character sequence of emacs-utf-8 to a byte sequence of a specific |
| 56 coding system. | 56 coding system. |
| 57 | 57 |
| 58 In Emacs Lisp, a coding system is represented by a Lisp symbol. In | 58 In Emacs Lisp, a coding system is represented by a Lisp symbol. In |
| 59 C level, a coding system is represented by a vector of attributes | 59 C level, a coding system is represented by a vector of attributes |
| 60 stored in the hash table Vcharset_hash_table. The conversion from a | 60 stored in the hash table Vcharset_hash_table. The conversion from |
| 61 coding system symbol to attributes vector is done by looking up | 61 coding system symbol to attributes vector is done by looking up |
| 62 Vcharset_hash_table by the symbol. | 62 Vcharset_hash_table by the symbol. |
| 63 | 63 |
| 64 Coding systems are classified into the following types depending on | 64 Coding systems are classified into the following types depending on |
| 65 the mechanism of encoding. Here's a brief descrition about type. | 65 the encoding mechanism. Here's a brief description of the types. |
| 66 | 66 |
| 67 o UTF-8 | 67 o UTF-8 |
| 68 | 68 |
| 69 o UTF-16 | 69 o UTF-16 |
| 70 | 70 |
| 71 o Charset-base coding system | 71 o Charset-base coding system |
| 72 | 72 |
| 73 A coding system defined by one or more (coded) character sets. | 73 A coding system defined by one or more (coded) character sets. |
| 74 Decoding and encoding are done by code converter defined for each | 74 Decoding and encoding are done by a code converter defined for each |
| 75 character set. | 75 character set. |
| 76 | 76 |
| 77 o Old Emacs' internal format (emacs-mule) | 77 o Old Emacs internal format (emacs-mule) |
| 78 | 78 |
| 79 The coding system adopted by an old versions of Emacs (20 and 21). | 79 The coding system adopted by old versions of Emacs (20 and 21). |
| 80 | 80 |
| 81 o ISO2022-base coding system | 81 o ISO2022-base coding system |
| 82 | 82 |
| 83 The most famous coding system for multiple character sets. X's | 83 The most famous coding system for multiple character sets. X's |
| 84 Compound Text, various EUCs (Extended Unix Code), and coding systems | 84 Compound Text, various EUCs (Extended Unix Code), and coding systems |
| 99 lowercase), we mean the coding system, and when we write "Big5" | 99 lowercase), we mean the coding system, and when we write "Big5" |
| 100 (capitalized), we mean the character set. | 100 (capitalized), we mean the character set. |
| 101 | 101 |
| 102 o CCL | 102 o CCL |
| 103 | 103 |
| 104 If a user wants to decode/encode a text encoded in a coding system | 104 If a user wants to decode/encode text encoded in a coding system |
| 105 not listed above, he can supply a decoder and an encoder for it in | 105 not listed above, he can supply a decoder and an encoder for it in |
| 106 CCL (Code Conversion Language) programs. Emacs executes the CCL | 106 CCL (Code Conversion Language) programs. Emacs executes the CCL |
| 107 program while decoding/encoding. | 107 program while decoding/encoding. |
| 108 | 108 |
| 109 o Raw-text | 109 o Raw-text |
| 110 | 110 |
| 111 A coding system for a text containing raw eight-bit data. Emacs | 111 A coding system for a text containing raw eight-bit data. Emacs |
| 112 treat each byte of source text as a character (except for | 112 treats each byte of source text as a character (except for |
| 113 end-of-line conversion). | 113 end-of-line conversion). |
| 114 | 114 |
| 115 o No-conversion | 115 o No-conversion |
| 116 | 116 |
| 117 Like raw text, but don't do end-of-line conversion. | 117 Like raw text, but don't do end-of-line conversion. |
| 118 | 118 |
| 119 | 119 |
| 120 END-OF-LINE FORMAT | 120 END-OF-LINE FORMAT |
| 121 | 121 |
| 122 How end-of-line of a text is encoded depends on a system. For | 122 How text end-of-line is encoded depends on operating system. For |
| 123 instance, Unix's format is just one byte of LF (line-feed) code, | 123 instance, Unix's format is just one byte of LF (line-feed) code, |
| 124 whereas DOS's format is two-byte sequence of `carriage-return' and | 124 whereas DOS's format is two-byte sequence of `carriage-return' and |
| 125 `line-feed' codes. MacOS's format is usually one byte of | 125 `line-feed' codes. MacOS's format is usually one byte of |
| 126 `carriage-return'. | 126 `carriage-return'. |
| 127 | 127 |
| 128 Since text characters encoding and end-of-line encoding are | 128 Since text character encoding and end-of-line encoding are |
| 129 independent, any coding system described above can take any format | 129 independent, any coding system described above can take any format |
| 130 of end-of-line (except for no-conversion). | 130 of end-of-line (except for no-conversion). |
| 131 | 131 |
| 132 STRUCT CODING_SYSTEM | 132 STRUCT CODING_SYSTEM |
| 133 | 133 |
| 134 Before using a coding system for code conversion (i.e. decoding and | 134 Before using a coding system for code conversion (i.e. decoding and |
| 135 encoding), we setup a structure of type `struct coding_system'. | 135 encoding), we setup a structure of type `struct coding_system'. |
| 136 This structure keeps various information about a specific code | 136 This structure keeps various information about a specific code |
| 137 conversion (e.g. the location of source and destination data). | 137 conversion (e.g. the location of source and destination data). |
| 138 | 138 |
| 139 */ | 139 */ |
| 140 | 140 |
| 141 /* COMMON MACROS */ | 141 /* COMMON MACROS */ |
| 142 | 142 |
| 7626 | 7626 |
| 7627 ASET (spec, 2, subsidiaries); | 7627 ASET (spec, 2, subsidiaries); |
| 7628 } | 7628 } |
| 7629 | 7629 |
| 7630 Fputhash (alias, spec, Vcoding_system_hash_table); | 7630 Fputhash (alias, spec, Vcoding_system_hash_table); |
| 7631 Vcoding_system_alist = Fcons (Fcons (alias, Qnil), Vcoding_system_alist); | 7631 Vcoding_system_alist = Fcons (Fcons (Fsymbol_name (alias), Qnil), |
| 7632 Vcoding_system_alist); | |
| 7632 | 7633 |
| 7633 return Qnil; | 7634 return Qnil; |
| 7634 } | 7635 } |
| 7635 | 7636 |
| 7636 DEFUN ("coding-system-base", Fcoding_system_base, Scoding_system_base, | 7637 DEFUN ("coding-system-base", Fcoding_system_base, Scoding_system_base, |
