comparison src/coding.c @ 35053:e3e1ff3616fa

Commentary changes. (detect_eol_type_in_2_octet_form): Declare arg big_endian_p.
author Dave Love <fx@gnu.org>
date Thu, 04 Jan 2001 17:35:26 +0000
parents 8cd5e6ad71a2
children 36de5bf9969c
comparison
equal deleted inserted replaced
35052:07b5f5fdb0ce 35053:e3e1ff3616fa
35 */ 35 */
36 36
37 /*** 0. General comments ***/ 37 /*** 0. General comments ***/
38 38
39 39
40 /*** GENERAL NOTE on CODING SYSTEM *** 40 /*** GENERAL NOTE on CODING SYSTEMS ***
41 41
42 Coding system is an encoding mechanism of one or more character 42 A coding system is an encoding mechanism for one or more character
43 sets. Here's a list of coding systems which Emacs can handle. When 43 sets. Here's a list of coding systems which Emacs can handle. When
44 we say "decode", it means converting some other coding system to 44 we say "decode", it means converting some other coding system to
45 Emacs' internal format (emacs-internal), and when we say "encode", 45 Emacs' internal format (emacs-mule), and when we say "encode",
46 it means converting the coding system emacs-mule to some other 46 it means converting the coding system emacs-mule to some other
47 coding system. 47 coding system.
48 48
49 0. Emacs' internal format (emacs-mule) 49 0. Emacs' internal format (emacs-mule)
50 50
51 Emacs itself holds a multi-lingual character in a buffer and a string 51 Emacs itself holds a multi-lingual character in buffers and strings
52 in a special format. Details are described in section 2. 52 in a special format. Details are described in section 2.
53 53
54 1. ISO2022 54 1. ISO2022
55 55
56 The most famous coding system for multiple character sets. X's 56 The most famous coding system for multiple character sets. X's
64 JISX0208. Widely used for PC's in Japan. Details are described in 64 JISX0208. Widely used for PC's in Japan. Details are described in
65 section 4. 65 section 4.
66 66
67 3. BIG5 67 3. BIG5
68 68
69 A coding system to encode character sets: ASCII and Big5. Widely 69 A coding system to encode the character sets ASCII and Big5. Widely
70 used by Chinese (mainly in Taiwan and Hong Kong). Details are 70 used for Chinese (mainly in Taiwan and Hong Kong). Details are
71 described in section 4. In this file, when we write "BIG5" 71 described in section 4. In this file, when we write "BIG5"
72 (all uppercase), we mean the coding system, and when we write 72 (all uppercase), we mean the coding system, and when we write
73 "Big5" (capitalized), we mean the character set. 73 "Big5" (capitalized), we mean the character set.
74 74
75 4. Raw text 75 4. Raw text
76 76
77 A coding system for a text containing random 8-bit code. Emacs does 77 A coding system for text containing random 8-bit code. Emacs does
78 no code conversion on such a text except for end-of-line format. 78 no code conversion on such text except for end-of-line format.
79 79
80 5. Other 80 5. Other
81 81
82 If a user wants to read/write a text encoded in a coding system not 82 If a user wants to read/write text encoded in a coding system not
83 listed above, he can supply a decoder and an encoder for it in CCL 83 listed above, he can supply a decoder and an encoder for it as CCL
84 (Code Conversion Language) programs. Emacs executes the CCL program 84 (Code Conversion Language) programs. Emacs executes the CCL program
85 while reading/writing. 85 while reading/writing.
86 86
87 Emacs represents a coding system by a Lisp symbol that has a property 87 Emacs represents a coding system by a Lisp symbol that has a property
88 `coding-system'. But, before actually using the coding system, the 88 `coding-system'. But, before actually using the coding system, the
91 91
92 */ 92 */
93 93
94 /*** GENERAL NOTES on END-OF-LINE FORMAT *** 94 /*** GENERAL NOTES on END-OF-LINE FORMAT ***
95 95
96 How end-of-line of a text is encoded depends on a system. For 96 How end-of-line of text is encoded depends on the operating system.
97 instance, Unix's format is just one byte of `line-feed' code, 97 For instance, Unix's format is just one byte of `line-feed' code,
98 whereas DOS's format is two-byte sequence of `carriage-return' and 98 whereas DOS's format is two-byte sequence of `carriage-return' and
99 `line-feed' codes. MacOS's format is usually one byte of 99 `line-feed' codes. MacOS's format is usually one byte of
100 `carriage-return'. 100 `carriage-return'.
101 101
102 Since text characters encoding and end-of-line encoding are 102 Since text character encoding and end-of-line encoding are
103 independent, any coding system described above can take 103 independent, any coding system described above can have any
104 any format of end-of-line. So, Emacs has information of format of 104 end-of-line format. So Emacs has information about end-of-line
105 end-of-line in each coding-system. See section 6 for more details. 105 format in each coding-system. See section 6 for more details.
106 106
107 */ 107 */
108 108
109 /*** GENERAL NOTES on `detect_coding_XXX ()' functions *** 109 /*** GENERAL NOTES on `detect_coding_XXX ()' functions ***
110 110
111 These functions check if a text between SRC and SRC_END is encoded 111 These functions check if a text between SRC and SRC_END is encoded
112 in the coding system category XXX. Each returns an integer value in 112 in the coding system category XXX. Each returns an integer value in
113 which appropriate flag bits for the category XXX is set. The flag 113 which appropriate flag bits for the category XXX are set. The flag
114 bits are defined in macros CODING_CATEGORY_MASK_XXX. Below is the 114 bits are defined in macros CODING_CATEGORY_MASK_XXX. Below is the
115 template of these functions. If MULTIBYTEP is nonzero, 8-bit codes 115 template for these functions. If MULTIBYTEP is nonzero, 8-bit codes
116 of the range 0x80..0x9F are in multibyte form. */ 116 of the range 0x80..0x9F are in multibyte form. */
117 #if 0 117 #if 0
118 int 118 int
119 detect_coding_emacs_mule (src, src_end, multibytep) 119 detect_coding_emacs_mule (src, src_end, multibytep)
120 unsigned char *src, *src_end; 120 unsigned char *src, *src_end;
129 These functions decode SRC_BYTES length of unibyte text at SOURCE 129 These functions decode SRC_BYTES length of unibyte text at SOURCE
130 encoded in CODING to Emacs' internal format. The resulting 130 encoded in CODING to Emacs' internal format. The resulting
131 multibyte text goes to a place pointed to by DESTINATION, the length 131 multibyte text goes to a place pointed to by DESTINATION, the length
132 of which should not exceed DST_BYTES. 132 of which should not exceed DST_BYTES.
133 133
134 These functions set the information of original and decoded texts in 134 These functions set the information about original and decoded texts
135 the members produced, produced_char, consumed, and consumed_char of 135 in the members `produced', `produced_char', `consumed', and
136 the structure *CODING. They also set the member result to one of 136 `consumed_char' of the structure *CODING. They also set the member
137 CODING_FINISH_XXX indicating how the decoding finished. 137 `result' to one of CODING_FINISH_XXX indicating how the decoding
138 138 finished.
139 DST_BYTES zero means that source area and destination area are 139
140 DST_BYTES zero means that the source area and destination area are
140 overlapped, which means that we can produce a decoded text until it 141 overlapped, which means that we can produce a decoded text until it
141 reaches at the head of not-yet-decoded source text. 142 reaches the head of the not-yet-decoded source text.
142 143
143 Below is a template of these functions. */ 144 Below is a template for these functions. */
144 #if 0 145 #if 0
145 static void 146 static void
146 decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes) 147 decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
147 struct coding_system *coding; 148 struct coding_system *coding;
148 unsigned char *source, *destination; 149 unsigned char *source, *destination;
152 } 153 }
153 #endif 154 #endif
154 155
155 /*** GENERAL NOTES on `encode_coding_XXX ()' functions *** 156 /*** GENERAL NOTES on `encode_coding_XXX ()' functions ***
156 157
157 These functions encode SRC_BYTES length text at SOURCE of Emacs' 158 These functions encode SRC_BYTES length text at SOURCE from Emacs'
158 internal multibyte format to CODING. The resulting unibyte text 159 internal multibyte format to CODING. The resulting unibyte text
159 goes to a place pointed to by DESTINATION, the length of which 160 goes to a place pointed to by DESTINATION, the length of which
160 should not exceed DST_BYTES. 161 should not exceed DST_BYTES.
161 162
162 These functions set the information of original and encoded texts in 163 These functions set the information about original and encoded texts
163 the members produced, produced_char, consumed, and consumed_char of 164 in the members `produced', `produced_char', `consumed', and
164 the structure *CODING. They also set the member result to one of 165 `consumed_char' of the structure *CODING. They also set the member
165 CODING_FINISH_XXX indicating how the encoding finished. 166 `result' to one of CODING_FINISH_XXX indicating how the encoding
166 167 finished.
167 DST_BYTES zero means that source area and destination area are 168
168 overlapped, which means that we can produce a encoded text until it 169 DST_BYTES zero means that the source area and destination area are
169 reaches at the head of not-yet-encoded source text. 170 overlapped, which means that we can produce encoded text until it
170 171 reaches at the head of the not-yet-encoded source text.
171 Below is a template of these functions. */ 172
173 Below is a template for these functions. */
172 #if 0 174 #if 0
173 static void 175 static void
174 encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes) 176 encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
175 struct coding_system *coding; 177 struct coding_system *coding;
176 unsigned char *source, *destination; 178 unsigned char *source, *destination;
258 260
259 261
260 /* Produce a multibyte form of characater C to `dst'. Jump to 262 /* Produce a multibyte form of characater C to `dst'. Jump to
261 `label_end_of_loop' if there's not enough space at `dst'. 263 `label_end_of_loop' if there's not enough space at `dst'.
262 264
263 If we are now in the middle of composition sequence, the decoded 265 If we are now in the middle of a composition sequence, the decoded
264 character may be ALTCHAR (for the current composition). In that 266 character may be ALTCHAR (for the current composition). In that
265 case, the character goes to coding->cmp_data->data instead of 267 case, the character goes to coding->cmp_data->data instead of
266 `dst'. 268 `dst'.
267 269
268 This macro is used in decoding routines. */ 270 This macro is used in decoding routines. */
1123 1125
1124 /*** 3. ISO2022 handlers ***/ 1126 /*** 3. ISO2022 handlers ***/
1125 1127
1126 /* The following note describes the coding system ISO2022 briefly. 1128 /* The following note describes the coding system ISO2022 briefly.
1127 Since the intention of this note is to help understand the 1129 Since the intention of this note is to help understand the
1128 functions in this file, some parts are NOT ACCURATE or OVERLY 1130 functions in this file, some parts are NOT ACCURATE or are OVERLY
1129 SIMPLIFIED. For thorough understanding, please refer to the 1131 SIMPLIFIED. For thorough understanding, please refer to the
1130 original document of ISO2022. 1132 original document of ISO2022. This is equivalent to the standard
1133 ECMA-35, obtainable from <URL:http://www.ecma.ch/> (*).
1131 1134
1132 ISO2022 provides many mechanisms to encode several character sets 1135 ISO2022 provides many mechanisms to encode several character sets
1133 in 7-bit and 8-bit environments. For 7-bite environments, all text 1136 in 7-bit and 8-bit environments. For 7-bit environments, all text
1134 is encoded using bytes less than 128. This may make the encoded 1137 is encoded using bytes less than 128. This may make the encoded
1135 text a little bit longer, but the text passes more easily through 1138 text a little bit longer, but the text passes more easily through
1136 several gateways, some of which strip off MSB (Most Signigant Bit). 1139 several types of gateway, some of which strip off the MSB (Most
1137 1140 Signigant Bit).
1138 There are two kinds of character sets: control character set and 1141
1139 graphic character set. The former contains control characters such 1142 There are two kinds of character sets: control character sets and
1143 graphic character sets. The former contain control characters such
1140 as `newline' and `escape' to provide control functions (control 1144 as `newline' and `escape' to provide control functions (control
1141 functions are also provided by escape sequences). The latter 1145 functions are also provided by escape sequences). The latter
1142 contains graphic characters such as 'A' and '-'. Emacs recognizes 1146 contain graphic characters such as 'A' and '-'. Emacs recognizes
1143 two control character sets and many graphic character sets. 1147 two control character sets and many graphic character sets.
1144 1148
1145 Graphic character sets are classified into one of the following 1149 Graphic character sets are classified into one of the following
1146 four classes, according to the number of bytes (DIMENSION) and 1150 four classes, according to the number of bytes (DIMENSION) and
1147 number of characters in one dimension (CHARS) of the set: 1151 number of characters in one dimension (CHARS) of the set:
1149 - DIMENSION1_CHARS96 1153 - DIMENSION1_CHARS96
1150 - DIMENSION2_CHARS94 1154 - DIMENSION2_CHARS94
1151 - DIMENSION2_CHARS96 1155 - DIMENSION2_CHARS96
1152 1156
1153 In addition, each character set is assigned an identification tag, 1157 In addition, each character set is assigned an identification tag,
1154 unique for each set, called "final character" (denoted as <F> 1158 unique for each set, called the "final character" (denoted as <F>
1155 hereafter). The <F> of each character set is decided by ECMA(*) 1159 hereafter). The <F> of each character set is decided by ECMA(*)
1156 when it is registered in ISO. The code range of <F> is 0x30..0x7F 1160 when it is registered in ISO. The code range of <F> is 0x30..0x7F
1157 (0x30..0x3F are for private use only). 1161 (0x30..0x3F are for private use only).
1158 1162
1159 Note (*): ECMA = European Computer Manufacturers Association 1163 Note (*): ECMA = European Computer Manufacturers Association
1160 1164
1161 Here are examples of graphic character set [NAME(<F>)]: 1165 Here are examples of graphic character sets [NAME(<F>)]:
1162 o DIMENSION1_CHARS94 -- ASCII('B'), right-half-of-JISX0201('I'), ... 1166 o DIMENSION1_CHARS94 -- ASCII('B'), right-half-of-JISX0201('I'), ...
1163 o DIMENSION1_CHARS96 -- right-half-of-ISO8859-1('A'), ... 1167 o DIMENSION1_CHARS96 -- right-half-of-ISO8859-1('A'), ...
1164 o DIMENSION2_CHARS94 -- GB2312('A'), JISX0208('B'), ... 1168 o DIMENSION2_CHARS94 -- GB2312('A'), JISX0208('B'), ...
1165 o DIMENSION2_CHARS96 -- none for the moment 1169 o DIMENSION2_CHARS96 -- none for the moment
1166 1170
1249 7-bit environment, non-locking-shift, and non-single-shift. 1253 7-bit environment, non-locking-shift, and non-single-shift.
1250 1254
1251 Note (**): If <F> is '@', 'A', or 'B', the intermediate character 1255 Note (**): If <F> is '@', 'A', or 'B', the intermediate character
1252 '(' can be omitted. We refer to this as "short-form" hereafter. 1256 '(' can be omitted. We refer to this as "short-form" hereafter.
1253 1257
1254 Now you may notice that there are a lot of ways for encoding the 1258 Now you may notice that there are a lot of ways of encoding the
1255 same multilingual text in ISO2022. Actually, there exist many 1259 same multilingual text in ISO2022. Actually, there exist many
1256 coding systems such as Compound Text (used in X11's inter client 1260 coding systems such as Compound Text (used in X11's inter client
1257 communication, ISO-2022-JP (used in Japanese internet), ISO-2022-KR 1261 communication, ISO-2022-JP (used in Japanese internet), ISO-2022-KR
1258 (used in Korean internet), EUC (Extended UNIX Code, used in Asian 1262 (used in Korean internet), EUC (Extended UNIX Code, used in Asian
1259 localized platforms), and all of these are variants of ISO2022. 1263 localized platforms), and all of these are variants of ISO2022.
1275 o ESC '1' -- end composition 1279 o ESC '1' -- end composition
1276 o ESC '2' -- start rule-base composition (*) 1280 o ESC '2' -- start rule-base composition (*)
1277 o ESC '3' -- start relative composition with alternate chars (**) 1281 o ESC '3' -- start relative composition with alternate chars (**)
1278 o ESC '4' -- start rule-base composition with alternate chars (**) 1282 o ESC '4' -- start rule-base composition with alternate chars (**)
1279 Since these are not standard escape sequences of any ISO standard, 1283 Since these are not standard escape sequences of any ISO standard,
1280 the use of them for these meaning is restricted to Emacs only. 1284 the use of them with these meanings is restricted to Emacs only.
1281 1285
1282 (*) This form is used only in Emacs 20.5 and the older versions, 1286 (*) This form is used only in Emacs 20.5 and older versions,
1283 but the newer versions can safely decode it. 1287 but the newer versions can safely decode it.
1284 (**) This form is used only in Emacs 21.1 and the newer versions, 1288 (**) This form is used only in Emacs 21.1 and newer versions,
1285 and the older versions can't decode it. 1289 and the older versions can't decode it.
1286 1290
1287 Here's a list of examples usages of these composition escape 1291 Here's a list of example usages of these composition escape
1288 sequences (categorized by `enum composition_method'). 1292 sequences (categorized by `enum composition_method').
1289 1293
1290 COMPOSITION_RELATIVE: 1294 COMPOSITION_RELATIVE:
1291 ESC 0 CHAR [ CHAR ] ESC 1 1295 ESC 0 CHAR [ CHAR ] ESC 1
1292 COMPOSITOIN_WITH_RULE: 1296 COMPOSITOIN_WITH_RULE:
1309 1313
1310 #define SHIFT_OUT_OK(idx) \ 1314 #define SHIFT_OUT_OK(idx) \
1311 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding_system_table[idx], 1) >= 0) 1315 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding_system_table[idx], 1) >= 0)
1312 1316
1313 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". 1317 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
1314 Check if a text is encoded in ISO2022. If it is, returns an 1318 Check if a text is encoded in ISO2022. If it is, return an
1315 integer in which appropriate flag bits any of: 1319 integer in which appropriate flag bits any of:
1316 CODING_CATEGORY_MASK_ISO_7 1320 CODING_CATEGORY_MASK_ISO_7
1317 CODING_CATEGORY_MASK_ISO_7_TIGHT 1321 CODING_CATEGORY_MASK_ISO_7_TIGHT
1318 CODING_CATEGORY_MASK_ISO_8_1 1322 CODING_CATEGORY_MASK_ISO_8_1
1319 CODING_CATEGORY_MASK_ISO_8_2 1323 CODING_CATEGORY_MASK_ISO_8_2
2038 2042
2039 /* ISO2022 encoding stuff. */ 2043 /* ISO2022 encoding stuff. */
2040 2044
2041 /* 2045 /*
2042 It is not enough to say just "ISO2022" on encoding, we have to 2046 It is not enough to say just "ISO2022" on encoding, we have to
2043 specify more details. In Emacs, each coding system of ISO2022 2047 specify more details. In Emacs, each ISO2022 coding system
2044 variant has the following specifications: 2048 variant has the following specifications:
2045 1. Initial designation to G0 thru G3. 2049 1. Initial designation to G0 thru G3.
2046 2. Allows short-form designation? 2050 2. Allows short-form designation?
2047 3. ASCII should be designated to G0 before control characters? 2051 3. ASCII should be designated to G0 before control characters?
2048 4. ASCII should be designated to G0 at end of line? 2052 4. ASCII should be designated to G0 at end of line?
2633 } 2637 }
2634 2638
2635 2639
2636 /*** 4. SJIS and BIG5 handlers ***/ 2640 /*** 4. SJIS and BIG5 handlers ***/
2637 2641
2638 /* Although SJIS and BIG5 are not ISO's coding system, they are used 2642 /* Although SJIS and BIG5 are not ISO coding systems, they are used
2639 quite widely. So, for the moment, Emacs supports them in the bare 2643 quite widely. So, for the moment, Emacs supports them in the bare
2640 C code. But, in the future, they may be supported only by CCL. */ 2644 C code. But, in the future, they may be supported only by CCL. */
2641 2645
2642 /* SJIS is a coding system encoding three character sets: ASCII, right 2646 /* SJIS is a coding system encoding three character sets: ASCII, right
2643 half of JISX0201-Kana, and JISX0208. An ASCII character is encoded 2647 half of JISX0201-Kana, and JISX0208. An ASCII character is encoded
2644 as is. A character of charset katakana-jisx0201 is encoded by 2648 as is. A character of charset katakana-jisx0201 is encoded by
2645 "position-code + 0x80". A character of charset japanese-jisx0208 2649 "position-code + 0x80". A character of charset japanese-jisx0208
2646 is encoded in 2-byte but two position-codes are divided and shifted 2650 is encoded in 2-byte but two position-codes are divided and shifted
2647 so that it fit in the range below. 2651 so that it fits in the range below.
2648 2652
2649 --- CODE RANGE of SJIS --- 2653 --- CODE RANGE of SJIS ---
2650 (character set) (range) 2654 (character set) (range)
2651 ASCII 0x00 .. 0x7F 2655 ASCII 0x00 .. 0x7F
2652 KATAKANA-JISX0201 0xA0 .. 0xDF 2656 KATAKANA-JISX0201 0xA0 .. 0xDF
2656 2660
2657 */ 2661 */
2658 2662
2659 /* BIG5 is a coding system encoding two character sets: ASCII and 2663 /* BIG5 is a coding system encoding two character sets: ASCII and
2660 Big5. An ASCII character is encoded as is. Big5 is a two-byte 2664 Big5. An ASCII character is encoded as is. Big5 is a two-byte
2661 character set and is encoded in two-byte. 2665 character set and is encoded in two bytes.
2662 2666
2663 --- CODE RANGE of BIG5 --- 2667 --- CODE RANGE of BIG5 ---
2664 (character set) (range) 2668 (character set) (range)
2665 ASCII 0x00 .. 0x7F 2669 ASCII 0x00 .. 0x7F
2666 Big5 (1st byte) 0xA1 .. 0xFE 2670 Big5 (1st byte) 0xA1 .. 0xFE
3308 } 3312 }
3309 3313
3310 3314
3311 /*** 7. C library functions ***/ 3315 /*** 7. C library functions ***/
3312 3316
3313 /* In Emacs Lisp, coding system is represented by a Lisp symbol which 3317 /* In Emacs Lisp, a coding system is represented by a Lisp symbol which
3314 has a property `coding-system'. The value of this property is a 3318 has a property `coding-system'. The value of this property is a
3315 vector of length 5 (called as coding-vector). Among elements of 3319 vector of length 5 (called the coding-vector). Among elements of
3316 this vector, the first (element[0]) and the fifth (element[4]) 3320 this vector, the first (element[0]) and the fifth (element[4])
3317 carry important information for decoding/encoding. Before 3321 carry important information for decoding/encoding. Before
3318 decoding/encoding, this information should be set in fields of a 3322 decoding/encoding, this information should be set in fields of a
3319 structure of type `coding_system'. 3323 structure of type `coding_system'.
3320 3324
3321 A value of property `coding-system' can be a symbol of another 3325 The value of the property `coding-system' can be a symbol of another
3322 subsidiary coding-system. In that case, Emacs gets coding-vector 3326 subsidiary coding-system. In that case, Emacs gets coding-vector
3323 from that symbol. 3327 from that symbol.
3324 3328
3325 `element[0]' contains information to be set in `coding->type'. The 3329 `element[0]' contains information to be set in `coding->type'. The
3326 value and its meaning is as follows: 3330 value and its meaning is as follows:
3360 If `coding->type' is `coding_type_big5', element[4] is t to denote 3364 If `coding->type' is `coding_type_big5', element[4] is t to denote
3361 BIG5-ETen or nil to denote BIG5-HKU. 3365 BIG5-ETen or nil to denote BIG5-HKU.
3362 3366
3363 If `coding->type' takes the other value, element[4] is ignored. 3367 If `coding->type' takes the other value, element[4] is ignored.
3364 3368
3365 Emacs Lisp's coding system also carries information about format of 3369 Emacs Lisp's coding systems also carry information about format of
3366 end-of-line in a value of property `eol-type'. If the value is 3370 end-of-line in a value of property `eol-type'. If the value is
3367 integer, 0 means CODING_EOL_LF, 1 means CODING_EOL_CRLF, and 2 3371 integer, 0 means CODING_EOL_LF, 1 means CODING_EOL_CRLF, and 2
3368 means CODING_EOL_CR. If it is not integer, it should be a vector 3372 means CODING_EOL_CR. If it is not integer, it should be a vector
3369 of subsidiary coding systems of which property `eol-type' has one 3373 of subsidiary coding systems of which property `eol-type' has one
3370 of above values. 3374 of the above values.
3371 3375
3372 */ 3376 */
3373 3377
3374 /* Extract information for decoding/encoding from CODING_SYSTEM_SYMBOL 3378 /* Extract information for decoding/encoding from CODING_SYSTEM_SYMBOL
3375 and set it in CODING. If CODING_SYSTEM_SYMBOL is invalid, CODING 3379 and set it in CODING. If CODING_SYSTEM_SYMBOL is invalid, CODING
3893 The category for a coding system not categorized in any of the 3897 The category for a coding system not categorized in any of the
3894 above. Assigned the coding-system (Lisp symbol) 3898 above. Assigned the coding-system (Lisp symbol)
3895 `no-conversion' by default. 3899 `no-conversion' by default.
3896 3900
3897 Each of them is a Lisp symbol and the value is an actual 3901 Each of them is a Lisp symbol and the value is an actual
3898 `coding-system's (this is also a Lisp symbol) assigned by a user. 3902 `coding-system' (this is also a Lisp symbol) assigned by a user.
3899 What Emacs does actually is to detect a category of coding system. 3903 What Emacs does actually is to detect a category of coding system.
3900 Then, it uses a `coding-system' assigned to it. If Emacs can't 3904 Then, it uses a `coding-system' assigned to it. If Emacs can't
3901 decide only one possible category, it selects a category of the 3905 decide a single possible category, it selects a category of the
3902 highest priority. Priorities of categories are also specified by a 3906 highest priority. Priorities of categories are also specified by a
3903 user in a Lisp variable `coding-category-list'. 3907 user in a Lisp variable `coding-category-list'.
3904 3908
3905 */ 3909 */
3906 3910
4186 utf-16-le. */ 4190 utf-16-le. */
4187 4191
4188 static int 4192 static int
4189 detect_eol_type_in_2_octet_form (source, src_bytes, skip, big_endian_p) 4193 detect_eol_type_in_2_octet_form (source, src_bytes, skip, big_endian_p)
4190 unsigned char *source; 4194 unsigned char *source;
4191 int src_bytes, *skip; 4195 int src_bytes, *skip, big_endian_p;
4192 { 4196 {
4193 unsigned char *src = source, *src_end = src + src_bytes; 4197 unsigned char *src = source, *src_end = src + src_bytes;
4194 unsigned int c1, c2; 4198 unsigned int c1, c2;
4195 int total = 0; /* How many end-of-lines are found so far. */ 4199 int total = 0; /* How many end-of-lines are found so far. */
4196 int eol_type = CODING_EOL_UNDECIDED; 4200 int eol_type = CODING_EOL_UNDECIDED;
6404 return make_number (coding.produced_char); 6408 return make_number (coding.produced_char);
6405 } 6409 }
6406 6410
6407 DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region, 6411 DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region,
6408 3, 3, "r\nzCoding system: ", 6412 3, 3, "r\nzCoding system: ",
6409 "Decode the current region by specified coding system.\n\ 6413 "Decode the current region from the specified coding system.\n\
6410 When called from a program, takes three arguments:\n\ 6414 When called from a program, takes three arguments:\n\
6411 START, END, and CODING-SYSTEM. START and END are buffer positions.\n\ 6415 START, END, and CODING-SYSTEM. START and END are buffer positions.\n\
6412 This function sets `last-coding-system-used' to the precise coding system\n\ 6416 This function sets `last-coding-system-used' to the precise coding system\n\
6413 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\ 6417 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
6414 not fully specified.)\n\ 6418 not fully specified.)\n\
6419 return code_convert_region1 (start, end, coding_system, 0); 6423 return code_convert_region1 (start, end, coding_system, 0);
6420 } 6424 }
6421 6425
6422 DEFUN ("encode-coding-region", Fencode_coding_region, Sencode_coding_region, 6426 DEFUN ("encode-coding-region", Fencode_coding_region, Sencode_coding_region,
6423 3, 3, "r\nzCoding system: ", 6427 3, 3, "r\nzCoding system: ",
6424 "Encode the current region by specified coding system.\n\ 6428 "Encode the current region into the specified coding system.\n\
6425 When called from a program, takes three arguments:\n\ 6429 When called from a program, takes three arguments:\n\
6426 START, END, and CODING-SYSTEM. START and END are buffer positions.\n\ 6430 START, END, and CODING-SYSTEM. START and END are buffer positions.\n\
6427 This function sets `last-coding-system-used' to the precise coding system\n\ 6431 This function sets `last-coding-system-used' to the precise coding system\n\
6428 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\ 6432 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
6429 not fully specified.)\n\ 6433 not fully specified.)\n\