Mercurial > emacs
comparison src/coding.c @ 20718:c600dea3b06b
Vselect_safe_coding_system_function): New variable.
(coding_category_table): This variable deleted.
(Vcoding_category_table): New variable.
(coding_category_name): Add "coding-category-iso-7-tight".
(detect_coding_iso2022): Check the mask
CODING_FLAG_ISO_DESIGNATION in CODING->FLAGS. Check a new coding
category coding-category-iso-7-tight.
(DECODE_DESIGNATION): Decode only such designations that CODING
can handle.
(check_composing_code): New function.
(decode_coding_iso2022): Decode only such characters that CODING
can handle.
(encode_coding_iso2022): Before and after encoding composite
characters, reset designation and invocation status.
(detect_coding_sjis): Delete unnecessary check.
(detect_coding_big5): Likewise.
(encode_designation_at_bol): Check the validity of requested
designation register.
(setup_coding_system): Set requested designation registers for
non-supported charsets to
CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION. Set mask
CODING_FLAG_ISO_DESIGNATION in CODING->FLAGS. Code tuned for
no-conversion and undecided.
(detect_coding): Adjusted for the new variable
Vcoding_category_table.
(syms_of_coding): Initialize Vcoding_category_table and staticpro
it. Register select-safe-coding-system as a Lisp variable.
(DECODE_CHARACTER_ASCII): Update coding->produced_char;
(DECODE_CHARACTER_DIMENSION1): Likewise.
(Qraw_text, Qcoding_category): New variables.
(syms_of_coding): Intern and staticpro them.
(coding_system_table): New variable.
(CHARSET_OK, SHIFT_OUT_OK): New macros.
(detect_coding_iso2022): Detection algorithm improved.
(decode_coding_iso2022): Arg CONSUMED deleted, and the meaning of
return value changed. Update members produced, produced_char,
consumed, consumed_char of the struct *coding. Pay attention to
CODING_MODE_INHIBIT_INCONSISTENT_EOL.
(encode_coding_iso2022): Likewise.
(decode_coding_sjis_big5, encode_coding_sjis_big5): Likewise.
(decode_eol, encode_eol): Likewise.
(ENCODE_ISO_CHARACTER): Update coding->consumed_char.
(DECODE_SJIS_BIG5_CHARACTER): Update coding->produced_char.
(ENCODE_SJIS_BIG5_CHARACTER): Update coding->consumed_char.
(detect_coding(detect_coding(detect_ITIES and SKIP.
(detect_coding): Adjusted for the change of detect_coding_mask.
Update coding->heading_ascii.
(detect_eol_type): New arg SKIP.
(detect_eol): Adjusted for the change of detect_eol_type.
(ccl_codign_driver): New function.
(decode_coding): Arg CONSUMED deleted, and the meaning of return
value changed. Update members produced, produced_char, consumed,
consumed_char of the struct *coding.
(encode_coding): Likewise.
(shrink_decoding_region, shrink_encoding_region): New function.
(code_convert_region, code_convert_string): Completely rewritten.
(detect_coding_sy(detect_coding_sy(detect_coding_sy(detect_coding_sy(detect_codiT.
(Fdetect_coding_string): New function.
(Fdecode_coding_region, Fencode_coding_region): Adjusted for the
change of code_convert_region.
(Fdecode_coding_string, Fencode_coding_string): Adjusted for the
change of code_convert_string.
(Fupdate_iso_coding_systems): New function.
(init_coding_once): Initialize coding_system_table.
| author | Kenichi Handa <handa@m17n.org> |
|---|---|
| date | Thu, 22 Jan 1998 01:26:45 +0000 |
| parents | ed9ed828415e |
| children | 13d0a6194de7 |
comparison
equal
deleted
inserted
replaced
| 20717:19463997fbc6 | 20718:c600dea3b06b |
|---|---|
| 77 If a user wants to read/write a text encoded in a coding system not | 77 If a user wants to read/write a text encoded in a coding system not |
| 78 listed above, he can supply a decoder and an encoder for it in CCL | 78 listed above, he can supply a decoder and an encoder for it in CCL |
| 79 (Code Conversion Language) programs. Emacs executes the CCL program | 79 (Code Conversion Language) programs. Emacs executes the CCL program |
| 80 while reading/writing. | 80 while reading/writing. |
| 81 | 81 |
| 82 Emacs represents a coding-system by a Lisp symbol that has a property | 82 Emacs represents a coding system by a Lisp symbol that has a property |
| 83 `coding-system'. But, before actually using the coding-system, the | 83 `coding-system'. But, before actually using the coding system, the |
| 84 information about it is set in a structure of type `struct | 84 information about it is set in a structure of type `struct |
| 85 coding_system' for rapid processing. See section 6 for more details. | 85 coding_system' for rapid processing. See section 6 for more details. |
| 86 | 86 |
| 87 */ | 87 */ |
| 88 | 88 |
| 89 /*** GENERAL NOTES on END-OF-LINE FORMAT *** | 89 /*** GENERAL NOTES on END-OF-LINE FORMAT *** |
| 90 | 90 |
| 91 How end-of-line of a text is encoded depends on a system. For | 91 How end-of-line of a text is encoded depends on a system. For |
| 92 instance, Unix's format is just one byte of `line-feed' code, | 92 instance, Unix's format is just one byte of `line-feed' code, |
| 93 whereas DOS's format is two-byte sequence of `carriage-return' and | 93 whereas DOS's format is two-byte sequence of `carriage-return' and |
| 94 `line-feed' codes. MacOS's format is one byte of `carriage-return'. | 94 `line-feed' codes. MacOS's format is usually one byte of |
| 95 `carriage-return'. | |
| 95 | 96 |
| 96 Since text characters encoding and end-of-line encoding are | 97 Since text characters encoding and end-of-line encoding are |
| 97 independent, any coding system described above can take | 98 independent, any coding system described above can take |
| 98 any format of end-of-line. So, Emacs has information of format of | 99 any format of end-of-line. So, Emacs has information of format of |
| 99 end-of-line in each coding-system. See section 6 for more details. | 100 end-of-line in each coding-system. See section 6 for more details. |
| 118 | 119 |
| 119 /*** GENERAL NOTES on `decode_coding_XXX ()' functions *** | 120 /*** GENERAL NOTES on `decode_coding_XXX ()' functions *** |
| 120 | 121 |
| 121 These functions decode SRC_BYTES length text at SOURCE encoded in | 122 These functions decode SRC_BYTES length text at SOURCE encoded in |
| 122 CODING to Emacs' internal format (emacs-mule). The resulting text | 123 CODING to Emacs' internal format (emacs-mule). The resulting text |
| 123 goes to a place pointed to by DESTINATION, the length of which should | 124 goes to a place pointed to by DESTINATION, the length of which |
| 124 not exceed DST_BYTES. The number of bytes actually processed is | 125 should not exceed DST_BYTES. These functions set the information of |
| 125 returned as *CONSUMED. The return value is the length of the decoded | 126 original and decoded texts in the members produced, produced_char, |
| 126 text. Below is a template of these functions. */ | 127 consumed, and consumed_char of the structure *CODING. |
| 128 | |
| 129 The return value is an integer (CODING_FINISH_XXX) indicating how | |
| 130 the decoding finished. | |
| 131 | |
| 132 DST_BYTES zero means that source area and destination area are | |
| 133 overlapped, which means that we can produce a decoded text until it | |
| 134 reaches at the head of not-yet-decoded source text. | |
| 135 | |
| 136 Below is a template of these functions. */ | |
| 127 #if 0 | 137 #if 0 |
| 128 decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes, consumed) | 138 decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes) |
| 129 struct coding_system *coding; | 139 struct coding_system *coding; |
| 130 unsigned char *source, *destination; | 140 unsigned char *source, *destination; |
| 131 int src_bytes, dst_bytes; | 141 int src_bytes, dst_bytes; |
| 132 int *consumed; | |
| 133 { | 142 { |
| 134 ... | 143 ... |
| 135 } | 144 } |
| 136 #endif | 145 #endif |
| 137 | 146 |
| 138 /*** GENERAL NOTES on `encode_coding_XXX ()' functions *** | 147 /*** GENERAL NOTES on `encode_coding_XXX ()' functions *** |
| 139 | 148 |
| 140 These functions encode SRC_BYTES length text at SOURCE of Emacs' | 149 These functions encode SRC_BYTES length text at SOURCE of Emacs' |
| 141 internal format (emacs-mule) to CODING. The resulting text goes to | 150 internal format (emacs-mule) to CODING. The resulting text goes to |
| 142 a place pointed to by DESTINATION, the length of which should not | 151 a place pointed to by DESTINATION, the length of which should not |
| 143 exceed DST_BYTES. The number of bytes actually processed is | 152 exceed DST_BYTES. These functions set the information of |
| 144 returned as *CONSUMED. The return value is the length of the | 153 original and encoded texts in the members produced, produced_char, |
| 145 encoded text. Below is a template of these functions. */ | 154 consumed, and consumed_char of the structure *CODING. |
| 155 | |
| 156 The return value is an integer (CODING_FINISH_XXX) indicating how | |
| 157 the encoding finished. | |
| 158 | |
| 159 DST_BYTES zero means that source area and destination area are | |
| 160 overlapped, which means that we can produce a decoded text until it | |
| 161 reaches at the head of not-yet-decoded source text. | |
| 162 | |
| 163 Below is a template of these functions. */ | |
| 146 #if 0 | 164 #if 0 |
| 147 encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes, consumed) | 165 encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes) |
| 148 struct coding_system *coding; | 166 struct coding_system *coding; |
| 149 unsigned char *source, *destination; | 167 unsigned char *source, *destination; |
| 150 int src_bytes, dst_bytes; | 168 int src_bytes, dst_bytes; |
| 151 int *consumed; | |
| 152 { | 169 { |
| 153 ... | 170 ... |
| 154 } | 171 } |
| 155 #endif | 172 #endif |
| 156 | 173 |
| 198 #define DECODE_CHARACTER_ASCII(c) \ | 215 #define DECODE_CHARACTER_ASCII(c) \ |
| 199 do { \ | 216 do { \ |
| 200 if (COMPOSING_P (coding->composing)) \ | 217 if (COMPOSING_P (coding->composing)) \ |
| 201 *dst++ = 0xA0, *dst++ = (c) | 0x80; \ | 218 *dst++ = 0xA0, *dst++ = (c) | 0x80; \ |
| 202 else \ | 219 else \ |
| 203 *dst++ = (c); \ | 220 { \ |
| 221 *dst++ = (c); \ | |
| 222 coding->produced_char++; \ | |
| 223 } \ | |
| 204 } while (0) | 224 } while (0) |
| 205 | 225 |
| 206 /* Decode one DIMENSION1 character whose charset is CHARSET and whose | 226 /* Decode one DIMENSION1 character whose charset is CHARSET and whose |
| 207 position-code is C. */ | 227 position-code is C. */ |
| 208 | 228 |
| 210 do { \ | 230 do { \ |
| 211 unsigned char leading_code = CHARSET_LEADING_CODE_BASE (charset); \ | 231 unsigned char leading_code = CHARSET_LEADING_CODE_BASE (charset); \ |
| 212 if (COMPOSING_P (coding->composing)) \ | 232 if (COMPOSING_P (coding->composing)) \ |
| 213 *dst++ = leading_code + 0x20; \ | 233 *dst++ = leading_code + 0x20; \ |
| 214 else \ | 234 else \ |
| 215 *dst++ = leading_code; \ | 235 { \ |
| 236 *dst++ = leading_code; \ | |
| 237 coding->produced_char++; \ | |
| 238 } \ | |
| 216 if (leading_code = CHARSET_LEADING_CODE_EXT (charset)) \ | 239 if (leading_code = CHARSET_LEADING_CODE_EXT (charset)) \ |
| 217 *dst++ = leading_code; \ | 240 *dst++ = leading_code; \ |
| 218 *dst++ = (c) | 0x80; \ | 241 *dst++ = (c) | 0x80; \ |
| 219 } while (0) | 242 } while (0) |
| 220 | 243 |
| 258 extern Lisp_Object Qinsert_file_contents, Qwrite_region; | 281 extern Lisp_Object Qinsert_file_contents, Qwrite_region; |
| 259 Lisp_Object Qcall_process, Qcall_process_region, Qprocess_argument; | 282 Lisp_Object Qcall_process, Qcall_process_region, Qprocess_argument; |
| 260 Lisp_Object Qstart_process, Qopen_network_stream; | 283 Lisp_Object Qstart_process, Qopen_network_stream; |
| 261 Lisp_Object Qtarget_idx; | 284 Lisp_Object Qtarget_idx; |
| 262 | 285 |
| 286 Lisp_Object Vselect_safe_coding_system_function; | |
| 287 | |
| 263 /* Mnemonic character of each format of end-of-line. */ | 288 /* Mnemonic character of each format of end-of-line. */ |
| 264 int eol_mnemonic_unix, eol_mnemonic_dos, eol_mnemonic_mac; | 289 int eol_mnemonic_unix, eol_mnemonic_dos, eol_mnemonic_mac; |
| 265 /* Mnemonic character to indicate format of end-of-line is not yet | 290 /* Mnemonic character to indicate format of end-of-line is not yet |
| 266 decided. */ | 291 decided. */ |
| 267 int eol_mnemonic_undecided; | 292 int eol_mnemonic_undecided; |
| 274 | 299 |
| 275 Lisp_Object Vcoding_system_list, Vcoding_system_alist; | 300 Lisp_Object Vcoding_system_list, Vcoding_system_alist; |
| 276 | 301 |
| 277 Lisp_Object Qcoding_system_p, Qcoding_system_error; | 302 Lisp_Object Qcoding_system_p, Qcoding_system_error; |
| 278 | 303 |
| 279 /* Coding system emacs-mule is for converting only end-of-line format. */ | 304 /* Coding system emacs-mule and raw-text are for converting only |
| 280 Lisp_Object Qemacs_mule; | 305 end-of-line format. */ |
| 306 Lisp_Object Qemacs_mule, Qraw_text; | |
| 281 | 307 |
| 282 /* Coding-systems are handed between Emacs Lisp programs and C internal | 308 /* Coding-systems are handed between Emacs Lisp programs and C internal |
| 283 routines by the following three variables. */ | 309 routines by the following three variables. */ |
| 284 /* Coding-system for reading files and receiving data from process. */ | 310 /* Coding-system for reading files and receiving data from process. */ |
| 285 Lisp_Object Vcoding_system_for_read; | 311 Lisp_Object Vcoding_system_for_read; |
| 309 Lisp_Object Vprocess_coding_system_alist; | 335 Lisp_Object Vprocess_coding_system_alist; |
| 310 Lisp_Object Vnetwork_coding_system_alist; | 336 Lisp_Object Vnetwork_coding_system_alist; |
| 311 | 337 |
| 312 #endif /* emacs */ | 338 #endif /* emacs */ |
| 313 | 339 |
| 314 Lisp_Object Qcoding_category_index; | 340 Lisp_Object Qcoding_category, Qcoding_category_index; |
| 315 | 341 |
| 316 /* List of symbols `coding-category-xxx' ordered by priority. */ | 342 /* List of symbols `coding-category-xxx' ordered by priority. */ |
| 317 Lisp_Object Vcoding_category_list; | 343 Lisp_Object Vcoding_category_list; |
| 318 | 344 |
| 319 /* Table of coding-systems currently assigned to each coding-category. */ | 345 /* Table of coding categories (Lisp symbols). */ |
| 320 Lisp_Object coding_category_table[CODING_CATEGORY_IDX_MAX]; | 346 Lisp_Object Vcoding_category_table; |
| 321 | 347 |
| 322 /* Table of names of symbol for each coding-category. */ | 348 /* Table of names of symbol for each coding-category. */ |
| 323 char *coding_category_name[CODING_CATEGORY_IDX_MAX] = { | 349 char *coding_category_name[CODING_CATEGORY_IDX_MAX] = { |
| 324 "coding-category-emacs-mule", | 350 "coding-category-emacs-mule", |
| 325 "coding-category-sjis", | 351 "coding-category-sjis", |
| 326 "coding-category-iso-7", | 352 "coding-category-iso-7", |
| 353 "coding-category-iso-7-tight", | |
| 327 "coding-category-iso-8-1", | 354 "coding-category-iso-8-1", |
| 328 "coding-category-iso-8-2", | 355 "coding-category-iso-8-2", |
| 329 "coding-category-iso-7-else", | 356 "coding-category-iso-7-else", |
| 330 "coding-category-iso-8-else", | 357 "coding-category-iso-8-else", |
| 331 "coding-category-big5", | 358 "coding-category-big5", |
| 332 "coding-category-raw-text", | 359 "coding-category-raw-text", |
| 333 "coding-category-binary" | 360 "coding-category-binary" |
| 334 }; | 361 }; |
| 362 | |
| 363 /* Table pointers to coding systems corresponding to each coding | |
| 364 categories. */ | |
| 365 struct coding_system *coding_system_table[CODING_CATEGORY_IDX_MAX]; | |
| 335 | 366 |
| 336 /* Flag to tell if we look up unification table on character code | 367 /* Flag to tell if we look up unification table on character code |
| 337 conversion. */ | 368 conversion. */ |
| 338 Lisp_Object Venable_character_unification; | 369 Lisp_Object Venable_character_unification; |
| 339 /* Standard unification table to look up on decoding (reading). */ | 370 /* Standard unification table to look up on decoding (reading). */ |
| 397 return 0; \ | 428 return 0; \ |
| 398 } while (0) | 429 } while (0) |
| 399 | 430 |
| 400 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". | 431 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". |
| 401 Check if a text is encoded in Emacs' internal format. If it is, | 432 Check if a text is encoded in Emacs' internal format. If it is, |
| 402 return CODING_CATEGORY_MASK_EMASC_MULE, else return 0. */ | 433 return CODING_CATEGORY_MASK_EMACS_MULE, else return 0. */ |
| 403 | 434 |
| 404 int | 435 int |
| 405 detect_coding_emacs_mule (src, src_end) | 436 detect_coding_emacs_mule (src, src_end) |
| 406 unsigned char *src, *src_end; | 437 unsigned char *src, *src_end; |
| 407 { | 438 { |
| 607 Since these are not standard escape sequences of any ISO, the use | 638 Since these are not standard escape sequences of any ISO, the use |
| 608 of them for these meaning is restricted to Emacs only. */ | 639 of them for these meaning is restricted to Emacs only. */ |
| 609 | 640 |
| 610 enum iso_code_class_type iso_code_class[256]; | 641 enum iso_code_class_type iso_code_class[256]; |
| 611 | 642 |
| 643 #define CHARSET_OK(idx, charset) \ | |
| 644 (CODING_SPEC_ISO_REQUESTED_DESIGNATION \ | |
| 645 (coding_system_table[idx], charset) \ | |
| 646 != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION) | |
| 647 | |
| 648 #define SHIFT_OUT_OK(idx) \ | |
| 649 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding_system_table[idx], 1) >= 0) | |
| 650 | |
| 612 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". | 651 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". |
| 613 Check if a text is encoded in ISO2022. If it is, returns an | 652 Check if a text is encoded in ISO2022. If it is, returns an |
| 614 integer in which appropriate flag bits any of: | 653 integer in which appropriate flag bits any of: |
| 615 CODING_CATEGORY_MASK_ISO_7 | 654 CODING_CATEGORY_MASK_ISO_7 |
| 655 CODING_CATEGORY_MASK_ISO_7_TIGHT | |
| 616 CODING_CATEGORY_MASK_ISO_8_1 | 656 CODING_CATEGORY_MASK_ISO_8_1 |
| 617 CODING_CATEGORY_MASK_ISO_8_2 | 657 CODING_CATEGORY_MASK_ISO_8_2 |
| 618 CODING_CATEGORY_MASK_ISO_7_ELSE | 658 CODING_CATEGORY_MASK_ISO_7_ELSE |
| 619 CODING_CATEGORY_MASK_ISO_8_ELSE | 659 CODING_CATEGORY_MASK_ISO_8_ELSE |
| 620 are set. If a code which should never appear in ISO2022 is found, | 660 are set. If a code which should never appear in ISO2022 is found, |
| 622 | 662 |
| 623 int | 663 int |
| 624 detect_coding_iso2022 (src, src_end) | 664 detect_coding_iso2022 (src, src_end) |
| 625 unsigned char *src, *src_end; | 665 unsigned char *src, *src_end; |
| 626 { | 666 { |
| 627 int mask = (CODING_CATEGORY_MASK_ISO_7 | 667 int mask = CODING_CATEGORY_MASK_ISO; |
| 628 | CODING_CATEGORY_MASK_ISO_8_1 | 668 int mask_found = 0; |
| 629 | CODING_CATEGORY_MASK_ISO_8_2 | 669 int reg[4], shift_out = 0; |
| 630 | CODING_CATEGORY_MASK_ISO_7_ELSE | 670 int c, c1, i, charset; |
| 631 | CODING_CATEGORY_MASK_ISO_8_ELSE | 671 |
| 632 ); | 672 reg[0] = CHARSET_ASCII, reg[1] = reg[2] = reg[3] = -1; |
| 633 int g1 = 0; /* 1 iff designating to G1. */ | |
| 634 int c, i; | |
| 635 struct coding_system coding_iso_8_1, coding_iso_8_2; | |
| 636 | |
| 637 /* Coding systems of these categories may accept latin extra codes. */ | |
| 638 setup_coding_system | |
| 639 (XSYMBOL (coding_category_table[CODING_CATEGORY_IDX_ISO_8_1])->value, | |
| 640 &coding_iso_8_1); | |
| 641 setup_coding_system | |
| 642 (XSYMBOL (coding_category_table[CODING_CATEGORY_IDX_ISO_8_2])->value, | |
| 643 &coding_iso_8_2); | |
| 644 | |
| 645 while (mask && src < src_end) | 673 while (mask && src < src_end) |
| 646 { | 674 { |
| 647 c = *src++; | 675 c = *src++; |
| 648 switch (c) | 676 switch (c) |
| 649 { | 677 { |
| 650 case ISO_CODE_ESC: | 678 case ISO_CODE_ESC: |
| 651 if (src >= src_end) | 679 if (src >= src_end) |
| 652 break; | 680 break; |
| 653 c = *src++; | 681 c = *src++; |
| 654 if ((c >= '(' && c <= '/')) | 682 if (c >= '(' && c <= '/') |
| 655 { | 683 { |
| 656 /* Designation sequence for a charset of dimension 1. */ | 684 /* Designation sequence for a charset of dimension 1. */ |
| 657 if (src >= src_end) | 685 if (src >= src_end) |
| 658 break; | 686 break; |
| 659 c = *src++; | 687 c1 = *src++; |
| 660 if (c < ' ' || c >= 0x80) | 688 if (c1 < ' ' || c1 >= 0x80 |
| 661 /* Invalid designation sequence. */ | 689 || (charset = iso_charset_table[0][c >= ','][c1]) < 0) |
| 662 return 0; | 690 /* Invalid designation sequence. Just ignore. */ |
| 691 break; | |
| 692 reg[(c - '(') % 4] = charset; | |
| 663 } | 693 } |
| 664 else if (c == '$') | 694 else if (c == '$') |
| 665 { | 695 { |
| 666 /* Designation sequence for a charset of dimension 2. */ | 696 /* Designation sequence for a charset of dimension 2. */ |
| 667 if (src >= src_end) | 697 if (src >= src_end) |
| 668 break; | 698 break; |
| 669 c = *src++; | 699 c = *src++; |
| 670 if (c >= '@' && c <= 'B') | 700 if (c >= '@' && c <= 'B') |
| 671 /* Designation for JISX0208.1978, GB2312, or JISX0208. */ | 701 /* Designation for JISX0208.1978, GB2312, or JISX0208. */ |
| 672 ; | 702 reg[0] = charset = iso_charset_table[1][0][c]; |
| 673 else if (c >= '(' && c <= '/') | 703 else if (c >= '(' && c <= '/') |
| 674 { | 704 { |
| 675 if (src >= src_end) | 705 if (src >= src_end) |
| 676 break; | 706 break; |
| 677 c = *src++; | 707 c1 = *src++; |
| 678 if (c < ' ' || c >= 0x80) | 708 if (c1 < ' ' || c1 >= 0x80 |
| 679 /* Invalid designation sequence. */ | 709 || (charset = iso_charset_table[1][c >= ','][c1]) < 0) |
| 680 return 0; | 710 /* Invalid designation sequence. Just ignore. */ |
| 711 break; | |
| 712 reg[(c - '(') % 4] = charset; | |
| 681 } | 713 } |
| 682 else | 714 else |
| 683 /* Invalid designation sequence. */ | 715 /* Invalid designation sequence. Just ignore. */ |
| 684 return 0; | 716 break; |
| 685 } | 717 } |
| 686 else if (c == 'N' || c == 'O' || c == 'n' || c == 'o') | 718 else if (c == 'N' || c == 'n') |
| 687 /* Locking shift. */ | 719 { |
| 688 mask &= (CODING_CATEGORY_MASK_ISO_7_ELSE | 720 if (shift_out == 0 |
| 689 | CODING_CATEGORY_MASK_ISO_8_ELSE); | 721 && (reg[1] >= 0 |
| 722 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_7_ELSE) | |
| 723 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_8_ELSE))) | |
| 724 { | |
| 725 /* Locking shift out. */ | |
| 726 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT; | |
| 727 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT; | |
| 728 shift_out = 1; | |
| 729 } | |
| 730 break; | |
| 731 } | |
| 732 else if (c == 'O' || c == 'o') | |
| 733 { | |
| 734 if (shift_out == 1) | |
| 735 { | |
| 736 /* Locking shift in. */ | |
| 737 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT; | |
| 738 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT; | |
| 739 shift_out = 0; | |
| 740 } | |
| 741 break; | |
| 742 } | |
| 690 else if (c == '0' || c == '1' || c == '2') | 743 else if (c == '0' || c == '1' || c == '2') |
| 691 /* Start/end composition. */ | 744 /* Start/end composition. Just ignore. */ |
| 692 ; | 745 break; |
| 693 else | 746 else |
| 694 /* Invalid escape sequence. */ | 747 /* Invalid escape sequence. Just ignore. */ |
| 695 return 0; | 748 break; |
| 749 | |
| 750 /* We found a valid designation sequence for CHARSET. */ | |
| 751 mask &= ~CODING_CATEGORY_MASK_ISO_8BIT; | |
| 752 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7, charset)) | |
| 753 mask_found |= CODING_CATEGORY_MASK_ISO_7; | |
| 754 else | |
| 755 mask &= ~CODING_CATEGORY_MASK_ISO_7; | |
| 756 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT, charset)) | |
| 757 mask_found |= CODING_CATEGORY_MASK_ISO_7_TIGHT; | |
| 758 else | |
| 759 mask &= ~CODING_CATEGORY_MASK_ISO_7_TIGHT; | |
| 760 if (! CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_ELSE, charset)) | |
| 761 mask &= ~CODING_CATEGORY_MASK_ISO_7_ELSE; | |
| 762 if (! CHARSET_OK (CODING_CATEGORY_IDX_ISO_8_ELSE, charset)) | |
| 763 mask &= ~CODING_CATEGORY_MASK_ISO_8_ELSE; | |
| 696 break; | 764 break; |
| 697 | 765 |
| 698 case ISO_CODE_SO: | 766 case ISO_CODE_SO: |
| 699 mask &= (CODING_CATEGORY_MASK_ISO_7_ELSE | 767 if (shift_out == 0 |
| 700 | CODING_CATEGORY_MASK_ISO_8_ELSE); | 768 && (reg[1] >= 0 |
| 769 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_7_ELSE) | |
| 770 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_8_ELSE))) | |
| 771 { | |
| 772 /* Locking shift out. */ | |
| 773 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT; | |
| 774 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT; | |
| 775 } | |
| 701 break; | 776 break; |
| 702 | 777 |
| 778 case ISO_CODE_SI: | |
| 779 if (shift_out == 1) | |
| 780 { | |
| 781 /* Locking shift in. */ | |
| 782 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT; | |
| 783 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT; | |
| 784 } | |
| 785 break; | |
| 786 | |
| 703 case ISO_CODE_CSI: | 787 case ISO_CODE_CSI: |
| 704 case ISO_CODE_SS2: | 788 case ISO_CODE_SS2: |
| 705 case ISO_CODE_SS3: | 789 case ISO_CODE_SS3: |
| 706 { | 790 { |
| 707 int newmask = CODING_CATEGORY_MASK_ISO_8_ELSE; | 791 int newmask = CODING_CATEGORY_MASK_ISO_8_ELSE; |
| 708 | 792 |
| 709 if (c != ISO_CODE_CSI) | 793 if (c != ISO_CODE_CSI) |
| 710 { | 794 { |
| 711 if (coding_iso_8_1.flags & CODING_FLAG_ISO_SINGLE_SHIFT) | 795 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags |
| 796 & CODING_FLAG_ISO_SINGLE_SHIFT) | |
| 712 newmask |= CODING_CATEGORY_MASK_ISO_8_1; | 797 newmask |= CODING_CATEGORY_MASK_ISO_8_1; |
| 713 if (coding_iso_8_2.flags & CODING_FLAG_ISO_SINGLE_SHIFT) | 798 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags |
| 799 & CODING_FLAG_ISO_SINGLE_SHIFT) | |
| 714 newmask |= CODING_CATEGORY_MASK_ISO_8_2; | 800 newmask |= CODING_CATEGORY_MASK_ISO_8_2; |
| 715 } | 801 } |
| 716 if (VECTORP (Vlatin_extra_code_table) | 802 if (VECTORP (Vlatin_extra_code_table) |
| 717 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c])) | 803 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c])) |
| 718 { | 804 { |
| 719 if (coding_iso_8_1.flags & CODING_FLAG_ISO_LATIN_EXTRA) | 805 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags |
| 806 & CODING_FLAG_ISO_LATIN_EXTRA) | |
| 720 newmask |= CODING_CATEGORY_MASK_ISO_8_1; | 807 newmask |= CODING_CATEGORY_MASK_ISO_8_1; |
| 721 if (coding_iso_8_2.flags & CODING_FLAG_ISO_LATIN_EXTRA) | 808 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags |
| 809 & CODING_FLAG_ISO_LATIN_EXTRA) | |
| 722 newmask |= CODING_CATEGORY_MASK_ISO_8_2; | 810 newmask |= CODING_CATEGORY_MASK_ISO_8_2; |
| 723 } | 811 } |
| 724 mask &= newmask; | 812 mask &= newmask; |
| 813 mask_found |= newmask; | |
| 725 } | 814 } |
| 726 break; | 815 break; |
| 727 | 816 |
| 728 default: | 817 default: |
| 729 if (c < 0x80) | 818 if (c < 0x80) |
| 733 if (VECTORP (Vlatin_extra_code_table) | 822 if (VECTORP (Vlatin_extra_code_table) |
| 734 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c])) | 823 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c])) |
| 735 { | 824 { |
| 736 int newmask = 0; | 825 int newmask = 0; |
| 737 | 826 |
| 738 if (coding_iso_8_1.flags & CODING_FLAG_ISO_LATIN_EXTRA) | 827 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags |
| 828 & CODING_FLAG_ISO_LATIN_EXTRA) | |
| 739 newmask |= CODING_CATEGORY_MASK_ISO_8_1; | 829 newmask |= CODING_CATEGORY_MASK_ISO_8_1; |
| 740 if (coding_iso_8_2.flags & CODING_FLAG_ISO_LATIN_EXTRA) | 830 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags |
| 831 & CODING_FLAG_ISO_LATIN_EXTRA) | |
| 741 newmask |= CODING_CATEGORY_MASK_ISO_8_2; | 832 newmask |= CODING_CATEGORY_MASK_ISO_8_2; |
| 742 mask &= newmask; | 833 mask &= newmask; |
| 834 mask_found |= newmask; | |
| 743 } | 835 } |
| 744 else | 836 else |
| 745 return 0; | 837 return 0; |
| 746 } | 838 } |
| 747 else | 839 else |
| 748 { | 840 { |
| 749 unsigned char *src_begin = src; | 841 unsigned char *src_begin = src; |
| 750 | 842 |
| 751 mask &= ~(CODING_CATEGORY_MASK_ISO_7 | 843 mask &= ~(CODING_CATEGORY_MASK_ISO_7BIT |
| 752 | CODING_CATEGORY_MASK_ISO_7_ELSE); | 844 | CODING_CATEGORY_MASK_ISO_7_ELSE); |
| 845 mask_found |= CODING_CATEGORY_MASK_ISO_8_1; | |
| 753 while (src < src_end && *src >= 0xA0) | 846 while (src < src_end && *src >= 0xA0) |
| 754 src++; | 847 src++; |
| 755 if ((src - src_begin - 1) & 1 && src < src_end) | 848 if ((src - src_begin - 1) & 1 && src < src_end) |
| 756 mask &= ~CODING_CATEGORY_MASK_ISO_8_2; | 849 mask &= ~CODING_CATEGORY_MASK_ISO_8_2; |
| 850 else | |
| 851 mask_found |= CODING_CATEGORY_MASK_ISO_8_2; | |
| 757 } | 852 } |
| 758 break; | 853 break; |
| 759 } | 854 } |
| 760 } | 855 } |
| 761 | 856 |
| 762 return mask; | 857 return (mask & mask_found); |
| 763 } | 858 } |
| 764 | 859 |
| 765 /* Decode a character of which charset is CHARSET and the 1st position | 860 /* Decode a character of which charset is CHARSET and the 1st position |
| 766 code is C1. If dimension of CHARSET is 2, the 2nd position code is | 861 code is C1. If dimension of CHARSET is 2, the 2nd position code is |
| 767 fetched from SRC and set to C2. If CHARSET is negative, it means | 862 fetched from SRC and set to C2. If CHARSET is negative, it means |
| 806 /* To tell a composition rule follows. */ \ | 901 /* To tell a composition rule follows. */ \ |
| 807 coding->composing = COMPOSING_WITH_RULE_RULE; \ | 902 coding->composing = COMPOSING_WITH_RULE_RULE; \ |
| 808 } while (0) | 903 } while (0) |
| 809 | 904 |
| 810 /* Set designation state into CODING. */ | 905 /* Set designation state into CODING. */ |
| 811 #define DECODE_DESIGNATION(reg, dimension, chars, final_char) \ | 906 #define DECODE_DESIGNATION(reg, dimension, chars, final_char) \ |
| 812 do { \ | 907 do { \ |
| 813 int charset = ISO_CHARSET_TABLE (make_number (dimension), \ | 908 int charset = ISO_CHARSET_TABLE (make_number (dimension), \ |
| 814 make_number (chars), \ | 909 make_number (chars), \ |
| 815 make_number (final_char)); \ | 910 make_number (final_char)); \ |
| 816 if (charset >= 0) \ | 911 if (charset >= 0 \ |
| 817 { \ | 912 && CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) == reg) \ |
| 818 if (coding->direction == 1 \ | 913 { \ |
| 819 && CHARSET_REVERSE_CHARSET (charset) >= 0) \ | 914 if (coding->spec.iso2022.last_invalid_designation_register == 0 \ |
| 820 charset = CHARSET_REVERSE_CHARSET (charset); \ | 915 && reg == 0 \ |
| 821 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \ | 916 && charset == CHARSET_ASCII) \ |
| 822 } \ | 917 { \ |
| 918 /* We should insert this designation sequence as is so \ | |
| 919 that it is surely written back to a file. */ \ | |
| 920 coding->spec.iso2022.last_invalid_designation_register = -1; \ | |
| 921 goto label_invalid_code; \ | |
| 922 } \ | |
| 923 coding->spec.iso2022.last_invalid_designation_register = -1; \ | |
| 924 if ((coding->mode & CODING_MODE_DIRECTION) \ | |
| 925 && CHARSET_REVERSE_CHARSET (charset) >= 0) \ | |
| 926 charset = CHARSET_REVERSE_CHARSET (charset); \ | |
| 927 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \ | |
| 928 } \ | |
| 929 else \ | |
| 930 { \ | |
| 931 coding->spec.iso2022.last_invalid_designation_register = reg; \ | |
| 932 goto label_invalid_code; \ | |
| 933 } \ | |
| 823 } while (0) | 934 } while (0) |
| 824 | 935 |
| 936 /* Check if the current composing sequence contains only valid codes. | |
| 937 If the composing sequence doesn't end before SRC_END, return -1. | |
| 938 Else, if it contains only valid codes, return 0. | |
| 939 Else return the length of the composing sequence. */ | |
| 940 | |
| 941 int check_composing_code (coding, src, src_end) | |
| 942 struct coding_system *coding; | |
| 943 unsigned char *src, *src_end; | |
| 944 { | |
| 945 unsigned char *src_start = src; | |
| 946 int invalid_code_found = 0; | |
| 947 int charset, c, c1, dim; | |
| 948 | |
| 949 while (src < src_end) | |
| 950 { | |
| 951 if (*src++ != ISO_CODE_ESC) continue; | |
| 952 if (src >= src_end) break; | |
| 953 if ((c = *src++) == '1') /* end of compsition */ | |
| 954 return (invalid_code_found ? src - src_start : 0); | |
| 955 if (src + 2 >= src_end) break; | |
| 956 if (!coding->flags & CODING_FLAG_ISO_DESIGNATION) | |
| 957 invalid_code_found = 1; | |
| 958 else | |
| 959 { | |
| 960 dim = 0; | |
| 961 if (c == '$') | |
| 962 { | |
| 963 dim = 1; | |
| 964 c = (*src >= '@' && *src <= 'B') ? '(' : *src++; | |
| 965 } | |
| 966 if (c >= '(' && c <= '/') | |
| 967 { | |
| 968 c1 = *src++; | |
| 969 if ((c1 < ' ' || c1 >= 0x80) | |
| 970 || (charset = iso_charset_table[dim][c >= ','][c1]) < 0 | |
| 971 || (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) | |
| 972 == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION)) | |
| 973 invalid_code_found = 1; | |
| 974 } | |
| 975 else | |
| 976 invalid_code_found = 1; | |
| 977 } | |
| 978 } | |
| 979 return ((coding->mode & CODING_MODE_LAST_BLOCK) ? src_end - src_start : -1); | |
| 980 } | |
| 981 | |
| 825 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */ | 982 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */ |
| 826 | 983 |
| 827 int | 984 int |
| 828 decode_coding_iso2022 (coding, source, destination, | 985 decode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes) |
| 829 src_bytes, dst_bytes, consumed) | |
| 830 struct coding_system *coding; | 986 struct coding_system *coding; |
| 831 unsigned char *source, *destination; | 987 unsigned char *source, *destination; |
| 832 int src_bytes, dst_bytes; | 988 int src_bytes, dst_bytes; |
| 833 int *consumed; | |
| 834 { | 989 { |
| 835 unsigned char *src = source; | 990 unsigned char *src = source; |
| 836 unsigned char *src_end = source + src_bytes; | 991 unsigned char *src_end = source + src_bytes; |
| 837 unsigned char *dst = destination; | 992 unsigned char *dst = destination; |
| 838 unsigned char *dst_end = destination + dst_bytes; | 993 unsigned char *dst_end = destination + dst_bytes; |
| 843 int charset; | 998 int charset; |
| 844 /* Charsets invoked to graphic plane 0 and 1 respectively. */ | 999 /* Charsets invoked to graphic plane 0 and 1 respectively. */ |
| 845 int charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); | 1000 int charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); |
| 846 int charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1); | 1001 int charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1); |
| 847 Lisp_Object unification_table | 1002 Lisp_Object unification_table |
| 848 = coding->character_unification_table_for_decode; | 1003 = coding->character_unification_table_for_decode; |
| 1004 int result = CODING_FINISH_NORMAL; | |
| 849 | 1005 |
| 850 if (!NILP (Venable_character_unification) && NILP (unification_table)) | 1006 if (!NILP (Venable_character_unification) && NILP (unification_table)) |
| 851 unification_table = Vstandard_character_unification_table_for_decode; | 1007 unification_table = Vstandard_character_unification_table_for_decode; |
| 852 | 1008 |
| 853 while (src < src_end && dst < adjusted_dst_end) | 1009 coding->produced_char = 0; |
| 1010 while (src < src_end && (dst_bytes | |
| 1011 ? (dst < adjusted_dst_end) | |
| 1012 : (dst < src - 6))) | |
| 854 { | 1013 { |
| 855 /* SRC_BASE remembers the start position in source in each loop. | 1014 /* SRC_BASE remembers the start position in source in each loop. |
| 856 The loop will be exited when there's not enough source text | 1015 The loop will be exited when there's not enough source text |
| 857 to analyze long escape sequence or 2-byte code (within macros | 1016 to analyze long escape sequence or 2-byte code (within macros |
| 858 ONE_MORE_BYTE or TWO_MORE_BYTES). In that case, SRC is reset | 1017 ONE_MORE_BYTE or TWO_MORE_BYTES). In that case, SRC is reset |
| 866 if (!coding->composing | 1025 if (!coding->composing |
| 867 && (charset0 < 0 || CHARSET_CHARS (charset0) == 94)) | 1026 && (charset0 < 0 || CHARSET_CHARS (charset0) == 94)) |
| 868 { | 1027 { |
| 869 /* This is SPACE or DEL. */ | 1028 /* This is SPACE or DEL. */ |
| 870 *dst++ = c1; | 1029 *dst++ = c1; |
| 1030 coding->produced_char++; | |
| 871 break; | 1031 break; |
| 872 } | 1032 } |
| 873 /* This is a graphic character, we fall down ... */ | 1033 /* This is a graphic character, we fall down ... */ |
| 874 | 1034 |
| 875 case ISO_graphic_plane_0: | 1035 case ISO_graphic_plane_0: |
| 882 else | 1042 else |
| 883 DECODE_ISO_CHARACTER (charset0, c1); | 1043 DECODE_ISO_CHARACTER (charset0, c1); |
| 884 break; | 1044 break; |
| 885 | 1045 |
| 886 case ISO_0xA0_or_0xFF: | 1046 case ISO_0xA0_or_0xFF: |
| 887 if (charset1 < 0 || CHARSET_CHARS (charset1) == 94) | 1047 if (charset1 < 0 || CHARSET_CHARS (charset1) == 94 |
| 1048 || coding->flags & CODING_FLAG_ISO_SEVEN_BITS) | |
| 888 { | 1049 { |
| 889 /* Invalid code. */ | 1050 /* Invalid code. */ |
| 890 *dst++ = c1; | 1051 *dst++ = c1; |
| 1052 coding->produced_char++; | |
| 891 break; | 1053 break; |
| 892 } | 1054 } |
| 893 /* This is a graphic character, we fall down ... */ | 1055 /* This is a graphic character, we fall down ... */ |
| 894 | 1056 |
| 895 case ISO_graphic_plane_1: | 1057 case ISO_graphic_plane_1: |
| 896 DECODE_ISO_CHARACTER (charset1, c1); | 1058 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) |
| 1059 { | |
| 1060 /* Invalid code. */ | |
| 1061 *dst++ = c1; | |
| 1062 coding->produced_char++; | |
| 1063 } | |
| 1064 else | |
| 1065 DECODE_ISO_CHARACTER (charset1, c1); | |
| 897 break; | 1066 break; |
| 898 | 1067 |
| 899 case ISO_control_code: | 1068 case ISO_control_code: |
| 900 /* All ISO2022 control characters in this class have the | 1069 /* All ISO2022 control characters in this class have the |
| 901 same representation in Emacs internal format. */ | 1070 same representation in Emacs internal format. */ |
| 1071 if (c1 == '\n' | |
| 1072 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL) | |
| 1073 && (coding->eol_type == CODING_EOL_CR | |
| 1074 || coding->eol_type == CODING_EOL_CRLF)) | |
| 1075 { | |
| 1076 result = CODING_FINISH_INCONSISTENT_EOL; | |
| 1077 goto label_end_of_loop_2; | |
| 1078 } | |
| 902 *dst++ = c1; | 1079 *dst++ = c1; |
| 1080 coding->produced_char++; | |
| 903 break; | 1081 break; |
| 904 | 1082 |
| 905 case ISO_carriage_return: | 1083 case ISO_carriage_return: |
| 906 if (coding->eol_type == CODING_EOL_CR) | 1084 if (coding->eol_type == CODING_EOL_CR) |
| 907 { | 1085 *dst++ = '\n'; |
| 908 *dst++ = '\n'; | |
| 909 } | |
| 910 else if (coding->eol_type == CODING_EOL_CRLF) | 1086 else if (coding->eol_type == CODING_EOL_CRLF) |
| 911 { | 1087 { |
| 912 ONE_MORE_BYTE (c1); | 1088 ONE_MORE_BYTE (c1); |
| 913 if (c1 == ISO_CODE_LF) | 1089 if (c1 == ISO_CODE_LF) |
| 914 *dst++ = '\n'; | 1090 *dst++ = '\n'; |
| 915 else | 1091 else |
| 916 { | 1092 { |
| 1093 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL) | |
| 1094 { | |
| 1095 result = CODING_FINISH_INCONSISTENT_EOL; | |
| 1096 goto label_end_of_loop_2; | |
| 1097 } | |
| 917 src--; | 1098 src--; |
| 918 *dst++ = c1; | 1099 *dst++ = '\r'; |
| 919 } | 1100 } |
| 920 } | 1101 } |
| 921 else | 1102 else |
| 922 { | 1103 *dst++ = c1; |
| 923 *dst++ = c1; | 1104 coding->produced_char++; |
| 924 } | |
| 925 break; | 1105 break; |
| 926 | 1106 |
| 927 case ISO_shift_out: | 1107 case ISO_shift_out: |
| 928 if (CODING_SPEC_ISO_DESIGNATION (coding, 1) < 0) | 1108 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT) |
| 929 goto label_invalid_escape_sequence; | 1109 || CODING_SPEC_ISO_DESIGNATION (coding, 1) < 0) |
| 1110 goto label_invalid_code; | |
| 930 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1; | 1111 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1; |
| 931 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); | 1112 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); |
| 932 break; | 1113 break; |
| 933 | 1114 |
| 934 case ISO_shift_in: | 1115 case ISO_shift_in: |
| 1116 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)) | |
| 1117 goto label_invalid_code; | |
| 935 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0; | 1118 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0; |
| 936 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); | 1119 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); |
| 937 break; | 1120 break; |
| 938 | 1121 |
| 939 case ISO_single_shift_2_7: | 1122 case ISO_single_shift_2_7: |
| 940 case ISO_single_shift_2: | 1123 case ISO_single_shift_2: |
| 1124 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)) | |
| 1125 goto label_invalid_code; | |
| 941 /* SS2 is handled as an escape sequence of ESC 'N' */ | 1126 /* SS2 is handled as an escape sequence of ESC 'N' */ |
| 942 c1 = 'N'; | 1127 c1 = 'N'; |
| 943 goto label_escape_sequence; | 1128 goto label_escape_sequence; |
| 944 | 1129 |
| 945 case ISO_single_shift_3: | 1130 case ISO_single_shift_3: |
| 1131 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)) | |
| 1132 goto label_invalid_code; | |
| 946 /* SS2 is handled as an escape sequence of ESC 'O' */ | 1133 /* SS2 is handled as an escape sequence of ESC 'O' */ |
| 947 c1 = 'O'; | 1134 c1 = 'O'; |
| 948 goto label_escape_sequence; | 1135 goto label_escape_sequence; |
| 949 | 1136 |
| 950 case ISO_control_sequence_introducer: | 1137 case ISO_control_sequence_introducer: |
| 961 switch (c1) | 1148 switch (c1) |
| 962 { | 1149 { |
| 963 case '&': /* revision of following character set */ | 1150 case '&': /* revision of following character set */ |
| 964 ONE_MORE_BYTE (c1); | 1151 ONE_MORE_BYTE (c1); |
| 965 if (!(c1 >= '@' && c1 <= '~')) | 1152 if (!(c1 >= '@' && c1 <= '~')) |
| 966 goto label_invalid_escape_sequence; | 1153 goto label_invalid_code; |
| 967 ONE_MORE_BYTE (c1); | 1154 ONE_MORE_BYTE (c1); |
| 968 if (c1 != ISO_CODE_ESC) | 1155 if (c1 != ISO_CODE_ESC) |
| 969 goto label_invalid_escape_sequence; | 1156 goto label_invalid_code; |
| 970 ONE_MORE_BYTE (c1); | 1157 ONE_MORE_BYTE (c1); |
| 971 goto label_escape_sequence; | 1158 goto label_escape_sequence; |
| 972 | 1159 |
| 973 case '$': /* designation of 2-byte character set */ | 1160 case '$': /* designation of 2-byte character set */ |
| 1161 if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION)) | |
| 1162 goto label_invalid_code; | |
| 974 ONE_MORE_BYTE (c1); | 1163 ONE_MORE_BYTE (c1); |
| 975 if (c1 >= '@' && c1 <= 'B') | 1164 if (c1 >= '@' && c1 <= 'B') |
| 976 { /* designation of JISX0208.1978, GB2312.1980, | 1165 { /* designation of JISX0208.1978, GB2312.1980, |
| 977 or JISX0208.1980 */ | 1166 or JISX0208.1980 */ |
| 978 DECODE_DESIGNATION (0, 2, 94, c1); | 1167 DECODE_DESIGNATION (0, 2, 94, c1); |
| 986 { /* designation of DIMENSION2_CHARS96 character set */ | 1175 { /* designation of DIMENSION2_CHARS96 character set */ |
| 987 ONE_MORE_BYTE (c2); | 1176 ONE_MORE_BYTE (c2); |
| 988 DECODE_DESIGNATION (c1 - 0x2C, 2, 96, c2); | 1177 DECODE_DESIGNATION (c1 - 0x2C, 2, 96, c2); |
| 989 } | 1178 } |
| 990 else | 1179 else |
| 991 goto label_invalid_escape_sequence; | 1180 goto label_invalid_code; |
| 992 break; | 1181 break; |
| 993 | 1182 |
| 994 case 'n': /* invocation of locking-shift-2 */ | 1183 case 'n': /* invocation of locking-shift-2 */ |
| 995 if (CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0) | 1184 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT) |
| 996 goto label_invalid_escape_sequence; | 1185 || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0) |
| 1186 goto label_invalid_code; | |
| 997 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2; | 1187 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2; |
| 998 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); | 1188 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); |
| 999 break; | 1189 break; |
| 1000 | 1190 |
| 1001 case 'o': /* invocation of locking-shift-3 */ | 1191 case 'o': /* invocation of locking-shift-3 */ |
| 1002 if (CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0) | 1192 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT) |
| 1003 goto label_invalid_escape_sequence; | 1193 || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0) |
| 1194 goto label_invalid_code; | |
| 1004 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3; | 1195 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3; |
| 1005 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); | 1196 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); |
| 1006 break; | 1197 break; |
| 1007 | 1198 |
| 1008 case 'N': /* invocation of single-shift-2 */ | 1199 case 'N': /* invocation of single-shift-2 */ |
| 1009 if (CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0) | 1200 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT) |
| 1010 goto label_invalid_escape_sequence; | 1201 || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0) |
| 1202 goto label_invalid_code; | |
| 1011 ONE_MORE_BYTE (c1); | 1203 ONE_MORE_BYTE (c1); |
| 1012 charset = CODING_SPEC_ISO_DESIGNATION (coding, 2); | 1204 charset = CODING_SPEC_ISO_DESIGNATION (coding, 2); |
| 1013 DECODE_ISO_CHARACTER (charset, c1); | 1205 DECODE_ISO_CHARACTER (charset, c1); |
| 1014 break; | 1206 break; |
| 1015 | 1207 |
| 1016 case 'O': /* invocation of single-shift-3 */ | 1208 case 'O': /* invocation of single-shift-3 */ |
| 1017 if (CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0) | 1209 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT) |
| 1018 goto label_invalid_escape_sequence; | 1210 || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0) |
| 1211 goto label_invalid_code; | |
| 1019 ONE_MORE_BYTE (c1); | 1212 ONE_MORE_BYTE (c1); |
| 1020 charset = CODING_SPEC_ISO_DESIGNATION (coding, 3); | 1213 charset = CODING_SPEC_ISO_DESIGNATION (coding, 3); |
| 1021 DECODE_ISO_CHARACTER (charset, c1); | 1214 DECODE_ISO_CHARACTER (charset, c1); |
| 1022 break; | 1215 break; |
| 1023 | 1216 |
| 1024 case '0': /* start composing without embeded rules */ | 1217 case '0': case '2': /* start composing */ |
| 1025 coding->composing = COMPOSING_NO_RULE_HEAD; | 1218 /* Before processing composing, we must be sure that all |
| 1219 characters being composed are supported by CODING. | |
| 1220 If not, we must give up composing and insert the | |
| 1221 bunch of codes for composing as is without decoding. */ | |
| 1222 { | |
| 1223 int result1; | |
| 1224 | |
| 1225 result1 = check_composing_code (coding, src, src_end); | |
| 1226 if (result1 == 0) | |
| 1227 coding->composing = (c1 == '0' | |
| 1228 ? COMPOSING_NO_RULE_HEAD | |
| 1229 : COMPOSING_WITH_RULE_HEAD); | |
| 1230 else if (result1 > 0) | |
| 1231 { | |
| 1232 if (result1 + 2 < (dst_bytes ? dst_end : src_base) - dst) | |
| 1233 { | |
| 1234 bcopy (src_base, dst, result1 + 2); | |
| 1235 src += result1; | |
| 1236 dst += result1 + 2; | |
| 1237 coding->produced_char += result1 + 2; | |
| 1238 } | |
| 1239 else | |
| 1240 { | |
| 1241 result = CODING_FINISH_INSUFFICIENT_DST; | |
| 1242 goto label_end_of_loop_2; | |
| 1243 } | |
| 1244 } | |
| 1245 else | |
| 1246 goto label_end_of_loop; | |
| 1247 } | |
| 1026 break; | 1248 break; |
| 1027 | 1249 |
| 1028 case '1': /* end composing */ | 1250 case '1': /* end composing */ |
| 1029 coding->composing = COMPOSING_NO; | 1251 coding->composing = COMPOSING_NO; |
| 1252 coding->produced_char++; | |
| 1030 break; | 1253 break; |
| 1031 | 1254 |
| 1032 case '2': /* start composing with embeded rules */ | |
| 1033 coding->composing = COMPOSING_WITH_RULE_HEAD; | |
| 1034 break; | |
| 1035 | |
| 1036 case '[': /* specification of direction */ | 1255 case '[': /* specification of direction */ |
| 1256 if (coding->flags & CODING_FLAG_ISO_NO_DIRECTION) | |
| 1257 goto label_invalid_code; | |
| 1037 /* For the moment, nested direction is not supported. | 1258 /* For the moment, nested direction is not supported. |
| 1038 So, the value of `coding->direction' is 0 or 1: 0 | 1259 So, `coding->mode & CODING_MODE_DIRECTION' zero means |
| 1039 means left-to-right, 1 means right-to-left. */ | 1260 left-to-right, and nozero means right-to-left. */ |
| 1040 ONE_MORE_BYTE (c1); | 1261 ONE_MORE_BYTE (c1); |
| 1041 switch (c1) | 1262 switch (c1) |
| 1042 { | 1263 { |
| 1043 case ']': /* end of the current direction */ | 1264 case ']': /* end of the current direction */ |
| 1044 coding->direction = 0; | 1265 coding->mode &= ~CODING_MODE_DIRECTION; |
| 1045 | 1266 |
| 1046 case '0': /* end of the current direction */ | 1267 case '0': /* end of the current direction */ |
| 1047 case '1': /* start of left-to-right direction */ | 1268 case '1': /* start of left-to-right direction */ |
| 1048 ONE_MORE_BYTE (c1); | 1269 ONE_MORE_BYTE (c1); |
| 1049 if (c1 == ']') | 1270 if (c1 == ']') |
| 1050 coding->direction = 0; | 1271 coding->mode &= ~CODING_MODE_DIRECTION; |
| 1051 else | 1272 else |
| 1052 goto label_invalid_escape_sequence; | 1273 goto label_invalid_code; |
| 1053 break; | 1274 break; |
| 1054 | 1275 |
| 1055 case '2': /* start of right-to-left direction */ | 1276 case '2': /* start of right-to-left direction */ |
| 1056 ONE_MORE_BYTE (c1); | 1277 ONE_MORE_BYTE (c1); |
| 1057 if (c1 == ']') | 1278 if (c1 == ']') |
| 1058 coding->direction= 1; | 1279 coding->mode |= CODING_MODE_DIRECTION; |
| 1059 else | 1280 else |
| 1060 goto label_invalid_escape_sequence; | 1281 goto label_invalid_code; |
| 1061 break; | 1282 break; |
| 1062 | 1283 |
| 1063 default: | 1284 default: |
| 1064 goto label_invalid_escape_sequence; | 1285 goto label_invalid_code; |
| 1065 } | 1286 } |
| 1066 break; | 1287 break; |
| 1067 | 1288 |
| 1068 default: | 1289 default: |
| 1290 if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION)) | |
| 1291 goto label_invalid_code; | |
| 1069 if (c1 >= 0x28 && c1 <= 0x2B) | 1292 if (c1 >= 0x28 && c1 <= 0x2B) |
| 1070 { /* designation of DIMENSION1_CHARS94 character set */ | 1293 { /* designation of DIMENSION1_CHARS94 character set */ |
| 1071 ONE_MORE_BYTE (c2); | 1294 ONE_MORE_BYTE (c2); |
| 1072 DECODE_DESIGNATION (c1 - 0x28, 1, 94, c2); | 1295 DECODE_DESIGNATION (c1 - 0x28, 1, 94, c2); |
| 1073 } | 1296 } |
| 1076 ONE_MORE_BYTE (c2); | 1299 ONE_MORE_BYTE (c2); |
| 1077 DECODE_DESIGNATION (c1 - 0x2C, 1, 96, c2); | 1300 DECODE_DESIGNATION (c1 - 0x2C, 1, 96, c2); |
| 1078 } | 1301 } |
| 1079 else | 1302 else |
| 1080 { | 1303 { |
| 1081 goto label_invalid_escape_sequence; | 1304 goto label_invalid_code; |
| 1082 } | 1305 } |
| 1083 } | 1306 } |
| 1084 /* We must update these variables now. */ | 1307 /* We must update these variables now. */ |
| 1085 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); | 1308 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); |
| 1086 charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1); | 1309 charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1); |
| 1087 break; | 1310 break; |
| 1088 | 1311 |
| 1089 label_invalid_escape_sequence: | 1312 label_invalid_code: |
| 1090 { | 1313 coding->produced_char += src - src_base; |
| 1091 int length = src - src_base; | 1314 while (src_base < src) |
| 1092 | 1315 *dst++ = *src_base++; |
| 1093 bcopy (src_base, dst, length); | |
| 1094 dst += length; | |
| 1095 } | |
| 1096 } | 1316 } |
| 1097 continue; | 1317 continue; |
| 1098 | 1318 |
| 1099 label_end_of_loop: | 1319 label_end_of_loop: |
| 1100 coding->carryover_size = src - src_base; | 1320 result = CODING_FINISH_INSUFFICIENT_SRC; |
| 1101 bcopy (src_base, coding->carryover, coding->carryover_size); | 1321 label_end_of_loop_2: |
| 1102 src = src_base; | 1322 src = src_base; |
| 1103 break; | 1323 break; |
| 1104 } | 1324 } |
| 1325 | |
| 1326 if (result == CODING_FINISH_NORMAL | |
| 1327 && src < src_end) | |
| 1328 result = CODING_FINISH_INSUFFICIENT_DST; | |
| 1105 | 1329 |
| 1106 /* If this is the last block of the text to be decoded, we had | 1330 /* If this is the last block of the text to be decoded, we had |
| 1107 better just flush out all remaining codes in the text although | 1331 better just flush out all remaining codes in the text although |
| 1108 they are not valid characters. */ | 1332 they are not valid characters. */ |
| 1109 if (coding->last_block) | 1333 if (coding->mode & CODING_MODE_LAST_BLOCK) |
| 1110 { | 1334 { |
| 1111 bcopy (src, dst, src_end - src); | 1335 bcopy (src, dst, src_end - src); |
| 1112 dst += (src_end - src); | 1336 dst += (src_end - src); |
| 1113 src = src_end; | 1337 src = src_end; |
| 1114 } | 1338 } |
| 1115 *consumed = src - source; | 1339 coding->consumed = coding->consumed_char = src - source; |
| 1116 return dst - destination; | 1340 coding->produced = dst - destination; |
| 1341 return result; | |
| 1117 } | 1342 } |
| 1118 | 1343 |
| 1119 /* ISO2022 encoding stuff. */ | 1344 /* ISO2022 encoding stuff. */ |
| 1120 | 1345 |
| 1121 /* | 1346 /* |
| 1122 It is not enough to say just "ISO2022" on encoding, we have to | 1347 It is not enough to say just "ISO2022" on encoding, we have to |
| 1123 specify more details. In Emacs, each coding-system of ISO2022 | 1348 specify more details. In Emacs, each coding system of ISO2022 |
| 1124 variant has the following specifications: | 1349 variant has the following specifications: |
| 1125 1. Initial designation to G0 thru G3. | 1350 1. Initial designation to G0 thru G3. |
| 1126 2. Allows short-form designation? | 1351 2. Allows short-form designation? |
| 1127 3. ASCII should be designated to G0 before control characters? | 1352 3. ASCII should be designated to G0 before control characters? |
| 1128 4. ASCII should be designated to G0 at end of line? | 1353 4. ASCII should be designated to G0 at end of line? |
| 1327 charset_alt = charset; \ | 1552 charset_alt = charset; \ |
| 1328 if (CHARSET_DIMENSION (charset_alt) == 1) \ | 1553 if (CHARSET_DIMENSION (charset_alt) == 1) \ |
| 1329 ENCODE_ISO_CHARACTER_DIMENSION1 (charset_alt, c1); \ | 1554 ENCODE_ISO_CHARACTER_DIMENSION1 (charset_alt, c1); \ |
| 1330 else \ | 1555 else \ |
| 1331 ENCODE_ISO_CHARACTER_DIMENSION2 (charset_alt, c1, c2); \ | 1556 ENCODE_ISO_CHARACTER_DIMENSION2 (charset_alt, c1, c2); \ |
| 1557 if (! COMPOSING_P (coding->composing)) \ | |
| 1558 coding->consumed_char++; \ | |
| 1332 } while (0) | 1559 } while (0) |
| 1333 | 1560 |
| 1334 /* Produce designation and invocation codes at a place pointed by DST | 1561 /* Produce designation and invocation codes at a place pointed by DST |
| 1335 to use CHARSET. The element `spec.iso2022' of *CODING is updated. | 1562 to use CHARSET. The element `spec.iso2022' of *CODING is updated. |
| 1336 Return new DST. */ | 1563 Return new DST. */ |
| 1429 ENCODE_DESIGNATION \ | 1656 ENCODE_DESIGNATION \ |
| 1430 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg), reg, coding); \ | 1657 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg), reg, coding); \ |
| 1431 } while (0) | 1658 } while (0) |
| 1432 | 1659 |
| 1433 /* Produce designation sequences of charsets in the line started from | 1660 /* Produce designation sequences of charsets in the line started from |
| 1434 *SRC to a place pointed by DSTP. | 1661 SRC to a place pointed by *DSTP, and update DSTP. |
| 1435 | 1662 |
| 1436 If the current block ends before any end-of-line, we may fail to | 1663 If the current block ends before any end-of-line, we may fail to |
| 1437 find all the necessary *designations. */ | 1664 find all the necessary designations. */ |
| 1665 | |
| 1438 encode_designation_at_bol (coding, table, src, src_end, dstp) | 1666 encode_designation_at_bol (coding, table, src, src_end, dstp) |
| 1439 struct coding_system *coding; | 1667 struct coding_system *coding; |
| 1440 Lisp_Object table; | 1668 Lisp_Object table; |
| 1441 unsigned char *src, *src_end, **dstp; | 1669 unsigned char *src, *src_end, **dstp; |
| 1442 { | 1670 { |
| 1463 if ((c_alt = unify_char (table, -1, charset, c1, c2)) >= 0) | 1691 if ((c_alt = unify_char (table, -1, charset, c1, c2)) >= 0) |
| 1464 charset = CHAR_CHARSET (c_alt); | 1692 charset = CHAR_CHARSET (c_alt); |
| 1465 } | 1693 } |
| 1466 | 1694 |
| 1467 reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset); | 1695 reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset); |
| 1468 if (r[reg] < 0) | 1696 if (reg != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION && r[reg] < 0) |
| 1469 { | 1697 { |
| 1470 found++; | 1698 found++; |
| 1471 r[reg] = charset; | 1699 r[reg] = charset; |
| 1472 } | 1700 } |
| 1473 | 1701 |
| 1485 } | 1713 } |
| 1486 | 1714 |
| 1487 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */ | 1715 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */ |
| 1488 | 1716 |
| 1489 int | 1717 int |
| 1490 encode_coding_iso2022 (coding, source, destination, | 1718 encode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes) |
| 1491 src_bytes, dst_bytes, consumed) | |
| 1492 struct coding_system *coding; | 1719 struct coding_system *coding; |
| 1493 unsigned char *source, *destination; | 1720 unsigned char *source, *destination; |
| 1494 int src_bytes, dst_bytes; | 1721 int src_bytes, dst_bytes; |
| 1495 int *consumed; | |
| 1496 { | 1722 { |
| 1497 unsigned char *src = source; | 1723 unsigned char *src = source; |
| 1498 unsigned char *src_end = source + src_bytes; | 1724 unsigned char *src_end = source + src_bytes; |
| 1499 unsigned char *dst = destination; | 1725 unsigned char *dst = destination; |
| 1500 unsigned char *dst_end = destination + dst_bytes; | 1726 unsigned char *dst_end = destination + dst_bytes; |
| 1502 from DST_END to assure overflow checking is necessary only at the | 1728 from DST_END to assure overflow checking is necessary only at the |
| 1503 head of loop. */ | 1729 head of loop. */ |
| 1504 unsigned char *adjusted_dst_end = dst_end - 19; | 1730 unsigned char *adjusted_dst_end = dst_end - 19; |
| 1505 Lisp_Object unification_table | 1731 Lisp_Object unification_table |
| 1506 = coding->character_unification_table_for_encode; | 1732 = coding->character_unification_table_for_encode; |
| 1733 int result = CODING_FINISH_NORMAL; | |
| 1507 | 1734 |
| 1508 if (!NILP (Venable_character_unification) && NILP (unification_table)) | 1735 if (!NILP (Venable_character_unification) && NILP (unification_table)) |
| 1509 unification_table = Vstandard_character_unification_table_for_encode; | 1736 unification_table = Vstandard_character_unification_table_for_encode; |
| 1510 | 1737 |
| 1511 while (src < src_end && dst < adjusted_dst_end) | 1738 coding->consumed_char = 0; |
| 1739 while (src < src_end && (dst_bytes | |
| 1740 ? (dst < adjusted_dst_end) | |
| 1741 : (dst < src - 19))) | |
| 1512 { | 1742 { |
| 1513 /* SRC_BASE remembers the start position in source in each loop. | 1743 /* SRC_BASE remembers the start position in source in each loop. |
| 1514 The loop will be exited when there's not enough source text | 1744 The loop will be exited when there's not enough source text |
| 1515 to analyze multi-byte codes (within macros ONE_MORE_BYTE, | 1745 to analyze multi-byte codes (within macros ONE_MORE_BYTE, |
| 1516 TWO_MORE_BYTES, and THREE_MORE_BYTES). In that case, SRC is | 1746 TWO_MORE_BYTES, and THREE_MORE_BYTES). In that case, SRC is |
| 1527 CODING_SPEC_ISO_BOL (coding) = 0; | 1757 CODING_SPEC_ISO_BOL (coding) = 0; |
| 1528 } | 1758 } |
| 1529 | 1759 |
| 1530 c1 = *src++; | 1760 c1 = *src++; |
| 1531 /* If we are seeing a component of a composite character, we are | 1761 /* If we are seeing a component of a composite character, we are |
| 1532 seeing a leading-code specially encoded for composition, or a | 1762 seeing a leading-code encoded irregularly for composition, or |
| 1533 composition rule if composing with rule. We must set C1 | 1763 a composition rule if composing with rule. We must set C1 to |
| 1534 to a normal leading-code or an ASCII code. If we are not at | 1764 a normal leading-code or an ASCII code. If we are not seeing |
| 1535 a composed character, we must reset the composition state. */ | 1765 a composite character, we must reset composition, |
| 1766 designation, and invocation states. */ | |
| 1536 if (COMPOSING_P (coding->composing)) | 1767 if (COMPOSING_P (coding->composing)) |
| 1537 { | 1768 { |
| 1538 if (c1 < 0xA0) | 1769 if (c1 < 0xA0) |
| 1539 { | 1770 { |
| 1540 /* We are not in a composite character any longer. */ | 1771 /* We are not in a composite character any longer. */ |
| 1541 coding->composing = COMPOSING_NO; | 1772 coding->composing = COMPOSING_NO; |
| 1773 ENCODE_RESET_PLANE_AND_REGISTER; | |
| 1542 ENCODE_COMPOSITION_END; | 1774 ENCODE_COMPOSITION_END; |
| 1543 } | 1775 } |
| 1544 else | 1776 else |
| 1545 { | 1777 { |
| 1546 if (coding->composing == COMPOSING_WITH_RULE_RULE) | 1778 if (coding->composing == COMPOSING_WITH_RULE_RULE) |
| 1573 | 1805 |
| 1574 case EMACS_control_code: | 1806 case EMACS_control_code: |
| 1575 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL) | 1807 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL) |
| 1576 ENCODE_RESET_PLANE_AND_REGISTER; | 1808 ENCODE_RESET_PLANE_AND_REGISTER; |
| 1577 *dst++ = c1; | 1809 *dst++ = c1; |
| 1810 coding->consumed_char++; | |
| 1578 break; | 1811 break; |
| 1579 | 1812 |
| 1580 case EMACS_carriage_return_code: | 1813 case EMACS_carriage_return_code: |
| 1581 if (!coding->selective) | 1814 if (! (coding->mode & CODING_MODE_SELECTIVE_DISPLAY)) |
| 1582 { | 1815 { |
| 1583 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL) | 1816 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL) |
| 1584 ENCODE_RESET_PLANE_AND_REGISTER; | 1817 ENCODE_RESET_PLANE_AND_REGISTER; |
| 1585 *dst++ = c1; | 1818 *dst++ = c1; |
| 1819 coding->consumed_char++; | |
| 1586 break; | 1820 break; |
| 1587 } | 1821 } |
| 1588 /* fall down to treat '\r' as '\n' ... */ | 1822 /* fall down to treat '\r' as '\n' ... */ |
| 1589 | 1823 |
| 1590 case EMACS_linefeed_code: | 1824 case EMACS_linefeed_code: |
| 1600 else if (coding->eol_type == CODING_EOL_CRLF) | 1834 else if (coding->eol_type == CODING_EOL_CRLF) |
| 1601 *dst++ = ISO_CODE_CR, *dst++ = ISO_CODE_LF; | 1835 *dst++ = ISO_CODE_CR, *dst++ = ISO_CODE_LF; |
| 1602 else | 1836 else |
| 1603 *dst++ = ISO_CODE_CR; | 1837 *dst++ = ISO_CODE_CR; |
| 1604 CODING_SPEC_ISO_BOL (coding) = 1; | 1838 CODING_SPEC_ISO_BOL (coding) = 1; |
| 1839 coding->consumed_char++; | |
| 1605 break; | 1840 break; |
| 1606 | 1841 |
| 1607 case EMACS_leading_code_2: | 1842 case EMACS_leading_code_2: |
| 1608 ONE_MORE_BYTE (c2); | 1843 ONE_MORE_BYTE (c2); |
| 1609 if (c2 < 0xA0) | 1844 if (c2 < 0xA0) |
| 1610 { | 1845 { |
| 1611 /* invalid sequence */ | 1846 /* invalid sequence */ |
| 1612 *dst++ = c1; | 1847 *dst++ = c1; |
| 1613 *dst++ = c2; | 1848 *dst++ = c2; |
| 1849 coding->consumed_char += 2; | |
| 1614 } | 1850 } |
| 1615 else | 1851 else |
| 1616 ENCODE_ISO_CHARACTER (c1, c2, /* dummy */ c3); | 1852 ENCODE_ISO_CHARACTER (c1, c2, /* dummy */ c3); |
| 1617 break; | 1853 break; |
| 1618 | 1854 |
| 1622 { | 1858 { |
| 1623 /* invalid sequence */ | 1859 /* invalid sequence */ |
| 1624 *dst++ = c1; | 1860 *dst++ = c1; |
| 1625 *dst++ = c2; | 1861 *dst++ = c2; |
| 1626 *dst++ = c3; | 1862 *dst++ = c3; |
| 1863 coding->consumed_char += 3; | |
| 1627 } | 1864 } |
| 1628 else if (c1 < LEADING_CODE_PRIVATE_11) | 1865 else if (c1 < LEADING_CODE_PRIVATE_11) |
| 1629 ENCODE_ISO_CHARACTER (c1, c2, c3); | 1866 ENCODE_ISO_CHARACTER (c1, c2, c3); |
| 1630 else | 1867 else |
| 1631 ENCODE_ISO_CHARACTER (c2, c3, /* dummy */ c4); | 1868 ENCODE_ISO_CHARACTER (c2, c3, /* dummy */ c4); |
| 1638 /* invalid sequence */ | 1875 /* invalid sequence */ |
| 1639 *dst++ = c1; | 1876 *dst++ = c1; |
| 1640 *dst++ = c2; | 1877 *dst++ = c2; |
| 1641 *dst++ = c3; | 1878 *dst++ = c3; |
| 1642 *dst++ = c4; | 1879 *dst++ = c4; |
| 1880 coding->consumed_char += 4; | |
| 1643 } | 1881 } |
| 1644 else | 1882 else |
| 1645 ENCODE_ISO_CHARACTER (c2, c3, c4); | 1883 ENCODE_ISO_CHARACTER (c2, c3, c4); |
| 1646 break; | 1884 break; |
| 1647 | 1885 |
| 1650 if (c2 < 0xA0) | 1888 if (c2 < 0xA0) |
| 1651 { | 1889 { |
| 1652 /* invalid sequence */ | 1890 /* invalid sequence */ |
| 1653 *dst++ = c1; | 1891 *dst++ = c1; |
| 1654 *dst++ = c2; | 1892 *dst++ = c2; |
| 1893 coding->consumed_char += 2; | |
| 1655 } | 1894 } |
| 1656 else if (c2 == 0xFF) | 1895 else if (c2 == 0xFF) |
| 1657 { | 1896 { |
| 1897 ENCODE_RESET_PLANE_AND_REGISTER; | |
| 1658 coding->composing = COMPOSING_WITH_RULE_HEAD; | 1898 coding->composing = COMPOSING_WITH_RULE_HEAD; |
| 1659 ENCODE_COMPOSITION_WITH_RULE_START; | 1899 ENCODE_COMPOSITION_WITH_RULE_START; |
| 1900 coding->consumed_char++; | |
| 1660 } | 1901 } |
| 1661 else | 1902 else |
| 1662 { | 1903 { |
| 1904 ENCODE_RESET_PLANE_AND_REGISTER; | |
| 1663 /* Rewind one byte because it is a character code of | 1905 /* Rewind one byte because it is a character code of |
| 1664 composition elements. */ | 1906 composition elements. */ |
| 1665 src--; | 1907 src--; |
| 1666 coding->composing = COMPOSING_NO_RULE_HEAD; | 1908 coding->composing = COMPOSING_NO_RULE_HEAD; |
| 1667 ENCODE_COMPOSITION_NO_RULE_START; | 1909 ENCODE_COMPOSITION_NO_RULE_START; |
| 1910 coding->consumed_char++; | |
| 1668 } | 1911 } |
| 1669 break; | 1912 break; |
| 1670 | 1913 |
| 1671 case EMACS_invalid_code: | 1914 case EMACS_invalid_code: |
| 1672 *dst++ = c1; | 1915 *dst++ = c1; |
| 1916 coding->consumed_char++; | |
| 1673 break; | 1917 break; |
| 1674 } | 1918 } |
| 1675 continue; | 1919 continue; |
| 1676 label_end_of_loop: | 1920 label_end_of_loop: |
| 1677 /* We reach here because the source date ends not at character | 1921 result = CODING_FINISH_INSUFFICIENT_SRC; |
| 1678 boundary. */ | 1922 src = src_base; |
| 1679 coding->carryover_size = src_end - src_base; | |
| 1680 bcopy (src_base, coding->carryover, coding->carryover_size); | |
| 1681 src = src_end; | |
| 1682 break; | 1923 break; |
| 1683 } | 1924 } |
| 1684 | 1925 |
| 1926 if (result == CODING_FINISH_NORMAL | |
| 1927 && src < src_end) | |
| 1928 result = CODING_FINISH_INSUFFICIENT_DST; | |
| 1929 | |
| 1685 /* If this is the last block of the text to be encoded, we must | 1930 /* If this is the last block of the text to be encoded, we must |
| 1686 reset graphic planes and registers to the initial state. */ | 1931 reset graphic planes and registers to the initial state, and |
| 1687 if (src >= src_end && coding->last_block) | 1932 flush out the carryover if any. */ |
| 1688 { | 1933 if (coding->mode & CODING_MODE_LAST_BLOCK) |
| 1689 ENCODE_RESET_PLANE_AND_REGISTER; | 1934 ENCODE_RESET_PLANE_AND_REGISTER; |
| 1690 if (coding->carryover_size > 0 | 1935 |
| 1691 && coding->carryover_size < (dst_end - dst)) | 1936 coding->consumed = src - source; |
| 1692 { | 1937 coding->produced = coding->produced_char = dst - destination; |
| 1693 bcopy (coding->carryover, dst, coding->carryover_size); | 1938 return result; |
| 1694 dst += coding->carryover_size; | |
| 1695 coding->carryover_size = 0; | |
| 1696 } | |
| 1697 } | |
| 1698 *consumed = src - source; | |
| 1699 return dst - destination; | |
| 1700 } | 1939 } |
| 1701 | 1940 |
| 1702 | 1941 |
| 1703 /*** 4. SJIS and BIG5 handlers ***/ | 1942 /*** 4. SJIS and BIG5 handlers ***/ |
| 1704 | 1943 |
| 1785 DECODE_CHARACTER_ASCII (c1); \ | 2024 DECODE_CHARACTER_ASCII (c1); \ |
| 1786 else if (CHARSET_DIMENSION (charset_alt) == 1) \ | 2025 else if (CHARSET_DIMENSION (charset_alt) == 1) \ |
| 1787 DECODE_CHARACTER_DIMENSION1 (charset_alt, c1); \ | 2026 DECODE_CHARACTER_DIMENSION1 (charset_alt, c1); \ |
| 1788 else \ | 2027 else \ |
| 1789 DECODE_CHARACTER_DIMENSION2 (charset_alt, c1, c2); \ | 2028 DECODE_CHARACTER_DIMENSION2 (charset_alt, c1, c2); \ |
| 2029 coding->produced_char++; \ | |
| 1790 } while (0) | 2030 } while (0) |
| 1791 | 2031 |
| 1792 #define ENCODE_SJIS_BIG5_CHARACTER(charset, c1, c2) \ | 2032 #define ENCODE_SJIS_BIG5_CHARACTER(charset, c1, c2) \ |
| 1793 do { \ | 2033 do { \ |
| 1794 int c_alt, charset_alt; \ | 2034 int c_alt, charset_alt; \ |
| 1827 *dst++ = b1, *dst++ = b2; \ | 2067 *dst++ = b1, *dst++ = b2; \ |
| 1828 } \ | 2068 } \ |
| 1829 else \ | 2069 else \ |
| 1830 *dst++ = charset_alt, *dst++ = c1, *dst++ = c2; \ | 2070 *dst++ = charset_alt, *dst++ = c1, *dst++ = c2; \ |
| 1831 } \ | 2071 } \ |
| 2072 coding->consumed_char++; \ | |
| 1832 } while (0); | 2073 } while (0); |
| 1833 | 2074 |
| 1834 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". | 2075 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". |
| 1835 Check if a text is encoded in SJIS. If it is, return | 2076 Check if a text is encoded in SJIS. If it is, return |
| 1836 CODING_CATEGORY_MASK_SJIS, else return 0. */ | 2077 CODING_CATEGORY_MASK_SJIS, else return 0. */ |
| 1842 unsigned char c; | 2083 unsigned char c; |
| 1843 | 2084 |
| 1844 while (src < src_end) | 2085 while (src < src_end) |
| 1845 { | 2086 { |
| 1846 c = *src++; | 2087 c = *src++; |
| 1847 if (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) | |
| 1848 return 0; | |
| 1849 if ((c >= 0x80 && c < 0xA0) || c >= 0xE0) | 2088 if ((c >= 0x80 && c < 0xA0) || c >= 0xE0) |
| 1850 { | 2089 { |
| 1851 if (src < src_end && *src++ < 0x40) | 2090 if (src < src_end && *src++ < 0x40) |
| 1852 return 0; | 2091 return 0; |
| 1853 } | 2092 } |
| 1866 unsigned char c; | 2105 unsigned char c; |
| 1867 | 2106 |
| 1868 while (src < src_end) | 2107 while (src < src_end) |
| 1869 { | 2108 { |
| 1870 c = *src++; | 2109 c = *src++; |
| 1871 if (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) | |
| 1872 return 0; | |
| 1873 if (c >= 0xA1) | 2110 if (c >= 0xA1) |
| 1874 { | 2111 { |
| 1875 if (src >= src_end) | 2112 if (src >= src_end) |
| 1876 break; | 2113 break; |
| 1877 c = *src++; | 2114 c = *src++; |
| 1885 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". | 2122 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". |
| 1886 If SJIS_P is 1, decode SJIS text, else decode BIG5 test. */ | 2123 If SJIS_P is 1, decode SJIS text, else decode BIG5 test. */ |
| 1887 | 2124 |
| 1888 int | 2125 int |
| 1889 decode_coding_sjis_big5 (coding, source, destination, | 2126 decode_coding_sjis_big5 (coding, source, destination, |
| 1890 src_bytes, dst_bytes, consumed, sjis_p) | 2127 src_bytes, dst_bytes, sjis_p) |
| 1891 struct coding_system *coding; | 2128 struct coding_system *coding; |
| 1892 unsigned char *source, *destination; | 2129 unsigned char *source, *destination; |
| 1893 int src_bytes, dst_bytes; | 2130 int src_bytes, dst_bytes; |
| 1894 int *consumed; | |
| 1895 int sjis_p; | 2131 int sjis_p; |
| 1896 { | 2132 { |
| 1897 unsigned char *src = source; | 2133 unsigned char *src = source; |
| 1898 unsigned char *src_end = source + src_bytes; | 2134 unsigned char *src_end = source + src_bytes; |
| 1899 unsigned char *dst = destination; | 2135 unsigned char *dst = destination; |
| 1902 from DST_END to assure overflow checking is necessary only at the | 2138 from DST_END to assure overflow checking is necessary only at the |
| 1903 head of loop. */ | 2139 head of loop. */ |
| 1904 unsigned char *adjusted_dst_end = dst_end - 3; | 2140 unsigned char *adjusted_dst_end = dst_end - 3; |
| 1905 Lisp_Object unification_table | 2141 Lisp_Object unification_table |
| 1906 = coding->character_unification_table_for_decode; | 2142 = coding->character_unification_table_for_decode; |
| 2143 int result = CODING_FINISH_NORMAL; | |
| 1907 | 2144 |
| 1908 if (!NILP (Venable_character_unification) && NILP (unification_table)) | 2145 if (!NILP (Venable_character_unification) && NILP (unification_table)) |
| 1909 unification_table = Vstandard_character_unification_table_for_decode; | 2146 unification_table = Vstandard_character_unification_table_for_decode; |
| 1910 | 2147 |
| 1911 while (src < src_end && dst < adjusted_dst_end) | 2148 coding->produced_char = 0; |
| 2149 while (src < src_end && (dst_bytes | |
| 2150 ? (dst < adjusted_dst_end) | |
| 2151 : (dst < src - 3))) | |
| 1912 { | 2152 { |
| 1913 /* SRC_BASE remembers the start position in source in each loop. | 2153 /* SRC_BASE remembers the start position in source in each loop. |
| 1914 The loop will be exited when there's not enough source text | 2154 The loop will be exited when there's not enough source text |
| 1915 to analyze two-byte character (within macro ONE_MORE_BYTE). | 2155 to analyze two-byte character (within macro ONE_MORE_BYTE). |
| 1916 In that case, SRC is reset to SRC_BASE before exiting. */ | 2156 In that case, SRC is reset to SRC_BASE before exiting. */ |
| 1917 unsigned char *src_base = src; | 2157 unsigned char *src_base = src; |
| 1918 unsigned char c1 = *src++, c2, c3, c4; | 2158 unsigned char c1 = *src++, c2, c3, c4; |
| 1919 | 2159 |
| 1920 if (c1 == '\r') | 2160 if (c1 < 0x20) |
| 1921 { | 2161 { |
| 1922 if (coding->eol_type == CODING_EOL_CRLF) | 2162 if (c1 == '\r') |
| 1923 { | 2163 { |
| 1924 ONE_MORE_BYTE (c2); | 2164 if (coding->eol_type == CODING_EOL_CRLF) |
| 1925 if (c2 == '\n') | 2165 { |
| 1926 *dst++ = c2; | 2166 ONE_MORE_BYTE (c2); |
| 2167 if (c2 == '\n') | |
| 2168 *dst++ = c2; | |
| 2169 else if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL) | |
| 2170 { | |
| 2171 result = CODING_FINISH_INCONSISTENT_EOL; | |
| 2172 goto label_end_of_loop_2; | |
| 2173 } | |
| 2174 else | |
| 2175 /* To process C2 again, SRC is subtracted by 1. */ | |
| 2176 *dst++ = c1, src--; | |
| 2177 } | |
| 2178 else if (coding->eol_type == CODING_EOL_CR) | |
| 2179 *dst++ = '\n'; | |
| 1927 else | 2180 else |
| 1928 /* To process C2 again, SRC is subtracted by 1. */ | 2181 *dst++ = c1; |
| 1929 *dst++ = c1, src--; | |
| 1930 } | 2182 } |
| 1931 else if (coding->eol_type == CODING_EOL_CR) | 2183 else if (c1 == '\n' |
| 1932 *dst++ = '\n'; | 2184 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL) |
| 2185 && (coding->eol_type == CODING_EOL_CR | |
| 2186 || coding->eol_type == CODING_EOL_CRLF)) | |
| 2187 { | |
| 2188 result = CODING_FINISH_INCONSISTENT_EOL; | |
| 2189 goto label_end_of_loop_2; | |
| 2190 } | |
| 1933 else | 2191 else |
| 1934 *dst++ = c1; | 2192 *dst++ = c1; |
| 1935 } | 2193 coding->produced_char++; |
| 1936 else if (c1 < 0x20) | 2194 } |
| 1937 *dst++ = c1; | |
| 1938 else if (c1 < 0x80) | 2195 else if (c1 < 0x80) |
| 1939 DECODE_SJIS_BIG5_CHARACTER (charset_ascii, c1, /* dummy */ c2); | 2196 DECODE_SJIS_BIG5_CHARACTER (charset_ascii, c1, /* dummy */ c2); |
| 1940 else if (c1 < 0xA0 || c1 >= 0xE0) | 2197 else if (c1 < 0xA0 || c1 >= 0xE0) |
| 1941 { | 2198 { |
| 1942 /* SJIS -> JISX0208, BIG5 -> Big5 (only if 0xE0 <= c1 < 0xFF) */ | 2199 /* SJIS -> JISX0208, BIG5 -> Big5 (only if 0xE0 <= c1 < 0xFF) */ |
| 1953 ONE_MORE_BYTE (c2); | 2210 ONE_MORE_BYTE (c2); |
| 1954 DECODE_BIG5 (c1, c2, charset, c3, c4); | 2211 DECODE_BIG5 (c1, c2, charset, c3, c4); |
| 1955 DECODE_SJIS_BIG5_CHARACTER (charset, c3, c4); | 2212 DECODE_SJIS_BIG5_CHARACTER (charset, c3, c4); |
| 1956 } | 2213 } |
| 1957 else /* Invalid code */ | 2214 else /* Invalid code */ |
| 1958 *dst++ = c1; | 2215 { |
| 2216 *dst++ = c1; | |
| 2217 coding->produced_char++; | |
| 2218 } | |
| 1959 } | 2219 } |
| 1960 else | 2220 else |
| 1961 { | 2221 { |
| 1962 /* SJIS -> JISX0201-Kana, BIG5 -> Big5 */ | 2222 /* SJIS -> JISX0201-Kana, BIG5 -> Big5 */ |
| 1963 if (sjis_p) | 2223 if (sjis_p) |
| 1964 DECODE_SJIS_BIG5_CHARACTER (charset_katakana_jisx0201, c1, /* dummy */ c2); | 2224 DECODE_SJIS_BIG5_CHARACTER (charset_katakana_jisx0201, c1, |
| 2225 /* dummy */ c2); | |
| 1965 else | 2226 else |
| 1966 { | 2227 { |
| 1967 int charset; | 2228 int charset; |
| 1968 | 2229 |
| 1969 ONE_MORE_BYTE (c2); | 2230 ONE_MORE_BYTE (c2); |
| 1972 } | 2233 } |
| 1973 } | 2234 } |
| 1974 continue; | 2235 continue; |
| 1975 | 2236 |
| 1976 label_end_of_loop: | 2237 label_end_of_loop: |
| 1977 coding->carryover_size = src - src_base; | 2238 result = CODING_FINISH_INSUFFICIENT_SRC; |
| 1978 bcopy (src_base, coding->carryover, coding->carryover_size); | 2239 label_end_of_loop_2: |
| 1979 src = src_base; | 2240 src = src_base; |
| 1980 break; | 2241 break; |
| 1981 } | 2242 } |
| 1982 | 2243 |
| 1983 *consumed = src - source; | 2244 if (result == CODING_FINISH_NORMAL |
| 1984 return dst - destination; | 2245 && src < src_end) |
| 2246 result = CODING_FINISH_INSUFFICIENT_DST; | |
| 2247 | |
| 2248 coding->consumed = coding->consumed_char = src - source; | |
| 2249 coding->produced = dst - destination; | |
| 2250 return result; | |
| 1985 } | 2251 } |
| 1986 | 2252 |
| 1987 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". | 2253 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". |
| 1988 This function can encode `charset_ascii', `charset_katakana_jisx0201', | 2254 This function can encode `charset_ascii', `charset_katakana_jisx0201', |
| 1989 `charset_jisx0208', `charset_big5_1', and `charset_big5-2'. We are | 2255 `charset_jisx0208', `charset_big5_1', and `charset_big5-2'. We are |
| 1992 charsets are produced without any encoding. If SJIS_P is 1, encode | 2258 charsets are produced without any encoding. If SJIS_P is 1, encode |
| 1993 SJIS text, else encode BIG5 text. */ | 2259 SJIS text, else encode BIG5 text. */ |
| 1994 | 2260 |
| 1995 int | 2261 int |
| 1996 encode_coding_sjis_big5 (coding, source, destination, | 2262 encode_coding_sjis_big5 (coding, source, destination, |
| 1997 src_bytes, dst_bytes, consumed, sjis_p) | 2263 src_bytes, dst_bytes, sjis_p) |
| 1998 struct coding_system *coding; | 2264 struct coding_system *coding; |
| 1999 unsigned char *source, *destination; | 2265 unsigned char *source, *destination; |
| 2000 int src_bytes, dst_bytes; | 2266 int src_bytes, dst_bytes; |
| 2001 int *consumed; | |
| 2002 int sjis_p; | 2267 int sjis_p; |
| 2003 { | 2268 { |
| 2004 unsigned char *src = source; | 2269 unsigned char *src = source; |
| 2005 unsigned char *src_end = source + src_bytes; | 2270 unsigned char *src_end = source + src_bytes; |
| 2006 unsigned char *dst = destination; | 2271 unsigned char *dst = destination; |
| 2009 from DST_END to assure overflow checking is necessary only at the | 2274 from DST_END to assure overflow checking is necessary only at the |
| 2010 head of loop. */ | 2275 head of loop. */ |
| 2011 unsigned char *adjusted_dst_end = dst_end - 1; | 2276 unsigned char *adjusted_dst_end = dst_end - 1; |
| 2012 Lisp_Object unification_table | 2277 Lisp_Object unification_table |
| 2013 = coding->character_unification_table_for_encode; | 2278 = coding->character_unification_table_for_encode; |
| 2279 int result = CODING_FINISH_NORMAL; | |
| 2014 | 2280 |
| 2015 if (!NILP (Venable_character_unification) && NILP (unification_table)) | 2281 if (!NILP (Venable_character_unification) && NILP (unification_table)) |
| 2016 unification_table = Vstandard_character_unification_table_for_encode; | 2282 unification_table = Vstandard_character_unification_table_for_encode; |
| 2017 | 2283 |
| 2018 while (src < src_end && dst < adjusted_dst_end) | 2284 coding->consumed_char = 0; |
| 2285 while (src < src_end && (dst_bytes | |
| 2286 ? (dst < adjusted_dst_end) | |
| 2287 : (dst < src - 1))) | |
| 2019 { | 2288 { |
| 2020 /* SRC_BASE remembers the start position in source in each loop. | 2289 /* SRC_BASE remembers the start position in source in each loop. |
| 2021 The loop will be exited when there's not enough source text | 2290 The loop will be exited when there's not enough source text |
| 2022 to analyze multi-byte codes (within macros ONE_MORE_BYTE and | 2291 to analyze multi-byte codes (within macros ONE_MORE_BYTE and |
| 2023 TWO_MORE_BYTES). In that case, SRC is reset to SRC_BASE | 2292 TWO_MORE_BYTES). In that case, SRC is reset to SRC_BASE |
| 2044 ENCODE_SJIS_BIG5_CHARACTER (charset_ascii, c1, /* dummy */ c2); | 2313 ENCODE_SJIS_BIG5_CHARACTER (charset_ascii, c1, /* dummy */ c2); |
| 2045 break; | 2314 break; |
| 2046 | 2315 |
| 2047 case EMACS_control_code: | 2316 case EMACS_control_code: |
| 2048 *dst++ = c1; | 2317 *dst++ = c1; |
| 2318 coding->consumed_char++; | |
| 2049 break; | 2319 break; |
| 2050 | 2320 |
| 2051 case EMACS_carriage_return_code: | 2321 case EMACS_carriage_return_code: |
| 2052 if (!coding->selective) | 2322 if (! (coding->mode & CODING_MODE_SELECTIVE_DISPLAY)) |
| 2053 { | 2323 { |
| 2054 *dst++ = c1; | 2324 *dst++ = c1; |
| 2325 coding->consumed_char++; | |
| 2055 break; | 2326 break; |
| 2056 } | 2327 } |
| 2057 /* fall down to treat '\r' as '\n' ... */ | 2328 /* fall down to treat '\r' as '\n' ... */ |
| 2058 | 2329 |
| 2059 case EMACS_linefeed_code: | 2330 case EMACS_linefeed_code: |
| 2062 *dst++ = '\n'; | 2333 *dst++ = '\n'; |
| 2063 else if (coding->eol_type == CODING_EOL_CRLF) | 2334 else if (coding->eol_type == CODING_EOL_CRLF) |
| 2064 *dst++ = '\r', *dst++ = '\n'; | 2335 *dst++ = '\r', *dst++ = '\n'; |
| 2065 else | 2336 else |
| 2066 *dst++ = '\r'; | 2337 *dst++ = '\r'; |
| 2338 coding->consumed_char++; | |
| 2067 break; | 2339 break; |
| 2068 | 2340 |
| 2069 case EMACS_leading_code_2: | 2341 case EMACS_leading_code_2: |
| 2070 ONE_MORE_BYTE (c2); | 2342 ONE_MORE_BYTE (c2); |
| 2071 ENCODE_SJIS_BIG5_CHARACTER (c1, c2, /* dummy */ c3); | 2343 ENCODE_SJIS_BIG5_CHARACTER (c1, c2, /* dummy */ c3); |
| 2085 coding->composing = 1; | 2357 coding->composing = 1; |
| 2086 break; | 2358 break; |
| 2087 | 2359 |
| 2088 default: /* i.e. case EMACS_invalid_code: */ | 2360 default: /* i.e. case EMACS_invalid_code: */ |
| 2089 *dst++ = c1; | 2361 *dst++ = c1; |
| 2362 coding->consumed_char++; | |
| 2090 } | 2363 } |
| 2091 continue; | 2364 continue; |
| 2092 | 2365 |
| 2093 label_end_of_loop: | 2366 label_end_of_loop: |
| 2094 coding->carryover_size = src_end - src_base; | 2367 result = CODING_FINISH_INSUFFICIENT_SRC; |
| 2095 bcopy (src_base, coding->carryover, coding->carryover_size); | 2368 src = src_base; |
| 2096 src = src_end; | |
| 2097 break; | 2369 break; |
| 2098 } | 2370 } |
| 2099 | 2371 |
| 2100 *consumed = src - source; | 2372 if (result == CODING_FINISH_NORMAL |
| 2101 return dst - destination; | 2373 && src < src_end) |
| 2374 result = CODING_FINISH_INSUFFICIENT_DST; | |
| 2375 coding->consumed = src - source; | |
| 2376 coding->produced = coding->produced_char = dst - destination; | |
| 2377 return result; | |
| 2102 } | 2378 } |
| 2103 | 2379 |
| 2104 | 2380 |
| 2105 /*** 5. End-of-line handlers ***/ | 2381 /*** 5. End-of-line handlers ***/ |
| 2106 | 2382 |
| 2107 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". | 2383 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". |
| 2108 This function is called only when `coding->eol_type' is | 2384 This function is called only when `coding->eol_type' is |
| 2109 CODING_EOL_CRLF or CODING_EOL_CR. */ | 2385 CODING_EOL_CRLF or CODING_EOL_CR. */ |
| 2110 | 2386 |
| 2111 decode_eol (coding, source, destination, src_bytes, dst_bytes, consumed) | 2387 decode_eol (coding, source, destination, src_bytes, dst_bytes) |
| 2112 struct coding_system *coding; | 2388 struct coding_system *coding; |
| 2113 unsigned char *source, *destination; | 2389 unsigned char *source, *destination; |
| 2114 int src_bytes, dst_bytes; | 2390 int src_bytes, dst_bytes; |
| 2115 int *consumed; | |
| 2116 { | 2391 { |
| 2117 unsigned char *src = source; | 2392 unsigned char *src = source; |
| 2118 unsigned char *src_end = source + src_bytes; | 2393 unsigned char *src_end = source + src_bytes; |
| 2119 unsigned char *dst = destination; | 2394 unsigned char *dst = destination; |
| 2120 unsigned char *dst_end = destination + dst_bytes; | 2395 unsigned char *dst_end = destination + dst_bytes; |
| 2121 int produced; | 2396 int result = CODING_FINISH_NORMAL; |
| 2397 | |
| 2398 if (src_bytes <= 0) | |
| 2399 return result; | |
| 2122 | 2400 |
| 2123 switch (coding->eol_type) | 2401 switch (coding->eol_type) |
| 2124 { | 2402 { |
| 2125 case CODING_EOL_CRLF: | 2403 case CODING_EOL_CRLF: |
| 2126 { | 2404 { |
| 2127 /* Since the maximum bytes produced by each loop is 2, we | 2405 /* Since the maximum bytes produced by each loop is 2, we |
| 2128 subtract 1 from DST_END to assure overflow checking is | 2406 subtract 1 from DST_END to assure overflow checking is |
| 2129 necessary only at the head of loop. */ | 2407 necessary only at the head of loop. */ |
| 2130 unsigned char *adjusted_dst_end = dst_end - 1; | 2408 unsigned char *adjusted_dst_end = dst_end - 1; |
| 2131 | 2409 |
| 2132 while (src < src_end && dst < adjusted_dst_end) | 2410 while (src < src_end && (dst_bytes |
| 2411 ? (dst < adjusted_dst_end) | |
| 2412 : (dst < src - 1))) | |
| 2133 { | 2413 { |
| 2134 unsigned char *src_base = src; | 2414 unsigned char *src_base = src; |
| 2135 unsigned char c = *src++; | 2415 unsigned char c = *src++; |
| 2136 if (c == '\r') | 2416 if (c == '\r') |
| 2137 { | 2417 { |
| 2138 ONE_MORE_BYTE (c); | 2418 ONE_MORE_BYTE (c); |
| 2139 if (c != '\n') | 2419 if (c != '\n') |
| 2140 *dst++ = '\r'; | 2420 { |
| 2421 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL) | |
| 2422 { | |
| 2423 result = CODING_FINISH_INCONSISTENT_EOL; | |
| 2424 goto label_end_of_loop_2; | |
| 2425 } | |
| 2426 *dst++ = '\r'; | |
| 2427 } | |
| 2141 *dst++ = c; | 2428 *dst++ = c; |
| 2429 } | |
| 2430 else if (c == '\n' | |
| 2431 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)) | |
| 2432 { | |
| 2433 result = CODING_FINISH_INCONSISTENT_EOL; | |
| 2434 goto label_end_of_loop_2; | |
| 2142 } | 2435 } |
| 2143 else | 2436 else |
| 2144 *dst++ = c; | 2437 *dst++ = c; |
| 2145 continue; | 2438 continue; |
| 2146 | 2439 |
| 2147 label_end_of_loop: | 2440 label_end_of_loop: |
| 2148 coding->carryover_size = src - src_base; | 2441 result = CODING_FINISH_INSUFFICIENT_SRC; |
| 2149 bcopy (src_base, coding->carryover, coding->carryover_size); | 2442 label_end_of_loop_2: |
| 2150 src = src_base; | 2443 src = src_base; |
| 2151 break; | 2444 break; |
| 2152 } | 2445 } |
| 2153 *consumed = src - source; | 2446 if (result == CODING_FINISH_NORMAL |
| 2154 produced = dst - destination; | 2447 && src < src_end) |
| 2155 break; | 2448 result = CODING_FINISH_INSUFFICIENT_DST; |
| 2156 } | 2449 } |
| 2450 break; | |
| 2157 | 2451 |
| 2158 case CODING_EOL_CR: | 2452 case CODING_EOL_CR: |
| 2159 produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes; | 2453 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL) |
| 2160 bcopy (source, destination, produced); | 2454 { |
| 2161 dst_end = destination + produced; | 2455 while (src < src_end) if (*src++ == '\n') break; |
| 2162 while (dst < dst_end) | 2456 if (*--src == '\n') |
| 2163 if (*dst++ == '\r') dst[-1] = '\n'; | 2457 { |
| 2164 *consumed = produced; | 2458 src_bytes = src - source; |
| 2459 result = CODING_FINISH_INCONSISTENT_EOL; | |
| 2460 } | |
| 2461 } | |
| 2462 if (dst_bytes && src_bytes > dst_bytes) | |
| 2463 { | |
| 2464 result = CODING_FINISH_INSUFFICIENT_DST; | |
| 2465 src_bytes = dst_bytes; | |
| 2466 } | |
| 2467 if (dst_bytes) | |
| 2468 bcopy (source, destination, src_bytes); | |
| 2469 else | |
| 2470 safe_bcopy (source, destination, src_bytes); | |
| 2471 src = source + src_bytes; | |
| 2472 while (src_bytes--) if (*dst++ == '\r') dst[-1] = '\n'; | |
| 2165 break; | 2473 break; |
| 2166 | 2474 |
| 2167 default: /* i.e. case: CODING_EOL_LF */ | 2475 default: /* i.e. case: CODING_EOL_LF */ |
| 2168 produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes; | 2476 if (dst_bytes && src_bytes > dst_bytes) |
| 2169 bcopy (source, destination, produced); | 2477 { |
| 2170 *consumed = produced; | 2478 result = CODING_FINISH_INSUFFICIENT_DST; |
| 2479 src_bytes = dst_bytes; | |
| 2480 } | |
| 2481 if (dst_bytes) | |
| 2482 bcopy (source, destination, src_bytes); | |
| 2483 else | |
| 2484 safe_bcopy (source, destination, src_bytes); | |
| 2485 src += src_bytes; | |
| 2486 dst += dst_bytes; | |
| 2171 break; | 2487 break; |
| 2172 } | 2488 } |
| 2173 | 2489 |
| 2174 return produced; | 2490 coding->consumed = coding->consumed_char = src - source; |
| 2491 coding->produced = coding->produced_char = dst - destination; | |
| 2492 return result; | |
| 2175 } | 2493 } |
| 2176 | 2494 |
| 2177 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". Encode | 2495 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". Encode |
| 2178 format of end-of-line according to `coding->eol_type'. If | 2496 format of end-of-line according to `coding->eol_type'. If |
| 2179 `coding->selective' is 1, code '\r' in source text also means | 2497 `coding->mode & CODING_MODE_SELECTIVE_DISPLAY' is nonzero, code |
| 2180 end-of-line. */ | 2498 '\r' in source text also means end-of-line. */ |
| 2181 | 2499 |
| 2182 encode_eol (coding, source, destination, src_bytes, dst_bytes, consumed) | 2500 encode_eol (coding, source, destination, src_bytes, dst_bytes) |
| 2183 struct coding_system *coding; | 2501 struct coding_system *coding; |
| 2184 unsigned char *source, *destination; | 2502 unsigned char *source, *destination; |
| 2185 int src_bytes, dst_bytes; | 2503 int src_bytes, dst_bytes; |
| 2186 int *consumed; | |
| 2187 { | 2504 { |
| 2188 unsigned char *src = source; | 2505 unsigned char *src = source; |
| 2189 unsigned char *dst = destination; | 2506 unsigned char *dst = destination; |
| 2190 int produced; | 2507 int result = CODING_FINISH_NORMAL; |
| 2191 | 2508 |
| 2192 if (src_bytes <= 0) | 2509 if (coding->eol_type == CODING_EOL_CRLF) |
| 2193 return 0; | 2510 { |
| 2194 | 2511 unsigned char c; |
| 2195 switch (coding->eol_type) | 2512 unsigned char *src_end = source + src_bytes; |
| 2196 { | 2513 unsigned char *dst_end = destination + dst_bytes; |
| 2197 case CODING_EOL_LF: | 2514 /* Since the maximum bytes produced by each loop is 2, we |
| 2198 case CODING_EOL_UNDECIDED: | 2515 subtract 1 from DST_END to assure overflow checking is |
| 2199 produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes; | 2516 necessary only at the head of loop. */ |
| 2200 bcopy (source, destination, produced); | 2517 unsigned char *adjusted_dst_end = dst_end - 1; |
| 2201 if (coding->selective) | 2518 |
| 2202 { | 2519 while (src < src_end && (dst_bytes |
| 2203 int i = produced; | 2520 ? (dst < adjusted_dst_end) |
| 2204 while (i--) | 2521 : (dst < src - 1))) |
| 2522 { | |
| 2523 c = *src++; | |
| 2524 if (c == '\n' | |
| 2525 || (c == '\r' && (coding->mode & CODING_MODE_SELECTIVE_DISPLAY))) | |
| 2526 *dst++ = '\r', *dst++ = '\n'; | |
| 2527 else | |
| 2528 *dst++ = c; | |
| 2529 } | |
| 2530 if (src < src_end) | |
| 2531 result = CODING_FINISH_INSUFFICIENT_DST; | |
| 2532 } | |
| 2533 else | |
| 2534 { | |
| 2535 if (dst_bytes && src_bytes > dst_bytes) | |
| 2536 { | |
| 2537 src_bytes = dst_bytes; | |
| 2538 result = CODING_FINISH_INSUFFICIENT_DST; | |
| 2539 } | |
| 2540 if (dst_bytes) | |
| 2541 bcopy (source, destination, src_bytes); | |
| 2542 else | |
| 2543 safe_bcopy (source, destination, src_bytes); | |
| 2544 if (coding->eol_type == CODING_EOL_CRLF) | |
| 2545 { | |
| 2546 while (src_bytes--) | |
| 2547 if (*dst++ == '\n') dst[-1] = '\r'; | |
| 2548 } | |
| 2549 else if (coding->mode & CODING_MODE_SELECTIVE_DISPLAY) | |
| 2550 { | |
| 2551 while (src_bytes--) | |
| 2205 if (*dst++ == '\r') dst[-1] = '\n'; | 2552 if (*dst++ == '\r') dst[-1] = '\n'; |
| 2206 } | 2553 } |
| 2207 *consumed = produced; | 2554 src += src_bytes; |
| 2208 | 2555 dst += src_bytes; |
| 2209 case CODING_EOL_CRLF: | 2556 } |
| 2210 { | 2557 |
| 2211 unsigned char c; | 2558 coding->consumed = coding->consumed_char = src - source; |
| 2212 unsigned char *src_end = source + src_bytes; | 2559 coding->produced = coding->produced_char = dst - destination; |
| 2213 unsigned char *dst_end = destination + dst_bytes; | 2560 return result; |
| 2214 /* Since the maximum bytes produced by each loop is 2, we | |
| 2215 subtract 1 from DST_END to assure overflow checking is | |
| 2216 necessary only at the head of loop. */ | |
| 2217 unsigned char *adjusted_dst_end = dst_end - 1; | |
| 2218 | |
| 2219 while (src < src_end && dst < adjusted_dst_end) | |
| 2220 { | |
| 2221 c = *src++; | |
| 2222 if (c == '\n' || (c == '\r' && coding->selective)) | |
| 2223 *dst++ = '\r', *dst++ = '\n'; | |
| 2224 else | |
| 2225 *dst++ = c; | |
| 2226 } | |
| 2227 produced = dst - destination; | |
| 2228 *consumed = src - source; | |
| 2229 break; | |
| 2230 } | |
| 2231 | |
| 2232 default: /* i.e. case CODING_EOL_CR: */ | |
| 2233 produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes; | |
| 2234 bcopy (source, destination, produced); | |
| 2235 { | |
| 2236 int i = produced; | |
| 2237 while (i--) | |
| 2238 if (*dst++ == '\n') dst[-1] = '\r'; | |
| 2239 } | |
| 2240 *consumed = produced; | |
| 2241 } | |
| 2242 | |
| 2243 return produced; | |
| 2244 } | 2561 } |
| 2245 | 2562 |
| 2246 | 2563 |
| 2247 /*** 6. C library functions ***/ | 2564 /*** 6. C library functions ***/ |
| 2248 | 2565 |
| 2315 int | 2632 int |
| 2316 setup_coding_system (coding_system, coding) | 2633 setup_coding_system (coding_system, coding) |
| 2317 Lisp_Object coding_system; | 2634 Lisp_Object coding_system; |
| 2318 struct coding_system *coding; | 2635 struct coding_system *coding; |
| 2319 { | 2636 { |
| 2320 Lisp_Object coding_spec, plist, type, eol_type; | 2637 Lisp_Object coding_spec, coding_type, eol_type, plist; |
| 2321 Lisp_Object val; | 2638 Lisp_Object val; |
| 2322 int i; | 2639 int i; |
| 2323 | 2640 |
| 2324 /* At first, set several fields to default values. */ | 2641 /* Initialize some fields required for all kinds of coding systems. */ |
| 2325 coding->last_block = 0; | 2642 coding->symbol = coding_system; |
| 2326 coding->selective = 0; | 2643 coding->common_flags = 0; |
| 2327 coding->composing = 0; | 2644 coding->mode = 0; |
| 2328 coding->direction = 0; | 2645 coding->heading_ascii = -1; |
| 2329 coding->carryover_size = 0; | |
| 2330 coding->post_read_conversion = coding->pre_write_conversion = Qnil; | 2646 coding->post_read_conversion = coding->pre_write_conversion = Qnil; |
| 2331 coding->character_unification_table_for_decode = Qnil; | |
| 2332 coding->character_unification_table_for_encode = Qnil; | |
| 2333 | |
| 2334 coding->symbol = coding_system; | |
| 2335 eol_type = Qnil; | |
| 2336 | |
| 2337 /* Get values of property `coding-system' and `eol-type'. | |
| 2338 Also get values of coding system properties: | |
| 2339 `post-read-conversion', `pre-write-conversion', | |
| 2340 `character-unification-table-for-decode', | |
| 2341 `character-unification-table-for-encode'. */ | |
| 2342 coding_spec = Fget (coding_system, Qcoding_system); | 2647 coding_spec = Fget (coding_system, Qcoding_system); |
| 2343 if (!VECTORP (coding_spec) | 2648 if (!VECTORP (coding_spec) |
| 2344 || XVECTOR (coding_spec)->size != 5 | 2649 || XVECTOR (coding_spec)->size != 5 |
| 2345 || !CONSP (XVECTOR (coding_spec)->contents[3])) | 2650 || !CONSP (XVECTOR (coding_spec)->contents[3])) |
| 2346 goto label_invalid_coding_system; | 2651 goto label_invalid_coding_system; |
| 2347 if (!inhibit_eol_conversion) | 2652 |
| 2348 eol_type = Fget (coding_system, Qeol_type); | 2653 eol_type = inhibit_eol_conversion ? Qnil : Fget (coding_system, Qeol_type); |
| 2349 | 2654 if (VECTORP (eol_type)) |
| 2655 { | |
| 2656 coding->eol_type = CODING_EOL_UNDECIDED; | |
| 2657 coding->common_flags = CODING_REQUIRE_DETECTION_MASK; | |
| 2658 } | |
| 2659 else if (XFASTINT (eol_type) == 1) | |
| 2660 { | |
| 2661 coding->eol_type = CODING_EOL_CRLF; | |
| 2662 coding->common_flags | |
| 2663 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK; | |
| 2664 } | |
| 2665 else if (XFASTINT (eol_type) == 2) | |
| 2666 { | |
| 2667 coding->eol_type = CODING_EOL_CR; | |
| 2668 coding->common_flags | |
| 2669 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK; | |
| 2670 } | |
| 2671 else | |
| 2672 coding->eol_type = CODING_EOL_LF; | |
| 2673 | |
| 2674 coding_type = XVECTOR (coding_spec)->contents[0]; | |
| 2675 /* Try short cut. */ | |
| 2676 if (SYMBOLP (coding_type)) | |
| 2677 { | |
| 2678 if (EQ (coding_type, Qt)) | |
| 2679 { | |
| 2680 coding->type = coding_type_undecided; | |
| 2681 coding->common_flags |= CODING_REQUIRE_DETECTION_MASK; | |
| 2682 } | |
| 2683 else | |
| 2684 coding->type = coding_type_no_conversion; | |
| 2685 return 0; | |
| 2686 } | |
| 2687 | |
| 2688 /* Initialize remaining fields. */ | |
| 2689 coding->composing = 0; | |
| 2690 coding->character_unification_table_for_decode = Qnil; | |
| 2691 coding->character_unification_table_for_encode = Qnil; | |
| 2692 | |
| 2693 /* Get values of coding system properties: | |
| 2694 `post-read-conversion', `pre-write-conversion', | |
| 2695 `character-unification-table-for-decode', | |
| 2696 `character-unification-table-for-encode'. */ | |
| 2350 plist = XVECTOR (coding_spec)->contents[3]; | 2697 plist = XVECTOR (coding_spec)->contents[3]; |
| 2351 coding->post_read_conversion = Fplist_get (plist, Qpost_read_conversion); | 2698 coding->post_read_conversion = Fplist_get (plist, Qpost_read_conversion); |
| 2352 coding->pre_write_conversion = Fplist_get (plist, Qpre_write_conversion); | 2699 coding->pre_write_conversion = Fplist_get (plist, Qpre_write_conversion); |
| 2353 val = Fplist_get (plist, Qcharacter_unification_table_for_decode); | 2700 val = Fplist_get (plist, Qcharacter_unification_table_for_decode); |
| 2354 if (SYMBOLP (val)) | 2701 if (SYMBOLP (val)) |
| 2358 val = Fplist_get (plist, Qcharacter_unification_table_for_encode); | 2705 val = Fplist_get (plist, Qcharacter_unification_table_for_encode); |
| 2359 if (SYMBOLP (val)) | 2706 if (SYMBOLP (val)) |
| 2360 val = Fget (val, Qcharacter_unification_table_for_encode); | 2707 val = Fget (val, Qcharacter_unification_table_for_encode); |
| 2361 coding->character_unification_table_for_encode | 2708 coding->character_unification_table_for_encode |
| 2362 = CHAR_TABLE_P (val) ? val : Qnil; | 2709 = CHAR_TABLE_P (val) ? val : Qnil; |
| 2710 val = Fplist_get (plist, Qcoding_category); | |
| 2711 if (!NILP (val)) | |
| 2712 { | |
| 2713 val = Fget (val, Qcoding_category_index); | |
| 2714 if (INTEGERP (val)) | |
| 2715 coding->category_idx = XINT (val); | |
| 2716 else | |
| 2717 goto label_invalid_coding_system; | |
| 2718 } | |
| 2719 else | |
| 2720 goto label_invalid_coding_system; | |
| 2363 | 2721 |
| 2364 val = Fplist_get (plist, Qsafe_charsets); | 2722 val = Fplist_get (plist, Qsafe_charsets); |
| 2365 if (EQ (val, Qt)) | 2723 if (EQ (val, Qt)) |
| 2366 { | 2724 { |
| 2367 for (i = 0; i <= MAX_CHARSET; i++) | 2725 for (i = 0; i <= MAX_CHARSET; i++) |
| 2376 coding->safe_charsets[i] = 1; | 2734 coding->safe_charsets[i] = 1; |
| 2377 val = XCONS (val)->cdr; | 2735 val = XCONS (val)->cdr; |
| 2378 } | 2736 } |
| 2379 } | 2737 } |
| 2380 | 2738 |
| 2381 if (VECTORP (eol_type)) | 2739 switch (XFASTINT (coding_type)) |
| 2382 { | |
| 2383 coding->eol_type = CODING_EOL_UNDECIDED; | |
| 2384 coding->common_flags = CODING_REQUIRE_DETECTION_MASK; | |
| 2385 } | |
| 2386 else if (XFASTINT (eol_type) == 1) | |
| 2387 { | |
| 2388 coding->eol_type = CODING_EOL_CRLF; | |
| 2389 coding->common_flags | |
| 2390 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK; | |
| 2391 } | |
| 2392 else if (XFASTINT (eol_type) == 2) | |
| 2393 { | |
| 2394 coding->eol_type = CODING_EOL_CR; | |
| 2395 coding->common_flags | |
| 2396 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK; | |
| 2397 } | |
| 2398 else | |
| 2399 { | |
| 2400 coding->eol_type = CODING_EOL_LF; | |
| 2401 coding->common_flags = 0; | |
| 2402 } | |
| 2403 | |
| 2404 type = XVECTOR (coding_spec)->contents[0]; | |
| 2405 switch (XFASTINT (type)) | |
| 2406 { | 2740 { |
| 2407 case 0: | 2741 case 0: |
| 2408 coding->type = coding_type_emacs_mule; | 2742 coding->type = coding_type_emacs_mule; |
| 2409 if (!NILP (coding->post_read_conversion)) | 2743 if (!NILP (coding->post_read_conversion)) |
| 2410 coding->common_flags |= CODING_REQUIRE_DECODING_MASK; | 2744 coding->common_flags |= CODING_REQUIRE_DECODING_MASK; |
| 2423 coding->common_flags | 2757 coding->common_flags |
| 2424 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK; | 2758 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK; |
| 2425 { | 2759 { |
| 2426 Lisp_Object val, temp; | 2760 Lisp_Object val, temp; |
| 2427 Lisp_Object *flags; | 2761 Lisp_Object *flags; |
| 2428 int i, charset, default_reg_bits = 0; | 2762 int i, charset, reg_bits = 0; |
| 2429 | 2763 |
| 2430 val = XVECTOR (coding_spec)->contents[4]; | 2764 val = XVECTOR (coding_spec)->contents[4]; |
| 2431 | 2765 |
| 2432 if (!VECTORP (val) || XVECTOR (val)->size != 32) | 2766 if (!VECTORP (val) || XVECTOR (val)->size != 32) |
| 2433 goto label_invalid_coding_system; | 2767 goto label_invalid_coding_system; |
| 2478 t: designate nothing to REG initially, but can be used | 2812 t: designate nothing to REG initially, but can be used |
| 2479 by any charsets, | 2813 by any charsets, |
| 2480 list of integer, nil, or t: designate the first | 2814 list of integer, nil, or t: designate the first |
| 2481 element (if integer) to REG initially, the remaining | 2815 element (if integer) to REG initially, the remaining |
| 2482 elements (if integer) is designated to REG on request, | 2816 elements (if integer) is designated to REG on request, |
| 2483 if an element is t, REG can be used by any charset, | 2817 if an element is t, REG can be used by any charsets, |
| 2484 nil: REG is never used. */ | 2818 nil: REG is never used. */ |
| 2485 for (charset = 0; charset <= MAX_CHARSET; charset++) | 2819 for (charset = 0; charset <= MAX_CHARSET; charset++) |
| 2486 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) | 2820 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) |
| 2487 = CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION; | 2821 = CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION; |
| 2488 for (i = 0; i < 4; i++) | 2822 for (i = 0; i < 4; i++) |
| 2495 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) = i; | 2829 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) = i; |
| 2496 } | 2830 } |
| 2497 else if (EQ (flags[i], Qt)) | 2831 else if (EQ (flags[i], Qt)) |
| 2498 { | 2832 { |
| 2499 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1; | 2833 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1; |
| 2500 default_reg_bits |= 1 << i; | 2834 reg_bits |= 1 << i; |
| 2835 coding->flags |= CODING_FLAG_ISO_DESIGNATION; | |
| 2501 } | 2836 } |
| 2502 else if (CONSP (flags[i])) | 2837 else if (CONSP (flags[i])) |
| 2503 { | 2838 { |
| 2504 Lisp_Object tail = flags[i]; | 2839 Lisp_Object tail = flags[i]; |
| 2505 | 2840 |
| 2841 coding->flags |= CODING_FLAG_ISO_DESIGNATION; | |
| 2506 if (INTEGERP (XCONS (tail)->car) | 2842 if (INTEGERP (XCONS (tail)->car) |
| 2507 && (charset = XINT (XCONS (tail)->car), | 2843 && (charset = XINT (XCONS (tail)->car), |
| 2508 CHARSET_VALID_P (charset)) | 2844 CHARSET_VALID_P (charset)) |
| 2509 || (charset = get_charset_id (XCONS (tail)->car)) >= 0) | 2845 || (charset = get_charset_id (XCONS (tail)->car)) >= 0) |
| 2510 { | 2846 { |
| 2521 CHARSET_VALID_P (charset)) | 2857 CHARSET_VALID_P (charset)) |
| 2522 || (charset = get_charset_id (XCONS (tail)->car)) >= 0) | 2858 || (charset = get_charset_id (XCONS (tail)->car)) >= 0) |
| 2523 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) | 2859 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) |
| 2524 = i; | 2860 = i; |
| 2525 else if (EQ (XCONS (tail)->car, Qt)) | 2861 else if (EQ (XCONS (tail)->car, Qt)) |
| 2526 default_reg_bits |= 1 << i; | 2862 reg_bits |= 1 << i; |
| 2527 tail = XCONS (tail)->cdr; | 2863 tail = XCONS (tail)->cdr; |
| 2528 } | 2864 } |
| 2529 } | 2865 } |
| 2530 else | 2866 else |
| 2531 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1; | 2867 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1; |
| 2532 | 2868 |
| 2533 CODING_SPEC_ISO_DESIGNATION (coding, i) | 2869 CODING_SPEC_ISO_DESIGNATION (coding, i) |
| 2534 = CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i); | 2870 = CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i); |
| 2535 } | 2871 } |
| 2536 | 2872 |
| 2537 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)) | 2873 if (reg_bits && ! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)) |
| 2538 { | 2874 { |
| 2539 /* REG 1 can be used only by locking shift in 7-bit env. */ | 2875 /* REG 1 can be used only by locking shift in 7-bit env. */ |
| 2540 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) | 2876 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) |
| 2541 default_reg_bits &= ~2; | 2877 reg_bits &= ~2; |
| 2542 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)) | 2878 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)) |
| 2543 /* Without any shifting, only REG 0 and 1 can be used. */ | 2879 /* Without any shifting, only REG 0 and 1 can be used. */ |
| 2544 default_reg_bits &= 3; | 2880 reg_bits &= 3; |
| 2545 } | 2881 } |
| 2546 | 2882 |
| 2547 for (charset = 0; charset <= MAX_CHARSET; charset++) | 2883 if (reg_bits) |
| 2548 if (CHARSET_VALID_P (charset) | 2884 for (charset = 0; charset <= MAX_CHARSET; charset++) |
| 2549 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) | |
| 2550 == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION)) | |
| 2551 { | 2885 { |
| 2552 /* We have not yet decided where to designate CHARSET. */ | 2886 if (CHARSET_VALID_P (charset)) |
| 2553 int reg_bits = default_reg_bits; | 2887 { |
| 2554 | 2888 /* There exist some default graphic registers to be |
| 2555 if (CHARSET_CHARS (charset) == 96) | 2889 used CHARSET. */ |
| 2556 /* A charset of CHARS96 can't be designated to REG 0. */ | 2890 |
| 2557 reg_bits &= ~1; | 2891 /* We had better avoid designating a charset of |
| 2558 | 2892 CHARS96 to REG 0 as far as possible. */ |
| 2559 if (reg_bits) | 2893 if (CHARSET_CHARS (charset) == 96) |
| 2560 /* There exist some default graphic register. */ | 2894 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) |
| 2561 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) | 2895 = (reg_bits & 2 |
| 2562 = (reg_bits & 1 | 2896 ? 1 : (reg_bits & 4 ? 2 : (reg_bits & 8 ? 3 : 0))); |
| 2563 ? 0 : (reg_bits & 2 ? 1 : (reg_bits & 4 ? 2 : 3))); | 2897 else |
| 2564 else | 2898 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) |
| 2565 /* We anyway have to designate CHARSET to somewhere. */ | 2899 = (reg_bits & 1 |
| 2566 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) | 2900 ? 0 : (reg_bits & 2 ? 1 : (reg_bits & 4 ? 2 : 3))); |
| 2567 = (CHARSET_CHARS (charset) == 94 | 2901 } |
| 2568 ? 0 | |
| 2569 : ((coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT | |
| 2570 || ! coding->flags & CODING_FLAG_ISO_SEVEN_BITS) | |
| 2571 ? 1 | |
| 2572 : (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT | |
| 2573 ? 2 : 0))); | |
| 2574 } | 2902 } |
| 2575 } | 2903 } |
| 2576 coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK; | 2904 coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK; |
| 2905 coding->spec.iso2022.last_invalid_designation_register = -1; | |
| 2577 break; | 2906 break; |
| 2578 | 2907 |
| 2579 case 3: | 2908 case 3: |
| 2580 coding->type = coding_type_big5; | 2909 coding->type = coding_type_big5; |
| 2581 coding->common_flags | 2910 coding->common_flags |
| 2608 case 5: | 2937 case 5: |
| 2609 coding->type = coding_type_raw_text; | 2938 coding->type = coding_type_raw_text; |
| 2610 break; | 2939 break; |
| 2611 | 2940 |
| 2612 default: | 2941 default: |
| 2613 if (EQ (type, Qt)) | 2942 goto label_invalid_coding_system; |
| 2614 { | |
| 2615 coding->type = coding_type_undecided; | |
| 2616 coding->common_flags |= CODING_REQUIRE_DETECTION_MASK; | |
| 2617 } | |
| 2618 else | |
| 2619 coding->type = coding_type_no_conversion; | |
| 2620 break; | |
| 2621 } | 2943 } |
| 2622 return 0; | 2944 return 0; |
| 2623 | 2945 |
| 2624 label_invalid_coding_system: | 2946 label_invalid_coding_system: |
| 2625 coding->type = coding_type_no_conversion; | 2947 coding->type = coding_type_no_conversion; |
| 2948 coding->category_idx = CODING_CATEGORY_IDX_BINARY; | |
| 2626 coding->common_flags = 0; | 2949 coding->common_flags = 0; |
| 2627 coding->eol_type = CODING_EOL_LF; | 2950 coding->eol_type = CODING_EOL_LF; |
| 2628 coding->symbol = coding->pre_write_conversion = coding->post_read_conversion | 2951 coding->pre_write_conversion = coding->post_read_conversion = Qnil; |
| 2629 = Qnil; | |
| 2630 return -1; | 2952 return -1; |
| 2631 } | 2953 } |
| 2632 | 2954 |
| 2633 /* Emacs has a mechanism to automatically detect a coding system if it | 2955 /* Emacs has a mechanism to automatically detect a coding system if it |
| 2634 is one of Emacs' internal format, ISO2022, SJIS, and BIG5. But, | 2956 is one of Emacs' internal format, ISO2022, SJIS, and BIG5. But, |
| 2650 | 2972 |
| 2651 o coding-category-iso-7 | 2973 o coding-category-iso-7 |
| 2652 | 2974 |
| 2653 The category for a coding system which has the same code range | 2975 The category for a coding system which has the same code range |
| 2654 as ISO2022 of 7-bit environment. This doesn't use any locking | 2976 as ISO2022 of 7-bit environment. This doesn't use any locking |
| 2655 shift and single shift functions. Assigned the coding-system | 2977 shift and single shift functions. This can encode/decode all |
| 2656 (Lisp symbol) `iso-2022-7bit' by default. | 2978 charsets. Assigned the coding-system (Lisp symbol) |
| 2979 `iso-2022-7bit' by default. | |
| 2980 | |
| 2981 o coding-category-iso-7-tight | |
| 2982 | |
| 2983 Same as coding-category-iso-7 except that this can | |
| 2984 encode/decode only the specified charsets. | |
| 2657 | 2985 |
| 2658 o coding-category-iso-8-1 | 2986 o coding-category-iso-8-1 |
| 2659 | 2987 |
| 2660 The category for a coding system which has the same code range | 2988 The category for a coding system which has the same code range |
| 2661 as ISO2022 of 8-bit environment and graphic plane 1 used only | 2989 as ISO2022 of 8-bit environment and graphic plane 1 used only |
| 2705 highest priority. Priorities of categories are also specified by a | 3033 highest priority. Priorities of categories are also specified by a |
| 2706 user in a Lisp variable `coding-category-list'. | 3034 user in a Lisp variable `coding-category-list'. |
| 2707 | 3035 |
| 2708 */ | 3036 */ |
| 2709 | 3037 |
| 2710 /* Detect how a text of length SRC_BYTES pointed by SRC is encoded. | 3038 /* Detect how a text of length SRC_BYTES pointed by SOURCE is encoded. |
| 2711 If it detects possible coding systems, return an integer in which | 3039 If it detects possible coding systems, return an integer in which |
| 2712 appropriate flag bits are set. Flag bits are defined by macros | 3040 appropriate flag bits are set. Flag bits are defined by macros |
| 2713 CODING_CATEGORY_MASK_XXX in `coding.h'. */ | 3041 CODING_CATEGORY_MASK_XXX in `coding.h'. |
| 2714 | 3042 |
| 2715 int | 3043 How many ASCII characters are at the head is returned as *SKIP. */ |
| 2716 detect_coding_mask (src, src_bytes) | 3044 |
| 2717 unsigned char *src; | 3045 static int |
| 2718 int src_bytes; | 3046 detect_coding_mask (source, src_bytes, priorities, skip) |
| 3047 unsigned char *source; | |
| 3048 int src_bytes, *priorities, *skip; | |
| 2719 { | 3049 { |
| 2720 register unsigned char c; | 3050 register unsigned char c; |
| 2721 unsigned char *src_end = src + src_bytes; | 3051 unsigned char *src = source, *src_end = source + src_bytes; |
| 2722 int mask; | 3052 unsigned int mask = (CODING_CATEGORY_MASK_ISO_7BIT |
| 3053 | CODING_CATEGORY_MASK_ISO_SHIFT); | |
| 3054 int i; | |
| 2723 | 3055 |
| 2724 /* At first, skip all ASCII characters and control characters except | 3056 /* At first, skip all ASCII characters and control characters except |
| 2725 for three ISO2022 specific control characters. */ | 3057 for three ISO2022 specific control characters. */ |
| 2726 label_loop_detect_coding: | 3058 label_loop_detect_coding: |
| 2727 while (src < src_end) | 3059 while (src < src_end) |
| 2728 { | 3060 { |
| 2729 c = *src; | 3061 c = *src; |
| 2730 if (c >= 0x80 | 3062 if (c >= 0x80 |
| 2731 || (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO)) | 3063 || ((mask & CODING_CATEGORY_MASK_ISO_7BIT) |
| 3064 && c == ISO_CODE_ESC) | |
| 3065 || ((mask & CODING_CATEGORY_MASK_ISO_SHIFT) | |
| 3066 && (c == ISO_CODE_SI || c == ISO_CODE_SO))) | |
| 2732 break; | 3067 break; |
| 2733 src++; | 3068 src++; |
| 2734 } | 3069 } |
| 3070 *skip = src - source; | |
| 2735 | 3071 |
| 2736 if (src >= src_end) | 3072 if (src >= src_end) |
| 2737 /* We found nothing other than ASCII. There's nothing to do. */ | 3073 /* We found nothing other than ASCII. There's nothing to do. */ |
| 2738 return CODING_CATEGORY_MASK_ANY; | 3074 return 0; |
| 2739 | 3075 |
| 2740 /* The text seems to be encoded in some multilingual coding system. | 3076 /* The text seems to be encoded in some multilingual coding system. |
| 2741 Now, try to find in which coding system the text is encoded. */ | 3077 Now, try to find in which coding system the text is encoded. */ |
| 2742 if (c < 0x80) | 3078 if (c < 0x80) |
| 2743 { | 3079 { |
| 2744 /* i.e. (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) */ | 3080 /* i.e. (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) */ |
| 2745 /* C is an ISO2022 specific control code of C0. */ | 3081 /* C is an ISO2022 specific control code of C0. */ |
| 2746 mask = detect_coding_iso2022 (src, src_end); | 3082 mask = detect_coding_iso2022 (src, src_end); |
| 2747 src++; | |
| 2748 if (mask == 0) | 3083 if (mask == 0) |
| 2749 /* No valid ISO2022 code follows C. Try again. */ | 3084 { |
| 2750 goto label_loop_detect_coding; | 3085 /* No valid ISO2022 code follows C. Try again. */ |
| 2751 mask |= CODING_CATEGORY_MASK_RAW_TEXT; | 3086 src++; |
| 2752 } | 3087 mask = (c != ISO_CODE_ESC |
| 2753 else if (c < 0xA0) | 3088 ? CODING_CATEGORY_MASK_ISO_7BIT |
| 2754 { | 3089 : CODING_CATEGORY_MASK_ISO_SHIFT); |
| 2755 /* If C is a special latin extra code, | 3090 goto label_loop_detect_coding; |
| 2756 or is an ISO2022 specific control code of C1 (SS2 or SS3), | 3091 } |
| 2757 or is an ISO2022 control-sequence-introducer (CSI), | 3092 if (priorities) |
| 2758 we should also consider the possibility of ISO2022 codings. */ | 3093 goto label_return_highest_only; |
| 2759 if ((VECTORP (Vlatin_extra_code_table) | 3094 } |
| 2760 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c])) | 3095 else |
| 2761 || (c == ISO_CODE_SS2 || c == ISO_CODE_SS3) | 3096 { |
| 2762 || (c == ISO_CODE_CSI | 3097 int try; |
| 2763 && (src < src_end | 3098 |
| 2764 && (*src == ']' | 3099 if (c < 0xA0) |
| 2765 || (src + 1 < src_end | 3100 { |
| 2766 && src[1] == ']' | 3101 /* C is the first byte of SJIS character code, |
| 2767 && (*src == '0' || *src == '1' || *src == '2')))))) | 3102 or a leading-code of Emacs' internal format (emacs-mule). */ |
| 2768 mask = (detect_coding_iso2022 (src, src_end) | 3103 try = CODING_CATEGORY_MASK_SJIS | CODING_CATEGORY_MASK_EMACS_MULE; |
| 2769 | detect_coding_sjis (src, src_end) | 3104 |
| 2770 | detect_coding_emacs_mule (src, src_end) | 3105 /* Or, if C is a special latin extra code, |
| 2771 | CODING_CATEGORY_MASK_RAW_TEXT); | 3106 or is an ISO2022 specific control code of C1 (SS2 or SS3), |
| 2772 | 3107 or is an ISO2022 control-sequence-introducer (CSI), |
| 3108 we should also consider the possibility of ISO2022 codings. */ | |
| 3109 if ((VECTORP (Vlatin_extra_code_table) | |
| 3110 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c])) | |
| 3111 || (c == ISO_CODE_SS2 || c == ISO_CODE_SS3) | |
| 3112 || (c == ISO_CODE_CSI | |
| 3113 && (src < src_end | |
| 3114 && (*src == ']' | |
| 3115 || ((*src == '0' || *src == '1' || *src == '2') | |
| 3116 && src + 1 < src_end | |
| 3117 && src[1] == ']'))))) | |
| 3118 try |= (CODING_CATEGORY_MASK_ISO_8_ELSE | |
| 3119 | CODING_CATEGORY_MASK_ISO_8BIT); | |
| 3120 } | |
| 2773 else | 3121 else |
| 2774 /* C is the first byte of SJIS character code, | 3122 /* C is a character of ISO2022 in graphic plane right, |
| 2775 or a leading-code of Emacs' internal format (emacs-mule). */ | 3123 or a SJIS's 1-byte character code (i.e. JISX0201), |
| 2776 mask = (detect_coding_sjis (src, src_end) | 3124 or the first byte of BIG5's 2-byte code. */ |
| 2777 | detect_coding_emacs_mule (src, src_end) | 3125 try = (CODING_CATEGORY_MASK_ISO_8_ELSE |
| 2778 | CODING_CATEGORY_MASK_RAW_TEXT); | 3126 | CODING_CATEGORY_MASK_ISO_8BIT |
| 2779 } | 3127 | CODING_CATEGORY_MASK_SJIS |
| 2780 else | 3128 | CODING_CATEGORY_MASK_BIG5); |
| 2781 /* C is a character of ISO2022 in graphic plane right, | 3129 |
| 2782 or a SJIS's 1-byte character code (i.e. JISX0201), | 3130 mask = 0; |
| 2783 or the first byte of BIG5's 2-byte code. */ | 3131 if (priorities) |
| 2784 mask = (detect_coding_iso2022 (src, src_end) | 3132 { |
| 2785 | detect_coding_sjis (src, src_end) | 3133 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++) |
| 2786 | detect_coding_big5 (src, src_end) | 3134 { |
| 2787 | CODING_CATEGORY_MASK_RAW_TEXT); | 3135 priorities[i] &= try; |
| 2788 | 3136 if (priorities[i] & CODING_CATEGORY_MASK_ISO) |
| 2789 return mask; | 3137 mask = detect_coding_iso2022 (src, src_end); |
| 3138 else if (priorities[i] & CODING_CATEGORY_MASK_SJIS) | |
| 3139 mask = detect_coding_sjis (src, src_end); | |
| 3140 else if (priorities[i] & CODING_CATEGORY_MASK_BIG5) | |
| 3141 mask = detect_coding_big5 (src, src_end); | |
| 3142 else if (priorities[i] & CODING_CATEGORY_MASK_EMACS_MULE) | |
| 3143 mask = detect_coding_emacs_mule (src, src_end); | |
| 3144 if (mask) | |
| 3145 goto label_return_highest_only; | |
| 3146 } | |
| 3147 return CODING_CATEGORY_MASK_RAW_TEXT; | |
| 3148 } | |
| 3149 if (try & CODING_CATEGORY_MASK_ISO) | |
| 3150 mask |= detect_coding_iso2022 (src, src_end); | |
| 3151 if (try & CODING_CATEGORY_MASK_SJIS) | |
| 3152 mask |= detect_coding_sjis (src, src_end); | |
| 3153 if (try & CODING_CATEGORY_MASK_BIG5) | |
| 3154 mask |= detect_coding_big5 (src, src_end); | |
| 3155 if (try & CODING_CATEGORY_MASK_EMACS_MULE) | |
| 3156 mask |= detect_coding_emacs_mule (src, src_end); | |
| 3157 } | |
| 3158 return (mask | CODING_CATEGORY_MASK_RAW_TEXT); | |
| 3159 | |
| 3160 label_return_highest_only: | |
| 3161 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++) | |
| 3162 { | |
| 3163 if (mask & priorities[i]) | |
| 3164 return priorities[i]; | |
| 3165 } | |
| 3166 return CODING_CATEGORY_MASK_RAW_TEXT; | |
| 2790 } | 3167 } |
| 2791 | 3168 |
| 2792 /* Detect how a text of length SRC_BYTES pointed by SRC is encoded. | 3169 /* Detect how a text of length SRC_BYTES pointed by SRC is encoded. |
| 2793 The information of the detected coding system is set in CODING. */ | 3170 The information of the detected coding system is set in CODING. */ |
| 2794 | 3171 |
| 2796 detect_coding (coding, src, src_bytes) | 3173 detect_coding (coding, src, src_bytes) |
| 2797 struct coding_system *coding; | 3174 struct coding_system *coding; |
| 2798 unsigned char *src; | 3175 unsigned char *src; |
| 2799 int src_bytes; | 3176 int src_bytes; |
| 2800 { | 3177 { |
| 2801 int mask = detect_coding_mask (src, src_bytes); | 3178 unsigned int idx; |
| 2802 int idx; | 3179 int skip, mask, i; |
| 3180 int priorities[CODING_CATEGORY_IDX_MAX]; | |
| 2803 Lisp_Object val = Vcoding_category_list; | 3181 Lisp_Object val = Vcoding_category_list; |
| 2804 | 3182 |
| 2805 if (mask == CODING_CATEGORY_MASK_ANY) | 3183 i = 0; |
| 2806 /* We found nothing other than ASCII. There's nothing to do. */ | 3184 while (CONSP (val) && i < CODING_CATEGORY_IDX_MAX) |
| 2807 return; | 3185 { |
| 2808 | 3186 if (! SYMBOLP (XCONS (val)->car)) |
| 2809 /* We found some plausible coding systems. Let's use a coding | 3187 break; |
| 2810 system of the highest priority. */ | 3188 idx = XFASTINT (Fget (XCONS (val)->car, Qcoding_category_index)); |
| 2811 | 3189 if (idx >= CODING_CATEGORY_IDX_MAX) |
| 2812 if (CONSP (val)) | 3190 break; |
| 2813 while (!NILP (val)) | 3191 priorities[i++] = (1 << idx); |
| 2814 { | 3192 val = XCONS (val)->cdr; |
| 2815 idx = XFASTINT (Fget (XCONS (val)->car, Qcoding_category_index)); | 3193 } |
| 2816 if ((idx < CODING_CATEGORY_IDX_MAX) && (mask & (1 << idx))) | 3194 /* If coding-category-list is valid and contains all coding |
| 2817 break; | 3195 categories, `i' should be CODING_CATEGORY_IDX_MAX now. If not, |
| 2818 val = XCONS (val)->cdr; | 3196 the following code saves Emacs from craching. */ |
| 2819 } | 3197 while (i < CODING_CATEGORY_IDX_MAX) |
| 2820 else | 3198 priorities[i++] = CODING_CATEGORY_MASK_RAW_TEXT; |
| 2821 val = Qnil; | 3199 |
| 2822 | 3200 mask = detect_coding_mask (src, src_bytes, priorities, &skip); |
| 2823 if (NILP (val)) | 3201 coding->heading_ascii = skip; |
| 2824 { | 3202 |
| 2825 /* For unknown reason, `Vcoding_category_list' contains none of | 3203 if (!mask) return; |
| 2826 found categories. Let's use any of them. */ | 3204 |
| 2827 for (idx = 0; idx < CODING_CATEGORY_IDX_MAX; idx++) | 3205 /* We found a single coding system of the highest priority in MASK. */ |
| 2828 if (mask & (1 << idx)) | 3206 idx = 0; |
| 2829 break; | 3207 while (mask && ! (mask & 1)) mask >>= 1, idx++; |
| 2830 } | 3208 if (! mask) |
| 2831 setup_coding_system (XSYMBOL (coding_category_table[idx])->value, coding); | 3209 idx = CODING_CATEGORY_IDX_RAW_TEXT; |
| 2832 } | 3210 |
| 2833 | 3211 val = XSYMBOL (XVECTOR (Vcoding_category_table)->contents[idx])->value; |
| 2834 /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC | 3212 |
| 2835 is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF, | 3213 if (coding->eol_type != CODING_EOL_UNDECIDED) |
| 2836 CODING_EOL_CR, and CODING_EOL_UNDECIDED. */ | 3214 { |
| 3215 Lisp_Object tmp = Fget (val, Qeol_type); | |
| 3216 | |
| 3217 if (VECTORP (tmp)) | |
| 3218 val = XVECTOR (tmp)->contents[coding->eol_type]; | |
| 3219 } | |
| 3220 setup_coding_system (val, coding); | |
| 3221 /* Set this again because setup_coding_system reset this member. */ | |
| 3222 coding->heading_ascii = skip; | |
| 3223 } | |
| 3224 | |
| 3225 /* Detect how end-of-line of a text of length SRC_BYTES pointed by | |
| 3226 SOURCE is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF, | |
| 3227 CODING_EOL_CR, and CODING_EOL_UNDECIDED. | |
| 3228 | |
| 3229 How many non-eol characters are at the head is returned as *SKIP. */ | |
| 2837 | 3230 |
| 2838 #define MAX_EOL_CHECK_COUNT 3 | 3231 #define MAX_EOL_CHECK_COUNT 3 |
| 2839 | 3232 |
| 2840 int | 3233 static int |
| 2841 detect_eol_type (src, src_bytes) | 3234 detect_eol_type (source, src_bytes, skip) |
| 2842 unsigned char *src; | 3235 unsigned char *source; |
| 2843 int src_bytes; | 3236 int src_bytes, *skip; |
| 2844 { | 3237 { |
| 2845 unsigned char *src_end = src + src_bytes; | 3238 unsigned char *src = source, *src_end = src + src_bytes; |
| 2846 unsigned char c; | 3239 unsigned char c; |
| 2847 int total = 0; /* How many end-of-lines are found so far. */ | 3240 int total = 0; /* How many end-of-lines are found so far. */ |
| 2848 int eol_type = CODING_EOL_UNDECIDED; | 3241 int eol_type = CODING_EOL_UNDECIDED; |
| 2849 int this_eol_type; | 3242 int this_eol_type; |
| 2850 | 3243 |
| 3244 *skip = 0; | |
| 3245 | |
| 2851 while (src < src_end && total < MAX_EOL_CHECK_COUNT) | 3246 while (src < src_end && total < MAX_EOL_CHECK_COUNT) |
| 2852 { | 3247 { |
| 2853 c = *src++; | 3248 c = *src++; |
| 2854 if (c == '\n' || c == '\r') | 3249 if (c == '\n' || c == '\r') |
| 2855 { | 3250 { |
| 3251 if (*skip == 0) | |
| 3252 *skip = src - 1 - source; | |
| 2856 total++; | 3253 total++; |
| 2857 if (c == '\n') | 3254 if (c == '\n') |
| 2858 this_eol_type = CODING_EOL_LF; | 3255 this_eol_type = CODING_EOL_LF; |
| 2859 else if (src >= src_end || *src != '\n') | 3256 else if (src >= src_end || *src != '\n') |
| 2860 this_eol_type = CODING_EOL_CR; | 3257 this_eol_type = CODING_EOL_CR; |
| 2863 | 3260 |
| 2864 if (eol_type == CODING_EOL_UNDECIDED) | 3261 if (eol_type == CODING_EOL_UNDECIDED) |
| 2865 /* This is the first end-of-line. */ | 3262 /* This is the first end-of-line. */ |
| 2866 eol_type = this_eol_type; | 3263 eol_type = this_eol_type; |
| 2867 else if (eol_type != this_eol_type) | 3264 else if (eol_type != this_eol_type) |
| 2868 /* The found type is different from what found before. | 3265 { |
| 2869 Let's notice the caller about this inconsistency. */ | 3266 /* The found type is different from what found before. */ |
| 2870 return CODING_EOL_INCONSISTENT; | 3267 eol_type = CODING_EOL_INCONSISTENT; |
| 2871 } | 3268 break; |
| 2872 } | 3269 } |
| 2873 | 3270 } |
| 3271 } | |
| 3272 | |
| 3273 if (*skip == 0) | |
| 3274 *skip = src_end - source; | |
| 2874 return eol_type; | 3275 return eol_type; |
| 2875 } | 3276 } |
| 2876 | 3277 |
| 2877 /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC | 3278 /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC |
| 2878 is encoded. If it detects an appropriate format of end-of-line, it | 3279 is encoded. If it detects an appropriate format of end-of-line, it |
| 2883 struct coding_system *coding; | 3284 struct coding_system *coding; |
| 2884 unsigned char *src; | 3285 unsigned char *src; |
| 2885 int src_bytes; | 3286 int src_bytes; |
| 2886 { | 3287 { |
| 2887 Lisp_Object val; | 3288 Lisp_Object val; |
| 2888 int eol_type = detect_eol_type (src, src_bytes); | 3289 int skip; |
| 3290 int eol_type = detect_eol_type (src, src_bytes, &skip); | |
| 3291 | |
| 3292 if (coding->heading_ascii > skip) | |
| 3293 coding->heading_ascii = skip; | |
| 3294 else | |
| 3295 skip = coding->heading_ascii; | |
| 2889 | 3296 |
| 2890 if (eol_type == CODING_EOL_UNDECIDED) | 3297 if (eol_type == CODING_EOL_UNDECIDED) |
| 2891 /* We found no end-of-line in the source text. */ | |
| 2892 return; | 3298 return; |
| 2893 | |
| 2894 if (eol_type == CODING_EOL_INCONSISTENT) | 3299 if (eol_type == CODING_EOL_INCONSISTENT) |
| 2895 { | 3300 { |
| 2896 #if 0 | 3301 #if 0 |
| 2897 /* This code is suppressed until we find a better way to | 3302 /* This code is suppressed until we find a better way to |
| 2898 distinguish raw text file and binary file. */ | 3303 distinguish raw text file and binary file. */ |
| 2909 eol_type = CODING_EOL_LF; | 3314 eol_type = CODING_EOL_LF; |
| 2910 } | 3315 } |
| 2911 | 3316 |
| 2912 val = Fget (coding->symbol, Qeol_type); | 3317 val = Fget (coding->symbol, Qeol_type); |
| 2913 if (VECTORP (val) && XVECTOR (val)->size == 3) | 3318 if (VECTORP (val) && XVECTOR (val)->size == 3) |
| 2914 setup_coding_system (XVECTOR (val)->contents[eol_type], coding); | 3319 { |
| 3320 setup_coding_system (XVECTOR (val)->contents[eol_type], coding); | |
| 3321 coding->heading_ascii = skip; | |
| 3322 } | |
| 3323 } | |
| 3324 | |
| 3325 #define CONVERSION_BUFFER_EXTRA_ROOM 256 | |
| 3326 | |
| 3327 #define DECODING_BUFFER_MAG(coding) \ | |
| 3328 (coding->type == coding_type_iso2022 \ | |
| 3329 ? 3 \ | |
| 3330 : ((coding->type == coding_type_sjis || coding->type == coding_type_big5) \ | |
| 3331 ? 2 \ | |
| 3332 : (coding->type == coding_type_raw_text \ | |
| 3333 ? 1 \ | |
| 3334 : (coding->type == coding_type_ccl \ | |
| 3335 ? coding->spec.ccl.decoder.buf_magnification \ | |
| 3336 : 2)))) | |
| 3337 | |
| 3338 /* Return maximum size (bytes) of a buffer enough for decoding | |
| 3339 SRC_BYTES of text encoded in CODING. */ | |
| 3340 | |
| 3341 int | |
| 3342 decoding_buffer_size (coding, src_bytes) | |
| 3343 struct coding_system *coding; | |
| 3344 int src_bytes; | |
| 3345 { | |
| 3346 return (src_bytes * DECODING_BUFFER_MAG (coding) | |
| 3347 + CONVERSION_BUFFER_EXTRA_ROOM); | |
| 3348 } | |
| 3349 | |
| 3350 /* Return maximum size (bytes) of a buffer enough for encoding | |
| 3351 SRC_BYTES of text to CODING. */ | |
| 3352 | |
| 3353 int | |
| 3354 encoding_buffer_size (coding, src_bytes) | |
| 3355 struct coding_system *coding; | |
| 3356 int src_bytes; | |
| 3357 { | |
| 3358 int magnification; | |
| 3359 | |
| 3360 if (coding->type == coding_type_ccl) | |
| 3361 magnification = coding->spec.ccl.encoder.buf_magnification; | |
| 3362 else | |
| 3363 magnification = 3; | |
| 3364 | |
| 3365 return (src_bytes * magnification + CONVERSION_BUFFER_EXTRA_ROOM); | |
| 3366 } | |
| 3367 | |
| 3368 #ifndef MINIMUM_CONVERSION_BUFFER_SIZE | |
| 3369 #define MINIMUM_CONVERSION_BUFFER_SIZE 1024 | |
| 3370 #endif | |
| 3371 | |
| 3372 char *conversion_buffer; | |
| 3373 int conversion_buffer_size; | |
| 3374 | |
| 3375 /* Return a pointer to a SIZE bytes of buffer to be used for encoding | |
| 3376 or decoding. Sufficient memory is allocated automatically. If we | |
| 3377 run out of memory, return NULL. */ | |
| 3378 | |
| 3379 char * | |
| 3380 get_conversion_buffer (size) | |
| 3381 int size; | |
| 3382 { | |
| 3383 if (size > conversion_buffer_size) | |
| 3384 { | |
| 3385 char *buf; | |
| 3386 int real_size = conversion_buffer_size * 2; | |
| 3387 | |
| 3388 while (real_size < size) real_size *= 2; | |
| 3389 buf = (char *) xmalloc (real_size); | |
| 3390 xfree (conversion_buffer); | |
| 3391 conversion_buffer = buf; | |
| 3392 conversion_buffer_size = real_size; | |
| 3393 } | |
| 3394 return conversion_buffer; | |
| 3395 } | |
| 3396 | |
| 3397 int | |
| 3398 ccl_coding_driver (coding, source, destination, src_bytes, dst_bytes, encodep) | |
| 3399 struct coding_system *coding; | |
| 3400 unsigned char *source, *destination; | |
| 3401 int src_bytes, dst_bytes, encodep; | |
| 3402 { | |
| 3403 struct ccl_program *ccl | |
| 3404 = encodep ? &coding->spec.ccl.encoder : &coding->spec.ccl.decoder; | |
| 3405 int result; | |
| 3406 | |
| 3407 coding->produced = ccl_driver (ccl, source, destination, | |
| 3408 src_bytes, dst_bytes, &(coding->consumed)); | |
| 3409 if (encodep) | |
| 3410 { | |
| 3411 coding->produced_char = coding->produced; | |
| 3412 coding->consumed_char | |
| 3413 = multibyte_chars_in_text (source, coding->consumed); | |
| 3414 } | |
| 3415 else | |
| 3416 { | |
| 3417 coding->produced_char | |
| 3418 = multibyte_chars_in_text (destination, coding->produced); | |
| 3419 coding->consumed_char = coding->consumed; | |
| 3420 } | |
| 3421 switch (ccl->status) | |
| 3422 { | |
| 3423 case CCL_STAT_SUSPEND_BY_SRC: | |
| 3424 result = CODING_FINISH_INSUFFICIENT_SRC; | |
| 3425 break; | |
| 3426 case CCL_STAT_SUSPEND_BY_DST: | |
| 3427 result = CODING_FINISH_INSUFFICIENT_DST; | |
| 3428 break; | |
| 3429 default: | |
| 3430 result = CODING_FINISH_NORMAL; | |
| 3431 break; | |
| 3432 } | |
| 3433 return result; | |
| 2915 } | 3434 } |
| 2916 | 3435 |
| 2917 /* See "GENERAL NOTES about `decode_coding_XXX ()' functions". Before | 3436 /* See "GENERAL NOTES about `decode_coding_XXX ()' functions". Before |
| 2918 decoding, it may detect coding system and format of end-of-line if | 3437 decoding, it may detect coding system and format of end-of-line if |
| 2919 those are not yet decided. */ | 3438 those are not yet decided. */ |
| 2920 | 3439 |
| 2921 int | 3440 int |
| 2922 decode_coding (coding, source, destination, src_bytes, dst_bytes, consumed) | 3441 decode_coding (coding, source, destination, src_bytes, dst_bytes) |
| 2923 struct coding_system *coding; | 3442 struct coding_system *coding; |
| 2924 unsigned char *source, *destination; | 3443 unsigned char *source, *destination; |
| 2925 int src_bytes, dst_bytes; | 3444 int src_bytes, dst_bytes; |
| 2926 int *consumed; | 3445 { |
| 2927 { | 3446 int result; |
| 2928 int produced; | |
| 2929 | 3447 |
| 2930 if (src_bytes <= 0) | 3448 if (src_bytes <= 0) |
| 2931 { | 3449 { |
| 2932 *consumed = 0; | 3450 coding->produced = coding->produced_char = 0; |
| 2933 return 0; | 3451 coding->consumed = coding->consumed_char = 0; |
| 3452 return CODING_FINISH_NORMAL; | |
| 2934 } | 3453 } |
| 2935 | 3454 |
| 2936 if (coding->type == coding_type_undecided) | 3455 if (coding->type == coding_type_undecided) |
| 2937 detect_coding (coding, source, src_bytes); | 3456 detect_coding (coding, source, src_bytes); |
| 2938 | 3457 |
| 2939 if (coding->eol_type == CODING_EOL_UNDECIDED) | 3458 if (coding->eol_type == CODING_EOL_UNDECIDED) |
| 2940 detect_eol (coding, source, src_bytes); | 3459 detect_eol (coding, source, src_bytes); |
| 2941 | 3460 |
| 2942 coding->carryover_size = 0; | |
| 2943 switch (coding->type) | 3461 switch (coding->type) |
| 2944 { | 3462 { |
| 2945 case coding_type_no_conversion: | |
| 2946 label_no_conversion: | |
| 2947 produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes; | |
| 2948 bcopy (source, destination, produced); | |
| 2949 *consumed = produced; | |
| 2950 break; | |
| 2951 | |
| 2952 case coding_type_emacs_mule: | 3463 case coding_type_emacs_mule: |
| 2953 case coding_type_undecided: | 3464 case coding_type_undecided: |
| 2954 case coding_type_raw_text: | 3465 case coding_type_raw_text: |
| 2955 if (coding->eol_type == CODING_EOL_LF | 3466 if (coding->eol_type == CODING_EOL_LF |
| 2956 || coding->eol_type == CODING_EOL_UNDECIDED) | 3467 || coding->eol_type == CODING_EOL_UNDECIDED) |
| 2957 goto label_no_conversion; | 3468 goto label_no_conversion; |
| 2958 produced = decode_eol (coding, source, destination, | 3469 result = decode_eol (coding, source, destination, src_bytes, dst_bytes); |
| 2959 src_bytes, dst_bytes, consumed); | |
| 2960 break; | 3470 break; |
| 2961 | 3471 |
| 2962 case coding_type_sjis: | 3472 case coding_type_sjis: |
| 2963 produced = decode_coding_sjis_big5 (coding, source, destination, | 3473 result = decode_coding_sjis_big5 (coding, source, destination, |
| 2964 src_bytes, dst_bytes, consumed, | 3474 src_bytes, dst_bytes, 1); |
| 2965 1); | |
| 2966 break; | 3475 break; |
| 2967 | 3476 |
| 2968 case coding_type_iso2022: | 3477 case coding_type_iso2022: |
| 2969 produced = decode_coding_iso2022 (coding, source, destination, | 3478 result = decode_coding_iso2022 (coding, source, destination, |
| 2970 src_bytes, dst_bytes, consumed); | 3479 src_bytes, dst_bytes); |
| 2971 break; | 3480 break; |
| 2972 | 3481 |
| 2973 case coding_type_big5: | 3482 case coding_type_big5: |
| 2974 produced = decode_coding_sjis_big5 (coding, source, destination, | 3483 result = decode_coding_sjis_big5 (coding, source, destination, |
| 2975 src_bytes, dst_bytes, consumed, | 3484 src_bytes, dst_bytes, 0); |
| 2976 0); | |
| 2977 break; | 3485 break; |
| 2978 | 3486 |
| 2979 case coding_type_ccl: | 3487 case coding_type_ccl: |
| 2980 produced = ccl_driver (&coding->spec.ccl.decoder, source, destination, | 3488 result = ccl_coding_driver (coding, source, destination, |
| 2981 src_bytes, dst_bytes, consumed); | 3489 src_bytes, dst_bytes, 0); |
| 2982 break; | 3490 break; |
| 2983 } | 3491 |
| 2984 | 3492 default: /* i.e. case coding_type_no_conversion: */ |
| 2985 return produced; | 3493 label_no_conversion: |
| 3494 if (dst_bytes && src_bytes > dst_bytes) | |
| 3495 { | |
| 3496 coding->produced = dst_bytes; | |
| 3497 result = CODING_FINISH_INSUFFICIENT_DST; | |
| 3498 } | |
| 3499 else | |
| 3500 { | |
| 3501 coding->produced = src_bytes; | |
| 3502 result = CODING_FINISH_NORMAL; | |
| 3503 } | |
| 3504 if (dst_bytes) | |
| 3505 bcopy (source, destination, coding->produced); | |
| 3506 else | |
| 3507 safe_bcopy (source, destination, coding->produced); | |
| 3508 coding->consumed | |
| 3509 = coding->consumed_char = coding->produced_char = coding->produced; | |
| 3510 break; | |
| 3511 } | |
| 3512 | |
| 3513 return result; | |
| 2986 } | 3514 } |
| 2987 | 3515 |
| 2988 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". */ | 3516 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". */ |
| 2989 | 3517 |
| 2990 int | 3518 int |
| 2991 encode_coding (coding, source, destination, src_bytes, dst_bytes, consumed) | 3519 encode_coding (coding, source, destination, src_bytes, dst_bytes) |
| 2992 struct coding_system *coding; | 3520 struct coding_system *coding; |
| 2993 unsigned char *source, *destination; | 3521 unsigned char *source, *destination; |
| 2994 int src_bytes, dst_bytes; | 3522 int src_bytes, dst_bytes; |
| 2995 int *consumed; | 3523 { |
| 2996 { | 3524 int result; |
| 2997 int produced; | 3525 |
| 3526 if (src_bytes <= 0) | |
| 3527 { | |
| 3528 coding->produced = coding->produced_char = 0; | |
| 3529 coding->consumed = coding->consumed_char = 0; | |
| 3530 return CODING_FINISH_NORMAL; | |
| 3531 } | |
| 2998 | 3532 |
| 2999 switch (coding->type) | 3533 switch (coding->type) |
| 3000 { | 3534 { |
| 3001 case coding_type_no_conversion: | |
| 3002 label_no_conversion: | |
| 3003 produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes; | |
| 3004 if (produced > 0) | |
| 3005 { | |
| 3006 bcopy (source, destination, produced); | |
| 3007 if (coding->selective) | |
| 3008 { | |
| 3009 unsigned char *p = destination, *pend = destination + produced; | |
| 3010 while (p < pend) | |
| 3011 if (*p++ == '\015') p[-1] = '\n'; | |
| 3012 } | |
| 3013 } | |
| 3014 *consumed = produced; | |
| 3015 break; | |
| 3016 | |
| 3017 case coding_type_emacs_mule: | 3535 case coding_type_emacs_mule: |
| 3018 case coding_type_undecided: | 3536 case coding_type_undecided: |
| 3019 case coding_type_raw_text: | 3537 case coding_type_raw_text: |
| 3020 if (coding->eol_type == CODING_EOL_LF | 3538 if (coding->eol_type == CODING_EOL_LF |
| 3021 || coding->eol_type == CODING_EOL_UNDECIDED) | 3539 || coding->eol_type == CODING_EOL_UNDECIDED) |
| 3022 goto label_no_conversion; | 3540 goto label_no_conversion; |
| 3023 produced = encode_eol (coding, source, destination, | 3541 result = encode_eol (coding, source, destination, src_bytes, dst_bytes); |
| 3024 src_bytes, dst_bytes, consumed); | |
| 3025 break; | 3542 break; |
| 3026 | 3543 |
| 3027 case coding_type_sjis: | 3544 case coding_type_sjis: |
| 3028 produced = encode_coding_sjis_big5 (coding, source, destination, | 3545 result = encode_coding_sjis_big5 (coding, source, destination, |
| 3029 src_bytes, dst_bytes, consumed, | 3546 src_bytes, dst_bytes, 1); |
| 3030 1); | |
| 3031 break; | 3547 break; |
| 3032 | 3548 |
| 3033 case coding_type_iso2022: | 3549 case coding_type_iso2022: |
| 3034 produced = encode_coding_iso2022 (coding, source, destination, | 3550 result = encode_coding_iso2022 (coding, source, destination, |
| 3035 src_bytes, dst_bytes, consumed); | 3551 src_bytes, dst_bytes); |
| 3036 break; | 3552 break; |
| 3037 | 3553 |
| 3038 case coding_type_big5: | 3554 case coding_type_big5: |
| 3039 produced = encode_coding_sjis_big5 (coding, source, destination, | 3555 result = encode_coding_sjis_big5 (coding, source, destination, |
| 3040 src_bytes, dst_bytes, consumed, | 3556 src_bytes, dst_bytes, 0); |
| 3041 0); | |
| 3042 break; | 3557 break; |
| 3043 | 3558 |
| 3044 case coding_type_ccl: | 3559 case coding_type_ccl: |
| 3045 produced = ccl_driver (&coding->spec.ccl.encoder, source, destination, | 3560 result = ccl_coding_driver (coding, source, destination, |
| 3046 src_bytes, dst_bytes, consumed); | 3561 src_bytes, dst_bytes, 1); |
| 3047 break; | 3562 break; |
| 3048 } | 3563 |
| 3049 | 3564 default: /* i.e. case coding_type_no_conversion: */ |
| 3050 return produced; | 3565 label_no_conversion: |
| 3051 } | 3566 if (dst_bytes && src_bytes > dst_bytes) |
| 3052 | 3567 { |
| 3053 #define CONVERSION_BUFFER_EXTRA_ROOM 256 | 3568 coding->produced = dst_bytes; |
| 3054 | 3569 result = CODING_FINISH_INSUFFICIENT_DST; |
| 3055 /* Return maximum size (bytes) of a buffer enough for decoding | 3570 } |
| 3056 SRC_BYTES of text encoded in CODING. */ | 3571 else |
| 3572 { | |
| 3573 coding->produced = src_bytes; | |
| 3574 result = CODING_FINISH_NORMAL; | |
| 3575 } | |
| 3576 if (dst_bytes) | |
| 3577 bcopy (source, destination, coding->produced); | |
| 3578 else | |
| 3579 safe_bcopy (source, destination, coding->produced); | |
| 3580 if (coding->mode & CODING_MODE_SELECTIVE_DISPLAY) | |
| 3581 { | |
| 3582 unsigned char *p = destination, *pend = p + coding->produced; | |
| 3583 while (p < pend) | |
| 3584 if (*p++ == '\015') p[-1] = '\n'; | |
| 3585 } | |
| 3586 coding->consumed | |
| 3587 = coding->consumed_char = coding->produced_char = coding->produced; | |
| 3588 break; | |
| 3589 } | |
| 3590 | |
| 3591 return result; | |
| 3592 } | |
| 3593 | |
| 3594 /* Scan text in the region between *BEG and *END, skip characters | |
| 3595 which we don't have to decode by coding system CODING at the head | |
| 3596 and tail, then set *BEG and *END to the region of the text we | |
| 3597 actually have to convert. | |
| 3598 | |
| 3599 If STR is not NULL, *BEG and *END are indices into STR. */ | |
| 3600 | |
| 3601 static void | |
| 3602 shrink_decoding_region (beg, end, coding, str) | |
| 3603 int *beg, *end; | |
| 3604 struct coding_system *coding; | |
| 3605 unsigned char *str; | |
| 3606 { | |
| 3607 unsigned char *begp_orig, *begp, *endp_orig, *endp; | |
| 3608 int eol_conversion; | |
| 3609 | |
| 3610 if (coding->type == coding_type_ccl | |
| 3611 || coding->type == coding_type_undecided | |
| 3612 || !NILP (coding->post_read_conversion)) | |
| 3613 { | |
| 3614 /* We can't skip any data. */ | |
| 3615 return; | |
| 3616 } | |
| 3617 else if (coding->type == coding_type_no_conversion) | |
| 3618 { | |
| 3619 /* We need no conversion. */ | |
| 3620 *beg = *end; | |
| 3621 return; | |
| 3622 } | |
| 3623 | |
| 3624 if (coding->heading_ascii >= 0) | |
| 3625 /* Detection routine has already found how much we can skip at the | |
| 3626 head. */ | |
| 3627 *beg += coding->heading_ascii; | |
| 3628 | |
| 3629 if (str) | |
| 3630 { | |
| 3631 begp_orig = begp = str + *beg; | |
| 3632 endp_orig = endp = str + *end; | |
| 3633 } | |
| 3634 else | |
| 3635 { | |
| 3636 move_gap (*beg); | |
| 3637 begp_orig = begp = GAP_END_ADDR; | |
| 3638 endp_orig = endp = begp + *end - *beg; | |
| 3639 } | |
| 3640 | |
| 3641 eol_conversion = (coding->eol_type != CODING_EOL_LF); | |
| 3642 | |
| 3643 switch (coding->type) | |
| 3644 { | |
| 3645 case coding_type_emacs_mule: | |
| 3646 case coding_type_raw_text: | |
| 3647 if (eol_conversion) | |
| 3648 { | |
| 3649 if (coding->heading_ascii < 0) | |
| 3650 while (begp < endp && *begp != '\r') begp++; | |
| 3651 while (begp < endp && *(endp - 1) != '\r') endp--; | |
| 3652 } | |
| 3653 else | |
| 3654 begp = endp; | |
| 3655 break; | |
| 3656 | |
| 3657 case coding_type_sjis: | |
| 3658 case coding_type_big5: | |
| 3659 /* We can skip all ASCII characters at the head. */ | |
| 3660 if (coding->heading_ascii < 0) | |
| 3661 { | |
| 3662 if (eol_conversion) | |
| 3663 while (begp < endp && *begp < 0x80 && *begp != '\n') begp++; | |
| 3664 else | |
| 3665 while (begp < endp && *begp < 0x80) begp++; | |
| 3666 } | |
| 3667 /* We can skip all ASCII characters at the tail except for the | |
| 3668 second byte of SJIS or BIG5 code. */ | |
| 3669 if (eol_conversion) | |
| 3670 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\n') endp--; | |
| 3671 else | |
| 3672 while (begp < endp && endp[-1] < 0x80) endp--; | |
| 3673 if (begp < endp && endp < endp_orig && endp[-1] >= 0x80) | |
| 3674 endp++; | |
| 3675 break; | |
| 3676 | |
| 3677 default: /* i.e. case coding_type_iso2022: */ | |
| 3678 if (coding->heading_ascii < 0) | |
| 3679 { | |
| 3680 unsigned char c; | |
| 3681 | |
| 3682 /* We can skip all ASCII characters at the head except for a | |
| 3683 few control codes. */ | |
| 3684 while (begp < endp && (c = *begp) < 0x80 | |
| 3685 && c != ISO_CODE_CR && c != ISO_CODE_SO | |
| 3686 && c != ISO_CODE_SI && c != ISO_CODE_ESC | |
| 3687 && (!eol_conversion || c != ISO_CODE_LF)) | |
| 3688 begp++; | |
| 3689 } | |
| 3690 switch (coding->category_idx) | |
| 3691 { | |
| 3692 case CODING_CATEGORY_IDX_ISO_8_1: | |
| 3693 case CODING_CATEGORY_IDX_ISO_8_2: | |
| 3694 /* We can skip all ASCII characters at the tail. */ | |
| 3695 if (eol_conversion) | |
| 3696 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\n') endp--; | |
| 3697 else | |
| 3698 while (begp < endp && endp[-1] < 0x80) endp--; | |
| 3699 break; | |
| 3700 | |
| 3701 case CODING_CATEGORY_IDX_ISO_7: | |
| 3702 case CODING_CATEGORY_IDX_ISO_7_TIGHT: | |
| 3703 /* We can skip all charactes at the tail except for ESC and | |
| 3704 the following 2-byte at the tail. */ | |
| 3705 if (eol_conversion) | |
| 3706 while (begp < endp && endp[-1] != ISO_CODE_ESC && endp[-1] != '\n') | |
| 3707 endp--; | |
| 3708 else | |
| 3709 while (begp < endp && endp[-1] != ISO_CODE_ESC) | |
| 3710 endp--; | |
| 3711 if (begp < endp && endp[-1] == ISO_CODE_ESC) | |
| 3712 { | |
| 3713 if (endp + 1 < endp_orig && end[0] == '(' && end[1] == 'B') | |
| 3714 /* This is an ASCII designation sequence. We can | |
| 3715 surely skip the tail. */ | |
| 3716 endp += 2; | |
| 3717 else | |
| 3718 /* Hmmm, we can't skip the tail. */ | |
| 3719 endp = endp_orig; | |
| 3720 } | |
| 3721 } | |
| 3722 } | |
| 3723 *beg += begp - begp_orig; | |
| 3724 *end += endp - endp_orig; | |
| 3725 return; | |
| 3726 } | |
| 3727 | |
| 3728 /* Like shrink_decoding_region but for encoding. */ | |
| 3729 | |
| 3730 static void | |
| 3731 shrink_encoding_region (beg, end, coding, str) | |
| 3732 int *beg, *end; | |
| 3733 struct coding_system *coding; | |
| 3734 unsigned char *str; | |
| 3735 { | |
| 3736 unsigned char *begp_orig, *begp, *endp_orig, *endp; | |
| 3737 int eol_conversion; | |
| 3738 | |
| 3739 if (coding->type == coding_type_ccl) | |
| 3740 /* We can't skip any data. */ | |
| 3741 return; | |
| 3742 else if (coding->type == coding_type_no_conversion) | |
| 3743 { | |
| 3744 /* We need no conversion. */ | |
| 3745 *beg = *end; | |
| 3746 return; | |
| 3747 } | |
| 3748 | |
| 3749 if (str) | |
| 3750 { | |
| 3751 begp_orig = begp = str + *beg; | |
| 3752 endp_orig = endp = str + *end; | |
| 3753 } | |
| 3754 else | |
| 3755 { | |
| 3756 move_gap (*beg); | |
| 3757 begp_orig = begp = GAP_END_ADDR; | |
| 3758 endp_orig = endp = begp + *end - *beg; | |
| 3759 } | |
| 3760 | |
| 3761 eol_conversion = (coding->eol_type == CODING_EOL_CR | |
| 3762 || coding->eol_type == CODING_EOL_CRLF); | |
| 3763 | |
| 3764 /* Here, we don't have to check coding->pre_write_conversion because | |
| 3765 the caller is expected to have handled it already. */ | |
| 3766 switch (coding->type) | |
| 3767 { | |
| 3768 case coding_type_undecided: | |
| 3769 case coding_type_emacs_mule: | |
| 3770 case coding_type_raw_text: | |
| 3771 if (eol_conversion) | |
| 3772 { | |
| 3773 while (begp < endp && *begp != '\n') begp++; | |
| 3774 while (begp < endp && endp[-1] != '\n') endp--; | |
| 3775 } | |
| 3776 else | |
| 3777 begp = endp; | |
| 3778 break; | |
| 3779 | |
| 3780 case coding_type_iso2022: | |
| 3781 if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL) | |
| 3782 { | |
| 3783 unsigned char *bol = begp; | |
| 3784 while (begp < endp && *begp < 0x80) | |
| 3785 { | |
| 3786 begp++; | |
| 3787 if (begp[-1] == '\n') | |
| 3788 bol = begp; | |
| 3789 } | |
| 3790 begp = bol; | |
| 3791 goto label_skip_tail; | |
| 3792 } | |
| 3793 /* fall down ... */ | |
| 3794 | |
| 3795 default: | |
| 3796 /* We can skip all ASCII characters at the head and tail. */ | |
| 3797 if (eol_conversion) | |
| 3798 while (begp < endp && *begp < 0x80 && *begp != '\n') begp++; | |
| 3799 else | |
| 3800 while (begp < endp && *begp < 0x80) begp++; | |
| 3801 label_skip_tail: | |
| 3802 if (eol_conversion) | |
| 3803 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\n') endp--; | |
| 3804 else | |
| 3805 while (begp < endp && *(endp - 1) < 0x80) endp--; | |
| 3806 break; | |
| 3807 } | |
| 3808 | |
| 3809 *beg += begp - begp_orig; | |
| 3810 *end += endp - endp_orig; | |
| 3811 return; | |
| 3812 } | |
| 3813 | |
| 3814 /* Decode (if ENCODEP is zero) or encode (if ENCODEP is nonzero) the | |
| 3815 text from FROM to TO by coding system CODING, and return number of | |
| 3816 characters in the resulting text. | |
| 3817 | |
| 3818 If ADJUST is nonzero, we do various things as if the original text | |
| 3819 is deleted and a new text is inserted. See the comments in | |
| 3820 replace_range (insdel.c) to know what we are doing. | |
| 3821 | |
| 3822 ADJUST nonzero also means that post-read-conversion or | |
| 3823 pre-write-conversion functions (if any) should be processed. */ | |
| 3057 | 3824 |
| 3058 int | 3825 int |
| 3059 decoding_buffer_size (coding, src_bytes) | 3826 code_convert_region (from, to, coding, encodep, adjust) |
| 3827 int from, to, encodep, adjust; | |
| 3060 struct coding_system *coding; | 3828 struct coding_system *coding; |
| 3061 int src_bytes; | 3829 { |
| 3062 { | 3830 int len = to - from, require, inserted, inserted_byte; |
| 3063 int magnification; | 3831 int from_byte, to_byte, len_byte; |
| 3064 | 3832 int from_byte_orig, to_byte_orig; |
| 3065 if (coding->type == coding_type_iso2022) | 3833 Lisp_Object saved_coding_symbol = Qnil; |
| 3066 magnification = 3; | 3834 |
| 3067 else if (coding->type == coding_type_ccl) | 3835 if (adjust) |
| 3068 magnification = coding->spec.ccl.decoder.buf_magnification; | 3836 { |
| 3837 prepare_to_modify_buffer (from, to, &from); | |
| 3838 to = from + len; | |
| 3839 } | |
| 3840 from_byte = CHAR_TO_BYTE (from); to_byte = CHAR_TO_BYTE (to); | |
| 3841 len_byte = from_byte - to_byte; | |
| 3842 | |
| 3843 if (! encodep && CODING_REQUIRE_DETECTION (coding)) | |
| 3844 { | |
| 3845 /* We must detect encoding of text and eol. Even if detection | |
| 3846 routines can't decide the encoding, we should not let them | |
| 3847 undecided because the deeper decoding routine (decode_coding) | |
| 3848 tries to detect the encodings in vain in that case. */ | |
| 3849 | |
| 3850 if (from < GPT && to > GPT) | |
| 3851 move_gap_both (from, from_byte); | |
| 3852 if (coding->type == coding_type_undecided) | |
| 3853 { | |
| 3854 detect_coding (coding, BYTE_POS_ADDR (from), len); | |
| 3855 if (coding->type == coding_type_undecided) | |
| 3856 coding->type = coding_type_emacs_mule; | |
| 3857 } | |
| 3858 if (coding->eol_type == CODING_EOL_UNDECIDED) | |
| 3859 { | |
| 3860 saved_coding_symbol = coding->symbol; | |
| 3861 detect_eol (coding, BYTE_POS_ADDR (from_byte), len_byte); | |
| 3862 if (coding->eol_type == CODING_EOL_UNDECIDED) | |
| 3863 coding->eol_type = CODING_EOL_LF; | |
| 3864 /* We had better recover the original eol format if we | |
| 3865 encounter an inconsitent eol format while decoding. */ | |
| 3866 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL; | |
| 3867 } | |
| 3868 } | |
| 3869 | |
| 3870 if (encodep | |
| 3871 ? ! CODING_REQUIRE_ENCODING (coding) | |
| 3872 : ! CODING_REQUIRE_DECODING (coding)) | |
| 3873 return len; | |
| 3874 | |
| 3875 /* Now we convert the text. */ | |
| 3876 | |
| 3877 /* For encoding, we must process pre-write-conversion in advance. */ | |
| 3878 if (encodep | |
| 3879 && adjust | |
| 3880 && ! NILP (coding->pre_write_conversion) | |
| 3881 && SYMBOLP (coding->pre_write_conversion) | |
| 3882 && ! NILP (Ffboundp (coding->pre_write_conversion))) | |
| 3883 { | |
| 3884 /* The function in pre-write-conversion put a new text in a new | |
| 3885 buffer. */ | |
| 3886 struct buffer *prev = current_buffer, *new; | |
| 3887 | |
| 3888 call2 (coding->pre_write_conversion, from, to); | |
| 3889 if (current_buffer != prev) | |
| 3890 { | |
| 3891 len = ZV - BEGV; | |
| 3892 new = current_buffer; | |
| 3893 set_buffer_internal_1 (prev); | |
| 3894 del_range (from, to); | |
| 3895 insert_from_buffer (new, BEG, len, 0); | |
| 3896 to = from + len; | |
| 3897 to_byte = CHAR_TO_BYTE (to); | |
| 3898 len_byte = to_byte - from_byte; | |
| 3899 } | |
| 3900 } | |
| 3901 | |
| 3902 /* Try to skip the heading and tailing ASCIIs. */ | |
| 3903 from_byte_orig = from_byte; to_byte_orig = to_byte; | |
| 3904 if (encodep) | |
| 3905 shrink_encoding_region (&from_byte, &to_byte, coding, NULL); | |
| 3069 else | 3906 else |
| 3070 magnification = 2; | 3907 shrink_decoding_region (&from_byte, &to_byte, coding, NULL); |
| 3071 | 3908 if (from_byte == to_byte) |
| 3072 return (src_bytes * magnification + CONVERSION_BUFFER_EXTRA_ROOM); | 3909 return len; |
| 3073 } | 3910 /* Here, the excluded region by shrinking contains only ASCIIs. */ |
| 3074 | 3911 from += (from_byte - from_byte_orig); |
| 3075 /* Return maximum size (bytes) of a buffer enough for encoding | 3912 to += (to_byte - to_byte_orig); |
| 3076 SRC_BYTES of text to CODING. */ | 3913 len = to - from; |
| 3077 | 3914 len_byte = to_byte - from_byte; |
| 3078 int | 3915 |
| 3079 encoding_buffer_size (coding, src_bytes) | 3916 /* For converion, we must put the gap before the text to be decoded |
| 3917 in addition to make the gap larger for efficient decoding. The | |
| 3918 required gap size starts from 2000 which is the magic number used | |
| 3919 in make_gap. But, after one batch of conversion, it will be | |
| 3920 incremented if we find that it is not enough . */ | |
| 3921 require = 2000; | |
| 3922 | |
| 3923 if (GAP_SIZE < require) | |
| 3924 make_gap (require - GAP_SIZE); | |
| 3925 move_gap_both (from, from_byte); | |
| 3926 | |
| 3927 if (adjust) | |
| 3928 adjust_before_replace (from, from_byte, to, to_byte); | |
| 3929 | |
| 3930 if (GPT - BEG < beg_unchanged) | |
| 3931 beg_unchanged = GPT - BEG; | |
| 3932 if (Z - GPT < end_unchanged) | |
| 3933 end_unchanged = Z - GPT; | |
| 3934 | |
| 3935 inserted = inserted_byte = 0; | |
| 3936 for (;;) | |
| 3937 { | |
| 3938 int result, diff_char, diff_byte; | |
| 3939 | |
| 3940 /* The buffer memory is changed from: | |
| 3941 +--------+converted-text+------------+-----original-text-----+---+ | |
| 3942 |<-from->|<--inserted-->|<-GAP_SIZE->|<---------len--------->|---| */ | |
| 3943 | |
| 3944 if (encodep) | |
| 3945 result = encode_coding (coding, GAP_END_ADDR, GPT_ADDR, len_byte, 0); | |
| 3946 else | |
| 3947 result = decode_coding (coding, GAP_END_ADDR, GPT_ADDR, len_byte, 0); | |
| 3948 /* to: | |
| 3949 +--------+-------converted-text--------+--+---original-text--+---+ | |
| 3950 |<-from->|<----(inserted+produced)---->|--|<-(len-consumed)->|---| */ | |
| 3951 | |
| 3952 diff_char = coding->produced_char - coding->consumed_char; | |
| 3953 diff_byte = coding->produced - coding->consumed; | |
| 3954 | |
| 3955 GAP_SIZE -= diff_byte; | |
| 3956 ZV += diff_char; ZV_BYTE += diff_byte; | |
| 3957 Z += diff_char; Z_BYTE += diff_byte; | |
| 3958 GPT += coding->produced_char; GPT_BYTE += coding->produced; | |
| 3959 | |
| 3960 inserted += coding->produced_char; | |
| 3961 inserted_byte += coding->produced; | |
| 3962 len -= coding->consumed_char; | |
| 3963 len_byte -= coding->consumed; | |
| 3964 | |
| 3965 if (! encodep && result == CODING_FINISH_INCONSISTENT_EOL) | |
| 3966 { | |
| 3967 unsigned char *p = GPT_ADDR - inserted_byte, *pend = GPT_ADDR; | |
| 3968 | |
| 3969 /* Encode LFs back to the original eol format (CR or CRLF). */ | |
| 3970 if (coding->eol_type == CODING_EOL_CR) | |
| 3971 { | |
| 3972 while (p < pend) if (*p++ == '\n') p[-1] = '\r'; | |
| 3973 } | |
| 3974 else | |
| 3975 { | |
| 3976 unsigned char *p2 = p; | |
| 3977 int count = 0; | |
| 3978 | |
| 3979 while (p2 < pend) if (*p2++ == '\n') count++; | |
| 3980 if (GAP_SIZE < count) | |
| 3981 make_gap (count - GAP_SIZE); | |
| 3982 p2 = GPT_ADDR + count; | |
| 3983 while (p < pend) | |
| 3984 { | |
| 3985 *--p2 = *--pend; | |
| 3986 if (*pend == '\n') *--p2 = '\r'; | |
| 3987 } | |
| 3988 GPT += count; GAP_SIZE -= count; ZV += count; Z += count; | |
| 3989 ZV_BYTE += count; Z_BYTE += count; | |
| 3990 coding->produced += count; | |
| 3991 coding->produced_char += count; | |
| 3992 inserted += count; | |
| 3993 inserted_byte += count; | |
| 3994 } | |
| 3995 | |
| 3996 /* Suppress eol-format conversion in the further conversion. */ | |
| 3997 coding->eol_type = CODING_EOL_LF; | |
| 3998 | |
| 3999 /* Restore the original symbol. */ | |
| 4000 coding->symbol = saved_coding_symbol; | |
| 4001 } | |
| 4002 if (len_byte <= 0) | |
| 4003 break; | |
| 4004 if (result == CODING_FINISH_INSUFFICIENT_SRC) | |
| 4005 { | |
| 4006 /* The source text ends in invalid codes. Let's just | |
| 4007 make them valid buffer contents, and finish conversion. */ | |
| 4008 inserted += len; | |
| 4009 inserted_byte += len_byte; | |
| 4010 break; | |
| 4011 } | |
| 4012 if (inserted == coding->produced_char) | |
| 4013 /* We have just done the first batch of conversion. Let's | |
| 4014 reconsider the required gap size now. | |
| 4015 | |
| 4016 We have converted CONSUMED bytes into PRODUCED bytes. To | |
| 4017 convert the remaining LEN bytes, we may need REQUIRE bytes | |
| 4018 of gap, where: | |
| 4019 REQUIRE + LEN = (LEN * PRODUCED / CONSUMED) | |
| 4020 REQUIRE = LEN * (PRODUCED - CONSUMED) / CONSUMED | |
| 4021 = LEN * DIFF / CONSUMED | |
| 4022 Here, we are sure that DIFF is positive. */ | |
| 4023 require = len_byte * diff_byte / coding->consumed; | |
| 4024 if (GAP_SIZE < require) | |
| 4025 make_gap (require - GAP_SIZE); | |
| 4026 } | |
| 4027 if (GAP_SIZE > 0) *GPT_ADDR = 0; /* Put an anchor. */ | |
| 4028 | |
| 4029 if (adjust) | |
| 4030 { | |
| 4031 adjust_after_replace (from, from_byte, to, to_byte, | |
| 4032 inserted, inserted_byte); | |
| 4033 | |
| 4034 if (! encodep && ! NILP (coding->post_read_conversion)) | |
| 4035 { | |
| 4036 Lisp_Object val; | |
| 4037 int orig_inserted = inserted, pos = PT; | |
| 4038 | |
| 4039 temp_set_point_both (current_buffer, from, from_byte); | |
| 4040 val = call1 (coding->post_read_conversion, make_number (inserted)); | |
| 4041 if (! NILP (val)) | |
| 4042 { | |
| 4043 CHECK_NUMBER (val, 0); | |
| 4044 inserted = XFASTINT (val); | |
| 4045 } | |
| 4046 if (pos >= from + orig_inserted) | |
| 4047 temp_set_point (current_buffer, pos + (inserted - orig_inserted)); | |
| 4048 } | |
| 4049 } | |
| 4050 | |
| 4051 return ((from_byte - from_byte_orig) + inserted + (to_byte_orig - to_byte)); | |
| 4052 } | |
| 4053 | |
| 4054 Lisp_Object | |
| 4055 code_convert_string (str, coding, encodep, nocopy) | |
| 4056 Lisp_Object str; | |
| 3080 struct coding_system *coding; | 4057 struct coding_system *coding; |
| 3081 int src_bytes; | 4058 int encodep, nocopy; |
| 3082 { | 4059 { |
| 3083 int magnification; | 4060 int len; |
| 3084 | 4061 char *buf; |
| 3085 if (coding->type == coding_type_ccl) | 4062 int from = 0, to = XSTRING (str)->size, to_byte = XSTRING (str)->size_byte; |
| 3086 magnification = coding->spec.ccl.encoder.buf_magnification; | 4063 struct gcpro gcpro1; |
| 4064 Lisp_Object saved_coding_symbol = Qnil; | |
| 4065 int result; | |
| 4066 | |
| 4067 if (encodep && !NILP (coding->pre_write_conversion) | |
| 4068 || !encodep && !NILP (coding->post_read_conversion)) | |
| 4069 { | |
| 4070 /* Since we have to call Lisp functions which assume target text | |
| 4071 is in a buffer, after setting a temporary buffer, call | |
| 4072 code_convert_region. */ | |
| 4073 int count = specpdl_ptr - specpdl; | |
| 4074 struct buffer *prev = current_buffer; | |
| 4075 | |
| 4076 record_unwind_protect (Fset_buffer, Fcurrent_buffer ()); | |
| 4077 temp_output_buffer_setup (" *code-converting-work*"); | |
| 4078 set_buffer_internal (XBUFFER (Vstandard_output)); | |
| 4079 if (encodep) | |
| 4080 insert_from_string (str, 0, 0, to, to_byte, 0); | |
| 4081 else | |
| 4082 { | |
| 4083 /* We must insert the contents of STR as is without | |
| 4084 unibyte<->multibyte conversion. */ | |
| 4085 current_buffer->enable_multibyte_characters = Qnil; | |
| 4086 insert_from_string (str, 0, 0, to_byte, to_byte, 0); | |
| 4087 current_buffer->enable_multibyte_characters = Qt; | |
| 4088 } | |
| 4089 code_convert_region (BEGV, ZV, coding, encodep, 1); | |
| 4090 if (encodep) | |
| 4091 /* We must return the buffer contents as unibyte string. */ | |
| 4092 current_buffer->enable_multibyte_characters = Qnil; | |
| 4093 str = make_buffer_string (BEGV, ZV, 0); | |
| 4094 set_buffer_internal (prev); | |
| 4095 return unbind_to (count, str); | |
| 4096 } | |
| 4097 | |
| 4098 if (! encodep && CODING_REQUIRE_DETECTION (coding)) | |
| 4099 { | |
| 4100 /* See the comments in code_convert_region. */ | |
| 4101 if (coding->type == coding_type_undecided) | |
| 4102 { | |
| 4103 detect_coding (coding, XSTRING (str)->data, to_byte); | |
| 4104 if (coding->type == coding_type_undecided) | |
| 4105 coding->type = coding_type_emacs_mule; | |
| 4106 } | |
| 4107 if (coding->eol_type == CODING_EOL_UNDECIDED) | |
| 4108 { | |
| 4109 saved_coding_symbol = coding->symbol; | |
| 4110 detect_eol (coding, XSTRING (str)->data, to_byte); | |
| 4111 if (coding->eol_type == CODING_EOL_UNDECIDED) | |
| 4112 coding->eol_type = CODING_EOL_LF; | |
| 4113 /* We had better recover the original eol format if we | |
| 4114 encounter an inconsitent eol format while decoding. */ | |
| 4115 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL; | |
| 4116 } | |
| 4117 } | |
| 4118 | |
| 4119 if (encodep | |
| 4120 ? ! CODING_REQUIRE_ENCODING (coding) | |
| 4121 : ! CODING_REQUIRE_DECODING (coding)) | |
| 4122 from = to_byte; | |
| 3087 else | 4123 else |
| 3088 magnification = 3; | 4124 { |
| 3089 | 4125 /* Try to skip the heading and tailing ASCIIs. */ |
| 3090 return (src_bytes * magnification + CONVERSION_BUFFER_EXTRA_ROOM); | 4126 if (encodep) |
| 3091 } | 4127 shrink_encoding_region (&from, &to_byte, coding, XSTRING (str)->data); |
| 3092 | 4128 else |
| 3093 #ifndef MINIMUM_CONVERSION_BUFFER_SIZE | 4129 shrink_decoding_region (&from, &to_byte, coding, XSTRING (str)->data); |
| 3094 #define MINIMUM_CONVERSION_BUFFER_SIZE 1024 | 4130 } |
| 3095 #endif | 4131 if (from == to_byte) |
| 3096 | 4132 return (nocopy ? str : Fcopy_sequence (str)); |
| 3097 char *conversion_buffer; | 4133 |
| 3098 int conversion_buffer_size; | 4134 if (encodep) |
| 3099 | 4135 len = encoding_buffer_size (coding, to_byte - from); |
| 3100 /* Return a pointer to a SIZE bytes of buffer to be used for encoding | 4136 else |
| 3101 or decoding. Sufficient memory is allocated automatically. If we | 4137 len = decoding_buffer_size (coding, to_byte - from); |
| 3102 run out of memory, return NULL. */ | 4138 len += from + XSTRING (str)->size_byte - to_byte; |
| 3103 | 4139 GCPRO1 (str); |
| 3104 char * | 4140 buf = get_conversion_buffer (len); |
| 3105 get_conversion_buffer (size) | 4141 UNGCPRO; |
| 3106 int size; | 4142 |
| 3107 { | 4143 if (from > 0) |
| 3108 if (size > conversion_buffer_size) | 4144 bcopy (XSTRING (str)->data, buf, from); |
| 3109 { | 4145 result = (encodep |
| 3110 char *buf; | 4146 ? encode_coding (coding, XSTRING (str)->data + from, |
| 3111 int real_size = conversion_buffer_size * 2; | 4147 buf + from, to_byte - from, len) |
| 3112 | 4148 : decode_coding (coding, XSTRING (str)->data + from, |
| 3113 while (real_size < size) real_size *= 2; | 4149 buf + from, to - from, len)); |
| 3114 buf = (char *) xmalloc (real_size); | 4150 if (! encodep && result == CODING_FINISH_INCONSISTENT_EOL) |
| 3115 xfree (conversion_buffer); | 4151 { |
| 3116 conversion_buffer = buf; | 4152 /* We simple try to decode the whole string again but without |
| 3117 conversion_buffer_size = real_size; | 4153 eol-conversion this time. */ |
| 3118 } | 4154 coding->eol_type = CODING_EOL_LF; |
| 3119 return conversion_buffer; | 4155 coding->symbol = saved_coding_symbol; |
| 4156 return code_convert_string (str, coding, encodep, nocopy); | |
| 4157 } | |
| 4158 | |
| 4159 bcopy (XSTRING (str)->data + to_byte, buf + from + coding->produced, | |
| 4160 XSTRING (str)->size_byte - to_byte); | |
| 4161 | |
| 4162 len = from + XSTRING (str)->size_byte - to_byte; | |
| 4163 if (encodep) | |
| 4164 str = make_unibyte_string (buf, len + coding->produced); | |
| 4165 else | |
| 4166 str = make_multibyte_string (buf, len + coding->produced_char, | |
| 4167 len + coding->produced); | |
| 4168 return str; | |
| 3120 } | 4169 } |
| 3121 | 4170 |
| 3122 | 4171 |
| 3123 #ifdef emacs | 4172 #ifdef emacs |
| 3124 /*** 7. Emacs Lisp library functions ***/ | 4173 /*** 7. Emacs Lisp library functions ***/ |
| 3185 return coding_system; | 4234 return coding_system; |
| 3186 while (1) | 4235 while (1) |
| 3187 Fsignal (Qcoding_system_error, Fcons (coding_system, Qnil)); | 4236 Fsignal (Qcoding_system_error, Fcons (coding_system, Qnil)); |
| 3188 } | 4237 } |
| 3189 | 4238 |
| 3190 DEFUN ("detect-coding-region", Fdetect_coding_region, Sdetect_coding_region, | 4239 Lisp_Object |
| 3191 2, 2, 0, | 4240 detect_coding_system (src, src_bytes, highest) |
| 3192 "Detect coding system of the text in the region between START and END.\n\ | 4241 unsigned char *src; |
| 3193 Return a list of possible coding systems ordered by priority.\n\ | 4242 int src_bytes, highest; |
| 3194 If only ASCII characters are found, it returns `undecided'\n\ | |
| 3195 or its subsidiary coding system according to a detected end-of-line format.") | |
| 3196 (b, e) | |
| 3197 Lisp_Object b, e; | |
| 3198 { | 4243 { |
| 3199 int coding_mask, eol_type; | 4244 int coding_mask, eol_type; |
| 3200 Lisp_Object val; | 4245 Lisp_Object val, tmp; |
| 3201 int beg, end; | 4246 int dummy; |
| 3202 int beg_byte, end_byte; | 4247 |
| 3203 | 4248 coding_mask = detect_coding_mask (src, src_bytes, NULL, &dummy); |
| 3204 validate_region (&b, &e); | 4249 eol_type = detect_eol_type (src, src_bytes, &dummy); |
| 3205 beg = XINT (b), end = XINT (e); | 4250 if (eol_type == CODING_EOL_INCONSISTENT) |
| 3206 beg_byte = CHAR_TO_BYTE (beg); | 4251 eol_type == CODING_EOL_UNDECIDED; |
| 3207 end_byte = CHAR_TO_BYTE (end); | 4252 |
| 3208 | 4253 if (!coding_mask) |
| 3209 if (beg < GPT && end >= GPT) | |
| 3210 move_gap_both (end, end_byte); | |
| 3211 | |
| 3212 coding_mask = detect_coding_mask (BYTE_POS_ADDR (beg_byte), | |
| 3213 end_byte - beg_byte); | |
| 3214 eol_type = detect_eol_type (BYTE_POS_ADDR (beg_byte), end_byte - beg_byte); | |
| 3215 | |
| 3216 if (coding_mask == CODING_CATEGORY_MASK_ANY) | |
| 3217 { | 4254 { |
| 3218 val = Qundecided; | 4255 val = Qundecided; |
| 3219 if (eol_type != CODING_EOL_UNDECIDED | 4256 if (eol_type != CODING_EOL_UNDECIDED) |
| 3220 && eol_type != CODING_EOL_INCONSISTENT) | |
| 3221 { | 4257 { |
| 3222 Lisp_Object val2; | 4258 Lisp_Object val2; |
| 3223 val2 = Fget (Qundecided, Qeol_type); | 4259 val2 = Fget (Qundecided, Qeol_type); |
| 3224 if (VECTORP (val2)) | 4260 if (VECTORP (val2)) |
| 3225 val = XVECTOR (val2)->contents[eol_type]; | 4261 val = XVECTOR (val2)->contents[eol_type]; |
| 3226 } | 4262 } |
| 3227 } | 4263 return val; |
| 3228 else | 4264 } |
| 3229 { | 4265 |
| 3230 Lisp_Object val2; | 4266 /* At first, gather possible coding systems in VAL. */ |
| 3231 | 4267 val = Qnil; |
| 3232 /* At first, gather possible coding-systems in VAL in a reverse | 4268 for (tmp = Vcoding_category_list; !NILP (tmp); tmp = XCONS (tmp)->cdr) |
| 3233 order. */ | 4269 { |
| 3234 val = Qnil; | 4270 int idx |
| 3235 for (val2 = Vcoding_category_list; | 4271 = XFASTINT (Fget (XCONS (tmp)->car, Qcoding_category_index)); |
| 3236 !NILP (val2); | 4272 if (coding_mask & (1 << idx)) |
| 3237 val2 = XCONS (val2)->cdr) | 4273 { |
| 3238 { | 4274 val = Fcons (Fsymbol_value (XCONS (tmp)->car), val); |
| 3239 int idx | 4275 if (highest) |
| 3240 = XFASTINT (Fget (XCONS (val2)->car, Qcoding_category_index)); | 4276 break; |
| 3241 if (coding_mask & (1 << idx)) | 4277 } |
| 3242 { | 4278 } |
| 3243 #if 0 | 4279 if (!highest) |
| 3244 /* This code is suppressed until we find a better way to | 4280 val = Fnreverse (val); |
| 3245 distinguish raw text file and binary file. */ | 4281 |
| 3246 | 4282 /* Then, substitute the elements by subsidiary coding systems. */ |
| 3247 if (idx == CODING_CATEGORY_IDX_RAW_TEXT | 4283 for (tmp = val; !NILP (tmp); tmp = XCONS (tmp)->cdr) |
| 3248 && eol_type == CODING_EOL_INCONSISTENT) | 4284 { |
| 3249 val = Fcons (Qno_conversion, val); | 4285 if (eol_type != CODING_EOL_UNDECIDED) |
| 3250 else | 4286 { |
| 3251 #endif /* 0 */ | 4287 Lisp_Object eol; |
| 3252 val = Fcons (Fsymbol_value (XCONS (val2)->car), val); | 4288 eol = Fget (XCONS (tmp)->car, Qeol_type); |
| 3253 } | 4289 if (VECTORP (eol)) |
| 3254 } | 4290 XCONS (tmp)->car = XVECTOR (eol)->contents[eol_type]; |
| 3255 | 4291 } |
| 3256 /* Then, change the order of the list, while getting subsidiary | 4292 } |
| 3257 coding-systems. */ | 4293 return (highest ? XCONS (val)->car : val); |
| 3258 val2 = val; | 4294 } |
| 3259 val = Qnil; | 4295 |
| 3260 if (eol_type == CODING_EOL_INCONSISTENT) | 4296 DEFUN ("detect-coding-region", Fdetect_coding_region, Sdetect_coding_region, |
| 3261 eol_type == CODING_EOL_UNDECIDED; | 4297 2, 3, 0, |
| 3262 for (; !NILP (val2); val2 = XCONS (val2)->cdr) | 4298 "Detect coding system of the text in the region between START and END.\n\ |
| 3263 { | 4299 Return a list of possible coding systems ordered by priority.\n\ |
| 3264 if (eol_type == CODING_EOL_UNDECIDED) | 4300 \n\ |
| 3265 val = Fcons (XCONS (val2)->car, val); | 4301 If only ASCII characters are found, it returns `undecided'\n\ |
| 3266 else | 4302 or its subsidiary coding system according to a detected end-of-line format.\n\ |
| 3267 { | 4303 \n\ |
| 3268 Lisp_Object val3; | 4304 If optional argument HIGHEST is non-nil, return the coding system of\n\ |
| 3269 val3 = Fget (XCONS (val2)->car, Qeol_type); | 4305 highest priority.") |
| 3270 if (VECTORP (val3)) | 4306 (start, end, highest) |
| 3271 val = Fcons (XVECTOR (val3)->contents[eol_type], val); | 4307 Lisp_Object start, end, highest; |
| 3272 else | 4308 { |
| 3273 val = Fcons (XCONS (val2)->car, val); | 4309 int from, to; |
| 3274 } | 4310 int from_byte, to_byte; |
| 3275 } | 4311 |
| 3276 } | 4312 CHECK_NUMBER_COERCE_MARKER (start, 0); |
| 3277 | 4313 CHECK_NUMBER_COERCE_MARKER (end, 1); |
| 3278 return val; | 4314 |
| 3279 } | 4315 validate_region (&start, &end); |
| 3280 | 4316 from = XINT (start), to = XINT (end); |
| 3281 /* Scan text in the region between *BEGP and *ENDP, skip characters | 4317 from_byte = CHAR_TO_BYTE (from); |
| 3282 which we never have to encode to (iff ENCODEP is 1) or decode from | 4318 to_byte = CHAR_TO_BYTE (to); |
| 3283 coding system CODING at the head and tail, then set BEGP and ENDP | 4319 |
| 3284 to the addresses of start and end of the text we actually convert. */ | 4320 if (from < GPT && to >= GPT) |
| 3285 | 4321 move_gap_both (to, to_byte); |
| 3286 void | 4322 |
| 3287 shrink_conversion_area (begp, endp, coding, encodep) | 4323 return detect_coding_system (BYTE_POS_ADDR (from_byte), |
| 3288 unsigned char **begp, **endp; | 4324 to_byte - from_byte, |
| 3289 struct coding_system *coding; | 4325 !NILP (highest)); |
| 3290 int encodep; | 4326 } |
| 3291 { | 4327 |
| 3292 register unsigned char *beg_addr = *begp, *end_addr = *endp; | 4328 DEFUN ("detect-coding-string", Fdetect_coding_string, Sdetect_coding_string, |
| 3293 | 4329 1, 2, 0, |
| 3294 if (coding->eol_type != CODING_EOL_LF | 4330 "Detect coding system of the text in STRING.\n\ |
| 3295 && coding->eol_type != CODING_EOL_UNDECIDED) | 4331 Return a list of possible coding systems ordered by priority.\n\ |
| 3296 /* Since we anyway have to convert end-of-line format, it is not | 4332 \n\ |
| 3297 worth skipping at most 100 bytes or so. */ | 4333 If only ASCII characters are found, it returns `undecided'\n\ |
| 3298 return; | 4334 or its subsidiary coding system according to a detected end-of-line format.\n\ |
| 3299 | 4335 \n\ |
| 3300 if (encodep) /* for encoding */ | 4336 If optional argument HIGHEST is non-nil, return the coding system of\n\ |
| 3301 { | 4337 highest priority.") |
| 3302 switch (coding->type) | 4338 (string, highest) |
| 3303 { | 4339 Lisp_Object string, highest; |
| 3304 case coding_type_no_conversion: | 4340 { |
| 3305 case coding_type_emacs_mule: | 4341 CHECK_STRING (string, 0); |
| 3306 case coding_type_undecided: | 4342 |
| 3307 case coding_type_raw_text: | 4343 return detect_coding_system (XSTRING (string)->data, |
| 3308 /* We need no conversion. */ | 4344 XSTRING (string)->size_byte, |
| 3309 *begp = *endp; | 4345 !NILP (highest)); |
| 3310 return; | |
| 3311 case coding_type_ccl: | |
| 3312 /* We can't skip any data. */ | |
| 3313 return; | |
| 3314 case coding_type_iso2022: | |
| 3315 if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL) | |
| 3316 { | |
| 3317 unsigned char *bol = beg_addr; | |
| 3318 while (beg_addr < end_addr && *beg_addr < 0x80) | |
| 3319 { | |
| 3320 beg_addr++; | |
| 3321 if (*(beg_addr - 1) == '\n') | |
| 3322 bol = beg_addr; | |
| 3323 } | |
| 3324 beg_addr = bol; | |
| 3325 goto label_skip_tail; | |
| 3326 } | |
| 3327 /* fall down ... */ | |
| 3328 default: | |
| 3329 /* We can skip all ASCII characters at the head and tail. */ | |
| 3330 while (beg_addr < end_addr && *beg_addr < 0x80) beg_addr++; | |
| 3331 label_skip_tail: | |
| 3332 while (beg_addr < end_addr && *(end_addr - 1) < 0x80) end_addr--; | |
| 3333 break; | |
| 3334 } | |
| 3335 } | |
| 3336 else /* for decoding */ | |
| 3337 { | |
| 3338 switch (coding->type) | |
| 3339 { | |
| 3340 case coding_type_no_conversion: | |
| 3341 /* We need no conversion. */ | |
| 3342 *begp = *endp; | |
| 3343 return; | |
| 3344 case coding_type_emacs_mule: | |
| 3345 case coding_type_raw_text: | |
| 3346 if (coding->eol_type == CODING_EOL_LF) | |
| 3347 { | |
| 3348 /* We need no conversion. */ | |
| 3349 *begp = *endp; | |
| 3350 return; | |
| 3351 } | |
| 3352 /* We can skip all but carriage-return. */ | |
| 3353 while (beg_addr < end_addr && *beg_addr != '\r') beg_addr++; | |
| 3354 while (beg_addr < end_addr && *(end_addr - 1) != '\r') end_addr--; | |
| 3355 break; | |
| 3356 case coding_type_sjis: | |
| 3357 case coding_type_big5: | |
| 3358 /* We can skip all ASCII characters at the head. */ | |
| 3359 while (beg_addr < end_addr && *beg_addr < 0x80) beg_addr++; | |
| 3360 /* We can skip all ASCII characters at the tail except for | |
| 3361 the second byte of SJIS or BIG5 code. */ | |
| 3362 while (beg_addr < end_addr && *(end_addr - 1) < 0x80) end_addr--; | |
| 3363 if (end_addr != *endp) | |
| 3364 end_addr++; | |
| 3365 break; | |
| 3366 case coding_type_ccl: | |
| 3367 /* We can't skip any data. */ | |
| 3368 return; | |
| 3369 default: /* i.e. case coding_type_iso2022: */ | |
| 3370 { | |
| 3371 unsigned char c; | |
| 3372 | |
| 3373 /* We can skip all ASCII characters except for a few | |
| 3374 control codes at the head. */ | |
| 3375 while (beg_addr < end_addr && (c = *beg_addr) < 0x80 | |
| 3376 && c != ISO_CODE_CR && c != ISO_CODE_SO | |
| 3377 && c != ISO_CODE_SI && c != ISO_CODE_ESC) | |
| 3378 beg_addr++; | |
| 3379 } | |
| 3380 break; | |
| 3381 } | |
| 3382 } | |
| 3383 *begp = beg_addr; | |
| 3384 *endp = end_addr; | |
| 3385 return; | |
| 3386 } | |
| 3387 | |
| 3388 /* Encode into or decode from (according to ENCODEP) coding system CODING | |
| 3389 the text between char positions B and E. */ | |
| 3390 | |
| 3391 Lisp_Object | |
| 3392 code_convert_region (b, e, coding, encodep) | |
| 3393 Lisp_Object b, e; | |
| 3394 struct coding_system *coding; | |
| 3395 int encodep; | |
| 3396 { | |
| 3397 int beg, end, len, consumed, produced; | |
| 3398 char *buf; | |
| 3399 unsigned char *begp, *endp; | |
| 3400 int opoint = PT, opoint_byte = PT_BYTE; | |
| 3401 int beg_byte, end_byte, len_byte; | |
| 3402 int zv_before = ZV; | |
| 3403 int zv_byte_before = ZV_BYTE; | |
| 3404 | |
| 3405 validate_region (&b, &e); | |
| 3406 beg = XINT (b), end = XINT (e); | |
| 3407 beg_byte = CHAR_TO_BYTE (beg); | |
| 3408 end_byte = CHAR_TO_BYTE (end); | |
| 3409 | |
| 3410 if (beg < GPT && end >= GPT) | |
| 3411 move_gap_both (end, end_byte); | |
| 3412 | |
| 3413 if (encodep && !NILP (coding->pre_write_conversion)) | |
| 3414 { | |
| 3415 /* We must call a pre-conversion function which may put a new | |
| 3416 text to be converted in a new buffer. */ | |
| 3417 struct buffer *old = current_buffer, *new; | |
| 3418 | |
| 3419 TEMP_SET_PT_BOTH (beg, beg_byte); | |
| 3420 call2 (coding->pre_write_conversion, b, e); | |
| 3421 if (old != current_buffer) | |
| 3422 { | |
| 3423 /* Replace the original text by the text just generated. */ | |
| 3424 len = ZV - BEGV; | |
| 3425 len_byte = ZV_BYTE - BEGV_BYTE; | |
| 3426 new = current_buffer; | |
| 3427 set_buffer_internal (old); | |
| 3428 del_range_both (beg, end, beg_byte, end_byte, 1); | |
| 3429 insert_from_buffer (new, 1, len, 0); | |
| 3430 end = beg + len; | |
| 3431 end_byte = len_byte; | |
| 3432 } | |
| 3433 } | |
| 3434 | |
| 3435 /* We may be able to shrink the conversion region. */ | |
| 3436 begp = BYTE_POS_ADDR (beg_byte); | |
| 3437 endp = begp + (end_byte - beg_byte); | |
| 3438 shrink_conversion_area (&begp, &endp, coding, encodep); | |
| 3439 | |
| 3440 if (begp == endp) | |
| 3441 /* We need no conversion. */ | |
| 3442 len = end - beg; | |
| 3443 else | |
| 3444 { | |
| 3445 int shrunk_beg_byte, shrunk_end_byte; | |
| 3446 int shrunk_beg; | |
| 3447 int shrunk_len_byte; | |
| 3448 int new_len_byte; | |
| 3449 int buflen; | |
| 3450 | |
| 3451 shrunk_beg_byte = PTR_BYTE_POS (begp); | |
| 3452 shrunk_beg = BYTE_TO_CHAR (shrunk_beg_byte); | |
| 3453 shrunk_end_byte = PTR_BYTE_POS (endp); | |
| 3454 shrunk_len_byte = shrunk_end_byte - shrunk_beg_byte; | |
| 3455 | |
| 3456 if (encodep) | |
| 3457 buflen = encoding_buffer_size (coding, shrunk_len_byte); | |
| 3458 else | |
| 3459 buflen = decoding_buffer_size (coding, shrunk_len_byte); | |
| 3460 buf = get_conversion_buffer (buflen); | |
| 3461 | |
| 3462 coding->last_block = 1; | |
| 3463 produced = (encodep | |
| 3464 ? encode_coding (coding, begp, buf, shrunk_len_byte, buflen, | |
| 3465 &consumed) | |
| 3466 : decode_coding (coding, begp, buf, shrunk_len_byte, buflen, | |
| 3467 &consumed)); | |
| 3468 | |
| 3469 TEMP_SET_PT_BOTH (shrunk_beg, shrunk_beg_byte); | |
| 3470 | |
| 3471 /* We let the number of characters in the result | |
| 3472 be computed in accord with enable-multilibyte-characters | |
| 3473 even when encoding. Otherwise the buffer contents | |
| 3474 will be inconsistent. */ | |
| 3475 insert (buf, produced); | |
| 3476 | |
| 3477 del_range_byte (PT_BYTE, PT_BYTE + shrunk_len_byte, 1); | |
| 3478 | |
| 3479 if (opoint >= end) | |
| 3480 { | |
| 3481 opoint += ZV - zv_before; | |
| 3482 opoint_byte += ZV_BYTE - zv_byte_before; | |
| 3483 } | |
| 3484 else if (opoint > beg) | |
| 3485 { | |
| 3486 opoint = beg; | |
| 3487 opoint_byte = beg_byte; | |
| 3488 } | |
| 3489 TEMP_SET_PT_BOTH (opoint, opoint_byte); | |
| 3490 | |
| 3491 end += ZV - zv_before; | |
| 3492 } | |
| 3493 | |
| 3494 if (!encodep && !NILP (coding->post_read_conversion)) | |
| 3495 { | |
| 3496 Lisp_Object insval; | |
| 3497 | |
| 3498 /* We must call a post-conversion function which may alter | |
| 3499 the text just converted. */ | |
| 3500 zv_before = ZV; | |
| 3501 zv_byte_before = ZV_BYTE; | |
| 3502 | |
| 3503 TEMP_SET_PT_BOTH (beg, beg_byte); | |
| 3504 insval = call1 (coding->post_read_conversion, make_number (end - beg)); | |
| 3505 CHECK_NUMBER (insval, 0); | |
| 3506 | |
| 3507 if (opoint >= beg + ZV - zv_before) | |
| 3508 { | |
| 3509 opoint += ZV - zv_before; | |
| 3510 opoint_byte += ZV_BYTE - zv_byte_before; | |
| 3511 } | |
| 3512 else if (opoint > beg) | |
| 3513 { | |
| 3514 opoint = beg; | |
| 3515 opoint_byte = beg_byte; | |
| 3516 } | |
| 3517 TEMP_SET_PT_BOTH (opoint, opoint_byte); | |
| 3518 len = XINT (insval); | |
| 3519 } | |
| 3520 | |
| 3521 return make_number (len); | |
| 3522 } | 4346 } |
| 3523 | 4347 |
| 3524 DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region, | 4348 DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region, |
| 3525 3, 3, "r\nzCoding system: ", | 4349 3, 3, "r\nzCoding system: ", |
| 3526 "Decode current region by specified coding system.\n\ | 4350 "Decode the current region by specified coding system.\n\ |
| 3527 When called from a program, takes three arguments:\n\ | 4351 When called from a program, takes three arguments:\n\ |
| 3528 START, END, and CODING-SYSTEM. START END are buffer positions.\n\ | 4352 START, END, and CODING-SYSTEM. START and END are buffer positions.\n\ |
| 3529 Return length of decoded text.") | 4353 Return length of decoded text.") |
| 3530 (b, e, coding_system) | 4354 (start, end, coding_system) |
| 3531 Lisp_Object b, e, coding_system; | 4355 Lisp_Object start, end, coding_system; |
| 3532 { | 4356 { |
| 3533 struct coding_system coding; | 4357 struct coding_system coding; |
| 3534 | 4358 int from, to; |
| 3535 CHECK_NUMBER_COERCE_MARKER (b, 0); | 4359 |
| 3536 CHECK_NUMBER_COERCE_MARKER (e, 1); | 4360 CHECK_NUMBER_COERCE_MARKER (start, 0); |
| 4361 CHECK_NUMBER_COERCE_MARKER (end, 1); | |
| 3537 CHECK_SYMBOL (coding_system, 2); | 4362 CHECK_SYMBOL (coding_system, 2); |
| 3538 | 4363 |
| 4364 validate_region (&start, &end); | |
| 4365 from = XFASTINT (start); | |
| 4366 to = XFASTINT (end); | |
| 4367 | |
| 3539 if (NILP (coding_system)) | 4368 if (NILP (coding_system)) |
| 3540 return make_number (XFASTINT (e) - XFASTINT (b)); | 4369 return make_number (to - from); |
| 4370 | |
| 3541 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) | 4371 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) |
| 3542 error ("Invalid coding-system: %s", XSYMBOL (coding_system)->name->data); | 4372 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data); |
| 3543 | 4373 |
| 3544 return code_convert_region (b, e, &coding, 0); | 4374 coding.mode |= CODING_MODE_LAST_BLOCK; |
| 4375 return code_convert_region (from, to, &coding, 0, 1); | |
| 3545 } | 4376 } |
| 3546 | 4377 |
| 3547 DEFUN ("encode-coding-region", Fencode_coding_region, Sencode_coding_region, | 4378 DEFUN ("encode-coding-region", Fencode_coding_region, Sencode_coding_region, |
| 3548 3, 3, "r\nzCoding system: ", | 4379 3, 3, "r\nzCoding system: ", |
| 3549 "Encode current region by specified coding system.\n\ | 4380 "Encode the current region by specified coding system.\n\ |
| 3550 When called from a program, takes three arguments:\n\ | 4381 When called from a program, takes three arguments:\n\ |
| 3551 START, END, and CODING-SYSTEM. START END are buffer positions.\n\ | 4382 START, END, and CODING-SYSTEM. START and END are buffer positions.\n\ |
| 3552 Return length of encoded text.") | 4383 Return length of encoded text.") |
| 3553 (b, e, coding_system) | 4384 (start, end, coding_system) |
| 3554 Lisp_Object b, e, coding_system; | 4385 Lisp_Object start, end, coding_system; |
| 3555 { | 4386 { |
| 3556 struct coding_system coding; | 4387 struct coding_system coding; |
| 3557 | 4388 int from, to; |
| 3558 CHECK_NUMBER_COERCE_MARKER (b, 0); | 4389 |
| 3559 CHECK_NUMBER_COERCE_MARKER (e, 1); | 4390 CHECK_NUMBER_COERCE_MARKER (start, 0); |
| 4391 CHECK_NUMBER_COERCE_MARKER (end, 1); | |
| 3560 CHECK_SYMBOL (coding_system, 2); | 4392 CHECK_SYMBOL (coding_system, 2); |
| 3561 | 4393 |
| 4394 validate_region (&start, &end); | |
| 4395 from = XFASTINT (start); | |
| 4396 to = XFASTINT (end); | |
| 4397 | |
| 3562 if (NILP (coding_system)) | 4398 if (NILP (coding_system)) |
| 3563 return make_number (XFASTINT (e) - XFASTINT (b)); | 4399 return make_number (to - from); |
| 4400 | |
| 3564 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) | 4401 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) |
| 3565 error ("Invalid coding-system: %s", XSYMBOL (coding_system)->name->data); | 4402 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data); |
| 3566 | 4403 |
| 3567 return code_convert_region (b, e, &coding, 1); | 4404 coding.mode |= CODING_MODE_LAST_BLOCK; |
| 3568 } | 4405 return code_convert_region (from, to, &coding, 1, 1); |
| 3569 | |
| 3570 /* Encode or decode (according to ENCODEP) the text of string STR | |
| 3571 using coding CODING. If NOCOPY is nil, we never return STR | |
| 3572 itself, but always a copy. If NOCOPY is non-nil, we return STR | |
| 3573 if no change is needed. */ | |
| 3574 | |
| 3575 Lisp_Object | |
| 3576 code_convert_string (str, coding, encodep, nocopy) | |
| 3577 Lisp_Object str, nocopy; | |
| 3578 struct coding_system *coding; | |
| 3579 int encodep; | |
| 3580 { | |
| 3581 int len, consumed, produced; | |
| 3582 char *buf; | |
| 3583 unsigned char *begp, *endp; | |
| 3584 int head_skip, tail_skip; | |
| 3585 struct gcpro gcpro1; | |
| 3586 | |
| 3587 if (encodep && !NILP (coding->pre_write_conversion) | |
| 3588 || !encodep && !NILP (coding->post_read_conversion)) | |
| 3589 { | |
| 3590 /* Since we have to call Lisp functions which assume target text | |
| 3591 is in a buffer, after setting a temporary buffer, call | |
| 3592 code_convert_region. */ | |
| 3593 int count = specpdl_ptr - specpdl; | |
| 3594 int len = XSTRING (str)->size_byte; | |
| 3595 Lisp_Object result; | |
| 3596 struct buffer *old = current_buffer; | |
| 3597 | |
| 3598 record_unwind_protect (Fset_buffer, Fcurrent_buffer ()); | |
| 3599 temp_output_buffer_setup (" *code-converting-work*"); | |
| 3600 set_buffer_internal (XBUFFER (Vstandard_output)); | |
| 3601 insert_from_string (str, 0, 0, XSTRING (str)->size, len, 0); | |
| 3602 code_convert_region (make_number (BEGV), make_number (ZV), | |
| 3603 coding, encodep); | |
| 3604 result = make_buffer_string (BEGV, ZV, 0); | |
| 3605 set_buffer_internal (old); | |
| 3606 return unbind_to (count, result); | |
| 3607 } | |
| 3608 | |
| 3609 /* We may be able to shrink the conversion region. */ | |
| 3610 begp = XSTRING (str)->data; | |
| 3611 endp = begp + XSTRING (str)->size_byte; | |
| 3612 shrink_conversion_area (&begp, &endp, coding, encodep); | |
| 3613 | |
| 3614 if (begp == endp) | |
| 3615 /* We need no conversion. */ | |
| 3616 return (NILP (nocopy) ? Fcopy_sequence (str) : str); | |
| 3617 | |
| 3618 /* We assume that head_skip and tail_skip count single-byte characters. */ | |
| 3619 head_skip = begp - XSTRING (str)->data; | |
| 3620 tail_skip = XSTRING (str)->size_byte - head_skip - (endp - begp); | |
| 3621 | |
| 3622 GCPRO1 (str); | |
| 3623 | |
| 3624 if (encodep) | |
| 3625 len = encoding_buffer_size (coding, endp - begp); | |
| 3626 else | |
| 3627 len = decoding_buffer_size (coding, endp - begp); | |
| 3628 buf = get_conversion_buffer (len + head_skip + tail_skip); | |
| 3629 | |
| 3630 bcopy (XSTRING (str)->data, buf, head_skip); | |
| 3631 coding->last_block = 1; | |
| 3632 produced = (encodep | |
| 3633 ? encode_coding (coding, XSTRING (str)->data + head_skip, | |
| 3634 buf + head_skip, endp - begp, len, &consumed) | |
| 3635 : decode_coding (coding, XSTRING (str)->data + head_skip, | |
| 3636 buf + head_skip, endp - begp, len, &consumed)); | |
| 3637 bcopy (XSTRING (str)->data + head_skip + (endp - begp), | |
| 3638 buf + head_skip + produced, | |
| 3639 tail_skip); | |
| 3640 | |
| 3641 UNGCPRO; | |
| 3642 | |
| 3643 if (encodep) | |
| 3644 /* When encoding, the result is all single-byte characters. */ | |
| 3645 return make_unibyte_string (buf, head_skip + produced + tail_skip); | |
| 3646 | |
| 3647 /* When decoding, count properly the number of chars in the string. */ | |
| 3648 return make_string (buf, head_skip + produced + tail_skip); | |
| 3649 } | 4406 } |
| 3650 | 4407 |
| 3651 DEFUN ("decode-coding-string", Fdecode_coding_string, Sdecode_coding_string, | 4408 DEFUN ("decode-coding-string", Fdecode_coding_string, Sdecode_coding_string, |
| 3652 2, 3, 0, | 4409 2, 3, 0, |
| 3653 "Decode STRING which is encoded in CODING-SYSTEM, and return the result.\n\ | 4410 "Decode STRING which is encoded in CODING-SYSTEM, and return the result.\n\ |
| 3661 CHECK_STRING (string, 0); | 4418 CHECK_STRING (string, 0); |
| 3662 CHECK_SYMBOL (coding_system, 1); | 4419 CHECK_SYMBOL (coding_system, 1); |
| 3663 | 4420 |
| 3664 if (NILP (coding_system)) | 4421 if (NILP (coding_system)) |
| 3665 return (NILP (nocopy) ? Fcopy_sequence (string) : string); | 4422 return (NILP (nocopy) ? Fcopy_sequence (string) : string); |
| 4423 | |
| 3666 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) | 4424 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) |
| 3667 error ("Invalid coding-system: %s", XSYMBOL (coding_system)->name->data); | 4425 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data); |
| 3668 | 4426 |
| 3669 return code_convert_string (string, &coding, 0, nocopy); | 4427 coding.mode |= CODING_MODE_LAST_BLOCK; |
| 4428 return code_convert_string (string, &coding, 0, !NILP (nocopy)); | |
| 3670 } | 4429 } |
| 3671 | 4430 |
| 3672 DEFUN ("encode-coding-string", Fencode_coding_string, Sencode_coding_string, | 4431 DEFUN ("encode-coding-string", Fencode_coding_string, Sencode_coding_string, |
| 3673 2, 3, 0, | 4432 2, 3, 0, |
| 3674 "Encode STRING to CODING-SYSTEM, and return the result.\n\ | 4433 "Encode STRING to CODING-SYSTEM, and return the result.\n\ |
| 3682 CHECK_STRING (string, 0); | 4441 CHECK_STRING (string, 0); |
| 3683 CHECK_SYMBOL (coding_system, 1); | 4442 CHECK_SYMBOL (coding_system, 1); |
| 3684 | 4443 |
| 3685 if (NILP (coding_system)) | 4444 if (NILP (coding_system)) |
| 3686 return (NILP (nocopy) ? Fcopy_sequence (string) : string); | 4445 return (NILP (nocopy) ? Fcopy_sequence (string) : string); |
| 4446 | |
| 3687 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) | 4447 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) |
| 3688 error ("Invalid coding-system: %s", XSYMBOL (coding_system)->name->data); | 4448 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data); |
| 3689 | 4449 |
| 3690 return code_convert_string (string, &coding, 1, nocopy); | 4450 coding.mode |= CODING_MODE_LAST_BLOCK; |
| 4451 return code_convert_string (string, &coding, 1, !NILP (nocopy)); | |
| 3691 } | 4452 } |
| 3692 | 4453 |
| 3693 DEFUN ("decode-sjis-char", Fdecode_sjis_char, Sdecode_sjis_char, 1, 1, 0, | 4454 DEFUN ("decode-sjis-char", Fdecode_sjis_char, Sdecode_sjis_char, 1, 1, 0, |
| 3694 "Decode a JISX0208 character of shift-jis encoding.\n\ | 4455 "Decode a JISX0208 character of shift-jis encoding.\n\ |
| 3695 CODE is the character code in SJIS.\n\ | 4456 CODE is the character code in SJIS.\n\ |
| 3706 XSETFASTINT (val, MAKE_NON_ASCII_CHAR (charset_jisx0208, c1, c2)); | 4467 XSETFASTINT (val, MAKE_NON_ASCII_CHAR (charset_jisx0208, c1, c2)); |
| 3707 return val; | 4468 return val; |
| 3708 } | 4469 } |
| 3709 | 4470 |
| 3710 DEFUN ("encode-sjis-char", Fencode_sjis_char, Sencode_sjis_char, 1, 1, 0, | 4471 DEFUN ("encode-sjis-char", Fencode_sjis_char, Sencode_sjis_char, 1, 1, 0, |
| 3711 "Encode a JISX0208 character CHAR to SJIS coding-system.\n\ | 4472 "Encode a JISX0208 character CHAR to SJIS coding system.\n\ |
| 3712 Return the corresponding character code in SJIS.") | 4473 Return the corresponding character code in SJIS.") |
| 3713 (ch) | 4474 (ch) |
| 3714 Lisp_Object ch; | 4475 Lisp_Object ch; |
| 3715 { | 4476 { |
| 3716 int charset, c1, c2, s1, s2; | 4477 int charset, c1, c2, s1, s2; |
| 3727 XSETFASTINT (val, 0); | 4488 XSETFASTINT (val, 0); |
| 3728 return val; | 4489 return val; |
| 3729 } | 4490 } |
| 3730 | 4491 |
| 3731 DEFUN ("decode-big5-char", Fdecode_big5_char, Sdecode_big5_char, 1, 1, 0, | 4492 DEFUN ("decode-big5-char", Fdecode_big5_char, Sdecode_big5_char, 1, 1, 0, |
| 3732 "Decode a Big5 character CODE of BIG5 coding-system.\n\ | 4493 "Decode a Big5 character CODE of BIG5 coding system.\n\ |
| 3733 CODE is the character code in BIG5.\n\ | 4494 CODE is the character code in BIG5.\n\ |
| 3734 Return the corresponding character.") | 4495 Return the corresponding character.") |
| 3735 (code) | 4496 (code) |
| 3736 Lisp_Object code; | 4497 Lisp_Object code; |
| 3737 { | 4498 { |
| 3745 XSETFASTINT (val, MAKE_NON_ASCII_CHAR (charset, c1, c2)); | 4506 XSETFASTINT (val, MAKE_NON_ASCII_CHAR (charset, c1, c2)); |
| 3746 return val; | 4507 return val; |
| 3747 } | 4508 } |
| 3748 | 4509 |
| 3749 DEFUN ("encode-big5-char", Fencode_big5_char, Sencode_big5_char, 1, 1, 0, | 4510 DEFUN ("encode-big5-char", Fencode_big5_char, Sencode_big5_char, 1, 1, 0, |
| 3750 "Encode the Big5 character CHAR to BIG5 coding-system.\n\ | 4511 "Encode the Big5 character CHAR to BIG5 coding system.\n\ |
| 3751 Return the corresponding character code in Big5.") | 4512 Return the corresponding character code in Big5.") |
| 3752 (ch) | 4513 (ch) |
| 3753 Lisp_Object ch; | 4514 Lisp_Object ch; |
| 3754 { | 4515 { |
| 3755 int charset, c1, c2, b1, b2; | 4516 int charset, c1, c2, b1, b2; |
| 3913 } | 4674 } |
| 3914 } | 4675 } |
| 3915 return Qnil; | 4676 return Qnil; |
| 3916 } | 4677 } |
| 3917 | 4678 |
| 4679 DEFUN ("update-iso-coding-systems", Fupdate_iso_coding_systems, | |
| 4680 Supdate_iso_coding_systems, 0, 0, 0, | |
| 4681 "Update internal database for ISO2022 based coding systems.\n\ | |
| 4682 When values of the following coding categories are changed, you must\n\ | |
| 4683 call this function:\n\ | |
| 4684 coding-category-iso-7, coding-category-iso-7-tight,\n\ | |
| 4685 coding-category-iso-8-1, coding-category-iso-8-2,\n\ | |
| 4686 coding-category-iso-7-else, coding-category-iso-8-else") | |
| 4687 () | |
| 4688 { | |
| 4689 int i; | |
| 4690 | |
| 4691 for (i = CODING_CATEGORY_IDX_ISO_7; i <= CODING_CATEGORY_IDX_ISO_8_ELSE; | |
| 4692 i++) | |
| 4693 { | |
| 4694 if (! coding_system_table[i]) | |
| 4695 coding_system_table[i] | |
| 4696 = (struct coding_system *) xmalloc (sizeof (struct coding_system)); | |
| 4697 setup_coding_system | |
| 4698 (XSYMBOL (XVECTOR (Vcoding_category_table)->contents[i])->value, | |
| 4699 coding_system_table[i]); | |
| 4700 } | |
| 4701 return Qnil; | |
| 4702 } | |
| 4703 | |
| 3918 #endif /* emacs */ | 4704 #endif /* emacs */ |
| 3919 | 4705 |
| 3920 | 4706 |
| 3921 /*** 8. Post-amble ***/ | 4707 /*** 8. Post-amble ***/ |
| 3922 | 4708 |
| 3965 | 4751 |
| 3966 setup_coding_system (Qnil, &keyboard_coding); | 4752 setup_coding_system (Qnil, &keyboard_coding); |
| 3967 setup_coding_system (Qnil, &terminal_coding); | 4753 setup_coding_system (Qnil, &terminal_coding); |
| 3968 setup_coding_system (Qnil, &safe_terminal_coding); | 4754 setup_coding_system (Qnil, &safe_terminal_coding); |
| 3969 | 4755 |
| 4756 bzero (coding_system_table, sizeof coding_system_table); | |
| 4757 | |
| 3970 #if defined (MSDOS) || defined (WINDOWSNT) | 4758 #if defined (MSDOS) || defined (WINDOWSNT) |
| 3971 system_eol_type = CODING_EOL_CRLF; | 4759 system_eol_type = CODING_EOL_CRLF; |
| 3972 #else | 4760 #else |
| 3973 system_eol_type = CODING_EOL_LF; | 4761 system_eol_type = CODING_EOL_LF; |
| 3974 #endif | 4762 #endif |
| 4040 Fput (Qcoding_system_error, Qerror_conditions, | 4828 Fput (Qcoding_system_error, Qerror_conditions, |
| 4041 Fcons (Qcoding_system_error, Fcons (Qerror, Qnil))); | 4829 Fcons (Qcoding_system_error, Fcons (Qerror, Qnil))); |
| 4042 Fput (Qcoding_system_error, Qerror_message, | 4830 Fput (Qcoding_system_error, Qerror_message, |
| 4043 build_string ("Invalid coding system")); | 4831 build_string ("Invalid coding system")); |
| 4044 | 4832 |
| 4833 Qcoding_category = intern ("coding-category"); | |
| 4834 staticpro (&Qcoding_category); | |
| 4045 Qcoding_category_index = intern ("coding-category-index"); | 4835 Qcoding_category_index = intern ("coding-category-index"); |
| 4046 staticpro (&Qcoding_category_index); | 4836 staticpro (&Qcoding_category_index); |
| 4047 | 4837 |
| 4838 Vcoding_category_table | |
| 4839 = Fmake_vector (make_number (CODING_CATEGORY_IDX_MAX), Qnil); | |
| 4840 staticpro (&Vcoding_category_table); | |
| 4048 { | 4841 { |
| 4049 int i; | 4842 int i; |
| 4050 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++) | 4843 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++) |
| 4051 { | 4844 { |
| 4052 coding_category_table[i] = intern (coding_category_name[i]); | 4845 XVECTOR (Vcoding_category_table)->contents[i] |
| 4053 staticpro (&coding_category_table[i]); | 4846 = intern (coding_category_name[i]); |
| 4054 Fput (coding_category_table[i], Qcoding_category_index, | 4847 Fput (XVECTOR (Vcoding_category_table)->contents[i], |
| 4055 make_number (i)); | 4848 Qcoding_category_index, make_number (i)); |
| 4056 } | 4849 } |
| 4057 } | 4850 } |
| 4058 | 4851 |
| 4059 Qcharacter_unification_table = intern ("character-unification-table"); | 4852 Qcharacter_unification_table = intern ("character-unification-table"); |
| 4060 staticpro (&Qcharacter_unification_table); | 4853 staticpro (&Qcharacter_unification_table); |
| 4072 Qsafe_charsets = intern ("safe-charsets"); | 4865 Qsafe_charsets = intern ("safe-charsets"); |
| 4073 staticpro (&Qsafe_charsets); | 4866 staticpro (&Qsafe_charsets); |
| 4074 | 4867 |
| 4075 Qemacs_mule = intern ("emacs-mule"); | 4868 Qemacs_mule = intern ("emacs-mule"); |
| 4076 staticpro (&Qemacs_mule); | 4869 staticpro (&Qemacs_mule); |
| 4870 | |
| 4871 Qraw_text = intern ("raw-text"); | |
| 4872 staticpro (&Qraw_text); | |
| 4077 | 4873 |
| 4078 defsubr (&Scoding_system_p); | 4874 defsubr (&Scoding_system_p); |
| 4079 defsubr (&Sread_coding_system); | 4875 defsubr (&Sread_coding_system); |
| 4080 defsubr (&Sread_non_nil_coding_system); | 4876 defsubr (&Sread_non_nil_coding_system); |
| 4081 defsubr (&Scheck_coding_system); | 4877 defsubr (&Scheck_coding_system); |
| 4082 defsubr (&Sdetect_coding_region); | 4878 defsubr (&Sdetect_coding_region); |
| 4879 defsubr (&Sdetect_coding_string); | |
| 4083 defsubr (&Sdecode_coding_region); | 4880 defsubr (&Sdecode_coding_region); |
| 4084 defsubr (&Sencode_coding_region); | 4881 defsubr (&Sencode_coding_region); |
| 4085 defsubr (&Sdecode_coding_string); | 4882 defsubr (&Sdecode_coding_string); |
| 4086 defsubr (&Sencode_coding_string); | 4883 defsubr (&Sencode_coding_string); |
| 4087 defsubr (&Sdecode_sjis_char); | 4884 defsubr (&Sdecode_sjis_char); |
| 4092 defsubr (&Sset_safe_terminal_coding_system_internal); | 4889 defsubr (&Sset_safe_terminal_coding_system_internal); |
| 4093 defsubr (&Sterminal_coding_system); | 4890 defsubr (&Sterminal_coding_system); |
| 4094 defsubr (&Sset_keyboard_coding_system_internal); | 4891 defsubr (&Sset_keyboard_coding_system_internal); |
| 4095 defsubr (&Skeyboard_coding_system); | 4892 defsubr (&Skeyboard_coding_system); |
| 4096 defsubr (&Sfind_operation_coding_system); | 4893 defsubr (&Sfind_operation_coding_system); |
| 4894 defsubr (&Supdate_iso_coding_systems); | |
| 4097 | 4895 |
| 4098 DEFVAR_LISP ("coding-system-list", &Vcoding_system_list, | 4896 DEFVAR_LISP ("coding-system-list", &Vcoding_system_list, |
| 4099 "List of coding systems.\n\ | 4897 "List of coding systems.\n\ |
| 4100 \n\ | 4898 \n\ |
| 4101 Do not alter the value of this variable manually. This variable should be\n\ | 4899 Do not alter the value of this variable manually. This variable should be\n\ |
| 4119 int i; | 4917 int i; |
| 4120 | 4918 |
| 4121 Vcoding_category_list = Qnil; | 4919 Vcoding_category_list = Qnil; |
| 4122 for (i = CODING_CATEGORY_IDX_MAX - 1; i >= 0; i--) | 4920 for (i = CODING_CATEGORY_IDX_MAX - 1; i >= 0; i--) |
| 4123 Vcoding_category_list | 4921 Vcoding_category_list |
| 4124 = Fcons (coding_category_table[i], Vcoding_category_list); | 4922 = Fcons (XVECTOR (Vcoding_category_table)->contents[i], |
| 4923 Vcoding_category_list); | |
| 4125 } | 4924 } |
| 4126 | 4925 |
| 4127 DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read, | 4926 DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read, |
| 4128 "Specify the coding system for read operations.\n\ | 4927 "Specify the coding system for read operations.\n\ |
| 4129 It is useful to bind this variable with `let', but do not set it globally.\n\ | 4928 It is useful to bind this variable with `let', but do not set it globally.\n\ |
| 4247 a coding system of ISO 2022 variant which has a flag\n\ | 5046 a coding system of ISO 2022 variant which has a flag\n\ |
| 4248 `accept-latin-extra-code' t (e.g. iso-latin-1) on reading a file\n\ | 5047 `accept-latin-extra-code' t (e.g. iso-latin-1) on reading a file\n\ |
| 4249 or reading output of a subprocess.\n\ | 5048 or reading output of a subprocess.\n\ |
| 4250 Only 128th through 159th elements has a meaning."); | 5049 Only 128th through 159th elements has a meaning."); |
| 4251 Vlatin_extra_code_table = Fmake_vector (make_number (256), Qnil); | 5050 Vlatin_extra_code_table = Fmake_vector (make_number (256), Qnil); |
| 5051 | |
| 5052 DEFVAR_LISP ("select-safe-coding-system-function", | |
| 5053 &Vselect_safe_coding_system_function, | |
| 5054 "Function to call to select safe coding system for encoding a text.\n\ | |
| 5055 \n\ | |
| 5056 If set, this function is called to force a user to select a proper\n\ | |
| 5057 coding system which can encode the text in the case that a default\n\ | |
| 5058 coding system used in each operation can't encode the text.\n\ | |
| 5059 \n\ | |
| 5060 The default value is `select-safe-codign-system' (which see)."); | |
| 5061 Vselect_safe_coding_system_function = Qnil; | |
| 5062 | |
| 4252 } | 5063 } |
| 4253 | 5064 |
| 4254 #endif /* emacs */ | 5065 #endif /* emacs */ |
