comparison src/coding.c @ 20718:c600dea3b06b

Vselect_safe_coding_system_function): New variable. (coding_category_table): This variable deleted. (Vcoding_category_table): New variable. (coding_category_name): Add "coding-category-iso-7-tight". (detect_coding_iso2022): Check the mask CODING_FLAG_ISO_DESIGNATION in CODING->FLAGS. Check a new coding category coding-category-iso-7-tight. (DECODE_DESIGNATION): Decode only such designations that CODING can handle. (check_composing_code): New function. (decode_coding_iso2022): Decode only such characters that CODING can handle. (encode_coding_iso2022): Before and after encoding composite characters, reset designation and invocation status. (detect_coding_sjis): Delete unnecessary check. (detect_coding_big5): Likewise. (encode_designation_at_bol): Check the validity of requested designation register. (setup_coding_system): Set requested designation registers for non-supported charsets to CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION. Set mask CODING_FLAG_ISO_DESIGNATION in CODING->FLAGS. Code tuned for no-conversion and undecided. (detect_coding): Adjusted for the new variable Vcoding_category_table. (syms_of_coding): Initialize Vcoding_category_table and staticpro it. Register select-safe-coding-system as a Lisp variable. (DECODE_CHARACTER_ASCII): Update coding->produced_char; (DECODE_CHARACTER_DIMENSION1): Likewise. (Qraw_text, Qcoding_category): New variables. (syms_of_coding): Intern and staticpro them. (coding_system_table): New variable. (CHARSET_OK, SHIFT_OUT_OK): New macros. (detect_coding_iso2022): Detection algorithm improved. (decode_coding_iso2022): Arg CONSUMED deleted, and the meaning of return value changed. Update members produced, produced_char, consumed, consumed_char of the struct *coding. Pay attention to CODING_MODE_INHIBIT_INCONSISTENT_EOL. (encode_coding_iso2022): Likewise. (decode_coding_sjis_big5, encode_coding_sjis_big5): Likewise. (decode_eol, encode_eol): Likewise. (ENCODE_ISO_CHARACTER): Update coding->consumed_char. (DECODE_SJIS_BIG5_CHARACTER): Update coding->produced_char. (ENCODE_SJIS_BIG5_CHARACTER): Update coding->consumed_char. (detect_coding(detect_coding(detect_ITIES and SKIP. (detect_coding): Adjusted for the change of detect_coding_mask. Update coding->heading_ascii. (detect_eol_type): New arg SKIP. (detect_eol): Adjusted for the change of detect_eol_type. (ccl_codign_driver): New function. (decode_coding): Arg CONSUMED deleted, and the meaning of return value changed. Update members produced, produced_char, consumed, consumed_char of the struct *coding. (encode_coding): Likewise. (shrink_decoding_region, shrink_encoding_region): New function. (code_convert_region, code_convert_string): Completely rewritten. (detect_coding_sy(detect_coding_sy(detect_coding_sy(detect_coding_sy(detect_codiT. (Fdetect_coding_string): New function. (Fdecode_coding_region, Fencode_coding_region): Adjusted for the change of code_convert_region. (Fdecode_coding_string, Fencode_coding_string): Adjusted for the change of code_convert_string. (Fupdate_iso_coding_systems): New function. (init_coding_once): Initialize coding_system_table.
author Kenichi Handa <handa@m17n.org>
date Thu, 22 Jan 1998 01:26:45 +0000
parents ed9ed828415e
children 13d0a6194de7
comparison
equal deleted inserted replaced
20717:19463997fbc6 20718:c600dea3b06b
77 If a user wants to read/write a text encoded in a coding system not 77 If a user wants to read/write a text encoded in a coding system not
78 listed above, he can supply a decoder and an encoder for it in CCL 78 listed above, he can supply a decoder and an encoder for it in CCL
79 (Code Conversion Language) programs. Emacs executes the CCL program 79 (Code Conversion Language) programs. Emacs executes the CCL program
80 while reading/writing. 80 while reading/writing.
81 81
82 Emacs represents a coding-system by a Lisp symbol that has a property 82 Emacs represents a coding system by a Lisp symbol that has a property
83 `coding-system'. But, before actually using the coding-system, the 83 `coding-system'. But, before actually using the coding system, the
84 information about it is set in a structure of type `struct 84 information about it is set in a structure of type `struct
85 coding_system' for rapid processing. See section 6 for more details. 85 coding_system' for rapid processing. See section 6 for more details.
86 86
87 */ 87 */
88 88
89 /*** GENERAL NOTES on END-OF-LINE FORMAT *** 89 /*** GENERAL NOTES on END-OF-LINE FORMAT ***
90 90
91 How end-of-line of a text is encoded depends on a system. For 91 How end-of-line of a text is encoded depends on a system. For
92 instance, Unix's format is just one byte of `line-feed' code, 92 instance, Unix's format is just one byte of `line-feed' code,
93 whereas DOS's format is two-byte sequence of `carriage-return' and 93 whereas DOS's format is two-byte sequence of `carriage-return' and
94 `line-feed' codes. MacOS's format is one byte of `carriage-return'. 94 `line-feed' codes. MacOS's format is usually one byte of
95 `carriage-return'.
95 96
96 Since text characters encoding and end-of-line encoding are 97 Since text characters encoding and end-of-line encoding are
97 independent, any coding system described above can take 98 independent, any coding system described above can take
98 any format of end-of-line. So, Emacs has information of format of 99 any format of end-of-line. So, Emacs has information of format of
99 end-of-line in each coding-system. See section 6 for more details. 100 end-of-line in each coding-system. See section 6 for more details.
118 119
119 /*** GENERAL NOTES on `decode_coding_XXX ()' functions *** 120 /*** GENERAL NOTES on `decode_coding_XXX ()' functions ***
120 121
121 These functions decode SRC_BYTES length text at SOURCE encoded in 122 These functions decode SRC_BYTES length text at SOURCE encoded in
122 CODING to Emacs' internal format (emacs-mule). The resulting text 123 CODING to Emacs' internal format (emacs-mule). The resulting text
123 goes to a place pointed to by DESTINATION, the length of which should 124 goes to a place pointed to by DESTINATION, the length of which
124 not exceed DST_BYTES. The number of bytes actually processed is 125 should not exceed DST_BYTES. These functions set the information of
125 returned as *CONSUMED. The return value is the length of the decoded 126 original and decoded texts in the members produced, produced_char,
126 text. Below is a template of these functions. */ 127 consumed, and consumed_char of the structure *CODING.
128
129 The return value is an integer (CODING_FINISH_XXX) indicating how
130 the decoding finished.
131
132 DST_BYTES zero means that source area and destination area are
133 overlapped, which means that we can produce a decoded text until it
134 reaches at the head of not-yet-decoded source text.
135
136 Below is a template of these functions. */
127 #if 0 137 #if 0
128 decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes, consumed) 138 decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
129 struct coding_system *coding; 139 struct coding_system *coding;
130 unsigned char *source, *destination; 140 unsigned char *source, *destination;
131 int src_bytes, dst_bytes; 141 int src_bytes, dst_bytes;
132 int *consumed;
133 { 142 {
134 ... 143 ...
135 } 144 }
136 #endif 145 #endif
137 146
138 /*** GENERAL NOTES on `encode_coding_XXX ()' functions *** 147 /*** GENERAL NOTES on `encode_coding_XXX ()' functions ***
139 148
140 These functions encode SRC_BYTES length text at SOURCE of Emacs' 149 These functions encode SRC_BYTES length text at SOURCE of Emacs'
141 internal format (emacs-mule) to CODING. The resulting text goes to 150 internal format (emacs-mule) to CODING. The resulting text goes to
142 a place pointed to by DESTINATION, the length of which should not 151 a place pointed to by DESTINATION, the length of which should not
143 exceed DST_BYTES. The number of bytes actually processed is 152 exceed DST_BYTES. These functions set the information of
144 returned as *CONSUMED. The return value is the length of the 153 original and encoded texts in the members produced, produced_char,
145 encoded text. Below is a template of these functions. */ 154 consumed, and consumed_char of the structure *CODING.
155
156 The return value is an integer (CODING_FINISH_XXX) indicating how
157 the encoding finished.
158
159 DST_BYTES zero means that source area and destination area are
160 overlapped, which means that we can produce a decoded text until it
161 reaches at the head of not-yet-decoded source text.
162
163 Below is a template of these functions. */
146 #if 0 164 #if 0
147 encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes, consumed) 165 encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
148 struct coding_system *coding; 166 struct coding_system *coding;
149 unsigned char *source, *destination; 167 unsigned char *source, *destination;
150 int src_bytes, dst_bytes; 168 int src_bytes, dst_bytes;
151 int *consumed;
152 { 169 {
153 ... 170 ...
154 } 171 }
155 #endif 172 #endif
156 173
198 #define DECODE_CHARACTER_ASCII(c) \ 215 #define DECODE_CHARACTER_ASCII(c) \
199 do { \ 216 do { \
200 if (COMPOSING_P (coding->composing)) \ 217 if (COMPOSING_P (coding->composing)) \
201 *dst++ = 0xA0, *dst++ = (c) | 0x80; \ 218 *dst++ = 0xA0, *dst++ = (c) | 0x80; \
202 else \ 219 else \
203 *dst++ = (c); \ 220 { \
221 *dst++ = (c); \
222 coding->produced_char++; \
223 } \
204 } while (0) 224 } while (0)
205 225
206 /* Decode one DIMENSION1 character whose charset is CHARSET and whose 226 /* Decode one DIMENSION1 character whose charset is CHARSET and whose
207 position-code is C. */ 227 position-code is C. */
208 228
210 do { \ 230 do { \
211 unsigned char leading_code = CHARSET_LEADING_CODE_BASE (charset); \ 231 unsigned char leading_code = CHARSET_LEADING_CODE_BASE (charset); \
212 if (COMPOSING_P (coding->composing)) \ 232 if (COMPOSING_P (coding->composing)) \
213 *dst++ = leading_code + 0x20; \ 233 *dst++ = leading_code + 0x20; \
214 else \ 234 else \
215 *dst++ = leading_code; \ 235 { \
236 *dst++ = leading_code; \
237 coding->produced_char++; \
238 } \
216 if (leading_code = CHARSET_LEADING_CODE_EXT (charset)) \ 239 if (leading_code = CHARSET_LEADING_CODE_EXT (charset)) \
217 *dst++ = leading_code; \ 240 *dst++ = leading_code; \
218 *dst++ = (c) | 0x80; \ 241 *dst++ = (c) | 0x80; \
219 } while (0) 242 } while (0)
220 243
258 extern Lisp_Object Qinsert_file_contents, Qwrite_region; 281 extern Lisp_Object Qinsert_file_contents, Qwrite_region;
259 Lisp_Object Qcall_process, Qcall_process_region, Qprocess_argument; 282 Lisp_Object Qcall_process, Qcall_process_region, Qprocess_argument;
260 Lisp_Object Qstart_process, Qopen_network_stream; 283 Lisp_Object Qstart_process, Qopen_network_stream;
261 Lisp_Object Qtarget_idx; 284 Lisp_Object Qtarget_idx;
262 285
286 Lisp_Object Vselect_safe_coding_system_function;
287
263 /* Mnemonic character of each format of end-of-line. */ 288 /* Mnemonic character of each format of end-of-line. */
264 int eol_mnemonic_unix, eol_mnemonic_dos, eol_mnemonic_mac; 289 int eol_mnemonic_unix, eol_mnemonic_dos, eol_mnemonic_mac;
265 /* Mnemonic character to indicate format of end-of-line is not yet 290 /* Mnemonic character to indicate format of end-of-line is not yet
266 decided. */ 291 decided. */
267 int eol_mnemonic_undecided; 292 int eol_mnemonic_undecided;
274 299
275 Lisp_Object Vcoding_system_list, Vcoding_system_alist; 300 Lisp_Object Vcoding_system_list, Vcoding_system_alist;
276 301
277 Lisp_Object Qcoding_system_p, Qcoding_system_error; 302 Lisp_Object Qcoding_system_p, Qcoding_system_error;
278 303
279 /* Coding system emacs-mule is for converting only end-of-line format. */ 304 /* Coding system emacs-mule and raw-text are for converting only
280 Lisp_Object Qemacs_mule; 305 end-of-line format. */
306 Lisp_Object Qemacs_mule, Qraw_text;
281 307
282 /* Coding-systems are handed between Emacs Lisp programs and C internal 308 /* Coding-systems are handed between Emacs Lisp programs and C internal
283 routines by the following three variables. */ 309 routines by the following three variables. */
284 /* Coding-system for reading files and receiving data from process. */ 310 /* Coding-system for reading files and receiving data from process. */
285 Lisp_Object Vcoding_system_for_read; 311 Lisp_Object Vcoding_system_for_read;
309 Lisp_Object Vprocess_coding_system_alist; 335 Lisp_Object Vprocess_coding_system_alist;
310 Lisp_Object Vnetwork_coding_system_alist; 336 Lisp_Object Vnetwork_coding_system_alist;
311 337
312 #endif /* emacs */ 338 #endif /* emacs */
313 339
314 Lisp_Object Qcoding_category_index; 340 Lisp_Object Qcoding_category, Qcoding_category_index;
315 341
316 /* List of symbols `coding-category-xxx' ordered by priority. */ 342 /* List of symbols `coding-category-xxx' ordered by priority. */
317 Lisp_Object Vcoding_category_list; 343 Lisp_Object Vcoding_category_list;
318 344
319 /* Table of coding-systems currently assigned to each coding-category. */ 345 /* Table of coding categories (Lisp symbols). */
320 Lisp_Object coding_category_table[CODING_CATEGORY_IDX_MAX]; 346 Lisp_Object Vcoding_category_table;
321 347
322 /* Table of names of symbol for each coding-category. */ 348 /* Table of names of symbol for each coding-category. */
323 char *coding_category_name[CODING_CATEGORY_IDX_MAX] = { 349 char *coding_category_name[CODING_CATEGORY_IDX_MAX] = {
324 "coding-category-emacs-mule", 350 "coding-category-emacs-mule",
325 "coding-category-sjis", 351 "coding-category-sjis",
326 "coding-category-iso-7", 352 "coding-category-iso-7",
353 "coding-category-iso-7-tight",
327 "coding-category-iso-8-1", 354 "coding-category-iso-8-1",
328 "coding-category-iso-8-2", 355 "coding-category-iso-8-2",
329 "coding-category-iso-7-else", 356 "coding-category-iso-7-else",
330 "coding-category-iso-8-else", 357 "coding-category-iso-8-else",
331 "coding-category-big5", 358 "coding-category-big5",
332 "coding-category-raw-text", 359 "coding-category-raw-text",
333 "coding-category-binary" 360 "coding-category-binary"
334 }; 361 };
362
363 /* Table pointers to coding systems corresponding to each coding
364 categories. */
365 struct coding_system *coding_system_table[CODING_CATEGORY_IDX_MAX];
335 366
336 /* Flag to tell if we look up unification table on character code 367 /* Flag to tell if we look up unification table on character code
337 conversion. */ 368 conversion. */
338 Lisp_Object Venable_character_unification; 369 Lisp_Object Venable_character_unification;
339 /* Standard unification table to look up on decoding (reading). */ 370 /* Standard unification table to look up on decoding (reading). */
397 return 0; \ 428 return 0; \
398 } while (0) 429 } while (0)
399 430
400 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". 431 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
401 Check if a text is encoded in Emacs' internal format. If it is, 432 Check if a text is encoded in Emacs' internal format. If it is,
402 return CODING_CATEGORY_MASK_EMASC_MULE, else return 0. */ 433 return CODING_CATEGORY_MASK_EMACS_MULE, else return 0. */
403 434
404 int 435 int
405 detect_coding_emacs_mule (src, src_end) 436 detect_coding_emacs_mule (src, src_end)
406 unsigned char *src, *src_end; 437 unsigned char *src, *src_end;
407 { 438 {
607 Since these are not standard escape sequences of any ISO, the use 638 Since these are not standard escape sequences of any ISO, the use
608 of them for these meaning is restricted to Emacs only. */ 639 of them for these meaning is restricted to Emacs only. */
609 640
610 enum iso_code_class_type iso_code_class[256]; 641 enum iso_code_class_type iso_code_class[256];
611 642
643 #define CHARSET_OK(idx, charset) \
644 (CODING_SPEC_ISO_REQUESTED_DESIGNATION \
645 (coding_system_table[idx], charset) \
646 != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION)
647
648 #define SHIFT_OUT_OK(idx) \
649 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding_system_table[idx], 1) >= 0)
650
612 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". 651 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
613 Check if a text is encoded in ISO2022. If it is, returns an 652 Check if a text is encoded in ISO2022. If it is, returns an
614 integer in which appropriate flag bits any of: 653 integer in which appropriate flag bits any of:
615 CODING_CATEGORY_MASK_ISO_7 654 CODING_CATEGORY_MASK_ISO_7
655 CODING_CATEGORY_MASK_ISO_7_TIGHT
616 CODING_CATEGORY_MASK_ISO_8_1 656 CODING_CATEGORY_MASK_ISO_8_1
617 CODING_CATEGORY_MASK_ISO_8_2 657 CODING_CATEGORY_MASK_ISO_8_2
618 CODING_CATEGORY_MASK_ISO_7_ELSE 658 CODING_CATEGORY_MASK_ISO_7_ELSE
619 CODING_CATEGORY_MASK_ISO_8_ELSE 659 CODING_CATEGORY_MASK_ISO_8_ELSE
620 are set. If a code which should never appear in ISO2022 is found, 660 are set. If a code which should never appear in ISO2022 is found,
622 662
623 int 663 int
624 detect_coding_iso2022 (src, src_end) 664 detect_coding_iso2022 (src, src_end)
625 unsigned char *src, *src_end; 665 unsigned char *src, *src_end;
626 { 666 {
627 int mask = (CODING_CATEGORY_MASK_ISO_7 667 int mask = CODING_CATEGORY_MASK_ISO;
628 | CODING_CATEGORY_MASK_ISO_8_1 668 int mask_found = 0;
629 | CODING_CATEGORY_MASK_ISO_8_2 669 int reg[4], shift_out = 0;
630 | CODING_CATEGORY_MASK_ISO_7_ELSE 670 int c, c1, i, charset;
631 | CODING_CATEGORY_MASK_ISO_8_ELSE 671
632 ); 672 reg[0] = CHARSET_ASCII, reg[1] = reg[2] = reg[3] = -1;
633 int g1 = 0; /* 1 iff designating to G1. */
634 int c, i;
635 struct coding_system coding_iso_8_1, coding_iso_8_2;
636
637 /* Coding systems of these categories may accept latin extra codes. */
638 setup_coding_system
639 (XSYMBOL (coding_category_table[CODING_CATEGORY_IDX_ISO_8_1])->value,
640 &coding_iso_8_1);
641 setup_coding_system
642 (XSYMBOL (coding_category_table[CODING_CATEGORY_IDX_ISO_8_2])->value,
643 &coding_iso_8_2);
644
645 while (mask && src < src_end) 673 while (mask && src < src_end)
646 { 674 {
647 c = *src++; 675 c = *src++;
648 switch (c) 676 switch (c)
649 { 677 {
650 case ISO_CODE_ESC: 678 case ISO_CODE_ESC:
651 if (src >= src_end) 679 if (src >= src_end)
652 break; 680 break;
653 c = *src++; 681 c = *src++;
654 if ((c >= '(' && c <= '/')) 682 if (c >= '(' && c <= '/')
655 { 683 {
656 /* Designation sequence for a charset of dimension 1. */ 684 /* Designation sequence for a charset of dimension 1. */
657 if (src >= src_end) 685 if (src >= src_end)
658 break; 686 break;
659 c = *src++; 687 c1 = *src++;
660 if (c < ' ' || c >= 0x80) 688 if (c1 < ' ' || c1 >= 0x80
661 /* Invalid designation sequence. */ 689 || (charset = iso_charset_table[0][c >= ','][c1]) < 0)
662 return 0; 690 /* Invalid designation sequence. Just ignore. */
691 break;
692 reg[(c - '(') % 4] = charset;
663 } 693 }
664 else if (c == '$') 694 else if (c == '$')
665 { 695 {
666 /* Designation sequence for a charset of dimension 2. */ 696 /* Designation sequence for a charset of dimension 2. */
667 if (src >= src_end) 697 if (src >= src_end)
668 break; 698 break;
669 c = *src++; 699 c = *src++;
670 if (c >= '@' && c <= 'B') 700 if (c >= '@' && c <= 'B')
671 /* Designation for JISX0208.1978, GB2312, or JISX0208. */ 701 /* Designation for JISX0208.1978, GB2312, or JISX0208. */
672 ; 702 reg[0] = charset = iso_charset_table[1][0][c];
673 else if (c >= '(' && c <= '/') 703 else if (c >= '(' && c <= '/')
674 { 704 {
675 if (src >= src_end) 705 if (src >= src_end)
676 break; 706 break;
677 c = *src++; 707 c1 = *src++;
678 if (c < ' ' || c >= 0x80) 708 if (c1 < ' ' || c1 >= 0x80
679 /* Invalid designation sequence. */ 709 || (charset = iso_charset_table[1][c >= ','][c1]) < 0)
680 return 0; 710 /* Invalid designation sequence. Just ignore. */
711 break;
712 reg[(c - '(') % 4] = charset;
681 } 713 }
682 else 714 else
683 /* Invalid designation sequence. */ 715 /* Invalid designation sequence. Just ignore. */
684 return 0; 716 break;
685 } 717 }
686 else if (c == 'N' || c == 'O' || c == 'n' || c == 'o') 718 else if (c == 'N' || c == 'n')
687 /* Locking shift. */ 719 {
688 mask &= (CODING_CATEGORY_MASK_ISO_7_ELSE 720 if (shift_out == 0
689 | CODING_CATEGORY_MASK_ISO_8_ELSE); 721 && (reg[1] >= 0
722 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_7_ELSE)
723 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_8_ELSE)))
724 {
725 /* Locking shift out. */
726 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT;
727 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT;
728 shift_out = 1;
729 }
730 break;
731 }
732 else if (c == 'O' || c == 'o')
733 {
734 if (shift_out == 1)
735 {
736 /* Locking shift in. */
737 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT;
738 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT;
739 shift_out = 0;
740 }
741 break;
742 }
690 else if (c == '0' || c == '1' || c == '2') 743 else if (c == '0' || c == '1' || c == '2')
691 /* Start/end composition. */ 744 /* Start/end composition. Just ignore. */
692 ; 745 break;
693 else 746 else
694 /* Invalid escape sequence. */ 747 /* Invalid escape sequence. Just ignore. */
695 return 0; 748 break;
749
750 /* We found a valid designation sequence for CHARSET. */
751 mask &= ~CODING_CATEGORY_MASK_ISO_8BIT;
752 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7, charset))
753 mask_found |= CODING_CATEGORY_MASK_ISO_7;
754 else
755 mask &= ~CODING_CATEGORY_MASK_ISO_7;
756 if (CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_TIGHT, charset))
757 mask_found |= CODING_CATEGORY_MASK_ISO_7_TIGHT;
758 else
759 mask &= ~CODING_CATEGORY_MASK_ISO_7_TIGHT;
760 if (! CHARSET_OK (CODING_CATEGORY_IDX_ISO_7_ELSE, charset))
761 mask &= ~CODING_CATEGORY_MASK_ISO_7_ELSE;
762 if (! CHARSET_OK (CODING_CATEGORY_IDX_ISO_8_ELSE, charset))
763 mask &= ~CODING_CATEGORY_MASK_ISO_8_ELSE;
696 break; 764 break;
697 765
698 case ISO_CODE_SO: 766 case ISO_CODE_SO:
699 mask &= (CODING_CATEGORY_MASK_ISO_7_ELSE 767 if (shift_out == 0
700 | CODING_CATEGORY_MASK_ISO_8_ELSE); 768 && (reg[1] >= 0
769 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_7_ELSE)
770 || SHIFT_OUT_OK (CODING_CATEGORY_IDX_ISO_8_ELSE)))
771 {
772 /* Locking shift out. */
773 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT;
774 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT;
775 }
701 break; 776 break;
702 777
778 case ISO_CODE_SI:
779 if (shift_out == 1)
780 {
781 /* Locking shift in. */
782 mask &= ~CODING_CATEGORY_MASK_ISO_7BIT;
783 mask_found |= CODING_CATEGORY_MASK_ISO_SHIFT;
784 }
785 break;
786
703 case ISO_CODE_CSI: 787 case ISO_CODE_CSI:
704 case ISO_CODE_SS2: 788 case ISO_CODE_SS2:
705 case ISO_CODE_SS3: 789 case ISO_CODE_SS3:
706 { 790 {
707 int newmask = CODING_CATEGORY_MASK_ISO_8_ELSE; 791 int newmask = CODING_CATEGORY_MASK_ISO_8_ELSE;
708 792
709 if (c != ISO_CODE_CSI) 793 if (c != ISO_CODE_CSI)
710 { 794 {
711 if (coding_iso_8_1.flags & CODING_FLAG_ISO_SINGLE_SHIFT) 795 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
796 & CODING_FLAG_ISO_SINGLE_SHIFT)
712 newmask |= CODING_CATEGORY_MASK_ISO_8_1; 797 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
713 if (coding_iso_8_2.flags & CODING_FLAG_ISO_SINGLE_SHIFT) 798 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
799 & CODING_FLAG_ISO_SINGLE_SHIFT)
714 newmask |= CODING_CATEGORY_MASK_ISO_8_2; 800 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
715 } 801 }
716 if (VECTORP (Vlatin_extra_code_table) 802 if (VECTORP (Vlatin_extra_code_table)
717 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c])) 803 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
718 { 804 {
719 if (coding_iso_8_1.flags & CODING_FLAG_ISO_LATIN_EXTRA) 805 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
806 & CODING_FLAG_ISO_LATIN_EXTRA)
720 newmask |= CODING_CATEGORY_MASK_ISO_8_1; 807 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
721 if (coding_iso_8_2.flags & CODING_FLAG_ISO_LATIN_EXTRA) 808 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
809 & CODING_FLAG_ISO_LATIN_EXTRA)
722 newmask |= CODING_CATEGORY_MASK_ISO_8_2; 810 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
723 } 811 }
724 mask &= newmask; 812 mask &= newmask;
813 mask_found |= newmask;
725 } 814 }
726 break; 815 break;
727 816
728 default: 817 default:
729 if (c < 0x80) 818 if (c < 0x80)
733 if (VECTORP (Vlatin_extra_code_table) 822 if (VECTORP (Vlatin_extra_code_table)
734 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c])) 823 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
735 { 824 {
736 int newmask = 0; 825 int newmask = 0;
737 826
738 if (coding_iso_8_1.flags & CODING_FLAG_ISO_LATIN_EXTRA) 827 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_1]->flags
828 & CODING_FLAG_ISO_LATIN_EXTRA)
739 newmask |= CODING_CATEGORY_MASK_ISO_8_1; 829 newmask |= CODING_CATEGORY_MASK_ISO_8_1;
740 if (coding_iso_8_2.flags & CODING_FLAG_ISO_LATIN_EXTRA) 830 if (coding_system_table[CODING_CATEGORY_IDX_ISO_8_2]->flags
831 & CODING_FLAG_ISO_LATIN_EXTRA)
741 newmask |= CODING_CATEGORY_MASK_ISO_8_2; 832 newmask |= CODING_CATEGORY_MASK_ISO_8_2;
742 mask &= newmask; 833 mask &= newmask;
834 mask_found |= newmask;
743 } 835 }
744 else 836 else
745 return 0; 837 return 0;
746 } 838 }
747 else 839 else
748 { 840 {
749 unsigned char *src_begin = src; 841 unsigned char *src_begin = src;
750 842
751 mask &= ~(CODING_CATEGORY_MASK_ISO_7 843 mask &= ~(CODING_CATEGORY_MASK_ISO_7BIT
752 | CODING_CATEGORY_MASK_ISO_7_ELSE); 844 | CODING_CATEGORY_MASK_ISO_7_ELSE);
845 mask_found |= CODING_CATEGORY_MASK_ISO_8_1;
753 while (src < src_end && *src >= 0xA0) 846 while (src < src_end && *src >= 0xA0)
754 src++; 847 src++;
755 if ((src - src_begin - 1) & 1 && src < src_end) 848 if ((src - src_begin - 1) & 1 && src < src_end)
756 mask &= ~CODING_CATEGORY_MASK_ISO_8_2; 849 mask &= ~CODING_CATEGORY_MASK_ISO_8_2;
850 else
851 mask_found |= CODING_CATEGORY_MASK_ISO_8_2;
757 } 852 }
758 break; 853 break;
759 } 854 }
760 } 855 }
761 856
762 return mask; 857 return (mask & mask_found);
763 } 858 }
764 859
765 /* Decode a character of which charset is CHARSET and the 1st position 860 /* Decode a character of which charset is CHARSET and the 1st position
766 code is C1. If dimension of CHARSET is 2, the 2nd position code is 861 code is C1. If dimension of CHARSET is 2, the 2nd position code is
767 fetched from SRC and set to C2. If CHARSET is negative, it means 862 fetched from SRC and set to C2. If CHARSET is negative, it means
806 /* To tell a composition rule follows. */ \ 901 /* To tell a composition rule follows. */ \
807 coding->composing = COMPOSING_WITH_RULE_RULE; \ 902 coding->composing = COMPOSING_WITH_RULE_RULE; \
808 } while (0) 903 } while (0)
809 904
810 /* Set designation state into CODING. */ 905 /* Set designation state into CODING. */
811 #define DECODE_DESIGNATION(reg, dimension, chars, final_char) \ 906 #define DECODE_DESIGNATION(reg, dimension, chars, final_char) \
812 do { \ 907 do { \
813 int charset = ISO_CHARSET_TABLE (make_number (dimension), \ 908 int charset = ISO_CHARSET_TABLE (make_number (dimension), \
814 make_number (chars), \ 909 make_number (chars), \
815 make_number (final_char)); \ 910 make_number (final_char)); \
816 if (charset >= 0) \ 911 if (charset >= 0 \
817 { \ 912 && CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) == reg) \
818 if (coding->direction == 1 \ 913 { \
819 && CHARSET_REVERSE_CHARSET (charset) >= 0) \ 914 if (coding->spec.iso2022.last_invalid_designation_register == 0 \
820 charset = CHARSET_REVERSE_CHARSET (charset); \ 915 && reg == 0 \
821 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \ 916 && charset == CHARSET_ASCII) \
822 } \ 917 { \
918 /* We should insert this designation sequence as is so \
919 that it is surely written back to a file. */ \
920 coding->spec.iso2022.last_invalid_designation_register = -1; \
921 goto label_invalid_code; \
922 } \
923 coding->spec.iso2022.last_invalid_designation_register = -1; \
924 if ((coding->mode & CODING_MODE_DIRECTION) \
925 && CHARSET_REVERSE_CHARSET (charset) >= 0) \
926 charset = CHARSET_REVERSE_CHARSET (charset); \
927 CODING_SPEC_ISO_DESIGNATION (coding, reg) = charset; \
928 } \
929 else \
930 { \
931 coding->spec.iso2022.last_invalid_designation_register = reg; \
932 goto label_invalid_code; \
933 } \
823 } while (0) 934 } while (0)
824 935
936 /* Check if the current composing sequence contains only valid codes.
937 If the composing sequence doesn't end before SRC_END, return -1.
938 Else, if it contains only valid codes, return 0.
939 Else return the length of the composing sequence. */
940
941 int check_composing_code (coding, src, src_end)
942 struct coding_system *coding;
943 unsigned char *src, *src_end;
944 {
945 unsigned char *src_start = src;
946 int invalid_code_found = 0;
947 int charset, c, c1, dim;
948
949 while (src < src_end)
950 {
951 if (*src++ != ISO_CODE_ESC) continue;
952 if (src >= src_end) break;
953 if ((c = *src++) == '1') /* end of compsition */
954 return (invalid_code_found ? src - src_start : 0);
955 if (src + 2 >= src_end) break;
956 if (!coding->flags & CODING_FLAG_ISO_DESIGNATION)
957 invalid_code_found = 1;
958 else
959 {
960 dim = 0;
961 if (c == '$')
962 {
963 dim = 1;
964 c = (*src >= '@' && *src <= 'B') ? '(' : *src++;
965 }
966 if (c >= '(' && c <= '/')
967 {
968 c1 = *src++;
969 if ((c1 < ' ' || c1 >= 0x80)
970 || (charset = iso_charset_table[dim][c >= ','][c1]) < 0
971 || (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
972 == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION))
973 invalid_code_found = 1;
974 }
975 else
976 invalid_code_found = 1;
977 }
978 }
979 return ((coding->mode & CODING_MODE_LAST_BLOCK) ? src_end - src_start : -1);
980 }
981
825 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */ 982 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". */
826 983
827 int 984 int
828 decode_coding_iso2022 (coding, source, destination, 985 decode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes)
829 src_bytes, dst_bytes, consumed)
830 struct coding_system *coding; 986 struct coding_system *coding;
831 unsigned char *source, *destination; 987 unsigned char *source, *destination;
832 int src_bytes, dst_bytes; 988 int src_bytes, dst_bytes;
833 int *consumed;
834 { 989 {
835 unsigned char *src = source; 990 unsigned char *src = source;
836 unsigned char *src_end = source + src_bytes; 991 unsigned char *src_end = source + src_bytes;
837 unsigned char *dst = destination; 992 unsigned char *dst = destination;
838 unsigned char *dst_end = destination + dst_bytes; 993 unsigned char *dst_end = destination + dst_bytes;
843 int charset; 998 int charset;
844 /* Charsets invoked to graphic plane 0 and 1 respectively. */ 999 /* Charsets invoked to graphic plane 0 and 1 respectively. */
845 int charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); 1000 int charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
846 int charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1); 1001 int charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
847 Lisp_Object unification_table 1002 Lisp_Object unification_table
848 = coding->character_unification_table_for_decode; 1003 = coding->character_unification_table_for_decode;
1004 int result = CODING_FINISH_NORMAL;
849 1005
850 if (!NILP (Venable_character_unification) && NILP (unification_table)) 1006 if (!NILP (Venable_character_unification) && NILP (unification_table))
851 unification_table = Vstandard_character_unification_table_for_decode; 1007 unification_table = Vstandard_character_unification_table_for_decode;
852 1008
853 while (src < src_end && dst < adjusted_dst_end) 1009 coding->produced_char = 0;
1010 while (src < src_end && (dst_bytes
1011 ? (dst < adjusted_dst_end)
1012 : (dst < src - 6)))
854 { 1013 {
855 /* SRC_BASE remembers the start position in source in each loop. 1014 /* SRC_BASE remembers the start position in source in each loop.
856 The loop will be exited when there's not enough source text 1015 The loop will be exited when there's not enough source text
857 to analyze long escape sequence or 2-byte code (within macros 1016 to analyze long escape sequence or 2-byte code (within macros
858 ONE_MORE_BYTE or TWO_MORE_BYTES). In that case, SRC is reset 1017 ONE_MORE_BYTE or TWO_MORE_BYTES). In that case, SRC is reset
866 if (!coding->composing 1025 if (!coding->composing
867 && (charset0 < 0 || CHARSET_CHARS (charset0) == 94)) 1026 && (charset0 < 0 || CHARSET_CHARS (charset0) == 94))
868 { 1027 {
869 /* This is SPACE or DEL. */ 1028 /* This is SPACE or DEL. */
870 *dst++ = c1; 1029 *dst++ = c1;
1030 coding->produced_char++;
871 break; 1031 break;
872 } 1032 }
873 /* This is a graphic character, we fall down ... */ 1033 /* This is a graphic character, we fall down ... */
874 1034
875 case ISO_graphic_plane_0: 1035 case ISO_graphic_plane_0:
882 else 1042 else
883 DECODE_ISO_CHARACTER (charset0, c1); 1043 DECODE_ISO_CHARACTER (charset0, c1);
884 break; 1044 break;
885 1045
886 case ISO_0xA0_or_0xFF: 1046 case ISO_0xA0_or_0xFF:
887 if (charset1 < 0 || CHARSET_CHARS (charset1) == 94) 1047 if (charset1 < 0 || CHARSET_CHARS (charset1) == 94
1048 || coding->flags & CODING_FLAG_ISO_SEVEN_BITS)
888 { 1049 {
889 /* Invalid code. */ 1050 /* Invalid code. */
890 *dst++ = c1; 1051 *dst++ = c1;
1052 coding->produced_char++;
891 break; 1053 break;
892 } 1054 }
893 /* This is a graphic character, we fall down ... */ 1055 /* This is a graphic character, we fall down ... */
894 1056
895 case ISO_graphic_plane_1: 1057 case ISO_graphic_plane_1:
896 DECODE_ISO_CHARACTER (charset1, c1); 1058 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS)
1059 {
1060 /* Invalid code. */
1061 *dst++ = c1;
1062 coding->produced_char++;
1063 }
1064 else
1065 DECODE_ISO_CHARACTER (charset1, c1);
897 break; 1066 break;
898 1067
899 case ISO_control_code: 1068 case ISO_control_code:
900 /* All ISO2022 control characters in this class have the 1069 /* All ISO2022 control characters in this class have the
901 same representation in Emacs internal format. */ 1070 same representation in Emacs internal format. */
1071 if (c1 == '\n'
1072 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
1073 && (coding->eol_type == CODING_EOL_CR
1074 || coding->eol_type == CODING_EOL_CRLF))
1075 {
1076 result = CODING_FINISH_INCONSISTENT_EOL;
1077 goto label_end_of_loop_2;
1078 }
902 *dst++ = c1; 1079 *dst++ = c1;
1080 coding->produced_char++;
903 break; 1081 break;
904 1082
905 case ISO_carriage_return: 1083 case ISO_carriage_return:
906 if (coding->eol_type == CODING_EOL_CR) 1084 if (coding->eol_type == CODING_EOL_CR)
907 { 1085 *dst++ = '\n';
908 *dst++ = '\n';
909 }
910 else if (coding->eol_type == CODING_EOL_CRLF) 1086 else if (coding->eol_type == CODING_EOL_CRLF)
911 { 1087 {
912 ONE_MORE_BYTE (c1); 1088 ONE_MORE_BYTE (c1);
913 if (c1 == ISO_CODE_LF) 1089 if (c1 == ISO_CODE_LF)
914 *dst++ = '\n'; 1090 *dst++ = '\n';
915 else 1091 else
916 { 1092 {
1093 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
1094 {
1095 result = CODING_FINISH_INCONSISTENT_EOL;
1096 goto label_end_of_loop_2;
1097 }
917 src--; 1098 src--;
918 *dst++ = c1; 1099 *dst++ = '\r';
919 } 1100 }
920 } 1101 }
921 else 1102 else
922 { 1103 *dst++ = c1;
923 *dst++ = c1; 1104 coding->produced_char++;
924 }
925 break; 1105 break;
926 1106
927 case ISO_shift_out: 1107 case ISO_shift_out:
928 if (CODING_SPEC_ISO_DESIGNATION (coding, 1) < 0) 1108 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
929 goto label_invalid_escape_sequence; 1109 || CODING_SPEC_ISO_DESIGNATION (coding, 1) < 0)
1110 goto label_invalid_code;
930 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1; 1111 CODING_SPEC_ISO_INVOCATION (coding, 0) = 1;
931 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); 1112 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
932 break; 1113 break;
933 1114
934 case ISO_shift_in: 1115 case ISO_shift_in:
1116 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT))
1117 goto label_invalid_code;
935 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0; 1118 CODING_SPEC_ISO_INVOCATION (coding, 0) = 0;
936 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); 1119 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
937 break; 1120 break;
938 1121
939 case ISO_single_shift_2_7: 1122 case ISO_single_shift_2_7:
940 case ISO_single_shift_2: 1123 case ISO_single_shift_2:
1124 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
1125 goto label_invalid_code;
941 /* SS2 is handled as an escape sequence of ESC 'N' */ 1126 /* SS2 is handled as an escape sequence of ESC 'N' */
942 c1 = 'N'; 1127 c1 = 'N';
943 goto label_escape_sequence; 1128 goto label_escape_sequence;
944 1129
945 case ISO_single_shift_3: 1130 case ISO_single_shift_3:
1131 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
1132 goto label_invalid_code;
946 /* SS2 is handled as an escape sequence of ESC 'O' */ 1133 /* SS2 is handled as an escape sequence of ESC 'O' */
947 c1 = 'O'; 1134 c1 = 'O';
948 goto label_escape_sequence; 1135 goto label_escape_sequence;
949 1136
950 case ISO_control_sequence_introducer: 1137 case ISO_control_sequence_introducer:
961 switch (c1) 1148 switch (c1)
962 { 1149 {
963 case '&': /* revision of following character set */ 1150 case '&': /* revision of following character set */
964 ONE_MORE_BYTE (c1); 1151 ONE_MORE_BYTE (c1);
965 if (!(c1 >= '@' && c1 <= '~')) 1152 if (!(c1 >= '@' && c1 <= '~'))
966 goto label_invalid_escape_sequence; 1153 goto label_invalid_code;
967 ONE_MORE_BYTE (c1); 1154 ONE_MORE_BYTE (c1);
968 if (c1 != ISO_CODE_ESC) 1155 if (c1 != ISO_CODE_ESC)
969 goto label_invalid_escape_sequence; 1156 goto label_invalid_code;
970 ONE_MORE_BYTE (c1); 1157 ONE_MORE_BYTE (c1);
971 goto label_escape_sequence; 1158 goto label_escape_sequence;
972 1159
973 case '$': /* designation of 2-byte character set */ 1160 case '$': /* designation of 2-byte character set */
1161 if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION))
1162 goto label_invalid_code;
974 ONE_MORE_BYTE (c1); 1163 ONE_MORE_BYTE (c1);
975 if (c1 >= '@' && c1 <= 'B') 1164 if (c1 >= '@' && c1 <= 'B')
976 { /* designation of JISX0208.1978, GB2312.1980, 1165 { /* designation of JISX0208.1978, GB2312.1980,
977 or JISX0208.1980 */ 1166 or JISX0208.1980 */
978 DECODE_DESIGNATION (0, 2, 94, c1); 1167 DECODE_DESIGNATION (0, 2, 94, c1);
986 { /* designation of DIMENSION2_CHARS96 character set */ 1175 { /* designation of DIMENSION2_CHARS96 character set */
987 ONE_MORE_BYTE (c2); 1176 ONE_MORE_BYTE (c2);
988 DECODE_DESIGNATION (c1 - 0x2C, 2, 96, c2); 1177 DECODE_DESIGNATION (c1 - 0x2C, 2, 96, c2);
989 } 1178 }
990 else 1179 else
991 goto label_invalid_escape_sequence; 1180 goto label_invalid_code;
992 break; 1181 break;
993 1182
994 case 'n': /* invocation of locking-shift-2 */ 1183 case 'n': /* invocation of locking-shift-2 */
995 if (CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0) 1184 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
996 goto label_invalid_escape_sequence; 1185 || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0)
1186 goto label_invalid_code;
997 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2; 1187 CODING_SPEC_ISO_INVOCATION (coding, 0) = 2;
998 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); 1188 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
999 break; 1189 break;
1000 1190
1001 case 'o': /* invocation of locking-shift-3 */ 1191 case 'o': /* invocation of locking-shift-3 */
1002 if (CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0) 1192 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)
1003 goto label_invalid_escape_sequence; 1193 || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0)
1194 goto label_invalid_code;
1004 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3; 1195 CODING_SPEC_ISO_INVOCATION (coding, 0) = 3;
1005 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); 1196 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
1006 break; 1197 break;
1007 1198
1008 case 'N': /* invocation of single-shift-2 */ 1199 case 'N': /* invocation of single-shift-2 */
1009 if (CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0) 1200 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1010 goto label_invalid_escape_sequence; 1201 || CODING_SPEC_ISO_DESIGNATION (coding, 2) < 0)
1202 goto label_invalid_code;
1011 ONE_MORE_BYTE (c1); 1203 ONE_MORE_BYTE (c1);
1012 charset = CODING_SPEC_ISO_DESIGNATION (coding, 2); 1204 charset = CODING_SPEC_ISO_DESIGNATION (coding, 2);
1013 DECODE_ISO_CHARACTER (charset, c1); 1205 DECODE_ISO_CHARACTER (charset, c1);
1014 break; 1206 break;
1015 1207
1016 case 'O': /* invocation of single-shift-3 */ 1208 case 'O': /* invocation of single-shift-3 */
1017 if (CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0) 1209 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)
1018 goto label_invalid_escape_sequence; 1210 || CODING_SPEC_ISO_DESIGNATION (coding, 3) < 0)
1211 goto label_invalid_code;
1019 ONE_MORE_BYTE (c1); 1212 ONE_MORE_BYTE (c1);
1020 charset = CODING_SPEC_ISO_DESIGNATION (coding, 3); 1213 charset = CODING_SPEC_ISO_DESIGNATION (coding, 3);
1021 DECODE_ISO_CHARACTER (charset, c1); 1214 DECODE_ISO_CHARACTER (charset, c1);
1022 break; 1215 break;
1023 1216
1024 case '0': /* start composing without embeded rules */ 1217 case '0': case '2': /* start composing */
1025 coding->composing = COMPOSING_NO_RULE_HEAD; 1218 /* Before processing composing, we must be sure that all
1219 characters being composed are supported by CODING.
1220 If not, we must give up composing and insert the
1221 bunch of codes for composing as is without decoding. */
1222 {
1223 int result1;
1224
1225 result1 = check_composing_code (coding, src, src_end);
1226 if (result1 == 0)
1227 coding->composing = (c1 == '0'
1228 ? COMPOSING_NO_RULE_HEAD
1229 : COMPOSING_WITH_RULE_HEAD);
1230 else if (result1 > 0)
1231 {
1232 if (result1 + 2 < (dst_bytes ? dst_end : src_base) - dst)
1233 {
1234 bcopy (src_base, dst, result1 + 2);
1235 src += result1;
1236 dst += result1 + 2;
1237 coding->produced_char += result1 + 2;
1238 }
1239 else
1240 {
1241 result = CODING_FINISH_INSUFFICIENT_DST;
1242 goto label_end_of_loop_2;
1243 }
1244 }
1245 else
1246 goto label_end_of_loop;
1247 }
1026 break; 1248 break;
1027 1249
1028 case '1': /* end composing */ 1250 case '1': /* end composing */
1029 coding->composing = COMPOSING_NO; 1251 coding->composing = COMPOSING_NO;
1252 coding->produced_char++;
1030 break; 1253 break;
1031 1254
1032 case '2': /* start composing with embeded rules */
1033 coding->composing = COMPOSING_WITH_RULE_HEAD;
1034 break;
1035
1036 case '[': /* specification of direction */ 1255 case '[': /* specification of direction */
1256 if (coding->flags & CODING_FLAG_ISO_NO_DIRECTION)
1257 goto label_invalid_code;
1037 /* For the moment, nested direction is not supported. 1258 /* For the moment, nested direction is not supported.
1038 So, the value of `coding->direction' is 0 or 1: 0 1259 So, `coding->mode & CODING_MODE_DIRECTION' zero means
1039 means left-to-right, 1 means right-to-left. */ 1260 left-to-right, and nozero means right-to-left. */
1040 ONE_MORE_BYTE (c1); 1261 ONE_MORE_BYTE (c1);
1041 switch (c1) 1262 switch (c1)
1042 { 1263 {
1043 case ']': /* end of the current direction */ 1264 case ']': /* end of the current direction */
1044 coding->direction = 0; 1265 coding->mode &= ~CODING_MODE_DIRECTION;
1045 1266
1046 case '0': /* end of the current direction */ 1267 case '0': /* end of the current direction */
1047 case '1': /* start of left-to-right direction */ 1268 case '1': /* start of left-to-right direction */
1048 ONE_MORE_BYTE (c1); 1269 ONE_MORE_BYTE (c1);
1049 if (c1 == ']') 1270 if (c1 == ']')
1050 coding->direction = 0; 1271 coding->mode &= ~CODING_MODE_DIRECTION;
1051 else 1272 else
1052 goto label_invalid_escape_sequence; 1273 goto label_invalid_code;
1053 break; 1274 break;
1054 1275
1055 case '2': /* start of right-to-left direction */ 1276 case '2': /* start of right-to-left direction */
1056 ONE_MORE_BYTE (c1); 1277 ONE_MORE_BYTE (c1);
1057 if (c1 == ']') 1278 if (c1 == ']')
1058 coding->direction= 1; 1279 coding->mode |= CODING_MODE_DIRECTION;
1059 else 1280 else
1060 goto label_invalid_escape_sequence; 1281 goto label_invalid_code;
1061 break; 1282 break;
1062 1283
1063 default: 1284 default:
1064 goto label_invalid_escape_sequence; 1285 goto label_invalid_code;
1065 } 1286 }
1066 break; 1287 break;
1067 1288
1068 default: 1289 default:
1290 if (! (coding->flags & CODING_FLAG_ISO_DESIGNATION))
1291 goto label_invalid_code;
1069 if (c1 >= 0x28 && c1 <= 0x2B) 1292 if (c1 >= 0x28 && c1 <= 0x2B)
1070 { /* designation of DIMENSION1_CHARS94 character set */ 1293 { /* designation of DIMENSION1_CHARS94 character set */
1071 ONE_MORE_BYTE (c2); 1294 ONE_MORE_BYTE (c2);
1072 DECODE_DESIGNATION (c1 - 0x28, 1, 94, c2); 1295 DECODE_DESIGNATION (c1 - 0x28, 1, 94, c2);
1073 } 1296 }
1076 ONE_MORE_BYTE (c2); 1299 ONE_MORE_BYTE (c2);
1077 DECODE_DESIGNATION (c1 - 0x2C, 1, 96, c2); 1300 DECODE_DESIGNATION (c1 - 0x2C, 1, 96, c2);
1078 } 1301 }
1079 else 1302 else
1080 { 1303 {
1081 goto label_invalid_escape_sequence; 1304 goto label_invalid_code;
1082 } 1305 }
1083 } 1306 }
1084 /* We must update these variables now. */ 1307 /* We must update these variables now. */
1085 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0); 1308 charset0 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 0);
1086 charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1); 1309 charset1 = CODING_SPEC_ISO_PLANE_CHARSET (coding, 1);
1087 break; 1310 break;
1088 1311
1089 label_invalid_escape_sequence: 1312 label_invalid_code:
1090 { 1313 coding->produced_char += src - src_base;
1091 int length = src - src_base; 1314 while (src_base < src)
1092 1315 *dst++ = *src_base++;
1093 bcopy (src_base, dst, length);
1094 dst += length;
1095 }
1096 } 1316 }
1097 continue; 1317 continue;
1098 1318
1099 label_end_of_loop: 1319 label_end_of_loop:
1100 coding->carryover_size = src - src_base; 1320 result = CODING_FINISH_INSUFFICIENT_SRC;
1101 bcopy (src_base, coding->carryover, coding->carryover_size); 1321 label_end_of_loop_2:
1102 src = src_base; 1322 src = src_base;
1103 break; 1323 break;
1104 } 1324 }
1325
1326 if (result == CODING_FINISH_NORMAL
1327 && src < src_end)
1328 result = CODING_FINISH_INSUFFICIENT_DST;
1105 1329
1106 /* If this is the last block of the text to be decoded, we had 1330 /* If this is the last block of the text to be decoded, we had
1107 better just flush out all remaining codes in the text although 1331 better just flush out all remaining codes in the text although
1108 they are not valid characters. */ 1332 they are not valid characters. */
1109 if (coding->last_block) 1333 if (coding->mode & CODING_MODE_LAST_BLOCK)
1110 { 1334 {
1111 bcopy (src, dst, src_end - src); 1335 bcopy (src, dst, src_end - src);
1112 dst += (src_end - src); 1336 dst += (src_end - src);
1113 src = src_end; 1337 src = src_end;
1114 } 1338 }
1115 *consumed = src - source; 1339 coding->consumed = coding->consumed_char = src - source;
1116 return dst - destination; 1340 coding->produced = dst - destination;
1341 return result;
1117 } 1342 }
1118 1343
1119 /* ISO2022 encoding stuff. */ 1344 /* ISO2022 encoding stuff. */
1120 1345
1121 /* 1346 /*
1122 It is not enough to say just "ISO2022" on encoding, we have to 1347 It is not enough to say just "ISO2022" on encoding, we have to
1123 specify more details. In Emacs, each coding-system of ISO2022 1348 specify more details. In Emacs, each coding system of ISO2022
1124 variant has the following specifications: 1349 variant has the following specifications:
1125 1. Initial designation to G0 thru G3. 1350 1. Initial designation to G0 thru G3.
1126 2. Allows short-form designation? 1351 2. Allows short-form designation?
1127 3. ASCII should be designated to G0 before control characters? 1352 3. ASCII should be designated to G0 before control characters?
1128 4. ASCII should be designated to G0 at end of line? 1353 4. ASCII should be designated to G0 at end of line?
1327 charset_alt = charset; \ 1552 charset_alt = charset; \
1328 if (CHARSET_DIMENSION (charset_alt) == 1) \ 1553 if (CHARSET_DIMENSION (charset_alt) == 1) \
1329 ENCODE_ISO_CHARACTER_DIMENSION1 (charset_alt, c1); \ 1554 ENCODE_ISO_CHARACTER_DIMENSION1 (charset_alt, c1); \
1330 else \ 1555 else \
1331 ENCODE_ISO_CHARACTER_DIMENSION2 (charset_alt, c1, c2); \ 1556 ENCODE_ISO_CHARACTER_DIMENSION2 (charset_alt, c1, c2); \
1557 if (! COMPOSING_P (coding->composing)) \
1558 coding->consumed_char++; \
1332 } while (0) 1559 } while (0)
1333 1560
1334 /* Produce designation and invocation codes at a place pointed by DST 1561 /* Produce designation and invocation codes at a place pointed by DST
1335 to use CHARSET. The element `spec.iso2022' of *CODING is updated. 1562 to use CHARSET. The element `spec.iso2022' of *CODING is updated.
1336 Return new DST. */ 1563 Return new DST. */
1429 ENCODE_DESIGNATION \ 1656 ENCODE_DESIGNATION \
1430 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg), reg, coding); \ 1657 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, reg), reg, coding); \
1431 } while (0) 1658 } while (0)
1432 1659
1433 /* Produce designation sequences of charsets in the line started from 1660 /* Produce designation sequences of charsets in the line started from
1434 *SRC to a place pointed by DSTP. 1661 SRC to a place pointed by *DSTP, and update DSTP.
1435 1662
1436 If the current block ends before any end-of-line, we may fail to 1663 If the current block ends before any end-of-line, we may fail to
1437 find all the necessary *designations. */ 1664 find all the necessary designations. */
1665
1438 encode_designation_at_bol (coding, table, src, src_end, dstp) 1666 encode_designation_at_bol (coding, table, src, src_end, dstp)
1439 struct coding_system *coding; 1667 struct coding_system *coding;
1440 Lisp_Object table; 1668 Lisp_Object table;
1441 unsigned char *src, *src_end, **dstp; 1669 unsigned char *src, *src_end, **dstp;
1442 { 1670 {
1463 if ((c_alt = unify_char (table, -1, charset, c1, c2)) >= 0) 1691 if ((c_alt = unify_char (table, -1, charset, c1, c2)) >= 0)
1464 charset = CHAR_CHARSET (c_alt); 1692 charset = CHAR_CHARSET (c_alt);
1465 } 1693 }
1466 1694
1467 reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset); 1695 reg = CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset);
1468 if (r[reg] < 0) 1696 if (reg != CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION && r[reg] < 0)
1469 { 1697 {
1470 found++; 1698 found++;
1471 r[reg] = charset; 1699 r[reg] = charset;
1472 } 1700 }
1473 1701
1485 } 1713 }
1486 1714
1487 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */ 1715 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */
1488 1716
1489 int 1717 int
1490 encode_coding_iso2022 (coding, source, destination, 1718 encode_coding_iso2022 (coding, source, destination, src_bytes, dst_bytes)
1491 src_bytes, dst_bytes, consumed)
1492 struct coding_system *coding; 1719 struct coding_system *coding;
1493 unsigned char *source, *destination; 1720 unsigned char *source, *destination;
1494 int src_bytes, dst_bytes; 1721 int src_bytes, dst_bytes;
1495 int *consumed;
1496 { 1722 {
1497 unsigned char *src = source; 1723 unsigned char *src = source;
1498 unsigned char *src_end = source + src_bytes; 1724 unsigned char *src_end = source + src_bytes;
1499 unsigned char *dst = destination; 1725 unsigned char *dst = destination;
1500 unsigned char *dst_end = destination + dst_bytes; 1726 unsigned char *dst_end = destination + dst_bytes;
1502 from DST_END to assure overflow checking is necessary only at the 1728 from DST_END to assure overflow checking is necessary only at the
1503 head of loop. */ 1729 head of loop. */
1504 unsigned char *adjusted_dst_end = dst_end - 19; 1730 unsigned char *adjusted_dst_end = dst_end - 19;
1505 Lisp_Object unification_table 1731 Lisp_Object unification_table
1506 = coding->character_unification_table_for_encode; 1732 = coding->character_unification_table_for_encode;
1733 int result = CODING_FINISH_NORMAL;
1507 1734
1508 if (!NILP (Venable_character_unification) && NILP (unification_table)) 1735 if (!NILP (Venable_character_unification) && NILP (unification_table))
1509 unification_table = Vstandard_character_unification_table_for_encode; 1736 unification_table = Vstandard_character_unification_table_for_encode;
1510 1737
1511 while (src < src_end && dst < adjusted_dst_end) 1738 coding->consumed_char = 0;
1739 while (src < src_end && (dst_bytes
1740 ? (dst < adjusted_dst_end)
1741 : (dst < src - 19)))
1512 { 1742 {
1513 /* SRC_BASE remembers the start position in source in each loop. 1743 /* SRC_BASE remembers the start position in source in each loop.
1514 The loop will be exited when there's not enough source text 1744 The loop will be exited when there's not enough source text
1515 to analyze multi-byte codes (within macros ONE_MORE_BYTE, 1745 to analyze multi-byte codes (within macros ONE_MORE_BYTE,
1516 TWO_MORE_BYTES, and THREE_MORE_BYTES). In that case, SRC is 1746 TWO_MORE_BYTES, and THREE_MORE_BYTES). In that case, SRC is
1527 CODING_SPEC_ISO_BOL (coding) = 0; 1757 CODING_SPEC_ISO_BOL (coding) = 0;
1528 } 1758 }
1529 1759
1530 c1 = *src++; 1760 c1 = *src++;
1531 /* If we are seeing a component of a composite character, we are 1761 /* If we are seeing a component of a composite character, we are
1532 seeing a leading-code specially encoded for composition, or a 1762 seeing a leading-code encoded irregularly for composition, or
1533 composition rule if composing with rule. We must set C1 1763 a composition rule if composing with rule. We must set C1 to
1534 to a normal leading-code or an ASCII code. If we are not at 1764 a normal leading-code or an ASCII code. If we are not seeing
1535 a composed character, we must reset the composition state. */ 1765 a composite character, we must reset composition,
1766 designation, and invocation states. */
1536 if (COMPOSING_P (coding->composing)) 1767 if (COMPOSING_P (coding->composing))
1537 { 1768 {
1538 if (c1 < 0xA0) 1769 if (c1 < 0xA0)
1539 { 1770 {
1540 /* We are not in a composite character any longer. */ 1771 /* We are not in a composite character any longer. */
1541 coding->composing = COMPOSING_NO; 1772 coding->composing = COMPOSING_NO;
1773 ENCODE_RESET_PLANE_AND_REGISTER;
1542 ENCODE_COMPOSITION_END; 1774 ENCODE_COMPOSITION_END;
1543 } 1775 }
1544 else 1776 else
1545 { 1777 {
1546 if (coding->composing == COMPOSING_WITH_RULE_RULE) 1778 if (coding->composing == COMPOSING_WITH_RULE_RULE)
1573 1805
1574 case EMACS_control_code: 1806 case EMACS_control_code:
1575 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL) 1807 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL)
1576 ENCODE_RESET_PLANE_AND_REGISTER; 1808 ENCODE_RESET_PLANE_AND_REGISTER;
1577 *dst++ = c1; 1809 *dst++ = c1;
1810 coding->consumed_char++;
1578 break; 1811 break;
1579 1812
1580 case EMACS_carriage_return_code: 1813 case EMACS_carriage_return_code:
1581 if (!coding->selective) 1814 if (! (coding->mode & CODING_MODE_SELECTIVE_DISPLAY))
1582 { 1815 {
1583 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL) 1816 if (coding->flags & CODING_FLAG_ISO_RESET_AT_CNTL)
1584 ENCODE_RESET_PLANE_AND_REGISTER; 1817 ENCODE_RESET_PLANE_AND_REGISTER;
1585 *dst++ = c1; 1818 *dst++ = c1;
1819 coding->consumed_char++;
1586 break; 1820 break;
1587 } 1821 }
1588 /* fall down to treat '\r' as '\n' ... */ 1822 /* fall down to treat '\r' as '\n' ... */
1589 1823
1590 case EMACS_linefeed_code: 1824 case EMACS_linefeed_code:
1600 else if (coding->eol_type == CODING_EOL_CRLF) 1834 else if (coding->eol_type == CODING_EOL_CRLF)
1601 *dst++ = ISO_CODE_CR, *dst++ = ISO_CODE_LF; 1835 *dst++ = ISO_CODE_CR, *dst++ = ISO_CODE_LF;
1602 else 1836 else
1603 *dst++ = ISO_CODE_CR; 1837 *dst++ = ISO_CODE_CR;
1604 CODING_SPEC_ISO_BOL (coding) = 1; 1838 CODING_SPEC_ISO_BOL (coding) = 1;
1839 coding->consumed_char++;
1605 break; 1840 break;
1606 1841
1607 case EMACS_leading_code_2: 1842 case EMACS_leading_code_2:
1608 ONE_MORE_BYTE (c2); 1843 ONE_MORE_BYTE (c2);
1609 if (c2 < 0xA0) 1844 if (c2 < 0xA0)
1610 { 1845 {
1611 /* invalid sequence */ 1846 /* invalid sequence */
1612 *dst++ = c1; 1847 *dst++ = c1;
1613 *dst++ = c2; 1848 *dst++ = c2;
1849 coding->consumed_char += 2;
1614 } 1850 }
1615 else 1851 else
1616 ENCODE_ISO_CHARACTER (c1, c2, /* dummy */ c3); 1852 ENCODE_ISO_CHARACTER (c1, c2, /* dummy */ c3);
1617 break; 1853 break;
1618 1854
1622 { 1858 {
1623 /* invalid sequence */ 1859 /* invalid sequence */
1624 *dst++ = c1; 1860 *dst++ = c1;
1625 *dst++ = c2; 1861 *dst++ = c2;
1626 *dst++ = c3; 1862 *dst++ = c3;
1863 coding->consumed_char += 3;
1627 } 1864 }
1628 else if (c1 < LEADING_CODE_PRIVATE_11) 1865 else if (c1 < LEADING_CODE_PRIVATE_11)
1629 ENCODE_ISO_CHARACTER (c1, c2, c3); 1866 ENCODE_ISO_CHARACTER (c1, c2, c3);
1630 else 1867 else
1631 ENCODE_ISO_CHARACTER (c2, c3, /* dummy */ c4); 1868 ENCODE_ISO_CHARACTER (c2, c3, /* dummy */ c4);
1638 /* invalid sequence */ 1875 /* invalid sequence */
1639 *dst++ = c1; 1876 *dst++ = c1;
1640 *dst++ = c2; 1877 *dst++ = c2;
1641 *dst++ = c3; 1878 *dst++ = c3;
1642 *dst++ = c4; 1879 *dst++ = c4;
1880 coding->consumed_char += 4;
1643 } 1881 }
1644 else 1882 else
1645 ENCODE_ISO_CHARACTER (c2, c3, c4); 1883 ENCODE_ISO_CHARACTER (c2, c3, c4);
1646 break; 1884 break;
1647 1885
1650 if (c2 < 0xA0) 1888 if (c2 < 0xA0)
1651 { 1889 {
1652 /* invalid sequence */ 1890 /* invalid sequence */
1653 *dst++ = c1; 1891 *dst++ = c1;
1654 *dst++ = c2; 1892 *dst++ = c2;
1893 coding->consumed_char += 2;
1655 } 1894 }
1656 else if (c2 == 0xFF) 1895 else if (c2 == 0xFF)
1657 { 1896 {
1897 ENCODE_RESET_PLANE_AND_REGISTER;
1658 coding->composing = COMPOSING_WITH_RULE_HEAD; 1898 coding->composing = COMPOSING_WITH_RULE_HEAD;
1659 ENCODE_COMPOSITION_WITH_RULE_START; 1899 ENCODE_COMPOSITION_WITH_RULE_START;
1900 coding->consumed_char++;
1660 } 1901 }
1661 else 1902 else
1662 { 1903 {
1904 ENCODE_RESET_PLANE_AND_REGISTER;
1663 /* Rewind one byte because it is a character code of 1905 /* Rewind one byte because it is a character code of
1664 composition elements. */ 1906 composition elements. */
1665 src--; 1907 src--;
1666 coding->composing = COMPOSING_NO_RULE_HEAD; 1908 coding->composing = COMPOSING_NO_RULE_HEAD;
1667 ENCODE_COMPOSITION_NO_RULE_START; 1909 ENCODE_COMPOSITION_NO_RULE_START;
1910 coding->consumed_char++;
1668 } 1911 }
1669 break; 1912 break;
1670 1913
1671 case EMACS_invalid_code: 1914 case EMACS_invalid_code:
1672 *dst++ = c1; 1915 *dst++ = c1;
1916 coding->consumed_char++;
1673 break; 1917 break;
1674 } 1918 }
1675 continue; 1919 continue;
1676 label_end_of_loop: 1920 label_end_of_loop:
1677 /* We reach here because the source date ends not at character 1921 result = CODING_FINISH_INSUFFICIENT_SRC;
1678 boundary. */ 1922 src = src_base;
1679 coding->carryover_size = src_end - src_base;
1680 bcopy (src_base, coding->carryover, coding->carryover_size);
1681 src = src_end;
1682 break; 1923 break;
1683 } 1924 }
1684 1925
1926 if (result == CODING_FINISH_NORMAL
1927 && src < src_end)
1928 result = CODING_FINISH_INSUFFICIENT_DST;
1929
1685 /* If this is the last block of the text to be encoded, we must 1930 /* If this is the last block of the text to be encoded, we must
1686 reset graphic planes and registers to the initial state. */ 1931 reset graphic planes and registers to the initial state, and
1687 if (src >= src_end && coding->last_block) 1932 flush out the carryover if any. */
1688 { 1933 if (coding->mode & CODING_MODE_LAST_BLOCK)
1689 ENCODE_RESET_PLANE_AND_REGISTER; 1934 ENCODE_RESET_PLANE_AND_REGISTER;
1690 if (coding->carryover_size > 0 1935
1691 && coding->carryover_size < (dst_end - dst)) 1936 coding->consumed = src - source;
1692 { 1937 coding->produced = coding->produced_char = dst - destination;
1693 bcopy (coding->carryover, dst, coding->carryover_size); 1938 return result;
1694 dst += coding->carryover_size;
1695 coding->carryover_size = 0;
1696 }
1697 }
1698 *consumed = src - source;
1699 return dst - destination;
1700 } 1939 }
1701 1940
1702 1941
1703 /*** 4. SJIS and BIG5 handlers ***/ 1942 /*** 4. SJIS and BIG5 handlers ***/
1704 1943
1785 DECODE_CHARACTER_ASCII (c1); \ 2024 DECODE_CHARACTER_ASCII (c1); \
1786 else if (CHARSET_DIMENSION (charset_alt) == 1) \ 2025 else if (CHARSET_DIMENSION (charset_alt) == 1) \
1787 DECODE_CHARACTER_DIMENSION1 (charset_alt, c1); \ 2026 DECODE_CHARACTER_DIMENSION1 (charset_alt, c1); \
1788 else \ 2027 else \
1789 DECODE_CHARACTER_DIMENSION2 (charset_alt, c1, c2); \ 2028 DECODE_CHARACTER_DIMENSION2 (charset_alt, c1, c2); \
2029 coding->produced_char++; \
1790 } while (0) 2030 } while (0)
1791 2031
1792 #define ENCODE_SJIS_BIG5_CHARACTER(charset, c1, c2) \ 2032 #define ENCODE_SJIS_BIG5_CHARACTER(charset, c1, c2) \
1793 do { \ 2033 do { \
1794 int c_alt, charset_alt; \ 2034 int c_alt, charset_alt; \
1827 *dst++ = b1, *dst++ = b2; \ 2067 *dst++ = b1, *dst++ = b2; \
1828 } \ 2068 } \
1829 else \ 2069 else \
1830 *dst++ = charset_alt, *dst++ = c1, *dst++ = c2; \ 2070 *dst++ = charset_alt, *dst++ = c1, *dst++ = c2; \
1831 } \ 2071 } \
2072 coding->consumed_char++; \
1832 } while (0); 2073 } while (0);
1833 2074
1834 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions". 2075 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
1835 Check if a text is encoded in SJIS. If it is, return 2076 Check if a text is encoded in SJIS. If it is, return
1836 CODING_CATEGORY_MASK_SJIS, else return 0. */ 2077 CODING_CATEGORY_MASK_SJIS, else return 0. */
1842 unsigned char c; 2083 unsigned char c;
1843 2084
1844 while (src < src_end) 2085 while (src < src_end)
1845 { 2086 {
1846 c = *src++; 2087 c = *src++;
1847 if (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO)
1848 return 0;
1849 if ((c >= 0x80 && c < 0xA0) || c >= 0xE0) 2088 if ((c >= 0x80 && c < 0xA0) || c >= 0xE0)
1850 { 2089 {
1851 if (src < src_end && *src++ < 0x40) 2090 if (src < src_end && *src++ < 0x40)
1852 return 0; 2091 return 0;
1853 } 2092 }
1866 unsigned char c; 2105 unsigned char c;
1867 2106
1868 while (src < src_end) 2107 while (src < src_end)
1869 { 2108 {
1870 c = *src++; 2109 c = *src++;
1871 if (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO)
1872 return 0;
1873 if (c >= 0xA1) 2110 if (c >= 0xA1)
1874 { 2111 {
1875 if (src >= src_end) 2112 if (src >= src_end)
1876 break; 2113 break;
1877 c = *src++; 2114 c = *src++;
1885 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". 2122 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions".
1886 If SJIS_P is 1, decode SJIS text, else decode BIG5 test. */ 2123 If SJIS_P is 1, decode SJIS text, else decode BIG5 test. */
1887 2124
1888 int 2125 int
1889 decode_coding_sjis_big5 (coding, source, destination, 2126 decode_coding_sjis_big5 (coding, source, destination,
1890 src_bytes, dst_bytes, consumed, sjis_p) 2127 src_bytes, dst_bytes, sjis_p)
1891 struct coding_system *coding; 2128 struct coding_system *coding;
1892 unsigned char *source, *destination; 2129 unsigned char *source, *destination;
1893 int src_bytes, dst_bytes; 2130 int src_bytes, dst_bytes;
1894 int *consumed;
1895 int sjis_p; 2131 int sjis_p;
1896 { 2132 {
1897 unsigned char *src = source; 2133 unsigned char *src = source;
1898 unsigned char *src_end = source + src_bytes; 2134 unsigned char *src_end = source + src_bytes;
1899 unsigned char *dst = destination; 2135 unsigned char *dst = destination;
1902 from DST_END to assure overflow checking is necessary only at the 2138 from DST_END to assure overflow checking is necessary only at the
1903 head of loop. */ 2139 head of loop. */
1904 unsigned char *adjusted_dst_end = dst_end - 3; 2140 unsigned char *adjusted_dst_end = dst_end - 3;
1905 Lisp_Object unification_table 2141 Lisp_Object unification_table
1906 = coding->character_unification_table_for_decode; 2142 = coding->character_unification_table_for_decode;
2143 int result = CODING_FINISH_NORMAL;
1907 2144
1908 if (!NILP (Venable_character_unification) && NILP (unification_table)) 2145 if (!NILP (Venable_character_unification) && NILP (unification_table))
1909 unification_table = Vstandard_character_unification_table_for_decode; 2146 unification_table = Vstandard_character_unification_table_for_decode;
1910 2147
1911 while (src < src_end && dst < adjusted_dst_end) 2148 coding->produced_char = 0;
2149 while (src < src_end && (dst_bytes
2150 ? (dst < adjusted_dst_end)
2151 : (dst < src - 3)))
1912 { 2152 {
1913 /* SRC_BASE remembers the start position in source in each loop. 2153 /* SRC_BASE remembers the start position in source in each loop.
1914 The loop will be exited when there's not enough source text 2154 The loop will be exited when there's not enough source text
1915 to analyze two-byte character (within macro ONE_MORE_BYTE). 2155 to analyze two-byte character (within macro ONE_MORE_BYTE).
1916 In that case, SRC is reset to SRC_BASE before exiting. */ 2156 In that case, SRC is reset to SRC_BASE before exiting. */
1917 unsigned char *src_base = src; 2157 unsigned char *src_base = src;
1918 unsigned char c1 = *src++, c2, c3, c4; 2158 unsigned char c1 = *src++, c2, c3, c4;
1919 2159
1920 if (c1 == '\r') 2160 if (c1 < 0x20)
1921 { 2161 {
1922 if (coding->eol_type == CODING_EOL_CRLF) 2162 if (c1 == '\r')
1923 { 2163 {
1924 ONE_MORE_BYTE (c2); 2164 if (coding->eol_type == CODING_EOL_CRLF)
1925 if (c2 == '\n') 2165 {
1926 *dst++ = c2; 2166 ONE_MORE_BYTE (c2);
2167 if (c2 == '\n')
2168 *dst++ = c2;
2169 else if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2170 {
2171 result = CODING_FINISH_INCONSISTENT_EOL;
2172 goto label_end_of_loop_2;
2173 }
2174 else
2175 /* To process C2 again, SRC is subtracted by 1. */
2176 *dst++ = c1, src--;
2177 }
2178 else if (coding->eol_type == CODING_EOL_CR)
2179 *dst++ = '\n';
1927 else 2180 else
1928 /* To process C2 again, SRC is subtracted by 1. */ 2181 *dst++ = c1;
1929 *dst++ = c1, src--;
1930 } 2182 }
1931 else if (coding->eol_type == CODING_EOL_CR) 2183 else if (c1 == '\n'
1932 *dst++ = '\n'; 2184 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2185 && (coding->eol_type == CODING_EOL_CR
2186 || coding->eol_type == CODING_EOL_CRLF))
2187 {
2188 result = CODING_FINISH_INCONSISTENT_EOL;
2189 goto label_end_of_loop_2;
2190 }
1933 else 2191 else
1934 *dst++ = c1; 2192 *dst++ = c1;
1935 } 2193 coding->produced_char++;
1936 else if (c1 < 0x20) 2194 }
1937 *dst++ = c1;
1938 else if (c1 < 0x80) 2195 else if (c1 < 0x80)
1939 DECODE_SJIS_BIG5_CHARACTER (charset_ascii, c1, /* dummy */ c2); 2196 DECODE_SJIS_BIG5_CHARACTER (charset_ascii, c1, /* dummy */ c2);
1940 else if (c1 < 0xA0 || c1 >= 0xE0) 2197 else if (c1 < 0xA0 || c1 >= 0xE0)
1941 { 2198 {
1942 /* SJIS -> JISX0208, BIG5 -> Big5 (only if 0xE0 <= c1 < 0xFF) */ 2199 /* SJIS -> JISX0208, BIG5 -> Big5 (only if 0xE0 <= c1 < 0xFF) */
1953 ONE_MORE_BYTE (c2); 2210 ONE_MORE_BYTE (c2);
1954 DECODE_BIG5 (c1, c2, charset, c3, c4); 2211 DECODE_BIG5 (c1, c2, charset, c3, c4);
1955 DECODE_SJIS_BIG5_CHARACTER (charset, c3, c4); 2212 DECODE_SJIS_BIG5_CHARACTER (charset, c3, c4);
1956 } 2213 }
1957 else /* Invalid code */ 2214 else /* Invalid code */
1958 *dst++ = c1; 2215 {
2216 *dst++ = c1;
2217 coding->produced_char++;
2218 }
1959 } 2219 }
1960 else 2220 else
1961 { 2221 {
1962 /* SJIS -> JISX0201-Kana, BIG5 -> Big5 */ 2222 /* SJIS -> JISX0201-Kana, BIG5 -> Big5 */
1963 if (sjis_p) 2223 if (sjis_p)
1964 DECODE_SJIS_BIG5_CHARACTER (charset_katakana_jisx0201, c1, /* dummy */ c2); 2224 DECODE_SJIS_BIG5_CHARACTER (charset_katakana_jisx0201, c1,
2225 /* dummy */ c2);
1965 else 2226 else
1966 { 2227 {
1967 int charset; 2228 int charset;
1968 2229
1969 ONE_MORE_BYTE (c2); 2230 ONE_MORE_BYTE (c2);
1972 } 2233 }
1973 } 2234 }
1974 continue; 2235 continue;
1975 2236
1976 label_end_of_loop: 2237 label_end_of_loop:
1977 coding->carryover_size = src - src_base; 2238 result = CODING_FINISH_INSUFFICIENT_SRC;
1978 bcopy (src_base, coding->carryover, coding->carryover_size); 2239 label_end_of_loop_2:
1979 src = src_base; 2240 src = src_base;
1980 break; 2241 break;
1981 } 2242 }
1982 2243
1983 *consumed = src - source; 2244 if (result == CODING_FINISH_NORMAL
1984 return dst - destination; 2245 && src < src_end)
2246 result = CODING_FINISH_INSUFFICIENT_DST;
2247
2248 coding->consumed = coding->consumed_char = src - source;
2249 coding->produced = dst - destination;
2250 return result;
1985 } 2251 }
1986 2252
1987 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". 2253 /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions".
1988 This function can encode `charset_ascii', `charset_katakana_jisx0201', 2254 This function can encode `charset_ascii', `charset_katakana_jisx0201',
1989 `charset_jisx0208', `charset_big5_1', and `charset_big5-2'. We are 2255 `charset_jisx0208', `charset_big5_1', and `charset_big5-2'. We are
1992 charsets are produced without any encoding. If SJIS_P is 1, encode 2258 charsets are produced without any encoding. If SJIS_P is 1, encode
1993 SJIS text, else encode BIG5 text. */ 2259 SJIS text, else encode BIG5 text. */
1994 2260
1995 int 2261 int
1996 encode_coding_sjis_big5 (coding, source, destination, 2262 encode_coding_sjis_big5 (coding, source, destination,
1997 src_bytes, dst_bytes, consumed, sjis_p) 2263 src_bytes, dst_bytes, sjis_p)
1998 struct coding_system *coding; 2264 struct coding_system *coding;
1999 unsigned char *source, *destination; 2265 unsigned char *source, *destination;
2000 int src_bytes, dst_bytes; 2266 int src_bytes, dst_bytes;
2001 int *consumed;
2002 int sjis_p; 2267 int sjis_p;
2003 { 2268 {
2004 unsigned char *src = source; 2269 unsigned char *src = source;
2005 unsigned char *src_end = source + src_bytes; 2270 unsigned char *src_end = source + src_bytes;
2006 unsigned char *dst = destination; 2271 unsigned char *dst = destination;
2009 from DST_END to assure overflow checking is necessary only at the 2274 from DST_END to assure overflow checking is necessary only at the
2010 head of loop. */ 2275 head of loop. */
2011 unsigned char *adjusted_dst_end = dst_end - 1; 2276 unsigned char *adjusted_dst_end = dst_end - 1;
2012 Lisp_Object unification_table 2277 Lisp_Object unification_table
2013 = coding->character_unification_table_for_encode; 2278 = coding->character_unification_table_for_encode;
2279 int result = CODING_FINISH_NORMAL;
2014 2280
2015 if (!NILP (Venable_character_unification) && NILP (unification_table)) 2281 if (!NILP (Venable_character_unification) && NILP (unification_table))
2016 unification_table = Vstandard_character_unification_table_for_encode; 2282 unification_table = Vstandard_character_unification_table_for_encode;
2017 2283
2018 while (src < src_end && dst < adjusted_dst_end) 2284 coding->consumed_char = 0;
2285 while (src < src_end && (dst_bytes
2286 ? (dst < adjusted_dst_end)
2287 : (dst < src - 1)))
2019 { 2288 {
2020 /* SRC_BASE remembers the start position in source in each loop. 2289 /* SRC_BASE remembers the start position in source in each loop.
2021 The loop will be exited when there's not enough source text 2290 The loop will be exited when there's not enough source text
2022 to analyze multi-byte codes (within macros ONE_MORE_BYTE and 2291 to analyze multi-byte codes (within macros ONE_MORE_BYTE and
2023 TWO_MORE_BYTES). In that case, SRC is reset to SRC_BASE 2292 TWO_MORE_BYTES). In that case, SRC is reset to SRC_BASE
2044 ENCODE_SJIS_BIG5_CHARACTER (charset_ascii, c1, /* dummy */ c2); 2313 ENCODE_SJIS_BIG5_CHARACTER (charset_ascii, c1, /* dummy */ c2);
2045 break; 2314 break;
2046 2315
2047 case EMACS_control_code: 2316 case EMACS_control_code:
2048 *dst++ = c1; 2317 *dst++ = c1;
2318 coding->consumed_char++;
2049 break; 2319 break;
2050 2320
2051 case EMACS_carriage_return_code: 2321 case EMACS_carriage_return_code:
2052 if (!coding->selective) 2322 if (! (coding->mode & CODING_MODE_SELECTIVE_DISPLAY))
2053 { 2323 {
2054 *dst++ = c1; 2324 *dst++ = c1;
2325 coding->consumed_char++;
2055 break; 2326 break;
2056 } 2327 }
2057 /* fall down to treat '\r' as '\n' ... */ 2328 /* fall down to treat '\r' as '\n' ... */
2058 2329
2059 case EMACS_linefeed_code: 2330 case EMACS_linefeed_code:
2062 *dst++ = '\n'; 2333 *dst++ = '\n';
2063 else if (coding->eol_type == CODING_EOL_CRLF) 2334 else if (coding->eol_type == CODING_EOL_CRLF)
2064 *dst++ = '\r', *dst++ = '\n'; 2335 *dst++ = '\r', *dst++ = '\n';
2065 else 2336 else
2066 *dst++ = '\r'; 2337 *dst++ = '\r';
2338 coding->consumed_char++;
2067 break; 2339 break;
2068 2340
2069 case EMACS_leading_code_2: 2341 case EMACS_leading_code_2:
2070 ONE_MORE_BYTE (c2); 2342 ONE_MORE_BYTE (c2);
2071 ENCODE_SJIS_BIG5_CHARACTER (c1, c2, /* dummy */ c3); 2343 ENCODE_SJIS_BIG5_CHARACTER (c1, c2, /* dummy */ c3);
2085 coding->composing = 1; 2357 coding->composing = 1;
2086 break; 2358 break;
2087 2359
2088 default: /* i.e. case EMACS_invalid_code: */ 2360 default: /* i.e. case EMACS_invalid_code: */
2089 *dst++ = c1; 2361 *dst++ = c1;
2362 coding->consumed_char++;
2090 } 2363 }
2091 continue; 2364 continue;
2092 2365
2093 label_end_of_loop: 2366 label_end_of_loop:
2094 coding->carryover_size = src_end - src_base; 2367 result = CODING_FINISH_INSUFFICIENT_SRC;
2095 bcopy (src_base, coding->carryover, coding->carryover_size); 2368 src = src_base;
2096 src = src_end;
2097 break; 2369 break;
2098 } 2370 }
2099 2371
2100 *consumed = src - source; 2372 if (result == CODING_FINISH_NORMAL
2101 return dst - destination; 2373 && src < src_end)
2374 result = CODING_FINISH_INSUFFICIENT_DST;
2375 coding->consumed = src - source;
2376 coding->produced = coding->produced_char = dst - destination;
2377 return result;
2102 } 2378 }
2103 2379
2104 2380
2105 /*** 5. End-of-line handlers ***/ 2381 /*** 5. End-of-line handlers ***/
2106 2382
2107 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions". 2383 /* See the above "GENERAL NOTES on `decode_coding_XXX ()' functions".
2108 This function is called only when `coding->eol_type' is 2384 This function is called only when `coding->eol_type' is
2109 CODING_EOL_CRLF or CODING_EOL_CR. */ 2385 CODING_EOL_CRLF or CODING_EOL_CR. */
2110 2386
2111 decode_eol (coding, source, destination, src_bytes, dst_bytes, consumed) 2387 decode_eol (coding, source, destination, src_bytes, dst_bytes)
2112 struct coding_system *coding; 2388 struct coding_system *coding;
2113 unsigned char *source, *destination; 2389 unsigned char *source, *destination;
2114 int src_bytes, dst_bytes; 2390 int src_bytes, dst_bytes;
2115 int *consumed;
2116 { 2391 {
2117 unsigned char *src = source; 2392 unsigned char *src = source;
2118 unsigned char *src_end = source + src_bytes; 2393 unsigned char *src_end = source + src_bytes;
2119 unsigned char *dst = destination; 2394 unsigned char *dst = destination;
2120 unsigned char *dst_end = destination + dst_bytes; 2395 unsigned char *dst_end = destination + dst_bytes;
2121 int produced; 2396 int result = CODING_FINISH_NORMAL;
2397
2398 if (src_bytes <= 0)
2399 return result;
2122 2400
2123 switch (coding->eol_type) 2401 switch (coding->eol_type)
2124 { 2402 {
2125 case CODING_EOL_CRLF: 2403 case CODING_EOL_CRLF:
2126 { 2404 {
2127 /* Since the maximum bytes produced by each loop is 2, we 2405 /* Since the maximum bytes produced by each loop is 2, we
2128 subtract 1 from DST_END to assure overflow checking is 2406 subtract 1 from DST_END to assure overflow checking is
2129 necessary only at the head of loop. */ 2407 necessary only at the head of loop. */
2130 unsigned char *adjusted_dst_end = dst_end - 1; 2408 unsigned char *adjusted_dst_end = dst_end - 1;
2131 2409
2132 while (src < src_end && dst < adjusted_dst_end) 2410 while (src < src_end && (dst_bytes
2411 ? (dst < adjusted_dst_end)
2412 : (dst < src - 1)))
2133 { 2413 {
2134 unsigned char *src_base = src; 2414 unsigned char *src_base = src;
2135 unsigned char c = *src++; 2415 unsigned char c = *src++;
2136 if (c == '\r') 2416 if (c == '\r')
2137 { 2417 {
2138 ONE_MORE_BYTE (c); 2418 ONE_MORE_BYTE (c);
2139 if (c != '\n') 2419 if (c != '\n')
2140 *dst++ = '\r'; 2420 {
2421 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2422 {
2423 result = CODING_FINISH_INCONSISTENT_EOL;
2424 goto label_end_of_loop_2;
2425 }
2426 *dst++ = '\r';
2427 }
2141 *dst++ = c; 2428 *dst++ = c;
2429 }
2430 else if (c == '\n'
2431 && (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL))
2432 {
2433 result = CODING_FINISH_INCONSISTENT_EOL;
2434 goto label_end_of_loop_2;
2142 } 2435 }
2143 else 2436 else
2144 *dst++ = c; 2437 *dst++ = c;
2145 continue; 2438 continue;
2146 2439
2147 label_end_of_loop: 2440 label_end_of_loop:
2148 coding->carryover_size = src - src_base; 2441 result = CODING_FINISH_INSUFFICIENT_SRC;
2149 bcopy (src_base, coding->carryover, coding->carryover_size); 2442 label_end_of_loop_2:
2150 src = src_base; 2443 src = src_base;
2151 break; 2444 break;
2152 } 2445 }
2153 *consumed = src - source; 2446 if (result == CODING_FINISH_NORMAL
2154 produced = dst - destination; 2447 && src < src_end)
2155 break; 2448 result = CODING_FINISH_INSUFFICIENT_DST;
2156 } 2449 }
2450 break;
2157 2451
2158 case CODING_EOL_CR: 2452 case CODING_EOL_CR:
2159 produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes; 2453 if (coding->mode & CODING_MODE_INHIBIT_INCONSISTENT_EOL)
2160 bcopy (source, destination, produced); 2454 {
2161 dst_end = destination + produced; 2455 while (src < src_end) if (*src++ == '\n') break;
2162 while (dst < dst_end) 2456 if (*--src == '\n')
2163 if (*dst++ == '\r') dst[-1] = '\n'; 2457 {
2164 *consumed = produced; 2458 src_bytes = src - source;
2459 result = CODING_FINISH_INCONSISTENT_EOL;
2460 }
2461 }
2462 if (dst_bytes && src_bytes > dst_bytes)
2463 {
2464 result = CODING_FINISH_INSUFFICIENT_DST;
2465 src_bytes = dst_bytes;
2466 }
2467 if (dst_bytes)
2468 bcopy (source, destination, src_bytes);
2469 else
2470 safe_bcopy (source, destination, src_bytes);
2471 src = source + src_bytes;
2472 while (src_bytes--) if (*dst++ == '\r') dst[-1] = '\n';
2165 break; 2473 break;
2166 2474
2167 default: /* i.e. case: CODING_EOL_LF */ 2475 default: /* i.e. case: CODING_EOL_LF */
2168 produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes; 2476 if (dst_bytes && src_bytes > dst_bytes)
2169 bcopy (source, destination, produced); 2477 {
2170 *consumed = produced; 2478 result = CODING_FINISH_INSUFFICIENT_DST;
2479 src_bytes = dst_bytes;
2480 }
2481 if (dst_bytes)
2482 bcopy (source, destination, src_bytes);
2483 else
2484 safe_bcopy (source, destination, src_bytes);
2485 src += src_bytes;
2486 dst += dst_bytes;
2171 break; 2487 break;
2172 } 2488 }
2173 2489
2174 return produced; 2490 coding->consumed = coding->consumed_char = src - source;
2491 coding->produced = coding->produced_char = dst - destination;
2492 return result;
2175 } 2493 }
2176 2494
2177 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". Encode 2495 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". Encode
2178 format of end-of-line according to `coding->eol_type'. If 2496 format of end-of-line according to `coding->eol_type'. If
2179 `coding->selective' is 1, code '\r' in source text also means 2497 `coding->mode & CODING_MODE_SELECTIVE_DISPLAY' is nonzero, code
2180 end-of-line. */ 2498 '\r' in source text also means end-of-line. */
2181 2499
2182 encode_eol (coding, source, destination, src_bytes, dst_bytes, consumed) 2500 encode_eol (coding, source, destination, src_bytes, dst_bytes)
2183 struct coding_system *coding; 2501 struct coding_system *coding;
2184 unsigned char *source, *destination; 2502 unsigned char *source, *destination;
2185 int src_bytes, dst_bytes; 2503 int src_bytes, dst_bytes;
2186 int *consumed;
2187 { 2504 {
2188 unsigned char *src = source; 2505 unsigned char *src = source;
2189 unsigned char *dst = destination; 2506 unsigned char *dst = destination;
2190 int produced; 2507 int result = CODING_FINISH_NORMAL;
2191 2508
2192 if (src_bytes <= 0) 2509 if (coding->eol_type == CODING_EOL_CRLF)
2193 return 0; 2510 {
2194 2511 unsigned char c;
2195 switch (coding->eol_type) 2512 unsigned char *src_end = source + src_bytes;
2196 { 2513 unsigned char *dst_end = destination + dst_bytes;
2197 case CODING_EOL_LF: 2514 /* Since the maximum bytes produced by each loop is 2, we
2198 case CODING_EOL_UNDECIDED: 2515 subtract 1 from DST_END to assure overflow checking is
2199 produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes; 2516 necessary only at the head of loop. */
2200 bcopy (source, destination, produced); 2517 unsigned char *adjusted_dst_end = dst_end - 1;
2201 if (coding->selective) 2518
2202 { 2519 while (src < src_end && (dst_bytes
2203 int i = produced; 2520 ? (dst < adjusted_dst_end)
2204 while (i--) 2521 : (dst < src - 1)))
2522 {
2523 c = *src++;
2524 if (c == '\n'
2525 || (c == '\r' && (coding->mode & CODING_MODE_SELECTIVE_DISPLAY)))
2526 *dst++ = '\r', *dst++ = '\n';
2527 else
2528 *dst++ = c;
2529 }
2530 if (src < src_end)
2531 result = CODING_FINISH_INSUFFICIENT_DST;
2532 }
2533 else
2534 {
2535 if (dst_bytes && src_bytes > dst_bytes)
2536 {
2537 src_bytes = dst_bytes;
2538 result = CODING_FINISH_INSUFFICIENT_DST;
2539 }
2540 if (dst_bytes)
2541 bcopy (source, destination, src_bytes);
2542 else
2543 safe_bcopy (source, destination, src_bytes);
2544 if (coding->eol_type == CODING_EOL_CRLF)
2545 {
2546 while (src_bytes--)
2547 if (*dst++ == '\n') dst[-1] = '\r';
2548 }
2549 else if (coding->mode & CODING_MODE_SELECTIVE_DISPLAY)
2550 {
2551 while (src_bytes--)
2205 if (*dst++ == '\r') dst[-1] = '\n'; 2552 if (*dst++ == '\r') dst[-1] = '\n';
2206 } 2553 }
2207 *consumed = produced; 2554 src += src_bytes;
2208 2555 dst += src_bytes;
2209 case CODING_EOL_CRLF: 2556 }
2210 { 2557
2211 unsigned char c; 2558 coding->consumed = coding->consumed_char = src - source;
2212 unsigned char *src_end = source + src_bytes; 2559 coding->produced = coding->produced_char = dst - destination;
2213 unsigned char *dst_end = destination + dst_bytes; 2560 return result;
2214 /* Since the maximum bytes produced by each loop is 2, we
2215 subtract 1 from DST_END to assure overflow checking is
2216 necessary only at the head of loop. */
2217 unsigned char *adjusted_dst_end = dst_end - 1;
2218
2219 while (src < src_end && dst < adjusted_dst_end)
2220 {
2221 c = *src++;
2222 if (c == '\n' || (c == '\r' && coding->selective))
2223 *dst++ = '\r', *dst++ = '\n';
2224 else
2225 *dst++ = c;
2226 }
2227 produced = dst - destination;
2228 *consumed = src - source;
2229 break;
2230 }
2231
2232 default: /* i.e. case CODING_EOL_CR: */
2233 produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes;
2234 bcopy (source, destination, produced);
2235 {
2236 int i = produced;
2237 while (i--)
2238 if (*dst++ == '\n') dst[-1] = '\r';
2239 }
2240 *consumed = produced;
2241 }
2242
2243 return produced;
2244 } 2561 }
2245 2562
2246 2563
2247 /*** 6. C library functions ***/ 2564 /*** 6. C library functions ***/
2248 2565
2315 int 2632 int
2316 setup_coding_system (coding_system, coding) 2633 setup_coding_system (coding_system, coding)
2317 Lisp_Object coding_system; 2634 Lisp_Object coding_system;
2318 struct coding_system *coding; 2635 struct coding_system *coding;
2319 { 2636 {
2320 Lisp_Object coding_spec, plist, type, eol_type; 2637 Lisp_Object coding_spec, coding_type, eol_type, plist;
2321 Lisp_Object val; 2638 Lisp_Object val;
2322 int i; 2639 int i;
2323 2640
2324 /* At first, set several fields to default values. */ 2641 /* Initialize some fields required for all kinds of coding systems. */
2325 coding->last_block = 0; 2642 coding->symbol = coding_system;
2326 coding->selective = 0; 2643 coding->common_flags = 0;
2327 coding->composing = 0; 2644 coding->mode = 0;
2328 coding->direction = 0; 2645 coding->heading_ascii = -1;
2329 coding->carryover_size = 0;
2330 coding->post_read_conversion = coding->pre_write_conversion = Qnil; 2646 coding->post_read_conversion = coding->pre_write_conversion = Qnil;
2331 coding->character_unification_table_for_decode = Qnil;
2332 coding->character_unification_table_for_encode = Qnil;
2333
2334 coding->symbol = coding_system;
2335 eol_type = Qnil;
2336
2337 /* Get values of property `coding-system' and `eol-type'.
2338 Also get values of coding system properties:
2339 `post-read-conversion', `pre-write-conversion',
2340 `character-unification-table-for-decode',
2341 `character-unification-table-for-encode'. */
2342 coding_spec = Fget (coding_system, Qcoding_system); 2647 coding_spec = Fget (coding_system, Qcoding_system);
2343 if (!VECTORP (coding_spec) 2648 if (!VECTORP (coding_spec)
2344 || XVECTOR (coding_spec)->size != 5 2649 || XVECTOR (coding_spec)->size != 5
2345 || !CONSP (XVECTOR (coding_spec)->contents[3])) 2650 || !CONSP (XVECTOR (coding_spec)->contents[3]))
2346 goto label_invalid_coding_system; 2651 goto label_invalid_coding_system;
2347 if (!inhibit_eol_conversion) 2652
2348 eol_type = Fget (coding_system, Qeol_type); 2653 eol_type = inhibit_eol_conversion ? Qnil : Fget (coding_system, Qeol_type);
2349 2654 if (VECTORP (eol_type))
2655 {
2656 coding->eol_type = CODING_EOL_UNDECIDED;
2657 coding->common_flags = CODING_REQUIRE_DETECTION_MASK;
2658 }
2659 else if (XFASTINT (eol_type) == 1)
2660 {
2661 coding->eol_type = CODING_EOL_CRLF;
2662 coding->common_flags
2663 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
2664 }
2665 else if (XFASTINT (eol_type) == 2)
2666 {
2667 coding->eol_type = CODING_EOL_CR;
2668 coding->common_flags
2669 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
2670 }
2671 else
2672 coding->eol_type = CODING_EOL_LF;
2673
2674 coding_type = XVECTOR (coding_spec)->contents[0];
2675 /* Try short cut. */
2676 if (SYMBOLP (coding_type))
2677 {
2678 if (EQ (coding_type, Qt))
2679 {
2680 coding->type = coding_type_undecided;
2681 coding->common_flags |= CODING_REQUIRE_DETECTION_MASK;
2682 }
2683 else
2684 coding->type = coding_type_no_conversion;
2685 return 0;
2686 }
2687
2688 /* Initialize remaining fields. */
2689 coding->composing = 0;
2690 coding->character_unification_table_for_decode = Qnil;
2691 coding->character_unification_table_for_encode = Qnil;
2692
2693 /* Get values of coding system properties:
2694 `post-read-conversion', `pre-write-conversion',
2695 `character-unification-table-for-decode',
2696 `character-unification-table-for-encode'. */
2350 plist = XVECTOR (coding_spec)->contents[3]; 2697 plist = XVECTOR (coding_spec)->contents[3];
2351 coding->post_read_conversion = Fplist_get (plist, Qpost_read_conversion); 2698 coding->post_read_conversion = Fplist_get (plist, Qpost_read_conversion);
2352 coding->pre_write_conversion = Fplist_get (plist, Qpre_write_conversion); 2699 coding->pre_write_conversion = Fplist_get (plist, Qpre_write_conversion);
2353 val = Fplist_get (plist, Qcharacter_unification_table_for_decode); 2700 val = Fplist_get (plist, Qcharacter_unification_table_for_decode);
2354 if (SYMBOLP (val)) 2701 if (SYMBOLP (val))
2358 val = Fplist_get (plist, Qcharacter_unification_table_for_encode); 2705 val = Fplist_get (plist, Qcharacter_unification_table_for_encode);
2359 if (SYMBOLP (val)) 2706 if (SYMBOLP (val))
2360 val = Fget (val, Qcharacter_unification_table_for_encode); 2707 val = Fget (val, Qcharacter_unification_table_for_encode);
2361 coding->character_unification_table_for_encode 2708 coding->character_unification_table_for_encode
2362 = CHAR_TABLE_P (val) ? val : Qnil; 2709 = CHAR_TABLE_P (val) ? val : Qnil;
2710 val = Fplist_get (plist, Qcoding_category);
2711 if (!NILP (val))
2712 {
2713 val = Fget (val, Qcoding_category_index);
2714 if (INTEGERP (val))
2715 coding->category_idx = XINT (val);
2716 else
2717 goto label_invalid_coding_system;
2718 }
2719 else
2720 goto label_invalid_coding_system;
2363 2721
2364 val = Fplist_get (plist, Qsafe_charsets); 2722 val = Fplist_get (plist, Qsafe_charsets);
2365 if (EQ (val, Qt)) 2723 if (EQ (val, Qt))
2366 { 2724 {
2367 for (i = 0; i <= MAX_CHARSET; i++) 2725 for (i = 0; i <= MAX_CHARSET; i++)
2376 coding->safe_charsets[i] = 1; 2734 coding->safe_charsets[i] = 1;
2377 val = XCONS (val)->cdr; 2735 val = XCONS (val)->cdr;
2378 } 2736 }
2379 } 2737 }
2380 2738
2381 if (VECTORP (eol_type)) 2739 switch (XFASTINT (coding_type))
2382 {
2383 coding->eol_type = CODING_EOL_UNDECIDED;
2384 coding->common_flags = CODING_REQUIRE_DETECTION_MASK;
2385 }
2386 else if (XFASTINT (eol_type) == 1)
2387 {
2388 coding->eol_type = CODING_EOL_CRLF;
2389 coding->common_flags
2390 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
2391 }
2392 else if (XFASTINT (eol_type) == 2)
2393 {
2394 coding->eol_type = CODING_EOL_CR;
2395 coding->common_flags
2396 = CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
2397 }
2398 else
2399 {
2400 coding->eol_type = CODING_EOL_LF;
2401 coding->common_flags = 0;
2402 }
2403
2404 type = XVECTOR (coding_spec)->contents[0];
2405 switch (XFASTINT (type))
2406 { 2740 {
2407 case 0: 2741 case 0:
2408 coding->type = coding_type_emacs_mule; 2742 coding->type = coding_type_emacs_mule;
2409 if (!NILP (coding->post_read_conversion)) 2743 if (!NILP (coding->post_read_conversion))
2410 coding->common_flags |= CODING_REQUIRE_DECODING_MASK; 2744 coding->common_flags |= CODING_REQUIRE_DECODING_MASK;
2423 coding->common_flags 2757 coding->common_flags
2424 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK; 2758 |= CODING_REQUIRE_DECODING_MASK | CODING_REQUIRE_ENCODING_MASK;
2425 { 2759 {
2426 Lisp_Object val, temp; 2760 Lisp_Object val, temp;
2427 Lisp_Object *flags; 2761 Lisp_Object *flags;
2428 int i, charset, default_reg_bits = 0; 2762 int i, charset, reg_bits = 0;
2429 2763
2430 val = XVECTOR (coding_spec)->contents[4]; 2764 val = XVECTOR (coding_spec)->contents[4];
2431 2765
2432 if (!VECTORP (val) || XVECTOR (val)->size != 32) 2766 if (!VECTORP (val) || XVECTOR (val)->size != 32)
2433 goto label_invalid_coding_system; 2767 goto label_invalid_coding_system;
2478 t: designate nothing to REG initially, but can be used 2812 t: designate nothing to REG initially, but can be used
2479 by any charsets, 2813 by any charsets,
2480 list of integer, nil, or t: designate the first 2814 list of integer, nil, or t: designate the first
2481 element (if integer) to REG initially, the remaining 2815 element (if integer) to REG initially, the remaining
2482 elements (if integer) is designated to REG on request, 2816 elements (if integer) is designated to REG on request,
2483 if an element is t, REG can be used by any charset, 2817 if an element is t, REG can be used by any charsets,
2484 nil: REG is never used. */ 2818 nil: REG is never used. */
2485 for (charset = 0; charset <= MAX_CHARSET; charset++) 2819 for (charset = 0; charset <= MAX_CHARSET; charset++)
2486 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) 2820 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
2487 = CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION; 2821 = CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION;
2488 for (i = 0; i < 4; i++) 2822 for (i = 0; i < 4; i++)
2495 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) = i; 2829 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) = i;
2496 } 2830 }
2497 else if (EQ (flags[i], Qt)) 2831 else if (EQ (flags[i], Qt))
2498 { 2832 {
2499 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1; 2833 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
2500 default_reg_bits |= 1 << i; 2834 reg_bits |= 1 << i;
2835 coding->flags |= CODING_FLAG_ISO_DESIGNATION;
2501 } 2836 }
2502 else if (CONSP (flags[i])) 2837 else if (CONSP (flags[i]))
2503 { 2838 {
2504 Lisp_Object tail = flags[i]; 2839 Lisp_Object tail = flags[i];
2505 2840
2841 coding->flags |= CODING_FLAG_ISO_DESIGNATION;
2506 if (INTEGERP (XCONS (tail)->car) 2842 if (INTEGERP (XCONS (tail)->car)
2507 && (charset = XINT (XCONS (tail)->car), 2843 && (charset = XINT (XCONS (tail)->car),
2508 CHARSET_VALID_P (charset)) 2844 CHARSET_VALID_P (charset))
2509 || (charset = get_charset_id (XCONS (tail)->car)) >= 0) 2845 || (charset = get_charset_id (XCONS (tail)->car)) >= 0)
2510 { 2846 {
2521 CHARSET_VALID_P (charset)) 2857 CHARSET_VALID_P (charset))
2522 || (charset = get_charset_id (XCONS (tail)->car)) >= 0) 2858 || (charset = get_charset_id (XCONS (tail)->car)) >= 0)
2523 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) 2859 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
2524 = i; 2860 = i;
2525 else if (EQ (XCONS (tail)->car, Qt)) 2861 else if (EQ (XCONS (tail)->car, Qt))
2526 default_reg_bits |= 1 << i; 2862 reg_bits |= 1 << i;
2527 tail = XCONS (tail)->cdr; 2863 tail = XCONS (tail)->cdr;
2528 } 2864 }
2529 } 2865 }
2530 else 2866 else
2531 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1; 2867 CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i) = -1;
2532 2868
2533 CODING_SPEC_ISO_DESIGNATION (coding, i) 2869 CODING_SPEC_ISO_DESIGNATION (coding, i)
2534 = CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i); 2870 = CODING_SPEC_ISO_INITIAL_DESIGNATION (coding, i);
2535 } 2871 }
2536 2872
2537 if (! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT)) 2873 if (reg_bits && ! (coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT))
2538 { 2874 {
2539 /* REG 1 can be used only by locking shift in 7-bit env. */ 2875 /* REG 1 can be used only by locking shift in 7-bit env. */
2540 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS) 2876 if (coding->flags & CODING_FLAG_ISO_SEVEN_BITS)
2541 default_reg_bits &= ~2; 2877 reg_bits &= ~2;
2542 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT)) 2878 if (! (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT))
2543 /* Without any shifting, only REG 0 and 1 can be used. */ 2879 /* Without any shifting, only REG 0 and 1 can be used. */
2544 default_reg_bits &= 3; 2880 reg_bits &= 3;
2545 } 2881 }
2546 2882
2547 for (charset = 0; charset <= MAX_CHARSET; charset++) 2883 if (reg_bits)
2548 if (CHARSET_VALID_P (charset) 2884 for (charset = 0; charset <= MAX_CHARSET; charset++)
2549 && (CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
2550 == CODING_SPEC_ISO_NO_REQUESTED_DESIGNATION))
2551 { 2885 {
2552 /* We have not yet decided where to designate CHARSET. */ 2886 if (CHARSET_VALID_P (charset))
2553 int reg_bits = default_reg_bits; 2887 {
2554 2888 /* There exist some default graphic registers to be
2555 if (CHARSET_CHARS (charset) == 96) 2889 used CHARSET. */
2556 /* A charset of CHARS96 can't be designated to REG 0. */ 2890
2557 reg_bits &= ~1; 2891 /* We had better avoid designating a charset of
2558 2892 CHARS96 to REG 0 as far as possible. */
2559 if (reg_bits) 2893 if (CHARSET_CHARS (charset) == 96)
2560 /* There exist some default graphic register. */ 2894 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
2561 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) 2895 = (reg_bits & 2
2562 = (reg_bits & 1 2896 ? 1 : (reg_bits & 4 ? 2 : (reg_bits & 8 ? 3 : 0)));
2563 ? 0 : (reg_bits & 2 ? 1 : (reg_bits & 4 ? 2 : 3))); 2897 else
2564 else 2898 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset)
2565 /* We anyway have to designate CHARSET to somewhere. */ 2899 = (reg_bits & 1
2566 CODING_SPEC_ISO_REQUESTED_DESIGNATION (coding, charset) 2900 ? 0 : (reg_bits & 2 ? 1 : (reg_bits & 4 ? 2 : 3)));
2567 = (CHARSET_CHARS (charset) == 94 2901 }
2568 ? 0
2569 : ((coding->flags & CODING_FLAG_ISO_LOCKING_SHIFT
2570 || ! coding->flags & CODING_FLAG_ISO_SEVEN_BITS)
2571 ? 1
2572 : (coding->flags & CODING_FLAG_ISO_SINGLE_SHIFT
2573 ? 2 : 0)));
2574 } 2902 }
2575 } 2903 }
2576 coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK; 2904 coding->common_flags |= CODING_REQUIRE_FLUSHING_MASK;
2905 coding->spec.iso2022.last_invalid_designation_register = -1;
2577 break; 2906 break;
2578 2907
2579 case 3: 2908 case 3:
2580 coding->type = coding_type_big5; 2909 coding->type = coding_type_big5;
2581 coding->common_flags 2910 coding->common_flags
2608 case 5: 2937 case 5:
2609 coding->type = coding_type_raw_text; 2938 coding->type = coding_type_raw_text;
2610 break; 2939 break;
2611 2940
2612 default: 2941 default:
2613 if (EQ (type, Qt)) 2942 goto label_invalid_coding_system;
2614 {
2615 coding->type = coding_type_undecided;
2616 coding->common_flags |= CODING_REQUIRE_DETECTION_MASK;
2617 }
2618 else
2619 coding->type = coding_type_no_conversion;
2620 break;
2621 } 2943 }
2622 return 0; 2944 return 0;
2623 2945
2624 label_invalid_coding_system: 2946 label_invalid_coding_system:
2625 coding->type = coding_type_no_conversion; 2947 coding->type = coding_type_no_conversion;
2948 coding->category_idx = CODING_CATEGORY_IDX_BINARY;
2626 coding->common_flags = 0; 2949 coding->common_flags = 0;
2627 coding->eol_type = CODING_EOL_LF; 2950 coding->eol_type = CODING_EOL_LF;
2628 coding->symbol = coding->pre_write_conversion = coding->post_read_conversion 2951 coding->pre_write_conversion = coding->post_read_conversion = Qnil;
2629 = Qnil;
2630 return -1; 2952 return -1;
2631 } 2953 }
2632 2954
2633 /* Emacs has a mechanism to automatically detect a coding system if it 2955 /* Emacs has a mechanism to automatically detect a coding system if it
2634 is one of Emacs' internal format, ISO2022, SJIS, and BIG5. But, 2956 is one of Emacs' internal format, ISO2022, SJIS, and BIG5. But,
2650 2972
2651 o coding-category-iso-7 2973 o coding-category-iso-7
2652 2974
2653 The category for a coding system which has the same code range 2975 The category for a coding system which has the same code range
2654 as ISO2022 of 7-bit environment. This doesn't use any locking 2976 as ISO2022 of 7-bit environment. This doesn't use any locking
2655 shift and single shift functions. Assigned the coding-system 2977 shift and single shift functions. This can encode/decode all
2656 (Lisp symbol) `iso-2022-7bit' by default. 2978 charsets. Assigned the coding-system (Lisp symbol)
2979 `iso-2022-7bit' by default.
2980
2981 o coding-category-iso-7-tight
2982
2983 Same as coding-category-iso-7 except that this can
2984 encode/decode only the specified charsets.
2657 2985
2658 o coding-category-iso-8-1 2986 o coding-category-iso-8-1
2659 2987
2660 The category for a coding system which has the same code range 2988 The category for a coding system which has the same code range
2661 as ISO2022 of 8-bit environment and graphic plane 1 used only 2989 as ISO2022 of 8-bit environment and graphic plane 1 used only
2705 highest priority. Priorities of categories are also specified by a 3033 highest priority. Priorities of categories are also specified by a
2706 user in a Lisp variable `coding-category-list'. 3034 user in a Lisp variable `coding-category-list'.
2707 3035
2708 */ 3036 */
2709 3037
2710 /* Detect how a text of length SRC_BYTES pointed by SRC is encoded. 3038 /* Detect how a text of length SRC_BYTES pointed by SOURCE is encoded.
2711 If it detects possible coding systems, return an integer in which 3039 If it detects possible coding systems, return an integer in which
2712 appropriate flag bits are set. Flag bits are defined by macros 3040 appropriate flag bits are set. Flag bits are defined by macros
2713 CODING_CATEGORY_MASK_XXX in `coding.h'. */ 3041 CODING_CATEGORY_MASK_XXX in `coding.h'.
2714 3042
2715 int 3043 How many ASCII characters are at the head is returned as *SKIP. */
2716 detect_coding_mask (src, src_bytes) 3044
2717 unsigned char *src; 3045 static int
2718 int src_bytes; 3046 detect_coding_mask (source, src_bytes, priorities, skip)
3047 unsigned char *source;
3048 int src_bytes, *priorities, *skip;
2719 { 3049 {
2720 register unsigned char c; 3050 register unsigned char c;
2721 unsigned char *src_end = src + src_bytes; 3051 unsigned char *src = source, *src_end = source + src_bytes;
2722 int mask; 3052 unsigned int mask = (CODING_CATEGORY_MASK_ISO_7BIT
3053 | CODING_CATEGORY_MASK_ISO_SHIFT);
3054 int i;
2723 3055
2724 /* At first, skip all ASCII characters and control characters except 3056 /* At first, skip all ASCII characters and control characters except
2725 for three ISO2022 specific control characters. */ 3057 for three ISO2022 specific control characters. */
2726 label_loop_detect_coding: 3058 label_loop_detect_coding:
2727 while (src < src_end) 3059 while (src < src_end)
2728 { 3060 {
2729 c = *src; 3061 c = *src;
2730 if (c >= 0x80 3062 if (c >= 0x80
2731 || (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO)) 3063 || ((mask & CODING_CATEGORY_MASK_ISO_7BIT)
3064 && c == ISO_CODE_ESC)
3065 || ((mask & CODING_CATEGORY_MASK_ISO_SHIFT)
3066 && (c == ISO_CODE_SI || c == ISO_CODE_SO)))
2732 break; 3067 break;
2733 src++; 3068 src++;
2734 } 3069 }
3070 *skip = src - source;
2735 3071
2736 if (src >= src_end) 3072 if (src >= src_end)
2737 /* We found nothing other than ASCII. There's nothing to do. */ 3073 /* We found nothing other than ASCII. There's nothing to do. */
2738 return CODING_CATEGORY_MASK_ANY; 3074 return 0;
2739 3075
2740 /* The text seems to be encoded in some multilingual coding system. 3076 /* The text seems to be encoded in some multilingual coding system.
2741 Now, try to find in which coding system the text is encoded. */ 3077 Now, try to find in which coding system the text is encoded. */
2742 if (c < 0x80) 3078 if (c < 0x80)
2743 { 3079 {
2744 /* i.e. (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) */ 3080 /* i.e. (c == ISO_CODE_ESC || c == ISO_CODE_SI || c == ISO_CODE_SO) */
2745 /* C is an ISO2022 specific control code of C0. */ 3081 /* C is an ISO2022 specific control code of C0. */
2746 mask = detect_coding_iso2022 (src, src_end); 3082 mask = detect_coding_iso2022 (src, src_end);
2747 src++;
2748 if (mask == 0) 3083 if (mask == 0)
2749 /* No valid ISO2022 code follows C. Try again. */ 3084 {
2750 goto label_loop_detect_coding; 3085 /* No valid ISO2022 code follows C. Try again. */
2751 mask |= CODING_CATEGORY_MASK_RAW_TEXT; 3086 src++;
2752 } 3087 mask = (c != ISO_CODE_ESC
2753 else if (c < 0xA0) 3088 ? CODING_CATEGORY_MASK_ISO_7BIT
2754 { 3089 : CODING_CATEGORY_MASK_ISO_SHIFT);
2755 /* If C is a special latin extra code, 3090 goto label_loop_detect_coding;
2756 or is an ISO2022 specific control code of C1 (SS2 or SS3), 3091 }
2757 or is an ISO2022 control-sequence-introducer (CSI), 3092 if (priorities)
2758 we should also consider the possibility of ISO2022 codings. */ 3093 goto label_return_highest_only;
2759 if ((VECTORP (Vlatin_extra_code_table) 3094 }
2760 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c])) 3095 else
2761 || (c == ISO_CODE_SS2 || c == ISO_CODE_SS3) 3096 {
2762 || (c == ISO_CODE_CSI 3097 int try;
2763 && (src < src_end 3098
2764 && (*src == ']' 3099 if (c < 0xA0)
2765 || (src + 1 < src_end 3100 {
2766 && src[1] == ']' 3101 /* C is the first byte of SJIS character code,
2767 && (*src == '0' || *src == '1' || *src == '2')))))) 3102 or a leading-code of Emacs' internal format (emacs-mule). */
2768 mask = (detect_coding_iso2022 (src, src_end) 3103 try = CODING_CATEGORY_MASK_SJIS | CODING_CATEGORY_MASK_EMACS_MULE;
2769 | detect_coding_sjis (src, src_end) 3104
2770 | detect_coding_emacs_mule (src, src_end) 3105 /* Or, if C is a special latin extra code,
2771 | CODING_CATEGORY_MASK_RAW_TEXT); 3106 or is an ISO2022 specific control code of C1 (SS2 or SS3),
2772 3107 or is an ISO2022 control-sequence-introducer (CSI),
3108 we should also consider the possibility of ISO2022 codings. */
3109 if ((VECTORP (Vlatin_extra_code_table)
3110 && !NILP (XVECTOR (Vlatin_extra_code_table)->contents[c]))
3111 || (c == ISO_CODE_SS2 || c == ISO_CODE_SS3)
3112 || (c == ISO_CODE_CSI
3113 && (src < src_end
3114 && (*src == ']'
3115 || ((*src == '0' || *src == '1' || *src == '2')
3116 && src + 1 < src_end
3117 && src[1] == ']')))))
3118 try |= (CODING_CATEGORY_MASK_ISO_8_ELSE
3119 | CODING_CATEGORY_MASK_ISO_8BIT);
3120 }
2773 else 3121 else
2774 /* C is the first byte of SJIS character code, 3122 /* C is a character of ISO2022 in graphic plane right,
2775 or a leading-code of Emacs' internal format (emacs-mule). */ 3123 or a SJIS's 1-byte character code (i.e. JISX0201),
2776 mask = (detect_coding_sjis (src, src_end) 3124 or the first byte of BIG5's 2-byte code. */
2777 | detect_coding_emacs_mule (src, src_end) 3125 try = (CODING_CATEGORY_MASK_ISO_8_ELSE
2778 | CODING_CATEGORY_MASK_RAW_TEXT); 3126 | CODING_CATEGORY_MASK_ISO_8BIT
2779 } 3127 | CODING_CATEGORY_MASK_SJIS
2780 else 3128 | CODING_CATEGORY_MASK_BIG5);
2781 /* C is a character of ISO2022 in graphic plane right, 3129
2782 or a SJIS's 1-byte character code (i.e. JISX0201), 3130 mask = 0;
2783 or the first byte of BIG5's 2-byte code. */ 3131 if (priorities)
2784 mask = (detect_coding_iso2022 (src, src_end) 3132 {
2785 | detect_coding_sjis (src, src_end) 3133 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
2786 | detect_coding_big5 (src, src_end) 3134 {
2787 | CODING_CATEGORY_MASK_RAW_TEXT); 3135 priorities[i] &= try;
2788 3136 if (priorities[i] & CODING_CATEGORY_MASK_ISO)
2789 return mask; 3137 mask = detect_coding_iso2022 (src, src_end);
3138 else if (priorities[i] & CODING_CATEGORY_MASK_SJIS)
3139 mask = detect_coding_sjis (src, src_end);
3140 else if (priorities[i] & CODING_CATEGORY_MASK_BIG5)
3141 mask = detect_coding_big5 (src, src_end);
3142 else if (priorities[i] & CODING_CATEGORY_MASK_EMACS_MULE)
3143 mask = detect_coding_emacs_mule (src, src_end);
3144 if (mask)
3145 goto label_return_highest_only;
3146 }
3147 return CODING_CATEGORY_MASK_RAW_TEXT;
3148 }
3149 if (try & CODING_CATEGORY_MASK_ISO)
3150 mask |= detect_coding_iso2022 (src, src_end);
3151 if (try & CODING_CATEGORY_MASK_SJIS)
3152 mask |= detect_coding_sjis (src, src_end);
3153 if (try & CODING_CATEGORY_MASK_BIG5)
3154 mask |= detect_coding_big5 (src, src_end);
3155 if (try & CODING_CATEGORY_MASK_EMACS_MULE)
3156 mask |= detect_coding_emacs_mule (src, src_end);
3157 }
3158 return (mask | CODING_CATEGORY_MASK_RAW_TEXT);
3159
3160 label_return_highest_only:
3161 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
3162 {
3163 if (mask & priorities[i])
3164 return priorities[i];
3165 }
3166 return CODING_CATEGORY_MASK_RAW_TEXT;
2790 } 3167 }
2791 3168
2792 /* Detect how a text of length SRC_BYTES pointed by SRC is encoded. 3169 /* Detect how a text of length SRC_BYTES pointed by SRC is encoded.
2793 The information of the detected coding system is set in CODING. */ 3170 The information of the detected coding system is set in CODING. */
2794 3171
2796 detect_coding (coding, src, src_bytes) 3173 detect_coding (coding, src, src_bytes)
2797 struct coding_system *coding; 3174 struct coding_system *coding;
2798 unsigned char *src; 3175 unsigned char *src;
2799 int src_bytes; 3176 int src_bytes;
2800 { 3177 {
2801 int mask = detect_coding_mask (src, src_bytes); 3178 unsigned int idx;
2802 int idx; 3179 int skip, mask, i;
3180 int priorities[CODING_CATEGORY_IDX_MAX];
2803 Lisp_Object val = Vcoding_category_list; 3181 Lisp_Object val = Vcoding_category_list;
2804 3182
2805 if (mask == CODING_CATEGORY_MASK_ANY) 3183 i = 0;
2806 /* We found nothing other than ASCII. There's nothing to do. */ 3184 while (CONSP (val) && i < CODING_CATEGORY_IDX_MAX)
2807 return; 3185 {
2808 3186 if (! SYMBOLP (XCONS (val)->car))
2809 /* We found some plausible coding systems. Let's use a coding 3187 break;
2810 system of the highest priority. */ 3188 idx = XFASTINT (Fget (XCONS (val)->car, Qcoding_category_index));
2811 3189 if (idx >= CODING_CATEGORY_IDX_MAX)
2812 if (CONSP (val)) 3190 break;
2813 while (!NILP (val)) 3191 priorities[i++] = (1 << idx);
2814 { 3192 val = XCONS (val)->cdr;
2815 idx = XFASTINT (Fget (XCONS (val)->car, Qcoding_category_index)); 3193 }
2816 if ((idx < CODING_CATEGORY_IDX_MAX) && (mask & (1 << idx))) 3194 /* If coding-category-list is valid and contains all coding
2817 break; 3195 categories, `i' should be CODING_CATEGORY_IDX_MAX now. If not,
2818 val = XCONS (val)->cdr; 3196 the following code saves Emacs from craching. */
2819 } 3197 while (i < CODING_CATEGORY_IDX_MAX)
2820 else 3198 priorities[i++] = CODING_CATEGORY_MASK_RAW_TEXT;
2821 val = Qnil; 3199
2822 3200 mask = detect_coding_mask (src, src_bytes, priorities, &skip);
2823 if (NILP (val)) 3201 coding->heading_ascii = skip;
2824 { 3202
2825 /* For unknown reason, `Vcoding_category_list' contains none of 3203 if (!mask) return;
2826 found categories. Let's use any of them. */ 3204
2827 for (idx = 0; idx < CODING_CATEGORY_IDX_MAX; idx++) 3205 /* We found a single coding system of the highest priority in MASK. */
2828 if (mask & (1 << idx)) 3206 idx = 0;
2829 break; 3207 while (mask && ! (mask & 1)) mask >>= 1, idx++;
2830 } 3208 if (! mask)
2831 setup_coding_system (XSYMBOL (coding_category_table[idx])->value, coding); 3209 idx = CODING_CATEGORY_IDX_RAW_TEXT;
2832 } 3210
2833 3211 val = XSYMBOL (XVECTOR (Vcoding_category_table)->contents[idx])->value;
2834 /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC 3212
2835 is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF, 3213 if (coding->eol_type != CODING_EOL_UNDECIDED)
2836 CODING_EOL_CR, and CODING_EOL_UNDECIDED. */ 3214 {
3215 Lisp_Object tmp = Fget (val, Qeol_type);
3216
3217 if (VECTORP (tmp))
3218 val = XVECTOR (tmp)->contents[coding->eol_type];
3219 }
3220 setup_coding_system (val, coding);
3221 /* Set this again because setup_coding_system reset this member. */
3222 coding->heading_ascii = skip;
3223 }
3224
3225 /* Detect how end-of-line of a text of length SRC_BYTES pointed by
3226 SOURCE is encoded. Return one of CODING_EOL_LF, CODING_EOL_CRLF,
3227 CODING_EOL_CR, and CODING_EOL_UNDECIDED.
3228
3229 How many non-eol characters are at the head is returned as *SKIP. */
2837 3230
2838 #define MAX_EOL_CHECK_COUNT 3 3231 #define MAX_EOL_CHECK_COUNT 3
2839 3232
2840 int 3233 static int
2841 detect_eol_type (src, src_bytes) 3234 detect_eol_type (source, src_bytes, skip)
2842 unsigned char *src; 3235 unsigned char *source;
2843 int src_bytes; 3236 int src_bytes, *skip;
2844 { 3237 {
2845 unsigned char *src_end = src + src_bytes; 3238 unsigned char *src = source, *src_end = src + src_bytes;
2846 unsigned char c; 3239 unsigned char c;
2847 int total = 0; /* How many end-of-lines are found so far. */ 3240 int total = 0; /* How many end-of-lines are found so far. */
2848 int eol_type = CODING_EOL_UNDECIDED; 3241 int eol_type = CODING_EOL_UNDECIDED;
2849 int this_eol_type; 3242 int this_eol_type;
2850 3243
3244 *skip = 0;
3245
2851 while (src < src_end && total < MAX_EOL_CHECK_COUNT) 3246 while (src < src_end && total < MAX_EOL_CHECK_COUNT)
2852 { 3247 {
2853 c = *src++; 3248 c = *src++;
2854 if (c == '\n' || c == '\r') 3249 if (c == '\n' || c == '\r')
2855 { 3250 {
3251 if (*skip == 0)
3252 *skip = src - 1 - source;
2856 total++; 3253 total++;
2857 if (c == '\n') 3254 if (c == '\n')
2858 this_eol_type = CODING_EOL_LF; 3255 this_eol_type = CODING_EOL_LF;
2859 else if (src >= src_end || *src != '\n') 3256 else if (src >= src_end || *src != '\n')
2860 this_eol_type = CODING_EOL_CR; 3257 this_eol_type = CODING_EOL_CR;
2863 3260
2864 if (eol_type == CODING_EOL_UNDECIDED) 3261 if (eol_type == CODING_EOL_UNDECIDED)
2865 /* This is the first end-of-line. */ 3262 /* This is the first end-of-line. */
2866 eol_type = this_eol_type; 3263 eol_type = this_eol_type;
2867 else if (eol_type != this_eol_type) 3264 else if (eol_type != this_eol_type)
2868 /* The found type is different from what found before. 3265 {
2869 Let's notice the caller about this inconsistency. */ 3266 /* The found type is different from what found before. */
2870 return CODING_EOL_INCONSISTENT; 3267 eol_type = CODING_EOL_INCONSISTENT;
2871 } 3268 break;
2872 } 3269 }
2873 3270 }
3271 }
3272
3273 if (*skip == 0)
3274 *skip = src_end - source;
2874 return eol_type; 3275 return eol_type;
2875 } 3276 }
2876 3277
2877 /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC 3278 /* Detect how end-of-line of a text of length SRC_BYTES pointed by SRC
2878 is encoded. If it detects an appropriate format of end-of-line, it 3279 is encoded. If it detects an appropriate format of end-of-line, it
2883 struct coding_system *coding; 3284 struct coding_system *coding;
2884 unsigned char *src; 3285 unsigned char *src;
2885 int src_bytes; 3286 int src_bytes;
2886 { 3287 {
2887 Lisp_Object val; 3288 Lisp_Object val;
2888 int eol_type = detect_eol_type (src, src_bytes); 3289 int skip;
3290 int eol_type = detect_eol_type (src, src_bytes, &skip);
3291
3292 if (coding->heading_ascii > skip)
3293 coding->heading_ascii = skip;
3294 else
3295 skip = coding->heading_ascii;
2889 3296
2890 if (eol_type == CODING_EOL_UNDECIDED) 3297 if (eol_type == CODING_EOL_UNDECIDED)
2891 /* We found no end-of-line in the source text. */
2892 return; 3298 return;
2893
2894 if (eol_type == CODING_EOL_INCONSISTENT) 3299 if (eol_type == CODING_EOL_INCONSISTENT)
2895 { 3300 {
2896 #if 0 3301 #if 0
2897 /* This code is suppressed until we find a better way to 3302 /* This code is suppressed until we find a better way to
2898 distinguish raw text file and binary file. */ 3303 distinguish raw text file and binary file. */
2909 eol_type = CODING_EOL_LF; 3314 eol_type = CODING_EOL_LF;
2910 } 3315 }
2911 3316
2912 val = Fget (coding->symbol, Qeol_type); 3317 val = Fget (coding->symbol, Qeol_type);
2913 if (VECTORP (val) && XVECTOR (val)->size == 3) 3318 if (VECTORP (val) && XVECTOR (val)->size == 3)
2914 setup_coding_system (XVECTOR (val)->contents[eol_type], coding); 3319 {
3320 setup_coding_system (XVECTOR (val)->contents[eol_type], coding);
3321 coding->heading_ascii = skip;
3322 }
3323 }
3324
3325 #define CONVERSION_BUFFER_EXTRA_ROOM 256
3326
3327 #define DECODING_BUFFER_MAG(coding) \
3328 (coding->type == coding_type_iso2022 \
3329 ? 3 \
3330 : ((coding->type == coding_type_sjis || coding->type == coding_type_big5) \
3331 ? 2 \
3332 : (coding->type == coding_type_raw_text \
3333 ? 1 \
3334 : (coding->type == coding_type_ccl \
3335 ? coding->spec.ccl.decoder.buf_magnification \
3336 : 2))))
3337
3338 /* Return maximum size (bytes) of a buffer enough for decoding
3339 SRC_BYTES of text encoded in CODING. */
3340
3341 int
3342 decoding_buffer_size (coding, src_bytes)
3343 struct coding_system *coding;
3344 int src_bytes;
3345 {
3346 return (src_bytes * DECODING_BUFFER_MAG (coding)
3347 + CONVERSION_BUFFER_EXTRA_ROOM);
3348 }
3349
3350 /* Return maximum size (bytes) of a buffer enough for encoding
3351 SRC_BYTES of text to CODING. */
3352
3353 int
3354 encoding_buffer_size (coding, src_bytes)
3355 struct coding_system *coding;
3356 int src_bytes;
3357 {
3358 int magnification;
3359
3360 if (coding->type == coding_type_ccl)
3361 magnification = coding->spec.ccl.encoder.buf_magnification;
3362 else
3363 magnification = 3;
3364
3365 return (src_bytes * magnification + CONVERSION_BUFFER_EXTRA_ROOM);
3366 }
3367
3368 #ifndef MINIMUM_CONVERSION_BUFFER_SIZE
3369 #define MINIMUM_CONVERSION_BUFFER_SIZE 1024
3370 #endif
3371
3372 char *conversion_buffer;
3373 int conversion_buffer_size;
3374
3375 /* Return a pointer to a SIZE bytes of buffer to be used for encoding
3376 or decoding. Sufficient memory is allocated automatically. If we
3377 run out of memory, return NULL. */
3378
3379 char *
3380 get_conversion_buffer (size)
3381 int size;
3382 {
3383 if (size > conversion_buffer_size)
3384 {
3385 char *buf;
3386 int real_size = conversion_buffer_size * 2;
3387
3388 while (real_size < size) real_size *= 2;
3389 buf = (char *) xmalloc (real_size);
3390 xfree (conversion_buffer);
3391 conversion_buffer = buf;
3392 conversion_buffer_size = real_size;
3393 }
3394 return conversion_buffer;
3395 }
3396
3397 int
3398 ccl_coding_driver (coding, source, destination, src_bytes, dst_bytes, encodep)
3399 struct coding_system *coding;
3400 unsigned char *source, *destination;
3401 int src_bytes, dst_bytes, encodep;
3402 {
3403 struct ccl_program *ccl
3404 = encodep ? &coding->spec.ccl.encoder : &coding->spec.ccl.decoder;
3405 int result;
3406
3407 coding->produced = ccl_driver (ccl, source, destination,
3408 src_bytes, dst_bytes, &(coding->consumed));
3409 if (encodep)
3410 {
3411 coding->produced_char = coding->produced;
3412 coding->consumed_char
3413 = multibyte_chars_in_text (source, coding->consumed);
3414 }
3415 else
3416 {
3417 coding->produced_char
3418 = multibyte_chars_in_text (destination, coding->produced);
3419 coding->consumed_char = coding->consumed;
3420 }
3421 switch (ccl->status)
3422 {
3423 case CCL_STAT_SUSPEND_BY_SRC:
3424 result = CODING_FINISH_INSUFFICIENT_SRC;
3425 break;
3426 case CCL_STAT_SUSPEND_BY_DST:
3427 result = CODING_FINISH_INSUFFICIENT_DST;
3428 break;
3429 default:
3430 result = CODING_FINISH_NORMAL;
3431 break;
3432 }
3433 return result;
2915 } 3434 }
2916 3435
2917 /* See "GENERAL NOTES about `decode_coding_XXX ()' functions". Before 3436 /* See "GENERAL NOTES about `decode_coding_XXX ()' functions". Before
2918 decoding, it may detect coding system and format of end-of-line if 3437 decoding, it may detect coding system and format of end-of-line if
2919 those are not yet decided. */ 3438 those are not yet decided. */
2920 3439
2921 int 3440 int
2922 decode_coding (coding, source, destination, src_bytes, dst_bytes, consumed) 3441 decode_coding (coding, source, destination, src_bytes, dst_bytes)
2923 struct coding_system *coding; 3442 struct coding_system *coding;
2924 unsigned char *source, *destination; 3443 unsigned char *source, *destination;
2925 int src_bytes, dst_bytes; 3444 int src_bytes, dst_bytes;
2926 int *consumed; 3445 {
2927 { 3446 int result;
2928 int produced;
2929 3447
2930 if (src_bytes <= 0) 3448 if (src_bytes <= 0)
2931 { 3449 {
2932 *consumed = 0; 3450 coding->produced = coding->produced_char = 0;
2933 return 0; 3451 coding->consumed = coding->consumed_char = 0;
3452 return CODING_FINISH_NORMAL;
2934 } 3453 }
2935 3454
2936 if (coding->type == coding_type_undecided) 3455 if (coding->type == coding_type_undecided)
2937 detect_coding (coding, source, src_bytes); 3456 detect_coding (coding, source, src_bytes);
2938 3457
2939 if (coding->eol_type == CODING_EOL_UNDECIDED) 3458 if (coding->eol_type == CODING_EOL_UNDECIDED)
2940 detect_eol (coding, source, src_bytes); 3459 detect_eol (coding, source, src_bytes);
2941 3460
2942 coding->carryover_size = 0;
2943 switch (coding->type) 3461 switch (coding->type)
2944 { 3462 {
2945 case coding_type_no_conversion:
2946 label_no_conversion:
2947 produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes;
2948 bcopy (source, destination, produced);
2949 *consumed = produced;
2950 break;
2951
2952 case coding_type_emacs_mule: 3463 case coding_type_emacs_mule:
2953 case coding_type_undecided: 3464 case coding_type_undecided:
2954 case coding_type_raw_text: 3465 case coding_type_raw_text:
2955 if (coding->eol_type == CODING_EOL_LF 3466 if (coding->eol_type == CODING_EOL_LF
2956 || coding->eol_type == CODING_EOL_UNDECIDED) 3467 || coding->eol_type == CODING_EOL_UNDECIDED)
2957 goto label_no_conversion; 3468 goto label_no_conversion;
2958 produced = decode_eol (coding, source, destination, 3469 result = decode_eol (coding, source, destination, src_bytes, dst_bytes);
2959 src_bytes, dst_bytes, consumed);
2960 break; 3470 break;
2961 3471
2962 case coding_type_sjis: 3472 case coding_type_sjis:
2963 produced = decode_coding_sjis_big5 (coding, source, destination, 3473 result = decode_coding_sjis_big5 (coding, source, destination,
2964 src_bytes, dst_bytes, consumed, 3474 src_bytes, dst_bytes, 1);
2965 1);
2966 break; 3475 break;
2967 3476
2968 case coding_type_iso2022: 3477 case coding_type_iso2022:
2969 produced = decode_coding_iso2022 (coding, source, destination, 3478 result = decode_coding_iso2022 (coding, source, destination,
2970 src_bytes, dst_bytes, consumed); 3479 src_bytes, dst_bytes);
2971 break; 3480 break;
2972 3481
2973 case coding_type_big5: 3482 case coding_type_big5:
2974 produced = decode_coding_sjis_big5 (coding, source, destination, 3483 result = decode_coding_sjis_big5 (coding, source, destination,
2975 src_bytes, dst_bytes, consumed, 3484 src_bytes, dst_bytes, 0);
2976 0);
2977 break; 3485 break;
2978 3486
2979 case coding_type_ccl: 3487 case coding_type_ccl:
2980 produced = ccl_driver (&coding->spec.ccl.decoder, source, destination, 3488 result = ccl_coding_driver (coding, source, destination,
2981 src_bytes, dst_bytes, consumed); 3489 src_bytes, dst_bytes, 0);
2982 break; 3490 break;
2983 } 3491
2984 3492 default: /* i.e. case coding_type_no_conversion: */
2985 return produced; 3493 label_no_conversion:
3494 if (dst_bytes && src_bytes > dst_bytes)
3495 {
3496 coding->produced = dst_bytes;
3497 result = CODING_FINISH_INSUFFICIENT_DST;
3498 }
3499 else
3500 {
3501 coding->produced = src_bytes;
3502 result = CODING_FINISH_NORMAL;
3503 }
3504 if (dst_bytes)
3505 bcopy (source, destination, coding->produced);
3506 else
3507 safe_bcopy (source, destination, coding->produced);
3508 coding->consumed
3509 = coding->consumed_char = coding->produced_char = coding->produced;
3510 break;
3511 }
3512
3513 return result;
2986 } 3514 }
2987 3515
2988 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". */ 3516 /* See "GENERAL NOTES about `encode_coding_XXX ()' functions". */
2989 3517
2990 int 3518 int
2991 encode_coding (coding, source, destination, src_bytes, dst_bytes, consumed) 3519 encode_coding (coding, source, destination, src_bytes, dst_bytes)
2992 struct coding_system *coding; 3520 struct coding_system *coding;
2993 unsigned char *source, *destination; 3521 unsigned char *source, *destination;
2994 int src_bytes, dst_bytes; 3522 int src_bytes, dst_bytes;
2995 int *consumed; 3523 {
2996 { 3524 int result;
2997 int produced; 3525
3526 if (src_bytes <= 0)
3527 {
3528 coding->produced = coding->produced_char = 0;
3529 coding->consumed = coding->consumed_char = 0;
3530 return CODING_FINISH_NORMAL;
3531 }
2998 3532
2999 switch (coding->type) 3533 switch (coding->type)
3000 { 3534 {
3001 case coding_type_no_conversion:
3002 label_no_conversion:
3003 produced = (src_bytes > dst_bytes) ? dst_bytes : src_bytes;
3004 if (produced > 0)
3005 {
3006 bcopy (source, destination, produced);
3007 if (coding->selective)
3008 {
3009 unsigned char *p = destination, *pend = destination + produced;
3010 while (p < pend)
3011 if (*p++ == '\015') p[-1] = '\n';
3012 }
3013 }
3014 *consumed = produced;
3015 break;
3016
3017 case coding_type_emacs_mule: 3535 case coding_type_emacs_mule:
3018 case coding_type_undecided: 3536 case coding_type_undecided:
3019 case coding_type_raw_text: 3537 case coding_type_raw_text:
3020 if (coding->eol_type == CODING_EOL_LF 3538 if (coding->eol_type == CODING_EOL_LF
3021 || coding->eol_type == CODING_EOL_UNDECIDED) 3539 || coding->eol_type == CODING_EOL_UNDECIDED)
3022 goto label_no_conversion; 3540 goto label_no_conversion;
3023 produced = encode_eol (coding, source, destination, 3541 result = encode_eol (coding, source, destination, src_bytes, dst_bytes);
3024 src_bytes, dst_bytes, consumed);
3025 break; 3542 break;
3026 3543
3027 case coding_type_sjis: 3544 case coding_type_sjis:
3028 produced = encode_coding_sjis_big5 (coding, source, destination, 3545 result = encode_coding_sjis_big5 (coding, source, destination,
3029 src_bytes, dst_bytes, consumed, 3546 src_bytes, dst_bytes, 1);
3030 1);
3031 break; 3547 break;
3032 3548
3033 case coding_type_iso2022: 3549 case coding_type_iso2022:
3034 produced = encode_coding_iso2022 (coding, source, destination, 3550 result = encode_coding_iso2022 (coding, source, destination,
3035 src_bytes, dst_bytes, consumed); 3551 src_bytes, dst_bytes);
3036 break; 3552 break;
3037 3553
3038 case coding_type_big5: 3554 case coding_type_big5:
3039 produced = encode_coding_sjis_big5 (coding, source, destination, 3555 result = encode_coding_sjis_big5 (coding, source, destination,
3040 src_bytes, dst_bytes, consumed, 3556 src_bytes, dst_bytes, 0);
3041 0);
3042 break; 3557 break;
3043 3558
3044 case coding_type_ccl: 3559 case coding_type_ccl:
3045 produced = ccl_driver (&coding->spec.ccl.encoder, source, destination, 3560 result = ccl_coding_driver (coding, source, destination,
3046 src_bytes, dst_bytes, consumed); 3561 src_bytes, dst_bytes, 1);
3047 break; 3562 break;
3048 } 3563
3049 3564 default: /* i.e. case coding_type_no_conversion: */
3050 return produced; 3565 label_no_conversion:
3051 } 3566 if (dst_bytes && src_bytes > dst_bytes)
3052 3567 {
3053 #define CONVERSION_BUFFER_EXTRA_ROOM 256 3568 coding->produced = dst_bytes;
3054 3569 result = CODING_FINISH_INSUFFICIENT_DST;
3055 /* Return maximum size (bytes) of a buffer enough for decoding 3570 }
3056 SRC_BYTES of text encoded in CODING. */ 3571 else
3572 {
3573 coding->produced = src_bytes;
3574 result = CODING_FINISH_NORMAL;
3575 }
3576 if (dst_bytes)
3577 bcopy (source, destination, coding->produced);
3578 else
3579 safe_bcopy (source, destination, coding->produced);
3580 if (coding->mode & CODING_MODE_SELECTIVE_DISPLAY)
3581 {
3582 unsigned char *p = destination, *pend = p + coding->produced;
3583 while (p < pend)
3584 if (*p++ == '\015') p[-1] = '\n';
3585 }
3586 coding->consumed
3587 = coding->consumed_char = coding->produced_char = coding->produced;
3588 break;
3589 }
3590
3591 return result;
3592 }
3593
3594 /* Scan text in the region between *BEG and *END, skip characters
3595 which we don't have to decode by coding system CODING at the head
3596 and tail, then set *BEG and *END to the region of the text we
3597 actually have to convert.
3598
3599 If STR is not NULL, *BEG and *END are indices into STR. */
3600
3601 static void
3602 shrink_decoding_region (beg, end, coding, str)
3603 int *beg, *end;
3604 struct coding_system *coding;
3605 unsigned char *str;
3606 {
3607 unsigned char *begp_orig, *begp, *endp_orig, *endp;
3608 int eol_conversion;
3609
3610 if (coding->type == coding_type_ccl
3611 || coding->type == coding_type_undecided
3612 || !NILP (coding->post_read_conversion))
3613 {
3614 /* We can't skip any data. */
3615 return;
3616 }
3617 else if (coding->type == coding_type_no_conversion)
3618 {
3619 /* We need no conversion. */
3620 *beg = *end;
3621 return;
3622 }
3623
3624 if (coding->heading_ascii >= 0)
3625 /* Detection routine has already found how much we can skip at the
3626 head. */
3627 *beg += coding->heading_ascii;
3628
3629 if (str)
3630 {
3631 begp_orig = begp = str + *beg;
3632 endp_orig = endp = str + *end;
3633 }
3634 else
3635 {
3636 move_gap (*beg);
3637 begp_orig = begp = GAP_END_ADDR;
3638 endp_orig = endp = begp + *end - *beg;
3639 }
3640
3641 eol_conversion = (coding->eol_type != CODING_EOL_LF);
3642
3643 switch (coding->type)
3644 {
3645 case coding_type_emacs_mule:
3646 case coding_type_raw_text:
3647 if (eol_conversion)
3648 {
3649 if (coding->heading_ascii < 0)
3650 while (begp < endp && *begp != '\r') begp++;
3651 while (begp < endp && *(endp - 1) != '\r') endp--;
3652 }
3653 else
3654 begp = endp;
3655 break;
3656
3657 case coding_type_sjis:
3658 case coding_type_big5:
3659 /* We can skip all ASCII characters at the head. */
3660 if (coding->heading_ascii < 0)
3661 {
3662 if (eol_conversion)
3663 while (begp < endp && *begp < 0x80 && *begp != '\n') begp++;
3664 else
3665 while (begp < endp && *begp < 0x80) begp++;
3666 }
3667 /* We can skip all ASCII characters at the tail except for the
3668 second byte of SJIS or BIG5 code. */
3669 if (eol_conversion)
3670 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\n') endp--;
3671 else
3672 while (begp < endp && endp[-1] < 0x80) endp--;
3673 if (begp < endp && endp < endp_orig && endp[-1] >= 0x80)
3674 endp++;
3675 break;
3676
3677 default: /* i.e. case coding_type_iso2022: */
3678 if (coding->heading_ascii < 0)
3679 {
3680 unsigned char c;
3681
3682 /* We can skip all ASCII characters at the head except for a
3683 few control codes. */
3684 while (begp < endp && (c = *begp) < 0x80
3685 && c != ISO_CODE_CR && c != ISO_CODE_SO
3686 && c != ISO_CODE_SI && c != ISO_CODE_ESC
3687 && (!eol_conversion || c != ISO_CODE_LF))
3688 begp++;
3689 }
3690 switch (coding->category_idx)
3691 {
3692 case CODING_CATEGORY_IDX_ISO_8_1:
3693 case CODING_CATEGORY_IDX_ISO_8_2:
3694 /* We can skip all ASCII characters at the tail. */
3695 if (eol_conversion)
3696 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\n') endp--;
3697 else
3698 while (begp < endp && endp[-1] < 0x80) endp--;
3699 break;
3700
3701 case CODING_CATEGORY_IDX_ISO_7:
3702 case CODING_CATEGORY_IDX_ISO_7_TIGHT:
3703 /* We can skip all charactes at the tail except for ESC and
3704 the following 2-byte at the tail. */
3705 if (eol_conversion)
3706 while (begp < endp && endp[-1] != ISO_CODE_ESC && endp[-1] != '\n')
3707 endp--;
3708 else
3709 while (begp < endp && endp[-1] != ISO_CODE_ESC)
3710 endp--;
3711 if (begp < endp && endp[-1] == ISO_CODE_ESC)
3712 {
3713 if (endp + 1 < endp_orig && end[0] == '(' && end[1] == 'B')
3714 /* This is an ASCII designation sequence. We can
3715 surely skip the tail. */
3716 endp += 2;
3717 else
3718 /* Hmmm, we can't skip the tail. */
3719 endp = endp_orig;
3720 }
3721 }
3722 }
3723 *beg += begp - begp_orig;
3724 *end += endp - endp_orig;
3725 return;
3726 }
3727
3728 /* Like shrink_decoding_region but for encoding. */
3729
3730 static void
3731 shrink_encoding_region (beg, end, coding, str)
3732 int *beg, *end;
3733 struct coding_system *coding;
3734 unsigned char *str;
3735 {
3736 unsigned char *begp_orig, *begp, *endp_orig, *endp;
3737 int eol_conversion;
3738
3739 if (coding->type == coding_type_ccl)
3740 /* We can't skip any data. */
3741 return;
3742 else if (coding->type == coding_type_no_conversion)
3743 {
3744 /* We need no conversion. */
3745 *beg = *end;
3746 return;
3747 }
3748
3749 if (str)
3750 {
3751 begp_orig = begp = str + *beg;
3752 endp_orig = endp = str + *end;
3753 }
3754 else
3755 {
3756 move_gap (*beg);
3757 begp_orig = begp = GAP_END_ADDR;
3758 endp_orig = endp = begp + *end - *beg;
3759 }
3760
3761 eol_conversion = (coding->eol_type == CODING_EOL_CR
3762 || coding->eol_type == CODING_EOL_CRLF);
3763
3764 /* Here, we don't have to check coding->pre_write_conversion because
3765 the caller is expected to have handled it already. */
3766 switch (coding->type)
3767 {
3768 case coding_type_undecided:
3769 case coding_type_emacs_mule:
3770 case coding_type_raw_text:
3771 if (eol_conversion)
3772 {
3773 while (begp < endp && *begp != '\n') begp++;
3774 while (begp < endp && endp[-1] != '\n') endp--;
3775 }
3776 else
3777 begp = endp;
3778 break;
3779
3780 case coding_type_iso2022:
3781 if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL)
3782 {
3783 unsigned char *bol = begp;
3784 while (begp < endp && *begp < 0x80)
3785 {
3786 begp++;
3787 if (begp[-1] == '\n')
3788 bol = begp;
3789 }
3790 begp = bol;
3791 goto label_skip_tail;
3792 }
3793 /* fall down ... */
3794
3795 default:
3796 /* We can skip all ASCII characters at the head and tail. */
3797 if (eol_conversion)
3798 while (begp < endp && *begp < 0x80 && *begp != '\n') begp++;
3799 else
3800 while (begp < endp && *begp < 0x80) begp++;
3801 label_skip_tail:
3802 if (eol_conversion)
3803 while (begp < endp && endp[-1] < 0x80 && endp[-1] != '\n') endp--;
3804 else
3805 while (begp < endp && *(endp - 1) < 0x80) endp--;
3806 break;
3807 }
3808
3809 *beg += begp - begp_orig;
3810 *end += endp - endp_orig;
3811 return;
3812 }
3813
3814 /* Decode (if ENCODEP is zero) or encode (if ENCODEP is nonzero) the
3815 text from FROM to TO by coding system CODING, and return number of
3816 characters in the resulting text.
3817
3818 If ADJUST is nonzero, we do various things as if the original text
3819 is deleted and a new text is inserted. See the comments in
3820 replace_range (insdel.c) to know what we are doing.
3821
3822 ADJUST nonzero also means that post-read-conversion or
3823 pre-write-conversion functions (if any) should be processed. */
3057 3824
3058 int 3825 int
3059 decoding_buffer_size (coding, src_bytes) 3826 code_convert_region (from, to, coding, encodep, adjust)
3827 int from, to, encodep, adjust;
3060 struct coding_system *coding; 3828 struct coding_system *coding;
3061 int src_bytes; 3829 {
3062 { 3830 int len = to - from, require, inserted, inserted_byte;
3063 int magnification; 3831 int from_byte, to_byte, len_byte;
3064 3832 int from_byte_orig, to_byte_orig;
3065 if (coding->type == coding_type_iso2022) 3833 Lisp_Object saved_coding_symbol = Qnil;
3066 magnification = 3; 3834
3067 else if (coding->type == coding_type_ccl) 3835 if (adjust)
3068 magnification = coding->spec.ccl.decoder.buf_magnification; 3836 {
3837 prepare_to_modify_buffer (from, to, &from);
3838 to = from + len;
3839 }
3840 from_byte = CHAR_TO_BYTE (from); to_byte = CHAR_TO_BYTE (to);
3841 len_byte = from_byte - to_byte;
3842
3843 if (! encodep && CODING_REQUIRE_DETECTION (coding))
3844 {
3845 /* We must detect encoding of text and eol. Even if detection
3846 routines can't decide the encoding, we should not let them
3847 undecided because the deeper decoding routine (decode_coding)
3848 tries to detect the encodings in vain in that case. */
3849
3850 if (from < GPT && to > GPT)
3851 move_gap_both (from, from_byte);
3852 if (coding->type == coding_type_undecided)
3853 {
3854 detect_coding (coding, BYTE_POS_ADDR (from), len);
3855 if (coding->type == coding_type_undecided)
3856 coding->type = coding_type_emacs_mule;
3857 }
3858 if (coding->eol_type == CODING_EOL_UNDECIDED)
3859 {
3860 saved_coding_symbol = coding->symbol;
3861 detect_eol (coding, BYTE_POS_ADDR (from_byte), len_byte);
3862 if (coding->eol_type == CODING_EOL_UNDECIDED)
3863 coding->eol_type = CODING_EOL_LF;
3864 /* We had better recover the original eol format if we
3865 encounter an inconsitent eol format while decoding. */
3866 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
3867 }
3868 }
3869
3870 if (encodep
3871 ? ! CODING_REQUIRE_ENCODING (coding)
3872 : ! CODING_REQUIRE_DECODING (coding))
3873 return len;
3874
3875 /* Now we convert the text. */
3876
3877 /* For encoding, we must process pre-write-conversion in advance. */
3878 if (encodep
3879 && adjust
3880 && ! NILP (coding->pre_write_conversion)
3881 && SYMBOLP (coding->pre_write_conversion)
3882 && ! NILP (Ffboundp (coding->pre_write_conversion)))
3883 {
3884 /* The function in pre-write-conversion put a new text in a new
3885 buffer. */
3886 struct buffer *prev = current_buffer, *new;
3887
3888 call2 (coding->pre_write_conversion, from, to);
3889 if (current_buffer != prev)
3890 {
3891 len = ZV - BEGV;
3892 new = current_buffer;
3893 set_buffer_internal_1 (prev);
3894 del_range (from, to);
3895 insert_from_buffer (new, BEG, len, 0);
3896 to = from + len;
3897 to_byte = CHAR_TO_BYTE (to);
3898 len_byte = to_byte - from_byte;
3899 }
3900 }
3901
3902 /* Try to skip the heading and tailing ASCIIs. */
3903 from_byte_orig = from_byte; to_byte_orig = to_byte;
3904 if (encodep)
3905 shrink_encoding_region (&from_byte, &to_byte, coding, NULL);
3069 else 3906 else
3070 magnification = 2; 3907 shrink_decoding_region (&from_byte, &to_byte, coding, NULL);
3071 3908 if (from_byte == to_byte)
3072 return (src_bytes * magnification + CONVERSION_BUFFER_EXTRA_ROOM); 3909 return len;
3073 } 3910 /* Here, the excluded region by shrinking contains only ASCIIs. */
3074 3911 from += (from_byte - from_byte_orig);
3075 /* Return maximum size (bytes) of a buffer enough for encoding 3912 to += (to_byte - to_byte_orig);
3076 SRC_BYTES of text to CODING. */ 3913 len = to - from;
3077 3914 len_byte = to_byte - from_byte;
3078 int 3915
3079 encoding_buffer_size (coding, src_bytes) 3916 /* For converion, we must put the gap before the text to be decoded
3917 in addition to make the gap larger for efficient decoding. The
3918 required gap size starts from 2000 which is the magic number used
3919 in make_gap. But, after one batch of conversion, it will be
3920 incremented if we find that it is not enough . */
3921 require = 2000;
3922
3923 if (GAP_SIZE < require)
3924 make_gap (require - GAP_SIZE);
3925 move_gap_both (from, from_byte);
3926
3927 if (adjust)
3928 adjust_before_replace (from, from_byte, to, to_byte);
3929
3930 if (GPT - BEG < beg_unchanged)
3931 beg_unchanged = GPT - BEG;
3932 if (Z - GPT < end_unchanged)
3933 end_unchanged = Z - GPT;
3934
3935 inserted = inserted_byte = 0;
3936 for (;;)
3937 {
3938 int result, diff_char, diff_byte;
3939
3940 /* The buffer memory is changed from:
3941 +--------+converted-text+------------+-----original-text-----+---+
3942 |<-from->|<--inserted-->|<-GAP_SIZE->|<---------len--------->|---| */
3943
3944 if (encodep)
3945 result = encode_coding (coding, GAP_END_ADDR, GPT_ADDR, len_byte, 0);
3946 else
3947 result = decode_coding (coding, GAP_END_ADDR, GPT_ADDR, len_byte, 0);
3948 /* to:
3949 +--------+-------converted-text--------+--+---original-text--+---+
3950 |<-from->|<----(inserted+produced)---->|--|<-(len-consumed)->|---| */
3951
3952 diff_char = coding->produced_char - coding->consumed_char;
3953 diff_byte = coding->produced - coding->consumed;
3954
3955 GAP_SIZE -= diff_byte;
3956 ZV += diff_char; ZV_BYTE += diff_byte;
3957 Z += diff_char; Z_BYTE += diff_byte;
3958 GPT += coding->produced_char; GPT_BYTE += coding->produced;
3959
3960 inserted += coding->produced_char;
3961 inserted_byte += coding->produced;
3962 len -= coding->consumed_char;
3963 len_byte -= coding->consumed;
3964
3965 if (! encodep && result == CODING_FINISH_INCONSISTENT_EOL)
3966 {
3967 unsigned char *p = GPT_ADDR - inserted_byte, *pend = GPT_ADDR;
3968
3969 /* Encode LFs back to the original eol format (CR or CRLF). */
3970 if (coding->eol_type == CODING_EOL_CR)
3971 {
3972 while (p < pend) if (*p++ == '\n') p[-1] = '\r';
3973 }
3974 else
3975 {
3976 unsigned char *p2 = p;
3977 int count = 0;
3978
3979 while (p2 < pend) if (*p2++ == '\n') count++;
3980 if (GAP_SIZE < count)
3981 make_gap (count - GAP_SIZE);
3982 p2 = GPT_ADDR + count;
3983 while (p < pend)
3984 {
3985 *--p2 = *--pend;
3986 if (*pend == '\n') *--p2 = '\r';
3987 }
3988 GPT += count; GAP_SIZE -= count; ZV += count; Z += count;
3989 ZV_BYTE += count; Z_BYTE += count;
3990 coding->produced += count;
3991 coding->produced_char += count;
3992 inserted += count;
3993 inserted_byte += count;
3994 }
3995
3996 /* Suppress eol-format conversion in the further conversion. */
3997 coding->eol_type = CODING_EOL_LF;
3998
3999 /* Restore the original symbol. */
4000 coding->symbol = saved_coding_symbol;
4001 }
4002 if (len_byte <= 0)
4003 break;
4004 if (result == CODING_FINISH_INSUFFICIENT_SRC)
4005 {
4006 /* The source text ends in invalid codes. Let's just
4007 make them valid buffer contents, and finish conversion. */
4008 inserted += len;
4009 inserted_byte += len_byte;
4010 break;
4011 }
4012 if (inserted == coding->produced_char)
4013 /* We have just done the first batch of conversion. Let's
4014 reconsider the required gap size now.
4015
4016 We have converted CONSUMED bytes into PRODUCED bytes. To
4017 convert the remaining LEN bytes, we may need REQUIRE bytes
4018 of gap, where:
4019 REQUIRE + LEN = (LEN * PRODUCED / CONSUMED)
4020 REQUIRE = LEN * (PRODUCED - CONSUMED) / CONSUMED
4021 = LEN * DIFF / CONSUMED
4022 Here, we are sure that DIFF is positive. */
4023 require = len_byte * diff_byte / coding->consumed;
4024 if (GAP_SIZE < require)
4025 make_gap (require - GAP_SIZE);
4026 }
4027 if (GAP_SIZE > 0) *GPT_ADDR = 0; /* Put an anchor. */
4028
4029 if (adjust)
4030 {
4031 adjust_after_replace (from, from_byte, to, to_byte,
4032 inserted, inserted_byte);
4033
4034 if (! encodep && ! NILP (coding->post_read_conversion))
4035 {
4036 Lisp_Object val;
4037 int orig_inserted = inserted, pos = PT;
4038
4039 temp_set_point_both (current_buffer, from, from_byte);
4040 val = call1 (coding->post_read_conversion, make_number (inserted));
4041 if (! NILP (val))
4042 {
4043 CHECK_NUMBER (val, 0);
4044 inserted = XFASTINT (val);
4045 }
4046 if (pos >= from + orig_inserted)
4047 temp_set_point (current_buffer, pos + (inserted - orig_inserted));
4048 }
4049 }
4050
4051 return ((from_byte - from_byte_orig) + inserted + (to_byte_orig - to_byte));
4052 }
4053
4054 Lisp_Object
4055 code_convert_string (str, coding, encodep, nocopy)
4056 Lisp_Object str;
3080 struct coding_system *coding; 4057 struct coding_system *coding;
3081 int src_bytes; 4058 int encodep, nocopy;
3082 { 4059 {
3083 int magnification; 4060 int len;
3084 4061 char *buf;
3085 if (coding->type == coding_type_ccl) 4062 int from = 0, to = XSTRING (str)->size, to_byte = XSTRING (str)->size_byte;
3086 magnification = coding->spec.ccl.encoder.buf_magnification; 4063 struct gcpro gcpro1;
4064 Lisp_Object saved_coding_symbol = Qnil;
4065 int result;
4066
4067 if (encodep && !NILP (coding->pre_write_conversion)
4068 || !encodep && !NILP (coding->post_read_conversion))
4069 {
4070 /* Since we have to call Lisp functions which assume target text
4071 is in a buffer, after setting a temporary buffer, call
4072 code_convert_region. */
4073 int count = specpdl_ptr - specpdl;
4074 struct buffer *prev = current_buffer;
4075
4076 record_unwind_protect (Fset_buffer, Fcurrent_buffer ());
4077 temp_output_buffer_setup (" *code-converting-work*");
4078 set_buffer_internal (XBUFFER (Vstandard_output));
4079 if (encodep)
4080 insert_from_string (str, 0, 0, to, to_byte, 0);
4081 else
4082 {
4083 /* We must insert the contents of STR as is without
4084 unibyte<->multibyte conversion. */
4085 current_buffer->enable_multibyte_characters = Qnil;
4086 insert_from_string (str, 0, 0, to_byte, to_byte, 0);
4087 current_buffer->enable_multibyte_characters = Qt;
4088 }
4089 code_convert_region (BEGV, ZV, coding, encodep, 1);
4090 if (encodep)
4091 /* We must return the buffer contents as unibyte string. */
4092 current_buffer->enable_multibyte_characters = Qnil;
4093 str = make_buffer_string (BEGV, ZV, 0);
4094 set_buffer_internal (prev);
4095 return unbind_to (count, str);
4096 }
4097
4098 if (! encodep && CODING_REQUIRE_DETECTION (coding))
4099 {
4100 /* See the comments in code_convert_region. */
4101 if (coding->type == coding_type_undecided)
4102 {
4103 detect_coding (coding, XSTRING (str)->data, to_byte);
4104 if (coding->type == coding_type_undecided)
4105 coding->type = coding_type_emacs_mule;
4106 }
4107 if (coding->eol_type == CODING_EOL_UNDECIDED)
4108 {
4109 saved_coding_symbol = coding->symbol;
4110 detect_eol (coding, XSTRING (str)->data, to_byte);
4111 if (coding->eol_type == CODING_EOL_UNDECIDED)
4112 coding->eol_type = CODING_EOL_LF;
4113 /* We had better recover the original eol format if we
4114 encounter an inconsitent eol format while decoding. */
4115 coding->mode |= CODING_MODE_INHIBIT_INCONSISTENT_EOL;
4116 }
4117 }
4118
4119 if (encodep
4120 ? ! CODING_REQUIRE_ENCODING (coding)
4121 : ! CODING_REQUIRE_DECODING (coding))
4122 from = to_byte;
3087 else 4123 else
3088 magnification = 3; 4124 {
3089 4125 /* Try to skip the heading and tailing ASCIIs. */
3090 return (src_bytes * magnification + CONVERSION_BUFFER_EXTRA_ROOM); 4126 if (encodep)
3091 } 4127 shrink_encoding_region (&from, &to_byte, coding, XSTRING (str)->data);
3092 4128 else
3093 #ifndef MINIMUM_CONVERSION_BUFFER_SIZE 4129 shrink_decoding_region (&from, &to_byte, coding, XSTRING (str)->data);
3094 #define MINIMUM_CONVERSION_BUFFER_SIZE 1024 4130 }
3095 #endif 4131 if (from == to_byte)
3096 4132 return (nocopy ? str : Fcopy_sequence (str));
3097 char *conversion_buffer; 4133
3098 int conversion_buffer_size; 4134 if (encodep)
3099 4135 len = encoding_buffer_size (coding, to_byte - from);
3100 /* Return a pointer to a SIZE bytes of buffer to be used for encoding 4136 else
3101 or decoding. Sufficient memory is allocated automatically. If we 4137 len = decoding_buffer_size (coding, to_byte - from);
3102 run out of memory, return NULL. */ 4138 len += from + XSTRING (str)->size_byte - to_byte;
3103 4139 GCPRO1 (str);
3104 char * 4140 buf = get_conversion_buffer (len);
3105 get_conversion_buffer (size) 4141 UNGCPRO;
3106 int size; 4142
3107 { 4143 if (from > 0)
3108 if (size > conversion_buffer_size) 4144 bcopy (XSTRING (str)->data, buf, from);
3109 { 4145 result = (encodep
3110 char *buf; 4146 ? encode_coding (coding, XSTRING (str)->data + from,
3111 int real_size = conversion_buffer_size * 2; 4147 buf + from, to_byte - from, len)
3112 4148 : decode_coding (coding, XSTRING (str)->data + from,
3113 while (real_size < size) real_size *= 2; 4149 buf + from, to - from, len));
3114 buf = (char *) xmalloc (real_size); 4150 if (! encodep && result == CODING_FINISH_INCONSISTENT_EOL)
3115 xfree (conversion_buffer); 4151 {
3116 conversion_buffer = buf; 4152 /* We simple try to decode the whole string again but without
3117 conversion_buffer_size = real_size; 4153 eol-conversion this time. */
3118 } 4154 coding->eol_type = CODING_EOL_LF;
3119 return conversion_buffer; 4155 coding->symbol = saved_coding_symbol;
4156 return code_convert_string (str, coding, encodep, nocopy);
4157 }
4158
4159 bcopy (XSTRING (str)->data + to_byte, buf + from + coding->produced,
4160 XSTRING (str)->size_byte - to_byte);
4161
4162 len = from + XSTRING (str)->size_byte - to_byte;
4163 if (encodep)
4164 str = make_unibyte_string (buf, len + coding->produced);
4165 else
4166 str = make_multibyte_string (buf, len + coding->produced_char,
4167 len + coding->produced);
4168 return str;
3120 } 4169 }
3121 4170
3122 4171
3123 #ifdef emacs 4172 #ifdef emacs
3124 /*** 7. Emacs Lisp library functions ***/ 4173 /*** 7. Emacs Lisp library functions ***/
3185 return coding_system; 4234 return coding_system;
3186 while (1) 4235 while (1)
3187 Fsignal (Qcoding_system_error, Fcons (coding_system, Qnil)); 4236 Fsignal (Qcoding_system_error, Fcons (coding_system, Qnil));
3188 } 4237 }
3189 4238
3190 DEFUN ("detect-coding-region", Fdetect_coding_region, Sdetect_coding_region, 4239 Lisp_Object
3191 2, 2, 0, 4240 detect_coding_system (src, src_bytes, highest)
3192 "Detect coding system of the text in the region between START and END.\n\ 4241 unsigned char *src;
3193 Return a list of possible coding systems ordered by priority.\n\ 4242 int src_bytes, highest;
3194 If only ASCII characters are found, it returns `undecided'\n\
3195 or its subsidiary coding system according to a detected end-of-line format.")
3196 (b, e)
3197 Lisp_Object b, e;
3198 { 4243 {
3199 int coding_mask, eol_type; 4244 int coding_mask, eol_type;
3200 Lisp_Object val; 4245 Lisp_Object val, tmp;
3201 int beg, end; 4246 int dummy;
3202 int beg_byte, end_byte; 4247
3203 4248 coding_mask = detect_coding_mask (src, src_bytes, NULL, &dummy);
3204 validate_region (&b, &e); 4249 eol_type = detect_eol_type (src, src_bytes, &dummy);
3205 beg = XINT (b), end = XINT (e); 4250 if (eol_type == CODING_EOL_INCONSISTENT)
3206 beg_byte = CHAR_TO_BYTE (beg); 4251 eol_type == CODING_EOL_UNDECIDED;
3207 end_byte = CHAR_TO_BYTE (end); 4252
3208 4253 if (!coding_mask)
3209 if (beg < GPT && end >= GPT)
3210 move_gap_both (end, end_byte);
3211
3212 coding_mask = detect_coding_mask (BYTE_POS_ADDR (beg_byte),
3213 end_byte - beg_byte);
3214 eol_type = detect_eol_type (BYTE_POS_ADDR (beg_byte), end_byte - beg_byte);
3215
3216 if (coding_mask == CODING_CATEGORY_MASK_ANY)
3217 { 4254 {
3218 val = Qundecided; 4255 val = Qundecided;
3219 if (eol_type != CODING_EOL_UNDECIDED 4256 if (eol_type != CODING_EOL_UNDECIDED)
3220 && eol_type != CODING_EOL_INCONSISTENT)
3221 { 4257 {
3222 Lisp_Object val2; 4258 Lisp_Object val2;
3223 val2 = Fget (Qundecided, Qeol_type); 4259 val2 = Fget (Qundecided, Qeol_type);
3224 if (VECTORP (val2)) 4260 if (VECTORP (val2))
3225 val = XVECTOR (val2)->contents[eol_type]; 4261 val = XVECTOR (val2)->contents[eol_type];
3226 } 4262 }
3227 } 4263 return val;
3228 else 4264 }
3229 { 4265
3230 Lisp_Object val2; 4266 /* At first, gather possible coding systems in VAL. */
3231 4267 val = Qnil;
3232 /* At first, gather possible coding-systems in VAL in a reverse 4268 for (tmp = Vcoding_category_list; !NILP (tmp); tmp = XCONS (tmp)->cdr)
3233 order. */ 4269 {
3234 val = Qnil; 4270 int idx
3235 for (val2 = Vcoding_category_list; 4271 = XFASTINT (Fget (XCONS (tmp)->car, Qcoding_category_index));
3236 !NILP (val2); 4272 if (coding_mask & (1 << idx))
3237 val2 = XCONS (val2)->cdr) 4273 {
3238 { 4274 val = Fcons (Fsymbol_value (XCONS (tmp)->car), val);
3239 int idx 4275 if (highest)
3240 = XFASTINT (Fget (XCONS (val2)->car, Qcoding_category_index)); 4276 break;
3241 if (coding_mask & (1 << idx)) 4277 }
3242 { 4278 }
3243 #if 0 4279 if (!highest)
3244 /* This code is suppressed until we find a better way to 4280 val = Fnreverse (val);
3245 distinguish raw text file and binary file. */ 4281
3246 4282 /* Then, substitute the elements by subsidiary coding systems. */
3247 if (idx == CODING_CATEGORY_IDX_RAW_TEXT 4283 for (tmp = val; !NILP (tmp); tmp = XCONS (tmp)->cdr)
3248 && eol_type == CODING_EOL_INCONSISTENT) 4284 {
3249 val = Fcons (Qno_conversion, val); 4285 if (eol_type != CODING_EOL_UNDECIDED)
3250 else 4286 {
3251 #endif /* 0 */ 4287 Lisp_Object eol;
3252 val = Fcons (Fsymbol_value (XCONS (val2)->car), val); 4288 eol = Fget (XCONS (tmp)->car, Qeol_type);
3253 } 4289 if (VECTORP (eol))
3254 } 4290 XCONS (tmp)->car = XVECTOR (eol)->contents[eol_type];
3255 4291 }
3256 /* Then, change the order of the list, while getting subsidiary 4292 }
3257 coding-systems. */ 4293 return (highest ? XCONS (val)->car : val);
3258 val2 = val; 4294 }
3259 val = Qnil; 4295
3260 if (eol_type == CODING_EOL_INCONSISTENT) 4296 DEFUN ("detect-coding-region", Fdetect_coding_region, Sdetect_coding_region,
3261 eol_type == CODING_EOL_UNDECIDED; 4297 2, 3, 0,
3262 for (; !NILP (val2); val2 = XCONS (val2)->cdr) 4298 "Detect coding system of the text in the region between START and END.\n\
3263 { 4299 Return a list of possible coding systems ordered by priority.\n\
3264 if (eol_type == CODING_EOL_UNDECIDED) 4300 \n\
3265 val = Fcons (XCONS (val2)->car, val); 4301 If only ASCII characters are found, it returns `undecided'\n\
3266 else 4302 or its subsidiary coding system according to a detected end-of-line format.\n\
3267 { 4303 \n\
3268 Lisp_Object val3; 4304 If optional argument HIGHEST is non-nil, return the coding system of\n\
3269 val3 = Fget (XCONS (val2)->car, Qeol_type); 4305 highest priority.")
3270 if (VECTORP (val3)) 4306 (start, end, highest)
3271 val = Fcons (XVECTOR (val3)->contents[eol_type], val); 4307 Lisp_Object start, end, highest;
3272 else 4308 {
3273 val = Fcons (XCONS (val2)->car, val); 4309 int from, to;
3274 } 4310 int from_byte, to_byte;
3275 } 4311
3276 } 4312 CHECK_NUMBER_COERCE_MARKER (start, 0);
3277 4313 CHECK_NUMBER_COERCE_MARKER (end, 1);
3278 return val; 4314
3279 } 4315 validate_region (&start, &end);
3280 4316 from = XINT (start), to = XINT (end);
3281 /* Scan text in the region between *BEGP and *ENDP, skip characters 4317 from_byte = CHAR_TO_BYTE (from);
3282 which we never have to encode to (iff ENCODEP is 1) or decode from 4318 to_byte = CHAR_TO_BYTE (to);
3283 coding system CODING at the head and tail, then set BEGP and ENDP 4319
3284 to the addresses of start and end of the text we actually convert. */ 4320 if (from < GPT && to >= GPT)
3285 4321 move_gap_both (to, to_byte);
3286 void 4322
3287 shrink_conversion_area (begp, endp, coding, encodep) 4323 return detect_coding_system (BYTE_POS_ADDR (from_byte),
3288 unsigned char **begp, **endp; 4324 to_byte - from_byte,
3289 struct coding_system *coding; 4325 !NILP (highest));
3290 int encodep; 4326 }
3291 { 4327
3292 register unsigned char *beg_addr = *begp, *end_addr = *endp; 4328 DEFUN ("detect-coding-string", Fdetect_coding_string, Sdetect_coding_string,
3293 4329 1, 2, 0,
3294 if (coding->eol_type != CODING_EOL_LF 4330 "Detect coding system of the text in STRING.\n\
3295 && coding->eol_type != CODING_EOL_UNDECIDED) 4331 Return a list of possible coding systems ordered by priority.\n\
3296 /* Since we anyway have to convert end-of-line format, it is not 4332 \n\
3297 worth skipping at most 100 bytes or so. */ 4333 If only ASCII characters are found, it returns `undecided'\n\
3298 return; 4334 or its subsidiary coding system according to a detected end-of-line format.\n\
3299 4335 \n\
3300 if (encodep) /* for encoding */ 4336 If optional argument HIGHEST is non-nil, return the coding system of\n\
3301 { 4337 highest priority.")
3302 switch (coding->type) 4338 (string, highest)
3303 { 4339 Lisp_Object string, highest;
3304 case coding_type_no_conversion: 4340 {
3305 case coding_type_emacs_mule: 4341 CHECK_STRING (string, 0);
3306 case coding_type_undecided: 4342
3307 case coding_type_raw_text: 4343 return detect_coding_system (XSTRING (string)->data,
3308 /* We need no conversion. */ 4344 XSTRING (string)->size_byte,
3309 *begp = *endp; 4345 !NILP (highest));
3310 return;
3311 case coding_type_ccl:
3312 /* We can't skip any data. */
3313 return;
3314 case coding_type_iso2022:
3315 if (coding->flags & CODING_FLAG_ISO_DESIGNATE_AT_BOL)
3316 {
3317 unsigned char *bol = beg_addr;
3318 while (beg_addr < end_addr && *beg_addr < 0x80)
3319 {
3320 beg_addr++;
3321 if (*(beg_addr - 1) == '\n')
3322 bol = beg_addr;
3323 }
3324 beg_addr = bol;
3325 goto label_skip_tail;
3326 }
3327 /* fall down ... */
3328 default:
3329 /* We can skip all ASCII characters at the head and tail. */
3330 while (beg_addr < end_addr && *beg_addr < 0x80) beg_addr++;
3331 label_skip_tail:
3332 while (beg_addr < end_addr && *(end_addr - 1) < 0x80) end_addr--;
3333 break;
3334 }
3335 }
3336 else /* for decoding */
3337 {
3338 switch (coding->type)
3339 {
3340 case coding_type_no_conversion:
3341 /* We need no conversion. */
3342 *begp = *endp;
3343 return;
3344 case coding_type_emacs_mule:
3345 case coding_type_raw_text:
3346 if (coding->eol_type == CODING_EOL_LF)
3347 {
3348 /* We need no conversion. */
3349 *begp = *endp;
3350 return;
3351 }
3352 /* We can skip all but carriage-return. */
3353 while (beg_addr < end_addr && *beg_addr != '\r') beg_addr++;
3354 while (beg_addr < end_addr && *(end_addr - 1) != '\r') end_addr--;
3355 break;
3356 case coding_type_sjis:
3357 case coding_type_big5:
3358 /* We can skip all ASCII characters at the head. */
3359 while (beg_addr < end_addr && *beg_addr < 0x80) beg_addr++;
3360 /* We can skip all ASCII characters at the tail except for
3361 the second byte of SJIS or BIG5 code. */
3362 while (beg_addr < end_addr && *(end_addr - 1) < 0x80) end_addr--;
3363 if (end_addr != *endp)
3364 end_addr++;
3365 break;
3366 case coding_type_ccl:
3367 /* We can't skip any data. */
3368 return;
3369 default: /* i.e. case coding_type_iso2022: */
3370 {
3371 unsigned char c;
3372
3373 /* We can skip all ASCII characters except for a few
3374 control codes at the head. */
3375 while (beg_addr < end_addr && (c = *beg_addr) < 0x80
3376 && c != ISO_CODE_CR && c != ISO_CODE_SO
3377 && c != ISO_CODE_SI && c != ISO_CODE_ESC)
3378 beg_addr++;
3379 }
3380 break;
3381 }
3382 }
3383 *begp = beg_addr;
3384 *endp = end_addr;
3385 return;
3386 }
3387
3388 /* Encode into or decode from (according to ENCODEP) coding system CODING
3389 the text between char positions B and E. */
3390
3391 Lisp_Object
3392 code_convert_region (b, e, coding, encodep)
3393 Lisp_Object b, e;
3394 struct coding_system *coding;
3395 int encodep;
3396 {
3397 int beg, end, len, consumed, produced;
3398 char *buf;
3399 unsigned char *begp, *endp;
3400 int opoint = PT, opoint_byte = PT_BYTE;
3401 int beg_byte, end_byte, len_byte;
3402 int zv_before = ZV;
3403 int zv_byte_before = ZV_BYTE;
3404
3405 validate_region (&b, &e);
3406 beg = XINT (b), end = XINT (e);
3407 beg_byte = CHAR_TO_BYTE (beg);
3408 end_byte = CHAR_TO_BYTE (end);
3409
3410 if (beg < GPT && end >= GPT)
3411 move_gap_both (end, end_byte);
3412
3413 if (encodep && !NILP (coding->pre_write_conversion))
3414 {
3415 /* We must call a pre-conversion function which may put a new
3416 text to be converted in a new buffer. */
3417 struct buffer *old = current_buffer, *new;
3418
3419 TEMP_SET_PT_BOTH (beg, beg_byte);
3420 call2 (coding->pre_write_conversion, b, e);
3421 if (old != current_buffer)
3422 {
3423 /* Replace the original text by the text just generated. */
3424 len = ZV - BEGV;
3425 len_byte = ZV_BYTE - BEGV_BYTE;
3426 new = current_buffer;
3427 set_buffer_internal (old);
3428 del_range_both (beg, end, beg_byte, end_byte, 1);
3429 insert_from_buffer (new, 1, len, 0);
3430 end = beg + len;
3431 end_byte = len_byte;
3432 }
3433 }
3434
3435 /* We may be able to shrink the conversion region. */
3436 begp = BYTE_POS_ADDR (beg_byte);
3437 endp = begp + (end_byte - beg_byte);
3438 shrink_conversion_area (&begp, &endp, coding, encodep);
3439
3440 if (begp == endp)
3441 /* We need no conversion. */
3442 len = end - beg;
3443 else
3444 {
3445 int shrunk_beg_byte, shrunk_end_byte;
3446 int shrunk_beg;
3447 int shrunk_len_byte;
3448 int new_len_byte;
3449 int buflen;
3450
3451 shrunk_beg_byte = PTR_BYTE_POS (begp);
3452 shrunk_beg = BYTE_TO_CHAR (shrunk_beg_byte);
3453 shrunk_end_byte = PTR_BYTE_POS (endp);
3454 shrunk_len_byte = shrunk_end_byte - shrunk_beg_byte;
3455
3456 if (encodep)
3457 buflen = encoding_buffer_size (coding, shrunk_len_byte);
3458 else
3459 buflen = decoding_buffer_size (coding, shrunk_len_byte);
3460 buf = get_conversion_buffer (buflen);
3461
3462 coding->last_block = 1;
3463 produced = (encodep
3464 ? encode_coding (coding, begp, buf, shrunk_len_byte, buflen,
3465 &consumed)
3466 : decode_coding (coding, begp, buf, shrunk_len_byte, buflen,
3467 &consumed));
3468
3469 TEMP_SET_PT_BOTH (shrunk_beg, shrunk_beg_byte);
3470
3471 /* We let the number of characters in the result
3472 be computed in accord with enable-multilibyte-characters
3473 even when encoding. Otherwise the buffer contents
3474 will be inconsistent. */
3475 insert (buf, produced);
3476
3477 del_range_byte (PT_BYTE, PT_BYTE + shrunk_len_byte, 1);
3478
3479 if (opoint >= end)
3480 {
3481 opoint += ZV - zv_before;
3482 opoint_byte += ZV_BYTE - zv_byte_before;
3483 }
3484 else if (opoint > beg)
3485 {
3486 opoint = beg;
3487 opoint_byte = beg_byte;
3488 }
3489 TEMP_SET_PT_BOTH (opoint, opoint_byte);
3490
3491 end += ZV - zv_before;
3492 }
3493
3494 if (!encodep && !NILP (coding->post_read_conversion))
3495 {
3496 Lisp_Object insval;
3497
3498 /* We must call a post-conversion function which may alter
3499 the text just converted. */
3500 zv_before = ZV;
3501 zv_byte_before = ZV_BYTE;
3502
3503 TEMP_SET_PT_BOTH (beg, beg_byte);
3504 insval = call1 (coding->post_read_conversion, make_number (end - beg));
3505 CHECK_NUMBER (insval, 0);
3506
3507 if (opoint >= beg + ZV - zv_before)
3508 {
3509 opoint += ZV - zv_before;
3510 opoint_byte += ZV_BYTE - zv_byte_before;
3511 }
3512 else if (opoint > beg)
3513 {
3514 opoint = beg;
3515 opoint_byte = beg_byte;
3516 }
3517 TEMP_SET_PT_BOTH (opoint, opoint_byte);
3518 len = XINT (insval);
3519 }
3520
3521 return make_number (len);
3522 } 4346 }
3523 4347
3524 DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region, 4348 DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region,
3525 3, 3, "r\nzCoding system: ", 4349 3, 3, "r\nzCoding system: ",
3526 "Decode current region by specified coding system.\n\ 4350 "Decode the current region by specified coding system.\n\
3527 When called from a program, takes three arguments:\n\ 4351 When called from a program, takes three arguments:\n\
3528 START, END, and CODING-SYSTEM. START END are buffer positions.\n\ 4352 START, END, and CODING-SYSTEM. START and END are buffer positions.\n\
3529 Return length of decoded text.") 4353 Return length of decoded text.")
3530 (b, e, coding_system) 4354 (start, end, coding_system)
3531 Lisp_Object b, e, coding_system; 4355 Lisp_Object start, end, coding_system;
3532 { 4356 {
3533 struct coding_system coding; 4357 struct coding_system coding;
3534 4358 int from, to;
3535 CHECK_NUMBER_COERCE_MARKER (b, 0); 4359
3536 CHECK_NUMBER_COERCE_MARKER (e, 1); 4360 CHECK_NUMBER_COERCE_MARKER (start, 0);
4361 CHECK_NUMBER_COERCE_MARKER (end, 1);
3537 CHECK_SYMBOL (coding_system, 2); 4362 CHECK_SYMBOL (coding_system, 2);
3538 4363
4364 validate_region (&start, &end);
4365 from = XFASTINT (start);
4366 to = XFASTINT (end);
4367
3539 if (NILP (coding_system)) 4368 if (NILP (coding_system))
3540 return make_number (XFASTINT (e) - XFASTINT (b)); 4369 return make_number (to - from);
4370
3541 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) 4371 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
3542 error ("Invalid coding-system: %s", XSYMBOL (coding_system)->name->data); 4372 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data);
3543 4373
3544 return code_convert_region (b, e, &coding, 0); 4374 coding.mode |= CODING_MODE_LAST_BLOCK;
4375 return code_convert_region (from, to, &coding, 0, 1);
3545 } 4376 }
3546 4377
3547 DEFUN ("encode-coding-region", Fencode_coding_region, Sencode_coding_region, 4378 DEFUN ("encode-coding-region", Fencode_coding_region, Sencode_coding_region,
3548 3, 3, "r\nzCoding system: ", 4379 3, 3, "r\nzCoding system: ",
3549 "Encode current region by specified coding system.\n\ 4380 "Encode the current region by specified coding system.\n\
3550 When called from a program, takes three arguments:\n\ 4381 When called from a program, takes three arguments:\n\
3551 START, END, and CODING-SYSTEM. START END are buffer positions.\n\ 4382 START, END, and CODING-SYSTEM. START and END are buffer positions.\n\
3552 Return length of encoded text.") 4383 Return length of encoded text.")
3553 (b, e, coding_system) 4384 (start, end, coding_system)
3554 Lisp_Object b, e, coding_system; 4385 Lisp_Object start, end, coding_system;
3555 { 4386 {
3556 struct coding_system coding; 4387 struct coding_system coding;
3557 4388 int from, to;
3558 CHECK_NUMBER_COERCE_MARKER (b, 0); 4389
3559 CHECK_NUMBER_COERCE_MARKER (e, 1); 4390 CHECK_NUMBER_COERCE_MARKER (start, 0);
4391 CHECK_NUMBER_COERCE_MARKER (end, 1);
3560 CHECK_SYMBOL (coding_system, 2); 4392 CHECK_SYMBOL (coding_system, 2);
3561 4393
4394 validate_region (&start, &end);
4395 from = XFASTINT (start);
4396 to = XFASTINT (end);
4397
3562 if (NILP (coding_system)) 4398 if (NILP (coding_system))
3563 return make_number (XFASTINT (e) - XFASTINT (b)); 4399 return make_number (to - from);
4400
3564 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) 4401 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
3565 error ("Invalid coding-system: %s", XSYMBOL (coding_system)->name->data); 4402 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data);
3566 4403
3567 return code_convert_region (b, e, &coding, 1); 4404 coding.mode |= CODING_MODE_LAST_BLOCK;
3568 } 4405 return code_convert_region (from, to, &coding, 1, 1);
3569
3570 /* Encode or decode (according to ENCODEP) the text of string STR
3571 using coding CODING. If NOCOPY is nil, we never return STR
3572 itself, but always a copy. If NOCOPY is non-nil, we return STR
3573 if no change is needed. */
3574
3575 Lisp_Object
3576 code_convert_string (str, coding, encodep, nocopy)
3577 Lisp_Object str, nocopy;
3578 struct coding_system *coding;
3579 int encodep;
3580 {
3581 int len, consumed, produced;
3582 char *buf;
3583 unsigned char *begp, *endp;
3584 int head_skip, tail_skip;
3585 struct gcpro gcpro1;
3586
3587 if (encodep && !NILP (coding->pre_write_conversion)
3588 || !encodep && !NILP (coding->post_read_conversion))
3589 {
3590 /* Since we have to call Lisp functions which assume target text
3591 is in a buffer, after setting a temporary buffer, call
3592 code_convert_region. */
3593 int count = specpdl_ptr - specpdl;
3594 int len = XSTRING (str)->size_byte;
3595 Lisp_Object result;
3596 struct buffer *old = current_buffer;
3597
3598 record_unwind_protect (Fset_buffer, Fcurrent_buffer ());
3599 temp_output_buffer_setup (" *code-converting-work*");
3600 set_buffer_internal (XBUFFER (Vstandard_output));
3601 insert_from_string (str, 0, 0, XSTRING (str)->size, len, 0);
3602 code_convert_region (make_number (BEGV), make_number (ZV),
3603 coding, encodep);
3604 result = make_buffer_string (BEGV, ZV, 0);
3605 set_buffer_internal (old);
3606 return unbind_to (count, result);
3607 }
3608
3609 /* We may be able to shrink the conversion region. */
3610 begp = XSTRING (str)->data;
3611 endp = begp + XSTRING (str)->size_byte;
3612 shrink_conversion_area (&begp, &endp, coding, encodep);
3613
3614 if (begp == endp)
3615 /* We need no conversion. */
3616 return (NILP (nocopy) ? Fcopy_sequence (str) : str);
3617
3618 /* We assume that head_skip and tail_skip count single-byte characters. */
3619 head_skip = begp - XSTRING (str)->data;
3620 tail_skip = XSTRING (str)->size_byte - head_skip - (endp - begp);
3621
3622 GCPRO1 (str);
3623
3624 if (encodep)
3625 len = encoding_buffer_size (coding, endp - begp);
3626 else
3627 len = decoding_buffer_size (coding, endp - begp);
3628 buf = get_conversion_buffer (len + head_skip + tail_skip);
3629
3630 bcopy (XSTRING (str)->data, buf, head_skip);
3631 coding->last_block = 1;
3632 produced = (encodep
3633 ? encode_coding (coding, XSTRING (str)->data + head_skip,
3634 buf + head_skip, endp - begp, len, &consumed)
3635 : decode_coding (coding, XSTRING (str)->data + head_skip,
3636 buf + head_skip, endp - begp, len, &consumed));
3637 bcopy (XSTRING (str)->data + head_skip + (endp - begp),
3638 buf + head_skip + produced,
3639 tail_skip);
3640
3641 UNGCPRO;
3642
3643 if (encodep)
3644 /* When encoding, the result is all single-byte characters. */
3645 return make_unibyte_string (buf, head_skip + produced + tail_skip);
3646
3647 /* When decoding, count properly the number of chars in the string. */
3648 return make_string (buf, head_skip + produced + tail_skip);
3649 } 4406 }
3650 4407
3651 DEFUN ("decode-coding-string", Fdecode_coding_string, Sdecode_coding_string, 4408 DEFUN ("decode-coding-string", Fdecode_coding_string, Sdecode_coding_string,
3652 2, 3, 0, 4409 2, 3, 0,
3653 "Decode STRING which is encoded in CODING-SYSTEM, and return the result.\n\ 4410 "Decode STRING which is encoded in CODING-SYSTEM, and return the result.\n\
3661 CHECK_STRING (string, 0); 4418 CHECK_STRING (string, 0);
3662 CHECK_SYMBOL (coding_system, 1); 4419 CHECK_SYMBOL (coding_system, 1);
3663 4420
3664 if (NILP (coding_system)) 4421 if (NILP (coding_system))
3665 return (NILP (nocopy) ? Fcopy_sequence (string) : string); 4422 return (NILP (nocopy) ? Fcopy_sequence (string) : string);
4423
3666 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) 4424 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
3667 error ("Invalid coding-system: %s", XSYMBOL (coding_system)->name->data); 4425 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data);
3668 4426
3669 return code_convert_string (string, &coding, 0, nocopy); 4427 coding.mode |= CODING_MODE_LAST_BLOCK;
4428 return code_convert_string (string, &coding, 0, !NILP (nocopy));
3670 } 4429 }
3671 4430
3672 DEFUN ("encode-coding-string", Fencode_coding_string, Sencode_coding_string, 4431 DEFUN ("encode-coding-string", Fencode_coding_string, Sencode_coding_string,
3673 2, 3, 0, 4432 2, 3, 0,
3674 "Encode STRING to CODING-SYSTEM, and return the result.\n\ 4433 "Encode STRING to CODING-SYSTEM, and return the result.\n\
3682 CHECK_STRING (string, 0); 4441 CHECK_STRING (string, 0);
3683 CHECK_SYMBOL (coding_system, 1); 4442 CHECK_SYMBOL (coding_system, 1);
3684 4443
3685 if (NILP (coding_system)) 4444 if (NILP (coding_system))
3686 return (NILP (nocopy) ? Fcopy_sequence (string) : string); 4445 return (NILP (nocopy) ? Fcopy_sequence (string) : string);
4446
3687 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0) 4447 if (setup_coding_system (Fcheck_coding_system (coding_system), &coding) < 0)
3688 error ("Invalid coding-system: %s", XSYMBOL (coding_system)->name->data); 4448 error ("Invalid coding system: %s", XSYMBOL (coding_system)->name->data);
3689 4449
3690 return code_convert_string (string, &coding, 1, nocopy); 4450 coding.mode |= CODING_MODE_LAST_BLOCK;
4451 return code_convert_string (string, &coding, 1, !NILP (nocopy));
3691 } 4452 }
3692 4453
3693 DEFUN ("decode-sjis-char", Fdecode_sjis_char, Sdecode_sjis_char, 1, 1, 0, 4454 DEFUN ("decode-sjis-char", Fdecode_sjis_char, Sdecode_sjis_char, 1, 1, 0,
3694 "Decode a JISX0208 character of shift-jis encoding.\n\ 4455 "Decode a JISX0208 character of shift-jis encoding.\n\
3695 CODE is the character code in SJIS.\n\ 4456 CODE is the character code in SJIS.\n\
3706 XSETFASTINT (val, MAKE_NON_ASCII_CHAR (charset_jisx0208, c1, c2)); 4467 XSETFASTINT (val, MAKE_NON_ASCII_CHAR (charset_jisx0208, c1, c2));
3707 return val; 4468 return val;
3708 } 4469 }
3709 4470
3710 DEFUN ("encode-sjis-char", Fencode_sjis_char, Sencode_sjis_char, 1, 1, 0, 4471 DEFUN ("encode-sjis-char", Fencode_sjis_char, Sencode_sjis_char, 1, 1, 0,
3711 "Encode a JISX0208 character CHAR to SJIS coding-system.\n\ 4472 "Encode a JISX0208 character CHAR to SJIS coding system.\n\
3712 Return the corresponding character code in SJIS.") 4473 Return the corresponding character code in SJIS.")
3713 (ch) 4474 (ch)
3714 Lisp_Object ch; 4475 Lisp_Object ch;
3715 { 4476 {
3716 int charset, c1, c2, s1, s2; 4477 int charset, c1, c2, s1, s2;
3727 XSETFASTINT (val, 0); 4488 XSETFASTINT (val, 0);
3728 return val; 4489 return val;
3729 } 4490 }
3730 4491
3731 DEFUN ("decode-big5-char", Fdecode_big5_char, Sdecode_big5_char, 1, 1, 0, 4492 DEFUN ("decode-big5-char", Fdecode_big5_char, Sdecode_big5_char, 1, 1, 0,
3732 "Decode a Big5 character CODE of BIG5 coding-system.\n\ 4493 "Decode a Big5 character CODE of BIG5 coding system.\n\
3733 CODE is the character code in BIG5.\n\ 4494 CODE is the character code in BIG5.\n\
3734 Return the corresponding character.") 4495 Return the corresponding character.")
3735 (code) 4496 (code)
3736 Lisp_Object code; 4497 Lisp_Object code;
3737 { 4498 {
3745 XSETFASTINT (val, MAKE_NON_ASCII_CHAR (charset, c1, c2)); 4506 XSETFASTINT (val, MAKE_NON_ASCII_CHAR (charset, c1, c2));
3746 return val; 4507 return val;
3747 } 4508 }
3748 4509
3749 DEFUN ("encode-big5-char", Fencode_big5_char, Sencode_big5_char, 1, 1, 0, 4510 DEFUN ("encode-big5-char", Fencode_big5_char, Sencode_big5_char, 1, 1, 0,
3750 "Encode the Big5 character CHAR to BIG5 coding-system.\n\ 4511 "Encode the Big5 character CHAR to BIG5 coding system.\n\
3751 Return the corresponding character code in Big5.") 4512 Return the corresponding character code in Big5.")
3752 (ch) 4513 (ch)
3753 Lisp_Object ch; 4514 Lisp_Object ch;
3754 { 4515 {
3755 int charset, c1, c2, b1, b2; 4516 int charset, c1, c2, b1, b2;
3913 } 4674 }
3914 } 4675 }
3915 return Qnil; 4676 return Qnil;
3916 } 4677 }
3917 4678
4679 DEFUN ("update-iso-coding-systems", Fupdate_iso_coding_systems,
4680 Supdate_iso_coding_systems, 0, 0, 0,
4681 "Update internal database for ISO2022 based coding systems.\n\
4682 When values of the following coding categories are changed, you must\n\
4683 call this function:\n\
4684 coding-category-iso-7, coding-category-iso-7-tight,\n\
4685 coding-category-iso-8-1, coding-category-iso-8-2,\n\
4686 coding-category-iso-7-else, coding-category-iso-8-else")
4687 ()
4688 {
4689 int i;
4690
4691 for (i = CODING_CATEGORY_IDX_ISO_7; i <= CODING_CATEGORY_IDX_ISO_8_ELSE;
4692 i++)
4693 {
4694 if (! coding_system_table[i])
4695 coding_system_table[i]
4696 = (struct coding_system *) xmalloc (sizeof (struct coding_system));
4697 setup_coding_system
4698 (XSYMBOL (XVECTOR (Vcoding_category_table)->contents[i])->value,
4699 coding_system_table[i]);
4700 }
4701 return Qnil;
4702 }
4703
3918 #endif /* emacs */ 4704 #endif /* emacs */
3919 4705
3920 4706
3921 /*** 8. Post-amble ***/ 4707 /*** 8. Post-amble ***/
3922 4708
3965 4751
3966 setup_coding_system (Qnil, &keyboard_coding); 4752 setup_coding_system (Qnil, &keyboard_coding);
3967 setup_coding_system (Qnil, &terminal_coding); 4753 setup_coding_system (Qnil, &terminal_coding);
3968 setup_coding_system (Qnil, &safe_terminal_coding); 4754 setup_coding_system (Qnil, &safe_terminal_coding);
3969 4755
4756 bzero (coding_system_table, sizeof coding_system_table);
4757
3970 #if defined (MSDOS) || defined (WINDOWSNT) 4758 #if defined (MSDOS) || defined (WINDOWSNT)
3971 system_eol_type = CODING_EOL_CRLF; 4759 system_eol_type = CODING_EOL_CRLF;
3972 #else 4760 #else
3973 system_eol_type = CODING_EOL_LF; 4761 system_eol_type = CODING_EOL_LF;
3974 #endif 4762 #endif
4040 Fput (Qcoding_system_error, Qerror_conditions, 4828 Fput (Qcoding_system_error, Qerror_conditions,
4041 Fcons (Qcoding_system_error, Fcons (Qerror, Qnil))); 4829 Fcons (Qcoding_system_error, Fcons (Qerror, Qnil)));
4042 Fput (Qcoding_system_error, Qerror_message, 4830 Fput (Qcoding_system_error, Qerror_message,
4043 build_string ("Invalid coding system")); 4831 build_string ("Invalid coding system"));
4044 4832
4833 Qcoding_category = intern ("coding-category");
4834 staticpro (&Qcoding_category);
4045 Qcoding_category_index = intern ("coding-category-index"); 4835 Qcoding_category_index = intern ("coding-category-index");
4046 staticpro (&Qcoding_category_index); 4836 staticpro (&Qcoding_category_index);
4047 4837
4838 Vcoding_category_table
4839 = Fmake_vector (make_number (CODING_CATEGORY_IDX_MAX), Qnil);
4840 staticpro (&Vcoding_category_table);
4048 { 4841 {
4049 int i; 4842 int i;
4050 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++) 4843 for (i = 0; i < CODING_CATEGORY_IDX_MAX; i++)
4051 { 4844 {
4052 coding_category_table[i] = intern (coding_category_name[i]); 4845 XVECTOR (Vcoding_category_table)->contents[i]
4053 staticpro (&coding_category_table[i]); 4846 = intern (coding_category_name[i]);
4054 Fput (coding_category_table[i], Qcoding_category_index, 4847 Fput (XVECTOR (Vcoding_category_table)->contents[i],
4055 make_number (i)); 4848 Qcoding_category_index, make_number (i));
4056 } 4849 }
4057 } 4850 }
4058 4851
4059 Qcharacter_unification_table = intern ("character-unification-table"); 4852 Qcharacter_unification_table = intern ("character-unification-table");
4060 staticpro (&Qcharacter_unification_table); 4853 staticpro (&Qcharacter_unification_table);
4072 Qsafe_charsets = intern ("safe-charsets"); 4865 Qsafe_charsets = intern ("safe-charsets");
4073 staticpro (&Qsafe_charsets); 4866 staticpro (&Qsafe_charsets);
4074 4867
4075 Qemacs_mule = intern ("emacs-mule"); 4868 Qemacs_mule = intern ("emacs-mule");
4076 staticpro (&Qemacs_mule); 4869 staticpro (&Qemacs_mule);
4870
4871 Qraw_text = intern ("raw-text");
4872 staticpro (&Qraw_text);
4077 4873
4078 defsubr (&Scoding_system_p); 4874 defsubr (&Scoding_system_p);
4079 defsubr (&Sread_coding_system); 4875 defsubr (&Sread_coding_system);
4080 defsubr (&Sread_non_nil_coding_system); 4876 defsubr (&Sread_non_nil_coding_system);
4081 defsubr (&Scheck_coding_system); 4877 defsubr (&Scheck_coding_system);
4082 defsubr (&Sdetect_coding_region); 4878 defsubr (&Sdetect_coding_region);
4879 defsubr (&Sdetect_coding_string);
4083 defsubr (&Sdecode_coding_region); 4880 defsubr (&Sdecode_coding_region);
4084 defsubr (&Sencode_coding_region); 4881 defsubr (&Sencode_coding_region);
4085 defsubr (&Sdecode_coding_string); 4882 defsubr (&Sdecode_coding_string);
4086 defsubr (&Sencode_coding_string); 4883 defsubr (&Sencode_coding_string);
4087 defsubr (&Sdecode_sjis_char); 4884 defsubr (&Sdecode_sjis_char);
4092 defsubr (&Sset_safe_terminal_coding_system_internal); 4889 defsubr (&Sset_safe_terminal_coding_system_internal);
4093 defsubr (&Sterminal_coding_system); 4890 defsubr (&Sterminal_coding_system);
4094 defsubr (&Sset_keyboard_coding_system_internal); 4891 defsubr (&Sset_keyboard_coding_system_internal);
4095 defsubr (&Skeyboard_coding_system); 4892 defsubr (&Skeyboard_coding_system);
4096 defsubr (&Sfind_operation_coding_system); 4893 defsubr (&Sfind_operation_coding_system);
4894 defsubr (&Supdate_iso_coding_systems);
4097 4895
4098 DEFVAR_LISP ("coding-system-list", &Vcoding_system_list, 4896 DEFVAR_LISP ("coding-system-list", &Vcoding_system_list,
4099 "List of coding systems.\n\ 4897 "List of coding systems.\n\
4100 \n\ 4898 \n\
4101 Do not alter the value of this variable manually. This variable should be\n\ 4899 Do not alter the value of this variable manually. This variable should be\n\
4119 int i; 4917 int i;
4120 4918
4121 Vcoding_category_list = Qnil; 4919 Vcoding_category_list = Qnil;
4122 for (i = CODING_CATEGORY_IDX_MAX - 1; i >= 0; i--) 4920 for (i = CODING_CATEGORY_IDX_MAX - 1; i >= 0; i--)
4123 Vcoding_category_list 4921 Vcoding_category_list
4124 = Fcons (coding_category_table[i], Vcoding_category_list); 4922 = Fcons (XVECTOR (Vcoding_category_table)->contents[i],
4923 Vcoding_category_list);
4125 } 4924 }
4126 4925
4127 DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read, 4926 DEFVAR_LISP ("coding-system-for-read", &Vcoding_system_for_read,
4128 "Specify the coding system for read operations.\n\ 4927 "Specify the coding system for read operations.\n\
4129 It is useful to bind this variable with `let', but do not set it globally.\n\ 4928 It is useful to bind this variable with `let', but do not set it globally.\n\
4247 a coding system of ISO 2022 variant which has a flag\n\ 5046 a coding system of ISO 2022 variant which has a flag\n\
4248 `accept-latin-extra-code' t (e.g. iso-latin-1) on reading a file\n\ 5047 `accept-latin-extra-code' t (e.g. iso-latin-1) on reading a file\n\
4249 or reading output of a subprocess.\n\ 5048 or reading output of a subprocess.\n\
4250 Only 128th through 159th elements has a meaning."); 5049 Only 128th through 159th elements has a meaning.");
4251 Vlatin_extra_code_table = Fmake_vector (make_number (256), Qnil); 5050 Vlatin_extra_code_table = Fmake_vector (make_number (256), Qnil);
5051
5052 DEFVAR_LISP ("select-safe-coding-system-function",
5053 &Vselect_safe_coding_system_function,
5054 "Function to call to select safe coding system for encoding a text.\n\
5055 \n\
5056 If set, this function is called to force a user to select a proper\n\
5057 coding system which can encode the text in the case that a default\n\
5058 coding system used in each operation can't encode the text.\n\
5059 \n\
5060 The default value is `select-safe-codign-system' (which see).");
5061 Vselect_safe_coding_system_function = Qnil;
5062
4252 } 5063 }
4253 5064
4254 #endif /* emacs */ 5065 #endif /* emacs */