emacs: src/coding.c comparison

comparison src/coding.c @ 35053:e3e1ff3616fa

Commentary changes. (detect_eol_type_in_2_octet_form): Declare arg big_endian_p.

author	Dave Love <fx@gnu.org>
date	Thu, 04 Jan 2001 17:35:26 +0000
parents	8cd5e6ad71a2
children	36de5bf9969c

comparison

equal deleted inserted replaced

-:07b5f5fdb0ce
+:e3e1ff3616fa
 */
 /*** 0. General comments ***/
-/*** GENERAL NOTE on CODING SYSTEM ***
+/*** GENERAL NOTE on CODING SYSTEMS ***
-Coding system is an encoding mechanism of one or more character
+A coding system is an encoding mechanism for one or more character
 sets.  Here's a list of coding systems which Emacs can handle.  When
 we say "decode", it means converting some other coding system to
-Emacs' internal format (emacs-internal), and when we say "encode",
+Emacs' internal format (emacs-mule), and when we say "encode",
 it means converting the coding system emacs-mule to some other
 coding system.
 0. Emacs' internal format (emacs-mule)
-Emacs itself holds a multi-lingual character in a buffer and a string
+Emacs itself holds a multi-lingual character in buffers and strings
 in a special format.  Details are described in section 2.
 1. ISO2022
 The most famous coding system for multiple character sets.  X's
 JISX0208.  Widely used for PC's in Japan.  Details are described in
 section 4.
 3. BIG5
-A coding system to encode character sets: ASCII and Big5.  Widely
+A coding system to encode the character sets ASCII and Big5.  Widely
-used by Chinese (mainly in Taiwan and Hong Kong).  Details are
+used for Chinese (mainly in Taiwan and Hong Kong).  Details are
 described in section 4.  In this file, when we write "BIG5"
 (all uppercase), we mean the coding system, and when we write
 "Big5" (capitalized), we mean the character set.
 4. Raw text
-A coding system for a text containing random 8-bit code.  Emacs does
+A coding system for text containing random 8-bit code.  Emacs does
-no code conversion on such a text except for end-of-line format.
+no code conversion on such text except for end-of-line format.
 5. Other
-If a user wants to read/write a text encoded in a coding system not
+If a user wants to read/write text encoded in a coding system not
-listed above, he can supply a decoder and an encoder for it in CCL
+listed above, he can supply a decoder and an encoder for it as CCL
 (Code Conversion Language) programs.  Emacs executes the CCL program
 while reading/writing.
 Emacs represents a coding system by a Lisp symbol that has a property
 `coding-system'.  But, before actually using the coding system, the
 */
 /*** GENERAL NOTES on END-OF-LINE FORMAT ***
-How end-of-line of a text is encoded depends on a system.  For
+How end-of-line of text is encoded depends on the operating system.
-instance, Unix's format is just one byte of `line-feed' code,
+For instance, Unix's format is just one byte of `line-feed' code,
 whereas DOS's format is two-byte sequence of `carriage-return' and
 `line-feed' codes.  MacOS's format is usually one byte of
 `carriage-return'.
-Since text characters encoding and end-of-line encoding are
+Since text character encoding and end-of-line encoding are
-independent, any coding system described above can take
+independent, any coding system described above can have any
-any format of end-of-line.  So, Emacs has information of format of
+end-of-line format.  So Emacs has information about end-of-line
-end-of-line in each coding-system.  See section 6 for more details.
+format in each coding-system.  See section 6 for more details.
 */
 /*** GENERAL NOTES on `detect_coding_XXX ()' functions ***
 These functions check if a text between SRC and SRC_END is encoded
 in the coding system category XXX.  Each returns an integer value in
-which appropriate flag bits for the category XXX is set.  The flag
+which appropriate flag bits for the category XXX are set.  The flag
 bits are defined in macros CODING_CATEGORY_MASK_XXX.  Below is the
-template of these functions.  If MULTIBYTEP is nonzero, 8-bit codes
+template for these functions.  If MULTIBYTEP is nonzero, 8-bit codes
 of the range 0x80..0x9F are in multibyte form.  */
 #if 0
 int
 detect_coding_emacs_mule (src, src_end, multibytep)
 unsigned char *src, *src_end;
 These functions decode SRC_BYTES length of unibyte text at SOURCE
 encoded in CODING to Emacs' internal format.  The resulting
 multibyte text goes to a place pointed to by DESTINATION, the length
 of which should not exceed DST_BYTES.
-These functions set the information of original and decoded texts in
+These functions set the information about original and decoded texts
-the members produced, produced_char, consumed, and consumed_char of
+in the members `produced', `produced_char', `consumed', and
-the structure *CODING.  They also set the member result to one of
+`consumed_char' of the structure *CODING.  They also set the member
-CODING_FINISH_XXX indicating how the decoding finished.
+`result' to one of CODING_FINISH_XXX indicating how the decoding
+finished.
-DST_BYTES zero means that source area and destination area are
+DST_BYTES zero means that the source area and destination area are
 overlapped, which means that we can produce a decoded text until it
-reaches at the head of not-yet-decoded source text.
+reaches the head of the not-yet-decoded source text.
-Below is a template of these functions.  */
+Below is a template for these functions.  */
 #if 0
 static void
 decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
 struct coding_system *coding;
 unsigned char *source, *destination;
 }
 #endif
 /*** GENERAL NOTES on `encode_coding_XXX ()' functions ***
-These functions encode SRC_BYTES length text at SOURCE of Emacs'
+These functions encode SRC_BYTES length text at SOURCE from Emacs'
 internal multibyte format to CODING.  The resulting unibyte text
 goes to a place pointed to by DESTINATION, the length of which
 should not exceed DST_BYTES.
-These functions set the information of original and encoded texts in
+These functions set the information about original and encoded texts
-the members produced, produced_char, consumed, and consumed_char of
+in the members `produced', `produced_char', `consumed', and
-the structure *CODING.  They also set the member result to one of
+`consumed_char' of the structure *CODING.  They also set the member
-CODING_FINISH_XXX indicating how the encoding finished.
+`result' to one of CODING_FINISH_XXX indicating how the encoding
+finished.
-DST_BYTES zero means that source area and destination area are
-overlapped, which means that we can produce a encoded text until it
+DST_BYTES zero means that the source area and destination area are
-reaches at the head of not-yet-encoded source text.
+overlapped, which means that we can produce encoded text until it
+reaches at the head of the not-yet-encoded source text.
-Below is a template of these functions.  */
+Below is a template for these functions.  */
 #if 0
 static void
 encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
 struct coding_system *coding;
 unsigned char *source, *destination;
 /* Produce a multibyte form of characater C to `dst'.  Jump to
 `label_end_of_loop' if there's not enough space at `dst'.
-If we are now in the middle of composition sequence, the decoded
+If we are now in the middle of a composition sequence, the decoded
 character may be ALTCHAR (for the current composition).  In that
 case, the character goes to coding->cmp_data->data instead of
 `dst'.
 This macro is used in decoding routines.  */
 /*** 3. ISO2022 handlers ***/
 /* The following note describes the coding system ISO2022 briefly.
 Since the intention of this note is to help understand the
-functions in this file, some parts are NOT ACCURATE or OVERLY
+functions in this file, some parts are NOT ACCURATE or are OVERLY
 SIMPLIFIED.  For thorough understanding, please refer to the
-original document of ISO2022.
+original document of ISO2022.  This is equivalent to the standard
+ECMA-35, obtainable from <URL:http://www.ecma.ch/> (*).
 ISO2022 provides many mechanisms to encode several character sets
-in 7-bit and 8-bit environments.  For 7-bite environments, all text
+in 7-bit and 8-bit environments.  For 7-bit environments, all text
 is encoded using bytes less than 128.  This may make the encoded
 text a little bit longer, but the text passes more easily through
-several gateways, some of which strip off MSB (Most Signigant Bit).
+several types of gateway, some of which strip off the MSB (Most
+Signigant Bit).
-There are two kinds of character sets: control character set and
-graphic character set.  The former contains control characters such
+There are two kinds of character sets: control character sets and
+graphic character sets.  The former contain control characters such
 as `newline' and `escape' to provide control functions (control
 functions are also provided by escape sequences).  The latter
-contains graphic characters such as 'A' and '-'.  Emacs recognizes
+contain graphic characters such as 'A' and '-'.  Emacs recognizes
 two control character sets and many graphic character sets.
 Graphic character sets are classified into one of the following
 four classes, according to the number of bytes (DIMENSION) and
 number of characters in one dimension (CHARS) of the set:
 - DIMENSION1_CHARS96
 - DIMENSION2_CHARS94
 - DIMENSION2_CHARS96
 In addition, each character set is assigned an identification tag,
-unique for each set, called "final character" (denoted as <F>
+unique for each set, called the "final character" (denoted as <F>
 hereafter).  The <F> of each character set is decided by ECMA(*)
 when it is registered in ISO.  The code range of <F> is 0x30..0x7F
 (0x30..0x3F are for private use only).
 Note (*): ECMA = European Computer Manufacturers Association
-Here are examples of graphic character set [NAME(<F>)]:
+Here are examples of graphic character sets [NAME(<F>)]:
 	o DIMENSION1_CHARS94 -- ASCII('B'), right-half-of-JISX0201('I'), ...
 	o DIMENSION1_CHARS96 -- right-half-of-ISO8859-1('A'), ...
 	o DIMENSION2_CHARS94 -- GB2312('A'), JISX0208('B'), ...
 	o DIMENSION2_CHARS96 -- none for the moment
 7-bit environment, non-locking-shift, and non-single-shift.
 Note (**): If <F> is '@', 'A', or 'B', the intermediate character
 '(' can be omitted.  We refer to this as "short-form" hereafter.
-Now you may notice that there are a lot of ways for encoding the
+Now you may notice that there are a lot of ways of encoding the
 same multilingual text in ISO2022.  Actually, there exist many
 coding systems such as Compound Text (used in X11's inter client
 communication, ISO-2022-JP (used in Japanese internet), ISO-2022-KR
 (used in Korean internet), EUC (Extended UNIX Code, used in Asian
 localized platforms), and all of these are variants of ISO2022.
 	o ESC '1' -- end composition
 	o ESC '2' -- start rule-base composition (*)
 	o ESC '3' -- start relative composition with alternate chars  (**)
 	o ESC '4' -- start rule-base composition with alternate chars  (**)
 Since these are not standard escape sequences of any ISO standard,
-the use of them for these meaning is restricted to Emacs only.
+the use of them with these meanings is restricted to Emacs only.
-(*) This form is used only in Emacs 20.5 and the older versions,
+(*) This form is used only in Emacs 20.5 and older versions,
 but the newer versions can safely decode it.
-(**) This form is used only in Emacs 21.1 and the newer versions,
+(**) This form is used only in Emacs 21.1 and newer versions,
 and the older versions can't decode it.
-Here's a list of examples usages of these composition escape
+Here's a list of example usages of these composition escape
 sequences (categorized by `enum composition_method').
 COMPOSITION_RELATIVE:
 	ESC 0 CHAR [ CHAR ] ESC 1
 COMPOSITOIN_WITH_RULE:
 #define SHIFT_OUT_OK(idx) \
 (CODING_SPEC_ISO_INITIAL_DESIGNATION (coding_system_table[idx], 1) >= 0)
 /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
-Check if a text is encoded in ISO2022.  If it is, returns an
+Check if a text is encoded in ISO2022.  If it is, return an
 integer in which appropriate flag bits any of:
 	CODING_CATEGORY_MASK_ISO_7
 	CODING_CATEGORY_MASK_ISO_7_TIGHT
 	CODING_CATEGORY_MASK_ISO_8_1
 	CODING_CATEGORY_MASK_ISO_8_2
 /* ISO2022 encoding stuff.  */
 /*
 It is not enough to say just "ISO2022" on encoding, we have to
-specify more details.  In Emacs, each coding system of ISO2022
+specify more details.  In Emacs, each ISO2022 coding system
 variant has the following specifications:
 	1. Initial designation to G0 thru G3.
 	2. Allows short-form designation?
 	3. ASCII should be designated to G0 before control characters?
 	4. ASCII should be designated to G0 at end of line?
 }
 /*** 4. SJIS and BIG5 handlers ***/
-/* Although SJIS and BIG5 are not ISO's coding system, they are used
+/* Although SJIS and BIG5 are not ISO coding systems, they are used
 quite widely.  So, for the moment, Emacs supports them in the bare
 C code.  But, in the future, they may be supported only by CCL.  */
 /* SJIS is a coding system encoding three character sets: ASCII, right
 half of JISX0201-Kana, and JISX0208.  An ASCII character is encoded
 as is.  A character of charset katakana-jisx0201 is encoded by
 "position-code + 0x80".  A character of charset japanese-jisx0208
 is encoded in 2-byte but two position-codes are divided and shifted
-so that it fit in the range below.
+so that it fits in the range below.
 --- CODE RANGE of SJIS ---
 (character set)	(range)
 ASCII		0x00 .. 0x7F
 KATAKANA-JISX0201	0xA0 .. 0xDF
 */
 /* BIG5 is a coding system encoding two character sets: ASCII and
 Big5.  An ASCII character is encoded as is.  Big5 is a two-byte
-character set and is encoded in two-byte.
+character set and is encoded in two bytes.
 --- CODE RANGE of BIG5 ---
 (character set)	(range)
 ASCII		0x00 .. 0x7F
 Big5 (1st byte)	0xA1 .. 0xFE
 }
 /*** 7. C library functions ***/
-/* In Emacs Lisp, coding system is represented by a Lisp symbol which
+/* In Emacs Lisp, a coding system is represented by a Lisp symbol which
 has a property `coding-system'.  The value of this property is a
-vector of length 5 (called as coding-vector).  Among elements of
+vector of length 5 (called the coding-vector).  Among elements of
 this vector, the first (element[0]) and the fifth (element[4])
 carry important information for decoding/encoding.  Before
 decoding/encoding, this information should be set in fields of a
 structure of type `coding_system'.
-A value of property `coding-system' can be a symbol of another
+The value of the property `coding-system' can be a symbol of another
 subsidiary coding-system.  In that case, Emacs gets coding-vector
 from that symbol.
 `element[0]' contains information to be set in `coding->type'.  The
 value and its meaning is as follows:
 If `coding->type' is `coding_type_big5', element[4] is t to denote
 BIG5-ETen or nil to denote BIG5-HKU.
 If `coding->type' takes the other value, element[4] is ignored.
-Emacs Lisp's coding system also carries information about format of
+Emacs Lisp's coding systems also carry information about format of
 end-of-line in a value of property `eol-type'.  If the value is
 integer, 0 means CODING_EOL_LF, 1 means CODING_EOL_CRLF, and 2
 means CODING_EOL_CR.  If it is not integer, it should be a vector
 of subsidiary coding systems of which property `eol-type' has one
-of above values.
+of the above values.
 */
 /* Extract information for decoding/encoding from CODING_SYSTEM_SYMBOL
 and set it in CODING.  If CODING_SYSTEM_SYMBOL is invalid, CODING
 	The category for a coding system not categorized in any of the
 	above.  Assigned the coding-system (Lisp symbol)
 	`no-conversion' by default.
 Each of them is a Lisp symbol and the value is an actual
-`coding-system's (this is also a Lisp symbol) assigned by a user.
+`coding-system' (this is also a Lisp symbol) assigned by a user.
 What Emacs does actually is to detect a category of coding system.
 Then, it uses a `coding-system' assigned to it.  If Emacs can't
-decide only one possible category, it selects a category of the
+decide a single possible category, it selects a category of the
 highest priority.  Priorities of categories are also specified by a
 user in a Lisp variable `coding-category-list'.
 */
 utf-16-le.  */
 static int
 detect_eol_type_in_2_octet_form (source, src_bytes, skip, big_endian_p)
 unsigned char *source;
-int src_bytes, *skip;
+int src_bytes, *skip, big_endian_p;
 {
 unsigned char *src = source, *src_end = src + src_bytes;
 unsigned int c1, c2;
 int total = 0;		/* How many end-of-lines are found so far.  */
 int eol_type = CODING_EOL_UNDECIDED;
 return make_number (coding.produced_char);
 }
 DEFUN ("decode-coding-region", Fdecode_coding_region, Sdecode_coding_region,
 3, 3, "r\nzCoding system: ",
-"Decode the current region by specified coding system.\n\
+"Decode the current region from the specified coding system.\n\
 When called from a program, takes three arguments:\n\
 START, END, and CODING-SYSTEM.  START and END are buffer positions.\n\
 This function sets `last-coding-system-used' to the precise coding system\n\
 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
 not fully specified.)\n\
 return code_convert_region1 (start, end, coding_system, 0);
 }
 DEFUN ("encode-coding-region", Fencode_coding_region, Sencode_coding_region,
 3, 3, "r\nzCoding system: ",
-"Encode the current region by specified coding system.\n\
+"Encode the current region into the specified coding system.\n\
 When called from a program, takes three arguments:\n\
 START, END, and CODING-SYSTEM.  START and END are buffer positions.\n\
 This function sets `last-coding-system-used' to the precise coding system\n\
 used (which may be different from CODING-SYSTEM if CODING-SYSTEM is\n\
 not fully specified.)\n\

Mercurial > emacs

comparison src/coding.c @ 35053:e3e1ff3616fa