[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Emacs-diffs] /srv/bzr/emacs/emacs-24 r110769: Clarify documentation abo
From: |
Chong Yidong |
Subject: |
[Emacs-diffs] /srv/bzr/emacs/emacs-24 r110769: Clarify documentation about escape sequences in strings. |
Date: |
Sat, 03 Nov 2012 19:02:43 +0800 |
User-agent: |
Bazaar (2.5.0) |
------------------------------------------------------------
revno: 110769
committer: Chong Yidong <address@hidden>
branch nick: emacs-24
timestamp: Sat 2012-11-03 19:02:43 +0800
message:
Clarify documentation about escape sequences in strings.
* objects.texi (General Escape Syntax): Clarify the explanation of
escape sequences.
(Non-ASCII in Strings): Clarify when a string is unibyte vs
multibyte. Hex escapes do not automatically make a string multibyte.
modified:
doc/lispref/ChangeLog
doc/lispref/objects.texi
=== modified file 'doc/lispref/ChangeLog'
--- a/doc/lispref/ChangeLog 2012-11-03 10:47:03 +0000
+++ b/doc/lispref/ChangeLog 2012-11-03 11:02:43 +0000
@@ -1,3 +1,11 @@
+2012-11-03 Chong Yidong <address@hidden>
+
+ * objects.texi (General Escape Syntax): Clarify the explanation of
+ escape sequences.
+ (Non-ASCII in Strings): Clarify when a string is unibyte vs
+ multibyte. Hex escapes do not automatically make a string
+ multibyte.
+
2012-11-03 Martin Rudalics <address@hidden>
* windows.texi (Switching Buffers): Document option
=== modified file 'doc/lispref/objects.texi'
--- a/doc/lispref/objects.texi 2012-05-27 01:34:14 +0000
+++ b/doc/lispref/objects.texi 2012-11-03 11:02:43 +0000
@@ -351,51 +351,48 @@
control characters, Emacs provides several types of escape syntax that
you can use to specify address@hidden text characters.
address@hidden unicode character escape
- You can specify characters by their Unicode values.
address@hidden@var{nnnn}} represents a character that maps to the Unicode
-code point @address@hidden (by convention, Unicode code points are
-given in hexadecimal). There is a slightly different syntax for
-specifying characters with code points higher than
address@hidden@var{ffff}}: @address@hidden represents the character
-whose code point is @address@hidden The Unicode Standard only
-defines code points up to @address@hidden, so if you specify a
-code point higher than that, Emacs signals an error.
-
- This peculiar and inconvenient syntax was adopted for compatibility
-with other programming languages. Unlike some other languages, Emacs
-Lisp supports this syntax only in character literals and strings.
-
@cindex @samp{\} in character constant
@cindex backslash in character constants
address@hidden octal character code
- The most general read syntax for a character represents the
-character code in either octal or hex. To use octal, write a question
-mark followed by a backslash and the octal character code (up to three
-octal digits); thus, @samp{?\101} for the character @kbd{A},
address@hidden for the character @kbd{C-a}, and @code{?\002} for the
-character @kbd{C-b}. Although this syntax can represent any
address@hidden character, it is preferred only when the precise octal
-value is more important than the @acronym{ASCII} representation.
-
address@hidden
address@hidden
-?\012 @result{} 10 ?\n @result{} 10 ?\C-j @result{} 10
-?\101 @result{} 65 ?A @result{} 65
address@hidden group
address@hidden example
-
- To use hex, write a question mark followed by a backslash, @samp{x},
-and the hexadecimal character code. You can use any number of hex
-digits, so you can represent any character code in this way.
-Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the
-character @kbd{C-a}, and @code{?\xe0} for the Latin-1 character
address@hidden unicode character escape
+ Firstly, you can specify characters by their Unicode values.
address@hidden@var{nnnn}} represents a character with Unicode code point
address@hidden@var{nnnn}}, where @var{nnnn} is (by convention) a hexadecimal
+number with exactly four digits. The backslash indicates that the
+subsequent characters form an escape sequence, and the @samp{u}
+specifies a Unicode escape sequence.
+
+ There is a slightly different syntax for specifying Unicode
+characters with code points higher than @address@hidden:
address@hidden@var{nnnnnn}} represents the character with code point
address@hidden@var{nnnnnn}}, where @var{nnnnnn} is a six-digit hexadecimal
+number. The Unicode Standard only defines code points up to
address@hidden@var{10ffff}}, so if you specify a code point higher than
+that, Emacs signals an error.
+
+ Secondly, you can specify characters by their hexadecimal character
+codes. A hexadecimal escape sequence consists of a backslash,
address@hidden, and the hexadecimal character code. Thus, @samp{?\x41} is
+the character @kbd{A}, @samp{?\x1} is the character @kbd{C-a}, and
address@hidden is the character
@iftex
@address@hidden
@end iftex
@ifnottex
@samp{a} with grave accent.
@end ifnottex
+You can use any number of hex digits, so you can represent any
+character code in this way.
+
address@hidden octal character code
+ Thirdly, you can specify characters by their character code in
+octal. An octal escape sequence consists of a backslash followed by
+up to three octal digits; thus, @samp{?\101} for the character
address@hidden, @samp{?\001} for the character @kbd{C-a}, and @code{?\002}
+for the character @kbd{C-b}. Only characters up to octal code 777 can
+be specified this way.
+
+ These escape sequences may also be used in strings. @xref{Non-ASCII
+in Strings}.
@node Ctl-Char Syntax
@subsubsection Control-Character Syntax
@@ -1026,40 +1023,53 @@
@node Non-ASCII in Strings
@subsubsection address@hidden Characters in Strings
- You can include a address@hidden international character in a
-string constant by writing it literally. There are two text
-representations for address@hidden characters in Emacs strings
-(and in buffers): unibyte and multibyte (@pxref{Text
-Representations}). If the string constant is read from a multibyte
-source, such as a multibyte buffer or string, or a file that would be
-visited as multibyte, then Emacs reads the address@hidden
-character as a multibyte character and automatically makes the string
-a multibyte string. If the string constant is read from a unibyte
-source, then Emacs reads the address@hidden character as unibyte,
-and makes the string unibyte.
-
- Instead of writing a address@hidden character literally into a
-multibyte string, you can write it as its character code using a hex
-escape, @address@hidden, with as many digits as necessary.
-(Multibyte address@hidden character codes are all greater than
-256.) You can also specify a character in a multibyte string using
-the @samp{\u} or @samp{\U} Unicode escape syntax (@pxref{General
-Escape Syntax}). In either case, any character which is not a valid
-hex digit terminates the construct. If the next character in the
-string could be interpreted as a hex digit, write @address@hidden }}
-(backslash and space) to terminate the hex escape---for example,
+ There are two text representations for address@hidden
+characters in Emacs strings: multibyte and unibyte (@pxref{Text
+Representations}). Roughly speaking, unibyte strings store raw bytes,
+while multibyte strings store human-readable text. Each character in
+a unibyte string is a byte, i.e.@: its value is between 0 and 255. By
+contrast, each character in a multibyte string may have a value
+between 0 to 4194303 (@pxref{Character Type}). In both cases,
+characters above 127 are address@hidden
+
+ You can include a address@hidden character in a string constant
+by writing it literally. If the string constant is read from a
+multibyte source, such as a multibyte buffer or string, or a file that
+would be visited as multibyte, then Emacs reads each
address@hidden character as a multibyte character and
+automatically makes the string a multibyte string. If the string
+constant is read from a unibyte source, then Emacs reads the
address@hidden character as unibyte, and makes the string
+unibyte.
+
+ Instead of writing a character literally into a multibyte string,
+you can write it as its character code using an escape sequence.
address@hidden Escape Syntax}, for details about escape sequences.
+
+ If you use any Unicode-style escape sequence @samp{\uNNNN} or
address@hidden in a string constant (even for an @acronym{ASCII}
+character), Emacs automatically assumes that it is multibyte.
+
+ You can also use hexadecimal escape sequences (@address@hidden) and
+octal escape sequences (@address@hidden) in string constants.
address@hidden beware:} If a string constant contains hexadecimal or
+octal escape sequences, and these escape sequences all specify unibyte
+characters (i.e.@: less than 256), and there are no other literal
address@hidden characters or Unicode-style escape sequences in
+the string, then Emacs automatically assumes that it is a unibyte
+string. That is to say, it assumes that all address@hidden
+characters occurring in the string are 8-bit raw bytes.
+
+ In hexadecimal and octal escape sequences, the escaped character
+code may contain any number of digits, so the first subsequent
+character which is not a valid hexadecimal or octal digit terminates
+the escape sequence. If the next character in a string could be
+interpreted as a hexadecimal or octal digit, write @address@hidden }}
+(backslash and space) to terminate the escape sequence. For example,
@address@hidden }} represents one character, @samp{a} with grave
accent. @address@hidden }} in a string constant is just like
backslash-newline; it does not contribute any character to the string,
-but it does terminate the preceding hex escape. Using any hex escape
-in a string (even for an @acronym{ASCII} character) automatically
-forces the string to be multibyte.
-
- You can represent a unibyte address@hidden character with its
-character code, which must be in the range from 128 (0200 octal) to
-255 (0377 octal). If you write all such character codes in octal and
-the string contains no other characters forcing it to be multibyte,
-this produces a unibyte string.
+but it does terminate any preceding hex escape.
@node Nonprinting Characters
@subsubsection Nonprinting Characters in Strings
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Emacs-diffs] /srv/bzr/emacs/emacs-24 r110769: Clarify documentation about escape sequences in strings.,
Chong Yidong <=