[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
documentation bug: Mule and MSDOS
From: |
dirk janssen |
Subject: |
documentation bug: Mule and MSDOS |
Date: |
Tue, 27 Mar 2001 19:03:34 +0200 |
This bug report will be sent to the Free Software Foundation,
not to your local site managers!!
Please write in English, because the Emacs maintainers do not have
translators to read other languages for them.
In GNU Emacs 20.6.1 (i386-suse-linux, X toolkit)
of Sat Mar 11 2000 on Hahn
configured using `configure --with-gcc --with-pop --with-system-malloc
--prefix=/usr --exec-prefix=/usr --infodir=/usr/share/info
--mandir=/usr/share/man --sharedstatedir=/var/state --libexecdir=/usr/lib
--with-x --with-x-toolkit=lucid --x-includes=/usr/X11R6/include
--x-libraries=/usr/X11R6/lib i386-suse-linux'
Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:
The documentation on the topic of reading msdos/windows files with
`european' characters on a unix box is unclear. I am a LISP
programmer, but it took me several hours to find the (very simple)
solution to this problem.
Here is a backtrack of my mental states :-)
1. I assumed I had to convert the buffer *after* it was read in
2. I could find info on disabling multibyte, but not much on enabling
it
3. the MULE docs do not mention codepages at all, one has to go to the
emacs on dos section. this section then lists the `dos-codepage-setup'
command, that is not available to me.
4. The other command `codepage-setup' does not change the display at
all, even not when I next choose this as an encoding in the
problematic buffer. Hence, I have no way to check what I am doing.
Scope of the problem:
Although codepages are a completely broken way to `support'
international characters, they are in common use. Windows
generated plain text files use a codepage, and not iso-latin.
Emacs should support them better, especially because all the machinery
is there.
Suggestions:
1. Make the MULE doc more `hands-on'. Currently, it tells me a whole
lot about various options and possibilities, but too little about how
I put it to use.
2. In the mule docs, insert a section on `Reading international files
from MS-DOS or Windows (codepages)'. Suggestion:
-------------------------------------------------------------
Applications on the MS-DOS and Windows platform commonly write files
that are not in any ISO encoding, but use a so-called `code page'. Emacs
has no way to determine the code page from the file, because
different code pages use the same numbers to represent things.
To read these files, tell Emacs which code page has been used to
encode them when opening the file. This is something you need to
write down when saving the file. Windows commonly uses code page 850
for iso-latin-1.
When opening the file, Emacs will convert the text to its internal
format and editing will proceed as usual. Upon saving, the file will
be converted back to the its code page encoding.
Opening a file with a code page takes three steps:
M-x codepage-setup
Extend emacs built-in encodings with one for the specified codepage.
Normal encodings are automatically available, code pages have to be
set up first with this command.
This command asks for the number of the code page, eg. 850.
C-x RETURN C
The encoding prefix. Use this to specify the code page
to use with the next command (which will be `open file'). For each code
page set up above, three encodings are created that represent the
unix, dos, or mac end-of-line conventions. For code page 850, these
are named `cp850-unix', `cp850-dos', and `cp850-mac'.
Usually your DOS files will adhere to the DOS end-of-line
convention, so specify `cp850-dos' (inserting the correct code page
number for your file).
C-x C-f
Open file, using the code page. It will automatically be saved
using the same code page.
While editing a file encoded with a code page, the mode line will show
something like `-D:--'. The `D' stands for Dos code page, and there
are two characters before the `:' to show that multibyte support is
on.
If your buffer contains escape character of the type `\213' and the
mode line shows only one character before the colon, you have read in
the file without specifying the code page. Close the file and read it
in again using the procedure above.
---------------------------------------------------------------
I know this repeats some information that is also available elsewhere
in the mule docs, and some notes and links would be useful. But the
perspective of someone trying to convert the odd `broken platform'
file is VERY different from someone trying to use emacs for Korean in
her/his daily life. Therefore a problem directed info section seems
warranted to me.
Virtually yours,
Dirk
Dirk Janssen
University of Leipzig, Germany
- documentation bug: Mule and MSDOS,
dirk janssen <=