[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: eight-bit char handling in emacs-unicode
From: |
Stefan Monnier |
Subject: |
Re: eight-bit char handling in emacs-unicode |
Date: |
18 Nov 2003 22:05:39 -0500 |
User-agent: |
Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50 |
> I see. Apart from the design itself, I agree that it's difficult to
> introduce a new type. But, when I discussed with Richard about the
> Character type object a few year ago, he was not that negative provided
> that it gives sure improvement.
Sounds about right to me: we have one free tag that we could use for chars
(and that I currently use to boost the max buffer size from 256MB to 512MB
in my local code).
But it needs to pay for itself.
> Then, we can't use make-string-unibyte for the current case
> because, in emacs-unicode, (concat '(?a 192)) returns a
> multibyte string whose second element is A-grave, not an
> eight-bit-char. Am I missing something?
Well, obviously we need to make it accept this case (i.e. accept both the
latin-1 192 and the eight-bit-char 192). I'm sure there'll be other issues.
I haven't had much time to think about it and you're obviously better
placed to foresee potential problems.
>> To do what your string-make-unibyte does you should use
>> `encode-coding-string' where the coding system is passed explicitly.
> Those are conceptually different things (I remember the
> similar discussion we had a while ago).
> encode-coding-string does:
> char-sequence --CCS-set--> (CCS/codepoint-pair)-sequence
> --CES--> encoded-byte-sequence
> string-make-unibyte does:
> char-sequence --CCS--> code-point-sequence
> --concat--> code-point-sequence
> These two yield the same result only when CCS support all
> chars in "char-sequence" and CES is stateless
> (e.g. iso-latin-1) and .
You lost me here (I'm a poor soul whose doesn't know much outside of the
latin-1 world).
I thought that string-make-unibyte only behaves meaningfully for
"normal 8bit coding-systems" such as latin-1.
>> I've changed my Emacs so that string-make-unibyte does the above
>> (i.e. signals an error if it encounters a non-byte char) and it works fairly
>> well, except for the few places where the elisp code is sloppy and needs to
>> be fixed.
> How did you change it? string-make-unibyte internally uses
> the function copy_text. Did you change it? But, then, each
> time you copy a multibyte string into a unibyte buffer, you
> should get an error.
Of course: it's an error. A unibyte buffer cannot represent multibyte
chars, so you need to encode them first (into a unibyte string).
Now to tell you the truth, my change had to accept a few (not so) special
cases and it took a bit of fiddling to make the code lenient enough to
accept elisp code I didn't feel like "fixing". I can't remember the details
off-hand, but I remember having problems with regexp matching functions
where multibyte regexps are used in unibyte buffers.
-- Stefan
- Re: eight-bit char handling in emacs-unicode, (continued)
- Re: eight-bit char handling in emacs-unicode, Simon Josefsson, 2003/11/14
- Re: eight-bit char handling in emacs-unicode, Alex Schroeder, 2003/11/16
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/17
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/18
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/18
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/18
- Re: eight-bit char handling in emacs-unicode,
Stefan Monnier <=
- Re: eight-bit char handling in emacs-unicode, Juri Linkov, 2003/11/19
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/19
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/20
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/20
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/21
- Re: eight-bit char handling in emacs-unicode, Stefan Monnier, 2003/11/22
- Re: eight-bit char handling in emacs-unicode, Kenichi Handa, 2003/11/23