Re: mbcel module for Gnulib?, incomplete multibyte sequences

diffutils-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mbcel module for Gnulib?, incomplete multibyte sequences

From:	Paul Eggert
Subject:	Re: mbcel module for Gnulib?, incomplete multibyte sequences
Date:	Fri, 21 Jul 2023 18:14:02 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 2023-07-21 17:33, Bruno Haible wrote:

It gets this info from mbrtoc32, which on most platforms gets this info
from mbrtowc. This multibyte scanner knows when the bytes it has seen
so far constitute
   - a complete character, or
   - an invalid character, or
   - an incomplete character (i.e. if additional bytes may lead to a
     complete character).

Ah, I had thought that the idea was to treat all the bytes of a bytesequence from 10646-1[1] R.2 Table 1 as a single invalid "character"(i.e., not a real character) if the byte sequence is not valid UTF-8.That's what Kuhn seems to be suggesting in [2].

But what you're saying is something different, that could be implementedby calling mbrtoc32.

For example, as I understand it, the byte sequence F4 90 80 80, which Ihad thought you were saying would be treated as a single byte sequence[F4 90 80 80] because that's in R.2 Table 1, would instead be treated as[F4 90] [80] [80], because [F4 90] is not an incomplete character(additional bytes cannot lead to a complete character).


Is this right?

[1]: https://www.cl.cam.ac.uk/~mgk25/ucs/ISO-10646-UTF-8.html
[2]: https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

[Prev in Thread]

Current Thread

[Next in Thread]

Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/17
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/20
  - Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/21
    - Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/21
    - Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert <=
    - Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/24
    - Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/25
    - Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/22
    - Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/24
    - Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/24
    - Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/24
    - Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/24
    - Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/27
    - Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/28
    - Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/26

Prev by Date: Re: mbcel module for Gnulib?, incomplete multibyte sequences
Next by Date: Re: mbcel module for Gnulib?, incomplete multibyte sequences
Previous by thread: Re: mbcel module for Gnulib?, incomplete multibyte sequences
Next by thread: Re: mbcel module for Gnulib?, incomplete multibyte sequences
Index(es):
- Date
- Thread