[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: new module c-strstr
From: |
Paul Eggert |
Subject: |
Re: new module c-strstr |
Date: |
Fri, 18 Aug 2006 16:47:28 -0700 |
User-agent: |
Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux) |
Bruno Haible <address@hidden> writes:
> Therefore most of our "c-*" modules should better be called
> "ascii-*" or "unibyte-*".
But both ASCII and other unibyte locales might say that some bytes are
encoding errors. So none of these names are exactly right. I guess
c-* is as good a name as any.
>> I think this claim isn't true for some weird non-ASCII encoding
>> schemes like DBCS-Host.
>
> Are these used as locale encodings? Many of these so-called DBCS encodings
> are stateful and therefore not usable as locale encodings.
Some are stateful, some not. As I understand it, the former are more
common, but I have practical experience only with the latter. They
are used as locale encodings in C environments. I'd expect Cobol to
be similar but don't know about it.
> Non-nearly-ASCII-compatible encodings don't appear in the world where GNU
> programs are deployed.
This is true for GNU programs that deal with encodings. My guess is
that most people who use GNU software use --disable-nls and the like
when they run in non-ASCII environments, and don't bother to file bug
reports because they don't expect much help from us. That being said,
GNU make and GCC are used on OS/390, as well as Python and Perl.
People have ported other GNU tools like M4. (Admittedly it is an
uphill battle...)
> But it's important to know that c_strstr (s, "x") is not safe and
> c_strstr (s, "123") is also not safe. The programmer needs to have the
> precise criteria.
I don't quite follow this. c_strstr (S, "x") is safe in all cases; it
never has undefined behavior. It's true that the result might not
be the same as strstr (S, "x"), but that's the point of having
c_strstr, right? So I would change this:
> /* The functions defined in this file assume a nearly ASCII compatible
> character set. */
to
/* The functions defined in this file act on null-terminated byte
strings, without regard to locale. */
and this:
> This function is safe to be called, even in a multibyte locale, if NEEDLE
> ...
to this:
> This function is safe to be called, even in all known multibyte locales
> derived from ASCII, if NEEDLE ...