bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnulib] addition: c-ctype.h, c-ctype.c


From: Bruno Haible
Subject: Re: [Bug-gnulib] addition: c-ctype.h, c-ctype.c
Date: Tue, 28 Jan 2003 14:37:32 +0100 (CET)

Paul Eggert writes:

> > The module's functions, except c_isascii, should also work on EBCDIC
> > hosts
> 
> Can't c_isascii also be made to work on EBCDIC hosts?  It would
> be equivalent to (c) == 'A' || (c) == 'B' || ..., where you run
> through all the ASCII characters.

This wouldn't bring a lot. Programs which use isascii or c_isascii
often also do some bit fiddling on the characters, and need some
changes for EBCDIC awareness anyway.

Besides that, how would I denote e.g. #\End-Of-Transmission in a
portable way, so that it evaluates to 0x04 on ASCII hosts and 0x37 on
EBCDIC hosts?

> > #if ('0' <= '9') \
> >     && ('0' + 1 == '1') && ('1' + 1 == '2') && ('2' + 1 == '3') \
> >     && ('3' + 1 == '4') && ('4' + 1 == '5') && ('5' + 1 == '6') \
> >     && ('6' + 1 == '7') && ('7' + 1 == '8') && ('8' + 1 == '9')
> > #define C_CTYPE_CONSECUTIVE_DIGITS 1
> 
> C89 and later require that the digits must have consecutive codes, so
> C_CTYPE_CONSECUTIVE_DIGITS must be 1 on any platform of interest to
> gnulib.  All K&R C platforms have consecutive digits too.  So you can
> remove this definition and replace C_CTYPE_CONSECUTIVE_DIGITS with 1.

Thanks for the tip! Done.

> > #if (' ' == 32) && ('!' == 33) && ('"' == 34) && ('#' == 35) \
> >     && ('%' == 37) && ('&' == 38) && ('\'' == 39) && ('(' == 40) \
> >     && (')' == 41) && ('*' == 42) && ('+' == 43) && (',' == 44) \
> >     && ('-' == 45) && ('.' == 46) && ('/' == 47) && ('0' == 48) \
> >     && ('1' == 49) && ('2' == 50) && ('3' == 51) && ('4' == 52) \
> >     && ('5' == 53) && ('6' == 54) && ('7' == 55) && ('8' == 56) \
> >     && ('9' == 57) && (':' == 58) && (';' == 59) && ('<' == 60) \
> >     && ('=' == 61) && ('>' == 62) && ('?' == 63) && ('A' == 65) \
> >     && ('B' == 66) && ('C' == 67) && ('D' == 68) && ('E' == 69) \
> 
> This can be simplified by writing ('A' == 65) &&
> C_CTYPE_CONSECUTIVE_UPPERCASE.  Similarly for 'a' and '0'.
> 
> If you're being this complete, shouldn't you also need to check for
> every character in the portable executable character set, e.g. '\n',
> '\t'?

Well, the c_* functions have the same implementation whether '\n' --
10 or '\n' == 13 (Mac). So why should I exclude the Mac?

> Also, shouldn't you test for characters like '$' and '@', which
> are in ASCII (aka ISO 646 IRV:1991) but are not in the portable
> C character set?

One purpose of this C_CTYPE_ASCII test is to exclude ISO646-JP and
SHIFT_JIS. I need to test '$' for this. Even though '$' is not in the
portable character set, so many programs use it that I'd say it's safe
to use.

> I got confused by this comment because there are several revisions of
> ISO 646, and some of them are not the same as ASCII.

I meant ISO646-JP.

> Also, the symbol
> is defined to 1 for compatible character sets like Latin 1, which are
> pure extensions to ASCII.  Assuming that I understand correctly what
> you're trying to do, please replace this with:
> 
> /* The character set is compatible with ASCII (ISO 646 IRV:1991).  */
> #define C_CTYPE_ASCII 1

Even if the C locale of a system actually uses Latin-1 (such systems
are e.g. AmigaOS), the functions here will use ASCII.

> 
> > #define c_isascii(c) \
> >   ({ int __c = (c); \
> >      ((__c & ~0x7f) == 0); \
> >    })
> 
> This isn't correct for a signed-magnitude host

Such hosts don't exist any more for more than 20 years.

> Wouldn't it be more readable, and more debuggable, to use inline
> functions instead of macros?

More debuggable?? Even in the most recent versions of gcc and gdb,
inline functions are a plague when debugging: you see the line number
of the inline function's code, and in the stack trace you don't see
the inline function's caller any more. For debugging it is best to use
-O0, and in this case "c-ctype.h" will use the external functions, not
the macros.

> > #if C_CTYPE_CONSECUTIVE_DIGITS \
> >     && C_CTYPE_CONSECUTIVE_UPPERCASE && C_CTYPE_CONSECUTIVE_LOWERCASE
> > #define c_isalnum(c) \
> >   ({ int __c = (c); \
> >      ((__c >= '0' && __c <= '9') \
> >       || (__c >= 'A' && __c <= 'Z') \
> >       || (__c >= 'a' && __c <= 'z')); \
> >    })
> > #endif
> 
> If speed is important, and if we're tuning for ASCII, wouldn't it be
> better to use ('A' <= __c & ~('a' - 'A') && __c & ~('a' - 'A') <= 'Z')?
> That will make the code shorter, and faster on the average.

Yes, thanks for nice optim!

> >   switch (c)
> >     {
> >     case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
> >     case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
> >     case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
> >     etc.
> 
> Rather than have all these switch tables for the non-ASCII case, how
> about having a single function that returns the ASCII code for each
> ASCII char, and returns (say) EOF for any non-ASCII char, and then
> convert the char to ASCII with that function and then use the existing
> ASCII-only code on the result?

This would make the code less elegant: If I can define c_isalnum etc.
abstractly without referring to ASCII, then an implementation which
doesn't need to refer to ASCII is more beautiful.

Like in mathematics: you can define the term "finite dimensional
vector space over R" abstractly without referring to R^n, therefore
defining it using an isomorphism to R^n is considered less beautiful.

> That will make the code easier to read and less error-prone.

I disagree here.

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]