bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnulib] addition: c-ctype.h, c-ctype.c


From: Paul Eggert
Subject: Re: [Bug-gnulib] addition: c-ctype.h, c-ctype.c
Date: 27 Jan 2003 14:03:36 -0800

Bruno Haible <address@hidden> writes:

> The module's functions, except c_isascii, should also work on EBCDIC
> hosts

Can't c_isascii also be made to work on EBCDIC hosts?  It would
be equivalent to (c) == 'A' || (c) == 'B' || ..., where you run
through all the ASCII characters.

> #if ('0' <= '9') \
>     && ('0' + 1 == '1') && ('1' + 1 == '2') && ('2' + 1 == '3') \
>     && ('3' + 1 == '4') && ('4' + 1 == '5') && ('5' + 1 == '6') \
>     && ('6' + 1 == '7') && ('7' + 1 == '8') && ('8' + 1 == '9')
> #define C_CTYPE_CONSECUTIVE_DIGITS 1

C89 and later require that the digits must have consecutive codes, so
C_CTYPE_CONSECUTIVE_DIGITS must be 1 on any platform of interest to
gnulib.  All K&R C platforms have consecutive digits too.  So you can
remove this definition and replace C_CTYPE_CONSECUTIVE_DIGITS with 1.

> #if (' ' == 32) && ('!' == 33) && ('"' == 34) && ('#' == 35) \
>     && ('%' == 37) && ('&' == 38) && ('\'' == 39) && ('(' == 40) \
>     && (')' == 41) && ('*' == 42) && ('+' == 43) && (',' == 44) \
>     && ('-' == 45) && ('.' == 46) && ('/' == 47) && ('0' == 48) \
>     && ('1' == 49) && ('2' == 50) && ('3' == 51) && ('4' == 52) \
>     && ('5' == 53) && ('6' == 54) && ('7' == 55) && ('8' == 56) \
>     && ('9' == 57) && (':' == 58) && (';' == 59) && ('<' == 60) \
>     && ('=' == 61) && ('>' == 62) && ('?' == 63) && ('A' == 65) \
>     && ('B' == 66) && ('C' == 67) && ('D' == 68) && ('E' == 69) \

This can be simplified by writing ('A' == 65) &&
C_CTYPE_CONSECUTIVE_UPPERCASE.  Similarly for 'a' and '0'.

If you're being this complete, shouldn't you also need to check for
every character in the portable executable character set, e.g. '\n',
'\t'?  Also, shouldn't you test for characters like '$' and '@', which
are in ASCII (aka ISO 646 IRV:1991) but are not in the portable
C character set?

> /* The character set is ISO-646, not EBCDIC. */
> #define C_CTYPE_ASCII 1

I got confused by this comment because there are several revisions of
ISO 646, and some of them are not the same as ASCII.  Also, the symbol
is defined to 1 for compatible character sets like Latin 1, which are
pure extensions to ASCII.  Assuming that I understand correctly what
you're trying to do, please replace this with:

/* The character set is compatible with ASCII (ISO 646 IRV:1991).  */
#define C_CTYPE_ASCII 1


> #define c_isascii(c) \
>   ({ int __c = (c); \
>      ((__c & ~0x7f) == 0); \
>    })

This isn't correct for a signed-magnitude host where EOF is -1,
because on such hosts EOF & ~0x7f evaluates to -0, which has a
distinct representation from 0 but which equals 0.  We don't want
c_isascii (EOF) to yield 1.  Better is:

#define c_isascii(c) \
  ({ int __c = (c); \
     unsigned __uc = __c;
     (__uc <= 0x7f); \
   })

There are similar problems with the c_isascii function and with
the c_iscntrl macro and function.

Wouldn't it be more readable, and more debuggable, to use inline
functions instead of macros?  Something like this for c_isascii:

static inline bool
c_isascii (int c)
{
  unsigned uc = c;
  return uc <= 0x7f;
}

These days one can assume inline functions for developers using GCC.


> #if C_CTYPE_CONSECUTIVE_DIGITS \
>     && C_CTYPE_CONSECUTIVE_UPPERCASE && C_CTYPE_CONSECUTIVE_LOWERCASE
> #define c_isalnum(c) \
>   ({ int __c = (c); \
>      ((__c >= '0' && __c <= '9') \
>       || (__c >= 'A' && __c <= 'Z') \
>       || (__c >= 'a' && __c <= 'z')); \
>    })
> #endif

If speed is important, and if we're tuning for ASCII, wouldn't it be
better to use ('A' <= __c & ~('a' - 'A') && __c & ~('a' - 'A') <= 'Z')?
That will make the code shorter, and faster on the average.


>   switch (c)
>     {
>     case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
>     case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
>     case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
>     etc.

Rather than have all these switch tables for the non-ASCII case, how
about having a single function that returns the ASCII code for each
ASCII char, and returns (say) EOF for any non-ASCII char, and then
convert the char to ASCII with that function and then use the existing
ASCII-only code on the result?  That will make the code easier to read
and less error-prone.  It would also solve the c_isascii problem.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]