bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parse_duration()


From: Bruno Haible
Subject: Re: parse_duration()
Date: Wed, 5 Nov 2008 12:06:29 +0100
User-agent: KMail/1.5.4

Bruce Korb wrote:
> > But it should be documented.
> 
> In the new .h file.

Good!

The documentation should also mention the return value convention of the
function: how does the caller distinguish
  - a successful parse,
  - an out-of-range time_t result,
  - a syntactically invalid input?

Can you also write the module description, or do you want me to do it for you?

> > Do you really mean that the result is locale dependent? Namely, if the user
> > uses U+00A0 (NO-BREAK SPACE) as a separator, he will get a parse success in
> > ISO-8859-1 locales but a parse failure in UTF-8 locales. - The gnulib module
> > 'c-ctype' contains functions that don't have this problem.
> 
> I'm not an internationalization expert.  I'll take a look when I have some 
> time.

The background is: Some characters, such as U+00A0, are single-byte in
some locales on some systems (0xA0 in ISO-8859-1 on Linux libc5) but
multi-byte in other locales:
  $ LC_ALL=de_DE.UTF-8 `which printf` '\u00A0' | od -t x1
  0000000 c2 a0
  0000002
<ctype.h> functions operate on a single byte. In the ISO-8859-1 locale:
  isspace (0xA0) -> true
In the UTF-8 locale:
  isspace (0xC2) -> false
  isspace (0xA0) -> false

The workaround: Decide whether you want locale dependent or locale independent
parsing.
  - If locale dependent, use the mbiter & mbchar modules to handle byte
    sequences such as 0xC2 0xA0 as a single multibyte character.
  - If locale independent, use "c-ctype.h" instead of <ctype.h>.

> >> parse_YMD(char const * pz)
> >> {
> >>   time_t res = 0, val;
> >>   char * ps = strchr(pz, 'Y');
> > 
> > 'ps' should be declared as 'char const *', because 'pz' has const.
> 
> The input to the main function (parse_duration) is const, the rest are
> now non-const not because they've magically become writable, but because
> littering the code with casts because various functions return non-const
> pointers is a big nuisance.

Huh? You don't need a cast to convert from 'char *' to 'char const *'. The
prototype of strchr was defined by ANSI C to be

   extern char *strchr(const char *, int);

so that it can be used in both of these situations without warnings:

   const char *pz = ...;
   const char *ps = strchr (ps, c);

and

   char *pz = ...;
   char *ps = strchr (ps, c);

Really, marking pointers as pointing to 'const' not only helps the reader
to understand that no dirty tricks are being played to the string, but
also helps avoiding bugs through typos when someone writes (*ps = 'x')
when he means (*ps == 'x').

Bruno





reply via email to

[Prev in Thread] Current Thread [Next in Thread]