bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnulib] ISSLASH on Woe32


From: Bruno Haible
Subject: Re: [bug-gnulib] ISSLASH on Woe32
Date: Thu, 28 Apr 2005 16:56:36 +0200
User-agent: KMail/1.5

Paul Eggert wrote:
> Why would gnulib itself need to care
> about the difference between (2) and (4)?  Either way, gnulib can
> easily look for '/' and '\' in path names.  Isn't it up to the
> supplier of the underlying system-call implementation, and/or the
> gnulib user, to decide whether (2) or (4) is in use?  In other words,
> can't gnulib itself be agnostic about (2) versus (4)?

Let's take an example contained in gnulib. (You can find many more examples
which are half contained in gnulib and half contained in 
coreutils/findutils/...)
Take localcharset.c. (Forget for one moment that the code is currently not
used in Woe32, for different reasons.)

The code currently is essentially

      dir = relocate (LIBDIR);

      /* Concatenate dir and base into freshly allocated file_name.  */
      {
        size_t dir_len = strlen (dir);
        size_t base_len = strlen (base);
        int add_slash = (dir_len > 0 && !ISSLASH (dir[dir_len - 1]));
        file_name = (char *) malloc (dir_len + add_slash + base_len + 1);
        if (file_name != NULL)
          {
            memcpy (file_name, dir, dir_len);
            if (add_slash)
              file_name[dir_len] = DIRECTORY_SEPARATOR;
            memcpy (file_name + dir_len + add_slash, base, base_len + 1);
          }
      }

      fp = fopen (file_name, "r");

In approach (2) LIBDIR will be an UTF-8 encoded pathname. The ISSLASH
operation will therefore work correctly. However, fopen() expects a
string in locale encoding, not in UTF-8 encoding. Therefore we have
to replace the last line with

      char *real_file_name = u8_conv_to_locale (file_name);
      fp = fopen (real_file_name, "r");
      free (real_file_name);

Or, alternatively, replace the whose set of libc functions dealing with
pathnames with wrappers that take an UTF-8 string:

      fp = u8_fopen (file_name, "r");

Whereas in approach (4), we can leave the code as it is.

> For example, EUC-JP is also safe.  Or perhaps you're not
> mentioning this because Microsoft doesn't support EUC-JP?  (I'm not
> familiar with their support for various encodings.)

I'm not familiar with it either. But the most comprehensive charset aliases
table
http://dev.icu-project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?rev=1.115
shows that EUC-JP is unknown as a CP<nnn> encoding, whereas UTF-8 is known as
CP1208 and as CP65001. It is also mentioned in
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_17si.asp
Therefore I think it's not possible to recommend an EUC-JP encoded locale
to Windows users.

Bruno





reply via email to

[Prev in Thread] Current Thread [Next in Thread]