monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: Tests passing!


From: Nathaniel Smith
Subject: Re: [Monotone-devel] Re: Tests passing!
Date: Fri, 2 Mar 2007 01:53:09 -0800
User-agent: Mutt/1.5.13 (2006-08-11)

On Fri, Mar 02, 2007 at 12:08:03AM +0100, Lapo Luchini wrote:
> Nathaniel Smith wrote:
> > On Thu, Mar 01, 2007 at 12:12:32PM +0100, Lapo Luchini wrote:
> >> [AC_CACHE_CHECK([if iconv supports //IGNORE//TRANSLIT],
> > Seriously, there is no such thing as //IGNORE//TRANSLIT.  Anywhere.
> 
> What do you mean "the is no such thing"? 0_o
> At least in GNU iconv there *sure* is!
>
[code snipped snipped]
>
> to me, this accepts "*//IGNORE//TRANSLIT" and does just as if "*" was
> selected, but with flags "transliterate" and "discard_ilseq" activated.

Hmm, you're quite right.  I hadn't realized that GNU libiconv and GNU
libc iconv were two totally different codebases -- I read the glibc
code earlier when Ulrich made his comments, and his comments are
accurate for that code.

So my current understanding:

libiconv:
  //IGNORE,TRANSLIT or //TRANSLIT,IGNORE are like specifying nothing
  at all, you get no transliteration.

  //IGNORE//TRANSLIT and //TRANSLIT//IGNORE are identical, and both do
  what you'd want.  AFAICT, this means transliterating when possible,
  and ignoring (not inserting question marks!) otherwise.

old glibc:
  //IGNORE,TRANSLIT or //TRANSLIT,IGNORE are like specifying nothing
  at all, you get no transliteration.

  //IGNORE//TRANSLIT is treated like //IGNORE, and //TRANSLIT//IGNORE
  is treated like //TRANSLIT

modern glibc:
  //IGNORE,TRANSLIT or //TRANSLIT,IGNORE do what you'd want (AFAICT,
  not as sure on this one).  I think they're both identical to
  //TRANSLIT, in practice (which does a fallback "transliterate
  everything to question mark" thing).

  //IGNORE//TRANSLIT is treated like //IGNORE, and //TRANSLIT//IGNORE
  is treated like //TRANSLIT

everywhere else:
  probably none of this stuff works at all, and conceivably even
  sticking //foo on the end of your charset will cause one of those
  "unrecognized charset" errors?

I guess we always want question marks, we're going to have to insert
them by hand in at least some cases, and we'd rather not deal with all
of this insanity.  So maybe we should consider:
  -- try opening our iconv handle with //TRANSLIT
  -- if that fails, try opening it the normal way (to account for any
     systems that just don't know //TRANSLIT)
  -- when we actually process bytes using this iconv handle, do
     poor-man's-TRANSLIT handling -- whenever iconv says EILSEQ or
     EINVAL, then dump a question mark to output, advance the input,
     and try again.

This seems like it might be the minimal necessary-and-sufficient code?

-- Nathaniel

-- 
Eternity is very long, especially towards the end.
  -- Woody Allen




reply via email to

[Prev in Thread] Current Thread [Next in Thread]