bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16581: suggested code simplification in dfa.c


From: arnold
Subject: bug#16581: suggested code simplification in dfa.c
Date: Thu, 30 Jan 2014 08:51:34 -0700
User-agent: Heirloom mailx 12.4 7/29/08

Paul Eggert <address@hidden> wrote:

> Aaron Crane wrote:
> > I'd expect U+01C8 LATIN CAPITAL LETTER L WITH SMALL
> > LETTER J ("Lj", roughly) to be U+01C7 LATIN CAPITAL LETTER LJ ("LJ")
> > under towupper(), and U+01C9 LATIN SMALL LETTER LJ ("lj") under
> > towlower().
>
> Ouch, thanks, I hadn't considered that.  So my idea was all wrong.  But 
> this means the current code is all wrong too.  I'll take a look at it. I 
> hope I don't regret picking up this thread....

This seems to be a weird (and very much corner) case: wc != towlower(wc)
and wc != towupper(wc).  It can only be an issue if doing case folding,
and there are only a few spots in the code that deal with case folding
when compiling the dfa.

I suggest starting with the XOR changes for unibyte locales - they seem
(to me) to be good no matter what. And then separately try to deal with
the multibyte case.

And just to increase the need for Aspirin, any idea how regex handles
this case?  I would not be surprised if the code there also doesn't
catch this.  Wheeeeeeeee!  :-)

Arnold





reply via email to

[Prev in Thread] Current Thread [Next in Thread]