bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16812: Eszett handling


From: Paul Eggert
Subject: bug#16812: Eszett handling
Date: Sat, 08 Mar 2014 10:52:49 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0

'grep' is conforming to its specification, even though it's not as useful as it might be when searching German text. The situation with 'ß'/'SS' is different than the situation with 'lj'/'Lj'/'LJ' because in the latter case 'grep' is dealing only with individual characters.

There's a related issue with 'ß' versus the recently-introduced capital sharp-S 'ẞ'. These do not match each other with 'grep --ignore-case' in the current savannah git master. This is an unfortunate property of how the glibc regex code behaves: the regex code uppercases both pattern and data before comparing, but in the standard German locale 'ß' is unchanged by uppercasing.

I'll leave this bug open as it is an awkward situation. Fixing it would require changing the glibc regex code, which is a big deal -- it would have some performance implications in a lot of programs. So I'm not optimistic about fixing it any time soon.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]