bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16919: [PATCH] fix mismatch between dfa and regex for treatment of t


From: Paul Eggert
Subject: bug#16919: [PATCH] fix mismatch between dfa and regex for treatment of titlecase
Date: Sun, 02 Mar 2014 12:37:09 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0

Norihiro Tanaka wrote:

I found difference between dfa and regex (glibc) treatment of titlecase.

Thanks for bringing this up, but I'm afraid that it appears that regex is buggy in this area. The regex code does the match by converting pattern and text to uppercase, and then trying a match with uppercase. But this is incorrect for an example like the following, which uses '\(\)\1' to force using the regex code:

echo 'ς' | grep -i '\(\)\1σ'

This should output nothing, because terminal sigma is not the same as lowercase sigma even when case is ignored. But since the uppercase counterpart of both characters is capital sigma, grep incorrectly outputs the terminal sigma. The dfa code gets it right.

POSIX is muddy in this area, unfortunately, but I don't see any interpretation whereby ς and σ should match when case is ignored.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]