bug#27681: grep: Combining Mark-Nonspacing are classified as [:punct:]

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#27681: grep: Combining Mark-Nonspacing are classified as [:punct:]

From:	Santiago
Subject:	bug#27681: grep: Combining Mark-Nonspacing are classified as [:punct:]
Date:	Thu, 13 Jul 2017 15:21:40 +0200

Hi,

I would like to forward the issue below, reported by Panu Kalliokoskii
in 2012 (better late than never!). I think the correct category is
Mark-nonspacing, but I am not very familiar with Unicode though.

It still occurs in grep 3.1. In this case, using the U+0301 acute accent:

 $ echo árbol | grep -o '[[:alpha:]]*'
 a
 rbol

Cheers,

 -- Santiago

On Mon, 05 Mar 2012 13:08:43 +0200 "Panu A. Kalliokoski" <address@hidden> wrote:
> Package: grep
> Version: 2.6.3-3
> Severity: normal
> 
> 
> It seems that grep misclassifies combining letters (unicode class Lm) as
> punctuation, when they should be letters.  For instance:
> 
> $ echo d̪ʌ̀lì | grep -o '[[:alpha:]]*'
> d
> ʌ
> li
> 
> As a consequence, combining accents are not seen as "word-constituent":
> 
> $ echo d̪ʌ̀lì | grep -o '\w*'
> d
> ʌ
> li
> 
> This causes also false positives on word-boundary conditions, such as
> the below:
> 
> $ echo d̪ʌ̀lì | grep -w ʌ
> d̪ʌ̀lì
> 
> I suggest that combining letters should be part of [:alpha:] instead of
> [:punct:].

[Prev in Thread]

Current Thread

[Next in Thread]

bug#27681: grep: Combining Mark-Nonspacing are classified as [:punct:], Santiago <=
- bug#27681: grep: Combining Mark-Nonspacing are classified as [:punct:], Paul Eggert, 2017/07/13
  - bug#27681: grep: Combining Mark-Nonspacing are classified as [:punct:], Santiago, 2017/07/17

Prev by Date: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem
Next by Date: bug#27681: grep: Combining Mark-Nonspacing are classified as [:punct:]
Previous by thread: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem
Next by thread: bug#27681: grep: Combining Mark-Nonspacing are classified as [:punct:]
Index(es):
- Date
- Thread