bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Bug report, incorrect handling of regular expression with range


From: Tomasz Żok
Subject: Bug report, incorrect handling of regular expression with range
Date: Thu, 18 Jun 2009 23:50:03 +0200

Hello,
I wanted to achieve a simple thing - count and print how many lowercase
letters there are in each line. My first aproach was this:
 {
  print gsub(/[a-z]/, "x")
}
But unfortunately it does not work. This AWK script prints both lowercase
and uppercase letters' count. If I use:
 {
  print gsub(/[[:lower:]]/, "x")
}
Or:
 {
  print gsub(/[qwertyuiopasdfghjklzxcvbnm]/, "x")
}
Then the output is alright
So my guess is that an error is somewhere inside the range modifier of a
regular expression. Because the interval [a-z] is consistent in means of
ASCII codes, there's no way the uppercase letters "incidentally" got treated
as part of [a-z]
Quick brief:
- I am using gawk 3.1.6 on an x86_64 Arch Linux machine
- /[a-z]/ matches incorrectly
- /[[:lower:]]/ or /[qwertyuiopasdfghjklzxcvbnm]/ matches correctly
- test instance:
Asss XXY cAA b
/[a-z]/ returns 11
/[[:lower:]]/ returns 5
Best regards,
Tomasz Żok


reply via email to

[Prev in Thread] Current Thread [Next in Thread]