[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Bug report, incorrect handling of regular expression with range
From: |
Tomasz Żok |
Subject: |
Bug report, incorrect handling of regular expression with range |
Date: |
Thu, 18 Jun 2009 23:50:03 +0200 |
Hello,
I wanted to achieve a simple thing - count and print how many lowercase
letters there are in each line. My first aproach was this:
{
print gsub(/[a-z]/, "x")
}
But unfortunately it does not work. This AWK script prints both lowercase
and uppercase letters' count. If I use:
{
print gsub(/[[:lower:]]/, "x")
}
Or:
{
print gsub(/[qwertyuiopasdfghjklzxcvbnm]/, "x")
}
Then the output is alright
So my guess is that an error is somewhere inside the range modifier of a
regular expression. Because the interval [a-z] is consistent in means of
ASCII codes, there's no way the uppercase letters "incidentally" got treated
as part of [a-z]
Quick brief:
- I am using gawk 3.1.6 on an x86_64 Arch Linux machine
- /[a-z]/ matches incorrectly
- /[[:lower:]]/ or /[qwertyuiopasdfghjklzxcvbnm]/ matches correctly
- test instance:
Asss XXY cAA b
/[a-z]/ returns 11
/[[:lower:]]/ returns 5
Best regards,
Tomasz Żok
- Bug report, incorrect handling of regular expression with range,
Tomasz Żok <=