Hello there,
I would like to report a bug with GNU Awk version 3.1.4.
Earlier today I encountered an issue with the Awk utility, specifically with the --re-interval switch toggled on.
The data in question is a file called lab3.data that contains this:
Mike Harrington:(510) 548-1278:250:100:175
Christian Dobbins:(408) 538-2358:155:90:201
Susan Dalsass:(206) 654-6279:250:60:50
Archie McNichol:(206) 548-1348:250:100:175
Jody Savage:(206) 548-1278:15:188:150
Guy Quigley:(916) 343-6410:250:100:175
Dan Savage:(406) 298-7744:450:300:275
Nancy McNeil:(206) 548-1278:250:80:75
John Goldenrod:(916) 348-4278:250:100:175
Chet Main:(510) 548-5258:50:95:135
Tom Savage:(408) 926-3456:250:168:200
Elizabeth Stachelin:(916) 440-1763:175:75:300
I have a question from a textbook of mine that reads:
Print all first names containing only four characters.
The obvious solution to this answer is this:
awk -F'[ :]' --re-interval '$1 ~ /^[A-Z][a-z]{3}$/{print $1}' lab3.data
Which returns:
However, while working towards finding this answer I stumbled across this line:
awk -F'[ :]' --re-interval '$1 ~ /^[A-Z]{4}$/{print $1}' lab3.data
Which for some reason also returned the data above. This shouldn't be the case since the regular _expression_ in the second awk statement specifies only 4 capital letters and not 1 capital and 3 lowercase which it should be. However, with 5 letters it doesn't return it with just the uppercase, I have to put 1 uppercase and 4 lowercase to get the results. On top of this, with 3 of the uppercase it misses one of the results (Dan) but with 1 uppercase and 2 lowercase it does not. My instructor and I spent over half an hour trying to figure this problem out (doing crazy things with line endings to make sure the Windows line endings weren't involved and changing the structure of the file as well) and still couldn't find an answer, so we came to the conclusion that it's probably a bug and he told me to send you guys an e-mail.
Thanks in advance,
-Christo Mavrick, John Abbott College Student