First comments: I would like to see the sequence "\#" treated differently. Instead of "As soon as awk sees the ‘#’ that
starts a comment, it ignores everything on the rest of the
line", I would like the rule to be "As soon as awk sees the ‘#’ that
starts a comment, it ignores the # and everything on the rest of the
line" so that the "\" before the comment starter would be treated as a line continuation. This would facilitate commentary in the middle of statements. Today the sequence "\#" gives the error "backslash not last character on line", so this would not break working code.
Second, and the real reason I'm writing, is that I'd like to see consecutive constant regular expressions concatenated. So instead of writing:
x() { for p in "$@" ; do if [[ $p == ' ' || ${p:0:1} == "#" ]]; then continue; fi echo -n $p done } ... ... echo "(123) xxx - yyy" | gawk '// { match ($0,/'$(x ' ' '[[:space:]]*' '# optional leading spaces ' '(\([[:digit:]]+\))' '# digits in parens ' '[[:space:]]*' '# blah blah blah... ' '([^-[:space:]]+)' ' ' '[[:space:]]*' ' ' '-' ' ' '[[:space:]]*' ' ' '([^[:space:]]+)' ' ' '[[:space:]]*' ' ' )'/, result) print "Number: " result[1], # show match result "Prefix: " result[2], "Suffix "result[3] }'
An alternative to my first suggestion, but only for this particular case, would be to ignore newlines following constant regular expessions, but I'm not proficient enough in awk to know if this change would have other (negative) effects.