[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Nick Dokos: texi2dvi egrep regexp
From: |
Nick Dokos |
Subject: |
Nick Dokos: texi2dvi egrep regexp |
Date: |
Fri, 08 Oct 2010 14:26:01 -0400 |
[Try again: Misspelt the mailing list name the first time]
------- Forwarded Message
Date: Fri, 08 Oct 2010 13:57:43 -0400
From: Nick Dokos <address@hidden>
To: address@hidden
cc: address@hidden, "Eric S. Fraga" <address@hidden>,
Suvayu Ali <address@hidden>
Subject: texi2dvi egrep regexp
There was a discussion about some problems with the egrep regexp
that texi2dvi uses back in March 2010 in the thread entitled
texi2dvi: locale-dependent error in egrep [A-z]
(see http://lists.gnu.org/archive/html/bug-texinfo/2010-03/msg00031.html
and following).
Has anything come of that? The reason I am asking is that recently emacs
org-mode tried to switch to texi2dvi for org->pdf exporting and several
people have reported this problem. The underlying reason seems to be
that recent versions of egrep check range expressions more strictly:
e.g. Fedora 13 uses grep version 2.6.3 and egrep fails the range check.
OTOH, Ubuntu 10.04 uses grep version 2.5.4: egrep does not fail there.
The egrep manual page says:
Within a bracket expression, a range expression consists of two
characters separated by a hyphen. It matches any single
character that sorts between the two characters, inclusive, using
the locale=E2=80=99s collating sequence and character set. For exam=
ple,
in the default C locale, [a-d] is equivalent to [abcd]. Many
locales sort characters in dictionary order, and in these locales
[a-d] is typically not equivalent to [abcd]; it might be
equivalent to [aBbCcDd], for example. To obtain the traditional
interpretation of bracket expressions, you can use the C locale
by setting the LC_ALL environment variable to the value C.
Finally, certain named classes of characters are predefined
within bracket expressions, as follows. Their names are self
explanatory, and they are [:alnum:], [:alpha:], [:cntrl:],
[:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:],
[:upper:], and [:xdigit:]. For example, [[:alnum:]] means
[0-9A-Za-z], except the latter form depends upon the C locale and
the ASCII character encoding, whereas the former is independent
of locale and character set. (Note that the brackets in these
class names are part of the symbolic names, and must be included
in addition to the brackets delimiting the bracket expression.)
Most meta-characters lose their special meaning inside bracket
expressions. To include a literal ] place it first in the list.
Similarly, to include a literal ^ place it anywhere but first.
Finally, to include a literal - place it last.
Given that, would it make sense to replace the egrep invocation in
texi2dvi with
egrep '^(/|[:alpha:]:/)'
which would be valid under any locale? It does not include the
ASCII characters between 'Z' and 'a', which (I was surprised to find
out from Eli's response) could be drive letters, but as Eli also
points out, those are probably never used nowadays.
Thanks,
Nick
------- End of Forwarded Message
- Nick Dokos: texi2dvi egrep regexp,
Nick Dokos <=