help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Regular expressions for Unicode general categories


From: Derick Eddington
Subject: Regular expressions for Unicode general categories
Date: Sun, 07 Dec 2008 12:47:13 -0800

Hello,

I am making an Emacs regular expression for matching R6RS Scheme
"identifiers" (part of the syntax highlighting of a major mode I'm
making), and it needs to match characters based on their Unicode
general categories.  It seems Emacs regular expressions do not provide
a way to do that directly (I'm using Emacs 23.0.60.1) (I couldn't find
anything about this in the Info docs, emacswiki.org, or this list's
archives), so I computed regular expression character sets for the
needed general categories (using `get-char-code-property') and placed
these in their positions in the larger regular expression.

My problem is I can't use it because I get this error: 
  Error during redisplay: (invalid-regexp Regular expression too big) 
which is understandable because the general category character sets
are giant and a bunch of them are used, and I suspect they might have
been too inefficient anyways.

So, what can I do?  If Emacs regular expressions' backslash construct
`\cC' supported Unicode general categories, or if there was some
construct which did, I think that would do it nicely.  Is that
planned, or should I resort to doing more manual parsing, or something
else?

JTMI, the reason identifiers need to be recognized using their
complete lexical specification is because I'm also highlighting
numbers and they have a lexical syntax which overlaps with
identifiers and so identifiers need to be fontified first just so
they're not partially fontified as numbers.

Thank you for help,

-- 
: Derick
----------------------------------------------------------------






reply via email to

[Prev in Thread] Current Thread [Next in Thread]