help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to grok a complicated regex?


From: Marcin Borkowski
Subject: Re: How to grok a complicated regex?
Date: Sat, 14 Mar 2015 00:16:50 +0100

On 2015-03-13, at 23:46, Emanuel Berg <embe8573@student.uu.se> wrote:

> Marcin Borkowski <mbork@wmi.amu.edu.pl> writes:
>
>> so I have this monstrosity [note: I know, there are
>> much worse ones, too!]:
>>
>> "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'"
>>
>> (it's in the org-latex--script-size function in
>> ox-latex.el, if you're curious).
>>
>> I'm not asking “what does this match” – I can read
>> it myself. But it comes with a considerable effort.
>
> I dare say most people (even programmers) cannot read
> that so if you can that's great. As a math

Really?  It's not /that/ difficult.  You only need enough coffee (or
tea, in my case), time and motivation.  You don’t need a genius, or even
IQ higher than, say, 90 or so.  It's not really /difficult/.
Intimidating, yes.  Boring, possibly.  Laborious (and mechanical), yes.
But not /difficult/.

> professional you are of course aware of the discipline
> called automata theory that deals with such things.

Well, as an analyst working in metric fixed point theory, that's just
it.  I'm /aware/ of automata theory – (almost) nothing more. ;-)

> Perhaps relational algebra might help to, if the data
> in the sets are strings. But automata theory should be
> it even more.
>
> Also, remember you don't have to understand those
> expressions. Often they are setup incrementally. They
> only need to be correct. The computer understands them
> - the programmer only understands the purpose, and the
> latest edition. Kind of risky, perhaps not what I math
> person would be appealed by, but I've constructed many
> that way so I know that method works.

That reminds me of the von Neumann quote: “In mathematics, you don’t
/understand/ things – you just /get used/ to them.”

>> Are you aware of any tools that might help to
>> understand such regexen?
>
> I have seen tools with which you can construct such
> expressions and they output figures, states,
> transitions, and so on. I wonder how advanced
> expression they can deal with? But if you get the
> basics right, it should be just basic building blocks
> that stick together and from there on the sky is the
> limit.
>
> Instead the problem is, as I see it: will those
> figures, balls and arrows, tagged with preconditions,
> postconditions, everything you can think of, will that
> actually be *clearer*?

As we both point out, I’m not talking about changing the representation,
but about making the existing one (which I agree is not /that/ bad) more
comprehensible.  Font lock, grouping and unescaping backslashes would be
definitely helpful.

OTOH, I can imagine that some kind of diagrams might be helpful for
someone.  The point is, in the end you have to read/write these regexen
in their normal form anyway, so why not train yourself to understand
their “default” representation instead of adding the burden of
translationg between representations?

> If I were to do it (which I am not thanks god) my
> answer would be *no*. The only way I could do it would
> instead be the opposite. Train the brain with such
> expressions - exactly as they are - day in, day out,
> until they are second nature.
>
> Example: a C++ OO project with classes and everything.
> Silly inheritance and interfaces. Some people would
> consider those pretty darn difficult to understand.
> But to the seasoned C++ programmer (no exaggerating
> here, a few years of focused training is enough) those
> programs are clear. For those guys, giving up writing
> C++ code and instead using some other representation
> (be it graphical or not) would be to in one stroke
> cripple their skills.
>
> So no, I think that representation is the best there
> is. To translate it back and forth would not only be

I’m not sure whether it’s the best – but it’s a standard (more or less,
Emacs’ regexen are not really “standard” by today’s, well, standards –
but hardly anything about Emacs is “standard” or “typical”, so who
cares;-)).

> very difficult to do - and even if possible, which of

I disagree.  I don’t think that such a translator would be a difficult
one to write.

If only I was a student again, with plenty of spare time, I might have
taken the challenge and tried to write one in TeX, so that some TeX
macro, given an (Emacs) regex would produce a nicely typeset diagram.

Wow, what a nice project for a bachelor’s thesis.  Wait a minute.
Ohboyohboyohboy.  I have to put this in my faculty’s database of
potential topics.  Poor students... ;-)

(BTW, I did once write a poor man’s parser in pure TeX; since there were
no regex engine written in TeX back then (now there is one!), I had to
craft a simple automaton myself.  Not an extremely pleasant work...)

> course it is, because a representation is just a
> representation of I don't know how many possible - I
> don't see the end result being any more clear: on the
> contrary, most likely.
>
> What I would do - try to get it more readable by using
> classes, string classes (do they exist?), and even
> more advanced constructs if necessary - as in this
> simple example:
>
>     (defconst stop-char-default "\\([[:punct:]]\\|[[:space:]][[:alnum:]]\\)")
>
> How do you define those? Can you identify any which
> aren't there, but could/should be?
>
> Example: say there is a class called "delimiters"
> which contain [, (, {, <, >, }, ), and ]. Can you
> split that up, in "opening-delimiters" and closing
> ditto?
>
> Second, exactly you mentioned - the font lock issue -
> work on that.
>
> You do know, of course, of
>
>     font-lock-regexp-grouping-construct
>     font-lock-regexp-grouping-backslash
>
> Are there more of those, that you can identify, and
> add?

There could be quite a few.  (As Alexis pointed out, a tool I was
writing about seems to exist – if it’s not satisfactory, I could think
about extending it somehow.  Not very probable, though – I’m too busy
now.  If only someone could be paying me for goofing around and playing
with Emacs hacks...)

Thanks for your input, and best regards!

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



reply via email to

[Prev in Thread] Current Thread [Next in Thread]