emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] [OT]: Search for missing :END:


From: Nick Dokos
Subject: Re: [O] [OT]: Search for missing :END:
Date: Mon, 21 Nov 2011 16:38:22 -0500

Markus Heller <address@hidden> wrote:

> Hello all,
> 
> I have an OT request that can hopefully be answered by emacs gurus in
> less than a minute:
> 
> I'm looking for an emacs search expression that finds :PROPERTIES:
> *without* a matching :END: ...
> 

If you mean a regexp, you are wasting your time[fn:1]. Regexps are
powerful, but their range of applicability is limited to regular
languages and even then, you have to worry about their efficiency. The
above *is* a regular language: if P stands for :PROPERTIES: and E stands
for :END:, then the regexp is

    ([^EP]*P[^EP]*E)*

In words, the stuff inside the parens says: 0 or more "other" things
(non-P and non-E), followed by a P, followed by 0 or more "other"
things, followed by an E. You can then have 0 or more of the
parenthesized things. This will succeed on well formed "sentences" and
fail on others.  But it might have to backtrack over the inner [^EP]*
matches and then the outer matches, and rescan arbitrarily long
stretches, which in the worst case, can turn your search into an
exponentially slow descent into the abyss. You might be able to write
non-greedy regexps that might behave better in this case. In most cases,
you'd end up with a horrendous-looking regexp: good luck trying to
understand it next week. That's my biggest problem with complicated regexps.

However, a change of tool will simplify the problem enormously. E.g. here's
a simple algorithm that can be used for this kind of problem:  start a
nesting depth at 0 - when you see a P, increment the nesting depth by 1;
when you see an E, decrement it by 1. If the nesting depth ever becomes
something other than 0 or 1, you got a problem - also, if at EOF, the
nesting depth is not 0, you got a problem. Easy variations of this will
check well-formedness even when nesting *is* allowed.

You can easily write such a program in any language you are familiar
with (it does not have to be elisp, although you *can* write it in
elisp - personally, I'd use awk).

But assuming that you are getting some error from org, you don't know
where the problem is and you are trying to find it, it will be simpler
to just use egrep:

    grep -E -n ':PROPERTIES:|:END:' foo.org

will filter out the relevant lines, so all you have to do is scan the
output by eye and spot any irregularity (consecutive :PROPERTIES: or
consecutive :END: lines). Even if you have hundreds of them, that's
*easy* for humans to do.[fn:2]

Or, if you prefer, you can write trivial validation programs to operate
on the output, e.g.:

        grep -E -n ':PROPERTIES:|:END:' foo.org | tee foo.out | grep PROP | wc 
-l
        grep END foo.out | wc -l

(the counts 'd better be the same).

or

        grep -E -n ':PROPERTIES:|:END:' foo.org | foo.awk

where foo.awk implements the nesting depth algorithm above - something
like this:

--8<---------------cut here---------------start------------->8---
#! /bin/bash

awk '
BEGIN          { d = 0;}
/:PROPERTIES:/ { d++; if (d > 1) { print $1, $d; exit; }}
/:END:/        { d--; if (d < 0) { print $1, $d; exit; }}
END            { if (d != 0) { print $1, $d; }}'
--8<---------------cut here---------------end--------------->8---


Even on Windoze, you can probably do all this stuff with cygwin.

Nick

> Thanks and Cheers and sorry for the OT ...
> 
> Markus
> 
>

Footnotes:

[fn:1] In the (approximate) words of Jamie Zawinski: "You have a
       problem. You think 'I know, let me use regexps to solve it'. Now
       you have two problems."

[fn:2] Of course, if you have formatted your file perversely or done
other naughty things, this might not work. The point is that although
this is not foolproof, it should deal with the vast majority of
"reasonable" files out there.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]