emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: syntax-propertize-function vs indentation lexer


From: Stefan Monnier
Subject: Re: syntax-propertize-function vs indentation lexer
Date: Fri, 31 May 2013 09:23:37 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (gnu/linux)

>> Sounds expensive.  How does it cope with large buffers?

> Not clear yet - I'm still getting the Ada grammar right. 

> The parser is actually generalized LALR, which spawns parallel parsers
> for grammar conflicts and ambiguities. So it can be very slow when the
> grammar has too many conflicts or is ambiguous - running 64 parsers in
> parallel is a lot slower than running 1 :). But it works well when the
> conflict can be resolved in a few tokens, and is much easier than
> reconstructing the grammar to eliminate the conflict.

Aha!  So the parser is a separate executable written in some other
language than Elisp, I guess?  In that case parsing speed should not be
a serious concern (except for the possible explosion of parallel parsers).

Although, using it for indentation makes speed a real concern:
in many cases one does "edit+reindent", so if you put a "full reparse"
between the two, it needs to be about as fast as instantaneous.

> Such a file would never be accepted in any project I have worked on, in
> any source language, unless it was generated from some other source.

I agree that 1MB is very unusual, but emacs/src/xdisp.c is pretty damn
close to 1MB.  And I've seen several times files of several hundred KB.

So I agree that "fast enough at 1MB" implies "fast enough", but if speed
is a problem for 1MB, then it might also be a problem for a real file on
a real machine (maybe also because that machine is slower than yours).

> Of course, it should be possible to open such a file in any case, so
> perhaps I'll need an explicit limit to disable parsing on large files.

Making sure C-g can be used should be sufficient.

> But any discussion of parser speed is premature at this point.

Fair enough.

>>> So I need to call
>>> (syntax-propertize (point-max))
>>> in ada-mode
>> 
>> I wouldn't put it in ada-mode, no.  Instead, I'd put it closer to the
>> code that actually needs those properties to be applied. E.g. I'd
>> either put it in the LALR parser code (if that code needs the syntax
>> properties) or in the indentation code.
> There may be other code, completely independent of the parser, that
> relies on syntax; imenu, for example.

Then imenu should call syntax-propertize as well.

> I'm also using the cached data for navigation (moving from 'if' to
> 'then' to 'elsif' to 'end if' etc); that is logically independent of
> indentation (but not of the parser, of course).

Then navigation should also call syntax-propertize (indeed smie's sexp
navigation also calls syntax-propertize for the same reason).

> Since the parser is asynchronous from the indentation, it would have to
> go in the parser (actually lexer) code. wisi-forward-token would be a
> logical place. But what would be the right guess for 'end'? The first
> step in wisi-forward-token is forward-comment, which can skip quite large
> portions of the buffer. 

I have the same problem in SMIE navigation, indeed.  For backward
navigation, that's not a problem, but for forward navigation, I don't
have a good answer.  Luckily, SMIE mostly cares about backward
navigation since that's what needed for indentation, but currently
forward navigation can bump into parse bugs for failure of calling
syntax-propertize on the text being considered.

> How does syntax.el take care of this? The only function on
> after-change-functions by default is jit-lock-after-change. And that's
> only there if font-lock is on.

It's added to before-change-functions.

> Syntax properties are closely tied to the text (they are an extension of
> the syntax table), and used by several independent functions, and thus
> should be kept consistent with the text as much as possible. So
> syntax-propertize should be run whenever the text changes.

That's doing the work eagerly, whereas syntax-propertize is designed to
do the work lazily.  In your case, since you very often need to look at
syntax properties and that you often need them to be correct upto
point-max, laziness probably doesn't buy you much.  But you may suffer
from performance problems.

> Another design choice would be to have all the low-level functions that
> rely on syntax (forward-comment, forward-word, etc) call
> syntax-propertize. That would certainly be more transparent, and is
> consistent with what you are advocating.

Right, that's the direction I'm headed.

> But that runs into the 'reasonable guess for end' problem; I think the
> language mode is the best place to resolve that problem. A language
> hook to provide the guess would be reasonable, but that hook could be
> expensive (since it reduces parser time, which is even more
> expensive), and thus should not be called more often than necessary
> (certainly not for every call of forward-comment).

I think a better option for forward navigation is to parse "upto 1KB
ahead" and constantly check whether we reached that parse limit to push
it 1KB further.

> I think you are actually advocating for a third choice; any code that
> depends on low-level syntax functions must be aware of
> syntax-propertize, and call it appropriately. That makes sense.

Right.

> It would help if the doc string for parse-partial-sexp mentioned
> syntax-propertize and syntax-ppss-flush-cache; then I would have been
> aware of this issue sooner.

I consider parse-partial-sexp as a mostly internal detail of
syntax-ppss nowadays.


        Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]