emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: access to parser stack in SMIE


From: Stephen Leake
Subject: Re: access to parser stack in SMIE
Date: Sun, 07 Oct 2012 19:18:07 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (windows-nt)

Stefan Monnier <address@hidden> writes:

>>> Actually, in some cases, it can be made to work: to disambiguate "begin"
>>> setup a loop that calls smie-backward-sexp repeatedly (starting from the
>>> position just before the "begin", of course) checking after each call
>>> whether the result lets us decide which of the two begins we're
>>> dealing with.
>> In the Ada use case I presented (package ... begin), that ends up going
>> all the way back to package at the beginning of the buffer, encountering
>> more "begin" tokens on the way; it's much more efficient to start there
>> in the first place.
>
> In your example, we might indeed end up scanning the buffer 3 times
> (once for the normal scan, once to disambiguate package's `begin', and
> one more time (in various chunks) to disambiguate the `begin's of the
> nested functions).

Yes. Which is why the cache is critical. And improving the cache by
storing the stack at each keyword would be even better.

> But I wonder now: can a "begin" that comes right after a "function
> ... end;" be a begin-open?  

package Package_1 is 

    function Function_1 return Integer
    is begin
       return 1;
    end;

begin

That isn't "begin-open". But how can you be sure that's the case you
have? You can't just stop a the "end"; consider:

package Package_1 is 

    function Function_1 return Integer
    is begin
       begin
         ...
       end;

       begin -- begin-open
         ...
       end;

       return 1;
    end;

begin -- begin-body
    
If you scan backward, and get to "is" (refined to "is-subprogram_body"),
you can stop; the flavor of "is" gives you the flavor of "begin". But
that means crossing an un-refined "begin", possibly many. That's the
problem. You could recurse; if you hit a "begin", start another scan
backwards, and eventually you'll hit "is" and unwind. Maybe that would
work. But it's messier and less general than the forward parse
mechanism, and the other tokens that need forward parse would need their
own variations. Too much ad-hoc code.

>> Switching to another parsing technology just for the forward full-parse
>> means supporting a second grammar; that's a lot of work. It might be
>> simpler to switch to only forward full-parse (which I think is what you
>> are suggesting).
>
> Yes, if you end up doing a full forward parse in most cases anyway,
> there's little point doing extra work to support backward parsing.

At the moment, it's the other way around; there are only a few cases
that need full forward parse, and using smie for that works well enough.
So it's the LALR parser that's extra work.

>> If I switch to only using forward full-parse, I'd have to do things
>> differently in the indentation rules as well. The key information they
>> need need is the location of the statement start for every indentation
>> point. So the forward parse could store the relevant statement start
>> (and maybe other stuff) with every token.
>
> Indeed.

I've gotten a trivial semantic grammar implemented, but it's not
returning any tags. It's frustrating (as were my first few days with
SMIE, come to think of it :).

>> Hmm.  The actual change to smie I'm asking for is very small (if we
>> leave out the cache); perhaps you could put that in, with a large "use
>> at your own risk" comment?
>
> Indeed, just exposing the stack is not that bad, and lets you solve your
> problem.  But it's kind of ugly.  Maybe I could instead provide
> a function that lets you query the particular part of the stack that
> you're interested in (that would make it easier to adapt to a new
> format of the stack, for example).
>
> Could you describe which part of the stack you need to know?

The top few tokens, which might be most of the stack. Here's the code
that examines the stack:

(defconst ada-indent-pre-begin-tokens
  '("declare-open"
    "declare-label"
    "is-subprogram_body"
    "is-package"
    "is-task_body"
    "is-entry_body"))

        ;; Parsing from beginning of buffer; examine stack
        (let ((stack smie--levels)
              stack-token
              (token nil))
          (while (null token)
            (setq stack-token (nth 0 (rassoc (pop stack) ada-indent-grammar)))
            (cond
             ((equal stack-token ";") nil)
             ((member stack-token ada-indent-pre-begin-tokens)
              (setq token "begin-body"))
             (t
              (setq token "begin-open"))))

          (if (>= (point) ada-indent-refine-forward-to)
              (throw 'ada-indent-refine-all-quit token)
            (throw 'local-quit token)))

In this case:

    function Function_1 return Integer
    is 
       Local_1 : integer;
       Local_2 : integer;
    begin

we see ";" from the local variables, then "is-subprogram_body", and we
stop.

For a package-level "begin", we see a ";" for each intervening
declaration, then "is-package", and stop. 

I guess you could provide a function that scans the stack, skipping ";",
and returning the next token. That would work in this particular case,
but I'm not sure about the other cases I need.

-- 
-- Stephe



reply via email to

[Prev in Thread] Current Thread [Next in Thread]