emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

access to parser stack in SMIE


From: Stephen Leake
Subject: access to parser stack in SMIE
Date: Sat, 06 Oct 2012 00:21:39 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (windows-nt)

I hit a major snag in the new Emacs Ada mode indentation engine, that I
resolved by allowing access to the stack SMIE uses in smie-next-sexp.
See the patch below; it was made from smie.el in Emacs 24.2.1. It
replaces the local variable `levels' (which contains the stack) with a
global variable `smie--levels', so ada-indent-forward-token can see the
stack contents.

The problem I had is that it is not possible to refine the Ada "begin"
keyword using only local information. "begin" is used in two ways: as
the _start_ of a block, and as the _divider_ between declarations and
statements in a block:

function F1 is
  <declarations>
begin -- divider (refined to "begin-body")
  <statements>
  begin -- block start (refined to "begin-open")
    <statements>
  end;
end;

These two uses must be different tokens in SMIE; "begin-open" is an
opener, while "begin-body" is neither opener nor closer.

The only way to figure out which role "begin" is playing is to parse
from the start of the compilation unit. Consider a package body:

package body Pack_1 is

  <declarations>

  function F1 is
    <declarations>
  begin -- divider
    <statements>
    begin -- block start
      <statements>
    end;
    <statements>
    begin -- block start
      <statements>
    end;
  end;

  begin -- divider
    <statements>
  end;

Here I've deliberately got the indentation wrong at the end, to
emphasize the ambiguity.

If we just look back or forward a few keywords from each "begin", we
can't tell which role it is playing. In particular, the package level
"begin" just looks like it follows a bunch of statements/declarations.
We must go all the way back to "package" to sort it out.

When all the tokens are properly disambiguated, SMIE can traverse
correctly from package "begin" to "package". But we can't do that while
disambiguating "begin"; that's circular.

The solution I found is to deliberately call `smie-forward-sexp'
starting at the beginning of the buffer. Then in
ada-indent-forward-token, when a "begin" is encountered, examine the
parser stack. We only have to look at a few tokens on the stack to
determine whether we've got a "divider" use or a "start" use.

Of course, parsing forward from the start of the buffer for each "begin"
is painfully slow (I tried that), so we also need a cache. I store the
refined keyword as a text property on each keyword;
ada-indent-forward-token and ada-indent-backward-token check for that
text property before calling the refine function.

The cache is invalidated when editing occurs before the keyword that has
cached information; this is handled by storing the maximum valid cache
position in `ada-indent-cache-max'; it is moved forward when a keyword
is refined, and backward by `ada-indent-after-change' when the buffer is
changed.

I use the cache for all keywords, since the mechanism is simple, and it
does provide some speed-up (some of the ada-indent-refine-* functions
are quite lengthy).

The full code for my current version is at
http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html#ada-mode-5.0
That includes the patched version of smie.el. At the moment, only the
function ada-indent-refine-begin uses the "parse from the beginning"
approach, but there are other keywords that need it; I plan to factor
that out soon.


I doubt that we really want to make the parser stack directly accessible
to smie clients; a copy would be safer. I implemented it this way
because it was the smallest change that let me test the idea. On the
other hand, there may be times when we'd like to add a token to the
stack, to handle broken source code, for example.

Also, the name of the global variable could be better; perhaps
`smie-parser-stack'.

It might also make sense to incorporate the refined keyword cache
mechanism into smie.

-- 
-- Stephe

--- smie.el
+++ smie.el
@@ -672,6 +672,9 @@ it should move backward to the beginning
   ;; has to be careful to distinguish those different cases.
   (eq (smie-op-left toklevels) (smie-op-right toklevels)))
 
+(defvar smie--levels nil
+  "Parser token stack lambda-bound in smie-next-sexp; `next-token' can examine 
stack to help refine.")
+
 (defun smie-next-sexp (next-token next-sexp op-forw op-back halfsexp)
   "Skip over one sexp.
 NEXT-TOKEN is a function of no argument that moves forward by one
@@ -691,7 +694,7 @@ Possible return values:
   (nil POS TOKEN): we skipped over a paren-like pair.
   nil: we skipped over an identifier, matched parentheses, ..."
   (catch 'return
-    (let ((levels
+    (let ((smie--levels
            (if (stringp halfsexp)
                (prog1 (list (cdr (assoc halfsexp smie-grammar)))
                  (setq halfsexp nil)))))
@@ -718,32 +721,32 @@ Possible return values:
               ;; A token like a paren-close.
               (assert (numberp     ; Otherwise, why mention it in smie-grammar.
                        (funcall op-forw toklevels)))
-              (push toklevels levels))
+              (push toklevels smie--levels))
              (t
-              (while (and levels (< (funcall op-back toklevels)
-                                    (funcall op-forw (car levels))))
-                (setq levels (cdr levels)))
+              (while (and smie--levels (< (funcall op-back toklevels)
+                                    (funcall op-forw (car smie--levels))))
+                (setq smie--levels (cdr smie--levels)))
               (cond
-               ((null levels)
+               ((null smie--levels)
                 (if (and halfsexp (numberp (funcall op-forw toklevels)))
-                    (push toklevels levels)
+                    (push toklevels smie--levels)
                   (throw 'return
                          (prog1 (list (or (car toklevels) t) (point) token)
                            (goto-char pos)))))
                (t
-                (let ((lastlevels levels))
-                  (if (and levels (= (funcall op-back toklevels)
-                                     (funcall op-forw (car levels))))
-                      (setq levels (cdr levels)))
+                (let ((lastlevels smie--levels))
+                  (if (and smie--levels (= (funcall op-back toklevels)
+                                     (funcall op-forw (car smie--levels))))
+                      (setq smie--levels (cdr smie--levels)))
                   ;; We may have found a match for the previously pending
                   ;; operator.  Is this the end?
                   (cond
                    ;; Keep looking as long as we haven't matched the
                    ;; topmost operator.
-                   (levels
+                   (smie--levels
                     (cond
                      ((numberp (funcall op-forw toklevels))
-                      (push toklevels levels))
+                      (push toklevels smie--levels))
                      ;; FIXME: For some languages, we can express the grammar
                      ;; OK, but next-sexp doesn't stop where we'd want it to.
                      ;; E.g. in SML, we'd want to stop right in front of
@@ -754,8 +757,8 @@ Possible return values:
                      ;;
                      ;; ((and (functionp (cadr (funcall op-forw toklevels)))
                      ;;       (funcall (cadr (funcall op-forw toklevels))
-                     ;;                levels))
-                     ;;  (setq levels nil))
+                     ;;                smie--levels))
+                     ;;  (setq smie--levels nil))
                      ))
                    ;; We matched the topmost operator.  If the new operator
                    ;; is the last in the corresponding BNF rule, we're done.
@@ -766,7 +769,7 @@ Possible return values:
                    ;; and is not associative, it's one of the inner operators
                    ;; (like the "in" in "let .. in .. end"), so keep looking.
                    ((not (smie--associative-p toklevels))
-                    (push toklevels levels))
+                    (push toklevels smie--levels))
                    ;; The new operator is associative.  Two cases:
                    ;; - it's really just an associative operator (like + or ;)
                    ;;   in which case we should have stopped right before.
@@ -778,8 +781,8 @@ Possible return values:
                    ;; - it's an associative operator within a larger construct
                    ;;   (e.g. an "elsif"), so we should just ignore it and keep
                    ;;   looking for the closing element.
-                   (t (setq levels lastlevels))))))))
-            levels)
+                   (t (setq smie--levels lastlevels))))))))
+            smie--levels)
         (setq halfsexp nil)))))



reply via email to

[Prev in Thread] Current Thread [Next in Thread]