emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SMIE


From: Stefan Monnier
Subject: Re: SMIE
Date: Thu, 10 Jul 2014 09:32:15 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux)

> In general I’m having a hard time connecting the dots between the BNF
> grammar table creation, the smie-rules (i.e. :before, :after, etc.),
> tokenization, indentation, and so forth, and how it all comes together
> to make this indentation machine work.

The tokenizer turns the text (sequence of chars) into a sequence of
"tokens", which should be thought of as "words".  This step gets rid of
things like comments and (syntactically insignificant) whitespace.

The BNF then describes which of those tokens are special (people usually
call them "keywords" if they look like human words and "operators" if
they look like math entities) and how they relate to each other to
provide structure.

These two together are sufficient to get C-M-f and C-M-b to do something
useful such as skip from "begin" to the matching "end" and vice-versa.
C-M-f/C-M-b also do something useful when called right before/after an
"infix" keyword: they skip over the corresponding right/left element.
E.g. C-M-b starting from right after "then" should jump to the matching
"if" (it may jump to either just before it or just after it, depending
on the particular way the BNF is specified), and C-M-f from right before
"+" should jump over the whole expression that's being added.

It's generally very useful to debug the BNF and tokenizer using C-M-f
and C-M-b: if these don't jump to the right place, then the indentation
rules won't work well anyway.  Note that these C-M-f and C-M-b may
sometimes stop before and sometimes after the "destination keyword",
depending on details of the BNF grammar, but usually either choice is
fine in the sense that it's "close enough" that the indentation rules
can then be made to work.

The indentation algorithm then works by looking at the immediately
surrounding tokens (the one before and the one after), asks the
rules-function what is the indentation rule for those tokens, and when
needed jumps to the "parent" with (more or less) C-M-b.

> there’s that. I’m slowly starting to pare things away. The bit you
> wrote about :list-intro is interesting. When you say that it sees two
> or more concatenated expressions, how does that tie in to the BNF
> grammar definitions?

When you have "exp1 exp2 exp3", there is no keyword/operator involved
(tho keyword/operators may be involved inside exp[123], of course), so
the BNF definition doesn't say anything about it.

When the BNF says ("BEGIN" exp "END"), SMIE interprets it as "we can
have any number of `exp' or any other non-special token between BEGIN
and END".  And that is true of all BNF rules in SMIE (SMIE always
accepts any number of non-special tokens anywhere between the special
tokens, and any repetition).

IOW it doesn't tie in to the BNF grammar definition at all, because this
is imposed implicitly for all grammars by SMIE's underlying algorithm.

E.g. the grammar

   (defvar my-smie-bnf '((id)
                         (exp ("BEGIN" exp "END")
                              (id ":=" exp))))

will happily consider "a b c := BEGIN d e END" as a valid `exp'.
Actually, it will also accept "BEGIN a b c END := BEGIN d e END",
because as soon as you have such a "parenthesis-like" element, SMIE will
also accept it anywhere (it is a really dumb parsing algorithm which
pays very little attention to the context).

> One final question. In the case of e.g.
> > Can't resolve the precedence cycle: .do < else:. < .do
> What does the placement of the dots (left of "do", right of "else:") mean?

SMIE reduces the BNF grammar into a simple table of precedences.
Each token gets assigned 2 precedences: a left-precedence and
a right-precedence.  So the "." indicates which precedence is affected:
the above says that there's a cycle between the left precedence of "do"
and the right precedence of "else:".

A good way to think about those < and > is as "parenthese":

The rule "else:. < .do" means that SMIE thinks (because of some part of
the BNF grammar) that if the code looks like:

   ... else: .. .. do ...

it should be parsed as

   ... else: ..(.. do ...

rather than as

   ... else: ..).. do ...

And the rule ".do < else:." (or better said "else:. > .do") says just
the reverse:

   ... else: .. .. do ...

it should be parsed as

   ... else: ..).. do ...

rather than as

   ... else: ..(.. do ...

So, SMIE doesn't know which is true.  Sadly, the code I wrote does not
keep track of where those inequalities come from, so it's unable to
point you to the particular part of the BNF rules which made it think
that "else:. < .do" or that ".do < else:.".

> As regarding inclusion in GNU ELPA, I'm just a caretaker for the
> project on behalf of the Elixir-lang people, but as it's already in
> MELPA I'm sure it's fine.

Inclusion into GNU ELPA is slightly different in that it's a half-way
inclusion in Emacs: on the technical side, it means that there's a Git
branch of the code kept along side Emacs's repository (and to which
Emacs maintainers can write) from which the GNU ELPA package get built,
and on the legal side it means that the code has to have the same
copyright status as Emacs, which concretely means that all non-trivial
contributors need to have signed some copyright paperwork.


        Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]