qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 11/14] qapi: add qapi2texi script


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH v3 11/14] qapi: add qapi2texi script
Date: Thu, 10 Nov 2016 17:37:52 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

Marc-André Lureau <address@hidden> writes:

> As the name suggests, the qapi2texi script converts JSON QAPI
> description into a texi file suitable for different target
> formats (info/man/txt/pdf/html...).
>
> It parses the following kind of blocks:
>
> Free-form:
>
>   ##
>   # = Section
>   # == Subsection
>   #
>   # Some text foo with *emphasis*
>   # 1. with a list
>   # 2. like that
>   #
>   # And some code:
>   # | $ echo foo
>   # | -> do this
>   # | <- get that
>   #
>   ##
>
> Symbol:
>
>   ##
>   # @symbol:
>   #
>   # Symbol body ditto ergo sum. Foo bar
>   # baz ding.
>   #
>   # @arg: foo
>   # @arg: #optional foo
>   #
>   # Returns: returns bla bla
>   #          Or bla blah
>   #
>   # Since: version
>   # Notes: notes, comments can have
>   #        - itemized list
>   #        - like this
>   #
>   # Example:
>   #
>   # -> { "execute": "quit" }
>   # <- { "return": {} }
>   #
>   ##

The following discussion belongs to review of qapi-code-gen.txt, but the
example here is more abstract and nicely visible, so let me use it.

The "Symbol" kind of block as shown above is for QMP commands with
arguments defined inline.

It also serves for QMP events: just omit the return value.

For struct types, the terminology is a bit off ("argument" instead of
"member"), but the basic structure less the return value still works.
No surprise, because a command is essentially a triple (name, arguments
type, return value type), where the arguments type is a struct type (but
see below for non-struct arguments).

>From there, it's a small step to using it for enum types.

But how to do union types is non-obvious.  In their general form, they
have common members, one of them the tag, and variant members depending
on the tag value.  How do we want such types be documented?

Alternate types can be documented like structs, I think.

What about QMP commands with arguments of the form 'data': 'TypeName'?
We typically replace the whole @arg: part by something like

    # For the arguments, see the documentation of BlockdevSnapshotSync.

Note that with 'boxed': true, TypeName could even be union or alternate.

My point is: the documentation needs work.  Whether we should do the
within this series or on top is debatable.

> That's roughly following the following BNF grammar:

EBNF, actually.

>
> api_comment = "##\n" comment "##\n"
> comment = freeform_comment | symbol_comment
> freeform_comment = { "#" text "\n" }

As we'll see below, your code actually requires at least one space after
"#" unless text is empty.

> symbol_comment = "#" "@" name ":\n" { freeform | member | meta }

Your code actually requires exactly one space between "#" and "@".  It
happily accepts additional text between : and newline.

freeform is undefined.  I guess you mean freeform_comment.

With that corrected, the grammar is ambiguous.  Your intent is obvious
enough, though.  Just not to a computer.

> member = "#" '@' name ':' [ text ] freeform_comment

Again, your code requires exactly one space after "#".

> meta = "#" ( "Returns:", "Since:", "Note:", "Notes:", "Example:", "Examples:" 
> ) [ text ] freeform_comment

Likewise.

> text = free-text markdown-like, "#optional" for members

Here, the grammar switches from EBNF to prose.

As is, the grammar can serve as semi-formal user documentation.  It
can't quite serve as specification for the parser.  As we'll see below,
the code processing doc strings is not a parser constructed from this
grammar.

Why am I so hung up on constructing parsers systematically from
grammars?  When you do that, you can derive the accepted language from
the *grammar*.  Generally *much* easier than from ad hoc parser code.
Moreover, a (sufficiently simple) grammar leads to a natural internal
representation, namely an abstract syntax tree.

But let's continue.

> Thanks to the following json expressions, the documentation is enhanced
> with extra information about the type of arguments and return value
> expected.

What exactly do you want to convey with this sentence?

> Signed-off-by: Marc-André Lureau <address@hidden>
> ---
>  scripts/qapi.py        | 175 ++++++++++++++++++++++++++-
>  scripts/qapi2texi.py   | 316 
> +++++++++++++++++++++++++++++++++++++++++++++++++
>  docs/qapi-code-gen.txt |  44 +++++--
>  3 files changed, 524 insertions(+), 11 deletions(-)
>  create mode 100755 scripts/qapi2texi.py
>
> diff --git a/scripts/qapi.py b/scripts/qapi.py
> index 21bc32f..ed52ee4 100644
> --- a/scripts/qapi.py
> +++ b/scripts/qapi.py

Warning: loads of comments ahead.  Doesn't mean there are loads of
problems!  I need to write to think clearly, and I'm sharing my thinking
to hopefully help us converge.

> @@ -122,6 +122,103 @@ class QAPIExprError(Exception):
>              "%s:%d: %s" % (self.info['file'], self.info['line'], self.msg)
>  
>  
> +class QAPIDoc(object):
> +    def __init__(self, parser):
> +        self.parser = parser
> +        self.symbol = None
> +        self.body = []
> +        # args is {'arg': 'doc', ...}

Rather terse, but I think I get what you mean.

> +        self.args = OrderedDict()
> +        # meta is [(Since/Notes/Examples/Returns:, 'doc'), ...]

More so.

> +        self.meta = []
> +        # the current section to populate, array of [dict, key, comment...]

Now you've reached my limit :)

"array of [...]" sounds like a list of lists.  But you probably mean
just a list.

We'll see below that dict is either self.args or self.meta, and key is a
key in dict.

> +        self.section = None
> +        self.expr_elem = None
> +
> +    def get_body(self):
> +        return "\n".join(self.body)

Not worth a doc string?  The existing code doesn't have any, but since
you provide them for other public methods of this class...

> +
> +    def has_meta(self, name):
> +        """Returns True if the doc has a meta section 'name'"""
> +        return next((True for i in self.meta if i[0] == name), False)

Less clever, but probably more obvious to readers lacking a degree in
Python generators:

           for i in self.meta:
               if i[0] == name:
                   return True
           return False

I'm not against using more advanced language features, but when the
"advanced" code needs more tokens than the "retarded" one...

> +
> +    def append(self, line):
> +        """Adds a # comment line, to be parsed and added in a section"""
> +        line = line[1:]

As we'll see below, @line is the full comment token text including the
initial '#', but not including the final '\n'.  First thing we do with
it is strip the initial '#'.  Makes me wonder why we pass it in.  It's
not wrong, of course.

Note that comments don't need to be full lines, they just happen to be
used that way most of the time.  Admittedly contrived counter-example:

    { 'enum': 'Empty', 'data': [] } ##
                                    # @XY:
                                    ##
    { 'enum': 'XY', 'data': ['x', 'y'] }

This makes the parser call doc.append("# @XY:").  The argument "# @XY:"
isn't a full line in the source.  It becomes a line in the documentation
text QAPIDoc wraps.

> +        if len(line) == 0:
> +            self._append_section(line)
> +            return

Special case for comment '#'.  Note that '# ' is processed as a normal
case.

> +
> +        if line[0] != ' ':
> +            raise QAPISchemaError(self.parser, "missing space after #")
> +
> +        line = line[1:]
> +        # take the first word out
> +        name = line.split(' ', 1)[0]

For a comment '# ', @name becomes '', because ''.split(' ', 1) returns
[''].  Works.

For a comment '#  Since:' (note two spaces), name becomes '', because
.split() returns ['', 'Since:'].  Thus, the keywords are only recognized
exactly after '# '.  Is this a good idea?

> +        if name.startswith("@") and name.endswith(":"):
> +            line = line[len(name):]
> +            name = name[1:-1]
> +            if self.symbol is None:
> +                # the first is the symbol this APIDoc object documents

Suggest

                   # The first @NAME: is the symbol being documented

> +                if len(self.body):
> +                    raise QAPISchemaError(self.parser, "symbol must come 
> first")

You and I know what 'symbol' means here, but the user should not need to
know.  Suggest something like "'@%s:' must come first".  I'm not
entirely happy with that either, but I don't have better ideas right
now.

If the parser was constructed from your grammar, we'd deal with the
first @NAME: not coming first differently.  According to the grammar, a
doc comment block either starts with '##\n# @' (a symbol block), or it
doesn't (a free-form block).  If the first line isn't a @NAME: line, but
later lines are, the parser decides it's a free-from block right there.
Any @NAME: lines it finds later would be parsed as free-form line.  If
we want to reject such lines in free-form blocks, the easiest way would
be a semantic check.

> +                self.symbol = name

Here's how we can tell what kind of block we're working on:

if .symbol is not None:
    this is a symbol block
elif .body is not None:
    this is a free-form block
else
    we don't know, yet

> +            else:
> +                # else an arg

Suggest
                   # Subsequent @NAME: are arguments and such

> +                self._start_args_section(name)

The remainder of the line after @NAME: is left in @line, and will be
stuffed into the current section or else the body after the conditional.
That's what the grammar specifies for argument NAME, in rule member.
It's not what it specifies for symbol NAME, in rule symbol_comment.

In particular, when nothing follows symbol NAME, we still stuff '' into
the body.  For instance,

    ##
    # @symbol:
    # 
    # lorem ipsum
    ##

produces self.body = ['', '', 'lorem ipsum'], which .get_body()
transforms into '\n\nlorem ipsum'.  Extra newline.  I guess it's all the
same to Texinfo in the end.

> +        elif self.symbol and name in (
> +                "Returns:", "Since:",
> +                # those are often singular or plural
> +                "Note:", "Notes:",
> +                "Example:", "Examples:"):
> +            # new "meta" section
> +            line = line[len(name):]
> +            self._start_meta_section(name[:-1])
> +
> +        self._append_section(line)
> +
> +    def _start_args_section(self, name):
> +        self.end_section()
> +        if self.args.has_key(name):
> +            raise QAPISchemaError(self.parser, "'%s' arg duplicated" % name)

Where's self.args[name] added?  Oh, I see, it's hidden in end_section().
Could it be added here instead?

> +        self.section = [self.args, name]
> +
> +    def _start_meta_section(self, name):
> +        self.end_section()
> +        if name in ("Returns", "Since") and self.has_meta(name):
> +            raise QAPISchemaError(self.parser, "'%s' section duplicated" % 
> name)

For the implementation, it's a section, but for the user, it's a line.
Suggest "Duplicate '%s'".

Comment on _start_args_section() applies.

> +        self.section = [self.meta, name]
> +
> +    def _append_section(self, line):
> +        """Add a comment to the current section, or the comment body"""

_append_to_section() or _section_append() would be clearer.  We're not
appending a section, we're appending *to* a section.  Or to the body,
which isn't a section.  Hmm.

> +        if self.section:
> +            name = self.section[1]
> +            if not name.startswith("Example"):
> +                # an empty line ends the section, except with Example
> +                if len(self.section) > 2 and len(line) == 0:
> +                    self.end_section()
> +                    return
> +                # Example is verbatim
> +                line = line.strip()

The comment is confusing, because it explains what we're not doing
elsewhere.  Comments should explain what we're doing here.

> +            if len(line) > 0:
> +                self.section.append(line)
> +        else:
> +            self.body.append(line.strip())
> +
> +    def end_section(self):
> +        if self.section is not None:

I prefer the more laconic "if self.section:", especially when
self.section can't have a false value other than None anyway.  Even more
I prefer making the fact that this does nothing when we don't have a
section immediately obvious:

           if not self.section:
               return
           do the work...

> +            target = self.section[0]
> +            name = self.section[1]
> +            if len(self.section) < 3:
> +                raise QAPISchemaError(self.parser, "Empty doc section")
> +            doc = "\n".join(self.section[2:])
> +            if isinstance(target, dict):
> +                target[name] = doc
> +            else:
> +                target.append((name, doc))
> +            self.section = None

Until the next _start_FOO_section(), self.section remains None.
Therefore, any text between here and the next section will be appended
to self.body.  This is going to mangle input that doesn't quite match
expectations.  Example:

    ##
    # @frobnicate:
    #
    # Frobnicate some frobs.
    #
    # @frobs: bla, bla, explain @frobs,
    #         bla.
    #
    #         Beware of the tiger!
    #
    #         Explain @frobs some more, bla, bla.
    #
    # @harder: frobnicate harder.
    ##
    { 'command': 'frobnicate',
      'data': { 'frobs': 'any', '*harder': 'bool' } }

Results in a QAPIDoc with

    .symbol = 'frobnicate'
    .args = { 'frobs': 'frobs bla, bla, explain,\nbla',
              'harder': 'frobnicate harder.' }
    .body = ['', '', 'Frobnicate some frobs.', '',
             'Beware of the tiger!', '',
             'Explain some more, bla, bla.', '']

and the following .texi:

    @deftypefn Command {} frobnicate @
    {('frobs': @var{any}, ['harder': @var{bool}])}

    @table @var
    @item frobs
    bla, bla, explain @var{frobs},
    bla.
    @item harder
    frobnicate harder.
    @end table


    Frobnicate some frobs.

    Beware of the tiger!

    Explain @var{frobs} some more, bla, bla.

The only way to detect the mistake is to read the generated
documentation attentively.  We can't rely on everybody doing that during
development.  Even when you do, the mangled output gives you no clue on
the mistake you made, unless you know how the doc generator works.

The generator could detect the mistake at the syntactical or at the
semantic level.

More general question: can you think of a legitimate reason for the
generator emitting documentation in a different order than it was
written?

If no, we should pick an internal representation that can represent the
input we have / envisage to have faithfully, then reject input it can't
represent faithfully.

> +
> +
>  class QAPISchemaParser(object):
>  
>      def __init__(self, fp, previously_included=[], incl_info=None):
> @@ -137,9 +234,15 @@ class QAPISchemaParser(object):

Let's see how parsing works.

>          self.line = 1
>          self.line_pos = 0
>          self.exprs = []
> +        self.docs = []
>          self.accept()
>  
>          while self.tok is not None:
> +            if self.tok == '#' and self.val.startswith('##'):
> +                doc = self.get_doc()
> +                self.docs.append(doc)
> +                continue
> +

Top-level doc comments get accumulated in list self.docs.

>              expr_info = {'file': fname, 'line': self.line,
>                           'parent': self.incl_info}
>              expr = self.get_expr(False)
> @@ -160,6 +263,7 @@ class QAPISchemaParser(object):
>                          raise QAPIExprError(expr_info, "Inclusion loop for 
> %s"
>                                              % include)
>                      inf = inf['parent']
> +
>                  # skip multiple include of the same file
>                  if incl_abs_fname in previously_included:
>                      continue
> @@ -171,12 +275,40 @@ class QAPISchemaParser(object):
>                  exprs_include = QAPISchemaParser(fobj, previously_included,
>                                                   expr_info)
>                  self.exprs.extend(exprs_include.exprs)
> +                self.docs.extend(exprs_include.docs)

The include's list of docs are spliced in.

>              else:
>                  expr_elem = {'expr': expr,
>                               'info': expr_info}
> +                if len(self.docs) > 0:
> +                    self.docs[-1].expr_elem = expr_elem
>                  self.exprs.append(expr_elem)

A non-include expression gets tied to the last doc comment.  This isn't
quite right.  Example:

    ##
    # @Empty:
    ##
    { 'enum': 'Empty', 'data': [] }

    { 'enum': 'XY', 'data': ['x', 'y'] }

The doc comment's expr_elem gets first set to the first enum expression,
then overwritten with the second one.

Less serious, but still somewhat weird:

    ##
    # Lorem ipsum
    ##

    # @Empty:
    { 'enum': 'Empty', 'data': [] }

The first comment block is a doc comment.  The second one isn't, and
gets ignored.  The first one's expr_elem gets set to the enum
expression.

The quick fix is to set .expr_elem only when it's still None.

The thorough fix is to integrate doc comments deeper into the syntax.

>  
> -    def accept(self):
> +    def get_doc(self):

Here, we know self.tok == '#', and self.val is thus the comment line
less the newline (see accept()).

> +        if self.val != '##':
> +            raise QAPISchemaError(self, "Doc comment not starting with '##'")

It starts with '##' alright, it just happens not to end there.  What
about "Junk after '##'"?  Or, if that's too laconic, perhaps "Junk after
'##' at start of documentation comment".

> +
> +        doc = QAPIDoc(self)
> +        self.accept(False)
> +        while self.tok == '#':
> +            if self.val.startswith('##'):
> +                # ## ends doc

"# ##" looks awkward.  Suggest

                   # End of doc comment

> +                if self.val != '##':
> +                    raise QAPISchemaError(self, "non-empty '##' line %s"
> +                                          % self.val)

Let's make the message similar to whatever message we emit for junk
after the initial ##.

> +                self.accept()
> +                doc.end_section()
> +                return doc
> +            else:
> +                doc.append(self.val)
> +            self.accept(False)
> +

Here, self.tok is whatever follows the doc comment.

> +        if self.val != '##':
> +            raise QAPISchemaError(self, "Doc comment not finishing with 
> '##'")

This check is incorrect.  For instance, input

    ##
    '## gotcha'

yields

    /dev/stdin:2:1: Doc comment not finishing with '##'

because self.tok == "'" and self.val == "## gotcha".

Either fail unconditionally here (then the final ## is required), or do
nothing (then it's optional).  Your commit message suggests the former.

> +
> +        doc.end_section()
> +        return doc
> +
> +    def accept(self, skip_comment=True):
>          while True:
>              self.tok = self.src[self.cursor]
>              self.pos = self.cursor
> @@ -184,7 +316,13 @@ class QAPISchemaParser(object):
>              self.val = None
>  
>              if self.tok == '#':
> +                if self.src[self.cursor] == '#':
> +                    # ## starts a doc comment

Suggest

                       # Start of doc comment

> +                    skip_comment = False
>                  self.cursor = self.src.find('\n', self.cursor)
> +                self.val = self.src[self.pos:self.cursor]

Puts the complete line less the newline into self.val.  I wonder whether
we should strip the initial '#' as well.

> +                if not skip_comment:
> +                    return

Make that

                   if not skip_comment:
                       self.val = self.src[self.pos:self.cursor]
                       return

>              elif self.tok in "{}:,[]":
>                  return
>              elif self.tok == "'":

Comment tokens are thrown away as before, except when the parser asks
for them by passing skip_comment=False, or when the comment token starts
with ##.  The parser asks while parsing a doc comment, in get_doc().

This is a backchannel from the parser to the lexer.  I'd rather avoid
such lexer hacks, but I guess we can address that on top.

A comment starting with ## inside an expression is now a syntax error.
For instance, input

    {
    ##

yields

    /dev/stdin:2:1: Expected string or "}"

Rather unfriendly error message, but we can fix that on top.

> @@ -779,6 +917,41 @@ def check_exprs(exprs):
>  
>      return exprs
>  
> +def check_docs(docs):
> +    for doc in docs:
> +        expr_elem = doc.expr_elem
> +        if not expr_elem:
> +            continue

A symbol block without an expr_elem should be an error, shouldn't it?

> +
> +        expr = expr_elem['expr']
> +        for i in ('enum', 'union', 'alternate', 'struct', 'command', 
> 'event'):
> +            if i in expr:
> +                meta = i
> +                break

We're working with expression trees here, not the QAPISchemaEntity
objects.  Okay as long as the work is sufficiently trivial.

> +
> +        info = expr_elem['info']
> +        name = expr[meta]
> +        if doc.symbol != name:
> +            raise QAPIExprError(info,
> +                                "Documentation symbol mismatch '%s' != '%s'"
> +                                % (doc.symbol, name))

Cryptic.  Suggest "Definition of '%s' follows documentation for '%s'" %
(name, doc.symbol).

> +        if not 'command' in expr and doc.has_meta('Returns'):
> +            raise QAPIExprError(info, "Invalid return documentation")
> +
> +        doc_args = set(doc.args.keys())

doc_args is the set of documented argument names.  Actually member names
when expr defines a type, but let's ignore that for now.

> +        if meta == 'union':
> +            data = expr.get('base', [])
> +        else:
> +            data = expr.get('data', [])
> +        if isinstance(data, dict):
> +            data = data.keys()
> +        args = set([k.strip('*') for k in data])

Suggest to rename k to name.

To see what args is, we first need to figure out what data is.

If expr defines an enum, expr['data'] is the list of of member names.
args is the set of member names.

If expr defines a struct type, expr['data'] is the OrderedDict mapping
local member names with optional '*' prefix to types.  args is the set
of local member names.  "Local", because additional members may be
inherited from a base type.

If expr defines an alternate type, likewise, except there should be no
'*' prefixes and no base type.

If expr defines a simple union type, expr['data'] is the OrderedDict
mapping tag value names to types.  The tag member is implicit.
expr['base'] should not exist.  args is therefore the empty set.

If expr defines a flat union type, data is the OrderedDict mapping tag
value names to types.  expr['base'] must exist, and is either a
dictionary or the name of a struct type.  If it's a dictionary, args is
the set of common members defined inline.  If it's a struct type name,
args is the set of characters in the type name.  Oops.  Test case: union
UserDefFlatUnion in qapi-schema-test.json.

Aside: covering all cases correctly is not trivial enough for expression
trees, I'm afraid.  Class QAPISchema has to do it.  Can we avoid
duplicating its logic here?  I'm not sure; the QAPISchemaEntity might
turn out to be just differently inconvenient.

If expr defines a command, data is either the OrderedDict mapping
argument names to types, or the name of the arguments type.  If it's a
dictionary, args is the set of argument names.  If it's a type name,
args is the set of characters in the type name.  Oops again.  Test case:
command user_def_cmd0 in qapi-schema-test.json.

If expr defines an event, likewise.  No test case in
qapi-schema-test.json.  Hole in the test coverage.

Conclusion: args is meant to be the set of arguments / members actually
defined.  For unions, that does *not* include variant members.

> +        if meta == 'alternate' or \
> +           (meta == 'union' and not expr.get('discriminator')):
> +            args.add('type')

For alternate and simple union types, add 'type' to the set of arguments
actually defined.

Recall that the set contains the member names for alternates, and
nothing for simple unions.

> +        if not doc_args.issubset(args):

Could use the <= operator: if not doc_args <= args.  Matter of taste.

> +            raise QAPIExprError(info, "Members documentation is not a subset 
> of"
> +                                " API %r > %r" % (list(doc_args), 
> list(args)))

Errors out if the doc string documents arguments that aren't actually
defined as far as we know.

"As far as we know", because we second-guess what's actually defined
based on the expression tree, and don't get all the cases right.  The
actual actual definitions are determined by class QAPISchema, and can be
found in its internal representation.  Hmm.

Failing to document an argument is not an error.  It probably should be,
at least in the longer run.

>  
>  #
>  # Schema compiler frontend
> diff --git a/scripts/qapi2texi.py b/scripts/qapi2texi.py
> new file mode 100755
> index 0000000..7e2440c
> --- /dev/null
> +++ b/scripts/qapi2texi.py

Considering the length of this review so far, I'm only skimming
generator part for now.

> @@ -0,0 +1,316 @@
> +#!/usr/bin/env python
> +# QAPI texi generator
> +#
> +# This work is licensed under the terms of the GNU LGPL, version 2+.
> +# See the COPYING file in the top-level directory.
> +"""This script produces the documentation of a qapi schema in texinfo 
> format"""
> +import re
> +import sys
> +
> +from qapi import *
> +
> +COMMAND_FMT = """
> address@hidden {type} {{{ret}}} {name} @
> +{{{args}}}
> +
> +{body}
> +
> address@hidden deftypefn
> +
> +""".format
> +
> +ENUM_FMT = """
> address@hidden Enum {name}
> +
> +{body}
> +
> address@hidden deftp
> +
> +""".format
> +
> +STRUCT_FMT = """
> address@hidden {type} {name} @
> +{{{attrs}}}
> +
> +{body}
> +
> address@hidden deftp
> +
> +""".format
> +
> +EXAMPLE_FMT = """@example
> +{code}
> address@hidden example
> +""".format
> +
> +
> +def subst_strong(doc):
> +    """Replaces *foo* by @strong{foo}"""
> +    return re.sub(r'\*([^_\n]+)\*', r'@emph{\1}', doc)
> +
> +
> +def subst_emph(doc):
> +    """Replaces _foo_ by @emph{foo}"""
> +    return re.sub(r'\s_([^_\n]+)_\s', r' @emph{\1} ', doc)
> +
> +
> +def subst_vars(doc):
> +    """Replaces @var by @var{var}"""
> +    return re.sub(r'@([\w-]+)', r'@var{\1}', doc)
> +
> +
> +def subst_braces(doc):
> +    """Replaces {} with @{ @}"""
> +    return doc.replace("{", "@{").replace("}", "@}")
> +
> +
> +def texi_example(doc):
> +    """Format @example"""
> +    doc = subst_braces(doc).strip('\n')
> +    return EXAMPLE_FMT(code=doc)
> +
> +
> +def texi_comment(doc):
> +    """
> +    Format a comment
> +
> +    Lines starting with:
> +    - |: generates an @example
> +    - =: generates @section
> +    - ==: generates @subsection
> +    - 1. or 1): generates an @enumerate @item
> +    - o/*/-: generates an @itemize list
> +    """
> +    lines = []
> +    doc = subst_braces(doc)
> +    doc = subst_vars(doc)
> +    doc = subst_emph(doc)
> +    doc = subst_strong(doc)
> +    inlist = ""
> +    lastempty = False
> +    for line in doc.split('\n'):
> +        empty = line == ""
> +
> +        if line.startswith("| "):
> +            line = EXAMPLE_FMT(code=line[1:])
> +        elif line.startswith("= "):
> +            line = "@section " + line[1:]
> +        elif line.startswith("== "):
> +            line = "@subsection " + line[2:]
> +        elif re.match("^([0-9]*[.)]) ", line):
> +            if not inlist:
> +                lines.append("@enumerate")
> +                inlist = "enumerate"
> +            line = line[line.find(" ")+1:]
> +            lines.append("@item")
> +        elif re.match("^[o*-] ", line):
> +            if not inlist:
> +                lines.append("@itemize %s" % {'o': "@bullet",
> +                                              '*': "@minus",
> +                                              '-': ""}[line[0]])
> +                inlist = "itemize"
> +            lines.append("@item")
> +            line = line[2:]
> +        elif lastempty and inlist:
> +            lines.append("@end %s\n" % inlist)
> +            inlist = ""
> +
> +        lastempty = empty
> +        lines.append(line)
> +
> +    if inlist:
> +        lines.append("@end %s\n" % inlist)
> +    return "\n".join(lines)
> +
> +
> +def texi_args(expr):
> +    """
> +    Format the functions/structure/events.. arguments/members
> +    """
> +    data = expr["data"] if "data" in expr else {}
> +    if isinstance(data, str):
> +        args = data
> +    else:
> +        arg_list = []
> +        for name, typ in data.iteritems():
> +            # optional arg
> +            if name.startswith("*"):
> +                name = name[1:]
> +                arg_list.append("['%s': @var{%s}]" % (name, typ))
> +            # regular arg
> +            else:
> +                arg_list.append("'%s': @var{%s}" % (name, typ))
> +        args = ", ".join(arg_list)
> +    return args
> +
> +def section_order(section):
> +    return {"Returns": 0,
> +            "Note": 1,
> +            "Notes": 1,
> +            "Since": 2,
> +            "Example": 3,
> +            "Examples": 3}[section]
> +
> +def texi_body(doc, arg="@var"):
> +    """
> +    Format the body of a symbol documentation:
> +    - a table of arguments
> +    - followed by "Returns/Notes/Since/Example" sections
> +    """
> +    body = "@table %s\n" % arg
> +    for arg, desc in doc.args.iteritems():
> +        if desc.startswith("#optional"):
> +            desc = desc[10:]
> +            arg += "*"
> +        elif desc.endswith("#optional"):
> +            desc = desc[:-10]
> +            arg += "*"
> +        body += "@item %s\n%s\n" % (arg, texi_comment(desc))
> +    body += "@end table\n"
> +    body += texi_comment(doc.get_body())
> +
> +    meta = sorted(doc.meta, key=lambda i: section_order(i[0]))
> +    for m in meta:
> +        key, doc = m
> +        func = texi_comment
> +        if key.startswith("Example"):
> +            func = texi_example
> +
> +        body += "address@hidden address@hidden quotation" % \
> +                (key, func(doc))
> +    return body
> +
> +
> +def texi_alternate(expr, doc):
> +    """
> +    Format an alternate to texi
> +    """
> +    args = texi_args(expr)
> +    body = texi_body(doc)
> +    return STRUCT_FMT(type="Alternate",
> +                      name=doc.symbol,
> +                      attrs="[ " + args + " ]",
> +                      body=body)
> +
> +
> +def texi_union(expr, doc):
> +    """
> +    Format an union to texi
> +    """
> +    args = texi_args(expr)
> +    body = texi_body(doc)
> +    return STRUCT_FMT(type="Union",
> +                      name=doc.symbol,
> +                      attrs="[ " + args + " ]",
> +                      body=body)
> +
> +
> +def texi_enum(_, doc):
> +    """
> +    Format an enum to texi
> +    """
> +    body = texi_body(doc, "@samp")
> +    return ENUM_FMT(name=doc.symbol,
> +                    body=body)
> +
> +
> +def texi_struct(expr, doc):
> +    """
> +    Format a struct to texi
> +    """
> +    args = texi_args(expr)
> +    body = texi_body(doc)
> +    return STRUCT_FMT(type="Struct",
> +                      name=doc.symbol,
> +                      attrs="@{ " + args + " @}",
> +                      body=body)
> +
> +
> +def texi_command(expr, doc):
> +    """
> +    Format a command to texi
> +    """
> +    args = texi_args(expr)
> +    ret = expr["returns"] if "returns" in expr else ""
> +    body = texi_body(doc)
> +    return COMMAND_FMT(type="Command",
> +                       name=doc.symbol,
> +                       ret=ret,
> +                       args="(" + args + ")",
> +                       body=body)
> +
> +
> +def texi_event(expr, doc):
> +    """
> +    Format an event to texi
> +    """
> +    args = texi_args(expr)
> +    body = texi_body(doc)
> +    return COMMAND_FMT(type="Event",
> +                       name=doc.symbol,
> +                       ret="",
> +                       args="(" + args + ")",
> +                       body=body)
> +
> +
> +def texi(docs):
> +    """
> +    Convert QAPI schema expressions to texi documentation
> +    """
> +    res = []
> +    for doc in docs:
> +        try:
> +            expr_elem = doc.expr_elem
> +            if expr_elem is None:
> +                res.append(texi_body(doc))
> +                continue
> +
> +            expr = expr_elem['expr']
> +            (kind, _) = expr.items()[0]
> +
> +            fmt = {"command": texi_command,
> +                   "struct": texi_struct,
> +                   "enum": texi_enum,
> +                   "union": texi_union,
> +                   "alternate": texi_alternate,
> +                   "event": texi_event}
> +            try:
> +                fmt = fmt[kind]
> +            except KeyError:
> +                raise ValueError("Unknown expression kind '%s'" % kind)
> +            res.append(fmt(expr, doc))
> +        except:
> +            print >>sys.stderr, "error at @%s" % qapi
> +            raise

What errors do you expect here?  I'm asking because catching exceptions
indiscrimatingly feels problematic.

> +
> +    return '\n'.join(res)
> +
> +
> +def parse_schema(fname):
> +    """
> +    Parse the given schema file and return the exprs
> +    """
> +    try:
> +        schema = QAPISchemaParser(open(fname, "r"))
> +        check_exprs(schema.exprs)
> +        check_docs(schema.docs)
> +        return schema.docs

You don't actually "return the exprs", you return a list of QAPIDoc.

Either return schema and let the caller retrieve its .docs, or rename
the function to parse_docs().  In either case, fix up the doc string.

> +    except (QAPISchemaError, QAPIExprError), err:
> +        print >>sys.stderr, err
> +        exit(1)

The function duplicates parts of QAPISchema.__init__().  The parts it
doesn't duplicate would be useful here, because without them, you might
be working on a broken schema.

Recommend to drop the function, and ...

> +
> +
> +def main(argv):
> +    """
> +    Takes schema argument, prints result to stdout
> +    """
> +    if len(argv) != 2:
> +        print >>sys.stderr, "%s: need exactly 1 argument: SCHEMA" % argv[0]
> +        sys.exit(1)
> +
> +    docs = parse_schema(argv[1])

... instead do

       schema = QAPISchema(argv[1])
       docs = schema.docs

with the obvious patch to make docs an attribute of schema:

diff --git a/scripts/qapi.py b/scripts/qapi.py
index ed52ee4..b17e7de 100644
--- a/scripts/qapi.py
+++ b/scripts/qapi.py
@@ -1424,7 +1424,9 @@ class QAPISchemaEvent(QAPISchemaEntity):
 class QAPISchema(object):
     def __init__(self, fname):
         try:
-            self.exprs = check_exprs(QAPISchemaParser(open(fname, "r")).exprs)
+            parser = QAPISchemaParser(open(fname, "r"))
+            self.exprs = check_exprs(parser.exprs)
+            self.docs = parser.docs
             self._entity_dict = {}
             self._predefining = True
             self._def_predefineds()

You don't use the common parse_command_line().  Probably the right
choice right now, as reusing it would require modifications (we have no
use for -c and -h at least), and wouldn't buy us much.

> +    print texi(docs)
> +
> +
> +if __name__ == "__main__":
> +    main(sys.argv)

Ignorant question: why is this __name__ == "__main__" raindance better
than the stupid way the qapi-{commands,event,introspect,types,visit}.py
work?

> diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt
> index 2841c51..d82e251 100644
> --- a/docs/qapi-code-gen.txt
> +++ b/docs/qapi-code-gen.txt
> @@ -45,16 +45,13 @@ QAPI parser does not).  At present, there is no place 
> where a QAPI
>  schema requires the use of JSON numbers or null.
>  
>  Comments are allowed; anything between an unquoted # and the following
> -newline is ignored.  Although there is not yet a documentation
> -generator, a form of stylized comments has developed for consistently
> -documenting details about an expression and when it was added to the
> -schema.  The documentation is delimited between two lines of ##, then
> -the first line names the expression, an optional overview is provided,
> -then individual documentation about each member of 'data' is provided,
> -and finally, a 'Since: x.y.z' tag lists the release that introduced
> -the expression.  Optional members are tagged with the phrase
> -'#optional', often with their default value; and extensions added
> -after the expression was first released are also given a '(since
> +newline is ignored.  The documentation is delimited between two lines
> +of ##, then the first line names the expression, an optional overview
> +is provided, then individual documentation about each member of 'data'
> +is provided, and finally, a 'Since: x.y.z' tag lists the release that
> +introduced the expression.  Optional members are tagged with the
> +phrase '#optional', often with their default value; and extensions
> +added after the expression was first released are also given a '(since
>  x.y.z)' comment.  For example:
>  
>      ##
> @@ -73,12 +70,39 @@ x.y.z)' comment.  For example:
>      #           (Since 2.0)
>      #
>      # Since: 0.14.0
> +    #
> +    # Notes: You can also make a list:
> +    #        - with items
> +    #        - like this
> +    #
> +    # Example:
> +    #
> +    # -> { "execute": ... }
> +    # <- { "return": ... }

Err, 'BlockStats' is a *type*, not a command.

> +    #
>      ##
>      { 'struct': 'BlockStats',
>        'data': {'*device': 'str', 'stats': 'BlockDeviceStats',
>                 '*parent': 'BlockStats',
>                 '*backing': 'BlockStats'} }
>  
> +It's also possible to create documentation sections, such as:
> +
> +    ##
> +    # = Section
> +    # == Subsection
> +    #
> +    # Some text foo with *emphasis*
> +    # 1. with a list
> +    # 2. like that
> +    #
> +    # And some code:
> +    # | $ echo foo
> +    # | -> do this
> +    # | <- get that
> +    #
> +    ##
> +

This suggests various kinds of markup, including section headers,
emphasis, numbered lists.  But what exactly is recognized?

>  The schema sets up a series of types, as well as commands and events
>  that will use those types.  Forward references are allowed: the parser
>  scans in two passes, where the first pass learns all type names, and

As discussed in my review of the cover letter, we have some additional
documentation work to do.

Also missing: tests.  A first cut should have a negative test for each
error qapi2texi.py can report, plus a modest suite of positive tests.  A
negative test consists of tests/qapi-schema/NAME.{json,out,err,exit}.
Positive tests can be added to tests/qapi-schema/qapi-schema-test.json,
or to a separate test schema, in case that's more convenient.

To get the most mileage out of positive tests, test-qapi.py should be
extended to show the resulting QAPIDoc.  Sketch appended.

diff --git a/tests/qapi-schema/test-qapi.py b/tests/qapi-schema/test-qapi.py
index ef74e2c..93cc709 100644
--- a/tests/qapi-schema/test-qapi.py
+++ b/tests/qapi-schema/test-qapi.py
@@ -55,3 +55,13 @@ class QAPISchemaTestVisitor(QAPISchemaVisitor):
 
 schema = QAPISchema(sys.argv[1])
 schema.visit(QAPISchemaTestVisitor())
+
+for doc in schema.docs:
+    expr_elem = doc.expr_elem
+    expr_info = expr_elem and expr_elem['info']
+    print 'doc %s %s' % (doc.symbol, expr_info)
+    for arg, text in doc.args.iteritems():
+        print '    arg=%s %s' % (arg, text)
+    for key, text in doc.meta:
+        print '    meta %s %s' % (key, text)
+    print '    body=%s' % doc.body



reply via email to

[Prev in Thread] Current Thread [Next in Thread]