[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-arm] [PATCH 01/23] scripts: Add decodetree.py
From: |
Peter Maydell |
Subject: |
Re: [Qemu-arm] [PATCH 01/23] scripts: Add decodetree.py |
Date: |
Fri, 12 Jan 2018 11:57:44 +0000 |
On 18 December 2017 at 17:45, Richard Henderson
<address@hidden> wrote:
> To be used to decode ARM SVE, but could be used for any 32-bit RISC.
> It would need additional work to extend to insn sizes other than 32-bit.
>
> Signed-off-by: Richard Henderson <address@hidden>
I have some comments here (mostly about the syntax, error checking
and generated code, rather than the script itself), but overall I
like this approach and I think we can drop the "RFC" from the patchset.
> ---
> scripts/decodetree.py | 984
> ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 984 insertions(+)
> create mode 100755 scripts/decodetree.py
>
> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> new file mode 100755
> index 0000000000..acb0243915
> --- /dev/null
> +++ b/scripts/decodetree.py
> @@ -0,0 +1,984 @@
> +#!/usr/bin/env python
I asked on #qemu-devel for some review from people who are more
familiar with Python than I am. One of the suggestions (from
Marc-André Lureau) was to run pycodestyle on this and fix the
(mostly coding style nits) reported by it. (pycodestyle may
be called 'pep8' on older distros.)
> +#
> +# Generate a decoding tree from a specification file.
> +#
> +# The tree is built from instruction "patterns". A pattern may represent
> +# a single architectural instruction or a group of same, depending on what
> +# is convenient for further processing.
> +#
> +# Each pattern has "fixedbits" & "fixedmask", the combination of which
> +# describes the condition under which the pattern is matched:
> +#
> +# (insn & fixedmask) == fixedbits
> +#
> +# Each pattern may have "fields", which are extracted from the insn and
> +# passed along to the translator. Examples of such are registers,
> +# immediates, and sub-opcodes.
> +#
> +# In support of patterns, one may declare fields, argument sets, and
> +# formats, each of which may be re-used to simplify further definitions.
> +#
> +## Field syntax:
> +#
> +# field_def := '%' identifier ( unnamed_field )+ ( !function=identifier )?
> +# unnamed_field := number ':' ( 's' ) number
> +#
> +# For unnamed_field, the first number is the least-significant bit position
> of
> +# the field and the second number is the length of the field. If the 's' is
> +# present, the field is considered signed. If multiple unnamed_fields are
> +# present, they are concatenated. In this way one can define disjoint
> fields.
This syntax lets you specify that fields other than the first one in
a concatenated set are signed, like
10:5 | 3:s5
That doesn't seem to me like it's meaningful. Shouldn't the signedness
or otherwise be a property of the whole extracted field, rather than
an individual component of it? (In practice creating a signed combined
value is implemented by doing the most-significant component as sextract,
obviously.)
> +#
> +# If !function is specified, the concatenated result is passed through the
> +# named function, taking and returning an integral value.
> +#
> +# FIXME: the fields of the structure into which this result will be stored
> +# is restricted to "int". Which means that we cannot expand 64-bit items.
> +#
> +# Field examples:
> +#
> +# %disp 0:s16 -- sextract(i, 0, 16)
> +# %imm9 16:6 10:3 -- extract(i, 16, 6) << 3 | extract(i, 10, 3)
> +# %disp12 0:s1 1:1 2:10 -- sextract(i, 0, 1) << 11
> +# | extract(i, 1, 1) << 10
> +# | extract(i, 2, 10)
> +# %shimm8 5:s8 13:1 !function=expand_shimm8
> +# -- expand_shimm8(sextract(i, 5, 8) << 1
> +# | extract(i, 13, 1))
Do we syntax-check for accidentally specifying a field-def whose
components overlap (eg "3:5 0:5")? I think we should, but I didn't
see a check in a quick scan through the parsing code.
> +#
> +## Argument set syntax:
> +#
> +# args_def := '&' identifier ( args_elt )+
> +# args_elt := identifier
> +#
> +# Each args_elt defines an argument within the argument set.
> +# Each argument set will be rendered as a C structure "arg_$name"
> +# with each of the fields being one of the member arguments.
> +#
> +# Argument set examples:
> +#
> +# ®3 ra rb rc
> +# &loadstore reg base offset
> +#
> +## Format syntax:
> +#
> +# fmt_def := '@' identifier ( fmt_elt )+
> +# fmt_elt := fixedbit_elt | field_elt | field_ref | args_ref
> +# fixedbit_elt := [01.]+
> +# field_elt := identifier ':' 's'? number
> +# field_ref := '%' identifier | identifier '=' '%' identifier
> +# args_ref := '&' identifier
> +#
> +# Defining a format is a handy way to avoid replicating groups of fields
> +# across many instruction patterns.
> +#
> +# A fixedbit_elt describes a contiguous sequence of bits that must
> +# be 1, 0, or "." for don't care.
> +#
> +# A field_elt describes a simple field only given a width; the position of
> +# the field is implied by its position with respect to other fixedbit_elt
> +# and field_elt.
> +#
> +# If any fixedbit_elt or field_elt appear then all 32 bits must be defined.
> +# Padding with a fixedbit_elt of all '.' is an easy way to accomplish that.
What is a format that doesn't specify the full 32 bits useful for?
Do you have an example of one?
> +#
> +# A field_ref incorporates a field by reference. This is the only way to
> +# add a complex field to a format. A field may be renamed in the process
> +# via assignment to another identifier. This is intended to allow the
> +# same argument set be used with disjoint named fields.
> +#
> +# A single args_ref may specify an argument set to use for the format.
> +# The set of fields in the format must be a subset of the arguments in
> +# the argument set. If an argument set is not specified, one will be
> +# inferred from the set of fields.
> +#
> +# It is recommended, but not required, that all field_ref and args_ref
> +# appear at the end of the line, not interleaving with fixedbit_elf or
> +# field_elt.
> +#
> +# Format examples:
> +#
> +# @opr ...... ra:5 rb:5 ... 0 ....... rc:5
> +# @opi ...... ra:5 lit:8 1 ....... rc:5
> +#
> +## Pattern syntax:
> +#
> +# pat_def := identifier ( pat_elt )+
> +# pat_elt := fixedbit_elt | field_elt | field_ref
> +# | args_ref | fmt_ref | const_elt
> +# fmt_ref := '@' identifier
> +# const_elt := identifier '=' number
> +#
> +# The fixedbit_elt and field_elt specifiers are unchanged from formats.
> +# A pattern that does not specify a named format will have one inferred
> +# from a referenced argument set (if present) and the set of fields.
> +#
> +# A const_elt allows a argument to be set to a constant value. This may
> +# come in handy when fields overlap between patterns and one has to
> +# include the values in the fixedbit_elt instead.
> +#
> +# The decoder will call a translator function for each pattern matched.
> +#
> +# Pattern examples:
> +#
> +# addl_r 010000 ..... ..... .... 0000000 ..... @opr
> +# addl_i 010000 ..... ..... .... 0000000 ..... @opi
> +#
> +# which will, in part, invoke
> +#
> +# trans_addl_r(ctx, &arg_opr, insn)
> +# and
> +# trans_addl_i(ctx, &arg_opi, insn)
> +#
I notice in the generated code that all the trans_FOO functions
are global, not file-local. That seems like it's going to lead
to name clashes down the line, especially if/when we ever get
to supporting multiple different target architectures in a single
QEMU binary.
Also from the generated code, "arg_incdec2_pred" &c don't follow
our coding style preference for CamelCase for typedef names. On
the other hand it's not immediately obvious how best to pick
a camelcase approach for them...
> +if sys.version_info >= (3, 0):
> + re_fullmatch = re.fullmatch
> +else:
> + def re_fullmatch(pat, str):
> + return re.match('^' + pat + '$', str)
> +
> +def output_autogen():
> + output('/* This file is autogenerated. */\n\n')
"autogenerated by decodetree.py" might assist some future
person in tracking down how it got generated.
> +
> +def parse_generic(lineno, is_format, name, toks):
> + """Parse one instruction format from TOKS at LINENO"""
> + global fields
> + global arguments
> + global formats
> + global patterns
> + global re_ident
> +
> + fixedmask = 0
> + fixedbits = 0
> + width = 0
> + flds = {}
> + arg = None
> + fmt = None
> + for t in toks:
> + # '&Foo' gives a format an explcit argument set.
"explicit"
> +
> +def main():
> + global arguments
> + global formats
> + global patterns
> + global translate_prefix
> + global output_file
> +
> + h_file = None
> + c_file = None
> + decode_function = 'decode'
> +
> + long_opts = [ 'decode=', 'translate=', 'header=', 'output=' ]
> + try:
> + (opts, args) = getopt.getopt(sys.argv[1:], 'h:o:', long_opts)
> + except getopt.GetoptError as err:
> + error(0, err)
> + for o, a in opts:
> + if o in ('-h', '--header'):
> + h_file = a
> + elif o in ('-o', '--output'):
> + c_file = a
> + elif o == '--decode':
> + decode_function = a
> + elif o == '--translate':
> + translate_prefix = a
> + else:
> + assert False, 'unhandled option'
A --help would be helpful (as would documenting the command
line syntax in the comment at the top of the file).
thanks
-- PMM