[RFC] Array types bounded by predicate

poke-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC] Array types bounded by predicate

From:	Jose E. Marchesi
Subject:	[RFC] Array types bounded by predicate
Date:	Mon, 16 Sep 2024 21:26:50 +0200
User-agent:	Gnus/5.13 (Gnus v5.13)

Hello people!

First of all, I will abuse your patience with a little recap.  Currently
the Poke language supports three kind of array types:

1. "Unbounded" array types, like

    int<32>[]

   When mapped, values get added to the array until either EOF happens
   or a constraint expression fails.  Suppose for example:

   (poke) type Printable_Char = struct { uint<8> c : c >= 32 && c <= 126; }
   (poke) Printable_Char[] @ 0#B

   The resulting array will contain all printable chars from the
   beginning of the IO space to either the end or to where a
   non-printable char is found.  The first non-printable char (the
   Printable_Char whose mapping results in E_constraint) is not included
   in the array.

   Constructing an unbounded array results in an empty array:

   (poke) Printable_Char[]()
   []

2. Array types bounded by number of elements, like

     int<32>[10]

   Where the index can be any Poke expression that evaluates to an
   integral value.

   When mapped, exactly that number of values are read from the IO space
   to conform the elements of the new array.  If an array of the
   specified numer of elements cannot be mapped, i.e. if the bound is
   not satisfied, then an exception gets raised.

   Constructing an array bounded by number of elements results in an
   array with that number of elements:

   (poke) int<32>[5]()
   [0,0,0,0,0]
   (poke) int<32>[5](100)
   [100,100,100,100,100]

3. Array types bounded by size, like

     int<32>[16#B]

   Where the index can be any Poke expression that evaluates to an
   offset value, which is the size of the whole array, not of each
   element.

   When mapped, an exact number values are read from the IO space to
   conform the elements of the new array, so the total size of the array
   equals the size bound.  If no exact number of values can be mapped to
   satisfy the bound an exception is raised.

   Constructing an array bounded by size results in an array with the
   number of elements that satisfy the bound:

   (big:poke) int<32>[16#B]()
   [0,0,0,0]

This works well.

However, consider the following two real-life situations:


a) The ASN1 BER encoding specify variable byte sequences, which are
   basically streams of bytes that end with two consecutive zero bytes.

   This is how the asn1-ber.pk pickle implements these sequences:

   type BER_Variable_Contents =
     struct
     {
       type Datum =
         union
         {
           uint<8>[2] pair : pair[1] != 0UB;
           uint<8> single : single != 0UB;
         };

       Datum[] data;
       uint<8>[2] end : end == [0UB,0UB];

       method get_bytes = uint<8>[]:
       {
         var bytes = uint<8>[]();
         for (d in data)
           try bytes += d.pair;
           catch if E_elem { bytes += [d.single]; }
         return bytes;
       }
     };

   As you can see, the trick is to use an unbounded array of `Data'
   values, each of which is either a pair of bytes the second of which
   is not zero, or a single non-zero byte.  The finalizing two zero
   bytes follow.  This approach has several problems: it is difficult to
   understand at first sight, it is bulky, inefficient and it requires
   an additional method (get_bytes) to construct the desired array from
   the rather more convoluted underlying data structure.

b) The JoJo Diff format also uses variable sequences of bytes, but this
   time each sequence is finalized by one of several possible escapse
   sequences of two bytes, which are themselves _not_ part of the
   resulting sequence.

   This is how the jojodiff.pk pickle implements these sequences:

     type Jojo_Datum =
       union
       {
         uint<8>[2] escaped_escape : escaped_escape == [JOJO_ESC, JOJO_ESC];
         uint<8>[2] pair : pair[0] == JOJO_ESC
             && !(pair[1] in [JOJO_MOD, JOJO_INS, JOJO_DEL, JOJO_EQL, 
JOJO_BKT]);
         uint<8> value : value != JOJO_ESC;
       };


    type Jojo_Bytes =
      struct
      {
        Jojo_Datum[] data : data'length > 0;

        method get_count = Jojo_Offset:
        {
          var len = 0UL#B;

          for (d in data)
            len += !(d.pair ?! E_elem) ? 2#B : 1#B;
          return len;
        }
      };

   This uses a similar technique than the BER case.  Again, this is
   cumbersome, not trivial to understand, and again it builds convoluted
   data structures that have to be turned into the expected simple form
   of a sequence of elements by a method.

There are many more similar such cases.  In order to better support
these, I am proposing to a fourth kind of array type:

4. Array types bounded by predicate, like

     int<32>[lambda (int[] a, int e) int: { return e != 0; }]

   Where the index is any expression that evaluates to a closure with
   prototype:

     (ELEMTYPE[],ELEMTYPE)int<32>

   The predicate gets called for each value that is to be appended to
   the array when mapping or constructing an array value.   The first
   argument is the so-far mapped/constructed array.  The first time the
   predicate gets called this array is empty.  The second argument is
   the value that is a candidate to be added as an element.

   If the predicate returns a positive number the candidate value gets
   added to the array and the construction/mapping process continues
   with a new candidate value.

   If the predicate returns 0 the candidate value gets added to the
   array and the construction/mapping process is stopped.

   If the predicate returns -N the candidate value gets added to the
   array, N elements get trimmed from the right of the array, and the
   construction/mapping process is stopped.

The previous two data structures can now be expressed in a much better
way.  A BER variable byte sequence becomes:

  fun ber_is_byte_array = (uint<8>[] arr, uint<8> elem) int<64>:
  {
    if (arr'length > 0 && arr[arr'length-1] == 0UB && elem == 0UB)
      /* The trailer zero bytes are part of the array.  */
      return 0;
    else
      /* Continue mapping/constructing.  */
      return 1;
  }

  type BER_Variable_Contents = uint<8>[ber_is_byte_array];

Whereas the Jojo Diff variable sequence of bytes becomes something like:

  fun jojo_is_bytes = (uint<8>[] arr, uint<8> elem) int<64>:
  {
    if (arr'length > 1
        && [arr[arr'length - 1],elem] in [[JOJO_ESC, JOJO_INS],
                                          [JOJO_ESC, JOJO_MOD],
                                          ...])
      {
        /* The escape sequence is _not_ part of the
           resulting array.  */
        return -2;
      }
    else
     /* Continue the mapping/construction.  */
     return 1;
  }

  type Jojo_Bytes = uint<8>[jojo_is_bytes];

An interesting aspect of this is that nothing prevents the predicate
functions to alter the array mapped/constructed so far, do all sort of
stuff based on global variables, do their own exploratory mapping, and
other crazy stuff... guaranteed fun and sheer power 8-)

No additional syntax is required, and I think it fits well with the
already used notion of "the bounding varies depending on the kind of
expression one specifies in the [XXX] part of the array type specifier".

Comments, thoughts?

[Prev in Thread]

Current Thread

[Next in Thread]

[RFC] Array types bounded by predicate, Jose E. Marchesi <=
- Re: [RFC] Array types bounded by predicate, Jose E. Marchesi, 2024/09/16

Prev by Date: [COMMITTED] poke: add field "optioness" info to .info type
Next by Date: Re: [RFC] Array types bounded by predicate
Previous by thread: [COMMITTED] poke: add field "optioness" info to .info type
Next by thread: Re: [RFC] Array types bounded by predicate
Index(es):
- Date
- Thread