[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RFC] Array types bounded by predicate
From: |
Jose E. Marchesi |
Subject: |
[RFC] Array types bounded by predicate |
Date: |
Mon, 16 Sep 2024 21:26:50 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) |
Hello people!
First of all, I will abuse your patience with a little recap. Currently
the Poke language supports three kind of array types:
1. "Unbounded" array types, like
int<32>[]
When mapped, values get added to the array until either EOF happens
or a constraint expression fails. Suppose for example:
(poke) type Printable_Char = struct { uint<8> c : c >= 32 && c <= 126; }
(poke) Printable_Char[] @ 0#B
The resulting array will contain all printable chars from the
beginning of the IO space to either the end or to where a
non-printable char is found. The first non-printable char (the
Printable_Char whose mapping results in E_constraint) is not included
in the array.
Constructing an unbounded array results in an empty array:
(poke) Printable_Char[]()
[]
2. Array types bounded by number of elements, like
int<32>[10]
Where the index can be any Poke expression that evaluates to an
integral value.
When mapped, exactly that number of values are read from the IO space
to conform the elements of the new array. If an array of the
specified numer of elements cannot be mapped, i.e. if the bound is
not satisfied, then an exception gets raised.
Constructing an array bounded by number of elements results in an
array with that number of elements:
(poke) int<32>[5]()
[0,0,0,0,0]
(poke) int<32>[5](100)
[100,100,100,100,100]
3. Array types bounded by size, like
int<32>[16#B]
Where the index can be any Poke expression that evaluates to an
offset value, which is the size of the whole array, not of each
element.
When mapped, an exact number values are read from the IO space to
conform the elements of the new array, so the total size of the array
equals the size bound. If no exact number of values can be mapped to
satisfy the bound an exception is raised.
Constructing an array bounded by size results in an array with the
number of elements that satisfy the bound:
(big:poke) int<32>[16#B]()
[0,0,0,0]
This works well.
However, consider the following two real-life situations:
a) The ASN1 BER encoding specify variable byte sequences, which are
basically streams of bytes that end with two consecutive zero bytes.
This is how the asn1-ber.pk pickle implements these sequences:
type BER_Variable_Contents =
struct
{
type Datum =
union
{
uint<8>[2] pair : pair[1] != 0UB;
uint<8> single : single != 0UB;
};
Datum[] data;
uint<8>[2] end : end == [0UB,0UB];
method get_bytes = uint<8>[]:
{
var bytes = uint<8>[]();
for (d in data)
try bytes += d.pair;
catch if E_elem { bytes += [d.single]; }
return bytes;
}
};
As you can see, the trick is to use an unbounded array of `Data'
values, each of which is either a pair of bytes the second of which
is not zero, or a single non-zero byte. The finalizing two zero
bytes follow. This approach has several problems: it is difficult to
understand at first sight, it is bulky, inefficient and it requires
an additional method (get_bytes) to construct the desired array from
the rather more convoluted underlying data structure.
b) The JoJo Diff format also uses variable sequences of bytes, but this
time each sequence is finalized by one of several possible escapse
sequences of two bytes, which are themselves _not_ part of the
resulting sequence.
This is how the jojodiff.pk pickle implements these sequences:
type Jojo_Datum =
union
{
uint<8>[2] escaped_escape : escaped_escape == [JOJO_ESC, JOJO_ESC];
uint<8>[2] pair : pair[0] == JOJO_ESC
&& !(pair[1] in [JOJO_MOD, JOJO_INS, JOJO_DEL, JOJO_EQL,
JOJO_BKT]);
uint<8> value : value != JOJO_ESC;
};
type Jojo_Bytes =
struct
{
Jojo_Datum[] data : data'length > 0;
method get_count = Jojo_Offset:
{
var len = 0UL#B;
for (d in data)
len += !(d.pair ?! E_elem) ? 2#B : 1#B;
return len;
}
};
This uses a similar technique than the BER case. Again, this is
cumbersome, not trivial to understand, and again it builds convoluted
data structures that have to be turned into the expected simple form
of a sequence of elements by a method.
There are many more similar such cases. In order to better support
these, I am proposing to a fourth kind of array type:
4. Array types bounded by predicate, like
int<32>[lambda (int[] a, int e) int: { return e != 0; }]
Where the index is any expression that evaluates to a closure with
prototype:
(ELEMTYPE[],ELEMTYPE)int<32>
The predicate gets called for each value that is to be appended to
the array when mapping or constructing an array value. The first
argument is the so-far mapped/constructed array. The first time the
predicate gets called this array is empty. The second argument is
the value that is a candidate to be added as an element.
If the predicate returns a positive number the candidate value gets
added to the array and the construction/mapping process continues
with a new candidate value.
If the predicate returns 0 the candidate value gets added to the
array and the construction/mapping process is stopped.
If the predicate returns -N the candidate value gets added to the
array, N elements get trimmed from the right of the array, and the
construction/mapping process is stopped.
The previous two data structures can now be expressed in a much better
way. A BER variable byte sequence becomes:
fun ber_is_byte_array = (uint<8>[] arr, uint<8> elem) int<64>:
{
if (arr'length > 0 && arr[arr'length-1] == 0UB && elem == 0UB)
/* The trailer zero bytes are part of the array. */
return 0;
else
/* Continue mapping/constructing. */
return 1;
}
type BER_Variable_Contents = uint<8>[ber_is_byte_array];
Whereas the Jojo Diff variable sequence of bytes becomes something like:
fun jojo_is_bytes = (uint<8>[] arr, uint<8> elem) int<64>:
{
if (arr'length > 1
&& [arr[arr'length - 1],elem] in [[JOJO_ESC, JOJO_INS],
[JOJO_ESC, JOJO_MOD],
...])
{
/* The escape sequence is _not_ part of the
resulting array. */
return -2;
}
else
/* Continue the mapping/construction. */
return 1;
}
type Jojo_Bytes = uint<8>[jojo_is_bytes];
An interesting aspect of this is that nothing prevents the predicate
functions to alter the array mapped/constructed so far, do all sort of
stuff based on global variables, do their own exploratory mapping, and
other crazy stuff... guaranteed fun and sheer power 8-)
No additional syntax is required, and I think it fits well with the
already used notion of "the bounding varies depending on the kind of
expression one specifies in the [XXX] part of the array type specifier".
Comments, thoughts?
- [RFC] Array types bounded by predicate,
Jose E. Marchesi <=