[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression
|
From: |
Mohammad-Reza Nabipoor |
|
Subject: |
Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression |
|
Date: |
Mon, 20 Feb 2023 02:48:15 +0100 |
Hi Jose.
On Fri, Feb 17, 2023 at 12:19:51PM +0100, Jose E. Marchesi wrote:
>
> >
> > What about having a new compile-time type for matched entities.
> > Both useful in regular expression matching for strings and array of
> > characters.
> >
> > Something like this:
> >
> > ```poke
> > var m1 = "Hello pokers!" ~ /[hH]ello/,
> > m2 = [0x00UB, 0x11UB, 0x22UB] ~ /\x11\x22/;
> >
> > if (m)
> > {
> > printf "matched at index %v and offset %v\n", m.index_begin,
> > m.offset_begin;
> > assert ("Hello pokers!"[m.index_begin:m.index_end] == "Hello");
> > }
> > else
> > {
> > assert (m.index_begin ?! E_elem);
> > assert (m.offset_begin ?! E_elem);
> > }
> > ```
> >
> > We can use other fields for the giving the access to sub-groups.
> >
> > We can take an approach similar to `Exception` struct. But for `Matched`.
> > Compiler can cast it to boolean when necessary.
>
> The idea is interesting. But I don't like the part of changing the
> semantics of `if' like this: it is not orthogonal.
>
> Note that the syntactic construction that uses Exception only works with
> exceptions:
>
> try STMT; catch if EXCEPTION { ... }
>
> If we could come with a syntactic construction for regular expression
> matching, then it would be better IMO.
>
>
What about this syntax:
```poke
var matched_p = "Hello pokers!" ~? /[hH]ello/,
matchinfo = "Hello pokers!" ~ /[hH]ello/;
assert (matched_p isa int<32>);
assert (matchinfo isa Matched);
if (matchinfo.matched_p) { ... }
```
Now let's talk about regexp searching!
```poke
var sr10 = /[nN]eedle/ $ "... needle in a haystack ...",
sr11 = /[nN]eedle/ $ (byte[] @ 10#B);
```
We can also translate struct patterns like `{ S | a == 0, b < 0, c == 15 }`
to a regexp pattern and a bunch of constraints. Consider:
```poke
set_endian (ENDIAN_LITTLE);
type S = struct
{
int<8> a;
int<32> b;
int<9> c;
};
var search_results = { S | a == 0, b < 0, c == 15 } $ (byte[] @ 0#B),
sr2 = { S | a == 0, b < 0, c == 15 }
$ [0xaaUB, 0x55UB, 0x00UB,0xffUB,0xffUB,0xffUB,0xffUB,0x0fUB,0UB];
// [0x00UB,0xffUB,0xffUB,0xffUB,0xffUB,0x0fUB,0UB] is the encoding of
// S {a=0UB, b=-1, c=15}.
```
That can be translated to something like this:
```poke
var search_results = (byte[] bytes) lambda SearchResult:
{
var tmp = open (*__somehting*"),
res = SearchResult {};
try {
var s = /\x00\(....\)\x0f\x00/ $ bytes,
sub = s.subgroups;
// sub[0] is the whole match.
var b = sub[1].offset_begin,
e = sub[1].offset_end;
byte[e-b] @ tmp : 0#B = bytes[b:e];
if ((int<32> @ tmp : 0#B) < 0)
{
// Found! fill in the `res` ...
}
} catch (Exception ex) {
close (tmp);
raise ex;
}
close (tmp);
return m;
} (byte[] @ 0#B);
```
I guess using a regexp library may improve the searching performance.
This just came to my mind. We can discuss more :)
Regards,
Mohammad-Reza
- [WIP][PATCH 1/2] pvm: add new pvm value: opaque values, Mohammad-Reza Nabipoor, 2023/02/14
- [WIP][PATCH 2/2] pkl,pvm: add support for regular expression, Mohammad-Reza Nabipoor, 2023/02/14
- Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression, Jose E. Marchesi, 2023/02/15
- Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression, Mohammad-Reza Nabipoor, 2023/02/15
- Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression, Jose E. Marchesi, 2023/02/15
- Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression, Mohammad-Reza Nabipoor, 2023/02/16
- Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression, Jose E. Marchesi, 2023/02/17
- Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression,
Mohammad-Reza Nabipoor <=
- Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression, Jose E. Marchesi, 2023/02/20
Re: [WIP][PATCH 1/2] pvm: add new pvm value: opaque values, Jose E. Marchesi, 2023/02/15