[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression
|
From: |
Jose E. Marchesi |
|
Subject: |
Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression |
|
Date: |
Mon, 20 Feb 2023 14:43:47 +0100 |
|
User-agent: |
Gnus/5.13 (Gnus v5.13) |
> Hi Jose.
>
> On Fri, Feb 17, 2023 at 12:19:51PM +0100, Jose E. Marchesi wrote:
>>
>> >
>> > What about having a new compile-time type for matched entities.
>> > Both useful in regular expression matching for strings and array of
>> > characters.
>> >
>> > Something like this:
>> >
>> > ```poke
>> > var m1 = "Hello pokers!" ~ /[hH]ello/,
>> > m2 = [0x00UB, 0x11UB, 0x22UB] ~ /\x11\x22/;
>> >
>> > if (m)
>> > {
>> > printf "matched at index %v and offset %v\n", m.index_begin,
>> > m.offset_begin;
>> > assert ("Hello pokers!"[m.index_begin:m.index_end] == "Hello");
>> > }
>> > else
>> > {
>> > assert (m.index_begin ?! E_elem);
>> > assert (m.offset_begin ?! E_elem);
>> > }
>> > ```
>> >
>> > We can use other fields for the giving the access to sub-groups.
>> >
>> > We can take an approach similar to `Exception` struct. But for `Matched`.
>> > Compiler can cast it to boolean when necessary.
>>
>> The idea is interesting. But I don't like the part of changing the
>> semantics of `if' like this: it is not orthogonal.
>>
>> Note that the syntactic construction that uses Exception only works with
>> exceptions:
>>
>> try STMT; catch if EXCEPTION { ... }
>>
>> If we could come with a syntactic construction for regular expression
>> matching, then it would be better IMO.
>>
>>
>
>
> What about this syntax:
>
> ```poke
> var matched_p = "Hello pokers!" ~? /[hH]ello/,
> matchinfo = "Hello pokers!" ~ /[hH]ello/;
>
> assert (matched_p isa int<32>);
> assert (matchinfo isa Matched);
>
> if (matchinfo.matched_p) { ... }
> ```
Hmm... that has the disadvantage of having to match twice.
It seems to me, we could make use of the exceptions by having ~ return a
Match struct and raising an E_nomatch exception when there is no match.
Then we can use the normal operators ?! and try-until and try-catch to
check for when there is no match.
>
> Now let's talk about regexp searching!
>
> ```poke
> var sr10 = /[nN]eedle/ $ "... needle in a haystack ...",
> sr11 = /[nN]eedle/ $ (byte[] @ 10#B);
> ```
>
> We can also translate struct patterns like `{ S | a == 0, b < 0, c == 15 }`
> to a regexp pattern and a bunch of constraints. Consider:
>
> ```poke
> set_endian (ENDIAN_LITTLE);
>
> type S = struct
> {
> int<8> a;
> int<32> b;
> int<9> c;
> };
>
> var search_results = { S | a == 0, b < 0, c == 15 } $ (byte[] @ 0#B),
> sr2 = { S | a == 0, b < 0, c == 15 }
> $ [0xaaUB, 0x55UB,
> 0x00UB,0xffUB,0xffUB,0xffUB,0xffUB,0x0fUB,0UB];
>
> // [0x00UB,0xffUB,0xffUB,0xffUB,0xffUB,0x0fUB,0UB] is the encoding of
> // S {a=0UB, b=-1, c=15}.
> ```
>
> That can be translated to something like this:
>
> ```poke
>
> var search_results = (byte[] bytes) lambda SearchResult:
> {
> var tmp = open (*__somehting*"),
> res = SearchResult {};
>
> try {
> var s = /\x00\(....\)\x0f\x00/ $ bytes,
> sub = s.subgroups;
>
> // sub[0] is the whole match.
>
> var b = sub[1].offset_begin,
> e = sub[1].offset_end;
>
> byte[e-b] @ tmp : 0#B = bytes[b:e];
> if ((int<32> @ tmp : 0#B) < 0)
> {
> // Found! fill in the `res` ...
> }
> } catch (Exception ex) {
> close (tmp);
> raise ex;
> }
>
> close (tmp);
> return m;
> } (byte[] @ 0#B);
> ```
>
>
> I guess using a regexp library may improve the searching performance.
> This just came to my mind. We can discuss more :)
>
>
> Regards,
> Mohammad-Reza
- [WIP][PATCH 1/2] pvm: add new pvm value: opaque values, Mohammad-Reza Nabipoor, 2023/02/14
- [WIP][PATCH 2/2] pkl,pvm: add support for regular expression, Mohammad-Reza Nabipoor, 2023/02/14
- Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression, Jose E. Marchesi, 2023/02/15
- Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression, Mohammad-Reza Nabipoor, 2023/02/15
- Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression, Jose E. Marchesi, 2023/02/15
- Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression, Mohammad-Reza Nabipoor, 2023/02/16
- Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression, Jose E. Marchesi, 2023/02/17
- Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression, Mohammad-Reza Nabipoor, 2023/02/19
- Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression,
Jose E. Marchesi <=
Re: [WIP][PATCH 1/2] pvm: add new pvm value: opaque values, Jose E. Marchesi, 2023/02/15