poke-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression


From: Mohammad-Reza Nabipoor
Subject: Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression
Date: Mon, 20 Feb 2023 02:48:15 +0100

Hi Jose.

On Fri, Feb 17, 2023 at 12:19:51PM +0100, Jose E. Marchesi wrote:
> 
> >
> > What about having a new compile-time type for matched entities.
> > Both useful in regular expression matching for strings and array of
> > characters.
> >
> > Something like this:
> >
> > ```poke
> > var m1 = "Hello pokers!" ~ /[hH]ello/,
> >     m2 = [0x00UB, 0x11UB, 0x22UB] ~ /\x11\x22/;
> >
> > if (m)
> >   {
> >     printf "matched at index %v and offset %v\n", m.index_begin, 
> > m.offset_begin;
> >     assert ("Hello pokers!"[m.index_begin:m.index_end] == "Hello");
> >   }
> > else
> >   {
> >     assert (m.index_begin ?! E_elem);
> >     assert (m.offset_begin ?! E_elem);
> >   }
> > ```
> >
> > We can use other fields for the giving the access to sub-groups.
> >
> > We can take an approach similar to `Exception` struct.  But for `Matched`.
> > Compiler can cast it to boolean when necessary.
> 
> The idea is interesting.  But I don't like the part of changing the
> semantics of `if' like this: it is not orthogonal.
> 
> Note that the syntactic construction that uses Exception only works with
> exceptions:
> 
>   try STMT; catch if EXCEPTION { ... }
> 
> If we could come with a syntactic construction for regular expression
> matching, then it would be better IMO.
> 
> 


What about this syntax:

```poke
var matched_p = "Hello pokers!" ~? /[hH]ello/,
    matchinfo = "Hello pokers!" ~ /[hH]ello/;

assert (matched_p isa int<32>);
assert (matchinfo isa Matched);

if (matchinfo.matched_p) { ... }
```


Now let's talk about regexp searching!

```poke
var sr10 = /[nN]eedle/ $ "... needle in a haystack ...",
    sr11 = /[nN]eedle/ $ (byte[] @ 10#B);
```

We can also translate struct patterns like `{ S | a == 0, b < 0, c == 15 }`
to a regexp pattern and a bunch of constraints.  Consider:

```poke
set_endian (ENDIAN_LITTLE);

type S = struct
  {
    int<8> a;
    int<32> b;
    int<9> c;
  };

var search_results = { S | a == 0, b < 0, c == 15 } $ (byte[] @ 0#B),
    sr2 = { S | a == 0, b < 0, c == 15 }
              $ [0xaaUB, 0x55UB, 0x00UB,0xffUB,0xffUB,0xffUB,0xffUB,0x0fUB,0UB];

// [0x00UB,0xffUB,0xffUB,0xffUB,0xffUB,0x0fUB,0UB] is the encoding of
// S {a=0UB, b=-1, c=15}.
```

That can be translated to something like this:

```poke

var search_results = (byte[] bytes) lambda SearchResult:
  {
    var tmp = open (*__somehting*"),
        res = SearchResult {};

    try {
      var s = /\x00\(....\)\x0f\x00/ $ bytes,
          sub = s.subgroups;

      // sub[0] is the whole match.

      var b = sub[1].offset_begin,
          e = sub[1].offset_end;

      byte[e-b] @ tmp : 0#B = bytes[b:e];
      if ((int<32> @ tmp : 0#B) < 0)
      {
        // Found! fill in the `res` ...
      }
    } catch (Exception ex) {
      close (tmp);
      raise ex;
    }

    close (tmp);
    return m;
  } (byte[] @ 0#B);
```


I guess using a regexp library may improve the searching performance.
This just came to my mind.  We can discuss more :)


Regards,
Mohammad-Reza



reply via email to

[Prev in Thread] Current Thread [Next in Thread]