chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] irregex and callbacks


From: Alex Shinn
Subject: Re: [Chicken-users] irregex and callbacks
Date: Thu, 2 Oct 2014 11:18:55 +0900

On Thu, Oct 2, 2014 at 8:22 AM, Andy Bennett <address@hidden> wrote:
Hi,

I am trying to use the browscap.org database to do HTTP User Agent
Classification.

This database consists of a (large) number of regexes and data about the
browser should the user agent string match that regex.

What I want to do is compile all the regexes together and be able to add
annotations such that I can match a UA string against this regex and get
back an idea of which pattern matched so that I can look up the
appropriate data.

i.e. I have a data structure keyed by "pattern" and I want to my input
to be something that matches that pattern rather than the pattern itself.

It seems that for this I need "Callbacks" but I don't really need full
callback support: I don't necessarily need to call an actual procedure
and I don't need to replace anything: I'm not doing a search/replace,
just a match. "All" I really need is to be able to annotate the FSM node
that matched with a little bit of data that I can get back.


You could use submatch info and check which submatch matched.
This would keep the matching as a single regexp, but you'd then
need a linear scan to see which submatch succeeded.

(define (irregex-merge-vector vec)
  (irregex `(or ,@(map (lambda (x) `(=> alt ,x)) (vector->list vec)))))

(define ua-vec ...)
(define all-ua-rx (irregex-merge-vector ua-vec))

(define (maybe-match-ua ua)
  (cond
    ((irregex-match all-ua-rx ua)
     => (lambda (m)
             (vector-reg ua-vec (irregex-match-numeric-index 'match-ua m '(alt)))))
    (else
      #f)))

although I believe irregex-match-numeric-index is not exported.
It's worth having a utility for this idiom.

-- 
Alex



Is this something that would be easy to add to irregex or can anyone
suggest any other alternative implementations that I might consider?


The PHP library that uses this browscap database (apparently) just does
a linear search by trying to match each regex in turn but I'd rather
keep that approach as a last resort.



Thanks for your help and any tips you can offer.



Regards,
@ndy

--
address@hidden
http://www.ashurst.eu.org/
0x7EBA75FF



reply via email to

[Prev in Thread] Current Thread [Next in Thread]