|
From: | Juergen Sauermann |
Subject: | Re: [Bug-apl] Suggestion for Quad-RE |
Date: | Thu, 12 Oct 2017 14:55:57 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux i686; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 |
Hi Elias, see below. /// Jürgen On 10/12/2017 09:13 AM, Elias Mårtenson
wrote:
Not exactly. It is true that libpcre returns a list of matches in terms of the position of each match in the subject string B. However any two matches are either disjoint or one match is contained in the other. This containment relation defines a partial order between the matches which is most conveniently described by a tree. In that tree one RE, say RE1 is a child of another RE RE2 if the substring of B corresponding to RE2 is contained in the substring of B that corresponds to RE2. The question is then: shall ⎕RE simply return the array of matches (which was what your implementation did) or shall ⎕RE return the matches as a tree? This is the same question as shall the tree be represented as a simple vector of nodes (corresponding to an APL vector of some kind) or shall it be represented as a recursive node-properties + children structure (corresponding to a nested APL value)? The vector of nodes and the nested APL value are both equivalent in describing the tree. However, converting the nested tree structure to a vector of nodes is much simpler (in APL) than the other way around because converting a node vector to the tree involves a lot of comparisons which are quite lightweight but extremely ugly in APL. That was why decided to return the tree and not the vector of nodes. Now, to have an option that drops the first element means to have an option that returns the nodes of the result tree except its root node. Although technically possible, this sounds very arbitrary to me. It may suit a particular use case, but it do not, IMHO, deserve a special flag. I could also create a use case where it makes sense that only every second node of the tree is returned, for example when matching some name=value pairs where I am only interested in the values and not the names. I am not entirely against a flag that goes into that direction, but I believe that flag should determine if either the tree is returned (default) or the node vector of the of the tree if the flag is given. Unfortunately that flag, even though it is far more consistent with the structure of the ⎕RE result than 1↓, does not solve your 1↓ because it would still contain the top-level match (= the root of the tree). Not necessarily. It could also be a boundary condition of your match that you only want to be satisfied no matter how. REs like [A-Z][a-z][0-9] are often used that way. Not sure if that should be so but i am not too familiar with libpre2 either. I would naively expect that an RE of the form A|B would either return a match for A or a match for B but not both. man pcre2pattern says: Vertical bar characters are used to separate alternative patterns. For example, the pattern gilbert|sullivan matches either "gilbert" or "sullivan". Any number of alternatives may appear, and an empty alternative is permitted (matching the empty string). The matching process tries each alternative in turn, from left to right, and the first one that succeeds is used. My understanding of this is that, for example, B is ignored if A matches. That implies that the matching of B is not even performed so "" (for no match) would be incorrect because B could also match as well.
|
[Prev in Thread] | Current Thread | [Next in Thread] |