txr-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Txr-users] matching blank lines


From: Kaz Kylheku
Subject: Re: [Txr-users] matching blank lines
Date: Wed, 24 Feb 2010 14:40:48 -0800

On Wed, Feb 24, 2010 at 2:15 PM, Daniel Lyons <address@hidden> wrote:
> On Wed, Feb 24, 2010 at 11:52:58AM -0800, Kaz Kylheku wrote:
>> > Thanks for txr! This is an immensely helpful program and I'm looking 
>> > forward
>> > to using it in all sorts of situations where awk/sed/etc. are
>> > unpleasant.
>>
>> This program sucks now, but I have some ideas about improving it. :)
>
> I think it has a lot of potential. I am always looking for new
> utilities for processing text. I often have to convert data from
> obscure formats to CSV and general purpose languages often make it
> wordier than it has to be. Conversely, using sed and awk often
> produces unmaintainable code overly dependent on the shell for extra
> help. Txr isn't perfect but it does do a good job of straddling these
> two domains. Plus, with my Prolog background, I appreciate the
> unification-like behavior and the function semantics.
>
>> I see that you are trying to collect the code/price pairs like
>> A and 47.50 using a freeform collect.
>>
>> Unfortunately, in investigating this problem, I've found
>> that freeform is buggy. :(
>>
>> It looks like you might be expecting @(freeform "\n\n") to stop
>> when it sees two newlines, but in fact "\n\n" simply means
>> that the character sequence "\n\n" will separate the lines
>> when they are considered to be a big virtual string.
>
> I guess I'm having trouble visualizing the difference. The two sound
> almost the same to me.

"\n\n" simply specifies that the remaining lines of input are glued
together with "\n\n" characters
to form one giant string. You can use any string, so if the remaining input is

  hello
  world

then @(freeform "$") will cause it to be treated as "hello$world$"

> So, the method you're describing will _eventually_ work?

Yes; I'm treating this is as the highest priority problem,
so eventually should be ``soon''.  I will be sure to add
regression test cases to exercise freeform.


> Is this how you'd approach the problem with txr?

I might, but it seems like your pairs of items exhibit
line-oriented regularity. It doesn't look like the situation
requires free-form matching across multiple lines;
this looks like it could be solved by a line-oriented collect.

E.g consider this very simplified query:

@(collect)
@a @b @c
@(collect)
@/ */@x @y
@(until)

@(end)
@(end)

It collects a triplet of items separated by spaces,
followed by  nested collect of
pairs from separate lines. The nested collect
stops at a blank line, thanks to the @(until).

If we run it on this input:

1 2 3
  a 1
  b 2

4 5 6
  c 6
  d 7

We get:

a[0]="1"
a[1]="4"
b[0]="2"
b[1]="5"
c[0]="3"
c[1]="6"
x_0[0]="a"
x_1[0]="b"
x_0[1]="c"
x_1[1]="d"
y_0[0]="1"
y_1[0]="2"
y_0[1]="6"
y_1[1]="7"

Maybe this can be a recipe for a workaround for you.

Cheers!




reply via email to

[Prev in Thread] Current Thread [Next in Thread]