[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] 4.7 Defining Fields by Content
From: |
Aharon Robbins |
Subject: |
Re: [bug-gawk] 4.7 Defining Fields by Content |
Date: |
Mon, 21 Mar 2016 06:25:09 +0200 |
User-agent: |
Heirloom mailx 12.5 6/20/10 |
The cited RFC allows embedded newlines in fields; I think they have to be
inside quotes but am not sure.
Arnold
> Date: Tue, 15 Mar 2016 08:09:54 +1000
> From: Miriam English <address@hidden>
> To: address@hidden
> Subject: Re: [bug-gawk] 4.7 Defining Fields by Content
>
> Is it "normal" for csv files to have embedded linefeeds? All the csv
> files I've seen with special characters inside their fields have them
> written as escaped codes (such as \t, \n, \f, and so on) which are
> replaced with the actual characters on use. If raw control characters do
> exist inside fields of csv files then wouldn't a pass through to convert
> them to escaped codes solve that problem?
>
> Cheers,
>
> - Miriam
>
> Andrew J. Schorr wrote:
> > On Mon, Mar 14, 2016 at 09:40:14AM +0100, Marco Coletti wrote:
> >> This is just short of what is needed to correctly parse RFC 4180
> >> formatted data, in that it does not account for double quotes
> >> appearing as part of a field.
> >
> > But even with the enhanced FPAT you propose, unless I'm confused,
> > it still won't work with records containing embedded linefeed
> > characters. We have discussed in the past developing a CSV
> > input parser extension, but nobody has implemented it yet.
> > If you'd like to develop it, we would welcome the contribution
> > of such an extension, possibly for the gawkextlib project if not
> > appropriate for inclusion in mainline gawk.
> >
> > Regards,
> > Andy
> --
> As artists, it would be a hell of a lot easier if our audiences were
> more tolerant of our penchant for boring them.
> - Cory Doctorow