bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] 4.7 Defining Fields by Content


From: Aharon Robbins
Subject: Re: [bug-gawk] 4.7 Defining Fields by Content
Date: Mon, 21 Mar 2016 06:25:09 +0200
User-agent: Heirloom mailx 12.5 6/20/10

The cited RFC allows embedded newlines in fields; I think they have to be
inside quotes but am not sure.

Arnold

> Date: Tue, 15 Mar 2016 08:09:54 +1000
> From: Miriam English <address@hidden>
> To: address@hidden
> Subject: Re: [bug-gawk] 4.7 Defining Fields by Content
>
> Is it "normal" for csv files to have embedded linefeeds? All the csv 
> files I've seen with special characters inside their fields have them 
> written as escaped codes (such as \t, \n, \f, and so on) which are 
> replaced with the actual characters on use. If raw control characters do 
> exist inside fields of csv files then wouldn't a pass through to convert 
> them to escaped codes solve that problem?
>
> Cheers,
>
>       - Miriam
>
> Andrew J. Schorr wrote:
> > On Mon, Mar 14, 2016 at 09:40:14AM +0100, Marco Coletti wrote:
> >> This is just short of what is needed to correctly parse RFC 4180
> >> formatted data, in that it does not account for double quotes
> >> appearing as part of a field.
> >
> > But even with the enhanced FPAT you propose, unless I'm confused,
> > it still won't work with records containing embedded linefeed
> > characters. We have discussed in the past developing a CSV
> > input parser extension, but nobody has implemented it yet.
> > If you'd like to develop it, we would welcome the contribution
> > of such an extension, possibly for the gawkextlib project if not
> > appropriate for inclusion in mainline gawk.
> >
> > Regards,
> > Andy
> -- 
> As artists, it would be a hell of a lot easier if our audiences were
> more tolerant of our penchant for boring them.
>    - Cory Doctorow



reply via email to

[Prev in Thread] Current Thread [Next in Thread]