Re: locale for scanf backed out

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: locale for scanf backed out

From:	Rik
Subject:	Re: locale for scanf backed out
Date:	Tue, 26 Feb 2013 14:07:36 -0800

On 02/26/2013 01:03 AM, Pascal Dupuis wrote:
> 2013/2/26 Rik <address@hidden>:
>> 2/25/13
>>
>> All,
>>
>> I backed out the locale changeset for scanf.  Unfortunately, due to issues
>> in the build system you will need to manually remove
>> libinterp/interpfcn/file-io.cc-tst before tests will pass.
>>
>> --Rik
> Rik,
>
> I find this a bit unfair. There was an agreement that supporting
> locale on printf is not a good idea; nor implement something like
> National Instruments extension to force the use of a decimal point.
I wouldn't have done it if I thought it unfair.  As Jordi has suggested,
the locale patch was primarily to help you resolve issues with CSV.  You
know how to build Octave and navigate Mercurial, so I would simply maintain
your patch for yourself locally until a better solution can be found.

Octave is really dedicated to matrix processing and not text processing. 
Things like locale and character encoding, and even regular expressions,
are just not as well handled as a more dedicated text processing language
like Perl or Python.    
>
> OTOH, I can assure you that changing the locales used in a CSV file is
> everything but easy. CSV files contains strings and numbers intermixed
> by separators. What about separators and points and comma inside
> strings ? What about string delimiter inside string like 'O''Reilly' ?
> Some broken programs even quote every number !
It is really complicated, which is why I wouldn't bother to re-invent the
wheel.  Perl programmers have been around a long time and have dealt with
most of these oddities.  I would rely on the vast archive of code out
there, in particular the CPAN module Text::CVS.

>
> The only sane way I found in the database package was
> 1) use strsplit() to decompose a line into fields
> 2) apply scant() onto each field. If it fails, try to detect a
> date-time pattern.
>
> As you can see, half the hard work is done inside strsplit, and we
> have the power of Octave to store each substring into a cell. Doing it
> from some external program implies to write  and debug a full strsplit
> replacement.
Attached is a rough draft for a script which converts csv files between
fr_BE.utf8 and the 'C' locale.  It can be customized to your particular
data format pretty easily by configuring the options to Text::CVS.

--Rik

cvtcsv.pl
Description: Perl program

locale_tst.pl
Description: Perl program

[Prev in Thread]

Current Thread

[Next in Thread]

locale for scanf backed out, Rik, 2013/02/25
- Re: locale for scanf backed out, Pascal Dupuis, 2013/02/26
  - Re: locale for scanf backed out, Jordi Gutiérrez Hermoso, 2013/02/26
  - Re: locale for scanf backed out, Philip Nienhuis, 2013/02/26
  - Re: locale for scanf backed out, Rik <=

Prev by Date: Re: Octave in Google Summer of Code 2013 / Low Level I/O
Next by Date: fpucw.h: No such file or directory
Previous by thread: Re: locale for scanf backed out
Next by thread: Re: citation()
Index(es):
- Date
- Thread