bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-datamash] Unable to use comma as field separator when the locale us


From: Jérémie Roquet
Subject: [Bug-datamash] Unable to use comma as field separator when the locale uses it as a decimal separator too
Date: Wed, 12 Sep 2018 17:42:09 +0200

Hi,

(First of all, as it's my first message on this mailing list, I'd like
to thank everyone involved in datamash, because that's a wonderful
tool.)

With datamash using the locale to determine which decimal separator to
use, the behavior becomes inconsistent when the field separator and
the decimal separator are the same.

For example:

  $ printf '1,2\n' | LC_NUMERIC=fr_FR.UTF-8 datamash -t, sum 1
  datamash: invalid numeric value in line 1 field 1: '1'
  $ printf '1,2\n' | LC_NUMERIC=fr_FR.UTF-8 datamash -t, sum 2
  2

whereas:

  $ printf '1,2\n' | LC_NUMERIC=en_US.UTF-8 datamash -t, sum 1
  1
  $ printf '1,2\n' | LC_NUMERIC=en_US.UTF-8 datamash -t, sum 2
  2

In my opinion, this is surprising because:
 - working on “raw” tabular data, you don't expect “smart” handling of
anything (that would have been less surprising from Microsoft Excel,
because it implements escaping of separators in formats like CSV);
 - parsing works well for the last column, but not for the others;
 - the error message reports a well-parsed numeric value, not an
obviously invalid one.

Assaf, you previously mentioned that you noted that “other GNU
utilities seem to ignore current locale when parsing input and always
accept decimal point” [1], and I'd argue that this would be the best
default approach.

What do you think?

Thanks and best regards,

[1] https://lists.gnu.org/archive/html/bug-datamash/2016-08/msg00001.html

-- 
Jérémie



reply via email to

[Prev in Thread] Current Thread [Next in Thread]