octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #50224] csv2cell() returns error when linefeed


From: ralf
Subject: [Octave-bug-tracker] [bug #50224] csv2cell() returns error when linefeed(s) are present in text fields
Date: Sat, 4 Feb 2017 15:53:13 +0000 (UTC)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0

Follow-up Comment #4, bug #50224 (project octave):

I am not sure, what the standard is on csv-files, that Octave needs to
support. 

In csv-files,
* numbers are either all quoted or all unquoted
* text strings are either all quoted or unquoted
* number formats may equal the format of IP addresses, which should not be
read-in as numbers, but as text strings that preserve the meaningful position
of the IP adresses' dots '.'
* spreadsheet row separators and linefeeds within spreadsheet cells share the
same separator "\n" 
* column separators may equal the fractional part separators of numbers

It seems that the original spreadsheet can only then be recovered properly, if
the csv-file format was chosen carefully.

One way to preserve original spreadsheet content is to format the csv-file
* (A) to embrace text in quotes '"'
* (B) to separate spreadsheet columns by ';'

(A) allows to distinguish between QUOTED text cells and UNQUOTED number cells
in situations where it is not clear, whether cell content is numerical or not
(e.g. IP addresses vs. numbers)
(A) allows to distinguish between spreadsheet row ends and text-cell-internal
linefeeds both encoded by the same newline character "\n"
(B) forces number cells to remain UNQUOTED (as an option in LibreOffice)
making sure that the QUOTED text strings can be distinguished from UNQUOTED
number strings

csv-files that fail to comply with (A) or (B), I think, cannot fully be
recovered by any software read-in, including Octave's csv2cell(), because
important information is lost during the encoding to csv.

*So, I guess, if csv2cell() is supposed to support the case, where text cells
are NOT embraced by quotes, then a fix wont be possible for the lack of
information within the chosen csv-format.*

In other words, the chosen csv-format determines, whether there is a way to
read-in text-cell-internal linefeeds unambiguously / correctly or not.

Chose quotes to surround your text cells in the csv-output and then correct
Octave-read-in of text-cell-internal linefeeds becomes feasible!

For this case of quoted, potentially linefeed-containing spreadsheet text
cells in the csv-file, I wrote hhcsv2cell(), see attached file hhcsv2cell.m
below.

hhcsv2cell() adapts to fractional part separators in numbers, that may vary
according to the chosen number format, usually either ',' or '.' as in
"123,456" or "123.456".

Code example for csv using column separator ';', quotation '"' and fractional
part separator ',' (see attached cde.csv and cde.ods):

[S,St] = hhcsv2cell( 'cde.csv', cSep=';', tSep='"', fSep=',' );


Note, that csv-files may embrace numbers in quotes for the case that the
column separator, e.g. ',', was chosen to equal the fractional part separator
in numbers. hhcsv2cell() does not convert quote-embraced number strings, e.g.
"123,456" to numerical values, but does so for unquoted 123,456. Chose ';' as
column separator in LibreOffice's csv output for numerical spreadsheet cells
to be formatted as unquoted numerical text strings in the csv-file. 

To summarize, in LibreOffice's csv output:
* good choice: You chose ';' as column separator cSep to have numerical
spreadsheet cells be output as UNQUOTED numbers in the csv-file, which ARE
converted to numbers by hhcsv2cell(). 
* bad choice:  You chose ',' as column separator cSep to have LibreOffice
QUOTE all numbers in the csv-file, which then NONE ARE converted to numbers by
[S,St]=hhcsv2cell() making its outputs S and St identical.
* good choice: You chose '"' as text separator tSep so hhcsv2cell() is able to
identify numerical spreadsheet cells and convert them to numbers in S (with St
encoding all cells as character strings including numerical cells)

If your choice of fSep fails to match the fractional part separators of
numbers in the csv-file, then number conversion may fail placing nan in S
whereever it applies.
 



(file #39662, file #39663, file #39664)
    _______________________________________________________

Additional Item Attachment:

File name: hhcsv2cell.m                   Size:7 KB
File name: cde.csv                        Size:0 KB
File name: cde.ods                        Size:9 KB


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?50224>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]