octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Moving spreadsheet I/O to core Octave


From: Philip Nienhuis
Subject: Re: Moving spreadsheet I/O to core Octave
Date: Fri, 8 May 2015 14:12:08 -0700 (PDT)

John W. Eaton wrote
> I'm moving this topic here from the help list.

It's a bit unfortunate that I as involved package maintainer will be away
until May 19, but in the time left until departure I'll respond a bit.


> On 6 May 2015 at 15:14:45 -0700 (PDT), Philip Nienhuis
> <

> address@hidden

> > wrote:
> 
>  > JWE only wants the spreadsheet I/O to go into core,
>  > and I'm not sure if that can be done completely.
>  > Some functionality can go over, some will have to
>  > remain in io due to incompatibility with licensing,
>  > too experimental or fragile code, or other limitations.
>  > As a result, smaller or larger parts of the io package
>  > will survive I think it'll happen with 4.2.0 at the
>  > earliest. With 4.0.0 on the doorstep I think there's
>  > no priority to pick this up.
> 
> If there are licensing issues for core Octave, then I suspect there are 
> licensing issues for a package as well.  Since I must have missed this 
> in the past, what exactly are the licensing problems?

Invoking proprietary programs (i.e., MS-Excel) through the COM/ActiveX
interface (that is supplied in the windows package).
At the time I was to introduce the spreadsheet I/O into the io package I
asked about this in the then octave-dev ML. The opinion I got back was that
is probably wasn't a problem as xlsread etc. didn't depend on Excel (there
were already other ways to have spreadsheet I/O) but invoked it as a system
library.
I can't find the post in question in the archives (it is dated 4 dec 2009),
although I have it in my own mail archive.

Later on there have been some more discussions on this subject in the
octave-maintainers ML on this.

BTW the reasons to invoke Excel itself were (1) Matlab compatibility, (2)
speed.
(2) is no more an issue these days, but the various Matlab idiosyncrasies
(e.g., dates, data ranges) make for some interesting confusion.


> Also, I don't like this argument of things being too experimental or 
> whatever.

My opinion is that large parts can go over, but some parts had better remain
as add-ons in a largely weeded-out io package. A good example is invoking
LibreOffice: fragile code on the upstream side, but it offers unique options
and file formats.


> I think the xlsread/write functions belong in core Octave. 
> Waiting until they are perfect before including them means they will 
> never be included.  If we just put them in core, a lot more people will 
> test them, look at them, and offer improvements.  That has always been 
> my experience.  Look at the history of the GUI, QtHandles, the Java 
> interface, and most recently, the audioread/write functions.  These all 
> received much more attention and improved more rapidly after being 
> included in core Octave.  I expect the same for xlsread/write.
> I have a patch that is nearly ready for doing this job, but I don't 
> think it is appropriate for 4.0 but it could certainly go on the default 
> branch now.



Personally I think xlsread/-write/-finfo plus their ods counterparts are
quite mature and (based on the download numbers of the io package), must
have had quite a bit of testing in the 5+ years they are in the io package.
But before dumping them into core more or less as-is it could be worthwile
to prepare a few things in advance.

Below is a mailing list text that I had prepared to post about spreadsheet
I/O for the release of the 4.0.0 Windows binary. The points it mentions are
equally valid for when xlsread & friends are moved to core.

========================================================

"
In the soon-to-be-expected Windows binary Octave installer the io package
will be included.
I was wondering how much of its spreadsheet I/O support we would like to
offer. The point is that some external dependencies might be needed, i.e.,
Java class libs (.jar files). 
(While it is possible to invoke MS-Excel directly through the windows
package, I think we shouldn't make Octave users depend on proprietary SW; if
they already have it: fine, it can be used; if not: almost equally fine, the
binary can offer alternatives.)

(In a sense this post is also a preparation for when xlsread/xlswrite will
be moved from the io package to core Octave, as the same issues will need to
be addressed then.)


If just the io package is included w/o dependencies, the following
spreadsheet file formats can be read and written (I've indicated
relative/subjective speeds of the current io package code compared to
Matlab's xlsread/xlswrite):
- .ods  (OpenOffice / LibreOffice Calc spreadsheet format). (Fairly slow
I/O)
- .xlsx  (MS-Excel 2007+ format, a.k.a. OOXML). (Very fast I/O)
- .gnumeric. (Fast I/O)

For older Excel formats, i.e., the still ubiquitous .xls formats (also
called BIFF8 and -older- BIFF5 = Matlab's "BASIC") and for .sxc (legacy
OpenOffice.org format), external SW is required in the form of Java-based
libraries. Octave's I/O for .ods based on native code is fairly slow, much
faster I/O is offered using Java-based dependencies. 
In a later post I'll explain exactly what candidate dependencies there are;
all of them have GPL or Apache licenses.
FYI, e.g., Fedora already includes some of those Java class libs. IOW,
including such Java-based dependencies doesn't look very alien to Octave.

So the question is whether we should include such Java-based spreadsheet I/O
libraries in the Windows binary. Or should we let users install those
libraries themselves.
Opinions?


Some people would even like to be able to process .csv files with xlsread,
because it allows to read/write mixed-type (numeric + text) csv files,
unlike csvread() does.  That would boil down to invoking io's csv2cell() and
cell2csv() functions behind the scenes.

If we do include Java support libs, (I'd be happy either way), I'm somewhat
inclined to suggest the following Java-based options:

- JExcelApi for the .xls format (BIFF8 ad BIFF8). See jexcelapi.sf.net
JExcelApi comprises just one .jar file; it is the only Java-based option
that can also read the older BIFF5 format (Matlab's "BASIC" format). Its
license is GPL 2 / LGPL 2. 
It does have some issues:
* (Technical) one can't mix read and write operations on an open file (using
xlsopen ... xls2oct ... oct2xls ... xlsclose) like all other "interfaces"
allow, but for most Windows users I'd expect that to be no problem as
they'll probably use xlsread / xlswrite that do just one read OR one write
operation at a time;
* I wonder if it is still actively maintained. The last code update is from
2009, the number of posts in the mailing list (Yahoo) is steadily
diminishing.
Yet jExcelAPI Just Works.

Alternatives:
-------------
- OpenXLS (openxls.sf.net). The fastest .xls I/O, LGPL-3, yet de facto
abandoned and it needs a file from the Google Web Toolkit (gwt-servlet.jar;
Apache 2.0 license). Its .xls support is good but .xlsx (OOXML) support is
buggy - but we do not need the OOXML support as it is covered natively in
the io package.

- Apache POI. Actively maintained, many many options including formula
validator and formula evaluator, Apache license.

- jOpenDocument for .ods, as it can also read the old .sxc format. See
www.jopendocument.org
Because of the very complicated setup of .ods files, the code that Octave
uses cannot be as optimized as e.g. the OOXML code can be (OOXML, while
complicated, is much more "predictable").
AFAIK jOpenDocument is by far the fastest option for .ods, actively
maintained and GPL-ed.
Alternative (just the one I'm aware of):
------------
odfdom (ODF Toolkit), https://incubator.apache.org/odftoolkit/.  IMO no good
option: slow code, an unstable API, the project may not be dead but is close
to coma, and the latest release, like a few earlier ones, turns out to
depend on some undocumented external, not included, Java functions (=> lousy
release management IMO).

For completeness:
For users that have LibreOffice or OpenOffice installed, the io package will
(on Windows) automatically try to load the required Java-based support SW -
*IF* that support SW was installed with LibreOffice/OpenOffice.org (it is an
option at install time that could be poorly understood by users).
If successful, a plethora of file format options becomes available, at least
for reading - for writing to file I've only added a few "filters".
"

Any opinions?

Philip




--
View this message in context: 
http://octave.1599824.n4.nabble.com/Moving-spreadsheet-I-O-to-core-Octave-tp4670310p4670313.html
Sent from the Octave - Maintainers mailing list archive at Nabble.com.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]