Re: xlsread in Octave 3.6.4

On Sat, Sep 14, 2013 at 12:47 AM, Philip Nienhuis <address@hidden> wrote:

Markus Bergholz wrote:
:
<snip>
:

>>>> Markus Bergholz wrote
>>>> > I haven't follow this thread and it's issue, but
i've wrote a
>>>> xlsxread
>>>> > function whitch don't need java.
>>>> > but it's very very rudimentary, works just with
linux and is a
>>>> quick&dirty
>>>> > write-down.

<snip>

Okey, some good and some bad news.
first i change tactics how i read the values. now i read first the
dimension and creat a corresponding matrix full of NaN.

Can't you just do "cell(nrows, ncolumns)" ?

I deem it best to parse the <dimenstion ref="A1:B2"\> part.

In this way i get no difficulties if the table won't start in A1.

Furthermore, if there are missing columns or rows because they are empty, you're not able to count the correct size anymore.

Or i have a error in reasoning :)

in the second step i read all numeric values (no t="s", no <f> etc) and
their position (e.g. B2). the third step is to transform B2 in the
octave index style (2,2) (atm limited from A1 to ZZZ9999)

How do you do that?

I've divided the characters and numbers with regexp and use the function __colnr in line 55.

In the io package (in the /private directory) there's a script I use for almost the same thing (parse_sp_range.m). I wrote that kludge 4 years ago just to be able to parse user-input spreadsheet ranges into row and column numbers, so there was no need to optimize for speed.

If needed it can be speeded up significantly by using base 26 number system for the column indices - on the web I found some examples at the time.

and copy the
value in the matrix full of NaN (vectorize, no loops. it's a small
regexp inferno and one time i use cellfun (there is much space for
improvements and code cleanup i guess).
This works very stable (http://p.osuv.de/index.php/Dlu1/) but it
increase run time and memory usage!
E.g. for excel2007.xlsx it was "Elapsed time is 0.078457 seconds" and is
now "Elapsed time is 0.17204 seconds".
But the huge example.xlsx (1234x1234) feels now like it uses java again!
:'-( :D

Yeah... I was more or less expecting that. Hopefully the slow-down can be kept within limits.
Once you got something working you can use the profiler to optimize the code where possible. (Hmmm if the optimizer is already in 3.6.4...)

FYI, while doing some searching myself I hit several comments in some regexp mailing lists that trying to tackle XML with regexp is the wrong way to proceed - the common advice is to invoke an XML parser.

So atm i lost one goal (speed/runtime) and reach one goal (robustness)
at the same time. but hey, it's still java-free 8-)
next week i'll going to test some more sheets at work (huge, folding etc).

In the meantime your work inspired me to wipe the dust off the old ODS pieces I had (for OpenOffice.org/LibreOffice). So, after having spent a few evenings I now have "native Octave" ODS reading capability in my development io package (now at 1.3.4-alpha). It can even recover spreadsheet formulas as text (could be useful for further processing and re-entering them in the spreadsheet).
I needed to create some basic XML functions for it (reading nodes, getting attribute values etc); they may also be useful for OOXML. But these script functions need speeding up as well.
I've attached them, perhaps you can use them?

Much like your xlsx scripts, reliably reading ODS with native Octave language is still a bit slow. Lots slower than e.g., jOpenDocument, but a little bit faster than ODF Toolkit (odfdom) - but OK the latter is a bit of a dinosaur IMO. When it comes to bigger spreadsheets nothing can beat UNO/Java (invoking LibreOffice through Java) - once LO itself is in memory, that is.

But OK, we now have an outlook for a Java- and ActiveX-free reading of ODS. Same for .xlsx.
Nice!

This weekend I'll sent you a few test Excel files with empty row/columns, merged cells, dates, times, booleans, formulas, erroneous formulas and -consequently- error values in the cached value attributes, etc. I'm curious as to how robust your xlsx thing handles those.

Philip

I've got a 3.7.6+ build on Arch Linux too (but i switched continuous) , so i'll going to try the profiler next days and take a look at your attached files and the parse_sp_range.m

And yes, I've been thinking about xmlread after the negative speed effects - and i've been thinking about rewrite it in C. But step-by-step ;)

--
icq: 167498924
XMPP|Jabber: address@hidden

From:	Markus Bergholz
Subject:	Re: xlsread in Octave 3.6.4
Date:	Sat, 14 Sep 2013 01:31:21 +0200