Re: xlsread in Octave 3.6.4

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xlsread in Octave 3.6.4

From:	Philip Nienhuis
Subject:	Re: xlsread in Octave 3.6.4
Date:	Mon, 30 Sep 2013 18:42:41 +0200
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Hey Philip, hope everything is going well so far.

I've take a look at io source on svn. Are you finished already?
Referring to __OCT_spsh_open__.m  line 51
"## Read file contents. For some reason fgetl needs to be called twice"
In OOXML you have to do this twice because the file hast just two lines.
The first line includes just this: "<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>"
All relevant datas are in the second line. Because of this I triggered
fgetl two times.


Yeah I found out this is valid for most XML files.

Do you just figured out in __OCT_spsh_open__.m whitch sheet should be
opened (given by name or index)?
So you need after  __OCT_spsh_open__.m
lastintf = "OCT";
xls.filename = filename;
? If yes, I can split-out this part easily from xlsxread.
Furthermore, what's up with this?
"## FIXME find a class that doesn't display as one looooong string"


That's only for ODS.

ODS, once unzipped, has all of its sheets & formatting in one big filecalled "content.xml".To avoid time-consuming re-reading and re-parsing these xml contentsagain and again for each sheet, I decided to just put the entire textstring + pointers to the sheets in the ods file pointer struct.

In unzipped OOXML however each sheet is in a separate file; so for .xlsxit makes sense to just store the location of the tmp dir containing theunzipped files.

Back to .ods: later I'll try if just storing the file position pointersand using fseeks etc. isn't too much of a performance penalty.That'll save memory, although reading really big spreadsheet filesprobably isn't daily routine (even at my work where we spend lots oftime digging around in large databases which are still mostly in Exceland .mat files).


<snip>

I've found a second mark for xlsx in __OCT_getusedrange__.m: line 43 to
50.


Yeah that's meant "for you".

I guess just extracting the used range from xlsxread is not possible as
you do for ods.


You mean: the really occupied cell range, not the formatted range.

Yeah, maybe for later. Although the latter would do for the moment. Aswe saw earlier it is kept somewhere in the top section of one of theunzipped files.Note that COM/ActiveX (i.e., Excel behind-the-scenes), and thereforeMatlab as well, has the very same issue. So it isn't all that bad.

__readnum is reading the value and the position of the value. The used
range results from that. So reading just the position in
__OCT_getusedrange__.m is possible, but this information must not be
lost.
But maybe you do this in __ods_get_sheet_dims__.m?


__ods_get_sheet_dims__ is only meant for ODS.

It was written a few years ago for older versions of jOpenDocument whenthose still had no methods for determining the used range. AsjOpenDOcument reads ods files as a long text string into the Java heapmemory, it was easy to copy it to the Octave world and apply Octave'sstring functions. Already then I saw it as a first step to reading .odsw/o Java. The unzip part has long been in the way until you sorted that out.

A shedule would be nice. Which file/function comes first, which
information goes in this file/function and which information have to go
out to the next...


See below.

Furthermore I've take a (short) look on xmlread.
No, it's not just "bad documented", it is simple not documented.


Let's say "not" is a subset of "bad" (or the other way round).

There *is* a help string (just try "help xmlread") so I don't agree itisn't documented at all :-)

Last night it occurred to me that the various Octave/Octave-Forgemailing lists from around the time Laurent committed these files mightcontain a clue.

Trying to use it with OOXML just give an error. "bad declaration" (It
stumbles over the first line (<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>).
By the way, Matlab isn't able to read it too with xmlread :)

It may be (but I know really little about XML) that a .dtd file(Document Type Description) is needed.The first line should be skipped in all cases. That is easily fixed inthe source of xmlread.



As to what I think is needed for xlsx in the io package (1.2.4):
-----------------------------
xlsread.m:
Nothing (apart from the help text header, I'll do that)

xlsopen.m, xls2oct.m, xlsclose.m:

Extra stanzas for OCT interface, plus header text. A'la similar partsfor odsopen/ods2oct/odsclose

I could do these too, it's not much work.

private/ functions:
-------------------
__OCT_spsh_open__.m

Unzipping the xlsx file, storing the unzipped dir pointer in the filepointer struct, perhaps also the sheet file names. Most is already inplace (at the top), there's a section at the bottom for xlsx.


__OCT_xlsx2oct__.m
New file. Contains most of your xlsxread.

__OCT_getusedrange__.m

Needs a little stanza where the stored used range is to be assessed.Easy, use the <dimension ref="<range>" /> from the top of the worksheetfiles; you could use getxmlnode.m/getxmlattv.m to get the value,parse_sp_range to translate it to row/column numbers.


__OCT_spsh_close__.m

A stanza for xlsx. Just wiping the tmpdir and returning an empty structis all that is needed.


__OCT_spsh_info__.m

To be assessed later. AFAICS it could be used w/o change if you set upthe file pointer struct (output of __OCT_spsh_open__.m) similar to whatis done for .ods

FYI I have plans to include reading gnumeric files and add them to theOCT interface as well. gnumeric also has XML files.



On a related note:
Markus wrote:
> Am 2013-09-18 22:34, schrieb Philip Nienhuis:
<snip>
>> * do you want to add more options, like
>> > specifying a range to read
>
> Range-read would be nice but it has the lowest priority for me.

It doesn't look too hard to get a long way here. After all you caneasily strip a subsection of all rows in a worksheet and apply'xlsxread' to it. One might read to many columns but that's a matter ofstripping those away.



Philip

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], (continued)
- Re: xlsread in Octave 3.6.4, Philip Nienhuis <=
  - Re: xlsread in Octave 3.6.4, Philip Nienhuis, 2013/09/30
    - Re: xlsread in Octave 3.6.4, Markus, 2013/09/30
    - Re: xlsread in Octave 3.6.4, Philip Nienhuis, 2013/09/30

Prev by Date: Please ship an AppData file upstream
Next by Date: Re: xlsread in Octave 3.6.4
Previous by thread: Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4]
Next by thread: Re: xlsread in Octave 3.6.4
Index(es):
- Date
- Thread