octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xlsread in Octave 3.6.4


From: Philip Nienhuis
Subject: Re: xlsread in Octave 3.6.4
Date: Wed, 04 Sep 2013 22:07:50 +0200
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Markus Bergholz wrote:



On Tue, Sep 3, 2013 at 11:42 PM, Philip Nienhuis <address@hidden
<mailto:address@hidden>> wrote:

    <moved from help-octave to octave-maintainers ML>

    Markus Bergholz wrote:

        On Mon, Sep 2, 2013 at 11:38 AM, Markus Bergholz
        <address@hidden <mailto:address@hidden>
        <mailto:address@hidden <mailto:address@hidden>>> wrote:
             On Mon, Sep 2, 2013 at 12:10 AM, Markus Bergholz
        <address@hidden <mailto:address@hidden>
        <mailto:address@hidden <mailto:address@hidden>>> wrote:
                 On Sun, Sep 1, 2013 at 11:42 PM, PhilipNienhuis
        <address@hidden <mailto:address@hidden>
        <mailto:address@hidden <mailto:address@hidden>>__>
        wrote:
                     Markus Bergholz wrote


    <snip>


         >>>>     Markus Bergholz wrote
         >>>> > I haven't follow this thread and it's issue, but
                     i've wrote a
         >>>>     xlsxread
         >>>> > function whitch don't need java.
         >>>> > but it's very very rudimentary, works just with
                     linux and is a
         >>>>     quick&dirty
         >>>> > write-down.
         >>>> > furthermore, you have to remove the string-analyse
                     part, if your
         >>>>     sheet
         >>>> > don't contain strings.
         >>>> > but maybe it helps someone else or someone want to
                     improve it or
         >>>>     someone
         >>>> > rewrite it in c/c++ as oct file, to get it even
                     faster than
         >>>>     matlab (for me
         >>>> > it's still faster than the java stuff atm).


    <snip, see thread on help-octave ML>


             i've made a few quick and dirty changes, change to gpl
        licence and
             commit the broken range part too.

        https://github.com/markuman/__xlsxread
        <https://github.com/markuman/xlsxread>

             it's now plattform indepentend and - once again - faster
        than before
             (~58 seconds). now it's nearly twice as fast as matlab
        (~110 seconds).
             enough time to waste it for ranges, strings etc in future.



        here comes version 0.6 - https://github.com/markuman/__xlsxread
        <https://github.com/markuman/xlsxread>

        * strings and calculations are now replaced with NaN (without
        any speed
        losses!)
            * testet with a excel 2007 and excel for mac 2011 file
        (example files
        are added)
        * it's using now nested functions. this should be easier to
        ingetrate it
        in octave-io

        ranges and empty columns still don't work!


    Good work Markus.

    Anyway, sorry to come up with a few more potential gotchas:


i know it's not finished ;)


    - Interesting would be if your code properly handles merged and
    hidden cells. I don't know what they look like in raw OOXML.

    - Does OOXML have repeated-rows and repeated-columns "folding"?
    E.g., ODS1.2 has the table:TableNumberRowsRepeated and
    table:__TableNumberColumnsRepeated tags.


Yes, I think we are talking about the same. The last time I've take a
look at it is 3 month ago. That's the next step.
And it would be helpful if some others can commit some example files for
this situation. That's all i got
https://github.com/markuman/xlsxread/tree/master/example

Oh, I have several specially cooked-up spreadsheet files lying around for testing (of course I have), but only a few in OOXML format. But translating from BIFF8 (Excel'97) or ODS to OOXML is easily done.

These days I have a test script in the io package that is used to test basic functionality of spreadsheet I/O for all supported interfaces (xls, ods). I started adding more complicated test cases to it earlier this year, incl. merged cells, but that is still unfinished.

In io-1.2.x there's a io_xls_testscript.m that can invoke interface "A" to create a spreadsheet file and invokes interface "B" to read it back (I made that for OpenXLS as its write support is currently disabled). Shouldn't be hard to amend it for testing your version.

    It would be really good to have a Java-free (and ActiveX-free)
    spreadsheet reading capability in Octave, even if only a basic one.

    Sergei suggested a Perl-based solution; but Perl would still be a
    dependency, not all systems have Perl installed (e.g., Windows).


So this is obsolete now.

Well...

1. I myself like choice, and I'd like to offer users some choice too, so I'm not opposed to adding it in if it is not too much work.

2. I think/hope it'll give me a chance to learn a little bit of Perl.

3. Perl offers write support as well (according to Sergei; I didn't check).

4. Many Linux distros have Perl installed in the base system.

==> So I keep this option on my TODO list; in the "sometime, somewhere" section, that is.

    You've made a first try for OOXML; I have a basis for decoding ODS
    lying around, it doesn't work at all yet but might not need undue
    amounts of attention.
    You made the vital piece: unzipping the spreadsheet file to disk.

    For inclusion in the OF io package (in a later stage, first try to
    get your version robust and fail-safe) I'd suggest to see how the
    various "interfaces" are built and called in the OF io package.


I was guessing that this was the message from the "ToDo" and "ChangeLog"
part from the README.md file.
* do the cell folding (e.g. empty columns) part
* make it more robust (there may be >100 disregarded xlsx variation)
* cleanup the code and fit it for the io package
but next week i'm on travel so there won't be any updates in the next ~2
weeks from me.

There's absolutely no hurry. I'd rather see a well-tested and robust version than a sloppy Q&D job that provokes heaps of bug reports.
Take your time.

For the time being, just keep building & testing it as a stand-alone self-contained xlsread.m
When it's time I'll help integrating it in the OF io package.

<snip>
    Oh BTW another idea (that I explored in 2009 but couldn't get to
    work at the time):
    There is a binary (compiled) xmlread function, currently it is in
    the io package. Maybe with a proper "template" it could just read
    the worksheets into a struct in RAM, faster than regexp can decode
    it. The missing piece is the "template". (sorry for my lack of XML
    proficiency & lingo) Unfortunately that xmlread is very tersely, if
    not badly documented.
    However there are xml toolboxes around that could be gotten to work
    in Octave.


I'll take a look at it.

Again, no hurry.

But at the moment i'm happy with this solution. It's easy and fast to
kill several needs (like t="s" and formulars like <f>) with just one
regexp command - even if it's nested.

Yep.

Thanks,

Philip


reply via email to

[Prev in Thread] Current Thread [Next in Thread]