[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: reading text data with textscan annoyingly slow
From: |
PhilipNienhuis |
Subject: |
Re: reading text data with textscan annoyingly slow |
Date: |
Wed, 2 Nov 2011 12:52:20 -0700 (PDT) |
bpabbott wrote:
>
> On Nov 2, 2011, at 6:40 AM, MarcelK wrote:
>
>> http://octave.1599824.n4.nabble.com/file/n3972499/example1.ncf
>> example1.ncf
>>
>> Hi,
>>
>> I'm using Octave 3.2.4. with Windows XP. (i686-pc-mingw32)
>> I'm also using GUIOctave 1.5.3. as frontend.
>>
>> I'm facing some problems reading data from a .ncf text file.
>> I've attached an example of such a file (I hope that worked).
>> It takes about 60 seconds to read one single ncf file
>> However, in Matlab it takes not even a second.
>>
>> Here's my code I use to read the data in:
>>
>>
>> function [Date1,headlines,nummatrix] = ncfread (filename)
>>
>> fid=fopen(filename,'r');
>>
>> %# read data headers
>> headerdata=fgets(fid);
>> index=findstr(headerdata,'}');
>> ncols=length(index);
>> headlines={};
>> headlines(1)=headerdata(1:index(1));
>> for mm=2:ncols
>> headlines(mm)=headerdata(index(mm-1)+1:index(mm));
>> endfor
>>
>> textformat=['%s %s',repmat('%f',1,ncols-2)];
>>
>> datacell=textscan(fid,textformat);
>>
>> Date1=datacell{1,1}{1};
>>
>>
>> timedata=datacell{2};
>>
>> fclose(fid);
>>
>> %# generate time vector (time in hours)
>> t=zeros(size(datacell,1),1);
>> timestring=char(timedata);
>> for jj=1:size(timestring,1)
>> tstruct=strptime(timestring(jj,:),'%R');
>> t(jj)=tstruct.hour+tstruct.min/60;
>> endfor
>>
>> %# conversion cell>matrix
>> nummatrix=zeros(length(datacell{1}),size(datacell,2));
>> nummatrix(:,2)=t;
>>
>> for ii=3:size(nummatrix,2)
>> nummatrix(:,ii)=datacell{ii};
>> endfor
>>
>> nummatrix(:,1)=[];
>>
>> endfunction
>>
>>
>> My way of converting the "time string" (e.g. '10:00') to time in hours
>> (e.g. 10.00) seems quite complicated to me, is there maybe a better way
>> to
>> achieve this?
>>
>> Thanks in advance,
>>
>> Marcel
>
> Octave's textscan() is currently implemented as an m-file, while Matlab's
> has been written in c++. I expect large differences in speed. The
> developers are planning to implement Octave's textscan() in c++ as well.
> I'm optimistic the result will be very fast.
>
> Even so, I am able to run your script is about 1 sec.
>
> tic (); ncfread ('example1.ncf'); toc()
> Elapsed time is 1 seconds.
>
> I'm running the developer's sources on MacOS, so it is possible that
> Octave's textread() has been improved or the slow performance is due to
> some problem between Octave and Windows.
>
> I don't have an older copy of Octave to try, nor do I have a windows
> machine to work with.
>
> Anyone else?
>
There was a similar complaint some months ago about the string/text file
reading functions (I think in the bug tracker).
Rik found out that (IIRC) strtrim() was the culprit. After replacing that,
execution times were much better.
I doubt if textscan.m/textread.m/strread.m from the development sources will
work with 3.2.4. So an Octave upgrade is needed in your case anyway.
You can try the 3.4.3 zip (7z) files (see
https://mailman.cae.wisc.edu/pipermail/octave-maintainers/2011-October/025505.html);
on my box these work with GUIOctave as well (but you need to explicitly set
gnuplot as graphics backend using "graphics_toolkit gnuplot).
Philip
--
View this message in context:
http://octave.1599824.n4.nabble.com/reading-text-file-with-textscan-annoyingly-slow-tp3972499p3982636.html
Sent from the Octave - General mailing list archive at Nabble.com.