Re: new function: textscan.m

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new function: textscan.m

From:	Ben Abbott
Subject:	Re: new function: textscan.m
Date:	Sun, 24 Oct 2010 01:05:18 +0800

On Oct 23, 2010, at 5:21 PM, Liam Groener wrote:

> On Oct 22, 2010, at 10:34 PM, Ben Abbott wrote:
> 
>> On Oct 23, 2010, at 10:14 AM, Liam Groener wrote:
>> 
>>> On Oct 22, 2010, at 6:46 PM, Ben Abbott wrote:
>>> 
>>>> On Oct 23, 2010, at 8:53 AM, John W. Eaton wrote:
>>>> 
>>>>> On 23-Oct-2010, Ben Abbott wrote:
>>>>> 
>>>>> | I've made an attempt to implement the missing function textscan.m
>>>>> | 
>>>>> | If there are no suggestions for improvement, I'll commit.
>>>>> 
>>>>> +  if (nargin > 2 && isnumeric (varargin{1}))
>>>>> +    N = varargin{1};
>>>>> 
>>>>> I think it would help to quickly understand what N is if you used
>>>>> nlines or similar instead of N.  Also, we generally try to avoid
>>>>> uppercase variable names in Octave.
>>>>> 
>>>>> +  if ((! strcmp (class (fid), "double") || fid < 0) && ! ischar (fid))
>>>>> +    error ("textscan: first input argument must be a valid file id, or 
>>>>> string.");
>>>>> +  endif
>>>>> +
>>>>> +  if (! ischar (formatstr) && ! isempty (formatstr))
>>>>> +    error ("textscan: second input must be a format specification.");
>>>>> +  endif
>>>>> 
>>>>> Maybe I'm just slow, but I have a harder time understanding negative
>>>>> conditions like the ones above.  Instead of checking the conditions
>>>>> that lead to errors, I find it simpler to write and easier to
>>>>> understand code later if I test the conditions for success instead.
>>>>> For example, instead of the above, I would write something like
>>>>> 
>>>>> if (isa (fid, "double") && fid > 0 || ischar (fid))
>>>>> if (ischar (formatstr) || isempty (formatstr))
>>>>>  ## ... code to do the real work here ...
>>>>> else
>>>>>  error ("textscan: second input must be a format specification");
>>>>> endif
>>>>> endif
>>>>> else
>>>>> error ("textscan: expecting first argument to be a file id or character 
>>>>> string");
>>>>> endif
>>>>> 
>>>>> Is that condition on formatstr correct?  Is it OK for it to be empty
>>>>> if it is not a character string?
>>>>> 
>>>>> Note also that isa is probably better than class+strcmp.  But what
>>>>> happens if fid is a matrix?  Should we check for that?  Should we
>>>>> maybe have a is_valid_file_id function?  Maybe that would also be
>>>>> useful in other places too.
>>>>> 
>>>>> jwe
>>>> 
>>> Hi Ben,
>>> 
>>> I thought that, in Matlab, N is the number of times that the format string 
>>> is repeated (as in textread), not the number of lines to be read. Did you 
>>> intend to make this change? (Or am I all wet?)
>>> Liam
>> 
>> I have never used texscan before this week. It would be wise to be skeptical 
>> of my understanding for how Matlab's version works.
>> 
>> Can you provide me an example that illustrates the difference between 
>> repeating for format string, and reading the number of lines?
>> 
>> Ben
>> 
> Well, I haven't used textscan either. (I don't have Matlab.) I got my 
> impressions of how textscan works from a Matlab book. I modified the example 
> script I sent you the other day as follows:
> 
> B = [30 40 60 70 80];
> fid = fopen('myoutput','w');
> fprintf(fid,'%g miles %g kilometers\n',[B;8*B/5]);
> fclose(fid);
> 
> [a,b,c,d] = textread('myoutput','%f %s',2)
> 
> fid=fopen('myoutput','r');
> C = textscan(fid,'%f %s',2);
> C{1}
> C{2}
> C{3}
> C{4}
> fclose(fid);
> 
> From my understanding, both the textread and textscan parts of this script 
> should give more or less the same output. Note that, at least the textread 
> part, reads all five lines of the file, with four values per line, with N=2.
> 
> Liam G.

I found an example from the Mathworks website that does not work for the 
current implementation.

fid = fopen ('grades.txt', 'w');
fprintf (fid, '%s\n', 'Student_ID  | Test1  | Test2  | Test3');
fprintf (fid, '%s\n', '   1           91.5     89.2     77.3');
fprintf (fid, '%s\n', '   2           88.0     67.8     91.0');
fprintf (fid, '%s\n', '   3           76.3     78.1     92.5');
fprintf (fid, '%s\n', '   4           96.4     81.2     84.6');
fclose (fid);

fid = fopen ('grades.txt');
C_text = textscan (fid, '%s', 4, 'delimiter', '|');
C_data0 = textscan (fid, '%d %f %f %f');
frewind (fid);
C_text = textscan (fid, '%s', 4, 'delimiter', '|');
C_data1 = textscan (fid, '%d %f %f %f', 'CollectOutput', 1);
fclose (fid);

The proper result is ...

C_text = {'Student_ID', 'Test1', 'Test2' 'Test3'};
C_data0 = {[1;2;3;4], [91.5;88.0;76.3;96.4], [89.2;67.8;78.1;81.2], 
[77.3;91.0;92.5;84.6]};
C_data1 = {[1;2;3;4], [[91.5;88.0;76.3;96.4], [89.2;67.8;78.1;81.2], 
[77.3;91.0;92.5;84.6]]};

I'll have to give some thought on how to handle this. If anyone has some 
advice, it would be appreciated.

Ben

[Prev in Thread]

Current Thread

[Next in Thread]

Re: new function: textscan.m, (continued)

Prev by Date: Re: Undocumented functions
Next by Date: Re: Undocumented functions
Previous by thread: Re: new function: textscan.m
Next by thread: NEWS and fltk text objects
Index(es):
- Date
- Thread