epix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ePiX-devel] Proposed data pruning function


From: Marcus D. Hanwell
Subject: Re: [ePiX-devel] Proposed data pruning function
Date: Sun, 13 May 2007 17:01:17 +0100
User-agent: Thunderbird 2.0.0.0 (X11/20070424)

Andrew D. Hwang wrote:
> On Fri, 6 Apr 2007, Marcus D. Hanwell wrote:
>
>> I have been working on a data pruning function today and have come up
>> with a prototype. Not sure if it is still a little sloppy or if
>> anyone else had any ideas on what they would like from it. For me it
>> goes along with my improved data_file read function - read in my data
>> files and select the part of the graph we are interested in.
>>
> Sorry for the long delay in replying. I finally had a chance to try
> out your tokenizing code, which seems to work well, though I did have
> a few minutes of confusion when previously-working files gave
> out-of-range errors; turns out I was using TAB-separated data... :)
>
> Would it be acceptable to make TAB the default separator? In any case,
> I'll fix the data_file::write function so it uses m_delim to separate
> columns. At present, TAB is hard-coded in data_file::write, but read
> and write should probably act consistently.
I nearly chose TAB as that is what I use (and nearly all the files I
read) but then got the impression you had previously preferred SPACE as
the delimiter (not sure why now). It would be no problem at all to
change the default to TAB - I think this is consistent with other
applications anyway.
>
> I'd also like to change some of the warning messages, so they print
> the function that called them, or the name of the file that contains
> errors.
That sounds like a great idea. I will remember to update things as I see
them and work on the code.
>
>> [snip data pruning]
>>
>> It defaults to pruning on column 1 and would delete all points
>> outside the
>> range 2.0 < x < 20.0 in this particular case. Below is my proposed
>> code for
>> the function.
>>
>> Thanks,
>>
>> Marcus
>>
>>  void prune(double min, double max, unsigned int col = 1);
>>
>>  void data_file::prune(double min, double max, unsigned int col)
>>  {
>>    // Erase rows where the data is outside of the specified range
>>    std::vector<std::vector<double>::iterator> iter(m_data.size());
>>    for (unsigned int i = 0; i < m_data.size(); i++)
>>      iter.at(i) = m_data.at(i).begin();
>>
>>    while (iter.at(0) != m_data.at(0).end())
>>    {
>>      if ( *iter.at(col-1) < min || *iter.at(col-1) > max )
>>      {
>>        for (unsigned int j = 0; j < m_data.size(); j++)
>>          m_data.at(j).erase(iter.at(j));
>>      }
>>      else
>>      {
>>        for (unsigned int j = 0; j < m_data.size(); j++)
>>          iter.at(j)++;
>>      }
>>    }
>>  }
>>
> This looks nice!
>
> Contrary to my email a couple of weeks back, your prune function
> doesn't overlap the proposed functionality of the "selection" class,
> which was intended to facilitate "masking" certain parts of plots
> without altering the actual data. However, perhaps it makes sense to
> provide pruning as part of a general scheme that allows columns to be
> removed according to criteria?
>
> As a rough outline, there'd be a data_clip class providing a
> bool-valued operator of double. User code would look something like this:
>
> // cull rows if column1 is outside [2, 20]
> data_clip my_clip(2.0, 20, 1);
> data.prune(my_clip);
> data.plot(...)
>
> data_file::prune would be implemented just like your code, except:
>
> while (iter.at(0) != m_data.at(0).end())
>   {
>     if ( my_clip(*iter.at(col-1))
>       ...
>   }
>
This looks like a useful improvement whilst maintaining the core
functionality I have been using.
> Potential advantages include the ability to prune data after applying
> a function to the "test column" (for example, to perform "log
> pruning") or
> the ability to remove data *inside* an interval:
>
> // removes rows if log(column2) is outside [-1,1]
> data_clip clip2(-1, 1, 2, log);
>
> // removes rows if log(column2) is inside [-1,1]
> data_clip clip3(-1, 1, 2, log, true); // last arg reverses criterion
>
>
> If the extra flexibility seems worthwhile, that leaves the question of
> implementing. Two (simple-minded) ideas are:
>
> 1. Have the data_clip constructor accept a bool-valued function of
> double,
>   and let data_clip be essentially a wrapper.
>
> 2. Have the constructor accept two doubles (clip bounds), a column index,
>   a double-valued function of double (applied to the data before testing
>   to see if the result is in bounds), and a bool (for logical reversal).
>
> I sort of prefer the second approach, but this design does limit
> future flexibility, as it hard-codes data that constitute a selection
> criterion.
>
> What do you think?
Personally I think the second solution looks best. I am glad my work
hasn't overlapped too much. I have been using my prune function quite a
bit recently and it has been very useful to me. I will add it shortly
once I have double checked it for bugs and ensured there are no CVS
conflicts.
> Finally (regarding *adding* columns), it would be convenient to have a
> function that inserts a column of data into a data_file; I'll try to
> put something together in the near future.
That certainly sounds like a useful addition. I am using ePiX quite
heavily right now doing a lot of data plotting so that side should get a
lot of testing from me. I held back on adding the prune stuff but will
add it shortly as I have done quite a bit of testing on it now.

Thanks,

Marcus




reply via email to

[Prev in Thread] Current Thread [Next in Thread]