[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ePiX-devel] Proposed ata pruning function
From: |
Andrew D. Hwang |
Subject: |
Re: [ePiX-devel] Proposed ata pruning function |
Date: |
Sun, 29 Apr 2007 13:35:56 -0400 (EDT) |
On Fri, 6 Apr 2007, Marcus D. Hanwell wrote:
I have been working on a data pruning function today and have come up
with a prototype. Not sure if it is still a little sloppy or if anyone
else had any ideas on what they would like from it. For me it goes along
with my improved data_file read function - read in my data files and
select the part of the graph we are interested in.
Hi Marcus,
Sorry for the long delay in replying. I finally had a chance to try out
your tokenizing code, which seems to work well, though I did have a few
minutes of confusion when previously-working files gave out-of-range
errors; turns out I was using TAB-separated data... :)
Would it be acceptable to make TAB the default separator? In any case,
I'll fix the data_file::write function so it uses m_delim to separate
columns. At present, TAB is hard-coded in data_file::write, but read and
write should probably act consistently.
I'd also like to change some of the warning messages, so they print the
function that called them, or the name of the file that contains errors.
[snip data pruning]
It defaults to pruning on column 1 and would delete all points outside the
range 2.0 < x < 20.0 in this particular case. Below is my proposed code for
the function.
Thanks,
Marcus
void prune(double min, double max, unsigned int col = 1);
void data_file::prune(double min, double max, unsigned int col)
{
// Erase rows where the data is outside of the specified range
std::vector<std::vector<double>::iterator> iter(m_data.size());
for (unsigned int i = 0; i < m_data.size(); i++)
iter.at(i) = m_data.at(i).begin();
while (iter.at(0) != m_data.at(0).end())
{
if ( *iter.at(col-1) < min || *iter.at(col-1) > max )
{
for (unsigned int j = 0; j < m_data.size(); j++)
m_data.at(j).erase(iter.at(j));
}
else
{
for (unsigned int j = 0; j < m_data.size(); j++)
iter.at(j)++;
}
}
}
This looks nice!
Contrary to my email a couple of weeks back, your prune function doesn't
overlap the proposed functionality of the "selection" class, which was
intended to facilitate "masking" certain parts of plots without altering
the actual data. However, perhaps it makes sense to provide pruning as
part of a general scheme that allows columns to be removed according to
criteria?
As a rough outline, there'd be a data_clip class providing a bool-valued
operator of double. User code would look something like this:
// cull rows if column1 is outside [2, 20]
data_clip my_clip(2.0, 20, 1);
data.prune(my_clip);
data.plot(...)
data_file::prune would be implemented just like your code, except:
while (iter.at(0) != m_data.at(0).end())
{
if ( my_clip(*iter.at(col-1))
...
}
Potential advantages include the ability to prune data after applying a
function to the "test column" (for example, to perform "log pruning") or
the ability to remove data *inside* an interval:
// removes rows if log(column2) is outside [-1,1]
data_clip clip2(-1, 1, 2, log);
// removes rows if log(column2) is inside [-1,1]
data_clip clip3(-1, 1, 2, log, true); // last arg reverses criterion
If the extra flexibility seems worthwhile, that leaves the question of
implementing. Two (simple-minded) ideas are:
1. Have the data_clip constructor accept a bool-valued function of double,
and let data_clip be essentially a wrapper.
2. Have the constructor accept two doubles (clip bounds), a column index,
a double-valued function of double (applied to the data before testing
to see if the result is in bounds), and a bool (for logical reversal).
I sort of prefer the second approach, but this design does limit future
flexibility, as it hard-codes data that constitute a selection criterion.
What do you think?
Finally (regarding *adding* columns), it would be convenient to have a
function that inserts a column of data into a data_file; I'll try to put
something together in the near future.
Best,
Andy
Andrew D. Hwang address@hidden
Department of Math and CS http://mathcs.holycross.edu/~ahwang
College of the Holy Cross (508) 793-2458 (Office: 320 Swords)
Worcester, MA, 01610-2395 (508) 793-3530 (fax)