epix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ePiX-devel] Proposed ata pruning function


From: Andrew D. Hwang
Subject: Re: [ePiX-devel] Proposed ata pruning function
Date: Sun, 29 Apr 2007 13:35:56 -0400 (EDT)

On Fri, 6 Apr 2007, Marcus D. Hanwell wrote:

I have been working on a data pruning function today and have come up with a prototype. Not sure if it is still a little sloppy or if anyone else had any ideas on what they would like from it. For me it goes along with my improved data_file read function - read in my data files and select the part of the graph we are interested in.

Hi Marcus,

Sorry for the long delay in replying. I finally had a chance to try out your tokenizing code, which seems to work well, though I did have a few minutes of confusion when previously-working files gave out-of-range errors; turns out I was using TAB-separated data... :)

Would it be acceptable to make TAB the default separator? In any case, I'll fix the data_file::write function so it uses m_delim to separate columns. At present, TAB is hard-coded in data_file::write, but read and write should probably act consistently.

I'd also like to change some of the warning messages, so they print the function that called them, or the name of the file that contains errors.

[snip data pruning]

It defaults to pruning on column 1 and would delete all points outside the
range 2.0 < x < 20.0 in this particular case. Below is my proposed code for
the function.

Thanks,

Marcus

 void prune(double min, double max, unsigned int col = 1);

 void data_file::prune(double min, double max, unsigned int col)
 {
   // Erase rows where the data is outside of the specified range
   std::vector<std::vector<double>::iterator> iter(m_data.size());
   for (unsigned int i = 0; i < m_data.size(); i++)
     iter.at(i) = m_data.at(i).begin();

   while (iter.at(0) != m_data.at(0).end())
   {
     if ( *iter.at(col-1) < min || *iter.at(col-1) > max )
     {
       for (unsigned int j = 0; j < m_data.size(); j++)
         m_data.at(j).erase(iter.at(j));
     }
     else
     {
       for (unsigned int j = 0; j < m_data.size(); j++)
         iter.at(j)++;
     }
   }
 }

This looks nice!

Contrary to my email a couple of weeks back, your prune function doesn't overlap the proposed functionality of the "selection" class, which was intended to facilitate "masking" certain parts of plots without altering the actual data. However, perhaps it makes sense to provide pruning as part of a general scheme that allows columns to be removed according to criteria?

As a rough outline, there'd be a data_clip class providing a bool-valued operator of double. User code would look something like this:

// cull rows if column1 is outside [2, 20]
data_clip my_clip(2.0, 20, 1);
data.prune(my_clip);
data.plot(...)

data_file::prune would be implemented just like your code, except:

while (iter.at(0) != m_data.at(0).end())
  {
    if ( my_clip(*iter.at(col-1))
      ...
  }

Potential advantages include the ability to prune data after applying a function to the "test column" (for example, to perform "log pruning") or
the ability to remove data *inside* an interval:

// removes rows if log(column2) is outside [-1,1]
data_clip clip2(-1, 1, 2, log);

// removes rows if log(column2) is inside [-1,1]
data_clip clip3(-1, 1, 2, log, true); // last arg reverses criterion


If the extra flexibility seems worthwhile, that leaves the question of implementing. Two (simple-minded) ideas are:

1. Have the data_clip constructor accept a bool-valued function of double,
  and let data_clip be essentially a wrapper.

2. Have the constructor accept two doubles (clip bounds), a column index,
  a double-valued function of double (applied to the data before testing
  to see if the result is in bounds), and a bool (for logical reversal).

I sort of prefer the second approach, but this design does limit future flexibility, as it hard-codes data that constitute a selection criterion.

What do you think?


Finally (regarding *adding* columns), it would be convenient to have a function that inserts a column of data into a data_file; I'll try to put something together in the near future.

Best,
Andy

Andrew D. Hwang                 address@hidden
Department of Math and CS       http://mathcs.holycross.edu/~ahwang
College of the Holy Cross       (508) 793-2458 (Office: 320 Swords)
Worcester, MA, 01610-2395       (508) 793-3530 (fax)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]