|
From: | David Rosenberg |
Subject: | Re: Use R to manage results from GNU Parallel |
Date: | Sun, 5 Jan 2014 00:55:47 -0500 |
Your idea requires the user to make sure the output is \t separated.
Maybe we could have an option that would indicate the splitting char.
The default would be none = don't split:
> load_parallel_results(file,split="\t")
myvar1 myvar2 V1 V2> load_parallel_results(file)
1 1 A Hello 1
2 1 A Bye 2
3 1 A Wow 3
4 2 A Interesting 9
5 1 B NewYork 3
myvar1 myvar2 stdout stderr
1 1 A "Hello\t1\nBye\t2\nWow\t3\n" ""
2 2 A "Interesting\t9\n" ""
3 1 B "NewYork\t3\n" ""
I am also somewhat concerned that the current function loads all
stdout/stderr files - even if they are never used. It would be better
if that could be done lazily - see
http://stackoverflow.com/questions/20923089/r-store-functions-in-a-data-frame
I believe I would prefer returning a data-structure, that you could
select the relevant records from based on the arguments. And when you
have the records you want, you can ask to have the stdout/stderr read
in and possibly expanded as rows. This would be able to scale to much
bigger stdout/stderr and many more jobs.
Maybe the trivial solution is to simply return a table of the args+the
filenames of stdout/stderr, and then have a function that turns that
table into the read in files, which you can run either immediately or
after you have selected the relevant rows.
[Prev in Thread] | Current Thread | [Next in Thread] |