[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Use R to manage results from GNU Parallel
From: |
Ole Tange |
Subject: |
Re: Use R to manage results from GNU Parallel |
Date: |
Sun, 5 Jan 2014 16:38:08 +0100 |
On Sun, Jan 5, 2014 at 2:13 PM, David Rosenberg <david.davidr@gmail.com> wrote:
>> But I would appreciate help with:
>>
>> load_parallel_results_split_on_newline(filenametable)
I have this working now. See below.
>> load_parallel_results_split_to_columns(filenametable)
>
> I'm happy to write these, though I'm limited on time. Could you could write
> a generator for test data?
parallel --results my/results/dir --header : echo FOO={foo}
BAR={bar}';'seq {bar} :::: <(echo foo; seq 1000) <(echo bar; seq 10)
> R has limited options for reading data with a non-newline record separator
> characters. My first approach here would be to pipe the data through tr or
> sed to swap the desired record separator character with "\n", so that we can
> read things into R with the usual commands. I'm assuming we're on a POSIX
> system, or something where we can do that. Otherwise, I think we'd have to
> read each file as a giant string (as you're doing for 'raw'), and then parse
> things ourselves, which I'd suspect would be much slower.
I do not like the idea of shelling out simply to read a file. If we
are talking tons of small files then spawning a shell will slow it
down tremendously.
I read that anything you can do on a connection (i.e. R's filehandle)
you can also do on a string using textConnection. So I would suggest
we make an efficient raw reader and use that and then use
a=sub(newlinesep,"\n",a) to replace newline/tab and finally use R's
builtin reader on a textConnection.
> BTW, for 'raw', it might be worth comparing the performance of using
> readLines, followed by collapsing the newlines, to the following approach:
>
> readChar(fileName, file.info(fileName)$size)
Good call. It is way easier to read, so even if the performance is the
same I would still use it.
/Ole
load_parallel_results_filenames <- function(resdir) {
## Find files called .../stdout
stdoutnames <- list.files(path=resdir, pattern="stdout", recursive=T);
## Find files called .../stderr
stderrnames <- list.files(path=resdir, pattern="stderr", recursive=T);
if(length(stdoutnames) == 0) {
## Return empty data frame if no files found
return(data.frame());
}
m <- matrix(unlist(strsplit(stdoutnames, "/")),nrow =
length(stdoutnames),byrow=T);
filenametable <- as.table(m[,c(F,T)]);
## Append the stdout and stderr filenames
filenametable <- cbind(filenametable,
paste(resdir,unlist(stdoutnames),sep="/"),
paste(resdir,unlist(stderrnames),sep="/"));
colnames(filenametable) <-
c(strsplit(stdoutnames[1],"/")[[1]][c(T,F)],"stderr");
return(filenametable);
}
load_parallel_results_raw <- function(filenametable) {
## Read the files given in column stdout
stdoutcontents <-
lapply(filenametable[,c("stdout")],
function(filename) {
return(readChar(filename, file.info(filename)$size));
} );
## Read the files given in column stderr
stderrcontents <-
lapply(filenametable[,c("stderr")],
function(filename) {
return(readChar(filename, file.info(filename)$size));
} );
## Replace filenames with file contents
filenametable[,c("stdout","stderr")] <-
c(as.character(stdoutcontents),as.character(stderrcontents));
return(filenametable);
}
load_parallel_results_split_on_newline <- function(filenametable) {
raw <- load_parallel_results_raw(filenametable);
arg_indexes <- 1:(dim(raw)[1]-2);
return(t(as.data.frame(row.names=c(""),
apply(raw, 1, function(row) {
return(sapply(unlist(strsplit(row[c("stdout")], "\n")),
function(line) {
return(c(row[arg_indexes], line));
}
));
})
)));
}
- Use R to manage results from GNU Parallel, Ole Tange, 2014/01/04
- Message not available
- Re: Use R to manage results from GNU Parallel, Ole Tange, 2014/01/04
- Re: Use R to manage results from GNU Parallel, David Rosenberg, 2014/01/05
- Re: Use R to manage results from GNU Parallel, Ole Tange, 2014/01/05
- Re: Use R to manage results from GNU Parallel, David Rosenberg, 2014/01/05
- Re: Use R to manage results from GNU Parallel,
Ole Tange <=
- Re: Use R to manage results from GNU Parallel, David Rosenberg, 2014/01/05
- Re: Use R to manage results from GNU Parallel, Ole Tange, 2014/01/05
- Re: Use R to manage results from GNU Parallel, David Rosenberg, 2014/01/05
- Re: Use R to manage results from GNU Parallel, Ole Tange, 2014/01/05
- Re: Use R to manage results from GNU Parallel, David Rosenberg, 2014/01/06