parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sqlmaster "nowait" and "append" functionality?


From: Ole Tange
Subject: Re: sqlmaster "nowait" and "append" functionality?
Date: Wed, 7 Dec 2016 12:34:21 +0100

On Wed, Dec 7, 2016 at 6:46 AM, Andy Loftus <aloftus@gmail.com> wrote:
> On Tue, Dec 6, 2016 at 4:54 PM Ole Tange <ole@tange.dk> wrote:
>> On Tue, Dec 6, 2016 at 6:10 PM, Andy Loftus <aloftus@gmail.com> wrote:
:
>> > 1. populate the database then exit immediately without waiting?
:
>> Hmmm... maybe we should change, so you need to add '--wait' if you
>> want '--sqlmaster' to wait. Seems like a reasonable change.
>
> Even better!  That was my first assumption about how it worked.  Took me a
> little bit of time to figure out why --sqlmaster never exited, haha :)

$? would of course be wrong without --wait.

It should probably be 0 unless something broke.

>> > On a related note, why does the sqlworker command require the exact same
>> > input as the sqlmaster?  Shouldn't it be sufficient that all necessary
>> > information is stored in the database and then the sqlworker can just
>> > pull
>> > tasks from the database?
>>
>> --sqlmaster only inserts the values - not the command. The problem
>> comes with replacement strings like {%}. You will never know in
>> advance which job slot a job will be run as:
>>
>>     parallel echo {%} ::: {a..j}
>>
>> So --sqlmaster cannot store the actual command to run.
>>
>> It _could_ be changed so that --sqlmaster stores the template command
>> into the command column, and --sqlworker fetches it form here, and
>> replaces the command column with the actual command run when done.
>>
>> It would, however, be a fairly big change of GNU Parallel: The
>> assumption has always been that the template command remains the same
>> for the whole run, and quite a bit of optimization depends on this.
>>
>> On a similar note: What would you expect the table should look like
>> when you run:
>>
>> # These do not work - but what would you expect them to do to the table?
>> parallel --sqlandworker $DBURL -X echo {%}: {} ::: {1..10}
>> parallel --sqlandworker $DBURL -N3 echo {%}: {} ::: {1..10}
>> parallel --sqlandworker $DBURL echo {%} '{= $_=total_jobs() =}' :::
>> {1..10}
>
> In general, I think the best approach is "keep it simple" and try to retain
> as much flexibility as possible by putting the template in the command
> column.

When you submit a patch for this, please make sure the non-sql case
does not get slower by this. As mentioned a lot of the optimization is
depending on the command template does not change. And as a user I
would find it very confusing, if the first entry in the command column
is applied to all commands - especially if I had appended some
commands with a template command that was different:

parallel --sqlmaster $DBURL echo {%} ::: {1..10}
parallel --sqlmaster $DBURL echo {} ::: {1..10}
parallel --sqlworker $DBURL

> Some things could be replaced, such as {#}, but contexts that don't
> make sense until runtime, like {%}, would have to be inserted verbatim.

I will recommend you do no do that: All replacement strings are today
replaced internally with {=perl=}-expressions, and is then simply
treated as any other {=perl=}-expression. Treating some of them
differently will no doubt be much harder for you to implement.

> Regarding the dynamic calculation of "total_jobs", several options come to
> mind. First is to leave it dynamic and each job run by a sqlworker will
> calculate it at runtime, perhaps by looking up the number of rows in the
> table.

Today total_jobs() simply reads all jobs, counts them, and pushes them
to the internal stack. In other words it does not push them back onto
the sql-server. But it might be doable to do a special case for this,
so if $sql is set, it should use 'SELECT count(*) FROM $DBURL'.

> As your examples point out, they don't work, but do they need to?

I really like a tool that does not surprise me: If I add an option
that seems completely unrelated, then I will be surprised if they
suddenly do no work.

-X/--xargs/-m is probably reasonable not to support, but -n/-N/-l/-L
ought to be supported.

>> > 2. append new tasks to an existing database?

I believe doing the +DBURL for append, the --wait and maybe the
total_jobs() will be fairly easy to do, but I am going to leave the
rest to you. I have created ticket to track progress.

https://savannah.gnu.org/bugs/index.php?49789
https://savannah.gnu.org/bugs/index.php?49787
https://savannah.gnu.org/bugs/index.php?49786
https://savannah.gnu.org/bugs/index.php?49785


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]