On 01/02/11 22:13, Todd Jackson wrote:
On Tue, Jan 4, 2011 at 2:07 AM, John
Collins (personal)
<address@hidden>
wrote:
On 04/01/11 04:27, Todd Jackson wrote:
I had something weird happen
with my gnubatch installation today, all of the jobs
disappeared from the gbch-q display and stopped
running. If I look in /usr/local/var/gnubatch the
SPXXXXXXXX SOXXXXXXXX and ERXXXXXXXX files are still
there. Is there any way to get the system to
recognize these jobs again without having to resubmit
them? Any ideas as to what could have triggered this
problem? I'm running gnubatch 1.2 on CentOS 5.5,
x86_64, and I've been running gnubatch for a few
months now without having a problem like this.
The last thing that I'm aware of that was being done
with gnubatch was that someone had just attempted to
submit a job with the following command:
gbch-r -r
minutes:20 -A- -a --userName\ swg -a --userPass\
xxxxxxxx -a --textMap\
/usr/local/etc/swg_10.0_text_map.xml.gz -a
--dailySize\ 10000000 -a --genUsers True -a
192.168.46.98 192.168.46.99 -h Summary\ Daily\
Performance /usr/local/bin/genswg.py
The person who was attempting to submit this job got a
few syntax errors when they issued this command, and
right after that we noticed that the gnubatch
scheduler had no jobs queued.
Thanks in advance.
-- Todd
Was there anything in the log file (in the spool directory)
saying that the scheduler had terminated?
Was there perhaps something else running that trampled on
the Shared Memory? I've seen this a good few times? The
trouble is you create the key and get a handle number and
whilst it is usually consecutive to the last handle number
it often isn't and some software just assumes it is so it
can grab "your" shared memory instead of its own with
unfortunate effects.
It's really a bit hard from here to work out.
--
John Collins address@hidden
Phone: +44 (0)1707 883174 Mobile: +44 (0)7958 387247
Work Phone: +44 (0)1707 886110
3 Mandeville Rise, Welwyn Garden City, Herts, AL8 7JT,
UK
I had what seems like the exact same issue again with
gnubatch losing all of the jobs from the queue. I looked at the
joblog an did not see anything in there that looked out of the
ordinary, just a bunch of jobs that were running. One thing
that I did notice this time, is that the gbch-q display was not
showing any queued jobs at all, but the jobs were still running,
at least they were until I did a gbch-quit and gbch-start. Not
sure if that is any help at all, but since this has happened to
me twice now in the past month I'm wondering if there is any way
to rescue the queued jobs after this has happened. Any
suggestions would be appreciated.
-- Todd
This is all very strange I've never heard of this happening before.
You could try running "gbch-cjlist" to dump off the jobs.
I must get the XML job queues and saved jobs bit running.
--
John Collins address@hidden Xi Software Ltd www.xisl.com
Phone: +44 (0)1707 886110
Home Phone: +44 (0)1707 883174
Mobile: +44 (0)7958 387247 (
address@hidden)
Trading Address 3 Mandeville Rise, Welwyn Garden City,
Herts, AL8 7JT, UK
Registered in England Company Number 01977148 VAT GB 403
9239 64 R/O: 2 Mill Road, Haverhill, Suffolk, CB9 8BD