sks-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sks-devel] Adding DB_INIT_LOCK to sks-keyserver (revisited)


From: Jeff Johnson
Subject: Re: [Sks-devel] Adding DB_INIT_LOCK to sks-keyserver (revisited)
Date: Fri, 26 Feb 2010 10:43:02 -0500

On Feb 26, 2010, at 4:25 AM, Kim Minh Kaplan wrote:

> Jeff Johnson writes:
> 
>> On Feb 24, 2010, at 6:01 PM, Kim Minh Kaplan wrote:
>> 
>>> Jeff Johnson:
>>> 
>>>> The PTree deadlock is easily reproduced, and (with db_stat) a
>>>> detailed deadlock diagnosis could be attempted.
>>> 
>>> How would that be?
>>> 
>> 
>> Deadlock diagnosis is described in chapter 11 here:
>> 
>>      
>> http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/index.html
> 
> Ah, my question was too vague: how would you reproduce easily the
> deadlock?  Do you have a procedure that I can follow an be sure to hit
> this deadlock?  If so please describe it, I'll see if I can do anything
> about it.
> 

Ah, got it.

>From the 3 deadlocks I've seen, I'd guess that any moderately large (>500 keys)
is likely to encounter a partition tree deadlock if DB_INIT_LOCK is enabled.

(untried, will test) Rebuilding the partition tree might also reproduce. OTOH,
that might be a different code path, I'm still sorting out the implementation.


>>> As part of recovering from the deadlock, some of the indices were
>>> damaged (checked using db_verify). The fixup for that is
>>>     db_dump ... | db_load ...
>>> However, db_dump has no guarantee of preserving all data, its just "best 
>>> effort".
>>> In my case, the data loss showed up with a DB_PAGE_NOTFOUND, which (because 
>>> it
>>> was easiest) led to a full reload of the database from a dump.
> 
> Does that mean that db_recover did not work?  That's what I would use.
> 

I believe I did db_recover but can't swear to it. I will try to confirm
once I catch a few more deadlocks.

Meanwhile, db_recover is only as good as the logs that are present, often only 
back
to the last hourly checkpoint. If logs are being automatically removed,
damage to data in the secondary indices earlier than the last checkpoint
cannot be repaired without db_dump/db_load afaict.

If the secondary indices get out of sync with the primary key store, there's
a class of lookup failure "weirdness" that force recreating the key database 
from a dump.

With DB->associate given DB_TRUNCATE (and/or DB_CREATE after removal of a file),
Berkeley DB will do a sequential traverse of the primary store, passing
each primary record through a callback, from which the secondary store(s)
can be recreated. That guarantees consistent/reliable secondary -> primary
mappings without recreating the entire database.

There's also a check during access that detects secondary -> primary lookup's
that have gone AWOL for any reason.

And please forgive my professional interest in Berkeley DB pathologies. 
Recreating
from a dump after weird corner case failures is probably adequate for the 
existing sks keyserver
daemons.

hth

73 de Jeff

> -- 
> Kim Minh





reply via email to

[Prev in Thread] Current Thread [Next in Thread]