emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] nnmaildir: Use a 'num' file, instead of a directory


From: Ted Zlatanov
Subject: Re: [PATCH 2/2] nnmaildir: Use a 'num' file, instead of a directory
Date: Thu, 12 Aug 2010 19:15:24 -0500
User-agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux)

On Sat, 26 Jun 2010 13:33:42 -0400 address@hidden (Paul Jarc) wrote: 

>> I have recently experienced slow nnmaildir performance

PJ> I suspect the performance problem is with the nov/ directory, rather
PJ> than num/.  With the current nov/ structure, it's quick and easy to
PJ> map from filenames to article numbers, but I think we usually need the
PJ> inverse operation.  As it is, to map from an article number to a
PJ> filename, we need to read the contents of all the nov/ files.  That's
PJ> done just once and the results are cached, so it's not too horrific,
PJ> but we still need to check timestamps to see if the files have
PJ> changed, and it takes a lot of memory for large groups.

PJ> It's been a while since I looked at it, though--there may be some
PJ> operations where we do need to go from the filename to the number.  So
PJ> then it might be useful to add hard links so each nov/ file could be
PJ> accessed by either its article number or filename.  The filename would
PJ> also have to be added to the contents of those files somehow.

Thanks for explaining, Paul.  I wanted to respond to you and John
carefully so it took me a while.  Sorry about that.

I looked at the nnmaildir code.  Keeping in mind the majority of Gnus
users don't need concurrent access to their Maildirs, I have a proposal.

Regarding John's patch, I think it's good to avoid creating many extra
files.  Inodes can be expensive and many filesystems are not good about
indexing many files.  But it should be a user option called
'make-concurrent for instance (on the nnmaildir backend), not a complete
switchover as it is now.  But using a `num' file seems superfluous
since, if we know concurrent access is not an issue, we can keep a
single database.  We also don't have to worry about users going back and
forth between concurrent and non-concurrent access.  If they do, we can
complain loudly and maybe provide a slow bidirectional switchover
function.

Regarding the NOV database in .nnmaildir/nov/MESSAGE-ID, the goal is to
map it to the number N that's currently inside that file.  Links would
also burden the filesystem and are IMO not a good improvement since
scanning the directory repeatedly is expensive.  I think the current
strategy should be kept as is and turned on only if the user asks for
concurrency (as above).  

The non-concurrent alternative should be to keep a single NOV and num
database in memory for the active group and flush it to disk as needed.
The database can be as simple as one line at the beginning for the
version and then just the NOV vectors in order, one per line.  Appending
is trivial (read last line to get max number, append line) and rewriting
the NOV is only necessary when deleting an article.  I think this would
speed up nnmaildir operations significantly.

I'd like to know your opinion since you wrote so much of nnmaildir.el
and have experience supporting it.  I am certain that for the majority
of Gnus users today concurrent access is not an issue based on what I've
heard in the Gnus mailing lists over the last 8 years.  But do you think
the current concurrent system can be improved significantly rather than
doing what I propose?  Should we look at different storage, maybe SQLite
or Berkeley DB or sparse files, for those databases?  Is there anything
in the Emacs core that can help us (thus the CC to emacs-devel) or can
anything be added to Emacs to that end?

Thanks for your help
Ted




reply via email to

[Prev in Thread] Current Thread [Next in Thread]