nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] new command lacks lock


From: Ken Hornstein
Subject: Re: [Nmh-workers] new command lacks lock
Date: Thu, 17 Oct 2013 09:57:35 -0400

>A script that was using mark(1) was sluggish on a local-disk folder with
>approx. 6,500 emails and many extra files after rmm, refile, etc.
>Deleting those, or using the script on a directory with less inodes, was
>a lot snappier.

It would be nice to understand what was so sluggish; system call tracing
would be interesting here.  Sadly, I think we're limited by the operating
system here in many cases; some Unix filesystems simply don't behave well
when dealing with a lot of files in a single directory.  Of course if we
can do better I'm always open to that.

>I've also been caught out by mark's behaviour in the past, e.g.
>
>    $ ls -A
>    $ touch {1..5}
>    $ mark -s lp -a all
>    $ ls -A
>    1  2  3  4  5  .mh_sequences
>    $ cat .mh_sequences
>    lp: 1-5
>    $ rm 2
>    $ cat .mh_sequences
>    lp: 1-5
>    $ mark -s lp -d 4
>    $ cat .mh_sequences
>    lp: 1 3 5
>    $ ls -A
>    1  3  4  5  .mh_sequences
>    $

So it's important to understand what is happening here.  The real culprit
is folder_read().  Specifically, folder_read() does the following things:

- Calls readdir() on the directory to build a 'struct msgs' data structure
  that contains all of the information about a folder.  Specifically, there
  is a bit vector that contains information about each message within a
  folder; whether or not it exists, if it's selected, and what sequences
  it is in.
- Calls seq_read() to read the sequences and set the necessary bits in
  the message structure.  However ... and this is the key point ... if the
  message doesn't exist, then it never gets added to the sequence bit vector.
  (The exception here is the "cur" sequence, which has special handling).

So it's not mark specifically that does this; it's any command which
ends up rewriting the sequence file (which in practice is most of them).

It seems that this behavior is generally what you want ... if the
sequences are out of whack and make references to messages that don't
exist, they are silently and automatically cleaned up.  I think that
history shows that this was the correct decision; references to stale
messages in your sequences file are not a problem for MH nor nmh.
Changing this would be hard ... it wouldn't be bad for messages between
the high and low message numbers, but messages outside of that range
would be hard to deal with since the data structures are all allocated
based on arrays sized to be starting at the low message number and only
containing enough elements to handle up to the high message number;
handling THAT case would involve a bunch of reallocations and copying.
In addition, allowing non-existant messages to stick around in sequences
would involve creating a new command to clean up the sequences file.

In short: changing this behavior would be a lot of effort for what I
must conclude is an extreme corner case encountered by someone who
should know better :-)

--Ken



reply via email to

[Prev in Thread] Current Thread [Next in Thread]