[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nmh-workers] nmh internals: full MIME integration
From: |
Ralph Corderoy |
Subject: |
Re: [Nmh-workers] nmh internals: full MIME integration |
Date: |
Sat, 26 Jul 2014 20:07:12 +0100 |
Hi Ken,
> > If we're having lazy evaluation of MIME parts, which is good, can it
> > also cover the headers? `pick --list-id <address@hidden>' isn't
> > concerned with decoding Subject and all those Received headers. It
> > may not sound like much, but we have folders with tens of thousands
> > of emails. get_header() could note minimal details of each header
> > it comes across whilst searching for the List-ID but not bother too
> > much about their contents.
>
> I wasn't actually thinking of decoding the headers for things like
> MIME content, at least upon read (I assume you're talking about RFC
> 2047 encoding
No, less than that. I'm hoping this change will also improve searching
for split-line headers.
$ grep -A 1 '^foo:' `mhpath .`
foo: bar
xyzzy
$ pick --foo 'bar xyzzy' .
pick: no messages match specification
$ pick --foo 'bar xyzzy' .
1 hit
$
pick may have changed a bit since the above version, but I still
shouldn't have to care how much whitespace continuation lines are
indented. Shouldn't pick be matching against a logical view of a single
line, with `CRLF WS*' becoming a single space?
> Okay, I guess I could see that. The normal case would be to decode
> the contents completely
Yep, to UTF-8 single lines?
> > the kind of overhead that would be nice to see done only on demand.
>
> I'm still skeptical that you'd even notice (it isn't 1988 anymore!),
> but I think if the API was well designed it should be easy to
> implement.
Well, you might be thinking the 2047-decoding might not make a lot of
difference, whereas I'm thinking a block can be read into a page-aligned
buffer that has an \n beyond it as a sentinel, then check for
/foo[ \t]*:/i, ignore any non-foo headers, hunt for the next \n and repeat
if it's not the sentinel, else read another block and try again. Stop
if no more blocks or \n\n. The detail's a bit more complex but there's
no allocation and copying for headers seen along the way; they'll be
found when they're looked for in turn. The file's blocks aren't being
modified so no copy-on-write's occurring.
I agree moderness is quick; this is on about 22,500 emails.
$ LC_ALL=C \time -v perl -e 'for (<[0-9]*>) {sysopen F, $_, 0 and sysread
F, $b, 4096 or die}'
Command being timed: "perl -e for (<[0-9]*>) {sysopen F, $_, 0 and
sysread F, $b, 4096 or die}"
User time (seconds): 0.40
System time (seconds): 0.52
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.93
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 24112
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 1688
Voluntary context switches: 1
Involuntary context switches: 19
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
$
It would be nice if a simple pick didn't add much to that roughly
one-second 100%-CPU-utilisation wall-clock time. :-) Running pick
tends to be an iterative process where the query is honed.
Cheers, Ralph.
- [Nmh-workers] nmh internals: full MIME integration, Ken Hornstein, 2014/07/25
- Re: [Nmh-workers] nmh internals: full MIME integration, Ken Hornstein, 2014/07/27
- Re: [Nmh-workers] nmh internals: full MIME integration, Lyndon Nerenberg, 2014/07/27
- Re: [Nmh-workers] nmh internals: full MIME integration, Robert Elz, 2014/07/27
- Re: [Nmh-workers] nmh internals: full MIME integration, Ken Hornstein, 2014/07/27