nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nmh-workers] Understanding nmh (aka. What's the goal)


From: Valdis . Kletnieks
Subject: Re: [Nmh-workers] Understanding nmh (aka. What's the goal)
Date: Thu, 02 Dec 2010 03:10:25 -0500

On Wed, 01 Dec 2010 21:39:37 PST, Jon Steinhart said:

> One of the big pieces that's needed is a modern mail parser.  As per earlier
> emails, I think that this is complex enough that it's a job for lex and yacc.
> 
> A big thing that someone could do to help me with this would be to collect all
> of the various grammar into a single document.  I'm willing to write the code
> for it, but I'm not a complete rfc junkie and find the whole thing hard to
> read.  If some of you could slog through the rfcs and collect this stuff we
> could make some real progress.

There's several ways to go here.  The actual grammar is (mostly) in RFC5322,
except for the MIME headers (which are mostly simple enough that a simple
ad-crock parser should be able to deal with it, just "Fieldname: [tag=value]*"
for the most part.  Parsing the tag/value pairs is easy - the semantics are a
pain because they're often context-sensitive (ignore this tag unless this other
tag doesn't say 'inline', etc...).

Large chunks of the grammar are there only for crufty corner cases (if anybody
is interested, read section 3.1.4 of RFC822 for an example of its awesomeness)

Just because I remembered seeing it before, here's a rfc822 address
validator, done as one Perl regexp:

http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

Yes, you really want to use lex/yacc to build a parser instead. :)

And then the question of what to do when certain other common MUAs and MTAs
manage to ignore the RFCs and produce something ugly - although the biggest
offender is still the various poorly written spamware out there.  But since no
spam filter is 100% effective, we *do* have to be robust in the face of crap.

Unfortunately, parsers created from a BNF or similar tend to be a tad
brittle when recovering from syntactically incorrect input (anybody ever
had a missing ) or } leave an error message 500+ lines away from the
actual error?

Attachment: pgpY1fy49RWu0.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]