[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nmh-workers] Understanding nmh (aka. What's the goal)
From: |
Valdis . Kletnieks |
Subject: |
Re: [Nmh-workers] Understanding nmh (aka. What's the goal) |
Date: |
Thu, 02 Dec 2010 03:10:25 -0500 |
On Wed, 01 Dec 2010 21:39:37 PST, Jon Steinhart said:
> One of the big pieces that's needed is a modern mail parser. As per earlier
> emails, I think that this is complex enough that it's a job for lex and yacc.
>
> A big thing that someone could do to help me with this would be to collect all
> of the various grammar into a single document. I'm willing to write the code
> for it, but I'm not a complete rfc junkie and find the whole thing hard to
> read. If some of you could slog through the rfcs and collect this stuff we
> could make some real progress.
There's several ways to go here. The actual grammar is (mostly) in RFC5322,
except for the MIME headers (which are mostly simple enough that a simple
ad-crock parser should be able to deal with it, just "Fieldname: [tag=value]*"
for the most part. Parsing the tag/value pairs is easy - the semantics are a
pain because they're often context-sensitive (ignore this tag unless this other
tag doesn't say 'inline', etc...).
Large chunks of the grammar are there only for crufty corner cases (if anybody
is interested, read section 3.1.4 of RFC822 for an example of its awesomeness)
Just because I remembered seeing it before, here's a rfc822 address
validator, done as one Perl regexp:
http://ex-parrot.com/~pdw/Mail-RFC822-Address.html
Yes, you really want to use lex/yacc to build a parser instead. :)
And then the question of what to do when certain other common MUAs and MTAs
manage to ignore the RFCs and produce something ugly - although the biggest
offender is still the various poorly written spamware out there. But since no
spam filter is 100% effective, we *do* have to be robust in the face of crap.
Unfortunately, parsers created from a BNF or similar tend to be a tad
brittle when recovering from syntactically incorrect input (anybody ever
had a missing ) or } leave an error message 500+ lines away from the
actual error?
pgpY1fy49RWu0.pgp
Description: PGP signature
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal), (continued)
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal), Ken Hornstein, 2010/12/01
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal), Lyndon Nerenberg, 2010/12/02
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal), Paul Fox, 2010/12/02
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal), belg4mit, 2010/12/02
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal), Jon Steinhart, 2010/12/02
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal),
Valdis . Kletnieks <=
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal), markus schnalke, 2010/12/02
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal), Robert Elz, 2010/12/02
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal), Mike O'Dell, 2010/12/02
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal), Anders Eriksson, 2010/12/03
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal), Joel Uckelman, 2010/12/02
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal) [really mime parsing], Jon Steinhart, 2010/12/02
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal), Joel Uckelman, 2010/12/02
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal), norm, 2010/12/02
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal), heymanj, 2010/12/02
- Re: [Nmh-workers] Understanding nmh (aka. What's the goal), Michael Richardson, 2010/12/03