nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Nmh-workers] RFC2047 section 5 and other MIME issues for the new scan


From: Jon Steinhart
Subject: [Nmh-workers] RFC2047 section 5 and other MIME issues for the new scan
Date: Sun, 14 Nov 2010 09:45:22 -0800

Any opinions out there on the right way to handle this on the decoding end of
things?

These rules seem pretty dumb to me.  Maybe I'm missing something, but they
create a lot of work possibly because they predate unicode and therefore
can't handle the notion of a word that contains multiple character sets.

In keeping with the RFC spirit, it seems that these rules must be followed
when encoding messages.  My preference, again in keeping with RFC spirit,
is to be more relaxed about decoding.

For example, it seems silly to complain about cases like =?...?==?...?=
instead of the required =?...?= =?...?= as it's actually more work to catch
this as an error case.  Also, the decoding behavior in the first case is not
defined.  Does the second =?...?= get treated as literal text since it is
technically not an encoded word since there's no space between it and the
previous encoded word?  Doesn't make a lot of sense to me.

My preference is to say that we'll treat any =?...?= as an encoded word
wherever it appears and that we'll decode it.  It appears that the authors of
RFC2047 expect that everything will be parsed into tokens and examined before
looking for encoded words.

My current plan for the new scan code is to:

 1.  Read a header field name.

 2.  Read a header field body if the header field is used by the format,
     unfolding folded lines in the process.

 3.  Look for encoded words and decode them creating a UTF-8 version of the
     header field body.

 4.  Break up the header field body into parameters if it is a MIME header.
     Apply the RFC2231 rules.

Separate from the above, I plan to add new components to the profile.  I think
that we need the ability to specify additional format strings strings on a
per mime type basis.  So, for example, we may want to format the file name and
size for a jpeg image.

Let me know what you think.  I'd prefer to know before I do the work :)

Jon



reply via email to

[Prev in Thread] Current Thread [Next in Thread]