[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Nmh-workers] The curse of m_getfld()
From: |
Ken Hornstein |
Subject: |
[Nmh-workers] The curse of m_getfld() |
Date: |
Wed, 25 Jan 2012 22:40:58 -0500 |
I started looking back at the list of things I wanted to do that I
posted here in the beginning on December, and I realize now that
we're in pretty good shape on them! We have a new release out the
door, we've migrated to Automake and cleaned up our Autoconf setup
a lot, we are now packaged with MacPorts and we have our own spec
file (thanks to David Levine!).
I would note that it sure would help out if people would try out the new
release if you haven't already ... it should literally only take you a
few minutes. It it doesn't then we've done something wrong and let us
know about it.
But on to the main show ... the last one on my list is "better MIME/charset
handling", and I've been thinking about that and looking at things. And
I've realized that to do it right, where everything is UTF-8 interally,
means that we need to get data off of disk in UTF-8 as soon as possible.
And it seems that while all roads lead to Rome, all data in nmh goes
through m_getfld() at some point. And that's where the fun begins ...
the function is LITERALLY cursed! :-)
Okay, the speed concerns mattered a lot back on a VAX; I think everyone
agrees that nowadays it's not a big deal. I'm not worried about making
it slower; I'm more thinking about adapting things to the modern reality
of MIME messages and different character sets.
Here's my thinking: the bulk of MIME parsing and translation _has_
to happen in m_getfld(). So in the New World Order, m_getfld()
reads in a message off of disk, translates anything necessary into
UTF-8, does things like RFC-2047 header parsing, base-64 &
quoted-printable decoding, and returns UTF-8 strings to the calling
functions.
This is relatively straightforward for headers; what to do about
messages with multiple MIME parts is a little trickier. But I think
we could take the routines in mhparse.c and adapt them so m_getfld()
returns a sequence of BODY parts, with an extra returned structure
that has the MIME information about the part in question.
One of the major wrinkles is ... well, m_getfld() is a complicated hot
mess. I know some of the people here have been inside of it; if they
wanted to impart some public knowledge here about it, I for one sure
would appreciate it.
A few other things:
- I know work on nmh tends to be bursty ... and in my case, that's definitely
true. I think I am going to have to work on other things soon, and I
don't know if I'll get a chance to get to the MIME work.
- Given the above ... do people think there is value in rolling another
release soon-ish? We've got a number of a bug fixes, a new build system,
significant improvements to the format strings that let people (in
some cases) select headers based on message contents, and if I get
the repl work done then that means one of my long-outstanding beefs
about replies will be solved (not the biggest one, sadly, but we can't
have everything).
- I know Paul Vixie was asking about putting most of nmh in a shared library,
and I think I've done 70% of that work with the Automake migration.
If someone wanted to take that over the finish line, I think that
would be great. A quick glance at the Libtool manual suggests that it
shouldn't be hard.
--Ken
- [Nmh-workers] The curse of m_getfld(),
Ken Hornstein <=
Re: [Nmh-workers] The curse of m_getfld(), Bill Wohler, 2012/01/26