gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gzz] PEG pts_content_types


From: Benja Fallenstein
Subject: Re: [Gzz] PEG pts_content_types
Date: Fri, 22 Nov 2002 19:08:12 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020913 Debian/1.1-1

Tuomas Lukka wrote:

On Fri, Nov 22, 2002 at 12:17:30PM +0100, Benja Fallenstein wrote:
Hmm. This isn't really possible: in an email client, we should simply support as much as we can (ideally all official IANA-registered charsets, practically all that our Java supports). -- But you're right; we want to be sure we can read in the future what we can read now.

Well, then you should say exactly this in the SPEC...


Ok.

I discussed this with Marc this morning and we thought it probably makes sense to convert everything to Unicode when recieving it. The problem is what to do when we receive a mail in a wrong character coding (which happens)-- we *must* be able to re-construct it the way we received it, so that a human expert can look at it and reconstruct it the way it was originally. Ideas?

Because of this, it might be better to leave it as is; not mangling received
data is a pretty good principle ;)


We discussed more today and I came up with I think is a good way. Firstly, we treat email somewhat like diffs: To add a new email, we

1. Add it to Storm verbatim as a message/rfc822 block.
2. Try to externalize the different body parts into own blocks, referencing them through message/external-body, converting their Content-Transfer-Encoding to binary, and for text/xxx blocks the charset to UTF-8 and the linebreaks to \n. (If the conversion does not work for some block, we'll probably just leave that body part in the message/rfc822 block and interpret it as US-ASCII.) 3. Try to reconstruct the original, verbatim message/rfc822 block from these blocks. (The id must match, which means the bytes must be exactly the same.) 4. *If that worked*, delete the verbatim block. (If it didn't work, keep the verbatim block for future use when extracting the email from Storm, repairing a character coding etc. This is expected to be a relatively unusual exception.)

That way, we have the convenient access as normal UTF-8 blocks, but also the guarantee that we can go back to the original message format, byte-for-byte, if we need to.

Then, we only need PermanentTextScroll to support:
- text/plain with charset UTF-8
- message/rfc822, which is interpreted as US-ASCII

Alternatively, we can have a subclass of PTS, MailTextScroll, which would use message/rfc822 instead of text/plain. I think I like that.

BTW, while your point is correct by itself, the reference to what I'm always saying isn't really ;-). The Persistency Commitment does not apply to applications of Storm in the same way: The point with Storm is that if I create a block now, I can *keep* it eternally without great fuzz. In 50 years I may not have an application that can *interpret* them, but I can still naturally keep them and move them around and back them up and retrieve them from old backups (possibly consolidating multilple backups by adding together the valid blocks found in any of them). Now if I want to read those blocks again, I can write an application that can read them, because the data isn't lost. So app persistence isn't as important as persistence on the Storm level.

Well, for me app persistence, especially for email apps, is every bit as
vital as storm block persistence.


I would say almost, but not quite as absolute... because if one version of the email client doesn't read everything correctly, I know I can still go back to the old one or wait for the next version to come out... or I might even approve to a scheme that converts my mails to a different in-Storm format if that conversion works really well.

Also, I think for the moment I want to rapid-prototype the email system and then see if we need big improvements. I want to make sure we can always dump an mbox; re-reading that into a different in-Storm format may be ok if there are good reasons to switch to a different one.

Of course. I fully agree. OTOH, email is one such basic function that at
least for the blocks that store received emails, it probably would make
sense to do the persistency commitment.


Hmm, I do agree, but thinking about this, I feel we shouldn't commit ourselves to something at this point... I'd say prototype this, then before moving to using it as our own main email system give it an overhaul, spec well and then decide whether to apply the commitment.

Hmm... an amusing idea: we apply the persistency commitment selectively to the important things...


Yes...
- Benja





reply via email to

[Prev in Thread] Current Thread [Next in Thread]