gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz] Re: Content types in the URI?


From: Benja Fallenstein
Subject: [Gzz] Re: Content types in the URI?
Date: Sun, 23 Mar 2003 15:55:16 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030319 Debian/1.3-3


Hi Gordon, hi Justin,

been mulling over this problem some more, trying to look at it from different sides, but my position essentially hasn't changed. I think that we won't agree on this point; I also think that this won't do great harm, though-- our systems should still be able to interoperate without problems. (See below.)

Gordon Mohr wrote:
But the point about URIs is that they do not need context to identify a resource. It would be nice to be able to use our URNs in the same context as a HTTP URL, for example in an <img> tag or an <a href>.

Basically, the idea about URIs is that you can use many different URI schemes in the same context, because they do not depend on a context, no?

Yep. So just do it without any type-labelling.
HTTP URLs don't include content-types. The URL...

    http://foobar.com/smiley

...could be a GIF or HTML or an executable. It's only the extra
(outside-the-URI) context provided over HTTP that sets its type --

Yes, but the point is that HTTP maps this URI to a content type plus a body, with the same 'trust level'. If I make a link to http://foobar.com/smiley, I expect that following that link will give a correct body *and* a correct content type. Both are in the hand of the same person (the server operator).

and I believe most browsers will even be tolerant of many kinds of mistyping by the server, once they see the data themselves, and coerce the file into its intended type.

Yes, but I'm sure the developers of those browsers will tell you they'd rather be served correct content types by all servers. :)

In my opinion, content type guessing is a kludgy workaround, because it is hard to do, error-prone, and hard to extend. Why do all those operating systems determine the content type of a file based on its extension, rather that 'just' looking at it and guessing its type?

The data URL scheme (RFC 2397). (It also puts the data in the URL; my opinion is that content type plus data isn't so different from content type plus cryptographic hash...)

Aha. I forgot about that one.
There might be a good reason you have to specify the type-interpretation,
but so far I haven't seen a specific case where it's necessary. This
ought to work (in a properly extended browser)...

 <img src="urn:sha1:BLAH">

...even without advance knowledge of the format of the bitstream, as
long as it turns out to be a recognizable image format.

I'm thinking that we probably won't be able to reach an argument here. I believe that providing a content type is essential for making developers' life easier; you don't think so.

(BTW, I think the analogy with data: runs deep-- why not guess the type of the content in the data URI, instead of putting the type in the URI?)

To summarize the options we've discussed:

1. Give the content type in the URI. You don't like that.
2. Make an indirect reference: The URI points to a hashed block containing a content type and the hash of the block with the actual data. This means we wouldn't use the same hash for e.g. an MP3 as you do, as we'd use the hash of the block *refering to* that MP3. I don't think you liked that. 3. Guessing the content type. I don't like that, on grounds that it makes life harder for client developers, and that may necessiate specifying the content type in the context (e.g. for digital signatures, to ensure we know the correct interpretation of the bytes that were signed). 4. Getting the content type through an out-of-bounds mechanism, like the Bitzi database. I don't like that, on grounds that it requires an Internet connection, and it doesn't have the same 'trust level' as having the content type in the URI or hashed data.

I still think content type in URI is the best alternative for us.

Now, if we cannot agree on this, can our system (Storm) still interoperate with the ones you are developing?

I should think so. Given a Storm URI, we can easily create a bitprint from it by stripping the content type. This means we can find Storm blocks and metadata about them using bitprint-based systems. Given a bitprint, we can generate a Storm URI by using either of the methods you proposed-- adding the content type from the Bitzi database, or guessing it from the content. (We could also use application/octet-stream if we don't know what kind of data it is, for some reason.)

I would also be entirely comfortable with building the Storm storage layer so that data is looked up by bitprint. For lookup, the content type doesn't matter, after all. It would only be used at the higher levels building on Storm, to interpret the data that the lower levels have retrieved. Thus, a Storm system could be used to look up bitprints, and a bitprint-based system could be used to lookup Storm blocks.

So, given all this, I do not see great harm if Storm proceeds in the way that seems most natural to me, and you proceed in the way that seems most natural to you. Right?

- Benja

P.S. Gordon, is there a public domain Java version of the new bitprint calculator, including source, already? Thanks, -b





reply via email to

[Prev in Thread] Current Thread [Next in Thread]