classpath
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: java.net.URI implementation


From: Giannis Georgalis
Subject: Re: java.net.URI implementation
Date: 10 Feb 2003 23:39:01 +0200
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50

Stephen Crawley <address@hidden> writes:

> While the complete URI grammar looks a complex, a URI string typically
> doesn't need to be fully parsed.  You only need to fully parse the
> components that are requested. 

I think you are wrong in this, the URI parser should accept *only*
valid URIs and not valid components within an URI. But I cannot be
sure untill I run some tests against sun's sdk. If what you said was
the case, then the regex-based, already submitted patch would be fine.

> Note that the JDK 1.4 spec for the URI(String) constructor states
> that its parsing more relaxed than the BNF in RFC 2396.  How relaxed
> it is can only be determined by black box testing against the JDK 1.4
> implementation.  If I was doing this, my first step would be to build
> some extensive Mauve test cases ...

I'm consulting JDK 1.4.1 API documentation, which does not state that
URI parser's grammar is more relaxed. On the contrary the assertions
made in URI(String) cover some implications within the RFC in
question which are not depicted in the BNF grammar.

> I'd recommend hand building a pure Java parser. That way, the Classpath
> build process doesn't depend on an external parser or lexer generator,
> and the source code will be easier to understand.

We could include the generated files (as Brian also noted) and avoid
the exotic dependencies. However, I was dissapointed to find out,
that jlex and jflex did not support parsing from a string (flex
supports it). That changed radically my plans... and on a second
thought native code with flex goes too much.

> A hand-built parser for grammar as simple as this should be easy to
> implement / maintain.  Especially considering Sun's documented deviations
> from the RFC grammar, and possible undocumented deviations.

Yes, after the above facts, I'm now thinking of implementing a
hand-written parser (if I don't get a better suggestion). As for the
"posible undocumented deviations", they would be bugs (or features
;-)). I think we shouldn't rely on these at all.

> Finally, the chance that the RFC URI syntax will change radically is pretty
> small, IMO.

You are probably right ... but don't forget M$, they'll want to make
their very own extensions to URI, sun will have to follow them
etc. ;-)

-- 
 Object-oriented programming is an exceptionally bad 
idea which could only have originated in California.
    - Edsger Dijkstra (attributed)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]