classpath
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: java.net.URI implementation


From: Stephen Crawley
Subject: Re: java.net.URI implementation
Date: Tue, 11 Feb 2003 10:24:19 +1000

> Stephen Crawley <address@hidden> writes:
> 
> > While the complete URI grammar looks a complex, a URI string typically
> > doesn't need to be fully parsed.  You only need to fully parse the
> > components that are requested. 
> 
> I think you are wrong in this, the URI parser should accept *only*
> valid URIs and not valid components within an URI. But I cannot be
> sure untill I run some tests against sun's sdk. If what you said was
> the case, then the regex-based, already submitted patch would be fine.

Here's an example.  The getRawPath() method returns the path part of
the URI with escaping in place.  The getPath() method decodes the
path and returns that, throwing an exception if the encoding is wrong.
This suggests to me that URI(String) should not attempt to parse the
escape sequences.

> > Note that the JDK 1.4 spec for the URI(String) constructor states
> > that its parsing more relaxed than the BNF in RFC 2396.  How relaxed
> > it is can only be determined by black box testing against the JDK 1.4
> > implementation.  If I was doing this, my first step would be to build
> > some extensive Mauve test cases ...
> 
> I'm consulting JDK 1.4.1 API documentation, which does not state that
> URI parser's grammar is more relaxed. On the contrary the assertions
> made in URI(String) cover some implications within the RFC in
> question which are not depicted in the BNF grammar.

I think we are mostly saying the same thing; i.e. the BNF in RFC 2396 is
not complete.  However, the Sun people who wrote the javadoc seem to be
implying that the RFC 2396 spec is (at least) ambiguous on the points in
which URI(String) "deviates".  Also doesn't the last deviation allow
URI(String) to handle URIs that contain unescaped Unicode in some
components?  Isn't this a substantive extension (relaxation) of RFC
2396?

> > I'd recommend hand building a pure Java parser. That way, the Classpath
> > build process doesn't depend on an external parser or lexer generator,
> > and the source code will be easier to understand.
> 
> We could include the generated files (as Brian also noted) and avoid
> the exotic dependencies.

For the record, including generated code in Classpath without
integrating the tools that generate would present maintenance problems.
You DO need the tools if you are going to change the parser ... unless
you are mad enough to try hand patch the parser tables.  Obviously, if
the grammar you are trying to implement is sufficiently complex, these
issues would be minor compared with the difficulty of implementing an
efficient parser by hand.
  
> As for the "posible undocumented deviations", they would be bugs (or 
> features ;-)). I think we shouldn't rely on these at all.

I disagree.  According to Sun, in cases where the implementation and
javadoc disagree, the former represents the conformance point.  Each
place where Classpath doesn't conform to the JDK behaviour represents a 
potential problem for someone trying to port Java applications between
the Sun and Classpath implementations of the JRE.

-- Steve






reply via email to

[Prev in Thread] Current Thread [Next in Thread]