classpath
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: java.net.URI implementation


From: Dalibor Topic
Subject: Re: java.net.URI implementation
Date: Mon, 10 Feb 2003 14:46:58 -0800 (PST)

Hi Giannis,

--- Giannis Georgalis <address@hidden> wrote:
> Hello,
> 
> After a discussion I had with Michael Koch, I
> decided to implement
> the java.net.URI class. I found in the classpath
> mail archives a
> patch submited by Mr. Topic (I think) in which he

yes that was me.

> implemented part of
> the URI class using:
>   /**
>    * Regular expression for parsing URIs.
>    *
>    * Taken from RFC 2396, Appendix B.
>    * This expression doesn't parse IPv6 addresses.
>    */
>   private static final String URI_REGEXP =
>    
>
"^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?";
> 
> Appart from the fact that this expression cannot
> parse IPv6
> addresses, it cannot be considered as a substitute
> of an URI parser,
> as it can only break up the parts of a *valid* URI.

I doubt adding basic IPv6 parsing to the regexp used
should pose significant problems.

> For example the
> uri : "http://1333.2123.232323.0.9.9~84.1"; is not
> valid, but can be
> parsed from this regexp.

You are mixing things up here. That's a valid URI.
Sun's JDK 1.4.1_01 on linux prints for a trivial test
program:

/usr/lib/j2sdk1.4.1_01/bin/java test
"http://1333.2123.232323.0.9.9~84.1";
http://1333.2123.232323.0.9.9~84.1
Authority: 1333.2123.232323.0.9.9~84.1
Fragment: null
Host: null
Path: 
Port: -1
Query: null
Scheme: http
SchemeSpecificPart: //1333.2123.232323.0.9.9~84.1
UserInfo: null

this is the test program I used:
address@hidden:~> cat test.java 
import java.net.*;

public class test {
        public static void main (String [] args) {
                try {
                        URI u = new URI(args[0]);
                        printURI(u);
                }
                catch(Exception e) {
                        e.printStackTrace();
                }
        }

        public static void printURI(URI u) {
                        System.out.println(u);
                        System.out.println("Authority:
" + u.getRawAuthority());
                        System.out.println("Fragment:
" + u.getRawFragment());
                        System.out.println("Host: " +
u.getHost());
                        System.out.println("Path: " +
u.getRawPath());
                        System.out.println("Port: " +
u.getPort());
                        System.out.println("Query: " +
u.getRawQuery());
                        System.out.println("Scheme: "
+ u.getScheme());
                       
System.out.println("SchemeSpecificPart: " +
u.getRawSchemeSpecificPart());
                        System.out.println("UserInfo:
" + u.getRawUserInfo());
        }
}

Here's my question for you, as you've said you've read
the URI RFCs: which section of the URI RFC does the
URI you considered not valid violate?

> After some digging in various RFCs I have written a
> (complete)
> grammar (in BNF) for parsing URIs (I'll append the
> grammar at the end
> of this message).

That's nice. But it's overkill. 

You can achieve the same effect by using the regexp to
separate URI components and doing some post-processing
(preferably using simple regexps) on the generated
Strings to ensure they contain only allowed
characters, to get the port number of hierarchical
URIs etc.

I could have implemented URI parsing using a parser
generator, but it seemed to me like the wrong solution
to the problem: instead of simple regexp and 20 lines,
you get a compile time dependency on a parser
generator, x lines for the grammar + y lines for the
generated code. I think your grammar alone is bigger
than my parsing code.

> So the URI parser can be implemented in either
> native (c code) or
> java. Implementing it in java, will be quite hard
> and difficult to
> maintain and keep up with potential URI changes. On
> the other hand,
> if it is implemented in c, it will be *very* easy to
> implement and
> maintain as I'll use flex and maximum parsing speed
> will be
> achieved. Additionally, provided that the URI
> grammar is very simple,
> bison (yacc) is not needed. It would be easy to
> implement the URI
> parser in java if jlex is used (that's another
> option I'm
> considering).

I don't understand how implementing the URI parser in
C would somehow magically make it easier to maintain
than if its implemented in java. There are parser
generators for java, too, as you already know. Sounds
like you're comparing oranges and apples to me.

That being said, feel free to reimplement URI parsing
from scratch. I can understand your enthusiasm.
Programming parsers can be fun, writing grammars is a
nice passtime as well. 

I would humbly propose using my code and fixing its
shortcomings, but I can't force anyone to use it ;) I
know fully well that it is not a full implementation
of java.net.URI (and I think I've stated that in the
mail accompanying the patch), but it is a good
starting point, in my opinion. It's certainly good
enough to run Saxon 7.3 on kaffe ;)

best regards,
dalibor topic

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com




reply via email to

[Prev in Thread] Current Thread [Next in Thread]