classpath
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: HTTPURLConnection.connect() buffers its entire input.


From: Chris Burdess
Subject: Re: HTTPURLConnection.connect() buffers its entire input.
Date: Fri, 9 Sep 2005 09:13:48 +0100

David Daney wrote:
It seems the the current implementation of HTTPURLConnection.connect() buffers the entire response before returning.

Is that a correct analysis?

Yes.

This can be problematical if the content is larger than the heap. It is even worse than that as it makes a copy of the content, so the content can only be half as large as the heap.

Does anyone know the rational behind doing it this way?

Our implementation uses the inetlib HTTP client in order to leverage numerous HTTP features such as chunked and compressed transfer-codings, TLS, and HTTP 1.1.

The design of the inetlib HTTP client is based on callbacks. You register a listener to receive notification of HTTP response data, rather than pulling the data yourself. This leaves the client in proper control of the stream and permits correct handling of HTTP persistent connections (reuse of the same TCP connection for multiple HTTP requests).

The design of the URLConnection API is pull-based. Therefore we either have to buffer an entire response before returning, or use multiple threads, a pipe, and a much more complex implementation to manage cleanup of resources. Also note that with HTTP 1.1 chunked encoding, you can have headers after the response body, which is not something that most naive developers will expect. This means that in the non-buffered implementation you could have

  connection.getHeader("My-Header"); // null
  connection.getInputStream();
  // read until -1
  connection.getHeader("My-Header"); // non-null

In practice I haven't seen this in many servers, but it is still a possibility.

Tom Tromey and I have discussed the possibility of this non-buffered implementation and of a hybrid model which uses a heuristic based on the content length to decide which of these implementations to use, but we haven't really had time to thrash it all out yet.

If you are dealing with streaming servers or with very large responses, you probably shouldn't be using the URLConnection API in any case - consider using the inetlib client directly as it will be more efficient.
--
Chris Burdess
  "They that can give up essential liberty to obtain a little safety
  deserve neither liberty nor safety." - Benjamin Franklin





reply via email to

[Prev in Thread] Current Thread [Next in Thread]