classpath
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: HTTPURLConnection.connect() buffers its entire input.


From: Nic Ferrier
Subject: Re: HTTPURLConnection.connect() buffers its entire input.
Date: Fri, 09 Sep 2005 10:46:12 +0100

Chris Burdess <address@hidden> writes:

> David Daney wrote:
>> It seems the the current implementation of HTTPURLConnection.connect() 
>> buffers the entire response before returning.
>>
>> Is that a correct analysis?
>
> Yes.
>
>> This can be problematical if the content is larger than the heap.  It 
>> is even worse than that as it makes a copy of the content, so the 
>> content can only be half as large as the heap.
>>
>> Does anyone know the rational behind doing it this way?
>
> Our implementation uses the inetlib HTTP client in order to leverage 
> numerous HTTP features such as chunked and compressed transfer-codings, 
> TLS, and HTTP 1.1.
>
> The design of the inetlib HTTP client is based on callbacks. You 
> register a listener to receive notification of HTTP response data, 
> rather than pulling the data yourself. This leaves the client in proper 
> control of the stream and permits correct handling of HTTP persistent 
> connections (reuse of the same TCP connection for multiple HTTP 
> requests).
>
> The design of the URLConnection API is pull-based. Therefore we either 
> have to buffer an entire response before returning, or use multiple 
> threads, a pipe, and a much more complex implementation to manage 
> cleanup of resources. Also note that with HTTP 1.1 chunked encoding, 
> you can have headers after the response body, which is not something 
> that most naive developers will expect. This means that in the 
> non-buffered implementation you could have
>
>    connection.getHeader("My-Header"); // null
>    connection.getInputStream();
>    // read until -1
>    connection.getHeader("My-Header"); // non-null
>
> In practice I haven't seen this in many servers, but it is still a 
> possibility.
>
> Tom Tromey and I have discussed the possibility of this non-buffered 
> implementation and of a hybrid model which uses a heuristic based on 
> the content length to decide which of these implementations to use, but 
> we haven't really had time to thrash it all out yet.
>
> If you are dealing with streaming servers or with very large responses, 
> you probably shouldn't be using the URLConnection API in any case - 
> consider using the inetlib client directly as it will be more 
> efficient.

I have spoken to Chris before about my own http library which uses
non-blocking IO. This would be a solution to this problem but also
require another thread (for the selector).

It also does not have 1.1 features like pipelining though I will add
them if I get the time.


Nic




reply via email to

[Prev in Thread] Current Thread [Next in Thread]