bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] persistence with multiple hostnames


From: Ryan Rawdon
Subject: [Bug-wget] persistence with multiple hostnames
Date: Tue, 17 Apr 2012 12:16:09 -0400

I was speaking with Micah on IRC today regarding a behavior in wget which is 
different than curl and most or all browsers.

Generally HTTP clients do not use a given persistent connection for more than 
one hostname, which is why tricks work like spreading static content across 
multiple name-based vhosts on the same IP address to encourage more 
parallelization in the fetching of a page's static elements.

However, wget appears to use persistent connections for multiple hostnames (see 
below).  In the case below, a connection is opened to soldat.pl which 302s to a 
new hostname.  Wget resolves the new hostname and selects the same address, and 
decides to reuse the existing connection to this IP address.

The RFC does not appear to address the re-use of persistent connections with 
regard to hostname, so the behavior is permissible (and fine from a protocol 
standpoint since Host is specified with each request).

The problem stems from usage of privilege separation between virtualhosts.  In 
the case below, before I fixed it today, wget was receiving 403 on the second 
request because the user that owned this fd on the server side did not have 
privileges to access the content for the soldat.thd.vg vhost.

This is probably a reproducible behavior with any page fetched with wget that 
302s between two privilege-separated vhosts on the same server, or scraping a 
page with elements from two or more hosts on the same IP address.

This behavior appears to be permissible based on the RFC, so this is more a 
discussion of whether this is intended behavior in wget, a bug, or an 
opportunity to behave more like curl and every day GUI browsers.

Micah took a quick look over the source (or was previously familiar with it), 
and it sounds like there may be checks in place which should have prevented 
this, however I did look to confirm.

nova-dhcp-host111:tmp ryan$ wget http://soldat.pl
--2012-04-17 11:57:25--  http://soldat.pl/
Resolving soldat.pl (soldat.pl)... 2607:fd50:1:91b0::50:1d8, 192.168.152.5
Connecting to soldat.pl (soldat.pl)|2607:fd50:1:91b0::50:1d8|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://soldat.thd.vg/ [following]
--2012-04-17 11:57:25--  http://soldat.thd.vg/
Resolving soldat.thd.vg (soldat.thd.vg)... 2607:fd50:1:91b0::50:1d8, 
192.168.152.5
Reusing existing connection to soldat.pl:80.
HTTP request sent, awaiting response... 302 Found
Location: http://soldat.thd.vg/en/ [following]
--2012-04-17 11:57:26--  http://soldat.thd.vg/en/
Reusing existing connection to soldat.pl:80.
HTTP request sent, awaiting response... 200 OK
Cookie coming from soldat.thd.vg attempted to set domain to soldat.thd.vg
Cookie coming from soldat.thd.vg attempted to set domain to soldat.thd.vg
Cookie coming from soldat.thd.vg attempted to set domain to soldat.thd.vg
Length: unspecified [text/html]

Here is the original report from a userwhich shows the 403:


address@hidden:~$ wget www.soldat.pl
--2012-04-17 11:50:29--  http://www.soldat.pl/
Resolving www.soldat.pl... 67.23.118.186, 2607:fd50:1:91b0::50:1d8
Connecting to www.soldat.pl|67.23.118.186|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://soldat.thd.vg/ [following]
--2012-04-17 11:50:29--  http://soldat.thd.vg/
Resolving soldat.thd.vg... 67.23.118.186, 2607:fd50:1:91b0::50:1d8
Reusing existing connection to www.soldat.pl:80.
HTTP request sent, awaiting response... 403 Forbidden
2012-04-17 11:50:29 ERROR 403: Forbidden.

address@hidden:~$ wget -6 www.soldat.pl
--2012-04-17 11:50:39--  http://www.soldat.pl/
Resolving www.soldat.pl... 2607:fd50:1:91b0::50:1d8
Connecting to www.soldat.pl|2607:fd50:1:91b0::50:1d8|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://soldat.thd.vg/ [following]
--2012-04-17 11:50:39--  http://soldat.thd.vg/
Resolving soldat.thd.vg... 2607:fd50:1:91b0::50:1d8
Reusing existing connection to www.soldat.pl:80.
HTTP request sent, awaiting response... 403 Forbidden
2012-04-17 11:50:39 ERROR 403: Forbidden.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]