[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] [bug #50935] TEXTHTML not properly set if page is already dow
From: |
anonymous |
Subject: |
[Bug-wget] [bug #50935] TEXTHTML not properly set if page is already downloaded |
Date: |
Wed, 3 May 2017 20:08:06 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0 |
URL:
<http://savannah.gnu.org/bugs/?50935>
Summary: TEXTHTML not properly set if page is already
downloaded
Project: GNU Wget
Submitted by: None
Submitted on: Thu 04 May 2017 12:08:05 AM UTC
Category: Program Logic
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name:
Originator Email:
Open/Closed: Open
Discussion Lock: Any
Release: trunk
Operating System: GNU/Linux
Reproducibility: Every Time
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: None
_______________________________________________________
Details:
Running (for example):
wget -xH -nc 'https://news.ycombinator.com/item?id=14245538'
wget -pH -nc 'https://news.ycombinator.com/item?id=14245538'
results in wget not checking the resulting html file for links. This is
caused by wget saving the file without an html suffix, and only checking the
file extension of the file to determine if it is an html file (this check even
has a "#### Bogusness alert."). This could possibly be fixed by checking the
file for a "<!DOCTYPE html" header, or checking if it begins with an "<html>"
tag.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?50935>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
- [Bug-wget] [bug #50935] TEXTHTML not properly set if page is already downloaded,
anonymous <=