[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Wget-dev] wget2 | Incompatible Behavior: -p (--page-requisites) and -np
From: |
Tsukasa OI |
Subject: |
[Wget-dev] wget2 | Incompatible Behavior: -p (--page-requisites) and -np (--no-parent) (#379) |
Date: |
Thu, 03 May 2018 10:13:27 +0000 |
New Issue was created.
Issue 379: https://gitlab.com/gnuwget/wget2/issues/379
Author: Tsukasa OI
Assignee:
Wget1 and Wget2 behaves differently when:
1. Both `-p` and `-np` are given
2. A page requisite (images/CSS etc.) and the original page are on the same
host (shares the same domain)
3. A page requisite exists outside the directory that contains the original
page (HTML file)
Especially, this behavior affects recursive downloading. For instance, on a
website (`http://example.com/`) with following files:
* `/style.css`: global style for a website
* `/category/index.html`: local page index (refers `/style.css` and links to
`/category/page.html`)
* `/category/page.html`: local page (but refers `/style.css`)
`wget -r -l 0 -p -np http://example.com/category/index.html` downloads all
three files but `wget2 -p -r -l 0 -p -np
http://example.com/category/index.html` doesn't download global `style.css`.
This is the simple example but the website I want to crawl is far more complex
(which makes `--accept-regex` and `--reject-rejex` nearly unusable).
While this behavior is consistent in _some_ way (works just like `-H`
[`--span-hosts`]) but not being able to retrieve page requisites in the
recursive download is not desirable for me (and in general).
I think it can be resolved by using `link_inline` somehow but I'm not sure:
1. Whether using `link_inline` can fix the issue
2. Whether changing the behavior of Wget2 just like Wget1 is good or not (is
there any better behavior than Wget1 [and current Wget2]? can we have a
command-line option?)
...partly because I first saw the source code of wget (1 and 2) today.
---
Reply to this email directly or view it on GitLab:
https://gitlab.com/gnuwget/wget2/issues/379
You're receiving this email because of your account on gitlab.com.
- [Wget-dev] wget2 | Incompatible Behavior: -p (--page-requisites) and -np (--no-parent) (#379),
Tsukasa OI <=
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available
- Message not available