[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] Need community input/comments
From: |
Tim Rühsen |
Subject: |
[Bug-wget] Need community input/comments |
Date: |
Thu, 13 Dec 2018 15:09:09 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 |
Hi everybody,
could you please post some opinions / ideas / comments on how to handle
file naming in certain cases !?
This an issue from Wget2 development (see
https://gitlab.com/gnuwget/wget2/issues/415), but we had similar issues
in wget1.x in the last years.
I copy here my comment from today 31.12.2018 that tries to sum it up:
(Not mentioned: ambiguous file naming can corrupt files when using
-k/--convert-links.)
PROBLEM
During --recursive downloads, we don't generate unique file names (e.g.
ending with .1, .2 etc), at least not when a directory structure is
created locally.
This being said, we have a file naming synchronisation issue between
server (GET request) and client (local file naming). The following three
GET requests result in 3 different contents, while we only have a single
file name for them:
GET /foo
GET /foo/
GET /foo/index.html
The first one is saved a file foo. For the second one we have to create
a directory foo to save index.html in. So we have to rename the existing
file foo to something else (what ?). Then the third GET request would be
saved into foo/index.html - which already exists. Which one to rename ?
And how to rename ? The naming should be unambiguous so that two
recursive downloads always generate the same file structure. In other
words: the order of the three downloads should have no influence on the
file naming.
(one possible) SOLUTION
(GET /foo/index.html) tries to create the directory foo. If foo already
exists as file: move the file away, create dir foo, move the file to
foo/.directory_noslash and save/overwrite the response content as
foo/index.html.
(GET /foo/) tries to create the directory foo. If foo already exists as
file: move the file away, create dir foo, move the file to
foo/.directory_noslash and save/overwrite the response content as
foo/.directory_slash.
(GET /foo) tries to save foo as file. If foo already exists as file:
overwrite it. If foo already exists as directory: save/overwrite the
response content as foo/.directory_noslash.
This is not 100% Wget1.x compatible but allows precise local linking
with -k. And I remember some long-standing issues that would be solved
with such an approach as well.
The names .directory_slash and .directory_noslash can be made
configurable, just in case there are websites using these names.
Regards, Tim
signature.asc
Description: OpenPGP digital signature
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Bug-wget] Need community input/comments,
Tim Rühsen <=