|
| From: | Luuk n/a (@Luuk34) |
| Subject: | wget2 | ... not followed (disallowed by robots.txt) (#653) |
| Date: | Sat, 27 Jan 2024 11:24:33 +0000 |
Luuk n/a created an issue: https://gitlab.com/gnuwget/wget2/-/issues/653 A download started using: wget2.exe --no-parent -r --wait 5 --random-wait https://ghostscript.readthedocs.io/en/gs10.02.0/ produces some lines ending in `not followed (disallowed by robots.txt)` ``` URL 'https://ghostscript.readthedocs.io/en/gs10.02.0/_images/icon-docx.svg' not followed (disallowed by robots.txt) Adding URL: https://ghostscript.readthedocs.io/en/gs10.02.0/_images/icon-odt.svg URL 'https://ghostscript.readthedocs.io/en/gs10.02.0/_images/icon-odt.svg' not followed (disallowed by robots.txt) Adding URL: https://ghostscript.readthedocs.io/en/gs10.02.0/_images/icon-xlsx.svg URL 'https://ghostscript.readthedocs.io/en/gs10.02.0/_images/icon-xlsx.svg' not followed (disallowed by robots.txt) Adding URL: https://ghostscript.readthedocs.io/en/gs10.02.0/_images/icon-pptx.svg URL 'https://ghostscript.readthedocs.io/en/gs10.02.0/_images/icon-pptx.svg' not followed (disallowed by robots.txt) Adding URL: https://ghostscript.readthedocs.io/en/gs10.02.0/_images/icon-txt.svg URL 'https://ghostscript.readthedocs.io/en/gs10.02.0/_images/icon-txt.svg' not followed (disallowed by robots.txt) ``` Trying to download a sing file from above result, will succeed: ``` wget2 https://ghostscript.readthedocs.io/en/gs10.02.0/_images/icon-txt.svg [0] Downloading 'https://ghostscript.readthedocs.io/en/gs10.02.0/_images/icon-txt.svg' ... Saving 'icon-txt.svg' HTTP response 200 [https://ghostscript.readthedocs.io/en/gs10.02.0/_images/icon-txt.svg] ``` The file `robots.txt` looks like: ``` D:\TEMP\gs2>dir robots.txt /s/b D:\TEMP\gs2\ghostscript.readthedocs.io\robots.txt D:\TEMP\gs2>type ghostscript.readthedocs.io\robots.txt User-agent: * Disallow: # Allow everything Sitemap: https://ghostscript.readthedocs.io/sitemap.xml ``` -- Reply to this email directly or view it on GitLab: https://gitlab.com/gnuwget/wget2/-/issues/653 You're receiving this email because of your account on gitlab.com.
| [Prev in Thread] | Current Thread | [Next in Thread] |