Re: [Bug-wget] can't reject robots.txt in recursive mode

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] can't reject robots.txt in recursive mode

From:	Giuseppe Scrivano
Subject:	Re: [Bug-wget] can't reject robots.txt in recursive mode
Date:	Wed, 06 Aug 2014 15:38:43 +0200
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Ilya Basin <address@hidden> writes:

> Here's my script to download IBM javadocs:
>
> (
>     rm -rf wget-test
>     mkdir wget-test
>     cd wget-test
>     
> starturl="http://www-01.ibm.com/support/knowledgecenter/api/content/SSZLC2_7.0.0/com.ibm.commerce.api.doc/allclasses-noframe.html";
>     wget -d -r -R robots.txt --page-requisites -nH --cut-dirs=5 --no-parent 
> "$starturl" 2>&1 | tee wget.log
> )
>
> regardless of '-R' option, wget downloads robots.txt and refuses to
> follow links starting with "/support/knowledgecenter/api/".

No need to use any workaround, you should be able to achieve the same
behavior with "-e robots=off" as documented.

Regards,
Giuseppe

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-wget] can't reject robots.txt in recursive mode, Ilya Basin, 2014/08/06
- Re: [Bug-wget] can't reject robots.txt in recursive mode, Giuseppe Scrivano <=

Prev by Date: [Bug-wget] can't reject robots.txt in recursive mode
Next by Date: Re: [Bug-wget] Remove set_windows_fd_as_blocking_socket()?
Previous by thread: [Bug-wget] can't reject robots.txt in recursive mode
Next by thread: [Bug-wget] [Bug-Wget][PATH} Cleanup the test suite and add documentation
Index(es):
- Date
- Thread