[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] [bug #20398] Save a list of the links that were not follo
From: |
Tim Rühsen |
Subject: |
Re: [Bug-wget] [bug #20398] Save a list of the links that were not followed |
Date: |
Thu, 07 May 2015 20:59:59 +0200 |
User-agent: |
KMail/4.14.2 (Linux/3.16.0-4-amd64; KDE/4.14.2; x86_64; ; ) |
Hi Jookia,
if you want us to include your patch (and it is welcome of course),
you have to sign a copyright assignment.
Please email the following information to address@hidden with a CC
to address@hidden, address@hidden and address@hidden, and we
will send you the assignment form for your past and future changes.
Please use your full legal name (in ASCII characters) as the subject
line of the message.
----------------------------------------------------------------------
REQUEST: SEND FORM FOR PAST AND FUTURE CHANGES
[What is the name of the program or package you're contributing to?]
[Did you copy any files or text written by someone else in these changes?
Even if that material is free software, we need to know about it.]
[Do you have an employer who might have a basis to claim to own
your changes? Do you attend a school which might make such a claim?]
[For the copyright registration, what country are you a citizen of?]
[What year were you born?]
[Please write your email address here.]
[Please write your postal address here.]
[Which files have you changed so far, and which new files have you written
so far?]
Am Donnerstag, 7. Mai 2015, 15:58:53 schrieb Jookia:
> Follow-up Comment #5, bug #20398 (project wget):
>
> I've found myself in need of this feature. I'm trying to download a website
> recursively without pulling in every single ad and its HTML. I'd like to be
> able to find out which URLs were rejected, why, and information about the
> domains (host, port, etc.)
>
> I've patched my copy of Wget to dump all of this in to a CSV file which I
> can then tool through to get my desired results:
>
>
>
> % grep "DOMAIN" rejected.csv | head -1
> DOMAIN,http://c0059637.cdn1.cloudfiles.rackspacecloud.com/flowplayer-3.2.6.m
> in.js,SCHEME_HTTP,c0059637.cdn1.cloudfiles.rackspacecloud.com,80,flowplayer-
> 3.2.6.min.js,(null),(null),(null),http://redated/,SCHEME_HTTP,redacted,80,,(
> null),(null),(null) % grep "DOMAIN" rejected.csv | cut -d"," -f4 | sort |
> uniq
> 0.gravatar.com
> 1.gravatar.com
> c0059637.cdn1.cloudfiles.rackspacecloud.com
> lh3.googleusercontent.com
> lh4.googleusercontent.com
> lh5.googleusercontent.com
> lh6.googleusercontent.com
>
>
> I've included a patch made in a few hours that does this.
>
> (file #33955)
> _______________________________________________________
>
> Additional Item Attachment:
>
> File name: 0001-rejected-log-Add-option-to-dump-URL-rejections-to-a-.patch
> Size:14 KB
>
>
> _______________________________________________________
>
> Reply to this item at:
>
> <http://savannah.gnu.org/bugs/?20398>
>
> _______________________________________________
> Message sent via/by Savannah
> http://savannah.gnu.org/
signature.asc
Description: This is a digitally signed message part.