bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] download that require login/password form submit


From: Fernando Cassia
Subject: Re: [Bug-wget] download that require login/password form submit
Date: Fri, 9 Apr 2010 14:24:37 -0300

On Fri, Apr 9, 2010 at 12:14 PM, Keisial <address@hidden> wrote:
> Voytek Eymont wrote:
>> Micah,
>>
>> thanks !!!!!!
>> I'm loging in OK.
>>
>> on next step I do like:
>>
>> wget --load-cookies=my-cookies.txt --save-cookies=my-cookies.txt
>> --keep-session-cookies
>> http://www.domain.tld/main.htm?_template=advanced&_module=active_list
>>
>> that fails until I put "" around the http string like so:
>>
>> wget --load-cookies=my-cookies.txt --save-cookies=my-cookies.txt
>> --keep-session-cookies
>> "http://www.domain.tld/main.htm?_template=advanced&_module=active_list";
>>
>> or should I use some '%' characters ? for & ? or just " " around https
>> string ?
>>
>
> Just surround it with double " " or single ' ' quotes.
> If & is not quoted your shell thinks you want to execute a program called
> wget and then assign active_list to a shell variable called _module (if
> there
> wasn't a = it would try to run a program called _module, which would give
> you an error message you could notice)
>
>
>> next question: the resulting file has lots and lots of bumpf like
>> space.gif galore, etc,
>>
>> how do I make into text as much as possible, is there a wget function, or ?
>>
> Remove anything between < and >, then unescape the entities. That should
> give you quite clean text with a minimal effort.

Use grep, and sed. Grep and sed are your friends

http://www.tech-recipes.com/rx/330/remove-html-tags-from-a-file/

FC




reply via email to

[Prev in Thread] Current Thread [Next in Thread]