bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] downloading links in a dynamic site


From: Vinh Nguyen
Subject: Re: [Bug-wget] downloading links in a dynamic site
Date: Mon, 26 Jul 2010 13:51:24 -0700

On Mon, Jul 26, 2010 at 11:18 AM, Keisial <address@hidden> wrote:
>  Vinh Nguyen wrote:
>> Dear list,
>>
>> My goal is to download some pdf files from a dynamic site (not sure on
>> the terminology).  For example, I would execute:
>>
>> wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
>> http://site.com/?sortorder=asc&p_o=0
>>
>> and would get my 10 pdf files.  On the page I can click a "Next" link
>> (to have more files), and I execute:
>>
>> wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
>> http://site.com/?sortorder=asc&p_o=10
>>
>> However, the downloaded files are identical to the previous.  I tried
>> the cookies setting and referer setting:
>>
>> wget -U firefox --cookies=on --keep-session-cookies
>> --save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
>> http://site.com/?sortorder=asc&p_o=0
>> wget -U firefox --referer='http://site.com/?sortorder=asc&p_o=0'
>> --cookies=on --load-cookies=cookie.txt --keep-session-cookies
>> --save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
>> http://site.com/?sortorder=asc&p_o=10
>>
>> but the results again are identical.  Any suggestions?
>>
>> Thanks.
>> Vinh
>
> Look at the page source how they are generating the urls.
> Maybe they are using some ugly javascript, although that discards
> the benefit of paging...


Thanks for your response Keisial.  I looked at the source, and of
course, there is javascript.  However, I couldn't tie it to anything
that generate links.  The links that I click on:

<td>32 Chapters</td><td width="100%" align="right"><span
class="paginationDisabled">First</span>&nbsp;|&nbsp;<b>1-10</b>&nbsp;|&nbsp;<a
href="/content/w2016h/?sortorder=asc&amp;p_o=10">11-20</a>&nbsp;|&nbsp;<a
href="/content/w2016h/?sortorder=asc&amp;p_o=20">21-30</a>&nbsp;|&nbsp;<a
href="/content/w2016h/?sortorder=asc&amp;p_o=30">31-32</a>&nbsp;|&nbsp;<a
href="/content/w2016h/?sortorder=asc&amp;p_o=10">Next</a></td>

That's displayed in the source.  Also, when i try to manually enter
the url changing =10, =20, =30, I get the right page, so I don't think
it's a javascript issue.  What else could it be besides referer and
cookies?

Vinh



reply via email to

[Prev in Thread] Current Thread [Next in Thread]