librefm-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Librefm-discuss] Re: lastscrape.py


From: Gordon Haverland
Subject: Re: [Librefm-discuss] Re: lastscrape.py
Date: Tue, 2 Feb 2010 12:28:15 -0700
User-agent: KMail/1.12.4 (Linux/2.6.26; KDE/4.3.4; i686; ; )

On February 1, 2010, Matt Lee wrote:
> On 02/01/2010 11:29 PM, Seth Woodworth wrote:
> > I would suggest, when possible, using the Html5lib parser and
> > using the traverser from BeautifulSoup.  The author himself
> > suggests[1] this in any case of  BS-3.1.0 or 3.0.8 behaving
> > poorly.
> >
> > I have been doing work with python, BeautifulSoup and
> > Html5Lib lately, and I've been collecting and slowly
> > improving python scripts (like this) to liberate data from
> > websites like Reddit or the Ubuntu forums. I would love to
> > get involved with the lastscrape.py script.
> 
> http://bugs.libre.fm/wiki/LastToLibre is the new way to do
>  this.
> 
> Last.fm has an API now, for people like us ;)

Well, it took about 6 hours to download, probably half a dozen 
restarts needed.  I decided to overlap the pages (if it failed at 
page N, I restarted at page N-1).  In the 6 hours, the total 
number of pages went up by 1 (to 6905 pages).  So, I guess I am 
going to have to clean this up a little.  (Not today.)

Do you require me to upload this to libre.fm in pieces, or can it 
be just one big file?

Gord




reply via email to

[Prev in Thread] Current Thread [Next in Thread]