pan-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-devel] Re: Re: Want to fix memory consumption issues


From: Calin A. Culianu
Subject: Re: [Pan-devel] Re: Re: Want to fix memory consumption issues
Date: Fri, 4 Jun 2004 10:49:25 -0500 (EST)


On Fri, 4 Jun 2004, Duncan wrote:

> Calin A. Culianu posted
> <address@hidden>, excerpted
> below,  on Mon, 31 May 2004 16:53:54 -0500:
> 
> 
> > 
> > On Mon, 31 May 2004, Duncan wrote:
> > 
> >> I'm not a source groker myself, but the consensus has always been []
> > 
> > Well struct Article eats [] about 300-400 bytes of data per header.  1
> > million headers is already 400 megabytes.. so I would say struct Article
> > isn't helping matters any..
> > 
> > 
> >> The trouble as I understand it is that PAN uses them as they were never
> >> intended to be used, forcing them to scale to entry-counts they were
> >> never intended to handle.  The problem is exacerbated by PAN using what
> >> is primarily a GUI widget that happens to have minor data-widget
> >> capabilities, as a data-widget that happens to be a GUI-widget as well.
> > 
> > Hmm.. so all headers are loaded into the header pane widget, on TOP of
> > being simultaneously stored in a big linked-list of struct Article?!?!
> > Eeek!!
> 
> I DID say /as/ /I/ /understand/ /it/.  Maybe that's what was changed when
> PAN went from bogging down at ~ 200k overviews to ~ 1M overviews..

Cool.  So if that is the case, then it makes it easier to slap a DB
backend in place, as the GUI already seems to be reading in header
information into widgets in a piecemeal fashion.. which is what one wants.


> 
> > I too have been playing lately with sqlite and trying to design some
> > reasonable tables to handle the needs of pan.  I have a good amount of
> > database programming experience too :).
> 
> Cool!  =:^)  Pan needs it.
> 
> > I have a lot of professional DB experience.  I am working out now how to
> > best design the schema, so that it is quick to extract information such
> > as parent/child relationships between articles for threading, and so
> > that sorting on any field is quick too.
> > 
> >> Look in the archives for the previous discussion, and go from there,
> >> would be my suggestion.
> >> 
> >> 
> > Okay.. it would be worthwhile to see what other people's thoughts were
> > on the db design -- perhaps someone has already worked out a pretty good
> > way to organize the data already so I don't re-invent the wheel.
> > 
> > I will look into it some more and see what I come up with.
> 
> You may already be ahead of the work that had been done, but as I said,
> see the archives for a bit of discussion.  Charles had previously

I did and the discussion is vague on details.. :(

> mentioned that he had done a bit of preliminary work on it.  I expect it
> was just that, rather preliminary, and I'm guessing he didn't do any
> testing or anything as you are already doing.  However, if you could get

Yeah I have gotten it to a point where on my 1.5 GHz Athlon-XP running
Linux, loading 1 million headers into the db (as if one were downloading
headers) is operating at roughly 1 megabyte per second.  This is with like
no tuning, just a basic schema which isn't even really well normalized.
  
For most users this would be fine, (since it already exceeds most people's
bandwidth) -- but we can do better.  I am going to better normalize the
schema to further reduce disk access times, thus maybe increasing
throughput.  

After that... the next step would be figuring out what kind of data pan
needs.  I already have some idea: it basically needs sorted lists of
articles (which can be sorted on a variety of fields), that may or may not
be threaded and that vary in threading technique from either breaking
threads if the subject changes, or keeping threads together regardless of
subejct.  These data needs really do entirely define how the schema should
look.. since retreiving this information from the db should be as fast as
possible.

Since I am impatient.. I see it all as an iterative process -- first I
design a schema, try to work it into pan, if it doesn't quite fit pan's
needs, go back and tweak things.  :)

Sqlite is really fast, by the way.  With almost no tuning it was
performing better than I would have imagined!! I guess it is pretty
lightweight and agile since it doesn't have the big
scalability/concurrency needs of things like Oracle..


> what he's already done, it's possible you can figure out better where he
> was going with it, so "y'all get pulling in the same direction from the
> get-go," so to speak.  =:^)

Yeah, I think that it would be helpful to do that. I will see about
getting a hold of him/her..

> 
> (I'm taking a bit longer to reply now than I might normally, due to
> working on Gentoo as a dual boot, taking me away from my already
> functioning Mandrake, so..)

It's okay.   I myself am busy with my real job so I can't really work 
on this anyway until the weekend... :)

-Calin





reply via email to

[Prev in Thread] Current Thread [Next in Thread]