pan-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-devel] Re: Ya know Pan2 is really single threaded, right?


From: Calin A. Culianu
Subject: Re: [Pan-devel] Re: Ya know Pan2 is really single threaded, right?
Date: Sun, 18 Mar 2007 15:43:01 -0400 (EDT)



On Sun, 18 Mar 2007, Duncan wrote:

"Calin A. Culianu" <address@hidden> posted
address@hidden, excerpted below, on
Sat, 17 Mar 2007 20:44:27 -0400:

I have been reading the code and trying to add a worker thread pool
model to the code.

I can see now that it introduces a lot of complexity and potential bugs
to the code -- already the code assumes that everything runs in 1
thread.

But I will do my best to aviod race conditions by locking shared data as
best I can, and to only introduce threading where it will actually
benefit the user (better network utilization, not stalling on uudecode,
etc).

FWIW, pan has used at least three threading models since I discovered it
back with 0.11.x (the old GNOME 1.x code).  I'm not sure when it went
multi-threaded in the first place, but at one point anyway, pan was I
think trying to handle all the downloading in multiple threads on its
own.  Then Charles switched to using gnet for the network handling, and
it's apparently reentrant on its own (the non-coder is showing here, I'm
vague on the details), thus sparing him worrying so much about the
threading details on the networking side -- he only had to worry about
the local stuff.  That worked quite well (in terms of keeping up the
network utilization).

I'm not sure what lead him to dump gnet in the rewrite, unless it was the
C to C++ jump, or simply trying to keep dependencies down, but from my
read between the lines, he judged the maintenance costs of trying to keep
threaded code bug-free without the library more hassle than it was worth.

Actually as I read more of the code I realize it's true -- right now the pan code is *very* readable and maintainable, (well, at least compared to the last time I looked at its code which was in 2003 and back then in was all C and not as pretty due to its complexity).

I think this is a very strong point -- that multithreading introduces the possibility of lessened stabilty and increased complexity for the developer.


From what I've observed and from comments yours and others, it's the
decoding and saving that's the problem at this point, not so much the
network stuff, except due to the decoding and saving bottleneck stalling
things long enough to fill the buffer and therefore stall the download as
well.  Therefore, if it were me, I'd try multi-threading that only,
first, creating a pool of decode-and-save threads and having the
otherwise single-threaded pan hand off to them.  Then possibly put the UI
in a (single) separate thread to keep it more responsive.  If I'm reading
Charles correctly, I think his thought is that the networking isn't the
issue, and as such he doesn't need to multithread it, and can therefore
avoid gnet, his previous partial solution, as a dependency.

I think you are right. It would be a lot of work to multi-thread all of pan's network operations -- and I am not sure, like you say, that it would really buy pan2 that much in terms of performance. You are right, already the performance is pretty decent on the download (just not on the decode!).

I was reading the code and yes, I think just doling off the decode to another thread is feasible. I think I know how to do this elegantly. There are a few hitches but I have ideas about how to solve them without making the code too unreadable.

Wish me luck! I am going to implement something today hopefully -- I just hope I am not missing something and that my idea actually works.


I'm running pan on a dual Opteron here, 8 gigs of memory, and my news
volumes on 4-way SATA RAID-6 (so basically two-way striped, the bus being
sufficient, the disk I/O shouldn't be a bottleneck), and more and more
folks are getting dual-core and will soon be going quad core.  What's
frustrating here is seeing the network stalling due to either CPU or I/O
wait times, on a single thread, when I've a second CPU idling and all
those local memory and RAID resources.  At significantly less than 10
Mbps, the network should be the bottleneck on a setup like that.  I
assume you are seeing some of the same issues, but being a coder with
multithreading experience, are even more frustrated than I am with it.

Yes, it is frustrating isn't it? On your setup probably the stalls are tinier -- I would imagine with 4-way SATA RAID-6 that decoding a 50MB attachment takes a 1-2 seconds. On my system it takes 20 seconds, IIRC, because of the slow hard drive I am using (external drive on USB 2.0 and for some reason my USB enclosure makes is slower). 20 seconds is a significant amount of time considering I get 10 megabit from my ISP -- 20 seconds ends up being a significant portion of the download time for that 50MB which effectively reduces my net download rate to some number significantly smaller than 10 megabit.

If the decode/save were to happen in another thread then this stall would disappear as the I/O bound decode could safely happen in the background and would hopefully not contend for resouces with the download thread (since presumably they are both relying on mainly non-overlapping hardware resources -- one is heavily network-bound and the other is heavily disk-bound).


I have experience with multithreaded coded so hopefully I can do this
somewhat elegantly and in a bug-free manner.

Likely he didn't make the code multithreaded because his initial pass
was to just get Pan2 working, and perhaps he intended to go in later and
added multithreading?

I don't think that was the issue, as this was a rewrite in part to get
things right from the beginning.  Thus, if he thought multithreading was
necessary to get it right, I believe he'd have written it with that in
mind, but you say the assumptions are single thread, so it's difficult to
believe he found multithreading a necessary or realistic likelyhood, or
he would have made that sort of assumption from the beginning, even if
the initial implementation strictly serialized it.  To do otherwise would
have nullified one of the big reasons for the rewrite, to get it right
from the beginning, this time, knowing what he knows now about what works
and what doesn't.

Fair enough. I can see now that compared to previous iterations of pan's code that I briefly skimmed, pan is much more elegant and maintainable to my eyes, which is usually the hallmark of having gone through several iterations.

Anyway, Duncan, thanks very much for your thoughtful comments. I think that your suggestion to just try and make the UUDecode phase happen in another thread, so that pan2 can resume its downloading is a pretty good one, and is the approach I think I will take.

-Calin





reply via email to

[Prev in Thread] Current Thread [Next in Thread]