pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] .nzb download seems to be going to /dev/null


From: Duncan
Subject: Re: [Pan-users] .nzb download seems to be going to /dev/null
Date: Mon, 26 Sep 2011 01:53:59 +0000 (UTC)
User-agent: Pan/0.135 (Tomorrow I'll Wake Up and Scald Myself with Tea; GIT db9cd97 branch-master)

Lacrocivious Acrophosist posted on Sun, 25 Sep 2011 19:50:33 +0000 as
excerpted:

> Graham Lawrence <address@hidden> writes:
> 
> 
>> Duncan, I'm using Pan 0.133, and thank you for the very detailed
>> response.  I hope you just pasted most of it and didn't have to type it
>> all, because since I ran
>> 
>> strace -feopen pan 2>&1 | grep -v 'icons\|cursors' | grep /home/g >
>> pan.debug
>> 
>> I think I know what the problem is, and it has nothing to do with Pan.
>> strace was very consistent in its output. After the initial preamble
>> for each task, it generated nothing but this pattern
>> 
>> [pid  5641]
> open("/home/g/.pan2/article-cache/part20of78.2Wn&address@hidden",
>> O_RDONLY) = -1 ENOENT (No such file or directory)
>> 
>> and it does this for every part in every task.  There is nothing in
>> /home/g/.pan2/article-cache/ whose name even begins with p.
>> 
>> My neighbor recommended newsgroups to me and offered to share his
>> account with me if I would split the cost with him.  The first time I
>> used it all was well, except it posted a warning to the effect that
>> such downloads could only be made to a single computer, which I
>> dismissed at the time as an aberration, I was only downloading to one
>> computer.  This time around is my second use, and now the penny has
>> dropped.  My neighbor's computer is the single computer referred to,
>> and it has blocked downloading to mine.
>> 
>> I am very sorry to have wasted your time on this.

FWIW, that was NOT a waste of time! =:^)  If you note, I didn't really 
post any solutions, because I didn't know what the problem was.  The 
entire goal was diagnostics, and it seems we've diagnosed the problem 
(tho see below), so regardless where it ended up being, the post was 
anything BUT a waste of time. =:^)

Meanwhile, however, that trace confirmed a suspicion of mine that it was 
related to the cache.  You may be right about the the root of the issue 
being server access restrictions, or not.  I've seen very similar issues 
when pan had permissions issues, when the cache involved a bad symlink to 
a directory in an unmounted filesystem[1], etc, which is why I 
immediately suspected a caching issue of some sort.  A caching issue of 
some sort was confirmed for sure, but we do NOT yet know for sure what's 
causing it.

The most recent such situation here was when I tried out the new binary 
posting code in HMueller's experimental git tree.  Here's how it happened.

Some time ago, the pan of the time was noted to have what was arguably a 
security issue.  Attachments that were posted as executable, pan was 
saving as executable as well.  If someone clicked them and they WERE a 
virus or the like, they'd try to run.  The list discussion decided there 
wasn't a good reason for that, and pan has for some time now removed the 
executable bit, if set, when saving a file (tho there was a regression at 
some point, for a version or two).

IIRC it was me that pointed out that pan does follow the umask it 
inherits from its environment, and as such, before the bug was fixed, a 
user could set the umask in pan's environment to something like 0137, and 
pan would behave accordingly, stripping executable bits entirely (plus 
stripping the writable bit for group and not allowing access at all for 
other).

The problem with that, as you will likely have guessed if you understand 
UNIX file permissions, was with directories.  As long as all the 
directories pan needed were pre-created and permissions set 
appropriately, allowing directory entry (the same bit that's the 
executable bit on files), all was fine, since it didn't need to actually 
create the dir.

But if one of pan's dirs didn't exist, with the 0137 umask I had set, it 
would create the dir fine, but couldn't actually enter it to work with 
the files it wanted to put there!

Well, I've been using pan for years, and this didn't bother me as long as 
pan kept the same directory structure, since it was already created.  
But, the new binary posting code used a new posting-cache directory as 
scratch-space for encoding, etc, so guess what problem I ran into as soon 
as I tried actually using the new code?  Right, it created the dir, but 
couldn't reach into it, and I got *VERY* puzzling errors... until I asked 
on-list about it, and someone's answer wasn't right on, but was 
sufficiently close to jog my memory of setting the umask.  Upon 
investigation, sure enough, that was my problem!

That happened only probably a couple months ago (I could check the 
archive to see when I posted the thread asking about it, if I /really/ 
needed the date, but it's not that important), so it's relatively fresh 
in my memory.

So you see, I've had a bit of experience with strace -feopen pan myself, 
and I sort of recognized the symptoms of cache error, but most folks 
won't run into that sort of issue as they won't have anything like as 
complex a pan setup as I do, so I was somewhat doubtful of my instincts.  
But sure enough, straced ENOENT errors on what should be cached files 
confirms it.  Now we just need to figure out for sure what the problem is.

So, before you go labeling it a server restriction and give up, do double-
check your cache dir permissions.  If for whatever reason the executable/
dir-entry bit is turned off for whatever permission level pan runs at on 
your system (likely, it runs as your normal user), that would do it.  
Turn it back on for all pan's directories.

Similarly, check SELinux permissions and the like if you run that, and 
user quotas for that partition, if you have them.  Also either ensure 
that your cache isn't a dead symlink if you are using a symlink in that 
path, and that the appropriate partitions are mounted, and that they're 
NOT mounted read-only or some such.

Meanwhile, there's another bit of info that may be helpful.  It's not yet 
in the latest release (0.135), let alone 0.133, but someone just a week 
or so ago requested a -v (verbose) switch for pan, such that when it's 
downloading from nzbs, it prints to STDOUT the actual files it's 
downloading.  HMueller has been on the ball and AFAIK already implemented 
the request, but it's only in his git repo, ATM.  Having that output 
would be very useful indeed in this sort of case, and may well have 
eliminated the need for the whole thread.  But, whether you want to 
hassle the whole live git repo compile-from-source thing is another 
question entirely.  Presumably not, if you're still on 0.133, but the 
feature is actually available now, along with binary posting, auto-
actions based on score (optionally automatically mark-read or delete low 
scored posts, cache or download-and-save high-scored posts), if you're 
willing to jump thru the necessary hoops.

Finally, I've no idea what sort of news account you and your neighbor 
split, but it sounds like it could be a monthly-pay, unlimited per-month, 
deal, which is why they are so particular about multiple access.

FWIW, unless one or both of you download *HUGE* amounts, it may be 
worthwhile at least considering block accounts.  With these you purchase 
X gigs of data for Y money (dollars/euro/yen/whatever), and can use it 
until it runs out.  No monthly charges.  No expiration (unless of course 
you lose track of the login info or the news provider goes belly up).

One of the interesting things about these sorts of accounts, besides not 
having to hassle the monthly payments if you're not using that much, is 
that it's actually in the provider's interest to make it easy for you to 
use it up, so you have to purchase more.  As such, they don't tend to 
have NEARLY the restrictions that some of the others do, and you'd very 
likely be able to login from separate IPs at the same time, as long as 
both had valid login info, of course, because all they care about is the 
bandwidth you use, and the sooner you use it up, the sooner you have to 
buy more.

There's two providers I know of that offer this.  Astraweb.com is one.  
Blocknews.net is the other.

Astraweb has 25 GB for (US) $10, or 180 GB for $25.  Header downloads, 
etc, are NOT counted toward the block.

Blocknews has blocks ranging from 5 GB for $2.75 to the astra-news 
comparable 25 GB for $8.50 and 200 GB for $21.59, to 500 gig for $51.49, 
1024 GB for $91.39, and a massive 3072 GB (3 TiB) for $239.99!  Headers 
*ARE* counted but traffic is discounted 10% to allow for headers.

Thus, if you tend to grab headers for use with other providers or do a 
LOT of header downloading compared to bodies, astraweb would be better, 
but if you minimize your header downloads, between that, the 10% traffic 
discount, and blocknews' lower per-gig at the higher end (<7.82 cents/gig 
for the 3 TiB pkg, 10-11 cents a gig for the 200 and 500 GB pkgs, just 
under 9 cents a gig for the 1 TiB), you would be better off there.

Astranews doesn't make their server list public, but blocknews has two 
server (farms), iad (Washington DC area), US, and Amsterdam area, 
Netherlands.  FWIW, the Amsterdam area is home to MANY European news 
providers, apparently due to a rather friendlier legal situation for news 
there than most other locations, Europe or North America.

Consider that those prices are unexpiring blocks, and talk to your friend 
about how much both of you download.  Given the prices, unless you're 
downloading > 25 gigs/mo, it's very likely to be cheaper getting the 
blocks, and you'll probably more or less break-even thru a hundred gigs 
or so.  If you're paying for giganews now, you'd probably be saving even 
more with the block accounts, unless their prices have come down 
substantially, recently, but giganews /is/ widely acknowledged as the 
gold standard of news providers and thus gets away with charging more for 
it.  Whether it's worth it could be argued either way, but some people 
definitely consider it so.

Of course, if you're downloading half a TiB or more a month, the 
unlimited monthly accounts are likely well worth it.  But, that's a *LOT* 
of traffic for an individual, and if you're doing that, the account has 
likely already been flagged for TOS-abuse-watch.

As they say, YMMV, but if I can save you a bit and decrease your chances 
of being TOSed at the same time...


> Speaking only for myself, I can say that Duncan definitely did not waste
> his time. That strace tutorial is going in my 'how to do stuff I could
> never remember without a cheat-sheet' file ;-)
> 
> Once again, Duncan has taught me to do things for which I previously had
> only 'nodding knowledge'. Thanks Duncan.

If I'd have known you were going to do that, I might have thrown in a 
paragraph or two dealing with the other -eXXXX options.  FWIW, -efile can 
be useful, giving all file actions not just file-opens, but that gets a 
bit more difficult to read as well, since the opens list the file names 
but the other file actions generally don't; they use the file-numbers 
(the result of the open, = 6 in my example) instead.  But that lets you 
see how long the file is open and what other files are opened before it's 
closed (tho if the file number isn't increasing, that means the same 
number is being reused repeatedly to open and close many files in 
sequence, and that's visible from the opens only), what the app actually 
reads/writes to the file and seek behavior within the file, etc.

The other system calls are memory and etc; not really as useful to non-
programmers as the file actions tend to be.  Well, the network class of 
calls can be interesting, but by far the most oft used here is -efile, 
and within it, -eopen suffices MOST of the time, since generally what I'm 
after is something to do with files, since they're exposed enough for the 
information to be useful to me as a user and admin, even if I'm not a 
programmer (unless you include shell scripting in "programmer", it's more 
a sysadmin type skill to me, and I believe most, tho it can often sort of 
do the job of a program, for one skilled enough at it).

Meanwhile, I too generally have to work out grep's OR behavior, each 
time. (I know to enclose it in '' to keep the shell from interfering, but 
I always seem to forget whether I still have to back-slash-escape the | 
ors as \| , or not.  But at least I know enough to try it one way, and if 
it doesn't work, try it the other, without having to go to the manual for 
it each time, just try it both ways.)  Otherwise, I'll often simply pipe 
a bunch of grep -v single-term-commands together using shell pipes, since 
I know how they work.  And it took me awhile to remember the 2>&1 bit as 
well, but I've apparently done it enough now that it's beginning to sink 
in. =:^)

But if it makes a useful cheat-sheet that's far simpler than the manpages 
while hitting all of the most-used bits, thus well demonstrating the 
80/20 rule[2], have at it! =:^)

---
[1]  I have a dedicated cache partition for my binary pan instance (I run 
multiple pan instances, each with its own config, using the PAN_HOME var 
to point each at its config with a wrapper script).  The rather long-
winded explanation can be found in the list archives, probably several 
times as I believe I've posted it more than once over the years.

[2]  http://en.wikipedia.org/wiki/80/20_rule

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]