[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] .nzb download seems to be going to /dev/null
From: |
Duncan |
Subject: |
Re: [Pan-users] .nzb download seems to be going to /dev/null |
Date: |
Mon, 26 Sep 2011 01:53:59 +0000 (UTC) |
User-agent: |
Pan/0.135 (Tomorrow I'll Wake Up and Scald Myself with Tea; GIT db9cd97 branch-master) |
Lacrocivious Acrophosist posted on Sun, 25 Sep 2011 19:50:33 +0000 as
excerpted:
> Graham Lawrence <address@hidden> writes:
>
>
>> Duncan, I'm using Pan 0.133, and thank you for the very detailed
>> response. I hope you just pasted most of it and didn't have to type it
>> all, because since I ran
>>
>> strace -feopen pan 2>&1 | grep -v 'icons\|cursors' | grep /home/g >
>> pan.debug
>>
>> I think I know what the problem is, and it has nothing to do with Pan.
>> strace was very consistent in its output. After the initial preamble
>> for each task, it generated nothing but this pattern
>>
>> [pid 5641]
> open("/home/g/.pan2/article-cache/part20of78.2Wn&address@hidden",
>> O_RDONLY) = -1 ENOENT (No such file or directory)
>>
>> and it does this for every part in every task. There is nothing in
>> /home/g/.pan2/article-cache/ whose name even begins with p.
>>
>> My neighbor recommended newsgroups to me and offered to share his
>> account with me if I would split the cost with him. The first time I
>> used it all was well, except it posted a warning to the effect that
>> such downloads could only be made to a single computer, which I
>> dismissed at the time as an aberration, I was only downloading to one
>> computer. This time around is my second use, and now the penny has
>> dropped. My neighbor's computer is the single computer referred to,
>> and it has blocked downloading to mine.
>>
>> I am very sorry to have wasted your time on this.
FWIW, that was NOT a waste of time! =:^) If you note, I didn't really
post any solutions, because I didn't know what the problem was. The
entire goal was diagnostics, and it seems we've diagnosed the problem
(tho see below), so regardless where it ended up being, the post was
anything BUT a waste of time. =:^)
Meanwhile, however, that trace confirmed a suspicion of mine that it was
related to the cache. You may be right about the the root of the issue
being server access restrictions, or not. I've seen very similar issues
when pan had permissions issues, when the cache involved a bad symlink to
a directory in an unmounted filesystem[1], etc, which is why I
immediately suspected a caching issue of some sort. A caching issue of
some sort was confirmed for sure, but we do NOT yet know for sure what's
causing it.
The most recent such situation here was when I tried out the new binary
posting code in HMueller's experimental git tree. Here's how it happened.
Some time ago, the pan of the time was noted to have what was arguably a
security issue. Attachments that were posted as executable, pan was
saving as executable as well. If someone clicked them and they WERE a
virus or the like, they'd try to run. The list discussion decided there
wasn't a good reason for that, and pan has for some time now removed the
executable bit, if set, when saving a file (tho there was a regression at
some point, for a version or two).
IIRC it was me that pointed out that pan does follow the umask it
inherits from its environment, and as such, before the bug was fixed, a
user could set the umask in pan's environment to something like 0137, and
pan would behave accordingly, stripping executable bits entirely (plus
stripping the writable bit for group and not allowing access at all for
other).
The problem with that, as you will likely have guessed if you understand
UNIX file permissions, was with directories. As long as all the
directories pan needed were pre-created and permissions set
appropriately, allowing directory entry (the same bit that's the
executable bit on files), all was fine, since it didn't need to actually
create the dir.
But if one of pan's dirs didn't exist, with the 0137 umask I had set, it
would create the dir fine, but couldn't actually enter it to work with
the files it wanted to put there!
Well, I've been using pan for years, and this didn't bother me as long as
pan kept the same directory structure, since it was already created.
But, the new binary posting code used a new posting-cache directory as
scratch-space for encoding, etc, so guess what problem I ran into as soon
as I tried actually using the new code? Right, it created the dir, but
couldn't reach into it, and I got *VERY* puzzling errors... until I asked
on-list about it, and someone's answer wasn't right on, but was
sufficiently close to jog my memory of setting the umask. Upon
investigation, sure enough, that was my problem!
That happened only probably a couple months ago (I could check the
archive to see when I posted the thread asking about it, if I /really/
needed the date, but it's not that important), so it's relatively fresh
in my memory.
So you see, I've had a bit of experience with strace -feopen pan myself,
and I sort of recognized the symptoms of cache error, but most folks
won't run into that sort of issue as they won't have anything like as
complex a pan setup as I do, so I was somewhat doubtful of my instincts.
But sure enough, straced ENOENT errors on what should be cached files
confirms it. Now we just need to figure out for sure what the problem is.
So, before you go labeling it a server restriction and give up, do double-
check your cache dir permissions. If for whatever reason the executable/
dir-entry bit is turned off for whatever permission level pan runs at on
your system (likely, it runs as your normal user), that would do it.
Turn it back on for all pan's directories.
Similarly, check SELinux permissions and the like if you run that, and
user quotas for that partition, if you have them. Also either ensure
that your cache isn't a dead symlink if you are using a symlink in that
path, and that the appropriate partitions are mounted, and that they're
NOT mounted read-only or some such.
Meanwhile, there's another bit of info that may be helpful. It's not yet
in the latest release (0.135), let alone 0.133, but someone just a week
or so ago requested a -v (verbose) switch for pan, such that when it's
downloading from nzbs, it prints to STDOUT the actual files it's
downloading. HMueller has been on the ball and AFAIK already implemented
the request, but it's only in his git repo, ATM. Having that output
would be very useful indeed in this sort of case, and may well have
eliminated the need for the whole thread. But, whether you want to
hassle the whole live git repo compile-from-source thing is another
question entirely. Presumably not, if you're still on 0.133, but the
feature is actually available now, along with binary posting, auto-
actions based on score (optionally automatically mark-read or delete low
scored posts, cache or download-and-save high-scored posts), if you're
willing to jump thru the necessary hoops.
Finally, I've no idea what sort of news account you and your neighbor
split, but it sounds like it could be a monthly-pay, unlimited per-month,
deal, which is why they are so particular about multiple access.
FWIW, unless one or both of you download *HUGE* amounts, it may be
worthwhile at least considering block accounts. With these you purchase
X gigs of data for Y money (dollars/euro/yen/whatever), and can use it
until it runs out. No monthly charges. No expiration (unless of course
you lose track of the login info or the news provider goes belly up).
One of the interesting things about these sorts of accounts, besides not
having to hassle the monthly payments if you're not using that much, is
that it's actually in the provider's interest to make it easy for you to
use it up, so you have to purchase more. As such, they don't tend to
have NEARLY the restrictions that some of the others do, and you'd very
likely be able to login from separate IPs at the same time, as long as
both had valid login info, of course, because all they care about is the
bandwidth you use, and the sooner you use it up, the sooner you have to
buy more.
There's two providers I know of that offer this. Astraweb.com is one.
Blocknews.net is the other.
Astraweb has 25 GB for (US) $10, or 180 GB for $25. Header downloads,
etc, are NOT counted toward the block.
Blocknews has blocks ranging from 5 GB for $2.75 to the astra-news
comparable 25 GB for $8.50 and 200 GB for $21.59, to 500 gig for $51.49,
1024 GB for $91.39, and a massive 3072 GB (3 TiB) for $239.99! Headers
*ARE* counted but traffic is discounted 10% to allow for headers.
Thus, if you tend to grab headers for use with other providers or do a
LOT of header downloading compared to bodies, astraweb would be better,
but if you minimize your header downloads, between that, the 10% traffic
discount, and blocknews' lower per-gig at the higher end (<7.82 cents/gig
for the 3 TiB pkg, 10-11 cents a gig for the 200 and 500 GB pkgs, just
under 9 cents a gig for the 1 TiB), you would be better off there.
Astranews doesn't make their server list public, but blocknews has two
server (farms), iad (Washington DC area), US, and Amsterdam area,
Netherlands. FWIW, the Amsterdam area is home to MANY European news
providers, apparently due to a rather friendlier legal situation for news
there than most other locations, Europe or North America.
Consider that those prices are unexpiring blocks, and talk to your friend
about how much both of you download. Given the prices, unless you're
downloading > 25 gigs/mo, it's very likely to be cheaper getting the
blocks, and you'll probably more or less break-even thru a hundred gigs
or so. If you're paying for giganews now, you'd probably be saving even
more with the block accounts, unless their prices have come down
substantially, recently, but giganews /is/ widely acknowledged as the
gold standard of news providers and thus gets away with charging more for
it. Whether it's worth it could be argued either way, but some people
definitely consider it so.
Of course, if you're downloading half a TiB or more a month, the
unlimited monthly accounts are likely well worth it. But, that's a *LOT*
of traffic for an individual, and if you're doing that, the account has
likely already been flagged for TOS-abuse-watch.
As they say, YMMV, but if I can save you a bit and decrease your chances
of being TOSed at the same time...
> Speaking only for myself, I can say that Duncan definitely did not waste
> his time. That strace tutorial is going in my 'how to do stuff I could
> never remember without a cheat-sheet' file ;-)
>
> Once again, Duncan has taught me to do things for which I previously had
> only 'nodding knowledge'. Thanks Duncan.
If I'd have known you were going to do that, I might have thrown in a
paragraph or two dealing with the other -eXXXX options. FWIW, -efile can
be useful, giving all file actions not just file-opens, but that gets a
bit more difficult to read as well, since the opens list the file names
but the other file actions generally don't; they use the file-numbers
(the result of the open, = 6 in my example) instead. But that lets you
see how long the file is open and what other files are opened before it's
closed (tho if the file number isn't increasing, that means the same
number is being reused repeatedly to open and close many files in
sequence, and that's visible from the opens only), what the app actually
reads/writes to the file and seek behavior within the file, etc.
The other system calls are memory and etc; not really as useful to non-
programmers as the file actions tend to be. Well, the network class of
calls can be interesting, but by far the most oft used here is -efile,
and within it, -eopen suffices MOST of the time, since generally what I'm
after is something to do with files, since they're exposed enough for the
information to be useful to me as a user and admin, even if I'm not a
programmer (unless you include shell scripting in "programmer", it's more
a sysadmin type skill to me, and I believe most, tho it can often sort of
do the job of a program, for one skilled enough at it).
Meanwhile, I too generally have to work out grep's OR behavior, each
time. (I know to enclose it in '' to keep the shell from interfering, but
I always seem to forget whether I still have to back-slash-escape the |
ors as \| , or not. But at least I know enough to try it one way, and if
it doesn't work, try it the other, without having to go to the manual for
it each time, just try it both ways.) Otherwise, I'll often simply pipe
a bunch of grep -v single-term-commands together using shell pipes, since
I know how they work. And it took me awhile to remember the 2>&1 bit as
well, but I've apparently done it enough now that it's beginning to sink
in. =:^)
But if it makes a useful cheat-sheet that's far simpler than the manpages
while hitting all of the most-used bits, thus well demonstrating the
80/20 rule[2], have at it! =:^)
---
[1] I have a dedicated cache partition for my binary pan instance (I run
multiple pan instances, each with its own config, using the PAN_HOME var
to point each at its config with a wrapper script). The rather long-
winded explanation can be found in the list archives, probably several
times as I believe I've posted it more than once over the years.
[2] http://en.wikipedia.org/wiki/80/20_rule
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman