[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Pan-users] Re: Pan or server problem?
From: |
Duncan |
Subject: |
[Pan-users] Re: Pan or server problem? |
Date: |
Wed, 11 Apr 2007 14:20:16 +0000 (UTC) |
User-agent: |
Pan/0.126 (Demon Sweat) |
"Rinaldi J. Montessi" <address@hidden>
posted address@hidden, excerpted below, on Wed, 11 Apr
2007 02:22:08 -0400:
> 8 205.152.152.46 (205.152.152.46) 34.860 ms 30.335 ms 29.538 ms
> 9 205.152.152.6 (205.152.152.6) 33.518 ms 30.661 ms 30.502 ms
> 10 * * *
> 11 * * *
> 12 * * *
> 13 * * *
> 14 * * *
> 15 * * *
> 16 * * *
> 17 * * *
> 18 * * *
> 19 * * *
> 20 * * *
> 21 * * *
> 22 * * *
> 23 * * *
> 24 * 66.21.240.205 (66.21.240.205) 30.012 ms !A *
> 25 * * *
It doesn't specifically note what the !A is, but there's some interesting
notes about !<letter> notation and skipped hops in the traceroute manpage.
BTW, tcptraceroute is useful for this sort of thing, too. It sends TCP
syn packets to a selected port (80/http by default, but we'd select 119/
nntp here). Most hops will immediately RESET, indicating port closed.
The destination service host will of course normally respond with the
open port, SYN ACK, in which case the tracing end kernel sends the
reset. Due to the technique it uses, of course, it requires a privileged
user (aka root) to run.
Here's the past-Cox route a tcptraceroute newsgroups.belsouth.net 119
shows from here:
10 * axr01aep-so-1-1-2.bellsouth.net (65.83.236.185) 84.210 ms 85.229 ms
11 ixc01aep-pos-4-0.bellsouth.net (65.83.237.99) 87.418 ms 86.879 ms
86.830 ms
12 205.152.152.42 85.017 ms * 86.340 ms
13 205.152.152.94 143.494 ms 85.189 ms 86.084 ms
14 * 66.21.240.205 96.508 ms 97.215 ms
15 bignews.bellsouth.net (216.77.188.18) [open] 102.067 ms 99.325 ms *
Note the [open] notation indicating the port is open on the last, which
is our target. After the two 205 IPs, it shows only a single further
router hop before it hits the server itself. The !A may be mean rate
limited or the like, as obviously, your hop 24 should be hop 10 only it's
not showing in the regular traceroutes, and bignews aka newsgroups should
be your hop 11.
I'm not sure what's going on, but TCP TIME_WAIT indicates a TCP
connection in the half-closed state -- one side has said close, but the
other side hasn't gotten the ACK on the close yet. They eventually
timeout, if the ACK gets lost or whatever, but it can take awhile.
Assuming you tried that telnet session when everything from your end was
closed (nothing in netstat), it's indicating the other end isn't allowing
the connection, saying all ten allowed connections are in use. It may
not be dropping them as it should.
Note that tornado is a high-winds product, and they are known for "stuck
connection" issues in certain configurations. Here, Cox outsourced news
to highwinds-media October-ish of last year, and we've been fighting this
sort of thing since then... only they're allowing only four connections.
In the configuration here, it's partly because they have two newsserver
farms setup to authenticate to the same (separate) authentication server,
and what happens is that in some cases the individual news-frontends that
serve the actual posts will experience issues and issue a TCP RESET,
which SHOULD say the connection's dead, and can be reestablished. Only
for whatever reason, the front-end doesn't send the info to the auth-
server, which then thinks the RESET connections are still alive and
kicking, and thus won't authenticate any further connections.
That you immediately get your full allotment of half-closed TCP sessions
seems to indicate a very similar problem on bellsouth, but you shouldn't
be the only one seeing it if so (tho not everyone will necessarily see
it, the folks with good enough quality connections to seldom see resets
and possibly there's something else involved as well, never seem to have
the issue, or at least not to the same degree).
Previous to October-ish of last year, Cox ran its own servers, using
Highwinds software on Sun Hardware. (The big companies often go with
Highwinds, as it's one of the only commercial products available that can
run on the "Big Iron" necessary to server a big ISP. The alternative is
apparently running certainly tens and possibly hundreds of PC server
level machines, with open source or small commercial level products, but
big corporations being what they are, they like to go big and
centralized, so...) Cox's own servers had various problems (mainly
completion and retention), but reliable authentication wasn't one of
them. That aspect just worked.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman