chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-users] http-client gets stuck in scheduler when reusing connect


From: Andy Bennett
Subject: [Chicken-users] http-client gets stuck in scheduler when reusing connections
Date: Wed, 24 Jun 2015 16:06:16 +0100
User-agent: RoundCube Webmail/0.5.1

Hi,

I've been using CHICKEN to speak to HBase via the Stargate REST API.
I've managed to build a binding with the rest-bind egg and it works.

However, it's very slow but it doesn't saturate CPU or IO bandwidth.

I have a benchmark where it requests 124 cells from HBase using the scanner API. This takes about 5 seconds, of which less than 0.2 seconds are actually spend doing anything at all:

-----
#: 124
0.18s CPU time, 0.02s GC time (major), 157977/130286 mutations (total/tracked), 4/273 GCs (major/minor) 0.32user 0.02system 0:05.20elapsed 6%CPU (0avgtext+0avgdata 24096maxresident)k
0inputs+32outputs (0major+3514minor)pagefaults 0swaps
-----

There are some more benchmarks here: http://paste.call-cc.org/paste?id=b46e6a3905ae611f2dcce3c3214e7d20384fa8ed

The benchmark is almost the same from compiled code as it is via csi.

I've tried attaching the debugger to the process and I always catch it in __poll_nocancel so I suspect that it's getting stuck in the scheduler.

If I tell the HTTP request to use HTTP/1.0 rather than HTTP/1.1 then it doesn't uses a new HTTP connection for each request and goes significantly faster (but still only gets up to about 17% CPU rather than 0.4%):

-----
$ time csi -s extractor.scm
#: 124
0.112s CPU time, 0.02s GC time (major), 33088/5388 mutations (total/tracked), 5/218 GCs (major/minor)

real    0m0.790s
user    0m0.260s
sys     0m0.012s
-----

As you can see from the numbers above, it's still wasting a considerable amount of time.



I did a benchmark using curl as well in order to rule out the other end of the REST API:


Reusing a single connection:

-----
$ time seq 1 124 | sed s#.*#http://localhost:8080/GridSearch/scanner/14351513400672ee69118# |xargs -n 124 curl > /dev/null 2>&1

real    0m0.079s
user    0m0.024s
sys     0m0.004s
-----


Using a connection per request:

-----
$ time seq 1 124 | sed s#.*#http://localhost:8080/GridSearch/scanner/14351514443804a6fb774# |xargs -n 1 curl > /dev/null 2>&1

real    0m1.472s
user    0m0.472s
sys     0m0.036s
-----



I did a bit more profiling and http-client spends almost all of its time in intarweb's read-response procedure which, in turn, spends all it's time in its own safe-read-line procedure. Swapping safe-read-line for read-line doesn't change anything.

My timings show that things get stuck in read-line for 38 or 40ms per call. For 124 calls that's about 4.712 seconds which is most of the 5 seconds run time.


I've looked briefly into ports.scm and library.scm in the CHICKEN source but didn't have much luck understanding what was going on. make-input-port's read-char procedure appears to call (read) whilst inside the scope of a lambda called read which has lots of arguments so I'm clearly missing something.



Any help about how to improve this situation would be greatfully appreciated.


Thanks! :-)




--
address@hidden
http://www.ashurst.eu.org/
http://www.gonumber.com/andyjpb
0x7EBA75FF



reply via email to

[Prev in Thread] Current Thread [Next in Thread]