lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] TCP bandwidth limited by rate of ACKs


From: Mason
Subject: Re: [lwip-users] TCP bandwidth limited by rate of ACKs
Date: Mon, 17 Oct 2011 16:55:14 +0200
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20110928 Firefox/7.0.1 SeaMonkey/2.4.1

Simon wrote:

> Mason wrote:
> 
>> Perhaps I could keep the RxTask, and enable LWIP_TCPIP_CORE_LOCKING_INPUT
>> which would take tcpip_thread out of the equation? Thus, LOCK_TCPIP_CORE
>> would be called from task context, which is fine.
> 
> I guess that would also be OK. Provided you would get DMA RX working, 
> you'd then get off with 1 memcpy (to socket application buffers) and 2 
> tasks switches (1 from ISR to RxThread, 1 from RxThread to application 
> thread).

I enabled CORE_LOCKING and CORE_LOCKING_INPUT, and used custom
PBUF_REF buffers in rx_task.

  struct pbuf *pbuf = pbuf_alloced_custom(PBUF_RAW, buflen, PBUF_REF, 
&curr->pbuf, buf, MAX_FRAME_SIZE);
  mynetif->input(pbuf, mynetif);

The STB can now process 2.4 million packets in 5 minutes, which
corresponds to 97.8 Mbps.

OS21 profile analysis for main.out
(16 bytes per bucket, sampled at 1000 Hz, system wide profile)

 301.212 seconds (100.00%) :  Total duration

Task breakdown:
 134.588 seconds ( 44.68%) :  RxTask
 124.072 seconds ( 41.19%) :  rxapp
   9.318 seconds (  3.09%) :  Root Task
   1.069 seconds (  0.35%) :  Idle Task
   0.100 seconds (  0.03%) :  tcpip_thread

Interrupt level breakdown:
  31.071 seconds ( 10.32%) :  Interrupt level 8
   0.645 seconds (  0.21%) :  Interrupt level 13
   0.047 seconds (  0.02%) :  Interrupt level 15

Symbolic breakdown:
 117.961 seconds ( 39.16%) :  _md_kernel_intr_mask
  24.310 seconds (  8.07%) :  memcpy
  14.622 seconds (  4.85%) :  stxmac_dma_isr
   8.378 seconds (  2.78%) :  tcp_input
   7.489 seconds (  2.49%) :  _md_timer_system_time
   5.851 seconds (  1.94%) :  _st40_kernel_interrupt_handler600
   5.588 seconds (  1.86%) :  tcp_receive
   5.235 seconds (  1.74%) :  stxmac_dma_rx_start
   3.654 seconds (  1.21%) :  stxmac_dma_tx_start


I'm trying to optimize this a bit further, first by attacking
the memcpy for rxapp.

Up to this point, I'd been using the BSD socket API in rxapp
with the following code (error-handling removed).

  int sock = socket(PF_INET, SOCK_STREAM, 0);
  struct sockaddr_in addr = { 0, AF_INET, htons(44044), };
  bind(sock, (struct sockaddr *)&addr, sizeof addr);
  listen(sock, 10);
#define BUF_LEN (4*1460)
  u8_t *buf = malloc(BUF_LEN);
  while ( 1 )
  {
    struct sockaddr_in from = { 0 };
    socklen_t len = sizeof from;
    int sock2 = accept(sock, (struct sockaddr *)&from, &len);
    while ( 1 )
    {
      int res = read(sock2, buf, BUF_LEN);
      if (res <= 0) break;
    }
    close(sock2);
  }

I tried using the Netconn API because I thought it didn't
force a memcpy (is that correct?), but I can't get it to work.

( http://lwip.wikia.com/wiki/Netconn_API )

The rxapp thread hangs after receiving a few hundred packets.
(I've written the following code.)

  struct netconn *conn = netconn_new( NETCONN_TCP );
  netconn_bind(conn, IP_ADDR_ANY, 44044);
  netconn_listen(conn);
  while ( 1 )
  {
    struct netconn *foo;
    netconn_accept(conn, &foo);
    while ( 1 )
    {
      struct netbuf *buf;
      err = netconn_recv(foo, &buf);
      if (err) { printf("recv=%d\n", err); break; }
    }
    netconn_delete(foo);
  }

Is something wrong with this call sequence?
Or should it work as-if?

-- 
Regards.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]