lwip-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-devel] Curious struct packing issue - is it GCC?


From: Bill Auerbach
Subject: [lwip-devel] Curious struct packing issue - is it GCC?
Date: Fri, 24 Apr 2009 16:00:40 -0400

Im trying to inline SMEMCPY.  I see its used with lengths only of 4, 6, 18, 20 and 28.  By inlining it with byte copies, I see over a 5% increase in outbound bandwidth (thats all I’m focused on right now).  As has come up this week, the call to memcpy to copy 4 or 6 bytes is very silly (IMO).

I thought I would copy u16_t in half the copies since either everything is u32_t aligned for my processor (NIOS II GCC as mentioned) or *should* be u16_t aligned for IP related items.  Everything *is* u16_t aligned except one item hwaddr in struct netif.  Netif does *not* include packing around its definition, but curiously, it *does* include packed struct members.  Have I found a problem in that GCC carried the included packed struct override through the remainder of the netif struct?

If I delete  hwaddr_len  from struct netif and then replace the only 2 uses of it in dhcp.c (netif->hwaddr_len ) with ETHARP_HWADDR_LEN, the remainder of netif is properly aligned.

Since we use ETHARP_HWADDR_LEN everywhere else in lwIP, why keep it in the netif, especially if its used only in DHCP 2 times?

Also, in icmp.c

  SMEMCPY((u8_t *)q->payload + sizeof(struct icmp_dur_hdr), p->payload,

          IP_HLEN + ICMP_DEST_UNREACH_DATASIZE);

I think this can be changed to MEMCPY since I dont think any compiler will inline a 28 byte copy.  And this call is only in the icmp_dest_unreach function.

If we do this, SMEMCPY uses only 4, 6, 18 and 20 bytes.  The following macro does well if anyone wants to experiment with it.  With GCC, only the code needed to copy the 2, 3, 9 or 10 u16_ts is generated.  (Interestingly, MSC++ 2003 doesnt eliminate unreachable code!)

#define SMEMCPY(d,s,l)\

        ((l > 1) ? * ((u16_t *) (d)+0) = * ((u16_t *) (s)+0),\

        (l > 3) ? * ((u16_t *) (d)+1) = * ((u16_t *) (s)+1),\

        (l > 5) ? * ((u16_t *) (d)+2) = * ((u16_t *) (s)+2),\

        (l > 7) ? * ((u16_t *) (d)+3) = * ((u16_t *) (s)+3),\

        (l > 9) ? * ((u16_t *) (d)+4) = * ((u16_t *) (s)+4),\

        (l > 11) ? * ((u16_t *) (d)+5) = * ((u16_t *) (s)+5),\

        (l > 13) ? * ((u16_t *) (d)+6) = * ((u16_t *) (s)+6),\

        (l > 15) ? * ((u16_t *) (d)+7) = * ((u16_t *) (s)+7),\

        (l > 17) ? * ((u16_t *) (d)+8) = * ((u16_t *) (s)+8),\

        (l > 19) ? * ((u16_t *) (d)+9) = * ((u16_t *) (s)+9)\

        : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0)

This is tested for DHCP, UDP, TCP on a processor where it fails if alignment isnt observed.  That is, only after I removed the hwaddr_len from netif (or you can simply make it a u32_t).

Bill


reply via email to

[Prev in Thread] Current Thread [Next in Thread]