Indeed, we don't use heap and we can't use PBUF_ROM or PBUF_REF, because some packets are sent internally by LwIP such as ARP request/replies, icmp reply, TCP SYN, TCP ACK... Those frames are allocated using PBUF_RAM, and If PBUF_RAM uses pools, I have to copy from the pools (data memory) to the frame memory before sending the frame to the HW MAC, which means wast of memory and power.
I went over LwIP code and the only memory error I saw is that when p->payload ptr < pbuf ptr. Since our data memory addresses are always less than the frame memory addresses we will not face this issue because p->payload ptr will always be greater than pbuf ptr.
In a separate note, even if I used LwIP without any change some frames are not contiguous and "the first payload byte cannot be calculated from struct pbuf" unless if we check the data type. For example, UDP packets allocate 20 bytes for TCP header however it uses 8 bytes only. Therefore p->payload is not equal to p + sizeof(pbuf struct), indeed p->payload = p + sizeof(pbuf struct) + 12.
That's why I inspired the method I mentioned above. There is no problem of having pbuf structure in a memory section that is far from the payload if we respect the condition p < p->payload.
In addition, LwIP doesn't consider the retransmission at the MAC layer (if 802.11 is used). LwIP calls low_level_output() and directly calls pbuf_free() which will deallocates the frame. That's why I have to add a minor modification to pbuf_free() function.
Thanks,
Amena