Embedded Design Handbook

ID 683689
Date 8/28/2023
Public
Document Table of Contents

7.4.4.2.3. Using Faster Packet Memory

You can increase the performance of the NicheStack networking stack by using fast, low-latency memory for storing Ethernet packets. This section describes this optimization and explains how it works.

Background

The NicheStack networking stack uses a memory queue to assemble and receive network packets. To send a packet, the NicheStack removes a free memory buffer from the queue, assembles the packet data into it, and passes this buffer memory location to the Ethernet device driver. To receive the data, the Ethernet device driver removes a free memory buffer, loads it with the received packet, and passes it back to the networking stack for processing. The NicheStack networking stack allows you to specify where its queue of buffer memory is located and how this memory allocation is implemented.

By default, the Intel version of the NicheStack networking stack allocates this pool of buffer memory using a series of calloc() function calls that use the system’s heap memory. Depending on the design of the system, and where the Nios® II system memory is located, this allocation method could impact overall system performance. For example, if your Nios® II processor’s heap segment is in high latency or slow memory, this allocation method might degrade performance.

Additionally, in the case where the Ethernet device utilizes direct memory access (DMA) hardware to move the packets and the Nios® II processor is not directly involved in transmitting or receiving the packet data, then this buffer memory must exist in an uncached region. Lack of buffer caching further degrades the performance because the Nios® II processor’s data cache is not able to offset any performance issues due to the slow memory.

The solution is to use the fastest memory possible for the networking stacks buffer memory, preferably a separate memory not used by the Nios® II processor for programmatic execution.

Solution

The ipport.h file defines a series of macros for allocating and deallocating big and small networking buffers. The macro names begin with BB_ (for “big buffer”) and LB_ (for “little buffer”). Following is the block of macros with the definitions in place for Triple Speed Ethernet device driver support.

#define BB_ALLOC(size) ncpalloc(size)
#define BB_FREE(ptr) ncpfree(ptr)
#define LB_ALLOC(size) ncpalloc(size)
#define LB_FREE(ptr) ncpfree(ptr)

You can use these macros to allocate and deallocate memory any way you choose. The Nios® II ethernet acceleration design example redefines these macros to allocate memory from on-chip memory (a fast memory structure inside the FPGA). This faster memory results in various degrees of performance increase, depending on the system. For detailed performance improvement figures, please refer to the readme.doc file included in the design example.

The Intel version of NicheStack does not use the BB_FREE() or LB_FREE() function calls. Therefore, any memory allocated with the BB_ALLOC() and LB_ALLOC() function calls is allocated at run time, and is never freed.

Using fast, low-latency memory for NicheStack’s packet storage can improve the overall performance of the system.