Embedded Design Handbook

ID 683689
Date 8/28/2023
Public
Document Table of Contents

7.1.1.1. Customizing and Accelerating FPGA Designs

FPGA-based designs provide you with the flexibility to modify your design easily, and to experiment to determine the best balance between hardware and software implementation of your design. In a discrete microcontroller-based design process, you must determine the processor resources—cache size and built-in peripherals, for example—before you reach the final design stages. You may be forced to make these resource decisions before you know your final processor requirements. If you implement some or all of your system's critical design components in an FPGA, you can easily redesign your system as your final product needs become clear. If you use the Nios® II processor, you can experiment with the correct balance of processor resources to optimize your system for your needs. Platform Designer facilitates this flexibility, by allowing you to add and modify system components and regenerate your project easily.

To experiment with performance and resource utilization tradeoffs, the following hardware optimization techniques are available:

  • Processor Performance—You can increase the performance of the Nios® II processor in the following ways:
    • Computational Efficiency—Selecting the most computationally efficient Nios® II processor core is the quickest way to improve overall application performance. The following Nios® II processor cores are available, in decreasing order of performance:26
      • Nios® II/f—optimized for speed
      • Nios® II/e—conserves on-chip resources at the expense of speed
    • Memory Bandwidth—Using low-latency, high speed memory decreases the amount of time required by the processor to fetch instructions and move data. Additionally, increasing the processor’s arbitration share of the memory increases the processor’s performance by allowing the Nios® II processor to perform more transactions to the memory before another Avalon® master port can assume control of the memory.
    • Instruction and Data Caches—Adding an instruction and data cache is an effective way to decrease the amount of time the Nios® II processor spends performing operations, especially in systems that have slow memories, such as SDRAM or double data rate (DDR) SDRAM. In general, the larger the cache size selected for the Nios® II processor, the greater the performance improvement.
    • Tightly-coupled Memories—Tightly-coupled memory is fast on-chip memory that bypasses the cache and has guaranteed low latency. Tightly-coupled memory gives the best memory access performance. You assign code and data to tightly-coupled memory partitions in the same way as other memory sections.
    • Hardware Multipliers—The Nios® II processor provides the following hardware multiplier options:
      • DSP Block—Includes DSP block multipliers available on the target device. This option is available only on Intel FPGAs that have DSP Blocks.
      • Embedded Multipliers—Includes dedicated embedded multipliers available on the target device. This option is available only on Intel FPGAs that have embedded multipliers.
      • Logic Elements—Includes hardware multipliers built from logic element (LE) resources.
      • None—Does not include multiply hardware. In this case, multiply operations are emulated in software
    • Optional Branch Prediction—The Nios® II processor performs dynamic and static branch prediction to minimize the cycle penalty associated with taken branches.
  • Clock Frequency—Increasing the speed of the processor’s clock results in more instructions being executed per unit of time. To gain the best performance possible, ensure that the processor’s execution memory is in the same clock domain as the processor, to avoid the use of clock-crossing FIFO buffers.

    One of the easiest ways to increase the operational clock frequency of the processor and memory peripherals is to use a FIFO bridge IP core to isolate the slower peripherals of the system. With a bridge peripheral, for example, you can connect the processor, memory, and an Ethernet device on one side of the bridge, and connect all of the peripherals that are not performance dependent on the other side of the bridge.

Similarly, if you implement your system in an FPGA, you can experiment with the best balance of hardware and software resource usage. If you find you have a software bottleneck in some part of your application, you can consider accelerating the relevant algorithm by implementing it in hardware instead of software. Platform Designer facilitates experimenting with the balance of software and hardware implementation. You can even design custom hardware accelerators for specific system tasks.

To help you solve system performance issues, the following acceleration methodologies are available:

  • Custom peripherals
  • Custom instructions

The method of acceleration you choose depends on the operation you wish to accelerate. To accelerate streaming operations on large amounts of data, a custom peripheral may be a good solution. Hardware interfaces (such as implementations of the Ethernet or serial peripheral interface (SPI) protocol) may also be implemented efficiently as custom peripherals. The current floating-point custom instruction is a good example of the type of operations that are typically best accelerated using custom instructions.

For information about hardware acceleration, refer to the "Hardware Acceleration and Coprocessing" chapter of the Embedded Design Handbook.

For information about custom instructions, refer to the Nios® II Custom Instruction User Guide.

For information about creating custom peripherals, refer to the "Creating Platform Designer Components" chapter in the Intel® Quartus® Prime Handbook Volume 1: Design and Synthesis.

26 The Nios® II/s core is only available with Nios® II Classic.