Intel® High Level Synthesis Compiler Standard Edition: Best Practices Guide

ID 683259
Date 12/18/2019
Public
Document Table of Contents

4.2.2. Unroll Loops

When a loop is unrolled, each iteration of the loop is replicated in hardware and executes simultaneously if the iterations are independent. Unrolling loops trades an increase in FPGA area use for a reduction in the latency of your component.
Consider the following basic loop with three stages and three iterations. Each stage represents the operations that occur in the loop within one clock cycle.
Figure 8. Basic loop with three stages and three iterations


If each stage of this loop takes one clock cycle to execute, then this loop has a latency of nine cycles.
The following figure shows the loop from Figure 8 unrolled three times.
Figure 9. Unrolled loop with three stages and three iterations


Three iterations of the loop can now be completed in only three clock cycles, but three times as many hardware resources are required.

You can control how the compiler unrolls a loop with the #pragma unroll directive, but this directive works only if the compiler knows the trip count for the loop in advance or if you specify the unroll factor. In addition to replicating the hardware, the compiler also reschedules the circuit such that each operation runs as soon as the inputs for the operation are ready.

For an example of using the #pragma unroll directive, see the best_practices/resource_sharing_filter tutorial.