4.2.2. Unroll Loops

Intel® High Level Synthesis Compiler Standard Edition: Best Practices Guide

Download PDF

ID 683259

Date 12/18/2019

Version 19.1

Public

Visible to Intel only — GUID: cqe1573417021023

Ixiasoft

View Details

4.2.2. Unroll Loops

When a loop is unrolled, each iteration of the loop is replicated in hardware and executes simultaneously if the iterations are independent. Unrolling loops trades an increase in FPGA area use for a reduction in the latency of your component.

Consider the following basic loop with three stages and three iterations. Each stage represents the operations that occur in the loop within one clock cycle.

Figure 8. Basic loop with three stages and three iterations

If each stage of this loop takes one clock cycle to execute, then this loop has a latency of nine cycles.

The following figure shows the loop from Figure 8 unrolled three times.

Figure 9. Unrolled loop with three stages and three iterations

Three iterations of the loop can now be completed in only three clock cycles, but three times as many hardware resources are required.

You can control how the compiler unrolls a loop with the #pragma unroll directive, but this directive works only if the compiler knows the trip count for the loop in advance or if you specify the unroll factor. In addition to replicating the hardware, the compiler also reschedules the circuit such that each operation runs as soon as the inputs for the operation are ready.

For an example of using the #pragma unroll directive, see the best_practices/resource_sharing_filter tutorial.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® High Level Synthesis Compiler Standard Edition: Best Practices Guide

4.2.2. Unroll Loops