Friday, February 19, 2010

FPGA : Sacrificing resource sharing for better timing

When it comes to any system design, the most important part is the optimum utilization of the available resources. Talking of system design with FPGAs, the resources mean the PLLs(Phase Locked Loop), RIOs(Rocket IOs), DCMs(Digital Clock Manager), BUFGs(Global Buffers), FFs(Flip-Flops), LUTs(Look Up Tables) and dedicated block rams.

There are numerous vendors supplying FPGAs in the market like Xilinx, Altera, Lattice, etc. All these vendors supply a wide range of FPGAs. Suitable FPGA should be selected depending upon the system resources required for your design and the resources available in the FPGA.

After the synthesisable code describing the system has been written, generating FPGA build takes two major steps. First, the code must fit the device selected and secondly, the design must meet the timing constraints (set up time, hold time, period constraints).

Most of the tools used to make FPGA builds available these days try to optimise the code and try to fit it in the device taking as less resources as it can. One of the logic used for optimising code is Resource Sharing.

Resource sharing is a technique where multiple logic blocks of the designed system share a common logic block. Let us take an example of a system which can add and subtract two input bits. The system can be thus divided into two major blocks of Addition of these two bits would give a bit each of Sum and Carry. Meanwhile, a subtractor would have a bit each of Difference and Borrow as its outputs. As we write the truth table we see that the sum and difference outputs have same logic design. If we try to make FPGA build with the code describing the above mentioned logic, the tool will infer only one logic block for generation of output difference and sum, i.e A XOR B (where A and B are input bits). And the tool will also try to place this block in the FPGA which will be equidistant from the outputs sum and difference.




This sharing of A XOR B block by the adder and subtractor block is known as resource sharing. This leads to optimum resource(FFs and LUTs) utilisation.

Referring to the above figure, we can see that if two logic blocks of A XOR B were generated, one each for sum and difference, the tool would have tried to place these two blocks as following:



When we are designing the system for lower speed(lower clock), then resource sharing is the best way to go. But, when we are working at higher speed where the timing is of utmost importance while placing the blocks in the FPGA, we have to take decisions of whether to use resource sharing or not. If the timing constraints for all the clocks used are met and the system is fitting in the device, then all is well. If the fitting issues arise, then we may have to enable resource sharing(by default the tools enable this feature). If we have design fitting handsomely but with problems meeting the timing constraints, then we may have to disable the resource sharing option. This would mean using up more resources but we would be rewarded with better timings. See the design for adder plus subtractor system in figures 1.1 with resource sharing and 1.2 without resource sharing. Obviously, the figure 1.1 uses lesser resources. But, when we calculate the total path delay from A and B to outputs Sum and Difference in the two designs, we will have to conclude that without resource sharing, we are achieving better timing in terms of path delays.

Now, consider the adder plus subtractor design to be a part of a bigger design, where the inputs A and B are outputs of Flip-Flops and the outputs Sum and Difference are inputs to different Flip-Flops. Using the design represented in figure 1.2 would lead to better margins in terms of setup times and the maximum clock speed that can be used in the design.

This way, sacrificing resource sharing can lead to better timing achievements.

No comments:

Post a Comment