Abstract:
Hardware benchmarking of candidates competing in cryptographic contests, such as
SHA-3 and CAESAR, is very important for ranking of their suitability for standardization.
A huge amount of time is necessary to design the datapath and controller and convert
them to the hardware description language (HDL) code, due to an increasing number of
candidates. The other difficulty is to develop a testbench in HDL for verification purposes.
High-Level Synthesis (HLS), based on the newly developed Xilinx Vivado HLS tool, offers a
potential solution to the aforementioned problems. Therefore, in the first part of this thesis
we investigate the following hypothesis: Ranking of candidate algorithms in cryptographic
contests in terms of their performance in modern FPGAs & All-Programmable SoCs will
remain the same independently whether the HDL implementations are developed manually
or generated automatically using HLS tools. In order to verify a potential validity of this
approach, 4 Round 2 SHA-3 candidates are implemented using Vivado HLS and compared
with existing RTL implementation. Our results indicate that the ranking of the evaluated
candidates, in terms of four major performance metrics, frequency, throughput, area, and
throughput to area ratio, has remained unchanged for all tested candidates.
In addition, one of the most essential performance metrics is the throughput, which
highly depends on the algorithm, hardware implementation architecture, coding style, and
options of tools. The maximum throughput is calculated based on the maximum clock
frequency supported by each algorithm. A common way of determining the maximum clock
frequency is static timing analysis provided by the CAD toolsets, such as Xilinx ISE, Xilinx
Vivado, and Altera Quartus Prime. Finding actual maximum clock frequency utilizing
static timing analysis is not a trivial task, especially in the Xilinx Vivado environment. It
is extremely time consuming and tedious. As a result, in the second part of this thesis,
we describe Minerva. Minerva is an automated hardware benchmarking tool which finds
maximum frequency based on static timing analysis. It can be configured to target either
Throughput or Throughput/Area as optimization criteria and to search through specific
number of optimization strategies. The tool determines the best requested clock frequency,
leading to the maximum value of the optimization target. We evaluated 20 Round 2 CAESAR
candidates in terms of frequency and frequency to area ratio. Minerva frequency search
is compared to binary search and results demonstrated up to 37% improvement in terms of
throughput to area ratio and up to 24% in terms of throughput.
In the third part of the thesis, we have developed a universal testbed, which is capable
of measuring the maximum clock frequency experimentally, using a prototyping board. We
are targeting cryptographic hardware cores, such as implementations of SHA-3 candidates.
Our testbed is designed using a Zynq platform and takes advantage of software/hardware
co-design and Advanced eXtensible Interface (AXI). We measured the maximum clock frequency
and the execution time of 12 Round 2 SHA-3 candidates experimentally on ZedBoard
and compared the results with the frequencies reported by Xilinx Vivado. Our results indicate
that depending on the characteristics of each algorithm, we may achieve either much
higher or the same experimental frequency than the results reported by the tools using
static timing analysis.