Skip to content
Snippets Groups Projects
Commit abc4ccc1 authored by mmrahorovic's avatar mmrahorovic
Browse files

[notebooks]: ensure that the new (renamed) attribute is addressed correctly

parent ea7518e2
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# End-to-End FINN Flow for a Simple Convolutional Net # End-to-End FINN Flow for a Simple Convolutional Net
----------------------------------------------------------------- -----------------------------------------------------------------
In this notebook, we will go through the FINN steps needed to take a binarized convolutional network all the way down to a heterogeneous streaming dataflow accelerator running on the FPGA. In this notebook, we will go through the FINN steps needed to take a binarized convolutional network all the way down to a heterogeneous streaming dataflow accelerator running on the FPGA.
It's recommended to go through the simpler [end-to-end notebook for a fully connected network](tfc_end2end_example.ipynb) first, since many steps here are very similar and we will focus on what is done differently for convolutions. It's recommended to go through the simpler [end-to-end notebook for a fully connected network](tfc_end2end_example.ipynb) first, since many steps here are very similar and we will focus on what is done differently for convolutions.
This notebook is quite lengthy, and some of the cells (involving Vivado synthesis) may take up to an hour to finish running. To let you save and resume your progress, we will save the intermediate ONNX models that are generated in the various steps to disk, so that you can jump back directly to where you left off. This notebook is quite lengthy, and some of the cells (involving Vivado synthesis) may take up to an hour to finish running. To let you save and resume your progress, we will save the intermediate ONNX models that are generated in the various steps to disk, so that you can jump back directly to where you left off.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Quick Introduction to the CNV-w1a1 Network ## Quick Introduction to the CNV-w1a1 Network
The particular quantized neural network (QNN) we will be targeting in this notebook is referred to as CNV-w1a1 and it classifies 32x32 RGB images into one of ten CIFAR-10 classes. All weights and activations in this network are quantized to bipolar values (either -1 or +1), with the exception of the input (which is RGB with 8 bits per channel) and the final output (which is 32-bit numbers). It first appeared in the original [FINN paper](https://arxiv.org/abs/1612.07119) from ISFPGA'17 with the name CNV, as a variant of the binarized convolutional network from the [BinaryNet paper](https://arxiv.org/abs/1602.02830), in turn inspired by the VGG-11 topology which was the runner-up for the 2014 [ImageNet Large Scale Visual Recognition Challenge](http://www.image-net.org/challenges/LSVRC/). The particular quantized neural network (QNN) we will be targeting in this notebook is referred to as CNV-w1a1 and it classifies 32x32 RGB images into one of ten CIFAR-10 classes. All weights and activations in this network are quantized to bipolar values (either -1 or +1), with the exception of the input (which is RGB with 8 bits per channel) and the final output (which is 32-bit numbers). It first appeared in the original [FINN paper](https://arxiv.org/abs/1612.07119) from ISFPGA'17 with the name CNV, as a variant of the binarized convolutional network from the [BinaryNet paper](https://arxiv.org/abs/1602.02830), in turn inspired by the VGG-11 topology which was the runner-up for the 2014 [ImageNet Large Scale Visual Recognition Challenge](http://www.image-net.org/challenges/LSVRC/).
You'll have a chance to interactively examine the layers that make up the network in Netron in a moment, so that's enough about the network for now. You'll have a chance to interactively examine the layers that make up the network in Netron in a moment, so that's enough about the network for now.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Quick Recap of the End-to-End Flow ## Quick Recap of the End-to-End Flow
The FINN compiler comes with many *transformations* that modify the ONNX representation of the network according to certain patterns. This notebook will demonstrate a *possible* sequence of such transformations to take a particular trained network all the way down to hardware, as shown in the figure below. The FINN compiler comes with many *transformations* that modify the ONNX representation of the network according to certain patterns. This notebook will demonstrate a *possible* sequence of such transformations to take a particular trained network all the way down to hardware, as shown in the figure below.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
![](finn-design-flow-example.svg) ![](finn-design-flow-example.svg)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The white fields show the state of the network representation in the respective step. The colored fields represent the transformations that are applied to the network to achieve a certain result. The diagram is divided into 5 sections represented by a different color, each of it includes several flow steps. The flow starts in top left corner with Brevitas export (green section), followed by the preparation of the network (blue section) for the Vivado HLS synthesis and Vivado IPI stitching (orange section), and finally building a PYNQ overlay bitfile and testing it on a PYNQ board (yellow section). The white fields show the state of the network representation in the respective step. The colored fields represent the transformations that are applied to the network to achieve a certain result. The diagram is divided into 5 sections represented by a different color, each of it includes several flow steps. The flow starts in top left corner with Brevitas export (green section), followed by the preparation of the network (blue section) for the Vivado HLS synthesis and Vivado IPI stitching (orange section), and finally building a PYNQ overlay bitfile and testing it on a PYNQ board (yellow section).
There is an additional section for functional verification (red section) on the left side of the diagram, which we will not cover in this notebook. For details please take a look in the verification notebook which you can find [here](tfc_end2end_verification.ipynb) There is an additional section for functional verification (red section) on the left side of the diagram, which we will not cover in this notebook. For details please take a look in the verification notebook which you can find [here](tfc_end2end_verification.ipynb)
We will use the helper function `showInNetron` to show the ONNX model at the current transformation step. The Netron displays are interactive, but they only work when running the notebook actively and not on GitHub (i.e. if you are viewing this on GitHub you'll only see blank squares). We will use the helper function `showInNetron` to show the ONNX model at the current transformation step. The Netron displays are interactive, but they only work when running the notebook actively and not on GitHub (i.e. if you are viewing this on GitHub you'll only see blank squares).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from finn.util.basic import make_build_dir from finn.util.basic import make_build_dir
from finn.util.visualization import showInNetron from finn.util.visualization import showInNetron
import os import os
build_dir = os.environ["FINN_BUILD_DIR"] build_dir = os.environ["FINN_BUILD_DIR"]
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 1. Brevitas Export, FINN Import and Tidy-Up ## 1. Brevitas Export, FINN Import and Tidy-Up
Similar to what we did in the TFC-w1a1 end-to-end notebook, we will start by exporting the [pretrained CNV-w1a1 network](https://github.com/Xilinx/brevitas/tree/master/src/brevitas_examples/bnn_pynq) to ONNX, importing that into FINN and running the "tidy-up" transformations to have a first look at the topology. Similar to what we did in the TFC-w1a1 end-to-end notebook, we will start by exporting the [pretrained CNV-w1a1 network](https://github.com/Xilinx/brevitas/tree/master/src/brevitas_examples/bnn_pynq) to ONNX, importing that into FINN and running the "tidy-up" transformations to have a first look at the topology.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import onnx import onnx
from finn.util.test import get_test_model_trained from finn.util.test import get_test_model_trained
import brevitas.onnx as bo import brevitas.onnx as bo
from qonnx.core.modelwrapper import ModelWrapper from qonnx.core.modelwrapper import ModelWrapper
from qonnx.transformation.infer_shapes import InferShapes from qonnx.transformation.infer_shapes import InferShapes
from qonnx.transformation.fold_constants import FoldConstants from qonnx.transformation.fold_constants import FoldConstants
from qonnx.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames, RemoveStaticGraphInputs from qonnx.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames, RemoveStaticGraphInputs
cnv = get_test_model_trained("CNV", 1, 1) cnv = get_test_model_trained("CNV", 1, 1)
bo.export_finn_onnx(cnv, (1, 3, 32, 32), build_dir + "/end2end_cnv_w1a1_export.onnx") bo.export_finn_onnx(cnv, (1, 3, 32, 32), build_dir + "/end2end_cnv_w1a1_export.onnx")
model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_export.onnx") model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_export.onnx")
model = model.transform(InferShapes()) model = model.transform(InferShapes())
model = model.transform(FoldConstants()) model = model.transform(FoldConstants())
model = model.transform(GiveUniqueNodeNames()) model = model.transform(GiveUniqueNodeNames())
model = model.transform(GiveReadableTensorNames()) model = model.transform(GiveReadableTensorNames())
model = model.transform(RemoveStaticGraphInputs()) model = model.transform(RemoveStaticGraphInputs())
model.save(build_dir + "/end2end_cnv_w1a1_tidy.onnx") model.save(build_dir + "/end2end_cnv_w1a1_tidy.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now that the model is exported, let's have a look at its layer structure with Netron. Remember that the visualization below is interactive, you can click on the individual nodes and view the layer attributes, trained weights and so on. Now that the model is exported, let's have a look at its layer structure with Netron. Remember that the visualization below is interactive, you can click on the individual nodes and view the layer attributes, trained weights and so on.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
showInNetron(build_dir+"/end2end_cnv_w1a1_tidy.onnx") showInNetron(build_dir+"/end2end_cnv_w1a1_tidy.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
You can see that the network is composed of a repeating convolution-convolution-maxpool layer pattern to extract features using 3x3 convolution kernels (with weights binarized), followed by fully connected layers acting as the classifier. Also notice the initial `MultiThreshold` layer at the beginning of the network, which is quantizing float inputs to 8-bit ones. You can see that the network is composed of a repeating convolution-convolution-maxpool layer pattern to extract features using 3x3 convolution kernels (with weights binarized), followed by fully connected layers acting as the classifier. Also notice the initial `MultiThreshold` layer at the beginning of the network, which is quantizing float inputs to 8-bit ones.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Adding Pre- and Postprocessing <a id='prepost'></a> ### Adding Pre- and Postprocessing <a id='prepost'></a>
Preprocessing and postprocessing steps can be added directly in the ONNX graph. In this case, the preprocessing step divides the input `uint8` data by 255 so the inputs to the CNV-w1a1 network are bounded between [0, 1]. The postprocessing step takes the output of the network and returns the index (0-9) of the image category with the highest probability (top-1). Preprocessing and postprocessing steps can be added directly in the ONNX graph. In this case, the preprocessing step divides the input `uint8` data by 255 so the inputs to the CNV-w1a1 network are bounded between [0, 1]. The postprocessing step takes the output of the network and returns the index (0-9) of the image category with the highest probability (top-1).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from finn.util.pytorch import ToTensor from finn.util.pytorch import ToTensor
from qonnx.transformation.merge_onnx_models import MergeONNXModels from qonnx.transformation.merge_onnx_models import MergeONNXModels
from qonnx.core.datatype import DataType from qonnx.core.datatype import DataType
model = ModelWrapper(build_dir+"/end2end_cnv_w1a1_tidy.onnx") model = ModelWrapper(build_dir+"/end2end_cnv_w1a1_tidy.onnx")
global_inp_name = model.graph.input[0].name global_inp_name = model.graph.input[0].name
ishape = model.get_tensor_shape(global_inp_name) ishape = model.get_tensor_shape(global_inp_name)
# preprocessing: torchvision's ToTensor divides uint8 inputs by 255 # preprocessing: torchvision's ToTensor divides uint8 inputs by 255
totensor_pyt = ToTensor() totensor_pyt = ToTensor()
chkpt_preproc_name = build_dir+"/end2end_cnv_w1a1_preproc.onnx" chkpt_preproc_name = build_dir+"/end2end_cnv_w1a1_preproc.onnx"
bo.export_finn_onnx(totensor_pyt, ishape, chkpt_preproc_name) bo.export_finn_onnx(totensor_pyt, ishape, chkpt_preproc_name)
# join preprocessing and core model # join preprocessing and core model
pre_model = ModelWrapper(chkpt_preproc_name) pre_model = ModelWrapper(chkpt_preproc_name)
model = model.transform(MergeONNXModels(pre_model)) model = model.transform(MergeONNXModels(pre_model))
# add input quantization annotation: UINT8 for all BNN-PYNQ models # add input quantization annotation: UINT8 for all BNN-PYNQ models
global_inp_name = model.graph.input[0].name global_inp_name = model.graph.input[0].name
model.set_tensor_datatype(global_inp_name, DataType["UINT8"]) model.set_tensor_datatype(global_inp_name, DataType["UINT8"])
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from qonnx.transformation.insert_topk import InsertTopK from qonnx.transformation.insert_topk import InsertTopK
from qonnx.transformation.infer_datatypes import InferDataTypes from qonnx.transformation.infer_datatypes import InferDataTypes
# postprocessing: insert Top-1 node at the end # postprocessing: insert Top-1 node at the end
model = model.transform(InsertTopK(k=1)) model = model.transform(InsertTopK(k=1))
chkpt_name = build_dir+"/end2end_cnv_w1a1_pre_post.onnx" chkpt_name = build_dir+"/end2end_cnv_w1a1_pre_post.onnx"
# tidy-up again # tidy-up again
model = model.transform(InferShapes()) model = model.transform(InferShapes())
model = model.transform(FoldConstants()) model = model.transform(FoldConstants())
model = model.transform(GiveUniqueNodeNames()) model = model.transform(GiveUniqueNodeNames())
model = model.transform(GiveReadableTensorNames()) model = model.transform(GiveReadableTensorNames())
model = model.transform(InferDataTypes()) model = model.transform(InferDataTypes())
model = model.transform(RemoveStaticGraphInputs()) model = model.transform(RemoveStaticGraphInputs())
model.save(chkpt_name) model.save(chkpt_name)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
showInNetron(build_dir+"/end2end_cnv_w1a1_pre_post.onnx") showInNetron(build_dir+"/end2end_cnv_w1a1_pre_post.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 2. How FINN Implements Convolutions: Lowering and Streamlining ## 2. How FINN Implements Convolutions: Lowering and Streamlining
In FINN, we implement convolutions with the *lowering* approach: we convert them to matrix-matrix multiply operations, where one of the matrices is generated by sliding a window over the input image. You can read more about the sliding window operator and how convolution lowering works [in this notebook](https://github.com/maltanar/qnn-inference-examples/blob/master/3-convolutional-binarized-gtsrb.ipynb). The streaming dataflow architecture we will end up with is going to look something like this figure from the [FINN-R paper](https://arxiv.org/abs/1809.04570): In FINN, we implement convolutions with the *lowering* approach: we convert them to matrix-matrix multiply operations, where one of the matrices is generated by sliding a window over the input image. You can read more about the sliding window operator and how convolution lowering works [in this notebook](https://github.com/maltanar/qnn-inference-examples/blob/master/3-convolutional-binarized-gtsrb.ipynb). The streaming dataflow architecture we will end up with is going to look something like this figure from the [FINN-R paper](https://arxiv.org/abs/1809.04570):
![](cnv-mp-fc.png) ![](cnv-mp-fc.png)
Note how the convolution layer looks very similar to the fully connected one in terms of the matrix-vector-threshold unit (MVTU), but now the MVTU is preceded by a sliding window unit that produces the matrix from the input image. All of these building blocks, including the `MaxPool` layer you see in this figure, exist as templated Vivado HLS C++ functions in [finn-hlslib](https://github.com/Xilinx/finn-hlslib). Note how the convolution layer looks very similar to the fully connected one in terms of the matrix-vector-threshold unit (MVTU), but now the MVTU is preceded by a sliding window unit that produces the matrix from the input image. All of these building blocks, including the `MaxPool` layer you see in this figure, exist as templated Vivado HLS C++ functions in [finn-hlslib](https://github.com/Xilinx/finn-hlslib).
To target this kind of hardware architecture with our network we'll apply a convolution lowering transformation, in addition to streamlining. You may recall the *streamlining transformation* that we applied to the TFC-w1a1 network, which is a series of mathematical simplifications that allow us to get rid of floating point scaling operations by implementing few-bit activations as thresholding operations. To target this kind of hardware architecture with our network we'll apply a convolution lowering transformation, in addition to streamlining. You may recall the *streamlining transformation* that we applied to the TFC-w1a1 network, which is a series of mathematical simplifications that allow us to get rid of floating point scaling operations by implementing few-bit activations as thresholding operations.
**The current implementation of streamlining is highly network-specific and may not work for your network if its topology is very different than the example network here. We hope to rectify this in future releases.** **The current implementation of streamlining is highly network-specific and may not work for your network if its topology is very different than the example network here. We hope to rectify this in future releases.**
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from finn.transformation.streamline import Streamline from finn.transformation.streamline import Streamline
from qonnx.transformation.lower_convs_to_matmul import LowerConvsToMatMul from qonnx.transformation.lower_convs_to_matmul import LowerConvsToMatMul
from qonnx.transformation.bipolar_to_xnor import ConvertBipolarMatMulToXnorPopcount from qonnx.transformation.bipolar_to_xnor import ConvertBipolarMatMulToXnorPopcount
import finn.transformation.streamline.absorb as absorb import finn.transformation.streamline.absorb as absorb
from finn.transformation.streamline.reorder import MakeMaxPoolNHWC, MoveScalarLinearPastInvariants from finn.transformation.streamline.reorder import MakeMaxPoolNHWC, MoveScalarLinearPastInvariants
from qonnx.transformation.infer_data_layouts import InferDataLayouts from qonnx.transformation.infer_data_layouts import InferDataLayouts
from qonnx.transformation.general import RemoveUnusedTensors from qonnx.transformation.general import RemoveUnusedTensors
model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_pre_post.onnx") model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_pre_post.onnx")
model = model.transform(MoveScalarLinearPastInvariants()) model = model.transform(MoveScalarLinearPastInvariants())
model = model.transform(Streamline()) model = model.transform(Streamline())
model = model.transform(LowerConvsToMatMul()) model = model.transform(LowerConvsToMatMul())
model = model.transform(MakeMaxPoolNHWC()) model = model.transform(MakeMaxPoolNHWC())
model = model.transform(absorb.AbsorbTransposeIntoMultiThreshold()) model = model.transform(absorb.AbsorbTransposeIntoMultiThreshold())
model = model.transform(ConvertBipolarMatMulToXnorPopcount()) model = model.transform(ConvertBipolarMatMulToXnorPopcount())
model = model.transform(Streamline()) model = model.transform(Streamline())
# absorb final add-mul nodes into TopK # absorb final add-mul nodes into TopK
model = model.transform(absorb.AbsorbScalarMulAddIntoTopK()) model = model.transform(absorb.AbsorbScalarMulAddIntoTopK())
model = model.transform(InferDataLayouts()) model = model.transform(InferDataLayouts())
model = model.transform(RemoveUnusedTensors()) model = model.transform(RemoveUnusedTensors())
model.save(build_dir + "/end2end_cnv_w1a1_streamlined.onnx") model.save(build_dir + "/end2end_cnv_w1a1_streamlined.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We won't go into too much detail about what happens in each transformation and why they are called in the particular order they are (feel free to visualize the intermediate steps using Netron yourself if you are curious) but here is a brief summmmary: We won't go into too much detail about what happens in each transformation and why they are called in the particular order they are (feel free to visualize the intermediate steps using Netron yourself if you are curious) but here is a brief summmmary:
* `Streamline` moves floating point scaling and addition operations closer to the input of the nearest thresholding activation and absorbs them into thresholds * `Streamline` moves floating point scaling and addition operations closer to the input of the nearest thresholding activation and absorbs them into thresholds
* `LowerConvsToMatMul` converts ONNX `Conv` nodes into sequences of `Im2Col, MatMul` nodes as discussed above. `Im2Col` is a custom FINN ONNX high-level node type that implements the sliding window operator. * `LowerConvsToMatMul` converts ONNX `Conv` nodes into sequences of `Im2Col, MatMul` nodes as discussed above. `Im2Col` is a custom FINN ONNX high-level node type that implements the sliding window operator.
* `MakeMaxPoolNHWC` and `AbsorbTransposeIntoMultiThreshold` convert the *data layout* of the network into the NHWC data layout that finn-hlslib primitives use. NCHW means the tensor dimensions are ordered as `(N : batch, H : height, W : width, C : channels)` (assuming 2D images). The ONNX standard ops normally use the NCHW layout, but the ONNX intermediate representation itself does not dictate any data layout. * `MakeMaxPoolNHWC` and `AbsorbTransposeIntoMultiThreshold` convert the *data layout* of the network into the NHWC data layout that finn-hlslib primitives use. NCHW means the tensor dimensions are ordered as `(N : batch, H : height, W : width, C : channels)` (assuming 2D images). The ONNX standard ops normally use the NCHW layout, but the ONNX intermediate representation itself does not dictate any data layout.
* You may recall `ConvertBipolarMatMulToXnorPopcount` from the TFC-w1a1 example, which is needed to implement bipolar-by-bipolar (w1a1) networks correctly using finn-hlslib. * You may recall `ConvertBipolarMatMulToXnorPopcount` from the TFC-w1a1 example, which is needed to implement bipolar-by-bipolar (w1a1) networks correctly using finn-hlslib.
Let's visualize the streamlined and lowered network with Netron. Observe how all the `Conv` nodes have turned into pairs of `Im2Col, MatMul` nodes, and many nodes including `BatchNorm, Mul, Add` nodes have disappeared and replaced with `MultiThreshold` nodes. Let's visualize the streamlined and lowered network with Netron. Observe how all the `Conv` nodes have turned into pairs of `Im2Col, MatMul` nodes, and many nodes including `BatchNorm, Mul, Add` nodes have disappeared and replaced with `MultiThreshold` nodes.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
showInNetron(build_dir+"/end2end_cnv_w1a1_streamlined.onnx") showInNetron(build_dir+"/end2end_cnv_w1a1_streamlined.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 3. Partitioning, Conversion to HLS Layers and Folding ## 3. Partitioning, Conversion to HLS Layers and Folding
The next steps will be (again) very similar to what we did for the TFC-w1a1 network. We'll first convert the layers that we can put into the FPGA into their HLS equivalents and separate them out into a *dataflow partition*: The next steps will be (again) very similar to what we did for the TFC-w1a1 network. We'll first convert the layers that we can put into the FPGA into their HLS equivalents and separate them out into a *dataflow partition*:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import finn.transformation.fpgadataflow.convert_to_hls_layers as to_hls import finn.transformation.fpgadataflow.convert_to_hls_layers as to_hls
from finn.transformation.fpgadataflow.create_dataflow_partition import ( from finn.transformation.fpgadataflow.create_dataflow_partition import (
CreateDataflowPartition, CreateDataflowPartition,
) )
from finn.transformation.move_reshape import RemoveCNVtoFCFlatten from finn.transformation.move_reshape import RemoveCNVtoFCFlatten
from qonnx.custom_op.registry import getCustomOp from qonnx.custom_op.registry import getCustomOp
from qonnx.transformation.infer_data_layouts import InferDataLayouts from qonnx.transformation.infer_data_layouts import InferDataLayouts
# choose the memory mode for the MVTU units, decoupled or const # choose the memory mode for the MVTU units, decoupled or const
mem_mode = "decoupled" mem_mode = "decoupled"
model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_streamlined.onnx") model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_streamlined.onnx")
model = model.transform(to_hls.InferBinaryMatrixVectorActivation(mem_mode)) model = model.transform(to_hls.InferBinaryMatrixVectorActivation(mem_mode))
model = model.transform(to_hls.InferQuantizedMatrixVectorActivation(mem_mode)) model = model.transform(to_hls.InferQuantizedMatrixVectorActivation(mem_mode))
# TopK to LabelSelect # TopK to LabelSelect
model = model.transform(to_hls.InferLabelSelectLayer()) model = model.transform(to_hls.InferLabelSelectLayer())
# input quantization (if any) to standalone thresholding # input quantization (if any) to standalone thresholding
model = model.transform(to_hls.InferThresholdingLayer()) model = model.transform(to_hls.InferThresholdingLayer())
model = model.transform(to_hls.InferConvInpGen()) model = model.transform(to_hls.InferConvInpGen())
model = model.transform(to_hls.InferStreamingMaxPool()) model = model.transform(to_hls.InferStreamingMaxPool())
# get rid of Reshape(-1, 1) operation between hlslib nodes # get rid of Reshape(-1, 1) operation between hlslib nodes
model = model.transform(RemoveCNVtoFCFlatten()) model = model.transform(RemoveCNVtoFCFlatten())
# get rid of Tranpose -> Tranpose identity seq # get rid of Tranpose -> Tranpose identity seq
model = model.transform(absorb.AbsorbConsecutiveTransposes()) model = model.transform(absorb.AbsorbConsecutiveTransposes())
# infer tensor data layouts # infer tensor data layouts
model = model.transform(InferDataLayouts()) model = model.transform(InferDataLayouts())
parent_model = model.transform(CreateDataflowPartition()) parent_model = model.transform(CreateDataflowPartition())
parent_model.save(build_dir + "/end2end_cnv_w1a1_dataflow_parent.onnx") parent_model.save(build_dir + "/end2end_cnv_w1a1_dataflow_parent.onnx")
sdp_node = parent_model.get_nodes_by_op_type("StreamingDataflowPartition")[0] sdp_node = parent_model.get_nodes_by_op_type("StreamingDataflowPartition")[0]
sdp_node = getCustomOp(sdp_node) sdp_node = getCustomOp(sdp_node)
dataflow_model_filename = sdp_node.get_nodeattr("model") dataflow_model_filename = sdp_node.get_nodeattr("model")
# save the dataflow partition with a different name for easier access # save the dataflow partition with a different name for easier access
dataflow_model = ModelWrapper(dataflow_model_filename) dataflow_model = ModelWrapper(dataflow_model_filename)
dataflow_model.save(build_dir + "/end2end_cnv_w1a1_dataflow_model.onnx") dataflow_model.save(build_dir + "/end2end_cnv_w1a1_dataflow_model.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Notice the additional `RemoveCNVtoFCFlatten` transformation that was not used for TFC-w1a1. In the last Netron visualization you may have noticed a `Reshape` operation towards the end of the network where the convolutional part of the network ends and the fully-connected layers started. That `Reshape` is essentialy a tensor flattening operation, which we can remove for the purposes of hardware implementation. We can examine the contents of the dataflow partition with Netron, and observe the `ConvolutionInputGenerator`, `MatrixVectorActivation` and `StreamingMaxPool_Batch` nodes that implement the sliding window, matrix multiply and maxpool operations in hlslib. *Note that the MatrixVectorActivation instances following the ConvolutionInputGenerator nodes are really implementing the convolutions, despite the name. The final three MatrixVectorActivation instances implement actual FC layers.* Notice the additional `RemoveCNVtoFCFlatten` transformation that was not used for TFC-w1a1. In the last Netron visualization you may have noticed a `Reshape` operation towards the end of the network where the convolutional part of the network ends and the fully-connected layers started. That `Reshape` is essentialy a tensor flattening operation, which we can remove for the purposes of hardware implementation. We can examine the contents of the dataflow partition with Netron, and observe the `ConvolutionInputGenerator`, `MatrixVectorActivation` and `StreamingMaxPool_Batch` nodes that implement the sliding window, matrix multiply and maxpool operations in hlslib. *Note that the MatrixVectorActivation instances following the ConvolutionInputGenerator nodes are really implementing the convolutions, despite the name. The final three MatrixVectorActivation instances implement actual FC layers.*
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
showInNetron(build_dir + "/end2end_cnv_w1a1_dataflow_parent.onnx") showInNetron(build_dir + "/end2end_cnv_w1a1_dataflow_parent.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Note that pretty much everything has gone into the `StreamingDataflowPartition` node; the only operation remaining is to apply a `Transpose` to obtain NHWC input from a NCHW input (the ONNX default). Note that pretty much everything has gone into the `StreamingDataflowPartition` node; the only operation remaining is to apply a `Transpose` to obtain NHWC input from a NCHW input (the ONNX default).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
showInNetron(build_dir + "/end2end_cnv_w1a1_dataflow_model.onnx") showInNetron(build_dir + "/end2end_cnv_w1a1_dataflow_model.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now we have to set the *folding factors* for certain layers to adjust the performance of our accelerator, similar to the TFC-w1a1 example. We'll also set the desired FIFO depths around those layers, which are important to achieve full throughput in the accelerator. Now we have to set the *folding factors* for certain layers to adjust the performance of our accelerator, similar to the TFC-w1a1 example. We'll also set the desired FIFO depths around those layers, which are important to achieve full throughput in the accelerator.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_dataflow_model.onnx") model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_dataflow_model.onnx")
fc_layers = model.get_nodes_by_op_type("MatrixVectorActivation") fc_layers = model.get_nodes_by_op_type("MatrixVectorActivation")
# each tuple is (PE, SIMD, in_fifo_depth) for a layer # each tuple is (PE, SIMD, in_fifo_depth) for a layer
folding = [ folding = [
(16, 3, 128), (16, 3, [128]),
(32, 32, 128), (32, 32, [128]),
(16, 32, 128), (16, 32, [128]),
(16, 32, 128), (16, 32, [128]),
(4, 32, 81), (4, 32, [81]),
(1, 32, 2), (1, 32, [2]),
(1, 4, 2), (1, 4, [2]),
(1, 8, 128), (1, 8, [128]),
(5, 1, 3), (5, 1, [3]),
] ]
for fcl, (pe, simd, ififodepth) in zip(fc_layers, folding): for fcl, (pe, simd, ififodepth) in zip(fc_layers, folding):
fcl_inst = getCustomOp(fcl) fcl_inst = getCustomOp(fcl)
fcl_inst.set_nodeattr("PE", pe) fcl_inst.set_nodeattr("PE", pe)
fcl_inst.set_nodeattr("SIMD", simd) fcl_inst.set_nodeattr("SIMD", simd)
fcl_inst.set_nodeattr("inFIFODepth", ififodepth) fcl_inst.set_nodeattr("inFIFODepths", ififodepth)
# use same SIMD values for the sliding window operators # use same SIMD values for the sliding window operators
swg_layers = model.get_nodes_by_op_type("ConvolutionInputGenerator") swg_layers = model.get_nodes_by_op_type("ConvolutionInputGenerator")
for i in range(len(swg_layers)): for i in range(len(swg_layers)):
swg_inst = getCustomOp(swg_layers[i]) swg_inst = getCustomOp(swg_layers[i])
simd = folding[i][1] simd = folding[i][1]
swg_inst.set_nodeattr("SIMD", simd) swg_inst.set_nodeattr("SIMD", simd)
model = model.transform(GiveUniqueNodeNames()) model = model.transform(GiveUniqueNodeNames())
model.save(build_dir + "/end2end_cnv_w1a1_folded.onnx") model.save(build_dir + "/end2end_cnv_w1a1_folded.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Below we visualize in Netron to observe the `StreamingDataWidthConverter` and `StreamingFIFO` nodes that have been inserted into graph, as well as the folding factors in the `PE` and `SIMD` attributes of each `MatrixVectorActivation`. Below we visualize in Netron to observe the `StreamingDataWidthConverter` and `StreamingFIFO` nodes that have been inserted into graph, as well as the folding factors in the `PE` and `SIMD` attributes of each `MatrixVectorActivation`.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
showInNetron(build_dir + "/end2end_cnv_w1a1_folded.onnx") showInNetron(build_dir + "/end2end_cnv_w1a1_folded.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Our network is now ready and we can start with the hardware generation. Our network is now ready and we can start with the hardware generation.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 4. Hardware Generation ## 4. Hardware Generation
From this point onward, the steps we have to follow do not depend on the particular network and will be exactly the same as the TFC-w1a1 example. **which may take about 30 minutes depending on your host computer**. For more details about what's going on in this step, please consult the [TFC end-to-end notebook](tfc_end2end_example.ipynb) or the appropriate section in the [FINN documentation](https://finn.readthedocs.io/en/latest/hw_build.html). From this point onward, the steps we have to follow do not depend on the particular network and will be exactly the same as the TFC-w1a1 example. **which may take about 30 minutes depending on your host computer**. For more details about what's going on in this step, please consult the [TFC end-to-end notebook](tfc_end2end_example.ipynb) or the appropriate section in the [FINN documentation](https://finn.readthedocs.io/en/latest/hw_build.html).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
test_pynq_board = "Pynq-Z1" test_pynq_board = "Pynq-Z1"
target_clk_ns = 10 target_clk_ns = 10
from finn.transformation.fpgadataflow.make_zynq_proj import ZynqBuild from finn.transformation.fpgadataflow.make_zynq_proj import ZynqBuild
model = ModelWrapper(build_dir+"/end2end_cnv_w1a1_folded.onnx") model = ModelWrapper(build_dir+"/end2end_cnv_w1a1_folded.onnx")
model = model.transform(ZynqBuild(platform = test_pynq_board, period_ns = target_clk_ns)) model = model.transform(ZynqBuild(platform = test_pynq_board, period_ns = target_clk_ns))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
After the `ZynqBuild` we run one additional transformation to generate a PYNQ driver for the accelerator. After the `ZynqBuild` we run one additional transformation to generate a PYNQ driver for the accelerator.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from finn.transformation.fpgadataflow.make_pynq_driver import MakePYNQDriver from finn.transformation.fpgadataflow.make_pynq_driver import MakePYNQDriver
model = model.transform(MakePYNQDriver("zynq-iodma")) model = model.transform(MakePYNQDriver("zynq-iodma"))
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
model.save(build_dir + "/end2end_cnv_w1a1_synth.onnx") model.save(build_dir + "/end2end_cnv_w1a1_synth.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 5. Deployment and Remote Execution ## 5. Deployment and Remote Execution
Now that we're done with the hardware generation, we can copy the necessary files onto our PYNQ board. Now that we're done with the hardware generation, we can copy the necessary files onto our PYNQ board.
**Make sure you've [set up the SSH keys for your PYNQ board](https://finn-dev.readthedocs.io/en/latest/getting_started.html#pynq-board-first-time-setup) before executing this step.** **Make sure you've [set up the SSH keys for your PYNQ board](https://finn-dev.readthedocs.io/en/latest/getting_started.html#pynq-board-first-time-setup) before executing this step.**
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import os import os
# set up the following values according to your own environment # set up the following values according to your own environment
# FINN will use ssh to deploy and run the generated accelerator # FINN will use ssh to deploy and run the generated accelerator
ip = "192.168.2.99" ip = "192.168.2.99"
username = os.getenv("PYNQ_USERNAME", "xilinx") username = os.getenv("PYNQ_USERNAME", "xilinx")
password = os.getenv("PYNQ_PASSWORD", "xilinx") password = os.getenv("PYNQ_PASSWORD", "xilinx")
port = os.getenv("PYNQ_PORT", 22) port = os.getenv("PYNQ_PORT", 22)
target_dir = os.getenv("PYNQ_TARGET_DIR", "/home/xilinx/finn_cnv_end2end_example") target_dir = os.getenv("PYNQ_TARGET_DIR", "/home/xilinx/finn_cnv_end2end_example")
# set up ssh options to only allow publickey authentication # set up ssh options to only allow publickey authentication
options = "-o PreferredAuthentications=publickey -o PasswordAuthentication=no" options = "-o PreferredAuthentications=publickey -o PasswordAuthentication=no"
# test access to PYNQ board # test access to PYNQ board
! ssh {options} {username}@{ip} -p {port} cat /var/run/motd.dynamic ! ssh {options} {username}@{ip} -p {port} cat /var/run/motd.dynamic
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from finn.transformation.fpgadataflow.make_deployment import DeployToPYNQ from finn.transformation.fpgadataflow.make_deployment import DeployToPYNQ
model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_synth.onnx") model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_synth.onnx")
model = model.transform(DeployToPYNQ(ip, port, username, password, target_dir)) model = model.transform(DeployToPYNQ(ip, port, username, password, target_dir))
model.save(build_dir + "/end2end_cnv_w1a1_pynq_deploy.onnx") model.save(build_dir + "/end2end_cnv_w1a1_pynq_deploy.onnx")
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
target_dir_pynq = target_dir + "/" + model.get_metadata_prop("pynq_deployment_dir").split("/")[-1] target_dir_pynq = target_dir + "/" + model.get_metadata_prop("pynq_deployment_dir").split("/")[-1]
target_dir_pynq target_dir_pynq
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
! ssh {options} {username}@{ip} -p {port} 'ls -l {target_dir_pynq}' ! ssh {options} {username}@{ip} -p {port} 'ls -l {target_dir_pynq}'
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We only have two more steps to be able to remotely execute the deployed bitfile with some test data from the CIFAR-10 dataset. Let's load up some test data that comes bundled with FINN -- *and before you ask, that's supposed to be a cat (CIFAR-10 class number 3)*. We only have two more steps to be able to remotely execute the deployed bitfile with some test data from the CIFAR-10 dataset. Let's load up some test data that comes bundled with FINN -- *and before you ask, that's supposed to be a cat (CIFAR-10 class number 3)*.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import pkg_resources as pk import pkg_resources as pk
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import numpy as np import numpy as np
fn = pk.resource_filename("finn.qnn-data", "cifar10/cifar10-test-data-class3.npz") fn = pk.resource_filename("finn.qnn-data", "cifar10/cifar10-test-data-class3.npz")
x = np.load(fn)["arr_0"] x = np.load(fn)["arr_0"]
x = x.reshape(3, 32,32).transpose(1, 2, 0) x = x.reshape(3, 32,32).transpose(1, 2, 0)
plt.imshow(x) plt.imshow(x)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Recall that we partitioned our original network into a parent graph that contained the non-synthesizable nodes and a child graph that contained the bulk of the network, which we turned into a bitfile. The only operator left outside the FPGA partition was a `Transpose` to convert NCHW images into NHWC ones. Thus, we can skip the execution in the parent as long as we ensure our image has the expected data layout, which we have done above. Recall that we partitioned our original network into a parent graph that contained the non-synthesizable nodes and a child graph that contained the bulk of the network, which we turned into a bitfile. The only operator left outside the FPGA partition was a `Transpose` to convert NCHW images into NHWC ones. Thus, we can skip the execution in the parent as long as we ensure our image has the expected data layout, which we have done above.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import numpy as np import numpy as np
from finn.core.onnx_exec import execute_onnx from finn.core.onnx_exec import execute_onnx
model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_pynq_deploy.onnx") model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_pynq_deploy.onnx")
iname = model.graph.input[0].name iname = model.graph.input[0].name
oname = model.graph.output[0].name oname = model.graph.output[0].name
ishape = model.get_tensor_shape(iname) ishape = model.get_tensor_shape(iname)
input_dict = {iname: x.astype(np.float32).reshape(ishape)} input_dict = {iname: x.astype(np.float32).reshape(ishape)}
ret = execute_onnx(model, input_dict, True) ret = execute_onnx(model, input_dict, True)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
ret[oname] ret[oname]
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We see that the network correctly predicts this as a class 3 ("cat"). We see that the network correctly predicts this as a class 3 ("cat").
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Validating the Accuracy on a PYNQ Board <a id='validation'></a> ### Validating the Accuracy on a PYNQ Board <a id='validation'></a>
All the command line prompts here are meant to be executed with `sudo` on the PYNQ board, so we'll use a workaround (`echo password | sudo -S command`) to get that working from this notebook running on the host computer. All the command line prompts here are meant to be executed with `sudo` on the PYNQ board, so we'll use a workaround (`echo password | sudo -S command`) to get that working from this notebook running on the host computer.
**Ensure that your PYNQ board has a working internet connecting for the next steps, since some there is some downloading involved.** **Ensure that your PYNQ board has a working internet connecting for the next steps, since some there is some downloading involved.**
To validate the accuracy, we first need to install the [`dataset-loading`](https://github.com/fbcotter/dataset_loading) Python package to the PYNQ board. This will give us a convenient way of downloading and accessing the MNIST dataset. To validate the accuracy, we first need to install the [`dataset-loading`](https://github.com/fbcotter/dataset_loading) Python package to the PYNQ board. This will give us a convenient way of downloading and accessing the MNIST dataset.
Command to execute on PYNQ: Command to execute on PYNQ:
```pip3 install git+https://github.com/fbcotter/dataset_loading.git@0.0.4#egg=dataset_loading``` ```pip3 install git+https://github.com/fbcotter/dataset_loading.git@0.0.4#egg=dataset_loading```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
! ssh {options} -t {username}@{ip} -p {port} 'echo {password} | sudo -S pip3 install git+https://github.com/fbcotter/dataset_loading.git@0.0.4#egg=dataset_loading' ! ssh {options} -t {username}@{ip} -p {port} 'echo {password} | sudo -S pip3 install git+https://github.com/fbcotter/dataset_loading.git@0.0.4#egg=dataset_loading'
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We can now use the `validate.py` script that was generated together with the driver to measure top-1 accuracy on the CIFAR-10 dataset. We can now use the `validate.py` script that was generated together with the driver to measure top-1 accuracy on the CIFAR-10 dataset.
Command to execute on PYNQ: Command to execute on PYNQ:
`python3.6 validate.py --dataset cifar10 --batchsize 1000` `python3.6 validate.py --dataset cifar10 --batchsize 1000`
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
! ssh {options} -t {username}@{ip} -p {port} 'cd {target_dir_pynq}; echo {password} | sudo -S python3.6 validate.py --dataset cifar10 --batchsize 1000' ! ssh {options} -t {username}@{ip} -p {port} 'cd {target_dir_pynq}; echo {password} | sudo -S python3.6 validate.py --dataset cifar10 --batchsize 1000'
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We see that the final top-1 accuracy is 84.19%, which is very close to the 84.22% reported on the [BNN-PYNQ accuracy table in Brevitas](https://github.com/Xilinx/brevitas/tree/master/src/brevitas_examples/bnn_pynq). We see that the final top-1 accuracy is 84.19%, which is very close to the 84.22% reported on the [BNN-PYNQ accuracy table in Brevitas](https://github.com/Xilinx/brevitas/tree/master/src/brevitas_examples/bnn_pynq).
......
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# FINN - End-to-End Flow # FINN - End-to-End Flow
----------------------------------------------------------------- -----------------------------------------------------------------
In this notebook, we will show how to take a simple, binarized, fully-connected network trained on the MNIST data set and take it all the way down to a customized bitfile running on a PYNQ board. In this notebook, we will show how to take a simple, binarized, fully-connected network trained on the MNIST data set and take it all the way down to a customized bitfile running on a PYNQ board.
This notebook is quite lengthy, and some of the cells (involving Vivado synthesis) may take up to an hour to finish running. To let you save and resume your progress, we will save the intermediate ONNX models that are generated in the various steps to disk, so that you can jump back directly to where you left off. This notebook is quite lengthy, and some of the cells (involving Vivado synthesis) may take up to an hour to finish running. To let you save and resume your progress, we will save the intermediate ONNX models that are generated in the various steps to disk, so that you can jump back directly to where you left off.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Overview ## Overview
The FINN compiler comes with many *transformations* that modify the ONNX representation of the network according to certain patterns. This notebook will demonstrate a *possible* sequence of such transformations to take a particular trained network all the way down to hardware, as shown in the figure below. The FINN compiler comes with many *transformations* that modify the ONNX representation of the network according to certain patterns. This notebook will demonstrate a *possible* sequence of such transformations to take a particular trained network all the way down to hardware, as shown in the figure below.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
![](finn-design-flow-example.svg) ![](finn-design-flow-example.svg)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The white fields show the state of the network representation in the respective step. The colored fields represent the transformations that are applied to the network to achieve a certain result. The diagram is divided into 5 sections represented by a different color, each of it includes several flow steps. The flow starts in top left corner with Brevitas export (green section), followed by the preparation of the network (blue section) for the Vivado HLS synthesis and Vivado IPI stitching (orange section), and finally building a PYNQ overlay bitfile and testing it on a PYNQ board (yellow section). The white fields show the state of the network representation in the respective step. The colored fields represent the transformations that are applied to the network to achieve a certain result. The diagram is divided into 5 sections represented by a different color, each of it includes several flow steps. The flow starts in top left corner with Brevitas export (green section), followed by the preparation of the network (blue section) for the Vivado HLS synthesis and Vivado IPI stitching (orange section), and finally building a PYNQ overlay bitfile and testing it on a PYNQ board (yellow section).
There is an additional section for functional verification (red section) on the right side of the diagram, which we will not cover in this notebook. For details please take a look in the verification notebook which you can find [here](tfc_end2end_verification.ipynb) There is an additional section for functional verification (red section) on the right side of the diagram, which we will not cover in this notebook. For details please take a look in the verification notebook which you can find [here](tfc_end2end_verification.ipynb)
This Jupyter notebook is organized based on the sections described above. We will use the following helper functions, `showSrc` to show source code of FINN library calls and `showInNetron` to show the ONNX model at the current transformation step. The Netron displays are interactive, but they only work when running the notebook actively and not on GitHub (i.e. if you are viewing this on GitHub you'll only see blank squares). This Jupyter notebook is organized based on the sections described above. We will use the following helper functions, `showSrc` to show source code of FINN library calls and `showInNetron` to show the ONNX model at the current transformation step. The Netron displays are interactive, but they only work when running the notebook actively and not on GitHub (i.e. if you are viewing this on GitHub you'll only see blank squares).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from finn.util.visualization import showSrc, showInNetron from finn.util.visualization import showSrc, showInNetron
from finn.util.basic import make_build_dir from finn.util.basic import make_build_dir
import os import os
build_dir = os.environ["FINN_BUILD_DIR"] build_dir = os.environ["FINN_BUILD_DIR"]
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Outline ## Outline
------------- -------------
1. [Brevitas export](#brev_exp) 1. [Brevitas export](#brev_exp)
2. [Network preparation](#nw_prep) 2. [Network preparation](#nw_prep)
3. [Hardware build](#vivado) 3. [Hardware build](#vivado)
4. [PYNQ deployment](#hw_test) 4. [PYNQ deployment](#hw_test)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 1. Brevitas export <a id='brev_exp'></a> ## 1. Brevitas export <a id='brev_exp'></a>
FINN expects an ONNX model as input. This can be a model trained with [Brevitas](https://github.com/Xilinx/brevitas). Brevitas is a PyTorch library for quantization-aware training and the FINN Docker image comes with several [example Brevitas networks](https://github.com/Xilinx/brevitas/tree/master/src/brevitas_examples/bnn_pynq). To show the FINN end-to-end flow, we'll use the TFC-w1a1 model as example network. FINN expects an ONNX model as input. This can be a model trained with [Brevitas](https://github.com/Xilinx/brevitas). Brevitas is a PyTorch library for quantization-aware training and the FINN Docker image comes with several [example Brevitas networks](https://github.com/Xilinx/brevitas/tree/master/src/brevitas_examples/bnn_pynq). To show the FINN end-to-end flow, we'll use the TFC-w1a1 model as example network.
First a few things have to be imported. Then the model can be loaded with the pretrained weights. First a few things have to be imported. Then the model can be loaded with the pretrained weights.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import onnx import onnx
from finn.util.test import get_test_model_trained from finn.util.test import get_test_model_trained
import brevitas.onnx as bo import brevitas.onnx as bo
tfc = get_test_model_trained("TFC", 1, 1) tfc = get_test_model_trained("TFC", 1, 1)
bo.export_finn_onnx(tfc, (1, 1, 28, 28), build_dir+"/tfc_w1_a1.onnx"); # semicolon added to suppress log bo.export_finn_onnx(tfc, (1, 1, 28, 28), build_dir+"/tfc_w1_a1.onnx"); # semicolon added to suppress log
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The model was now exported, loaded with the pretrained weights and saved under the name "tfc_w1_a1.onnx". The model was now exported, loaded with the pretrained weights and saved under the name "tfc_w1_a1.onnx".
To visualize the exported model, Netron can be used. Netron is a visualizer for neural networks and allows interactive investigation of network properties. For example, you can click on the individual nodes and view the properties. To visualize the exported model, Netron can be used. Netron is a visualizer for neural networks and allows interactive investigation of network properties. For example, you can click on the individual nodes and view the properties.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
showInNetron(build_dir+"/tfc_w1_a1.onnx") showInNetron(build_dir+"/tfc_w1_a1.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now that we have the model in .onnx format, we can work with it using FINN. For that, `ModelWrapper` is used. It is a wrapper around the ONNX model which provides several helper functions to make it easier to work with the model. 'ModelWrapper' is imported from the [QONNX repo](https://github.com/fastmachinelearning/qonnx), this repository contains several functionality that is used in FINN. Now that we have the model in .onnx format, we can work with it using FINN. For that, `ModelWrapper` is used. It is a wrapper around the ONNX model which provides several helper functions to make it easier to work with the model. 'ModelWrapper' is imported from the [QONNX repo](https://github.com/fastmachinelearning/qonnx), this repository contains several functionality that is used in FINN.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from qonnx.core.modelwrapper import ModelWrapper from qonnx.core.modelwrapper import ModelWrapper
model = ModelWrapper(build_dir+"/tfc_w1_a1.onnx") model = ModelWrapper(build_dir+"/tfc_w1_a1.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now the model is prepared and could be simulated using Python. How this works is described in the Jupyter notebook about verification and can be found [here](tfc_end2end_verification.ipynb#simpy). Now the model is prepared and could be simulated using Python. How this works is described in the Jupyter notebook about verification and can be found [here](tfc_end2end_verification.ipynb#simpy).
The model can now also be processed in different ways. The principle of FINN are analysis and transformation passes, which can be applied to the model. An analysis pass extracts specific information about the model and returns it to the user in the form of a dictionary. A transformation pass changes the model and returns the changed model back to the FINN flow. The model can now also be processed in different ways. The principle of FINN are analysis and transformation passes, which can be applied to the model. An analysis pass extracts specific information about the model and returns it to the user in the form of a dictionary. A transformation pass changes the model and returns the changed model back to the FINN flow.
Since the goal in this notebook is to process the model to such an extent that a bitstream can be generated from it, the focus is on the transformations that are necessary for this. In the next section these are discussed in more detail. Since the goal in this notebook is to process the model to such an extent that a bitstream can be generated from it, the focus is on the transformations that are necessary for this. In the next section these are discussed in more detail.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 2. Network preparation <a id='nw_prep'></a> ## 2. Network preparation <a id='nw_prep'></a>
* [FINN-style Dataflow Architectures](#dataflow_arch) * [FINN-style Dataflow Architectures](#dataflow_arch)
* [Tidy-up transformations](#basic_trafo) * [Tidy-up transformations](#basic_trafo)
* [Streamlining](#streamline) * [Streamlining](#streamline)
* [Conversion to HLS layers](#hls_layers) * [Conversion to HLS layers](#hls_layers)
* [Creating a Dataflow Partition](#dataflow_partition) * [Creating a Dataflow Partition](#dataflow_partition)
* [Folding and Datawidth Converter, FIFO and TLastMarker Insertion](#folding) * [Folding and Datawidth Converter, FIFO and TLastMarker Insertion](#folding)
In this section, we will put the network through a series of transformations that puts it in a form that can be stitched together to form a FINN-style dataflow architecture, yielding a high-performance, high-efficiency FPGA accelerator. In this section, we will put the network through a series of transformations that puts it in a form that can be stitched together to form a FINN-style dataflow architecture, yielding a high-performance, high-efficiency FPGA accelerator.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### FINN-style Dataflow Architectures <a id='dataflow_arch'></a> ### FINN-style Dataflow Architectures <a id='dataflow_arch'></a>
We start with a quick recap of FINN-style dataflow architectures. The key idea in such architectures is to parallelize across layers as well as within layers by dedicating a proportionate amount of compute resources to each layer, as illustrated in the figure below taken from the [FINN-R paper](https://arxiv.org/pdf/1809.04570.pdf): We start with a quick recap of FINN-style dataflow architectures. The key idea in such architectures is to parallelize across layers as well as within layers by dedicating a proportionate amount of compute resources to each layer, as illustrated in the figure below taken from the [FINN-R paper](https://arxiv.org/pdf/1809.04570.pdf):
![](finn-hw-arch.png) ![](finn-hw-arch.png)
In practice, the compute arrays are instantiated by function calls to optimized Vivado HLS building blocks from the [finn-hlslib](https://github.com/Xilinx/finn-hlslib) library. As these function calls can only handle certain patterns/cases, we need to transform the network into an appropriate form so that we can replace network layers with these function calls, which is the goal of the network preparation process. In practice, the compute arrays are instantiated by function calls to optimized Vivado HLS building blocks from the [finn-hlslib](https://github.com/Xilinx/finn-hlslib) library. As these function calls can only handle certain patterns/cases, we need to transform the network into an appropriate form so that we can replace network layers with these function calls, which is the goal of the network preparation process.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Tidy-up transformations <a id='basic_trafo'></a> ### Tidy-up transformations <a id='basic_trafo'></a>
This section deals with some basic transformations, which are applied to the model like a kind of "tidy-up" to make it easier to be processed. They do not appear in the diagram above, but they are applied in many steps in the FINN flow to postprocess the model after a transformation and/or to prepare it for the next transformation. This section deals with some basic transformations, which are applied to the model like a kind of "tidy-up" to make it easier to be processed. They do not appear in the diagram above, but they are applied in many steps in the FINN flow to postprocess the model after a transformation and/or to prepare it for the next transformation.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
These transformations are: These transformations are:
* GiveUniqueNodeNames * GiveUniqueNodeNames
* GiveReadableTensorNames * GiveReadableTensorNames
* InferShapes * InferShapes
* InferDataTypes * InferDataTypes
* FoldConstants * FoldConstants
* RemoveStaticGraphInputs * RemoveStaticGraphInputs
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
In the first two transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames`) the nodes in the graph are first given unique (by enumeration) names, then the tensors are given human-readable names (based on the node names). The following two transformations (`InferShapes`, `InferDataTypes`) derive the shapes and data types of the tensors from the model properties and set them in the `ValueInfo` of the model. These transformations can almost always be applied without negative effects and do not affect the structure of the graph, ensuring that all the information needed is available. In the first two transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames`) the nodes in the graph are first given unique (by enumeration) names, then the tensors are given human-readable names (based on the node names). The following two transformations (`InferShapes`, `InferDataTypes`) derive the shapes and data types of the tensors from the model properties and set them in the `ValueInfo` of the model. These transformations can almost always be applied without negative effects and do not affect the structure of the graph, ensuring that all the information needed is available.
The next listed transformation is `FoldConstants`, which performs constant folding. It identifies a node with constant inputs and determines its output. The result is then set as constant-only inputs for the following node and the old node is removed. Although this transformation changes the structure of the model, it is a transformation that is usually always desired and can be applied to any model. And finally, we have `RemoveStaticGraphInputs` to remove any top-level graph inputs that already have ONNX initializers associated with them. The next listed transformation is `FoldConstants`, which performs constant folding. It identifies a node with constant inputs and determines its output. The result is then set as constant-only inputs for the following node and the old node is removed. Although this transformation changes the structure of the model, it is a transformation that is usually always desired and can be applied to any model. And finally, we have `RemoveStaticGraphInputs` to remove any top-level graph inputs that already have ONNX initializers associated with them.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
These transformations can be imported and applied as follows. These transformations can be imported and applied as follows.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from qonnx.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames, RemoveStaticGraphInputs from qonnx.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames, RemoveStaticGraphInputs
from qonnx.transformation.infer_shapes import InferShapes from qonnx.transformation.infer_shapes import InferShapes
from qonnx.transformation.infer_datatypes import InferDataTypes from qonnx.transformation.infer_datatypes import InferDataTypes
from qonnx.transformation.fold_constants import FoldConstants from qonnx.transformation.fold_constants import FoldConstants
model = model.transform(InferShapes()) model = model.transform(InferShapes())
model = model.transform(FoldConstants()) model = model.transform(FoldConstants())
model = model.transform(GiveUniqueNodeNames()) model = model.transform(GiveUniqueNodeNames())
model = model.transform(GiveReadableTensorNames()) model = model.transform(GiveReadableTensorNames())
model = model.transform(InferDataTypes()) model = model.transform(InferDataTypes())
model = model.transform(RemoveStaticGraphInputs()) model = model.transform(RemoveStaticGraphInputs())
model.save(build_dir+"/tfc_w1_a1_tidy.onnx") model.save(build_dir+"/tfc_w1_a1_tidy.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The result of these transformations can be viewed with netron after the model has been saved again. By clicking on the individual nodes, it can now be seen, for example, that each node has been given a name. Also the whole upper area could be folded, so that now the first node is "Reshape". The result of these transformations can be viewed with netron after the model has been saved again. By clicking on the individual nodes, it can now be seen, for example, that each node has been given a name. Also the whole upper area could be folded, so that now the first node is "Reshape".
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
showInNetron(build_dir+"/tfc_w1_a1_tidy.onnx") showInNetron(build_dir+"/tfc_w1_a1_tidy.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Adding Pre- and Postprocessing <a id='prepost'></a> ### Adding Pre- and Postprocessing <a id='prepost'></a>
In many cases, it's common to apply some preprocessing to the raw data in a machine learning framework prior to training. For image classification networks, this may include conversion of raw 8-bit RGB values into floating point values between 0 and 1. Similarly, at the output of the network some postprocessing may be performed during deployment, such as extracting the indices of the classifications with the largest value (top-K indices). In many cases, it's common to apply some preprocessing to the raw data in a machine learning framework prior to training. For image classification networks, this may include conversion of raw 8-bit RGB values into floating point values between 0 and 1. Similarly, at the output of the network some postprocessing may be performed during deployment, such as extracting the indices of the classifications with the largest value (top-K indices).
In FINN, we can bake some of these pre/postprocessing operatings into the graph, and in some cases these can be highly beneficial for performance by allowing our accelerator to directly consume raw data instead of going through CPU preprocessing. In FINN, we can bake some of these pre/postprocessing operatings into the graph, and in some cases these can be highly beneficial for performance by allowing our accelerator to directly consume raw data instead of going through CPU preprocessing.
We'll demonstrate this for our small image classification network as follows. Brevitas preprocesses BNN-PYNQ network inputs with `torchvision.transforms.ToTensor()` [prior to training](https://github.com/Xilinx/brevitas/blob/master/src/brevitas_examples/bnn_pynq/trainer.py#L104), which converts 8-bit RGB values into floats between 0 and 1 by dividing the input by 255. We can achieve the same effect in FINN by exporting a single-node ONNX graph for division by 255 (which already exists as `finn.util.pytorch.ToTensor` and merging this with our original model. Finally, we're going to mark our input tensor as 8-bit to let FINN know which level of precision to use. We'll demonstrate this for our small image classification network as follows. Brevitas preprocesses BNN-PYNQ network inputs with `torchvision.transforms.ToTensor()` [prior to training](https://github.com/Xilinx/brevitas/blob/master/src/brevitas_examples/bnn_pynq/trainer.py#L104), which converts 8-bit RGB values into floats between 0 and 1 by dividing the input by 255. We can achieve the same effect in FINN by exporting a single-node ONNX graph for division by 255 (which already exists as `finn.util.pytorch.ToTensor` and merging this with our original model. Finally, we're going to mark our input tensor as 8-bit to let FINN know which level of precision to use.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from finn.util.pytorch import ToTensor from finn.util.pytorch import ToTensor
from qonnx.transformation.merge_onnx_models import MergeONNXModels from qonnx.transformation.merge_onnx_models import MergeONNXModels
from qonnx.core.datatype import DataType from qonnx.core.datatype import DataType
model = ModelWrapper(build_dir+"/tfc_w1_a1_tidy.onnx") model = ModelWrapper(build_dir+"/tfc_w1_a1_tidy.onnx")
global_inp_name = model.graph.input[0].name global_inp_name = model.graph.input[0].name
ishape = model.get_tensor_shape(global_inp_name) ishape = model.get_tensor_shape(global_inp_name)
# preprocessing: torchvision's ToTensor divides uint8 inputs by 255 # preprocessing: torchvision's ToTensor divides uint8 inputs by 255
totensor_pyt = ToTensor() totensor_pyt = ToTensor()
chkpt_preproc_name = build_dir+"/tfc_w1_a1_preproc.onnx" chkpt_preproc_name = build_dir+"/tfc_w1_a1_preproc.onnx"
bo.export_finn_onnx(totensor_pyt, ishape, chkpt_preproc_name) bo.export_finn_onnx(totensor_pyt, ishape, chkpt_preproc_name)
# join preprocessing and core model # join preprocessing and core model
pre_model = ModelWrapper(chkpt_preproc_name) pre_model = ModelWrapper(chkpt_preproc_name)
model = model.transform(MergeONNXModels(pre_model)) model = model.transform(MergeONNXModels(pre_model))
# add input quantization annotation: UINT8 for all BNN-PYNQ models # add input quantization annotation: UINT8 for all BNN-PYNQ models
global_inp_name = model.graph.input[0].name global_inp_name = model.graph.input[0].name
model.set_tensor_datatype(global_inp_name, DataType["UINT8"]) model.set_tensor_datatype(global_inp_name, DataType["UINT8"])
model.save(build_dir+"/tfc_w1_a1_with_preproc.onnx") model.save(build_dir+"/tfc_w1_a1_with_preproc.onnx")
showInNetron(build_dir+"/tfc_w1_a1_with_preproc.onnx") showInNetron(build_dir+"/tfc_w1_a1_with_preproc.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
You can observe two changes in the graph above: a `Div` node has appeared in the beginning to perform the input preprocessing, and the `global_in` tensor now has a quantization annotation to mark it as an unsigned 8-bit value. You can observe two changes in the graph above: a `Div` node has appeared in the beginning to perform the input preprocessing, and the `global_in` tensor now has a quantization annotation to mark it as an unsigned 8-bit value.
For the postprocessing we'll insert a TopK node for k=1 at the end of our graph. This will extract the index (class number) for the largest-valued output. For the postprocessing we'll insert a TopK node for k=1 at the end of our graph. This will extract the index (class number) for the largest-valued output.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from qonnx.transformation.insert_topk import InsertTopK from qonnx.transformation.insert_topk import InsertTopK
# postprocessing: insert Top-1 node at the end # postprocessing: insert Top-1 node at the end
model = model.transform(InsertTopK(k=1)) model = model.transform(InsertTopK(k=1))
chkpt_name = build_dir+"/tfc_w1_a1_pre_post.onnx" chkpt_name = build_dir+"/tfc_w1_a1_pre_post.onnx"
# tidy-up again # tidy-up again
model = model.transform(InferShapes()) model = model.transform(InferShapes())
model = model.transform(FoldConstants()) model = model.transform(FoldConstants())
model = model.transform(GiveUniqueNodeNames()) model = model.transform(GiveUniqueNodeNames())
model = model.transform(GiveReadableTensorNames()) model = model.transform(GiveReadableTensorNames())
model = model.transform(InferDataTypes()) model = model.transform(InferDataTypes())
model = model.transform(RemoveStaticGraphInputs()) model = model.transform(RemoveStaticGraphInputs())
model.save(chkpt_name) model.save(chkpt_name)
showInNetron(build_dir+"/tfc_w1_a1_pre_post.onnx") showInNetron(build_dir+"/tfc_w1_a1_pre_post.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Notice the`TopK` node that has appeared at the end of the network. With our pre- and postprocessing in place, we can move on to the next step in the flow, which is streamlining. Notice the`TopK` node that has appeared at the end of the network. With our pre- and postprocessing in place, we can move on to the next step in the flow, which is streamlining.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Streamlining <a id='streamline'></a> ### Streamlining <a id='streamline'></a>
Streamlining is a transformation containing several sub-transformations. The goal of streamlining is to eliminate floating point operations by moving them around, then collapsing them into one operation and in the last step transform them into multi-thresholding nodes. For more information on the theoretical background of this, see [this paper](https://arxiv.org/pdf/1709.04060). Streamlining is a transformation containing several sub-transformations. The goal of streamlining is to eliminate floating point operations by moving them around, then collapsing them into one operation and in the last step transform them into multi-thresholding nodes. For more information on the theoretical background of this, see [this paper](https://arxiv.org/pdf/1709.04060).
Let's have a look at which sub-transformations `Streamline` consists of: Let's have a look at which sub-transformations `Streamline` consists of:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from finn.transformation.streamline import Streamline from finn.transformation.streamline import Streamline
showSrc(Streamline) showSrc(Streamline)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
As can be seen, several transformations are involved in the streamlining transformation. There are move and collapse transformations. In the last step the operations are transformed into multithresholds. The involved transformations can be viewed in detail [here](https://github.com/Xilinx/finn/tree/master/src/finn/transformation/streamline). After each transformation, three of the tidy-up transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames` and `InferDataTypes`) are applied to the model. As can be seen, several transformations are involved in the streamlining transformation. There are move and collapse transformations. In the last step the operations are transformed into multithresholds. The involved transformations can be viewed in detail [here](https://github.com/Xilinx/finn/tree/master/src/finn/transformation/streamline). After each transformation, three of the tidy-up transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames` and `InferDataTypes`) are applied to the model.
After streamlining the network looks as follows: After streamlining the network looks as follows:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from finn.transformation.streamline.reorder import MoveScalarLinearPastInvariants from finn.transformation.streamline.reorder import MoveScalarLinearPastInvariants
import finn.transformation.streamline.absorb as absorb import finn.transformation.streamline.absorb as absorb
model = ModelWrapper(build_dir+"/tfc_w1_a1_pre_post.onnx") model = ModelWrapper(build_dir+"/tfc_w1_a1_pre_post.onnx")
# move initial Mul (from preproc) past the Reshape # move initial Mul (from preproc) past the Reshape
model = model.transform(MoveScalarLinearPastInvariants()) model = model.transform(MoveScalarLinearPastInvariants())
# streamline # streamline
model = model.transform(Streamline()) model = model.transform(Streamline())
model.save(build_dir+"/tfc_w1_a1_streamlined.onnx") model.save(build_dir+"/tfc_w1_a1_streamlined.onnx")
showInNetron(build_dir+"/tfc_w1_a1_streamlined.onnx") showInNetron(build_dir+"/tfc_w1_a1_streamlined.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
You can see that the network has become simplified considerably compared to the previous step -- a lot of nodes have disappeared between the `MatMul` layers. You can see that the network has become simplified considerably compared to the previous step -- a lot of nodes have disappeared between the `MatMul` layers.
**The current implementation of streamlining is highly network-specific and may not work for your network if its topology is very different than the example network here. We hope to rectify this in future releases.** **The current implementation of streamlining is highly network-specific and may not work for your network if its topology is very different than the example network here. We hope to rectify this in future releases.**
Our example network is a quantized network with 1-bit bipolar (-1, +1 values) precision, and we want FINN to implement them as XNOR-popcount operations [as described in the original FINN paper](https://arxiv.org/pdf/1612.07119). For this reason, after streamlining, the resulting bipolar matrix multiplications are converted into xnorpopcount operations. This transformation produces operations that are again collapsed and converted into thresholds. This procedure is shown below. Our example network is a quantized network with 1-bit bipolar (-1, +1 values) precision, and we want FINN to implement them as XNOR-popcount operations [as described in the original FINN paper](https://arxiv.org/pdf/1612.07119). For this reason, after streamlining, the resulting bipolar matrix multiplications are converted into xnorpopcount operations. This transformation produces operations that are again collapsed and converted into thresholds. This procedure is shown below.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from qonnx.transformation.bipolar_to_xnor import ConvertBipolarMatMulToXnorPopcount from qonnx.transformation.bipolar_to_xnor import ConvertBipolarMatMulToXnorPopcount
from finn.transformation.streamline.round_thresholds import RoundAndClipThresholds from finn.transformation.streamline.round_thresholds import RoundAndClipThresholds
from qonnx.transformation.infer_data_layouts import InferDataLayouts from qonnx.transformation.infer_data_layouts import InferDataLayouts
from qonnx.transformation.general import RemoveUnusedTensors from qonnx.transformation.general import RemoveUnusedTensors
model = model.transform(ConvertBipolarMatMulToXnorPopcount()) model = model.transform(ConvertBipolarMatMulToXnorPopcount())
model = model.transform(absorb.AbsorbAddIntoMultiThreshold()) model = model.transform(absorb.AbsorbAddIntoMultiThreshold())
model = model.transform(absorb.AbsorbMulIntoMultiThreshold()) model = model.transform(absorb.AbsorbMulIntoMultiThreshold())
# absorb final add-mul nodes into TopK # absorb final add-mul nodes into TopK
model = model.transform(absorb.AbsorbScalarMulAddIntoTopK()) model = model.transform(absorb.AbsorbScalarMulAddIntoTopK())
model = model.transform(RoundAndClipThresholds()) model = model.transform(RoundAndClipThresholds())
# bit of tidy-up # bit of tidy-up
model = model.transform(InferDataLayouts()) model = model.transform(InferDataLayouts())
model = model.transform(RemoveUnusedTensors()) model = model.transform(RemoveUnusedTensors())
model.save(build_dir+"/tfc_w1a1_ready_for_hls_conversion.onnx") model.save(build_dir+"/tfc_w1a1_ready_for_hls_conversion.onnx")
showInNetron(build_dir+"/tfc_w1a1_ready_for_hls_conversion.onnx") showInNetron(build_dir+"/tfc_w1a1_ready_for_hls_conversion.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Observe the pairs of `XnorPopcountmatMul` and `MultiThreshold` layers following each other -- this is the particular pattern that the next step will be looking for in order to convert them to HLS layers. Observe the pairs of `XnorPopcountmatMul` and `MultiThreshold` layers following each other -- this is the particular pattern that the next step will be looking for in order to convert them to HLS layers.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Conversion to HLS layers <a id='hls_layers'></a> ### Conversion to HLS layers <a id='hls_layers'></a>
Converts the nodes to HLS layers that correspond to the functions in [finn-hls library](https://finn-hlslib.readthedocs.io/en/latest/). In our case this transformation converts pairs of binary XnorPopcountMatMul layers to MatrixVectorActivation layers. Any immediately following MultiThreshold layers will also be absorbed into the MVTU. Converts the nodes to HLS layers that correspond to the functions in [finn-hls library](https://finn-hlslib.readthedocs.io/en/latest/). In our case this transformation converts pairs of binary XnorPopcountMatMul layers to MatrixVectorActivation layers. Any immediately following MultiThreshold layers will also be absorbed into the MVTU.
Below is the code for the transformation and the network is visualized using netron to create the new structure with `MatrixVectorActivation` nodes, which will correspond to a function call from the [finn-hlslib](https://finn-hlslib.readthedocs.io/en/latest/library/matrixvector.html) library. Below is the code for the transformation and the network is visualized using netron to create the new structure with `MatrixVectorActivation` nodes, which will correspond to a function call from the [finn-hlslib](https://finn-hlslib.readthedocs.io/en/latest/library/matrixvector.html) library.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
**Note:** The transformation `to_hls.InferBinaryMatrixVectorActivation` gets the string "decoupled" as argument, this indicates the `mem_mode` for the weights. In FINN there are different options to set the way the weights are stored and accessed. For details please have a look on the [FINN readthedocs website](https://finn.readthedocs.io/) under Internals. **Note:** The transformation `to_hls.InferBinaryMatrixVectorActivation` gets the string "decoupled" as argument, this indicates the `mem_mode` for the weights. In FINN there are different options to set the way the weights are stored and accessed. For details please have a look on the [FINN readthedocs website](https://finn.readthedocs.io/) under Internals.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import finn.transformation.fpgadataflow.convert_to_hls_layers as to_hls import finn.transformation.fpgadataflow.convert_to_hls_layers as to_hls
model = ModelWrapper(build_dir+"/tfc_w1a1_ready_for_hls_conversion.onnx") model = ModelWrapper(build_dir+"/tfc_w1a1_ready_for_hls_conversion.onnx")
model = model.transform(to_hls.InferBinaryMatrixVectorActivation("decoupled")) model = model.transform(to_hls.InferBinaryMatrixVectorActivation("decoupled"))
# TopK to LabelSelect # TopK to LabelSelect
model = model.transform(to_hls.InferLabelSelectLayer()) model = model.transform(to_hls.InferLabelSelectLayer())
# input quantization (if any) to standalone thresholding # input quantization (if any) to standalone thresholding
model = model.transform(to_hls.InferThresholdingLayer()) model = model.transform(to_hls.InferThresholdingLayer())
model.save(build_dir+"/tfc_w1_a1_hls_layers.onnx") model.save(build_dir+"/tfc_w1_a1_hls_layers.onnx")
showInNetron(build_dir+"/tfc_w1_a1_hls_layers.onnx") showInNetron(build_dir+"/tfc_w1_a1_hls_layers.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Each MatrixVectorActivation node has two attributes that specify the degree of folding, PE and SIMD. In all nodes the values for these attributes are set as default to 1, which would correspond to a maximum folding (time multiplexing) and thus minimum performance. We will shortly cover how these can be adjusted, but first we want to separate the HLS layers from the non-HLS layers in this network. Each MatrixVectorActivation node has two attributes that specify the degree of folding, PE and SIMD. In all nodes the values for these attributes are set as default to 1, which would correspond to a maximum folding (time multiplexing) and thus minimum performance. We will shortly cover how these can be adjusted, but first we want to separate the HLS layers from the non-HLS layers in this network.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Creating a Dataflow Partition <a id='dataflow_partition'></a> ### Creating a Dataflow Partition <a id='dataflow_partition'></a>
In the graph above, you can see that there is a mixture of FINN HLS layers (MatrixVectorActivation and Thresholding_Batch) with one regular ONNX layers (Reshape). To create a bitstream, FINN needs a model with only HLS layers. In order to achieve this, we will use the `CreateDataflowPartition` transformation to create a "dataflow partition" in this graph, separating out the HLS layers into another model, and replacing them with a placeholder layer called StreamingDataflowPartition. In the graph above, you can see that there is a mixture of FINN HLS layers (MatrixVectorActivation and Thresholding_Batch) with one regular ONNX layers (Reshape). To create a bitstream, FINN needs a model with only HLS layers. In order to achieve this, we will use the `CreateDataflowPartition` transformation to create a "dataflow partition" in this graph, separating out the HLS layers into another model, and replacing them with a placeholder layer called StreamingDataflowPartition.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from finn.transformation.fpgadataflow.create_dataflow_partition import CreateDataflowPartition from finn.transformation.fpgadataflow.create_dataflow_partition import CreateDataflowPartition
model = ModelWrapper(build_dir+"/tfc_w1_a1_hls_layers.onnx") model = ModelWrapper(build_dir+"/tfc_w1_a1_hls_layers.onnx")
parent_model = model.transform(CreateDataflowPartition()) parent_model = model.transform(CreateDataflowPartition())
parent_model.save(build_dir+"/tfc_w1_a1_dataflow_parent.onnx") parent_model.save(build_dir+"/tfc_w1_a1_dataflow_parent.onnx")
showInNetron(build_dir+"/tfc_w1_a1_dataflow_parent.onnx") showInNetron(build_dir+"/tfc_w1_a1_dataflow_parent.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We can see that the `MatrixVectorActivation` instances and the `Thresholding_Batch` in the beginning have all been replaced with a single `StreamingDataflowPartition`, which has an attribute `model` that points to the extracted, HLS dataflow-only graph: We can see that the `MatrixVectorActivation` instances and the `Thresholding_Batch` in the beginning have all been replaced with a single `StreamingDataflowPartition`, which has an attribute `model` that points to the extracted, HLS dataflow-only graph:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from qonnx.custom_op.registry import getCustomOp from qonnx.custom_op.registry import getCustomOp
sdp_node = parent_model.get_nodes_by_op_type("StreamingDataflowPartition")[0] sdp_node = parent_model.get_nodes_by_op_type("StreamingDataflowPartition")[0]
sdp_node = getCustomOp(sdp_node) sdp_node = getCustomOp(sdp_node)
dataflow_model_filename = sdp_node.get_nodeattr("model") dataflow_model_filename = sdp_node.get_nodeattr("model")
showInNetron(dataflow_model_filename) showInNetron(dataflow_model_filename)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We can see all the extracted `MatrixVectorActivation` instances and the `Thresholding_Batch` have been moved to the child (dataflow) model. We will load the child model with `ModelWrapper` and continue working on it. We can see all the extracted `MatrixVectorActivation` instances and the `Thresholding_Batch` have been moved to the child (dataflow) model. We will load the child model with `ModelWrapper` and continue working on it.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
model = ModelWrapper(dataflow_model_filename) model = ModelWrapper(dataflow_model_filename)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Folding: Adjusting the Parallelism <a id='folding'></a> ### Folding: Adjusting the Parallelism <a id='folding'></a>
*Folding* in FINN describes how much a layer is time-multiplexed in terms of execution resources. There are several *folding factors* for each layer, controlled by the PE (parallelization over outputs) and SIMD (parallelization over inputs) parameters as described by the original [FINN paper](https://arxiv.org/pdf/1612.07119). The higher the PE and SIMD values are set, the faster the generated accelerator will run, and the more FPGA resources it will consume. *Folding* in FINN describes how much a layer is time-multiplexed in terms of execution resources. There are several *folding factors* for each layer, controlled by the PE (parallelization over outputs) and SIMD (parallelization over inputs) parameters as described by the original [FINN paper](https://arxiv.org/pdf/1612.07119). The higher the PE and SIMD values are set, the faster the generated accelerator will run, and the more FPGA resources it will consume.
Since the folding parameters are node attributes, they can be easily accessed and changed using a helper function of the `ModelWrapper`. But first we take a closer look at one of the nodes that implement a MatrixVectorActivation operation. This is where the Netron visualization helps us, in the above diagram we can see that the model contains four MatrixVectorActivation. So as an example we extract the second node of the graph. Since the folding parameters are node attributes, they can be easily accessed and changed using a helper function of the `ModelWrapper`. But first we take a closer look at one of the nodes that implement a MatrixVectorActivation operation. This is where the Netron visualization helps us, in the above diagram we can see that the model contains four MatrixVectorActivation. So as an example we extract the second node of the graph.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We can use the higher-level [HLSCustomOp](https://github.com/Xilinx/finn/blob/main/src/finn/custom_op/fpgadataflow/__init__.py) wrappers for this node. These wrappers provide easy access to specific properties of these nodes, such as the folding factors (PE and SIMD). Let's have a look at which node attributes are defined by the CustomOp wrapper, and adjust the SIMD and PE attributes. We can use the higher-level [HLSCustomOp](https://github.com/Xilinx/finn/blob/main/src/finn/custom_op/fpgadataflow/__init__.py) wrappers for this node. These wrappers provide easy access to specific properties of these nodes, such as the folding factors (PE and SIMD). Let's have a look at which node attributes are defined by the CustomOp wrapper, and adjust the SIMD and PE attributes.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
fc0 = model.graph.node[1] fc0 = model.graph.node[1]
fc0w = getCustomOp(fc0) fc0w = getCustomOp(fc0)
print("CustomOp wrapper is of class " + fc0w.__class__.__name__) print("CustomOp wrapper is of class " + fc0w.__class__.__name__)
fc0w.get_nodeattr_types() fc0w.get_nodeattr_types()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We can see that the PE and SIMD are listed as node attributes, as well as the depths of the FIFOs that will be inserted between consecutive layers, and all can be adjusted using `set_nodeattr` subject to certain constraints. There are also a lot of additional attributes that can be set for this node type. We can see that the PE and SIMD are listed as node attributes, as well as the depths of the FIFOs that will be inserted between consecutive layers, and all can be adjusted using `set_nodeattr` subject to certain constraints. There are also a lot of additional attributes that can be set for this node type.
**In this notebook we are setting the folding factors and FIFO depths manually, but in a future version we will support determining the folding factors given an FPGA resource budget according to the analytical model from the [FINN-R paper](https://arxiv.org/pdf/1809.04570).** **In this notebook we are setting the folding factors and FIFO depths manually, but in a future version we will support determining the folding factors given an FPGA resource budget according to the analytical model from the [FINN-R paper](https://arxiv.org/pdf/1809.04570).**
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
fc_layers = model.get_nodes_by_op_type("MatrixVectorActivation") fc_layers = model.get_nodes_by_op_type("MatrixVectorActivation")
# (PE, SIMD, in_fifo_depth, out_fifo_depth, ramstyle) for each layer # (PE, SIMD, in_fifo_depth, out_fifo_depth, ramstyle) for each layer
config = [ config = [
(16, 49, 16, 64, "block"), (16, 49, [16], [64], "block"),
(8, 8, 64, 64, "auto"), (8, 8, [64], [64], "auto"),
(8, 8, 64, 64, "auto"), (8, 8, [64], [64], "auto"),
(10, 8, 64, 10, "distributed"), (10, 8, [64], [10], "distributed"),
] ]
for fcl, (pe, simd, ififo, ofifo, ramstyle) in zip(fc_layers, config): for fcl, (pe, simd, ififo, ofifo, ramstyle) in zip(fc_layers, config):
fcl_inst = getCustomOp(fcl) fcl_inst = getCustomOp(fcl)
fcl_inst.set_nodeattr("PE", pe) fcl_inst.set_nodeattr("PE", pe)
fcl_inst.set_nodeattr("SIMD", simd) fcl_inst.set_nodeattr("SIMD", simd)
fcl_inst.set_nodeattr("inFIFODepth", ififo) fcl_inst.set_nodeattr("inFIFODepths", ififo)
fcl_inst.set_nodeattr("outFIFODepth", ofifo) fcl_inst.set_nodeattr("outFIFODepths", ofifo)
fcl_inst.set_nodeattr("ram_style", ramstyle) fcl_inst.set_nodeattr("ram_style", ramstyle)
# set parallelism for input quantizer to be same as first layer's SIMD # set parallelism for input quantizer to be same as first layer's SIMD
inp_qnt_node = model.get_nodes_by_op_type("Thresholding_Batch")[0] inp_qnt_node = model.get_nodes_by_op_type("Thresholding_Batch")[0]
inp_qnt = getCustomOp(inp_qnt_node) inp_qnt = getCustomOp(inp_qnt_node)
inp_qnt.set_nodeattr("PE", 49) inp_qnt.set_nodeattr("PE", 49)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We are setting PE and SIMD so that each layer has a total folding of 16. We are setting PE and SIMD so that each layer has a total folding of 16.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Besides PE and SIMD three other node attributes are set. `ram_style` specifies how the weights are to be stored (BRAM, LUTRAM, and so on). It can be selected explicitly or with the option `auto` you can let Vivado decide. Besides PE and SIMD three other node attributes are set. `ram_style` specifies how the weights are to be stored (BRAM, LUTRAM, and so on). It can be selected explicitly or with the option `auto` you can let Vivado decide.
`inFIFODepth` and `outFIFODepth` specifies the FIFO depths that is needed by the node from the surrounding FIFOs. These attributes are used in the transformation 'InsertFIFO' to insert the appropriate FIFOs between the nodes, which will be automatically called as part of the hardware build process. `inFIFODepths` and `outFIFODepths` specifies the FIFO depths that is needed by the node from the surrounding FIFOs. These attributes are used in the transformation 'InsertFIFO' to insert the appropriate FIFOs between the nodes, which will be automatically called as part of the hardware build process.
In previous versions of FINN we had to call transformations to insert data width converters, FIFOs and `TLastMarker` manually at this step. This is no longer needed, as all this is taken care of by the `ZynqBuild` or `VitisBuild` transformations. In previous versions of FINN we had to call transformations to insert data width converters, FIFOs and `TLastMarker` manually at this step. This is no longer needed, as all this is taken care of by the `ZynqBuild` or `VitisBuild` transformations.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
model.save(build_dir+"/tfc_w1_a1_set_folding_factors.onnx") model.save(build_dir+"/tfc_w1_a1_set_folding_factors.onnx")
showInNetron(build_dir+"/tfc_w1_a1_set_folding_factors.onnx") showInNetron(build_dir+"/tfc_w1_a1_set_folding_factors.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This completes the network preparation and the network can be passed on to the next block *Vivado HLS and IPI*, which is described below. This completes the network preparation and the network can be passed on to the next block *Vivado HLS and IPI*, which is described below.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 3. Hardware Build <a id='vivado'></a> ## 3. Hardware Build <a id='vivado'></a>
We're finally ready to start generating hardware from our network. Depending on whether you want to target a Zynq or Alveo platform, FINN offers two transformations to build the accelerator, integrate into an appropriate shell and build a bitfile. These are `ZynqBuild` and `VitisBuild` for Zynq and Alveo, respectively. In this notebook we'll demonstrate the `ZynqBuild` as these boards are more common and it's much faster to complete bitfile generation for the smaller FPGAs found on them. We're finally ready to start generating hardware from our network. Depending on whether you want to target a Zynq or Alveo platform, FINN offers two transformations to build the accelerator, integrate into an appropriate shell and build a bitfile. These are `ZynqBuild` and `VitisBuild` for Zynq and Alveo, respectively. In this notebook we'll demonstrate the `ZynqBuild` as these boards are more common and it's much faster to complete bitfile generation for the smaller FPGAs found on them.
As we will be dealing with FPGA synthesis tools in these tasks, we'll define two helper variables that describe the Xilinx FPGA part name and the PYNQ board name that we are targeting. As we will be dealing with FPGA synthesis tools in these tasks, we'll define two helper variables that describe the Xilinx FPGA part name and the PYNQ board name that we are targeting.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# print the names of the supported PYNQ boards # print the names of the supported PYNQ boards
from finn.util.basic import pynq_part_map from finn.util.basic import pynq_part_map
print(pynq_part_map.keys()) print(pynq_part_map.keys())
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# change this if you have a different PYNQ board, see list above # change this if you have a different PYNQ board, see list above
pynq_board = "Pynq-Z1" pynq_board = "Pynq-Z1"
fpga_part = pynq_part_map[pynq_board] fpga_part = pynq_part_map[pynq_board]
target_clk_ns = 10 target_clk_ns = 10
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
In previous versions of FINN, we had to manually go through several steps to generate HLS code, stitch IP, create a PYNQ project and run synthesis. All these steps are now performed by the `ZynqBuild` transform (or the `VitisBuild` transform for Alveo). **As this involves calling HLS synthesis and Vivado synthesis, this transformation will run for some time (up to half an hour depending on your PC).** In previous versions of FINN, we had to manually go through several steps to generate HLS code, stitch IP, create a PYNQ project and run synthesis. All these steps are now performed by the `ZynqBuild` transform (or the `VitisBuild` transform for Alveo). **As this involves calling HLS synthesis and Vivado synthesis, this transformation will run for some time (up to half an hour depending on your PC).**
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from finn.transformation.fpgadataflow.make_zynq_proj import ZynqBuild from finn.transformation.fpgadataflow.make_zynq_proj import ZynqBuild
model = ModelWrapper(build_dir+"/tfc_w1_a1_set_folding_factors.onnx") model = ModelWrapper(build_dir+"/tfc_w1_a1_set_folding_factors.onnx")
model = model.transform(ZynqBuild(platform = pynq_board, period_ns = target_clk_ns)) model = model.transform(ZynqBuild(platform = pynq_board, period_ns = target_clk_ns))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
After the `ZynqBuild` we run one additional transformation to generate a PYNQ driver for the accelerator. After the `ZynqBuild` we run one additional transformation to generate a PYNQ driver for the accelerator.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from finn.transformation.fpgadataflow.make_pynq_driver import MakePYNQDriver from finn.transformation.fpgadataflow.make_pynq_driver import MakePYNQDriver
model = model.transform(MakePYNQDriver("zynq-iodma")) model = model.transform(MakePYNQDriver("zynq-iodma"))
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
model.save(build_dir + "/tfc_w1_a1_post_synthesis.onnx") model.save(build_dir + "/tfc_w1_a1_post_synthesis.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Examining the generated outputs <a id='gen_outputs'></a> ### Examining the generated outputs <a id='gen_outputs'></a>
Let's start by viewing the post-synthesis model in Netron: Let's start by viewing the post-synthesis model in Netron:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
showInNetron(build_dir + "/tfc_w1_a1_post_synthesis.onnx") showInNetron(build_dir + "/tfc_w1_a1_post_synthesis.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We can see that our sequence of HLS layers has been replaced with `StreamingDataflowPartition`s, each of which point to a different ONNX file. You can open a Netron session for each of them to view their contents. Here, the first and last partitions contain only an `IODMA` node, which was inserted automatically to move data between DRAM and the accelerator. Let's take a closer look at the middle partition, which contains all our layers: We can see that our sequence of HLS layers has been replaced with `StreamingDataflowPartition`s, each of which point to a different ONNX file. You can open a Netron session for each of them to view their contents. Here, the first and last partitions contain only an `IODMA` node, which was inserted automatically to move data between DRAM and the accelerator. Let's take a closer look at the middle partition, which contains all our layers:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
model = ModelWrapper(build_dir + "/tfc_w1_a1_post_synthesis.onnx") model = ModelWrapper(build_dir + "/tfc_w1_a1_post_synthesis.onnx")
sdp_node_middle = getCustomOp(model.graph.node[1]) sdp_node_middle = getCustomOp(model.graph.node[1])
postsynth_layers = sdp_node_middle.get_nodeattr("model") postsynth_layers = sdp_node_middle.get_nodeattr("model")
showInNetron(postsynth_layers) showInNetron(postsynth_layers)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We can see that `StreamingFIFO` and `StreamingDataWidthConverter` instances have been automatically inserted into the graph prior to hardware build. Transformations like `ZynqBuild` use the `metadata_props` of the model to put in additional metadata information relevant to the results of the transformation. Let's examine the metadata for the current graph containing all layers: We can see that `StreamingFIFO` and `StreamingDataWidthConverter` instances have been automatically inserted into the graph prior to hardware build. Transformations like `ZynqBuild` use the `metadata_props` of the model to put in additional metadata information relevant to the results of the transformation. Let's examine the metadata for the current graph containing all layers:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
model = ModelWrapper(postsynth_layers) model = ModelWrapper(postsynth_layers)
model.model.metadata_props model.model.metadata_props
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Here we see that a Vivado project was built to create what we call the `stitched IP`, where all the IP blocks implementing various layers will be stitched together. You can view this stitched block design in Vivado, or [here](StreamingDataflowPartition_1.pdf) as an exported PDF. Here we see that a Vivado project was built to create what we call the `stitched IP`, where all the IP blocks implementing various layers will be stitched together. You can view this stitched block design in Vivado, or [here](StreamingDataflowPartition_1.pdf) as an exported PDF.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Moving back to the top-level model, recall that `ZynqBuild` will create a Vivado project and synthesize it, so it will be creating metadata entries related to the paths and files that were created: Moving back to the top-level model, recall that `ZynqBuild` will create a Vivado project and synthesize it, so it will be creating metadata entries related to the paths and files that were created:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
model = ModelWrapper(build_dir + "/tfc_w1_a1_post_synthesis.onnx") model = ModelWrapper(build_dir + "/tfc_w1_a1_post_synthesis.onnx")
model.model.metadata_props model.model.metadata_props
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Here, we can see the directories that were created for the PYNQ driver (`pynq_driver_dir`) and the Vivado synthesis project (`vivado_pynq_proj`), as well as the locations of the bitfile, hardware handoff file and synthesis report. Here, we can see the directories that were created for the PYNQ driver (`pynq_driver_dir`) and the Vivado synthesis project (`vivado_pynq_proj`), as well as the locations of the bitfile, hardware handoff file and synthesis report.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
! ls {model.get_metadata_prop("vivado_pynq_proj")} ! ls {model.get_metadata_prop("vivado_pynq_proj")}
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Feel free to examine the generated Vivado project to get a feel for how the system-level integration is performed for the FINN-generated "stitched IP", which appears as `StreamingDataflowPartition_1` in the top-level block design -- you can see it as a block diagram exported to PDF [here](top.pdf). Feel free to examine the generated Vivado project to get a feel for how the system-level integration is performed for the FINN-generated "stitched IP", which appears as `StreamingDataflowPartition_1` in the top-level block design -- you can see it as a block diagram exported to PDF [here](top.pdf).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 4. PYNQ deployment <a id='hw_test'></a> ## 4. PYNQ deployment <a id='hw_test'></a>
* [Deployment and Remote Execution](#deploy) * [Deployment and Remote Execution](#deploy)
* [Validation on PYNQ Board](#validation) * [Validation on PYNQ Board](#validation)
* [Throughput Test on PYNQ Board](#throughput) * [Throughput Test on PYNQ Board](#throughput)
We are almost done preparing our hardware design. We'll now put it in a form suitable for use as a PYNQ overlay, synthesize and deploy it. We are almost done preparing our hardware design. We'll now put it in a form suitable for use as a PYNQ overlay, synthesize and deploy it.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Deployment and Remote Execution <a id='deploy'></a> ### Deployment and Remote Execution <a id='deploy'></a>
We'll now use the `DeployToPYNQ` transformation to create a deployment folder with the bitfile and driver file(s), and copy that to the PYNQ board. You can change the default IP address, username, password and target folder for the PYNQ below. We'll now use the `DeployToPYNQ` transformation to create a deployment folder with the bitfile and driver file(s), and copy that to the PYNQ board. You can change the default IP address, username, password and target folder for the PYNQ below.
**Make sure you've [set up the SSH keys for your PYNQ board](https://finn-dev.readthedocs.io/en/latest/getting_started.html#pynq-board-first-time-setup) before executing this step.** **Make sure you've [set up the SSH keys for your PYNQ board](https://finn-dev.readthedocs.io/en/latest/getting_started.html#pynq-board-first-time-setup) before executing this step.**
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import os import os
# set up the following values according to your own environment # set up the following values according to your own environment
# FINN will use ssh to deploy and run the generated accelerator # FINN will use ssh to deploy and run the generated accelerator
ip = "192.168.2.99" ip = "192.168.2.99"
username = os.getenv("PYNQ_USERNAME", "xilinx") username = os.getenv("PYNQ_USERNAME", "xilinx")
password = os.getenv("PYNQ_PASSWORD", "xilinx") password = os.getenv("PYNQ_PASSWORD", "xilinx")
port = os.getenv("PYNQ_PORT", 22) port = os.getenv("PYNQ_PORT", 22)
target_dir = os.getenv("PYNQ_TARGET_DIR", "/home/xilinx/finn_tfc_end2end_example") target_dir = os.getenv("PYNQ_TARGET_DIR", "/home/xilinx/finn_tfc_end2end_example")
# set up ssh options to only allow publickey authentication # set up ssh options to only allow publickey authentication
options = "-o PreferredAuthentications=publickey -o PasswordAuthentication=no" options = "-o PreferredAuthentications=publickey -o PasswordAuthentication=no"
# test access to PYNQ board # test access to PYNQ board
! ssh {options} {username}@{ip} -p {port} cat /var/run/motd.dynamic ! ssh {options} {username}@{ip} -p {port} cat /var/run/motd.dynamic
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from finn.transformation.fpgadataflow.make_deployment import DeployToPYNQ from finn.transformation.fpgadataflow.make_deployment import DeployToPYNQ
model = model.transform(DeployToPYNQ(ip, port, username, password, target_dir)) model = model.transform(DeployToPYNQ(ip, port, username, password, target_dir))
model.save(build_dir + "/tfc_w1_a1_pynq_deploy.onnx") model.save(build_dir + "/tfc_w1_a1_pynq_deploy.onnx")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Let's verify that the remote access credentials is saved in the model metadata, and that the deployment folder has been successfully copied to the board: Let's verify that the remote access credentials is saved in the model metadata, and that the deployment folder has been successfully copied to the board:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
model.model.metadata_props model.model.metadata_props
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
target_dir_pynq = target_dir + "/" + model.get_metadata_prop("pynq_deployment_dir").split("/")[-1] target_dir_pynq = target_dir + "/" + model.get_metadata_prop("pynq_deployment_dir").split("/")[-1]
target_dir_pynq target_dir_pynq
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
! ssh {options} {username}@{ip} -p {port} 'ls -l {target_dir_pynq}' ! ssh {options} {username}@{ip} -p {port} 'ls -l {target_dir_pynq}'
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We only have two more steps to be able to remotely execute the deployed bitfile with some test data from the MNIST dataset. Let's load up some test data that comes bundled with FINN. We only have two more steps to be able to remotely execute the deployed bitfile with some test data from the MNIST dataset. Let's load up some test data that comes bundled with FINN.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from pkgutil import get_data from pkgutil import get_data
import onnx.numpy_helper as nph import onnx.numpy_helper as nph
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
raw_i = get_data("qonnx.data", "onnx/mnist-conv/test_data_set_0/input_0.pb") raw_i = get_data("qonnx.data", "onnx/mnist-conv/test_data_set_0/input_0.pb")
x = nph.to_array(onnx.load_tensor_from_string(raw_i)) x = nph.to_array(onnx.load_tensor_from_string(raw_i))
plt.imshow(x.reshape(28,28), cmap='gray') plt.imshow(x.reshape(28,28), cmap='gray')
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
model = ModelWrapper(build_dir + "/tfc_w1_a1_pynq_deploy.onnx") model = ModelWrapper(build_dir + "/tfc_w1_a1_pynq_deploy.onnx")
iname = model.graph.input[0].name iname = model.graph.input[0].name
oname = parent_model.graph.output[0].name oname = parent_model.graph.output[0].name
ishape = model.get_tensor_shape(iname) ishape = model.get_tensor_shape(iname)
print("Expected network input shape is " + str(ishape)) print("Expected network input shape is " + str(ishape))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Finally, we can call `execute_onnx` on the graph, which will internally call remote execution with the bitfile, grab the results and return a numpy array. You may recall that one "reshape" node was left out of the StreamingDataflowPartition. We'll do that manually with a numpy function call when passing in the input, but everything else in the network ended up inside the StreamingDataflowPartition so that's all we need to do. Finally, we can call `execute_onnx` on the graph, which will internally call remote execution with the bitfile, grab the results and return a numpy array. You may recall that one "reshape" node was left out of the StreamingDataflowPartition. We'll do that manually with a numpy function call when passing in the input, but everything else in the network ended up inside the StreamingDataflowPartition so that's all we need to do.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import numpy as np import numpy as np
from finn.core.onnx_exec import execute_onnx from finn.core.onnx_exec import execute_onnx
input_dict = {iname: x.reshape(ishape)} input_dict = {iname: x.reshape(ishape)}
ret = execute_onnx(model, input_dict) ret = execute_onnx(model, input_dict)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
ret[oname] ret[oname]
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We see that the network correctly predicts this as a digit 2. We see that the network correctly predicts this as a digit 2.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Validating the Accuracy on a PYNQ Board <a id='validation'></a> ### Validating the Accuracy on a PYNQ Board <a id='validation'></a>
All the command line prompts here are meant to be executed with `sudo` on the PYNQ board, so we'll use a workaround (`echo password | sudo -S command`) to get that working from this notebook running on the host computer. All the command line prompts here are meant to be executed with `sudo` on the PYNQ board, so we'll use a workaround (`echo password | sudo -S command`) to get that working from this notebook running on the host computer.
**Ensure that your PYNQ board has a working internet connecting for the next steps, since there is some downloading involved.** **Ensure that your PYNQ board has a working internet connecting for the next steps, since there is some downloading involved.**
To validate the accuracy, we first need to install the [`dataset-loading`](https://github.com/fbcotter/dataset_loading) Python package to the PYNQ board. This will give us a convenient way of downloading and accessing the MNIST dataset. To validate the accuracy, we first need to install the [`dataset-loading`](https://github.com/fbcotter/dataset_loading) Python package to the PYNQ board. This will give us a convenient way of downloading and accessing the MNIST dataset.
Command to execute on PYNQ: Command to execute on PYNQ:
```sudo pip3 install git+https://github.com/fbcotter/dataset_loading.git@0.0.4#egg=dataset_loading``` ```sudo pip3 install git+https://github.com/fbcotter/dataset_loading.git@0.0.4#egg=dataset_loading```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
! ssh {options} -t {username}@{ip} -p {port} 'echo {password} | sudo -S pip3 install git+https://github.com/fbcotter/dataset_loading.git@0.0.4#egg=dataset_loading' ! ssh {options} -t {username}@{ip} -p {port} 'echo {password} | sudo -S pip3 install git+https://github.com/fbcotter/dataset_loading.git@0.0.4#egg=dataset_loading'
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We can now use the `validate.py` script that was generated together with the driver to measure top-1 accuracy on the MNIST dataset. We can now use the `validate.py` script that was generated together with the driver to measure top-1 accuracy on the MNIST dataset.
Command to execute on PYNQ: Command to execute on PYNQ:
`python3.6 validate.py --dataset mnist --batchsize 1000` `python3.6 validate.py --dataset mnist --batchsize 1000`
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
! ssh {options} -t {username}@{ip} -p {port} 'cd {target_dir_pynq}; echo {password} | sudo -S python3.6 validate.py --dataset mnist --batchsize 1000' ! ssh {options} -t {username}@{ip} -p {port} 'cd {target_dir_pynq}; echo {password} | sudo -S python3.6 validate.py --dataset mnist --batchsize 1000'
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We see that the final top-1 accuracy is 92.96%, which is very close to the 93.17% reported on the [BNN-PYNQ accuracy table in Brevitas](https://github.com/Xilinx/brevitas/tree/master/src/brevitas_examples/bnn_pynq). We see that the final top-1 accuracy is 92.96%, which is very close to the 93.17% reported on the [BNN-PYNQ accuracy table in Brevitas](https://github.com/Xilinx/brevitas/tree/master/src/brevitas_examples/bnn_pynq).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Throughput Test on PYNQ Board <a id='throughput'></a> ### Throughput Test on PYNQ Board <a id='throughput'></a>
In addition to the functional verification, FINN also offers the possibility to measure the network performance directly on the PYNQ board. This can be done using the core function `throughput_test`. In the next section we import the function and execute it. In addition to the functional verification, FINN also offers the possibility to measure the network performance directly on the PYNQ board. This can be done using the core function `throughput_test`. In the next section we import the function and execute it.
First we extract the `remote_exec_model` again and pass it to the function. The function returns the metrics of the network as dictionary. First we extract the `remote_exec_model` again and pass it to the function. The function returns the metrics of the network as dictionary.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from finn.core.throughput_test import throughput_test_remote from finn.core.throughput_test import throughput_test_remote
model = ModelWrapper(build_dir + "/tfc_w1_a1_pynq_deploy.onnx") model = ModelWrapper(build_dir + "/tfc_w1_a1_pynq_deploy.onnx")
res = throughput_test_remote(model, 10000) res = throughput_test_remote(model, 10000)
print("Network metrics:") print("Network metrics:")
for key in res: for key in res:
print(str(key) + ": " + str(res[key])) print(str(key) + ": " + str(res[key]))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Together with the values for folding we can evaluate the performance of our accelerator. Each layer has a total folding factor of 64 and because the network is fully pipelined, it follows: `II = 64`. II is the initiation interval and indicates how many cycles are needed for one input to be processed. Together with the values for folding we can evaluate the performance of our accelerator. Each layer has a total folding factor of 64 and because the network is fully pipelined, it follows: `II = 64`. II is the initiation interval and indicates how many cycles are needed for one input to be processed.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
II = 64 II = 64
# frequency in MHz # frequency in MHz
f_MHz = 100 f_MHz = 100
# expected throughput in MFPS # expected throughput in MFPS
expected_throughput = f_MHz / II expected_throughput = f_MHz / II
# measured throughput (FPS) from throughput test, converted to MFPS # measured throughput (FPS) from throughput test, converted to MFPS
measured_throughput = res["throughput[images/s]"] * 0.000001 measured_throughput = res["throughput[images/s]"] * 0.000001
# peformance # peformance
print("We reach approximately " + str(round((measured_throughput / expected_throughput)*100)) + "% of the ideal performance.") print("We reach approximately " + str(round((measured_throughput / expected_throughput)*100)) + "% of the ideal performance.")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The measured values were recorded with a batch size of 10000 and at a frequency of 100 MHz. We will be improving the efficiency of the generated accelerator examples in the coming FINN releases. The measured values were recorded with a batch size of 10000 and at a frequency of 100 MHz. We will be improving the efficiency of the generated accelerator examples in the coming FINN releases.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment