[Notebook] import onnx prior to torch for segfault workaround

2a754f75 · Yaman Umuroglu · c1a0afd7 · 2a754f75
Commit 2a754f75 authored 5 years ago by Yaman Umuroglu
--- a/notebooks/9-FINN-EndToEndFlow.ipynb
+++ b/notebooks/9-FINN-EndToEndFlow.ipynb
@@ -13,7 +13,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
@@ -99,6 +99,7 @@
    }
   ],
   "source": [
+    "import onnx\n",
    "from finn.util.test import get_fc_model_trained\n",
    "import brevitas.onnx as bo\n",
    "\n",
@@ -163,7 +164,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
@@ -232,7 +233,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
@@ -261,13 +262,15 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
+      "\n",
+      "Stopping http://0.0.0.0:8081\n",
      "Serving '/workspace/finn/tfc_w1_a1.onnx' at http://0.0.0.0:8081\n"
     ]
    }
@@ -279,7 +282,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
@@ -312,7 +315,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
@@ -364,7 +367,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
@@ -385,7 +388,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {

 %% Cell type:markdown id: tags:

 # FINN - End-to-End Flow
 -----------------------------------------------------------------
 This notebook gives an overview about the end to end flow of FINN. From loading an ONNX model from Brevitas, followed by the numerous transformations in FINN and up to the generation of a bitstream that can be used to load an FPGA.

 We'll use the following showSrc function to print the source code for function calls in the Jupyter notebook.

 %% Cell type:code id: tags:

 ``` python
 import inspect
 import netron
 from finn.util.basic import make_build_dir

 def showSrc(what):
    print("".join(inspect.getsourcelines(what)[0]))

 build_dir = "/workspace/finn"
 ```

 %% Cell type:markdown id: tags:

 ## Overview
 The notebook is based on the following diagram.

 %% Cell type:markdown id: tags:

 ![](finn-design-flow-example.svg)

 %% Cell type:markdown id: tags:

 The diagram visualizes the end-to-end flow of FINN. The cylinder-like fields show the state of the network representation in the respective step. The rectangular fields represent the transformations that are applied to the network to achieve a certain result. The diagram is divided into 5 blocks, each of it includes several flow steps. The flow starts in top left corner with Brevitas export (pink block), followed by the preparation of the network (grey block) for the Vivado HLS and Vivado synthesis (yellow block). There is also a section for testing and verification in software (green block) and the hardware test on the PYNQ board (red block).
 The diagram leads to the following outline for this Jupyter notebook.

 %% Cell type:markdown id: tags:

 ## Outline
 -------------
 1. [Brevitas export](#brev_exp)
 2. [Network preparation](#nw_prep)
    * Basic transformations
    * Streamlining
    * Conversion to HLS layers
    * Folding
 3. [Vivado HLS and Vivado synthesis](#vivado)
    * HLS IP per layer
    * Creation of stitched design
    * PYNQ shell project
    * Synthesis, place and route
 4. [Hardware Test](#hw_test)
 5. [Simulation & Emulation flows for functional verification](#sim)
    * Simulation using Python
    * Simulation (npysim) using C++
    * Emulation (rtlsim) using PyVerilator

 %% Cell type:markdown id: tags:

 ## 1. Brevitas export <a id='brev_exp'></a>
 FINN expects an ONNX model as input. This can be a model trained with [Brevitas](https://github.com/Xilinx/brevitas). Brevitas is a Pytorch library for quantization-aware training and the FINN Docker image comes with several [example Brevitas networks](https://github.com/maltanar/brevitas_cnv_lfc). To show the FINN end-to-end flow, we'll use the TFC-w1a1 model as example network. The Brevitas export is only briefly described here, for details see Jupyter notebook [3-FINN-Brevitas-network-import](3-FINN-Brevitas-network-import.ipynb).

 First a few things have to be imported. Then the model can be loaded with the pretrained weights.

 %% Cell type:code id: tags:

 ``` python
+import onnx
 from finn.util.test import get_fc_model_trained
 import brevitas.onnx as bo

 tfc = get_fc_model_trained("TFC", 1, 1)
 bo.export_finn_onnx(tfc, (1, 1, 28, 28), build_dir+"/tfc_w1_a1.onnx")
 ```

 %% Output

    /workspace/brevitas_cnv_lfc/training_scripts/models/TFC.py:73: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
      x = 2.0 * x - torch.tensor([1.0])

 %% Cell type:markdown id: tags:

 The model was now exported, loaded with the pretrained weights and saved under the name "lfc_w1_a1.onnx".
 To visualize the exported model, Netron can be used. Netron is a visualizer for neural networks and allows interactive investigation of network properties. For example, you can click on the individual nodes and view the properties.

 %% Cell type:code id: tags:

 ``` python
 netron.start(build_dir+"/tfc_w1_a1.onnx", port=8081, host="0.0.0.0")
 ```

 %% Output

    Serving '/workspace/finn/tfc_w1_a1.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
 ```

 %% Output


 %% Cell type:markdown id: tags:

 Now that we have the model in .onnx format, we can work with it using FINN. For that FINN `ModelWrapper` is used. It is a wrapper around the ONNX model which provides several helper functions to make it easier to work with the model. For details see Jupyter notebook [2-FINN-ModelWrapper](2-FINN-ModelWrapper.ipynb).

 %% Cell type:code id: tags:

 ``` python
 from finn.core.modelwrapper import ModelWrapper
 model = ModelWrapper(build_dir+"/tfc_w1_a1.onnx")
 ```

 %% Cell type:markdown id: tags:

 Now the model is prepared and could be simulated using Python. How this works is described in subsection [Simulation using Python](#simpy) in the section about *Simulation & Emulation flows for functional verification*.

 The model can now also be processed in different ways. The principle of FINN are analysis and transformation passes, which can be applied to the model. An analysis pass extracts specific information about the model and returns it to the user in the form of a dictionary. For more details see [4-FINN-HowToAnalysisPass](4-FINN-HowToAnalysisPass.ipynb). A transformation pass changes the model and returns the changed model back to the FINN flow, for more information about transformation passes see notebook [5-FINN-HowToTransformationPass](5-FINN-HowToTransformationPass.ipynb).

 Since the goal in this notebook is to process the model to such an extent that a bitstream can be generated from it, the focus is on the transformations that are necessary for this. In the next section these are discussed in more detail.

 %% Cell type:markdown id: tags:

 ## 2. Network preparation <a id='nw_prep'></a>

 * [Basic transformations](#basic_trafo)
 * [Streamlining](#streamline)
 * [Conversion to HLS layers](#hls_layers)
 * [Folding](#folding)

 %% Cell type:markdown id: tags:

 ### Basic transformations <a id='basic_trafo'></a>
 This section deals with the basic transformations, which are applied to the model like a kind of clean up. They do not appear in the diagram above, but they are applied in many steps in the FINN flow to postprocess the model after a transformation and/or to prepare it for the next transformation.

 %% Cell type:markdown id: tags:

 The basis transformations are:
 * GiveUniqueNodeNames
 * GiveReadableTensorNames
 * InferShapes
 * InferDataTypes
 * FoldConstants

 %% Cell type:markdown id: tags:

 These transformations work like a clean up. In the first two transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames`) the nodes and tensors are given unique and readable names. The following two transformations (`InferShapes`, `InferDataTypes`) derive the shapes and data types of the tensors from the model properties and set them in the `ValueInfo` of the model. Normally these transformations can always be applied and do not affect the structure of the graph, but ensure that all the information needed is available.

 The last listed transformation is `FoldConstants`. It identifies a node with constant inputs and determines its output. The result is then set as const-only inputs for the following node and the old node is removed. Although this transformation changes the structure of the model, it is a transformation that is usually always desired and can be applied to any model.

 %% Cell type:markdown id: tags:

 The transformations can be imported and applied as follows.

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames
 from finn.transformation.infer_shapes import InferShapes
 from finn.transformation.infer_datatypes import InferDataTypes
 from finn.transformation.fold_constants import FoldConstants

 model = model.transform(InferShapes())
 model = model.transform(FoldConstants())
 model = model.transform(GiveUniqueNodeNames())
 model = model.transform(GiveReadableTensorNames())
 model = model.transform(InferDataTypes())


 # save model with other name for section "Simulation using Python"
 model.save(build_dir+"/tfc_w1_a1_after_brevitas_export.onnx")
 ```

 %% Cell type:markdown id: tags:

 The result of these transformations can be viewed with netron after the model has been saved again. By clicking on the individual nodes, it can now be seen, for example, that each node has been given a name. Also the whole upper area could be folded, so that now the first node is "Reshape".

 %% Cell type:code id: tags:

 ``` python
 model.save(build_dir+"/tfc_w1_a1.onnx")
 netron.start(build_dir+"/tfc_w1_a1.onnx", port=8081, host="0.0.0.0")
 ```

 %% Output

+    
+    Stopping http://0.0.0.0:8081
    Serving '/workspace/finn/tfc_w1_a1.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
 ```

 %% Output


 %% Cell type:markdown id: tags:

 ### Streamlining <a id='streamline'></a>
 Streamlining is a transformation containing several sub-transformations. The goal of streamlining is to eliminate floating point operations by moving them around, then collapsing them into one operation and in the last step transform them into multithresholding nodes. For the theoretical background see arXiv:1709.04060.

 In the following the streamlining transformation is shown.

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.streamline import Streamline
 showSrc(Streamline)
 ```

 %% Output

    class Streamline(Transformation):
        """Apply the streamlining transform, see arXiv:1709.04060."""
    
        def apply(self, model):
            streamline_transformations = [
                ConvertSubToAdd(),
                BatchNormToAffine(),
                ConvertSignToThres(),
                MoveScalarAddPastMatMul(),
                MoveScalarMulPastMatMul(),
                MoveAddPastMul(),
                CollapseRepeatedAdd(),
                CollapseRepeatedMul(),
                AbsorbAddIntoMultiThreshold(),
                FactorOutMulSignMagnitude(),
                AbsorbMulIntoMultiThreshold(),
                Absorb1BitMulIntoMatMul(),
                RoundAndClipThresholds(),
            ]
            for trn in streamline_transformations:
                model = model.transform(trn)
                model = model.transform(GiveUniqueNodeNames())
                model = model.transform(GiveReadableTensorNames())
                model = model.transform(InferDataTypes())
            return (model, False)
    

 %% Cell type:markdown id: tags:

 As can be seen, several transformations are involved in the streamlining transformation. There are move and collapse transformations. In the last step the operations are transformed into multithresholds. The involved transformations can be viewed in detail [here](https://github.com/Xilinx/finn/tree/dev/src/finn/transformation/streamline). After each transformation, three of the basic transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames` and `InferDataTypes`) are applied to the model as clean up.

 After streamlining the network looks as follows.

 %% Cell type:code id: tags:

 ``` python
 model = model.transform(Streamline())
 model.save(build_dir+"/tfc_w1_a1.onnx")
 netron.start(build_dir+"/tfc_w1_a1.onnx", port=8081, host="0.0.0.0")
 ```

 %% Output

    
    Stopping http://0.0.0.0:8081
    Serving '/workspace/finn/tfc_w1_a1.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
 ```

 %% Output


 %% Cell type:markdown id: tags:

 Our example network is a quantized network with 1 bit precision. For this reason, after streamlining, the resulting bipolar matrix multiplications are converted into xnorpopcount operations. This transformation produces operations that are again collapsed and converted into thresholds. This procedure is shown below.
 In this state the model can still be simulated with Python, even if it no longer contains only standard ONNX nodes. For details, see section [Simulation using Python](#simpy).

 After these finishing transformations, the nodes can be converted to HLS layers for further processing.

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.bipolar_to_xnor import ConvertBipolarMatMulToXnorPopcount
 import finn.transformation.streamline.absorb as absorb
 from finn.transformation.streamline.round_thresholds import RoundAndClipThresholds

 model = model.transform(ConvertBipolarMatMulToXnorPopcount())
 model = model.transform(absorb.AbsorbAddIntoMultiThreshold())
 model = model.transform(absorb.AbsorbMulIntoMultiThreshold())
 model = model.transform(RoundAndClipThresholds())
 ```

 %% Cell type:markdown id: tags:

 ### Conversion to HLS layers <a id='hls_layers'></a>
 Converts the nodes to HLS layers that correspond to the functions in [finn-hls library](https://finn-hlslib.readthedocs.io/en/latest/). In our case this transformation onverts pairs of binary XnorPopcountMatMul layers to StreamingFCLayer_Batch layers. Any immediately following MultiThreshold layers will also be absorbed into the MVTU.

 Below is the code for the transformation and the network is visualized using netron to create the new structure which now corresponds to the finn-hls library.

 %% Cell type:code id: tags:

 ``` python
 import finn.transformation.fpgadataflow.convert_to_hls_layers as to_hls
 model = model.transform(to_hls.InferBinaryStreamingFCLayer())
 model.save(build_dir+"/tfc_w1_a1.onnx")
 netron.start(build_dir+"/tfc_w1_a1.onnx", port=8081, host="0.0.0.0")
 ```

 %% Output

    
    Stopping http://0.0.0.0:8081
    Serving '/workspace/finn/tfc_w1_a1.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
 ```

 %% Output


 %% Cell type:markdown id: tags:

 Each StreamingFCLayer_Batch node has two attributes that specify the degree of folding, PE and SIMD. In all nodes the values for these attributes are set as default to 1, which would correspond to a maximum folding. The user can now adjust the folding as desired. This is described in the next section.

 At this point the model can also be simulated using C++. The exact procedure is described in section [Simulation using C++](#npysim).

 %% Cell type:markdown id: tags:

 ### Creating a Dataflow Partition <a id='dataflow_partition'></a>

 In the graph above, you can see that there is a mixture of FINN HLS layers (StreamingFCLayer_Batch) with regular ONNX layers (Reshape, Mul, Add). To create a bitstream, FINN needs a model with only HLS layers. In order to achieve this, we will use the `CreateDataflowPartition` transformation to create a "dataflow partition" in this graph, separating out the HLS layers into another model, and replacing them with a placeholder layer called StreamingDataflowPartition:

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.fpgadataflow.create_dataflow_partition import CreateDataflowPartition

 parent_model = model.transform(CreateDataflowPartition())
 parent_model.save(build_dir+"/tfc_w1_a1_dataflow_parent.onnx")
 netron.start(build_dir+"/tfc_w1_a1_dataflow_parent.onnx", port=8081, host="0.0.0.0")
 ```

 %% Output

    
    Stopping http://0.0.0.0:8081
    Serving '/workspace/finn/tfc_w1_a1_dataflow_parent.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
 ```

 %% Output


 %% Cell type:markdown id: tags:

 We can see that the StreamingFCLayer instances have all been replaced with a single `StreamingDataflowPartition`, which has an attribute `model` that points to the extracted, HLS dataflow-only graph:

 %% Cell type:code id: tags:

 ``` python
 from finn.custom_op.registry import getCustomOp
 sdp_node = getCustomOp(parent_model.graph.node[2])
 dataflow_model_filename = sdp_node.get_nodeattr("model")
 netron.start(dataflow_model_filename, port=8081, host="0.0.0.0")
 ```

 %% Output

    
    Stopping http://0.0.0.0:8081
    Serving '/tmp/finn_maltanar_22115/dataflow_partition_9vof1ltc/df_model.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
 ```

 %% Output


 %% Cell type:markdown id: tags:

 We can see all the extracted `StreamingFCLayer` instances have been moved to the child (dataflow) model. We will load the child model with `ModelWrapper` and continue working on it.

 %% Cell type:code id: tags:

 ``` python
 model = ModelWrapper(dataflow_model_filename)
 ```

 %% Cell type:markdown id: tags:

 ### Folding and TLastMarker Insertion <a id='folding'></a>
 Since the folding parameters are node attributes, they can be easily accessed and changed using a helper function of the `ModelWrapper`. But first we have to extract the nodes which are StreamingFCLayer_Batch operations. This is where netron helps us, in the above diagram we can see that the first four nodes are StreamingFCLayer_Batch. Through the `print`s we can check if the extracted nodes all have the op_type "StreamingFCLayer_Batch". For more details on how to use ONNX model, see Jupyter notebook [1-FINN-HowToWorkWithONNX](1-FINN-HowToWorkWithONNX.ipynb).

 %% Cell type:code id: tags:

 ``` python
 fc0 = model.graph.node[0]
 fc1 = model.graph.node[1]
 fc2 = model.graph.node[2]
 fc3 = model.graph.node[3]
 print("fc0 has the op_type: " + str(fc0.op_type))
 print("fc1 has the op_type: " + str(fc1.op_type))
 print("fc2 has the op_type: " + str(fc2.op_type))
 print("fc3 has the op_type: " + str(fc3.op_type))
 ```

 %% Output

    fc0 has the op_type: StreamingFCLayer_Batch
    fc1 has the op_type: StreamingFCLayer_Batch
    fc2 has the op_type: StreamingFCLayer_Batch
    fc3 has the op_type: StreamingFCLayer_Batch

 %% Cell type:markdown id: tags:

 We can use the higher-level [HLSCustomOp](https://github.com/Xilinx/finn/blob/dev/src/finn/custom_op/fpgadataflow/__init__.py) wrappers for these nodes. These wrappers provide easy access to specific properties of these nodes, such as the folding factors (PE and SIMD). For more details about custom ops, see Jupyter notebook [7-FINN-CustomOps](7-FINN-CustomOps.ipynb). Let's have a look at which node attributes are defined by the CustomOp wrapper, and adjust the SIMD and PE attributes.

 %% Cell type:code id: tags:

 ``` python
 fc0w = getCustomOp(fc0)
 fc1w = getCustomOp(fc1)
 fc2w = getCustomOp(fc2)
 fc3w = getCustomOp(fc3)

 print("CustomOp wrapper is of class " + fc0w.__class__.__name__)

 fc0w.get_nodeattr_types()
 ```

 %% Output

    CustomOp wrapper is of class StreamingFCLayer_Batch

    {'PE': ('i', True, 0),
     'SIMD': ('i', True, 0),
     'MW': ('i', True, 0),
     'MH': ('i', True, 0),
     'resType': ('s', True, ''),
     'ActVal': ('i', False, 0),
     'inputDataType': ('s', True, ''),
     'weightDataType': ('s', True, ''),
     'outputDataType': ('s', True, ''),
     'binaryXnorMode': ('i', False, 0),
     'noActivation': ('i', False, 0),
     'backend': ('s', True, 'fpgadataflow'),
     'code_gen_dir_npysim': ('s', False, ''),
     'code_gen_dir_ipgen': ('s', False, ''),
     'executable_path': ('s', False, ''),
     'ipgen_path': ('s', False, ''),
     'sim_mode': ('s', False, ''),
     'sim_cycles': ('i', False, 0)}

 %% Cell type:code id: tags:

 ``` python
 # SIMD controls the folding over the input vector
 # PE controls the folding over the output vector

 fc0w.set_nodeattr("SIMD", 16)
 fc0w.set_nodeattr("PE", 16)

 fc1w.set_nodeattr("SIMD", 16)
 fc1w.set_nodeattr("PE", 16)

 fc2w.set_nodeattr("SIMD", 16)
 fc2w.set_nodeattr("PE", 16)

 fc3w.set_nodeattr("SIMD", 16)
 fc3w.set_nodeattr("PE", 10)
 ```

 %% Cell type:markdown id: tags:

 Finally, we will run the `InsertTLastMarker` transformation to get a `TLastMarker` node at the output of this graph, which is necessary to run the DMA engines correctly.

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.fpgadataflow.insert_tlastmarker import InsertTLastMarker
 model = model.transform(InsertTLastMarker())
 model.save(build_dir+"/tfc_w1_a1_set_folding_factors.onnx")
 ```

 %% Cell type:markdown id: tags:

 This completes the network preparation and the network can be passed on to the next block *Vivado HLS and Vivado synthesis*. Which is described below.

 %% Cell type:markdown id: tags:

 ## 3. Vivado HLS and Vivado synthesis <a id='vivado'></a>
 * [Generating HLS Code](#hls_per_layer)
 * [Synthesizing HLS to IP Blocks](#hls_synth)
 * [IP Stitching](#ip_stitching)
 * [Inserting the IP into a PYNQ Shell](#pynq_shell)
 * [Synthesis, place and route](#synth_pl_ro)

 As we will be performing FPGA synthesis in these tasks, we'll define two helper variables that describe the Xilinx FPGA part name and the PYNQ board name that we are targeting.

 %% Cell type:code id: tags:

 ``` python
 fpga_part = "xczu3eg-sbva484-1-e"
 pynq_board = "Ultra96"
 target_clk_ns = 5
 ```

 %% Cell type:markdown id: tags:

 ### Generating HLS Code <a id='hls_per_layer'></a>
 This section deals with the generation of an IP block from the different layers. These can then be stitched to a block design that corresponds to the complete model. The single conversion into IP blocks allows a good transparency and we can check the functionality of each IP block and compare it with the behaviour of the corresponding ONNX node. The emulation of such an IP block is performed with PyVerilator and is described in detail in section [Emulation (rtlsim) using Pyverilator](#rtlsim).

 %% Cell type:markdown id: tags:

 Two transformations are required to generate HLS IP blocks for each layer:
 * `CodeGen_ipgen` which generates the HLS C++ code for the node and a tcl-script which starts the HLS synthesis and exports the design as IP.
 * `HLSSynth_IPGen` which passes the tcl-script to Vivado HLS and thus performs the actual IP generation.

 We start off by giving unique node names using the basic transformation `GiveUniqueNodeNames`, and then proceed with the HLS C++ code generation with `CodeGen_ipgen`.

 %% Cell type:code id: tags:

 ``` python
 model = model.transform(GiveUniqueNodeNames())

 from finn.transformation.fpgadataflow.codegen_ipgen import CodeGen_ipgen
 model = model.transform(CodeGen_ipgen(fpga_part, target_clk_ns))
 ```

 %% Cell type:markdown id: tags:

 ### Synthesizing HLS to IP Blocks <a id='hls_synth'></a>

 Now that we have generated the HLS code for each layer, we can call the `HLSSynth_IPGen` transformation to convert the generated HLS into Vivado IP blocks. **As this involves calling HLS synthesis, this transformation will run for some time (several minutes).**

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.fpgadataflow.hlssynth_ipgen import HLSSynth_IPGen

 model = model.transform(HLSSynth_IPGen())
 ```

 %% Cell type:markdown id: tags:

 Each `StreamingFCLayer_Batch` node now has new attributes which can be examined more closely with netron.

 %% Cell type:code id: tags:

 ``` python
 model.save(build_dir+"/tfc_w1_a1_ipgen.onnx")
 netron.start(build_dir+"/tfc_w1_a1_ipgen.onnx", port=8081, host="0.0.0.0")
 ```

 %% Output

    
    Stopping http://0.0.0.0:8081
    Serving '/workspace/finn/tfc_w1_a1_ipgen.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="800"></iframe>
 ```

 %% Output


 %% Cell type:markdown id: tags:

 There are two additional attributes:
 * `code_gen_dir_ipgen` which contains the directory path where all the files generated by the ipgen transformations are stored
 * `ipgen_path` which contains the path to the project directory in which the generated IP block is stored

 We can further investigate which files are produced by taking a look in this directory. For example for the first StreamingFCLayer_Batch node.

 %% Cell type:code id: tags:

 ``` python
 fc0w = getCustomOp(model.graph.node[0])
 code_gen_dir = fc0w.get_nodeattr("code_gen_dir_ipgen")
 !ls {code_gen_dir}
 ```

 %% Output

    hls_syn_StreamingFCLayer_Batch_0.tcl  thresh.h
    ipgen.sh			      top_StreamingFCLayer_Batch_0.cpp
    params.h			      vivado_hls.log
    project_StreamingFCLayer_Batch_0

 %% Cell type:markdown id: tags:

 Directory *project_StreamingFCLayer_Batch_0* contains the project created by Vivado HLS into which the IP Block is exported, along with other files generated by Vivado HLS. If we compare it to the above visualization of the network with netron, this is exactly the name of the folder stored in the node attribute `ipgen_path`. The .cpp code that is passed to Vivado HLS can be found in the file *top_StreamingFCLayer_Batch_0.cpp*. The files *params.h* and *thresh.h* belong to that as well, they contain the values for the weights and thresholds. *vivado_hls.log* is the log file from Vivado HLS. Besides these files, the folder contains *ipgen.sh* and *hls_syn_StreamingFCLayer_Batch_0.tcl*. First we take a look at *ipgen.sh*.

 %% Cell type:code id: tags:

 ``` python
 shell_script = code_gen_dir + "/ipgen.sh"
 !cat {shell_script}
 ```

 %% Output

    #!/bin/bash
    cd /tmp/finn_maltanar_22115/code_gen_ipgen_StreamingFCLayer_Batch_bwxffr0g
    vivado_hls /tmp/finn_maltanar_22115/code_gen_ipgen_StreamingFCLayer_Batch_bwxffr0g/hls_syn_StreamingFCLayer_Batch_0.tcl
    cd /workspace/finn

 %% Cell type:markdown id: tags:

 The script consists only of two framing `cd` commands and a command to pass the tcl script to *vivado_hls*. The directory has to be changed to create the files in the correct folder and will then be changed back to the original directory.

 Below is the tcl script which is passed to *vivado_hls*.

 %% Cell type:code id: tags:

 ``` python
 tcl_script = code_gen_dir + "/hls_syn_StreamingFCLayer_Batch_0.tcl"
 !cat {tcl_script}
 ```

 %% Output

    
    set config_proj_name project_StreamingFCLayer_Batch_0
    puts "HLS project: $config_proj_name"
    set config_hwsrcdir "/tmp/finn_maltanar_22115/code_gen_ipgen_StreamingFCLayer_Batch_bwxffr0g"
    puts "HW source dir: $config_hwsrcdir"
    set config_proj_part "xczu3eg-sbva484-1-e"
    
    set config_bnnlibdir "/workspace/finn-hlslib"
    
    set config_toplevelfxn "StreamingFCLayer_Batch_0"
    set config_clkperiod 5
    
    open_project $config_proj_name
    add_files $config_hwsrcdir/top_StreamingFCLayer_Batch_0.cpp -cflags "-std=c++0x -I$config_bnnlibdir"
    
    set_top $config_toplevelfxn
    open_solution sol1
    set_part $config_proj_part
    
    config_interface -m_axi_addr64
    
    create_clock -period $config_clkperiod -name default
    csynth_design
    export_design -format ip_catalog
    exit 0

 %% Cell type:markdown id: tags:

 In the first part of the script the project is configured. For example the FPGA part and the clock are set. Then the project is opened and the files are added. The toplevel function is set and after creating a clock, the design is first synthesized with `csynth` and then exported as an IP block.

 Now that all IP blocks are in place, they can be stitched together to create an IP design that matches the ONNX model. This is covered in the next section.

 %% Cell type:code id: tags:

 ``` python
 # save model with other name for section "Emulation using PyVerilator""
 model.save(build_dir+"/tfc_w1_a1_after_hls_ip_per_layer.onnx")
 ```

 %% Cell type:markdown id: tags:

 ### IP Stitching <a id='ip_stitching'></a>

 We now have IP blocks for each of our layers, and will stitch them together into a larger IP that implements the whole network using the `CodeGen_ipstitch` transformation. Bear in mind that this transformation can only be applied on a graph that only contains HLS nodes that already have been through the `HLSSynth_IPGen` transformation, which is the last step we performed. **This invokes Vivado and may take a few minutes to run.**

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.fpgadataflow.codegen_ipstitch import CodeGen_ipstitch

 model = model.transform(CodeGen_ipstitch(fpga_part))
 ```

 %% Cell type:markdown id: tags:

 If you examine the nodes themselves on the transformed model you won't see a difference, because the IP stitching adds model-level metadata to the graph. This can be accessed using the `.model.metadata_props`, the `get_metadata_prop` function in `ModelWrapper`, or by clicking on the global input/output tensors in Netron.

 %% Cell type:code id: tags:

 ``` python
 model.model.metadata_props
 ```

 %% Output

    [key: "vivado_stitch_proj"
    value: "/tmp/finn_maltanar_22115/vivado_stitch_proj_nfte0nh0"
    , key: "vivado_stitch_vlnv"
    value: "xilinx_finn:finn:finn_design:1.0"
    ]

 %% Cell type:code id: tags:

 ``` python
 model.get_metadata_prop("vivado_stitch_proj")
 ```

 %% Output

    '/tmp/finn_maltanar_22115/vivado_stitch_proj_nfte0nh0'

 %% Cell type:markdown id: tags:

 If you navigate to the folder above (remember the /tmp/finn_xxx folder is mounted on the host as well as inside Docker) you can open the Vivado project (.xpr) file there using Vivado, and view the following stitched IP block design:

 %% Cell type:markdown id: tags:

 ![](stitched_ip.png)

 %% Cell type:markdown id: tags:

 ### Inserting the IP into a PYNQ Shell <a id='pynq_shell'></a>

 We are almost done preparing our hardware design. To deploy our accelerator on a PYNQ platform, it needs to be put inside an appropriate *shell* that bridges it with the interfaces that the underlying system exposes. FINN makes it easy to create a PYNQ-compatible overlay by inserting the stitched IP into an appropriate PYNQ shell with the `MakePYNQProject` transformation, and view the created PYNQ shell project directory using the `metadata_props`. **This invokes Vivado and may take a few minutes to run.**

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.fpgadataflow.make_pynq_proj import MakePYNQProject

 model = model.transform(MakePYNQProject(pynq_board))
 model.model.metadata_props
 ```

 %% Output

    [key: "vivado_stitch_proj"
    value: "/tmp/finn_maltanar_22115/vivado_stitch_proj_nfte0nh0"
    , key: "vivado_stitch_vlnv"
    value: "xilinx_finn:finn:finn_design:1.0"
    , key: "vivado_pynq_proj"
    value: "/tmp/finn_maltanar_22115/vivado_pynq_proj_bj_z4tm0"
    ]

 %% Cell type:code id: tags:

 ``` python
 ! ls {model.get_metadata_prop("vivado_pynq_proj")}
 ```

 %% Output

    ip_config.tcl	 resizer.hw		resizer.srcs	  vivado.jou
    make_project.sh  resizer.ip_user_files	resizer.xpr	  vivado.log
    resizer.cache	 resizer.sim		synth_project.sh  vivado_pid24853.str

 %% Cell type:markdown id: tags:

 If we open the created Vivado project (.xpr) under the `vivado_pynq_proj` directory above, we can see the system-level block design as below, with the FINN-generated part of the design highlighted. Various other components, such as the DMA engine and data width converters, have also been instantiated.
 ![](pynq_shell_project.png)

 %% Cell type:markdown id: tags:

 ### Synthesis, place and route <a id='synth_pl_ro'></a>

 %% Cell type:markdown id: tags:

 We are now ready for the final hardware generation step, which is synthesis, place and route to generate an FPGA bitfile. This can be done by either running the `synth_project.sh` script in the generated Vivado PYNQ project directory inside Docker, or by executing the `SynthPYNQProject` transformation. **This step involves launching Vivado for synthesis and may take a few hours.**

 %% Cell type:code id: tags:

 ``` python
 model.save(build_dir + "/tfc_w1_a1_pre_synthesis.onnx")

 from finn.transformation.fpgadataflow.synth_pynq_proj import SynthPYNQProject

 model = model.transform(SynthPYNQProject())
 model.model.metadata_props
 ```

 %% Output

    [key: "vivado_stitch_proj"
    value: "/tmp/finn_maltanar_22115/vivado_stitch_proj_nfte0nh0"
    , key: "vivado_stitch_vlnv"
    value: "xilinx_finn:finn:finn_design:1.0"
    , key: "vivado_pynq_proj"
    value: "/tmp/finn_maltanar_22115/vivado_pynq_proj_bj_z4tm0"
    , key: "vivado_pynq_bitfile"
    value: "/tmp/finn_maltanar_22115/vivado_pynq_proj_bj_z4tm0/resizer.bit"
    ]

 %% Cell type:code id: tags:

 ``` python
 model.save(build_dir + "/tfc_w1_a1_post_synthesis.onnx")
 ```

 %% Cell type:markdown id: tags:

 ## 4. Hardware test <a id='hw_test'></a>

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.fpgadataflow.make_pynq_driver import MakePYNQDriver
 model = ModelWrapper(build_dir + "/tfc_w1_a1_post_synthesis.onnx")
 model = model.transform(MakePYNQDriver())
 ```

 %% Output

    [key: "vivado_stitch_proj"
    value: "/tmp/finn_maltanar_22115/vivado_stitch_proj_nfte0nh0"
    , key: "vivado_stitch_vlnv"
    value: "xilinx_finn:finn:finn_design:1.0"
    , key: "vivado_pynq_proj"
    value: "/tmp/finn_maltanar_22115/vivado_pynq_proj_bj_z4tm0"
    , key: "vivado_pynq_bitfile"
    value: "/tmp/finn_maltanar_22115/vivado_pynq_proj_bj_z4tm0/resizer.bit"
    , key: "pynq_driver_dir"
    value: "/tmp/finn_maltanar_22115/pynq_driver_63xiuej8"
    ]

 %% Cell type:markdown id: tags:

 ## 5. Simulation & Emulation flows for functional verification <a id='sim'></a>
 * [Simulation using Python](#simpy)
 * [Simulation (npysim) using C++](#npysim)
 * [Emulation (rtlsim) using PyVerilator](#rtlsim)

 %% Cell type:markdown id: tags:

 ### Simulation using Python <a id='simpy'></a>
 If an ONNX model consists of [standard ONNX nodes](https://github.com/onnx/onnx/blob/master/docs/Operators.md) and/or FINN custom operations that do not belong to the fpgadataflow (`backend` $\neq$ "fpgadataflow") this model can be checked for functionality using Python. General information about FINN custom op nodes can be found in Jupyter notebook [7-FINN-CustomOps](7-FINN-CustomOps.ipynb).

 %% Cell type:markdown id: tags:

 To simulate an ONNX node [ONNX Runtime](https://github.com/microsoft/onnxruntime) is used. ONNX Runtime is an open source tool developed by Microsoft to run standard ONNX nodes. For the FINN custom op nodes execution functions are defined. The following is an example of the execution function of a XNOR popcount node.

 %% Cell type:code id: tags:

 ``` python
 from finn.custom_op.xnorpopcount import xnorpopcountmatmul
 showSrc(xnorpopcountmatmul)
 ```

 %% Output

    def xnorpopcountmatmul(inp0, inp1):
        # extract the operand shapes
        (M, K0) = inp0.shape
        (K1, N) = inp1.shape
        # make sure shapes are compatible with matmul
        assert K0 == K1
        K = K0
        # we simulate XNOR-popcount matrix multiplication as a regular bipolar
        # matrix multiplication followed by some post processing
        # first, convert binary inputs to bipolar
        inp0_bipolar = 2.0 * inp0 - 1.0
        inp1_bipolar = 2.0 * inp1 - 1.0
        # call regular numpy matrix multiplication
        out = np.matmul(inp0_bipolar, inp1_bipolar)
        # XNOR-popcount does not produce the regular dot product result --
        # it returns the number of +1s after XNOR. let P be the number of +1s
        # and N be the number of -1s. XNOR-popcount returns P, whereas the
        # regular dot product result from numpy is P-N, so we need to apply
        # some correction.
        # out = P-N
        # K = P+N
        # out + K = 2P, so P = (out + K)/2
        return (out + K) * 0.5
    

 %% Cell type:markdown id: tags:

 The function contains a description of the behaviour in Python and can thus calculate the result of the node.

 %% Cell type:markdown id: tags:

 This execution function and onnxruntime is used when `execute_onnx` from [onnx_exec](https://github.com/Xilinx/finn/blob/dev/src/finn/core/onnx_exec.py) is applied to the model. The model is then simulated node by node and the result is stored in a context dictionary, which contains the values of each tensor at the end of the execution. To get the result, only the output tensor has to be extracted.

 The procedure is shown below. We take the model after the Brevitas export and the basic netwok transformations and generate an input tensor to pass to the execution function. The input tensor is generated from the Brevitas example inputs.

 %% Cell type:code id: tags:

 ``` python
 from pkgutil import get_data
 import onnx
 import onnx.numpy_helper as np_helper

 raw_i = get_data("finn", "data/onnx/mnist-conv/test_data_set_0/input_0.pb")
 input_tensor = onnx.load_tensor_from_string(raw_i)
 input_dict = {"global_in": np_helper.to_array(input_tensor)}

 model_for_sim = ModelWrapper(build_dir+"/tfc_w1_a1_after_brevitas_export.onnx")
 ```

 %% Cell type:code id: tags:

 ``` python
 import finn.core.onnx_exec as oxe
 output_dict = oxe.execute_onnx(model_for_sim, input_dict)
 pysim_output = output_dict[list(output_dict.keys())[0]]
 pysim_output
 ```

 %% Output

    array([[ 3.3252678 , -2.5652065 ,  9.215742  , -1.4251148 ,  1.4251148 ,
            -3.3727715 ,  0.28502294, -0.5700459 ,  7.07807   , -1.2826033 ]],
          dtype=float32)

 %% Cell type:markdown id: tags:

 This result can now be compared with a theoretical calculated value for verification. This is not done in this notebook, but there are some tests in the FINN repository that demonstrate such a procedure. They can be found [here](https://github.com/Xilinx/finn/tree/dev/tests).

 %% Cell type:markdown id: tags:

 ### Simulation (npysim) using C++ <a id='npysim'></a>
 When dealing with HLS custom op nodes in FINN the simulation using Python is no longer sufficient. After the nodes have been converted to HLS layers, the simulation using C++ can be used. To do this, the input tensor is stored in an .npy file and C++ code is generated that reads the values from the .npy array, streams them to the corresponding finn-hlslib function and writes the result to a new .npy file. This in turn can be read in Python and processed in the FINN flow. For this example the model after the conversion to HLS layers is used.

 %% Cell type:code id: tags:

 ``` python
 model_for_npysim = ModelWrapper(build_dir+"/tfc_w1_a1_after_conv_to_hls.onnx")
 ```

 %% Cell type:markdown id: tags:

 To generate the code for this simulation and to generate the executable two transformations are used:
 * `CodeGen_npysim` which generates the C++ code for the corresponding hls layer
 * `Compile` which compiles the C++ code and stores the path to the executable

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.fpgadataflow.codegen_npysim import CodeGen_npysim
 from finn.transformation.fpgadataflow.compile import Compile

 model_for_npysim = model_for_npysim.transform(CodeGen_npysim())
 model_for_npysim = model_for_npysim.transform(Compile())
 ```

 %% Cell type:markdown id: tags:

 When we take a look at the model using netron, we can see that the transformations introduced new attributes.

 %% Cell type:code id: tags:

 ``` python
 model_for_npysim.save(build_dir+"/tfc_w1_a1_after_conv_to_hls.onnx")
 netron.start(build_dir+"/tfc_w1_a1_after_conv_to_hls.onnx", port=8081, host="0.0.0.0")
 ```

 %% Output

    
    Stopping http://0.0.0.0:8081
    Serving 'lfc_w1_a1_after_conv_to_hls.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
 ```

 %% Output


 %% Cell type:markdown id: tags:

 The following node attributes have been added:
 * `code_gen_dir_npysim` indicates the directory where the files for the simulation using C++ are stored
 * `executable_path` specifies the path to the executable

 We take now a closer look into the files that were generated:

 %% Cell type:code id: tags:

 ``` python
 fc0 = model_for_npysim.graph.node[2]
 fc0w = StreamingFCLayer_Batch(fc0)
 code_gen_dir = fc0w.get_nodeattr("code_gen_dir_npysim")
 !ls {code_gen_dir}
 ```

 %% Output

    compile.sh  execute_StreamingFCLayer_Batch.cpp	node_model  params.h  thresh.h

 %% Cell type:markdown id: tags:

 Besides the .cpp file, the folder contains .h files with the weights and thresholds. The shell script contains the compile command and *node_model* is the executable generated by compilation. Comparing this with the `executable_path` node attribute, it can be seen that it specifies exactly the path to *node_model*.

 To simulate the model the *simulation mode*(`sim_mode`) must be set to "npysim". This is done using the transformation `SetSimMode`.

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.fpgadataflow.set_sim_mode import SetSimMode

 model_for_npysim = model_for_npysim.transform(SetSimMode("npysim"))
 ```

 %% Cell type:markdown id: tags:

 Now the model can be executed using `execute_onnx`. The function reads the `sim_mode` and writes the input into the correct directory in a .npy file. To be able to read this in C++, there is an additional .hpp file ([npy2apintstream.hpp](https://github.com/Xilinx/finn/blob/dev/src/finn/data/cpp/npy2apintstream.hpp)) in FINN, which uses [cnpy](https://github.com/rogersce/cnpy) to read .npy files and convert them into streams, or to read a stream and write it into an .npy. cnpy is a helper to read and write .npy and .npz formates in C++.

 %% Cell type:code id: tags:

 ``` python
 output_dict = oxe.execute_onnx(model_for_npysim, input_dict)
 npysim_output = output_dict[list(output_dict.keys())[0]]
 npysim_output
 ```

 %% Output

    array([[ 3.3252678 , -2.5652065 ,  9.215742  , -1.4251148 ,  1.4251148 ,
            -3.3727715 ,  0.28502294, -0.5700459 ,  7.07807   , -1.2826033 ]],
          dtype=float32)

 %% Cell type:markdown id: tags:

 This result can now be compared with the previous simulation using Python.

 %% Cell type:code id: tags:

 ``` python
 if (pysim_output == npysim_output).all():
    print("Results are the same")
 else:
    raise Exception
 ```

 %% Output

    Results are the same

 %% Cell type:markdown id: tags:

 ### Emulation (rtlsim) using Pyverilator <a id='rtlsim'></a>
 The emulation using [PyVerilator](https://github.com/maltanar/pyverilator) can be done after IP blocks are generated from the corresponding HLS layers. Pyverilator is a tool which makes it possible to simulate verilog files using verilator via a python interface.

 In the first step we load the model after IP blocks have been created from the layers and set `sim_mode` to "rtlsim".

 %% Cell type:code id: tags:

 ``` python
 model_for_rtlsim = ModelWrapper(build_dir+"/tfc_w1_a1_after_hls_ip_per_layer.onnx")

 model_for_rtlsim = model_for_rtlsim.transform(SetSimMode("rtlsim"))
 ```

 %% Cell type:markdown id: tags:

 Because the necessary files for the emulation are already generated in section [](), in the next step the execution of the model can be done directly.

 %% Cell type:code id: tags:

 ``` python
 model_for_rtlsim.save(build_dir+"/tfc_w1_a1_after_hls_ip_per_layer.onnx")
 netron.start(build_dir+"/tfc_w1_a1_after_hls_ip_per_layer.onnx", port=8081, host="0.0.0.0")
 ```

 %% Output

    
    Stopping http://0.0.0.0:8081
    Serving 'lfc_w1_a1_after_hls_ip_per_layer.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
 ```

 %% Output


 %% Cell type:code id: tags:

 ``` python
 output_dict = oxe.execute_onnx(model_for_rtlsim, input_dict)
 rtlsim_output = output_dict[list(output_dict.keys())[0]]
 rtlsim_output
 ```

 %% Output

    MultiThreshold
    StreamingFCLayer_Batch

    ---------------------------------------------------------------------------
    Exception                                 Traceback (most recent call last)
    <ipython-input-12-96db733648ff> in <module>
    ----> 1 output_dict = oxe.execute_onnx(model_for_rtlsim, input_dict)
          2 rtlsim_output = output_dict[list(output_dict.keys())[0]]
          3 rtlsim_output
    /workspace/finn/src/finn/core/onnx_exec.py in execute_onnx(model, input_dict, return_full_exec_context)
        118     # topologically sorted
        119     for node in graph.node:
    --> 120         execute_node(node, execution_context, graph)
        121     if return_full_exec_context:
        122         return execution_context
    /workspace/finn/src/finn/core/onnx_exec.py in execute_node(node, context, graph)
         41     if node.domain == "finn":
         42
    ---> 43         ex_cu_node.execute_custom_node(node, context, graph)
         44
         45     else:
    /workspace/finn/src/finn/core/execute_custom_node.py in execute_custom_node(node, context, graph)
         10         inst = registry.custom_op[op_type](node)
         11         print(op_type)
    ---> 12         inst.execute_node(context, graph)
         13     except KeyError:
         14         # exception if op_type is not supported
    /workspace/finn/src/finn/custom_op/fpgadataflow/streamingfclayer_batch.py in execute_node(self, context, graph)
        496                 raise Exception(
        497                     """Found no verilog files for this node,
    --> 498                     did you run the codegen_ipgen transformation?"""
        499                 )
        500
    Exception: Found no verilog files for this node,
                    did you run the codegen_ipgen transformation?

 %% Cell type:code id: tags:

 ``` python
 ```