[Notebook] Added section about simulation using python

a1ec991d · auphelia · ab8d5878 · a1ec991d
Commit a1ec991d authored 5 years ago by auphelia
--- a/notebooks/9-FINN-EndToEndFlow.ipynb
+++ b/notebooks/9-FINN-EndToEndFlow.ipynb
@@ -82,7 +82,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
@@ -164,7 +164,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
@@ -233,7 +233,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 49,
   "metadata": {},
   "outputs": [],
   "source": [
@@ -246,7 +246,11 @@
    "model = model.transform(FoldConstants())\n",
    "model = model.transform(GiveUniqueNodeNames())\n",
    "model = model.transform(GiveReadableTensorNames())\n",
-    "model = model.transform(InferDataTypes())"
+    "model = model.transform(InferDataTypes())\n",
+    "\n",
+    "\n",
+    "# save model with other name for section \"Simulation using Python\"\n",
+    "model.save(\"lfc_w1_a1_after_brevitas_export.onnx\")"
   ]
  },
  {
@@ -792,7 +796,8 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Creation of stitched design <a id='stitched_design'></a>"
+    "### Creation of stitched design <a id='stitched_design'></a>\n",
+    "The goal of this transformation is the creation of a Vivado IP Block Design project from all the generated IPs of a model. All nodes in the model must have the fpgadataflow backend attribute, and the CodeGen_ipgen transformation must have been previously run on the model. The resulting block design is also packaged as IP."
   ]
  },
  {
@@ -830,7 +835,120 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Simulation using Python <a id='simpy'></a>"
+    "### Simulation using Python <a id='simpy'></a>\n",
+    "If an ONNX model consists of [standard ONNX nodes](https://github.com/onnx/onnx/blob/master/docs/Operators.md) and/or FINN custom operations that do not belong to the fpgadataflow (`backend` $\\neq$ \"fpgadataflow\") this model can be checked for functionality using Python. General information about FINN custom op nodes can be found in Jupyter notebook [7-FINN-CustomOps](7-FINN-CustomOps.ipynb)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To simulate an ONNX node [ONNX Runtime](https://github.com/microsoft/onnxruntime) is used. ONNX Runtime is an open source tool developed by Microsoft to run standard ONNX nodes. For the FINN custom op nodes execution functions are defined. The following is an example of the execution function of a XNOR popcount node."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 45,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "def xnorpopcountmatmul(inp0, inp1):\n",
+      "    # extract the operand shapes\n",
+      "    (M, K0) = inp0.shape\n",
+      "    (K1, N) = inp1.shape\n",
+      "    # make sure shapes are compatible with matmul\n",
+      "    assert K0 == K1\n",
+      "    K = K0\n",
+      "    # we simulate XNOR-popcount matrix multiplication as a regular bipolar\n",
+      "    # matrix multiplication followed by some post processing\n",
+      "    # first, convert binary inputs to bipolar\n",
+      "    inp0_bipolar = 2.0 * inp0 - 1.0\n",
+      "    inp1_bipolar = 2.0 * inp1 - 1.0\n",
+      "    # call regular numpy matrix multiplication\n",
+      "    out = np.matmul(inp0_bipolar, inp1_bipolar)\n",
+      "    # XNOR-popcount does not produce the regular dot product result --\n",
+      "    # it returns the number of +1s after XNOR. let P be the number of +1s\n",
+      "    # and N be the number of -1s. XNOR-popcount returns P, whereas the\n",
+      "    # regular dot product result from numpy is P-N, so we need to apply\n",
+      "    # some correction.\n",
+      "    # out = P-N\n",
+      "    # K = P+N\n",
+      "    # out + K = 2P, so P = (out + K)/2\n",
+      "    return (out + K) * 0.5\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "from finn.custom_op.xnorpopcount import xnorpopcountmatmul\n",
+    "showSrc(xnorpopcountmatmul)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The function contains a description of the behaviour in Python and can thus calculate the result of the node."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This execution function and onnxruntime is used when `execute_onnx` from [onnx_exec](https://github.com/Xilinx/finn/blob/dev/src/finn/core/onnx_exec.py) is applied to the model. The model is then simulated node by node and the result is stored in a context dictionary, which contains the values of each tensor at the end of the execution. To get the result, only the output tensor has to be extracted.\n",
+    "\n",
+    "The procedure is shown below. We take the model after the Brevitas export and the basic netwok transformations and generate an input tensor to pass to the execution function. The input tensor is generated from the Brevitas example inputs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pkgutil import get_data\n",
+    "import onnx\n",
+    "import onnx.numpy_helper as np_helper\n",
+    "\n",
+    "raw_i = get_data(\"finn\", \"data/onnx/mnist-conv/test_data_set_0/input_0.pb\")\n",
+    "input_tensor = onnx.load_tensor_from_string(raw_i)\n",
+    "input_dict = {\"global_in\": np_helper.to_array(input_tensor)}\n",
+    "\n",
+    "model_for_sim = ModelWrapper(\"lfc_w1_a1_after_brevitas_export.onnx\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'global_out': array([[ 3.3252678 , -2.5652065 ,  9.215742  , -1.4251148 ,  1.4251148 ,\n",
+       "         -3.3727715 ,  0.28502294, -0.5700459 ,  7.07807   , -1.2826033 ]],\n",
+       "       dtype=float32)}"
+      ]
+     },
+     "execution_count": 54,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import finn.core.onnx_exec as oxe\n",
+    "output_dict = oxe.execute_onnx(model, input_dict)\n",
+    "output_dict"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This result can now be compared with a theoretical value for verification. This is not done in this notebook, but there are some tests in the FINN repository that demonstrate such a procedure. They can be found [here](https://github.com/Xilinx/finn/tree/dev/tests)."
   ]
  },
  {

 %% Cell type:markdown id: tags:

 # FINN - End-to-End Flow
 -----------------------------------------------------------------
 This notebook gives an overview about the end to end flow of FINN. From loading an ONNX model from Brevitas, followed by the numerous transformations in FINN and up to the generation of a bitstream that can be used to load an FPGA.

 We'll use the following showSrc function to print the source code for function calls in the Jupyter notebook.

 %% Cell type:code id: tags:

 ``` python
 import inspect

 def showSrc(what):
    print("".join(inspect.getsourcelines(what)[0]))
 ```

 %% Cell type:markdown id: tags:

 ## Overview
 The notebook is based on the following diagram.

 %% Cell type:markdown id: tags:

 ![](finn-design-flow-example.svg)

 %% Cell type:markdown id: tags:

 The diagram visualizes the end-to-end flow of FINN. The cylinder-like fields show the state of the network representation in the respective step. The rectangular fields represent the transformations that are applied to the network to achieve a certain result. The diagram is divided into 5 blocks, each of it includes several flow steps. The flow starts in top left corner with Brevitas export (pink block), followed by the preparation of the network (grey block) for the Vivado HLS and Vivado synthesis (yellow block). There is also a section for testing and verification in software (green block) and the hardware test on the PYNQ board (red block).
 The diagram leads to the following outline for this Jupyter notebook.

 %% Cell type:markdown id: tags:

 ## Outline
 -------------
 1. [Brevitas export](#brev_exp)
 2. [Network preparation](#nw_prep)
    * Basic transformations
    * Streamlining
    * Conversion to HLS layers
    * Folding
 3. [Vivado HLS and Vivado synthesis](#vivado)
    * HLS IP per layer
    * Creation of stitched design
    * PYNQ shell project
    * Synthesis, place and route
 4. [Hardware Test](#hw_test)
 5. [Simulation & Emulation flows for functional verification](#sim)
    * Simulation using Python
    * Simulation (npysim) using C++
    * Emulation (rtlsim) using PyVerilator

 %% Cell type:markdown id: tags:

 ## 1. Brevitas export <a id='brev_exp'></a>
 FINN expects an ONNX model as input. This can be a model trained with [Brevitas](https://github.com/Xilinx/brevitas). Brevitas is a Pytorch library for quantization-aware training and the FINN Docker image comes with several [example Brevitas networks](https://github.com/maltanar/brevitas_cnv_lfc). To show the FINN end-to-end flow, we'll use the LFC-w1a1 model as example network. The Brevitas export is only briefly described here, for details see Jupyter notebook [3-FINN-Brevitas-network-import](3-FINN-Brevitas-network-import.ipynb).

 First a few things have to be imported. Then the model can be loaded with the pretrained weights.

 %% Cell type:code id: tags:

 ``` python
 import torch
 import brevitas.onnx as bo
 from models.LFC import LFC

 lfc = LFC(weight_bit_width=1, act_bit_width=1, in_bit_width=1)
 trained_lfc_checkpoint = ("/workspace/brevitas_cnv_lfc/pretrained_models/LFC_1W1A/checkpoints/best.tar")
 checkpoint = torch.load(trained_lfc_checkpoint, map_location="cpu")
 lfc.load_state_dict(checkpoint["state_dict"])
 bo.export_finn_onnx(lfc, (1, 1, 28, 28), "lfc_w1_a1.onnx")
 ```

 %% Output

    /workspace/brevitas_cnv_lfc/training_scripts/models/LFC.py:73: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
      x = 2.0 * x - torch.tensor([1.0])

 %% Cell type:markdown id: tags:

 The model was now exported, loaded with the pretrained weights and saved under the name "lfc_w1_a1.onnx".
 To visualize the exported model, Netron can be used. Netron is a visualizer for neural networks and allows interactive investigation of network properties. For example, you can click on the individual nodes and view the properties.

 %% Cell type:code id: tags:

 ``` python
 import netron
 netron.start("lfc_w1_a1.onnx", port=8081, host="0.0.0.0")
 ```

 %% Output

    Serving 'lfc_w1_a1.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
 ```

 %% Output


 %% Cell type:markdown id: tags:

 Now that we have the model in .onnx format, we can work with it using FINN. For that FINN `ModelWrapper` is used. It is a wrapper around the ONNX model which provides several helper functions to make it easier to work with the model. For details see Jupyter notebook [2-FINN-ModelWrapper](2-FINN-ModelWrapper.ipynb).

 %% Cell type:code id: tags:

 ``` python
 from finn.core.modelwrapper import ModelWrapper
 model = ModelWrapper("lfc_w1_a1.onnx")
 ```

 %% Cell type:markdown id: tags:

 Now the model is prepared and could be simulated using Python. How this works is described in subsection [Simulation using Python](#simpy) in the section about *Simulation & Emulation flows for functional verification*.

 The model can now also be processed in different ways. The principle of FINN are analysis and transformation passes, which can be applied to the model. An analysis pass extracts specific information about the model and returns it to the user in the form of a dictionary. For more details see [4-FINN-HowToAnalysisPass](4-FINN-HowToAnalysisPass.ipynb). A transformation pass changes the model and returns the changed model back to the FINN flow, for more information about transformation passes see notebook [5-FINN-HowToTransformationPass](5-FINN-HowToTransformationPass.ipynb).

 Since the goal in this notebook is to process the model to such an extent that a bitstream can be generated from it, the focus is on the transformations that are necessary for this. In the next section these are discussed in more detail.

 %% Cell type:markdown id: tags:

 ## 2. Network preparation <a id='nw_prep'></a>

 * [Basic transformations](#basic_trafo)
 * [Streamlining](#streamline)
 * [Conversion to HLS layers](#hls_layers)
 * [Folding](#folding)

 %% Cell type:markdown id: tags:

 ### Basic transformations <a id='basic_trafo'></a>
 This section deals with the basic transformations, which are applied to the model like a kind of clean up. They do not appear in the diagram above, but they are applied in many steps in the FINN flow to postprocess the model after a transformation and/or to prepare it for the next transformation.

 %% Cell type:markdown id: tags:

 The basis transformations are:
 * GiveUniqueNodeNames
 * GiveReadableTensorNames
 * InferShapes
 * InferDataTypes
 * FoldConstants

 %% Cell type:markdown id: tags:

 These transformations work like a clean up. In the first two transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames`) the nodes and tensors are given unique and readable names. The following two transformations (`InferShapes`, `InferDataTypes`) derive the shapes and data types of the tensors from the model properties and set them in the `ValueInfo` of the model. Normally these transformations can always be applied and do not affect the structure of the graph, but ensure that all the information needed is available.

 The last listed transformation is `FoldConstants`. It identifies a node with constant inputs and determines its output. The result is then set as const-only inputs for the following node and the old node is removed. Although this transformation changes the structure of the model, it is a transformation that is usually always desired and can be applied to any model.

 %% Cell type:markdown id: tags:

 The transformations can be imported and applied as follows.

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames
 from finn.transformation.infer_shapes import InferShapes
 from finn.transformation.infer_datatypes import InferDataTypes
 from finn.transformation.fold_constants import FoldConstants

 model = model.transform(InferShapes())
 model = model.transform(FoldConstants())
 model = model.transform(GiveUniqueNodeNames())
 model = model.transform(GiveReadableTensorNames())
 model = model.transform(InferDataTypes())
+
+
+# save model with other name for section "Simulation using Python"
+model.save("lfc_w1_a1_after_brevitas_export.onnx")
 ```

 %% Cell type:markdown id: tags:

 The result of these transformations can be viewed with netron after the model has been saved again. By clicking on the individual nodes, it can now be seen, for example, that each node has been given a name. Also the whole upper area could be folded, so that now the first node is "Reshape".

 %% Cell type:code id: tags:

 ``` python
 model.save("lfc_w1_a1.onnx")
 netron.start("lfc_w1_a1.onnx", port=8081, host="0.0.0.0")
 ```

 %% Output

    
    Stopping http://0.0.0.0:8081
    Serving 'lfc_w1_a1.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
 ```

 %% Output


 %% Cell type:markdown id: tags:

 ### Streamlining <a id='streamline'></a>
 Streamlining is a transformation containing several sub-transformations. The goal of streamlining is to eliminate floating point operations by moving them around, then collapsing them into one operation and in the last step transform them into multithresholding nodes. For the theoretical background see arXiv:1709.04060.

 In the following the streamlining transformation is shown.

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.streamline import Streamline
 showSrc(Streamline)
 ```

 %% Output

    class Streamline(Transformation):
        """Apply the streamlining transform, see arXiv:1709.04060."""
    
        def apply(self, model):
            streamline_transformations = [
                ConvertSubToAdd(),
                BatchNormToAffine(),
                ConvertSignToThres(),
                MoveScalarAddPastMatMul(),
                MoveScalarMulPastMatMul(),
                MoveAddPastMul(),
                CollapseRepeatedAdd(),
                CollapseRepeatedMul(),
                AbsorbAddIntoMultiThreshold(),
                FactorOutMulSignMagnitude(),
                AbsorbMulIntoMultiThreshold(),
                Absorb1BitMulIntoMatMul(),
                RoundAndClipThresholds(),
            ]
            for trn in streamline_transformations:
                model = model.transform(trn)
                model = model.transform(GiveUniqueNodeNames())
                model = model.transform(GiveReadableTensorNames())
                model = model.transform(InferDataTypes())
            return (model, False)
    

 %% Cell type:markdown id: tags:

 As can be seen, several transformations are involved in the streamlining transformation. There are move and collapse transformations. In the last step the operations are transformed into multithresholds. The involved transformations can be viewed in detail [here](https://github.com/Xilinx/finn/tree/dev/src/finn/transformation/streamline). After each transformation, three of the basic transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames` and `InferDataTypes`) are applied to the model as clean up.

 After streamlining the network looks as follows.

 %% Cell type:code id: tags:

 ``` python
 model = model.transform(Streamline())
 model.save("lfc_w1_a1.onnx")
 netron.start("lfc_w1_a1.onnx", port=8081, host="0.0.0.0")
 ```

 %% Output

    
    Stopping http://0.0.0.0:8081
    Serving 'lfc_w1_a1.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
 ```

 %% Output


 %% Cell type:markdown id: tags:

 Our example network is a quantized network with 1 bit precision. For this reason, after streamlining, the resulting bipolar matrix multiplications are converted into xnorpopcount operations. This transformation produces operations that are again collapsed and converted into thresholds. This procedure is shown below.
 In this state the model can still be simulated with Python, even if it no longer contains only standard ONNX nodes. For details, see section [Simulation using Python](#simpy).

 After these finishing transformations, the nodes can be converted to HLS layers for further processing.

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.bipolar_to_xnor import ConvertBipolarMatMulToXnorPopcount
 import finn.transformation.streamline.absorb as absorb
 from finn.transformation.streamline.round_thresholds import RoundAndClipThresholds

 model = model.transform(ConvertBipolarMatMulToXnorPopcount())
 model = model.transform(absorb.AbsorbAddIntoMultiThreshold())
 model = model.transform(absorb.AbsorbMulIntoMultiThreshold())
 model = model.transform(RoundAndClipThresholds())
 ```

 %% Cell type:markdown id: tags:

 ### Conversion to HLS layers <a id='hls_layers'></a>
 Converts the nodes to HLS layers that correspond to the functions in [finn-hls library](https://finn-hlslib.readthedocs.io/en/latest/). In our case this transformation onverts pairs of binary XnorPopcountMatMul layers to StreamingFCLayer_Batch layers. Any immediately following MultiThreshold layers will also be absorbed into the MVTU.

 Below is the code for the transformation and the network is visualized using netron to create the new structure which now corresponds to the finn-hls library.

 %% Cell type:code id: tags:

 ``` python
 import finn.transformation.fpgadataflow.convert_to_hls_layers as to_hls
 model = model.transform(to_hls.InferBinaryStreamingFCLayer())
 model.save("lfc_w1_a1.onnx")
 netron.start("lfc_w1_a1.onnx", port=8081, host="0.0.0.0")
 ```

 %% Output

    
    Stopping http://0.0.0.0:8081
    Serving 'lfc_w1_a1.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
 ```

 %% Output


 %% Cell type:markdown id: tags:

 Each StreamingFCLayer_Batch node has two attributes that specify the degree of folding, PE and SIMD. In all nodes the values for these attributes are set as default to 1, which would correspond to a maximum folding. The user can now adjust the folding as desired. This is described in the next section.

 At this point the model can also be simulated using C++. The exact procedure is described in section [Simulation using C++](#npysim).

 %% Cell type:markdown id: tags:

 ### Folding <a id='folding'></a>
 Since the folding parameters are node attributes, they can be easily accessed and changed using a helper function of the `ModelWrapper`. But first we have to extract the nodes which are StreamingFCLayer_Batch operations. This is where netron helps us, in the above diagram we can see that the third to sixth nodes are StreamingFCLayer_Batch. Through the `print`s we can check if the extracted nodes all have the op_type "StreamingFCLayer_Batch". For more details on how to use ONNX model, see Jupyter notebook [1-FINN-HowToWorkWithONNX](1-FINN-HowToWorkWithONNX.ipynb).

 %% Cell type:code id: tags:

 ``` python
 fc0 = model.graph.node[2]
 fc1 = model.graph.node[3]
 fc2 = model.graph.node[4]
 fc3 = model.graph.node[5]
 print("fc0 has the op_type: " + str(fc0.op_type))
 print("fc1 has the op_type: " + str(fc1.op_type))
 print("fc2 has the op_type: " + str(fc2.op_type))
 print("fc3 has the op_type: " + str(fc3.op_type))
 ```

 %% Output

    fc0 has the op_type: StreamingFCLayer_Batch
    fc1 has the op_type: StreamingFCLayer_Batch
    fc2 has the op_type: StreamingFCLayer_Batch
    fc3 has the op_type: StreamingFCLayer_Batch

 %% Cell type:markdown id: tags:

 Now we can use the [HLSCustomOp](https://github.com/Xilinx/finn/blob/dev/src/finn/custom_op/fpgadataflow/__init__.py) class to create a [StreamingFCLayer_Batch](https://github.com/Xilinx/finn/blob/dev/src/finn/custom_op/fpgadataflow/streamingfclayer_batch.py) object for each node to set PE and SIMD. This procedure is identical for each node. For more details about custom ops, see Jupyter notebook [7-FINN-CustomOps](7-FINN-CustomOps.ipynb).

 %% Cell type:code id: tags:

 ``` python
 from finn.custom_op.fpgadataflow.streamingfclayer_batch import StreamingFCLayer_Batch

 fc0w = StreamingFCLayer_Batch(fc0)
 fc0w.set_nodeattr("SIMD", 784)
 fc0w.set_nodeattr("PE", 32)

 fc1w = StreamingFCLayer_Batch(fc1)
 fc1w.set_nodeattr("SIMD", 1024)
 fc1w.set_nodeattr("PE", 32)

 fc2w = StreamingFCLayer_Batch(fc2)
 fc2w.set_nodeattr("SIMD", 1024)
 fc2w.set_nodeattr("PE", 32)

 fc3w = StreamingFCLayer_Batch(fc3)
 fc3w.set_nodeattr("SIMD", 1024)
 fc3w.set_nodeattr("PE", 10)
 ```

 %% Cell type:markdown id: tags:

 This completes the network preparation and the network can be passed on to the next block *Vivado HLS and Vivado synthesis*. Which is described below.

 %% Cell type:markdown id: tags:

 ## 3. Vivado HLS and Vivado synthesis <a id='vivado'></a>
 * [HLS IP per layer](#hls_per_layer)
 * [Creation of stitched design](#stitched_design)
 * [PYNQ shell project](#pynq_shell)
 * [Synthesis, place and route](#synth_pl_ro)

 %% Cell type:markdown id: tags:

 ### HLS IP per layer <a id='hls_per_layer'></a>
 This section deals with the generation of an IP block from the different layers. These can then be stitched to a block design that corresponds to the complete model. The single conversion into IP blocks allows a good transparency and we can check the functionality of each IP block and compare it with the behaviour of the corresponding ONNX node. The emulation of such an IP block is performed with PyVerilator and is described in detail in section [Emulation (rtlsim) using Pyverilator](#rtlsim).

 %% Cell type:markdown id: tags:

 Two transformations are required to generate HLS IP blocks for each layer:
 * `CodeGen_ipgen` which generates the C++ code for the node and a tcl-script which starts the HLS synthesis and exports the design as IP.
 * `HLSSynth_IPGen` which passes the tcl-script to Vivado and thus performs the actual IP generation.

 First the basic transformation `GiveUniqueNodeNames` is applied and then the two transformations necessary for the IP block creation are performed. This will take some time as Vivado is called and for each StreamingFCLayer_Batch node in the design HLS synthesis is performed and the node is exported as IP block. `CodeGen_ipgen` gets as arguments an FPGA part as string and the clock in ns as integer.

 %% Cell type:code id: tags:

 ``` python
 model = model.transform(GiveUniqueNodeNames())

 from finn.transformation.fpgadataflow.codegen_ipgen import CodeGen_ipgen
 from finn.transformation.fpgadataflow.hlssynth_ipgen import HLSSynth_IPGen

 model = model.transform(CodeGen_ipgen("xc7z020clg400-1", 5))
 model = model.transform(HLSSynth_IPGen())
 ```

 %% Cell type:markdown id: tags:

 Each StreamingFCLayer_Batch node now has new attributes which can be examined more closely with netron.

 %% Cell type:code id: tags:

 ``` python
 model.save("lfc_w1_a1.onnx")
 netron.start("lfc_w1_a1.onnx", port=8081, host="0.0.0.0")
 ```

 %% Output

    
    Stopping http://0.0.0.0:8081
    Serving 'lfc_w1_a1.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
 ```

 %% Output


 %% Cell type:markdown id: tags:

 There are two additional attributes:
 * `code_gen_dir_ipgen` which contains the directory path where all the files generated by the ipgen transformations are stored
 * `ipgen_path` which contains the path to the project directory in which the generated IP block is stored

 We can further investigate which files are produced by taking a look in this directory. For example for the first StreamingFCLayer_Batch node.

 %% Cell type:code id: tags:

 ``` python
 fc0 = model.graph.node[2]
 fc0w = StreamingFCLayer_Batch(fc0)
 code_gen_dir = fc0w.get_nodeattr("code_gen_dir_ipgen")
 !ls {code_gen_dir}
 ```

 %% Output

    hls_syn_StreamingFCLayer_Batch_0.tcl  thresh.h
    ipgen.sh			      top_StreamingFCLayer_Batch_0.cpp
    params.h			      vivado_hls.log
    project_StreamingFCLayer_Batch_0

 %% Cell type:markdown id: tags:

 Directory *project_StreamingFCLayer_Batch_0* contains the project created by Vivado HLS into which the IP Block is exported, along with other files generated by Vivado HLS. If we compare it to the above visualization of the network with netron, this is exactly the name of the folder stored in the node attribute `ipgen_path`. The .cpp code that is passed to Vivado HLS can be found in the file *top_StreamingFCLayer_Batch_0.cpp*. The files *params.h* and *thresh.h* belong to that as well, they contain the values for the weights and thresholds. *vivado_hls.log* is the log file from Vivado HLS. Besides these files, the folder contains *ipgen.sh* and *hls_syn_StreamingFCLayer_Batch_0.tcl*. First we take a look at *ipgen.sh*.

 %% Cell type:code id: tags:

 ``` python
 shell_script = code_gen_dir + "/ipgen.sh"
 !cat {shell_script}
 ```

 %% Output

    #!/bin/bash
    cd /tmp/code_gen_ipgen_StreamingFCLayer_Batch_hkd3xuxq
    vivado_hls /tmp/code_gen_ipgen_StreamingFCLayer_Batch_hkd3xuxq/hls_syn_StreamingFCLayer_Batch_0.tcl
    cd /workspace/finn

 %% Cell type:markdown id: tags:

 The script consists only of two framing `cd` commands and a command to pass the tcl script to *vivado_hls*. The directory has to be changed to create the files in the correct folder and will then be changed back to the original directory.

 Below is the tcl script which is passed to *vivado_hls*.

 %% Cell type:code id: tags:

 ``` python
 tcl_script = code_gen_dir + "/hls_syn_StreamingFCLayer_Batch_0.tcl"
 !cat {tcl_script}
 ```

 %% Output

    
    set config_proj_name project_StreamingFCLayer_Batch_0
    puts "HLS project: $config_proj_name"
    set config_hwsrcdir "/tmp/code_gen_ipgen_StreamingFCLayer_Batch_hkd3xuxq"
    puts "HW source dir: $config_hwsrcdir"
    set config_proj_part "xc7z020clg400-1"
    
    set config_bnnlibdir "/workspace/finn-hlslib"
    
    set config_toplevelfxn "StreamingFCLayer_Batch_0"
    set config_clkperiod 5
    
    open_project $config_proj_name
    add_files $config_hwsrcdir/top_StreamingFCLayer_Batch_0.cpp -cflags "-std=c++0x -I$config_bnnlibdir"
    
    set_top $config_toplevelfxn
    open_solution sol1
    set_part $config_proj_part
    
    config_interface -m_axi_addr64
    
    create_clock -period $config_clkperiod -name default
    csynth_design
    export_design -format ip_catalog
    exit 0

 %% Cell type:markdown id: tags:

 In the first part of the script the project is configured. For example the FPGA part and the clock are set. Then the project is opened and the files are added. The toplevel function is set and after creating a clock, the design is first synthesized with `csynth` and then exported as an IP block.

 Now that all IP blocks are in place, they can be stitched together to create an IP design that matches the ONNX model. This is covered in the next section.

 %% Cell type:markdown id: tags:

 ### Creation of stitched design <a id='stitched_design'></a>
+The goal of this transformation is the creation of a Vivado IP Block Design project from all the generated IPs of a model. All nodes in the model must have the fpgadataflow backend attribute, and the CodeGen_ipgen transformation must have been previously run on the model. The resulting block design is also packaged as IP.

 %% Cell type:markdown id: tags:

 ### PYNQ shell project <a id='pynq_shell'></a>

 %% Cell type:markdown id: tags:

 ### Synthesis, place and route <a id='synth_pl_ro'></a>

 %% Cell type:markdown id: tags:

 ## 4. Hardware test <a id='hw_test'></a>

 %% Cell type:markdown id: tags:

 ## 5. Simulation & Emulation flows for functional verification <a id='sim'></a>
 * [Simulation using Python](#simpy)
 * [Simulation (npysim) using C++](#npysim)
 * [Emulation (rtlsim) using PyVerilator](#rtlsim)

 %% Cell type:markdown id: tags:

 ### Simulation using Python <a id='simpy'></a>
+If an ONNX model consists of [standard ONNX nodes](https://github.com/onnx/onnx/blob/master/docs/Operators.md) and/or FINN custom operations that do not belong to the fpgadataflow (`backend` $\neq$ "fpgadataflow") this model can be checked for functionality using Python. General information about FINN custom op nodes can be found in Jupyter notebook [7-FINN-CustomOps](7-FINN-CustomOps.ipynb).
+
+%% Cell type:markdown id: tags:
+
+To simulate an ONNX node [ONNX Runtime](https://github.com/microsoft/onnxruntime) is used. ONNX Runtime is an open source tool developed by Microsoft to run standard ONNX nodes. For the FINN custom op nodes execution functions are defined. The following is an example of the execution function of a XNOR popcount node.
+
+%% Cell type:code id: tags:
+
+``` python
+from finn.custom_op.xnorpopcount import xnorpopcountmatmul
+showSrc(xnorpopcountmatmul)
+```
+
+%% Output
+
+    def xnorpopcountmatmul(inp0, inp1):
+        # extract the operand shapes
+        (M, K0) = inp0.shape
+        (K1, N) = inp1.shape
+        # make sure shapes are compatible with matmul
+        assert K0 == K1
+        K = K0
+        # we simulate XNOR-popcount matrix multiplication as a regular bipolar
+        # matrix multiplication followed by some post processing
+        # first, convert binary inputs to bipolar
+        inp0_bipolar = 2.0 * inp0 - 1.0
+        inp1_bipolar = 2.0 * inp1 - 1.0
+        # call regular numpy matrix multiplication
+        out = np.matmul(inp0_bipolar, inp1_bipolar)
+        # XNOR-popcount does not produce the regular dot product result --
+        # it returns the number of +1s after XNOR. let P be the number of +1s
+        # and N be the number of -1s. XNOR-popcount returns P, whereas the
+        # regular dot product result from numpy is P-N, so we need to apply
+        # some correction.
+        # out = P-N
+        # K = P+N
+        # out + K = 2P, so P = (out + K)/2
+        return (out + K) * 0.5
+    
+
+%% Cell type:markdown id: tags:
+
+The function contains a description of the behaviour in Python and can thus calculate the result of the node.
+
+%% Cell type:markdown id: tags:
+
+This execution function and onnxruntime is used when `execute_onnx` from [onnx_exec](https://github.com/Xilinx/finn/blob/dev/src/finn/core/onnx_exec.py) is applied to the model. The model is then simulated node by node and the result is stored in a context dictionary, which contains the values of each tensor at the end of the execution. To get the result, only the output tensor has to be extracted.
+
+The procedure is shown below. We take the model after the Brevitas export and the basic netwok transformations and generate an input tensor to pass to the execution function. The input tensor is generated from the Brevitas example inputs.
+
+%% Cell type:code id: tags:
+
+``` python
+from pkgutil import get_data
+import onnx
+import onnx.numpy_helper as np_helper
+
+raw_i = get_data("finn", "data/onnx/mnist-conv/test_data_set_0/input_0.pb")
+input_tensor = onnx.load_tensor_from_string(raw_i)
+input_dict = {"global_in": np_helper.to_array(input_tensor)}
+
+model_for_sim = ModelWrapper("lfc_w1_a1_after_brevitas_export.onnx")
+```
+
+%% Cell type:code id: tags:
+
+``` python
+import finn.core.onnx_exec as oxe
+output_dict = oxe.execute_onnx(model, input_dict)
+output_dict
+```
+
+%% Output
+
+    {'global_out': array([[ 3.3252678 , -2.5652065 ,  9.215742  , -1.4251148 ,  1.4251148 ,
+             -3.3727715 ,  0.28502294, -0.5700459 ,  7.07807   , -1.2826033 ]],
+           dtype=float32)}
+
+%% Cell type:markdown id: tags:
+
+This result can now be compared with a theoretical value for verification. This is not done in this notebook, but there are some tests in the FINN repository that demonstrate such a procedure. They can be found [here](https://github.com/Xilinx/finn/tree/dev/tests).

 %% Cell type:markdown id: tags:

 ### Simulation (npysim) using C++ <a id='npysim'></a>

 %% Cell type:markdown id: tags:

 ### Emulation (rtlsim) using Pyverilator <a id='rtlsim'></a>

 %% Cell type:code id: tags:

 ``` python
 ```