[Notebook] Added additional information to streamlining section

c284e02f · auphelia · 9795804f · c284e02f
Commit c284e02f authored 5 years ago by auphelia
--- a/notebooks/9-FINN-EndToEndFlow.ipynb
+++ b/notebooks/9-FINN-EndToEndFlow.ipynb
@@ -354,7 +354,76 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "As can be seen, several transformations are involved in the streamlining transformation. There are move and collapse transformations. In the last step the operations are transformed into multithresholds. The involved transformations can be viewed in detail [here](https://github.com/Xilinx/finn/tree/dev/src/finn/transformation/streamline). After each transformation, three of the basic transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames` and `InferDataTypes`) are applied to the model as clean up."
+    "As can be seen, several transformations are involved in the streamlining transformation. There are move and collapse transformations. In the last step the operations are transformed into multithresholds. The involved transformations can be viewed in detail [here](https://github.com/Xilinx/finn/tree/dev/src/finn/transformation/streamline). After each transformation, three of the basic transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames` and `InferDataTypes`) are applied to the model as clean up.\n",
+    "\n",
+    "After streamlining the network looks as follows."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Stopping http://0.0.0.0:8081\n",
+      "Serving 'lfc_w1_a1.onnx' at http://0.0.0.0:8081\n"
+     ]
+    }
+   ],
+   "source": [
+    "model = model.transform(Streamline())\n",
+    "model.save(\"lfc_w1_a1.onnx\")\n",
+    "netron.start(\"lfc_w1_a1.onnx\", port=8081, host=\"0.0.0.0\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<iframe src=\"http://0.0.0.0:8081/\" style=\"position: relative; width: 100%;\" height=\"400\"></iframe>\n"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%html\n",
+    "<iframe src=\"http://0.0.0.0:8081/\" style=\"position: relative; width: 100%;\" height=\"400\"></iframe>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Our example network is a quantized network with 1 bit precision. For this reason, after streamlining, the resulting bipolar matrix multiplications are converted into xnorpopcount operations. This transformation produces operations that are again collapsed and converted into thresholds. This procedure is shown below. After these transformations, the nodes can be converted to HLS layers for further processing."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from finn.transformation.bipolar_to_xnor import ConvertBipolarMatMulToXnorPopcount\n",
+    "import finn.transformation.streamline.absorb as absorb\n",
+    "from finn.transformation.streamline.round_thresholds import RoundAndClipThresholds\n",
+    "\n",
+    "model = model.transform(ConvertBipolarMatMulToXnorPopcount())\n",
+    "model = model.transform(absorb.AbsorbAddIntoMultiThreshold())\n",
+    "model = model.transform(absorb.AbsorbMulIntoMultiThreshold())\n",
+    "model = model.transform(RoundAndClipThresholds())"
   ]
  },
  {

 %% Cell type:markdown id: tags:

 # FINN - End-to-End Flow
 -----------------------------------------------------------------
 This notebook gives an overview about the end to end flow of FINN. From loading an ONNX model from Brevitas, followed by the numerous transformations in FINN and up to the generation of a bitstream that can be used to load an FPGA.

 We'll use the following showSrc function to print the source code for function calls in the Jupyter notebook.

 %% Cell type:code id: tags:

 ``` python
 import inspect

 def showSrc(what):
    print("".join(inspect.getsourcelines(what)[0]))
 ```

 %% Cell type:markdown id: tags:

 ## Overview
 The notebook is based on the following diagram.

 %% Cell type:markdown id: tags:

 ![](finn-design-flow-example.svg)

 %% Cell type:markdown id: tags:

 The diagram visualizes the end-to-end flow of FINN. The cylinder-like fields show the state of the network representation in the respective step. The rectangular fields represent the transformations that are applied to the network to achieve a certain result. The diagram is divided into 5 blocks, each of it includes several flow steps. The flow starts in top left corner with Brevitas export (pink block), followed by the preparation of the network (grey block) for the Vivado HLS and Vivado synthesis (yellow block). There is also a section for testing and verification in software (green block) and the hardware test on the PYNQ board (red block).
 The diagram leads to the following outline for this Jupyter notebook.

 %% Cell type:markdown id: tags:

 ## Outline
 -------------
 1. [Brevitas export](#brev_exp)
 2. [Network preparation](#nw_prep)
    * Basic transformations
    * Streamlining
    * Conversion to HLS layers
    * Folding
 3. [Vivado HLS and Vivado synthesis](#vivado)
    * HLS IP per layer
    * Creation of stitched design
    * PYNQ shell project
    * Synthesis, place and route
 4. [Hardware Test](#hw_test)
 5. [Simulation & Emulation flows for functional verification](#sim)
    * Simulation using Python
    * Simulation (npysim) using C++
    * Emulation (rtlsim) using PyVerilator

 %% Cell type:markdown id: tags:

 ## 1. Brevitas export <a id='brev_exp'></a>
 FINN expects an ONNX model as input. This can be a model trained with [Brevitas](https://github.com/Xilinx/brevitas). Brevitas is a Pytorch library for quantization-aware training and the FINN Docker image comes with several [example Brevitas networks](https://github.com/maltanar/brevitas_cnv_lfc). To show the FINN end-to-end flow, we'll use the LFC-w1a1 model as example network. The Brevitas export is only briefly described here, for details see Jupyter notebook [3-FINN-Brevitas-network-import](3-FINN-Brevitas-network-import.ipynb).

 First a few things have to be imported. Then the model can be loaded with the pretrained weights.

 %% Cell type:code id: tags:

 ``` python
 import torch
 import brevitas.onnx as bo
 from models.LFC import LFC

 lfc = LFC(weight_bit_width=1, act_bit_width=1, in_bit_width=1)
 trained_lfc_checkpoint = ("/workspace/brevitas_cnv_lfc/pretrained_models/LFC_1W1A/checkpoints/best.tar")
 checkpoint = torch.load(trained_lfc_checkpoint, map_location="cpu")
 lfc.load_state_dict(checkpoint["state_dict"])
 bo.export_finn_onnx(lfc, (1, 1, 28, 28), "lfc_w1_a1.onnx")
 ```

 %% Output

    /workspace/brevitas_cnv_lfc/training_scripts/models/LFC.py:73: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
      x = 2.0 * x - torch.tensor([1.0])

 %% Cell type:markdown id: tags:

 The model was now exported, loaded with the pretrained weights and saved under the name "lfc_w1_a1.onnx".
 To visualize the exported model, Netron can be used. Netron is a visualizer for neural networks and allows interactive investigation of network properties. For example, you can click on the individual nodes and view the properties.

 %% Cell type:code id: tags:

 ``` python
 import netron
 netron.start("lfc_w1_a1.onnx", port=8081, host="0.0.0.0")
 ```

 %% Output

    Serving 'lfc_w1_a1.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
 ```

 %% Output


 %% Cell type:markdown id: tags:

 Now that we have the model in .onnx format, we can work with it using FINN. For that FINN `ModelWrapper` is used. It is a wrapper around the ONNX model which provides several helper functions to make it easier to work with the model. For details see Jupyter notebook [2-FINN-ModelWrapper](2-FINN-ModelWrapper.ipynb).

 %% Cell type:code id: tags:

 ``` python
 from finn.core.modelwrapper import ModelWrapper
 model = ModelWrapper("lfc_w1_a1.onnx")
 ```

 %% Cell type:markdown id: tags:

 Now the model is prepared and it can be processed in different ways. The principle of FINN are analysis and transformation passes, which can be applied to the model. An analysis pass extracts specific information about the model and returns it to the user in the form of a dictionary. For more details see [4-FINN-HowToAnalysisPass](4-FINN-HowToAnalysisPass.ipynb). A transformation pass changes the model and returns the changed model back to the FINN flow, for more information about transformation passes see notebook [5-FINN-HowToTransformationPass](5-FINN-HowToTransformationPass.ipynb).

 Since the goal in this notebook is to process the model to such an extent that a bitstream can be generated from it, the focus is on the transformations that are necessary for this. In the next section these are discussed in more detail.

 %% Cell type:markdown id: tags:

 ## 2. Network preparation <a id='nw_prep'></a>

 * Basic transformations
 * Streamlining
 * Conversion to HLS layers
 * Folding

 %% Cell type:markdown id: tags:

 ### Basic transformations
 This section deals with the basic transformations, which are applied to the model like a kind of clean up. They do not appear in the diagram above, but they are applied in many steps in the FINN flow to postprocess the model after a transformation and/or to prepare it for the next transformation.

 %% Cell type:markdown id: tags:

 The basis transformations are:
 * GiveUniqueNodeNames
 * GiveReadableTensorNames
 * InferShapes
 * InferDataTypes
 * FoldConstants

 %% Cell type:markdown id: tags:

 These transformations work like a clean up. In the first two transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames`) the nodes and tensors are given unique and readable names. The following two transformations (`InferShapes`, `InferDataTypes`) derive the shapes and data types of the tensors from the model properties and set them in the `ValueInfo` of the model. Normally these transformations can always be applied and do not affect the structure of the graph, but ensure that all the information needed is available.

 The last listed transformation is `FoldConstants`. It identifies a node with constant inputs and determines its output. The result is then set as const-only inputs for the following node and the old node is removed. Although this transformation changes the structure of the model, it is a transformation that is usually always desired and can be applied to any model.

 %% Cell type:markdown id: tags:

 The transformations can be imported and applied as follows.

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames
 from finn.transformation.infer_shapes import InferShapes
 from finn.transformation.infer_datatypes import InferDataTypes
 from finn.transformation.fold_constants import FoldConstants

 model = model.transform(InferShapes())
 model = model.transform(FoldConstants())
 model = model.transform(GiveUniqueNodeNames())
 model = model.transform(GiveReadableTensorNames())
 model = model.transform(InferDataTypes())
 ```

 %% Cell type:markdown id: tags:

 The result of these transformations can be viewed with netron after the model has been saved again. By clicking on the individual nodes, it can now be seen, for example, that each node has been given a name. Also the whole upper area could be folded, so that now the first node is "Reshape".

 %% Cell type:code id: tags:

 ``` python
 model.save("lfc_w1_a1.onnx")
 netron.start("lfc_w1_a1.onnx", port=8081, host="0.0.0.0")
 ```

 %% Output

    
    Stopping http://0.0.0.0:8081
    Serving 'lfc_w1_a1.onnx' at http://0.0.0.0:8081

 %% Cell type:code id: tags:

 ``` python
 %%html
 <iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
 ```

 %% Output


 %% Cell type:markdown id: tags:

 ### Streamlining
 Streamlining is a transformation containing several sub-transformations. The goal of streamlining is to eliminate floating point operations by moving them around, then collapsing them into one operation and in the last step transform them into multithresholding nodes. For the theoretical background see arXiv:1709.04060.

 In the following the streamlining transformation is shown.

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.streamline import Streamline
 showSrc(Streamline)
 ```

 %% Output

    class Streamline(Transformation):
        """Apply the streamlining transform, see arXiv:1709.04060."""
    
        def apply(self, model):
            streamline_transformations = [
                ConvertSubToAdd(),
                BatchNormToAffine(),
                ConvertSignToThres(),
                MoveScalarAddPastMatMul(),
                MoveScalarMulPastMatMul(),
                MoveAddPastMul(),
                CollapseRepeatedAdd(),
                CollapseRepeatedMul(),
                AbsorbAddIntoMultiThreshold(),
                FactorOutMulSignMagnitude(),
                AbsorbMulIntoMultiThreshold(),
                Absorb1BitMulIntoMatMul(),
                RoundAndClipThresholds(),
            ]
            for trn in streamline_transformations:
                model = model.transform(trn)
                model = model.transform(GiveUniqueNodeNames())
                model = model.transform(GiveReadableTensorNames())
                model = model.transform(InferDataTypes())
            return (model, False)
    

 %% Cell type:markdown id: tags:

 As can be seen, several transformations are involved in the streamlining transformation. There are move and collapse transformations. In the last step the operations are transformed into multithresholds. The involved transformations can be viewed in detail [here](https://github.com/Xilinx/finn/tree/dev/src/finn/transformation/streamline). After each transformation, three of the basic transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames` and `InferDataTypes`) are applied to the model as clean up.

+After streamlining the network looks as follows.
+
+%% Cell type:code id: tags:
+
+``` python
+model = model.transform(Streamline())
+model.save("lfc_w1_a1.onnx")
+netron.start("lfc_w1_a1.onnx", port=8081, host="0.0.0.0")
+```
+
+%% Output
+
+    
+    Stopping http://0.0.0.0:8081
+    Serving 'lfc_w1_a1.onnx' at http://0.0.0.0:8081
+
+%% Cell type:code id: tags:
+
+``` python
+%%html
+<iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>
+```
+
+%% Output
+
+
+%% Cell type:markdown id: tags:
+
+Our example network is a quantized network with 1 bit precision. For this reason, after streamlining, the resulting bipolar matrix multiplications are converted into xnorpopcount operations. This transformation produces operations that are again collapsed and converted into thresholds. This procedure is shown below. After these transformations, the nodes can be converted to HLS layers for further processing.
+
+%% Cell type:code id: tags:
+
+``` python
+from finn.transformation.bipolar_to_xnor import ConvertBipolarMatMulToXnorPopcount
+import finn.transformation.streamline.absorb as absorb
+from finn.transformation.streamline.round_thresholds import RoundAndClipThresholds
+
+model = model.transform(ConvertBipolarMatMulToXnorPopcount())
+model = model.transform(absorb.AbsorbAddIntoMultiThreshold())
+model = model.transform(absorb.AbsorbMulIntoMultiThreshold())
+model = model.transform(RoundAndClipThresholds())
+```
+
 %% Cell type:markdown id: tags:

 ### Conversion to HLS layers

 %% Cell type:markdown id: tags:

 ### Folding

 %% Cell type:markdown id: tags:

 ## 3. Vivado HLS and Vivado synthesis <a id='vivado'></a>
 * HLS IP per layer
 * Creation of stitched design
 * PYNQ shell project
 * Synthesis, place and route

 %% Cell type:markdown id: tags:

 ### HLS IP per layer

 %% Cell type:markdown id: tags:

 ### Creation of stitched design

 %% Cell type:markdown id: tags:

 ### Conversion to HLS layers

 %% Cell type:markdown id: tags:

 ### Synthesis, place and route

 %% Cell type:markdown id: tags:

 ## 4. Hardware test <a id='hw_test'></a>

 %% Cell type:markdown id: tags:

 ## 5. Simulation & Emulation flows for functional verification <a id='sim'></a>
 * Simulation using Python
 * Simulation (npysim) using C++
 * Emulation (rtlsim) using PyVerilator

 %% Cell type:markdown id: tags:

 ### Simulation using Python

 %% Cell type:markdown id: tags:

 ### Simulation (npysim) using C++

 %% Cell type:markdown id: tags:

 ### Emulation (rtlsim) using Pyverilator

 %% Cell type:code id: tags:

 ``` python
 ```