Skip to content
Snippets Groups Projects
Commit 5529ff59 authored by Yaman Umuroglu's avatar Yaman Umuroglu
Browse files

[Doc] change dev links to master

parent 9c7d03fc
No related branches found
No related tags found
No related merge requests found
......@@ -25,7 +25,7 @@ What is FINN?
More FINN Resources
===================
* `List of publications <https://github.com/Xilinx/finn/blob/dev/docs/publications.md>`_
* `List of publications <https://github.com/Xilinx/finn/blob/master/docs/publications.md>`_
* `Roadmap <https://github.com/Xilinx/finn/projects/1>`_
......
......@@ -13,19 +13,19 @@ Basics
The notebooks in this folder should give a basic insight into FINN, how to get started and the basic concepts.
* `0_getting_started <https://github.com/Xilinx/finn/blob/dev/notebooks/basics/0_getting_started.ipynb>`_
* `0_getting_started <https://github.com/Xilinx/finn/blob/master/notebooks/basics/0_getting_started.ipynb>`_
* This notebook corresponds to the chapter :ref:`getting_started` and gives an overview how to start working with FINN.
* `1_how_to_work_with_onnx <https://github.com/Xilinx/finn/blob/dev/notebooks/basics/1_how_to_work_with_onnx.ipynb>`_
* `1_how_to_work_with_onnx <https://github.com/Xilinx/finn/blob/master/notebooks/basics/1_how_to_work_with_onnx.ipynb>`_
* This notebook can help you to learn how to create and manipulate a simple ONNX model, also by using FINN
* `2_modelwrapper <https://github.com/Xilinx/finn/blob/dev/notebooks/basics/2_modelwrapper.ipynb>`_
* `2_modelwrapper <https://github.com/Xilinx/finn/blob/master/notebooks/basics/2_modelwrapper.ipynb>`_
* This notebook corresponds to the section :ref:`modelwrapper` in the chapter about internals.
* `3_brevitas_network_import <https://github.com/Xilinx/finn/blob/dev/notebooks/basics/3_brevitas_network_import.ipynb>`_
* `3_brevitas_network_import <https://github.com/Xilinx/finn/blob/master/notebooks/basics/3_brevitas_network_import.ipynb>`_
* This notebook shows how to import a brevitas network and prepare it for the FINN flow.
......@@ -34,19 +34,19 @@ Internals
The notebooks in this folder are more developer oriented. They should help you to get familiar with the principles in FINN and how to add new content regarding these concepts.
* `0_custom_analysis_pass <https://github.com/Xilinx/finn/blob/dev/notebooks/internals/0_custom_analysis_pass.ipynb>`_
* `0_custom_analysis_pass <https://github.com/Xilinx/finn/blob/master/notebooks/internals/0_custom_analysis_pass.ipynb>`_
* This notebook explains what an analysis pass is and how to write one for FINN.
* `1_custom_transformation_pass <https://github.com/Xilinx/finn/blob/dev/notebooks/internals/1_custom_transformation_pass.ipynb>`_
* `1_custom_transformation_pass <https://github.com/Xilinx/finn/blob/master/notebooks/internals/1_custom_transformation_pass.ipynb>`_
* This notebook explains what a transformation pass is and how to write one for FINN.
* `2_custom_op <https://github.com/Xilinx/finn/blob/dev/notebooks/internals/2_custom_op.ipynb>`_
* `2_custom_op <https://github.com/Xilinx/finn/blob/master/notebooks/internals/2_custom_op.ipynb>`_
* This notebooks explains what a custom operation/node is and how to create one for FINN.
* `3_verify_hls_custom_op <https://github.com/Xilinx/finn/blob/dev/notebooks/internals/3_verify_hls_custom_op.ipynb>`_
* `3_verify_hls_custom_op <https://github.com/Xilinx/finn/blob/master/notebooks/internals/3_verify_hls_custom_op.ipynb>`_
* This notebook shows the functional verification flow for hls custom operations/nodes.
......@@ -55,6 +55,4 @@ End-to-End Flow
This notebook shows the FINN end-to-end flow step by step using an example of a simple, binarized, fully-connected network trained on the MNIST data set. Starting with the brevitas export and taking this particular network all the way down to hardware by using a specific sequence of transformations.
* `tfc_end2end_example <https://github.com/Xilinx/finn/blob/dev/notebooks/end2end_example/tfc_end2end_example.ipynb>`_
* `tfc_end2end_example <https://github.com/Xilinx/finn/blob/master/notebooks/end2end_example/tfc_end2end_example.ipynb>`_
%% Cell type:markdown id: tags:
# FINN Basics
## What is FINN?
'FINN' is colloquially used to refer to two separate but highly related things:
* The [FINN project](https://xilinx.github.io/finn/), which includes tools for training quantized neural networks such as [Brevitas](github.com/Xilinx/brevitas), the FINN compiler, and the [finn-hlslib](github.com/Xilinx/finn-hlslib) Vivado HLS library of FPGA components for QNNs.
* This repository, referred to as the *FINN compiler*, which is the centerpiece of the FINN project.
## How to use the FINN compiler?
The FINN compiler should not be thought of a single pushbutton tool that does everything for you, but rather as a collection of scripts/tools that will help you convert a QNN into a custom FPGA accelerator that performs high-performance inference. We do provide several examples of taking trained networks all the way down to FPGA bitfiles, but if you are trying to do this for custom networks you will have to write your own Python scripts that call the appropriate FINN Compiler functions that process your design correctly, or adding new functions as required.
%% Cell type:markdown id: tags:
## Requirements
* Ubuntu 18.04
* Docker
* A working Vivado installation
* A `VIVADO_PATH` environment variable pointing to the Vivado installation directory (e.g. the directory where settings64.sh is located)
## Running FINN with Docker
We use Docker extensively for developing and deploying FINN. If you are not familiar with Docker, there are many excellent [online resources]( https://docker-curriculum.com/) to get started. There is a Dockerfile in the root of the repository, as well as a `run-docker.sh` script that can be launched in the following modes:
### Getting an interactive shell for development or experimentation
Simply running `sh run-docker.sh` without any additional arguments will clone the dependency repos, create a Docker container and give you a terminal with you can use for development for experimentation.
*Important:* the Docker container is spawned with the `--rm` option, so make sure that any important files you created inside the container are either in the /workspace/finn folder (which is mounted from the host cinoyter) or otherwise backed up.
*Develop from host, run inside container:* The FINN repository directory will be mounted from the host, so that you can use a text editor on your host computer to develop and the changes will be reflected directly inside the container.
### Running the Jupyter notebooks
```sh run-docker.sh notebook```
This will launch the Jupyter notebook server inside a Docker container, and print a link on the terminal that you can open in your browser to run the FINN notebooks or create new ones. The link will look something like this (the token you get will be different):
`http://127.0.0.1:8888/?token=f5c6bd32ae93ec103a88152214baedff4ce1850d81065bfc`
The `run-docker.sh` script forwards ports 8888 for Jupyter and 8081 for Netron, and launches the notebook server with appropriate arguments.
### Running the test suite
FINN comes with a set of tests which you can easily launch in Docker as follows:
```sh run-docker.sh test```
Note that some of the tests involve extra compilation and the entire test suite may take some time to complete.
%% Cell type:markdown id: tags:
## Intermediate Representation: FINN-ONNX
FINN uses [ONNX](onnx.ai) as an intermediate representation (IR) for neural networks. As such, almost every component inside FINN uses ONNX and its [Python API](https://github.com/onnx/onnx/blob/master/docs/PythonAPIOverview.md), so you may want to familiarize yourself with how ONNX represents DNNs. Specifically, the [ONNX protobuf description](https://github.com/onnx/onnx/blob/master/onnx/onnx.proto) (or its [human-readable documentation](https://github.com/onnx/onnx/blob/master/docs/IR.md) and the [operator schemas](https://github.com/onnx/onnx/blob/master/docs/Operators.md) are useful as reference documents.
FINN uses ONNX is a specific way that we refer to as FINN-ONNX, and not all ONNX graphs are supported by FINN (and vice versa). Here is a list of key points to keep in mind:
* *Custom quantization annotations but data stored as float.* ONNX does not support datatypes smaller than 8-bit integers, whereas in FINN we are interested in smaller integers down to ternary and bipolar. To make this work, FINN uses the `quantization_annotation` field in ONNX to annotate tensors with their [FINN DataType](https://github.com/Xilinx/finn/blob/dev/src/finn/core/datatype.py) information. However, all tensors are expected to use single-precision floating point (float32) storage in FINN. This means we store even a 1-bit value as floating point for the purposes of representation. The FINN compiler flow is responsible for eventually producing a packed representation for the target hardware, where the 1-bit is actually stored as 1-bit.
* *Custom quantization annotations but data stored as float.* ONNX does not support datatypes smaller than 8-bit integers, whereas in FINN we are interested in smaller integers down to ternary and bipolar. To make this work, FINN uses the `quantization_annotation` field in ONNX to annotate tensors with their [FINN DataType](https://github.com/Xilinx/finn/blob/master/src/finn/core/datatype.py) information. However, all tensors are expected to use single-precision floating point (float32) storage in FINN. This means we store even a 1-bit value as floating point for the purposes of representation. The FINN compiler flow is responsible for eventually producing a packed representation for the target hardware, where the 1-bit is actually stored as 1-bit.
* *Custom operations/nodes.* FINN uses many custom operations (`op_type` in ONNX NodeProto) that are not defined in the ONNX operator schema. These custom nodes are marked with `domain="finn"` in the protobuf to identify them as such. These nodes can represent specific operations that we need for low-bit networks, or operations that are specific to a particular hardware backend.
* *Custom ONNX execution flow* To verify correct operation of FINN-ONNX graphs, FINN provides its own [ONNX execution flow](https://github.com/Xilinx/finn/blob/dev/src/finn/core/onnx_exec.py). This flow supports the standard set of ONNX operations as well as the custom FINN operations. *Important:* this execution flow is *only* meant for checking the correctness of models after applying transformations, and *not* for high performance inference.
* *Custom ONNX execution flow* To verify correct operation of FINN-ONNX graphs, FINN provides its own [ONNX execution flow](https://github.com/Xilinx/finn/blob/master/src/finn/core/onnx_exec.py). This flow supports the standard set of ONNX operations as well as the custom FINN operations. *Important:* this execution flow is *only* meant for checking the correctness of models after applying transformations, and *not* for high performance inference.
* *ModelWrapper* FINN provides a [`ModelWrapper`](https://github.com/Xilinx/finn/blob/dev/src/finn/core/modelwrapper.py) class as a thin wrapper around ONNX to make it easier to analyze and manipulate ONNX graphs. This wrapper provides many helper functions, while still giving full access to the ONNX protobuf representation.
* *ModelWrapper* FINN provides a [`ModelWrapper`](https://github.com/Xilinx/finn/blob/master/src/finn/core/modelwrapper.py) class as a thin wrapper around ONNX to make it easier to analyze and manipulate ONNX graphs. This wrapper provides many helper functions, while still giving full access to the ONNX protobuf representation.
[Netron](https://lutzroeder.github.io/netron/) is very useful for visualizing ONNX models, including FINN-ONNX models.
%% Cell type:markdown id: tags:
## More FINN Resources
* **[List of publications](https://github.com/Xilinx/finn/blob/dev/docs/publications.md)**
* **[List of publications](https://github.com/Xilinx/finn/blob/master/docs/publications.md)**
* **[Roadmap](https://github.com/Xilinx/finn/projects/1)**
* **[Status of example networks](https://github.com/Xilinx/finn/blob/dev/docs/example-networks.md)**
* **[Status of example networks](https://github.com/Xilinx/finn/blob/master/docs/example-networks.md)**
%% Cell type:code id: tags:
``` python
```
......
%% Cell type:markdown id: tags:
# FINN - End-to-End Flow
-----------------------------------------------------------------
In this notebook, we will show how to take a simple, binarized, fully-connected network trained on the MNIST data set and take it all the way down to a customized bitfile running on a PYNQ board.
This notebook is quite lengthy, and some of the cells (involving Vivado synthesis) may take up to an hour to finish running. To let you save and resume your progress, we will save the intermediate ONNX models that are generated in the various steps to disk, so that you can jump back directly to where you left off.
%% Cell type:markdown id: tags:
## Overview
The FINN compiler comes with many *transformations* that modify the ONNX representation of the network according to certain patterns. This notebook will demonstrate a *possible* sequence of such transformations to take a particular trained network all the way down to hardware, as shown in the figure below.
%% Cell type:markdown id: tags:
![](finn-design-flow-example.svg)
%% Cell type:markdown id: tags:
The cylinder-like fields show the state of the network representation in the respective step. The rectangular fields represent the transformations that are applied to the network to achieve a certain result. The diagram is divided into 5 blocks, each of it includes several flow steps. The flow starts in top left corner with Brevitas export (purple block), followed by the preparation of the network (grey block) for the Vivado HLS synthesis and Vivado IPI stitching (yellow block), and finally building a PYNQ overlay bitfile and testing it on a PYNQ board (pink block).
There is an additional section for functional verification (green block), which we will not cover in this notebook.
This Jupyter notebook is organized based on the sections described above. We will use the following helper functions, `showSrc` to show source code of FINN library calls and `showInNetron` to show the ONNX model at the current transformation step. The Netron displays are interactive, but they only work when running the notebook actively and not on GitHub (i.e. if you are viewing this on GitHub you'll only see blank squares).
%% Cell type:code id: tags:
``` python
import inspect
import netron
from finn.util.basic import make_build_dir
from IPython.display import IFrame
def showSrc(what):
print("".join(inspect.getsourcelines(what)[0]))
def showInNetron(model_filename):
netron.start(model_filename, port=8081, host="0.0.0.0")
return IFrame(src="http://0.0.0.0:8081/", width="100%", height=400)
build_dir = "/workspace/finn"
```
%% Cell type:markdown id: tags:
## Outline
-------------
1. [Brevitas export](#brev_exp)
2. [Network preparation](#nw_prep)
* Basic transformations
* Streamlining
* Conversion to HLS layers
* Folding
3. [Vivado HLS and Vivado IPI](#vivado)
* HLS IP per layer
* Creation of stitched design
4. [Synthesize, Deploy and Test on PYNQ](#hw_test)
* PYNQ shell project
* Synthesis, place and route
* Driver generation
* Deployment and remote execution
%% Cell type:markdown id: tags:
## 1. Brevitas export <a id='brev_exp'></a>
FINN expects an ONNX model as input. This can be a model trained with [Brevitas](https://github.com/Xilinx/brevitas). Brevitas is a PyTorch library for quantization-aware training and the FINN Docker image comes with several [example Brevitas networks](https://github.com/maltanar/brevitas_cnv_lfc). To show the FINN end-to-end flow, we'll use the TFC-w1a1 model as example network.
First a few things have to be imported. Then the model can be loaded with the pretrained weights.
%% Cell type:code id: tags:
``` python
import onnx
from finn.util.test import get_test_model_trained
import brevitas.onnx as bo
tfc = get_test_model_trained("TFC", 1, 1)
bo.export_finn_onnx(tfc, (1, 1, 28, 28), build_dir+"/tfc_w1_a1.onnx")
```
%% Cell type:markdown id: tags:
The model was now exported, loaded with the pretrained weights and saved under the name "lfc_w1_a1.onnx".
To visualize the exported model, Netron can be used. Netron is a visualizer for neural networks and allows interactive investigation of network properties. For example, you can click on the individual nodes and view the properties.
%% Cell type:code id: tags:
``` python
showInNetron(build_dir+"/tfc_w1_a1.onnx")
```
%% Output
Stopping http://0.0.0.0:8081
Serving '/workspace/finn/tfc_w1_a1.onnx' at http://0.0.0.0:8081
<IPython.lib.display.IFrame at 0x7f186ccfbe10>
%% Cell type:markdown id: tags:
Now that we have the model in .onnx format, we can work with it using FINN. For that FINN `ModelWrapper` is used. It is a wrapper around the ONNX model which provides several helper functions to make it easier to work with the model.
%% Cell type:code id: tags:
``` python
from finn.core.modelwrapper import ModelWrapper
model = ModelWrapper(build_dir+"/tfc_w1_a1.onnx")
```
%% Cell type:markdown id: tags:
Now the model is prepared and could be simulated using Python. How this works is described in subsection [Simulation using Python](#simpy) in the section about *Simulation & Emulation flows for functional verification*.
The model can now also be processed in different ways. The principle of FINN are analysis and transformation passes, which can be applied to the model. An analysis pass extracts specific information about the model and returns it to the user in the form of a dictionary. A transformation pass changes the model and returns the changed model back to the FINN flow.
Since the goal in this notebook is to process the model to such an extent that a bitstream can be generated from it, the focus is on the transformations that are necessary for this. In the next section these are discussed in more detail.
%% Cell type:markdown id: tags:
## 2. Network preparation <a id='nw_prep'></a>
* [Tidy-up transformations](#basic_trafo)
* [Streamlining](#streamline)
* [Conversion to HLS layers](#hls_layers)
* [Folding](#folding)
In this section, we will put the network through a series of transformations that puts it in a form that can be stitched together to form a FINN-style dataflow architecture, yielding a high-performance, high-efficiency FPGA accelerator.
%% Cell type:markdown id: tags:
### FINN-style Dataflow Architectures
We start with a quick recap of FINN-style dataflow architectures. The key idea in such architectures is to parallelize across layers as well as within layers by dedicating a proportionate amount of compute resources to each layer, as illustrated in the figure below taken from the [FINN-R paper](https://arxiv.org/pdf/1809.04570.pdf):
![](finn-hw-arch.png)
In practice, the compute arrays are instantiated by function calls to optimized Vivado HLS building blocks from the [finn-hlslib](https://github.com/Xilinx/finn-hlslib) library. As these function calls can only handle certain patterns/cases, we need to transform the network into an appropriate form so that we can replace network layers with these function calls, which is the goal of the network preparation process.
%% Cell type:markdown id: tags:
### Tidy-up transformations <a id='basic_trafo'></a>
This section deals with some basic transformations, which are applied to the model like a kind of "tidy-up" to make it easier to be processed. They do not appear in the diagram above, but they are applied in many steps in the FINN flow to postprocess the model after a transformation and/or to prepare it for the next transformation.
%% Cell type:markdown id: tags:
These transformations are:
* GiveUniqueNodeNames
* GiveReadableTensorNames
* InferShapes
* InferDataTypes
* FoldConstants
%% Cell type:markdown id: tags:
In the first two transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames`) the nodes in the graph are first given unique (by enumeration) names, then the tensors are given human-readable names (based on the node names). The following two transformations (`InferShapes`, `InferDataTypes`) derive the shapes and data types of the tensors from the model properties and set them in the `ValueInfo` of the model. These transformations can almost always be applied without negative effects and do not affect the structure of the graph, ensuring that all the information needed is available.
The last listed transformation is `FoldConstants`, which performs constant folding. It identifies a node with constant inputs and determines its output. The result is then set as constant-only inputs for the following node and the old node is removed. Although this transformation changes the structure of the model, it is a transformation that is usually always desired and can be applied to any model.
%% Cell type:markdown id: tags:
These transformations can be imported and applied as follows.
%% Cell type:code id: tags:
``` python
from finn.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames
from finn.transformation.infer_shapes import InferShapes
from finn.transformation.infer_datatypes import InferDataTypes
from finn.transformation.fold_constants import FoldConstants
model = model.transform(InferShapes())
model = model.transform(FoldConstants())
model = model.transform(GiveUniqueNodeNames())
model = model.transform(GiveReadableTensorNames())
model = model.transform(InferDataTypes())
model.save(build_dir+"/tfc_w1_a1_tidy.onnx")
```
%% Cell type:markdown id: tags:
The result of these transformations can be viewed with netron after the model has been saved again. By clicking on the individual nodes, it can now be seen, for example, that each node has been given a name. Also the whole upper area could be folded, so that now the first node is "Reshape".
%% Cell type:code id: tags:
``` python
showInNetron(build_dir+"/tfc_w1_a1_tidy.onnx")
```
%% Output
Stopping http://0.0.0.0:8081
Serving '/workspace/finn/tfc_w1_a1_tidy.onnx' at http://0.0.0.0:8081
<IPython.lib.display.IFrame at 0x7f186e386240>
%% Cell type:markdown id: tags:
### Streamlining <a id='streamline'></a>
Streamlining is a transformation containing several sub-transformations. The goal of streamlining is to eliminate floating point operations by moving them around, then collapsing them into one operation and in the last step transform them into multi-thresholding nodes. For more information on the theoretical background of this, see [this paper](https://arxiv.org/pdf/1709.04060).
Let's have a look at which sub-transformations `Streamline` consists of:
%% Cell type:code id: tags:
``` python
from finn.transformation.streamline import Streamline
showSrc(Streamline)
```
%% Output
class Streamline(Transformation):
"""Apply the streamlining transform, see arXiv:1709.04060."""
def apply(self, model):
streamline_transformations = [
ConvertSubToAdd(),
BatchNormToAffine(),
ConvertSignToThres(),
MoveAddPastMul(),
MoveScalarAddPastMatMul(),
MoveScalarMulPastMatMul(),
MoveAddPastMul(),
CollapseRepeatedAdd(),
CollapseRepeatedMul(),
AbsorbAddIntoMultiThreshold(),
FactorOutMulSignMagnitude(),
AbsorbMulIntoMultiThreshold(),
Absorb1BitMulIntoMatMul(),
RoundAndClipThresholds(),
]
for trn in streamline_transformations:
model = model.transform(trn)
model = model.transform(GiveUniqueNodeNames())
model = model.transform(GiveReadableTensorNames())
model = model.transform(InferDataTypes())
return (model, False)
%% Cell type:markdown id: tags:
As can be seen, several transformations are involved in the streamlining transformation. There are move and collapse transformations. In the last step the operations are transformed into multithresholds. The involved transformations can be viewed in detail [here](https://github.com/Xilinx/finn/tree/dev/src/finn/transformation/streamline). After each transformation, three of the tidy-up transformations (`GiveUniqueNodeNames`, `GiveReadableTensorNames` and `InferDataTypes`) are applied to the model.
After streamlining the network looks as follows:
%% Cell type:code id: tags:
``` python
model = ModelWrapper(build_dir+"/tfc_w1_a1_tidy.onnx")
model = model.transform(Streamline())
model.save(build_dir+"/tfc_w1_a1_streamlined.onnx")
showInNetron(build_dir+"/tfc_w1_a1_streamlined.onnx")
```
%% Output
Stopping http://0.0.0.0:8081
Serving '/workspace/finn/tfc_w1_a1_streamlined.onnx' at http://0.0.0.0:8081
<IPython.lib.display.IFrame at 0x7f186cd470b8>
%% Cell type:markdown id: tags:
You can see that the network has become simplified considerably compared to the previous step -- a lot of nodes have disappeared between the `MatMul` layers, and the `Sign` nodes have been replaced with `MultiThreshold` nodes instead.
**The current implementation of streamlining is highly network-specific and may not work for your network if its topology is very different than the example network here. We hope to rectify this in future releases.**
Our example network is a quantized network with 1-bit bipolar (-1, +1 values) precision, and we want FINN to implement them as XNOR-popcount operations [as described in the original FINN paper](https://arxiv.org/pdf/1612.07119). For this reason, after streamlining, the resulting bipolar matrix multiplications are converted into xnorpopcount operations. This transformation produces operations that are again collapsed and converted into thresholds. This procedure is shown below.
%% Cell type:code id: tags:
``` python
from finn.transformation.bipolar_to_xnor import ConvertBipolarMatMulToXnorPopcount
import finn.transformation.streamline.absorb as absorb
from finn.transformation.streamline.round_thresholds import RoundAndClipThresholds
model = model.transform(ConvertBipolarMatMulToXnorPopcount())
model = model.transform(absorb.AbsorbAddIntoMultiThreshold())
model = model.transform(absorb.AbsorbMulIntoMultiThreshold())
model = model.transform(RoundAndClipThresholds())
model.save(build_dir+"/tfc_w1a1_ready_for_hls_conversion.onnx")
showInNetron(build_dir+"/tfc_w1a1_ready_for_hls_conversion.onnx")
```
%% Output
Stopping http://0.0.0.0:8081
Serving '/workspace/finn/tfc_w1a1_ready_for_hls_conversion.onnx' at http://0.0.0.0:8081
<IPython.lib.display.IFrame at 0x7f17f04bbc18>
%% Cell type:markdown id: tags:
Observe the pairs of `XnorPopcountmatMul` and `MultiThreshold` layers following each other -- this is the particular pattern that the next step will be looking for in order to convert them to HLS layers.
%% Cell type:markdown id: tags:
### Conversion to HLS layers <a id='hls_layers'></a>
Converts the nodes to HLS layers that correspond to the functions in [finn-hls library](https://finn-hlslib.readthedocs.io/en/latest/). In our case this transformation onverts pairs of binary XnorPopcountMatMul layers to StreamingFCLayer_Batch layers. Any immediately following MultiThreshold layers will also be absorbed into the MVTU.
Below is the code for the transformation and the network is visualized using netron to create the new structure with `StreamingFCLayer_Batch` nodes, which will correspond to a function call from the [finn-hlslib](https://finn-hlslib.readthedocs.io/en/latest/library/fclayer.html#_CPPv4I_j_j_j_j000_i_i000E22StreamingFCLayer_BatchvRN3hls6streamI7ap_uintI9InStreamWEEERN3hls6streamI7ap_uintI10OutStreamWEEERK2TWRK2TAKjRK1R) library.
%% Cell type:code id: tags:
``` python
import finn.transformation.fpgadataflow.convert_to_hls_layers as to_hls
model = ModelWrapper(build_dir+"/tfc_w1a1_ready_for_hls_conversion.onnx")
model = model.transform(to_hls.InferBinaryStreamingFCLayer())
model.save(build_dir+"/tfc_w1_a1_hls_layers.onnx")
showInNetron(build_dir+"/tfc_w1_a1_hls_layers.onnx")
```
%% Output
Stopping http://0.0.0.0:8081
Serving '/workspace/finn/tfc_w1_a1_hls_layers.onnx' at http://0.0.0.0:8081
<IPython.lib.display.IFrame at 0x7f1868061eb8>
%% Cell type:markdown id: tags:
Each StreamingFCLayer_Batch node has two attributes that specify the degree of folding, PE and SIMD. In all nodes the values for these attributes are set as default to 1, which would correspond to a maximum folding (time multiplexing) and thus minimum performance. We will shortly cover how these can be adjusted, but first we want to separate the HLS layers from the non-HLS layers in this network.
%% Cell type:markdown id: tags:
### Creating a Dataflow Partition <a id='dataflow_partition'></a>
In the graph above, you can see that there is a mixture of FINN HLS layers (StreamingFCLayer_Batch) with regular ONNX layers (Reshape, Mul, Add). To create a bitstream, FINN needs a model with only HLS layers. In order to achieve this, we will use the `CreateDataflowPartition` transformation to create a "dataflow partition" in this graph, separating out the HLS layers into another model, and replacing them with a placeholder layer called StreamingDataflowPartition:
%% Cell type:code id: tags:
``` python
from finn.transformation.fpgadataflow.create_dataflow_partition import CreateDataflowPartition
model = ModelWrapper(build_dir+"/tfc_w1_a1_hls_layers.onnx")
parent_model = model.transform(CreateDataflowPartition())
parent_model.save(build_dir+"/tfc_w1_a1_dataflow_parent.onnx")
showInNetron(build_dir+"/tfc_w1_a1_dataflow_parent.onnx")
```
%% Output
Stopping http://0.0.0.0:8081
Serving '/workspace/finn/tfc_w1_a1_dataflow_parent.onnx' at http://0.0.0.0:8081
<IPython.lib.display.IFrame at 0x7f186cc55e48>
%% Cell type:markdown id: tags:
We can see that the StreamingFCLayer instances have all been replaced with a single `StreamingDataflowPartition`, which has an attribute `model` that points to the extracted, HLS dataflow-only graph:
%% Cell type:code id: tags:
``` python
from finn.custom_op.registry import getCustomOp
sdp_node = getCustomOp(parent_model.graph.node[2])
dataflow_model_filename = sdp_node.get_nodeattr("model")
showInNetron(dataflow_model_filename)
```
%% Output
Stopping http://0.0.0.0:8081
Serving '/tmp/finn_maltanar/dataflow_partition_h1c4i5gn/df_model.onnx' at http://0.0.0.0:8081
<IPython.lib.display.IFrame at 0x7f17f04c70f0>
%% Cell type:markdown id: tags:
We can see all the extracted `StreamingFCLayer` instances have been moved to the child (dataflow) model. We will load the child model with `ModelWrapper` and continue working on it.
%% Cell type:code id: tags:
``` python
model = ModelWrapper(dataflow_model_filename)
```
%% Cell type:markdown id: tags:
### Folding and TLastMarker Insertion <a id='folding'></a>
*Folding* in FINN describes how much a layer is time-multiplexed in terms of execution resources. There are several *folding factors* for each layer, controlled by the PE (parallelization over outputs) and SIMD (parallelization over inputs) parameters as described by the original [FINN paper](https://arxiv.org/pdf/1612.07119). The higher the PE and SIMD values are set, the faster the generated accelerator will run, and the more FPGA resources it will consume.
Since the folding parameters are node attributes, they can be easily accessed and changed using a helper function of the `ModelWrapper`. But first we have to extract the nodes which are StreamingFCLayer_Batch operations. This is where the Netron visualization helps us, in the above diagram we can see that the first four nodes are StreamingFCLayer_Batch. Through the `print`s we can check if the extracted nodes all have the op_type "StreamingFCLayer_Batch".
%% Cell type:code id: tags:
``` python
fc0 = model.graph.node[0]
fc1 = model.graph.node[1]
fc2 = model.graph.node[2]
fc3 = model.graph.node[3]
print("fc0 has the op_type: " + str(fc0.op_type))
print("fc1 has the op_type: " + str(fc1.op_type))
print("fc2 has the op_type: " + str(fc2.op_type))
print("fc3 has the op_type: " + str(fc3.op_type))
```
%% Output
fc0 has the op_type: StreamingFCLayer_Batch
fc1 has the op_type: StreamingFCLayer_Batch
fc2 has the op_type: StreamingFCLayer_Batch
fc3 has the op_type: StreamingFCLayer_Batch
%% Cell type:markdown id: tags:
We can use the higher-level [HLSCustomOp](https://github.com/Xilinx/finn/blob/dev/src/finn/custom_op/fpgadataflow/__init__.py) wrappers for these nodes. These wrappers provide easy access to specific properties of these nodes, such as the folding factors (PE and SIMD). Let's have a look at which node attributes are defined by the CustomOp wrapper, and adjust the SIMD and PE attributes.
We can use the higher-level [HLSCustomOp](https://github.com/Xilinx/finn/blob/master/src/finn/custom_op/fpgadataflow/__init__.py) wrappers for these nodes. These wrappers provide easy access to specific properties of these nodes, such as the folding factors (PE and SIMD). Let's have a look at which node attributes are defined by the CustomOp wrapper, and adjust the SIMD and PE attributes.
%% Cell type:code id: tags:
``` python
fc0w = getCustomOp(fc0)
fc1w = getCustomOp(fc1)
fc2w = getCustomOp(fc2)
fc3w = getCustomOp(fc3)
print("CustomOp wrapper is of class " + fc0w.__class__.__name__)
fc0w.get_nodeattr_types()
```
%% Output
CustomOp wrapper is of class StreamingFCLayer_Batch
{'PE': ('i', True, 0),
'SIMD': ('i', True, 0),
'MW': ('i', True, 0),
'MH': ('i', True, 0),
'resType': ('s', True, ''),
'ActVal': ('i', False, 0),
'inputDataType': ('s', True, ''),
'weightDataType': ('s', True, ''),
'outputDataType': ('s', True, ''),
'binaryXnorMode': ('i', False, 0),
'noActivation': ('i', False, 0),
'inFIFODepth': ('i', False, 0),
'outFIFODepth': ('i', False, 0),
'backend': ('s', True, 'fpgadataflow'),
'code_gen_dir_npysim': ('s', False, ''),
'code_gen_dir_ipgen': ('s', False, ''),
'executable_path': ('s', False, ''),
'ipgen_path': ('s', False, ''),
'exec_mode': ('s', False, ''),
'sim_cycles': ('i', False, 0),
'rtlsim_trace': ('s', False, '')}
%% Cell type:markdown id: tags:
We can see that the PE and SIMD are listed as node attributes, as well as the depths of the FIFOs that will be inserted between consecutive layers, and all can be adjusted using `set_nodeattr` subject to certain constraints.
**In this notebook we are setting the folding factors and FIFO depths manually, but in a future version we will support determining the folding factors given an FPGA resource budget according to the analytical model from the [FINN-R paper](https://arxiv.org/pdf/1809.04570).**
%% Cell type:code id: tags:
``` python
# SIMD controls the folding over the input vector
# PE controls the folding over the output vector
fc0w.set_nodeattr("inFIFODepth", 50)
fc0w.set_nodeattr("SIMD", 16)
fc0w.set_nodeattr("PE", 16)
fc0w.set_nodeattr("outFIFODepth", 4)
fc1w.set_nodeattr("SIMD", 16)
fc1w.set_nodeattr("PE", 16)
fc1w.set_nodeattr("outFIFODepth", 4)
fc2w.set_nodeattr("SIMD", 16)
fc2w.set_nodeattr("PE", 16)
fc2w.set_nodeattr("outFIFODepth", 4)
fc3w.set_nodeattr("SIMD", 16)
fc3w.set_nodeattr("PE", 10)
fc3w.set_nodeattr("outFIFODepth", 50)
```
%% Cell type:markdown id: tags:
Finally, we will run the `InsertTLastMarker` transformation to get a `TLastMarker` node at the output of this graph, which is necessary to run the DMA engines correctly.
%% Cell type:code id: tags:
``` python
from finn.transformation.fpgadataflow.insert_tlastmarker import InsertTLastMarker
model = model.transform(InsertTLastMarker())
model.save(build_dir+"/tfc_w1_a1_set_folding_factors.onnx")
showInNetron(build_dir+"/tfc_w1_a1_set_folding_factors.onnx")
```
%% Output
Stopping http://0.0.0.0:8081
Serving '/workspace/finn/tfc_w1_a1_set_folding_factors.onnx' at http://0.0.0.0:8081
<IPython.lib.display.IFrame at 0x7f1868061d30>
%% Cell type:markdown id: tags:
This completes the network preparation and the network can be passed on to the next block *Vivado HLS and Vivado synthesis*, which is described below.
%% Cell type:markdown id: tags:
## 3. Vivado HLS and Vivado IPI <a id='vivado'></a>
* [Generating HLS Code](#hls_per_layer)
* [Synthesizing HLS to IP Blocks](#hls_synth)
* [IP Stitching](#ip_stitching)
As we will be dealing with FPGA synthesis tools in these tasks, we'll define two helper variables that describe the Xilinx FPGA part name and the PYNQ board name that we are targeting.
%% Cell type:code id: tags:
``` python
# print the names of the supported PYNQ boards
from finn.util.basic import pynq_part_map
print(pynq_part_map.keys())
```
%% Output
dict_keys(['Ultra96', 'Pynq-Z1'])
%% Cell type:code id: tags:
``` python
# change this if you have a different PYNQ board, see list above
pynq_board = "Ultra96"
fpga_part = pynq_part_map[pynq_board]
target_clk_ns = 5
```
%% Cell type:markdown id: tags:
### Generating HLS Code <a id='hls_per_layer'></a>
This section deals with the generation of an IP block from the different layers. These can then be stitched to a block design that corresponds to the complete model. The single conversion into IP blocks allows a good transparency and we can check the functionality of each IP block and compare it with the behaviour of the corresponding ONNX node.
%% Cell type:markdown id: tags:
Two transformations are required to generate HLS IP blocks for each layer:
* `CodeGen_ipgen` which generates the HLS C++ code for the node and a tcl-script which starts the HLS synthesis and exports the design as IP.
* `HLSSynth_IPGen` which passes the tcl-script to Vivado HLS and thus performs the actual IP generation.
We start off by giving unique node names using the basic transformation `GiveUniqueNodeNames`, and then proceed with the HLS C++ code generation with `CodeGen_ipgen`.
%% Cell type:code id: tags:
``` python
model = ModelWrapper(build_dir+"/tfc_w1_a1_set_folding_factors.onnx")
model = model.transform(GiveUniqueNodeNames())
from finn.transformation.fpgadataflow.codegen_ipgen import CodeGen_ipgen
model = model.transform(CodeGen_ipgen(fpga_part, target_clk_ns))
```
%% Cell type:markdown id: tags:
### Synthesizing HLS to IP Blocks <a id='hls_synth'></a>
Now that we have generated the HLS code for each layer, we can call the `HLSSynth_IPGen` transformation to convert the generated HLS into Vivado IP blocks. **As this involves calling HLS synthesis, this transformation will run for some time (several minutes).**
%% Cell type:code id: tags:
``` python
from finn.transformation.fpgadataflow.hlssynth_ipgen import HLSSynth_IPGen
model = model.transform(HLSSynth_IPGen())
model.save(build_dir+"/tfc_w1_a1_ipgen.onnx")
```
%% Cell type:markdown id: tags:
Each `StreamingFCLayer_Batch` node now has new attributes which can be examined more closely with netron.
%% Cell type:code id: tags:
``` python
showInNetron(build_dir+"/tfc_w1_a1_ipgen.onnx")
```
%% Output
Stopping http://0.0.0.0:8081
Serving '/workspace/finn/tfc_w1_a1_ipgen.onnx' at http://0.0.0.0:8081
<IPython.lib.display.IFrame at 0x7f17f04c9470>
%% Cell type:markdown id: tags:
There are two additional attributes:
* `code_gen_dir_ipgen` which contains the directory path where all the files generated by the ipgen transformations are stored
* `ipgen_path` which contains the path to the project directory in which the generated IP block is stored
We can further investigate which files are produced by taking a look in this directory. For example for the first StreamingFCLayer_Batch node.
%% Cell type:code id: tags:
``` python
fc0w = getCustomOp(model.graph.node[0])
code_gen_dir = fc0w.get_nodeattr("code_gen_dir_ipgen")
!ls {code_gen_dir}
```
%% Output
hls_syn_StreamingFCLayer_Batch_0.tcl thresh.h
ipgen.sh top_StreamingFCLayer_Batch_0.cpp
params.h vivado_hls.log
project_StreamingFCLayer_Batch_0
%% Cell type:markdown id: tags:
Directory *project_StreamingFCLayer_Batch_0* contains the project created by Vivado HLS into which the IP Block is exported, along with other files generated by Vivado HLS. If we compare it to the above visualization of the network with netron, this is exactly the name of the folder stored in the node attribute `ipgen_path`. The .cpp code that is passed to Vivado HLS can be found in the file *top_StreamingFCLayer_Batch_0.cpp*. The files *params.h* and *thresh.h* belong to that as well, they contain the values for the weights and thresholds. *vivado_hls.log* is the log file from Vivado HLS. Besides these files, the folder contains *ipgen.sh* and *hls_syn_StreamingFCLayer_Batch_0.tcl*. First we take a look at *ipgen.sh*.
%% Cell type:code id: tags:
``` python
shell_script = code_gen_dir + "/ipgen.sh"
!cat {shell_script}
```
%% Output
#!/bin/bash
cd /tmp/finn_maltanar/code_gen_ipgen_StreamingFCLayer_Batch_5f0hmok_
vivado_hls /tmp/finn_maltanar/code_gen_ipgen_StreamingFCLayer_Batch_5f0hmok_/hls_syn_StreamingFCLayer_Batch_0.tcl
cd /workspace/finn
%% Cell type:markdown id: tags:
The script consists only of two framing `cd` commands and a command to pass the tcl script to *vivado_hls*. The directory has to be changed to create the files in the correct folder and will then be changed back to the original directory.
Below is the tcl script which is passed to *vivado_hls*.
%% Cell type:code id: tags:
``` python
tcl_script = code_gen_dir + "/hls_syn_StreamingFCLayer_Batch_0.tcl"
!cat {tcl_script}
```
%% Output
set config_proj_name project_StreamingFCLayer_Batch_0
puts "HLS project: $config_proj_name"
set config_hwsrcdir "/tmp/finn_maltanar/code_gen_ipgen_StreamingFCLayer_Batch_5f0hmok_"
puts "HW source dir: $config_hwsrcdir"
set config_proj_part "xczu3eg-sbva484-1-e"
set config_bnnlibdir "/workspace/finn-hlslib"
set config_toplevelfxn "StreamingFCLayer_Batch_0"
set config_clkperiod 5
open_project $config_proj_name
add_files $config_hwsrcdir/top_StreamingFCLayer_Batch_0.cpp -cflags "-std=c++0x -I$config_bnnlibdir"
set_top $config_toplevelfxn
open_solution sol1
set_part $config_proj_part
config_interface -m_axi_addr64
config_rtl -auto_prefix
create_clock -period $config_clkperiod -name default
csynth_design
export_design -format ip_catalog
exit 0
%% Cell type:markdown id: tags:
In the first part of the script the project is configured. For example the FPGA part and the clock are set. Then the project is opened and the files are added. The toplevel function is set and after creating a clock, the design is first synthesized with `csynth` and then exported as an IP block.
Now that all IP blocks are in place, they can be stitched together to create an IP design that matches the ONNX model. This is covered in the next section.
%% Cell type:markdown id: tags:
### IP Stitching <a id='ip_stitching'></a>
We now have IP blocks for each of our layers, and will stitch them together into a larger IP that implements the whole network using the `CodeGen_ipstitch` transformation. Bear in mind that this transformation can only be applied on a graph that only contains HLS nodes that already have been through the `HLSSynth_IPGen` transformation, which is the last step we performed. Prior to calling IP stitching, we'll also use the `ReplaceVerilogRelPaths` transformation to convert any relative `$readmemh` paths in the generated IP blocks to absolute ones, which prevents errors later on. **This step invokes Vivado and may take a few minutes to run.**
%% Cell type:code id: tags:
``` python
from finn.transformation.fpgadataflow.codegen_ipstitch import CodeGen_ipstitch
from finn.transformation.fpgadataflow.replace_verilog_relpaths import ReplaceVerilogRelPaths
model = ModelWrapper(build_dir+"/tfc_w1_a1_ipgen.onnx")
model = model.transform(ReplaceVerilogRelPaths())
model = model.transform(CodeGen_ipstitch(fpga_part))
```
%% Cell type:markdown id: tags:
If you examine the nodes themselves on the transformed model you won't see a difference, because the IP stitching adds model-level metadata to the graph. This can be accessed using the `.model.metadata_props`, the `get_metadata_prop` function in `ModelWrapper`, or by clicking on the global input/output tensors in Netron.
%% Cell type:code id: tags:
``` python
model.model.metadata_props
```
%% Output
[key: "vivado_stitch_proj"
value: "/tmp/finn_maltanar/vivado_stitch_proj_oo2lpoeo"
, key: "vivado_stitch_vlnv"
value: "xilinx_finn:finn:finn_design:1.0"
, key: "wrapper_filename"
value: "/tmp/finn_maltanar/vivado_stitch_proj_oo2lpoeo/finn_vivado_stitch_proj.srcs/sources_1/bd/finn_design/hdl/finn_design_wrapper.v"
]
%% Cell type:code id: tags:
``` python
model.get_metadata_prop("vivado_stitch_proj")
```
%% Output
'/tmp/finn_maltanar/vivado_stitch_proj_oo2lpoeo'
%% Cell type:markdown id: tags:
If you navigate to the folder above (remember the /tmp/finn_xxx folder is mounted on the host as well as inside Docker) you can open the Vivado project (.xpr) file there using Vivado, and view the following stitched IP block design:
%% Cell type:markdown id: tags:
![](stitched_ip.png)
%% Cell type:code id: tags:
``` python
model.save(build_dir+"/tfc_w1_a1_ipstitch.onnx")
```
%% Cell type:markdown id: tags:
At this point, one could take the generated stitched IP and integrate it into your own project using Vivado IP Integrator if desired. Here, we will continue the tutorial by assuming that we want to do a stand-alone deployment for this accelerator for a PYNQ board.
%% Cell type:markdown id: tags:
## 4. Synthesize, Deploy and Test on PYNQ <a id='hw_test'></a>
* [Inserting the IP into a PYNQ Overlay Shell](#pynq_shell)
* [Synthesis, place and route](#synth_pl_ro)
* [Driver Generation](#driver_gen)
* [Deployment and Remote Execution](#deploy)
We are almost done preparing our hardware design. We'll now put it in a form suitable for use as a PYNQ overlay, synthesize and deploy it.
%% Cell type:markdown id: tags:
### Inserting the IP into a PYNQ Overlay Shell <a id='pynq_shell'></a>
We are almost done preparing our hardware design. To deploy our accelerator on a PYNQ platform, it needs to be put inside an appropriate *shell* that bridges it with the interfaces that the underlying system exposes. FINN makes it easy to create a PYNQ-compatible overlay by inserting the stitched IP into an appropriate PYNQ shell with the `MakePYNQProject` transformation, and view the created PYNQ shell project directory using the `metadata_props`. **This invokes Vivado and may take a few minutes to run.**
%% Cell type:code id: tags:
``` python
from finn.transformation.fpgadataflow.make_pynq_proj import MakePYNQProject
model = ModelWrapper(build_dir+"/tfc_w1_a1_ipstitch.onnx")
model = model.transform(MakePYNQProject(pynq_board))
model.model.metadata_props
```
%% Output
[key: "vivado_stitch_proj"
value: "/tmp/finn_maltanar/vivado_stitch_proj_oo2lpoeo"
, key: "vivado_stitch_vlnv"
value: "xilinx_finn:finn:finn_design:1.0"
, key: "wrapper_filename"
value: "/tmp/finn_maltanar/vivado_stitch_proj_oo2lpoeo/finn_vivado_stitch_proj.srcs/sources_1/bd/finn_design/hdl/finn_design_wrapper.v"
, key: "vivado_pynq_proj"
value: "/tmp/finn_maltanar/vivado_pynq_proj_hq9mfroo"
]
%% Cell type:code id: tags:
``` python
! ls {model.get_metadata_prop("vivado_pynq_proj")}
```
%% Output
ip_config.tcl resizer.cache resizer.ip_user_files resizer.xpr
make_project.sh resizer.hw resizer.srcs synth_project.sh
%% Cell type:markdown id: tags:
If we open the created Vivado project (.xpr) under the `vivado_pynq_proj` directory above, we can see the system-level block design as below, with the FINN-generated part of the design highlighted. Various other components, such as the DMA engine and data width converters, have also been instantiated.
![](pynq_shell_project.png)
%% Cell type:code id: tags:
``` python
model.save(build_dir + "/tfc_w1_a1_pynq_project.onnx")
```
%% Cell type:markdown id: tags:
### Synthesis, place and route <a id='synth_pl_ro'></a>
%% Cell type:markdown id: tags:
We are now ready for the final hardware generation step, which is synthesis, place and route to generate an FPGA bitfile. This can be done by either running the `synth_project.sh` script in the generated Vivado PYNQ project directory inside Docker, or by executing the `SynthPYNQProject` transformation. **This step involves launching Vivado for synthesis and may take a few hours.**
%% Cell type:code id: tags:
``` python
from finn.transformation.fpgadataflow.synth_pynq_proj import SynthPYNQProject
model = ModelWrapper(build_dir + "/tfc_w1_a1_pynq_project.onnx")
model = model.transform(SynthPYNQProject())
model.model.metadata_props
```
%% Output
[key: "vivado_stitch_proj"
value: "/tmp/finn_maltanar/vivado_stitch_proj_oo2lpoeo"
, key: "vivado_stitch_vlnv"
value: "xilinx_finn:finn:finn_design:1.0"
, key: "wrapper_filename"
value: "/tmp/finn_maltanar/vivado_stitch_proj_oo2lpoeo/finn_vivado_stitch_proj.srcs/sources_1/bd/finn_design/hdl/finn_design_wrapper.v"
, key: "vivado_pynq_proj"
value: "/tmp/finn_maltanar/vivado_pynq_proj_hq9mfroo"
, key: "vivado_pynq_bitfile"
value: "/tmp/finn_maltanar/vivado_pynq_proj_hq9mfroo/resizer.bit"
]
%% Cell type:code id: tags:
``` python
model.save(build_dir + "/tfc_w1_a1_post_synthesis.onnx")
```
%% Cell type:markdown id: tags:
### Driver generation <a id='driver_gen'></a>
Now that we have synthesized a bitfile for our network, we will generate some Python code for PYNQ that will act as the driver for this bitfile, package everything into a deployment folder and copy that to our PYNQ board.
%% Cell type:code id: tags:
``` python
from finn.transformation.fpgadataflow.make_pynq_driver import MakePYNQDriver
model = ModelWrapper(build_dir + "/tfc_w1_a1_post_synthesis.onnx")
model = model.transform(MakePYNQDriver())
```
%% Cell type:markdown id: tags:
The generated driver is placed in a folder that is indicated by the `pynq_driver_dir` top-level metadata. We can examine the generated PYNQ Python driver code as follows:
%% Cell type:code id: tags:
``` python
driver_dir = model.get_metadata_prop("pynq_driver_dir")
! cat {driver_dir}/driver.py
```
%% Output
from pynq import Overlay
import numpy as np
from pynq import allocate
from finn.util.data_packing import (
finnpy_to_packed_bytearray,
packed_bytearray_to_finnpy
)
from finn.core.datatype import DataType
bitfile_path = "resizer.bit"
ol = Overlay(bitfile_path)
dma=ol.axi_dma_0
# declare input/output types and shapes for the accelerator
# input FINN DataType
idt = DataType.BINARY
# normal, folded and packed input shapes
ishape_normal = (1, 784)
ishape_folded = (1, 49, 16)
ishape_packed = (1, 49, 2)
# output FINN DataType
odt = DataType.UINT32
# normal, folded and packed output shapes
oshape_normal = (1, 10)
oshape_folded = (1, 1, 10)
oshape_packed = (1, 1, 40)
# load desired input .npy file
ibuf_normal = np.load("input.npy")
# ensure that shape is as expected
assert ibuf_normal.shape == ishape_normal
# convert to folded form
ibuf_folded = ibuf_normal.reshape(ishape_folded)
# pack the input buffer, reversing both SIMD dim and endianness
ibuf_packed = finnpy_to_packed_bytearray(
ibuf_folded, idt, reverse_endian=True, reverse_inner=True
)
# allocate a PYNQ buffer for the packed input buffer
ibuf_packed_device = allocate(shape=ishape_packed, dtype=np.uint8)
# copy the packed data into the PYNQ buffer
# TODO optimization: pack directly into the PYNQ buffer?
np.copyto(ibuf_packed_device, ibuf_packed)
# allocate a PYNQ buffer for the returned packed output buffer
obuf_packed = allocate(shape=oshape_packed, dtype=np.uint8)
# set up the DMA and wait until all transfers complete
dma.sendchannel.transfer(ibuf_packed_device)
dma.recvchannel.transfer(obuf_packed)
dma.sendchannel.wait()
dma.recvchannel.wait()
# unpack the packed output buffer from accelerator
obuf_folded = packed_bytearray_to_finnpy(
obuf_packed, odt, oshape_folded, reverse_endian=True, reverse_inner=True
)
# convert to normal reshape and save
obuf_normal = obuf_folded.reshape(oshape_normal)
np.save("output.npy", obuf_normal)
%% Cell type:markdown id: tags:
We can see that the generated driver contains the expected input/output shapes, expecting a file called `input.npy` to be provided prior to execution, which will be read in, packed into the format that the accelerator expects, running it and generating an `output.npy` file with the results. You can build your own applications around the accelerator by modifying the driver, or use the remote execution capabilities that FINN provides just to check if it is working, which will be our next step.
%% Cell type:markdown id: tags:
### Deployment and Remote Execution <a id='deploy'></a>
We'll now use the `DeployToPYNQ` transformation to create a deployment folder with the bitfile and driver file(s), and copy that to the PYNQ board. You can change the default IP address, username, password and target folder for the PYNQ below.
%% Cell type:code id: tags:
``` python
from finn.transformation.fpgadataflow.make_deployment import DeployToPYNQ
ip = "192.168.3.1"
username = "xilinx"
password = "xilinx"
target_dir = "/home/xilinx/finn_tfc_end2end_example"
model = model.transform(DeployToPYNQ(ip, username, password, target_dir))
model.save(build_dir + "/tfc_w1_a1_pynq_deploy.onnx")
```
%% Cell type:markdown id: tags:
Let's verify that the remote access credentials is saved in the model metadata, and that the deployment folder has been successfully copied to the board:
%% Cell type:code id: tags:
``` python
model.model.metadata_props
```
%% Output
[key: "vivado_stitch_proj"
value: "/tmp/finn_maltanar/vivado_stitch_proj_oo2lpoeo"
, key: "vivado_stitch_vlnv"
value: "xilinx_finn:finn:finn_design:1.0"
, key: "wrapper_filename"
value: "/tmp/finn_maltanar/vivado_stitch_proj_oo2lpoeo/finn_vivado_stitch_proj.srcs/sources_1/bd/finn_design/hdl/finn_design_wrapper.v"
, key: "vivado_pynq_proj"
value: "/tmp/finn_maltanar/vivado_pynq_proj_hq9mfroo"
, key: "vivado_pynq_bitfile"
value: "/tmp/finn_maltanar/vivado_pynq_proj_hq9mfroo/resizer.bit"
, key: "pynq_driver_dir"
value: "/tmp/finn_maltanar/pynq_driver_25t8u9sd"
, key: "pynq_ip"
value: "192.168.3.1"
, key: "pynq_username"
value: "xilinx"
, key: "pynq_password"
value: "xilinx"
, key: "pynq_target_dir"
value: "/home/xilinx/finn_tfc_end2end_example"
, key: "pynq_deployment_dir"
value: "/tmp/finn_maltanar/pynq_deployment_mpyziv7h"
, key: "pynq_deploy_dir"
value: "/tmp/finn_maltanar/pynq_deployment_mpyziv7h"
, key: "exec_mode"
value: "remote_pynq"
]
%% Cell type:code id: tags:
``` python
! sshpass -p {password} ssh {username}@{ip} 'ls -l {target_dir}/*'
```
%% Output
/home/xilinx/finn_tfc_end2end_example/pynq_deployment_1oyo7x66:
total 5820
-rw-r--r-- 1 xilinx xilinx 1934 Feb 13 13:36 driver.py
drwxr-xr-x 4 xilinx xilinx 4096 Feb 13 13:36 finn
-rw-r--r-- 1 xilinx xilinx 3264 Feb 13 14:24 input.npy
-rw-r--r-- 1 root root 120 Feb 13 14:24 output.npy
-rw-r--r-- 1 xilinx xilinx 5568787 Feb 13 13:36 resizer.bit
-rw-r--r-- 1 xilinx xilinx 368173 Feb 13 13:36 resizer.hwh
-rw-r--r-- 1 root root 32 Feb 13 14:24 sds_trace_data.dat
/home/xilinx/finn_tfc_end2end_example/pynq_deployment_mpyziv7h:
total 5808
-rw-r--r-- 1 xilinx xilinx 1934 Feb 28 16:09 driver.py
drwxr-xr-x 4 xilinx xilinx 4096 Feb 28 16:09 finn
-rw-r--r-- 1 xilinx xilinx 5568787 Feb 28 16:09 resizer.bit
-rw-r--r-- 1 xilinx xilinx 368173 Feb 28 16:09 resizer.hwh
%% Cell type:markdown id: tags:
We only have two more steps to be able to remotely execute the deployed bitfile with some test data from the MNIST dataset. Let's load up some test data that comes bundled with FINN.
%% Cell type:code id: tags:
``` python
from pkgutil import get_data
import onnx.numpy_helper as nph
import matplotlib.pyplot as plt
raw_i = get_data("finn", "data/onnx/mnist-conv/test_data_set_0/input_0.pb")
x = nph.to_array(onnx.load_tensor_from_string(raw_i))
plt.imshow(x.reshape(28,28), cmap='gray')
```
%% Output
<matplotlib.image.AxesImage at 0x7f17e0a82e10>
%% Cell type:markdown id: tags:
Recall that we partitioned our original network into a parent graph that contained the non-synthesizable nodes and a child graph that contained the bulk of the network, which we turned into a bitfile. We'll load up the parent graph, modify the `StreamingDataflowPartition` node so that it points to the deployed ONNX graph.
%% Cell type:code id: tags:
``` python
parent_model = ModelWrapper(build_dir+"/tfc_w1_a1_dataflow_parent.onnx")
sdp_node = parent_model.graph.node[2]
remote_exec_model = build_dir + "/tfc_w1_a1_pynq_deploy.onnx"
getCustomOp(sdp_node).set_nodeattr("model", remote_exec_model)
parent_model.save(build_dir+"/tfc_w1_a1_dataflow_parent_with_remote_bitfile_exec.onnx")
```
%% Cell type:markdown id: tags:
Finally, we can call `execute_onnx` on the parent graph, which will internally call remote execution with the bitfile once the `StreamingDataflowPartition` node is reached, grab the results, then continue executing the last portion of the network.
%% Cell type:code id: tags:
``` python
import numpy as np
from finn.core.onnx_exec import execute_onnx
iname = parent_model.graph.input[0].name
oname = parent_model.graph.output[0].name
ishape = parent_model.get_tensor_shape(iname)
input_dict = {iname: x.reshape(ishape)}
ret = execute_onnx(parent_model, input_dict, True)
```
%% Cell type:markdown id: tags:
We'll pass the output of the network through a softmax function to interpret it as probabilities, and plot the per-class probabilities as a bar chart.
%% Cell type:code id: tags:
``` python
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()
logits = ret[oname].flatten()
prob = softmax(logits)
plt.bar(np.arange(10), prob)
```
%% Output
<BarContainer object of 10 artists>
%% Cell type:markdown id: tags:
We see that the network correctly predicts this as a digit 2 with high probability. This concludes our tutorial on how to take a simple fully-connected BNN all the way down to hardware with FINN, and execute it remotely on a PYNQ board.
%% Cell type:code id: tags:
``` python
```
......
%% Cell type:markdown id: tags:
# FINN - CustomOps
-----------------------------------------------------------------
<font size="3">This notebook should give a more detailed insight into FINN custom operation nodes. </font>
%% Cell type:markdown id: tags:
<font size="3">Following showSrc function is used to print the source code of function calls in the Jupyter notebook: </font>
%% Cell type:code id: tags:
``` python
import inspect
def showSrc(what):
print("".join(inspect.getsourcelines(what)[0]))
```
%% Cell type:markdown id: tags:
<font size="3">FINN uses many custom operations (`op_type` in ONNX NodeProto) that are not defined in the ONNX operator schema. These custom nodes are marked with `domain="finn"` in the protobuf to identify them as such. These nodes can represent specific operations that we need for low-bit networks, or operations that are specific to a particular hardware backend.
A very abstract version of a custom op node representing a streaming fc layer is shown below. </font>
%% Cell type:markdown id: tags:
## Outline
---------------------------
* <font size="3">Basic FINN-ONNX node</font>
* <font size="3">CustomOp class</font>
* <font size="3">HLS FINN-ONNX node</font>
* <font size="3">HLSCustomOp class</font>
%% Cell type:markdown id: tags:
## Basic FINN-ONNX node
<font size="3">To create a FINN-ONNX node you can use the helper function of ONNX. Because it is an ONNX NodeProtobuf, but with several additional attributes. The procedure is shown with an example for a multithreshold node. </font>
`multithreshold_node = helper.make_node(
"MultiThreshold",
["v", "thresholds"],
["out"],
domain="finn",
out_scale=2.0,
out_bias=-1.0,
out_dtype="",
)`
%% Cell type:markdown id: tags:
<font size="3">The `helper.make_node` function gets the op_type as first argument. In this case it is *MultiThreshold*. Then the inputs and outputs are passed. Beside the data input the multithreshold node has an additional input to pass the threshold values.
The next attribute (`domain`) is to specify that it is a FINN-ONNX node. It must be set to `"finn"`, so that the functions that work with FINN-ONNX nodes can directly recognize that it is a CustomOp. The attributes `out_scale` and `out_bias` are special multithreshold attributes to manipulate the output value. `out_dtype` contains the output data type.
**Note**: each FINN-ONNX node has its own special attributes, which must be set correctly to ensure proper processing.</font>
%% Cell type:markdown id: tags:
## CustomOp class
<font size="3">Custom Ops are represented in FINN as ONNX nodes on the one hand and by a CustomOp class on the other hand. This allows easier access to different attributes and introduces special custom op functions. See below for the standard CustomOp class.</font>
%% Cell type:code id: tags:
``` python
from finn.custom_op import CustomOp
showSrc(CustomOp)
```
%% Output
class CustomOp(ABC):
"""CustomOp class all custom op nodes are based on. Contains different functions
every custom node should have. Some as abstract methods, these have to be filled when
writing a new custom op node."""
def __init__(self, onnx_node):
super().__init__()
self.onnx_node = onnx_node
def get_nodeattr(self, name):
"""Get a node attribute by name. Data is stored inside the ONNX node's
AttributeProto container. Attribute must be part of get_nodeattr_types.
Default value is returned if attribute is not set."""
try:
(dtype, req, def_val) = self.get_nodeattr_types()[name]
attr = get_by_name(self.onnx_node.attribute, name)
if attr is not None:
# dtype indicates which ONNX Attribute member to use
# (such as i, f, s...)
ret = attr.__getattribute__(dtype)
if dtype == "s":
# decode string attributes
ret = ret.decode("utf-8")
return ret
else:
# not set, return default value
return def_val
except KeyError:
raise AttributeError("Op has no such attribute: " + name)
def set_nodeattr(self, name, value):
"""Set a node attribute by name. Data is stored inside the ONNX node's
AttributeProto container. Attribute must be part of get_nodeattr_types."""
try:
(dtype, req, def_val) = self.get_nodeattr_types()[name]
attr = get_by_name(self.onnx_node.attribute, name)
if attr is not None:
# dtype indicates which ONNX Attribute member to use
# (such as i, f, s...)
if dtype == "s":
# encode string attributes
value = value.encode("utf-8")
attr.__setattr__(dtype, value)
else:
# not set, create and insert AttributeProto
attr_proto = helper.make_attribute(name, value)
self.onnx_node.attribute.append(attr_proto)
except KeyError:
raise AttributeError("Op has no such attribute: " + name)
@abstractmethod
def get_nodeattr_types(self):
"""Returns a dict of permitted attributes for node, where:
returned_dict[attribute_name] = (dtype, require, default_value)
- dtype indicates which member of the ONNX AttributeProto
will be utilized
- require indicates whether this attribute is required
- default_val indicates the default value that will be used if the
attribute is not set
"""
pass
@abstractmethod
def make_shape_compatible_op(self):
"""Returns a standard ONNX op which is compatible with this CustomOp
for performing shape inference."""
pass
@abstractmethod
def infer_node_datatype(self, model):
"""Set the DataType annotations corresponding to the outputs of this
node."""
pass
@abstractmethod
def execute_node(self, context, graph):
"""Execute this CustomOp instance, given the execution context and
ONNX graph."""
pass
@abstractmethod
def verify_node(self):
"""Verifies that all attributes the node needs are there and
that particular attributes are set correctly. Also checks if
the number of inputs is equal to the expected number."""
pass
%% Cell type:markdown id: tags:
<font size="3">When instantiating the class, the ONNX node is passed to access all attributes of the node within the class. This is accompanied by the functions `get_nodeattr()`and `set_nodeattr()`, which each instance of this class has. Furthermore 4 abstract methods are implemented, which are described in more detail in the commands of the code and will be exemplarily explained for the multithreshold node in the following. </font>
%% Cell type:code id: tags:
``` python
from finn.custom_op.multithreshold import MultiThreshold
showSrc(MultiThreshold)
```
%% Output
class MultiThreshold(CustomOp):
"""Class that corresponds to a multithresholding node."""
def get_nodeattr_types(self):
return {
"out_dtype": ("s", True, ""),
"out_scale": ("f", False, 1.0),
"out_bias": ("f", False, 0.0),
}
def make_shape_compatible_op(self):
node = self.onnx_node
return helper.make_node("Relu", [node.input[0]], [node.output[0]])
def infer_node_datatype(self, model):
node = self.onnx_node
odt = self.get_nodeattr("out_dtype")
model.set_tensor_datatype(node.output[0], DataType[odt])
def execute_node(self, context, graph):
node = self.onnx_node
# save inputs
v = context[node.input[0]]
thresholds = context[node.input[1]]
# retrieve attributes if output scaling is used
out_scale = self.get_nodeattr("out_scale")
out_bias = self.get_nodeattr("out_bias")
# calculate output
output = multithreshold(v, thresholds, out_scale, out_bias)
# setting context according to output
context[node.output[0]] = output
def verify_node(self):
info_messages = []
# verify number of attributes
num_of_attr = 3
if len(self.onnx_node.attribute) == num_of_attr:
info_messages.append("The number of attributes is correct")
else:
info_messages.append(
"""The number of attributes is incorrect,
{} should have {} attributes""".format(
self.onnx_node.op_type, num_of_attr
)
)
# verify that "domain" is set to "finn"
domain_value = self.onnx_node.domain
if domain_value == "finn":
info_messages.append("Attribute domain is set correctly")
else:
info_messages.append('Attribute domain should be set to "finn"')
# verify that all necessary attributes exist
try:
self.get_nodeattr("out_scale")
self.get_nodeattr("out_bias")
self.get_nodeattr("out_dtype")
info_messages.append("All necessary attributes exist")
except Exception:
info_messages.append(
"""The necessary attributes do not exist.
MultiThreshold needs the following attributes:
out_scale, out_bias, out_dtype"""
)
# verify the number of inputs
if len(self.onnx_node.input) == 2:
info_messages.append("The number of inputs is correct")
else:
info_messages.append(
"""MultiThreshold needs 2 inputs
(data input and threshold values)"""
)
return info_messages
%% Cell type:markdown id: tags:
<font size="3"> `get_nodeattr_types`: returns a dict for the permitted attributes for node. It returns a triple with following values for each of the special multithreshold attributes. </font>
* <font size="3">`dtype`: indicates which member of the ONNX AttributeProto will be utilized </font>
* <font size="3">`require`: indicates whether this attribute is required </font>
* <font size="3">`default_value`: indicates the default value that will be used if the attribute is not set </font>
%% Cell type:markdown id: tags:
<font size="3">`make_shape_compatible_op`: To use the flow of FINN, the transformation pass [infer_shapes](https://github.com/Xilinx/finn/blob/dev/src/finn/transformation/infer_shapes.py) is applied to the graphs in various places. In order for this transformation to be applied to CustomOps, they must first be converted to standard ONNX nodes with the same shape behavior. This means, nodes where the relationship between input and output shape is the same.
<font size="3">`make_shape_compatible_op`: To use the flow of FINN, the transformation pass [infer_shapes](https://github.com/Xilinx/finn/blob/master/src/finn/transformation/infer_shapes.py) is applied to the graphs in various places. In order for this transformation to be applied to CustomOps, they must first be converted to standard ONNX nodes with the same shape behavior. This means, nodes where the relationship between input and output shape is the same.
This is done at this point. Since the output shape of a multithreshold node is the same as the input shape, it can be replaced by a `"Relu"` node from the standard node library of onnx.</font>
%% Cell type:markdown id: tags:
<font size="3">`infer_node_datatype`: sets the output tensor data type accordingly to the attribute `out_dtype` </font>
%% Cell type:markdown id: tags:
<font size="3">`execute_node`: This function allows the execution of the node, depending on the CustomOp a different functionality has to be implemented. In the case of the multithreshold node the input values and the thresholds are first extracted and after the attributes for the output scaling have been retrieved, the output is calculated with the help of a separate function. For more details regarding this function please take a look in the code [here](https://github.com/Xilinx/finn/blob/dev/src/finn/custom_op/multithreshold.py). </font>
<font size="3">`execute_node`: This function allows the execution of the node, depending on the CustomOp a different functionality has to be implemented. In the case of the multithreshold node the input values and the thresholds are first extracted and after the attributes for the output scaling have been retrieved, the output is calculated with the help of a separate function. For more details regarding this function please take a look in the code [here](https://github.com/Xilinx/finn/blob/master/src/finn/custom_op/multithreshold.py). </font>
%% Cell type:markdown id: tags:
<font size="3">FINN has a subset of CustomOps that correspond to the [finn-hls](https://finn-hlslib.readthedocs.io/en/latest/) library. In the next part of the Jupyter notebook these are described in more detail. </font>
%% Cell type:markdown id: tags:
## HLS FINN-ONNX node
<font size="3">The creation of an HLS FINN-ONNX node looks very similar to the creation of a basic FINN-ONNX node. But three new attributes are introduced that are necessary to enable the processing of HLS FINN-ONNX nodes in FINN.</font>
`FCLayer_node = helper.make_node(
"StreamingFCLayer_Batch",
node_inp_list,
node_outp_list,
domain="finn",
backend="fpgadataflow",
code_gen_dir="",
executable_path="",
resType="ap_resource_lut()",
MW=mw,
MH=mh,
SIMD=simd,
PE=pe,
inputDataType=<FINN DataType>,
weightDataType=<FINN DataType>,
outputDataType=<FINN DataType>,
ActVal=actval,
binaryXnorMode=<0/1>,
noActivation=<0/1>
)`
%% Cell type:markdown id: tags:
<font size="3">`"StreamingFCLayer_Batch"` describes the op_type, then the inputs and outputs are declared. This is still like building a default onnx node without additional attributes. But since this is a custom op node of FINN, the attribute `domain="finn"` must be set. The streaming fc layer is a custom op from the [finn-hls](https://finn-hlslib.readthedocs.io/en/latest/) library, this information is set in the node using the `backend` attribute. To execute a custom op from the [finn-hls](https://finn-hlslib.readthedocs.io/en/latest/) library, the corresponding c++ code must be created and an executable must be produced. Where the generated code is stored is specified in the `code_gen_dir` attribute and `executable_path` specifies the path to the produced executable. In addition to the data types of the input and output tensors, the node also contains various other attributes resulting from the parameters of the corresponding [finn-hls](https://finn-hlslib.readthedocs.io/en/latest/) library function. More detailed information can be found in the documentation of [finn-hlslib](https://finn-hlslib.readthedocs.io/en/latest/).</font>
%% Cell type:markdown id: tags:
## HLSCustomOp class
<font size="3">If it is a node from the [finn-hls](https://finn-hlslib.readthedocs.io/en/latest/) library another class is used which is derived from the CustomOp class:</font>
%% Cell type:code id: tags:
``` python
from finn.custom_op.fpgadataflow import HLSCustomOp
showSrc(HLSCustomOp)
```
%% Output
class HLSCustomOp(CustomOp):
"""HLSCustomOp class all custom ops that correspond to a finn-hlslib
function are based on. Contains different functions every fpgadataflow
custom node should have. Some as abstract methods, these have to be filled
when writing a new fpgadataflow custom op node."""
def __init__(self, onnx_node):
super().__init__(onnx_node)
self.code_gen_dict = {}
# getting templates from templates.py
# template for single node execution
self.docompute_template = templates.docompute_template
# templates for single node ip generation
# cpp file
self.ipgen_template = templates.ipgen_template
# tcl script
self.ipgentcl_template = templates.ipgentcl_template
def get_nodeattr_types(self):
return {
"backend": ("s", True, "fpgadataflow"),
"code_gen_dir_npysim": ("s", False, ""),
"code_gen_dir_ipgen": ("s", False, ""),
"executable_path": ("s", False, ""),
"ipgen_path": ("s", False, ""),
"exec_mode": ("s", False, ""),
"sim_cycles": ("i", False, 0),
"rtlsim_trace": ("s", False, ""),
}
def node_res_estimation(self):
"""Returns summarized resource estimation of BRAMs and LUTs
of the node."""
resources = []
resources.append("BRAMs: " + str(self.bram_estimation()))
resources.append("LUTs: " + str(self.lut_estimation()))
return resources
def bram_estimation(self):
"""Function for BRAM resource estimation, is member function of
HLSCustomOp class but has to be filled by every node"""
return 0
def lut_estimation(self):
"""Function for LUT resource estimation, is member function of
HLSCustomOp class but has to be filled by every node"""
return 0
def code_generation_ipgen(self, model, fpgapart, clk):
"""Generates c++ code and tcl script for ip generation."""
node = self.onnx_node
# generate top cpp file for ip generation
path = self.get_nodeattr("code_gen_dir_ipgen")
self.generate_params(model, path)
self.global_includes()
self.defines("ipgen")
self.blackboxfunction()
self.pragmas()
self.docompute()
template = self.ipgen_template
for key in self.code_gen_dict:
# transform list into long string separated by '\n'
code_gen_line = "\n".join(self.code_gen_dict[key])
template = template.replace(key, code_gen_line)
code_gen_dir = self.get_nodeattr("code_gen_dir_ipgen")
f = open(os.path.join(code_gen_dir, "top_{}.cpp".format(node.name)), "w")
f.write(template)
f.close()
self.code_gen_dict.clear()
# generate tcl script for ip generation
self.code_gen_dict["$PROJECTNAME$"] = ["project_{}".format(node.name)]
self.code_gen_dict["$HWSRCDIR$"] = [code_gen_dir]
self.code_gen_dict["$FPGAPART$"] = [fpgapart]
self.code_gen_dict["$FINNHLSLIBDIR$"] = ["/workspace/finn-hlslib"]
self.code_gen_dict["$TOPFXN$"] = [node.name]
self.code_gen_dict["$CLKPERIOD$"] = [str(clk)]
template = self.ipgentcl_template
for key in self.code_gen_dict:
# transform list into long string separated by '\n'
code_gen_line = "\n".join(self.code_gen_dict[key])
template = template.replace(key, code_gen_line)
code_gen_dir = self.get_nodeattr("code_gen_dir_ipgen")
f = open(os.path.join(code_gen_dir, "hls_syn_{}.tcl".format(node.name)), "w")
f.write(template)
f.close()
self.code_gen_dict.clear()
def ipgen_singlenode_code(self):
"""Builds the bash script for ip generation using the IPGenBuilder from
finn.util.fpgadataflow."""
node = self.onnx_node
code_gen_dir = self.get_nodeattr("code_gen_dir_ipgen")
builder = IPGenBuilder()
builder.append_tcl(code_gen_dir + "/hls_syn_{}.tcl".format(node.name))
builder.set_ipgen_path(code_gen_dir + "/project_{}".format(node.name))
builder.build(code_gen_dir)
self.set_nodeattr("ipgen_path", builder.ipgen_path)
def code_generation_npysim(self, model):
"""Generates c++ code for simulation (npysim)."""
node = self.onnx_node
path = self.get_nodeattr("code_gen_dir_npysim")
self.generate_params(model, path)
self.global_includes()
self.defines("npysim")
self.read_npy_data()
self.strm_decl()
self.docompute()
self.dataoutstrm()
self.save_as_npy()
template = self.docompute_template
for key in self.code_gen_dict:
# transform list into long string separated by '\n'
code_gen_line = "\n".join(self.code_gen_dict[key])
template = template.replace(key, code_gen_line)
code_gen_dir = self.get_nodeattr("code_gen_dir_npysim")
f = open(os.path.join(code_gen_dir, "execute_{}.cpp".format(node.op_type)), "w")
f.write(template)
f.close()
self.code_gen_dict.clear()
def compile_singlenode_code(self):
"""Builds the bash script for compilation using the CppBuilder from
finn.util.basic and executes the script to produce the executable."""
code_gen_dir = self.get_nodeattr("code_gen_dir_npysim")
builder = CppBuilder()
# to enable additional debug features please uncommand the next line
# builder.append_includes("-DDEBUG")
builder.append_includes("-I/workspace/finn/src/finn/data/cpp")
builder.append_includes("-I/workspace/cnpy/")
builder.append_includes("-I/workspace/finn-hlslib")
builder.append_includes("-I{}/include".format(os.environ["VIVADO_PATH"]))
builder.append_includes("--std=c++11")
builder.append_sources(code_gen_dir + "/*.cpp")
builder.append_sources("/workspace/cnpy/cnpy.cpp")
builder.append_includes("-lz")
builder.set_executable_path(code_gen_dir + "/node_model")
builder.build(code_gen_dir)
self.set_nodeattr("executable_path", builder.executable_path)
def dynamic_input_to_npy(self, context, count):
"""Saves input (given context) into .npy files.
Count indicates the number of inputs that have to be saved."""
node = self.onnx_node
code_gen_dir = self.get_nodeattr("code_gen_dir_npysim")
if code_gen_dir == "":
raise Exception(
"""
Found no codegen dir for this node, did you run the codegen_npysim transformation?
"""
)
# create a npy file for each input of the node (in_ind is input index)
# assuming dynamic inputs start from 0
for in_ind in range(count):
current_input_name = node.input[in_ind]
np.save(
os.path.join(code_gen_dir, "input_{}.npy".format(in_ind)),
context[current_input_name],
)
def npy_to_dynamic_output(self, context):
"""Reads the output from a .npy file and saves it at the right place in
the context dictionary."""
# TODO support multi-output nodes as needed
node = self.onnx_node
code_gen_dir = self.get_nodeattr("code_gen_dir_npysim")
output = np.load("{}/output.npy".format(code_gen_dir))
context[node.output[0]] = output
def exec_precompiled_singlenode_model(self):
"""Executes precompiled executable."""
executable_path = self.get_nodeattr("executable_path")
if executable_path == "":
raise Exception(
"""
Found no executable for this node, did you run the codegen and
compilation transformations?
"""
)
process_execute = subprocess.Popen(executable_path, stdout=subprocess.PIPE)
process_execute.communicate()
def reset_rtlsim(self, sim):
"""Sets reset input in pyverilator to zero, toggles the clock and set it
back to one"""
sim.io.ap_rst_n = 0
sim.io.ap_clk = 1
sim.io.ap_clk = 0
sim.io.ap_rst_n = 1
def toggle_clk(self, sim):
"""Toggles the clock input in pyverilator once."""
sim.io.ap_clk = 1
sim.io.ap_clk = 0
def rtlsim(self, sim, inp):
"""Runs the pyverilator simulation by passing the input values to the simulation,
toggle the clock and observing the execution time. Function contains also an
observation loop that can abort the simulation if no output value is produced
after 100 cycles."""
trace_file = self.get_nodeattr("rtlsim_trace")
if trace_file != "":
if trace_file == "default":
trace_file = self.onnx_node.name + ".vcd"
sim.start_vcd_trace(trace_file)
inputs = inp
outputs = []
sim.io.out_V_V_TREADY = 1
# observe if output is completely calculated
# observation_count will contain the number of cycles the calculation ran
num_out_values = self.get_number_output_values()
output_observed = False
observation_count = 0
# avoid infinite looping of simulation by aborting when there is no change in
# output values after 100 cycles
no_change_count = 0
old_outputs = outputs
liveness_threshold = pyverilate_get_liveness_threshold_cycles()
while not (output_observed):
sim.io.in0_V_V_TVALID = 1 if len(inputs) > 0 else 0
sim.io.in0_V_V_TDATA = inputs[0] if len(inputs) > 0 else 0
if sim.io.in0_V_V_TREADY == 1 and sim.io.in0_V_V_TVALID == 1:
inputs = inputs[1:]
if sim.io.out_V_V_TVALID == 1 and sim.io.out_V_V_TREADY == 1:
outputs = outputs + [sim.io.out_V_V_TDATA]
sim.io.ap_clk = 1
sim.io.ap_clk = 0
observation_count = observation_count + 1
no_change_count = no_change_count + 1
if len(outputs) == num_out_values:
self.set_nodeattr("sim_cycles", observation_count)
output_observed = True
if no_change_count == liveness_threshold:
if old_outputs == outputs:
if trace_file != "":
sim.flush_vcd_trace()
sim.stop_vcd_trace()
raise Exception(
"Error in simulation! Takes too long to produce output. "
"Consider setting the LIVENESS_THRESHOLD env.var. to a "
"larger value."
)
else:
no_change_count = 0
old_outputs = outputs
if trace_file != "":
sim.flush_vcd_trace()
sim.stop_vcd_trace()
return outputs
def execute_node(self, context, graph):
"""Executes single node using npysim or rtlsim."""
mode = self.get_nodeattr("exec_mode")
if mode == "npysim":
# save input(s)
self.dynamic_input_to_npy(context, 1)
# execute the precompiled model
self.exec_precompiled_singlenode_model()
# load output npy file
self.npy_to_dynamic_output(context)
elif mode == "rtlsim":
pass
else:
raise Exception(
"""Invalid value for attribute exec_mode! Is currently set to: {}
has to be set to one of the following value ("npysim", "rtlsim")""".format(
mode
)
)
def generate_params(self, model, path):
"""Function to generate parameters (i.e. weights and thresholds),
is member function of HLSCustomOp class but has to be filled
by every node."""
pass
@abstractmethod
def get_number_output_values(self):
"""Function to get the number of expected output values,
is member function of HLSCustomOp class but has to be filled
by every node."""
pass
@abstractmethod
def global_includes(self):
"""Function to set the global includes for c++ code that has to be generated
for npysim or rtlsim, is member function of HLSCustomOp class but has to
be filled by every node."""
pass
@abstractmethod
def defines(self, var):
"""Function to set the define commands for c++ code that has to be generated
for npysim or rtlsim, is member function of HLSCustomOp class but has to
be filled by every node.
var: makes it possible to reuse the function for different c++ code generation.
I.e. if set to "ipgen" in StreamingFCLayer_Batch additional PRAGMA defines are
added."""
pass
@abstractmethod
def read_npy_data(self):
"""Function to generate the commands for reading data from .npy file in c++,
is member function of HLSCustomOp class but has to be filled by every node."""
pass
@abstractmethod
def strm_decl(self):
"""Function to generate the commands for the stream declaration in c++,
is member function of HLSCustomOp class but has to be filled
by every node."""
pass
@abstractmethod
def docompute(self):
"""Function to generate the commands for the computational part of the
c++ code, is member function of HLSCustomOp class but has to be filled
by every node."""
pass
@abstractmethod
def dataoutstrm(self):
"""Function to generate the commands for reading out data from c++ and convert
into npy format, is member function of HLSCustomOp class but has to be filled
by every node."""
pass
@abstractmethod
def save_as_npy(self):
"""Function to generate the commands for saving data in .npy file in c++,
is member function of HLSCustomOp class but has to be filled by every node."""
pass
@abstractmethod
def blackboxfunction(self):
"""Function to generate a blackbock function in c++ from which an IP block
will be generated, is member function of HLSCustomOp class but has to be filled
by every node."""
pass
@abstractmethod
def pragmas(self):
"""Function to generate the pragma commands in c++, is member function of
HLSCustomOp class but has to be filled by every node."""
pass
def get_folded_input_shape(self):
"""Returns folded input shape (according to synapse folding), if implemented."""
raise Exception("get_folded_input_shape not implemented for this op")
def get_folded_output_shape(self):
"""Returns folded output shape (according to neuron folding), if implemented."""
raise Exception("get_folded_output_shape not implemented for this op")
def get_instream_width(self):
"""Returns input stream width, if implemented."""
raise Exception("get_instream_width not implemented for this op")
def get_outstream_width(self):
"""Returns output stream width, if implemented."""
raise Exception("get_outstream_width not implemented for this op")
%% Cell type:markdown id: tags:
<font size="3">When creating an instance of this class, a template is introduced, which forms the layout for the c++ code to execute the node. It has some general constructs, like the inclusion of bnn-library.h, which contains the references to the finn-hls library, and of cnpy.h and npy2apintstream.hpp, which support the transfer of python numpy arrays in c++. The idea of this template is to replace the variables marked with `$ $` with c++ calls during code generation. Then the template can be written into a .cpp file and be compiled.
**`get_nodeattr_types()`**: each instance of the HLSCustomOp class must have the attributes `code_gen_dir` and `executable_path`, since to execute these nodes c++ code must be generated and correspondingly the executables.
</font>
%% Cell type:markdown id: tags:
<font size="3">**`code_generation(model)`**: all functions required for code generation are called and the `$ $` variables in the template are replaced accordingly and written into a .cpp file. Almost all of these subfunctions are implemented as abstract methods in the class, so they are completely customized for each custom op node. A special function is `generate_params()`. This is not implemented as an abstract method, but as a normal function, but contains by default only `pass`. This is because some custom op nodes do not have parameters that need to be generated and in this way the function is skipped. For example for a streaming fc layer node a parameter generation is necessary. How such a parameter generation can look like is described in more detail in the course of this notebook.
</font>
%% Cell type:markdown id: tags:
<font size="3">**`compile_singlenode_code()`**: To compile the generated code, the compile command must be built. This is done in this function. It creates an instance of the `CppBuilder()` class and assembles the various components for the function. The `.build` function creates the executable and then sets the corresponding attribute. The class `CppBuilder` is a transformation and a more detailed description can be found in Jupyter notebook [FINN-CodeGenerationAndCompilation](FINN-CodeGenerationAndCompilation.ipynb).
</font>
%% Cell type:markdown id: tags:
<font size="3">**`dynamic_input_to_npy(context, count)`**: creates a .npy file for all inputs of the node. These files will be stored in the directory specified by code_gen_dir. The argument `count` must be used to specify the number of inputs. `context` contains the values for the inputs.</font>
%% Cell type:markdown id: tags:
<font size="3">**`npy_to_dynamic_output(context)`**: reads the output values and sets `context` dictionary accordingly. When executing the c++ executable of the node, the output values are written to a .npy file. </font>
%% Cell type:markdown id: tags:
<font size="3">**`exec_precompiled_singlenode_model()`**: executes precompiled executable which is specified in `executable_path`</font>
%% Cell type:markdown id: tags:
<font size="3">**`execute_node(context,graph)`**: calls first `dynamic_input_to_npy()`, then executes the executable using `exec_precompiled_singlenode_model()` and at the end reads the output .npy file with `npy_to_dynamic_output`</font>
%% Cell type:markdown id: tags:
#### Generate Parameter
<font size="3">Parameters have to be generated for specific types of HLSCustomOps. For example if the node is a streaming fc layer, there are weights and activation values, which are written to separate .h files and added to the template using `#include`. For streaming fc layer the parameter generation looks like this:
</font>
%% Cell type:code id: tags:
``` python
from finn.custom_op.fpgadataflow.streamingfclayer_batch import StreamingFCLayer_Batch
showSrc(StreamingFCLayer_Batch.generate_params)
```
%% Output
def generate_params(self, model, path):
"""Saves weights into params.h and if existing thresholds into thresh.h."""
code_gen_dir = path
# weights
weights = model.get_initializer(self.onnx_node.input[1])
# convert weights into hlslib-compatible format
weight_tensor = self.get_hls_compatible_weight_tensor(weights)
export_wdt = self.get_weight_datatype()
# we have converted bipolar weights to binary for export,
# so use it as such for weight generation
if self.get_weight_datatype() == DataType.BIPOLAR:
export_wdt = DataType.BINARY
weight_hls_code = numpy_to_hls_code(
weight_tensor, export_wdt, "weights", True, True
)
# write weights into params.h
# code_gen_dir = self.get_nodeattr("code_gen_dir_npysim")
f_weights = open("{}/params.h".format(code_gen_dir), "w")
if export_wdt.bitwidth() != 1:
f_weights.write(
"static FixedPointWeights<{},{},{},{}> weights = ".format(
self.get_nodeattr("SIMD"),
export_wdt.get_hls_datatype_str(),
self.get_nodeattr("PE"),
self.calc_wmem(),
)
)
else:
f_weights.write(
"static BinaryWeights<{},{},{}> weights = ".format(
self.get_nodeattr("SIMD"), self.get_nodeattr("PE"), self.calc_wmem()
)
)
f_weights.write(weight_hls_code)
f_weights.close()
# thresholds
if len(self.onnx_node.input) > 2:
thresholds = model.get_initializer(self.onnx_node.input[2])
if thresholds is not None:
threshold_tensor = self.get_hls_compatible_threshold_tensor(thresholds)
tdt = DataType.INT32
# use UINT32 threshold export for bipolar times bipolar
inp_is_bipolar = self.get_input_datatype() == DataType.BIPOLAR
wt_is_bipolar = self.get_weight_datatype() == DataType.BIPOLAR
# reinterpret inp/wt as bipolar if bin_xnor_mode is iset
inp_is_binary = self.get_input_datatype() == DataType.BINARY
wt_is_binary = self.get_weight_datatype() == DataType.BINARY
bin_xnor_mode = self.get_nodeattr("binaryXnorMode") == 1
inp_is_bipolar = inp_is_bipolar or (inp_is_binary and bin_xnor_mode)
wt_is_bipolar = wt_is_bipolar or (wt_is_binary and bin_xnor_mode)
if inp_is_bipolar and wt_is_bipolar:
tdt = DataType.UINT32
thresholds_hls_code = numpy_to_hls_code(
threshold_tensor, tdt, "thresholds", False, True
)
# write thresholds into thresh.h
# code_gen_dir = self.get_nodeattr("code_gen_dir_npysim")
f_thresh = open("{}/thresh.h".format(code_gen_dir), "w")
tdt_hls = tdt.get_hls_datatype_str()
# use binary to export bipolar activations
export_odt = self.get_output_datatype()
if self.get_output_datatype() == DataType.BIPOLAR:
export_odt = DataType.BINARY
odt_hls = export_odt.get_hls_datatype_str()
f_thresh.write(
"static ThresholdsActivation<{},{},{},{},{},{},{}> threshs \
= ".format(
self.calc_tmem(),
self.get_nodeattr("PE"),
threshold_tensor.shape[-1],
tdt_hls,
odt_hls,
self.get_nodeattr("ActVal"),
"std::less_equal<%s>" % tdt_hls,
)
)
f_thresh.write(thresholds_hls_code)
f_thresh.close()
%% Cell type:markdown id: tags:
<font size="3">First, the values for the weights are extracted with `get_initializer()` using the ModelWrapper. At this point it is assumed that the second input of the streamingfclayer specifies the weights. After a few manipulations the weights are written in `params.h`. If there are threshold values, they will be prepared and written to `thresh.h`. </font>
%% Cell type:code id: tags:
``` python
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment