Skip to content
Snippets Groups Projects
Commit b9165b7a authored by auphelia's avatar auphelia
Browse files

[Notebooks] Update tfc_end2end_verification notebook

parent c718ece3
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# FINN - Functional Verification of End-to-End Flow
-----------------------------------------------------------------
**Important: This notebook depends on the tfc_end2end_example notebook, because we are using models that are available at intermediate steps in the end-to-end flow. So please make sure the needed .onnx files are generated to run this notebook.**
In this notebook, we will show how to take the intermediate results of the end-to-end tfc example and verify their functionality with different methods. In the following picture you can see the section in the end-to-end flow about the *Simulation & Emulation Flows*. Besides the methods in this notebook, there is another one that is covered in the Jupyter notebook [tfc_end2end_example](tfc_end2end_example.ipynb): remote execution. The remote execution allows functional verification directly on the PYNQ board, for details please have a look at the mentioned Jupyter notebook.
%% Cell type:markdown id: tags:
<img src="verification.png" alt="Drawing" style="width: 500px;"/>
%% Cell type:markdown id: tags:
We will use the following helper functions, `showSrc` to show source code of FINN library calls and `showInNetron` to show the ONNX model at the current transformation step. The Netron displays are interactive, but they only work when running the notebook actively and not on GitHub (i.e. if you are viewing this on GitHub you'll only see blank squares).
%% Cell type:code id: tags:
``` python
from finn.util.basic import make_build_dir
from finn.util.visualization import showSrc, showInNetron
import os
build_dir = os.environ["FINN_ROOT"]
```
%% Cell type:markdown id: tags:
To verify the simulations, a "golden" output is calculated as a reference. This is calculated directly from the Brevitas model using PyTorch, by running some example data from the MNIST dataset through the trained model.
%% Cell type:code id: tags:
``` python
from pkgutil import get_data
import onnx
import onnx.numpy_helper as nph
import torch
from finn.util.test import get_test_model_trained
fc = get_test_model_trained("TFC", 1, 1)
raw_i = get_data("finn.data", "onnx/mnist-conv/test_data_set_0/input_0.pb")
raw_i = get_data("qonnx.data", "onnx/mnist-conv/test_data_set_0/input_0.pb")
input_tensor = onnx.load_tensor_from_string(raw_i)
input_brevitas = torch.from_numpy(nph.to_array(input_tensor)).float()
output_golden = fc.forward(input_brevitas).detach().numpy()
output_golden
```
%% Cell type:markdown id: tags:
## Simulation using Python <a id='simpy'></a>
If an ONNX model consists of [standard ONNX](https://github.com/onnx/onnx/blob/master/docs/Operators.md) nodes and/or FINN custom operations that do not belong to the fpgadataflow (backend $\neq$ "fpgadataflow") this model can be checked for functionality using Python.
If an ONNX model consists of [standard ONNX](https://github.com/onnx/onnx/blob/master/docs/Operators.md) nodes and/or FINN custom operations that do not belong to the fpgadataflow (`backend` $\neq$ `fpgadataflow`) this model can be checked for functionality using Python.
To simulate a standard ONNX node [onnxruntime](https://github.com/microsoft/onnxruntime) is used. onnxruntime is an open source tool developed by Microsoft to run standard ONNX nodes. For the FINN custom op nodes execution functions are defined. The following is an example of the execution function of a XNOR popcount node.
To simulate a standard ONNX node [onnxruntime](https://github.com/microsoft/onnxruntime) is used. onnxruntime is an open source tool developed by Microsoft to run standard ONNX nodes. For the FINN custom op nodes execution, functions are defined. The following is an example of the execution function of a XNOR popcount node.
%% Cell type:code id: tags:
``` python
from qonnx.custom_op.general.xnorpopcount import xnorpopcountmatmul
showSrc(xnorpopcountmatmul)
```
%% Cell type:markdown id: tags:
The function contains a description of the behaviour in Python and can thus calculate the result of the node.
This execution function and onnxruntime is used when `execute_onnx` from `onnx_exec` is applied to the model. The model is then simulated node by node and the result is stored in a context dictionary, which contains the values of each tensor at the end of the execution. To get the result, only the output tensor has to be extracted.
The procedure is shown below. We take the model right before the nodes should be converted into HLS layers and generate an input tensor to pass to the execution function. The input tensor is generated from the Brevitas example inputs.
%% Cell type:code id: tags:
``` python
import numpy as np
from qonnx.core.modelwrapper import ModelWrapper
input_dict = {"global_in": nph.to_array(input_tensor)}
model_for_sim = ModelWrapper(build_dir+"/tfc_w1a1_ready_for_hls_conversion.onnx")
```
%% Cell type:code id: tags:
``` python
import finn.core.onnx_exec as oxe
output_dict = oxe.execute_onnx(model_for_sim, input_dict, return_full_exec_context=False)
output_pysim = output_dict[list(output_dict.keys())[0]]
if np.isclose(output_pysim, np.where(output_golden[0]==np.amax(output_golden[0])), atol=1e-3).all():
print("Results are the same!")
else:
print("The results are not the same!")
```
%% Cell type:markdown id: tags:
The result is compared with the theoretical "golden" value for verification.
%% Cell type:markdown id: tags:
## Simulation (cppsim) using C++
When dealing with HLS custom op nodes in FINN the simulation using Python is no longer sufficient. After the nodes have been converted to HLS layers, the simulation using C++ can be used. To do this, the input tensor is stored in an .npy file and C++ code is generated that reads the values from the .npy array, streams them to the corresponding finn-hlslib function and writes the result to a new .npy file. This in turn can be read in Python and processed in the FINN flow. For this example the model after setting the folding factors in the HLS layers is used, please be aware that this is not the full model, but the dataflow partition, so before executing at the end of this section we have to integrate the model back into the parent model.
When dealing with HLS custom op nodes in FINN the simulation using Python is no longer sufficient. After the nodes have been converted to HLS layers, the simulation using C++ can be used. To do this, the input tensor is stored in a .npy file and C++ code is generated that reads the values from the .npy array, streams them to the corresponding finn-hlslib function and writes the result to a new .npy file. This in turn can be read in Python and processed in the FINN flow. For this example the model after setting the folding factors in the HLS layers is used, please be aware that this is not the full model, but the dataflow partition, so before executing at the end of this section we have to integrate the model back into the parent model.
%% Cell type:code id: tags:
``` python
model_for_cppsim = ModelWrapper(build_dir+"/tfc_w1_a1_set_folding_factors.onnx")
```
%% Cell type:markdown id: tags:
To generate the code for this simulation and to generate the executable two transformations are used:
* `PrepareCppSim` which generates the C++ code for the corresponding hls layer
* `CompileCppSim` which compules the C++ code and stores the path to the executable
%% Cell type:code id: tags:
``` python
from finn.transformation.fpgadataflow.prepare_cppsim import PrepareCppSim
from finn.transformation.fpgadataflow.compile_cppsim import CompileCppSim
from qonnx.transformation.general import GiveUniqueNodeNames
model_for_cppsim = model_for_cppsim.transform(GiveUniqueNodeNames())
model_for_cppsim = model_for_cppsim.transform(PrepareCppSim())
model_for_cppsim = model_for_cppsim.transform(CompileCppSim())
```
%% Cell type:markdown id: tags:
When we take a look at the model using netron, we can see that the transformations introduced new attributes.
%% Cell type:code id: tags:
``` python
model_for_cppsim.save(build_dir+"/tfc_w1_a1_for_cppsim.onnx")
showInNetron(build_dir+"/tfc_w1_a1_for_cppsim.onnx")
```
%% Cell type:markdown id: tags:
The following node attributes have been added:
* `code_gen_dir_cppsim` indicates the directory where the files for the simulation using C++ are stored
* `executable_path` specifies the path to the executable
We take now a closer look into the files that were generated:
%% Cell type:code id: tags:
``` python
from qonnx.custom_op.registry import getCustomOp
fc0 = model_for_cppsim.graph.node[1]
fc0w = getCustomOp(fc0)
code_gen_dir = fc0w.get_nodeattr("code_gen_dir_cppsim")
!ls {code_gen_dir}
```
%% Cell type:markdown id: tags:
Besides the .cpp file, the folder contains .h files with the weights and thresholds. The shell script contains the compile command and *node_model* is the executable generated by compilation. Comparing this with the `executable_path` node attribute, it can be seen that it specifies exactly the path to *node_model*.
%% Cell type:markdown id: tags:
To simulate the model the execution mode(exec_mode) must be set to "cppsim". This is done using the transformation SetExecMode.
%% Cell type:code id: tags:
``` python
from finn.transformation.fpgadataflow.set_exec_mode import SetExecMode
model_for_cppsim = model_for_cppsim.transform(SetExecMode("cppsim"))
model_for_cppsim.save(build_dir+"/tfc_w1_a1_for_cppsim.onnx")
```
%% Cell type:markdown id: tags:
Before the model can be executed using `execute_onnx`, we integrate the child model in the parent model. The function reads then the `exec_mode` and writes the input into the correct directory in a .npy file. To be able to read this in C++, there is an additional .hpp file ([npy2apintstream.hpp](https://github.com/Xilinx/finn/blob/master/src/finn/data/cpp/npy2apintstream.hpp)) in FINN, which uses cnpy to read .npy files and convert them into streams, or to read a stream and write it into an .npy. [cnpy](https://github.com/rogersce/cnpy) is a helper to read and write .npy and .npz formates in C++.
Before the model can be executed using `execute_onnx`, we integrate the child model in the parent model. The function reads then the `exec_mode` and writes the input into the correct directory in a .npy file. To be able to read this in C++, there is an additional .hpp file ([npy2apintstream.hpp](https://github.com/Xilinx/finn/blob/main/src/finn/qnn-data/cpp/npy2apintstream.hpp)) in FINN, which uses cnpy to read .npy files and convert them into streams, or to read a stream and write it into an .npy. [cnpy](https://github.com/rogersce/cnpy) is a helper to read and write .npy and .npz formates in C++.
The result is again compared to the "golden" output.
%% Cell type:code id: tags:
``` python
parent_model = ModelWrapper(build_dir+"/tfc_w1_a1_dataflow_parent.onnx")
sdp_node = parent_model.graph.node[1]
child_model = build_dir + "/tfc_w1_a1_for_cppsim.onnx"
getCustomOp(sdp_node).set_nodeattr("model", child_model)
output_dict = oxe.execute_onnx(parent_model, input_dict)
output_cppsim = output_dict[list(output_dict.keys())[0]]
if np.isclose(output_cppsim, np.where(output_golden[0]==np.amax(output_golden[0])), atol=1e-3).all():
print("Results are the same!")
else:
print("The results are not the same!")
```
%% Cell type:markdown id: tags:
## Emulation (rtlsim) using PyVerilator
The emulation using [PyVerilator](https://github.com/maltanar/pyverilator) can be done after IP blocks are generated from the corresponding HLS layers. Pyverilator is a tool which makes it possible to simulate verilog files using verilator via a python interface.
We have two ways to use rtlsim, one is to run the model node-by-node as with the simulation methods, but if the model is in the form of the dataflow partition, the part of the graph that consist of only HLS nodes could also be executed as whole.
%% Cell type:markdown id: tags:
Because at the point where we want to grab and verify the model, the model is already in split form (parent graph consisting of non-hls layers and child graph consisting only of hls layers) we first have to reference the child graph within the parent graph. This is done using the node attribute `model` for the `StreamingDataflowPartition` node.
First the procedure is shown, if the child graph has ip blocks corresponding to the individual layers, then the procedure is shown, if the child graph already has a stitched IP.
%% Cell type:markdown id: tags:
### Emulation of model node-by-node
The child model is loaded and the `exec_mode` for each node is set. To prepare the node-by-node emulation the transformation `PrepareRTLSim` is applied to the child model. With this transformation the emulation files are created for each node and can be used directly when calling `execute_onnx()`. Each node has a new node attribute "rtlsim_so" after transformation, which contains the path to the corresponding emulation files. Then it is saved in a new .onnx file so that the changed model can be referenced in the parent model.
%% Cell type:code id: tags:
``` python
from finn.transformation.fpgadataflow.prepare_rtlsim import PrepareRTLSim
from finn.transformation.fpgadataflow.prepare_ip import PrepareIP
from finn.transformation.fpgadataflow.hlssynth_ip import HLSSynthIP
test_fpga_part = "xc7z020clg400-1"
target_clk_ns = 10
child_model = ModelWrapper(build_dir + "/tfc_w1_a1_set_folding_factors.onnx")
child_model = child_model.transform(GiveUniqueNodeNames())
child_model = child_model.transform(PrepareIP(test_fpga_part, target_clk_ns))
child_model = child_model.transform(HLSSynthIP())
child_model = child_model.transform(SetExecMode("rtlsim"))
child_model = child_model.transform(PrepareRTLSim())
child_model.save(build_dir + "/tfc_w1_a1_dataflow_child.onnx")
```
%% Cell type:markdown id: tags:
The next step is to load the parent model and set the node attribute `model` in the StreamingDataflowPartition node (`sdp_node`). Afterwards the `exec_mode` is set in the parent model in each node.
The next step is to load the parent model and set the node attribute `model` in the StreamingDataflowPartition node (`sdp_node`). Afterwards the `exec_mode` is set in the parent model in each node and the model can be executed.
%% Cell type:code id: tags:
``` python
# parent model
model_for_rtlsim = ModelWrapper(build_dir + "/tfc_w1_a1_dataflow_parent.onnx")
# reference child model
sdp_node = getCustomOp(model_for_rtlsim.graph.node[1])
sdp_node.set_nodeattr("model", build_dir + "/tfc_w1_a1_dataflow_child.onnx")
model_for_rtlsim = model_for_rtlsim.transform(SetExecMode("rtlsim"))
```
%% Cell type:markdown id: tags:
Because the necessary files for the emulation are already generated in Jupyter notebook [tfc_end2end_example](tfc_end2end_example.ipynb), in the next step the execution of the model can be done directly.
%% Cell type:code id: tags:
``` python
output_dict = oxe.execute_onnx(model_for_rtlsim, input_dict)
output_rtlsim = output_dict[list(output_dict.keys())[0]]
if np.isclose(output_rtlsim, np.where(output_golden[0]==np.amax(output_golden[0])), atol=1e-3).all():
print("Results are the same!")
else:
print("The results are not the same!")
```
%% Cell type:markdown id: tags:
### Emulation of stitched IP
Here we use the same procedure. First the child model is loaded, but in contrast to the layer-by-layer emulation, the metadata property `exec_mode` is set to "rtlsim" for the whole child model. When the model is integrated and executed in the last step, the verilog files of the stitched IP of the child model are used.
%% Cell type:code id: tags:
``` python
from finn.transformation.fpgadataflow.insert_dwc import InsertDWC
from finn.transformation.fpgadataflow.insert_fifo import InsertFIFO
from finn.transformation.fpgadataflow.create_stitched_ip import CreateStitchedIP
child_model = ModelWrapper(build_dir + "/tfc_w1_a1_dataflow_child.onnx")
child_model = child_model.transform(InsertDWC())
child_model = child_model.transform(InsertFIFO())
child_model = child_model.transform(GiveUniqueNodeNames())
child_model = child_model.transform(PrepareIP(test_fpga_part, target_clk_ns))
child_model = child_model.transform(HLSSynthIP())
child_model = child_model.transform(CreateStitchedIP(test_fpga_part, target_clk_ns))
child_model = child_model.transform(PrepareRTLSim())
child_model.set_metadata_prop("exec_mode","rtlsim")
child_model.save(build_dir + "/tfc_w1_a1_dataflow_child.onnx");
```
%% Cell type:code id: tags:
``` python
# parent model
model_for_rtlsim = ModelWrapper(build_dir + "/tfc_w1_a1_dataflow_parent.onnx")
# reference child model
sdp_node = getCustomOp(model_for_rtlsim.graph.node[1])
sdp_node.set_nodeattr("model", build_dir + "/tfc_w1_a1_dataflow_child.onnx")
```
%% Cell type:code id: tags:
``` python
output_dict = oxe.execute_onnx(model_for_rtlsim, input_dict)
output_rtlsim = output_dict[list(output_dict.keys())[0]]
```
%% Cell type:code id: tags:
``` python
if np.isclose(output_rtlsim, np.where(output_golden[0]==np.amax(output_golden[0])), atol=1e-3).all():
print("Results are the same!")
else:
print("The results are not the same!")
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment