Skip to content
Snippets Groups Projects
Commit dec3aa75 authored by auphelia's avatar auphelia
Browse files

[Notebook] Update and rerun basics Jupyter notebooks

parent 87b1bd45
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# FINN - How to work with ONNX
This notebook should give an overview of ONNX ProtoBuf, help to create and manipulate an ONNX model and use FINN functions to work with it. There may be overlaps to notebook [ModelWrapper](2_modelwrapper.ipynb), but this notebook will give an overview about the handling of ONNX models in FINN.
This notebook should give an overview of ONNX ProtoBuf, help to create and manipulate an ONNX model and use FINN functions to work with it.
%% Cell type:markdown id: tags:
## Outline
* #### How to create a simple model
* #### How to create a simple ONNX model
* #### How to manipulate an ONNX model
%% Cell type:markdown id: tags:
### How to create a simple model
### How to create a simple ONNX model
To explain how to create an ONNX model a simple example with mathematical operations is used. All nodes are from the [standard operations library of ONNX](https://github.com/onnx/onnx/blob/master/docs/Operators.md).
First ONNX is imported, then the helper function can be used to make a node.
%% Cell type:code id: tags:
``` python
import onnx
Add1_node = onnx.helper.make_node(
'Add',
inputs=['in1', 'in2'],
outputs=['sum1'],
name='Add1'
)
```
%% Cell type:markdown id: tags:
The first attribute of the node is the operation type. In this case it is `'Add'`, so it is an adder node. Then the input names are passed to the node and at the end a name is assigned to the output.
For this example we want two other adder nodes, one abs node and the output shall be rounded so one round node is needed.
%% Cell type:code id: tags:
``` python
Add2_node = onnx.helper.make_node(
'Add',
inputs=['sum1', 'in3'],
outputs=['sum2'],
name='Add2',
)
Add3_node = onnx.helper.make_node(
'Add',
inputs=['abs1', 'abs1'],
outputs=['sum3'],
name='Add3',
)
Abs_node = onnx.helper.make_node(
'Abs',
inputs=['sum2'],
outputs=['abs1'],
name='Abs'
)
Round_node = onnx.helper.make_node(
'Round',
inputs=['sum3'],
outputs=['out1'],
name='Round',
)
```
%% Cell type:markdown id: tags:
The names of the inputs and outputs of the nodes give already an idea of the structure of the resulting graph. In order to integrate the nodes into a graph environment, the inputs and outputs of the graph have to be specified first. In ONNX all data edges are processed as tensors. So with the helper function tensor value infos are created for the input and output tensors of the graph. Float from ONNX is used as data type.
The names of the inputs and outputs of the nodes give already an idea of the structure of the resulting graph. In order to integrate the nodes into a graph environment, the inputs and outputs of the graph have to be specified first. In ONNX all data edges are processed as tensors. So with onnx helper function tensors value infos are created for the input and output tensors of the graph. Float from ONNX is used as data type.
%% Cell type:code id: tags:
``` python
in1 = onnx.helper.make_tensor_value_info("in1", onnx.TensorProto.FLOAT, [4, 4])
in2 = onnx.helper.make_tensor_value_info("in2", onnx.TensorProto.FLOAT, [4, 4])
in3 = onnx.helper.make_tensor_value_info("in3", onnx.TensorProto.FLOAT, [4, 4])
out1 = onnx.helper.make_tensor_value_info("out1", onnx.TensorProto.FLOAT, [4, 4])
```
%% Cell type:markdown id: tags:
Now the graph can be built. First all nodes are passed. Here it is to be noted that it requires a certain sequence. The nodes must be instantiated in their dependencies to each other. This means Add2 must not be listed before Add1, because Add2 depends on the result of Add1. A name is then assigned to the graph. This is followed by the inputs and outputs.
`value_info` of the graph contains the remaining tensors within the graph. When creating the nodes we have already defined names for the inner data edges and now these are assigned tensors of the datatype float and a certain shape.
%% Cell type:code id: tags:
``` python
graph = onnx.helper.make_graph(
nodes=[
Add1_node,
Add2_node,
Abs_node,
Add3_node,
Round_node,
],
name="simple_graph",
inputs=[in1, in2, in3],
outputs=[out1],
value_info=[
onnx.helper.make_tensor_value_info("sum1", onnx.TensorProto.FLOAT, [4, 4]),
onnx.helper.make_tensor_value_info("sum2", onnx.TensorProto.FLOAT, [4, 4]),
onnx.helper.make_tensor_value_info("abs1", onnx.TensorProto.FLOAT, [4, 4]),
onnx.helper.make_tensor_value_info("sum3", onnx.TensorProto.FLOAT, [4, 4]),
],
)
```
%% Cell type:markdown id: tags:
**Important**: In our example, the shape of the tensors does not change during the calculation. This is not always the case. So you have to make sure that you specify the shape correctly.
Now a model can be created from the graph and saved using the `.save` function. The model is saved in .onnx format and can be reloaded with `onnx.load()`. This also means that you can easily share your own model in .onnx format with others.
%% Cell type:code id: tags:
``` python
onnx_model = onnx.helper.make_model(graph, producer_name="simple-model")
onnx.save(onnx_model, 'simple_model.onnx')
onnx.save(onnx_model, '/tmp/simple_model.onnx')
```
%% Cell type:markdown id: tags:
To visualize the created model, [netron](https://github.com/lutzroeder/netron) can be used. Netron is a visualizer for neural network, deep learning and machine learning models.
To visualize the created model, [netron](https://github.com/lutzroeder/netron) can be used. Netron is a visualizer for neural network, deep learning and machine learning models. FINN provides a utility function for visualization with netron, which we import and use in the following.
%% Cell type:code id: tags:
``` python
from finn.util.visualization import showInNetron
```
%% Cell type:code id: tags:
``` python
showInNetron('simple_model.onnx')
showInNetron('/tmp/simple_model.onnx')
```
%% Output
Serving 'simple_model.onnx' at http://0.0.0.0:8081
Serving '/tmp/simple_model.onnx' at http://0.0.0.0:8081
<IPython.lib.display.IFrame at 0x7fb9303c7b38>
<IPython.lib.display.IFrame at 0x7fcdfc956b70>
%% Cell type:markdown id: tags:
Netron also allows you to interactively explore the model. If you click on a node, the node attributes will be displayed.
In order to test the resulting model, a function is first written in Python that calculates the expected output. Because numpy arrays are to be used, numpy is imported first.
%% Cell type:code id: tags:
``` python
import numpy as np
def expected_output(in1, in2, in3):
sum1 = np.add(in1, in2)
sum2 = np.add(sum1, in3)
abs1 = np.absolute(sum2)
sum3 = np.add(abs1, abs1)
return np.round(sum3)
```
%% Cell type:markdown id: tags:
Then the values for the three inputs are calculated. Random numbers are used.
%% Cell type:code id: tags:
``` python
in1_values =np.asarray(np.random.uniform(low=-5, high=5, size=(4,4)), dtype=np.float32)
in2_values = np.asarray(np.random.uniform(low=-5, high=5, size=(4,4)), dtype=np.float32)
in3_values = np.asarray(np.random.uniform(low=-5, high=5, size=(4,4)), dtype=np.float32)
```
%% Cell type:markdown id: tags:
We can easily pass the values to the function we just wrote to calculate the expected result. For the created model the inputs must be summarized in a dictionary, which is then passed on to the model.
%% Cell type:code id: tags:
``` python
input_dict = {}
input_dict["in1"] = in1_values
input_dict["in2"] = in2_values
input_dict["in3"] = in3_values
```
%% Cell type:markdown id: tags:
To run the model and calculate the output, [onnxruntime](https://github.com/microsoft/onnxruntime) can be used. ONNX Runtime is a performance-focused complete scoring engine for Open Neural Network Exchange (ONNX) models from Microsoft. The `.InferenceSession` function is used to create a session of the model and `.run` is used to execute the model.
To run the model and calculate the output, [onnxruntime](https://github.com/microsoft/onnxruntime) can be used. ONNX Runtime is a performance-focused complete scoring engine for ONNX models from Microsoft. The `.InferenceSession` function is used to create a session of the model and `.run` is used to execute the model.
%% Cell type:code id: tags:
``` python
import onnxruntime as rt
sess = rt.InferenceSession(onnx_model.SerializeToString())
output = sess.run(None, input_dict)
```
%% Cell type:markdown id: tags:
The input values are also transferred to the reference function. Now the output of the execution of the model can be compared with that of the reference.
%% Cell type:code id: tags:
``` python
ref_output= expected_output(in1_values, in2_values, in3_values)
print("The output of the ONNX model is: \n{}".format(output[0]))
print("\nThe output of the reference function is: \n{}".format(ref_output))
if (output[0] == ref_output).all():
print("\nThe results are the same!")
else:
raise Exception("Something went wrong, the output of the model doesn't match the expected output!")
```
%% Output
The output of the ONNX model is:
[[ 1. 16. 3. 10.]
[ 5. 17. 17. 13.]
[ 3. 11. 10. 17.]
[ 9. 2. 4. 8.]]
[[22. 13. 21. 8.]
[ 0. 8. 11. 1.]
[ 3. 12. 8. 2.]
[ 0. 6. 1. 4.]]
The output of the reference function is:
[[ 1. 16. 3. 10.]
[ 5. 17. 17. 13.]
[ 3. 11. 10. 17.]
[ 9. 2. 4. 8.]]
[[22. 13. 21. 8.]
[ 0. 8. 11. 1.]
[ 3. 12. 8. 2.]
[ 0. 6. 1. 4.]]
The results are the same!
%% Cell type:markdown id: tags:
Now that we have verified that the model works as we expected it to, we can continue working with the graph.
%% Cell type:markdown id: tags:
### How to manipulate an ONNX model
In the model there are two successive adder nodes. An adder node in ONNX can only add two inputs, but there is also the [**sum**](https://github.com/onnx/onnx/blob/master/docs/Operators.md#Sum) node, which can process more than two inputs. So it would be a reasonable change of the graph to combine the two successive adder nodes to one sum node.
%% Cell type:markdown id: tags:
In the following we assume that we do not know the appearance of the model, so we first try to identify whether there are two consecutive adders in the graph and then convert them into a sum node.
Here we make use of FINN. FINN provides a thin wrapper around the model which provides several additional helper functions to manipulate the graph. The code can be found [here](https://github.com/Xilinx/finn/blob/master/src/finn/core/modelwrapper.py) and you can find a more detailed description in the notebook [ModelWrapper](2_modelwrapper.ipynb).
Here we make use of FINN. FINN provides a thin wrapper around the model which provides several additional helper functions to manipulate the graph. The code can be found [here](https://github.com/Xilinx/finn/blob/master/src/finn/core/modelwrapper.py).
%% Cell type:code id: tags:
``` python
from finn.core.modelwrapper import ModelWrapper
finn_model = ModelWrapper(onnx_model)
```
%% Cell type:markdown id: tags:
As explained in the previous section, it is important that the nodes are listed in the correct order. If a new node has to be inserted or an old node has to be replaced, it is important to do that in the appropriate position. The following function serves this purpose. It returns a dictionary, which contains the node name as key and the respective node index as value.
%% Cell type:code id: tags:
``` python
def get_node_id(model):
node_index = {}
node_ind = 0
for node in model.graph.node:
node_index[node.name] = node_ind
node_ind += 1
return node_index
```
%% Cell type:markdown id: tags:
The function scans the list of nodes and stores a run index (`node_ind`) as node index in the dictionary for every node name.
Another helper function is being implemented that searches for adder nodes in the graph and returns the found nodes. This is needed to determine if and which adder nodes are in the given model.
%% Cell type:code id: tags:
``` python
def identify_adder_nodes(model):
add_nodes = []
for node in model.graph.node:
if node.op_type == "Add":
add_nodes.append(node)
return add_nodes
```
%% Cell type:markdown id: tags:
The function iterates over all nodes of the model and if the operation type is `"Add"` the node will be stored in `add_nodes`. At the end `add_nodes` is returned.
If we apply this to our model, three nodes should be returned.
%% Cell type:code id: tags:
``` python
add_nodes = identify_adder_nodes(finn_model)
for node in add_nodes:
print("Found adder node: {}".format(node.name))
```
%% Output
Found adder node: Add1
Found adder node: Add2
Found adder node: Add3
%% Cell type:markdown id: tags:
Among other helper functions, `ModelWrapper` offers two functions that can help to determine the preceding and succeeding node of a node. However, these functions are not getting a node as input, but can determine the consumer or producer of a tensor. We write two functions that uses these helper functions to determine the previous and the next node of a node.
%% Cell type:code id: tags:
``` python
def find_predecessor(model, node):
predecessors = []
for i in range(len(node.input)):
producer = model.find_producer(node.input[i])
predecessors.append(producer)
return predecessors
def find_successor(model, node):
successors = []
for i in range(len(node.output)):
consumer = model.find_consumer(node.output[i])
successors.append(consumer)
return successors
```
%% Cell type:markdown id: tags:
The first function uses `find_producer` from `ModelWrapper` to create a list of the producers of the inputs of the given node. So the returned list is indirectly filled with the predecessors of the node. The second function works in a similar way, `find_consumer` from `ModelWrapper` is used to find the consumers of the output tensors of the node and so a list with the successors can be created.
%% Cell type:code id: tags:
``` python
def adder_pair(model, node):
adder_pairs = []
node_pair = []
successor_list = find_successor(model, node)
for successor in successor_list:
if successor.op_type == "Add":
node_pair.append(node)
node_pair.append(successor)
adder_pairs.append((node_pair))
node_pair = []
return adder_pairs
```
%% Cell type:markdown id: tags:
The function gets a node and the model as input. Two empty lists are created to be filled with a list of adder node pairs that can be returned as result of the function. Then the function `find_successor` is used to return all of the successors of the node. If one of the successors is an adder node, the node is saved in `node_pair` together with the successive adder node and put in the list `adder_pairs`. Then the temporary list is cleaned and can be filled with the next adder node pair. Since it is theoretically possible for an adder node to have more than one subsequent adder node, a list of lists is created. This list of the node with all its successive adder nodes is returned.
So now we can find out which adder node has an adder node as successor. Since the model is known, one adder pair (Add1+Add2) should be found when applying the function to the previously determined adder node list (`add_nodes`).
%% Cell type:code id: tags:
``` python
for node in add_nodes:
add_pairs = adder_pair(finn_model, node)
if len(add_pairs) != 0:
for i in range(len(add_pairs)):
substitute_pair = add_pairs[i]
print("Found following pair that could be replaced by a sum node:")
for node_pair in add_pairs:
for node in node_pair:
print(node.name)
```
%% Output
Found following pair that could be replaced by a sum node:
Add1
Add2
%% Cell type:markdown id: tags:
Now that the pair to be replaced has been identified (`substitute_pair`), a sum node can be instantiated and inserted into the graph at the correct position.
First of all, the inputs must be determined. For this the adder nodes inputs are used minus the input, which corresponds to the output of the other adder node.
%% Cell type:code id: tags:
``` python
input_list = []
for i in range(len(substitute_pair)):
if i == 0:
for j in range(len(substitute_pair[i].input)):
if substitute_pair[i].input[j] != substitute_pair[i+1].output[0]:
input_list.append(substitute_pair[i].input[j])
else:
for j in range(len(substitute_pair[i].input)):
if substitute_pair[i].input[j] != substitute_pair[i-1].output[0]:
input_list.append(substitute_pair[i].input[j])
print("The new node gets the following inputs: \n{}".format(input_list))
```
%% Output
The new node gets the following inputs:
['in1', 'in2', 'in3']
%% Cell type:markdown id: tags:
The output of the sum node matches the output of the second adder node and can therefore be taken over directly.
%% Cell type:code id: tags:
``` python
sum_output = substitute_pair[1].output[0]
```
%% Cell type:markdown id: tags:
The summary node can be created with this information.
%% Cell type:code id: tags:
``` python
Sum_node = onnx.helper.make_node(
'Sum',
inputs=input_list,
outputs=[sum_output],
name="Sum"
)
```
%% Cell type:markdown id: tags:
The node can now be inserted into the graph and the old nodes are removed.
%% Cell type:code id: tags:
``` python
node_ids = get_node_id(finn_model)
node_ind = node_ids[substitute_pair[0].name]
graph.node.insert(node_ind, Sum_node)
for node in substitute_pair:
graph.node.remove(node)
```
%% Cell type:markdown id: tags:
To insert the node in the right place, the index of the first node of the substitute_pair is used as node index for the sum node and embedded into the graph using `.insert`. Then the two elements in `substitute_pair` are deleted using `.remove`. `.insert` and `.remove` are functions provided by ONNX.
%% Cell type:markdown id: tags:
The new graph is saved as ONNX model and can be visualized with Netron.
%% Cell type:code id: tags:
``` python
onnx_model1 = onnx.helper.make_model(graph, producer_name="simple-model1")
onnx.save(onnx_model1, 'simple_model1.onnx')
onnx.save(onnx_model1, '/tmp/simple_model1.onnx')
```
%% Cell type:code id: tags:
``` python
showInNetron('simple_model1.onnx')
showInNetron('/tmp/simple_model1.onnx')
```
%% Output
Stopping http://0.0.0.0:8081
Serving 'simple_model1.onnx' at http://0.0.0.0:8081
Serving '/tmp/simple_model1.onnx' at http://0.0.0.0:8081
<IPython.lib.display.IFrame at 0x7fb93018f9e8>
<IPython.lib.display.IFrame at 0x7fcdfc130cc0>
%% Cell type:markdown id: tags:
Through the visualization it can already be seen that the insertion was successful, but it is still to be checked whether the result remains the same. Therefore the result of the reference function written in the previous section is used and the new model with the input values is simulated. At this point onnxruntime can be used again. The simulation is analogous to the one of the first model in the previous section.
%% Cell type:code id: tags:
``` python
sess = rt.InferenceSession(onnx_model1.SerializeToString())
output = sess.run(None, input_dict)
```
%% Cell type:code id: tags:
``` python
print("The output of the manipulated ONNX model is: \n{}".format(output[0]))
print("\nThe output of the reference function is: \n{}".format(ref_output))
if (output[0] == ref_output).all():
print("\nThe results are the same!")
else:
raise Exception("Something went wrong, the output of the model doesn't match the expected output!")
```
%% Output
The output of the manipulated ONNX model is:
[[ 1. 16. 3. 10.]
[ 5. 17. 17. 13.]
[ 3. 11. 10. 17.]
[ 9. 2. 4. 8.]]
[[22. 13. 21. 8.]
[ 0. 8. 11. 1.]
[ 3. 12. 8. 2.]
[ 0. 6. 1. 4.]]
The output of the reference function is:
[[ 1. 16. 3. 10.]
[ 5. 17. 17. 13.]
[ 3. 11. 10. 17.]
[ 9. 2. 4. 8.]]
[[22. 13. 21. 8.]
[ 0. 8. 11. 1.]
[ 3. 12. 8. 2.]
[ 0. 6. 1. 4.]]
The results are the same!
......
%% Cell type:markdown id: tags:
# Importing Brevitas networks into FINN
In this notebook we'll go through an example of how to import a Brevitas-trained QNN into FINN. The steps will be as follows:
1. Load up the trained PyTorch model
2. Call Brevitas FINN-ONNX export and visualize with Netron
3. Import into FINN and call cleanup transformations
We'll use the following showSrc function to print the source code for function calls in the Jupyter notebook:
We'll use the following utility functions to print the source code for function calls (`showSrc()`) and to visualize a network using netron (`showInNetron()`) in the Jupyter notebook:
%% Cell type:code id: tags:
``` python
import onnx
from finn.util.visualization import showSrc, showInNetron
```
%% Cell type:markdown id: tags:
## 1. Load up the trained PyTorch model
The FINN Docker image comes with several [example Brevitas networks](https://github.com/maltanar/brevitas_cnv_lfc), and we'll use the LFC-w1a1 model as the example network here. This is a binarized fully connected network trained on the MNIST dataset. Let's start by looking at what the PyTorch network definition looks like:
%% Cell type:code id: tags:
``` python
from models.LFC import LFC
showSrc(LFC)
```
%% Output
class LFC(Module):
def __init__(self, num_classes=10, weight_bit_width=None, act_bit_width=None,
in_bit_width=None, in_ch=1, in_features=(28, 28), device="cpu"):
super(LFC, self).__init__()
self.device = device
weight_quant_type = get_quant_type(weight_bit_width)
act_quant_type = get_quant_type(act_bit_width)
in_quant_type = get_quant_type(in_bit_width)
stats_op = get_stats_op(weight_quant_type)
self.features = ModuleList()
self.features.append(get_act_quant(in_bit_width, in_quant_type))
self.features.append(Dropout(p=IN_DROPOUT))
in_features = reduce(mul, in_features)
for out_features in FC_OUT_FEATURES:
self.features.append(get_quant_linear(in_features=in_features,
out_features=out_features,
per_out_ch_scaling=INTERMEDIATE_FC_PER_OUT_CH_SCALING,
bit_width=weight_bit_width,
quant_type=weight_quant_type,
stats_op=stats_op))
in_features = out_features
self.features.append(BatchNorm1d(num_features=in_features))
self.features.append(get_act_quant(act_bit_width, act_quant_type))
self.features.append(Dropout(p=HIDDEN_DROPOUT))
self.features.append(get_quant_linear(in_features=in_features,
out_features=num_classes,
per_out_ch_scaling=LAST_FC_PER_OUT_CH_SCALING,
bit_width=weight_bit_width,
quant_type=weight_quant_type,
stats_op=stats_op))
self.features.append(BatchNorm1d(num_features=num_classes))
for m in self.modules():
if isinstance(m, QuantLinear):
torch.nn.init.uniform_(m.weight.data, -1, 1)
def clip_weights(self, min_val, max_val):
for mod in self.features:
if isinstance(mod, QuantLinear):
mod.weight.data.clamp_(min_val, max_val)
def forward(self, x):
x = x.view(x.shape[0], -1)
x = 2.0 * x - torch.tensor([1.0]).to(self.device)
for mod in self.features:
x = mod(x)
return x
%% Cell type:markdown id: tags:
We can see that the network topology is constructed using a few helper functions that generate the quantized linear layers and quantized activations. The bitwidth of the layers is actually parametrized in the constructor, so let's instantiate a 1-bit weights and activations version of this network. We also have pretrained weights for this network, which we will load into the model.
%% Cell type:code id: tags:
``` python
import torch
trained_lfc_w1a1_checkpoint = "/workspace/brevitas_cnv_lfc/pretrained_models/LFC_1W1A/checkpoints/best.tar"
lfc = LFC(weight_bit_width=1, act_bit_width=1, in_bit_width=1).eval()
checkpoint = torch.load(trained_lfc_w1a1_checkpoint, map_location="cpu")
lfc.load_state_dict(checkpoint["state_dict"])
lfc
```
%% Output
LFC(
(features): ModuleList(
(0): QuantHardTanh(
(act_quant_proxy): ActivationQuantProxy(
(fused_activation_quant_proxy): FusedActivationQuantProxy(
(activation_impl): Identity()
(tensor_quant): ClampedBinaryQuant(
(scaling_impl): StandaloneScaling(
(restrict_value): RestrictValue(
(forward_impl): Sequential(
(0): PowerOfTwo()
(1): ClampMin()
)
)
)
)
)
)
)
(1): Dropout(p=0.2)
(2): QuantLinear(
in_features=784, out_features=1024, bias=False
(weight_reg): WeightReg()
(weight_quant): WeightQuantProxy(
(tensor_quant): BinaryQuant(
(scaling_impl): StandaloneScaling(
(restrict_value): RestrictValue(
(forward_impl): Sequential(
(0): PowerOfTwo()
(1): Identity()
)
)
)
)
)
(bias_quant): BiasQuantProxy()
)
(3): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(4): QuantHardTanh(
(act_quant_proxy): ActivationQuantProxy(
(fused_activation_quant_proxy): FusedActivationQuantProxy(
(activation_impl): Identity()
(tensor_quant): ClampedBinaryQuant(
(scaling_impl): StandaloneScaling(
(restrict_value): RestrictValue(
(forward_impl): Sequential(
(0): PowerOfTwo()
(1): ClampMin()
)
)
)
)
)
)
)
(5): Dropout(p=0.2)
(6): QuantLinear(
in_features=1024, out_features=1024, bias=False
(weight_reg): WeightReg()
(weight_quant): WeightQuantProxy(
(tensor_quant): BinaryQuant(
(scaling_impl): StandaloneScaling(
(restrict_value): RestrictValue(
(forward_impl): Sequential(
(0): PowerOfTwo()
(1): Identity()
)
)
)
)
)
(bias_quant): BiasQuantProxy()
)
(7): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): QuantHardTanh(
(act_quant_proxy): ActivationQuantProxy(
(fused_activation_quant_proxy): FusedActivationQuantProxy(
(activation_impl): Identity()
(tensor_quant): ClampedBinaryQuant(
(scaling_impl): StandaloneScaling(
(restrict_value): RestrictValue(
(forward_impl): Sequential(
(0): PowerOfTwo()
(1): ClampMin()
)
)
)
)
)
)
)
(9): Dropout(p=0.2)
(10): QuantLinear(
in_features=1024, out_features=1024, bias=False
(weight_reg): WeightReg()
(weight_quant): WeightQuantProxy(
(tensor_quant): BinaryQuant(
(scaling_impl): StandaloneScaling(
(restrict_value): RestrictValue(
(forward_impl): Sequential(
(0): PowerOfTwo()
(1): Identity()
)
)
)
)
)
(bias_quant): BiasQuantProxy()
)
(11): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(12): QuantHardTanh(
(act_quant_proxy): ActivationQuantProxy(
(fused_activation_quant_proxy): FusedActivationQuantProxy(
(activation_impl): Identity()
(tensor_quant): ClampedBinaryQuant(
(scaling_impl): StandaloneScaling(
(restrict_value): RestrictValue(
(forward_impl): Sequential(
(0): PowerOfTwo()
(1): ClampMin()
)
)
)
)
)
)
)
(13): Dropout(p=0.2)
(14): QuantLinear(
in_features=1024, out_features=10, bias=False
(weight_reg): WeightReg()
(weight_quant): WeightQuantProxy(
(tensor_quant): BinaryQuant(
(scaling_impl): StandaloneScaling(
(restrict_value): RestrictValue(
(forward_impl): Sequential(
(0): PowerOfTwo()
(1): Identity()
)
)
)
)
)
(bias_quant): BiasQuantProxy()
)
(15): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
%% Cell type:markdown id: tags:
We have now instantiated our trained PyTorch network. Let's try to run an example MNIST image through the network using PyTorch.
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
from pkgutil import get_data
import onnx
import onnx.numpy_helper as nph
raw_i = get_data("finn", "data/onnx/mnist-conv/test_data_set_0/input_0.pb")
input_tensor = onnx.load_tensor_from_string(raw_i)
input_tensor_npy = nph.to_array(input_tensor)
input_tensor_pyt = torch.from_numpy(input_tensor_npy).float()
imgplot = plt.imshow(input_tensor_npy.reshape(28,28), cmap='gray')
```
%% Output
%% Cell type:code id: tags:
``` python
from torch.nn.functional import softmax
# do forward pass in PyTorch/Brevitas
produced = lfc.forward(input_tensor_pyt).detach()
probabilities = softmax(produced, dim=-1).flatten()
probabilities
```
%% Output
tensor([0.0602, 0.0147, 0.5844, 0.0445, 0.0270, 0.0185, 0.0595, 0.0082, 0.1689,
0.0141])
%% Cell type:code id: tags:
``` python
import numpy as np
objects = [str(x) for x in range(10)]
y_pos = np.arange(len(objects))
plt.bar(y_pos, probabilities, align='center', alpha=0.5)
plt.xticks(y_pos, objects)
plt.ylabel('Predicted Probability')
plt.title('LFC-w1a1 Predictions for Image')
plt.show()
```
%% Output
%% Cell type:markdown id: tags:
## 2. Call Brevitas FINN-ONNX export and visualize with Netron
Brevitas comes with built-in FINN-ONNX export functionality. This is similar to the regular ONNX export capabilities of PyTorch, with a few differences:
1. The weight quantization logic is not exported as part of the graph; rather, the quantized weights themselves are exported.
2. Special quantization annotations are used to preserve the low-bit quantization information. ONNX (at the time of writing) supports 8-bit quantization as the minimum bitwidth, whereas FINN-ONNX quantization annotations can go down to binary/bipolar quantization.
3. Low-bit quantized activation functions are exported as MultiThreshold operators.
It's actually quite straightforward to export ONNX from our Brevitas model as follows:
%% Cell type:code id: tags:
``` python
import brevitas.onnx as bo
export_onnx_path = "/tmp/LFCW1A1.onnx"
input_shape = (1, 1, 28, 28)
bo.export_finn_onnx(lfc, input_shape, export_onnx_path)
```
%% Output
/workspace/brevitas_cnv_lfc/training_scripts/models/LFC.py:85: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
x = 2.0 * x - torch.tensor([1.0]).to(self.device)
%% Cell type:markdown id: tags:
Let's examine what the exported ONNX model looks like. For this, we will use the Netron visualizer:
%% Cell type:code id: tags:
``` python
showInNetron('/tmp/LFCW1A1.onnx')
```
%% Output
Serving '/tmp/LFCW1A1.onnx' at http://0.0.0.0:8081
<IPython.lib.display.IFrame at 0x7f86cdb6e5f8>
<IPython.lib.display.IFrame at 0x7f3d330b6ac8>
%% Cell type:markdown id: tags:
When running this notebook in the FINN Docker container, you should be able to see an interactive visualization of the imported network above, and click on individual nodes to inspect their parameters. If you look at any of the MatMul nodes, you should be able to see that the weights are all {-1, +1} values, and the activations are Sign functions.
%% Cell type:markdown id: tags:
## 3. Import into FINN and call cleanup transformations
We will now import this ONNX model into FINN using the ModelWrapper, and examine some of the graph attributes from Python.
%% Cell type:code id: tags:
``` python
from finn.core.modelwrapper import ModelWrapper
model = ModelWrapper(export_onnx_path)
model.graph.node[9]
```
%% Output
input: "37"
input: "38"
output: "40"
op_type: "MatMul"
%% Cell type:markdown id: tags:
The ModelWrapper exposes a range of other useful functions as well. For instance, by convention the second input of the MatMul node will be a pre-initialized weight tensor, which we can view using the following:
%% Cell type:code id: tags:
``` python
model.get_initializer(model.graph.node[9].input[1])
```
%% Output
array([[-1., -1., -1., ..., -1., -1., 1.],
[-1., 1., -1., ..., 1., -1., -1.],
[ 1., -1., 1., ..., -1., -1., -1.],
...,
[ 1., 1., -1., ..., 1., 1., 1.],
[-1., -1., 1., ..., 1., 1., -1.],
[ 1., 1., -1., ..., 1., -1., -1.]], dtype=float32)
%% Cell type:markdown id: tags:
We can also examine the quantization annotations and shapes of various tensors using the convenience functions provided by ModelWrapper.
%% Cell type:code id: tags:
``` python
model.get_tensor_datatype(model.graph.node[9].input[1])
```
%% Output
<DataType.BIPOLAR: 8>
%% Cell type:code id: tags:
``` python
model.get_tensor_shape(model.graph.node[9].input[1])
```
%% Output
[784, 1024]
%% Cell type:markdown id: tags:
If we want to operate further on this model in FINN, it is a good idea to execute certain "cleanup" transformations on this graph. Here, we will run shape inference and constant folding on this graph, and visualize the resulting graph in Netron again.
%% Cell type:code id: tags:
``` python
from finn.transformation.fold_constants import FoldConstants
from finn.transformation.infer_shapes import InferShapes
model = model.transform(InferShapes())
model = model.transform(FoldConstants())
export_onnx_path_transformed = "/tmp/LFCW1A1-clean.onnx"
model.save(export_onnx_path_transformed)
```
%% Cell type:code id: tags:
``` python
showInNetron('/tmp/LFCW1A1-clean.onnx')
```
%% Output
Stopping http://0.0.0.0:8081
Serving '/tmp/LFCW1A1-clean.onnx' at http://0.0.0.0:8081
<IPython.lib.display.IFrame at 0x7f86cdb6ec18>
<IPython.lib.display.IFrame at 0x7f3d3380aef0>
%% Cell type:markdown id: tags:
We can see that the resulting graph has become smaller and simpler. Specifically, the input reshaping is now a single Reshape node instead of the Shape -> Gather -> Unsqueeze -> Concat -> Reshape sequence. We can now use the internal ONNX execution capabilities of FINN to ensure that we still get the same output from this model as we did with PyTorch.
%% Cell type:code id: tags:
``` python
import finn.core.onnx_exec as oxe
input_dict = {"0": nph.to_array(input_tensor)}
output_dict = oxe.execute_onnx(model, input_dict)
produced_finn = output_dict[list(output_dict.keys())[0]]
produced_finn
```
%% Output
array([[-1.5095654 , -2.915617 , 0.764004 , -1.8118242 , -2.308991 ,
-2.6900144 , -1.520713 , -3.4965858 , -0.47711682, -2.9628415 ]],
dtype=float32)
%% Cell type:code id: tags:
``` python
np.isclose(produced, produced_finn).all()
```
%% Output
True
%% Cell type:markdown id: tags:
We have succesfully verified that the transformed and cleaned-up FINN graph still produces the same output, and can now use this model for further processing in FINN.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment