[Notebook] Update end2end notebooks

68c33fc5 · auphelia · 2abc3085 · 68c33fc5 · 68c33fc5
Commit 68c33fc5 authored 4 years ago by auphelia
--- a/notebooks/end2end_example/tfc_end2end_example.ipynb
+++ b/notebooks/end2end_example/tfc_end2end_example.ipynb
--- a/notebooks/end2end_example/tfc_end2end_verification.ipynb
+++ b/notebooks/end2end_example/tfc_end2end_verification.ipynb
@@ -9,7 +9,7 @@
    "\n",
    "**Important: This notebook depends on the tfc_end2end_example notebook, because we are using models that are available at intermediate steps in the end-to-end flow. So please make sure the needed .onnx files are generated to run this notebook.**\n",
    "\n",
-    "In this notebook, we will show how to take the intermediate results of the end-to-end tfc example and verify their functionality with different methods. In the following picture you can see the block in the end-to-end flow about the *Simulation & Emulation Flows*. Besides the methods in this notebook, there is another one that is covered in the Jupyter notebook [tfc_end2end_example](tfc_end2end_example.ipynb): remote execution. The remote execution allows functional verification directly on the PYNQ board, for details please have a look at the mentioned Jupyter notebook."
+    "In this notebook, we will show how to take the intermediate results of the end-to-end tfc example and verify their functionality with different methods. In the following picture you can see the section in the end-to-end flow about the *Simulation & Emulation Flows*. Besides the methods in this notebook, there is another one that is covered in the Jupyter notebook [tfc_end2end_example](tfc_end2end_example.ipynb): remote execution. The remote execution allows functional verification directly on the PYNQ board, for details please have a look at the mentioned Jupyter notebook."
   ]
  },
  {
@@ -32,18 +32,9 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "import inspect\n",
-    "import netron\n",
    "from finn.util.basic import make_build_dir\n",
-    "from IPython.display import IFrame\n",
-    "\n",
-    "def showSrc(what):\n",
-    "    print(\"\".join(inspect.getsourcelines(what)[0]))\n",
-    "    \n",
-    "def showInNetron(model_filename):\n",
-    "    netron.start(model_filename, port=8081, host=\"0.0.0.0\")\n",
-    "    return IFrame(src=\"http://0.0.0.0:8081/\", width=\"100%\", height=400)\n",
-    "    \n",
+    "from finn.util.visualization import showSrc, showInNetron\n",
+    "   \n",
    "build_dir = \"/workspace/finn\""
   ]
  },
@@ -51,7 +42,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "To verify the simulations a \"golden\" output is calculated as a reference. This is calculated directly from the Brevitas model using PyTorch, by running some example data from the MNIST dataset through the trained model."
+    "To verify the simulations, a \"golden\" output is calculated as a reference. This is calculated directly from the Brevitas model using PyTorch, by running some example data from the MNIST dataset through the trained model."
   ]
  },
  {
@@ -62,8 +53,8 @@
    {
     "data": {
      "text/plain": [
-       "array([[-0.4992097 , -0.24960485,  6.489726  ,  0.99841946, -0.24960482,\n",
-       "        -2.2464437 ,  0.7488146 , -1.4976292 , -0.49920973, -2.7456534 ]],\n",
+       "array([[-1.119972 , -1.7596636,  0.8423852, -1.0705007, -1.3218282,\n",
+       "        -1.5030646, -1.4598225, -1.2803943, -1.0334575, -1.7878995]],\n",
       "      dtype=float32)"
      ]
     },
@@ -91,9 +82,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Simulation using Python \n",
+    "## Simulation using Python <a id='simpy'></a>\n",
    "\n",
-    "If an ONNX model consists of [standard ONNX](https://github.com/onnx/onnx/blob/master/docs/Operators.md) nodes and/or FINN custom operations that do not belong to the fpgadataflow (backend $\\neq$ \"fpgadataflow\") this model can be checked for functionality using Python. General information about FINN custom op nodes can be found in Jupyter notebook [2_custom_op.ipynb](../internals/2_custom_op.ipynb).\n",
+    "If an ONNX model consists of [standard ONNX](https://github.com/onnx/onnx/blob/master/docs/Operators.md) nodes and/or FINN custom operations that do not belong to the fpgadataflow (backend $\\neq$ \"fpgadataflow\") this model can be checked for functionality using Python.\n",
    "\n",
    "To simulate a standard ONNX node [onnxruntime](https://github.com/microsoft/onnxruntime) is used. onnxruntime is an open source tool developed by Microsoft to run standard ONNX nodes. For the FINN custom op nodes execution functions are defined. The following is an example of the execution function of a XNOR popcount node.\n"
   ]
@@ -111,8 +102,10 @@
      "    \"\"\"Simulates XNOR-popcount matrix multiplication as a regular bipolar\n",
      "    matrix multiplication followed by some post processing.\"\"\"\n",
      "    # extract the operand shapes\n",
-      "    (M, K0) = inp0.shape\n",
-      "    (K1, N) = inp1.shape\n",
+      "    # (M, K0) = inp0.shape\n",
+      "    # (K1, N) = inp1.shape\n",
+      "    K0 = inp0.shape[-1]\n",
+      "    K1 = inp1.shape[0]\n",
      "    # make sure shapes are compatible with matmul\n",
      "    assert K0 == K1, \"Matrix shapes are not compatible with matmul.\"\n",
      "    K = K0\n",
@@ -202,7 +195,7 @@
   "source": [
    "## Simulation (npysim) using C++\n",
    "\n",
-    "When dealing with HLS custom op nodes in FINN the simulation using Python is no longer sufficient. After the nodes have been converted to HLS layers, the simulation using C++ can be used. To do this, the input tensor is stored in an .npy file and C++ code is generated that reads the values from the .npy array, streams them to the corresponding finn-hlslib function and writes the result to a new .npy file. This in turn can be read in Python and processed in the FINN flow. For this example the model after the conversion to HLS layers is used."
+    "When dealing with HLS custom op nodes in FINN the simulation using Python is no longer sufficient. After the nodes have been converted to HLS layers, the simulation using C++ can be used. To do this, the input tensor is stored in an .npy file and C++ code is generated that reads the values from the .npy array, streams them to the corresponding finn-hlslib function and writes the result to a new .npy file. This in turn can be read in Python and processed in the FINN flow. For this example the model after setting the folding factors in the HLS layers is used, please be aware that this is not the full model, but the dataflow partition, so before executing at the end of this section we have to integrate the model back into the parent model."
   ]
  },
  {
@@ -211,7 +204,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "model_for_npysim = ModelWrapper(build_dir+\"/tfc_w1_a1_hls_layers.onnx\")"
+    "model_for_npysim = ModelWrapper(build_dir+\"/tfc_w1_a1_set_folding_factors.onnx\")"
   ]
  },
  {
@@ -231,7 +224,9 @@
   "source": [
    "from finn.transformation.fpgadataflow.codegen_npysim import CodeGen_npysim\n",
    "from finn.transformation.fpgadataflow.compile import Compile\n",
+    "from finn.transformation.general import GiveUniqueNodeNames\n",
    "\n",
+    "model_for_npysim = model_for_npysim.transform(GiveUniqueNodeNames())\n",
    "model_for_npysim = model_for_npysim.transform(CodeGen_npysim())\n",
    "model_for_npysim = model_for_npysim.transform(Compile())"
   ]
@@ -269,7 +264,7 @@
       "        "
      ],
      "text/plain": [
-       "<IPython.lib.display.IFrame at 0x7fb461dd6710>"
+       "<IPython.lib.display.IFrame at 0x7f8dfdb29c18>"
      ]
     },
     "execution_count": 8,
@@ -302,14 +297,15 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "compile.sh  execute_StreamingFCLayer_Batch.cpp\tnode_model  params.h  thresh.h\r\n"
+      "compile.sh\t\t\t    memblock_0.dat  thresh.h\r\n",
+      "execute_StreamingFCLayer_Batch.cpp  node_model\t    weights.npy\r\n"
     ]
    }
   ],
   "source": [
    "from finn.custom_op.registry import getCustomOp\n",
    "\n",
-    "fc0 = model_for_npysim.graph.node[2]\n",
+    "fc0 = model_for_npysim.graph.node[1]\n",
    "fc0w = getCustomOp(fc0)\n",
    "code_gen_dir = fc0w.get_nodeattr(\"code_gen_dir_npysim\")\n",
    "!ls {code_gen_dir}"
@@ -337,14 +333,15 @@
   "source": [
    "from finn.transformation.fpgadataflow.set_exec_mode import SetExecMode\n",
    "\n",
-    "model_for_npysim = model_for_npysim.transform(SetExecMode(\"npysim\"))"
+    "model_for_npysim = model_for_npysim.transform(SetExecMode(\"npysim\"))\n",
+    "model_for_npysim.save(build_dir+\"/tfc_w1_a1_for_npysim.onnx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Now the model can be executed using `execute_onnx`. The function reads the `exec_mode` and writes the input into the correct directory in a .npy file. To be able to read this in C++, there is an additional .hpp file ([npy2apintstream.hpp](https://github.com/Xilinx/finn/blob/master/src/finn/data/cpp/npy2apintstream.hpp)) in FINN, which uses cnpy to read .npy files and convert them into streams, or to read a stream and write it into an .npy. [cnpy](https://github.com/rogersce/cnpy) is a helper to read and write .npy and .npz formates in C++.\n",
+    "Before the model can be executed using `execute_onnx`, we integrate the child model in the parent model. The function reads then the `exec_mode` and writes the input into the correct directory in a .npy file. To be able to read this in C++, there is an additional .hpp file ([npy2apintstream.hpp](https://github.com/Xilinx/finn/blob/master/src/finn/data/cpp/npy2apintstream.hpp)) in FINN, which uses cnpy to read .npy files and convert them into streams, or to read a stream and write it into an .npy. [cnpy](https://github.com/rogersce/cnpy) is a helper to read and write .npy and .npz formates in C++.\n",
    "\n",
    "The result is again compared to the \"golden\" output."
   ]
@@ -363,7 +360,11 @@
    }
   ],
   "source": [
-    "output_dict = oxe.execute_onnx(model_for_npysim, input_dict)\n",
+    "parent_model = ModelWrapper(build_dir+\"/tfc_w1_a1_dataflow_parent.onnx\")\n",
+    "sdp_node = parent_model.graph.node[2]\n",
+    "child_model = build_dir + \"/tfc_w1_a1_for_npysim.onnx\"\n",
+    "getCustomOp(sdp_node).set_nodeattr(\"model\", child_model)\n",
+    "output_dict = oxe.execute_onnx(parent_model, input_dict)\n",
    "output_npysim = output_dict[list(output_dict.keys())[0]]\n",
    "\n",
    "if np.isclose(output_npysim, output_golden, atol=1e-3).all():\n",
@@ -398,7 +399,7 @@
   "source": [
    "### Emulation of model node-by-node\n",
    "\n",
-    "The child model is loaded and the `exec_mode` for each node is set. Then it is saved in a new .onnx file so that the changed model can be referenced in the parent model."
+    "The child model is loaded and the `exec_mode` for each node is set. To prepare the node-by-node emulation the transformation `PrepareRTLSim` is applied to the child model. With this transformation the emulation files are created for each node and can be used directly when calling `execute_onnx()`. Each node has a new node attribute \"rtlsim_so\" after transformation, which contains the path to the corresponding emulation files. Then it is saved in a new .onnx file so that the changed model can be referenced in the parent model."
   ]
  },
  {
@@ -407,8 +408,10 @@
   "metadata": {},
   "outputs": [],
   "source": [
+    "from finn.transformation.fpgadataflow.prepare_rtlsim import PrepareRTLSim\n",
    "child_model = ModelWrapper(build_dir + \"/tfc_w1_a1_ipgen.onnx\")\n",
    "child_model = child_model.transform(SetExecMode(\"rtlsim\"))\n",
+    "child_model = child_model.transform(PrepareRTLSim())\n",
    "child_model.save(build_dir + \"/tfc_w1_a1_dataflow_child.onnx\")"
   ]
  },
@@ -519,13 +522,6 @@
    "else:\n",
    "    print(\"The results are not the same!\")"
   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
  }
 ],
 "metadata": {

 %% Cell type:markdown id: tags:

 # FINN - Functional Verification of End-to-End Flow
 -----------------------------------------------------------------

 **Important: This notebook depends on the tfc_end2end_example notebook, because we are using models that are available at intermediate steps in the end-to-end flow. So please make sure the needed .onnx files are generated to run this notebook.**

-In this notebook, we will show how to take the intermediate results of the end-to-end tfc example and verify their functionality with different methods. In the following picture you can see the block in the end-to-end flow about the *Simulation & Emulation Flows*. Besides the methods in this notebook, there is another one that is covered in the Jupyter notebook [tfc_end2end_example](tfc_end2end_example.ipynb): remote execution. The remote execution allows functional verification directly on the PYNQ board, for details please have a look at the mentioned Jupyter notebook.
+In this notebook, we will show how to take the intermediate results of the end-to-end tfc example and verify their functionality with different methods. In the following picture you can see the section in the end-to-end flow about the *Simulation & Emulation Flows*. Besides the methods in this notebook, there is another one that is covered in the Jupyter notebook [tfc_end2end_example](tfc_end2end_example.ipynb): remote execution. The remote execution allows functional verification directly on the PYNQ board, for details please have a look at the mentioned Jupyter notebook.

 %% Cell type:markdown id: tags:

 <img src="verification.png" alt="Drawing" style="width: 500px;"/>

 %% Cell type:markdown id: tags:

 We will use the following helper functions, `showSrc` to show source code of FINN library calls and `showInNetron` to show the ONNX model at the current transformation step. The Netron displays are interactive, but they only work when running the notebook actively and not on GitHub (i.e. if you are viewing this on GitHub you'll only see blank squares).

 %% Cell type:code id: tags:

 ``` python
-import inspect
-import netron
 from finn.util.basic import make_build_dir
-from IPython.display import IFrame
-
-def showSrc(what):
-    print("".join(inspect.getsourcelines(what)[0]))
-
-def showInNetron(model_filename):
-    netron.start(model_filename, port=8081, host="0.0.0.0")
-    return IFrame(src="http://0.0.0.0:8081/", width="100%", height=400)
+from finn.util.visualization import showSrc, showInNetron

 build_dir = "/workspace/finn"
 ```

 %% Cell type:markdown id: tags:

-To verify the simulations a "golden" output is calculated as a reference. This is calculated directly from the Brevitas model using PyTorch, by running some example data from the MNIST dataset through the trained model.
+To verify the simulations, a "golden" output is calculated as a reference. This is calculated directly from the Brevitas model using PyTorch, by running some example data from the MNIST dataset through the trained model.

 %% Cell type:code id: tags:

 ``` python
 from pkgutil import get_data
 import onnx
 import onnx.numpy_helper as nph
 import torch
 from finn.util.test import get_test_model_trained

 fc = get_test_model_trained("TFC", 1, 1)
 raw_i = get_data("finn", "data/onnx/mnist-conv/test_data_set_0/input_0.pb")
 input_tensor = onnx.load_tensor_from_string(raw_i)
 input_brevitas = torch.from_numpy(nph.to_array(input_tensor)).float()
 output_golden = fc.forward(input_brevitas).detach().numpy()
 output_golden
 ```

 %% Output

-    array([[-0.4992097 , -0.24960485,  6.489726  ,  0.99841946, -0.24960482,
-            -2.2464437 ,  0.7488146 , -1.4976292 , -0.49920973, -2.7456534 ]],
+    array([[-1.119972 , -1.7596636,  0.8423852, -1.0705007, -1.3218282,
+            -1.5030646, -1.4598225, -1.2803943, -1.0334575, -1.7878995]],
          dtype=float32)

 %% Cell type:markdown id: tags:

-## Simulation using Python
+## Simulation using Python <a id='simpy'></a>

-If an ONNX model consists of [standard ONNX](https://github.com/onnx/onnx/blob/master/docs/Operators.md) nodes and/or FINN custom operations that do not belong to the fpgadataflow (backend $\neq$ "fpgadataflow") this model can be checked for functionality using Python. General information about FINN custom op nodes can be found in Jupyter notebook [2_custom_op.ipynb](../internals/2_custom_op.ipynb).
+If an ONNX model consists of [standard ONNX](https://github.com/onnx/onnx/blob/master/docs/Operators.md) nodes and/or FINN custom operations that do not belong to the fpgadataflow (backend $\neq$ "fpgadataflow") this model can be checked for functionality using Python.

 To simulate a standard ONNX node [onnxruntime](https://github.com/microsoft/onnxruntime) is used. onnxruntime is an open source tool developed by Microsoft to run standard ONNX nodes. For the FINN custom op nodes execution functions are defined. The following is an example of the execution function of a XNOR popcount node.

 %% Cell type:code id: tags:

 ``` python
 from finn.custom_op.xnorpopcount import xnorpopcountmatmul
 showSrc(xnorpopcountmatmul)
 ```

 %% Output

    def xnorpopcountmatmul(inp0, inp1):
        """Simulates XNOR-popcount matrix multiplication as a regular bipolar
        matrix multiplication followed by some post processing."""
        # extract the operand shapes
-        (M, K0) = inp0.shape
-        (K1, N) = inp1.shape
+        # (M, K0) = inp0.shape
+        # (K1, N) = inp1.shape
+        K0 = inp0.shape[-1]
+        K1 = inp1.shape[0]
        # make sure shapes are compatible with matmul
        assert K0 == K1, "Matrix shapes are not compatible with matmul."
        K = K0
        # convert binary inputs to bipolar
        inp0_bipolar = 2.0 * inp0 - 1.0
        inp1_bipolar = 2.0 * inp1 - 1.0
        # call regular numpy matrix multiplication
        out = np.matmul(inp0_bipolar, inp1_bipolar)
        # XNOR-popcount does not produce the regular dot product result --
        # it returns the number of +1s after XNOR. let P be the number of +1s
        # and N be the number of -1s. XNOR-popcount returns P, whereas the
        # regular dot product result from numpy is P-N, so we need to apply
        # some correction.
        # out = P-N
        # K = P+N
        # out + K = 2P, so P = (out + K)/2
        return (out + K) * 0.5
    

 %% Cell type:markdown id: tags:

 The function contains a description of the behaviour in Python and can thus calculate the result of the node.

 This execution function and onnxruntime is used when `execute_onnx` from `onnx_exec` is applied to the model. The model is then simulated node by node and the result is stored in a context dictionary, which contains the values of each tensor at the end of the execution. To get the result, only the output tensor has to be extracted.

 The procedure is shown below. We take the model right before the nodes should be converted into HLS layers and generate an input tensor to pass to the execution function. The input tensor is generated from the Brevitas example inputs.

 %% Cell type:code id: tags:

 ``` python
 import numpy as np
 from finn.core.modelwrapper import ModelWrapper
 input_dict = {"global_in": nph.to_array(input_tensor)}

 model_for_sim = ModelWrapper(build_dir+"/tfc_w1a1_ready_for_hls_conversion.onnx")
 ```

 %% Cell type:code id: tags:

 ``` python
 import finn.core.onnx_exec as oxe
 output_dict = oxe.execute_onnx(model_for_sim, input_dict)
 output_pysim = output_dict[list(output_dict.keys())[0]]



 if np.isclose(output_pysim, output_golden, atol=1e-3).all():
    print("Results are the same!")
 else:
    print("The results are not the same!")
 ```

 %% Output

    Results are the same!

 %% Cell type:markdown id: tags:

 The result is compared with the theoretical "golden" value for verification.

 %% Cell type:markdown id: tags:

 ## Simulation (npysim) using C++

-When dealing with HLS custom op nodes in FINN the simulation using Python is no longer sufficient. After the nodes have been converted to HLS layers, the simulation using C++ can be used. To do this, the input tensor is stored in an .npy file and C++ code is generated that reads the values from the .npy array, streams them to the corresponding finn-hlslib function and writes the result to a new .npy file. This in turn can be read in Python and processed in the FINN flow. For this example the model after the conversion to HLS layers is used.
+When dealing with HLS custom op nodes in FINN the simulation using Python is no longer sufficient. After the nodes have been converted to HLS layers, the simulation using C++ can be used. To do this, the input tensor is stored in an .npy file and C++ code is generated that reads the values from the .npy array, streams them to the corresponding finn-hlslib function and writes the result to a new .npy file. This in turn can be read in Python and processed in the FINN flow. For this example the model after setting the folding factors in the HLS layers is used, please be aware that this is not the full model, but the dataflow partition, so before executing at the end of this section we have to integrate the model back into the parent model.

 %% Cell type:code id: tags:

 ``` python
-model_for_npysim = ModelWrapper(build_dir+"/tfc_w1_a1_hls_layers.onnx")
+model_for_npysim = ModelWrapper(build_dir+"/tfc_w1_a1_set_folding_factors.onnx")
 ```

 %% Cell type:markdown id: tags:

 To generate the code for this simulation and to generate the executable two transformations are used:
 * `CodeGen_npysim` which generates the C++ code for the corresponding hls layer
 * `Compile` which compules the C++ code and stores the path to the executable

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.fpgadataflow.codegen_npysim import CodeGen_npysim
 from finn.transformation.fpgadataflow.compile import Compile
+from finn.transformation.general import GiveUniqueNodeNames

+model_for_npysim = model_for_npysim.transform(GiveUniqueNodeNames())
 model_for_npysim = model_for_npysim.transform(CodeGen_npysim())
 model_for_npysim = model_for_npysim.transform(Compile())
 ```

 %% Cell type:markdown id: tags:

 When we take a look at the model using netron, we can see that the transformations introduced new attributes.

 %% Cell type:code id: tags:

 ``` python
 model_for_npysim.save(build_dir+"/tfc_w1_a1_for_npysim.onnx")
 showInNetron(build_dir+"/tfc_w1_a1_for_npysim.onnx")
 ```

 %% Output

    Serving '/workspace/finn/tfc_w1_a1_for_npysim.onnx' at http://0.0.0.0:8081

-    <IPython.lib.display.IFrame at 0x7fb461dd6710>
+    <IPython.lib.display.IFrame at 0x7f8dfdb29c18>

 %% Cell type:markdown id: tags:

 The following node attributes have been added:
 * `code_gen_dir_npysim` indicates the directory where the files for the simulation using C++ are stored
 * `executable_path` specifies the path to the executable

 We take now a closer look into the files that were generated:

 %% Cell type:code id: tags:

 ``` python
 from finn.custom_op.registry import getCustomOp

-fc0 = model_for_npysim.graph.node[2]
+fc0 = model_for_npysim.graph.node[1]
 fc0w = getCustomOp(fc0)
 code_gen_dir = fc0w.get_nodeattr("code_gen_dir_npysim")
 !ls {code_gen_dir}
 ```

 %% Output

-    compile.sh  execute_StreamingFCLayer_Batch.cpp	node_model  params.h  thresh.h
+    compile.sh			    memblock_0.dat  thresh.h
+    execute_StreamingFCLayer_Batch.cpp  node_model	    weights.npy

 %% Cell type:markdown id: tags:

 Besides the .cpp file, the folder contains .h files with the weights and thresholds. The shell script contains the compile command and *node_model* is the executable generated by compilation. Comparing this with the `executable_path` node attribute, it can be seen that it specifies exactly the path to *node_model*.

 %% Cell type:markdown id: tags:

 To simulate the model the execution mode(exec_mode) must be set to "npysim". This is done using the transformation SetExecMode.

 %% Cell type:code id: tags:

 ``` python
 from finn.transformation.fpgadataflow.set_exec_mode import SetExecMode

 model_for_npysim = model_for_npysim.transform(SetExecMode("npysim"))
+model_for_npysim.save(build_dir+"/tfc_w1_a1_for_npysim.onnx")
 ```

 %% Cell type:markdown id: tags:

-Now the model can be executed using `execute_onnx`. The function reads the `exec_mode` and writes the input into the correct directory in a .npy file. To be able to read this in C++, there is an additional .hpp file ([npy2apintstream.hpp](https://github.com/Xilinx/finn/blob/master/src/finn/data/cpp/npy2apintstream.hpp)) in FINN, which uses cnpy to read .npy files and convert them into streams, or to read a stream and write it into an .npy. [cnpy](https://github.com/rogersce/cnpy) is a helper to read and write .npy and .npz formates in C++.
+Before the model can be executed using `execute_onnx`, we integrate the child model in the parent model. The function reads then the `exec_mode` and writes the input into the correct directory in a .npy file. To be able to read this in C++, there is an additional .hpp file ([npy2apintstream.hpp](https://github.com/Xilinx/finn/blob/master/src/finn/data/cpp/npy2apintstream.hpp)) in FINN, which uses cnpy to read .npy files and convert them into streams, or to read a stream and write it into an .npy. [cnpy](https://github.com/rogersce/cnpy) is a helper to read and write .npy and .npz formates in C++.

 The result is again compared to the "golden" output.

 %% Cell type:code id: tags:

 ``` python
-output_dict = oxe.execute_onnx(model_for_npysim, input_dict)
+parent_model = ModelWrapper(build_dir+"/tfc_w1_a1_dataflow_parent.onnx")
+sdp_node = parent_model.graph.node[2]
+child_model = build_dir + "/tfc_w1_a1_for_npysim.onnx"
+getCustomOp(sdp_node).set_nodeattr("model", child_model)
+output_dict = oxe.execute_onnx(parent_model, input_dict)
 output_npysim = output_dict[list(output_dict.keys())[0]]

 if np.isclose(output_npysim, output_golden, atol=1e-3).all():
    print("Results are the same!")
 else:
    print("The results are not the same!")
 ```

 %% Output

    Results are the same!

 %% Cell type:markdown id: tags:

 ## Emulation (rtlsim) using PyVerilator

 The emulation using [PyVerilator](https://github.com/maltanar/pyverilator) can be done after IP blocks are generated from the corresponding HLS layers. Pyverilator is a tool which makes it possible to simulate verilog files using verilator via a python interface.

 We have two ways to use rtlsim, one is to run the model node-by-node as with the simulation methods, but if the model is in the form of the dataflow partition, the part of the graph that consist of only HLS nodes could also be executed as whole.

 %% Cell type:markdown id: tags:

 Because at the point where we want to grab and verify the model, the model is already in split form (parent graph consisting of non-hls layers and child graph consisting only of hls layers) we first have to reference the child graph within the parent graph. This is done using the node attribute `model` for the `StreamingDataflowPartition` node.

 First the procedure is shown, if the child graph has ip blocks corresponding to the individual layers, then the procedure is shown, if the child graph already has a stitched IP.

 %% Cell type:markdown id: tags:

 ### Emulation of model node-by-node

-The child model is loaded and the `exec_mode` for each node is set. Then it is saved in a new .onnx file so that the changed model can be referenced in the parent model.
+The child model is loaded and the `exec_mode` for each node is set. To prepare the node-by-node emulation the transformation `PrepareRTLSim` is applied to the child model. With this transformation the emulation files are created for each node and can be used directly when calling `execute_onnx()`. Each node has a new node attribute "rtlsim_so" after transformation, which contains the path to the corresponding emulation files. Then it is saved in a new .onnx file so that the changed model can be referenced in the parent model.

 %% Cell type:code id: tags:

 ``` python
+from finn.transformation.fpgadataflow.prepare_rtlsim import PrepareRTLSim
 child_model = ModelWrapper(build_dir + "/tfc_w1_a1_ipgen.onnx")
 child_model = child_model.transform(SetExecMode("rtlsim"))
+child_model = child_model.transform(PrepareRTLSim())
 child_model.save(build_dir + "/tfc_w1_a1_dataflow_child.onnx")
 ```

 %% Cell type:markdown id: tags:

 The next step is to load the parent model and set the node attribute `model` in the StreamingDataflowPartition node (`sdp_node`). Afterwards the `exec_mode` is set in the parent model in each node.

 %% Cell type:code id: tags:

 ``` python
 # parent model
 model_for_rtlsim = ModelWrapper(build_dir + "/tfc_w1_a1_dataflow_parent.onnx")
 # reference child model
 sdp_node = getCustomOp(model_for_rtlsim.graph.node[2])
 sdp_node.set_nodeattr("model", build_dir + "/tfc_w1_a1_dataflow_child.onnx")

 model_for_rtlsim = model_for_rtlsim.transform(SetExecMode("rtlsim"))
 ```

 %% Cell type:markdown id: tags:

 Because the necessary files for the emulation are already generated in Jupyter notebook [tfc_end2end_example](tfc_end2end_example.ipynb), in the next step the execution of the model can be done directly.

 %% Cell type:code id: tags:

 ``` python
 output_dict = oxe.execute_onnx(model_for_rtlsim, input_dict)
 output_rtlsim = output_dict[list(output_dict.keys())[0]]

 if np.isclose(output_rtlsim, output_golden, atol=1e-3).all():
    print("Results are the same!")
 else:
    print("The results are not the same!")
 ```

 %% Output

    Results are the same!

 %% Cell type:markdown id: tags:

 ### Emulation of stitched IP

 Here we use the same procedure. First the child model is loaded, but in contrast to the layer-by-layer emulation, the metadata property `exec_mode` is set to "rtlsim" for the whole child model. When the model is integrated and executed in the last step, the verilog files of the stitched IP of the child model are used.

 %% Cell type:code id: tags:

 ``` python
 child_model = ModelWrapper(build_dir + "/tfc_w1_a1_ipstitch.onnx")
 child_model.set_metadata_prop("exec_mode","rtlsim")
 child_model.save(build_dir + "/tfc_w1_a1_dataflow_child.onnx")
 ```

 %% Cell type:code id: tags:

 ``` python
 # parent model
 model_for_rtlsim = ModelWrapper(build_dir + "/tfc_w1_a1_dataflow_parent.onnx")
 # reference child model
 sdp_node = getCustomOp(model_for_rtlsim.graph.node[2])
 sdp_node.set_nodeattr("model", build_dir + "/tfc_w1_a1_dataflow_child.onnx")
 ```

 %% Cell type:code id: tags:

 ``` python
 output_dict = oxe.execute_onnx(model_for_rtlsim, input_dict)
 output_rtlsim = output_dict[list(output_dict.keys())[0]]

 if np.isclose(output_rtlsim, output_golden, atol=1e-3).all():
    print("Results are the same!")
 else:
    print("The results are not the same!")
 ```

 %% Output

    Results are the same!
-
-%% Cell type:code id: tags:
-
-``` python
-```