diff --git a/notebooks/end2end_example/cybersecurity/3-build-accelerator-with-finn.ipynb b/notebooks/end2end_example/cybersecurity/3-build-accelerator-with-finn.ipynb index 6433b8cbe14ab7562fe8983ebaf7db47b03c6706..eba942ecc23dbf7d3a75b256c22b3e3fceb3475f 100644 --- a/notebooks/end2end_example/cybersecurity/3-build-accelerator-with-finn.ipynb +++ b/notebooks/end2end_example/cybersecurity/3-build-accelerator-with-finn.ipynb @@ -40,10 +40,10 @@ "source": [ "## Introduction to `build_dataflow` Tool <a id=\"intro_build_dataflow\"></a>\n", "\n", - "Since version 0.5b, the FINN compiler has a `build_dataflow` tool. Compared to previous versions which required setting up all the needed transformations in a Python script, it makes experimenting with dataflow architecture generation easier. The core idea is to specify the relevant build info as a configuration `dict`, which invokes all the necessary steps to make the dataflow build happen. It can be invoked either from the [command line](https://finn-dev.readthedocs.io/en/latest/command_line.html) or with a single Python function call\n", + "Since version 0.5b, the FINN compiler has a `build_dataflow` tool. Compared to previous versions which required setting up all the needed transformations in a Python script, it makes experimenting with dataflow architecture generation easier. The core idea is to specify the relevant build info as a configuration `dict`, which invokes all the necessary steps to make the dataflow build happen. It can be invoked either from the [command line](https://finn-dev.readthedocs.io/en/latest/command_line.html) or with a single Python function call.\n", "\n", "\n", - "In this notebook, we'll use the Python function call to invoke the builds to stay inside the Jupyter notebook, but feel free to experiment with reproducing what we do here with the `./run-docker.sh build_dataflow` and `./run-docker.sh build_custom` command-line entry points too, as documented [here](https://finn-dev.readthedocs.io/en/latest/command_line.html)." + "In this notebook, we'll use the Python function call to invoke the builds to stay inside the Jupyter notebook, but feel free to experiment with reproducing what we do here with the `./run-docker.sh build_dataflow` and `./run-docker.sh build_custom` command-line entry points too. " ] }, { @@ -71,8 +71,8 @@ " - `BITFILE` : integrate the accelerator into a shell to produce a standalone bitfile\n", " - `PYNQ_DRIVER` : generate a PYNQ Python driver that can be used to launch the accelerator\n", " - `DEPLOYMENT_PACKAGE` : create a folder with the `BITFILE` and `PYNQ_DRIVER` outputs, ready to be copied to the target FPGA platform.\n", - "* `output_dir`: the directory where the all the generated build outputs above will be written into.\n", - "* `steps`: list of predefined (or custom) build steps FINN will go through. Use `build_dataflow_config.estimate_only_dataflow_steps` to execute only the steps needed for estimation (without any synthesis), and the `build_dataflow_config.default_build_dataflow_steps` otherwise (which is the default value).\n", + "* `output_dir`: the directory where all the generated build outputs above will be written into.\n", + "* `steps`: list of predefined (or custom) build steps FINN will go through. Use `build_dataflow_config.estimate_only_dataflow_steps` to execute only the steps needed for estimation (without any synthesis), and the `build_dataflow_config.default_build_dataflow_steps` otherwise (which is the default value). You can find the list of default steps [here](https://finn.readthedocs.io/en/latest/source_code/finn.builder.html#finn.builder.build_dataflow_config.default_build_dataflow_steps) in the documentation.\n", "\n", "### Configuring the Board and FPGA Part <a id=\"config_fpga\"></a>\n", "\n", @@ -82,7 +82,7 @@ "\n", "### Configuring the Performance <a id=\"config_perf\"></a>\n", "\n", - "You can configure the performance (and correspondingly, the FPGA resource footprint) of the generated in two ways:\n", + "You can configure the performance (and correspondingly, the FPGA resource footprint) of the generated dataflow accelerator in two ways:\n", "\n", "1) (basic) Set a target performance and let the compiler figure out the per-node parallelization settings.\n", "\n", @@ -90,7 +90,7 @@ "\n", "This notebook only deals with the basic approach, for which you need to set up:\n", "\n", - "* `target_fps`: target inference performance in frames per second. Note that target may not be achievable due to specific layer constraints, or due to resource limitations of the FPGA.\n", + "* `target_fps`: target inference performance in frames per second. Note that target may not be achievable due to specific layer constraints, or due to resource limitations of the FPGA. \n", "* `synth_clk_period_ns`: target clock frequency (in nanoseconds) for Vivado synthesis. e.g. `synth_clk_period_ns=5.0` will target a 200 MHz clock. Note that the target clock period may not be achievable depending on the FPGA part and design complexity." ] }, @@ -107,11 +107,38 @@ "cell_type": "code", "execution_count": 1, "metadata": {}, + "outputs": [], + "source": [ + "import finn.builder.build_dataflow as build\n", + "import finn.builder.build_dataflow_config as build_cfg\n", + "\n", + "model_file = \"cybsec-mlp-ready.onnx\"\n", + "\n", + "estimates_output_dir = \"output_estimates_only\"\n", + "\n", + "cfg_estimates = build.DataflowBuildConfig(\n", + " output_dir = estimates_output_dir,\n", + " target_fps = 1000000,\n", + " synth_clk_period_ns = 10.0,\n", + " fpga_part = \"xc7z020clg400-1\",\n", + " steps = build_cfg.estimate_only_dataflow_steps,\n", + " generate_outputs=[\n", + " build_cfg.DataflowOutputType.ESTIMATE_REPORTS,\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ + "CPU times: user 2 碌s, sys: 1 碌s, total: 3 碌s\n", + "Wall time: 6.91 碌s\n", "Building dataflow accelerator from cybsec-mlp-ready.onnx\n", "Intermediate outputs will be generated in /tmp/finn_dev_maltanar\n", "Final outputs will be generated in output_estimates_only\n", @@ -132,31 +159,14 @@ "0" ] }, - "execution_count": 1, + "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "import finn.builder.build_dataflow as build\n", - "import finn.builder.build_dataflow_config as build_cfg\n", - "\n", - "model_file = \"cybsec-mlp-ready.onnx\"\n", - "\n", - "estimates_output_dir = \"output_estimates_only\"\n", - "\n", - "cfg = build.DataflowBuildConfig(\n", - " output_dir = estimates_output_dir,\n", - " target_fps = 1000000,\n", - " synth_clk_period_ns = 10.0,\n", - " fpga_part = \"xc7z020clg400-1\",\n", - " steps = build_cfg.estimate_only_dataflow_steps,\n", - " generate_outputs=[\n", - " build_cfg.DataflowOutputType.ESTIMATE_REPORTS,\n", - " ]\n", - ")\n", - "\n", - "build.build_dataflow_cfg(model_file, cfg)" + "%%time\n", + "build.build_dataflow_cfg(model_file, cfg_estimates)" ] }, { @@ -168,7 +178,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 3, "metadata": {}, "outputs": [ { @@ -185,7 +195,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 4, "metadata": {}, "outputs": [ { @@ -211,7 +221,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 5, "metadata": {}, "outputs": [ { @@ -236,12 +246,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Since all of these reports are .json files, we can easily load them into Python for further processing. Let's define a helper function and look at the `estimate_layer_cycles.json` report." + "Since all of these reports are .json files, we can easily load them into Python for further processing. This can be useful if you are building your own design automation tools on top of FINN. Let's define a helper function and look at the `estimate_layer_cycles.json` report." ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "metadata": {}, "outputs": [], "source": [ @@ -254,7 +264,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "metadata": {}, "outputs": [ { @@ -266,7 +276,7 @@ " 'StreamingFCLayer_Batch_3': 64}" ] }, - "execution_count": 6, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } @@ -286,7 +296,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 8, "metadata": {}, "outputs": [ { @@ -319,7 +329,7 @@ " 'total': {'BRAM_18K': 36.0, 'LUT': 11360.0, 'URAM': 0.0, 'DSP': 0.0}}" ] }, - "execution_count": 7, + "execution_count": 8, "metadata": {}, "output_type": "execute_result" } @@ -334,7 +344,7 @@ "source": [ "This particular report is useful to determine whether the current configuration will fit into a particular FPGA. If you see that the resource requirements are too high for the FPGA you had in mind, you should consider lowering the `target_fps`.\n", "\n", - "*Note that the analytical models tend to over-estimate how much resources are needed, since they can't capture the effects of various synthesis optimizations.*" + "**Note that the analytical models tend to over-estimate how much resources are needed, since they can't capture the effects of various synthesis optimizations.**" ] }, { @@ -355,7 +365,40 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "import finn.builder.build_dataflow as build\n", + "import finn.builder.build_dataflow_config as build_cfg\n", + "import os\n", + "import shutil\n", + "\n", + "model_file = \"cybsec-mlp-ready.onnx\"\n", + "\n", + "rtlsim_output_dir = \"output_ipstitch_ooc_rtlsim\"\n", + "\n", + "#Delete previous run results if exist\n", + "if os.path.exists(rtlsim_output_dir):\n", + " shutil.rmtree(rtlsim_output_dir)\n", + " print(\"Previous run results deleted!\")\n", + "\n", + "cfg_stitched_ip = build.DataflowBuildConfig(\n", + " output_dir = rtlsim_output_dir,\n", + " target_fps = 1000000,\n", + " synth_clk_period_ns = 10.0,\n", + " fpga_part = \"xc7z020clg400-1\",\n", + " generate_outputs=[\n", + " build_cfg.DataflowOutputType.STITCHED_IP,\n", + " build_cfg.DataflowOutputType.RTLSIM_PERFORMANCE,\n", + " build_cfg.DataflowOutputType.OOC_SYNTH,\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, "metadata": {}, "outputs": [ { @@ -382,7 +425,9 @@ "Running step: step_out_of_context_synthesis [14/16]\n", "Running step: step_synthesize_bitfile [15/16]\n", "Running step: step_deployment_package [16/16]\n", - "Completed successfully\n" + "Completed successfully\n", + "CPU times: user 3.69 s, sys: 756 ms, total: 4.45 s\n", + "Wall time: 7min 11s\n" ] }, { @@ -391,66 +436,35 @@ "0" ] }, - "execution_count": 8, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "import finn.builder.build_dataflow as build\n", - "import finn.builder.build_dataflow_config as build_cfg\n", - "import os\n", - "import shutil\n", - "\n", - "model_file = \"cybsec-mlp-ready.onnx\"\n", - "\n", - "rtlsim_output_dir = \"output_ipstitch_ooc_rtlsim\"\n", - "\n", - "#Delete previous run results if exist\n", - "if os.path.exists(rtlsim_output_dir):\n", - " shutil.rmtree(rtlsim_output_dir)\n", - " print(\"Previous run results deleted!\")\n", - "\n", - "cfg = build.DataflowBuildConfig(\n", - " output_dir = rtlsim_output_dir,\n", - " target_fps = 1000000,\n", - " synth_clk_period_ns = 10.0,\n", - " fpga_part = \"xc7z020clg400-1\",\n", - " generate_outputs=[\n", - " build_cfg.DataflowOutputType.STITCHED_IP,\n", - " build_cfg.DataflowOutputType.RTLSIM_PERFORMANCE,\n", - " build_cfg.DataflowOutputType.OOC_SYNTH,\n", - " ]\n", - ")\n", - "\n", - "build.build_dataflow_cfg(model_file, cfg)" + "%%time\n", + "build.build_dataflow_cfg(model_file, cfg_stitched_ip)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Among the output products, we will find the accelerator exported as IP:" + "Why is e.g. `step_synthesize_bitfile` listed above even though we didn't ask for a bitfile in the output products? This is because we're using the default set of build steps, which includes `step_synthesize_bitfile`. Since its output product is not selected, this step will do nothing." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Among the output products, we will find the accelerator exported as a stitched IP block design:" ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "all_verilog_srcs.txt\t\t ip\r\n", - "finn_vivado_stitch_proj.cache\t make_project.sh\r\n", - "finn_vivado_stitch_proj.hw\t make_project.tcl\r\n", - "finn_vivado_stitch_proj.ip_user_files vivado.jou\r\n", - "finn_vivado_stitch_proj.srcs\t vivado.log\r\n", - "finn_vivado_stitch_proj.xpr\r\n" - ] - } - ], + "outputs": [], "source": [ "! ls {rtlsim_output_dir}/stitched_ip" ] @@ -464,18 +478,9 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "estimate_layer_resources_hls.json rtlsim_performance.json\r\n", - "ooc_synth_and_timing.json\r\n" - ] - } - ], + "outputs": [], "source": [ "! ls {rtlsim_output_dir}/report" ] @@ -489,27 +494,9 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\r\n", - " \"vivado_proj_folder\": \"/tmp/finn_dev_maltanar/synth_out_of_context_ex08r7hd/results_finn_design_wrapper\",\r\n", - " \"LUT\": 7920.0,\r\n", - " \"FF\": 7327.0,\r\n", - " \"DSP\": 0.0,\r\n", - " \"BRAM\": 18.0,\r\n", - " \"WNS\": 1.565,\r\n", - " \"\": 0,\r\n", - " \"fmax_mhz\": 118.55364552459987,\r\n", - " \"estimated_throughput_fps\": 1481920.5690574984\r\n", - "}" - ] - } - ], + "outputs": [], "source": [ "! cat {rtlsim_output_dir}/report/ooc_synth_and_timing.json" ] @@ -523,26 +510,9 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\r\n", - " \"cycles\": 840,\r\n", - " \"runtime[ms]\": 0.008400000000000001,\r\n", - " \"throughput[images/s]\": 952380.9523809523,\r\n", - " \"DRAM_in_bandwidth[Mb/s]\": 71.42857142857142,\r\n", - " \"DRAM_out_bandwidth[Mb/s]\": 0.11904761904761903,\r\n", - " \"fclk[mhz]\": 100.0,\r\n", - " \"N\": 8,\r\n", - " \"latency_cycles\": 231\r\n", - "}" - ] - } - ], + "outputs": [], "source": [ "! cat {rtlsim_output_dir}/report/rtlsim_performance.json" ] @@ -556,65 +526,9 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\r\n", - " \"Defaults\": {},\r\n", - " \"StreamingFIFO_0\": {\r\n", - " \"ram_style\": \"auto\",\r\n", - " \"depth\": 32,\r\n", - " \"impl_style\": \"rtl\"\r\n", - " },\r\n", - " \"StreamingFCLayer_Batch_0\": {\r\n", - " \"PE\": 32,\r\n", - " \"SIMD\": 15,\r\n", - " \"ram_style\": \"auto\",\r\n", - " \"resType\": \"lut\",\r\n", - " \"mem_mode\": \"decoupled\",\r\n", - " \"runtime_writeable_weights\": 0\r\n", - " },\r\n", - " \"StreamingDataWidthConverter_Batch_0\": {\r\n", - " \"impl_style\": \"hls\"\r\n", - " },\r\n", - " \"StreamingFCLayer_Batch_1\": {\r\n", - " \"PE\": 4,\r\n", - " \"SIMD\": 16,\r\n", - " \"ram_style\": \"auto\",\r\n", - " \"resType\": \"lut\",\r\n", - " \"mem_mode\": \"decoupled\",\r\n", - " \"runtime_writeable_weights\": 0\r\n", - " },\r\n", - " \"StreamingDataWidthConverter_Batch_1\": {\r\n", - " \"impl_style\": \"hls\"\r\n", - " },\r\n", - " \"StreamingFCLayer_Batch_2\": {\r\n", - " \"PE\": 4,\r\n", - " \"SIMD\": 16,\r\n", - " \"ram_style\": \"auto\",\r\n", - " \"resType\": \"lut\",\r\n", - " \"mem_mode\": \"decoupled\",\r\n", - " \"runtime_writeable_weights\": 0\r\n", - " },\r\n", - " \"StreamingDataWidthConverter_Batch_2\": {\r\n", - " \"impl_style\": \"hls\"\r\n", - " },\r\n", - " \"StreamingFCLayer_Batch_3\": {\r\n", - " \"PE\": 1,\r\n", - " \"SIMD\": 1,\r\n", - " \"ram_style\": \"auto\",\r\n", - " \"resType\": \"lut\",\r\n", - " \"mem_mode\": \"decoupled\",\r\n", - " \"runtime_writeable_weights\": 0\r\n", - " }\r\n", - "}" - ] - } - ], + "outputs": [], "source": [ "! cat {rtlsim_output_dir}/final_hw_config.json" ] @@ -630,7 +544,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -662,7 +576,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -678,7 +592,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -694,7 +608,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -710,7 +624,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -726,7 +640,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": null, "metadata": {}, "outputs": [], "source": [