Merge branch 'dev' into feature/gen_sv_tb

3baf111b · Yaman Umuroglu · 28cc1b15 · 44e8ea0e · 3baf111b · 3baf111b
Commit 3baf111b authored 2 years ago by Yaman Umuroglu
--- a/README.md
+++ b/README.md
@@ -34,9 +34,12 @@ You can view the documentation on [readthedocs](https://finn.readthedocs.io) or

 ## Community

-We have a [gitter channel](https://gitter.im/xilinx-finn/community) where you can ask questions. You can use the GitHub issue tracker to report bugs, but please don't file issues to ask questions as this is better handled in the gitter channel.
+We have [GitHub discussions](https://github.com/Xilinx/finn/discussions) where you can ask questions. You can use the GitHub issue tracker to report bugs, but please don't file issues to ask questions as this is better handled in GitHub discussions.
+
+We also heartily welcome contributions to the project, please check out the [contribution guidelines](CONTRIBUTING.md) and the [list of open issues](https://github.com/Xilinx/finn/issues). Don't hesitate to get in touch over [GitHub discussions](https://github.com/Xilinx/finn/discussions) to discuss your ideas.
+
+In the past, we also had a [Gitter channel](https://gitter.im/xilinx-finn/community). Please be aware that this is no longer maintained by us but can still be used to search for questions previous users had.

-We also heartily welcome contributions to the project, please check out the [contribution guidelines](CONTRIBUTING.md) and the [list of open issues](https://github.com/Xilinx/finn/issues). Don't hesitate to get in touch over [Gitter](https://gitter.im/xilinx-finn/community) to discuss your ideas.

 ## Citation


--- a/tutorials/fpga_flow/README.md
+++ b/tutorials/fpga_flow/README.md
+# FINN Example FPGA Flow Using MNIST Numerals
+
+This example demonstrates how to bring a FINN compiled model into the Vivado FPGA design environment for integration into a larger FPGA application. It extends on the command-line [build_dataflow](https://github.com/Xilinx/finn/tree/master/src/finn/qnn-data/build_dataflow) using a model that was quantized with [Brevitas](https://github.com/Xilinx/brevitas) down to single-bit weight/ativation precision to classify hand-written numerals from the MNIST data set.
+
+If you are new to the command-line flow, more information can be found [here](https://finn.readthedocs.io/en/latest/command_line.html).
+
+This demo was created using Vivado 2020.1.
+
+## Compiling the Model in FINN
+
+#### Configuration
+`build.py` assembles the needed files and configures how the model is compiled when generating the "stitched IP".  The following items will need to be set appropriately for specific use cases:
+- `output_dir`: defines the directory to be created for FINN compiler output.
+- `target_fps`: desired throughput performance target for FINN compiler to achieve.
+- `mvau_wwidth_max`:  _an optional parameter_ ([described here](https://finn.readthedocs.io/en/latest/source_code/finn.builder.html#finn.builder.build_dataflow_config.DataflowBuildConfig.mvau_wwidth_max)) shown only to illustrate passing additional configuration items to the compiler.
+- `folding_config_file`: an optional parameter to pass a json file defining the layer optimizations (PE,SIMD,ramstyle, etc.) to the compiler.
+- `synth_clk_period_ns`: set the desired clock period in nS.
+- `fpga_part` configures the IP for your target device that the stitched IP will be implemented in.  It should be the full string recognized in Vivado: \<device\>-\<package\>-\<temperature_grade\>-\<speed_grade\>
+- `generate_outputs`: for integration purposes, the only output needed is `STITCHED_IP`.  You might also find the `ESTIMATE_REPORTS` interesting.  Other options are documented [here](https://finn.readthedocs.io/en/latest/command_line.html#generated-outputs) and some of them (namely OOC_SYNTH, BITFILE) add substantial runtime and are not needed for this flow.
+- `stitched_ip_gen_dcp` : will generate an IP block with a synthesized design checkpoint (.dcp) which makes the design more portable across different machines, but will add some runtime.
+
+
+### Running FINN Compiler
+
+Prior to running, insure the following prerequisites have been met:
+- Install FINN and prerequisites.  The [Getting Started](https://finn.readthedocs.io/en/latest/getting_started.html#quickstart) section of the FINN documentation might be helpful for this.
+- Ensure you have the `FINN_XILINX_PATH` and `FINN_XILINX_VERSION` env variables set appropriately for your install.  For example:
+> export FINN_XILINX_PATH=/opt/Xilinx
+> export FINN_XILINX_VERSION=2020.1
+- Set the env variable for your `finn` install top directory (where you cloned the FINN compiler repo):
+> export FINN_ROOT=/home/foo/finn
+
+Then, change to `finn` install directory and invoke the build as follows:
+> cd ${FINN_ROOT}
+> ./run-docker.sh build_custom ${FINN_ROOT}/tutorials/fpga_flow/
+
+Alternatively, since the tutorials folder is already part of the FINN compiler installation, you can invoke it from within the Docker container:
+> cd ${FINN_ROOT}
+> ./run-docker.sh
+> cd tutorials/fpga_flow
+> python build.py
+
+The build should finish in about 10 minutes, and the FINN docker will close on success.
+
+```
+   ...
+   Running step: step_create_stitched_ip [11/16]
+   Running step: step_measure_rtlsim_performance [12/16]
+   Running step: step_out_of_context_synthesis [13/16]
+   Running step: step_synthesize_bitfile [14/16]
+   Running step: step_make_pynq_driver [15/16]
+   Running step: step_deployment_package [16/16]
+   Completed successfully
+   The program finished and will be restarted
+```
+
+
+### Examine the Stitched IP
+
+Navigate to the stitched IP project directory:
+
+> cd ${FINN_ROOT}/tutorials/fpga_flow/output_tfc_w0a1_fpga/stitched_ip
+
+And, open the project:
+
+> vivado finn_vivado_stitch_proj.xpr
+
+Explore the IPI board design and note the interfaces. Keep this design open in Vivado, as we'll be adding the testbench and invoking the simulation here later on.
+
+### Simulating the Stitched IP with a Verilog Test Bench
+
+The included `testbench.sv` is a very simple test to illustrate how to feed data to the compiled model.
+
+The image data is 784 bytes per frame, organized as 28x28 unsigned integer bytes.  However, due to the folding optimizations chosen, the input data is transfered to the hardware model 49 bytes at a time over 16 cycles.  Note how this matches PE=49 as selected for the first layer in `folding_config.json`
+
+Using the following image for coordinate reference where a byte is identified as B\<row\>\_\<column\> we see that B0_0 is the upper leftmost byte, and B27_27 is the lower right most byte:
+
+![Image coordinates: 0,0 is the upper left, and 27,27 is the lower right](numeral.png)
+
+Thus, the input data for the first cycle is organized as such:
+```
+  s_axis_0_tdata[391:0] = {B1_20,B1_19, ...  ,B1_0,B0_27, ...  ,B0_1,B0_0};
+```
+
+The testbench reads data from a simple text file (data.hex).  The included script `gen_tb_data.py` creates the test data as well as the ground truth expectations (Note: using ground truth is undesirable if the intent is to validate that the HW implementation matches the trained model).  The script takes the liberty of flipping the byte-order such that verilog's `$readmemh` brings B0_0 nicely into the LSB position.
+
+To generate the test data, you'll need a Python environment with Keras installed since the Python script uses `keras.datasets` to access the MNIST data. Once you have this, you can generate the test data with the following.
+
+> cd ${FINN_ROOT}/tutorials/fpga_flow/output_tfc_w0a1_fpga/stitched_ip
+> mkdir -p finn_vivado_stitch_proj.sim/sim_1/behav/xsim
+> python ../../gen_tb_data.py finn_vivado_stitch_proj.sim/sim_1/behav/xsim/data.hex
+
+If you'd like to, you can examine what the generated .hex file with the test data looks like:
+
+> less finn_vivado_stitch_proj.sim/sim_1/behav/xsim/data.hex
+
+In Vivado, add the testbench as a simulation file by pasting the following into the Tcl Console:
+> add_files -fileset sim_1 -norecurse ../../testbench.sv
+
+
+Then, run the simulation (Flow Navigator -> Simulation -> Run Simulation).   Give the simulator a `run -all`  (click the "play" button in the simulator) to run the sim to its $finish conclusion.  With 20 test points run, it should have 1 mismatch due using the ground-truth as the check source:
+
+```
+ ************************************************************
+  SIM COMPLETE
+   Validated 20 data points
+   Total error count: ====>  1  <====
+```
+
+Note that this mismatch is due to the trained neural network not having perfect accuracy on the test dataset (i.e. the trained PyTorch model would have the same behavior).
+
+#### Instantiation in Mission Design
+
+There are any number of ways to bring the stitched IP into larger design.
+
+FINN already packages the stitched IP block design as a standalone IP-XACT component, which you can find under `${FINN_ROOT}/tutorials/fpga_flow/output_tfc_w0a1_fpga/stitched_ip/ip`. You can add this to the list of IP repos and use it in your own Vivado designs. A good reference for this is [UG1119](https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_1/ug1119-vivado-creating-packaging-ip-tutorial.pdf)
+
+Keep in mind that all of the User IP Repo's included in the Stitched IP project (from `$FINN_HOST_BUILD_DIR` which is normally located under `/tmp/finn_dev_<username>`) need to also be brought in as IP Repo's to any project using the stitched IP.  It would be prudent to copy those IP repos to an appropriate archive location. Alternatively, if you don't want to copy all of the dependencies, you can ask FINN to generate the IP-XACT component with a synthesized .dcp checkpoint by passing the [stitched_ip_gen_dcp=True](https://finn-dev.readthedocs.io/en/latest/source_code/finn.builder.html#finn.builder.build_dataflow_config.DataflowBuildConfig.stitched_ip_gen_dcp) option as part of the build configuration.
--- a/tutorials/fpga_flow/build.py
+++ b/tutorials/fpga_flow/build.py
+# Copyright (c) 2022 Xilinx, Inc.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# * Redistributions of source code must retain the above copyright notice, this
+#   list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright notice,
+#   this list of conditions and the following disclaimer in the documentation
+#   and/or other materials provided with the distribution.
+#
+# * Neither the name of Xilinx nor the names of its
+#   contributors may be used to endorse or promote products derived from
+#   this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+# This file is intended to serve as an example showing how to set up custom builds
+# using FINN. The custom build can be launched like this:
+# ./run-docker.sh build_custom /path/to/folder
+
+
+import finn.builder.build_dataflow as build
+import finn.builder.build_dataflow_config as build_cfg
+
+model_name = "tfc_w1a1"
+platform_name = "fpga"
+
+cfg = build.DataflowBuildConfig(
+    board=platform_name,
+    output_dir="output_%s_%s" % (model_name, platform_name),
+    synth_clk_period_ns=10.0,
+    folding_config_file="folding_config.json",
+    fpga_part="xczu3eg-sbva484-1-e",
+    shell_flow_type=build_cfg.ShellFlowType.VIVADO_ZYNQ,
+    stitched_ip_gen_dcp=False,
+    generate_outputs=[
+        build_cfg.DataflowOutputType.STITCHED_IP,
+        # build_cfg.DataflowOutputType.PYNQ_DRIVER,
+        # build_cfg.DataflowOutputType.RTLSIM_PERFORMANCE,
+        # build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
+        # build_cfg.DataflowOutputType.OOC_SYNTH,
+        # build_cfg.DataflowOutputType.DEPLOYMENT_PACKAGE,
+    ],
+    verify_steps=[
+        build_cfg.VerificationStepType.TIDY_UP_PYTHON,
+        build_cfg.VerificationStepType.STREAMLINED_PYTHON,
+        build_cfg.VerificationStepType.FOLDED_HLS_CPPSIM,
+        build_cfg.VerificationStepType.STITCHED_IP_RTLSIM,
+    ],
+    save_intermediate_models=True,
+)
+model_file = "model.onnx"
+build.build_dataflow_cfg(model_file, cfg)
--- a/tutorials/fpga_flow/expected_output.npy
+++ b/tutorials/fpga_flow/expected_output.npy
--- a/tutorials/fpga_flow/folding_config.json
+++ b/tutorials/fpga_flow/folding_config.json
+{
+  "Defaults": {},
+  "Thresholding_Batch_0": {
+    "PE": 49,
+    "ram_style": "block"
+  },
+  "StreamingFCLayer_Batch_0": {
+    "PE": 16,
+    "SIMD": 49,
+    "ram_style": "block"
+  },
+  "StreamingFCLayer_Batch_1": {
+    "PE": 8,
+    "SIMD": 8,
+    "ram_style": "auto"
+  },
+  "StreamingFCLayer_Batch_2": {
+    "PE": 8,
+    "SIMD": 8,
+    "ram_style": "auto"
+  },
+  "StreamingFCLayer_Batch_3": {
+    "PE": 10,
+    "SIMD": 8,
+    "ram_style": "distributed"
+  },
+  "LabelSelect_Batch_0": {
+    "PE": 1
+  }
+}
--- a/tutorials/fpga_flow/gen_tb_data.py
+++ b/tutorials/fpga_flow/gen_tb_data.py
+#!/usr/bin/python3
+# Copyright (c) 2022 Xilinx, Inc.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# * Redistributions of source code must retain the above copyright notice, this
+#   list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright notice,
+#   this list of conditions and the following disclaimer in the documentation
+#   and/or other materials provided with the distribution.
+#
+# * Neither the name of Xilinx nor the names of its
+#   contributors may be used to endorse or promote products derived from
+#   this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+import sys
+from keras.datasets import mnist
+
+(train_x, train_y), (test_x, test_y) = mnist.load_data()
+print("Loaded MNIST test data successfully")
+# print('X_test:  '  + str(test_x.shape))
+
+if len(sys.argv) != 2:
+    print("Expected: gen_tb_data.py <path_to_hex_file>")
+    sys.exit(-1)
+
+file_name = sys.argv[1]
+
+with open(file_name, "w") as tb_data:
+    for i in range(20):
+        for j in range(28):
+            for k in range(27, -1, -1):
+                tb_data.write("{:02X}".format(test_x[i][j][k]))
+            tb_data.write("\n")
+        tb_data.write(
+            "ffffffffffffffffffffffffffffffffffffffffffffffffffffff{:02X}\n".format(
+                test_y[i]
+            )
+        )
+
+print("Testbench data generated at " + file_name)
--- a/tutorials/fpga_flow/input.npy
+++ b/tutorials/fpga_flow/input.npy
--- a/tutorials/fpga_flow/model.onnx
+++ b/tutorials/fpga_flow/model.onnx
--- a/tutorials/fpga_flow/numeral.png
+++ b/tutorials/fpga_flow/numeral.png
--- a/tutorials/fpga_flow/testbench.sv
+++ b/tutorials/fpga_flow/testbench.sv
+// Copyright (c) 2022 Xilinx, Inc.
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+//
+// * Redistributions of source code must retain the above copyright notice, this
+//   list of conditions and the following disclaimer.
+//
+// * Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// * Neither the name of Xilinx nor the names of its
+//   contributors may be used to endorse or promote products derived from
+//   this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+// DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+// FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+// DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+// SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+// OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+`timescale 1 ns / 1 ps
+`define HEXFILE "data.hex"
+
+parameter MAX_FL =4000;
+
+
+module tb ();
+
+logic [28*8-1:0] data [MAX_FL];
+logic [28*8-1:0] data_row;
+logic [28*28*8-1:0] img_data;
+logic [7:0] fifo [16];
+logic [3:0] rd_ptr=0;
+logic [3:0] wr_ptr=0;
+int err_count=0;
+int data_count=0;
+int i,j;
+logic [31:0] file_lines;
+
+logic ap_clk = 0;
+logic ap_rst_n = 0;
+
+logic [7:0]dout_tdata;
+logic dout_tlast;
+logic dout_tready;
+logic dout_tvalid;
+
+logic [392-1:0]din_tdata;
+logic din_tready;
+logic din_tvalid;
+
+
+
+finn_design_wrapper finn_design_wrapper (
+  .ap_clk                (ap_clk               ),//i
+  .ap_rst_n              (ap_rst_n             ),//i
+
+  .m_axis_0_tdata        (dout_tdata           ),//o
+  .m_axis_0_tready       (dout_tready          ),//i
+  .m_axis_0_tvalid       (dout_tvalid          ),//o
+
+  .s_axis_0_tdata        (din_tdata           ),//i
+  .s_axis_0_tready       (din_tready          ),//o
+  .s_axis_0_tvalid       (din_tvalid          ) //i
+);
+
+initial begin: AP_CLK
+  forever begin
+    ap_clk = #5 ~ap_clk;
+  end
+end
+
+
+initial begin
+  // Hex file formated for Upper N bits as input data, and lower N bits as expected output data
+
+  $readmemh(`HEXFILE, data);
+  // Determine how large file actuall is
+  for (i=0; i<MAX_FL; i+=1)  if (data[i][0] !== 1'bx) file_lines = i;
+  if (file_lines[0] === {1'bx}) begin
+    $display("ERROR:  Unable to read hex file: %s",`HEXFILE);
+    $finish;
+  end
+
+
+  din_tvalid = 0;
+  din_tdata = 0;
+  dout_tready = 1;
+
+  repeat (100)  @(negedge ap_clk);
+  ap_rst_n = 1;
+  repeat (100)  @(negedge ap_clk);
+  dout_tready = 1;
+
+  repeat (10)  @(negedge ap_clk);
+  //while (~din_tready) @(negedge ap_clk);
+  @(negedge ap_clk);
+  @(negedge ap_clk);
+
+  // The hex file is formated in 29 row blocks
+  //    The first 28 rows are the image data
+  //    The 29th row is the ground truth expected result stored in the lowest byte.
+  // Note that each row's byte-order is saved such that the high-byte is in the upper
+  // most bits, and the first byte in the lower-most bits.
+  for (j=0; j<=file_lines; j+=1) begin
+    if ((j%29) < 28) begin
+      img_data[(j%29)*28*8+:28*8] = data[j];
+    end else begin
+      // Grab the verifcation result on the 29th row
+      data_row = data[j];
+      //$display("wr_ptr %h, data:%h,  j=%d",wr_ptr,data[j],j);
+      fifo[wr_ptr] = data_row[7:0];
+      wr_ptr++;
+
+      // Due to folding factors, the 784 bytes of each image gets fed 49-bytes at a time
+      // over 16 cycles
+      for (i=0; i<16; i+=1) begin
+        din_tvalid = 1;
+        din_tdata = img_data[392*i+:392];
+        @(negedge ap_clk);
+        while (~din_tready)  @(negedge ap_clk);
+        din_tvalid = 0;
+        //repeat (200) @(negedge ap_clk);
+      end
+    end
+  end
+  din_tdata = 0;
+  din_tvalid = 0;
+
+  repeat (1000)  @(negedge ap_clk);
+  din_tdata = 0;
+  if (wr_ptr != rd_ptr) begin
+    $display("ERR: End-sim check: rd_ptr %h != %h wr_ptr",rd_ptr, wr_ptr);
+    err_count++;
+  end
+
+  $display("\n************************************************************ ");
+  $display("  SIM COMPLETE");
+  $display("  Validated %0d data points ",data_count);
+  $display("  Total error count: ====>  %0d  <====\n",err_count);
+  $finish;
+end
+
+
+// Check the result at each valid output from the model
+always @(posedge ap_clk) begin
+  if (dout_tvalid && ap_rst_n) begin
+    if (dout_tdata !== fifo[rd_ptr]) begin
+      $display("ERR: Data mismatch %h != %h ",dout_tdata, fifo[rd_ptr]);
+      err_count++;
+    end else begin
+      $display("CHK: Data    match %h == %h   --> %0d",dout_tdata, fifo[rd_ptr], data_count);
+    end
+    rd_ptr++;
+    data_count++;
+  end
+end
+
+endmodule