docs/_posts/2019-10-02-rebuilding-finn-for-open-source.md · 52a0a9f7df1284b72fb8a4245f4b8e63b1969f9e · streichg / Finn · GitLab

Snippets Groups Projects

To receive notifications about scheduled maintenance, please subscribe to the mailing-list gitlab-operations@sympa.ethz.ch. You can subscribe to the mailing-list at https://sympa.ethz.ch

5 years ago
52a0a9f7

Create 2019-10-02-rebuilding-finn-for-open-source.md · 52a0a9f7
Yaman Umuroglu authored 5 years ago

Unverified

52a0a9f7

History

Create 2019-10-02-rebuilding-finn-for-open-source.md
Yaman Umuroglu authored 5 years ago

2019-10-02-rebuilding-finn-for-open-source.md 5.61 KiB

layout: post
title:  "Rebuilding FINN for open source"
author: "Yaman Umuroglu"

We're happy to announce some exciting developments in the FINN project: we're rebuilding our solution stack from the ground up to be more modular, more usable and more open-source!

A quick retrospective

Over the past few years, the team at Xilinx Research Labs Ireland has done quite a bit of research of Quantized Neural Networks (QNNs). Starting with Binarized Neural Networks (BNNs) on FPGAs back in 2016, we've since looked at many aspects of quantized deep learning, ranging from at better quantization methods and mixing quantization and pruning, to accuracy-throughput tradeoffs and recurrent topologies.

Although some demonstrators of our work has been open source for some time, we want to take things a step further. We love QNNs and the high-performance, high-efficiency dataflow accelerators we can build for them on Xilinx FPGAs, and we want you and the FPGA/ML community to be able to do the same. The (co-)design process for making this happen is actually quite involved, starting from customizing a neural network in a machine learning framework, going through multiple design steps that involve many optimizations, HLS code generation and Vivado synthesis, and ending up with an FPGA bitstream that you can deploy as part of some application. Many of those steps require some manual effort, but having a modular, flexible solution stack to support you through this process is greatly helpful. This is why we are rebulding our FINN solution stack from the ground-up to make it more modular, and we hope to build a community around it that shares our excitement around QNNs for FPGAs.

Making FINN modular

The first step towards making this happen is to define what layers exist in the solution stack. In many ways, this solution stack is inspired by the tested-and-tried frontend/backend software architecture found in compiler frameworks like LLVM. This stack breaks down the complex co-design problem into parts, and each layer focuses on a different sub-problem, consuming the artifacts produced by the previous one. The diagram on the left illustrates this briefly, and over the next few months we hope to make a first few QNNs go through all the layers of this stack to produce cool FPGA dataflow accelerators. In fact, some of these components are already available today for you to explore!

Let's have a look at the main parts:

Brevitas is a PyTorch library that lets you do quantization-aware training. It gives you a set of torch.nn building blocks to explore different forms of weight, activation and accumulator quantization schemes. You can also learn the bitwidths for different layers with backpropagation! See the Brevitas page for more information.
Frontend. Once you are happy with the accuracy of your quantized neural network in Brevitas, you'll be able to export it into a custom ONNX representation that FINN uses internally to represent QNNs. More details about this custom ONNX representation will be available in an upcoming blog post.
The FINN Compiler will then import this ONNX representation, and go through several steps of optimizations such as the streamlining transform to make the QNN simpler.
The FPGA dataflow backend will then convert the optimized QNN into a series of streaming HLS library calls. An important part of the stack is the FINN HLS library, which provides optimized Vivado HLS descriptions of several common layer types (convolutions, thresholding, pooling...) found in QNNs.
Synthesis. Once the HLS calls are generated, the next steps are to call Vivado HLS and Vivado to generate a bitstream for the target Xilinx FPGA. We have plans to support Vivado IPI block design code generation as well for increased agility and modularity.
PYNQ deployment. Finally, you will be able to use any of the supported PYNQ platforms to directly call the generated accelerator from Python and integrate it with other functionality. Since FINN-generated dataflow accelerators expose streaming interfaces, we think it will be exciting to use streaming-oriented Python frameworks such as Ray to create heterogeneous, high-performance task graphs incorporating QNNs.

Getting started

More will be available in the coming weeks and months, but if you want to get your hands dirty there's already plenty to start with! If you haven't done so already, we recommend starting with BNN-PYNQ to see what dataflow QNN accelerators look and feel like. You can also start experimenting with Brevitas to train some QNNs, or put together a streaming pipeline with the FINN HLS library. We have also created a Gitter channel to make it easier to get in touch with the community, and hope to see many of you there! :)