diff --git a/docs/finn/internals.rst b/docs/finn/internals.rst index 7a4bc687eeb827320991f7d3f1ef8cc35e97f3da..010cdece978cde078c3df4c64177fa1c5455aa0a 100644 --- a/docs/finn/internals.rst +++ b/docs/finn/internals.rst @@ -16,6 +16,9 @@ Custom Quantization Annotations ONNX does not support datatypes smaller than 8-bit integers, whereas in FINN we are interested in smaller integers down to ternary and bipolar. To make this work, FINN uses the quantization_annotation field in ONNX to annotate tensors with their FINN DataType (:py:mod:`finn.core.datatype.DataType`) information. However, all tensors are expected to use single-precision floating point (float32) storage in FINN. This means we store even a 1-bit value as floating point for the purposes of representation. The FINN compiler flow is responsible for eventually producing a packed representation for the target hardware, where the 1-bit is actually stored as 1-bit. +Note that FINN uses floating point tensors as a carrier data type to represent integers. Floating point arithmetic can introduce rounding errors, e.g. (int_num * float_scale) / float_scale is not always equal to int_num. +When using the custom ONNX execution flow, FINN will attempt to sanitize any rounding errors for integer tensors. See (:py:mod:`finn.util.basic.sanitize_quant_values`) for more information. + Custom Operations/Nodes =======================