diff --git a/notebooks/end2end_example/cybersecurity/1-train-mlp-with-brevitas.ipynb b/notebooks/end2end_example/cybersecurity/1-train-mlp-with-brevitas.ipynb
index 3792c5704bcff3600407522b530327ef48d53f6b..80d48beeebd61d9faa81ed9e47a33b04df23de7f 100644
--- a/notebooks/end2end_example/cybersecurity/1-train-mlp-with-brevitas.ipynb
+++ b/notebooks/end2end_example/cybersecurity/1-train-mlp-with-brevitas.ipynb
@@ -33,7 +33,7 @@
     "*The task:* The goal of [*network intrusion detection*](https://ieeexplore.ieee.org/abstract/document/283931) is to identify, preferably in real time, unauthorized use, misuse, and abuse of computer systems by both system insiders and external penetrators. This may be achieved by a mix of techniques, and machine-learning (ML) based techniques are increasing in popularity. \n",
     "\n",
     "*The dataset:* Several datasets are available for use in ML-based methods for intrusion detection.\n",
-    "The [UNSW-NB15](https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/) is one such dataset created by the Australian Centre for Cyber Security (ACCS) to provide a comprehensive network based data set which can reflect modern network traffic scenarios. You can find more details about the dataset on [its homepage](https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/).\n",
+    "The **UNSW-NB15** is one such dataset created by the Australian Centre for Cyber Security (ACCS) to provide a comprehensive network based data set which can reflect modern network traffic scenarios. You can find more details about the dataset on [its homepage](https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/).\n",
     "\n",
     "*Performance considerations:* FPGAs are commonly used for implementing high-performance packet processing systems that still provide a degree of programmability. To avoid introducing bottlenecks on the network, the DNN implementation must be capable of detecting malicious ones at line rate, which can be millions of packets per second, and is expected to increase further as next-generation networking solutions provide increased\n",
     "throughput. This is a good reason to consider FPGA acceleration for this particular use-case."
@@ -46,9 +46,8 @@
     "## Outline\n",
     "-------------\n",
     "\n",
-    "* [Initial setup](#initial_setup)\n",
-    "* [Load the UNSW_NB15 dataset](#load_dataset) \n",
-    "* [Define the Quantized MLP model](#define_quantized_mlp)\n",
+    "* [Load the UNSW_NB15 Dataset](#load_dataset) \n",
+    "* [Define the Quantized MLP Model](#define_quantized_mlp)\n",
     "* [Define Train and Test  Methods](#train_test)\n",
     "    * [(Option 1) Train the Model from Scratch](#train_scratch)\n",
     "    * [(Option 2) Load Pre-Trained Parameters](#load_pretrained)\n",
@@ -88,7 +87,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We will create a binarized representation for the dataset by following the procedure defined by [Murovic and Trost](https://ev.fe.uni-lj.si/1-2-2019/Murovic.pdf), which we repeat briefly here:\n",
+    "We will create a binarized representation for the dataset by following the procedure defined by Murovic and Trost, which we repeat briefly here:\n",
     "\n",
     "* Original features have different formats ranging from integers, floating numbers to strings.\n",
     "* Integers, which for example represent a packet lifetime, are binarized with as many bits as to include the maximum value. \n",
@@ -405,7 +404,7 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "Training loss = 0.132480 test accuracy = 0.797989: 100%|██████████| 10/10 [00:57<00:00,  5.79s/it]\n"
+      "Training loss = 0.132480 test accuracy = 0.797989: 100%|██████████| 10/10 [00:58<00:00,  5.70s/it]\n"
      ]
     }
    ],
@@ -466,7 +465,7 @@
    "outputs": [
     {
      "data": {
-      "image/png": "\n",
+      "image/png": "\n",
       "text/plain": [
        "<Figure size 432x288 with 1 Axes>"
       ]
@@ -479,7 +478,7 @@
    ],
    "source": [
     "acc_per_epoch = [np.mean(acc_per_epoch) for acc_per_epoch in running_test_acc]\n",
-    "display_loss_plot(acc_per_epoch, title=\"Training accuracy\", ylabel=\"Accuracy [%]\")"
+    "display_loss_plot(acc_per_epoch, title=\"Test accuracy\", ylabel=\"Accuracy [%]\")"
    ]
   },
   {
@@ -567,6 +566,13 @@
     "test(model, test_quantized_loader)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Why do these parameters give better accuracy vs training from scratch?** Even with the topology and quantization fixed, achieving good accuracy on a given dataset requires [*hyperparameter tuning*](https://towardsdatascience.com/hyperparameters-optimization-526348bb8e2d) and potentially running training for a long time. The \"training from scratch\" example above is only intended as a quick example, whereas the pretrained parameters are obtained from a longer training run using the [determined.ai](https://determined.ai/) platform for hyperparameter tuning."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -659,7 +665,7 @@
    "source": [
     "Next, we'll modify the expected input/output ranges. In FINN, we prefer to work with bipolar {-1, +1} instead of binary {0, 1} values. To achieve this, we'll create a \"wrapper\" model that handles the pre/postprocessing as follows:\n",
     "\n",
-    "* on the input side, we'll pre-process by (x + 1) / 2 in order to map incoming {-1, +1} inputs to {0, 1} ones which the trained network is used to. Since we're just multiplying/adding a scalar, these operations can be *streamlined* in FINN and implemented with no extra cost.\n",
+    "* on the input side, we'll pre-process by (x + 1) / 2 in order to map incoming {-1, +1} inputs to {0, 1} ones which the trained network is used to. Since we're just multiplying/adding a scalar, these operations can be [*streamlined*](https://finn.readthedocs.io/en/latest/nw_prep.html#streamlining-transformations) by FINN and implemented with no extra cost.\n",
     "\n",
     "* on the output side, we'll add a binary quantizer which maps everthing below 0 to -1 and everything above 0 to +1. This is essentially the same behavior as the sigmoid we used earlier, except the outputs are bipolar instead of binary."
    ]
@@ -748,7 +754,8 @@
    "source": [
     "# Export to FINN-ONNX <a id=\"export_finn_onnx\" ></a>\n",
     "\n",
-    "FINN expects an ONNX model as input. We'll now export our network into ONNX to be imported and used in FINN for the next notebooks. Note that the particular ONNX representation used for FINN differs from standard ONNX, you can read more about this [here](https://finn.readthedocs.io/en/latest/internals.html#intermediate-representation-finn-onnx)."
+    "\n",
+    "[ONNX](https://onnx.ai/) is an open format built to represent machine learning models, and the FINN compiler expects an ONNX model as input. We'll now export our network into ONNX to be imported and used in FINN for the next notebooks. Note that the particular ONNX representation used for FINN differs from standard ONNX, you can read more about this [here](https://finn.readthedocs.io/en/latest/internals.html#intermediate-representation-finn-onnx)."
    ]
   },
   {
@@ -833,11 +840,11 @@
    "source": [
     "## View the Exported ONNX in Netron\n",
     "\n",
-    "Let's examine the exported ONNX model with Netron. Particular things of note:\n",
+    "Let's examine the exported ONNX model with [Netron](https://github.com/lutzroeder/netron), which is a visualizer for neural networks and allows interactive investigation of network properties. For example, you can click on the individual nodes and view the properties. Particular things of note:\n",
     "\n",
     "* The input tensor \"0\" is annotated with `quantization: finn_datatype: BIPOLAR`\n",
     "* The input preprocessing (x + 1) / 2 is exported as part of the network (initial `Add` and `Div` layers)\n",
-    "* We've exported the padded version; shape of the first MatMul node's weight parameter is 600x64\n",
+    "* Brevitas `QuantLinear` layers are exported to ONNX as `MatMul`. We've exported the padded version; shape of the first MatMul node's weight parameter is 600x64\n",
     "* The weight parameters (second inputs) for MatMul nodes are annotated with `quantization: finn_datatype: INT2`\n",
     "* The quantized activations are exported as `MultiThreshold` nodes with `domain=finn.custom_op.general`\n",
     "* There's a final `MultiThreshold` node with threshold=0 to produce the final bipolar output (this is the `qnt_output` from `CybSecMLPForExport`"
@@ -869,7 +876,7 @@
        "        "
       ],
       "text/plain": [
-       "<IPython.lib.display.IFrame at 0x7f808a61f438>"
+       "<IPython.lib.display.IFrame at 0x7f77214fa630>"
       ]
      },
      "execution_count": 27,