Skip to content
Snippets Groups Projects
Commit c4d90729 authored by Yaman Umuroglu's avatar Yaman Umuroglu
Browse files

[Notebook] cybsec-1: bring back dataset download

parent 0736356d
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Train a Quantized MLP on UNSW-NB15 with Brevitas
%% Cell type:markdown id: tags:
<font color="red">**FPGA'21 tutorial:** We recommend clicking **Cell -> Run All** when you start reading this notebook for "latency hiding".</font>
%% Cell type:markdown id: tags:
In this notebook, we will show how to create, train and export a quantized Multi Layer Perceptron (MLP) with quantized weights and activations with [Brevitas](https://github.com/Xilinx/brevitas).
Specifically, the task at hand will be to label network packets as normal or suspicious (e.g. originating from an attacker, virus, malware or otherwise) by training on a quantized variant of the UNSW-NB15 dataset.
**You won't need a GPU to train the neural net.** This MLP will be small enough to train on a modern x86 CPU, so no GPU is required to follow this tutorial Alternatively, we provide pre-trained parameters for the MLP if you want to skip the training entirely.
%% Cell type:markdown id: tags:
## A quick introduction to the task and the dataset
*The task:* The goal of [*network intrusion detection*](https://ieeexplore.ieee.org/abstract/document/283931) is to identify, preferably in real time, unauthorized use, misuse, and abuse of computer systems by both system insiders and external penetrators. This may be achieved by a mix of techniques, and machine-learning (ML) based techniques are increasing in popularity.
*The dataset:* Several datasets are available for use in ML-based methods for intrusion detection.
The **UNSW-NB15** is one such dataset created by the Australian Centre for Cyber Security (ACCS) to provide a comprehensive network based data set which can reflect modern network traffic scenarios. You can find more details about the dataset on [its homepage](https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/).
*Performance considerations:* FPGAs are commonly used for implementing high-performance packet processing systems that still provide a degree of programmability. To avoid introducing bottlenecks on the network, the DNN implementation must be capable of detecting malicious ones at line rate, which can be millions of packets per second, and is expected to increase further as next-generation networking solutions provide increased
throughput. This is a good reason to consider FPGA acceleration for this particular use-case.
%% Cell type:markdown id: tags:
## Outline
-------------
* [Load the UNSW_NB15 Dataset](#load_dataset)
* [Define the Quantized MLP Model](#define_quantized_mlp)
* [Define Train and Test Methods](#train_test)
* [(Option 1) Train the Model from Scratch](#train_scratch)
* [(Option 2) Load Pre-Trained Parameters](#load_pretrained)
* [Network Surgery Before Export](#network_surgery)
* [Export to FINN-ONNX](#export_finn_onnx)
%% Cell type:code id: tags:
``` python
import onnx
import torch
```
%% Cell type:markdown id: tags:
**This is important -- always import onnx before torch**. This is a workaround for a [known bug](https://github.com/onnx/onnx/issues/2394).
%% Cell type:markdown id: tags:
## Load the UNSW_NB15 Dataset <a id='load_dataset'></a>
### Dataset Quantization <a id='dataset_qnt'></a>
The goal of this notebook is to train a Quantized Neural Network (QNN) to be later deployed as an FPGA accelerator generated by the FINN compiler. Although we can choose a variety of different precisions for the input, [Murovic and Trost](https://ev.fe.uni-lj.si/1-2-2019/Murovic.pdf) have previously shown we can actually binarize the inputs and still get good (90%+) accuracy.
%% Cell type:markdown id: tags:
We will create a binarized representation for the dataset by following the procedure defined by Murovic and Trost, which we repeat briefly here:
* Original features have different formats ranging from integers, floating numbers to strings.
* Integers, which for example represent a packet lifetime, are binarized with as many bits as to include the maximum value.
* Another case is with features formatted as strings (protocols), which are binarized by simply counting the number of all different strings for each feature and coding them in the appropriate number of bits.
* Floating-point numbers are reformatted into fixed-point representation.
* In the end, each sample is transformed into a 593-bit wide binary vector.
* All vectors are labeled as bad (0) or normal (1)
Following Murovic and Trost's open-source implementation provided as a Matlab script [here](https://github.com/TadejMurovic/BNN_Deployment/blob/master/cybersecurity_dataset_unswb15.m), we've created a [Python version](dataloader_quantized.py).
<font color="red">**FPGA'21 tutorial:** Downloading the original dataset and quantizing it can take some time, so we provide a pre-quantized version for your convenience. The prequantized dataset (unsw_nb15_binarized.npz) should already be available on the AWS instance. If not, uncomment the following line to download:</font>
<font color="red">**FPGA'21 tutorial:** Downloading the original dataset and quantizing it can take some time, so we provide a download link to the pre-quantized version for your convenience. </font>
%% Cell type:code id: tags:
``` python
# ! wget -O unsw_nb15_binarized.npz https://zenodo.org/record/4519767/files/unsw_nb15_binarized.npz?download=1
! wget -O unsw_nb15_binarized.npz https://zenodo.org/record/4519767/files/unsw_nb15_binarized.npz?download=1
```
%% Output
--2021-02-24 16:57:33-- https://zenodo.org/record/4519767/files/unsw_nb15_binarized.npz?download=1
Resolving zenodo.org (zenodo.org)... 137.138.76.77
Connecting to zenodo.org (zenodo.org)|137.138.76.77|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13391907 (13M) [application/octet-stream]
Saving to: 'unsw_nb15_binarized.npz'
unsw_nb15_binarized 100%[===================>] 12.77M 2.17MB/s in 8.9s
2021-02-24 16:57:44 (1.44 MB/s) - 'unsw_nb15_binarized.npz' saved [13391907/13391907]
%% Cell type:markdown id: tags:
We can extract the binarized numpy arrays from the .npz archive and wrap them as a PyTorch `TensorDataset` as follows:
%% Cell type:code id: tags:
``` python
import numpy as np
from torch.utils.data import TensorDataset
def get_preqnt_dataset(data_dir: str, train: bool):
unsw_nb15_data = np.load(data_dir + "/unsw_nb15_binarized.npz")
if train:
partition = "train"
else:
partition = "test"
part_data = unsw_nb15_data[partition].astype(np.float32)
part_data = torch.from_numpy(part_data)
part_data_in = part_data[:, :-1]
part_data_out = part_data[:, -1]
return TensorDataset(part_data_in, part_data_out)
train_quantized_dataset = get_preqnt_dataset(".", True)
test_quantized_dataset = get_preqnt_dataset(".", False)
print("Samples in each set: train = %d, test = %s" % (len(train_quantized_dataset), len(test_quantized_dataset)))
print("Shape of one input sample: " + str(train_quantized_dataset[0][0].shape))
```
%% Output
Samples in each set: train = 175341, test = 82332
Shape of one input sample: torch.Size([593])
%% Cell type:markdown id: tags:
## Set up DataLoader
Following either option, we now have access to the quantized dataset. We will wrap the dataset in a PyTorch `DataLoader` for easier access in batches.
%% Cell type:code id: tags:
``` python
from torch.utils.data import DataLoader, Dataset
batch_size = 1000
# dataset loaders
train_quantized_loader = DataLoader(train_quantized_dataset, batch_size=batch_size, shuffle=True)
test_quantized_loader = DataLoader(test_quantized_dataset, batch_size=batch_size, shuffle=False)
```
%% Cell type:code id: tags:
``` python
count = 0
for x,y in train_quantized_loader:
print("Input shape for 1 batch: " + str(x.shape))
print("Label shape for 1 batch: " + str(y.shape))
count += 1
if count == 1:
break
```
%% Output
Input shape for 1 batch: torch.Size([1000, 593])
Label shape for 1 batch: torch.Size([1000])
%% Cell type:markdown id: tags:
# Define the Quantized MLP Model <a id='define_quantized_mlp'></a>
We'll now define an MLP model that will be trained to perform inference with quantized weights and activations.
For this, we'll use the quantization-aware training (QAT) capabilities offered by [Brevitas](https://github.com/Xilinx/brevitas).
Our MLP will have four fully-connected (FC) layers in total: three hidden layers with 64 neurons, and a final output layer with a single output, all using 2-bit weights. We'll use 2-bit quantized ReLU activation functions, and apply batch normalization between each FC layer and its activation.
In case you'd like to experiment with different quantization settings or topology parameters, we'll define all these topology settings as variables.
%% Cell type:code id: tags:
``` python
input_size = 593
hidden1 = 64
hidden2 = 64
hidden3 = 64
weight_bit_width = 2
act_bit_width = 2
num_classes = 1
```
%% Cell type:markdown id: tags:
Now we can define our MLP using the layer primitives provided by Brevitas:
%% Cell type:code id: tags:
``` python
from brevitas.nn import QuantLinear, QuantReLU
import torch.nn as nn
# Setting seeds for reproducibility
torch.manual_seed(0)
model = nn.Sequential(
QuantLinear(input_size, hidden1, bias=True, weight_bit_width=weight_bit_width),
nn.BatchNorm1d(hidden1),
nn.Dropout(0.5),
QuantReLU(bit_width=act_bit_width),
QuantLinear(hidden1, hidden2, bias=True, weight_bit_width=weight_bit_width),
nn.BatchNorm1d(hidden2),
nn.Dropout(0.5),
QuantReLU(bit_width=act_bit_width),
QuantLinear(hidden2, hidden3, bias=True, weight_bit_width=weight_bit_width),
nn.BatchNorm1d(hidden3),
nn.Dropout(0.5),
QuantReLU(bit_width=act_bit_width),
QuantLinear(hidden3, num_classes, bias=True, weight_bit_width=weight_bit_width)
)
```
%% Cell type:markdown id: tags:
Note that the MLP's output is not yet quantized. Even though we want the final output of our MLP to be a binary (0/1) value indicating the classification, we've only defined a single-neuron FC layer as the output. While training the network we'll pass that output through a sigmoid function as part of the loss criterion, which [gives better numerical stability](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html). Later on, after we're done training the network, we'll add a quantization node at the end before we export it to FINN.
%% Cell type:markdown id: tags:
# Define Train and Test Methods <a id='train_test'></a>
The train and test methods will use a `DataLoader`, which feeds the model with a new predefined batch of training data in each iteration, until the entire training data is fed to the model. Each repetition of this process is called an `epoch`.
%% Cell type:code id: tags:
``` python
def train(model, train_loader, optimizer, criterion):
losses = []
# ensure model is in training mode
model.train()
for i, data in enumerate(train_loader, 0):
inputs, target = data
optimizer.zero_grad()
# forward pass
output = model(inputs.float())
loss = criterion(output, target.unsqueeze(1))
# backward pass + run optimizer to update weights
loss.backward()
optimizer.step()
# keep track of loss value
losses.append(loss.data.numpy())
return losses
```
%% Cell type:code id: tags:
``` python
import torch
from sklearn.metrics import accuracy_score
def test(model, test_loader):
# ensure model is in eval mode
model.eval()
y_true = []
y_pred = []
with torch.no_grad():
for data in test_loader:
inputs, target = data
output_orig = model(inputs.float())
# run the output through sigmoid
output = torch.sigmoid(output_orig)
# compare against a threshold of 0.5 to generate 0/1
pred = (output.detach().numpy() > 0.5) * 1
target = target.float()
y_true.extend(target.tolist())
y_pred.extend(pred.reshape(-1).tolist())
return accuracy_score(y_true, y_pred)
```
%% Cell type:markdown id: tags:
# Train the QNN <a id="train_qnn"></a>
We provide two options for training below: you can opt for training the model from scratch (slower) or use a pre-trained model (faster). The first option will give more insight into how the training process works, while the second option will likely give better accuracy.
%% Cell type:markdown id: tags:
## (Option 1, slower) Train the Model from Scratch <a id="train_scratch"></a>
%% Cell type:markdown id: tags:
Before we start training our MLP we need to define some hyperparameters. Moreover, in order to monitor the loss function evolution over epochs, we need to define a method for it. As mentioned earlier, we'll use a loss criterion which applies a sigmoid function during the training phase (`BCEWithLogitsLoss`). For the testing phase, we're manually computing the sigmoid and thresholding at 0.5.
%% Cell type:code id: tags:
``` python
num_epochs = 10
lr = 0.001
def display_loss_plot(losses, title="Training loss", xlabel="Iterations", ylabel="Loss"):
x_axis = [i for i in range(len(losses))]
plt.plot(x_axis,losses)
plt.title(title)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.show()
```
%% Cell type:code id: tags:
``` python
# loss criterion and optimizer
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr, betas=(0.9, 0.999))
```
%% Cell type:code id: tags:
``` python
import numpy as np
from sklearn.metrics import accuracy_score
from tqdm import tqdm, trange
# Setting seeds for reproducibility
torch.manual_seed(0)
np.random.seed(0)
running_loss = []
running_test_acc = []
t = trange(num_epochs, desc="Training loss", leave=True)
for epoch in t:
loss_epoch = train(model, train_quantized_loader, optimizer,criterion)
test_acc = test(model, test_quantized_loader)
t.set_description("Training loss = %f test accuracy = %f" % (np.mean(loss_epoch), test_acc))
t.refresh() # to show immediately the update
running_loss.append(loss_epoch)
running_test_acc.append(test_acc)
```
%% Output
Training loss = 0.132480 test accuracy = 0.797989: 100%|██████████| 10/10 [00:58<00:00, 5.70s/it]
%% Cell type:code id: tags:
``` python
%matplotlib inline
import matplotlib.pyplot as plt
loss_per_epoch = [np.mean(loss_per_epoch) for loss_per_epoch in running_loss]
display_loss_plot(loss_per_epoch)
```
%% Output
%% Cell type:code id: tags:
``` python
acc_per_epoch = [np.mean(acc_per_epoch) for acc_per_epoch in running_test_acc]
display_loss_plot(acc_per_epoch, title="Test accuracy", ylabel="Accuracy [%]")
```
%% Output
%% Cell type:code id: tags:
``` python
test(model, test_quantized_loader)
```
%% Output
0.7979886313948404
%% Cell type:code id: tags:
``` python
# Save the Brevitas model to disk
torch.save(model.state_dict(), "state_dict_self-trained.pth")
```
%% Cell type:markdown id: tags:
## (Option 2, faster) Load Pre-Trained Parameters <a id="load_pretrained"></a>
Instead of training from scratch, you can also use pre-trained parameters we provide here. These parameters should achieve ~91.9% test accuracy.
%% Cell type:code id: tags:
``` python
import torch
trained_state_dict = torch.load("state_dict.pth")["models_state_dict"][0]
model.load_state_dict(trained_state_dict, strict=False)
```
%% Output
IncompatibleKeys(missing_keys=[], unexpected_keys=[])
%% Cell type:code id: tags:
``` python
test(model, test_quantized_loader)
```
%% Output
0.9188772287810328
%% Cell type:markdown id: tags:
**Why do these parameters give better accuracy vs training from scratch?** Even with the topology and quantization fixed, achieving good accuracy on a given dataset requires [*hyperparameter tuning*](https://towardsdatascience.com/hyperparameters-optimization-526348bb8e2d) and potentially running training for a long time. The "training from scratch" example above is only intended as a quick example, whereas the pretrained parameters are obtained from a longer training run using the [determined.ai](https://determined.ai/) platform for hyperparameter tuning.
%% Cell type:markdown id: tags:
# Network Surgery Before Export <a id="network_surgery"></a>
Sometimes, it's desirable to make some changes to our trained network prior to export (this is known in general as "network surgery"). This depends on the model and is not generally necessary, but in this case we want to make a couple of changes to get better results with FINN.
%% Cell type:markdown id: tags:
Let's start by padding the input. Our input vectors are 593-bit, which will make folding (parallelization) for the first layer a bit tricky since 593 is a prime number. So we'll pad the weight matrix of the first layer with seven 0-valued columns to work with an input size of 600 instead. When using the modified network we'll similarly provide inputs padded to 600 bits.
%% Cell type:code id: tags:
``` python
from copy import deepcopy
modified_model = deepcopy(model)
W_orig = modified_model[0].weight.data.detach().numpy()
W_orig.shape
```
%% Output
(64, 593)
%% Cell type:code id: tags:
``` python
import numpy as np
# pad the second (593-sized) dimensions with 7 zeroes at the end
W_new = np.pad(W_orig, [(0,0), (0,7)])
W_new.shape
```
%% Output
(64, 600)
%% Cell type:code id: tags:
``` python
modified_model[0].weight.data = torch.from_numpy(W_new)
modified_model[0].weight.shape
```
%% Output
torch.Size([64, 600])
%% Cell type:markdown id: tags:
Next, we'll modify the expected input/output ranges. In FINN, we prefer to work with bipolar {-1, +1} instead of binary {0, 1} values. To achieve this, we'll create a "wrapper" model that handles the pre/postprocessing as follows:
* on the input side, we'll pre-process by (x + 1) / 2 in order to map incoming {-1, +1} inputs to {0, 1} ones which the trained network is used to. Since we're just multiplying/adding a scalar, these operations can be [*streamlined*](https://finn.readthedocs.io/en/latest/nw_prep.html#streamlining-transformations) by FINN and implemented with no extra cost.
* on the output side, we'll add a binary quantizer which maps everthing below 0 to -1 and everything above 0 to +1. This is essentially the same behavior as the sigmoid we used earlier, except the outputs are bipolar instead of binary.
%% Cell type:code id: tags:
``` python
from brevitas.core.quant import QuantType
from brevitas.nn import QuantIdentity
class CybSecMLPForExport(nn.Module):
def __init__(self, my_pretrained_model):
super(CybSecMLPForExport, self).__init__()
self.pretrained = my_pretrained_model
self.qnt_output = QuantIdentity(quant_type=QuantType.BINARY, bit_width=1, min_val=-1.0, max_val=1.0)
def forward(self, x):
# assume x contains bipolar {-1,1} elems
# shift from {-1,1} -> {0,1} since that is the
# input range for the trained network
x = (x + torch.tensor([1.0])) / 2.0
out_original = self.pretrained(x)
out_final = self.qnt_output(out_original) # output as {-1,1}
return out_final
model_for_export = CybSecMLPForExport(modified_model)
```
%% Cell type:code id: tags:
``` python
def test_padded_bipolar(model, test_loader):
# ensure model is in eval mode
model.eval()
y_true = []
y_pred = []
with torch.no_grad():
for data in test_loader:
inputs, target = data
# pad inputs to 600 elements
input_padded = np.pad(inputs, [(0,0), (0,7)])
# convert inputs to {-1,+1}
input_scaled = 2*input_padded - 1
# run the model
output = model(torch.from_numpy(input_scaled).float())
y_pred.extend(list(output.flatten()))
# make targets bipolar {-1,+1}
expected = 2*target.float() - 1
expected = expected.detach().numpy()
y_true.extend(list(expected.flatten()))
return accuracy_score(y_true, y_pred)
```
%% Cell type:code id: tags:
``` python
test_padded_bipolar(model_for_export, test_quantized_loader)
```
%% Output
0.9188772287810328
%% Cell type:markdown id: tags:
# Export to FINN-ONNX <a id="export_finn_onnx" ></a>
[ONNX](https://onnx.ai/) is an open format built to represent machine learning models, and the FINN compiler expects an ONNX model as input. We'll now export our network into ONNX to be imported and used in FINN for the next notebooks. Note that the particular ONNX representation used for FINN differs from standard ONNX, you can read more about this [here](https://finn.readthedocs.io/en/latest/internals.html#intermediate-representation-finn-onnx).
%% Cell type:code id: tags:
``` python
import brevitas.onnx as bo
export_onnx_path = "cybsec-mlp.onnx"
input_shape = (1, 600)
bo.export_finn_onnx(model_for_export, input_shape, export_onnx_path)
print("Model saved to %s" % export_onnx_path)
```
%% Output
Model saved to cybsec-mlp.onnx
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:15: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
from ipykernel import kernelapp as app
%% Cell type:markdown id: tags:
## One final fix: input datatype
There's one more thing we'll do: we will mark the input tensor datatype as `DataType.BIPOLAR`, which will be used by the compiler later on. To do this, we'll utilize the `ModelWrapper` component from FINN, which lets us examine and manipulate the ONNX graph in an easier way.
*In the near future it will be possible to add this information to the model [while exporting](https://github.com/Xilinx/brevitas/issues/232), instead of having to add it manually.*
%% Cell type:code id: tags:
``` python
from finn.core.modelwrapper import ModelWrapper
from finn.core.datatype import DataType
finn_model = ModelWrapper(export_onnx_path)
finnonnx_in_tensor_name = finn_model.graph.input[0].name
finnonnx_model_in_shape = finn_model.get_tensor_shape(finnonnx_in_tensor_name)
finn_model.set_tensor_datatype(finnonnx_in_tensor_name, DataType.BIPOLAR)
print("Input tensor name: %s" % finnonnx_in_tensor_name)
print("Input tensor shape: %s" % str(finnonnx_model_in_shape))
print("Input tensor datatype: %s" % str(finn_model.get_tensor_datatype(finnonnx_in_tensor_name)))
ready_model_filename = "cybsec-mlp-ready.onnx"
finn_model.save(ready_model_filename)
```
%% Output
Input tensor name: 0
Input tensor shape: [1, 600]
Input tensor datatype: DataType.BIPOLAR
%% Cell type:markdown id: tags:
## View the Exported ONNX in Netron
Let's examine the exported ONNX model with [Netron](https://github.com/lutzroeder/netron), which is a visualizer for neural networks and allows interactive investigation of network properties. For example, you can click on the individual nodes and view the properties. Particular things of note:
* The input tensor "0" is annotated with `quantization: finn_datatype: BIPOLAR`
* The input preprocessing (x + 1) / 2 is exported as part of the network (initial `Add` and `Div` layers)
* Brevitas `QuantLinear` layers are exported to ONNX as `MatMul`. We've exported the padded version; shape of the first MatMul node's weight parameter is 600x64
* The weight parameters (second inputs) for MatMul nodes are annotated with `quantization: finn_datatype: INT2`
* The quantized activations are exported as `MultiThreshold` nodes with `domain=finn.custom_op.general`
* There's a final `MultiThreshold` node with threshold=0 to produce the final bipolar output (this is the `qnt_output` from `CybSecMLPForExport`
%% Cell type:code id: tags:
``` python
from finn.util.visualization import showInNetron
showInNetron(ready_model_filename)
```
%% Output
Serving 'cybsec-mlp-ready.onnx' at http://0.0.0.0:8081
<IPython.lib.display.IFrame at 0x7f77214fa630>
%% Cell type:markdown id: tags:
## That's it! <a id="thats_it" ></a>
You created, trained and tested a quantized MLP that is ready to be loaded into FINN, congratulations! You can now proceed to the next notebook.
%% Cell type:code id: tags:
``` python
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment