Commit cd7c4de2 authored by nstorni's avatar nstorni
Browse files

MNIST dataset generation, Dataset statistics

parent 9a2a2188
......@@ -46,13 +46,20 @@ cd $HOME
As it is quite slow at extracting the frames, if you want ot extract multiple video and don't won't to leave the ssh connection open you can submit a job to the cluster that will extract the frames:
```bash
bsub -n 1 -R "rusage[mem=4096]" "cd $SCRATCH/data/mini_mice_dataset; pwd; $HOME/world-models/video_frame/video2frames.sh"
bsub -o job_logs -n 1 -R "rusage[mem=4096]" "cd $SCRATCH/data/mini_mice_dataset; pwd; $HOME/world-models/video_frame/video2frames.sh"
```
You can check the job status by tiping:
```bash
bbjobs
```
Compute normalisation statistics (mean and std).
```bash
cd $SCRATCH/data/$DATASET_NAME
module purge
module load gcc/4.8.5 python_cpu/3.7.1
python $HOME/world-models/utils/dataset_statistics.py --imagetype=RGB
```
Move some images from train to test folder (not really random, just taking all the file ending with 1 in the filename) should be about 10% of the train set.
......@@ -74,29 +81,50 @@ cp world-models/template_config.json training_configs/train_config1.json
Modify train_config1.json for your training run, then you can test the configuration by starting the training on the local leonhard instance:
```bash
$HOME/world-models/train_vae.sh training_configs/train_config1.json
$HOME/world-models/start_training.sh --modeldir train_shvae.py --modelconfigdir $HOME/training_configs/train_config1.json
```
If there are no errors interrupt the training with CTRL+C, you can now submit it to the cluster.
```bash
bsub -W 12:00 -n 4 -R "rusage[ngpus_excl_p=1,mem=4096 ]" "$HOME/world-models/train_vae.sh training_configs/train_config1.json"
bsub -o job_logs -n 4 -R "rusage[ngpus_excl_p=1,mem=4096 ]" "$HOME/world-models/start_training.sh --modeldir trainvae.py --modelconfigdir $HOME/training_configs/train_config1.json"
```
## 4. Monitoring the job with tensorboard
You can monitor the training process with tensorboard, launch tensorboard:
```bash
$HOME/world-models/start_tensorboard.sh
$HOME/world-models/utils/start_tensorboard.sh mini_mice_dataset
```
Forward the port used by tensorboard to view the training process in the browser, do this in a separate bash shell (you will have to change to the port that tensorboard outputs).
```bash
ssh -L 6993:localhost:6993 nstorni@login.leonhard.ethz.ch
```
Forward the port used by tensorboard to view the training process in the browser.
You can access tensorboard in your browser at localhost:6993
The training job will save the models, samples and logs in the $SCRATCH/experiments folder.
## 5. Generate Moving MNIST dataset
You can generate a custom toy dataset with a MNIST digits moving on a black frame that bounce on the borders with the following script:
```bash
$HOME/world-models/utils/generate_moving_mnist.sh --num_videos 10 --num_videos_val 300 --num_frames 10 --digits_dim 40 --frame_dim 256 --num_digits 1
```
--num_videos: specifies how many video sequences
--num_frames: how many frames per sequence
--digits_dim: how large are the digits
--frame_dim: how large is the image
--num_digits: how many digits are on each frame
The script will create a folder named: DATASET_NAME=$CUSTOM_NAME"movingMNIST_""digits"$NUM_DIGITS"ddim"$DIGITS_DIM"fdim"$FRAME_DIM"v"$NUM_VIDEOS"f"$NUM_FRAMES
in $SCRATCH/data.
The script already computes the normalisation statistics and puts them in the dataset folder.
For large datasets run the script in the cluster:
This probably won't work but something similar should:
```bash
ssh nstorni@login.leonhard.ethz.ch -L 6000:login.leonhard.ethz.ch:6008
bsub -o $HOME/job_logs -n 4 -R "rusage[mem=4096]" "$HOME/world-models/utils/generate_moving_mnist.sh --num_videos 10000 --num_videos_val 300 --num_frames 10 --digits_dim 40 --frame_dim 256"
```
(I used forwarded ports feature in Visual Studio Code, you can also use Putty)
The training job will save the models, samples and logs in the $SCRATCH folder.
......
import math
import os
import sys
import numpy as np
from PIL import Image
###########################################################################################
# script to generate moving mnist video dataset (frame by frame) as described in
# [1] arXiv:1502.04681 - Unsupervised Learning of Video Representations Using LSTMs
# Srivastava et al
# by Tencia Lee
# saves in hdf5, npz, or jpg (individual frames) format
###########################################################################################
# helper functions
def arr_from_img(im, mean=0, std=1):
'''
Args:
im: Image
shift: Mean to subtract
std: Standard Deviation to subtract
Returns:
Image in np.float32 format, in width height channel format. With values in range 0,1
Shift means subtract by certain value. Could be used for mean subtraction.
'''
width, height = im.size
arr = im.getdata()
c = int(np.product(arr.size) / (width * height))
return (np.asarray(arr, dtype=np.float32).reshape((height, width, c)).transpose(2, 1, 0) / 255. - mean) / std
def get_image_from_array(X, index, mean=0, std=1):
'''
Args:
X: Dataset of shape N x C x W x H
index: Index of image we want to fetch
mean: Mean to add
std: Standard Deviation to add
Returns:
Image with dimensions H x W x C or H x W if it's a single channel image
'''
ch, w, h = X.shape[1], X.shape[2], X.shape[3]
ret = (((X[index] + mean) * 255.) * std).reshape(ch, w, h).transpose(2, 1, 0).clip(0, 255).astype(np.uint8)
if ch == 1:
ret = ret.reshape(h, w)
return ret
# loads mnist from web on demand
def load_dataset(training=True):
if sys.version_info[0] == 2:
from urllib import urlretrieve
else:
from urllib.request import urlretrieve
def download(filename, source='http://yann.lecun.com/exdb/mnist/'):
print("Downloading %s" % filename)
urlretrieve(source + filename, filename)
import gzip
def load_mnist_images(filename):
if not os.path.exists(filename):
download(filename)
with gzip.open(filename, 'rb') as f:
data = np.frombuffer(f.read(), np.uint8, offset=16)
data = data.reshape(-1, 1, 28, 28).transpose(0, 1, 3, 2)
return data / np.float32(255)
if training:
return load_mnist_images('train-images-idx3-ubyte.gz')
return load_mnist_images('t10k-images-idx3-ubyte.gz')
def generate_moving_mnist(training, shape=(64, 64), num_frames=30, num_images=100, original_size=28, nums_per_image=2):
'''
Args:
training: Boolean, used to decide if downloading/generating train set or test set
shape: Shape we want for our moving images (new_width and new_height)
num_frames: Number of frames in a particular movement/animation/gif
num_images: Number of movement/animations/gif to generate
original_size: Real size of the images (eg: MNIST is 28x28)
nums_per_image: Digits per movement/animation/gif.
Returns:
Dataset of np.uint8 type with dimensions num_frames * num_images x 1 x new_width x new_height
'''
mnist = load_dataset(training)
width, height = shape
# Get how many pixels can we move around a single image
lims = (x_lim, y_lim) = width - original_size, height - original_size
# Create a dataset of shape of num_frames * num_images x 1 x new_width x new_height
# Eg : 3000000 x 1 x 64 x 64
dataset = np.empty((num_images, num_frames, width, height), dtype=np.uint8)
for img_idx in range(num_images):
# Randomly generate direction, speed and velocity for both images
direcs = np.pi * (np.random.rand(nums_per_image) * 2 - 1)
speeds = np.random.randint(5, size=nums_per_image) + 2
veloc = np.asarray([(speed * math.cos(direc), speed * math.sin(direc)) for direc, speed in zip(direcs, speeds)])
# Get a list containing two PIL images randomly sampled from the database
mnist_images = [Image.fromarray(get_image_from_array(mnist, r, mean=0)).resize((original_size, original_size),
Image.ANTIALIAS) \
for r in np.random.randint(0, mnist.shape[0], nums_per_image)]
# Generate tuples of (x,y) i.e initial positions for nums_per_image (default : 2)
positions = np.asarray([(np.random.rand() * x_lim, np.random.rand() * y_lim) for _ in range(nums_per_image)])
# Generate new frames for the entire num_framesgth
for frame_idx in range(num_frames):
canvases = [Image.new('L', (width, height)) for _ in range(nums_per_image)]
canvas = np.zeros((1, width, height), dtype=np.float32)
# In canv (i.e Image object) place the image at the respective positions
# Super impose both images on the canvas (i.e empty np array)
for i, canv in enumerate(canvases):
canv.paste(mnist_images[i], tuple(positions[i].astype(int)))
canvas += arr_from_img(canv, mean=0)
# Get the next position by adding velocity
next_pos = positions + veloc
# Iterate over velocity and see if we hit the wall
# If we do then change the (change direction)
for i, pos in enumerate(next_pos):
for j, coord in enumerate(pos):
if coord < -2 or coord > lims[j] + 2:
veloc[i] = list(list(veloc[i][:j]) + [-1 * veloc[i][j]] + list(veloc[i][j + 1:]))
# Make the permanent change to position by adding updated velocity
positions = positions + veloc
# Add the canvas to the dataset array
dataset[img_idx, frame_idx] = (canvas * 255).clip(0, 255).astype(np.uint8)
return dataset
def main(training, dest, filetype='npz', frame_size=64, num_frames=30, num_images=100, original_size=28,
nums_per_image=2):
dat = generate_moving_mnist(training, shape=(frame_size, frame_size), num_frames=num_frames, num_images=num_images, \
original_size=original_size, nums_per_image=nums_per_image)
n = num_images * num_frames
if filetype == 'npz':
np.savez(dest, dat)
elif filetype == 'jpg':
print(dat.shape[0])
print(dat.shape[1])
for v in range(dat.shape[0]):
for f in range(dat.shape[1]):
# Image.fromarray(get_image_from_array(dat[v], f, mean=0)).save(os.path.join(dest, '{}_{}.jpg'.format(v,f)))
Image.fromarray(dat[v][f]).save(os.path.join(dest, '{}_{}.jpg'.format(v,f)))
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(description='Command line options')
parser.add_argument('--dest', type=str, dest='dest', default='movingmnistdata')
parser.add_argument('--filetype', type=str, dest='filetype', default="npz")
parser.add_argument('--training', type=bool, dest='training', default=True)
parser.add_argument('--frame_size', type=int, dest='frame_size', default=64)
parser.add_argument('--num_frames', type=int, dest='num_frames', default=30) # length of each sequence
parser.add_argument('--num_images', type=int, dest='num_images', default=20000) # number of sequences to generate
parser.add_argument('--original_size', type=int, dest='original_size',
default=28) # size of mnist digit within frame
parser.add_argument('--nums_per_image', type=int, dest='nums_per_image',
default=2) # number of digits in each frame
args = parser.parse_args(sys.argv[1:])
main(**{k: v for (k, v) in vars(args).items() if v is not None})
\ No newline at end of file
{
"exp_name" : "VAE training",
"exp_name" : "test",
"learning_rate" : 0.0003,
"batch_size" : 32,
"batch_size" : 64,
"noreload" : "True",
"reload_dir": "",
"epochs" : 20,
"samples" : "True",
"epochs" : 200,
"epochsamples": 4,
"vaesamples" : 8,
"vae_type": "Square",
"betavae":1,
"input_dim": [64,64],
"latent_dim": 10,
"latent_dim": 64,
"early_stopping": "True",
"weight_decay": "True",
"logdir" : "$SCRATCH/vae_training_outputs",
"logdir" : "$SCRATCH/experiments/vae",
"dataset_dir":"$SCRATCH/data/mini_mice_dataset",
"train_dataset_dir": "$SCRATCH/data/mini_mice_dataset/train",
"val_dataset_dir":"$SCRATCH/data/mini_mice_dataset/val",
"output_dir": ""
......
import json
from easydict import EasyDict
import os
import string
def get_dataset_statistics(json_file):
"""
Get the config from a json file
:param json_file: the path of the config file
:return: config(namespace), config(dictionary)
"""
# parse the configurations from the config json file provided
with open(json_file, 'r') as config_file:
try:
config_dict = json.load(config_file)
config = EasyDict(config_dict)
return config
except ValueError:
print("INVALID JSON file format.. Please provide a good json file")
exit(-1)
def get_config_from_json(json_file):
"""
......@@ -9,7 +25,6 @@ def get_config_from_json(json_file):
:param json_file: the path of the config file
:return: config(namespace), config(dictionary)
"""
print(json_file)
# parse the configurations from the config json file provided
with open(json_file, 'r') as config_file:
try:
......@@ -18,7 +33,12 @@ def get_config_from_json(json_file):
config = EasyDict(config_dict)
config.train_dataset_dir = os.path.expandvars(config.train_dataset_dir)
config.val_dataset_dir = os.path.expandvars(config.val_dataset_dir)
config.dataset_dir = os.path.expandvars(config.dataset_dir)
config.logdir = os.path.expandvars(config.logdir)
# Make experiment name legal filename:
valid_chars = "-_%s%s" % (string.ascii_letters, string.digits)
config.exp_name = ''.join(c for c in config.exp_name if c in valid_chars)
return config, config_dict
except ValueError:
print("INVALID JSON file format.. Please provide a good json file")
......
import torch
import torch.utils.data
from PIL import Image
from torchvision import transforms, datasets
import json
import argparse
parser = argparse.ArgumentParser(description='Dataset statistics')
parser.add_argument('--imagetype', type=str, default="RGB", help='What type of image: Grayscale, RGB or RGBA')
args = parser.parse_args()
# Custom loader for grayscale images
def grayscale_loader(path):
with open(path, 'rb') as f:
img = Image.open(f).convert("L")
return img
# Custom loader for images with mask in the alpha channel (RGBA)
def rgba_loader(path):
with open(path, 'rb') as f:
img = Image.open(f).convert("RGBA")
return img
dataset_dir = '.'
# Load correct dataset depending on image type.
if args.imagetype == "Grayscale":
dataset = datasets.ImageFolder(dataset_dir+'/train',loader=grayscale_loader, transform=transforms.Compose([
transforms.ToTensor()]))
if args.imagetype == "RGBA":
dataset = datasets.ImageFolder(dataset_dir+'/train',loader=rgba_loader, transform=transforms.Compose([
transforms.ToTensor()]))
if args.imagetype == "RGB":
dataset = datasets.ImageFolder(dataset_dir+'/train', transform=transforms.Compose([
transforms.ToTensor()]))
loader = torch.utils.data.DataLoader(dataset,
batch_size=32,
num_workers=8,
shuffle=False)
mean = 0.0
statistics = {
"means" : [],
"stds" : []
}
print("Computing mean")
for images, _ in loader:
batch_samples = images.size(0)
images = images.view(batch_samples, images.size(1), -1)
mean += images.mean(2).sum(0)
mean = mean / len(loader.dataset)
statistics["means"] = mean.cpu().numpy().tolist()
print(statistics["means"])
print("Computing std")
var = 0.0
# count = 0
image_size = 0
for images, _ in loader:
batch_samples = images.size(0)
images = images.view(batch_samples, images.size(1), -1)
var += ((images - mean.unsqueeze(1))**2).sum([0,2])
image_size = images.size(2)
std = torch.sqrt(var / (len(loader.dataset)*image_size))
statistics["stds"] = std.cpu().numpy().tolist()
print(statistics["stds"])
with open(dataset_dir+'/statistics.json', 'w') as outfile:
json.dump(statistics, outfile)
\ No newline at end of file
#!/bin/bash
# Make bash file executable chmod u+x filename.sh
echo "Loading modules"
module purge
module load gcc/4.8.5 python_cpu/3.7.1
python $HOME/world-models/utils/dataset_statistics.py
\ No newline at end of file
......@@ -47,7 +47,8 @@ def load_dataset():
# generates and returns video frames in uint8 array
def generate_moving_mnist(shape=(64,64), seq_len=30, seqs=10000, num_sz=28, nums_per_image=2):
mnist = load_dataset()
# mnist = load_dataset()
mnist = load_mnist_images('train-images-idx3-ubyte.gz')
width, height = shape
lims = (x_lim, y_lim) = width-num_sz, height-num_sz
dataset = np.empty((seq_len*seqs, 1, width, height), dtype=np.uint8)
......
#!/bin/bash
#Parameters
NUM_VIDEOS=1
NUM_FRAMES=10
NUM_DIGITS=1
DIGITS_DIM=45
FRAME_DIM=64
NUM_VIDEOS_VAL=1
CUSTOM_NAME=""
while [ "$1" != "" ]; do
case $1 in
--custom_name ) shift
CUSTOM_NAME=$1
;;
--num_videos ) shift
NUM_VIDEOS=$1
;;
--num_frames ) shift
NUM_FRAMES=$1
;;
--num_digits ) shift
NUM_DIGITS=$1
;;
--num_videos_val ) shift
NUM_VIDEOS_VAL=$1
;;
--digits_dim ) shift
DIGITS_DIM=$1
;;
--frame_dim ) shift
FRAME_DIM=$1
;;
-interactive ) interactive=1
;;
--help ) usage
exit
;;
* ) usage
exit 1
esac
shift
done
#Dataset name
DATASET_NAME=$CUSTOM_NAME"movingMNIST_""digits"$NUM_DIGITS"ddim"$DIGITS_DIM"fdim"$FRAME_DIM"v"$NUM_VIDEOS"f"$NUM_FRAMES
echo $DATASET_NAME
# Make bash file executable chmod u+x filename.sh
echo "Making dataset directories"
mkdir $SCRATCH/data
mkdir $SCRATCH/data/$DATASET_NAME
mkdir $SCRATCH/data/$DATASET_NAME/video
mkdir $SCRATCH/data/$DATASET_NAME/train
mkdir $SCRATCH/data/$DATASET_NAME/train/nolabel
mkdir $SCRATCH/data/$DATASET_NAME/val
mkdir $SCRATCH/data/$DATASET_NAME/val/nolabel
echo "Download dataset"
cp $SCRATCH/data/train-images-idx3-ubyte.gz $SCRATCH/data/$DATASET_NAME/.
cd $SCRATCH/data/$DATASET_NAME
echo "Loading modules"
module purge
module load gcc/4.8.5 python_cpu/3.7.1
echo "Generating train set"
python $HOME/world-models/data/moving_mnist.py --num_images $NUM_VIDEOS --num_frames $NUM_FRAMES --nums_per_image $NUM_DIGITS --filetype jpg --frame_size $FRAME_DIM --original_size $DIGITS_DIM --dest train/nolabel
echo "Generating validation set"
python $HOME/world-models/data/moving_mnist.py --num_images $NUM_VIDEOS_VAL --num_frames $NUM_FRAMES --nums_per_image $NUM_DIGITS --filetype jpg --frame_size $FRAME_DIM --original_size $DIGITS_DIM --dest val/nolabel
echo "Computing dataset statistics"
python $HOME/world-models/utils/dataset_statistics.py --imagetype=Grayscale
\ No newline at end of file
......@@ -4,4 +4,4 @@ echo "Loading modules"
module purge
module load gcc/4.8.5 python_cpu/3.7.1
echo "Starting tensorboard"
tensorboard --logdir $SCRATCH/vae_training_outputs/tensorboard_logs
\ No newline at end of file
tensorboard --logdir $SCRATCH/$1/tensorboard_logs --port=6993
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment