To receive notifications about scheduled maintenance, please subscribe to the mailing-list You can subscribe to the mailing-list at

Commit b703fbc3 authored by stehess's avatar stehess

Initial commit

parent 5e812841
# Data files and directories common in repo root
# Byte-compiled / optimized / DLL files
# Distribution / packaging
# Installer logs
# VS Studio Code
# PyCharm
# Dropbox
# Jupyter Notebook
# pyenv
# dotenv
# virtualenv
# cGOM
# cGOM
## Automating Areas Of Interest Analysis in Mobile Eye Tracking Experiments based on Machine Learning
### User Guide
1. Install the tool. For this clone the public git repo to your target directory from
Note that you should have pre-installed python and pip.
To be able to have a look into the source code, a source code editor such as atom ( is very helpful.
Then install all requirements:
$ pip install −r requirements.txt
2. The following folders should exist in the git directory:
- data_sets
Datasets for training and validation can be added to this folder. The sets must be presented as the examples.
- gaze
Contains the text file containing information about the fixations. Note that it must have the same name as the corresponding video file.
- images
Is an empty folder. Serves as target directory for the images extracted later.
- labels
Contains a json file with the labels. This is automatically generated.
- misc
Contains functions to extract frames from a video, create masked videos, or manually label images.
- mrcnn
Contains everything responsible for mask R-CNN to work properly.
- toolbox
Contains all functions described in this work.
- videos
Contains a video from the eye tracking camera. Note that it must have the same name as the corresponding gaze file.
- weights
Contains the COCO weights, and two other files containing the weights of the partially trained agent and the fully trained agent.
3. Navigate to the misc-folder and run the function with the corresponding parameters to extract frames from a video. Note that a gaze file can be included in order to solely generate frames from fixations.
$ python --video_path ../videos/Name-of-the-video.avi --output_dir ../images --num_images 60 --gaze_path ../gaze/Name-of-the-gaze-file.txt
4. The folder images should contain around 100 frames. Navigate into this folder and create a new folder in it called Object1_Object2. Drag and drop all images into that folder. Note that object 1 and object 2 are placeholder and should be replaced by names of your objects of interest. You may include more objects of interest by following the same logic Obj1_Obj2_Obj3. This procedure is necessary to use the included labelling tool for the training images in step 5.
5. Navigate to the folder toolbox and inspect all the default configuration files. Make changes if required. Then call the function
$ python
Note that this tool currently only works on Linus or iOS. If you are using Windows, please continue with step 8.
6. A window with an image and a bunch of sliders will appear. The sliders can be set to 0, 1, and 2. Now adjust the sliders that all masks corresponding to object1 are set to 1 and all corresponding to object2 to 2.
Setting one mask to 0 disables the mask, and setting all masks to 0 dumps the image to a different unmasks folder. Note that all masks for an image must exist, otherwise it should be discharged to the unmasks folder since it can cause difficulties during the learning process. You should be able to label ca. half of all images with this method.
7. Now navigate to the data_sets folder. There should be a new subfolder called Object1_Object2. Extract the folder unmasks and place it at a desired destination.
8. Open via.html from misc. In it load ca. 90% of the images of unmasks, this will correspond to the training set.
9. On the first frame draw a polygon around the first object. Then open Region Attributes and replace [Add New] with label. Then label the newly created polygon with either cup or pen.
10. Repeat the above process for all images. Finally click on Save as JSON under Annotation.
11. Repeat steps 9 and 10 and create a validation set with the remaining images.
12. Navigate back to data_sets in it create a new folder Object1_Object2_1. In Object1_Object2_1 place the two folders train and val containing the previously labeled images with the corresponding via_region_data.json file. All in all the folder structure with all its content should look similar to Object1_Object2. Also check the content of the via_region_data.json file.
13. Now call the training function from the toolbox:
$ python
14. Once training is over extract the weights from the newly created log folder, rename it according to your preferences and place them in the weights folder.
15. Now call the inference function:
$ python
This will generate a new folder outputs in you can find the results from the method.
Please note that a lot of adjustments can be made in the configuration files making parts of the above description obsolete. However, there are some requirements:
- Image folders for labeling must contain a subfolder with the label names, as described above. The algorithm obtains the labels from the folder names.
- Dataset folders must contain a train and val folder, each containing a file named via_region_data.json.
- via_region_data.json must follow the json files generated by via.html, where each polygon must be labeled.
- If there are two datasets with the same label, call them Object1_Object2, Object1_Object2_1, Object1_Object2_2, etc. Repeating labels is handled by the algorithm: cup_pen and lamp_pen, for instance.
- Data sets from former studies are included in the training of the neural network if they are in the data_sets folder. Please remove data sets that should not be included in the training set.
- The video file and the gaze file must have the same name. Multiple video files and gaze files can be processed automatically, thus they must have a corresponding name.
\ No newline at end of file
This diff is collapsed.
\ No newline at end of file
This diff is collapsed.
{"BG": 0, "shot": 1, "bottle": 2}
\ No newline at end of file
ST: 'cGOM'
Bachmann David, Hess Stephan & Julian Wolf (SV)
pdz, ETH Zürich
This file contains all functions to extract images from video, also just from fixations.
# Global imports
import cv2
import argparse
import random
import os
import numpy as np
# Local imports
from utils import read_gaze
def extract_images_from_video(args):
# Read video
video = cv2.VideoCapture(args.video_path)
width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = video.get(cv2.CAP_PROP_FPS)
# Either no gaze file is provided, then each frame is chosen at random from the entire video
if args.gaze_path == None:
# Calculate the probabilities in order to end up with config.num_images
num_frames = video.get(cv2.CAP_PROP_FRAME_COUNT)
p = args.num_images / num_frames
# Write the corresponding frames
i = 0
video_flag = True
while video_flag:
video_flag, frame =
if random.random() < p:, 'image_' + str(i) + '.JPG'), np.flip(frame, axis=2))
i += 1
# If a gaze file is provided, we only pull a frame from fixations
# Read the gaze file and make it iterable
gaze = read_gaze(args.gaze_path, max_res=(height, width))
p = args.num_images / len(gaze)
gaze = iter(gaze)
# Random frame
gaze_entry = next(gaze)
(t_start, t_end, x, y) = list(gaze_entry.values())
rand_frame = np.random.uniform(t_start, t_end) * fps
rand_frame = int(rand_frame)
i = 0
f_count = 0
video_flag = True
while video_flag:
video_flag, frame =
f_count += 1
# If the random frame corresponds to the frame ID we inspect it
if rand_frame == f_count:
# However, the frame is only kept with probability p
if random.random() < p:, 'image_' + str(i) + '.JPG'), np.flip(frame, axis=2))
i += 1
# Get next random frame
gaze_entry = next(gaze)
(t_start, t_end, x, y) = list(gaze_entry.values())
rand_frame = np.random.uniform(t_start, t_end) * fps
rand_frame = int(rand_frame)
if __name__ == '__main__':
# Get config
parser = argparse.ArgumentParser(description='Extract a number of random frames from a video')
parser.add_argument('--video_path', required=True)
parser.add_argument('--output_dir', required=True)
parser.add_argument('--num_images', type=int, required=True)
parser.add_argument('--gaze_path', default=None)
args = parser.parse_args()
# Extract frames
ST: 'cGOM'
Bachmann David, Hess Stephan & Julian Wolf (SV)
pdz, ETH Zürich
This file contains all functions to create a gaze video.
# Global imports
import os
import sys
import warnings
import argparse
import cv2
import numpy as np
# Local imports
import utils
# Import Mask RCNN from parent directory
from mrcnn.config import Config
from mrcnn import model as modellib, visualize
# Suppress warnings
warnings.filterwarnings('ignore', message='Anti-aliasing will be enabled by default in skimage 0.15 to')
# Derived config class
class GazeConfig(Config):
# Name
NAME = "gaze"
# Number of GPUs
# Number of images per GPU
def make_mask_gaze_video(model, args, classes):
# Video capture
video = cv2.VideoCapture(args.video_path)
width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = video.get(cv2.CAP_PROP_FPS)
# Video writer
name = os.path.basename(args.video_path).split('.')[0]
writer = cv2.VideoWriter(name + '_masked.avi', cv2.VideoWriter_fourcc(*'MJPG'), fps, (width, height))
# Read the gaze file and make it iterable
gaze = utils.read_gaze(args.gaze_path, max_res=(height, width))
gaze = iter(gaze)
# Get the first entry from the gaze file
gaze_entry = next(gaze)
(t_start, t_end, x, y) = list(gaze_entry.values())
f_start = int(t_start * fps)
f_end = int(t_end * fps)
# Go through the frames of the video
f_count = 0
video_flag = True
while video_flag:
# Read a frame
video_flag, frame =
f_count += 1
# Detect masks
frame = np.flip(frame, axis=2).copy()
r = model.detect([frame], verbose=0)[0]
masks = r['masks']
scores = r['scores']
class_ids = r['class_ids']
num_masks = masks.shape[-1]
# Apply the masks
for i in range(num_masks):
id = class_ids[i]
color = (float(id == 1), float(id == 2), float(id > 2))
frame = visualize.apply_mask(frame, masks[:, :, i], color)
# If a frame is within our random array we keep it
if f_start < f_count <= f_end:
assert video_flag, 'A time outside the scope of the video has been selected. This should not happen.'
# Check where the gaze point lies
max_score = 0.
max_class = 0
for i in range(num_masks):
if masks[y, x, i] == True and scores[i] > max_score:
max_score = scores[i]
max_class = class_ids[i]
# Write the class onto the frame
label = classes[max_class]
text = 'label: %s' %label
cv2.putText(frame, text, (int(frame.shape[0] / 50.), int(frame.shape[0] / 50.)), cv2.FONT_HERSHEY_PLAIN,
frame.shape[0] / 700., (255, 0, 0))
# Draw the circle, (x, y), int(frame.shape[0] / 100.), (255, 0, 0), thickness=int(frame.shape[0] / 100.))
# Check if we are leaving the fixation
# Exhaust casues the last frame not to be written - whatever.
if f_count == f_end:
gaze_entry = next(gaze, 'break')
if gaze_entry == 'break':
(t_start, t_end, x, y) = list(gaze_entry.values())
f_start = int(t_start * fps)
f_end = int(t_end * fps)
# Write the frame
writer.write(np.flip(frame, axis=2))
if __name__ == '__main__':
# Args
parser = argparse.ArgumentParser(description='Make a fancy videos containing masks, gaze point, and detections')
parser.add_argument('--video_path', required=True)
parser.add_argument('--gaze_path', required=True)
parser.add_argument('--log_dir', default='./logs')
parser.add_argument('--weights_path', default='../weights/w_pilot.h5')
parser.add_argument('--label_dir', default='../labels')
parser.add_argument('--detection_min_confidence', default=0.9, type=int)
args= parser.parse_args()
# Read the labels
LABEL_DIR = os.path.join(args.label_dir, 'labels.json')
classes = list(utils.load(LABEL_DIR).keys())
# Configs for model
class GazeConfig(GazeConfig):
NUM_CLASSES = len(classes)
DETECTION_MIN_CONFIDENCE = args.detection_min_confidence
config = GazeConfig()
# Load the model
model = modellib.MaskRCNN(mode="inference", config=config, model_dir=args.log_dir)
# Load weights
if args.weights_path == 'last':
model.load_weights(model.find_last(), by_name=True)
model.load_weights(args.weights_path, by_name=True)
# Make the video
make_mask_gaze_video(model, args, classes)
## [1.0.6] - June 15, 2018
* a patch from Stefan Mihaila which requires polygon shape to have at least 3 points.
* rectangles can now be resized from edges
* added POLYLINE shape
* image file list can be filtered using regular expression
* renamed methods and variable
- _via_reload_img_table : _via_reload_img_fn_list_table
- reload_img_table() : reload_img_fn_list_table()
- _via_loaded_img_table_html : _via_loaded_img_fn_list_table_html
## [1.0.5] - January 16, 2017
* (code contributions from Stefan Mihaila) via.js codebase improvement, wider web browser support (IE 10, IE 11 and Opera 12)
* added file to record contributions to VIA codebase
* removed 'localStorage.clear()' to avoid SecurityError in Safari browser (issue 85 and 108)
## [1.0.4] - October 17, 2017
* fixed polygon copy/paste/resize issue (issue 107)
## [1.0.3] - August 07, 2017
* CSV export now does not add extra comma to each line (issue 103)
## [1.0.2] - August 04, 2017
* removed free resize of ellipse from any edge (issue 100)
* fixed free resize of rectangle (issue 101)
* fixed 1-pixel bug (first set image space coordinate, then set canvas coordinate. see issue 96) for region resize and move
* press Ctrl while resizing to preserve the aspect ratio of rectangle (issue 98)
* fixed issue with CSV files containing newline character \r or \r\n (issue 102)
* top menu bar remains consistent event when the user scrolls the window
## [1.0.1] - June 11, 2017
* fixed issue 33 : Annotations cannot be imported from file of type application/
* fixed issue 96 : A major bug in how canvas coordinates are computed
## [1.0.0] - April 04, 2017
* file-attributes support added (useful for weakly supervised learning)
* spreadsheet like editor for region and file attributes
* visualization of loaded image list improved
* user annotation data cached in browser's localStorage (for data recovery on browser crash)
* zoom in/out support
* improved performance using multi-layered canvas for image and annotations
* new user interface layout (added toolbar on top navigation panel)
* added Getting Started guide and License to help menu
* CSV import/export now conforms to RFC 4180 standard
* added some basic unit tests
* added support for point regions (useful for landmark annotations)
## [1.0.0-beta] - 2017-03-15
* beta release for VIA 1.0.0
## [0.1b] - 2016-10-24
* first release of VGG image annotator
* supports following region shape: rectangle, circle, ellipse, polygon
* contains basic image region operations such as move, resize, delete
* Ctrl a/c/v to select all, copy and paste image regions
* import/export of region data from/to text file in csv,json format
* display list of loaded images
# Contributors to VIA project
We welcome all forms of contributions (code update, documentation, etc) from users.
These contributions must adhere to the existing [license](LICENSE) of VIA project.
Here is the list of current contributions to VIA project.
* Stefan Mihaila (@smihaila, 01 Feb. 2018, updates to via-1.0.5)
01. a patch from Stefan Mihaila which requires polygon shape to have at least 3 points.
* Stefan Mihaila (@smihaila, 15 Jan. 2018, updates to via-1.0.4)
01. Added "use strict";
02. Added the "var _via_current_x = 0; var _via_current_y = 0;" global vars.
03. Replaced any Set() object (_via_region_attributes, _via_file_attributes) with a standard dictionary object.
04. Replaced any Map() object (ImageMetadata.file_attributes, ImageRegion.shape_attributes and ImageRegion.region_attributes) with a standard dictionary object.
05. Made most of the switch() statements more readable or even fixing potential bugs caused by unintended "fall-through" (i.e. lack of "break") statements.
06. Added missing semi-colon (;) expression terminators.
07. Replaced any use of "for (var key of collection_name.keys()) {}" block (combined with collection_name.get(key) inside the block) with "for (var key in collection_name) {}" (combined with collection_name[key] inside the block).
08. Gave a more intuitive name to certain local var names.
09. Commented out unused local vars.
10. Removed un-necessary intermediary local vars.
11. Made certain local vars inside functions, to be more sub-scoped / to reflect their exact use.
12. Added missing "var variable_name" declarations.
13. Leverage Object.keys(collection_name).length property instead of Map.size and Set.size property.
14. Replaced "==" and "!=" with their more precise / identity operators (=== and !==).
15. Simplified some function implementations, using direct "return expression" statements.
16. Fixed spelling errors in comments, string values, variable names and function names.
Copyright (c) 2016-2018, Abhishek Dutta, Visual Geometry Group, Oxford University.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.