# Hand-eye-coordination (HEC or EHC) This Readme shall provide all necessary insights to use the framework in order to recognize hand eye-coordination (HEC) patterns in eye tracking videos. FYI: The software code has been developed and tested in a Windows 10 Education OS environment only. ## License The code and the models in this repo are released under the [MIT License](https://gitlab.ethz.ch/pdz/3d-convnet_for_hec_recognition/-/blob/master/LICENSE). ## Installation # Step 1: Download this repository # Step 2: Download and install anaconda # Step 3: Create environment from .yml file conda env create -f HEC_CNN_env.yml # Step 4: Activate conda environment activate HEC_CNN_env # Step 5: Add PyTorch pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html # FYI - Save an environment with conda env export > HEC_CNN_env.yml ## Citation If you use our code in your research or wish to refer to the baseline results, please use the following BibTeX entry. >>>>>>>> CHECK THIS CITATION BEFORE RELEASE <<<<<<<<<<<<<<< @article{NonLocal2020, author = {Stephan Wegner, Felix Wang, Sophokles Ktistakis, Julian Wolf, Quentin Lohmeyer, Mirko Meboldt}, title = {FILL IN TITLE HERE}, journal = {PlosONE}, year = {2020} } ## Structure of the data **Video files:** .avi files with name *name_base{i}.avi* **Gaze coordinate files:** .txt files with name *name_base{i}.txt* *Headings in file:* | RecordingTime [ms] | Point of Regard Binocular X [px] | Point of Regard Binocular Y [px] | Video Time [h:m:s:ms] | **Labels for Mask-RCNN:** .json file with name *labels_yps.json* *Structure in file:* {"bg": 0, "Obj1": 1, "Ojb2": 2, "Obj3":3, "Obj4": 4} *(You can have as many object as you wish, according to your trained model)* **Ground truth files for 3D-ConvNet:** .csv files with name *behaviour_ground_truth_name_base{i}.csv* *Headings in file (exported from SMI BeGaze 3.6):* | Frame time | Frame number | Behaviour | **Video times to cut orginal videos:** .txt file with name *video_times.txt* *Headings in file:* | Name | start [s] | end [s] | ## Defintions in definitions.py Open definitions.py in your favorite source code editor (e.g. [Atom](https://atom.io/) or [Pycharm](https://www.jetbrains.com/de-de/pycharm/)) ### Define the name base for your files (video (.avi), gaze coordinates (.txt), and behaviour ground truth (.csv)) # As an example: name_base = 'EHC_Y_P' ### Define, which scripts you want to run #Please choose, which oparation(s) you want to start by choosing 0 (operation is not started) or 1 (operation will be started) operation = {'2DCNN_Inference': 1, 'extract_features': 1, 'create_segments': 1, '3DCNN_train_class': 1, '3DCNN_predict': 1, 'post-processing': 1 } ### Define, which IDs are in the training and in the test set # the rule for naming is name_base{i}, both for video and gaze coordinate files train_val_nums = [2,3,4,5,6,7,8,9,12,13,14,15,16,17,18,19,20,21,24,25,26,27,28,29,30,31,32,33,34] test_nums = [1,10,11,22,23] ### Define if you want to run training or test # choose mode: "train" or "test" mode = "train" ### For the 2D-ConvNet, definitions are # choose: "original" or "black" background image_type='original' # weights for paper use case work well, please adapt to your specific use case # 'w_pen': 1.0,'w_phone':0.96,'w_pillow':1.3, 'w_smart': 1.6 class_weights = [1, 0.96, 1.3, 1.6] ### For the 3D-ConvNet, definitions are #### For inference path_to_load_classification_network = ROOT_DIR + "\\" + r"models\ThreeDCNN\classification_NN\22i_TCNN_class_acc_0.65652174.h5" HEC_classes = ['Background', 'Guiding', 'Directing', 'Checking', 'Observing'] #### For training During the training, the model is saved every 5 epochs. Define learning rate and number of epochs for the training The training set is devided into k folds for training/ val split. You can define, which fold you want to start in the training # Parameters for training the 3D CNN hyperparam=hyperparam #Hyperparameters for tuning the 3D CNN length=length # number of hyperparameter sets # k-fold cross validation, start and end point defined for validation set, remaining samples are collected in training set starts=[] #start of the split, for k=5: 0.01, 0.21, 0.41, 0.61, 0.81 ends=[] #end of the split, for k=5: 0.20, 0.40, 0.60, 0.80, 1.00 k = 5 step = 1 / k a = 0 while a < k: start = round((1 / 100 + 10 * step * a / 10), 2) end = round((start + step - 1 / 100), 2) ends.append(end) starts.append(start) a+=1 runs=len(starts) # learning rate for training the model learning_rate= 1e-3 epochs = 50 # decision, which fold is start fold (0,1,2,3,4) start_fold=0 # choose if undersampling should be applied (True or False) undersampling = False # Choose model = 'None' model = 'None' ##### Restart training You can restart the training from one of the saved models # Definitions path to to-be retrained model path_to_load_retrain_network = ROOT_DIR + "\\" + r"models\ThreeDCNN\classification_NN\temp_training\BGtrain_05s\3DCNN_color_f1_GP_epoch_40.h5" # choose model to re-train model = path_to_load_retrain_network # number of epoch to restart training if model == 'None': start_epoch = 0 else: start_epoch = 40 ### Definitions for post-procession use_bg = False ## Run code # Go to folder of the project and open the command line python main.py ## Structure of the project definitiony.py LICENSE main.py opti.py README.md data | -- datasets | | | -- dadaset_gt | | | | | - behaviour_ground_truth_name_base{i}.txt | | | -- extracted_images | | | | | -- test | | -- train_val | | | -- filled_values_segment | | | | | -- test | | -- train_val | | | -- masked_videos | | | | | -- blacked_mask_videos | | | | | | | - name_base{i}_black.avi | | | | | -- labels_mask | | | | | | | - name_base{i}.csv | | | | | -- original_mask_videos | | | | | - name_base{i}_masked.avi | | | - test_filled_values_id_label_map.csv | - test_segment_dataset.csv | - train_filled_values_id_label_map.csv | - test_segment_dataset.csv | -- raw | -- gaze | | | - name_base{i}.txt | -- ground_truth | | | - behaviour_ground_truth_name_base{i}.csv | -- labels | | | - labels_yps.json | -- video_times | | | - video_times.txt | -- videos | - name_base{i}.avi logs models | -- ThreeDCNN | | | -- classification_NN | | | --temp_train | -- TwoDCNN | -mrcnn | - mask_rcnn_hands.h5 - mask_rcnn_yps.h5 reports | -- figures | | | -- acc | -- loss | -- ClassifiactionRep | -- ComparisonPredTrue | -- ConfusionMat | -- temp_acc_loss | -- predictions src | -- ThreeDCNN | | | -- dataset_creation | | | | | - __init__.py | | - create_segments.py | | - extract_features.py | | - utils.py | | | -- models | | | -- classifiaction_network | | | | | - __init__.py | | - classifiaction_model.py | | - train_model.py | | - utils.py | | | -- data_generator | | | | | - __init__.py | | - ThreeDimCNN_datagenerator.py | | | -- post_processing | | | | | - __init__.py | | - post_process.py | | - utils.py | | | -- prediction | | | | | - __init__.py | | - predict.py | | - utils.py | | | - __init__.py | -- TwoDCNN | -- models | | | - 2DCNN_inference.py | - __init__.py | - makse_mask_gaze_video.py | - utils.py | -- mrcnn | - __init__.py - config.py - LICENSE - model.py - parakkek_model.py (?) - utils-py - visualize venv | - HEC_CNN_env.yml