Commit ac1bf7a5 authored by PhilFischer's avatar PhilFischer
Browse files

Update README.md

parent 0ee1fd7d
......@@ -23,13 +23,24 @@ The *WikiArt* dataset is used which can be downloaded [here](https://github.com/
## Usage
To run hyperparameter tuning on the main model run
### Tuning
To run one tuning session on the main model and save all logs for Tensorboard inside the directory `logs` run
```
$ python3 main.py
```
This will run one tuning session and save all logs for Tensorboard inside the directory `logs`.
The logs can be viewed locally by synchronizing the `logs` folder and then hosting it with Tensorboard. Each tuning session is assigned an experiment ID, which can be specified in `main.py` as well as a timestamp. Each run within the session is then enumerated by a run ID. The full log path is thus `logs/{experimentID}/{timestamp}/{runID}`. Additionally, inference results will be produced for the best model within a session and stored to `logs/{experimentID}/{timestamp}/images` and the logs in the experiment folder are compatible with the *HParams* plugin of Tensorboard.
## Model Architecture
The model at hand is a conditional convolutional variational autoencoder which models the conditional marginal likelihood $`p(\mathbf{x}|s)`$ of images $`\mathbf{x}`$ given style labels $`s`$.
The encoder uses convolutional feature extraction blocks on images and dense feature maps on styles, which are then concatenated, to model the conditional posterior $`q(\mathbf{z}|\mathbf{x},s)`$ of latent variables $`\mathbf{z}`$, which are assumed to be standard normally distributed $`\mathbf{z} \sim \mathcal{N}(0, \mathbf{1})`$.
The decoder models the conditional likelihood given the latent sample $`p(\mathbf{x}|\mathbf{z},s)`$. The likelihood is assumed to be independently Gaussian $`\mathcal{N}(\boldsymbol{\mu}, \sigma^2\mathbf{1})`$, where the standard deviation $`\sigma`$ is an additional learnable parameter [[σ-VAE](https://arxiv.org/abs/2006.13202)]. This allows for reconstruction with different style $`s'`$ by sampling from the marginal distribution $`\int q(\mathbf{z}|\mathbf{x}, s) p(\mathbf{x}|\mathbf{z}, s') \text{d}\mathbf{z}`$, as well as conditional generative modelling by using $`\int p(\mathbf{z})p(\mathbf{x}|\mathbf{z}, s') \text{d}\mathbf{z}`$ instead.
![](img/architecture.png)
## TODO:
......@@ -38,5 +49,6 @@ This will run one tuning session and save all logs for Tensorboard inside the di
- [ ] Optimize feature extraction head with transfer learning.
- [ ] Optimize latent space rank: 1D or 2D representation.
- [ ] Optimize latent space form: [Quantized Latent Space](https://arxiv.org/abs/1711.00937).
- [ ] Prevent posterior collapse: β-VAE or [σ-VAE](https://arxiv.org/abs/2006.13202).
- [ ] Optimize decoder distribution: Standard Gaussian, Scaled Gaussian, Beta, Kumaraswamy, Discrete.
- [x] Prevent posterior collapse: β-VAE or [σ-VAE](https://arxiv.org/abs/2006.13202).
- [x] Optimize decoder distribution: Standard Gaussian, Scaled Gaussian, Beta, Kumaraswamy, **Discrete**.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment