To receive notifications about scheduled maintenance, please subscribe to the mailing-list gitlab-operations@sympa.ethz.ch. You can subscribe to the mailing-list at https://sympa.ethz.ch

Commit 3efe7f33 authored by pennap's avatar pennap
Browse files

CE 7 published

parent b24b90a6
%% Cell type:markdown id: tags:
# SLT-CE-7: Model Validation
%% Cell type:markdown id: tags:
## References
[1] [Information Theoretic Model Selection for Pattern Analysis](https://proceedings.mlr.press/v27/buhmann12a/buhmann12a.pdf)
[2] [Approximate Sorting (original paper)](https://ml2.inf.ethz.ch/courses/slt/tutorials/apx-sorting-original-paper.pdf)
[3] [Tutorial on Approximate Sorting](https://ml2.inf.ethz.ch/courses/slt/tutorials/Tutorial-approximate-sorting.pdf) (and corresponding exercises)
%% Cell type:markdown id: tags:
## Introduction
<p style="background-color:#adebad;">
We return to the problem of k-means clustering, but this time the goal is to find the optimal number of clusters. Having read the reference paper [1], we will try to reproduce the experiments detailed in their section 4.1. However, instead of a Gaussian Mixture Model, we are going to use modified setups, see below. Use Deterministic Annealing to perform clustering. You can reuse your DA algorithms from coding exercise 2.
</p>
%% Cell type:code id: tags:
``` python
import numpy as np
import matplotlib.pylab as plt
import time
import sklearn as skl
from functools import reduce
```
%% Cell type:markdown id: tags:
<h2 style="background-color:#f0b375;">
Section 4.0
<span style=font-size:50%> Complete all problems in this section to get a pass on this exercise. </span>
</h2>
<p style="background-color:#adebad;">
Explain the main idea behind the Model Validation. Write down the equations yuo are going to use for solving the problems and shortly explain the variables that are involved (check section 4.1 in [1]):
</p>
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
<p style="background-color:#adebad;">
For this excercise we shall use the following data source:
$$p(\textbf{x})=\frac{1}{2}\mathcal{N}(\textbf{x} | \textbf{0}, \sigma_1^2\mathbb{I}) +
\frac{1}{2}\mathcal{N}(\textbf{x} | r\textbf{v},\sigma_2^2\mathbb{I}) $$
<p style="background-color:#adebad;">
with $\mathbf{x}, \mathbf{v} \in \mathbb{R}^d$, $\lVert \mathbf{v} \rVert=1$ and $r\in\mathbb{R}_+$.
Implement a function generating data from this distribution:
</p>
%% Cell type:code id: tags:
``` python
def generate_data(var_1, var_2, r, v, num_samples):
"""Generate data from the described model using provided parameters:
Args:
var_1 (number): varience of the first cluster
var_2 (number): varience of the second cluster
r (number): distance r to the center of the second cluster
v (np.ndarray): direction v of the center of the second cluster
num_samples (number): number of points to be generated
Returns:
data (np.ndarray): generated points
labels (np.ndarray): true cluster assigment for the generated points
"""
return np.zeros((num_samples, v.size)), np.ones(num_samples)
```
%% Cell type:markdown id: tags:
<p style="background-color:#adebad;">
Generate and plot 2D points for different combinations of $\sigma_1$, $\sigma_2$ and $r$. Color the generated points according to their cluster assigment.
</p>
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
<p style="background-color:#adebad;">
In order to reproduce the results described in the reference paper [1], section 4.1, we need to calculate the mutual information of two clusterings. Provide an implementation of the correspondent function according to its docstring. You may want to check section 3 in [1] for more details.
</p>
%% Cell type:code id: tags:
``` python
def mutual_information(X_1, X_2, y_1, y_2, t):
"""Compute mutual information of two clusterings
Args:
x_1 (np.ndarray): first dataset
x_2 (np.ndarray): second dataset
y_1 (np.ndarray): cluster centers for the first dataset
y_2 (np.ndarray): cluster centers for the second dataset
t (float): temperature T
Returns:
mutual_information (float): mutual information of x_1 and x_2
"""
pass
```
%% Cell type:code id: tags:
``` python
var_1 = 1
var_2 = 1
r = 1
v = np.array([1, 1])
n_samples = 100
data, labels = generate_data(var_1, var_2, r, v, n_samples)
centers = np.array([[0, 0], [r, r]])
MI = mutual_information(data, data, centers, centers, 1)
print(MI)
```
%% Cell type:markdown id: tags:
<h2 style="background-color:#f0b375;">
Section 4.5
<span style=font-size:50%> Complete all problems in this section to get a pass on this exercise. </span>
</h2>
<p style="background-color:#adebad;">
Now fix the parameters $\sigma_1=\sigma_2=\sigma$ and produce a plot of the mutual information (MI) vs. temperature for different $r$ (as in their figure 2b; put all curves in one figure). Modify your Deterministic Annealing class such that it logs MI score for every evaluated temperature T. Note that mutual information score is defined for a hard assignment while deterministic annealing algorithm result to a probabilistic one.
</p>
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
<p style="background-color:#adebad;">
Additionally, plot the maximum MI vs. the distance $r$. What do you observe?
</p>
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
<p style="background-color:#adebad;">
Investigate the behavior of the maximal MI as a function of the number of data samples $n$ and the data dimension $d$, i.e. plot several curves MI vs. $r$ for different $n$ and $d$. Try to organize the curves into figures, such that one can observe the behavior clearly.
</p>
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
<p style="background-color:#adebad;">
Any implementation/computational/numerical issues?
</p>
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
<h2 style="background-color:#f0b375;">
Section 5.0
<span style=font-size:50%> Complete all problems in this section to get a pass on this exercise. </span>
</h2>
<p style="background-color:#adebad;">
Use $r$ such that one can not quite distinguish the two clusters (i.e. the maximum MI is not clearly realized with two clusters, i.e. k=2). The two clusters are overlapping so much that they appear as one cluster. Now reduce the variance $\sigma_2$ of the second cluster so that it creates a concentrated peak within the first cluster. Again, plot MI vs. $T$ for different $\sigma_2$ as well as the max MI vs. $\sigma_2$.
</p>
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
<p style="background-color:#adebad;">
As before, also investigate the behavior of max MI vs. $\sigma_2$ for different $n,d$.
</p>
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
<h2 style="background-color:#f0b375;">
Section 5.5
<span style=font-size:50%> Complete all problems in this section to get a pass on this exercise. </span>
</h2>
<p style="background-color:#adebad;">
Now we look at a slightly more complex problem, where a mean field approxiamtion is need to carry out the calucation of the partition functions, that is approximated sorting [2].
<br>Write down the mean field equation (7) and (8) in matrix form (note the typo in eq. 8 where the first sum should be extended only to $k-1$). Use the notation $A^{sum}_{ij} = \sum_{k=1}^j A_{ik}$, and implement these equations (you might want to use numpy.cumsum(, axis=1) to implement $A^{sum}$).
</p>
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
def field(x,q):
"""x: pairwise comparison matrix
q: probabilistic assignment matrix
returns: the mean fields of MF approximations"""
pass
def assignment(beta, field, mu):
"""beta: inverse temperature
field: mean fields
mu: lagrange multipliers to enforce double stochasticity
returns: probabilistic assignment matrix q"""
pass
```
%% Cell type:markdown id: tags:
<p style="background-color:#adebad;">
The langrange paramenters $\mu_k$ are such that the matrix $q$ obtained is doubly stochastic (it sums to one over both rows and columns). Using this property write down an iterative equation of the form $\mu = f(\beta, \mu, q)$ and implement it.
</p>
%% Cell type:code id: tags:
``` python
def iterative_mu(beta,q,mu):
"""beta: inverse temperature
q: probabilistic assignment matrix
mu: old lagrange multiplier
returns: updated lagrange multiplier"""
pass
```
%% Cell type:markdown id: tags:
<p style="background-color:#adebad;">
Implement the mean field approximation for sorting filling out the provided class method. Note that for every iteration of the mean field, you have to run an entire cycle of iteration until convergence of $\mu$.
</p>
%% Cell type:code id: tags:
``` python
class MeanFieldSorting():
def __init__(self, x, eps=1e-5):
self.x = x
self.eps = eps
self.n = self.x.shape[0]
def compute(self, beta):
"""beta: inverse temperature
returns: q, field, mu"""
return np.zeros((self.n,self.n)), np.zeros((self.n,self.n)), np.zeros(self.n)
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
<p style="background-color:#adebad;">
Implement a function for generating data like the one described in Section 2.1 and 2.5 of [2] to generate fig. 2, giving as output both the pairwise comparison matrix and the true order.
</p>
%% Cell type:code id: tags:
``` python
def generate_pairwise_data(n, noise):
'''
n: number of items to be sorted
noise: standard deviation of the Normal noise to be added
returns: X (pairwise comparisons), order (correct order of items)
'''
return np.zeros((n,n)), np.ones(n)
```
%% Cell type:code id: tags:
``` python
x, order = generate_pairwise_data(10,0)
mf = MeanFieldSorting(x)
q, field, mu = mf.compute(1)
assert np.all(np.argmax(q, axis=1) == order)
```
%%%% Output: error
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-45-d8dc69284356> in <module>
2 mf = MeanFieldSorting(x)
3 q, field, mu = mf.compute(1)
----> 4 assert np.all(np.argmax(q, axis=1) == order)
AssertionError:
%% Cell type:markdown id: tags:
<h2 style="background-color:#f0b375;">
Section 6.0
<span style=font-size:50%> Complete all problems in this section to get a pass on this exercise. </span>
</h2>
<p style="background-color:#adebad;">
Implement the mutual information function in eq. 15
</p>
%% Cell type:code id: tags:
``` python
def mutual_information(x1,x2,beta):
"""x1,x2: two realization of the pairwise comparison matrix
beta: inverse temperature
returns: mutual information"""
MF1 = MeanFieldSorting(x1)
MF2 = MeanFieldSorting(x2)
_, _, _ = MF1.compute(beta)
_, _, _ = MF2.compute(beta)
pass
```
%% Cell type:markdown id: tags:
<p style="background-color:#adebad;">
Show in a plot like the one in fig. 2a the behaviour of the optimal $\beta$ as a function of the noise in the data distribution. Also show for some level of noises the mutual information as a function of beta.
</p>
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
## Comments
We hope you found this exercise instructive.
Feel free to leave comments below, we will read them carefully.
%% Cell type:markdown id: tags:
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment