Commit 3efe7f33 by pennap

### CE 7 published

parent b24b90a6
slt-ce-7.ipynb 0 → 100644
 %% Cell type:markdown id: tags: # SLT-CE-7: Model Validation %% Cell type:markdown id: tags: ## References [1] [Information Theoretic Model Selection for Pattern Analysis](https://proceedings.mlr.press/v27/buhmann12a/buhmann12a.pdf) [2] [Approximate Sorting (original paper)](https://ml2.inf.ethz.ch/courses/slt/tutorials/apx-sorting-original-paper.pdf) [3] [Tutorial on Approximate Sorting](https://ml2.inf.ethz.ch/courses/slt/tutorials/Tutorial-approximate-sorting.pdf) (and corresponding exercises) %% Cell type:markdown id: tags: ## Introduction

We return to the problem of k-means clustering, but this time the goal is to find the optimal number of clusters. Having read the reference paper [1], we will try to reproduce the experiments detailed in their section 4.1. However, instead of a Gaussian Mixture Model, we are going to use modified setups, see below. Use Deterministic Annealing to perform clustering. You can reuse your DA algorithms from coding exercise 2.

%% Cell type:code id: tags: ``` python import numpy as np import matplotlib.pylab as plt import time import sklearn as skl from functools import reduce ``` %% Cell type:markdown id: tags:

Section 4.0 Complete all problems in this section to get a pass on this exercise.

Explain the main idea behind the Model Validation. Write down the equations yuo are going to use for solving the problems and shortly explain the variables that are involved (check section 4.1 in [1]):

%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:

For this excercise we shall use the following data source: \$\$p(\textbf{x})=\frac{1}{2}\mathcal{N}(\textbf{x} | \textbf{0}, \sigma_1^2\mathbb{I}) + \frac{1}{2}\mathcal{N}(\textbf{x} | r\textbf{v},\sigma_2^2\mathbb{I}) \$\$

with \$\mathbf{x}, \mathbf{v} \in \mathbb{R}^d\$, \$\lVert \mathbf{v} \rVert=1\$ and \$r\in\mathbb{R}_+\$. Implement a function generating data from this distribution:

%% Cell type:code id: tags: ``` python def generate_data(var_1, var_2, r, v, num_samples): """Generate data from the described model using provided parameters: Args: var_1 (number): varience of the first cluster var_2 (number): varience of the second cluster r (number): distance r to the center of the second cluster v (np.ndarray): direction v of the center of the second cluster num_samples (number): number of points to be generated Returns: data (np.ndarray): generated points labels (np.ndarray): true cluster assigment for the generated points """ return np.zeros((num_samples, v.size)), np.ones(num_samples) ``` %% Cell type:markdown id: tags:

Generate and plot 2D points for different combinations of \$\sigma_1\$, \$\sigma_2\$ and \$r\$. Color the generated points according to their cluster assigment.

%% Cell type:code id: tags: ``` python ``` %% Cell type:markdown id: tags:

In order to reproduce the results described in the reference paper [1], section 4.1, we need to calculate the mutual information of two clusterings. Provide an implementation of the correspondent function according to its docstring. You may want to check section 3 in [1] for more details.

%% Cell type:code id: tags: ``` python def mutual_information(X_1, X_2, y_1, y_2, t): """Compute mutual information of two clusterings Args: x_1 (np.ndarray): first dataset x_2 (np.ndarray): second dataset y_1 (np.ndarray): cluster centers for the first dataset y_2 (np.ndarray): cluster centers for the second dataset t (float): temperature T Returns: mutual_information (float): mutual information of x_1 and x_2 """ pass ``` %% Cell type:code id: tags: ``` python var_1 = 1 var_2 = 1 r = 1 v = np.array([1, 1]) n_samples = 100 data, labels = generate_data(var_1, var_2, r, v, n_samples) centers = np.array([[0, 0], [r, r]]) MI = mutual_information(data, data, centers, centers, 1) print(MI) ``` %% Cell type:markdown id: tags:

Section 4.5 Complete all problems in this section to get a pass on this exercise.

Now fix the parameters \$\sigma_1=\sigma_2=\sigma\$ and produce a plot of the mutual information (MI) vs. temperature for different \$r\$ (as in their figure 2b; put all curves in one figure). Modify your Deterministic Annealing class such that it logs MI score for every evaluated temperature T. Note that mutual information score is defined for a hard assignment while deterministic annealing algorithm result to a probabilistic one.

%% Cell type:code id: tags: ``` python ``` %% Cell type:markdown id: tags:

Additionally, plot the maximum MI vs. the distance \$r\$. What do you observe?

%% Cell type:code id: tags: ``` python ``` %% Cell type:markdown id: tags:

Investigate the behavior of the maximal MI as a function of the number of data samples \$n\$ and the data dimension \$d\$, i.e. plot several curves MI vs. \$r\$ for different \$n\$ and \$d\$. Try to organize the curves into figures, such that one can observe the behavior clearly.

%% Cell type:code id: tags: ``` python ``` %% Cell type:markdown id: tags:

Any implementation/computational/numerical issues?

%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:

Section 5.0 Complete all problems in this section to get a pass on this exercise.

Use \$r\$ such that one can not quite distinguish the two clusters (i.e. the maximum MI is not clearly realized with two clusters, i.e. k=2). The two clusters are overlapping so much that they appear as one cluster. Now reduce the variance \$\sigma_2\$ of the second cluster so that it creates a concentrated peak within the first cluster. Again, plot MI vs. \$T\$ for different \$\sigma_2\$ as well as the max MI vs. \$\sigma_2\$.

%% Cell type:code id: tags: ``` python ``` %% Cell type:markdown id: tags:

As before, also investigate the behavior of max MI vs. \$\sigma_2\$ for different \$n,d\$.

%% Cell type:code id: tags: ``` python ``` %% Cell type:markdown id: tags:

Section 5.5 Complete all problems in this section to get a pass on this exercise.

Now we look at a slightly more complex problem, where a mean field approxiamtion is need to carry out the calucation of the partition functions, that is approximated sorting [2].
Write down the mean field equation (7) and (8) in matrix form (note the typo in eq. 8 where the first sum should be extended only to \$k-1\$). Use the notation \$A^{sum}_{ij} = \sum_{k=1}^j A_{ik}\$, and implement these equations (you might want to use numpy.cumsum(, axis=1) to implement \$A^{sum}\$).

%% Cell type:code id: tags: ``` python ``` %% Cell type:code id: tags: ``` python def field(x,q): """x: pairwise comparison matrix q: probabilistic assignment matrix returns: the mean fields of MF approximations""" pass def assignment(beta, field, mu): """beta: inverse temperature field: mean fields mu: lagrange multipliers to enforce double stochasticity returns: probabilistic assignment matrix q""" pass ``` %% Cell type:markdown id: tags:

The langrange paramenters \$\mu_k\$ are such that the matrix \$q\$ obtained is doubly stochastic (it sums to one over both rows and columns). Using this property write down an iterative equation of the form \$\mu = f(\beta, \mu, q)\$ and implement it.

%% Cell type:code id: tags: ``` python def iterative_mu(beta,q,mu): """beta: inverse temperature q: probabilistic assignment matrix mu: old lagrange multiplier returns: updated lagrange multiplier""" pass ``` %% Cell type:markdown id: tags:

Implement the mean field approximation for sorting filling out the provided class method. Note that for every iteration of the mean field, you have to run an entire cycle of iteration until convergence of \$\mu\$.

%% Cell type:code id: tags: ``` python class MeanFieldSorting(): def __init__(self, x, eps=1e-5): self.x = x self.eps = eps self.n = self.x.shape[0] def compute(self, beta): """beta: inverse temperature returns: q, field, mu""" return np.zeros((self.n,self.n)), np.zeros((self.n,self.n)), np.zeros(self.n) ``` %% Cell type:code id: tags: ``` python ``` %% Cell type:markdown id: tags:

Implement a function for generating data like the one described in Section 2.1 and 2.5 of [2] to generate fig. 2, giving as output both the pairwise comparison matrix and the true order.

%% Cell type:code id: tags: ``` python def generate_pairwise_data(n, noise): ''' n: number of items to be sorted noise: standard deviation of the Normal noise to be added returns: X (pairwise comparisons), order (correct order of items) ''' return np.zeros((n,n)), np.ones(n) ``` %% Cell type:code id: tags: ``` python x, order = generate_pairwise_data(10,0) mf = MeanFieldSorting(x) q, field, mu = mf.compute(1) assert np.all(np.argmax(q, axis=1) == order) ``` %%%% Output: error --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) in 2 mf = MeanFieldSorting(x) 3 q, field, mu = mf.compute(1) ----> 4 assert np.all(np.argmax(q, axis=1) == order) AssertionError: %% Cell type:markdown id: tags:

Section 6.0 Complete all problems in this section to get a pass on this exercise.

Implement the mutual information function in eq. 15

%% Cell type:code id: tags: ``` python def mutual_information(x1,x2,beta): """x1,x2: two realization of the pairwise comparison matrix beta: inverse temperature returns: mutual information""" MF1 = MeanFieldSorting(x1) MF2 = MeanFieldSorting(x2) _, _, _ = MF1.compute(beta) _, _, _ = MF2.compute(beta) pass ``` %% Cell type:markdown id: tags:

Show in a plot like the one in fig. 2a the behaviour of the optimal \$\beta\$ as a function of the noise in the data distribution. Also show for some level of noises the mutual information as a function of beta.

%% Cell type:code id: tags: ``` python ``` %% Cell type:markdown id: tags: ## Comments We hope you found this exercise instructive. Feel free to leave comments below, we will read them carefully. %% Cell type:markdown id: tags:
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!