### CE 7 published

parent b24b90a6
slt-ce-7.ipynb 0 → 100644
 { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SLT-CE-7: Model Validation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References \n", "\n", " [Information Theoretic Model Selection for Pattern Analysis](https://proceedings.mlr.press/v27/buhmann12a/buhmann12a.pdf)\n", "\n", " [Approximate Sorting (original paper)](https://ml2.inf.ethz.ch/courses/slt/tutorials/apx-sorting-original-paper.pdf)\n", "\n", " [Tutorial on Approximate Sorting](https://ml2.inf.ethz.ch/courses/slt/tutorials/Tutorial-approximate-sorting.pdf) (and corresponding exercises)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Introduction\n", "\n", "

\n", "We return to the problem of k-means clustering, but this time the goal is to find the optimal number of clusters. Having read the reference paper , we will try to reproduce the experiments detailed in their section 4.1. However, instead of a Gaussian Mixture Model, we are going to use modified setups, see below. Use Deterministic Annealing to perform clustering. You can reuse your DA algorithms from coding exercise 2.\n", "

" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pylab as plt\n", "import time\n", "\n", "import sklearn as skl\n", "from functools import reduce" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Section 4.0 \n", " Complete all problems in this section to get a pass on this exercise. \n", "

\n", "\n", "

\n", "Explain the main idea behind the Model Validation. Write down the equations yuo are going to use for solving the problems and shortly explain the variables that are involved (check section 4.1 in ): \n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "For this excercise we shall use the following data source:\n", "\n", "$$p(\\textbf{x})=\\frac{1}{2}\\mathcal{N}(\\textbf{x} | \\textbf{0}, \\sigma_1^2\\mathbb{I}) + \n", "\\frac{1}{2}\\mathcal{N}(\\textbf{x} | r\\textbf{v},\\sigma_2^2\\mathbb{I})$$\n", "\n", "

\n", "with $\\mathbf{x}, \\mathbf{v} \\in \\mathbb{R}^d$, $\\lVert \\mathbf{v} \\rVert=1$ and $r\\in\\mathbb{R}_+$.\n", "Implement a function generating data from this distribution:\n", "

" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def generate_data(var_1, var_2, r, v, num_samples):\n", " \"\"\"Generate data from the described model using provided parameters:\n", " Args:\n", " var_1 (number): varience of the first cluster\n", " var_2 (number): varience of the second cluster\n", " r (number): distance r to the center of the second cluster\n", " v (np.ndarray): direction v of the center of the second cluster\n", " num_samples (number): number of points to be generated\n", " \n", " Returns:\n", " data (np.ndarray): generated points\n", " labels (np.ndarray): true cluster assigment for the generated points \n", " \"\"\"\n", " \n", " return np.zeros((num_samples, v.size)), np.ones(num_samples)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Generate and plot 2D points for different combinations of $\\sigma_1$, $\\sigma_2$ and $r$. Color the generated points according to their cluster assigment.\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "In order to reproduce the results described in the reference paper , section 4.1, we need to calculate the mutual information of two clusterings. Provide an implementation of the correspondent function according to its docstring. You may want to check section 3 in  for more details.\n", "

" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "def mutual_information(X_1, X_2, y_1, y_2, t):\n", " \"\"\"Compute mutual information of two clusterings\n", " Args:\n", " x_1 (np.ndarray): first dataset\n", " x_2 (np.ndarray): second dataset\n", " y_1 (np.ndarray): cluster centers for the first dataset\n", " y_2 (np.ndarray): cluster centers for the second dataset\n", " t (float): temperature T\n", " Returns:\n", " mutual_information (float): mutual information of x_1 and x_2\n", " \"\"\"\n", " pass" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n" ] } ], "source": [ "var_1 = 1\n", "var_2 = 1\n", "r = 1\n", "v = np.array([1, 1])\n", "n_samples = 100\n", "data, labels = generate_data(var_1, var_2, r, v, n_samples)\n", "centers = np.array([[0, 0], [r, r]])\n", "MI = mutual_information(data, data, centers, centers, 1)\n", "print(MI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Section 4.5\n", " Complete all problems in this section to get a pass on this exercise. \n", "

\n", "\n", "

\n", "Now fix the parameters $\\sigma_1=\\sigma_2=\\sigma$ and produce a plot of the mutual information (MI) vs. temperature for different $r$ (as in their figure 2b; put all curves in one figure). Modify your Deterministic Annealing class such that it logs MI score for every evaluated temperature T. Note that mutual information score is defined for a hard assignment while deterministic annealing algorithm result to a probabilistic one.\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "

\n", "Additionally, plot the maximum MI vs. the distance $r$. What do you observe?\n", "

" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Investigate the behavior of the maximal MI as a function of the number of data samples $n$ and the data dimension $d$, i.e. plot several curves MI vs. $r$ for different $n$ and $d$. Try to organize the curves into figures, such that one can observe the behavior clearly.\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Any implementation/computational/numerical issues?\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Section 5.0\n", " Complete all problems in this section to get a pass on this exercise. \n", "

\n", "\n", "

\n", "Use $r$ such that one can not quite distinguish the two clusters (i.e. the maximum MI is not clearly realized with two clusters, i.e. k=2). The two clusters are overlapping so much that they appear as one cluster. Now reduce the variance $\\sigma_2$ of the second cluster so that it creates a concentrated peak within the first cluster. Again, plot MI vs. $T$ for different $\\sigma_2$ as well as the max MI vs. $\\sigma_2$.\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "As before, also investigate the behavior of max MI vs. $\\sigma_2$ for different $n,d$.\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Section 5.5\n", " Complete all problems in this section to get a pass on this exercise. \n", "

\n", "\n", "

\n", "Now we look at a slightly more complex problem, where a mean field approxiamtion is need to carry out the calucation of the partition functions, that is approximated sorting . \n", "
Write down the mean field equation (7) and (8) in matrix form (note the typo in eq. 8 where the first sum should be extended only to $k-1$). Use the notation $A^{sum}_{ij} = \\sum_{k=1}^j A_{ik}$, and implement these equations (you might want to use numpy.cumsum(, axis=1) to implement $A^{sum}$).\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "def field(x,q):\n", " \"\"\"x: pairwise comparison matrix\n", " q: probabilistic assignment matrix\n", " \n", " returns: the mean fields of MF approximations\"\"\"\n", " \n", " pass\n", "\n", "def assignment(beta, field, mu):\n", " \"\"\"beta: inverse temperature\n", " field: mean fields\n", " mu: lagrange multipliers to enforce double stochasticity\n", " \n", " returns: probabilistic assignment matrix q\"\"\"\n", " pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "The langrange paramenters $\\mu_k$ are such that the matrix $q$ obtained is doubly stochastic (it sums to one over both rows and columns). Using this property write down an iterative equation of the form $\\mu = f(\\beta, \\mu, q)$ and implement it.\n", "

" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "def iterative_mu(beta,q,mu):\n", " \"\"\"beta: inverse temperature\n", " q: probabilistic assignment matrix\n", " mu: old lagrange multiplier\n", " \n", " returns: updated lagrange multiplier\"\"\"\n", " pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Implement the mean field approximation for sorting filling out the provided class method. Note that for every iteration of the mean field, you have to run an entire cycle of iteration until convergence of $\\mu$.\n", "

" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "class MeanFieldSorting():\n", " \n", " def __init__(self, x, eps=1e-5):\n", " self.x = x\n", " self.eps = eps\n", " self.n = self.x.shape\n", " \n", " def compute(self, beta):\n", " \"\"\"beta: inverse temperature\n", " \n", " returns: q, field, mu\"\"\"\n", " return np.zeros((self.n,self.n)), np.zeros((self.n,self.n)), np.zeros(self.n)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Implement a function for generating data like the one described in Section 2.1 and 2.5 of  to generate fig. 2, giving as output both the pairwise comparison matrix and the true order.\n", "

" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "def generate_pairwise_data(n, noise):\n", " '''\n", " n: number of items to be sorted\n", " noise: standard deviation of the Normal noise to be added\n", " \n", " returns: X (pairwise comparisons), order (correct order of items) \n", " '''\n", " return np.zeros((n,n)), np.ones(n)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "ename": "AssertionError", "evalue": "", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mmf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mMeanFieldSorting\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mq\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfield\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmu\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcompute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mall\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0margmax\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mq\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0morder\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mAssertionError\u001b[0m: " ] } ], "source": [ "x, order = generate_pairwise_data(10,0)\n", "mf = MeanFieldSorting(x)\n", "q, field, mu = mf.compute(1)\n", "assert np.all(np.argmax(q, axis=1) == order)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Section 6.0\n", " Complete all problems in this section to get a pass on this exercise. \n", "

\n", "\n", "

\n", "Implement the mutual information function in eq. 15\n", "

" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "def mutual_information(x1,x2,beta):\n", " \"\"\"x1,x2: two realization of the pairwise comparison matrix\n", " beta: inverse temperature\n", " \n", " returns: mutual information\"\"\"\n", " \n", " MF1 = MeanFieldSorting(x1)\n", " MF2 = MeanFieldSorting(x2)\n", " _, _, _ = MF1.compute(beta)\n", " _, _, _ = MF2.compute(beta)\n", " \n", " pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Show in a plot like the one in fig. 2a the behaviour of the optimal $\\beta$ as a function of the noise in the data distribution. Also show for some level of noises the mutual information as a function of beta.\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Comments\n", "\n", "We hope you found this exercise instructive.\n", "\n", "Feel free to leave comments below, we will read them carefully." ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 2 }
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!