\chapter{Gradient Descent} \chapter{Gradient Ascent} Gradient Descent is a popular method Gradient Ascent is a method for searching through large search spaces. When given a point in the solution space, we find the locally most beneficial direction to move in in order to increase the score. In a second step we move slightly in this direction. That process is repeated until we converge to a solution. As from the helper theorems \ref{makeInt1} and \ref{makeInt2} we can from fractional fields build almost entirely integer fields, we can actually use Gradient Descent we can use Gradient Descent for the solution space consisting of all fractional solutions. fractional solutions. This is natural as it allows for adding and removing small amounts of crop from a pixel, an important feature of the small step size gradient descent has. There are now two remaining issues we need to tackle. The first is computing the gradient. The second is that after following the gradient we might no longer be within the valid solution space. An example would be the gradient suggesting we plant more than one unit of crop per pixel to achieve a higher score. While this would be beneficial to the score, it no longer represents a valid fractional solution for our problem. Thus may need to find the closest valid solution. In the following sections we tackle these issues. \section{Finding the closest valid solution} Again the Birkhoff Polytope offers a solution. There are methods for finding points within the Birkhoff Polytope that have a minimal Euclidian distance to any arbitrary point TODO: Cite. Hence if we could represent all valid fractional solutions within a Birkhoff Polytope we could use these methods. The core idea is that we expand the fractional solution representation slightly in order to get a Birkhoff polytope. Usually for every pixel we have $C$ variables, which would lead to a $(X \cdot Y) \times C$ matrix. However, for the birkhoff polytope we need a square matrix where both all rows and all columns sum to $1$. To achieve this we clone'' crops. That is, if crop $c$ should be represented $n$ times within the field, then we introduce $n$ crops, that all behave identically to $c$, which all should be represented once within the field. As the total amount of crops to be planted is the same as the number of pixels, we must get a square $(X \cdot Y) \times (X \cdot Y)$ matrix. This process should be familiar from the section on the linear programming method. Every matrix of that form within the the corresponding Birkhoff Polytope is a valid fractional solution, as every pixel contains exactly a total amount of $1$ planted crops (rows sum to $1$) and we have one of each crop (columns sum to $1$). As we have one of each crop and some crops behave exactly the same, we can group these together to obtain the original fractional solution space we started out with. Hence every element of our Birkhoff Polytope maps to a valid fractional solution. This means we can project any $(X \cdot Y) \times (X \cdot Y)$ matrix to the closest valid fractional solution. \section{Computing the gradient} We assume we are given a valid fractional solution as a matrix $F$ in the Birkhoff Polytope discussed above. Thus we have a $(X \cdot Y) \times (X \cdot Y)$ matrix. Computing the gradient is now straightforward. For every pixel $p$ contains crop $c$ variable $p_c$ we take the partial derivative of the score function: \begin{align*} \frac{\partial}{\partial p_c} \sum_{n \in N(p)} \sum_{c_i \in \Cps} R(c, c_i) \cdot p_{c} \cdot n_{c_j} = \sum_{n \in N(p)} \sum_{c_i \in \Cps} R(c, c_i) \cdot n_{c_j} \end{align*} \section{Method} Using these elements we now build the following algorithm: \FloatBarrier \begin{algorithm} \caption{Random Gradient Ascent Algorithm} \begin{algorithmic} \State $x_0 \gets \text{Random initial valid field}$ \While{$x_0$ not converged} \State $G \gets gradient(x_0)$ \State $x_0 \gets x_0 + \text{stepsize} \cdot G$ \State $x_0 \gets project(x_0)$ \EndWhile\\ \Return $integerize(x_0)$ \end{algorithmic} \end{algorithm} \FloatBarrier \ No newline at end of file
 ... ... @@ -191,7 +191,7 @@ on large fields and requires no conditions of $R$. \node[main node] (2) [right of=1] {V}; \path[every node/.style={font=\sffamily\small}] (1) edge[bend left] node [above] {$\lambda \cdot c_1$} (2) (1) edge[bend left] node [above] {$\lambda \cdot c_1$} (2); (2) edge[bend left] node [below] {$\lambda \cdot c_2$} (1); \end{tikzpicture} \end{figure} ... ... @@ -300,8 +300,8 @@ statement with a condition on $R$ might be helpful: \node[main node, fill=black!20,] (11) [below of=10] {N}; \node[main node, fill=black!20,] (12) [below of=11] {N}; \path[every node/.style={font=\sffamily\small}] (1) edge[bend left] node [above] {$c_\alpha$} (2) \path[every node/.style={font=\sffamily\small}]; (1) edge[bend left] node [above] {$c_\alpha$} (2); (2) edge[bend left] node [below] {$c_\beta$} (1); \end{tikzpicture} \end{figure} ... ...
 ... ... @@ -33,9 +33,9 @@ their preferred crop: \node[main node] (2) [right of=hidden] {V}; \node[main node] (3) [above of=hidden] {W}; \path[every node/.style={font=\sffamily\small}] (1) edge[left] node [above] {$b$} (2) (2) edge[left] node [above] {$c$} (3) \path[every node/.style={font=\sffamily\small}]; (1) edge[left] node [above] {$b$} (2); (2) edge[left] node [above] {$c$} (3); (3) edge[right] node [above] {$a$} (1); \end{tikzpicture} \end{figure} ... ...