In this blog post, I will explore an application of probability in the realm of approximation algorithms. The material here is from Chapter 3 of in Roman Vershynin’s textbook on High Dimensional Probability.

Often, the problems that we want to solve are hard in a computational sense, i.e. we do not have algorithms to solve these problems in polynomial time. An example of such a problem is an *integer optimization problem*:

In general, finding an optimal solution in a *discrete space* is difficult, as one has to try an exponential amount of possibilities. In the above example, for general \(A_{ij}\) we would have to try all \(2^n\) possibilities for \(x_i\), which is very time consuming!

## Semidefinite Relaxations

Our strategy for solving these hard computational problems is to *relax* them into easier problems. Of course, once we make these problems simpler, the solutions we get will not be optimal for the original problem, but often we can give *guarantees* on the performance of our relaxation.

In particular, sometimes we will try to turn these hard computation problems into semidefinite programs (SDPs). Recall that an SDP can be written in the form:

\[\text{maximize} \left< A, X\right> : X\succeq 0, \left< B_i,X\right> =b_i\]where \(\left<X,Y\right>\) denotes the matrix inner product.

SDPs can be solved with convex optimization methods, i.e. in polynomial time. (A good text on convex optimization is the classic Convex Optimization.)

## MAX-CUT

Let’s illustrate this with an example. Lets say I give you a graph \(G=(V,E)\) a set of vertices and edges. It’s your job to partition the vertices into two sets \(A\) and \(B\) such that the number of edges between the two sets is maximal.

Easy enough? It turns out that in general, this problem, known as MAX-CUT is NP-hard. Let’s try to formalize it mathematically.

For a graph \(G=(V,E)\), arbitrarily label the vertices \(1,2,...n\). Then define the adjacency matrix \(A\), where \(A_{ij} = 1\) iff there is an edge between vertex i and vertex j. (By convention, \(A_{ii} = 0\)). Let us describe our partition as the variable \(x_i = \pm 1\), with the \(+1\) values denoting the first set, and \(-1\) values denoting the second set. (Our formulation will be symmetric, so it doesn’t matter if the first set is \(+1\) or \(-1\).) Then for a given labeling \(x = [x_1, x_2, ..., x_n]\), lets define:

\[\text{CUT}(G,x) = \frac{1}{2} \sum_{i,j:x_ix_j = -1} A_{ij}.\]Here, we are summing up all the edges where \(x_i\) and \(x_j\) have opposite labelings, and dividing by 2 so we don’t count edges twice.

Then we can rewrite this as:

\[\text{CUT}(G,x) = \frac{1}{4} \sum_{i,j} A_{ij}(1-x_ix_j).\]Note that here we are summing over all \(i, j\). The inner sum is equal to 0 when \(x_ix_j=1\), and equal to 2 when \(x_ix_j = -1\), so they are equivalent!

MAX-CUT is defined as:

\[\text{maximize } \text{CUT}(G,x) \text{ over all } x_i = \pm 1.\]### A 0.5-Approximation

Lets start with a simple solution to this problem: for every vertex in E, flip a fair coin. If heads, let \(x_i = 1\), and if tails, \(x_i=-1\).

How does it do? Let’s look at the *expectation* of CUT\((G,x)\).

So we expect to get a cut with size \(\frac{1}{2} \lvert E\rvert\). But \(\lvert E\rvert\) is trivially an upper-bound on \(\text{MAX-CUT}(G)\), so in this case, our algorithm gives us a solution that is \(\approx 0.5 \text{MAX-CUT}(G)\).

You might note that this is only true in *expectation* - if you do this once, you aren’t guaranteed to get a cut thats \(\ge \text{MAX-CUT}(G)\). But it turns out, you can argue that if you do this \(N\) times, eventually you’ll get a cut that gives you this guarantee with high probability.

### A 0.878-Approximation

A 0.5-approximiation really isn’t that good. Let’s see if we can try to improve it. Below, I outline a method due to Goemans and Williamson in their seminal paper in 1995. Here, we will finally use the “semidefinite relaxation” we’ve been talking about.

Recall that our MAX-CUT problem was:

\[\text{maximize} \frac{1}{4}\sum_{i,j} A_{ij} (1-x_ix_j): x_i = \pm 1.\]The issue here is that \(x_i\) are discrete, which is hard to optimize over. Instead, let us represent each \(x_i\) as an \(n\)-dimensional unit vector \(X_i\):

\[\text{SDP(G)} := \text{maximize} \frac{1}{4} \sum_{i,j} A_{ij} (1-\left< X_i,X_j \right>): \left\lVert X_i \right\rVert_2 = 1\]If we think about the matrix \(X\) where each entry \(X_{ij} = \left< X_i,X_j \right>\), it’s not too hard to see that this is a SDP.

So we can efficiently get some solution \(X^*_{i}\in\mathbb{R}^n, i=1,2...,n\). But now we have an \(n\)-dimensional vector solution for each \(x_i\) - how do we translate this back into solutions of the form \(x_i = \pm 1\)?

We will use **randomized rounding**: pick a random \(g\sim\mathcal{N}(0,I_n)\), and for each \(X_i\), let \(x_i \leftarrow \text{sign}\left<g, X_i\right>\). Essentially, we are seeing if our \(X_i\) is at an angle \(< \pi\) with the random vector \(g\).

Let’s try to evaluate the expected size of the cut we get:

\[\mathbb{E} \text{CUT}(G,x) = \frac{1}{4}\sum_{i,j}A_{ij}(1-\mathbb{E}\left[ \text{sign} \left<g, X_i \right>\text{sign} \left<g, X_j \right>\right])\]Here we use an identity:

**Lemma.** For \(g\sim\mathcal{N}(0,I_n)\), any \(u,v\in\mathbb{R}^n\) with \(\left\lVert u\right\rVert_2=\left\lVert v\right\rVert_2=1\):

Furthermore,

\[\begin{align} 1-\mathbb{E}\left[ \text{sign} \left<g, u\right>\text{sign} \left<g, v \right>\right] &= 1-\frac{2}{\pi} \arcsin\left<u,v\right>\\ &= \frac{2}{\pi} \arccos\left<u,v\right>\\ &\ge 0.878 (1-\left<u,v\right>) \end{align}\]**Proof.**

Let \(\alpha\) be the angle between \(u\) and \(v\). Then, if we consider the hyperplane to which \(u\) and \(v\) belong, a random vector “cuts” \(u\) and \(v\), giving them different sign, with probability \(\frac{\alpha}{\pi}\) (We are able to do this because of the “rotation invariance” property of g.) Then we have:

\[\begin{align}&= 1-\frac{\alpha}{\pi} - \frac{\alpha}{\pi} \\ &= 1 - 2\frac{\alpha}{\pi}\\ &= 1-\frac{2}{\pi} \arccos{\left<u,v\right>}\\ &= \frac{2}{\pi}\arcsin\left<u,v\right>.\end{align}\]because \(u, v\) are unit.

The inequality follows from the fact that:

\[1-\frac{2}{\pi}\arcsin t \ge 0.878(1-t).\]which can be numerically verified.

Applying our lemma, we have:

\[\begin{align} \mathbb{E} \text{CUT}(G,x) &= \frac{1}{4}\sum_{i,j}A_{ij}(1-\mathbb{E}\left[ \text{sign} \left<g, X_i \right>\text{sign} \left<g, X_j \right>\right]) \\ &= \frac{1}{4}\sum_{i,j} A_{ij} (1-\frac{2}{\pi} \arcsin \left<X_i, X_j \right>) \\ &\ge 0.878 \times \frac{1}{4}\sum_{i,j} A_{ij} (1- \left<X_i, X_j \right>).\\ &= 0.878 \text{ SDP}(G) \end{align}\]So on *expectation*, our randomized rounding step gives us a solution that is lower bounded by the numerical max we got from the SDP relaxation. In addition, we also have the bound that:

This can be shown by the fact that a solution for MAX-CUT can be converted into a (non-optimal) solution for SDP; therefore the optimal solution for SDP must have larger functional value.

Hence, we have found a 0.878-approximation to our MAX-CUT problem!