EECS 477 Notes

1. Final Exam Prep
2. 2015-12-04 Friday
- 2.1. Paging Continued
  - 2.1.1. Marking Algorithm
- 2.2. Final
3. 2015-12-02 Wednesday
- 3.1. Amortized Analysis (continued)
- 3.2. Paging Problem
4. 2015-11-30 Monday
- 4.1. Homework 5 recap
- 4.2. Amortized Analysis
5. 2015-11-23 Monday
- 5.1. Max Cut
6. 2015-11-20 Friday
- 6.1. MAX SAT
7. 2015-11-18 Wednesday
- 7.1. "Math Facts"
- 7.2. Back to Hitting Set
- 7.3. Maximum Satisfiability
8. 2015-11-16 Monday
- 8.1. Hitting Set
- 8.2. Primal Dual Algorithm for HS
- 8.3. Example
- 8.4. Approximation
- 8.5. Randomized Approximation Algorithm
9. 2015-11-11 Wednesday
- 9.1. Game Tree Evaluation (cont)
10. 2015-11-09 Monday
- 10.1. Game Tree Evaluation
11. 2015-11-06 Friday
- 11.1. Two Person Zero Sum Games (cont)
  - 11.1.1. Yao's Principle
  - 11.1.2. Example Time
12. 2015-11-04 Wednesday
- 12.1. Two Person Zero Sum Games
- 12.2. Saddle Points for Mixed Strategies
- 12.3. Something Else
13. 2015-11-02 Monday
- 13.1. Dr. Liberatore's Homework
- 13.2. Our Homework
  - 13.2.1. MCNF and Max Flow
  - 13.2.2. Successive Shortest Paths
14. 2015-10-30 Friday
- 14.1. Games (strategic form)
- 14.2. Mixed Strategies
- 14.3. Games (Behavioral Form)
- 14.4. Two person zero sum game
15. 2015-10-28 Wednesday
- 15.1. Capacity Scaling (cont)
  - 15.1.1. Runtime
- 15.2. Max Flow Min Cut
- 15.3. Game Theory!
16. 2015-10-26 Monday
- 16.1. Runtime of Successive Shortest Paths
- 16.2. Capacity Scaling
17. 2015-10-21 Wednesday
- 17.1. MCNF Algorithms
  - 17.1.1. Successive Shortest Paths
18. 2015-10-16 Friday
19. 2015-10-14 Wednesday
- 19.1. Back to Equipment Replacement
- 19.2. Back to Uniform Cost Parallel Machines
- 19.3. Midterm Questions
- 19.4. Final Exam
20. 2015-10-12 Monday
- 20.1. Homework 3 Review
21. 2015-10-09 Friday
- 21.1. Uniform Parallel Machines
- 21.2. Equipment Replacement
22. 2015-10-07 Wednesday
- 22.1. Max Flow Reduction: Matrix Rounding
- 22.2. Max Flow Reduction: Job Scheduling
23. 2015-10-05 Monday
- 23.1. Max Flow
- 23.2. Feasible Flow Problem
- 23.3. Reductions
24. 2015-10-02 Friday
- 24.1. Shortest Paths
- 24.2. Dynamic Programming!!!!!
- 24.3. Circulation
25. 2015-09-30 Wednesday
- 25.1. MCNF Duality
26. 2015-09-28 Monday
27. 2015-09-25 Friday
- 27.1. MCNF Duality
28. 2015-09-23 Wednesday
- 28.1. MCNF: Transformations
- 28.2. Residual Network
- 28.3. MCNF as LP
29. 2015-09-21 Monday
30. 2015-09-18 Friday
- 30.1. Integrality
- 30.2. Minimum Cost Network Flow
31. 2015-09-16 Wednesday
- 31.1. Problem Setup
- 31.2. Complementary Slackness Proof
- 31.3. Lagrangian Relaxation
- 31.4. Easily Finding the Dual
- 31.5. More Examples
32. 2015-09-14 Monday
- 32.1. Homework Stuff
- 32.2. Duality
- 32.3. Complementary Slackness
33. 2015-09-11 Friday
- 33.1. Duality
34. 2015-09-09 Wednesday
- 34.1. Linear Programming
35. 2015-09-02 Wednesday
- 35.1. Review of LP
  - 35.1.1. Example
- 35.2. Reducing Vertex Cover to ILP
- 35.3. "Slicing" Linear Programs
- 35.4. Semi Definite Programming
- 35.5. LP reduces to SDP
- 35.6. Quadratically Constrained Quadratic Programming (QCQP)
- 35.7. Back to Linear Programming
36. 2015-08-31 Monday
- 36.1. Linear Programming (LP)
- 36.2. Integer Linear Programming
- 36.3. Mixed Integer Linear Programming
- 36.4. Next Time, on Advanced Algorithms:
37. 2015-08-28 Friday
- 37.1. Last Time:
- 37.2. Proof
- 37.3. Reduction
  - 37.3.1. Optimal Message Passing
  - 37.3.2. Choosing your reduction
- 37.4. GNU Octave
38. 2015-08-26 Wednesday
- 38.1. Optimization Problems (cont.)
- 38.2. Maximization problems
- 38.3. Approximation Algorithms
- 38.4. Approx Vertex Cover
39. 2015-08-24 Monday
- 39.1. Asymptotics
- 39.2. Optimization Problems

1 Final Exam Prep

As discussed on Friday, the final exam will cover topics from the second half of the class. As prep work, I will put here a general listing of topics from the second half, as well as internal links to notes about them.

Topics (most recent first):

Paging: First Day, Second Day
- Marking Algorithm
Amortized Analysis: First Day (mostly introduction), Second Day
Max Cut: Lecture. The second day was Wednesday before Thanksgiving; was not present.
Maximum Satisfiability: First Day, Second Day
- Random Assignment Algorithm
- LP-Based Randomized Algorithm
Hitting Set: First Day (definition, 2 algorithms), Second Day (repeated execution and analysis of combined algorithm).
Game Tree Evaluation: First Day, Second Day, Third Day
Two Person Zero Sum Games: First Day, Second Day, Third Day
Game Theory Intro: First Day, Second Day, Third Day
- Mixed Strategies (and the penny game)
- Strategic Form
- Behavioral/Extensive Form
MCNF (some of this was before midterm, but oh well):
- First Day (definition, high level stuff)
- Second Day (transformations, residuals, LP, unimodularity)
- Third Day (duality)
- Fourth Day (duality again)
- Fifth Day (applications to shortest paths, dynamic programming, circulation)
- (all MCNF above this point was "pre-midterm", now for "post-midterm", i.e. the algorithms)
- Sixth Day (assumptions of algorithms, SSP)
- Seventh Day (Runtime of SSP, Capacity Scaling Algorithm)
- Eighth Day (CS algorithm, plus runtime)
Max Flow == Min Cut

Shortcut table for taking duals.

2 2015-12-04 Friday

2.1 Paging Continued

$n$ pages in process, main memory holds \(k
Process makes requests for pages, if not in main memory, we need to read from disk and maybe evict a page from main memory.
Need to come up with an strategy for evicting pages that will minimize page faults.
If we knew all the page requests ahead of time, there would be a relatively simple algorithm that could be applied to minimize page faults.
Problem: we don't know page requests ahead of time. So we can think of this as an adversarial game.
When \(k
So, it seems like there's not much we can do with a deterministic algorithm for the general case.

We would like to come up with an approximation algorithm and do an amortized analysis.

$c(G)$: number of page faults of algorithm $G$ on sequence $\sigma$.
OPT: optimum page replacement strategy on $\sigma$.
$c(G) \le \alpha + c(OPT) + \beta \:\:\: \forall \sigma $
Need an online algorithm (i.e. it has no future knowledge). OPT has the advantage of future knowledge.

2.1.1 Marking Algorithm

Associates a bit with every page in main memory. Initially, all $k$ slots are unmarked.
When page $p$ is requested:
- if $p \not\in$ main memory:
  - if all pages are marked, unmark all
  - evict any unmarked page
  - replace it with $p$
- mark p

Example: $k = 3$

Initial main memory (none are marked):

1 2 3
Request page 4. 2 is arbitrarily chosen to be evicted

1 4* 3
Request page 2. 1 is arbitrarily chosen to be evicted

2* 4* 3
Request page 4. Nothing changes

2* 4* 3
Request page 1. 3 is the only page that can be evicted.

2* 4* 1*
Request page 3. All pages must be unmarked, and then 4 is arbitrarily chosen to be evicted.

2 3* 1

Ok. Looks pretty darn simple. This marking algorithm tries to emulate LRU (Least Recently Used), but with only one bit of memory to store how recently used. Or you could think of it the other way, that M is a generalization of LRU, because M has a choice to evict "any" page, and LRU specifies that the page it evicts is the least recently used.

For the purpose of analysis, we will define a "Phase". P₀ is at the start. P_i is a maximal subsequence of requests containing $k$ distinct pages.

Theorem: $c(M) \le k c(OPT) + k$

Proof: In each phase, $M$ incurs no mare than $k$ page faults (once a page is marked, it stays in main memory for the rest of the phase).

For the purpose of analyzing the optimal strategy, we will call the first requested page of phase $q$. We will define $Q$ to be the remaining page requests of the phase, plus the next one. $Q$ contains exactly $k$ distinct pages that are not equal to $q$. By the end of $Q$, OPT will have to incur a page fault (because $q$ is in main memory, and it's impossible to have all $k$ of the other distinct pages in $Q$).

The sequence of page requests will contain some number $m$ of complete phases, followed maybe by an additional $\sigma$, or maybe not. So there are $m$ complete phases for the marking algorithm, and in the worst case there are $m-1$ regions $Q$. In each phase, the M algorithm incurs cost $\le k$ while the optimal incurs cost $\le 1$. In the final phase, the M algorithm incurs cost $\le k$, while the optimal may or may not incur any cost.

So, we can easily see that $c(M) \le k c(OPT) + k$

Paging has tons of interesting "confluences" of concepts from this class:

primal dual algorithms
randomized algorithms
amortized analysis
Yao's principle
more

Sadly, we didn't get a chance to cover it all.

2.2 Final

He will be asking questions about the second part of the course (after the midterm). You will be pretty screwed if you don't know LP anymore, but there won't be any explicit questions on the content of the first half. The exam will be open book and note (!!). Question types:

Explain (insert some topic/algorithm/proof).
- Open book, so he may ask questions about the proof to show that you do understand the proof, and aren't just reading the proof for the first time.
Simple problems - as in do thing x.
Homework problems may be asked verbatim! This could be a "personalized" strategy, so you should look at what you did wrong in the homework and make sure you understand what you did wrong.

The rules about what you can bring are mostly guidelines, cause you'll be solving problems in front of (and with) him.

3 2015-12-02 Wednesday

3.1 Amortized Analysis (continued)

3.1.1 Aggregate Analysis

We were doing amortized analysis of the binary counter. Over all 2ⁿ increments, you'll get the following costs:

Cost 1 for 2^n-1 increments
Cost 2 for 2^n-2 increments
Cost 3 for 2^n-3 increments
…
Cost n for 2^n-n (=1) increments

So the average cost is:

\begin{align*} \text{avg. cost} = \frac{1}{2^n} \sum_{i=1}^n 2^{n-i} i \end{align*}

Actually, the cost n action happens twice. Let's do this in terms of bits:

Bit 1 examined 2ⁿ times
Bit 2 examined 2^n-1 times
…
Bit n examined 1 (+1) times

So the total and average costs are:

\begin{align*} \text{total cost} &= 1 + \sum_{i=0}^n 2^i = 2^{n+1} - 1 + 1 = 2^n+1 \\ \text{average cost} &= \frac{2^{n+1}}{2^n} = 2 \end{align*}

3.1.2 Accounting Method

We associate costs with actions on the data structure.

When you increment the counter from 0, you travel to the first bit, and flip it. You leave a "candy". Your cost is 1 bit traveled plus 1 candy placed = 2.
When you increment from 1 to 2, you travel to the first bit and flip it to zero, and eat that candy. Then, you travel to the second bit, flip it to one, and leave a candy there. Your cost is 2 travel, 1 candy placed, and 1 candy eaten, for a total of 2.
When you increment from 2 to 3, you travel to the first bit, flip it to one, and leave a candy. Your cost is 1 travel, 1 candy, for a total of 2.
When you increment from 3 to 4, you travel to the first bit, flip it to 0, eat the candy. Same thing at the second. Finally, at the third, you flip it to one, and leave a candy. Your cost is 3 travel, 1 candy placed, and 2 candy eaten, for a total of 2.

This pattern continues: no matter whan increment you do, you will always have a cost of (at most) 2, which happens to be the same as we got above.

How does this work? The idea is that when you flip a bit to 1, you also account for the fact that later on, you will be flipping it to zero (this is the act of leaving a candy). Then, when you actually do that operation later on, you "eat the candy", or in other words you don't have to pay that cost. This is actually really cool.

3.1.3 Potential Function Method

Data structure D_i: state after $i$th operation.
Φ: maps state of data structure → real numbers. This is a potential function.
c_i: cost of $i$th operation

The total cost of all operations on the data structure (if there are $m$ operations) is simply:

\begin{align*} \text{total cost} = \frac{1}{m} \sum_{i=1}^m c_i \end{align*}

Now, we define $\hat{c}_i$, to be the "amortized" cost of the $i$th operation. We define it as:

\begin{align*} \hat{c}_i = c_i - \Phi(D_{i-1}) + \Phi(D_i) \end{align*}

The sum of the amortized costs is:

\begin{align*} \sum_{i=1}^m \hat{c}_i &= \sum_{i=1}^m (c_i - \Phi(D_{i-1}) + \Phi(D_i)) \\ &= \sum_{i=1}^m c_i - \Phi(D_0) + \Phi(D_m) \\ &\ge \sum_{i=1}^m c_i \end{align*}

The last inequality holds whenever $\Phi(D_m) \ge \Phi(D_0)$. By convention, most papers set $\Phi(D_0) = 0$, and ensure that $\Phi(D_m) \ge 0$ at all times.

With this formulation, we can come up with any potential function we like (that satisfies the above condition), and then sum up the amortized costs instead of the actual costs, and get a bound on the total cost.

In the binary counter, we let the potential function be the number of 1's in the counter. We can see that the value of $\hat{c}_i$ is always 2.

In this case, the potentials are pretty much the same as the candies in the Accounting Method. This is not always the case; you can view the potential method as more general than the accounting method, because it can be used in situations where there's no clear way to "assign candies" in strategic locations like we did earlier.

3.2 Paging Problem

This problem has to do with memory management in operating systems.

We have space in main memory for $k$ pages.
We have a process that has a working set of $n$ pages.
The $n-k$ pages that are not stored in main memory are stored safely on disk.

We're going to view the process as an adversary that requests pages out of its working set. If the OS has it in main memory, nothing needs to happen. Otherwise, it should be loaded from the disk into main memory. The other problem is that the OS will (probably) have to evict a page from main memory to make room for the page.

The OS would like to maximize the number of hits, or minimize the number of page faults (where it loads the page from disk).

Our worst case analysis would be simple - in the worst case, there is a page fault on every request, so we have a cost of 1 disk access for every page request.

We can see that a randomized algorithm could prevent the worst case for a process. But maybe we'd like to do an amortized analysis for a deterministic algorithm. Next time we'll talk about that.

4 2015-11-30 Monday

4.1 Homework 5 recap

Here is a nicer overview of the steps from the capacity scaling algorithm, for when you're actually stepping through it (instead of the algorithm as presented earlier, which is a bit difficult to interpret at times).

For each delta scaling phase:

Copy over the residual network (nodes and arcs).
Copy over the node potentials (π) from the previous iteration, as well as the reduced costs, upper bounds, and excesses.
Any new edges (not in the previous residual) should be drawn, and have their reduced costs computed. Chances are, some of these will have negative reduced costs.
- You do arc reversal on those arcs, which will change the excesses. This is a good time to redraw the residual. (at least, this is when Dr. Liberatore redraws it).
Select an excess and deficit vertex to do an augmentation on. You will then recompute the node potentials (π) and the reduced costs (c_ij^π). Also make sure to add in the reversed edges.
- Repeat this until all your excesses and deficits are less than Δ.

It seems like the other thing I missed was that you should assume that there is a path from 1 to 8, of very large cost. In his example (which he started last class), it looks like he made the cost 15. I'll have to get some input from Jeff on how he did that in class last Wednesday.

4.2 Amortized Analysis

Binary counter: suppose you have an $n$ bit counter. Initially, it is set to all 0's. We will pretend this counter is a data structure that supports only one operation, which is increment. This does your typical increment operation (flip the bit and carry if it was 1). An increment operation has a cost defined by the number of bits you flip.

The worst case cost for an operation is going to be $n$. However, it looks like most of your costs are going to be much smaller. Half of them will have cost 1. So amortized analysis will be a way of doing worst-case analysis that involves finding the cost over an entire sequence of operations.

5 2015-11-23 Monday

5.1 Max Cut

Graph $G=(V, E)$
Def: A cut is a partition of $V$ into $S$ and $V-S$.
Def: An edge $e = \{u,v\}$\) crosses cut $S, V-S$ iff $u \in S$, $v \not\in S$.
Let $a_{ij}$ be the weight (≥ 0) associated with edge $\{i,j\}$.
The problem is to find a cut that maximizes $\sum_{i,j\text{ crossing cut}} a_{ij}$

This problem is NP-hard, so we'd like to find an approximation algorithm. Note a few things. First, the max cut for a bipartite graph is just the cut that separates the two sets of edges. Also, in general, we can assume that we are working with the "complete" graph, and that the edges are 0 if there was no edge in the original graph.

First Claim: there is a randomized algorithm with an approximation ratio of $\frac{1}{2}$.

This algorithm simply assigns each node to $S$ with probability $\frac{1}{2}$. The probability that an edge crosses the cut is $\frac{1}{2}$. So the expected probability is $\frac{1}{2} \sum a_{ij}$, which is greater than or equal to half the maximum possible optimal ($\sum a_{ij}$).

Now, let's look at the integer linear program. First, our variables:

\begin{equation} x_i = \left\{\begin{array} +1 & \text{ if } i \in S \\ -1 & \text{ if } i \not\in S \\ \end{array}\right. \end{equation}

Here's the "program":

\begin{align*} \max \sum_{i Note that $x_i x_j$ is positive 1 if the edge doesn't cross, and negative 1 if it does. So $1 - x_i x_j$ is 2 if the edge crosses, and 0 otherwise. So then we divide by the above to get the "indicator" that the edge crosses the cut.

This program is a bit of a downer because it's quadratic, and linear. We're gonna stick with it, We're going to relax the integer condition on x, in a very weird way.

Say that instead of $x$ being scalar, we used $\vec{v}$, a vector on the unit sphere in $R^n$. If they are opposite each other, then the dot product will be -1, and if they are the same, the dot product will be 1. And we'll also relax the "integer" constraint on the $\vec{v_i}$.

Okay. Let's now say we have $B = (v_1, v_2, \dots, v_n$. Let $y = B^T B$:

\begin{equation} y = B^T B = \begin{bmatrix} v_1^T \\ v_2^T \\ \vdots \\ v_n^T \end{bmatrix} \begin{bmatrix} v_1 & v_2 & \cdots & v_n \end{bmatrix} = \begin{bmatrix} v_1 \cdot v_1 & v_1 \cdot v_2 & \cdots & v_1 \cdot v_n \\ v_2 \cdot v_1 & v_2 \cdot v_2 & \cdots & v_2 \cdot v_n \\ \vdots & \vdots & \ddots & \vdots \\ v_n \cdot v_1 & v_n \cdot v_2 & \cdots & v_n \cdot v_n \\ \end{bmatrix} \end{equation}

So now our objective is $\max \sum_{i\lt j} a_{ij} \frac{1 - y_{ij}}{2}$. I guess we need that the matrix is positive semidefinite. So I guess we have a semidefinite program. We can solve this SDP to within some $\epsilon$ in time polynomial in $n$ and $\log \epsilon$. Since ε is a constant here, we're good. This will give us vectors $v_i^*$.

Our job will be to separate the vectors into two groups. Turns out we'll just select a random hyperplane. Woo randomness.

6 2015-11-20 Friday

6.1 MAX SAT

Recall:

"Maximum Satisfiability"
$n$ boolean variables, $x_1, \dots, x_n$.
$p$ clauses, $c_1, \dots, c_p$
$c_i = x_1 \lor \bar{x_2} \lor x_3$ (for example)

Continuing on:

Let $k(i)$ be the number of literals in clause $i$.
Let $S_i^+ =$ the set of $x$ variables that are not negated.
- For the example $c_i$ above, $S_i^+ = \{x_1, x_3\}$
Let $S_i^- =$ the set of $x$ variables that are negated.
- For the example $c_i$ above, $S_i^+ = \{x_2\}$

In max sat, we want to maximize the number of clauses that are satisfied. Our initial randomized algorithm (which we applied when $k(i) \ge k = 2$) assigned each variable a value with probability $\frac{1}{2}$. This gave a $\frac{3}{4}$ approximation algorithm, which is tough to beat. But the performance is bad when $k=1$.

So now we'll examine an algorithm based on a linear program. Let's use the following notation:

\begin{equation} y_j = \left\{\begin{array}{ll} 1 & \text{ if } x_j = T \\ 0 & \text{ otherwise} \end{array}\right. \end{equation}

Now, here is the working ILP formulation:

\begin{align*} \max \sum_{i=1}^p z_i & \text{ s.t.} \\ \sum_{j \in S_i^+} y_j + \sum_{j \in S_i^-}(1-y_j) &\ge z_i \\ 0 \le z_i &\le 1 \\ y_j &\in \{0, 1\} \\ \end{align*}

Our $z_i$ are variables such that:

\begin{equation} z_i = \left\{\begin{array}{ll} 1 & \text{ if } c_i = T \\ 0 & \text{ otherwise} \end{array}\right. \end{equation}

When we do the LP relaxation, $y_i$ can range from 0 to 1. Then, our randomized algorithm will have $Pr(x_j = T) = y_j^*$. So, the probability that a clause is satisfied is:

\begin{align*} Pr(C_i = T) &= 1 - \left(\prod_{j \in S_i^+} (1-y_j^*)\right) \left(\prod_{j \in S_i^-} y_j^*\right) \\ &\ge \left(1 - \left(1 - \frac{1}{k(i)}\right)^{k(i)}\right)z_i^* \\ &\ge \left(1 - \frac{1}{e}\right)z_i^* \end{align*}

So, the expected number of clauses satisfied is $\sum_{i=1}^p \left(1 - \frac{1}{e}\right)z_i^*$. The best you can do is $\sum_{i=1}^p z_i^*$, so the approximation ratio is $\left(1 - \frac{1}{e}\right) \approx 0.63$.

This isn't as good as the original algorithm, which had an approximation ratio of 0.75. But it deals better with the cases of small $k$. Of course, the downside is that we have to solve a linear program. Which kinda stinks. So our best algorithm is just to run both and pick the best.

OK, what can we say about the expected performance of this "best" algorithm? Well, the algorithm has a performance of $\max\{n_1, n_2\}$, where the two $n$ variables are the performance of the two algorithms. Expectations of max's are very hard, so we will just say that:

\begin{align*} \max\{n_1, n_2\} &\ge \frac{n_1 + n_2}{2}\\ &= \frac{1}{2} \left(\sum_{i=1}^p \left(1 - \frac{1}{2^{k(i)}}\right) \sum_{i=1}^p \left(1 - \left(1 - \frac{1}{k(i)}\right)^{k(i)}\right)z_i^*\right) \\ &\ge \frac{1}{2} \sum_{i=1}^p z_i^* \left(1 - \frac{1}{2^{k(i)}}\right) \left(1 - \left(1 - \frac{1}{k(i)}\right)^{k(i)}\right)\\ &\ge \frac{1}{2} \sum_{i=1}^p z_i^* \frac{3}{2} \\ &= \frac{3}{4} \sum_{i=1}^p z_i^* \\ \end{align*}

That step up above where we replaced the big stuff with $\frac{3}{2}$ can actually be proven easily by induction on $k$. But it's pretty darn cool result, because it's not true in general that the expectation of a max operation is just the max of the respective expectation. It works out this way because the two algorithms make up for each other's weaknesses (k=1).

7 2015-11-18 Wednesday

7.1 "Math Facts"

Fact #1:

\begin{equation} 1 - x \le \left(1 - \frac{x}{k}\right)^k \le e^{-x} \:\:\: \forall x \in R, \forall k \in Z^+ \end{equation}

in the limit as $k\to\infty$, the right side is equality.
when $k=1$, the left side is equality.

Fact #2:

Let $x_1, x_2, \dots, x_n \in [0,1]$, s.t. $\sum_{i=1}^n x_i \ge \zeta \ge 0$. Then:

\begin{equation} \prod_{i=1}^n (1-x_i) \le \left(1 - \frac{\zeta}{n}\right)^n \end{equation}

We'll "sketch" the proof without fully doing it. Let's look at the case $n=2$, $\zeta=1$. So, we have:

$x_1 + x_2 \ge 1$
Or, $x_2 \ge 1 - x_1$.

The expression we're interested in is:

\begin{align*} (1 - x_1)(1 - x_2) \le x_2(1-x_2) = f(x_2) \end{align*}

You can maximize $f$ by looking at its derivative, which will give you $x_2 = \frac{1}{2} = x_1$. So, plugging that back in:

\begin{align*} (1 - x_1)(1 - x_2) \le x_2(1-x_2) \le \left(1 - \frac{1}{2}\right)^2 \end{align*}

Tada! You can do this generally by induction on $n$, but we won't.

7.2 Back to Hitting Set

Here's the ILP relaxation of the hitting set problem:

$\min \sum_{e \in E} c_e x_e$, (we'll call this $c(x)$) s.t.
$\sum_{e \in T_i} x_e \ge 1$
$0 \le x_e \le 1$

Earlier, we interpreted $x_e$ as $Pr[e\in A]$. Then, we defined a randomized algorithm to "solve" this problem. Except it didn't solve it, because its solutions weren't necessarily feasible. But hey, it performed well! So, let's look at the probability that this algorithm produces an infeasible solution. We'l start by looking at the probability that a single set $T_i$ isn't "covered":

\begin{align*} Pr[T_i \cap A \ne \emptyset] &= 1 - Pr[\text{any member is in A}] \\ &= 1 - \prod_{e \in T_i} Pr[e \not\in A] \\ &= P - \prod_{e \in T_i} (1 - x_e^*) \end{align*}

For each $T_i$, we have $\sum_{e \in T_i} x_e^* \ge 1$. Which means that we can say:

\begin{align*} Pr[T_i \cap A \ne \emptyset] &= P - \prod_{e \in T_i} (1 - x_e^*) \\ &\ge 1 - \left(1 - \frac{1}{|T_i |}\right) \\ &\ge 1 - \frac{1}{e} > 0 \end{align*}

Now, suppose we repeat the randomized algorithm $t$ times. Now, we know that $e\in A$ iff $\exists$ one trial in which $e\in A$. Looking at the expressions above, we know that then:

\begin{align*} Pr[T_i \cap A = \emptyset] \le \frac{1}{e^t} \le \frac{1}{cp} \end{align*}

That last bit holds if we choose a $t \in \Theta(\log p)$. Now, let's define an indicator random variable $\gamma_i$ which is 1 when $T_i$ is not covered. Since it's an indicator RV, its expectation is just the probability of its event, which means:

\begin{align*} E[\gamma_i ] \le \frac{1}{cp} \end{align*}

Now, we can talk about the expected number of sets not hit after $t$ trials:

\begin{align*} E[\sum_{i=1}^p \gamma_i ] \le \frac{1}{c} \end{align*}

This is true due to the linearity of expectation. It is not true that the $\gamma_i$ are independent.

So, repeating the algorithm can allow us to bound the number of sets not covered and make it arbitrarily small. But, this ability comes at a cost. Let's look at the expected cost of the new solution:

\begin{align*} E[c(A)] = t * c(x^*) \le t* \text{OPTIMAL} \end{align*}

This leads to the theorem that this repeated randomized algorithm is an $O(\log p)$-apx algorithm, and it returns a hitting set with probability $\ge 1 - \frac{1}{c}$.

Let's summarize our algorithms for Hitting Set:

Primal/Dual algorithm: apx $\max |T_i |$ (for vertex cover, this is 2)
- both simple and deterministic
Repeated Randomized algorithM: apx $O(\log p)$
- also simple, but randomized and has a nonzero probability of failure.

7.3 Maximum Satisfiability

Have $n$ decision variables, $x_1, x_2, \dots, x_n \in \{T, F\}$.
Have $c_i$, a bunch of clauses over $x_1, \dots, x_n$.
- For example, this could be $x_1 \lor x_2$, $x_1 \lor \lnot x_3$, or $\not x_1 \lor x_2 \lor \lnot x_3$.
- In general, a clause may only be a disjunction of variables (or their negations).
We would like to find a truth assignment to $x_1, x_2, \dots, x_n$ that maximizes the number of true clauses.

Let's imagine right now that each clause contains $\ge k = 2$ literals. There is an incredibly easy approximation algorithm here: just assign $x_i = T$ with probability $\frac{1}{2}$. The probability of a clause not being satisfied is less than or equal to $\frac{1}{4}$. The expected number of clauses being satisfied is $\frac{3}{4}p$, (where $p$ is the number of clauses), and you can't do better than $p$. So this is a $\frac{3}{4}$ apx algorithm. But it's not so good when $k$ is 1.

8 2015-11-16 Monday

8.1 Hitting Set

Given $E$, $c_e \ge 0 \:\: \forall e \in E$, $T_1, T_2, \dots, T_p \subseteq E$
Find $A \subseteq E$ w/ min $c(A) = \sum_{e \in A} c_e$
- such that $A \cap T_I \ne \emptyset, \:\: (i=1,2,\dots,p)$
An example of this problem is vertex cover.

An ILP for this:

$x_e$ is 1 if $e \in A$, or 0 otherwise.
Minimize $c_{opt}$ subject to constraints based on the $T$ sets.

The linear relaxation would allow $x_e$ to take any value between 0 and 1. The dual involves some variables $y_i$ for $i=1,2,\dots,p$. Complementary slackness conditions:

$y_i \left(\sum_{e \in T_I} x_e - 1\right) = 0$
$x_e \left(\sum_{i,e\in T_i} y_i - c_e \right) = 0$
And we want our solutions to be integral.

Unfortunately, this problem is not unimodular, so we can't expect all the conditions to be satisfied. So I guess we're just going to try to enforce the bottom two conditions. Furthermore, there is something we get when we apply integrality to the second CS condition. I guess it's nice cause we know that the value of $x$ must be 0 or 1, which makes the conditions a bit stronger.

8.2 Primal Dual Algorithm for HS

We start with an integral primal and a feasible dual, which are relataed by complementary slackness. In MCNF, we could start with $x=0$, and $\pi=0$.

$A \gets \emptyset$ i.e. x=0
$y_i \gets 0 \:\:\: (i=1,2, \dots, p)$
while $\exists K: T_k \cap A = \emptyset$ (that is, while x is still infeasible)
- (comment) Change x so that it's slightly "less infeasible". (In MCNF, this was pushing flow from excess to deficit).
- (comment) Update y so as to ensure complementary slacknesss. (In MCNF, π ← π-d).
- Increase $y_k$ until $\exists e: \sum_{i,e\in T_i} y_i = c_e$.
- $A \gets A \cup \{e\}$ (which means $x_e=1$

8.3 Example

We have a graph with nodes $a,b,c,d$, which have costs $2,3,4,5$ respectively. We have edges $(a,b), (a,c), (a,d), (b,c), (c,d)$. It's a rectangle with a diagonal. The $y_i$ values exist for each edge, and they are all initialized to 0.

We start with the $T_k$ associated with the edge $(a,c)$. We increase $y_k$ (you might say instead $y_{ac}$) to 2, because this is the first value that will make a vertex's CS condition switch. This causes vertex $a$'s condition to be true, so we can add $a$ to the cover.

Next iteration, we see that there are still two edges ($(b,c)$ and $(c,d)$). that haven't been covered yet. We wil choose $T_k$ associated with $(b,c)$. When we increase $y_{bc}$, the first value that makes a constraint tight is 2, making the constraint on $c$ tight (since it already has 2 in its sum, and its cost is 4). So, we add $c$ to the cover, which covers all the edges and terminates the algorithm.

8.4 Approximation

Theorem: Primal Dual is a max $|T_i|$-apx algorithm for HS.

Corollary: PD is a 2-apx algorithm for VC.

Proof of Theorem: The $y_i$ values start at 0 and never decrease, and are always feasible (since ∑ y_i < c_e\). $e$ is added to $A$ when $\sum_{i:e\in T_i} y_i = c_e$.

\begin{align*} c(A) &= \sum_{e \in A} c_e = \sum_{e \in A} \sum_{i:e\in T_i} y_i \\ &= \sum_{i=1}^p y_i |A \cap T_i | \\ &\le \max_{1 \le i \le p} |T_i | \cdot \sum_{i=1}^p y_i \\ &\le \max_{1 \le i \le p} |T_{i}| \cdot c(y^*) \\ &\le \max_{1 \le i \le p} |T_{i}| \cdot c(x^*) \\ &\le \max_{1 \le i \le p} |T_{i}| \cdot c(x_{int}^*) \\ \end{align*}

8.5 Randomized Approximation Algorithm

LP:

$\min \sum_{e \in E} c_e x_e$, s.t.
$\sum_{e \in T_i} x_e \ge 1$
$0 \le x_e \le 1$

Randomized algorithm idea v0.1:

Solve the linear relaxation above. This gets you an x.
Add $e$ to $A$ with probability $x_e$.

Sadly, the glaringly obvious issue here is that in general, a solution generated by this algorithm is not guaranteed to be feasible. But how would it perform? Well, it turns out that $E[c(A)] = c(x^*) \le c(x_{int}^*)$. Which is really good. Like, potentially better than the actual solution to the ILP. Which seems problematic :P

The problem that needs to be solved is making this algorithm feasible. And the approach will be selecting a number $t$ of trials where we will choose whether to add $e$ to $A$. This will allow us to reduce the probability of infeasibility, (which sadly increases the expected cost, but ya gotta do what you gotta do). There will never be a point where the probability of infeasibility of 0.

9 2015-11-11 Wednesday

9.1 Game Tree Evaluation (cont)

Same problem setup - binary tree, alternating min/max, and 0/1 leaves. Recall from last time:

Not proven, but intuitive: for any deterministic algorithm, I can find a labelling that will force the algorithm to visit every single leaf node.
We viewed this as a game situation where the algorithm and the input were adversaries. In this view, a randomized algorithm could be viewed as a mixed strategy as opposed to any single deterministic (i.e. pure) strategy.
Finally, we were attempting to quantify the expected worst case number of leaves visited, by defining it recursively. We're looking at a tree with "min" at the top of each round.

Expected visits for the max layer:

If max evaluates to 0: then E[# leaves] = $2W(k-1)$.
If max evaluates to 1: then E[# leaves] $\le \frac{3}{2} W(k-1)$.

Now we need to try to quantify $W(k)$, so we look at the min layer (above the max layer).

If min evaluates to 0: then at least one child max evaluates to 0.
- With at least 0.5 probability, the algorithm finds it:
- $W(k) \le \frac{1}{2}[2(W(k-1))] + \frac{1}{2}[2W(k-1) + \frac{3}{2}W(k-1)]$
- This is $< W(k-1)$
If min evaluates to 1: then we know both children evaluated to 1.
- $W(k) \le 2 * \frac{3}{2} W(k-1) = 3 W(k-1)$

In all cases, $W(k) \le 3W(k-1)\le 3^k = 3^{\log_4 n} = n^{\log_4 3} = n^{0.793} = o(n)$. So by randomizing, we have improved from linear in the number of leaves, to sub linear. Yay us.

Now, we would like to show that there is not a better randomized algorithm for this problem. To do this, we will use Yao's principle:

\begin{equation} \min_y \max_i H(i,y) \ge \min_j H(x,j) \forall x \end{equation}

In our case, $H(i,y)$ is the expected number of leaves visited by randomized algorithm $y$ on input tree $i$. Similarly, $H(x,j)$ is the expected number of leaves visited by deterministic algorithm $j$ on a tree labelled by probability distribution $x$.

What Yao's principle gives us here is that our best randomized algorithm has a lower bound determined by the minimum number of leaves expected to be visited by a deterministic algorithm on any input probability distribution $x$. Interesting.

We will label a leaf 1 with probability $p$ and 0 with probability $1-p$. Doesn't matter what $p$ is. In this case, we have $p=\frac{3-\sqrt{5}}{2}$

Also, instead of doing mins and maxes, since we're guaranteed to have 0's and 1's, we know that max nodes are like or's, and min nodes are like ands. If we replace all nodes with nor's, we get an equivalent tree (viewed from the top). This is because $\lnot(\lnot(a \lor b) \lor \lnot (c \lor d)) = (a \lor b) \land (c \lor d)$. Let's look at the truth table of a nor:

a	b	$\lnot(a \lor b)$	probability
0	0	1	$1-p)^2 = p$
0	1	0	$1-(1-p)^2\ = 1-p$
1	0	0	same
1	1	0	same

Since Dr. Liberatore cleverly solved a quadratic equation, we have a nice property that each node in the whole tree is 1/0 with the exact same probability distribution: $p$ and $1-p$. Woah.

Ok, so now we're looking at a tree of depth two, min at the top, max at the next layer, leaves labeled left to right a,b,c,d. Let a(b,c,d) be the first (second, third, fourth) grandchild(ren) visited by a deterministic algorithm.

Let's look at cases:

Case	$W(k)$	Pr
$a=0 \to b=0$	$2W(k-1)$	$(1-p)^2 = p$
$a=0 \to b=1 \to c=0 \to d=?$	$4W(k-1)$	$(1-p^2)p = p^2$
$a=0 \to b=1 \to c=1$	$3W(k-1)$	$p^2(1-p)$
$a=1 \to c=0 \to d=?$	$3W(k-1)$	$p(1-p)$
$a=1 \to c=1$	$2W(k-1)$	$p(1-p)$

To find the expected value of $W(k)$, we can just sum up each case times its probability, plug it into your favorite numerical/symbolic computing system, and get back an answer. Or, we could be lazy, and try to avoid that. Guess what we're going to do? Be lazy!

We know that the final recurrence will be of the form $W(k) = c W(k-1)$. "Your book is fearless, and actually calculated the value of $c$." - Dr. Liberatore.

However, we see that 3 is sort of the "mean" value, and the probability of 2 is greater than the probability of 4. So we know that $c<3$, but more than two. If you get the exact value of $c$ as the book did, you would find that $W(k) = \Omega(n^{0.694})$. The exponent is $\log_4 c$, $c \approx 2.62$. Our estimation was right on.

This tells us that a randomized algorithm needs to examine at least that many nodes. Our randomized algorithm examines more, and it's an open problem to try to close the gap.

10 2015-11-09 Monday

10.1 Game Tree Evaluation

(continued)

Recall:

We have a game tree, with internal nodes labeled "min" and "max" alternating layers. The leaves have payoff values of either 0 or 1.
Some definitions:
- $n$: number of leaves ($=4^K$)
- $k$: number of rounds
- $h$: height ($=2k$)
Input: $k$, labeling of leaves.
We were doing a "game theoretic" approach to this problem, where:
- One player tries to minimize the number of node visits by designing a smarter algorithm.
- The other player tries to maximize the number of node visits by designing a devious tree.
- We realized that really any deterministic algorithm will have an input that will force it to visit every node.
- We realized that a randomized algorithm was equivalent to a "mixed" strategy, and it may present a better expected runtime

10.1.1 Pruned Depth First Search

This is essentially a depth first search where we avoid expanding branches if we already know the parent's value. Without loss of generality, any algorithm we design will be of this form. That is (according to the book): "for any algorithm that examines $L$ leaves, there exists a pruned DFS that examines no more than $L$ leaves." So it's a bit simpler to think of everything in pruned DFS terms anyway.

10.1.2 Game Theory Approach to Game Tree Evaluation

Question: Here's a deterministic algorithm. Show that it examines all leaves.

Answer: A labeling that forces the examination of all leaves.

We have two players:

The Input (worst-case). Objective: maximize $n$.
- Strategies: $S^1 = \{(0,0,0,0), \dots, (1,1,1,1)\}$
- Some are not useful at all (like the first and last)
- So strictly less than 16 (or 15) strategies in the case of $k=1$.
The Algorithm: Objective: minimize $n$
- Strategies: $S^1 = \{(L,L,L), \dots, (R,R,R)\}$.
- Each is an assignment of the "first" direction to each internal node.
- These are all unique and potentially useful strategies, so exactly 8.

10.1.3 Randomized Algorithms

RA's base their decisions on a random number generator.

Examples:

for Game Tree Evaluation, choose a strategy in S² uniformly at random, and use it.
always use the first strategy (an RA doesn't have to use its RNG).
choose a strategy in S² at random, with any probability distribution over S².

This already feels quite similar to mixed strategies!

Note: random ≠ arbitrary. In depth first search, we arbitrarily select an outgoing edge to expand. That doesn't mean we're doing it at random. An arbitrary selection is just a decision made ahead of time, and a random with arbitrary decisions will always run the same, and may have a worst case input.

Anyway, now that we're looking at this problem as a game, we should also note that the number of strategies of either player is exploding exponentially as a function of the number of rounds. So we can't expect to be able to use a linear program to come up with a maximized worst case mixed strategy. But we can still use game theory stuff (such as Yao's principle) to help.

10.1.4 Worst Case

Ok, our previous initial randomized algorithm was to select a strategy from $S^1$ uniformly at random. This is equivalent to, when visiting an internal node, go L/R with probability 0.5.

Let $W(k)$ be the worst case number of leaves examined by randomized algorithm in the expectation ($k$: number of rounds).

Let's look at a "max" node, which will finally evaluate to 0. We know that both nodes must have been evaluated, so $W(k) = 2 W(k-1)$.
Now let's look at a "max" node, which will finally evaluate to 1. This means that at least one of the children will evaluate to 1. With at least 0.5 probability, we will find it, and have $W(k) = W(k-1)$, and with no more than 0.5 probability, we won't find it, and get $W(k) = 2W(k-1)$. So, the expected worse case is $W(k) \le \frac{3}{2} W(k-1)$.

11 2015-11-06 Friday

11.1 Two Person Zero Sum Games (cont)

Last time, we defined TPZSGs, and saddle points, and we talked about mixed strategies. We also formulated the problem of designing a mixed strategy to maximize a player's worst case payoff:

\begin{align*} \max v, &\text{ s.t.} \\ \sum_{i=1}^n H(i,j) x_i &\ge v \:\:\: j=1,2,\dots,m \\ \sum_{i=1}^n x_i &= 1 \\ x_i &\ge 0 \:\:\: i=1,2,\dots,n \end{align*}

The dual:

\begin{align*} \min w, &\text{ s.t.} \\ \sum_{j=1}^m H(i,j) y_j &\le w \:\:\: i=1,\dots,n\\ \sum_{i=1}^m y_i &= 1 \\ y_i &\ge 0 \:\:\: j=1,\dots,m\\ \end{align*}

Would you look at that! The dual is the exact same thing except for the other player. An due to duality, we know that $v^* = w^*$, so they will result in the same value. This is Loomis's Lemma (this guy Loomis proved it before we knew about LP duality, so good for him).

We can take the constraints of the dual and sum them up like this to get a new expression (for any $x$).

\begin{align*} \sum_{i=1}^n x_i \sum_{j=1}^m H(i,j) y_j^* &\le \sum_{i=1}^n x_i w^* \\ \sum_{i=1}^n \sum_{j=1}^m x_i H(i,j) y_j^* &\le w^* \end{align*}

And similarly for the primal:

\begin{align*} \sum_{j=1}^m \sum_{i=1}^n x_i^* H(i,j) y_j \ge v^* \end{align*}

Combining these (and substituting in $x^*$ and $y^*$ respectively, we have:

\begin{equation} v^* \le \sum_{i=1}^n \sum_{j=1}^m x_i^* H(i,j) y_j^* \le w^* \end{equation}

By definition, that quantity in the summation is $H(x^*, y^*)$, and by duality, we know that $H(x^*, y^*) = v^* = w^*$. This leads us to the inequality:

\begin{equation} H(x,y^*) \le H(x^*, y^*) \le H(x^*,y) \end{equation}

This leads to the theorem (Von Neuman): Every 2 person zero sum game has a (mixed) saddle point.

11.1.1 Yao's Principle

\begin{equation} \max_x \min_j H(x,j) = \min_y \max_i H(i,y) \le \max_i H(i,y) \forall y \end{equation}

This is a restatement of Loomis's lemma, with a rather obvious addendum. And then we put this into two inequalities that look a bit less obvious.

\begin{align*} \max_x \min_j H(x,j) &\le \max_i H(i,y) \forall y\\ \min_y \max_i H(i,j) &\ge \min_j H(x,j) \forall x \end{align*}

And I guess that is Yao's Principle. It is useful for when you want to put an upper bound on the $v^*$ the optimal payoff. You create a $y$ that has $\max_i H(i,y) \le b$.

11.1.2 Example Time

Game Tree Evaluation:

Input: game tree. The game tree root is a maximizing node, its children are minimizing, and the next level are maximizing, etc. The leaves are labeled with 0 or 1. The tree has degree $d$, and $k$ rounds (which means that the height of the tree is $2k$). In our example, $d=2$.
Output: $x$, the value at the root node.

We would like an algorithm to evaluate this tree, and we would ilke it to minimize the number of leaves visited (which is roughly equivalent to minimizing runtime).

The total number of leaves in this tree is $4^k$.

Depending on the tree, there could be a node that will allow you to skip visiting other nodes. But with a deterministic algorithm, you can always design a worst-case input that will "hide" that information for as long as possible.

This already is similar to game theory stuff. We are one player (trying to find a worst-case input), and our algorithm is the other player (trying to terminate as quickly as possible).

12 2015-11-04 Wednesday

12.1 Two Person Zero Sum Games

Recall: a TPZSG is a game with 2 players, and the payoff of player 1 is the opposite of the payoff of player 2. We will define $H(i,j)$ to be the payoff for player 1 of strategies $i$, $j$.

Definition: A saddle point is strategies $i^*$ (first player), $j^*$ (second player) s.t.:

$\forall i \in S^1$, $\forall j \in S^2$: $H(i, j^*) \le H(i^*, j^*) \le H(i^*, j)$

That is, when player 1 knows player 2 is using $j^*$, no strategy will do better than $i^*$. And vice versa.

Example:

	1:	2:	3:
1:	3	4	3
2:	2	0	1

Saddle point: $i^* = 1$, $j^* = 1$, with a payoff of 3.

There is not always a saddle point!

Assume ∃ saddle point $(i^*, j^*)$:

$\forall \min_j H(i,j) \le H(i,j^*) \le H(i^*,j^*)$

$\max_i \min_j H(i,j) \le H(i^*, j^*) \le \min_j \max_i H(i,j)$

When there is a saddle point, these inequalities are actually strict inequalities! Since you would simply choose $i^*$ and $j^*$.

12.2 Saddle Points for Mixed Strategies

Definition: A saddle point $(x^*, y^*)$ has the property that $\forall x,y H(x, y^*) \le H(x^*, y^*) \le H(x^*, y)$.

This is pretty much exactly the same as previously, except $H$ is now an expectation over a probability distribution.

12.3 Something Else

From the point of view of player 1, I expect the worst (i.e. minimizing my playoff) pure strategy from player 2. I would like to formulate a mixed strategy to play the game that maximizes my (expected) "profit". Let $x_1, x_2, \dots, x_n$ be probabilities of strategy $i \in S^1$. If player 2 plays $j$ (I don't know that he will!):

$H(x^*, j) = \sum_{i=1}^n x_i H(i,j)$

My worst-case payoff for any strategy that player 2 can play is:

$\min_j H(x^*, j) = \min_j \sum_{i=1} x_i H(i,j)$

I would like to maximize my own worst-case payoff (since I know player 2 will try to minimize my payoff). That is:

$x^* = \arg \max_x \min_j \sum_{i=1}^n x_i H(i,j)$

How do we find this? Linear programming. We create a constraint for every strategy $j$ can have: $\sum_{i=1}^n x_i H(i,j) \ge v$. We want to maximize $v$, our minimum payoff. We also need to ensure that $\sum_{i=1}^n x_i = 1$., and $x_i \ge 0 \forall i$. The overall LP:

$\max v$, s.t.
$-\sum_{i=1}^n x_i H(i,j) + v \le 0 \:\: \forall j$
$\sum_{i=1}^n x_i = 1$
$x_i \ge 0 \:\: \forall i$

Dual:

$\min w$, s.t.
$-\sum_{j=1}^m H(i,j) y_j + w \ge 0$
$\sum_{i=1}^m y_i = 1$
$y_j \ge 0$

13 2015-11-02 Monday

13.1 Dr. Liberatore's Homework

Last time we talked about games in Behavioral (Extensive?) form, and translating them into Strategic form. We are still looking at the game tree where Player 1 has choices L and R, and then Player 2 chooses L or R after that.

$S^1 = \{L, R\}$
$S^2 = \{(L,L), (L,R), (R,L), (R,R)\}$
- where (x,y) means: if player 1 played L, then x. If player 1 played R, then y. (I think I missed this part last lecture!)

The probability you assign to $S^2$ is the product of the conditional probabilities of each move. So say player 2 will move left with probability $1/2$ when player 1 moves left, and left with probability $3/4$ if player 1 goes right. In this case, the probability of strategy $s^2=(L,L)$ is $\frac{3}{8}$.

Computing the probability of each of the payoffs can be done with this setup, you just have to make sure that you add up probabilities for both strategies that allow a certain branch to take.

Given a game in extensive form:

Wlog the payoff leaves are distinct (if not, add a player who never gets to move and whose payoff is different from leaf to leaf).
$Pr[H(l)]$. If you have a path from the root to a payoff, then the probability is:
\begin{equation} Pr[H(l)] = \prod_{e \in \text{Path root} \to l} p(e) \end{equation}
Here, $p(e)$ is the probability that edge $e$ is chosen. We can subdivide this into terms for each player:
\begin{equation} Pr[H(l)] = \prod_{e \in \text{Path root} \to l} p(e) = \prod_{i=1}^n \prod_{e \in \text{Path root} \to l, e \text{chosen by player } i} p(e) \end{equation}
When you transform the game to strategic form, a strategy set will be the set of all possible choices for all the players. To compute the probability of getting to a certain payoff, you do the following:
\begin{equation} Pr[H(l)] = \prod_{i=1}^n Pr[\text{player }i\text{ contemplates }e_1, e_2, \dots] \end{equation}
Which evaluates to the same probability.

13.2 Our Homework

13.2.1 MCNF and Max Flow

Can't have negative upper bounds! For the nurse problem, you had to maximize the number of unused nurses, not try to maximize the negative of the number of nurses used.

13.2.2 Successive Shortest Paths

Negative reduced costs should not happen, they indicate an error in your work. That's fine if you encounter that and recognize it, but it sounds like a lot of people got negative reduced costs.

Each iteration, you have $(c_{ij}^\pi, r_{ij})$ for each arc, and $(e_i, \pi_i)$ for each node $i$.

You compute the shortest path using the reduced costs. This assigns $d_i$ values.
You determine the shortest path and augment flow.
You draw a new network. The $π_i$'s are updated by $\pi_i - d_i$.
The topology is changed to to the augmentation.
However, the $c_{ij}^\pi$ are always recomputed with $c_{ij}^\pi = c_{ij} - \pi_i + \pi_j$, not using the old $c_{ij}^\pi$

14 2015-10-30 Friday

14.1 Games (strategic form)

$N=\{1,2,\dots,n\}$ players
$S^i$ = set of pure strategies of $i$th player.
Payoff function $H: S^1 \times S^2 \times \dots \times S^N \to R^N$
$\vec{s} \in S^1 \times S^2 \times \dots \times S^N$ strategy profile
$H_i(s)$: payoff of $i$th player.

14.2 Mixed Strategies

Mixed strategy for $i$th player:
- $x^i$: probability distribution over $S^i$.
Mixed strategy profile:
- $\vec{x} = (x^1, x^2, \dots, x^n)$
Strategy profile $\vec{s} = (s_1, s_2, \dots, s_n)$ has probability $\vec{x}(\vec{s}) = x^1(s_1)x^2(s_2) \dots x^n(s_n)$.
Expected payoff:
- $H(\vec{x}) = \sum_{s\in S^1 \times S^2 \times \dots \times S^N} \vec{x}(\vec{s}) H(\vec{s})$

14.3 Games (Behavioral Form)

There is a tree that shows what the players can do. Say, the first level is player one, and there is a branch for each move player one can do. Then, the next level is for the next player, and at each "child node" game state, there is a branch for each move the player can make.

At the leaves of the tree you have the payoff function.

The behavioral form is probably a bit more intuitive. It represents players' moves as sequential, whereas the strategic form doesn't impose that.
The book has a proof that the behavioral form and strategic form are equivalent. But we won't be doing much with behavioral form, so we'll just do an example of one.

Our example of behavioral form was a binary tree with two levels, where player one would first choose between left and right, and then player two would choose between left and right as well.

We had a bit of a disagreement on how to represent player two's set of pure strategies, and its probability distribution. Dr. Liberatore represented $S^2 = \{LL, LR, RL, RR\}$, and created a probability distribution over that. I disagree; I think that you should represent $S^2=\{L, R\}$, and that $x^2$ is just conditioned on $x^1$. But, we spent a lot of time on this dispute without getting anything done, so we moved on.

14.4 Two person zero sum game

$n=2$
$H_1(\vec{s}) = -H_2(\vec{s})$

This means that the first player's maximization is the second player's min.

There is/(may be) a saddle point $(i^*, j^*) \in S^1 \times S^2$.
$H(i,j^*) \le H(i^*,j^*) \le H(i^*,j)$, $\forall i \in S^1, j \in S^2$
- If player two plays $j^*$, then player 1 wlog plays $i^*$.
- If player one plays $i^*$, its payoff is guaranteed to be $\ge H(i^*, j^*)$. It could be more if player two makes a mistake.
There doesn't have to be a saddle point, I guess. For instance, the matching pennies game doesn't have a saddle point.
Andrew's musings: A saddle point is a [https://en.wikipedia.org/wiki/Nash_equilibrium[Nash equilibrium]]. For pure strategies, there does not have to be a saddle point, but if you allow mixed strategies, then for every two-player, zero sum game, there is always a saddle point (from ECON 329).

15 2015-10-28 Wednesday

15.1 Capacity Scaling (cont)

The algorithm:

First, set a flow $x \gets 0$, $\pi \gets 0$
$\Delta \gets 2^{\lfloor\log U\rfloor}$
while $\Delta \ge 1$
- Here, we maintain the invariant:
- foreach $(i,j) \in E$:
  - if $c_{ij}^\pi < 0$, then arc reversal
- SSP($\Delta$)
- $\Delta \gets \Delta / 2$
The algorithm holds the loop invariant that $c_{ij}^\pi \ge 0 \:\:\: \forall (i,j) \in E(x,\Delta)$.
SSP($\Delta$) terminates when either $E(\Delta) = \emptyset$ or $D(\Delta) = \emptyset$. So either $0 \le e_i < \Delta$ or $0 \ge e_i > -\Delta$. This means that the total excess ≤ $n\Delta$.
After we modify $\Delta$, the total excess ≤ $2n\Delta$.
However, the arc reversals introduce a worst case total of $2m\Delta$ excess, so at the beginning of SSP, we get a total excess ≤ $2(n+m)\Delta$.
At $\Delta=1$, $G(x,\Delta) = G(x)$, so we are solving the original residuals.
- We can think of CS as a logarithmic number of scaling phases to prepare for $SSP(1)$ (the original problem).
- Kinda like Shell Sort!

15.1.1 Runtime

The original SSP took $O(nUS)$. It was based on the idea that you would send (in the worst case) 1 unit of flow each time, so you would do $nU$ augmentations, multiplied by $S$, the shortest path time. We have changed our SSP algorithm so that it pushes $\Delta$ units of flow at each iteration. We said above that at the beginning of SSP, the total excess ≤ $2(n+m)\Delta$. By $2(n+m)$ iterations, we should have excess set to 0. So, the runtime is $O((n+m)S)$ for each SSP, and the total runtime is $O((n+m)S\log U)$. (for the record, $m$ is the number of edges). This means that capacity scaling is in fact a polynomial time algorithm!

15.2 Max Flow Min Cut

Suppose you have a source vertex, and a network without costs but with capacities. The maximum flow from $s$ to $t$ is equal to the min cut that separates $s$, $t$. A cut is defined as a partition of the set of vertices. The min cut is the one that minimizes the value of the edges (in this case, capacities) that span the partition. It actually really makes sense that max flow is equal to min cut, because the min cut is like the smallest "pipe width" you can send flow through.

Unfortunately, in order to solve min cut you pretty much have to solve max flow. So there isn't a huge breakthrough due to this property. It's just interesting and sometimes useful.

15.3 Game Theory!

We have a game where a professor puts a coin under a cup, and a student guesses whether it's heads or tails. If the student guesses correctly, they get a point and the professor loses one. Otherwise, the professor gets a point and the student loses one. To summarize:

	Student Guesses Heads	Student Guesses Tails
Professor Picks Heads	Professor-1, Student+1	Professor+1, Student-1
Professor Picks Tails	Professor+1, Student-1	Professor-1, Student-1

Here is a theoretical definition of a game (in strategic terms):

Set $N=\{1,2,\dots,n\}$ players.
Set $S^i $ of pure strategies for the $i^{\text{th}}$ player.
Payoff function $H: S^1 \times S^2 \times \dots \times S^n \to R^n $
A strategy profile: $(s_1, s_2, \dots, s_n)$
$H_i(s_1, s_2, \dots, s_n)$ is the payoff for player $i$

In this coin flipping game, there are two players (player one is professor, player two is student), and $S^1=S^2=\{h,t\}$. The table above summarizes the payoff, but I'll do it again in terms of the players and strategies again.

	s₂=h	s₂=t
s₁ = h	(-1, +1)	(+1,-1)
s₂ = t	(+1, -1)	(-1,+1)

Def: A mixed strategy of player $i$ is a probability distribution over $S^i $.

$\vec{x}$: mixed strategy profile
- $\vec{x} = (x_1, x_2, \dots, x_n)$
$Pr[s] = x_1(s_1)x_2(s_2)\dots x_n(s_n)$

16 2015-10-26 Monday

16.1 Runtime of Successive Shortest Paths

Worst case complexity, as a function of $n$, is sometimes tricky.
We talk about it in terms of "input size", with any reasonably efficient encoding.
Some things that look polynomial (e.g. $n^k $) are not, because $k$ is an input to the problem.
Some things that don't look polynomial (e.g. $n \log n$) are, because they are bounded above by a polynomial function (e.g. $n^2 $).
So, with that in mind, let's look at the runtime of SSP.

The runtime is $O(n \times U \times S)$, where:

$n$ is the number of vertices
$U = \max_{i\in V} \{|b_i |\}$
$S$ is the shortest path runtime

This is not polynomial! Let's look at the terms. We'll assume $S$ is fine. To represent the graph, we need at least $n$ bits (probably some constant multiple of $n$, for the edges and things). So, $n$ is an input size, and the $n$ term is polynomial.

However, $U$ is not an input size. It's an input number. It's completely unrelated to the storage of the graph and the input size. You could make it whatever you'd like. Because of this, you only need $\log_2 U$ bits to store it. Thus, $U$ is exponential in the input size. (A useful counterexample - if $U=2^n $, then it's $O(n2^n S)$).

It's a pseudo polynomial time algorithm. If we represented $U$ in unary, it would be polynomial. But that's not a reasonably efficient encoding.

16.2 Capacity Scaling

$\Delta = 2^{\lfloor\log U\rfloor},\dots, 2, 1$
$E(\Delta) = \{e_i \ge \Delta: i \in V\}$
$D(\Delta) = \{e_i \le -\Delta: i \in V\}$
$G(x,\Delta) = (V, E(x,\Delta))$ - the residual network w.r.t. $x$, w/ edges $r_{ij} \ge \Delta$.

The idea of this algorithm is that we want to solve the biggest supplies and demands first.

SSP(Δ):

Input: $G(x, \Delta)$ s.t. $c_{ij}^\pi \ge 0 \:\:\: \forall (i,j) \in E(x,\Delta)$
$E(\Delta) \gets \{e_i \ge \Delta: i \in V\}$
$D(\Delta) \gets \{e_i \le -\Delta: i \in V\}$
while $E(\Delta), D(\Delta) \ne \emptyset$:
- choose $k \in E(\Delta)$, $l \in D(\Delta)$
- find shortest path $P: k \to l$ in $G(x, \Delta)$ w/ $c_{ij}^\pi $
- send $\Delta$ units of flow along $P$.
- $\pi \gets \pi - d$
- update $x$, $E(\Delta)$, $D(\Delta)$, $G(k,\Delta)$.

Thinking about this algorithm, we see that each iteration of this will not necessarily remove an edge from the residual. It could just drop that node out of $E(x,\Delta)$, while still keeping its residual greater than 0. So, we need the capacity scaling algorithm below:

Capacity Scaling:

$x \gets 0, \pi \gets 0$
$\Delta \gets 2^{\lfloor\log U\rfloor}$
while Δ ≥ 1:
- for each $(i,j) \in G(x)$:
  - if $c_{ij}^\pi < 0$ then arc reversal
- SSP($\Delta$)
- $\Delta \gets \Delta / 2$
Claim: $c_{ij}^\pi \ge 0 \:\:\: \forall (i,j) \in E(x,\Delta)$ throughout SSP($\Delta$).
Proof: similar to SSP but $G(x, \Delta)$ …
- Sketch: if we have an edge that is in $E(x, \Delta)$ but not $E(x, 2\Delta)$, we cannot guarantee the claim. But an arc reversal will fix this…

Runtime: $O(n \times \log U \times S)$

How do we get that? Let's look at the total excess right after we adjust $\Delta$. Due to the termination condition of SSP, we have:
- either $\forall i \in V: 0 < e_i < 2\Delta$,
- or $\forall i \in V: 0 > e_i > -2\Delta$.
These imply:
- either: total excess $\le 2n\Delta$
- or: total deficit $\ge -2n\Delta$

17 2015-10-21 Wednesday

17.1 MCNF Algorithms

Assumptions

All data is integral (this implies that solutions are integral as well).
- An "integral part" of why the solutions are all integral.
$c_{ij} \ge 0 \:\:\: \forall (i,j) \in E$
- If you have a path that contains cycles, and all weights are non-negative, you're better off eliminating all cycles.
$\forall u,v \in V \:\:\: \exists P: u \to v$ (that is, there is a path from every node to every other one).
- Let $|P| \le n - 1$. Since $c_{ij} \ge 0$:
- cost of $P \le n C = n \max_{(i,j)\in E} \{c_{ij}\}$
- I missed the lead up to this, but: "An optimal solution should not send flow through $z$ if $\exists$ another path from $u$ to $v$."

17.1.1 Successive Shortest Paths

Imagine that you have a number line of the objective value. You take a feasible flow (?), then take its dual and start improving the dual until it's feasible, and then it's optimal (?).

Definition: pseudo-flow $x$ satisfies $u_{ij}$ but not necessarily $b_i$.

EG: $x=0$. This satisfies the upper bounds, but usually not the supplies and demands.
We can also let $\pi=0$ in the dual. This means that the reduced costs $c_{ij}^\pi = c_{ij} - \pi_i + \pi_j \ge 0$, and the residual is the original network. This means that $x=0$ and $\pi=0$ are related by complementary slackness. So this is a starting point for SSP.

Definition: $e_i $: excess at vertex $i$ w.r.t. pseudo flow $x$:

\begin{equation} e_i = b_i + \sum_{(j,i) \in E} x_{ji} - \sum_{(i,j) \in E} x_{ij} \end{equation}

Initially, (at $x=0$), $e_i = b_i$.

Then, we define the:

Excess vertices: $E = \{i \in V: e_i > 0\}$
Deficit vertices: $D = \{i \in V: e_i < 0\}$

Assuming we have a feasible solution, $E \ne \emptyset \leftrightarrow D \ne \emptyset$ (initially?).

So, if you have two vertices, one in $E$ and one in $D$, you want to send some flow between them. How do we update the $\pi$ values when you come up with the new $x$?

Claim: Let $x$ be pseudo-flows and $\pi$ be node potentials, satisfying complementary slackness. We also have $G(x)=(V, E(x))$, the residual (residuals only use upper bounds and flows, not supplies, so we can have residuals for pseudo-flows). Finally, let $c_{ij}^\pi $ be the reduced costs for $(i,j) \in E(x)$.

Define $d_i$ to be the length of the shortest path from $s \in V$ to $i$ ($\forall i \in V$) in the residual, with lengths $c_{ij}^\pi $. ($s$ is arbitrary but fixed as some special vertex $s \in V$). Since $c_{ij}^\pi \ge 0$, this is well defined.

OK, two results of all these assumptions and notations:

$x$, $\pi'$ satisfy complementary slackness, where $\pi'=\pi-d$
$c_{ij}^\pi = 0 \:\:\: \forall (i,j) \in$ shortest path tree

Example (initial $\pi = 0$)

i	b_i	d_i	π'_i
1	4	0	0
2	0	2	-2
3	0	2	-2
4	-4	2	-3

i	j	c_ij	u_ij	c_ij^π	c_ij^π'
1	2	2	4	2	0
1	3	2	2	2	0
2	3	1	2	1	1
2	4	3	3	3	2
3	4	1	5	1	0

The shortest path tree is: (1,2), (1,3), (3,4). The (new) reduced costs for those edges are all 0. In this case, they are the only edges that have reduced costs 0, but that's not true in the general sense.

Proof: (I doubt we'll have time to finish this bad boy)

Proof of the first part ($x$, $\pi'$ satisfy complementary slackness):
- $d_j \le d_i + c_{ij}^\pi \:\:\: \forall (i,j) \in E(x)$ (I believe this is a Bellman-Ford inequality)
  \begin{align*} c_{ij}^{\pi'} &= c_{ij} - \pi_{i}' + \pi_{j}' \\ &= c_{ij} - \pi_i + d_i + \pi_j - d_j \\ &= c_{ij}^\pi + d_i - d_j \ge 0 \end{align*}
  Well, that was neat.
Proof of second part $(i,j) \in SPT$, $c_{ij}^\pi = 0$
- If $(i,j) \in SPT$, then…
- We have $d_j = d_i + c_ij^π
- And therefore when you do the computation above, you end up with $c_{ij}^{\pi'} = \dots = c_ij^\pi + d_i - d_j = 0$. So that was also easier than expected.

Now we're talking about CS:

$\alpha_{ij} (u_{ij} - x_{ij}) = 0$
$x_{ij} (c_{ij}^\pi + \alpha_{ij}) = 0$

When you have $c_{ij}^\pi = 0$, $\alpha_{ij} = 0$, and so both CS conditions are satisfied (for any flow $x$).

You can send any amount of flow along this SPT, and it will be optimal! Our job is just to find a flow that satisfies the upper bounds.

18 2015-10-16 Friday

Midterm was today.

19 2015-10-14 Wednesday

Update: Problem 5 on the homework we went over on Monday did not actually have a solution that involved shortest paths. Oops!

Friday is the midterm. (I'm a bit scared). So I guess we'll do some review.

The "Reading List" syllabus on Blackboard contains all the topics.
There will be four questions.
Easier than the homework (but very similar)
Doubtful that we will have enough time to solve them all. We won't be graded out of 100% completion.
We should have enough time to "get a start" on them all and "write something meaningful".
Doubtful that we will need to write anything in Octave.
Open books, open notes, open electronics (w/o internet). Of course, this isn't the most helpful to you if you bring everything, because you won't have time to use it in 50 minutes.

19.1 Back to Equipment Replacement

$c_{ij}$: cost of operating machine from year $i$ to year $j$. $1 \le i < j \le n$.
Minimize the total cost s.t.
- At every time, the number of running machines ≥ 1.
Last time, we got to the point where we had a graph with nodes 1 through $n$. Each node had an arc to all the future years, with the $c_{ij}$'s labeled on it.
Put a supply of 1 on 1, and a supply of -1 on n. This seems like it would solve the problem, but it doesn't allow for us to have multiple machines operating at one time.
- One solution would be to have a single "source" node that connects forward and backward with each year node. But this allows us to send machines "forward" in time with zero cost. Bad.
- My initial solution to that was to have a set of nodes 1' through n', with edges (i',1) through (i', i-1), and then (i, i'). This allows only backward flow. This does solve the problem, but wih quadratic arcs.
- The next improvement on that was to have each "prime" node connected backward to the previous one, instead of all of the previous regular nodes. This improved it to linear number of arcs.
- Then we realized that each node could just be directly connected to its backward neighbor with cost of zero. Oops.

19.2 Back to Uniform Cost Parallel Machines

$n$ jobs, length $l_i \le T$, $\sum_{i=1}^n l_i \le TM$, where $T$ is the time unit, $M$ the number of machines.
We had a max flow solution to assign job segments into time slots, now we need to fill up the machines so that there is no overlap.
- To do this: assign the first job into the first machine. Assign the next job to the remainder of the machine's time. Then put that job's remainder onto the next machine. You know it won't overlap because the segment must be ≤ T.
- Yay.

19.3 Midterm Questions

Now we talk about the midterm.

Mostly "solving problems"
They will be "like" homework problems. Not necessarily "same question, different numbers". But similar in spirit.
There could be reductions to linear programs.
There could be reductions to shortest paths
Should probably review your linear algebra so it's easier to
(This is only slightly terrifying)

19.4 Final Exam

30-40 minute oral exam!!!
Will be plenty of talk on concepts, explaining yourself, etc.
(This is actually a lot terrifying)

20 2015-10-12 Monday

20.1 Homework 3 Review

Problem 1: when converting nodes with capacities into pairs of nodes without capacities, you assign the supplies to the input/output node by the following rule:

If $b_i > 0$, then assign it to the output node.
If $b_i < 0$, then assign it to the input node.
Otherwise, it doesn't really matter, both nodes get 0.

Also on problem 1, make sure when you draw graphs you explain what your numbers mean. A good policy is to include a legend.

Finally, it's generally better to get rid of capacities first, because we don't have solid rules on how to get rid of lower bounds with nodes that have capacities.

For my case, I know I forgot to decrease the upper bound of $(c,d)$ when I removed the lower bound. My other error was putting the negative $b_i$ on the output node.

Problem 4: The strategies for finding $\pi_i$ were varied. Some wrote the primal, and then found the dual, and then solved it with Octave, and then used the dual solution. Others guessed and checked. Others used the $c_{ij} - \pi_i + \pi_j \ge 0$ condition and found a feasible solution to that (which was not too hard given the number of parallel arcs). The other way was to pick a starting node from which all other nodes are reachable, and then find the shortest to each node. The negative of these values are suitable to use as node potentials. Admittedly, in my solution, I didn't show any work for the node potentials, but I used the shortest path method.

Problem 5: Here is one solution to a: 2 3 2 3 2. Here, Alice always loses no matter what she chooses. My solution is similar to that (1 6 7 5 8 4 9 3 10 2). This sort of solution may trick you into believing that if Alice has any winning strategy, the best choice is to choose the max chip. This is not true. Consider this arrangement: 10 1000 3 4. Alice should choose 4 in order to protect the 1000, forcing Bob to open it up for her to take on the next turn. This strategy requires that she remove the lower of the choices.

One way to solve part b is to formulate the solution recursively, and come up with a dynamic programming algorithm. Then do as in class and recast the dynamic programming solution as a shortest path solution.

Another way to do it would be to do game trees, where you represent each node as a series of decisions, and its children are the results of all the possible decisions from that node. But this has $2^n$ nodes, which is pretty bad. There is also the problem that a shortest path solution will minimize the strategy of Bob, which doesn't really do what we want.

Dr. Liberatore's solution is to formulate each node in a graph as a list of chips, and make transitions a full round (Alice, Bob). We identified a similar issue to the previous strategy (where it minimizes Bob's strategy). Hopefully next class we'll get it figured out.

21 2015-10-09 Friday

21.1 Uniform Parallel Machines

You have a set of $n$ jobs:

$s_i$: start time of job $i$
$d_i$: deadline of job $i$

It seems like the idea is to schedule each job to a time interval on a machine so that everything happens. The claim is that this reduces to Max Flow.

Job	1	2	3	4
r_i	1.5	1.25	2.1	3.6

Interval	Jobs
1,3	2
3,4	1,2,3
4,5	1,3
5,7	3,4
7,9	4

We create a network with a source, and a sink. We create nodes for the jobs, and we create nodes for each time interval. We create arcs from the source to each time interval, with capacity of $M * (f_i - s_i)$. Then we create nodes for each job. We create arcs from the jobs to the sink with capacity of $r_i $. Finally, we create arcs from the intervals to the jobs, with capacity $f_i - s_i $.

To go from a flow to a schedule: If we have $x_{(h,k), i} > 0$, we schedule $i$ in $[h,k]$ for $x_{(h,k),i}$ time. We may have to do some finagling to ensure that everything fits, but we'll come back to that issue. In any case, we can also go from a schedule to a flow.

A schedule/flow is feasible if every job → sink link is at capacity.

21.2 Equipment Replacement

When you buy a machine, it operates very nicely at the beginning, but then deteriorates and you need to repair it. You'll need to repair it more and more and more as you go. At some point, it becomes cheaper to replace it with a new machine than to continue repair it.

So, say you buy a machine at time $i$ and sell at time $j$.

$c_{ij}$: total cost of operating machine from time $i$ to $j$. This includes the cost of purchase, minus the sell price, plus repair prices.

At least one machine must be operating at all times. The problem is to find an optimal replacement schedule. The claim is that this reduces to MCNF. We are given a discrete set of times 1 through $n$ to buy and sell at.

You can create a network with a node for each time. Each time $i$ is connected to each future node $j$ with cost $c_{ij}$. You set the supply at $1$ to 1, and supply at $n$ to -1. The one problem is that this doesn't allow for more than one machine to be running at a time. So, we're trying to solve that by adding a source and sink node, but things aren't working out perfectly.

22 2015-10-07 Wednesday

22.1 Max Flow Reduction: Matrix Rounding

A government agency collects data via a census, in the following matrix $A$:

\begin{bmatrix} 3.1 & 6.8 & 7.3 \\ 9.6 & 2.4 & 0.7 \\ 3.6 & 1.2 & 6.5 \\ \end{bmatrix}

Where zip codes are along the x-axis, and tax brackets are along the y-axis. Given enough such matrices to cross-reference, a nefarious individual would be able to identify exactly which members of their community are in which tax brackets. So, the agency would like to obfuscate the data in the matrix.

Create a new matrix $B$ such that $b_{ij}$ is either $\lfloor{}a_{ij}\rfloor$ or $\lceil{}a_{ij}\rceil$ However, $B$ should also obfuscate the sums of both the columns and rows of $A$ in the same manner (so, $\sum_{i}b_{ij}$ should be either $\lfloor{}\sum_{i}a_{ij}\rfloor$ or $\lceil{}\sum_{i}a_{ij}\rceil$, and so on, for each row and column).

A ``consistent rounding'' satisfies all these constraints.

Question: Given matrix $A$, find a consistent rounding. Claim: Reduces to max flow.

See the (hopefully) included image.

Figure 1: (Hopefully) included image.

Let $r_i$ be the sum of row i, and $c_j$ be the sum of column j.

The idea is to have a node for each element of $A$, as well as nodes for the column and row sums. Then, make an arc going from $a_{ij}$ to $c_j$, with lower bound of $\lfloor{}a_{ij}\rfloor_{}$ and $\lceil{}a_{ij}\rceil$. Also, make an arc going from $r_i$ to $a_{ij}$ with the same upper and lower bounds.

Now, make an arc going from each $c_j$ to a sink node $t$. Each arc will have lower bound $\lfloor{}c_j \rfloor$ and upper bound $\lceil{}c_{j}\rceil$. Similarly, make an arc going from a source node $s$ to each $r_i$. Each arc will have upper bound $\lfloor{}r_{i}\rfloor$ and upper bound $\lceil{}r_{i}\rceil$.

This is sufficient to formulate matrix rounding as a max flow problem. Note that there is actually a second formulation, which reverses the directions of all the arcs.

22.2 Max Flow Reduction: Job Scheduling

The remainder of class was spent talking about the job scheduling problem, which appears to be described in full on the notes for 2015/10/09.

23 2015-10-05 Monday

23.1 Max Flow

def Max flow: G = (V, E), with $\forall (i,j)\:u_{ij},\text{zero costs, supplies, and demands}$. One edge going from source to sink with no upper bound, -1 cost. The optimal solution will maximize the flow from source to sink.

23.2 Feasible Flow Problem

def Feasible flow: Given a network with supplies and upper bounds, show that a flow exists.

Claim: Feasible flow reduces to Max flow.

Steps:

(remove lower bounds, if needed)
from all sinks, put edge to t with $u_{it}=-b_{i}$
to all sources, put edge from s with $u_{si}=b_{i}$
set supplies/demands of all nodes to zero
connect t to s, -1 cost, no upperbound
Solve max flow. If all upper bounds met going into t / out of s,

(i.e. final supplies/demands are 0) then there is a feasible flow.

23.3 Reductions

To prove a reduction from one problem to another, show a 1-to-1 mapping between solutions to each problem.

Ex. Show feasible flow reduces to max flow: Proof: ($feasible flow \Rightarrow max flow $

Let x be a feasible flow. Let $y_{ij}=\begin{cases}x_{ij}\text{, if}(i,j)\in E \\ u_{ij}\text{, if}(i,j)\notin E \end{cases}$
We know the value of the max flow $\le \sum u_{si}$ and the value of the y flow \(≤ ∑ y_si = ∑ b_i \mathbb{1}_{bi ≥ 0} ≥) value of max flow. So, If val(max flow) == val(y flow), y flow is optimal.

24 2015-10-02 Friday

MCNF: special cases and applications

Taking problems and mapping them to MCNF is preferable because:

MCNF is faster that LP
MCNF is unimodular, so you can drop integrality constraints.
This makes it attractive. How can we solve other problems with it?

24.1 Shortest Paths

Given:

directed graph $G=(V,E)$
$s \in V$
lengths $c_{ij}$ associated with $(i,j)\in E$.

Find the shortest path from $s$ to $u$. $u \in V - \{s\}$.

I feel like after EECS 340, I don't really need to copy down the graph we're using for demonstration. So, shortest paths are a special case of MCNF.

Set costs to the lengths of the edges.
Lower bounds 0
Upper bounds infinite
Set source node supply to $|V-\{s\}|$, and all other nodes to $-1$.

If you get fractional solution for a single solution, you could use either flow. Note that people really don't try to use MCNF to solve shortest paths; rather, they sometimes use repeated Dijkstra's algorithm to solve MCNF (instead of LP).

$\min \sum_{(i,j)\in E} c_{ij} x_{ij}$, s.t.
$\sum_{(s,j) \in E} x_{sij} - \sum_{(j,s) \in E} x_{js} = n - 1$
$\sum_{(i,j) \in E} x_{ij} - \sum_{(j,i) \in E} x_{ji} = -1$ for \(i ∈ V-\{s\})
$x_{ij} > 0$, for $(i,j) \in E$

The dual:

$\max (n-1) \pi_s - \sum_{i \in V-\{s\}} \pi_i$, s.t.
$c_{ij} - \pi_i + \pi_j \ge 0$, for $(i,j) \in E$
π's unrestricted.

Let's look at the objective function and make a change of variable:

\begin{equation} (n-1) \pi_s - \sum_{i \in V-\{s\}} \pi_i = n \pi_s - \sum_{i\in V} \pi_i = \sum_{i \in V} (\pi_s - \pi_i) = \sum_{i\in V} d_i \end{equation}

So, we change variable to $d_i = \pi_s - \pi_i$. Above is the new objective function using the change of variable. Now we'll rewrite the whole thing using $d_i$.

$\max \sum_{i\in V} d_i $, s.t.
$d_j \le d_i + c_{ij}$, for all $(i,j) \in E$

The intuition is that $d_j $ is the distance from the source to node $j$. We are maximizing, which seems funny, but it makes sense when you realize that the constraints are holding back the whole thing. The constraints here are the Bellman-Ford optimality conditions, which are what Dijkstra's Algorithm does at every step (and of course the Bellman-Ford algorithm).

24.2 Dynamic Programming!!!!!

Dynamic programming is an algorithm design technique, not an algorithm in itself. However, we're going to make a sweeping claim here: "all dynamic programming is a special case of network flow". (say-whaaaaat.gif)

Figure 2: Say Whaaaaat?

Weighted Interval Scheduling: You have $n$ intervals $(s_i, f_i), v_i $. Find the subset of non-overlapping intervals with maximum value.

Solution: $M(i)$ is the maximum value from scheduling intervals 1, 2, …, i. We want $M(n)$. We do know that $M(0) = 0$. Recursively:

\begin{equation} M(i) = \max \left\{ M(i-1), v_i + M(p(i))\right\}\ \end{equation}

Here, $p(i)$ is the first predecessor interval that doesn't overlap with $i$. The two conditions are either (1) we don't choose interval $i$, or (2), we do choose interval $i$. In the solution algorithm, we keep a vector of $M(i)$ values and compute it from $0$ up to $n$, and that value is your optimal value.

0	1	2	…	p(i)	…	i-1	i	…	n
0				M(p(i))		M(i-1)	M(i)		M(n)

Imagine that in this illustration, we have arrows going from the vector slot $i$ back to $i-1$ and $p(i)$.

Now, we start making broad, sweeping claims! This is just a MCNF problem! Make each element in the vector a node in a graph, and put edges from $p(i)$ to $i$, and $i-1$ to $i$. The edges are labeled with the gain you get $-v_i$. You can then run MCNF on this, and get an optimal solution. You can do this with any Dynamic Programming algorithm that you have the recursive formulation of, and are stored in a tabular format like this.

So essentially, the dynamic programming and recursive/network way of thinking are pretty equivalent, and it usually depends on the person and how they've been thinking previously. But the solutions you come up with as a result may have slightly different advantages and disadvantages.

24.3 Circulation

Circulation problems are pretty much MCNF, except that all supplies are 0. Sometimes, you have lower bounds, so you are forced to send some flow. Of course, then you could eliminate the lower bounds as we've discussed, and then end up with a problem that has supplies and demands that are nonzero.

However, you can have problems with no supply/demand and no lower bounds, which have an optimal flow that does send some flow.

Example Network

i	j	c_ij	l_ij	u_ij
1	2	-1	0	3
2	3	+1	2
2	5
3	4
3	1	-2		4
5	4

Next week is a regular week. Then homework solutions and midterm review.

25 2015-09-30 Wednesday

25.1 MCNF Duality

Recall from last time:

π_i: node potential at node i ∈ V
α_ij = max{0, -c_ij^π}, (i,j) ∈ E
where c_ij^π = c_ij - π_i + π_j

Here's his network from last time:

Node	Supply	π_i
1	4	0
2	0	-2
3	0	-3
4	-4	-4

Start	End	Cost	Upper	c_ij^π
1	2	2	4	0
1	3	2	2	-1
2	3	1	2	0
2	4	3	3	1
3	4	1	5	0

We have an (optimal) flow as follows:

2 units go 1 → 2 → 3 → 4
2 units go 1 → 3 → 4

Then he wrote the residual network, with π_i and c_ij^π copied down. c_ij^π can be copied for the arcs that still exist, but needs to be recalculated for each new arc. However, if a new arc is the opposite of an old one, you know that $c_{ji}^\pi = -c_{ij}^\pi$. Here are the values for the residual network.

Node	Supply	π_i
1	0	0
2	0	-2
3	0	-3
4	0	-4

Start	End	Cost	Upper	c_ij^π
1	2	2	2	0
2	1	-2	2	0
3	1	-2	2	1
3	2	-1	2	0
2	4	3	3	1
3	4	1	1	0
4	3	-1	4	0

The residual network can be a multigraph. If you have node $i$ and $j$ connected a forward and backward arc in the original network, you could have each arc causing two in the residual. If an edge $e$ causes new edge $f$ in the same direction and $g$ in the opposite, then you have that $c_e^\pi = c_f^\pi = - c_g^\pi$. We say $e$ originates $g$ and $f$.

Ok. We're kinda caught up to where we were at the end of the last MCNF duality lecture. Last time we conjectured that:

Theorem: x_ij, π_i, (α_ij) feasible:

x_ij, π_i, (α_ij) optimal iff: $c_{ij}^\pi \ge 0 \:\: \forall (i,j) \in E(x)$.

Here, $E(x)$ is the arcs in the residual network.

Proof: (→) x_ij, π_i optimal.

This gives us:
- wlog $\alpha_{ij} = \max \{0, -c_{ij}^pi \}$
- complementary slackness!
  - (1): $\alpha_{ij} (u_{ij} - x_{ij}) = 0$
  - (2): $x_{ij} (c_{ij}^\pi + \alpha_{ij}) = 0$
$\forall (h,k) \in E(x)$, $\exists! (i,j) \in E$ that originates it. $\exists!$ means "there exists a unique".
- Case a: x_ij = 0 originates (i,j) ∈ E(x)
  - By contradiction, c_ij^π < 0, so a_ij > 0.
  - By the complementary slackness condition (1), $x_{ij} = u_{ij}$, which is a contradiction.
- Case b: x_ij = u_ij originates (j, i) ∈ E(k) (but not (i,j), since there is no more flow to send forward).
  - By contradiction, c_ji^π < 0, which means c_ij^π > 0.
  - This means that $c_{ij}^\pi + \alpha_{ij} > 0$ (since c_ij^π > 0 and α_ij ≥ 0).
  - According to complementary slackness condition (2), $x_{ij} = 0$, which is again a contradiction!
- Case c: 0 < x_ij < u_ij originates (i,j) and (j,i):
  - By contradiction, assume $c_{ij}^\pi < 0$. By the same argument in case a, we get x_ij=u_ij, which is a contradiction.
  - By contradiction, assume $c_{ji}^\pi < 0$, By the same argument in case b, we get x_ij = 0, which is a contradiction.
This completes the proof. We use the fact that x_ij and α_ij are optimal to get complementary slackness. That's the key to the proof.

Proof: (←) We know $c_{ij}^\pi \ge 0$, $\forall (i,j) \in E(x)$. Need to prove $x_{ij}^\pi$ and $\pi_i$ are optimal.

To do this, we would want to show that for each arc (h,k) ∈ E, we have the complementary slackness conditions written above.
Case a: x_hk = 0:
- CS condition (2) follows immediately.
- Also, since $x_{hj}=0$, we know that $(h,k) \in E(x)$, which by our initial assumption means that $c_{ij}^\pi \ge 0$. Since $\alpha_{hk} = \max\{0, -c_{hj}^\pi\}$, we know $\alpha_{hk}=0$, and therefore condition (1) is also satisfied.
Case b: x_hj = u_hk:
- CS condition (1) follows immediately.
- Since arc (h,k) is at capacity, we know that $(k,h)\in E(x)$. So, we know that $c_{kh}^\pi \ge 0$. This means that in the original problem, $c_{hk}^\pi \le 0$ (that's how we would have calculated $c_{kh}$.
- By our definition of $\alpha_{hk} = - -c_{hk}^\pi$, which means that condition (2) is satisfied!
Case c: 0 < x_hk < u_hk:
- Now we know that both $(h,k)$ and $(k,h)$ are in $E(x)$.
- This means $c_{hk}^\pi, c_{kh}^\pi \ge 0$, which means $c_{hk}^\pi = c_{kj}^\pi = \alpha_{hk} = 0$, which gives us both complementary slackness conditions!
QED

As a recap, this proof gives us the approach of most of the MCNF algorithms. They start with a feasible flow, find the violations in the $c_{ij}^\pi \ge 0$ condition in the residual, and update the flow to improve it.

26 2015-09-28 Monday

Looks like we're going over homework 2.

It appears that the function numpy.invert() doesn't actually invert a matrix. The function for that is numpy.linalg.inv(). Who names a function invert(), when it doesn't actually invert a matrix? Damn.
When you have a linear program with matrix $A$, you can show that all BFS of that program will be integer by showing that: $A$ has full row rank!!, and then that $A$ is unimodular.
- Turns out we needed to do all 8x8 bases, not 9x9 bases. In which case, checking all (10 choose 8) bases would be too much, so a script was necessary here.

27 2015-09-25 Friday

27.1 MCNF Duality

Assume for simplicity, ∀ (i,j) ∈ E, 0 < u_ij < ∞. This condition doesn't provide any algorithmic restrictions, it just makes our notation simpler.

$\min \sum_{(i,j)\in E} c_{ij} x_ij $, s.t.
$\sum_{(i,j)\in E} x_{ij} - \sum_{(j,i)\in E} x_ji = b_i$ for all i
$-x_{ij} \ge -u_{ij}$ for all (i,j).
$x_ij \ge 0$ for all (i,j)

The dual: this should be pretty automatic by now. We will use variables π_i for the top constraint, and α_ij for the second constraint

$\max \sum_{i\in V} b_i \pi_i - \sum_{(i,j) \in E} u_{ij} \alpha_{i,j}$, s.t.
$ \pi_i - \pi_j - \alpha_{ij} \le 0$, for all i,j
- rewrite: $c_{ij} - \pi_i + \pi_j + \alpha_{ij} \ge 0$
$\alpha_{ij} \ge 0$
π_i unrestricted

Now that we have the dual, the next step is to ask, "what does it mean"? If we figure out what the dual means, we can learn many properties about the original problem and maybe come up with more efficient algorithms for it.

Claim: If the dual has an optimal solution, then ∃ optimal solution where $\alpha_{ij} = \max \{0, -c_{ij}^\pi \}$.

Definition: $c_{ij}^\pi = c_{ij} - \pi_i + \pi_j$
Called the reduced costs for the arc (i,j).
But not the same as the reduced costs as its linear program (?).
- That doesn't really matter to me since I never got the reduced costs thing anyway.

Proof: We already know that $a_{ij} \ge 0$, and $\alpha_{ij} \ge - c_{ij}^\pi$. The claim states that this is a strict equality, and apparently this is true becasue the profit of $a_{ij}$ is $-u_{ij}$, which means we need $a_{ij}$ to be as small as possible.

Theorem: (Complementary Slackness). If $x_{ij}$ is feasible, and $\pi_i, a_{ij}$ are feasible, then they are optimal iff:

$\alpha_{ij}(u_ij - x_{ij}) = 0$
$x_{ij} (c_{ij}^\pi + a_{ij}) = 0$

Define: Excess at $i$: $e_i = b_i - \sum_{(i,j)\in E} x_{ij} + \sum_{(j,i) \in E} x_{ij}$

$e_i=0$ in any feasible solution.

Proof: (of the complementary slackness statement above):

We write the LP complementary slackness conditions.
$\pi_i e_i = 0$
$\alpha_{ij} (u_{ij}-x_{ij}) = 0$
$x_{ij} (c_{ij}^\pi + \alpha_{ij}) = 0$
The first is always true (as we mentioned above with the excess)
The other two are the complementary slackness conditions from the theorem.

Example time (not pictured):

He gave a network and a feasible flow.
He claims it's optimal.
To prove it, he'll give a corresponding solution to the dual!
$\pi_{j}$ is called the "node potential".
He gave us the values for $\pi_i$ at each node. The $\alpha_{ij}$ follow from complementary slackness, having the values of $x_{ij}$ and $\pi_i$.
We then computed the values for $c_{ij}^\pi$, and used those to compute $\alpha_{ij}$, and showed that they satisfy complementary slackness, and are feasible.
Then, we wrote the residual network. Recall, you do this by replacing each edge that has a flow with an edge in the same direction, same cost, and ub is $u_{ij}-x_{ij}$. And, you add a reverse edge with negative cost and ub is $x_{ij}$
Then, we mapped the reduced costs $c_{ij}^\pi$ onto the residual network. For the new back edges $(i,j)$, their reduced costs are just $-c_{ji}^\pi$.
We notice that $c_{ij}^\pi \ge 0$ for all edges in the residual network. We denote that $\forall (i,j) \in E(x)$.
We make the claim that $x_{ij}$ and $\pi_i$ are optimal iff $c_{ij}^\pi \ge 0$ forall edges in the residual network.
Also, we point out that if $c_{ij}^\pi > 0$, then that arc is a binding constraint on the flow (i.e., that arc is at capacity).

28 2015-09-23 Wednesday

28.1 MCNF: Transformations

Have an instance of MCNF, and I want to transform it into another instance which is equivalent.

28.1.1 Removing lower bounds

If you have an arc from i to j, with b_i and b_j, with cost c_ij, and bounds (ℓ_ij, u_ij).
Create a new arc with bounds (0, u_ij-ℓ_ij), and you make node i b_i-ℓ_ij, and node j has b_j+ℓ_ij.
Essentially, this is an equivalent problem where you disregard the required flow. The cost that's eliminated is constant, so the solution to this will be the same to the solution of the original.

28.1.2 Removing upper bounds

Again, you have i and j, with supply b_i/b_j, cost c_ij, and upper bound u_iu.
i is flowing to j, with an amount x.
Create a new "node" that has supply -u_ij. Both i and j will flow into the new node. x will still flow from node i into the new node.
The cost of the flow from i to the new node will be c_ij
The cost of the flow from j into the new node will be 0.
The supply of node i is still b_i.
The supply of j becomes b_j + u_ij. Essentially, the flow from j to the new node will be u_ij - x, and when you subtract that from the supply, you will get b_ij + x, which is the flow from the original setup.

Example: he's doing an example I can't draw, but functionally he's taking a MCNF problem with costs and upper bounds, and getting rid of all the upper bounds. The process is as follows (for the whole graph):

The whole network becomes a bipartite graph. Each original node is on the left, and on the right there is a node for each arc in the original network.
For each arc on the original, you connect both nodes on the left to the corresponding node on the right.
You set the supply on the right side node to be the opposite of the upper bound.
The "supplying" node's arc has the same cost.
The "receiving" node's arc gets a cost of 0.
The receiving node has its supply increased by the upper bound.

28.1.3 Node splitting

Sometimes you try to model a LP as an MCNF? So to make the modeling better, you may want to bound the amount flowing through a node (not just the net "supply/demand").

Split the node into two: an input node containing all the edges going in, and an output node containing all the outgoing edges.
Connect the nodes from the input to the output, set the cost to 0, and upper bound it by the amount of flow you'd like to allow through the terminal.
You set the input node's demand to 0, and the output node's demand to the original.
Congrats, you've split the node!

28.2 Residual Network

If you have a feasible flow going through an instance of an MCNF problem, you can create a new problem by changing the demands at each node to be what you currently have satisfied, and creating arcs to either send back what you've already got flowing, or to go up to the maximum flow between nodes.

This new network is called a "residual network" since it represents the residual actions you can take to change the flow. It's sort of a pivot in the LP or maybe the dual. It's an action frequently taken by algorithms.

28.3 MCNF as LP

The MCNF is formulated as:

min c^T x, s.t.
Nx = b
x ≥ 0

Where x is a vector of arcs, c^T is a vector of costs for flows on each arc. N is the node arc incidence matrix. Its rows are the nodes, and its columns are the arcs. For arc (i,j), there is a +1 in row i, and a -1 in row j, assuming the flow is from i to j.

We will call $A$ the maximal subset of rows of $N$ that are linearly independent. It has full row rank. It has at least two non-zero elements, which are ± 1, in each column. Then, we have the problem:

min c^T x, s.t.
Ax = b
x ≥ 0

If $A$ is unimodular, then ∀ integer b, the optimal solution is an integer. Which means that the solution to the ILP version of the $A$ LP would be the same as the solution to the LP, and integer. I guess that would be pretty cool.

Definition: A is totally unimodular iff ∀ square submatrices C of A, det C ∈ {0, ± 1}.

Claim: Totally Unimodular → Unimodular. (recall Unimodular is: ∀ bases B of A, det B ∈ {± 1}.) Proof: Well, B is a non-singular square submatrix, so its determinant must be ± 1.

So, we must prove that $A$ is totally unimodular!

∀ C square submatrices k × k. We do induction on k.
Base case: k=1. Each submatrix is 1 × 1, and either contains -1, 0, or 1, so the determinant is either -1, 0, or 1.
Induction:
- Case 1: C has a column containing all 0's → det C = 0.
- Case 2: in every column, there is a +1 and a -1. If you sum up every row, you get 0, so the matrix is singular, and det C = 0.
- Case 3: Anything else. Pick a column such that you have just a +1 or a -1. This is simply that entry (± 1), times the determinant of the submatrix that excludes that row and column. By the inductive hypothesis, the determinant of this submatrix is ∈ {0, ± 1}, so this means that det C is ∈ {0, ± 1} as well!
MCNF is unimodular!

29 2015-09-21 Monday

Same content as 2015-09-23.

30 2015-09-18 Friday

30.1 Integrality

When you have a ILP, it's normally NP-hard. However, if the optimal solution to the LP is integral, then you have the solution in polynomial time. Hooray for you. How can you check to find out if the optimal solution is integral? Apparently, using a concept called unimodularity.

For this part of the lecture, $A$ is a $p \times q$ matrix, with integer values, and $rank(A)=p$.

Definition: A is unimodular iff $\forall B$ basis, $\det B = \pm 1$.

Theorem: $A$ as above. Equivalent:

(a) $A$ is unimodular
(b) ∀ basic feasible solution, $Ax=b$, $x\ge 0$ s an integer. (b integer)
(c) $\forall B$ basis, $B^{-1}$ integer.

This theorem is cool because it will apply for any objective function, and any integer $\vec{b}$. However, unimodularity is more strict than purely figuring out whether a given problem has an integer solution that is optimal.

Proof: (a) → (b)

A basic feasible solution is $x=(x_B, x_L)$ s.t. $Bx_B = b$ and $x_L = 0$.
For all basic feasible solutions, we have $x_L = 0$, which is integer. What about $x_B$? Let's figure out the ith component of $x_B$.
Well, it turns out that by Cramer's Rule, $x_i = \frac{\det B_i}{\det B}$.
We know that $\det B_i =$ an integer (how?).
Since $\det B = \pm 1$, we know that $x_i$ must be an integer. Yay (I guess).

Proof: (b) → (c)

basis → $B$.
D_j: jth column of $B^{-1}$
$D_j = B^{-1} e_J$, where $e_j$ is a vector of zeros except for index $j$, which is 1.
$(d_{ij}) = B^{-1}$
$a_i = \left\{ \begin{array}{ll} [-d_{ij}] & \text{ if } d_{ij} < 0 \\ 0 & \text{ if } d_{ij} \ge 0 \end{array} \right.$ (where [] is ceiling function).
This gives us a vector $\vec{a}$.
$D_j + a \ge 0$
$Bx = e_j + Ba$
$x = D_j + a$ is a solution
$B(B^{-1} e_j + a) = e_j + Ba$
The right hand side is an integer, and $B^{-1} e_j + a$ is apparently also an integer.
And I guess this proves it. I'm totally lost here.

Proof: (c) → (a)

$\det B \det B^{-1} = 1$
This is because $\det A \det B = \det AB$, and $\det I = 1$.
So, $\det B = \frac{1}{\det B^{-1}}$. We know that $B$ is integer (since $A$ is as above, integer).
We also know by our assumption (c) that $B^{-1}$ is integer. This means both determinants are integers, and the only values for $\det B^{-1}$ that make this possible are -1 and 1.
So, $A$ must be unimodular.

30.2 Minimum Cost Network Flow

Woo?

In minimum cost network flow, I have a graph $G=(V,E)$ which is directed. We now call vertices nodes, and edges arcs. You have a source node and a sink node, with a certain amount of flow that needs to go through the network (e.g., 3 units must go out of the source, and into the sink).

Each arc in the network from node $i$ and $j$ has a cost associated with it $c_{ij}$. It also has an $\ell_{ij}$ that is a lower bound of flow, and an $u_{ij}$, which is an upper bound of what can flow through the arc. The cost of a flow is the amount of flow through an arc times the cost of the arc. All flows are nonnegative.

The goal is to put the required amount of flow through the network, while minimizing the cost. The total cost is the sum for each arc of the amount of flow times the $c_{ij}$ for that arc.

The three next steps for this problem:

Make in into a linear program.
Find out whether it is unimodular.
Figure out its dual.

31 2015-09-16 Wednesday

Intro: we're getting to the core of the class. :D

31.1 Problem Setup

Problem setup:

min c^T x, s.t.
Ax ≥ b
x ≥ 0

Dual:

min b^T π, s.t.
A^T π ≤ c
π ≥ 0

31.2 Complementary Slackness Proof

Definition: x, π feasible are said to satisfy complementary slackness iff:

$\pi_i \left( \sum_{i=1}^q a_{ij}x_{j} - b_i \right) = 0$, i=1,2,…,p, (1)
$x_j \left( \sum_{i=1}^p a_{ij}x_i - c_j \right) = 0$, j=1,2,…,q, (2)

Theorem: x, π feasible satisfy c.s. iff x, π are optimal.

Proof: x, π feasible → weak duality.

b^T π ≤ π^T A x ≤ c^T x
First, prove cs → optimal
- Assume x, π satisfy cs.
- Sum up equation (1):
- You actually get $\pi^T A x - \pi^T b = 0$, or $\pi^T A x = \pi^T b$
- Sum up equation (2):
- You similarly get $\pi^T A x = c^T x$.
- So, you have $\pi^T b = c^T x$.
- This means that x and π are optimal.
Next, the other way around.
- It's basically the same proof in reverse.
- $c^T x = b^T \pi$ (by strong duality)
- Due to the weak duality inequalities, we know $c^T x = \pi^T A x = b^T \pi$.
- Then we can take the left and right side of the above, and take them make to summations:
- $\sum_{i=1}^p \pi_i \left(\sum_{j=1}^q a_{ij}x_j - b_i \right) = 0$
- (and similarly for the left side)
- Since π_i ≥ 0 and the inner summation also ≥ 0, we know that each term must be equal to 0.
- So, this proves (1), and WLOG the other half of the equation proves (2).

31.3 Lagrangian Relaxation

z^* = min c^T x, s.t. constraints

L(π) = min c^T x + π^T (b - Ax), s.t.
- x ≥ 0

Assume π ≥ 0. z* ≥ min c^T x + π^T (b - Ax), s.t. Ax≥ b, x≥ 0. This makes sense because the feasible region is the same, the c^T x part is the same, and π^T (b - Ax) will be ≤ 0. We can then further expand this to say that the right side is ≥ L(π), since L(π) expands the feasible region, meaning that the optimum value is ≤ the more constrained one.

31.4 Easily Finding the Dual

We want to find the dual of every linear program, not just the form with minimization, Ax≥b. and x≥0. We could switch the problem into this form. Let's call that plan B. Let's do this instead:

min c^T x s.t.

a_i^T x = b_i, (i∈M)
a_i^T x ≥ b_i, (i¬∈M)
x_j ≥ 0, (j∈N)
x_j unconstrained, (j¬∈N)

Dual:

max b^T π, s.t.

π_i unconstrained
π_i ≥ 0
A_j^T π ≤ c_j
A_j^T π = c_j

Primal	Dual
min	max
c^T x	b^T π
a_i^T x = b_i	π_i unconstrained
a_i^T x ≥ b_i	π_i ≥ 0
x_j ≥ 0	a_j^T π ≤ c_j
x_J unconstrained	A_j^T π = c_j

EG: min x₁ + x₂, s.t.

x₁ - 2x₂ = 3
x₁, x₂, ≥ 0

Originally, we would have transformed it into this problem: min x₁ + x₂, s.t.

x₁ - 2x₂ ≥ 3
-x₁ + 2x₂ ≥ -3
x₁, x₂ ≥ 0

Then, we get the dual from the constraints: max 3π₁ - 3π₂, s.t.

π₁ - π₂ ≤ 1
-2π₁ + 2π₂ ≤ 1
π₁, π₂ ≥ 0

Finally, simplify to max 3y, s.t.

y ≤ 1
-2y ≤ 1

31.5 More Examples

min 2x₁ + x₂, s.t.

x₁ + 3x₂ ≥ 4 (π₁)
-x₁ + x₂ = 7 (π)

Dual: max 4π₁ + 7π₂, s.t.

π₁ - π₂ ≤ 2 (x₁)
3π₁ + π₂ = 1 (x₂)
π₁ ≥ 0

min x₁ + 2x₂ - 3x₃, s.t.

x₁ + + x₃ = 4 (π₁)
2x₁ - x₂ + 2x₃ ≤ 5
- -2x₁ + x₂ - 2x₃ ≥ -5 (π₂)
3x₁ - 2x₂ + 3x₃ ≥ 7, (π₃)
x₁, x₃ ≥ 0

π₁ is unconstrained, due to the equality. π₂ and π₃ are ≥ 0, due to the inequality. The dual: max 4π₁ - 5π₂ + 7π₃, s.t.

π₁ - 2π₂ + 3π₃ ≤ 1 (x₁)
π₂ - 2π₃ = 2 (x₂)
π₁ - 2π₂ + 3π3 ≤ -3 (x₃)
π₂, π₃ ≥ 0

We'll do one with the knapsack problem, where we don't have a definite number of constraints/variables.

max ∑_j=1^q c_j x_J, s.t.

∑_j=1^q w_j x_j ≤ W
x_j ≤ 1, j=1…q
x_j ≥ 0, j=1…q

For the dual, we'll take the one constraint, call it α, and the rest and call the others π_i

min W α, s.t.
w_j α + π_i ≥ c_j
α, π_i ≥ 0

32 2015-09-14 Monday

32.1 Homework Stuff

Problem 3 had no solution.

In problem 7, you could find many examples of optimal solutions that are actually convex combinations of two optimal basic solutions, which are not themselves basic solutions. This is not what the question asked for. It seems like the basic feasible solutions are always integer for this problem.

32.2 Duality

Strong Duality Theorem: I have a program of the form:

min c^T x, st
Ax ≥ b
x ≥ 0

We also have its dual:

max b^T π, s.t.
A^T π ≤ c
π ≥ 0

The differences between these are:

min/max
A becomes A^T
c and b are swapped
x becomes π

We have weak duality, that b^T π ≤ π^T A x ≤ c^T x.

Strong Duality Theorem: Suppose that the primal (or dual) has a finite, optimal solution. Then, so does the dual (primal), and they have the same optimal objective value.

Proof: WLOG, assume that the primal has a finite, optimal solution x^* (the primal and dual can be swapped and the proof is the same). Also assume WLOG x^* is a BFS. First, we'll take the primal and put it into standard form:

min c^T x, s.t.
Ax - Is = b
x, s ≥ 0

Let $\tilde{x}=(x,s)$, $\tilde{c}=(c,0)$, and $\tilde{A}=(A,-I)$. Then we can put this in even nicer looking standard form:

$\min \tilde{c}^T \tilde{x}$, s.t.
$\tilde{A} \tilde{x} = b$
$\tilde{x} \geq 0$

When we write this in "canonical form" (I seriously need to study this part of the stuff):

$\min (c_L^{\tilde{\pi}})^T \tilde{x}_L + \tilde{\pi}^T b$, s.t.
$\tilde{x}_B + \tilde{A}\tilde{x}_L = \tilde{b}$
$\tilde{x}_B, \tilde{x}_L \geq 0$

We'll call the objective function z, so we're minimizing $z(\tilde{x})$. $z(x^*)=\tilde{\pi}^T b$. The next thing is to look at the reduced costs. First, we know that $c^{\tilde{\pi}} \geq 0$, because x^* is optimal. Next, we know that $c^{\tilde{\pi}} = \tilde{c} - \tilde{A}^T \tilde{\pi} =$:

\begin{align*} c^{\tilde{\pi}} &= \tilde{c} - \tilde{A}^T \tilde{\pi} \\ &= \begin{bmatrix} c \\ 0 \end{bmatrix} - \begin{bmatrix} A^T \\ -I \end{bmatrix} \\ &= \begin{bmatrix} c - A^T \tilde{\pi} \\ \tilde{\pi} \end{bmatrix} \end{align*}

This gives us that $c \geq A^T \tilde{\pi}$, and $\tilde{\pi} \geq 0$. This tells us that the $\tilde{\pi}$ is feasible in the dual. And then, we know that the objective value of $\tilde{\pi}$ in the dual is $b^T \tilde{\pi}$, which is z(x^*). We know by the weak duality theorem that no π can have an objective value greater than this, so it is an optimal solution for the dual!

32.3 Complementary Slackness

Let x, π be feasible solutions. x, π satisfy complementary slackness (p+q equalities).

\begin{equation} \pi_i \left(\sum_{j=1}^q a_{ij} x_{j} - b_i \right) = 0 \end{equation}

for i=1, 2, …, p

\begin{equation} x_j \left(\sum_{i=1}^p a_{ij} \pi_i - c_j \right) = 0 \end{equation}

for j=1, 2, …, q

Essentially, each of these are the slack variables. So, if you look at the slackness in a constraint in one problem, and multiply it by the corresponding variable in its dual, you'll find that quantity is zero.

If the slackness $s_i > 0$, then $\pi_i = 0$. You can look at the $\pi_i$ as a "price" for how much you'd be willing to "get rid" of the constraint. If your constraint is not even constraining you, you wouldn't care to get rid of it, and $\pi_i$ is 0. On the flip side, if your slackness is 0, the $\pi_i$ will tell you kinda how much you'd like to get rid of your constraint.

33 2015-09-11 Friday

LP canonical form.

\begin{align*} \min (c_L^{\pi})^T x_L + \pi^T b &\\ \text{s.t. } x_B + \bar{A} x_L &= \bar{b} \\ x_B, x_L &\geq 0 \\ \text{where } \pi^T B &= c_B^T \\ c^pi &= c - A^T \pi \\ \end{align*}

EG:

\begin{align*} \min x_3 + x_4 + 7 &\\ \text{s.t. } x_1 + 2x_3 + 3x_4 &= 1 \\ x_2 + x_3 + 7x_4 &= 2 \\ x_1, x_2, x_3, x_4 &\geq 0 \\ \end{align*}

The basic variables are x₁ and x₂. You can come up with a BFS by setting x₃ and x₄ equal to 0, and reading off the values for the basic variables.

Thm: BFS $\bar{x}$ is optimal iff c^π ≥ 0.

Proof: ← last time
Proof: → (only the non-degenerate case)

∃ s . c_s^T < 0 → $\bar{x}$ is not optimal.

Look at the $s$th column of $\bar{A}$, or $\bar{A}_s$.

The ith constraint is $x_1 + \bar{a}_{is} x_s + \mathcal{L} \text{ terms } = \bar{b}_i$.

If $\bar{A}_{is} \leq 0 \: \forall i$ then x_s can be increased arbitrarily to.

Assume ∃ i s.t. $\bar{a}_is > 0
\begin{equation} \theta = \min_{i: \bar{a}_{is} > 0} \frac{\bar{b}_i}{\bar{a}_{is}} \end{equation}
Since $\bar{b}_i \ge 0$ and $\bar{a}_{is} > 0$, we can claim $\theta \ge 0$. However, we'll be doing the non-degenerate case, and assuming $\theta > 0$.

More proof stuff that I really need to read about.

33.1 Duality

Primal: min C^T x s.t. Ax ≥ b, x ≥ 0.
Dual: max b^T π s.t. A^T π ≤ c, π ≥ 0

Claim: dual of dual is primal.

Theorem (Weak Duality): ∀ feasible x, π, b^T π ≤ c^t x.

34 2015-09-09 Wednesday

34.1 Linear Programming

min c^T x, s.t. Ax = b, x ≥ 0

A p × q matrix.
Rank(A) = p
A = (A₁, A₂, A₃, …, A_q)

Let $\bar{x}$ be a feasible solution. Let $A(\bar{x}) = \{A_i: \bar{x}_i > 0\}$. Thm: $\bar{x}$ is an extreme point iff $A(\bar{x})$ is a set of linearly independent vectors.

Def: (B,L) is a basis structure iff:

(B,L) partition of {1, 2, …, q}.
{A_I: i ∈ B} is a basis for R^p

A = (B, L), x = (x_B, x_L), c = (c_B, c_L)

EG: min x₁+x₂+x₃+x₄ s.

x₁ + 2x₂ + 3x₄ = 1
4x₂ + x₃ + 2x₄ = 2
All x ≥ 0

Rename variables x₁ to y₁, x₂ to y₃, x₃ to y₂, x₄ to y₄:

min y₁ + y₂ + y₃ + y₄, s.t.

y₁ + 2y₃ + 3y₄ = 1
y₂ + 4y₃ + 2y₄ = 2
All y ≥ 0

\begin{equation} A = \begin{bmatrix} 1 & 0 & 2 & 3 \\ 0 & 1 & 4 & 2 \end{bmatrix} \end{equation}

The left half of A is B, and the right half is L.

\begin{equation} y = \begin{bmatrix}y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix} \end{equation} \begin{equation} c = \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \end{bmatrix} \end{equation}

The top halves of these are $y_B$ and $c_B$ respectively.

\begin{align*} B x_B + L x_L &= b \\ x_B + B^{-1} L x_L &= B^{-1} b \\ x_B + \bar{A} x_L &= \bar{b} \end{align*}

Here we're letting $\bar{A} = B^{-1} L$ and $\bar{b} = B^{-1} b$.

A basic solution is one where $\bar{x_B} = \bar{b}$, or $\bar{x_L} = 0$. A feasible basic solution is one where $\bar{x_B} \geq 0$ as well.

Def: Simplex multipliers corresponding to $(B,L)$:

\begin{equation} \pi^T = c_B^T B^{-1} \end{equation}

Let $\bar{x} = \begin{bmatrix}\bar{x_B} \\ \bar{x_L} \end{bmatrix}$ be BFS corresponding to (B, L). The objective function at $\bar{x}$ is:

\begin{align*} \begin{bmatrix} c_B^T & c_L^T \end{bmatrix} \begin{bmatrix} \bar{x_B} \\ \bar{x_L} \end{bmatrix} &= c_B^T \bar{x_B} + c_L^T x_L \\ &= (\pi^T B) (B^{-1} b) + c_L^T (0) \\ &= \pi^T b \end{align*}

Def: Reduced costs corresponding to (B,L) = A

\begin{equation} c^{\pi} = c - A^T \pi \end{equation} \begin{equation} c^{\pi} = \begin{bmatrix} c_B^{\pi} \\ c_L^{\pi} \end{bmatrix} = \begin{bmatrix} c_B \\ c_L \end{bmatrix} - \begin{bmatrix} B^T & L^T \end{bmatrix} \pi = \begin{bmatrix} c_B - B^T \pi \\ c_L - L^T \pi \end{bmatrix} = \begin{bmatrix} 0 \\ c_L - L^T \pi \end{bmatrix} \end{equation}

I guess you can also rewrite it to $c = c^{\pi} + A^T \pi$, but I'm not writing out the vectors and matrices again. Now he's doing more stuff with the objective function.

\begin{equation} c^T x = (c^{\pi}_L)^T x_L + \pi^T b \end{equation}

Once we find the basic feasible solution, the $\pi^T b$ is pretty much fixed, and so we just need to minimize (C_L^π)^T x_L. Now, say we look at a non-basic (i.e. in L, not B) variable x_i, and look at its reduced costs.

If c_i^π ≥ 0, we would be happy to set x_i = 0 (if it's feasible).
If c_i^π < 0, we would be happy to set x_i = +∞ (if it's feasible).

We can see that if ∀ i c_i^π ≥ 0, then the BFS is optimal. In fact, it's also true the other way around.

Thm: BFS $\bar{x}$ is optimal iff ∀ i c_i^π ≥ 0.

Proof ←: (basically what we've been talking about)
Proof →: is a really difficult, multi-lecture proof. We'll sketch out the non-degenerate case only, θ > 0. Next time.

35 2015-09-02 Wednesday

35.1 Review of LP

min C^T x, st.
Ax = b
x ≥ 0

Integer LP is same, except require that x is an integer.

35.1.1 Example

Vertex Cover Problem - given a undirected graph G, find a vertex cover of minimum size. (vc = a set of vertices that cover every edge).

We are going to convert a VC problem into ILP. The graph we have is (no diagrams, sorry): V={1, 2, 3, 4}, E={(1,2), (1,3), (1,4), (2,3), (3,4)} (undirected).

Decision variables are x_i= 1, if i ∈ VC, 0 otherwise. We minimize the function x₁ + x₂ + x₃ + x₄, s.t.:

x₁ + x₂ ≥ 1
x₁ + x₃ ≥ 1
x₁ + x₄ ≥ 1
x₂ + x₃ ≥ 1
x₃ + x₄ ≥ 1
x₁, x₂, x₃, x₄ ∈ {0, 1}

In case you can't tell, there is a constraint for each edge, which basically says that at least one of the vertices on the edge needs to be 1.

35.2 Reducing Vertex Cover to ILP

More generally, the vertex cover of G=(V,E) can be transformed to ILP like this:

Min ∑_i∈V x_i, s.t.
x_i + x_j ≥ 1 ∀ (i,j) ∈ E
x_i ∈ {0, 1} ∀ i ∈ V

When you remove the integrality constraint from an ILP, you get the linear relaxation of the problem. In the case of this problem, we get an assignment of fractional weights to vertices such that each edge has sum ≥ 1, while minimizing the total vertex weights. It's an entirely different problem, and not really something we want.

According to the Liberator, the difference between a lot of the problems dealt with in other fields and in computer science is the addition of these "integrality constraints," which makes problems much more difficult than their continuous relatives.

35.3 "Slicing" Linear Programs

When you have the constraints Ax = b, you can think of it as a_i^T x = b_i, where a_i^T is a row vector of A. This is totally linear algebra, and I'm sure it'll come in useful later in the course.

35.4 Semi Definite Programming

Def: A real matrix A is positive (semi) definite iff ∀ x ≥ 0, x^T A x > 0 (x^T A x ≥ 0).
Thm: A is positive semidefinite iff all its eigenvalues are ≥ 0.

(note to self - go over linear algebra!)

Def: A is symmetric, positive, semidefinite -> A ≽ 0.
Thm: A ≽ 0 iff ∃ B s.t. A = B^T B. Given A, B can be found in polynomial time. B is not necessarily square, but of course B^T B will be.
Given two matrices C, X (n by m), C ⋅ X = ∑_i=1ⁿ ∑_j=1^m c_ij x_ij.

The problem of Semi Definite Programming is:

minimize C ⋅ X, st:
A_i ⋅ X = b_i
X ≽ 0

35.5 LP reduces to SDP

Claim: Linear programming is a special case of (i.e. reduces to) Semi Definite Programming.
\begin{equation} X = \begin{bmatrix} x_1 & 0 & 0 \\ 0 & \ddots & 0 \\ 0 & 0 & x_q \end{bmatrix} \end{equation} \begin{equation} C = \begin{bmatrix} c_1 & 0 & 0 \\ 0 & \ddots & 0 \\ 0 & 0 & c_q \end{bmatrix} \end{equation} \begin{equation} A_i = \begin{bmatrix} a_{i1} & 0 & 0 \\ 0 & \ddots & 0 \\ 0 & 0 & a_{iq} \end{bmatrix} \end{equation}
We wouldn't want to do this in practice, since we have more efficient algorithms to LP. But it exists.

35.6 Quadratically Constrained Quadratic Programming (QCQP)

min x^T Q x + q^T x
s.t. x^T Q_i x + q_i^T x ≤ b_i, i=1,2,..,p

Both the objective function and the constraints may be quadratic.

It seems that you can reduce QCQP also to SDP.
I guess the way to think about it is that in SDP, X = B^T B, so in the decision variables you get quadratic terms. Or something.

35.7 Back to Linear Programming

Like you could slice LP constraint matrices by rows, you can also do it by columns. Split A into columns A₁, A₂, …, A_q. Then, you can break the constraints into: A₁ x₁ + A₂ x₂ + … + A_q x_q = b.

Back when we were looking at LP the first time, we saw the feasible region as a polygon (or polyhedron), and the vertices were the extreme points, which are the candidate solutions. These extreme points cannot be expressed as convex combination of other feasible solutions. Even more exciting, Thm: All feasible solutions are convex combinations of extreme points.

Each constraint point corresponds in some way to the column breakdown shown above, which allows us to do LP is a Linear Algebra way.

Thm: A feasible solution is an extreme point iff:

A_i corresponding to x_i > 0 are independent. That is, given a point x, look at its coordinates x_i, find the ones greater than 0, and check if the A_i corresponding to them are independent.

36 2015-08-31 Monday

36.1 Linear Programming (LP)

An instance of LP:

min ∑_j=1^q c_j x_j, subject to:
∑_j=1^q a_ij x_j ≤ b_i, for i = 1, 2, …, p, and j=1, 2, …, q
x_j > 0

The constraints define X, the feasible region. You can switch a minimization problem to a maximization problem by negating the objective function. Minimization is the "standard form". You can also define the "slack variables" in the constraints, which were covered a bit more in the EECS 440 lecture on LP. EG, diet problem:

i	Food	Energy	Protein	Calcium	Price	Max
1	Oatmeal	110	4	2	3	4
2	Chicken	205	32	12	24	3
3	Eggs	160	13	54	13	2
4	Milk	160	8	285	9	8
5	Pie	420	4	22	20	2
6	Pork w/ beans	260	14	80	19	2
	GOALS	2000	55	800	min

Decision variable is x₁, so here is the problem:

minimize, 3x₁ + 24x₂ + 13x₃ + 9x₄ + 20x₅ + 19x₆, subject to:
110x₁ + 205x₂ + 160x₃ + 160x₄ + 420x₅ + 260x₆ ≥ 2000
4x₁ + 32x₂ + 13x₃ + 8x₄ + 4x₅ + 14x₆ ≥ 55
2x₁ + 12x₂ + 54x₃ + 285x₄ + 22x₅ + 80x₆ ≥ 800
0 ≤ x₁ ≤ 4
0 ≤ x₂ ≤ 3
0 ≤ x₃ ≤ 2
0 ≤ x₄ ≤ 8
0 ≤ x₅ ≤ 2
0 ≤ x₆ ≤ 2

This isn't in standard form due to the greater than or equal to in the top 3 constraints, and the less than or equal to in the variable bounds. I guess.

What to do to get decision variables unrestricted in sign (not in std form): If you want x to be negative (or just allowed to be negative) replace it with two variables (say, y and z). Substitute x with y-z, and add the condition that y,z ≥ 0. This allows x (aka y-z) to be positive or negative, but you could add more conditions on y-z to make it how you'd like.

The graphical representation of these problems is pretty simple (when you have two variables). The constraints create a nice shaded polygon that represents your feasible region, and then you pick the vertex that maximizes the objective function.

Claim: There is always an optimal solution in an extreme point. That's worded weird. I prefer "an optimal solution is always an extreme point."

You can represent a LP instance in matrix form like this:

min C^T x
s.t. Ax=b
x ≥ 0

Where, x = (x₁, x₂, …, x_q)^T, c = (c₁, c₂, …, c_q)^T, A=(a₁₁, a₁₂, …, a_1q; …; a_p1, a_p2, …, a_pq), b=(b₁, b₂, …, b_p)^T.

36.2 Integer Linear Programming

Same as ^, except that the x's must be integers. Since this is a more restricted problem, the solutions are no better than the LP solutions.

36.3 Mixed Integer Linear Programming

MILP. Really?

> Matrix I'd Like to Program - Andrew Mason

Only some of the decision variables need to be integral, others can be continuous.

36.4 Next Time, on Advanced Algorithms:

Vertex cover, formulated as ILP.

37 2015-08-28 Friday

37.1 Last Time:

Approximation algorithms have approximation ratio:

apx ratio = $max_{I\in{}\mathscr{I}} {\frac{c(I)}{c^*(I)}}$

A c-approximation algorithm has cost ≤ c × optimal cost on all instances I of the problem $\mathscr{I}$. One example is the vertex cover problem. We covered a 2-approximation algorithm (called VCapx) that operates by repeatedly choosing an edge, adding its endpoints to the VC, and removing all incident edges from the graph.

We left off saying that today we would cover the proof that it is a 2-apx algorithm.

37.2 Proof

Theorem VCapx is a 2-approximation algorithm.

Proof Every edge is covered by VCapx at termination. For every one of these edges, the algorithm adds at most two vertices to $V'$. The optimal solution contains at least one of these two. VCapx never considers the same vertex twice (since it deletes incident edges). So, this is a 2 approximation algorithm.

Here's the actual text of his proof:

Every edge is covered by VCapx at terminates.
∀ edge chosen by VCapx
- VCapx adds 2 vertices to $V'$
- Opt contains at least one of the two vertices
VCapx never considers same vertex twice. (by deleting incident edges)
- → edges are disjoint, → $V'$ can be partitioned by edges added by VCapx
→ 2-apx algorithm

37.3 Reduction

The pipeline of reduction:

(X, c) ∈ I → (X', c') ∈ I' → x'^* → x^*

If the time to translate (X,c) to (X', c') is T₁, and the time to translate x'^* to x^* is T₂, then problem I reduces to *I^'* in time T₁ + T₂.

EG: Any maximization problem reduces to a minimization problem in O(1) time.

37.3.1 Optimal Message Passing

Given a graph G=(V,E) with probability p_e (0 < p_e < 1) associated to each e ∈ E. Find a spanning tree of G that minimizes the probability of failure. (The probabilities are of failure, and independent).

So, the probability of survival for the whole tree is Π_{e∈ T} (1-p_e).

We can reduce the OMP to Minimum Spanning Tree problem in linear time. We define the weight of an edge to be w_e = -log (1-p_e). The cost of an MST is c(T) = ∑_{e∈ T} w_e = ∑_{e∈ T} log 1/(1-p_e) = log Π_{e∈ T} 1/(1-p_e) = log 1/(Π_{e∈ T}(1-p_e)). Since we're trying to minimize that logarithm, and logarithms are strictly increasing functions, we also are minimizing the inside of the logarithm. This is the same as maximizing the denominator, which happens to be the probability of survival of the tree.

37.3.2 Choosing your reduction

This isn't necessarily like EECS 343 reductions, where you find the easiest reduction to do. There are entire families of problems that are special cases of each other. A problem might be able to be reduced to the simplest of these, or the most general of these. The reduction to the most general problem is usually easiest, and the reduction to the simpler problem is more difficult. The advantage of doing the harder reduction is generally a faster algorithm to solve the simpler problem. It's just a wonderful world of tradeoffs here in computer science land.

37.4 GNU Octave

Download it via your package manager, or from the GNU website if you're a Win/Mac user.
There is a good deal of documentation on the GNU site about how to use Octave. It looks like a less powerful Python+NumPy+Matplotlib, or maybe a less powerful (open source) Mathematica.
glpk function for linear programming.
First homework this afternoon, due in two weeks!

38 2015-08-26 Wednesday

38.1 Optimization Problems (cont.)

Definition: an Optimization problem is a set of instances of optimization problems (iops).

Ex. Minimum Spanning Tree (MST) - give spanning tree $T'$ of graph $G$. Minimize $\sum_{e \in T'}c(e)$, where $c(e)$ is the cost of an edge.

Ex. Vertex Cover - (Find $V' \subseteq V$ s.t. $\forall (e \in E)\: e \bigcap V \neq \emptyset $. V' is the vertex cover. Minimize $ \vert V' \vert$).

38.2 Maximization problems

What about maximization problems? Well, they can reduce to minimization problems: \[ I_{max} = (X, f) \Rightarrow \tilde{I} = (X, -f) \:( \text{or sometimes }\tilde{I} = \left(X, \frac{1}{f}\right) ) \]

$\tilde{I}$ is now a minimization problem.

38.3 Approximation Algorithms

Let I be an iop with an optimal solution $x^*$ with an optimal value $c(x^*) = c^*(I)$. Let $c(I)$ be the cost accrued by running algorithm A on I. Then, the approximation ratio is: \[ R = max_I \left( \frac{c(I)}{c^*(I)}, \frac{c^*(I)}{c(I)} \right) \]

(This formulation allows us to use the same ratio function for max and min problems.)

If for any $\epsilon > 0 \exists\:A\text{ s.t. }R = 1 + \epsilon \wedge O(n^k)\text{ w.r.t. input size} $ then $A$ is a polynomial time approximation scheme (PTAS).

If A is polynomial w.r.t. input size and $\epsilon$, then it is called a fully polynomial time approximation scheme.

def: a k-approximation algorithm is if its result is within k-times the minimum result. e.g. $\forall G\:|VC| \le 2|VC^*|$ for approx vertex cover.

38.4 Approx Vertex Cover

Algorithm: Given $G = (V,E)$

repeatedly
- choose $ e = (u,v) \in E $
- $V' = V' \cup e$
- delete all $d \in E$ incident to $u,v$
until $ E = \emptyset $
return $V'$

Proof of correctness / 2 approx algorithm:

Every edge covered by VC aprx at termination
$\forall e \in VC_{approx}$
- VC aprx adds at most 2 vertexes to $V'$
- VC opt contains at least 1 of two vertexes to remain feasible
VC aprx never considers same vertex twice because edges are removed
edges are disjoint $\Rightarrow\:V'$ can be partitioned by edges added by VC aprc
therefore, 2 aprx algorithm

39 2015-08-24 Monday

5 books, get sections from library
2 tests:
- Final exam, possibly oral.
- Midterm
6 homeworks:
- Need to know octave

39.1 Asymptotics

Measure time complexity. Focus is on large inputs.

f(n) ∈ O(g(n)) means "f(n) ≤ g(n)"

∃ c > g, n₀ > 0 s.t. ∀ n ≥ n₀ : f(n) ≤ c g(n)
f(n) = Ω(g(n)) defined: g(n) ∈ O(f(n))
f(n) ∈ Θ(g(n)) defined: f(n) ∈ O(g(n)) and f(n) ∈ Ω(g(n))
f(n) ∈ o(g(n)) defined: $\lim_{n\to\infty} \frac{f(n)}{g(n)} = 0$
f(n) ∈ ω(g(n)) defined: $\lim{n\to\infty} \frac{f(n)}{g(n)} =\infty$

What is $n$? Input size. Sometimes it's a number of elements, or it could be multiple parameters (number of nodes, number of edges).

Sometimes we use the number of bits of the input. For example, an algorithm with input integer $k$. The number of bits is $n=\Theta(\log k)$. If the runtime is $O(k)$, it looks like it's linear time. But in the number of bits, it's exponential ($O(2^n)$). It looks polynomial, but it's exponential. It's called pseudo-polynomial.

Formula:

$(1-\frac{x}{k})^k$, where $x \in R$, $k \in N^+$. We have that quantity $< e^{-k}$, and $\geq (1-x)$. This will be used a lot apparently.

39.2 Optimization Problems

Definition: An instance of an optimization problem is a pair $(X,f)$, where $X$ is a set of feasible solutions, and $f$ is an objective function. $f$ maps from $X$ to the real numbers. An optimal solution $x^*$ is an element of $X$ with the property that $f(x^*) \leq f(x) \: \forall x \in X$.

For instance, if you have a graph and you're talking about the minimum spanning tree problem, $X$ is the set of all MSTs, and $f$ maps each to the sum of the edge weights in the tree.

Case	\(W(k)\)	Pr
\(a=0 \to b=0\)	\(2W(k-1)\)	\((1-p)^2 = p\)
\(a=0 \to b=1 \to c=0 \to d=?\)	\(4W(k-1)\)	\((1-p^2)p = p^2\)
\(a=0 \to b=1 \to c=1\)	\(3W(k-1)\)	\(p^2(1-p)\)
\(a=1 \to c=0 \to d=?\)	\(3W(k-1)\)	\(p(1-p)\)
\(a=1 \to c=1\)	\(2W(k-1)\)	\(p(1-p)\)

a	b	\(\lnot(a \lor b)\)	probability
0	0	1	\(1-p)^2 = p\)
0	1	0	\(1-(1-p)^2\ = 1-p\)
1	0	0	same
1	1	0	same

EECS 477 Notes

Table of Contents

1 Final Exam Prep

2 2015-12-04 Friday

2.1 Paging Continued

2.1.1 Marking Algorithm

2.2 Final

3 2015-12-02 Wednesday

3.1 Amortized Analysis (continued)

3.1.1 Aggregate Analysis

3.1.2 Accounting Method

3.1.3 Potential Function Method

3.2 Paging Problem

4 2015-11-30 Monday

4.1 Homework 5 recap

4.2 Amortized Analysis

5 2015-11-23 Monday

5.1 Max Cut

6 2015-11-20 Friday

6.1 MAX SAT

7 2015-11-18 Wednesday

7.1 "Math Facts"

7.2 Back to Hitting Set

7.3 Maximum Satisfiability

8 2015-11-16 Monday

8.1 Hitting Set

8.2 Primal Dual Algorithm for HS

8.3 Example

8.4 Approximation

8.5 Randomized Approximation Algorithm

9 2015-11-11 Wednesday

9.1 Game Tree Evaluation (cont)

10 2015-11-09 Monday

10.1 Game Tree Evaluation

10.1.1 Pruned Depth First Search

10.1.2 Game Theory Approach to Game Tree Evaluation

10.1.3 Randomized Algorithms

10.1.4 Worst Case

11 2015-11-06 Friday

11.1 Two Person Zero Sum Games (cont)

11.1.1 Yao's Principle

11.1.2 Example Time

12 2015-11-04 Wednesday

12.1 Two Person Zero Sum Games

12.2 Saddle Points for Mixed Strategies

12.3 Something Else

13 2015-11-02 Monday

13.1 Dr. Liberatore's Homework

13.2 Our Homework

13.2.1 MCNF and Max Flow

13.2.2 Successive Shortest Paths

14 2015-10-30 Friday

14.1 Games (strategic form)

14.2 Mixed Strategies

14.3 Games (Behavioral Form)

14.4 Two person zero sum game

15 2015-10-28 Wednesday

15.1 Capacity Scaling (cont)

15.1.1 Runtime

15.2 Max Flow Min Cut

15.3 Game Theory!

16 2015-10-26 Monday

16.1 Runtime of Successive Shortest Paths

16.2 Capacity Scaling

17 2015-10-21 Wednesday

17.1 MCNF Algorithms

17.1.1 Successive Shortest Paths

18 2015-10-16 Friday

19 2015-10-14 Wednesday

19.1 Back to Equipment Replacement

19.2 Back to Uniform Cost Parallel Machines

19.3 Midterm Questions

19.4 Final Exam

20 2015-10-12 Monday

20.1 Homework 3 Review

21 2015-10-09 Friday

21.1 Uniform Parallel Machines

21.2 Equipment Replacement

22 2015-10-07 Wednesday

22.1 Max Flow Reduction: Matrix Rounding