Subgradient Method

Subgradient

\(g\) is a subgradient of a convex function \(f\) at \(x\in{}dom~f\) if \[f(y)\geq{}f(x)+g^\top(y-x)\quad\text{for all }y\in{}dom~f\]

Properties

\(f(x)+g^\top(y-x)\) is a global lower bound on \(f(y)\)
\(g\) defines non-vertical supporting hyperplane to \(epi~f\) at \((x, f (x))\)
if \(f\) is convex and differentiable, then \(\nabla{}f(x)\) is a subgradient of \(g\) at \(x\)

Subdifferential

the subdifferential \(\partial{}f(x)\) of \(f\) at \(x\) is the set of all subgradients: \[\partial{}f(x)=\{g|g^\top(y-x)\leq{}f(y)-f(x),\forall{}y\in{}dom~f\}\]

Monotonicity

the subdifferential of a convex function is a monotone operator: \[(u-v)^\top(x-y)\geq0,\text{for all }x,y,u\in\partial{}f(x),v\in\partial{}f(y)\]

Directional Derivative

Introduction

\[f(x+d)=f(x)+\nabla{}f(x)^\top{}d\] \[\text{If }\nabla{}f(x)^\top{}d<0,d\text{ is a descent direction}\]

Definition

The directional derivative of \(f\) at \(x\) in the direction \(y\) is \[f^{'}(x;y)=\lim_{a\searrow{}0}\frac{f(x+\alpha{}y)-f(x)}{\alpha}=\lim_{t\rightarrow\infty}(tf(x+\frac{y}{t})-tf(x))\]

\(f^{'}(x;y)\) is the right derivative of \(g(\alpha)=f(x+\alpha{}y)\) at \(\alpha=0\)

Proof:Existence

\[h(\alpha)=\frac{f(x+\alpha{}y)-f(x)}{\alpha}\] \[x+\alpha_1{}y=x+\frac{\alpha_1}{\alpha_2}x-\frac{\alpha_1}{\alpha_2}x+\frac{\alpha_1}{\alpha_2}\alpha_2y=(1-\frac{\alpha_1}{\alpha_2})x+\frac{\alpha_1}{\alpha_2}(x+\alpha_2y)\] \[f(x+\alpha_1{}y)\leq(1-\frac{\alpha_1}{\alpha_2})f(x)+\frac{\alpha_1}{\alpha_2}f(x+\alpha_2y)\text{ (convex)}\] \[h(\alpha_1)\leq{}h(\alpha_2)\]

Proof:Homogeneous

\[\text{If }x\in\text{int dom}f\Rightarrow{}\text{dom}f^{'}(x,y)=R^n\] \[\text{Let }\beta=\alpha\lambda,\text{ it is easy to prove}\]

Directional derivative and subgradients

For convex \(f\) and \(x\in\text{int dom}f\): \[f^{'}(x;y)=\sup_{g\in\partial{}f(x)}g^\top{}y\]

generalize \(f^{'}(x;y)\) for differentiable functions
implies that \(f^{'}(x;y)\) exists for all \(x\in\text{int dom}f\), all \(y\)

Proof

\[f^{'}(x;y)=\lim_{a\searrow{}0}\frac{f(x+\alpha{}y)-f(x)}{\alpha}\geq\lim_{a\searrow{}0}\frac{}{}\]

\[\text{Let }\hat{g}\in\partial_yf^{'}(x;y)\] \[\lambda{}f^{'}(x;v)=f^{'}(x,\lambda{}v)\geq{}f^{'}(x;y)+\hat{g}^\top(\lambda{}v-y)\]\[\Leftrightarrow{}\lambda(f^{'}(x,v)-\hat{g}^\top{}v)\geq{}f^{'}(x;y)-\hat{g}^\top{}y\] \[\Leftrightarrow{}^{\lambda\rightarrow\infty}f^{'}(x,v)\geq\hat{h}^\top{}v\] \[\Leftrightarrow{}f(x,v)\geq{}f(x)+f^{'}(x,v)\geq{}f(x)+\hat{g}^\top{}v\Rightarrow{}\hat{g}\in\partial{}f(x)\] \[\Leftrightarrow{}^{\lambda\rightarrow0}f^{'}(x,y)\leq{}g^\top{}y\]

Steepest descent direction

Steepest descent direction at \(x\in\text{int dom} f\) is \[\Delta{}x_{nsd}=\arg\min_{||y||_2\leq1}f^{'}(x;y)\] \[\text{(P) }\min_yf^{'}(x;y),s.t.||y||_2\leq1\] \[\text{(D) }\max_g-||g||_2,s.t.g\in\partial{}f(x)\]

Proof

if \(f\) is convex, \(f(y)<f(x),g\in\partial{}f(x)\), then for small \(t>0\): \[||x-tg-y||_2^2=||x-y||_2^2-2tg^\top(x-y)+t^2||g||_2^2\leq||x-y||_2^2-2t(f(x)-f(y))+t^2||g||_2^2<||x-y||_2^2\] \[f^\star(x^\star)=\min{}f(x),x_{k+1}=x_k-tg_k,g_k\in\partial{}f(x_k)\] \[f(x_k)>f(x^\star),\text{but }||x_{k+1}-x^\star||<||x_k-x^\star||\]

Lemma 1

Let \(x_0\in{}int~dom~f\), then \[f \text{ is convex }\Rightarrow{}f\text{ is continuous at }x_0\]

Proof

\[g(x)=f(x_0+x)-f(x_0), g(0)=0, g\text{ is convex}, 0\in~int~dom~g\] \[\exists\alpha~s.t.~x_0+\alpha{}y_i\in~dom~f\] \[\{y_1,\dots,y_{2n}\}=\{e_1,\dots,e_n,-e_1,\dots,-e_n\}\] ......

Subgradient Method

to minimize a nondifferentiable convex function \(f\): choose \(x^{(0)}\) and repeat \[x^{(k)}=x^{(k-1)}-t_kg^{(k-1)},k=1,2,\dots\] \(g^{(k-1)}\) is any subgradient of \(f\) at \(x^{(k−1)}\)

Assumptions

\(f\) has finite optimal value \(f^\star\), minimizer \(x^\star\)
\(f\) is convex, \(dom~f = R^n\)
\(f\) is Lipschitz continuous with constant \(G > 0\)

\[|f(x)-f(y)|\leq{}G||x-y||_2\quad\forall{}x,y\] this is equivalent to \(||g||_2\leq{}G\) for all \(x\) and \(g\in\partial{}f(x)\)

Analysis

the subgradient method is not a descent method
the key quantity in the analysis is the distance to the optimal set

with \(x^+=x^{(i)},x=x^{(i-1)},g=g^{(i-1)},t=t_i\): \[||x^+-x^\star||_2^2=||x-tg-x^\star||_2^2\] \[=||x-x^\star||_2^2-2tg^\top(x-x^\star)+t^2||g_2^2||\leq||x-x^\star||_2^2-2t(f(x)-f^\star)+t^2||g_2^2||\] combine inequalities for \(i=1,\dots,k\), and define \(f^{(k)}_{\text{best}}=\min_{0\leq{}i\leq{}k}f(x^{(i)})\) \[(f^{(k)}_{\text{best}}-f^\star)\leq\frac{||x^{(0)}-x^\star||_2^2+\sum_{i=1}^kt_i^2||g^{(i-1)}||_2^2}{2\sum_{i=1}^kt_i}\]

Fixed step size: \(t_i=t\)

\[f^{(k)}_{\text{best}}-f^\star\leq\frac{||x^{(0)}-x^\star||_2^2}{2kt}+\frac{G^2t}{2}\]

does not guarantee convergence of \(f^{(k)}_{\text{best}}\)

Fixed step length: \(t_i=s/||g^{(i-1)}||_2\)

\[f^{(k)}_{\text{best}}-f^\star\leq\frac{G||x^{(0)}-x^\star||_2^2}{2ks}+\frac{Gs}{2}\]

does not guarantee convergence of \(f^{(k)}_{\text{best}}\)

Diminishing step size: \(t_i\rightarrow0\)

\(f^{(k)}_{\text{best}}\) converges to \(f^\star\)
currently we don't know the speed of convergence
large \(t_i\) may converge fast at first, but then smaller \(t_i\) should be used

Optimal step size when \(f^\star\) is known

\[t_i=\frac{f(x^{(i-1)})-f^\star}{||g^{(i-1)}||_2^2}\]

applying recursively (with \(||x^{(0)}-x^\star||_2\leq{}R\) and \(||g^{(i)}||_2\leq{}G\)) gives

\[f^{(k)}_{\text{best}}-f^\star\leq\frac{GR}{\sqrt{k}}\]