您的位置:首页 > 运维架构

[笔记] Convex Optimization 2015.10.28

2015-11-09 01:19 260 查看
Proposition: Let f:R→Rf : \mathbb{R} \to \mathbb{R} with domfdom \, f convex and ff twice differentiable.

Then ff is convex if f′′(x)≥0f''(x) \ge 0 for all x∈domfx \in dom \, f.

Proof: Let z,x∈domfz, x \in dom \, f, then

f(z)===≥f(x)+∫zxf′(t)dtf(x)+∫zx(f′(x)+∫txf′′(s)ds)dtf(x)+f′(x)(z−x)+∫zx∫txf′′(s)dsdtf(x)+f′(x)(z−x)(two case to consider)\begin{align*}
f(z) =& f(x) + \int _x^z f'(t)dt \\
=& f(x) + \int _x^z (f'(x) + \int _x^t f''(s)ds )dt \\
=& f(x) + f'(x)(z - x) + \int _x^z \int _x^t f''(s)ds dt \\
\ge & f(x) + f'(x)(z - x) & \text{(two case to consider)}\\
\end{align*}

QED by “First order conditions”

Chain Rule: Let f:Rn→Rmf : \mathbb{R}^n \to \mathbb{R}^m be differentiable at x∈domfx \in dom \, f,

let g:Rm→Rkg : \mathbb{R}^m \to \mathbb{R}^k be differentiable at f(x)∈domgf(x) \in dom \, g,

then if h:Rn→Rkh : \mathbb{R}^n \to \mathbb{R}^k is defined by h(y)=g(f(y))∀y∈Rnh(y) = g(f(y)) \; \forall y \in \mathbb{R}^n, hh is differentiable at xx and Dh(x)=Dgf(x))⋅Df(x)Dh(x) = Dgf(x)) \cdot Df(x)

(Df:m×nDf : m \times n matrix, Dg:k×mDg : k \times m matrix)

can be written as h=g∘f,D(g∘f)=(Dg∘f)⋅Dfh = g \circ f, D(g \circ f) = (Dg \circ f) \cdot Df

Example: Let f:Rm→Rf : \mathbb{R}^m \to \mathbb{R}, A∈Rm×nA \in \mathbb{R}^{m \times n}, b∈Rnb \in \mathbb{R}^n, l(x)=Ax+bl(x) = Ax + b.

D(f∘l)(x)=[(Df∘l)⋅Dl](x)=Df(Ax+b)⋅A=∇f(Ax+b)TAD(f \circ l)(x) = [(Df \circ l) \cdot Dl](x) = Df(Ax + b) \cdot A = \nabla f(Ax + b)^T A

Example: Let f:Rn→Rf : \mathbb{R}^n \to \mathbb{R}, g:R→Rg : \mathbb{R} \to \mathbb{R}, then

D(g∘f)(x)=Dg(f(x))Df(x)=g′(f(x))⋅∇f(x)TD(g \circ f)(x) = Dg(f(x))Df(x) = g'(f(x)) \cdot \nabla f(x)^T

Example: Let f:Rn→Rf : \mathbb{R}^n \to \mathbb{R}, g:R→Rg : \mathbb{R} \to \mathbb{R} be defined by g(t)=f(x+tu)g(t) = f(x + tu) for some vectors x,ux, u.

To compute g′(t)g'(t), let h(t)=x+tuh(t) = x + tu, so h:R→Rnh : \mathbb{R} \to \mathbb{R}^n and g=f∘hg = f \circ h.

So g′(t)=((Df∘h)⋅Dh)(t)=∇fT(h(t))⋅Dh(t)=∇f(x+tu)T⋅u=uT∇f(x+tu)g'(t) = ((Df \circ h) \cdot Dh)(t) = \nabla f^T(h(t)) \cdot Dh(t) = \nabla f(x + tu)^T \cdot u = u^T \nabla f(x + tu).

To compute g′′(t)g''(t),

g′′(t)=(D[(uT∇f)∘h])(t)=([(DuT∇f)∘h]⋅Dh)(t)=(((uTD∇f)∘h)⋅u)(t)=uT∇2f(h(t))⋅ug''(t) = (D[(u^T \nabla f) \circ h])(t) = ([(Du^T \nabla f) \circ h] \cdot Dh)(t) = (((u^T D \nabla f) \circ h) \cdot u)(t) = u^T \nabla ^2 f(h(t)) \cdot u

Corollary: Let f:Rn→Rf : \mathbb{R}^n \to \mathbb{R} be twice differentiable, domfdom \, f convex.

The ff is convex if ∇2f⪰0\nabla ^2 f \succeq 0.

Example: “log-sum-exp” f(x)=log(ex1+⋯+exn),f:Rn→R,domf=Rnf(x) = log(e^{x_1} + \cdots + e^{x_n}), f : \mathbb{R}^n \to \mathbb{R}, dom \, f = \mathbb{R}^n

∇f(x)=⎡⎣⎢⎢⎢⎢∂f∂x1(x)⋮∂f∂xn(x)⎤⎦⎥⎥⎥⎥=1ex1+⋯+exn⋅⎡⎣⎢⎢ex1⋮exn⎤⎦⎥⎥\nabla f(x) = \begin{bmatrix} \frac{\partial f}{\partial x_1}(x) \\ \vdots \\ \frac{\partial f}{\partial x_n}(x) \end{bmatrix} = \frac{1}{e^{x_1} + \cdots + e^{x_n}} \cdot \begin{bmatrix} e^{x_1} \\ \vdots \\ e^{x_n} \end{bmatrix}

∂∂xi1ex1+⋯+exn=−(1ex1+⋯+exn)2⋅exi\frac{\partial}{\partial x_i} \frac{1}{e^{x_1} + \cdots + e^{x_n}} = -\left( \frac{1}{e^{x_1} + \cdots + e^{x_n}} \right)^2 \cdot e^{x_i}

(∇2f)ij,i≠j=∂∂xiexjex1+⋯+exn=−(1ex1+⋯+exn)2⋅exiexj(\nabla ^2 f)_{ij, \, i \neq j} = \frac{\partial}{\partial x_i} \frac{e^{x_j}}{e^{x_1} + \cdots + e^{x_n}} = -\left( \frac{1}{e^{x_1} + \cdots + e^{x_n}} \right)^2 \cdot e^{x_i} e^{x_j}

(∇2f)ii=∂∂xiexiex1+⋯+exn=−(1ex1+⋯+exn)2⋅(exi)2+exiex1+⋯+exn(\nabla ^2 f)_{ii} = \frac{\partial}{\partial x_i} \frac{e^{x_i}}{e^{x_1} + \cdots + e^{x_n}} = -\left( \frac{1}{e^{x_1} + \cdots + e^{x_n}} \right)^2 \cdot (e^{x_i})^2 + \frac{e^{x_i}}{e^{x_1} + \cdots + e^{x_n}}

Put zi=exiz_i = e^{x_i}, then ex1+⋯+exn=ITze^{x_1} + \cdots + e^{x_n} = I^T z

∇2f=−(1ITz)2zzT+1ITz⋅diag(z)=1ITz(diag(z)−1ITzzzT)\nabla ^2 f = -\left(\frac{1}{I^T z} \right)^2 z z^T + \frac{1}{I^T z} \cdot \text{diag}(z) = \frac{1}{I^T z} \left( \text{diag}(z) - \frac{1}{I^T z} z z^T \right)

xT(ITz⋅diag(z)−zzT)x≥0⟸⟸ITz⋅∑i=1nx2i⋅zi−(zTx)2≥0(zTx)2≤ITz⋅∑i=1nx2izi=∥(z1−−√,⋯,zn−−√)∥2⋅∑i=1n∥x1z1−−√,⋯,xnzn−−√∥2\begin{align*}
x^T (I^T z \cdot \text{diag}(z) - z z^T)x \ge 0 \impliedby & I^T z \cdot \sum_{i = 1}^n x_i^2 \cdot z_i - (z^T x)^2 \ge 0 \\
\impliedby & (z^T x)^2 \le I^T z \cdot \sum_{i = 1}^n x_i^2 z_i = \lVert (\sqrt{z_1}, \cdots, \sqrt{z_n}) \rVert _2 \cdot \sum_{i = 1}^n \lVert x_1 \sqrt{z_1}, \cdots, x_n \sqrt{z_n} \rVert _2 \\
\end{align*}

Exercise: Prove that f(x,y)=y2/xf(x, y) = y^2 / x is convex, domf=R++×Rdom \, f = \mathbb{R}_{++} \times \mathbb{R}

∇f=⎡⎣−y2x22yx⎤⎦,∇2f=⎡⎣−2y2x3−2yx2−2yx22x⎤⎦=1x3[2y2−2xy−2xy2x2]=2x3[y−x][y−x]\nabla f = \begin{bmatrix} -\frac{y^2}{x^2} \\ \frac{2y}{x} \end{bmatrix}, \nabla ^2 f = \begin{bmatrix} -\frac{2y^2}{x^3} & -\frac{2y}{x^2} \\ -\frac{2y}{x^2} & \frac{2}{x} \end{bmatrix} = \frac{1}{x^3} \begin{bmatrix} 2y^2 & -2xy \\ -2xy & 2x^2 \end{bmatrix} = \frac{2}{x^3} \begin{bmatrix} y \\ -x \end{bmatrix} \begin{bmatrix} y -x \end{bmatrix}

Proposition: Let f:Rn→Rf : \mathbb{R}^n \to \mathbb{R} be twice differentiable at xx. Then

f(x+z)=f(x)+∇f(x)Tz+12zT∇2f(x)z+errx(z)f(x + z) = f(x) + \nabla f(x)^T z + \frac{1}{2} z^T \nabla ^2 f(x) z + err_x(z)

where limz→0∥errx(z)∥2∥z∥22=0\lim_{z \to 0} \frac{\lVert err_x(z) \rVert _2}{\lVert z \rVert _2^2} = 0.

Equivalent to: ∀ε>0,∃r>0,s.t.(∥z∥2≤r⟹∥errx(z)∥2≤ε⋅∥z∥22)\forall \varepsilon \gt 0, \, \exists r \gt 0, \, s.t. \, (\lVert z \rVert _2 \le r \implies \lVert err_x(z) \rVert _2 \le \varepsilon \cdot \lVert z \rVert _2^2)

Proof: Let ε>0\varepsilon \gt 0, then ∃r>0\exists r \gt 0 s.t.

∇f(x+z)=∇f(x)+∇2f(x)z+errx(z)\nabla f(x + z) = \nabla f(x) + \nabla ^2 f(x) z + err_x(z)

where ∥errx(z)∥2≤ε⋅∥z∥2\lVert err_x(z) \rVert _2 \le \varepsilon \cdot \lVert z \rVert _2 for all zz s.t. ∥z∥2≤r\lVert z \rVert _2 \le r

∥∇f(x+z)−∇f(x)−∇2f(x)z∥2≤ε∥z∥2\lVert \nabla f(x + z) - \nabla f(x) - \nabla ^2 f(x) z \rVert _2 \le \varepsilon \lVert z \rVert _2 for all zz s.t. ∥z∥2≤r\lVert z \rVert _2 \le r

Let zz s.t. ∥z∥2≤r\lVert z \rVert _2 \le r and let u=z/∥z∥2u = z / \lVert z \rVert _2, g(t)=f(x+tu),t∈Rg(t) = f(x + tu), t \in \mathbb{R}.

Then g′(t)=∇f(x+tu)ug'(t) = \nabla f(x + tu)u

So f(x+tu)=====≤≤===f(x)+∫t0g′(s)dsf(x)+∫t0uT∇f(x+su)dsf(x)+uT∫t0(∇f(x)+∇2f(x)su+errx(su))dsf(x)+uT∇f(x)(t−0)+uT∇2f(x)u∫t0sds+uT∫t0errx(su)dsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2∫t0∥errx(su)∥2dsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2∫t0ε∥su∥2dsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2ε∫t0tdsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2εt2f(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+εt2f(x)+∇f(x)Tz+12zT∇2f(x)z+ε∥z∥22\begin{align*}
f(x + tu) = & f(x) + \int _0^t g'(s)ds \\
=& f(x) + \int _0^t u^T \nabla f(x + su)ds \\
=& f(x) + u^T \int _0^t (\nabla f(x) + \nabla ^2 f(x)su + err_x(su))ds \\
=& f(x) + u^T \nabla f(x) (t - 0) + u^T \nabla ^2 f(x) u \int _0^t s ds + u^T \int _0^t err_x(su)ds \\
=& f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \int _0^t \lVert err_x(su) \rVert _2 ds \\
\le & f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \int _0^t \varepsilon \lVert su \rVert _2 ds \\
\le & f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \varepsilon \int _0^t t ds \\
=& f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \varepsilon t^2 \\
=& f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \varepsilon t^2 \\
=& f(x) + \nabla f(x)^T z + \frac{1}{2} z^T \nabla ^2 f(x) z + \varepsilon \lVert z \rVert _2^2 \\
\end{align*}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: