[笔记] Convex Optimization 2015.10.28
2015-11-09 01:19
260 查看
Proposition: Let f:R→Rf : \mathbb{R} \to \mathbb{R} with domfdom \, f convex and ff twice differentiable.
Then ff is convex if f′′(x)≥0f''(x) \ge 0 for all x∈domfx \in dom \, f.
Proof: Let z,x∈domfz, x \in dom \, f, then
f(z)===≥f(x)+∫zxf′(t)dtf(x)+∫zx(f′(x)+∫txf′′(s)ds)dtf(x)+f′(x)(z−x)+∫zx∫txf′′(s)dsdtf(x)+f′(x)(z−x)(two case to consider)\begin{align*}
f(z) =& f(x) + \int _x^z f'(t)dt \\
=& f(x) + \int _x^z (f'(x) + \int _x^t f''(s)ds )dt \\
=& f(x) + f'(x)(z - x) + \int _x^z \int _x^t f''(s)ds dt \\
\ge & f(x) + f'(x)(z - x) & \text{(two case to consider)}\\
\end{align*}
QED by “First order conditions”
Chain Rule: Let f:Rn→Rmf : \mathbb{R}^n \to \mathbb{R}^m be differentiable at x∈domfx \in dom \, f,
let g:Rm→Rkg : \mathbb{R}^m \to \mathbb{R}^k be differentiable at f(x)∈domgf(x) \in dom \, g,
then if h:Rn→Rkh : \mathbb{R}^n \to \mathbb{R}^k is defined by h(y)=g(f(y))∀y∈Rnh(y) = g(f(y)) \; \forall y \in \mathbb{R}^n, hh is differentiable at xx and Dh(x)=Dgf(x))⋅Df(x)Dh(x) = Dgf(x)) \cdot Df(x)
(Df:m×nDf : m \times n matrix, Dg:k×mDg : k \times m matrix)
can be written as h=g∘f,D(g∘f)=(Dg∘f)⋅Dfh = g \circ f, D(g \circ f) = (Dg \circ f) \cdot Df
Example: Let f:Rm→Rf : \mathbb{R}^m \to \mathbb{R}, A∈Rm×nA \in \mathbb{R}^{m \times n}, b∈Rnb \in \mathbb{R}^n, l(x)=Ax+bl(x) = Ax + b.
D(f∘l)(x)=[(Df∘l)⋅Dl](x)=Df(Ax+b)⋅A=∇f(Ax+b)TAD(f \circ l)(x) = [(Df \circ l) \cdot Dl](x) = Df(Ax + b) \cdot A = \nabla f(Ax + b)^T A
Example: Let f:Rn→Rf : \mathbb{R}^n \to \mathbb{R}, g:R→Rg : \mathbb{R} \to \mathbb{R}, then
D(g∘f)(x)=Dg(f(x))Df(x)=g′(f(x))⋅∇f(x)TD(g \circ f)(x) = Dg(f(x))Df(x) = g'(f(x)) \cdot \nabla f(x)^T
Example: Let f:Rn→Rf : \mathbb{R}^n \to \mathbb{R}, g:R→Rg : \mathbb{R} \to \mathbb{R} be defined by g(t)=f(x+tu)g(t) = f(x + tu) for some vectors x,ux, u.
To compute g′(t)g'(t), let h(t)=x+tuh(t) = x + tu, so h:R→Rnh : \mathbb{R} \to \mathbb{R}^n and g=f∘hg = f \circ h.
So g′(t)=((Df∘h)⋅Dh)(t)=∇fT(h(t))⋅Dh(t)=∇f(x+tu)T⋅u=uT∇f(x+tu)g'(t) = ((Df \circ h) \cdot Dh)(t) = \nabla f^T(h(t)) \cdot Dh(t) = \nabla f(x + tu)^T \cdot u = u^T \nabla f(x + tu).
To compute g′′(t)g''(t),
g′′(t)=(D[(uT∇f)∘h])(t)=([(DuT∇f)∘h]⋅Dh)(t)=(((uTD∇f)∘h)⋅u)(t)=uT∇2f(h(t))⋅ug''(t) = (D[(u^T \nabla f) \circ h])(t) = ([(Du^T \nabla f) \circ h] \cdot Dh)(t) = (((u^T D \nabla f) \circ h) \cdot u)(t) = u^T \nabla ^2 f(h(t)) \cdot u
Corollary: Let f:Rn→Rf : \mathbb{R}^n \to \mathbb{R} be twice differentiable, domfdom \, f convex.
The ff is convex if ∇2f⪰0\nabla ^2 f \succeq 0.
Example: “log-sum-exp” f(x)=log(ex1+⋯+exn),f:Rn→R,domf=Rnf(x) = log(e^{x_1} + \cdots + e^{x_n}), f : \mathbb{R}^n \to \mathbb{R}, dom \, f = \mathbb{R}^n
∇f(x)=⎡⎣⎢⎢⎢⎢∂f∂x1(x)⋮∂f∂xn(x)⎤⎦⎥⎥⎥⎥=1ex1+⋯+exn⋅⎡⎣⎢⎢ex1⋮exn⎤⎦⎥⎥\nabla f(x) = \begin{bmatrix} \frac{\partial f}{\partial x_1}(x) \\ \vdots \\ \frac{\partial f}{\partial x_n}(x) \end{bmatrix} = \frac{1}{e^{x_1} + \cdots + e^{x_n}} \cdot \begin{bmatrix} e^{x_1} \\ \vdots \\ e^{x_n} \end{bmatrix}
∂∂xi1ex1+⋯+exn=−(1ex1+⋯+exn)2⋅exi\frac{\partial}{\partial x_i} \frac{1}{e^{x_1} + \cdots + e^{x_n}} = -\left( \frac{1}{e^{x_1} + \cdots + e^{x_n}} \right)^2 \cdot e^{x_i}
(∇2f)ij,i≠j=∂∂xiexjex1+⋯+exn=−(1ex1+⋯+exn)2⋅exiexj(\nabla ^2 f)_{ij, \, i \neq j} = \frac{\partial}{\partial x_i} \frac{e^{x_j}}{e^{x_1} + \cdots + e^{x_n}} = -\left( \frac{1}{e^{x_1} + \cdots + e^{x_n}} \right)^2 \cdot e^{x_i} e^{x_j}
(∇2f)ii=∂∂xiexiex1+⋯+exn=−(1ex1+⋯+exn)2⋅(exi)2+exiex1+⋯+exn(\nabla ^2 f)_{ii} = \frac{\partial}{\partial x_i} \frac{e^{x_i}}{e^{x_1} + \cdots + e^{x_n}} = -\left( \frac{1}{e^{x_1} + \cdots + e^{x_n}} \right)^2 \cdot (e^{x_i})^2 + \frac{e^{x_i}}{e^{x_1} + \cdots + e^{x_n}}
Put zi=exiz_i = e^{x_i}, then ex1+⋯+exn=ITze^{x_1} + \cdots + e^{x_n} = I^T z
∇2f=−(1ITz)2zzT+1ITz⋅diag(z)=1ITz(diag(z)−1ITzzzT)\nabla ^2 f = -\left(\frac{1}{I^T z} \right)^2 z z^T + \frac{1}{I^T z} \cdot \text{diag}(z) = \frac{1}{I^T z} \left( \text{diag}(z) - \frac{1}{I^T z} z z^T \right)
xT(ITz⋅diag(z)−zzT)x≥0⟸⟸ITz⋅∑i=1nx2i⋅zi−(zTx)2≥0(zTx)2≤ITz⋅∑i=1nx2izi=∥(z1−−√,⋯,zn−−√)∥2⋅∑i=1n∥x1z1−−√,⋯,xnzn−−√∥2\begin{align*}
x^T (I^T z \cdot \text{diag}(z) - z z^T)x \ge 0 \impliedby & I^T z \cdot \sum_{i = 1}^n x_i^2 \cdot z_i - (z^T x)^2 \ge 0 \\
\impliedby & (z^T x)^2 \le I^T z \cdot \sum_{i = 1}^n x_i^2 z_i = \lVert (\sqrt{z_1}, \cdots, \sqrt{z_n}) \rVert _2 \cdot \sum_{i = 1}^n \lVert x_1 \sqrt{z_1}, \cdots, x_n \sqrt{z_n} \rVert _2 \\
\end{align*}
Exercise: Prove that f(x,y)=y2/xf(x, y) = y^2 / x is convex, domf=R++×Rdom \, f = \mathbb{R}_{++} \times \mathbb{R}
∇f=⎡⎣−y2x22yx⎤⎦,∇2f=⎡⎣−2y2x3−2yx2−2yx22x⎤⎦=1x3[2y2−2xy−2xy2x2]=2x3[y−x][y−x]\nabla f = \begin{bmatrix} -\frac{y^2}{x^2} \\ \frac{2y}{x} \end{bmatrix}, \nabla ^2 f = \begin{bmatrix} -\frac{2y^2}{x^3} & -\frac{2y}{x^2} \\ -\frac{2y}{x^2} & \frac{2}{x} \end{bmatrix} = \frac{1}{x^3} \begin{bmatrix} 2y^2 & -2xy \\ -2xy & 2x^2 \end{bmatrix} = \frac{2}{x^3} \begin{bmatrix} y \\ -x \end{bmatrix} \begin{bmatrix} y -x \end{bmatrix}
Proposition: Let f:Rn→Rf : \mathbb{R}^n \to \mathbb{R} be twice differentiable at xx. Then
f(x+z)=f(x)+∇f(x)Tz+12zT∇2f(x)z+errx(z)f(x + z) = f(x) + \nabla f(x)^T z + \frac{1}{2} z^T \nabla ^2 f(x) z + err_x(z)
where limz→0∥errx(z)∥2∥z∥22=0\lim_{z \to 0} \frac{\lVert err_x(z) \rVert _2}{\lVert z \rVert _2^2} = 0.
Equivalent to: ∀ε>0,∃r>0,s.t.(∥z∥2≤r⟹∥errx(z)∥2≤ε⋅∥z∥22)\forall \varepsilon \gt 0, \, \exists r \gt 0, \, s.t. \, (\lVert z \rVert _2 \le r \implies \lVert err_x(z) \rVert _2 \le \varepsilon \cdot \lVert z \rVert _2^2)
Proof: Let ε>0\varepsilon \gt 0, then ∃r>0\exists r \gt 0 s.t.
∇f(x+z)=∇f(x)+∇2f(x)z+errx(z)\nabla f(x + z) = \nabla f(x) + \nabla ^2 f(x) z + err_x(z)
where ∥errx(z)∥2≤ε⋅∥z∥2\lVert err_x(z) \rVert _2 \le \varepsilon \cdot \lVert z \rVert _2 for all zz s.t. ∥z∥2≤r\lVert z \rVert _2 \le r
∥∇f(x+z)−∇f(x)−∇2f(x)z∥2≤ε∥z∥2\lVert \nabla f(x + z) - \nabla f(x) - \nabla ^2 f(x) z \rVert _2 \le \varepsilon \lVert z \rVert _2 for all zz s.t. ∥z∥2≤r\lVert z \rVert _2 \le r
Let zz s.t. ∥z∥2≤r\lVert z \rVert _2 \le r and let u=z/∥z∥2u = z / \lVert z \rVert _2, g(t)=f(x+tu),t∈Rg(t) = f(x + tu), t \in \mathbb{R}.
Then g′(t)=∇f(x+tu)ug'(t) = \nabla f(x + tu)u
So f(x+tu)=====≤≤===f(x)+∫t0g′(s)dsf(x)+∫t0uT∇f(x+su)dsf(x)+uT∫t0(∇f(x)+∇2f(x)su+errx(su))dsf(x)+uT∇f(x)(t−0)+uT∇2f(x)u∫t0sds+uT∫t0errx(su)dsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2∫t0∥errx(su)∥2dsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2∫t0ε∥su∥2dsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2ε∫t0tdsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2εt2f(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+εt2f(x)+∇f(x)Tz+12zT∇2f(x)z+ε∥z∥22\begin{align*}
f(x + tu) = & f(x) + \int _0^t g'(s)ds \\
=& f(x) + \int _0^t u^T \nabla f(x + su)ds \\
=& f(x) + u^T \int _0^t (\nabla f(x) + \nabla ^2 f(x)su + err_x(su))ds \\
=& f(x) + u^T \nabla f(x) (t - 0) + u^T \nabla ^2 f(x) u \int _0^t s ds + u^T \int _0^t err_x(su)ds \\
=& f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \int _0^t \lVert err_x(su) \rVert _2 ds \\
\le & f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \int _0^t \varepsilon \lVert su \rVert _2 ds \\
\le & f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \varepsilon \int _0^t t ds \\
=& f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \varepsilon t^2 \\
=& f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \varepsilon t^2 \\
=& f(x) + \nabla f(x)^T z + \frac{1}{2} z^T \nabla ^2 f(x) z + \varepsilon \lVert z \rVert _2^2 \\
\end{align*}
Then ff is convex if f′′(x)≥0f''(x) \ge 0 for all x∈domfx \in dom \, f.
Proof: Let z,x∈domfz, x \in dom \, f, then
f(z)===≥f(x)+∫zxf′(t)dtf(x)+∫zx(f′(x)+∫txf′′(s)ds)dtf(x)+f′(x)(z−x)+∫zx∫txf′′(s)dsdtf(x)+f′(x)(z−x)(two case to consider)\begin{align*}
f(z) =& f(x) + \int _x^z f'(t)dt \\
=& f(x) + \int _x^z (f'(x) + \int _x^t f''(s)ds )dt \\
=& f(x) + f'(x)(z - x) + \int _x^z \int _x^t f''(s)ds dt \\
\ge & f(x) + f'(x)(z - x) & \text{(two case to consider)}\\
\end{align*}
QED by “First order conditions”
Chain Rule: Let f:Rn→Rmf : \mathbb{R}^n \to \mathbb{R}^m be differentiable at x∈domfx \in dom \, f,
let g:Rm→Rkg : \mathbb{R}^m \to \mathbb{R}^k be differentiable at f(x)∈domgf(x) \in dom \, g,
then if h:Rn→Rkh : \mathbb{R}^n \to \mathbb{R}^k is defined by h(y)=g(f(y))∀y∈Rnh(y) = g(f(y)) \; \forall y \in \mathbb{R}^n, hh is differentiable at xx and Dh(x)=Dgf(x))⋅Df(x)Dh(x) = Dgf(x)) \cdot Df(x)
(Df:m×nDf : m \times n matrix, Dg:k×mDg : k \times m matrix)
can be written as h=g∘f,D(g∘f)=(Dg∘f)⋅Dfh = g \circ f, D(g \circ f) = (Dg \circ f) \cdot Df
Example: Let f:Rm→Rf : \mathbb{R}^m \to \mathbb{R}, A∈Rm×nA \in \mathbb{R}^{m \times n}, b∈Rnb \in \mathbb{R}^n, l(x)=Ax+bl(x) = Ax + b.
D(f∘l)(x)=[(Df∘l)⋅Dl](x)=Df(Ax+b)⋅A=∇f(Ax+b)TAD(f \circ l)(x) = [(Df \circ l) \cdot Dl](x) = Df(Ax + b) \cdot A = \nabla f(Ax + b)^T A
Example: Let f:Rn→Rf : \mathbb{R}^n \to \mathbb{R}, g:R→Rg : \mathbb{R} \to \mathbb{R}, then
D(g∘f)(x)=Dg(f(x))Df(x)=g′(f(x))⋅∇f(x)TD(g \circ f)(x) = Dg(f(x))Df(x) = g'(f(x)) \cdot \nabla f(x)^T
Example: Let f:Rn→Rf : \mathbb{R}^n \to \mathbb{R}, g:R→Rg : \mathbb{R} \to \mathbb{R} be defined by g(t)=f(x+tu)g(t) = f(x + tu) for some vectors x,ux, u.
To compute g′(t)g'(t), let h(t)=x+tuh(t) = x + tu, so h:R→Rnh : \mathbb{R} \to \mathbb{R}^n and g=f∘hg = f \circ h.
So g′(t)=((Df∘h)⋅Dh)(t)=∇fT(h(t))⋅Dh(t)=∇f(x+tu)T⋅u=uT∇f(x+tu)g'(t) = ((Df \circ h) \cdot Dh)(t) = \nabla f^T(h(t)) \cdot Dh(t) = \nabla f(x + tu)^T \cdot u = u^T \nabla f(x + tu).
To compute g′′(t)g''(t),
g′′(t)=(D[(uT∇f)∘h])(t)=([(DuT∇f)∘h]⋅Dh)(t)=(((uTD∇f)∘h)⋅u)(t)=uT∇2f(h(t))⋅ug''(t) = (D[(u^T \nabla f) \circ h])(t) = ([(Du^T \nabla f) \circ h] \cdot Dh)(t) = (((u^T D \nabla f) \circ h) \cdot u)(t) = u^T \nabla ^2 f(h(t)) \cdot u
Corollary: Let f:Rn→Rf : \mathbb{R}^n \to \mathbb{R} be twice differentiable, domfdom \, f convex.
The ff is convex if ∇2f⪰0\nabla ^2 f \succeq 0.
Example: “log-sum-exp” f(x)=log(ex1+⋯+exn),f:Rn→R,domf=Rnf(x) = log(e^{x_1} + \cdots + e^{x_n}), f : \mathbb{R}^n \to \mathbb{R}, dom \, f = \mathbb{R}^n
∇f(x)=⎡⎣⎢⎢⎢⎢∂f∂x1(x)⋮∂f∂xn(x)⎤⎦⎥⎥⎥⎥=1ex1+⋯+exn⋅⎡⎣⎢⎢ex1⋮exn⎤⎦⎥⎥\nabla f(x) = \begin{bmatrix} \frac{\partial f}{\partial x_1}(x) \\ \vdots \\ \frac{\partial f}{\partial x_n}(x) \end{bmatrix} = \frac{1}{e^{x_1} + \cdots + e^{x_n}} \cdot \begin{bmatrix} e^{x_1} \\ \vdots \\ e^{x_n} \end{bmatrix}
∂∂xi1ex1+⋯+exn=−(1ex1+⋯+exn)2⋅exi\frac{\partial}{\partial x_i} \frac{1}{e^{x_1} + \cdots + e^{x_n}} = -\left( \frac{1}{e^{x_1} + \cdots + e^{x_n}} \right)^2 \cdot e^{x_i}
(∇2f)ij,i≠j=∂∂xiexjex1+⋯+exn=−(1ex1+⋯+exn)2⋅exiexj(\nabla ^2 f)_{ij, \, i \neq j} = \frac{\partial}{\partial x_i} \frac{e^{x_j}}{e^{x_1} + \cdots + e^{x_n}} = -\left( \frac{1}{e^{x_1} + \cdots + e^{x_n}} \right)^2 \cdot e^{x_i} e^{x_j}
(∇2f)ii=∂∂xiexiex1+⋯+exn=−(1ex1+⋯+exn)2⋅(exi)2+exiex1+⋯+exn(\nabla ^2 f)_{ii} = \frac{\partial}{\partial x_i} \frac{e^{x_i}}{e^{x_1} + \cdots + e^{x_n}} = -\left( \frac{1}{e^{x_1} + \cdots + e^{x_n}} \right)^2 \cdot (e^{x_i})^2 + \frac{e^{x_i}}{e^{x_1} + \cdots + e^{x_n}}
Put zi=exiz_i = e^{x_i}, then ex1+⋯+exn=ITze^{x_1} + \cdots + e^{x_n} = I^T z
∇2f=−(1ITz)2zzT+1ITz⋅diag(z)=1ITz(diag(z)−1ITzzzT)\nabla ^2 f = -\left(\frac{1}{I^T z} \right)^2 z z^T + \frac{1}{I^T z} \cdot \text{diag}(z) = \frac{1}{I^T z} \left( \text{diag}(z) - \frac{1}{I^T z} z z^T \right)
xT(ITz⋅diag(z)−zzT)x≥0⟸⟸ITz⋅∑i=1nx2i⋅zi−(zTx)2≥0(zTx)2≤ITz⋅∑i=1nx2izi=∥(z1−−√,⋯,zn−−√)∥2⋅∑i=1n∥x1z1−−√,⋯,xnzn−−√∥2\begin{align*}
x^T (I^T z \cdot \text{diag}(z) - z z^T)x \ge 0 \impliedby & I^T z \cdot \sum_{i = 1}^n x_i^2 \cdot z_i - (z^T x)^2 \ge 0 \\
\impliedby & (z^T x)^2 \le I^T z \cdot \sum_{i = 1}^n x_i^2 z_i = \lVert (\sqrt{z_1}, \cdots, \sqrt{z_n}) \rVert _2 \cdot \sum_{i = 1}^n \lVert x_1 \sqrt{z_1}, \cdots, x_n \sqrt{z_n} \rVert _2 \\
\end{align*}
Exercise: Prove that f(x,y)=y2/xf(x, y) = y^2 / x is convex, domf=R++×Rdom \, f = \mathbb{R}_{++} \times \mathbb{R}
∇f=⎡⎣−y2x22yx⎤⎦,∇2f=⎡⎣−2y2x3−2yx2−2yx22x⎤⎦=1x3[2y2−2xy−2xy2x2]=2x3[y−x][y−x]\nabla f = \begin{bmatrix} -\frac{y^2}{x^2} \\ \frac{2y}{x} \end{bmatrix}, \nabla ^2 f = \begin{bmatrix} -\frac{2y^2}{x^3} & -\frac{2y}{x^2} \\ -\frac{2y}{x^2} & \frac{2}{x} \end{bmatrix} = \frac{1}{x^3} \begin{bmatrix} 2y^2 & -2xy \\ -2xy & 2x^2 \end{bmatrix} = \frac{2}{x^3} \begin{bmatrix} y \\ -x \end{bmatrix} \begin{bmatrix} y -x \end{bmatrix}
Proposition: Let f:Rn→Rf : \mathbb{R}^n \to \mathbb{R} be twice differentiable at xx. Then
f(x+z)=f(x)+∇f(x)Tz+12zT∇2f(x)z+errx(z)f(x + z) = f(x) + \nabla f(x)^T z + \frac{1}{2} z^T \nabla ^2 f(x) z + err_x(z)
where limz→0∥errx(z)∥2∥z∥22=0\lim_{z \to 0} \frac{\lVert err_x(z) \rVert _2}{\lVert z \rVert _2^2} = 0.
Equivalent to: ∀ε>0,∃r>0,s.t.(∥z∥2≤r⟹∥errx(z)∥2≤ε⋅∥z∥22)\forall \varepsilon \gt 0, \, \exists r \gt 0, \, s.t. \, (\lVert z \rVert _2 \le r \implies \lVert err_x(z) \rVert _2 \le \varepsilon \cdot \lVert z \rVert _2^2)
Proof: Let ε>0\varepsilon \gt 0, then ∃r>0\exists r \gt 0 s.t.
∇f(x+z)=∇f(x)+∇2f(x)z+errx(z)\nabla f(x + z) = \nabla f(x) + \nabla ^2 f(x) z + err_x(z)
where ∥errx(z)∥2≤ε⋅∥z∥2\lVert err_x(z) \rVert _2 \le \varepsilon \cdot \lVert z \rVert _2 for all zz s.t. ∥z∥2≤r\lVert z \rVert _2 \le r
∥∇f(x+z)−∇f(x)−∇2f(x)z∥2≤ε∥z∥2\lVert \nabla f(x + z) - \nabla f(x) - \nabla ^2 f(x) z \rVert _2 \le \varepsilon \lVert z \rVert _2 for all zz s.t. ∥z∥2≤r\lVert z \rVert _2 \le r
Let zz s.t. ∥z∥2≤r\lVert z \rVert _2 \le r and let u=z/∥z∥2u = z / \lVert z \rVert _2, g(t)=f(x+tu),t∈Rg(t) = f(x + tu), t \in \mathbb{R}.
Then g′(t)=∇f(x+tu)ug'(t) = \nabla f(x + tu)u
So f(x+tu)=====≤≤===f(x)+∫t0g′(s)dsf(x)+∫t0uT∇f(x+su)dsf(x)+uT∫t0(∇f(x)+∇2f(x)su+errx(su))dsf(x)+uT∇f(x)(t−0)+uT∇2f(x)u∫t0sds+uT∫t0errx(su)dsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2∫t0∥errx(su)∥2dsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2∫t0ε∥su∥2dsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2ε∫t0tdsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2εt2f(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+εt2f(x)+∇f(x)Tz+12zT∇2f(x)z+ε∥z∥22\begin{align*}
f(x + tu) = & f(x) + \int _0^t g'(s)ds \\
=& f(x) + \int _0^t u^T \nabla f(x + su)ds \\
=& f(x) + u^T \int _0^t (\nabla f(x) + \nabla ^2 f(x)su + err_x(su))ds \\
=& f(x) + u^T \nabla f(x) (t - 0) + u^T \nabla ^2 f(x) u \int _0^t s ds + u^T \int _0^t err_x(su)ds \\
=& f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \int _0^t \lVert err_x(su) \rVert _2 ds \\
\le & f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \int _0^t \varepsilon \lVert su \rVert _2 ds \\
\le & f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \varepsilon \int _0^t t ds \\
=& f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \varepsilon t^2 \\
=& f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \varepsilon t^2 \\
=& f(x) + \nabla f(x)^T z + \frac{1}{2} z^T \nabla ^2 f(x) z + \varepsilon \lVert z \rVert _2^2 \\
\end{align*}
相关文章推荐
- Chromium硬件加速渲染的OpenGL命令执行过程分析
- 基于Smack3.0.4+ Openfire3.10.2下学习开发IM(五)聊天室操作:创建聊天室、成员添加聊天室、查询聊天室成员和监听聊天室
- centos6.7 64位 伪分布 安装 cdh5.4.8 + jdk 8
- AOP--代理模式,拦截器的简易实现及原理
- linux rsync安装 配置 实例详解
- linux rsync同步命令(值得收藏)
- 一些免费的WebService的服务网站
- Cloudera Hadoop 管理员 广州
- Vim:基础(命令模式)
- CentOS 7中没有ifconfig命令,而且不能发现eth0
- linux下安装
- python调用shell
- Linux下检测程序进程是否正常并重启的脚本
- 编写简单的CentOS7系统服务文件
- map-reduce 、map、reduce
- linux里挂载(mount)和取消挂载(umount)命令的使用
- linux 常用基础命令 cat 详细介绍
- Linux 技巧:让进程在后台可靠运行的几种方法
- tomcat加载启动越来越慢怎么解决?
- SVN服务器迁移(两台Linux机器之间)