Desirable Properties of Objective Functions
While applying iterative optimization algorithms, some objective functions seem to be easier to optimize than others. In this section, we will discuss some desirable properties of objective functions that make them easier to optimize.
Continuity and Differentiability
A function \(f(x)\) is called to be continuous at \(c\in D\) if the limit \(\lim_{x\to c}f(x)=f(c)\) exists. This just means that the function does not have any jumps or holes at \(c\).
There are many different levels of differentiability. A function \(f(x)\) is called to be differentiable at \(c\in D\) if the limit \(\lim_{x\to c}\frac{f(x)-f(c)}{x-c}\) exists. It is called to be continuously differentiable if the derivative is continuous.
A stronger condition of continuity is called Lipschitz continuity, which requires that there exists a constant \(L\) such that $$ |f(x)-f(y)|\leq L\|x-y\| $$ for all \(x,y\in D\). This means that the function value cannot change arbitrarily fast.
A desired property of objective functions is that they are continuous and at least twice differentiable, or, in set notation, \(f\in C^2\). Also its gradient should be Lipschitz continuous, i.e. $$ \|\nabla f(x)-\nabla f(y)\|\leq L\|x-y\| $$ for all \(x,y\in D\).
For a continously differentiable function \(f(x)\), the convergence measured by the difference between the function value of the \(k\)-th iteration and the optimal value \(f(x^{*})\) yield the same convergence order as the convergence measured by the difference between the \(k\)-th iterate and the optimal solution \(x^{*}\), i.e. \(|f(x_{k+1})-f(x^{*})|\propto |x_{k+1}-x^{*}|\). This is because we can approximate the function value by its first-order Taylor expansion: $$ \begin{align*} |f(x_{k+1}) - f(x^{*})| &= |f(x^{*} + (x_{k+1}-x^{*})) - f(x^{*})| \\ &\approx |f(x^{*}) + \langle \nabla f(x^{*}), (x_{k+1}-x^{*} \rangle - f(x^{*})| \\ &= |\langle \nabla f(x^{*}), (x_{k+1}-x^{*} \rangle| \\ &\leq \|\nabla f(x^{*})\|\|x_{k+1}-x^{*}\| \\ &\propto |x_{k+1}-x^{*}| \end{align*} $$
Convexity
A continously differentiable function \(f(x)\) is called to be convex if $$ f(y)\geq f(x) + \nabla \langle f(x), (y-x) \rangle $$ for all \(x,y\in D\). This means that the function value is always larger than its first-order Taylor approximant.
Note: Convexity can also be defined for functions that are not differentiable.
If the objective function has Lipschitz continuous gradient in addition to being convex, it can be shown that the function has a quadratic upper bound: $$ f(y)\leq f(x) + \langle \nabla f(x), (y-x) \rangle + \frac{L}{2}\|y - x\|^2 $$ for all \(x,y\in D\). The proof is rather technical and is beyond the scope of this lecture.
A continously differentiable function \(f(x)\) is called to be strongly convex if there exists a constant \(\mu > 0\) such that $$ g(x) := f(x) - \frac{\mu}{2}\|x\|^2 $$ is convex. Intuitively, this means that the function \(f(x)\) must be "more convex" than a quadratic function.
Apply the convexity definition to the function \(g(x)\) above, we obtain $$ g(y)\geq g(x) + \nabla \langle g(x), (y-x) \rangle \quad \forall x,y\in D \\ \Rightarrow f(y) \geq f(x) + \langle \nabla f(x), (y-x) \rangle + \frac{\mu}{2}\|x\|^2 \quad \forall x,y\in D $$ This means that there is a quadratic lower bound for the function value of \(f(x)\).