Note
This page was generated from docs/notebooks/inversion/L_curve.ipynb.
L-curve criterion#
By P.C. Hansen
Diffinition#
Practically, in order to solve inversion problems \(Ax=b\) \((A\in\mathbb{R}^{m\times n}, x\in\mathbb{R}^n, b\in\mathbb{R}^m)\), we need to consider the similar form of least-squeares equations as follows:
\(\lambda\in\mathbb{R}\) is a real regularization parameter that must be chosen by the user. The vector \(b\) is the given data and the vector \(x_0\in\mathbb{R}^n\) is a priori estimate of \(x\) which is set to zero when no a priori information is available. \(||Ax - b||_2\) is the residual term and \(||L(x - x_0)||_2\) is the regularization term, where \(L\in\mathbb{R}^{n \times n}\) is an operator matrix like a laplacian one. \(\lambda\) is the parameter which decides the contribution to these terms. And L-curve criterion is the method to balance between these contribution and optmize the inversion solution.
L-curve is precisely following points curve:
L-curve criterion is based on the fact that the optimal reguralization paramter is acheived when the L-curve point:
lies on this corner.
In order to obtain the strict \(\lambda\) on the L-curve “corner” which is defined by the mathmatical method, we can calculate the L-curve carvature \(\kappa\) given by following equation:
Using the singular value decomposion (SVD) fomula, \(x_\lambda\) is described as follows:
where,
Here, \(r\leq\min(m, n)\) is the numerical rank of \(A\), so each SVD matrix has demensions like \(U\in \mathbb{R}^{m\times r}\), \(\Sigma\in\mathbb{R}^{r\times r}\), \(V\in\mathbb{R}^{n\times r}\), respectively. A priori estimate \(x_0\) is assumed to be 0.
Also \(\rho, \eta,\) and \(\eta^\prime\) is described by using SVD as follows:
As can be seen from the above, \(v_i\) or \(\tilde{v}_i\) are not used in the calculation of \(\rho, \eta,\) and \(\eta^\prime\). They affect only the inversion solution \(x_\lambda\). Therefore, \(\tilde{V}\) is often called the “inverted solution basis” or “reconstruction basis”.
Example ill-posed linear operator equation and applying L-curve criterion into this problem.#
As a famouse ill-posed linear equation, Fredholm integral equation is often used:
Here, we think the following situation as above equation form:
And, the true solution \(x_\text{t}(t)\) is assumed as follows:
let us define these function as follows:
[1]:
import numpy as np
from matplotlib import pyplot as plt
from cherab.phix.inversion import Lcurve
plt.rcParams["figure.dpi"] = 150
def kernel(s: np.ndarray, t: np.ndarray):
"""Kernel of Fredholm integral equation of the first kind."""
u = np.pi * (np.sin(s) + np.sin(t))
if u == 0:
return np.cos(s) + np.cos(t)
else:
return (np.cos(s) + np.cos(t)) * (np.sin(u) / u) ** 2
# true solution
def x_t_func(t):
return 2.0 * np.exp(-6.0 * (t - 0.8) ** 2) + np.exp(-2.0 * (t + 0.5) ** 2)
Let’s descritize the above integral equation. \(s\) and \(t\) are devided to 100 points at even intervals. \(x\) is a 1D vector data \((100, )\) and \(A\) is a \(100\times 100\) matrix. \(A\) is defined using karnel function. When discretizing the integral, the trapezoidal approximation is applied.
[2]:
# set valiables
s = np.linspace(-np.pi * 0.5, np.pi * 0.5, num=100)
t = np.linspace(-np.pi * 0.5, np.pi * 0.5, num=100)
x_t = x_t_func(t)
# Operater matrix: A
A = np.zeros((s.size, t.size))
A = np.array([[kernel(i, j) for j in t] for i in s])
# trapezoidal rule
A[:, 0] *= 0.5
A[:, -1] *= 0.5
A *= t[1] - t[0]
# simpson rule -- option
# A[:, 1:-1:2] *= 4.0
# A[:, 2:-2:2] *= 2.0
# A *= (t[1] - t[0]) / 3.0
print(f"condition number of A is {np.linalg.cond(A)}")
condition number of A is 1.3162238461768632e+19
Then excute singular value decomposition of \(A\)
[3]:
# SVD using the numpy module
u, sigma, vh = np.linalg.svd(A, full_matrices=False)
The measured data \(b\) contain both white noise \(\bar{b}\) and truly converted \(b_0\) from \(Ax_\text{t}\).
Descritized linear equation is as follows:
The noise variance is \(1.0 \times 10^{-4}\).
[4]:
b_0 = A.dot(x_t)
rng = np.random.default_rng()
b_noise = rng.normal(0, 1.0e-4, b_0.size)
b = b_0 + b_noise
In term of regularization, let us think tikhonov regularization, that is, \(L = I\) and \(x_0 = 0\).
And as an optimization method, let us use L-curve method as described above.
Lcurve is defined in cherab.phix.inversion module.
Let us solve the inverse problem.
[6]:
bounds = (-20.0, 2.0) # bounds of log10 of regularization parameter
sol, status = lcurve.solve(bounds=bounds, disp=False)
print(status)
message: ['requested number of basinhopping iterations completed successfully']
success: True
fun: -373.25031303000344
x: [-8.201e+00]
nit: 100
minimization_failures: 0
nfev: 2550
njev: 1275
lowest_optimization_result: message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
success: True
status: 0
fun: -373.25031303000344
x: [-8.201e+00]
nit: 4
jac: [ 6.168e-03]
nfev: 16
njev: 8
hess_inv: <1x1 LbfgsInvHessProduct with dtype=float64>
Plot measured data w/ or w/o noise#
The noise level is so mute that there is no clear difference between those. However this causes to arise the ill-posed problem due to the kernel function.
[7]:
plt.plot(s, b_0)
plt.plot(s, b)
plt.legend(["w/o noise", "w/ noise"])
plt.xlabel("s")
plt.ylabel("b(s)");
Plot L-curve#
[8]:
lcurve.plot_L_curve(bounds=bounds, n_beta=500, scatter_plot=5, scatter_annotate=True)
[8]:
(<Figure size 960x720 with 1 Axes>,
<AxesSubplot:xlabel='Residual norm', ylabel='Regularization norm'>)
Compare true solution \(x_\text{t}\) with estimated \(x_\lambda\)#
[9]:
lambdas = [1.0e-11, lcurve.lambda_opt, 1.0]
fig, axes = plt.subplots(1, 3, figsize=(10, 3))
fig.tight_layout(pad=-2.0)
labels = [f"$\\lambda =$ {i:.2e}" for i in lambdas]
i = 0
for ax, beta, label in zip(axes, lambdas, labels):
ax.plot(t, x_t, "--", label="$x_{true}$")
ax.plot(t, lcurve.inverted_solution(beta=beta), label="$x_\\lambda$")
ax.set_xlim(t.min(), t.max())
ax.set_ylim(0, x_t.max() * 1.1)
ax.set_xlabel("$t$")
ax.set_title(label)
ax.tick_params(direction="in", labelsize=10, which="both", top=True, right=True)
if i < 1:
ax.legend()
else:
ax.set_yticklabels([])
i += 1
plot L-curve curvature#
[10]:
_, ax = lcurve.plot_curvature(bounds=(-10, -6), n_beta=500)
ax.set_title("$\\lambda_{} = ${:.2e}".format("{opt}", lcurve.lambda_opt));
Plot \(\|Ax_\lambda - b\|\), \(\|x_\lambda\|\), and curvature#
[11]:
fig, ax1 = plt.subplots(dpi=150)
fig.subplots_adjust(right=0.85)
ax2 = ax1.twinx()
ax3 = ax1.twinx()
ax3.spines.right.set_position(("axes", 1.2))
# calculation of the values
lambdas = np.logspace(-15, 0, num=500)
rhos = [lcurve.residual_norm(beta) for beta in lambdas]
etas = [lcurve.regularization_norm(beta) for beta in lambdas]
kappa = [lcurve.curvature(beta) for beta in lambdas]
# plot lines
(p1,) = ax1.loglog(lambdas, rhos, color="C0")
(p2,) = ax2.loglog(lambdas, etas, color="C1")
(p3,) = ax3.semilogx(lambdas, kappa, color="C2")
# set axes properties
ax1.set(xlim=(lambdas[0], lambdas[-1]), xlabel=r"$\lambda$", ylabel=r"$||Ax_\lambda - b||$")
ax2.set(ylabel=r"$||x_\lambda||$")
ax3.set(ylabel="curvature of L-curve")
ax1.yaxis.label.set_color(p1.get_color())
ax2.yaxis.label.set_color(p2.get_color())
ax3.yaxis.label.set_color(p3.get_color())
ax1.tick_params(axis="y", colors=p1.get_color())
ax2.tick_params(axis="y", colors=p2.get_color())
ax3.tick_params(axis="y", colors=p3.get_color())
ax3.spines["left"].set_color(p1.get_color())
ax2.spines["right"].set_color(p2.get_color())
ax3.spines["right"].set_color(p3.get_color())
check the error of solutions#
The relative error is defined as follows:
[12]:
def error(values, true):
return np.linalg.norm(true - values) / np.linalg.norm(true)
# set regularization parameters
lambdas = np.logspace(-10, -6, num=500)
# calculate errors
errors = np.asarray([error(lcurve.inverted_solution(beta), x_t) for beta in lambdas])
lambda_min = lambdas[errors.argmin()]
error_min = errors.min()
error_opt = error(sol, x_t)
# calculate curvatures
kappa = np.asarray([lcurve.curvature(beta) for beta in lambdas])
# create figure
fig, ax1 = plt.subplots(dpi=150)
ax2 = ax1.twinx()
# plot errors and curvatures
(p1,) = ax1.loglog(lambdas, errors, color="C0")
(p2,) = ax2.semilogx(lambdas, kappa, color="C1")
# plot minimum error vertical line and point
ax1.vlines(lambda_min, 0.01, 1, color="r", linestyle="--", linewidth=0.75)
ax1.scatter(lambda_min, error_min, color="r", marker="o", s=10, zorder=2)
ax1.text(
lambda_min,
1.5e-2,
"$\\lambda_{min}$",
color="r",
horizontalalignment="left",
verticalalignment="center",
)
# plot maximum curvature vertical line and point
ax1.vlines(lcurve.lambda_opt, 0.01, 1, color="g", linestyle="--", linewidth=0.75)
ax2.scatter(lcurve.lambda_opt, lcurve.curvature(lcurve.lambda_opt), color="g", marker="o", s=10, zorder=2)
ax1.text(
lcurve.lambda_opt,
1.5e-2,
"$\\lambda_{opt}$",
color="g",
horizontalalignment="left",
verticalalignment="center",
)
# set axes
ax1.set(
xlim=(lambdas[0], lambdas[-1]), ylim=(0.01, 1), xlabel=r"$\lambda$", ylabel="Relative errors"
)
ax2.set(ylabel="curvature of L-curve")
ax1.yaxis.label.set_color(p1.get_color())
ax2.yaxis.label.set_color(p2.get_color())
ax1.tick_params(axis="y", colors=p1.get_color())
ax2.tick_params(axis="y", colors=p2.get_color())
ax2.spines["left"].set_color(p1.get_color())
ax2.spines["right"].set_color(p2.get_color())
ax1.set_title(
"$e(\\lambda_{})$ = {:.2%}, $e(\\lambda_{}) = ${:.2%}".format(
"{opt}", error_opt, "{min}", error_min
)
);
Compare minimum error solution with Lcurve-optimized \(x_\lambda\) one#
[13]:
fig, ax = plt.subplots()
ax.plot(t, x_t, "k--", label="$x_{true}$", linewidth=1.0)
ax.plot(t, lcurve.inverted_solution(lambda_min), label="$\\lambda_{min}$")
ax.plot(t, sol, label="$\\lambda_{opt}$")
ax.set_xlabel("$t$")
ax.set_xlim(t.min(), t.max())
ax.set_ylim(0, x_t.max() * 1.1)
ax.legend()
[13]:
<matplotlib.legend.Legend at 0x7f6ca9645940>
GCV criterion#
Diffinition#
Generalized Cross-Validation’s idea is that the best modell for the measurements is the one that best predicts each measurement as a function of the others. The GCV estimate of \(\lambda\) is chosen as follows:
where
Using SVD components, \(GCV(\lambda)\) is replaced as follows:
[14]:
message: ['requested number of basinhopping iterations completed successfully']
success: True
fun: 2.68607600493509e-09
x: [-2.000e+01]
nit: 100
minimization_failures: 0
nfev: 310
njev: 155
lowest_optimization_result: message: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
success: True
status: 0
fun: 2.68607600493509e-09
x: [-2.000e+01]
nit: 1
jac: [ 1.118e-10]
nfev: 4
njev: 2
hess_inv: <1x1 LbfgsInvHessProduct with dtype=float64>
Plot error function and GCV#
The results of the GCV optimization shows that GCV does not work for this inversion problem.
[15]:
_, ax = gcv.plot_gcv(bounds=bounds, n_beta=500)
ax.set_title("$\\lambda_{} = ${:.2e}".format("{opt}", gcv.lambda_opt));