A New Choice Rule for Regularization Parameters in Tikhonov ...
A New Choice Rule for Regularization Parameters in Tikhonov ... A New Choice Rule for Regularization Parameters in Tikhonov ...
Research Report A New Choice Rule for Regularization Parameters in Tikhonov Regularization by Kazufumi Ito, Bangti Jin, Jun Zou CUHK-2008-07 (362) September 2008 Department of Mathematics The Chinese University of Hong Kong Shatin, Hong Kong Fax: (852) 2603 5154 Email : dept@math.cuhk.edu.hk URL: http://www.cuhk.edu.hk
- Page 2 and 3: A New Choice Rule for Regularizatio
- Page 4 and 5: a reasonable regularization paramet
- Page 6 and 7: This naturally motivates the follow
- Page 8 and 9: for any sequence {x n } n ⊂ X con
- Page 10 and 11: Corollary 3.1 The functional value
- Page 12 and 13: Let us denote the subdifferential o
- Page 14 and 15: Therefore, the definition of η k g
- Page 16 and 17: Table 1: Numerical examples. exampl
- Page 18 and 19: 10 8 8 8 6 6 6 p(η) 4 p(e) 4 p(σ
- Page 20 and 21: Table 4: Numerical results for Exam
- Page 22 and 23: Table 6: Numerical results for Exam
- Page 24: [11] Engl HW, Hanke M, Neubauer A.
Research Report<br />
A <strong>New</strong> <strong>Choice</strong> <strong>Rule</strong> <strong>for</strong> <strong>Regularization</strong> <strong>Parameters</strong><br />
<strong>in</strong> <strong>Tikhonov</strong> <strong>Regularization</strong><br />
by<br />
Kazufumi Ito, Bangti J<strong>in</strong>, Jun Zou<br />
CUHK-2008-07 (362)<br />
September 2008<br />
Department of Mathematics<br />
The Ch<strong>in</strong>ese University of Hong Kong<br />
Shat<strong>in</strong>, Hong Kong<br />
Fax: (852) 2603 5154<br />
Email : dept@math.cuhk.edu.hk<br />
URL: http://www.cuhk.edu.hk
A <strong>New</strong> <strong>Choice</strong> <strong>Rule</strong> <strong>for</strong> <strong>Regularization</strong> <strong>Parameters</strong><br />
<strong>in</strong> <strong>Tikhonov</strong> <strong>Regularization</strong><br />
Kazufumi Ito ∗ Bangti J<strong>in</strong> † Jun Zou ‡<br />
August 26, 2008<br />
Abstract This paper proposes and analyzes a novel rule <strong>for</strong> choos<strong>in</strong>g the regularization parameters <strong>in</strong><br />
<strong>Tikhonov</strong> regularization <strong>for</strong> <strong>in</strong>verse problems, not necessarily requir<strong>in</strong>g the knowledge of the exact noise<br />
level. The new choice rule is derived by draw<strong>in</strong>g ideas from Bayesian statistical analysis. The existence<br />
of solutions to the regularization parameter equation is shown, and some variational characterizations of<br />
the feasible parameters are also provided. With such feasible regularization parameters, we are able to<br />
establish a posteriori error estimates of the approximate solutions to the concerned <strong>in</strong>verse problem. An<br />
iterative algorithm is suggested <strong>for</strong> the efficient numerical realization of the choice rule, which is shown to<br />
have a practically desired monotonic convergence. Numerical experiments <strong>for</strong> both mildly and severely<br />
ill-posed benchmark <strong>in</strong>verse problems with various regulariz<strong>in</strong>g functionals of <strong>Tikhonov</strong> type, e.g. L 2 -L 2 ,<br />
L 2 -L 1 and L 1 -T V , are presented which have demonstrated the effectiveness and robustness of the new<br />
choice rule.<br />
Key Words: regularization parameter, a posteriori error estimate, <strong>Tikhonov</strong> regularization, <strong>in</strong>verse<br />
problem<br />
1 Introduction<br />
Inverse problems arise <strong>in</strong> real-world applications whenever one attempts to <strong>in</strong>fer the physical laws or<br />
parameters from imprecise and <strong>in</strong>direct observational data. This work considers all the l<strong>in</strong>ear <strong>in</strong>verse<br />
problems of general <strong>for</strong>m<br />
Kx = y δ , (1)<br />
where x ∈ X and y δ ∈ Y refer to the unknown parameters and the observational data, respectively.<br />
The spaces X and Y are Banach spaces, with the respective norms denoted as ‖ · ‖ X and ‖ · ‖ Y , and<br />
X is reflexive. The <strong>for</strong>ward operator K : dom(K) ⊂ X ↦→ Y is l<strong>in</strong>ear and bounded. System (1) can<br />
represent a wide variety of <strong>in</strong>verse problems aris<strong>in</strong>g <strong>in</strong> diverse <strong>in</strong>dustrial and eng<strong>in</strong>eer<strong>in</strong>g applications,<br />
e.g. computerized tomography [23], parameter identification [4], image process<strong>in</strong>g [10] and <strong>in</strong>verse heat<br />
transfer [6]. The observational data y δ is a noisy version of the exact data y = Kx + , and its noise level<br />
is often measured by the upper bound σ 0 <strong>in</strong> the follow<strong>in</strong>g <strong>in</strong>equality<br />
φ(x + , y δ ) ≤ σ 2 0, (2)<br />
where the functional φ(x, y δ ) : X × Y ↦→ R + measures the proximity of the model output Kx to the data<br />
y δ . We use the notation σ 2 0 <strong>in</strong> (2) <strong>in</strong> place of the more commonly used δ 2 to ma<strong>in</strong>ta<strong>in</strong> its clear statistical<br />
<strong>in</strong>terpretation as the variance of the data noise.<br />
∗ Center <strong>for</strong> Research <strong>in</strong> Scientific Computation & Department of Mathematics, North Carol<strong>in</strong>a State University, Raleigh,<br />
NC 27695, USA. (kito@math.ncsu.edu)<br />
† Universität Bremen, FB 3 Mathematik und In<strong>for</strong>matik, Zentrum für Technomathematik, Postfach 330 440, 28344<br />
Bremen, Germany. (kimbts<strong>in</strong>g@yahoo.com.cn)<br />
‡ Department of Mathematics, The Ch<strong>in</strong>ese University of Hong Kong, Shat<strong>in</strong> N.T., Hong Kong, P.R. Ch<strong>in</strong>a. The<br />
work of this author was substantially supported by Hong Kong RGC grants (Project 404105 and Project 404606).<br />
(zou@math.cuhk.edu.hk)<br />
1
Inverse problems are generally ill-posed <strong>in</strong> the sense of Hardamard, i.e. a solution may not exist and<br />
be nonunique, and more severely, a small perturbation <strong>in</strong> the data may cause an enormous deviation of<br />
the solution. There<strong>for</strong>e, the mathematical analysis and numerical solution of <strong>in</strong>verse problems are very<br />
challeng<strong>in</strong>g. The standard procedure <strong>for</strong> numerically treat<strong>in</strong>g <strong>in</strong>verse problems is regularization thanks<br />
to <strong>Tikhonov</strong>’s <strong>in</strong>augural work [28]. <strong>Regularization</strong> techniques mitigate the ill-posedness by <strong>in</strong>corporat<strong>in</strong>g<br />
a priori <strong>in</strong><strong>for</strong>mation about the solution, e.g. boundedness, smoothness and positivity [28, 11]. The<br />
celebrated <strong>Tikhonov</strong> regularization trans<strong>for</strong>ms the solution of system (1) <strong>in</strong>to the m<strong>in</strong>imization of the<br />
<strong>Tikhonov</strong> functional J η def<strong>in</strong>ed by<br />
J η (x) = φ(x, y δ ) + ηψ(x), (3)<br />
and takes its m<strong>in</strong>imizer x η as an approximate solution, where η is often called the regularization parameter<br />
compromis<strong>in</strong>g the data fitt<strong>in</strong>g term φ(x, y δ ) with the a priori <strong>in</strong><strong>for</strong>mation encoded <strong>in</strong> the regularization<br />
term ψ(x). Some commonly used data fidelity functionals <strong>in</strong>clude ‖Kx − y δ ‖ 2 L<br />
[28], ‖Kx − y δ ‖ 2 L 1 [24]<br />
and ∫ (Kx − y δ log Kx) [2], and regularization functionals <strong>in</strong>clude ‖x‖ ν L [24], ν ‖x‖2 H [28] and |x| m T V<br />
[10]. Traditionally, <strong>Tikhonov</strong> regularization considers only L 2 data fitt<strong>in</strong>g <strong>in</strong> conjunction with L 2 or<br />
H m regularization, referred to as L 2 -L 2 functional hereafter, which statistically corresponds to additive<br />
Gaussian noise and smoothness prior, respectively. However, other nonconventional and nonstandard<br />
functionals have also received considerable recent attention, e.g. statistically motivated data fitt<strong>in</strong>g and<br />
feature-promot<strong>in</strong>g, e.g. edge, sparsity and texture, regularization.<br />
The regularization parameter η determ<strong>in</strong>es the tradeoff between data fidelity and a priori <strong>in</strong><strong>for</strong>mation,<br />
and it plays an <strong>in</strong>dispensable role <strong>in</strong> design<strong>in</strong>g a stable <strong>in</strong>verse reconstruction process and obta<strong>in</strong><strong>in</strong>g a<br />
practically acceptable <strong>in</strong>verse solution. To be more precise, the <strong>in</strong>verse solution is overwhelmed by<br />
the prior knowledge if η is too large and it often leads to undesirable effects, e.g. over-smooth, and<br />
conversely, it may be unstable and plagued with spurious and nonphysical details if η is too small.<br />
There<strong>for</strong>e, its selection constitutes one of the major <strong>in</strong>conveniences and difficulties <strong>in</strong> apply<strong>in</strong>g exist<strong>in</strong>g<br />
regularization techniques, and is crucial to the success of a regularization method. A number of choice<br />
rules have been proposed <strong>in</strong> the literature, e.g. discrepancy pr<strong>in</strong>ciple [21, 25, 32], unbiased predictive risk<br />
estimator (UPRE) method [30], quasi-optimality criterion [29], generalized cross-validation (GCV) [13],<br />
and L-curve criterion [15]. The discrepancy pr<strong>in</strong>ciple is mathematically rigorous, however, it requires<br />
an accurate estimate of the exact noise level σ 0 and its <strong>in</strong>accuracy can severely deteriorate the <strong>in</strong>verse<br />
solution [11, 21]. The UPRE method was orig<strong>in</strong>ally developed <strong>for</strong> model selection <strong>in</strong> l<strong>in</strong>ear regression,<br />
and was later adapted <strong>for</strong> choos<strong>in</strong>g the regularization parameter [30]. However, its application requires<br />
an estimate of data noise like the discrepancy pr<strong>in</strong>ciple, and the m<strong>in</strong>imization of the UPRE curve is<br />
tricky s<strong>in</strong>ce it may be very flat over a broad scale [30], which is also the case <strong>for</strong> the GCV curve [16]. The<br />
latter three do not require any a priori knowledge of the noise level σ 0 , and thus are fully data-driven.<br />
These methods have been very popular <strong>in</strong> the eng<strong>in</strong>eer<strong>in</strong>g community s<strong>in</strong>ce their <strong>in</strong>ception and also have<br />
delivered satisfactory per<strong>for</strong>mance <strong>for</strong> numerous practical <strong>in</strong>verse problems [16]. However, these methods<br />
are heuristic <strong>in</strong> nature, and can not be analyzed <strong>in</strong> the framework of determ<strong>in</strong>istic <strong>in</strong>verse theory [11].<br />
Nonetheless, their mathematical underp<strong>in</strong>n<strong>in</strong>gs might be laid down <strong>in</strong> the context of statistical <strong>in</strong>verse<br />
theory, e.g. the semidiscrete semistochastic l<strong>in</strong>ear data model [30], though such analysis is seldom carried<br />
out <strong>for</strong> general regularization <strong>for</strong>mulations.<br />
Another pr<strong>in</strong>cipled framework <strong>for</strong> select<strong>in</strong>g the regularization parameter is Bayesian <strong>in</strong>ference [7, 12].<br />
Thompson and Kay [27] and Archer and Tigger<strong>in</strong>gton [1] <strong>in</strong>vestigated the framework <strong>in</strong> the context<br />
of image restoration, and proposed and numerically evaluated several choice rules by consider<strong>in</strong>g various<br />
po<strong>in</strong>t estimates, e.g. maximum likelihood estimate and maximum a posteriori, of the posteriori<br />
probability density function and their approximations. However, these were application-oriented papers<br />
compar<strong>in</strong>g different methods with neither mathematical analysis nor algorithmic description. Motivated<br />
by hierarchical model<strong>in</strong>g of Bayesian paradigm [12], the authors [19] recently proposed an augmented<br />
<strong>Tikhonov</strong> functional which determ<strong>in</strong>es the regularization parameter and the noise level along with the<br />
<strong>in</strong>verse solution <strong>for</strong> f<strong>in</strong>ite-dimensional l<strong>in</strong>ear <strong>in</strong>verse problems.<br />
In this paper, we will <strong>in</strong>vestigate the <strong>Tikhonov</strong> regularization <strong>in</strong> a general sett<strong>in</strong>g, with a general<br />
data fitt<strong>in</strong>g term φ(x, y δ ) and regularization term ψ(x) <strong>in</strong> (3), and propose a new choice rule <strong>for</strong> f<strong>in</strong>d<strong>in</strong>g<br />
2
a reasonable regularization parameter η. The derivation of the parameter choice rule from the po<strong>in</strong>t<br />
of view of hierarchical Bayesian <strong>in</strong>ference will be detailed <strong>in</strong> Section 2. As we will see, the new rule<br />
preserves an important advantage of some other exist<strong>in</strong>g heuristic rules <strong>in</strong> that it does not require the<br />
knowledge of the noise level as well. But <strong>for</strong> this new rule, some solid theoretical justifications can be<br />
developed, especially a posteriori error estimates shall be established. In addition, an iterative algorithm<br />
of monotone type is developed <strong>for</strong> an efficient realization of the algorithm <strong>in</strong> practice, and it merits a fast<br />
and steady convergence.<br />
The newly proposed choice rule has several more dist<strong>in</strong>ctions <strong>in</strong> comparison with exist<strong>in</strong>g heuristic<br />
choice rules. Various nonconvergence results have been established <strong>for</strong> the L-curve criterion [14, 30] and<br />
thus the variation of the regularization parameter is unduly large <strong>in</strong> case of low noise levels, and the<br />
existence of a corner is not be ensured. The theoretical understand<strong>in</strong>g of the quasi-optimality criterion is<br />
very limited despite its popularity [3]. The GCV merits solid statistical justifications [31, 11], however, the<br />
existence of a m<strong>in</strong>imum is not guaranteed. Moreover, <strong>in</strong> the L-curve criterion, numerically locat<strong>in</strong>g the<br />
corner from discrete sampl<strong>in</strong>g po<strong>in</strong>ts is highly nontrivial. The GCV curve is often very flat and numerically<br />
difficult to m<strong>in</strong>imize, and it sometimes requires tight bounds on the regularization parameter so as to<br />
work robustly. For functionals other than L 2 -L 2 type, all three exist<strong>in</strong>g methods require comput<strong>in</strong>g<br />
the <strong>in</strong>verse solution at many discrete sampl<strong>in</strong>g po<strong>in</strong>ts, and thus computationally very expensive. The<br />
newly proposed choice rule basically elim<strong>in</strong>ates these computational <strong>in</strong>conveniences by the efficient and<br />
monotonically convergent iterative algorithm, while at the same time it can be justified mathematically<br />
as it is done <strong>in</strong> Sections 3 and 4. Moreover, the new choice rule applies straight<strong>for</strong>wardly to <strong>Tikhonov</strong><br />
regularization of very general type, e.g. L 1 -T V , whereas other rules are numerically validated and<br />
theoretically attacked mostly <strong>for</strong> functionals of L 2 -L 2 types.<br />
We conclude this section with a general remark on heuristic choice rules. A well-known theorem of<br />
Bakush<strong>in</strong>skii [11] states that no determ<strong>in</strong>istic convergence theory can exist <strong>for</strong> choice rules disrespect<strong>in</strong>g<br />
the exact noise level. In particular, the <strong>in</strong>verse solution does not necessarily converge to the exact solution<br />
as the noise level dim<strong>in</strong>ishes to zero. There<strong>for</strong>e, we reiterate that no choice rule, <strong>in</strong> particular heuristics, <strong>for</strong><br />
choos<strong>in</strong>g the regularization parameter <strong>in</strong> ill-posed problems should be considered a “black-box rout<strong>in</strong>e”.<br />
One can always construct examples where the heuristic choice rules per<strong>for</strong>m poorly.<br />
The rest of the paper is structured as follows. In Section 2, we derive the new choice rule with<strong>in</strong> the<br />
Bayesian paradigm. Section 3 shows the existence of solutions to the regularization parameter equation,<br />
and derives some a posteriori error estimates. Section 4 proposes an iterative algorithm <strong>for</strong> efficient<br />
numerical computation, and establishes the monotone convergence of the algorithm. Section 5 presents<br />
numerical results <strong>for</strong> several benchmark l<strong>in</strong>ear <strong>in</strong>verse problems to illustrate relevant features of the<br />
proposed method. We conclude and <strong>in</strong>dicate directions of future research <strong>in</strong> Section 6.<br />
2 Derivation of the new choice rule<br />
In this section, we shall motivate our new determ<strong>in</strong>istic choice rule by draw<strong>in</strong>g some ideas from the<br />
nondeterm<strong>in</strong>istic Bayesian <strong>in</strong>ference [12, 19] which was used <strong>for</strong> a different purpose <strong>in</strong> the statistical<br />
community. But the choice rule will be rigorously analyzed and justified <strong>in</strong> the framework of determ<strong>in</strong>istic<br />
<strong>in</strong>verse theory, as it is done <strong>in</strong> the subsequent sections.<br />
For the ease of exposition, we shall derive our new choice rule by consider<strong>in</strong>g the follow<strong>in</strong>g f<strong>in</strong>itedimensional<br />
l<strong>in</strong>ear <strong>in</strong>verse problem<br />
Kx = y δ , (4)<br />
with K ∈ R n×m , x ∈ R m and y δ ∈ R n . One pr<strong>in</strong>cipled approach to provide solutions to this problem is<br />
by Bayesian <strong>in</strong>ference [12, 19]. The cornerstone of Bayesian <strong>in</strong>ference is Bayes’ rule<br />
p(x|y δ ) ∝ p(y δ |x)p(x),<br />
where the probability and conditional probability density functions p(x) and p(y δ |x) are known as the<br />
prior and likelihood function, and reflect the prior knowledge and contributions of the data, respectively.<br />
3
Also we have dropped the normaliz<strong>in</strong>g constant s<strong>in</strong>ce it plays only an immaterial role <strong>in</strong> our subsequent<br />
developments. There<strong>for</strong>e, there are two build<strong>in</strong>g blocks <strong>in</strong> Bayesian <strong>in</strong>ference, i.e. p(y δ |x) and p(x), that<br />
are to be modeled. Assume that additive i.i.d. Gaussian random variables with mean zero and variance σ 2<br />
account <strong>for</strong> the measurement errors contam<strong>in</strong>at<strong>in</strong>g the exact data, then the likelihood function p(y δ |x, τ),<br />
with τ = 1/σ 2 , is given by<br />
p(y δ |x, τ) ∝ τ n 2 exp<br />
(− τ )<br />
2 ‖Kx − yδ ‖ 2 2 .<br />
Bayesian <strong>in</strong>ference encodes the a priori <strong>in</strong><strong>for</strong>mation of the unknown x be<strong>for</strong>e collect<strong>in</strong>g the data <strong>in</strong> the<br />
prior density function p(x|λ), and this is often achieved with the help of the versatile tool of Markov<br />
random field, which <strong>in</strong> its simplest <strong>for</strong>m can be mathematically written as<br />
p(x|λ) ∝ λ m 2 exp<br />
(− λ )<br />
2 ‖Lx‖2 2 ,<br />
where the matrix L ∈ R p×m encapsulates the structure of <strong>in</strong>teractions between neighbor<strong>in</strong>g sites, and typically<br />
corresponds to some discretized differential operator. The scale parameter λ dictates the strength<br />
of the <strong>in</strong>teraction. Un<strong>for</strong>tunately, the scale parameter λ and the <strong>in</strong>verse variance τ are often nontrivial to<br />
assign and calibrate despite their critical role <strong>in</strong> the statistical model<strong>in</strong>g. The Bayesian paradigm resolves<br />
the difficulty flexibly through hierarchical model<strong>in</strong>g. The underly<strong>in</strong>g idea is to regard them as unknowns<br />
and to let the data determ<strong>in</strong>e these parameters. More precisely, they are also modeled as random variables,<br />
and have their own priors. We follow the standard statistical practice of adopt<strong>in</strong>g conjugate priors<br />
<strong>for</strong> both λ and τ [12], i.e.<br />
p(λ) ∝ λ α0−1 e −β0λ and p(τ) ∝ τ α1−1 e −β1τ ,<br />
where (α 0 , β 0 ) and (α 1 , β 1 ) are the parameter pairs <strong>for</strong> the prior distributions of λ and τ, respectively. By<br />
comb<strong>in</strong><strong>in</strong>g these densities via Bayes’ rule, we arrive at the complete Bayesian solution, i.e. the posterior<br />
probability density function (PPDF) p(x, λ, τ|y δ ), to the <strong>in</strong>verse problem (4):<br />
p(x, λ, τ|y δ ) ∝ p(y δ |x, τ) · p(x|λ) · p(λ) · p(τ)<br />
∝ τ n 2 exp<br />
(− τ )<br />
2 ‖Hx − yδ ‖ 2 2 · λ m 2 exp<br />
(− λ )<br />
2 ‖Lx‖2 2 · λ α0−1 e −β 0λ · τ α1−1 e −β1τ .<br />
The PPDF encapsulates complete <strong>in</strong><strong>for</strong>mation about the unknown x and the parameters λ and τ.<br />
The maximum a posteriori rema<strong>in</strong>s the most popular Bayesian estimate, and it selects (x, λ, τ) map as the<br />
most probable one given the observational data y δ . More precisely, it proceeds as follows:<br />
(x, λ, τ) map = arg max p(x, λ,<br />
(x,λ,τ) τ|yδ ) = arg m<strong>in</strong> J (m, λ, τ),<br />
(x,λ,τ)<br />
where the functional J (x, λ, τ) is def<strong>in</strong>ed by<br />
J (x, λ, τ) = τ 2 ‖Kx − yδ ‖ 2 2 + λ ( m<br />
)<br />
( n<br />
)<br />
2 ‖Lx‖2 2 + β 0 λ −<br />
2 + α 0 − 1 ln λ + β 1 τ −<br />
2 + α 1 − 1 ln τ.<br />
Abus<strong>in</strong>g the notations α 0 , β 0 , α 1 and β 1 slightly, its <strong>for</strong>mal limit as m, n → ∞ suggests a new functional<br />
of cont<strong>in</strong>uous <strong>for</strong>m<br />
J (x, λ, τ) = τ 2 ‖Kx − yδ ‖ 2 L + λ ( )<br />
( )<br />
1 1 2<br />
2 ‖Lx‖2 L + β 0λ − 2 2 + α 0 ln λ + β 1 τ −<br />
2 + α 1 ln τ,<br />
where the operators K and L are cont<strong>in</strong>uous analogs of the matrices K and L, respectively. Upon lett<strong>in</strong>g<br />
α ′ 0 = 1 2 + α 0 and α ′ 1 = 1 2 + α 1, we arrive at<br />
J (x, λ, τ) = τ 2 ‖Kx − yδ ‖ 2 2 + λ 2 ‖Lx‖2 2 + β 0 λ − α ′ 0 ln λ + β 1 τ − α ′ 1 ln τ.<br />
4
This naturally motivates the follow<strong>in</strong>g generalized <strong>Tikhonov</strong> (g-<strong>Tikhonov</strong> <strong>for</strong> short) functional<br />
J (x, λ, τ) = τφ(x, y δ ) + λψ(x) + β 0 λ − α ′ 0 ln λ + β 1 τ − α ′ 1 ln τ (5)<br />
def<strong>in</strong>ed <strong>for</strong> (x, λ, τ) ∈ X × R + × R + . This extends the <strong>Tikhonov</strong> functional (3), but will never be utilized<br />
to solve the <strong>in</strong>verse problem (1). Functional J (x, λ, τ) is <strong>in</strong>troduced only <strong>in</strong> the hope to help construct<br />
an adaptive algorithm <strong>for</strong> select<strong>in</strong>g a reasonable regularization parameter η <strong>in</strong> (3), which will be our<br />
<strong>in</strong>terested solver <strong>for</strong> (1).<br />
We are now go<strong>in</strong>g to derive the algorithm <strong>for</strong> adaptively updat<strong>in</strong>g the parameter η by mak<strong>in</strong>g use<br />
of the optimality system of functional J (x, λ, τ), and detect<strong>in</strong>g the noise level σ 0 <strong>in</strong> a fully data-driven<br />
manner. As we will see, the parameter η and the noise level σ(η) are connected with the parameters λ and<br />
τ <strong>in</strong> (5) by the relations η := λτ −1 and σ 2 (η) = τ −1 . Not<strong>in</strong>g that we are consider<strong>in</strong>g a general sett<strong>in</strong>g,<br />
the functional might be nonsmooth and nonconvex, so we shall resort to optimality <strong>in</strong> a generalized sense.<br />
Def<strong>in</strong>ition 2.1 An element (x ∗ , λ ∗ , τ ∗ ) ∈ X × R + × R + is called a critical po<strong>in</strong>t of the functional (5) if<br />
it satisfies the follow<strong>in</strong>g generalized optimality system<br />
x ∗ {<br />
= arg m<strong>in</strong> φ(x, y δ ) + λ ∗ (τ ∗ ) −1 ψ(x) } ,<br />
x∈X<br />
ψ(x ∗ ) + β 0 − α 0<br />
′ 1<br />
= 0, (6)<br />
λ∗ φ(x ∗ , y δ ) + β 1 − α 1<br />
′ 1<br />
τ ∗ = 0.<br />
Note that the solution x ∗ co<strong>in</strong>cides with the <strong>Tikhonov</strong> solution x η ∗ <strong>in</strong> (3) with η ∗ := λ∗<br />
τ<br />
. Numerical<br />
∗<br />
experiments <strong>in</strong>dicate that the estimate σ 2 (η ∗ ) = 1<br />
τ<br />
= ϕ(x η ∗ ,yδ )+β 1<br />
∗ α<br />
represents an excellent approximation<br />
′<br />
1<br />
to the exact variance σ0 2 <strong>for</strong> the choice α 1 = β 1 ≈ 0, like the highly applauded GCV estimate of the<br />
variance [31].<br />
From the optimality system (6), the automatically determ<strong>in</strong>ed regularization parameter η ∗ verifies<br />
α ′ 0<br />
η ∗ := λ ∗ · (τ ∗ ) −1 =<br />
· ϕ(x η ∗, yδ ) + β 1<br />
. (7)<br />
ψ(x η ∗) + β 0<br />
Under the premise that the estimate σ 2 (η ∗ ) approximates accurately the exact variance σ0, 2 the def<strong>in</strong><strong>in</strong>g<br />
relation (7) implies that the regularization parameter η ∗ verifies the <strong>in</strong>equality η ∗ α′ 0<br />
β 0<br />
σ0. 2 Experimentally,<br />
we have also observed that the value of the scale parameter λ ∗ :=<br />
α ′ 1<br />
α ′ 0<br />
ψ(x η ∗ )+β 0<br />
is almost <strong>in</strong>dependent<br />
of the noise level σ 0 <strong>for</strong> fixed α ′ 0 and β 0 , and thus empirically speak<strong>in</strong>g, η ∗ is of the order O(σ 2 0). However,<br />
the determ<strong>in</strong>istic <strong>in</strong>verse theory [11] requires a regularization parameter choice rule ˜η(σ 0 ) verify<strong>in</strong>g<br />
lim ˜η(σ σ0<br />
2<br />
0) = 0 and lim<br />
σ 0→0 σ 0→0 ˜η(σ 0 ) = 0 (8)<br />
<strong>in</strong> order to yield a valid regulariz<strong>in</strong>g scheme, i.e. the <strong>in</strong>verse solution converges to the exact one as the<br />
noise level σ 0 dim<strong>in</strong>ishes to zero. There<strong>for</strong>e, the g-<strong>Tikhonov</strong> method (5) is bounded to under-regularize<br />
the <strong>in</strong>verse problem (1) <strong>in</strong> case of low noise levels, i.e. the regularization parameter η ∗ is too small.<br />
Numerical f<strong>in</strong>d<strong>in</strong>gs also corroborate the assertion, evidenced by the under-regularization <strong>in</strong> case of low<br />
noise levels. One promis<strong>in</strong>g approach to remedy the difficulty is to rescale α 0 ′ as σ0 −d with 0 < d < 2<br />
as σ 0 tends to zero <strong>in</strong> order to ensure the consistency conditions dictated by equation (8). There<strong>for</strong>e, it<br />
seems natural to adaptively update α 0 ′ us<strong>in</strong>g the automatically determ<strong>in</strong>ed noise level σ 2 (η ∗ ).<br />
Our new choice rule derives from equation (7) and preced<strong>in</strong>g arguments. Upon abus<strong>in</strong>g the notation<br />
α 0 ′ by identify<strong>in</strong>g α 0 ′ with its rescal<strong>in</strong>g with the automatically determ<strong>in</strong>ed σ 2 (η ∗ ), it consists of choos<strong>in</strong>g<br />
5
the regularization parameter η ∗ by the rule<br />
η ∗ =<br />
α ′ 0<br />
ψ(x η ∗) + β 0<br />
·<br />
( ϕ(xη ∗, y δ ) −d<br />
)<br />
· ϕ(x η ∗, yδ )<br />
α ′ 1<br />
= α φ(x η ∗, yδ ) 1−d<br />
ψ(x η ∗) + β 0<br />
, 0 < d < 1,<br />
where α = α′ 0<br />
is some constant. Here we have dropped the constant β<br />
(α ′ 1 s<strong>in</strong>ce it has only marg<strong>in</strong>al<br />
1 )1−d<br />
practical impact on the solution procedure so long as its value is small. The rationale of <strong>in</strong>vok<strong>in</strong>g an<br />
exponent (1 − d) is to adaptively update the parameter α 0 ′ us<strong>in</strong>g the automatically detected noise level<br />
so that α 0 ′ ∼ O(σ0 −2d ) as the noise level σ 0 decreases to zero, <strong>in</strong> the hope of verify<strong>in</strong>g the consistency<br />
conditions dictated <strong>in</strong> equation (8). This choice rule is plausible provided that the estimate σ 2 (η ∗ ) agrees<br />
reasonably well with the exact noise level σ0. 2 In brevity, we have arrived at the desired choice rule which<br />
selects the regularization parameter accord<strong>in</strong>g to the nonl<strong>in</strong>ear equation of η with 0 < d < 1:<br />
η (ψ(x η ) + β 0 ) = α φ(x η , y δ ) 1−d (9)<br />
<strong>for</strong> which we shall propose an effective iterative algorithm that converges monotonically.<br />
We emphasize that the newly proposed choice rule (9) can also be regarded as an adaptive strategy<br />
<strong>for</strong> updat<strong>in</strong>g the parameter α 0 ′ <strong>in</strong> the g-<strong>Tikhonov</strong> functional. The specialization of the choice rule (9)<br />
to the L 2 -L 2 functional might be also used as a systematic strategy to adapt the parameter ν of a fixed<br />
po<strong>in</strong>t algorithm proposed <strong>in</strong> [5], which numerically implements a local m<strong>in</strong>imum criterion [26, 11] <strong>for</strong><br />
choos<strong>in</strong>g the regularization parameter.<br />
Selection of parameters β 0 , α and d <strong>in</strong> (9). Be<strong>for</strong>e proceed<strong>in</strong>g to the analysis of the new choice<br />
rule based on (9), we give some practical guidel<strong>in</strong>es on choos<strong>in</strong>g the parameters β 0 , α and d. The<br />
parameter β 0 plays only an <strong>in</strong>significant role as long as its value is kept sufficiently small so that the term<br />
ψ(x η ) is dom<strong>in</strong>ated. Practically, we have observed that the numerical results are practically identical <strong>for</strong><br />
β 0 vary<strong>in</strong>g over a wide range, e.g. [1 × 10 −3 , 1 × 10 −10 ]. Numerical experiments <strong>in</strong>dicate that <strong>for</strong> the<br />
f<strong>in</strong>ite-dimensional <strong>in</strong>verse problem (4), small values of α 0 and α 1 works well <strong>for</strong> <strong>in</strong>verse problems with<br />
5% relative noise <strong>in</strong> the data. There<strong>for</strong>e, a value of α 0 ′ ≈ m 2 and α′ 1 ≈ n 2<br />
suffices <strong>in</strong> this case, which<br />
consequently <strong>in</strong>dicates that α 0 ′ = 1 2 and α′ 1 = 1 2<br />
should suffice <strong>for</strong> its cont<strong>in</strong>uous analog if m ≈ n, i.e.<br />
the constant α 0/α ′ 1 ′ should ma<strong>in</strong>ta<strong>in</strong> the order 1 <strong>in</strong> equation (7). With these experimental observations<br />
<strong>in</strong> m<strong>in</strong>d, the constant α <strong>in</strong> (9) should be of order one, but scaled appropriately by the magnitude of<br />
the data to account <strong>for</strong> the rescal<strong>in</strong>g φ(x η ∗, y δ ) −d . This can be roughly achieved by rescal<strong>in</strong>g its value<br />
by max i |y i | 2d and max i |y i | d <strong>in</strong> case of L 2 and L 1 data-fitt<strong>in</strong>g, respectively. The optimal value of the<br />
exponent d depends on the source condition verified by the exact solution x + , see Theorems 3.4 and 3.5,<br />
and typically we choose its value <strong>in</strong> the range [ 1 3 , 1 2 ]. These guidel<strong>in</strong>es on the selection of parameters β 0,<br />
α and d are simple and easy to realize <strong>in</strong> numerical implementations, and have worked very well <strong>for</strong> all<br />
the five benchmark problems, rang<strong>in</strong>g from mildly to severely ill-posed <strong>in</strong>verse problems; see Section 5<br />
<strong>for</strong> details.<br />
3 Existence and error estimates<br />
This section shows the existence of solutions to the regularization parameter equation (9), and derives<br />
a posteriori error estimates of the <strong>Tikhonov</strong> solution x η ∗ <strong>in</strong> (3). We make the follow<strong>in</strong>g assumptions on<br />
the functionals φ(x, y δ ) and ψ(x).<br />
Assumption 3.1 Assume that the nonnegative functionals φ(x, y δ ) and ψ(x) satisfy<br />
(a) For any η > 0, the functional J η (x) def<strong>in</strong>ed <strong>in</strong> (3) is coercive on X , i.e. J η (x) → +∞ as ‖x‖ X → ∞.<br />
(b) The functionals φ(x, y δ ) and ψ(x) are weakly lower semi-cont<strong>in</strong>uous, i.e.<br />
φ(x, y δ ) ≤ lim <strong>in</strong>f<br />
n→∞ φ(x n, y δ ) and ψ(x) ≤ lim <strong>in</strong>f<br />
n→∞ ψ(x n)<br />
6<br />
α ′ 1
<strong>for</strong> any sequence {x n } n ⊂ X converg<strong>in</strong>g weakly to x.<br />
(c) There exists an ˜x such that ψ(˜x) = 0.<br />
Assumptions 3.1(a) and (b) are standard <strong>for</strong> ensur<strong>in</strong>g the existence of a m<strong>in</strong>imizer to the <strong>Tikhonov</strong><br />
functional J η [11].<br />
Note that the m<strong>in</strong>imizers x η of the <strong>Tikhonov</strong> functional J η <strong>in</strong> (3) might be nonunique, thus the<br />
functions φ(x η , y δ ) and ψ(x η ) might be multi-valued. We will need the next lemma on the monotonicity<br />
of these functions with respect to η.<br />
Lemma 3.1 Let x η1 and x η2 be the solutions to the <strong>Tikhonov</strong> functional J η (x) <strong>in</strong> (3) with the regularization<br />
parameter η 1 and η 2 , respectively. Then we have<br />
(ψ(x η1 ) − ψ(x η2 ))(η 1 − η 2 ) ≤ 0,<br />
(φ(x η1 , y δ ) − φ(x η2 , y δ ))(η 1 − η 2 ) ≥ 0.<br />
Proof. By the m<strong>in</strong>imiz<strong>in</strong>g properties of x η1 and x η2 , we have<br />
φ(x η1 , y δ ) + η 1 ψ(x η1 ) ≤ φ(x η2 , y δ ) + η 1 ψ(x η2 ),<br />
φ(x η2 , y δ ) + η 2 ψ(x η2 ) ≤ φ(x η1 , y δ ) + η 2 ψ(x η1 ).<br />
Add<strong>in</strong>g these two <strong>in</strong>equalities gives the first assertion, and the second can be derived analogously. □<br />
The m<strong>in</strong>imizer x η to the <strong>Tikhonov</strong> functional J η <strong>in</strong> (3) is a nonl<strong>in</strong>ear function of the regularization<br />
parameter η, there<strong>for</strong>e equation (9) is a nonl<strong>in</strong>ear equation <strong>in</strong> η. Assisted with Lemma 3.1, we are now<br />
ready to give an existence result to equation (9).<br />
Theorem 3.1 Assume that the functions φ(x η , y δ ) and ψ(x η ) are cont<strong>in</strong>uous with respect to η. Then<br />
there exists at least one positive solution to equation (9) if lim η→0 + φ(x η , y δ ) > 0.<br />
Proof.<br />
Def<strong>in</strong>e<br />
f(η) = η(ψ(x η ) + β 0 ) − αφ(x η , y δ ) 1−d , (10)<br />
then the nonnegativity of the functionals φ(x, y δ ) and ψ(x) and Lemma 3.1 imply that<br />
from which we derive that<br />
f(η) ≥ β 0 η − αφ(x ∞ , y δ ) 1−d ,<br />
lim f(η) = +∞.<br />
η→∞<br />
Note that by Lemma 3.1, φ(x η , y δ ) is monotonically decreas<strong>in</strong>g with the decrease of η, and by Assumption<br />
3.1, it is bounded from below. There<strong>for</strong>e, the follow<strong>in</strong>g limit<strong>in</strong>g process makes sense, and<br />
lim f(η) = lim −αφ(x η, y δ ) < 0,<br />
η→0 + η→0 +<br />
by the assumption that lim η→0 + φ(x η , y δ ) is positive. By the cont<strong>in</strong>uity of the functional φ(x η , y δ ) and<br />
ψ(x η ) with respect to η, we conclude that there exists at least one positive solution to equation (9). □<br />
Remark 3.1 The existence of a solution follows also from the convergence of the fixed po<strong>in</strong>t algorithm,<br />
see Theorem 4.1. Also, the existence of a positive solution can be ensured <strong>for</strong> a relaxation of equation (9)<br />
η(ψ(x η ) + β 0 ) = α(φ(x η , y δ ) + β 1 ) 1−d , 0 < d < 1,<br />
where β 1 acts as a relaxation parameter and is usually taken to be much smaller compared to the magnitude<br />
of φ(x η , y δ ).<br />
7
Remark 3.2 The proposed choice rule (9) also generalizes the zero-cross<strong>in</strong>g method <strong>for</strong> the L 2 -L 2 functional,<br />
which seeks the solution to the nonl<strong>in</strong>ear equation<br />
−φ(x η ∗, y δ ) + η ∗ ψ(x η ∗) = 0,<br />
It is obta<strong>in</strong>ed by sett<strong>in</strong>g d = 0 and α = 1 <strong>in</strong> equation (9). The zero-cross<strong>in</strong>g method is popular <strong>in</strong> the<br />
biomedical eng<strong>in</strong>eer<strong>in</strong>g community, and <strong>for</strong> some analysis of the method, we refer to [20].<br />
Theorem 3.1 relies crucially on the cont<strong>in</strong>uity of the functions φ(x η , y δ ) and ψ(x η ) with respect to<br />
the regularization parameter η. Lemma 3.1 <strong>in</strong>dicates that the functions are monotone, and thus are<br />
differentiable almost everywhere. The follow<strong>in</strong>g theorem gives one sufficient condition <strong>for</strong> the cont<strong>in</strong>uity.<br />
Theorem 3.2 Suppose that the functional J η has a unique m<strong>in</strong>imizer <strong>for</strong> every η > 0. Then the functions<br />
φ(x η , y δ ) and ψ(x η ) are cont<strong>in</strong>uous with respect to η.<br />
Proof. Fix η ∗ > 0 and let x η ∗ be the unique m<strong>in</strong>imizer of J η ∗. Let {η j } j ⊂ R + converge to η ∗ .<br />
Consider the sequence of m<strong>in</strong>imizers {x ηj } j . Observe that<br />
φ(x ηj , y δ ) + η j ψ(x ηj ) ≤ φ(˜x, y δ ) + η j φ(˜x) = φ(˜x, y δ ).<br />
This implies that the sequences {φ(x ηj , y δ )} j and {ψ(x ηj )} j are uni<strong>for</strong>mly bounded. By Assumption<br />
3.1(a), the sequence {x ηj } j is uni<strong>for</strong>mly bounded. There<strong>for</strong>e, there exists a subsequence of {x ηj } j , also<br />
denoted as {x ηj } j , such that x ηj → x ∗ weakly. By Assumption 3.1(b) on the weak lower semi-cont<strong>in</strong>uity<br />
of the functionals, we have<br />
Hence, we arrive at<br />
φ(x ∗ , y δ ) ≤ lim <strong>in</strong>f<br />
j→∞ φ(x η j<br />
, y δ ) and ψ(x ∗ ) ≤ lim <strong>in</strong>f<br />
j→∞ ψ(x η j<br />
). (11)<br />
J η ∗(x ∗ ) = φ(x ∗ , y δ ) + η ∗ ψ(x ∗ ) ≤ lim <strong>in</strong>f<br />
j→∞<br />
Next we show J η ∗(x η ∗) ≥ lim sup j→∞ J ηj (x ηj ). To see this,<br />
by the fact that x ηj<br />
lim sup<br />
j→∞<br />
φ(x η j<br />
, y δ ) + lim <strong>in</strong>f η jψ(x ηj ) ≤ lim <strong>in</strong>f J η j<br />
(x ηj ).<br />
j→∞<br />
j→∞<br />
J ηj (x ηj ) ≤ lim sup J ηj (x η ∗) = lim J η j<br />
(x η ∗) = J η ∗(x η ∗)<br />
j→∞<br />
j→∞<br />
is a m<strong>in</strong>imizer of J ηj . Consequently,<br />
lim sup J ηj (x ηj ) ≤ J η ∗(x η ∗) ≤ J η ∗(x ∗ ) ≤ lim <strong>in</strong>f J η j<br />
(x ηj ). (12)<br />
j→∞<br />
j→∞<br />
We thus see that x ∗ is a m<strong>in</strong>imizer of J η ∗, and by uniqueness of m<strong>in</strong>imizers of J η ∗, we deduce that<br />
x ∗ = x η ∗, and the whole sequence {x ηj } j converges weakly to x η ∗. Consequently, the function J η (x η ) is<br />
cont<strong>in</strong>uous with respect to η. Next we show that the functional ψ(x ηj ) → ψ(x η ∗), <strong>for</strong> which it suffices to<br />
show that<br />
lim sup ψ(x ηj ) ≤ ψ(x η ∗).<br />
j→∞<br />
Assume that it does not hold. Then there exists a constant c such that c := lim sup j→∞ ψ(x ηj ) > ψ(x η ∗),<br />
and there exists a subsequence of {x ηj } j , denoted by {x n } n , such that<br />
As a consequence of (12), we have<br />
x n → x η ∗ weakly , and ψ(x n ) → c.<br />
lim φ(x n, y δ ) = φ(x η ∗, y δ ) + η ∗ ψ(x η ∗) − lim η nψ(x n )<br />
n→∞ n→∞<br />
= φ(x η ∗, y δ ) + η ∗ (ψ(x η ∗) − c) < φ(x η ∗, y δ ).<br />
This is <strong>in</strong> contradiction with (11). There<strong>for</strong>e, we have lim sup j→∞ ψ(x ηj ) ≤ ψ(x η ∗), and the function<br />
ψ(x η ) is cont<strong>in</strong>uous with respect to η. The cont<strong>in</strong>uity of φ(x η , y δ ) follows from the cont<strong>in</strong>uity of the<br />
function J η (x η ) and ψ(x η ).<br />
□<br />
The follow<strong>in</strong>g corollary is a direct consequence of Theorem 3.2, and is also of <strong>in</strong>dependent <strong>in</strong>terest.<br />
8
Corollary 3.1 The functional value J η (x η ) is always cont<strong>in</strong>uous with respect to η. The multi-valued<br />
functions φ(x η , y δ ) and ψ(x η ) share the same discont<strong>in</strong>uity set, which is at most of countable card<strong>in</strong>ality.<br />
Proof. The cont<strong>in</strong>uity of the functional follows directly from the proof of Theorem 3.2, and it consequently<br />
implies that φ(x η , y δ ) and ψ(x η ) share the same discont<strong>in</strong>uity set. The fact that the discont<strong>in</strong>uity<br />
set is countable follows from the monotonicity of φ(x η , y δ ) and ψ(x η ), see Lemma 3.1.<br />
□<br />
Remark 3.3 The cont<strong>in</strong>uity result also holds <strong>for</strong> non-reflexive spaces, e.g. BV space. For a proof of<br />
the L 2 -T V <strong>for</strong>mulation, we refer to reference [9]. However, the uniqueness assumption is necessary <strong>in</strong><br />
general, and one counterexample is the L 1 -T V <strong>for</strong>mulation [9]. Theorem 3.2 rema<strong>in</strong>s valid <strong>in</strong> the presence<br />
of convex constra<strong>in</strong>t set C or nonl<strong>in</strong>ear operators K.<br />
We are now go<strong>in</strong>g to establish some variational characterization of the regularization parameter choice<br />
rule (9). For this, we <strong>in</strong>troduce a functional G by<br />
{ 1<br />
G(x) = d φ(x, yδ ) d + α ln(β 0 + ψ(x)), 0 < d < 1,<br />
ln φ(x, y δ ) + α ln(β 0 + ψ(x)), d = 0.<br />
Clearly, the existence of a m<strong>in</strong>imizer to the functional G follows directly from Assumption 3.1 as it is<br />
bounded from below, coercive and weakly lower semi-cont<strong>in</strong>uous. Under the premise that the functionals<br />
φ(x, y δ ) and ψ(x) are differentiable, a critical po<strong>in</strong>t x ∗ of the functional G satisfies<br />
Sett<strong>in</strong>g η ∗ = α φ(x∗ ,y δ ) 1−d<br />
ψ(x ∗ )+β 0<br />
gives<br />
φ(x ∗ , y δ ) d−1 φ ′ (x ∗ , y δ ) +<br />
α<br />
ψ(x ∗ ) + β 0<br />
ψ ′ (x ∗ ) = 0.<br />
φ ′ (x ∗ , y δ ) + η ∗ ψ ′ (x ∗ ) = 0,<br />
i.e. x ∗ is a critical po<strong>in</strong>t of the functional J η ∗. Furthermore, if the functional J η ∗ is convex, then x ∗ is<br />
also a m<strong>in</strong>imizer of the functional J η ∗ and x ∗ = x η ∗. The next theorem summarizes this observation.<br />
Theorem 3.3 If the functional J η is convex and the functionals φ(x, y δ ) and ψ(x) are differentiable,<br />
then a solution x η ∗ computed by the choice rule (9) is a critical po<strong>in</strong>t of the functional G.<br />
Theorem 3.3 and the existence of a m<strong>in</strong>imizer to the functional G ensures existence of a solution to<br />
the strategy (9). The functional G provides a variational characterization of the regularization parameter<br />
choice rule (9), while the strategy (9) implements the functional G via the optimality condition. There<br />
might exist better strategies numerically implement<strong>in</strong>g the functional G, however, this is beyond the<br />
scope of the present study.<br />
To offer further theoretical justifications of the choice rule (9), we will derive a posteriori error<br />
estimates, i.e. a bound of the error between the <strong>in</strong>verse solution x η ∗ and the exact solution x + to (1).<br />
We will consider functionals of L 2 -ψ type with ψ be<strong>in</strong>g convex, and discuss two cases ψ(x) = ‖x‖ 2 2 and<br />
ψ(x) be<strong>in</strong>g a general convex function separately due to the <strong>in</strong>herent differences therebetween.<br />
We first specialize to <strong>Tikhonov</strong> regularization <strong>in</strong> Hilbert spaces [11], with its norm denoted by ‖ · ‖.<br />
Let x η ∗ be a solution to the <strong>Tikhonov</strong> functional <strong>in</strong> (3) with φ(x, y δ ) = ‖Kx − y δ ‖ 2 2, ψ(x) = ‖x‖ 2 2 and<br />
with η ∗ chosen by equation (9). To this end, we adopt the general framework of reference [11]. Let<br />
g η (t) = 1<br />
η+t and r η(t) = 1 − tg η (t) =<br />
η<br />
η+t , then def<strong>in</strong>e G(η) by G(η) := sup{|g η(t)| : t ∈ [0, ‖K‖ 2 ]} = 1 η ,<br />
and let ω µ : (0, ‖K‖ 2 ) → R be such that <strong>for</strong> all γ ∈ (0, γ 0 ) and t ∈ [0, ‖K‖ 2 ], t µ |r γ (t)| ≤ ω µ (γ) holds.<br />
Then <strong>for</strong> 0 < µ ≤ 1, we have ω µ (η) = η µ . Moreover, def<strong>in</strong>e the source sets X µ,ρ by X µ,ρ := {x ∈ X :<br />
x = (K ∗ K) µ w, ‖w‖ ≤ ρ}. With these prelim<strong>in</strong>aries, we are ready to state one of our ma<strong>in</strong> results on a<br />
posteriori error estimates.<br />
Theorem 3.4 Let x + be the m<strong>in</strong>imum-norm solution to Kx = y, and assume that x + ∈ X µ,ρ <strong>for</strong> some<br />
0 < µ ≤ 1. Let δ ∗ := ‖y δ − Kx η ∗‖ and d = 2µ<br />
2µ+1<br />
. Then we have<br />
( √ )<br />
‖x + − x η ∗‖ ≤ c ρ 1 δ ψ(xη ∗) + β 0<br />
2µ+1 + √ max{δ, δ ∗ } 2µ<br />
2µ+1 . (13)<br />
δ ∗ α<br />
9
Proof.<br />
We decompose the error x + − x η <strong>in</strong>to<br />
x + − x η = r η (K ∗ K)x + + g η (K ∗ K)K ∗ (y − y δ ).<br />
Introduc<strong>in</strong>g the source representer w with x + = (K ∗ K) µ w, the <strong>in</strong>terpolation <strong>in</strong>equality gives<br />
‖r η (K ∗ K)x + ‖ = ‖r η (K ∗ K)(K ∗ K) µ w‖<br />
≤ ‖(K ∗ K) 1 2 +µ r η (K ∗ K)w‖ 2µ<br />
2µ+1 ‖rη (K ∗ K)w‖ 1<br />
2µ+1<br />
= ‖r η (KK ∗ )Kx + ‖ 2µ<br />
2µ+1 ‖rη (K ∗ K)w‖ 1<br />
2µ+1<br />
≤ c ( ‖r η (KK ∗ )y δ ‖ + ‖r η (KK ∗ )(y δ − y)‖ ) 2µ<br />
2µ+1<br />
‖w‖ 1<br />
2µ+1 ,<br />
where the constant c depends only on the maximum of r η over [0, ‖K‖ 2 ]. By not<strong>in</strong>g the relation<br />
we obta<strong>in</strong><br />
r η ∗(KK ∗ )y δ = y δ − Kx η ∗,<br />
‖r η ∗(K ∗ K)x + ‖ ≤ c(δ ∗ + cδ) 2µ<br />
2µ+1 ρ<br />
1<br />
2µ+1 ≤ c1 max{δ, δ ∗ } 2µ<br />
2µ+1 ρ<br />
1<br />
2µ+1 .<br />
It rema<strong>in</strong>s to estimate the term ‖g η ∗(K ∗ K)K ∗ (y δ − y)‖. The standard estimate (see Theorem 4.2 of [11])<br />
yields<br />
‖g η ∗(K ∗ K)K ∗ (y δ − y)‖ ≤ c√ δ<br />
η<br />
∗ ,<br />
However, by equation (9), we have<br />
There<strong>for</strong>e, we derive that<br />
1<br />
√ η<br />
∗ = δd ∗<br />
δ ∗<br />
√<br />
ψ(xη ∗) + β 0<br />
√ α<br />
.<br />
‖g η ∗(K ∗ K)K ∗ (y δ − y)‖ ≤ c δ δ ∗<br />
√<br />
ψ(xη ∗) + β 0<br />
√ α<br />
δ d ∗ ≤ c δ δ ∗<br />
√<br />
ψ(xη ∗) + β 0<br />
√ α<br />
max{δ, δ ∗ } d .<br />
Comb<strong>in</strong><strong>in</strong>g these two estimates and tak<strong>in</strong>g <strong>in</strong>to account that d = 2µ<br />
2µ+1<br />
, we arrive at the desired a posteriori<br />
error estimate.<br />
□<br />
Remark 3.4 The error bound (13) states that the approximation obta<strong>in</strong>ed from the proposed rule is<br />
order-optimal provided that δ ∗ is about the order of δ. However, to this end, the exponent d must be<br />
chosen accord<strong>in</strong>g to the sourcewise parameter µ. The knowledge of δ ∗ enables a posteriori check<strong>in</strong>g: if<br />
δ ∗ ≪ δ, then one should be cautious about the chosen parameter, s<strong>in</strong>ce the prefactor δ<br />
δ ∗<br />
is very large; if<br />
δ ∗ ≫ δ, the situation is not critical and the magnitude of δ ∗ essentially determ<strong>in</strong>es the error. Numerically,<br />
the prefactor λ ∗ = ψ(x η ∗ )+β 0<br />
α<br />
rema<strong>in</strong>s almost constant as the noise level σ0 2 varies.<br />
Next we consider functionals of the type L 2 -ψ with ψ(x) be<strong>in</strong>g convex. The convergence rate analysis<br />
<strong>for</strong> <strong>in</strong>verse problems <strong>in</strong> Banach spaces is fundamentally different from that <strong>in</strong> Hilbert spaces [8]. We will<br />
use an <strong>in</strong>terest<strong>in</strong>g new distance function, the generalized Bregman distance (cf. [8]), to measure the a<br />
posteriori error. To this end, we need the concept of the ψ-m<strong>in</strong>imiz<strong>in</strong>g solution.<br />
Def<strong>in</strong>ition 3.1 An element x + ∈ X is called a ψ-m<strong>in</strong>imiz<strong>in</strong>g solution of (1) if Kx + = y and<br />
ψ(x + ) ≤ ψ(x), ∀x ∈ X such that Kx = y.<br />
10
Let us denote the subdifferential of ψ(x) at x + by ∂ψ(x + ), i.e.<br />
∂ψ(x + ) = {q ∈ X ∗ : ψ(x) ≥ ψ(x + ) + 〈q, x − x + 〉, ∀x ∈ X },<br />
and def<strong>in</strong>e the generalized Bregman distance D ψ (x, x + ) by<br />
D ψ (x, x + ) := { ψ(x) − ψ(x + ) − 〈q, x − x + 〉 : q ∈ ∂ψ(x + ) } .<br />
One can verify that if ψ(x) = ‖x‖ 2 , then the generalized Bregman distance d(x η ∗, x + ) reduces to the<br />
familiar <strong>for</strong>mula<br />
d(x η ∗, x + ) = ‖x η ∗ − x + ‖ 2 .<br />
Now we are ready to present another a posteriori error estimate.<br />
Theorem 3.5 Let x + be a ψ-m<strong>in</strong>imiz<strong>in</strong>g solution to equation (1) and assume that the follow<strong>in</strong>g source<br />
condition holds: there exists a w ∈ Y such that<br />
K ∗ w ∈ ∂ψ(x + ).<br />
Let δ ∗ = ‖Kx η ∗ −y δ ‖ and d = 1 2 . Then <strong>for</strong> each x η ∗ that solves equation (9), there exists d ∈ D ψ(x η ∗, x + )<br />
such that<br />
( )<br />
δ<br />
d(x η ∗, x + ψ(x η ∗) + β 0 α<br />
) ≤<br />
+<br />
‖w‖ 2 max{δ, δ ∗ }.<br />
δ ∗ α ψ(x η ∗) + β 0<br />
Proof.<br />
Let<br />
d(x η ∗, x + ) = ψ(x η ∗) − ψ(x + ) − 〈K ∗ w, x η ∗ − x + 〉 ∈ D ψ (x η ∗, x + ).<br />
By the m<strong>in</strong>imiz<strong>in</strong>g property of x η ∗, Kx + = y and ‖y − y δ ‖ = δ, we have<br />
1<br />
2 ‖Kx η ∗ − yδ ‖ 2 2 + η ∗ ψ(x η ∗) ≤ δ2<br />
2 + η∗ ψ(x + ),<br />
i.e.<br />
1<br />
2 ‖Kx η ∗ − yδ ‖ 2 2 + η ∗ d + η ∗ [ 〈w, Kx η ∗ − y δ 〉 + 〈w, y δ − y〉 ] ≤ δ2<br />
2 .<br />
Add<strong>in</strong>g 1 2 η∗2 ‖w‖ 2 to both sides of the equality and utiliz<strong>in</strong>g the Cauchy-Schwartz <strong>in</strong>equality yield<br />
1<br />
2 ‖Kx η ∗ − yδ − η ∗ w‖ 2 2 + η ∗ d(x η ∗, x + ) ≤ δ2<br />
2 + η∗2<br />
2 ‖w‖2 + η ∗ 〈w, y − y δ 〉<br />
≤ δ 2 + ‖w‖ 2 η ∗2 .<br />
There<strong>for</strong>e, we derive that<br />
d(x η ∗, x + ) ≤ δ2<br />
η ∗ + ‖w‖2 η ∗ .<br />
which comb<strong>in</strong>ed with equation (9) yields the desired estimate.<br />
□<br />
4 Numerical algorithm <strong>for</strong> both <strong>Tikhonov</strong> solution and regularization<br />
parameter<br />
The new choice rule requires solv<strong>in</strong>g the nonl<strong>in</strong>ear regularization parameter equation (9) <strong>for</strong> the regularization<br />
parameter η <strong>in</strong> order to f<strong>in</strong>d the <strong>Tikhonov</strong> solution x η through the functional J η <strong>in</strong> (3). A<br />
direct numerical treatment of (9) seems difficult. Motivated by the strict biconvexity structure of the<br />
g-<strong>Tikhonov</strong> functional J (x, λ, τ), i.e. it is strictly convex <strong>in</strong> x (respectively <strong>in</strong> (λ, τ)) <strong>for</strong> fixed (λ, τ) (<br />
respectively x), we propose the follow<strong>in</strong>g iterative algorithm <strong>for</strong> the efficient numerical realization of the<br />
proposed choice rule (9), along with the <strong>Tikhonov</strong> solution x η through the functional J η <strong>in</strong> (3).<br />
Algorithm I. Choose an <strong>in</strong>itial guess η 0 > 0, and set k = 0. F<strong>in</strong>d (x k , η k ) <strong>for</strong> k ≥ 1 as follows:<br />
11
(i) Solve <strong>for</strong> x k+1 by the <strong>Tikhonov</strong> regularization method<br />
(ii) Update the regularization parameter η k+1 by<br />
x k+1 = arg m<strong>in</strong><br />
x<br />
{<br />
φ(x, y δ ) + η k ψ(x) } .<br />
η k+1 = α φ(x k+1, y δ ) 1−d<br />
ψ(x k+1 ) + β 0<br />
.<br />
(iii) Check the stopp<strong>in</strong>g criterion. If not converged, set k = k + 1 and repeat from Step (i).<br />
Be<strong>for</strong>e embark<strong>in</strong>g on the convergence analysis of the algorithm, we mention that we have not specified<br />
the solver <strong>for</strong> <strong>Tikhonov</strong> regularization problem <strong>in</strong> Step (i). The problem per se may be approximately<br />
solved with an iterative algorithm, e.g. the conjugate gradient method or iterative reweighted leastsquares<br />
method. Numerically, we have found that it will not affect the steady convergence of the algorithm<br />
much so long as the the problem is solved with reasonable accuracy.<br />
The follow<strong>in</strong>g lemma provides an <strong>in</strong>terest<strong>in</strong>g and practically very important observation on the monotonicity<br />
of the sequence {η k } k of regularization parameters generated by Algorithm I, and the monotonicity<br />
is key to the demonstration of the convergence of the algorithm.<br />
Lemma 4.1 For any <strong>in</strong>itial guess η 0 , the sequence {η k } k generated by Algorithm I converges monotonically.<br />
Proof.<br />
By the def<strong>in</strong>ition of η k , we have<br />
η k := α φ(x k, y δ ) 1−d<br />
ψ(x k ) + β 0<br />
.<br />
There<strong>for</strong>e,<br />
where the denom<strong>in</strong>ator D k is def<strong>in</strong>ed as<br />
η k − η k−1 = α φ(x k, y δ ) 1−d<br />
ψ(x k ) + β 0<br />
− α φ(x k−1, y δ ) 1−d<br />
ψ(x k−1 ) + β 0<br />
= α D k<br />
[I + β 0 II] , (14)<br />
D k = (ψ(x k−1 ) + β 0 ) (ψ(x k ) + β 0 ) .<br />
The terms I and II <strong>in</strong> the square bracket of equation (14) are respectively given by<br />
I := φ(x k , y δ ) 1−d ψ(x k−1 ) − φ(x k−1 , y δ ) 1−d ψ(x k )<br />
= φ(x k , y δ ) 1−d (ψ(x k−1 ) − ψ(x k )) + ψ(x k )(φ(x k , y δ ) 1−d − φ(x k−1 , y δ ) 1−d ),<br />
II := φ(x k , y δ ) 1−d − φ(x k−1 , y δ ) 1−d .<br />
We assume that η k−1 ≠ η k−2 , otherwise it is trivial. Lemma 3.1 <strong>in</strong>dicates that each term is of the same<br />
sign with η k − η k−1 , and thus the sequence {η k } k is monotone. Next we show that the sequence {η k } k is<br />
bounded. A trivial lower bound is zero. Now by the m<strong>in</strong>imiz<strong>in</strong>g property of x k , we deduce<br />
φ(x k , y δ ) + η k ψ(x k ) ≤ φ(˜x, y δ ) + η k ψ(˜x),<br />
where ˜x ∈ X satisfies ψ(˜x) = 0 by Assumption 3.1(c). Consequently,<br />
φ(x k , y δ ) ≤ φ(˜x, y δ ).<br />
12
There<strong>for</strong>e, the def<strong>in</strong>ition of η k gives<br />
η k = α φ(x k, y δ ) 1−d<br />
ψ(x k ) + β 0<br />
≤ α β 0<br />
φ(˜x, y δ ) 1−d ,<br />
i.e. the sequence {η k } k is uni<strong>for</strong>mly bounded, which comb<strong>in</strong>ed with the monotonicity yields the desired<br />
convergence.<br />
□<br />
Lemma 4.2 Assume that the functionals φ(x) and ψ(x) are differentiable, and let F (η) = φ(x η , y δ ) +<br />
ηψ(x η ). Then the asymptotic convergence rate r ∗ of the algorithm is dictated by<br />
r ∗ η ∗ − η k+1<br />
:= lim<br />
k→∞ η ∗ = −η∗ F ′′ (η ∗ ) [<br />
(1 − d)αφ(xη ∗, y δ ) −d + 1 ] .<br />
− η k ψ(x η ∗) + β 0<br />
Proof.<br />
Differentiat<strong>in</strong>g F (η) with respect to η gives<br />
F ′ (η) = dφ(x η, y δ ) dx η<br />
dx η<br />
dη + ηψ(x η) + η dψ(x η)<br />
dx η<br />
which tak<strong>in</strong>g <strong>in</strong>to account the optimality condition <strong>for</strong> x η gives<br />
dx η<br />
dx ,<br />
ψ(x η ) = F ′ (η) and φ(x η , y δ ) = F (η) − ηF ′ (η).<br />
The asymptotic convergence rate r ∗ of the algorithm is dictated by<br />
r ∗ η ∗ − η k+1<br />
:= lim<br />
k→∞ η ∗ = d<br />
− η k dη αφ(x η, y δ ) 1−d<br />
| η=η ∗ = α d<br />
ψ(x η ) + β 0 dη<br />
[F (η) − ηF ′ (η)] 1−d<br />
F ′ | η=η ∗<br />
(η) + β 0<br />
= α [F (η∗ ) − η ∗ F ′ (η ∗ )] −d F ′′ (η ∗ )[−(1 − d)η ∗ (F ′ (η ∗ ) + β 0 ) − (F (η ∗ ) − η ∗ F ′ (η ∗ ))]<br />
(F ′ (η ∗ ) + β 0 ) 2<br />
−η ∗ F ′′ (η ∗ ) [<br />
=<br />
(1 − d)η ∗<br />
(ψ(x η ∗) + β 0 )φ(x η ∗, y δ (ψ(x η ∗) + β 0 ) + φ(x η ∗, y δ ) ]<br />
)<br />
= −η∗ F ′′ (η ∗ ) [<br />
(1 − d)αφ(xη ∗, y δ ) −d + 1 ] .<br />
ψ(x η ∗) + β 0<br />
This establishes the lemma.<br />
□<br />
Remark 4.1 For the special case d = 0, the expression of rate r ∗ <strong>in</strong> Lemma 4.2 simplifies to<br />
r ∗ = (1 + α) −η∗ F ′′ (η ∗ )<br />
ψ(x η ∗) + β 0<br />
.<br />
The established monotone convergence of the sequence {η k } k implies that r ∗ ≤ 1, however, a precise<br />
estimate of the rate r ∗ is still miss<strong>in</strong>g. Nonetheless, a fast convergence is always numerically observed.<br />
Def<strong>in</strong>ition 4.1 [22] A functional ψ(x) is said to have the H-property on the space X if any sequence<br />
{x n } n ⊂ X weakly converg<strong>in</strong>g to a limit x 0 ∈ X and converg<strong>in</strong>g to x 0 <strong>in</strong> functional, i.e. ψ(x n ) → ψ(x 0 ),<br />
strongly converges to x 0 <strong>in</strong> X .<br />
This property is also known as the Efimov-Stechk<strong>in</strong> condition or the Kadec-Klee property <strong>in</strong> the literature.<br />
Norms and semi-norms on Hilbert spaces, and norms the spaces L p (Ω) and Sobolev spaces W m,p (Ω) with<br />
1 < p < ∞ and m ≥ 1 satisfy the H-property.<br />
Assisted with Lemma 4.1, we are now ready to prove the convergence of Algorithm I.<br />
Theorem 4.1 Assume that η ∗ > 0. Then every subsequence of the sequence {(x k , η k )} k generated by<br />
Algorithm I has a subsequence converg<strong>in</strong>g weakly to a solution (x ∗ , η ∗ ) of equation (9), and the convergence<br />
of the sequence {η k } k is monotonic. If the m<strong>in</strong>imizer of J η ∗(x) is unique, the whole sequence converges<br />
weakly. Moreover, if the functional ψ(x) satisfies the H-property, the weak convergence is actually strong.<br />
13
Proof.<br />
Lemma 4.1 shows that there exists some η ∗ such that<br />
lim η k = η ∗ > 0.<br />
k→∞<br />
By Lemma 3.1 and the monotonicity of the sequence {η k } k , we deduce that the sequences {φ(x k , y δ )} k<br />
and {ψ(x k )} k are monotonic. By η ∗ > 0 and Assumption 3.1, we observe that<br />
0 ≤ φ(x k , y δ ) ≤ φ(˜x, y δ ),<br />
0 ≤ ψ(x k ) ≤ max{ψ(x η0 ), ψ(x η ∗)}.<br />
There<strong>for</strong>e, the sequences {φ(x k , y δ )} k and {ψ(x k )} k are monotonically convergent. By Assumption 3.1(a),<br />
the sequence {x k } k is uni<strong>for</strong>mly bounded, and there exists a subsequence of {x k } k , also denoted as {x k } k ,<br />
and some x ∗ ∈ X , such that<br />
x k → x ∗ weakly.<br />
The m<strong>in</strong>imiz<strong>in</strong>g property of x k gives<br />
Lett<strong>in</strong>g k tend to ∞, we have<br />
φ(x k , y δ ) + η k−1 ψ(x k ) ≤ φ(x, y δ ) + η k−1 ψ(x), ∀ x ∈ X .<br />
φ(x ∗ , y δ ) + η ∗ ψ(x ∗ ) ≤ φ(x, y δ ) + η ∗ ψ(x), ∀ x ∈ X ,<br />
i.e., x ∗ is a m<strong>in</strong>imizer of the <strong>Tikhonov</strong> functional J η ∗. There<strong>for</strong>e, the element (x ∗ , η ∗ ) satisfies equation<br />
(9). Now if the m<strong>in</strong>imizer of the functional J η ∗ is unique, the whole sequence {x k } k converges weakly to<br />
x ∗ . Recall the monotone convergence<br />
lim<br />
k→∞ ψ(x k) = c ∗ ,<br />
<strong>for</strong> some constant c ∗ . Next we show that c ∗ = φ(x ∗ ). By the lower semi-cont<strong>in</strong>uity of ψ(x) we have<br />
φ(x ∗ ) ≤ lim <strong>in</strong>f ψ(x k) = lim ψ(x k) = c ∗ .<br />
k→∞<br />
k→∞<br />
Assume that c ∗ > ψ(x ∗ ), then by the cont<strong>in</strong>uity of the functional value J η (x η ) with respect to η, see<br />
Corollary 3.1, we have φ(x ∗ , y δ ) > lim k→∞ φ(x k , y δ ), which is <strong>in</strong> contradiction with the lower semicont<strong>in</strong>uity<br />
of the functional φ(x, y δ ). There<strong>for</strong>e, we deduce that<br />
lim ψ(x k) = ψ(x ∗ ),<br />
k→∞<br />
which together with the H-property of ψ(x) on the space X implies the desired strong convergence.<br />
□<br />
Remark 4.2 In the numerical algorithm, the quantity σ 2 (η) can also be computed<br />
σ 2 (η k ) = φ(x k, y δ )<br />
α 1<br />
,<br />
which estimates the variance σ 2 0 of the data noise, analogous to the highly applauded generalized crossvalidation<br />
[31]. By observ<strong>in</strong>g Lemmas 3.1 and 4.1, the sequence {σ 2 (η k )} k converges monotonically. One<br />
dist<strong>in</strong>ction of the estimate σ 2 (η) is that it changes very mildly dur<strong>in</strong>g the iteration, especially <strong>for</strong> severely<br />
ill-posed <strong>in</strong>verse problems.<br />
14
Table 1: Numerical examples.<br />
example description ill-posedness Cond(K) noise program φ-ψ<br />
1 Shaw’s problem severe 1.94 × 10 19 Gaussian shaw L 2 -L 2<br />
2 gravity survey<strong>in</strong>g severe 9.74 × 10 18 Gaussian gravity L 2 -H 2 with C<br />
3 differentiation mild 1.22 × 10 4 Gaussian deriv2 L 2 -T V<br />
4 Phillips’s problem mild 2.64 × 10 6 Gaussian phillips L 2 -L 2<br />
5 deblurr<strong>in</strong>g severe 2.62 × 10 12 impulsive deblur L 1 -T V<br />
5 Numerical experiments and discussions<br />
This section presents the numerical results <strong>for</strong> five benchmark <strong>in</strong>verse problems, which are adapted from<br />
Hansen’s popular MATLAB package <strong>Regularization</strong> Tool [17] and range from mild to severe ill-posedness,<br />
to illustrate salient features of the proposed rule. These are Fredholm (or Volterra) <strong>in</strong>tegral equations<br />
of the first k<strong>in</strong>d with kernel k(s, t) and solution x(t). The discretized l<strong>in</strong>ear system takes the <strong>for</strong>m<br />
Kx = y δ , and is of size 100 × 100. The regulariz<strong>in</strong>g functional is referred to as φ-ψ type, e.g. L 1 -T V<br />
denotes the one with L 1 data-fitt<strong>in</strong>g and T V regularization. Table 1 summarizes major features, e.g.<br />
degree of ill-posedness, of these examples, where the notation Cond(K) denotes the condition number<br />
of the matrix K, and relevant MATLAB programs are taken from the package. Let ε be the relative noise<br />
level, then we will consider five noise levels, i.e. ε ∈ {5 × 10 −2 , 5 × 10 −3 , 5 × 10 −4 , 5 × 10 −5 , 5 × 10 −6 },<br />
and graphically differentiated by dist<strong>in</strong>ct colors. Unless otherwise specified, the <strong>in</strong>itial guess <strong>for</strong> the<br />
regularization parameter η is η 0 = 1.0 × 10 −8 , and the value <strong>for</strong> the parameter pair (α, β 0 ) and the<br />
constant d is taken to be (0.1, 1 × 10 −4 ) and 1 3<br />
, respectively. The value <strong>for</strong> α follows from the rule of<br />
thumb that α 0 ′ ≈ 1 works well <strong>for</strong> full norms <strong>in</strong> case of the g-<strong>Tikhonov</strong> method and subsequently it is<br />
scaled by max i |y i | 2d to compensate <strong>for</strong> the effect of the component. Vector norms are rescaled so that<br />
the estimate σ 2 (η) is directly comparable with the variance σ0. 2 The nonsmooth m<strong>in</strong>imization problems<br />
aris<strong>in</strong>g from L 2 -T V , L 2 -L 1 and L 1 -T V <strong>for</strong>mulations are solved by the iterative reweighted least-squares<br />
method [18].<br />
We will term the newly proposed choice rule as g-<strong>Tikhonov</strong> rule to emphasize its <strong>in</strong>timate connection<br />
with the g-<strong>Tikhonov</strong> functional, and compare it with three other popular heuristic choice rules, i.e.<br />
quasi-optimality (QO) criterion, generalized cross-validation (GCV) and L-curve (LC) [16] criterion. The<br />
quasi-optimality criterion requires the differentiability of the <strong>in</strong>verse solution x η with respect to η and thus<br />
it might be unsuitable <strong>for</strong> nonsmooth functionals, e.g. L 2 -L 1 , and the GCV seems not directly amenable<br />
with problems with constra<strong>in</strong>t and nonsmooth functionals due to the lack of an explicit <strong>for</strong>mula <strong>for</strong><br />
comput<strong>in</strong>g the ‘effective’ degrees of freedom of the residual. Generally, the existence of a ‘corner’ on the<br />
L-curve is not guaranteed. Moreover, <strong>for</strong> regulariz<strong>in</strong>g functionals other than L 2 -L 2 type, the L-curve<br />
must be sampled at discrete po<strong>in</strong>ts, however, numerically locat<strong>in</strong>g the corner from discrete sample po<strong>in</strong>ts<br />
is highly nontrivial.<br />
5.1 Case 1: L 2 -L 2<br />
Example 1 (Shaw’s problem [17]). The functions k and x are given by k(s, t) = (cos s + cos t) ( s<strong>in</strong> u<br />
u<br />
with u(s, t) = π(s<strong>in</strong> s + s<strong>in</strong> t) and x(t) = 2e −6(t− 4 5 )2 + e −2(t+ 1 2 )2 , respectively, and the <strong>in</strong>tegration <strong>in</strong>terval<br />
is [− π 2 , π 2<br />
]. The data is contam<strong>in</strong>ated by additive Gaussian noise, i.e.<br />
yi δ = y i + max {|y i|}εξ i , 1 ≤ i ≤ 100,<br />
1≤i≤100<br />
where ξ i is the standard Gaussian random variable, and ε refers to the relative noise level. The variance<br />
σ 2 0 is related to the noise level ε by σ 0 = max 1≤i≤100 {|y i |}ε. The parameter α is taken to be α = 1.<br />
The automatically determ<strong>in</strong>ed value of the regularization parameter η depends on the realization of<br />
the random noise, and thus it is also random. The probability density p(η) is estimated from 1000 samples<br />
with kernel density estimation technique <strong>in</strong> a logarithmic scale [19] <strong>for</strong> Example 1, and it is shown <strong>in</strong><br />
) 2<br />
15
10<br />
20<br />
8<br />
8<br />
15<br />
7<br />
6<br />
6<br />
5<br />
p(η)<br />
4<br />
p(e)<br />
10<br />
p(σ 2 )<br />
4<br />
3<br />
2<br />
5<br />
2<br />
1<br />
0<br />
0<br />
0<br />
(a) 10 −10 10 −8 10 −6 10 −4 10 −2 10 0<br />
10 −3 10 −2 10 −1 10 0 10 1<br />
η<br />
(b) (c) 10 −10 10 −8 10 −6 10 −4 10 −2 10 0<br />
e<br />
σ 2<br />
Figure 1: Density of (a) η, (b) e, and (c) σ 2 <strong>for</strong> Example 1.<br />
Table 2: Numerical results <strong>for</strong> Example 1.<br />
ε σ 2 0 σ 2 GCV σ 2 GT η QO η LC η GCV η GT λ GT e QO e LC e GCV e GT<br />
5e-6 3.34e-10 2.48e-10 3.24e-10 2.87e-8 5.81e-10 2.30e-8 4.74e-7 1.01e0 3.26e-2 3.67e-2 3.28e-2 3.32e-2<br />
5e-5 3.31e-8 2.51e-8 2.60e-8 9.87e-6 6.27e-8 9.97e-7 8.83e-6 1.01e0 4.55e-2 5.60e-2 4.09e-2 4.53e-2<br />
5e-4 3.31e-6 2.52e-6 3.20e-6 3.68e-5 3.02e-6 1.70e-5 2.20e-4 1.01e0 5.33e-2 9.38e-2 5.86e-2 5.80e-2<br />
5e-3 3.31e-4 2.54e-4 2.72e-4 1.08e-2 1.53e-4 4.58e-4 4.34e-3 1.03e0 1.52e-1 7.48e-2 8.02e-2 1.35e-1<br />
5e-2 3.31e-2 2.52e-2 2.99e-2 1.02e-2 7.44e-3 1.16e-2 1.08e-1 1.13e0 1.60e-1 1.58e-1 1.61e-1 1.88e-1<br />
Figure 1(a). Here the dash-dotted, dotted, dashed and solid curves refer to results given by η QO , η LC ,<br />
η GCV and η GT , respectively. For medium noise levels, all three methods except the GCV work very well,<br />
however the variation of η QO and η LC are larger than that of η GT . The GCV fails <strong>for</strong> about 10% of the<br />
samples, signified by η GCV tak<strong>in</strong>g very small values, e.g. 1 × 10 −20 . This phenomenon occurs irrespective<br />
of noise levels, and it is attributed to the fact that the GCV curve is very flat [16], see e.g. Figure 2(c).<br />
On average, η LC decays to zero faster than η QO and η GT as the noise level σ0 2 tends to zero. Thus <strong>for</strong> low<br />
noise levels, η LC often takes very small values and spans over a broad scale, which renders its solution<br />
plagued with spurious oscillations, see e.g. the red dotted curve <strong>in</strong> Figure 1(b). The observation concurs<br />
with previous theoretical and numerical results of Hanke [14] that suggest the L-curve criterion may suffer<br />
from nonconvergence <strong>in</strong> case of smooth solutions. The quasi-optimality criterion fails also occasionally,<br />
as <strong>in</strong>dicated by the long tail, despite its overall robustness. There<strong>for</strong>e, the newly proposed choice rule is<br />
more robust than the other three. The <strong>in</strong>verse solution x η ∗ is also random. We utilize the accuracy error<br />
e def<strong>in</strong>ed below as the error metric<br />
e = ‖x η ∗ − x‖ L 2.<br />
The probability density p(e) of the accuracy error e is shown <strong>in</strong> Figure 1(b). The accuracy errors e QO<br />
and e GT are very similar despite the apparent discrepancies of the regularization parameters, whereas<br />
e LC and e GCV vary very broadly, especially at low noise levels, although the variation of e LC is much<br />
milder than that of e GCV . The estimates σGCV 2 and σ2 AT are practically identical, see Figure 1(c), which<br />
qualifies σGT 2 as an estimator of the variance. Interest<strong>in</strong>gly, σ2 GCV can slightly under-estimate the noise<br />
level σ0 2 compared σGT 2 due to the exceed<strong>in</strong>gly small regularization parameters chosen by the GCV.<br />
10 4 k<br />
10 2<br />
10 0<br />
σ 2<br />
λ<br />
η<br />
e<br />
2.5<br />
2<br />
1.5<br />
exact<br />
numerical<br />
10 −1 η<br />
10 −2<br />
10 −2<br />
x<br />
1<br />
G(η)<br />
10 −3<br />
10 −4<br />
0.5<br />
10 −6<br />
0<br />
(a)<br />
5 10 15 20<br />
0<br />
−2 −1.5 −1 −0.5<br />
(b)<br />
0<br />
t<br />
0.5 1 1.5 2<br />
10 −4<br />
(c) 10 −15 10 −10 10 −5 10 0 10 5<br />
Figure 2: (a) convergence of σ 2 , λ, η and e, (b) solution, and (c) GCV curve <strong>for</strong> Example 1 with ε = 5%.<br />
16
10<br />
8<br />
8<br />
8<br />
6<br />
6<br />
6<br />
p(η)<br />
4<br />
p(e)<br />
4<br />
p(σ 2 )<br />
4<br />
2<br />
2<br />
2<br />
0<br />
(a)<br />
10 −10 10 −8 10 −6 η<br />
10 −4 10 −2<br />
0<br />
0<br />
(b) 10 −4 10 −3 10 −2 e<br />
10 −1 10 0<br />
10<br />
(c)<br />
−10 10 −8 10 −6 10 −4<br />
σ 2 10 −2 10 0<br />
Figure 3: Density of (a) η, (b) e, and (c) σ 2 <strong>for</strong> Example 2.<br />
Next we <strong>in</strong>vestigate the convergence of Algorithm I <strong>for</strong> a particular realization of the noise. The<br />
numerical results are summarized <strong>in</strong> Figure 2 and Table 2. The algorithm converges with<strong>in</strong> five iterations,<br />
and thus it merits a fast convergence. Moreover, the convergence is rather steady, and a few extra<br />
iterations would not deteriorate the <strong>in</strong>verse solution. The estimate σ 2 (η) changes very little dur<strong>in</strong>g the<br />
iteration process, and a strik<strong>in</strong>g convergence with<strong>in</strong> one iteration is observed, concurr<strong>in</strong>g with previous<br />
numerical f<strong>in</strong>d<strong>in</strong>gs <strong>for</strong> severely ill-posed problems [19]. The convergence of the estimate σ 2 (η) is monotonic,<br />
substantiat<strong>in</strong>g the remark after Theorem 4.1. The estimate σGCV 2 also approximates reasonably<br />
σ0, 2 but it is less accurate than σGT 2 , see Table 2. The prefactor λ rema<strong>in</strong>s almost unchanged as the<br />
noise level varies, see Table 2, and thus η GT is <strong>in</strong>deed proportional to φ(x, y δ ) 1−d ≈ σ 2(1−d)<br />
0 = σ 4 3<br />
0 . The<br />
numerical solution rema<strong>in</strong>s accurate and stable <strong>for</strong> up to ε = 5%, see Figure 2(b).<br />
5.2 Case 2: L 2 -H 2 with constra<strong>in</strong>t<br />
Example 2. (1D gravity survey<strong>in</strong>g [17] with nonnegativity constra<strong>in</strong>t). The functions k and x are given<br />
( 1<br />
by k(s, t) = 1 4 16 + (s − t)2) − 3 2<br />
and x(t) = s<strong>in</strong>(πt) + 1 2<br />
s<strong>in</strong>(2πt), respectively, and the <strong>in</strong>tegration <strong>in</strong>terval<br />
is [0, 1]. The constra<strong>in</strong>ed optimization problems are solved by built-<strong>in</strong> MATLAB function quadprog.<br />
The presence of the constra<strong>in</strong>t rules out the usage of the quasi-optimality criterion and the GCV,<br />
and it can also distort the shape of the L-curve greatly so that a corner does not appear at all, e.g. <strong>in</strong><br />
case of the L 2 -L 2 functional. There does exist a dist<strong>in</strong>ct corner on the curve <strong>for</strong> the L 2 -H 2 functional,<br />
see Figure 4(c), however, it is numerically difficult to locate due to the lack of monotonicity and discrete<br />
nature of sampl<strong>in</strong>g po<strong>in</strong>ts. This causes the frequent failure of the MATLAB functions corner and l corner<br />
provided by the package, and visual <strong>in</strong>spection is required. The <strong>in</strong>convenience persists <strong>for</strong> the rema<strong>in</strong><strong>in</strong>g<br />
examples, and thus we do not <strong>in</strong>vestigate its statistical per<strong>for</strong>mance via comput<strong>in</strong>g relevant probability<br />
densities. The results <strong>for</strong> the L-curve criterion are obta<strong>in</strong>ed by manually locat<strong>in</strong>g the corner.<br />
However, the presence of constra<strong>in</strong>ts poses no difficulty to the proposed rule. Analogous to Example<br />
1, η GT and e GT are narrowly distributed, see Figures 3(a) and (b), which clearly illustrates its excellent<br />
scalability with respect to the noise level. The estimate σGT 2 peaks around the exact variance σ2 0, and<br />
always reta<strong>in</strong>s the correct magnitude, see Figure 3(c). Typical numerical results <strong>for</strong> Example 2 with<br />
ε = 5% are presented <strong>in</strong> Figure 4. A fast and steady convergence of the algorithm with<strong>in</strong> five iterations<br />
is aga<strong>in</strong> observed, see Figure 4(a), and similar convergence behavior is observed <strong>for</strong> other noise levels. In<br />
contrast, <strong>in</strong> order that the L-curve is representative, many po<strong>in</strong>ts on the curve must be sampled, which<br />
effectively dim<strong>in</strong>ishes its computational efficiency. The numerical solution is <strong>in</strong> good agreement with the<br />
exact one, see Figure 4(b) and Table 3. The accuracy error e GT improves steadily as the noise level ε<br />
decreases, and it compares very favorably with e LC , <strong>for</strong> which a nonconvergence phenomenon is observed.<br />
The prefactor λ changes very little as the noise level ε varies, and thus η GT decays at a rate commensurate<br />
with σ 4 3<br />
GT<br />
, see also Figure 3(a).<br />
17
Table 3: Numerical results <strong>for</strong> Example 2.<br />
ε σ0 2 σGT 2 η LC η GT λ GT e LC e GT<br />
5e-6 1.14e-9 8.62e-10 7.20e-12 3.72e-10 4.11e-4 1.71e-1 4.84e-4<br />
5e-5 1.14e-7 8.67e-8 7.91e-11 8.06e-9 4.12e-4 1.11e-1 7.83e-4<br />
5e-4 1.14e-5 8.57e-6 9.54e-9 1.74e-7 4.15e-4 6.86e-2 2.49e-2<br />
5e-3 1.14e-3 8.58e-4 5.18e-7 3.81e-6 4.24e-4 1.65e-2 7.20e-3<br />
5e-2 1.14e-1 8.69e-2 6.25e-5 1.04e-4 5.32e-4 1.10e-1 4.12e-2<br />
1.4<br />
1.2<br />
exact<br />
numerical<br />
10 0 k<br />
10 −2<br />
10 −4<br />
10 −6<br />
σ 2<br />
λ<br />
η<br />
e<br />
x<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
2<br />
||Lx|| 2 L<br />
10 8 ||Kx−y|| L<br />
2<br />
10 4<br />
10 0<br />
10 −8<br />
0<br />
(a)<br />
5 10 15 20<br />
0<br />
10 −4<br />
0 0.2 0.4 0.6 0.8 1<br />
(b) 10 −2 10 −1 10 0 10 1<br />
t<br />
(c)<br />
2<br />
Figure 4: (a) convergence of σ 2 , λ, η and e, (b) solution, and (c) L-curve <strong>for</strong> Example 2 with ε = 5%.<br />
5.3 Case 3: L 2 -T V<br />
Example 3 (numerical differentiation, adapted from deriv2 [16]). The functions k and x are given by<br />
k(s, t) =<br />
{<br />
s(t − 1), s < t,<br />
t(s − 1), s ≥ t,<br />
and x(t) =<br />
{<br />
1,<br />
1<br />
3 < t ≤ 2 3 ,<br />
0, otherwise,<br />
respectively, and the <strong>in</strong>tegration <strong>in</strong>terval is [0, 1]. The constant α is taken to be 5 × 10 −3 , and η 0 is<br />
1 × 10 −6 . The exact solution x is piecewise constant, and thus T V regularization is suitable.<br />
The regularization parameter η GT distributes narrowly, and the accuracy error e GT is mostly comparable<br />
with the noise level, see Figures 5(a) and (b), respectively. The sample mean of the estimate<br />
σ 2 GT (η) agrees excellently with the exact one σ2 0, see Figure 5(c). For <strong>in</strong>stance, <strong>in</strong> case of ε = 5%, the<br />
mean 1.23 × 10 −5 almost co<strong>in</strong>cides with the exact value. Typical numerical results <strong>for</strong> Example 3 are<br />
summarized <strong>in</strong> Figure 6 and Table 4. The L-curve has only an ambiguous corner, and the ambiguity<br />
persists <strong>for</strong> very low noise levels, e.g. ε = 5 × 10 −6 . Nevertheless, the regularization parameters chosen<br />
by these two rules are comparable, and the numerical results are practically identical, see Table 4.<br />
Algorithm I converges with<strong>in</strong> less than five iterations with an empirical asymptotic convergence rate<br />
r ∗ < 0.15 <strong>for</strong> all five noise levels, and moreover it tends to accelerate as σ 2 0 decreases, e.g. r ∗ ≈ 0.05<br />
<strong>for</strong> ε = 5 × 10 −6 . There<strong>for</strong>e, the algorithm is computationally very efficient. The reconstructed profile<br />
rema<strong>in</strong>s accurate and stable <strong>for</strong> ε up to 5%, see Figure 6(b) and Table 4. Note that it exhibits typical<br />
10<br />
8<br />
8<br />
8<br />
6<br />
6<br />
6<br />
p(η)<br />
4<br />
p(e)<br />
4<br />
p(σ 2 )<br />
4<br />
2<br />
2<br />
2<br />
0<br />
0<br />
0<br />
(a) 10 −10 10 −8 10 −6 10 −4 10 −2<br />
(b) 10 −4 10 −3 10 −2 10 −1 10 0<br />
10 −14 10 −12 10 −10 10 −8 10 −6 10 −4<br />
η<br />
(c)<br />
e<br />
σ 2<br />
Figure 5: Density of (a) η, (b) e and (c) σ 2 <strong>for</strong> Example 3.<br />
18
Table 4: Numerical results <strong>for</strong> Example 3.<br />
ε σ0 2 σGT 2 η LC η GT λ GT e LC e GT<br />
5e-6 1.19e-13 8.21e-14 6.43e-12 4.70e-10 2.49e-1 1.26e-3 5.77e-4<br />
5e-5 1.19e-11 8.79e-12 3.59e-9 1.06e-8 2.49e-1 5.01e-3 9.38e-3<br />
5e-4 1.19e-9 8.65e-10 1.26e-7 2.25e-7 2.48e-1 4.34e-2 4.93e-2<br />
5e-3 1.19e-7 8.60e-8 2.01e-6 4.78e-6 2.45e-1 8.68e-2 8.08e-2<br />
5e-2 1.19e-5 8.89e-6 7.05e-5 1.08e-4 2.51e-1 1.05e-1 9.69e-2<br />
1.2<br />
1<br />
10 2 k<br />
10 0<br />
10 −2<br />
10 −4<br />
σ 2<br />
λ<br />
η<br />
e<br />
x<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
exact<br />
numerical<br />
|x| TV<br />
10 4 ||Kx−y|| L<br />
2<br />
10 0<br />
10 −4<br />
0<br />
10 −6<br />
0<br />
(a)<br />
5 10 15 20<br />
−0.2<br />
0<br />
(b)<br />
0.2 0.4<br />
t<br />
0.6 0.8 1<br />
10 −8<br />
10 −10<br />
(c)<br />
10 −8 10 −6 2<br />
10 −4<br />
Figure 6: (a) convergence of σ 2 , λ, η and e, and (b) solution, and (c) L-curve <strong>for</strong> Example 3 with ε = 5%.<br />
stair-cases of TV regularization.<br />
5.4 Case 4: L 2 -L 1<br />
]<br />
χ|t−s|
Table 5: Numerical results <strong>for</strong> Example 4.<br />
ε σ0 2 σGT 2 η LC η GT λ GT e LC e GT<br />
5e-6 5.73e-12 3.59e-12 2.85e-9 3.90e-8 1.66e0 4.31e-4 5.64e-4<br />
5e-5 5.73e-10 3.81e-10 1.00e-8 8.74e-7 1.66e0 1.05e-2 6.70e-3<br />
5e-4 5.73e-8 3.94e-8 1.87e-7 1.92e-5 1.66e0 7.08e-2 3.12e-2<br />
5e-3 5.73e-6 4.10e-6 2.85e-5 4.25e-4 1.66e0 1.88e-1 5.91e-2<br />
5e-2 5.73e-4 4.27e-4 2.85e-3 9.45e-3 1.67e0 1.51e-1 6.88e-2<br />
1.5<br />
10 4 k<br />
10 0<br />
1<br />
0.5<br />
10 0<br />
x<br />
0<br />
||x|| L<br />
1<br />
10 4 ||Kx−y|| L<br />
2<br />
10 −4<br />
10 −4<br />
σ 2<br />
λ<br />
η<br />
e<br />
10 −8<br />
0<br />
(a)<br />
5 10 15 20<br />
−0.5<br />
−1<br />
exact<br />
numerical<br />
−1.5<br />
10 −12<br />
−6 −4 −2 0 2 4 6<br />
(b) 10 −8 10 −6 10 −4 10 −2 10 0<br />
t<br />
(c)<br />
2<br />
10 −8<br />
Figure 8: (a) convergence of σ 2 , λ, η and e, (b) solution, and (c) L-curve <strong>for</strong> Example 4 with ε = 5%.<br />
all five noise levels, and thus η GT scales as σ 4 3<br />
GT<br />
. The solution shows the feature of sparsity-promot<strong>in</strong>g<br />
L 1 regulariz<strong>in</strong>g: the locations of the all three small spikes are perfectly detected, and the retrieved<br />
magnitudes are reasonably accurate.<br />
5.5 Case 5: L 1 -T V<br />
Example 5 (Deblurr<strong>in</strong>g 1D image). The functions k and x are given by k(s, t) = 1<br />
4 √ (s−t)2<br />
e− 32 χ<br />
2π |s−t|
Table 6: Numerical results <strong>for</strong> Example 5.<br />
ε q σ0 2 σGT 2 η LC η GT λ GT e LC e GT<br />
1e-4 30% 2.18e-5 1.65e-5 5.34e-2 2.02e-2 4.97e0 4.03e-7 2.96e-7<br />
1e-3 30% 2.18e-4 1.64e-4 8.11e-2 6.38e-2 4.97e0 5.93e-7 4.68e-7<br />
1e-2 30% 2.18e-3 1.64e-3 4.33e-1 2.02e-1 4.97e0 8.18e-5 2.88e-6<br />
1e-1 30% 2.18e-2 1.64e-2 1.00e0 6.38e-1 4.97e0 2.93e-3 5.62e-4<br />
1e0 30% 2.18e-1 1.65e-1 1.52e0 2.02e0 4.97e0 9.42e-3 1.63e-2<br />
1e-4 50% 3.63e-5 3.27e-5 1.23e-1 2.84e-2 4.97e0 1.74e-4 3.77e-4<br />
1e-3 50% 3.63e-4 3.27e-4 1.23e-1 9.00e-2 4.97e0 1.03e-3 1.20e-3<br />
1e-2 50% 3.63e-3 3.28e-3 6.58e-1 2.85e-1 4.97e0 1.68e-2 8.58e-3<br />
1e-1 50% 3.63e-2 3.28e-2 6.58e-1 9.01e-1 4.97e0 2.69e-2 3.53e-2<br />
1e0 50% 3.63e-1 3.27e-1 1.00e0 2.85e0 4.97e0 3.81e-2 6.39e-2<br />
1.2<br />
10 0<br />
1<br />
0.8<br />
10 4<br />
10 1 k<br />
10 −1<br />
10 −2<br />
σ 2<br />
λ<br />
η<br />
e<br />
x<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
exact<br />
numerical<br />
|x| TV<br />
10 8 ||Kx−y|| L<br />
1<br />
10 0<br />
10 −4<br />
10 −8<br />
10 −3<br />
0<br />
(a)<br />
5 10 15 20<br />
−0.2<br />
0<br />
(b)<br />
20 40<br />
t<br />
60 80 100<br />
10 −12<br />
10<br />
(c)<br />
−2 10 −1 10 0<br />
Figure 10: (a) convergence of σ 2 , λ, η and e, (b) solution, and (c) L-curve <strong>for</strong> Example 5 with ε = 1 and<br />
q = 50%.<br />
e.g. [2, 6] <strong>in</strong> case of ε = 1 and q = 50%, see Figure 9(a), where the solid and dashed curves refer to q = 50%<br />
and q = 30%, respectively. However, the accuracy error e varies broadly, albeit mostly it is very small.<br />
This is <strong>in</strong>herent to the L 1 -T V functional [9] <strong>for</strong> which η plays the role of a characteristic parameter, and<br />
at some specific values the profile of the solution undergoes sudden transition. The mean of the estimate<br />
σGT 2 approximates the exact noise level σ2 0 excellently, e.g. 3.62×10 −2 <strong>in</strong> case of ε = 1×10 −1 and q = 50%,<br />
which co<strong>in</strong>cides with the exact noise level, see Table 6. There<strong>for</strong>e, the estimate σGT 2 is a valid variance<br />
estimate <strong>for</strong> non-Gaussian noise. Note that the estimate σGT 2 <strong>for</strong> q = 30% is generally less accurate than<br />
that <strong>for</strong> q = 50% due to the less effective samples available, as <strong>in</strong>dicated by the looser probability bound.<br />
Interest<strong>in</strong>gly, the numerical results of the <strong>for</strong>mer are far more accurate than that of the latter, usually by<br />
one order <strong>in</strong> magnitude, which clearly illustrates the profound effect of the corruption percentage q on<br />
the <strong>in</strong>verse solution.<br />
Exemplary numerical results <strong>for</strong> Example 5 with ε = 1 and q = 50% noise <strong>in</strong> the data are presented <strong>in</strong><br />
Figure 10. The L-curve does not exhibit the typical L-shape, and there is no dist<strong>in</strong>ct corner, see Figure<br />
10(c), which causes its ambiguity <strong>in</strong> practical usage. This concurs with previous theoretical and numerical<br />
observations <strong>in</strong> the context of image denois<strong>in</strong>g [9]. Moreover, it lacks concavity and monotonicity due to<br />
limited precision of the solver <strong>for</strong> the optimization problem, and there are many po<strong>in</strong>ts cluster<strong>in</strong>g around<br />
the ‘corner’, which renders its accurate numerical locat<strong>in</strong>g very challeng<strong>in</strong>g. Practically, the discont<strong>in</strong>uity<br />
po<strong>in</strong>ts, as <strong>in</strong>dicated by the small steps along the curve, may serve as a regularization parameter, and<br />
the lower step is selected. The results presented <strong>in</strong> Table 6 <strong>for</strong> the L-curve criterion are obta<strong>in</strong>ed with<br />
this rule of thumb. The numerical results are accurate, and thus the ad hoc rule seems viable. A steady<br />
and fast convergence of the algorithm is also observed, and tak<strong>in</strong>g account <strong>in</strong>to the large amount of data<br />
noise, the converged profile approximates excellently the exact one.<br />
21
6 Conclud<strong>in</strong>g remarks<br />
This paper proposes and analyzes a new rule <strong>for</strong> choos<strong>in</strong>g the regularization parameter <strong>in</strong> <strong>Tikhonov</strong><br />
regularization. The existence of solutions to the regularization parameter equation is shown, a variational<br />
characterization is provided, and a posterior error estimates of the <strong>in</strong>verse solution are derived. An<br />
effective iterative algorithm is suggested, and its monotonically convergence is <strong>in</strong>vestigated. Results of<br />
five regulariz<strong>in</strong>g <strong>for</strong>mulations, i.e. L 2 -L 2 , L 2 -H 2 with constra<strong>in</strong>t, L 2 -T V , L 2 -L 1 and L 1 -T V , <strong>for</strong> several<br />
benchmark examples are presented to illustrate its numerical features. The numerical results <strong>in</strong>dicate<br />
that the proposed rule is competitive with exist<strong>in</strong>g heuristic choice rules, e.g. the L-curve criterion and<br />
generalized cross-validation, and the algorithm merits a fast and steady convergence. Unlike most exist<strong>in</strong>g<br />
choice rules, solid mathematical justification is established <strong>for</strong> the new choice rule.<br />
There are several avenues <strong>for</strong> future research, which are currently under <strong>in</strong>vestigation. For large-scale<br />
<strong>in</strong>verse problems, iterative regularization methods, e.g. Landweber method, conjugate gradient method<br />
and generalized m<strong>in</strong>imal residual method, are of immense practical <strong>in</strong>terest. It is of great practical<br />
significance to develop analogous choice rules <strong>for</strong> these methods. Secondly, some applications require<br />
multiple regularization terms <strong>for</strong> preserv<strong>in</strong>g several dist<strong>in</strong>ct features simultaneously, and consequently,<br />
call <strong>for</strong> multiple regularization parameters. Thus extend<strong>in</strong>g the idea to this context would be <strong>in</strong>terest<strong>in</strong>g.<br />
Acknowledgements<br />
The authors would like to thank Professor Per Christian Hansen <strong>for</strong> his MATLAB package <strong>Regularization</strong><br />
Tool with which some of our numerical experiments were conducted.<br />
References<br />
[1] Archer G, Tigger<strong>in</strong>gton DM. On some Bayesian/regularization methods <strong>for</strong> image restoration. IEEE<br />
Transactions on Image Process<strong>in</strong>g 1995;4(7):989–995.<br />
[2] Aubert G, Aujol JF. A variational approach to remove multiplicative noise. SIAM Journal on Applied<br />
Mathematics 2008;68(4): 925–946.<br />
[3] Bauer F, K<strong>in</strong>dermann S. The quasi-optimality criterion <strong>for</strong> classical <strong>in</strong>verse problems. Inverse Problems<br />
2008;24(3): 035002.<br />
[4] Banks HT, Kunisch K. Estimation Techniques <strong>for</strong> Distributed Parameter Systems. Birkhäuser:<br />
Boston; 1989.<br />
[5] Bazán FSV. Fixed po<strong>in</strong>t iterations <strong>in</strong> determ<strong>in</strong><strong>in</strong>g the <strong>Tikhonov</strong> regularization parameter. Inverse<br />
Problems 2008;24: 0350001.<br />
[6] Beck JV, Blackwell B, Clair CRS. Inverse Heat Conduction: Ill-Posed Problems. Wiley: <strong>New</strong> York;<br />
1985.<br />
[7] Besag J. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, Series<br />
B (Methodological) 1986;48(3): 259–302.<br />
[8] Burger M, Osher S. Convergence rates of convex variational regularization. Inverse Problems<br />
2004;20(5): 1411–1421.<br />
[9] Chan TF, Esedoḡlu S. Aspects of total variation regularized L 1 function approximation. SIAM<br />
Journal on Applied Mathematics 2005;65(5): 1817–1837.<br />
[10] Chan TF, Shen J. Image Process<strong>in</strong>g and Analysis: Variational, PDE, Wavelet, and Stochastic Methods.<br />
SIAM: Philadelphia; 2005.<br />
22
[11] Engl HW, Hanke M, Neubauer A. <strong>Regularization</strong> of Inverse Problems. Kluwer: Dordrecht; 1996.<br />
[12] Gelman A, Carl<strong>in</strong> JB, Stern HS, Rub<strong>in</strong> DB. Bayesian Data Analysis (2nd ed.). Chapman & Hall:<br />
Boca Raton, 2004.<br />
[13] Golub GH, Heath MT, Wahba G. Generalized cross-validation as a method <strong>for</strong> choos<strong>in</strong>g a good ridge<br />
parameter. Technometrics 1979;21(2): 215–223.<br />
[14] Hanke M. Limitations of the L-curve method <strong>in</strong> ill-posed problems. BIT 1996;36(2): 287–301.<br />
[15] Hansen PC. Analysis of discrete ill-posed problems by means of the L-curve. SIAM Review 1992;<br />
34(4): 561–580.<br />
[16] Hansen PC. Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of L<strong>in</strong>ear Inversion.<br />
SIAM: Philadelphia; 1998.<br />
[17] Hansen PC. <strong>Regularization</strong> Tools Version 4.0 <strong>for</strong> Matlab 7.3. Numerical Algorithms 2007;46(2):<br />
189–194. Software available at: http://www2.imm.dtu.dk/∼pch/Regutools/<strong>in</strong>dex.html.<br />
[18] H<strong>in</strong>termüller M, Ito K, Kunisch K. The primal-dual active set strategy as a semismooth <strong>New</strong>ton<br />
method. SIAM Journal on Optimization 2003;13(3): 865–888.<br />
[19] J<strong>in</strong> B, Zou J. A Bayesian <strong>in</strong>ference approach to the ill-posed Cauchy problem of steady-state heat<br />
conduction. International Journal <strong>for</strong> Numerical Methods <strong>in</strong> Eng<strong>in</strong>eer<strong>in</strong>g, <strong>in</strong> press. Available at:<br />
http://doi.wiley.com/10.1002/nme.2350.<br />
[20] Johnston PR, Gulrajani RM. An analysis of the zero-cross<strong>in</strong>g method <strong>for</strong> choos<strong>in</strong>g regularization<br />
parameters. SIAM Journal on Scientific Comput<strong>in</strong>g 2002;24(2): 428–442.<br />
[21] Kunisch K, Zou J. Iterative choices of regularization parameters <strong>in</strong> l<strong>in</strong>ear <strong>in</strong>verse problems. Inverse<br />
Problems 1998;14(5): 1247–1264.<br />
[22] Leonov AS. <strong>Regularization</strong> of ill-posed problems <strong>in</strong> Sobolev space W 1 1 . Journal of Inverse and Ill-<br />
Posed Problems 2005;13(6): 595-619.<br />
[23] Natterer F. The Mathematics of Computerized Tomography. Teubner: Stuttgart; 1986.<br />
[24] Nikolova M. M<strong>in</strong>imizers of cost-functions <strong>in</strong>volv<strong>in</strong>g non-smooth data-fidelity terms: application to<br />
the process<strong>in</strong>g of outliers. SIAM Journal on Numerical Analysis 2002;40(3): 965–994.<br />
[25] Morozov VA. On the solution of functional equations by the method of regularization. Soviet Mathematics<br />
Doklady 1966;7(3): 414–417.<br />
[26] Reg<strong>in</strong>ska T. A regularization parameter <strong>in</strong> discrete ill-posed problems. SIAM Journal on Scientific<br />
Comput<strong>in</strong>g 1996;17(3): 740–749.<br />
[27] Thompson AM, Kay J. On some choices of regularization parameter <strong>in</strong> image restoration. Inverse<br />
Problems 1993;9(6): 749–761.<br />
[28] <strong>Tikhonov</strong> AN, Arsen<strong>in</strong> VA. Solutions of Ill-posed Problems. W<strong>in</strong>ston & Sons: Wash<strong>in</strong>gton; 1977.<br />
[29] <strong>Tikhonov</strong> AN, Glasko V, Kriks<strong>in</strong> Y. On the question of quasi-optimal choice of a regularized approximation.<br />
Soviet Mathematics Doklady 1979;20(5): 1036–1040.<br />
[30] Vogel CR. Computational Methods <strong>for</strong> Inverse Problems. SIAM: Philadelphia, PA; 2002.<br />
[31] Wahba G. Spl<strong>in</strong>e Models <strong>for</strong> Observational Data. SIAM: Philadelphia; 1990.<br />
[32] Xie JL, Zou J. An improved model function method <strong>for</strong> choos<strong>in</strong>g regularization parameters <strong>in</strong> l<strong>in</strong>ear<br />
<strong>in</strong>verse problems. Inverse Problems 2002;18(3): 631–643.<br />
23