A New Choice Rule for Regularization Parameters in Tikhonov ...

A New Choice Rule for Regularization Parameters in Tikhonov ... A New Choice Rule for Regularization Parameters in Tikhonov ...

math.tamu.edu
from math.tamu.edu More from this publisher
28.04.2014 Views

Research Report A New Choice Rule for Regularization Parameters in Tikhonov Regularization by Kazufumi Ito, Bangti Jin, Jun Zou CUHK-2008-07 (362) September 2008 Department of Mathematics The Chinese University of Hong Kong Shatin, Hong Kong Fax: (852) 2603 5154 Email : dept@math.cuhk.edu.hk URL: http://www.cuhk.edu.hk

Research Report<br />

A <strong>New</strong> <strong>Choice</strong> <strong>Rule</strong> <strong>for</strong> <strong>Regularization</strong> <strong>Parameters</strong><br />

<strong>in</strong> <strong>Tikhonov</strong> <strong>Regularization</strong><br />

by<br />

Kazufumi Ito, Bangti J<strong>in</strong>, Jun Zou<br />

CUHK-2008-07 (362)<br />

September 2008<br />

Department of Mathematics<br />

The Ch<strong>in</strong>ese University of Hong Kong<br />

Shat<strong>in</strong>, Hong Kong<br />

Fax: (852) 2603 5154<br />

Email : dept@math.cuhk.edu.hk<br />

URL: http://www.cuhk.edu.hk


A <strong>New</strong> <strong>Choice</strong> <strong>Rule</strong> <strong>for</strong> <strong>Regularization</strong> <strong>Parameters</strong><br />

<strong>in</strong> <strong>Tikhonov</strong> <strong>Regularization</strong><br />

Kazufumi Ito ∗ Bangti J<strong>in</strong> † Jun Zou ‡<br />

August 26, 2008<br />

Abstract This paper proposes and analyzes a novel rule <strong>for</strong> choos<strong>in</strong>g the regularization parameters <strong>in</strong><br />

<strong>Tikhonov</strong> regularization <strong>for</strong> <strong>in</strong>verse problems, not necessarily requir<strong>in</strong>g the knowledge of the exact noise<br />

level. The new choice rule is derived by draw<strong>in</strong>g ideas from Bayesian statistical analysis. The existence<br />

of solutions to the regularization parameter equation is shown, and some variational characterizations of<br />

the feasible parameters are also provided. With such feasible regularization parameters, we are able to<br />

establish a posteriori error estimates of the approximate solutions to the concerned <strong>in</strong>verse problem. An<br />

iterative algorithm is suggested <strong>for</strong> the efficient numerical realization of the choice rule, which is shown to<br />

have a practically desired monotonic convergence. Numerical experiments <strong>for</strong> both mildly and severely<br />

ill-posed benchmark <strong>in</strong>verse problems with various regulariz<strong>in</strong>g functionals of <strong>Tikhonov</strong> type, e.g. L 2 -L 2 ,<br />

L 2 -L 1 and L 1 -T V , are presented which have demonstrated the effectiveness and robustness of the new<br />

choice rule.<br />

Key Words: regularization parameter, a posteriori error estimate, <strong>Tikhonov</strong> regularization, <strong>in</strong>verse<br />

problem<br />

1 Introduction<br />

Inverse problems arise <strong>in</strong> real-world applications whenever one attempts to <strong>in</strong>fer the physical laws or<br />

parameters from imprecise and <strong>in</strong>direct observational data. This work considers all the l<strong>in</strong>ear <strong>in</strong>verse<br />

problems of general <strong>for</strong>m<br />

Kx = y δ , (1)<br />

where x ∈ X and y δ ∈ Y refer to the unknown parameters and the observational data, respectively.<br />

The spaces X and Y are Banach spaces, with the respective norms denoted as ‖ · ‖ X and ‖ · ‖ Y , and<br />

X is reflexive. The <strong>for</strong>ward operator K : dom(K) ⊂ X ↦→ Y is l<strong>in</strong>ear and bounded. System (1) can<br />

represent a wide variety of <strong>in</strong>verse problems aris<strong>in</strong>g <strong>in</strong> diverse <strong>in</strong>dustrial and eng<strong>in</strong>eer<strong>in</strong>g applications,<br />

e.g. computerized tomography [23], parameter identification [4], image process<strong>in</strong>g [10] and <strong>in</strong>verse heat<br />

transfer [6]. The observational data y δ is a noisy version of the exact data y = Kx + , and its noise level<br />

is often measured by the upper bound σ 0 <strong>in</strong> the follow<strong>in</strong>g <strong>in</strong>equality<br />

φ(x + , y δ ) ≤ σ 2 0, (2)<br />

where the functional φ(x, y δ ) : X × Y ↦→ R + measures the proximity of the model output Kx to the data<br />

y δ . We use the notation σ 2 0 <strong>in</strong> (2) <strong>in</strong> place of the more commonly used δ 2 to ma<strong>in</strong>ta<strong>in</strong> its clear statistical<br />

<strong>in</strong>terpretation as the variance of the data noise.<br />

∗ Center <strong>for</strong> Research <strong>in</strong> Scientific Computation & Department of Mathematics, North Carol<strong>in</strong>a State University, Raleigh,<br />

NC 27695, USA. (kito@math.ncsu.edu)<br />

† Universität Bremen, FB 3 Mathematik und In<strong>for</strong>matik, Zentrum für Technomathematik, Postfach 330 440, 28344<br />

Bremen, Germany. (kimbts<strong>in</strong>g@yahoo.com.cn)<br />

‡ Department of Mathematics, The Ch<strong>in</strong>ese University of Hong Kong, Shat<strong>in</strong> N.T., Hong Kong, P.R. Ch<strong>in</strong>a. The<br />

work of this author was substantially supported by Hong Kong RGC grants (Project 404105 and Project 404606).<br />

(zou@math.cuhk.edu.hk)<br />

1


Inverse problems are generally ill-posed <strong>in</strong> the sense of Hardamard, i.e. a solution may not exist and<br />

be nonunique, and more severely, a small perturbation <strong>in</strong> the data may cause an enormous deviation of<br />

the solution. There<strong>for</strong>e, the mathematical analysis and numerical solution of <strong>in</strong>verse problems are very<br />

challeng<strong>in</strong>g. The standard procedure <strong>for</strong> numerically treat<strong>in</strong>g <strong>in</strong>verse problems is regularization thanks<br />

to <strong>Tikhonov</strong>’s <strong>in</strong>augural work [28]. <strong>Regularization</strong> techniques mitigate the ill-posedness by <strong>in</strong>corporat<strong>in</strong>g<br />

a priori <strong>in</strong><strong>for</strong>mation about the solution, e.g. boundedness, smoothness and positivity [28, 11]. The<br />

celebrated <strong>Tikhonov</strong> regularization trans<strong>for</strong>ms the solution of system (1) <strong>in</strong>to the m<strong>in</strong>imization of the<br />

<strong>Tikhonov</strong> functional J η def<strong>in</strong>ed by<br />

J η (x) = φ(x, y δ ) + ηψ(x), (3)<br />

and takes its m<strong>in</strong>imizer x η as an approximate solution, where η is often called the regularization parameter<br />

compromis<strong>in</strong>g the data fitt<strong>in</strong>g term φ(x, y δ ) with the a priori <strong>in</strong><strong>for</strong>mation encoded <strong>in</strong> the regularization<br />

term ψ(x). Some commonly used data fidelity functionals <strong>in</strong>clude ‖Kx − y δ ‖ 2 L<br />

[28], ‖Kx − y δ ‖ 2 L 1 [24]<br />

and ∫ (Kx − y δ log Kx) [2], and regularization functionals <strong>in</strong>clude ‖x‖ ν L [24], ν ‖x‖2 H [28] and |x| m T V<br />

[10]. Traditionally, <strong>Tikhonov</strong> regularization considers only L 2 data fitt<strong>in</strong>g <strong>in</strong> conjunction with L 2 or<br />

H m regularization, referred to as L 2 -L 2 functional hereafter, which statistically corresponds to additive<br />

Gaussian noise and smoothness prior, respectively. However, other nonconventional and nonstandard<br />

functionals have also received considerable recent attention, e.g. statistically motivated data fitt<strong>in</strong>g and<br />

feature-promot<strong>in</strong>g, e.g. edge, sparsity and texture, regularization.<br />

The regularization parameter η determ<strong>in</strong>es the tradeoff between data fidelity and a priori <strong>in</strong><strong>for</strong>mation,<br />

and it plays an <strong>in</strong>dispensable role <strong>in</strong> design<strong>in</strong>g a stable <strong>in</strong>verse reconstruction process and obta<strong>in</strong><strong>in</strong>g a<br />

practically acceptable <strong>in</strong>verse solution. To be more precise, the <strong>in</strong>verse solution is overwhelmed by<br />

the prior knowledge if η is too large and it often leads to undesirable effects, e.g. over-smooth, and<br />

conversely, it may be unstable and plagued with spurious and nonphysical details if η is too small.<br />

There<strong>for</strong>e, its selection constitutes one of the major <strong>in</strong>conveniences and difficulties <strong>in</strong> apply<strong>in</strong>g exist<strong>in</strong>g<br />

regularization techniques, and is crucial to the success of a regularization method. A number of choice<br />

rules have been proposed <strong>in</strong> the literature, e.g. discrepancy pr<strong>in</strong>ciple [21, 25, 32], unbiased predictive risk<br />

estimator (UPRE) method [30], quasi-optimality criterion [29], generalized cross-validation (GCV) [13],<br />

and L-curve criterion [15]. The discrepancy pr<strong>in</strong>ciple is mathematically rigorous, however, it requires<br />

an accurate estimate of the exact noise level σ 0 and its <strong>in</strong>accuracy can severely deteriorate the <strong>in</strong>verse<br />

solution [11, 21]. The UPRE method was orig<strong>in</strong>ally developed <strong>for</strong> model selection <strong>in</strong> l<strong>in</strong>ear regression,<br />

and was later adapted <strong>for</strong> choos<strong>in</strong>g the regularization parameter [30]. However, its application requires<br />

an estimate of data noise like the discrepancy pr<strong>in</strong>ciple, and the m<strong>in</strong>imization of the UPRE curve is<br />

tricky s<strong>in</strong>ce it may be very flat over a broad scale [30], which is also the case <strong>for</strong> the GCV curve [16]. The<br />

latter three do not require any a priori knowledge of the noise level σ 0 , and thus are fully data-driven.<br />

These methods have been very popular <strong>in</strong> the eng<strong>in</strong>eer<strong>in</strong>g community s<strong>in</strong>ce their <strong>in</strong>ception and also have<br />

delivered satisfactory per<strong>for</strong>mance <strong>for</strong> numerous practical <strong>in</strong>verse problems [16]. However, these methods<br />

are heuristic <strong>in</strong> nature, and can not be analyzed <strong>in</strong> the framework of determ<strong>in</strong>istic <strong>in</strong>verse theory [11].<br />

Nonetheless, their mathematical underp<strong>in</strong>n<strong>in</strong>gs might be laid down <strong>in</strong> the context of statistical <strong>in</strong>verse<br />

theory, e.g. the semidiscrete semistochastic l<strong>in</strong>ear data model [30], though such analysis is seldom carried<br />

out <strong>for</strong> general regularization <strong>for</strong>mulations.<br />

Another pr<strong>in</strong>cipled framework <strong>for</strong> select<strong>in</strong>g the regularization parameter is Bayesian <strong>in</strong>ference [7, 12].<br />

Thompson and Kay [27] and Archer and Tigger<strong>in</strong>gton [1] <strong>in</strong>vestigated the framework <strong>in</strong> the context<br />

of image restoration, and proposed and numerically evaluated several choice rules by consider<strong>in</strong>g various<br />

po<strong>in</strong>t estimates, e.g. maximum likelihood estimate and maximum a posteriori, of the posteriori<br />

probability density function and their approximations. However, these were application-oriented papers<br />

compar<strong>in</strong>g different methods with neither mathematical analysis nor algorithmic description. Motivated<br />

by hierarchical model<strong>in</strong>g of Bayesian paradigm [12], the authors [19] recently proposed an augmented<br />

<strong>Tikhonov</strong> functional which determ<strong>in</strong>es the regularization parameter and the noise level along with the<br />

<strong>in</strong>verse solution <strong>for</strong> f<strong>in</strong>ite-dimensional l<strong>in</strong>ear <strong>in</strong>verse problems.<br />

In this paper, we will <strong>in</strong>vestigate the <strong>Tikhonov</strong> regularization <strong>in</strong> a general sett<strong>in</strong>g, with a general<br />

data fitt<strong>in</strong>g term φ(x, y δ ) and regularization term ψ(x) <strong>in</strong> (3), and propose a new choice rule <strong>for</strong> f<strong>in</strong>d<strong>in</strong>g<br />

2


a reasonable regularization parameter η. The derivation of the parameter choice rule from the po<strong>in</strong>t<br />

of view of hierarchical Bayesian <strong>in</strong>ference will be detailed <strong>in</strong> Section 2. As we will see, the new rule<br />

preserves an important advantage of some other exist<strong>in</strong>g heuristic rules <strong>in</strong> that it does not require the<br />

knowledge of the noise level as well. But <strong>for</strong> this new rule, some solid theoretical justifications can be<br />

developed, especially a posteriori error estimates shall be established. In addition, an iterative algorithm<br />

of monotone type is developed <strong>for</strong> an efficient realization of the algorithm <strong>in</strong> practice, and it merits a fast<br />

and steady convergence.<br />

The newly proposed choice rule has several more dist<strong>in</strong>ctions <strong>in</strong> comparison with exist<strong>in</strong>g heuristic<br />

choice rules. Various nonconvergence results have been established <strong>for</strong> the L-curve criterion [14, 30] and<br />

thus the variation of the regularization parameter is unduly large <strong>in</strong> case of low noise levels, and the<br />

existence of a corner is not be ensured. The theoretical understand<strong>in</strong>g of the quasi-optimality criterion is<br />

very limited despite its popularity [3]. The GCV merits solid statistical justifications [31, 11], however, the<br />

existence of a m<strong>in</strong>imum is not guaranteed. Moreover, <strong>in</strong> the L-curve criterion, numerically locat<strong>in</strong>g the<br />

corner from discrete sampl<strong>in</strong>g po<strong>in</strong>ts is highly nontrivial. The GCV curve is often very flat and numerically<br />

difficult to m<strong>in</strong>imize, and it sometimes requires tight bounds on the regularization parameter so as to<br />

work robustly. For functionals other than L 2 -L 2 type, all three exist<strong>in</strong>g methods require comput<strong>in</strong>g<br />

the <strong>in</strong>verse solution at many discrete sampl<strong>in</strong>g po<strong>in</strong>ts, and thus computationally very expensive. The<br />

newly proposed choice rule basically elim<strong>in</strong>ates these computational <strong>in</strong>conveniences by the efficient and<br />

monotonically convergent iterative algorithm, while at the same time it can be justified mathematically<br />

as it is done <strong>in</strong> Sections 3 and 4. Moreover, the new choice rule applies straight<strong>for</strong>wardly to <strong>Tikhonov</strong><br />

regularization of very general type, e.g. L 1 -T V , whereas other rules are numerically validated and<br />

theoretically attacked mostly <strong>for</strong> functionals of L 2 -L 2 types.<br />

We conclude this section with a general remark on heuristic choice rules. A well-known theorem of<br />

Bakush<strong>in</strong>skii [11] states that no determ<strong>in</strong>istic convergence theory can exist <strong>for</strong> choice rules disrespect<strong>in</strong>g<br />

the exact noise level. In particular, the <strong>in</strong>verse solution does not necessarily converge to the exact solution<br />

as the noise level dim<strong>in</strong>ishes to zero. There<strong>for</strong>e, we reiterate that no choice rule, <strong>in</strong> particular heuristics, <strong>for</strong><br />

choos<strong>in</strong>g the regularization parameter <strong>in</strong> ill-posed problems should be considered a “black-box rout<strong>in</strong>e”.<br />

One can always construct examples where the heuristic choice rules per<strong>for</strong>m poorly.<br />

The rest of the paper is structured as follows. In Section 2, we derive the new choice rule with<strong>in</strong> the<br />

Bayesian paradigm. Section 3 shows the existence of solutions to the regularization parameter equation,<br />

and derives some a posteriori error estimates. Section 4 proposes an iterative algorithm <strong>for</strong> efficient<br />

numerical computation, and establishes the monotone convergence of the algorithm. Section 5 presents<br />

numerical results <strong>for</strong> several benchmark l<strong>in</strong>ear <strong>in</strong>verse problems to illustrate relevant features of the<br />

proposed method. We conclude and <strong>in</strong>dicate directions of future research <strong>in</strong> Section 6.<br />

2 Derivation of the new choice rule<br />

In this section, we shall motivate our new determ<strong>in</strong>istic choice rule by draw<strong>in</strong>g some ideas from the<br />

nondeterm<strong>in</strong>istic Bayesian <strong>in</strong>ference [12, 19] which was used <strong>for</strong> a different purpose <strong>in</strong> the statistical<br />

community. But the choice rule will be rigorously analyzed and justified <strong>in</strong> the framework of determ<strong>in</strong>istic<br />

<strong>in</strong>verse theory, as it is done <strong>in</strong> the subsequent sections.<br />

For the ease of exposition, we shall derive our new choice rule by consider<strong>in</strong>g the follow<strong>in</strong>g f<strong>in</strong>itedimensional<br />

l<strong>in</strong>ear <strong>in</strong>verse problem<br />

Kx = y δ , (4)<br />

with K ∈ R n×m , x ∈ R m and y δ ∈ R n . One pr<strong>in</strong>cipled approach to provide solutions to this problem is<br />

by Bayesian <strong>in</strong>ference [12, 19]. The cornerstone of Bayesian <strong>in</strong>ference is Bayes’ rule<br />

p(x|y δ ) ∝ p(y δ |x)p(x),<br />

where the probability and conditional probability density functions p(x) and p(y δ |x) are known as the<br />

prior and likelihood function, and reflect the prior knowledge and contributions of the data, respectively.<br />

3


Also we have dropped the normaliz<strong>in</strong>g constant s<strong>in</strong>ce it plays only an immaterial role <strong>in</strong> our subsequent<br />

developments. There<strong>for</strong>e, there are two build<strong>in</strong>g blocks <strong>in</strong> Bayesian <strong>in</strong>ference, i.e. p(y δ |x) and p(x), that<br />

are to be modeled. Assume that additive i.i.d. Gaussian random variables with mean zero and variance σ 2<br />

account <strong>for</strong> the measurement errors contam<strong>in</strong>at<strong>in</strong>g the exact data, then the likelihood function p(y δ |x, τ),<br />

with τ = 1/σ 2 , is given by<br />

p(y δ |x, τ) ∝ τ n 2 exp<br />

(− τ )<br />

2 ‖Kx − yδ ‖ 2 2 .<br />

Bayesian <strong>in</strong>ference encodes the a priori <strong>in</strong><strong>for</strong>mation of the unknown x be<strong>for</strong>e collect<strong>in</strong>g the data <strong>in</strong> the<br />

prior density function p(x|λ), and this is often achieved with the help of the versatile tool of Markov<br />

random field, which <strong>in</strong> its simplest <strong>for</strong>m can be mathematically written as<br />

p(x|λ) ∝ λ m 2 exp<br />

(− λ )<br />

2 ‖Lx‖2 2 ,<br />

where the matrix L ∈ R p×m encapsulates the structure of <strong>in</strong>teractions between neighbor<strong>in</strong>g sites, and typically<br />

corresponds to some discretized differential operator. The scale parameter λ dictates the strength<br />

of the <strong>in</strong>teraction. Un<strong>for</strong>tunately, the scale parameter λ and the <strong>in</strong>verse variance τ are often nontrivial to<br />

assign and calibrate despite their critical role <strong>in</strong> the statistical model<strong>in</strong>g. The Bayesian paradigm resolves<br />

the difficulty flexibly through hierarchical model<strong>in</strong>g. The underly<strong>in</strong>g idea is to regard them as unknowns<br />

and to let the data determ<strong>in</strong>e these parameters. More precisely, they are also modeled as random variables,<br />

and have their own priors. We follow the standard statistical practice of adopt<strong>in</strong>g conjugate priors<br />

<strong>for</strong> both λ and τ [12], i.e.<br />

p(λ) ∝ λ α0−1 e −β0λ and p(τ) ∝ τ α1−1 e −β1τ ,<br />

where (α 0 , β 0 ) and (α 1 , β 1 ) are the parameter pairs <strong>for</strong> the prior distributions of λ and τ, respectively. By<br />

comb<strong>in</strong><strong>in</strong>g these densities via Bayes’ rule, we arrive at the complete Bayesian solution, i.e. the posterior<br />

probability density function (PPDF) p(x, λ, τ|y δ ), to the <strong>in</strong>verse problem (4):<br />

p(x, λ, τ|y δ ) ∝ p(y δ |x, τ) · p(x|λ) · p(λ) · p(τ)<br />

∝ τ n 2 exp<br />

(− τ )<br />

2 ‖Hx − yδ ‖ 2 2 · λ m 2 exp<br />

(− λ )<br />

2 ‖Lx‖2 2 · λ α0−1 e −β 0λ · τ α1−1 e −β1τ .<br />

The PPDF encapsulates complete <strong>in</strong><strong>for</strong>mation about the unknown x and the parameters λ and τ.<br />

The maximum a posteriori rema<strong>in</strong>s the most popular Bayesian estimate, and it selects (x, λ, τ) map as the<br />

most probable one given the observational data y δ . More precisely, it proceeds as follows:<br />

(x, λ, τ) map = arg max p(x, λ,<br />

(x,λ,τ) τ|yδ ) = arg m<strong>in</strong> J (m, λ, τ),<br />

(x,λ,τ)<br />

where the functional J (x, λ, τ) is def<strong>in</strong>ed by<br />

J (x, λ, τ) = τ 2 ‖Kx − yδ ‖ 2 2 + λ ( m<br />

)<br />

( n<br />

)<br />

2 ‖Lx‖2 2 + β 0 λ −<br />

2 + α 0 − 1 ln λ + β 1 τ −<br />

2 + α 1 − 1 ln τ.<br />

Abus<strong>in</strong>g the notations α 0 , β 0 , α 1 and β 1 slightly, its <strong>for</strong>mal limit as m, n → ∞ suggests a new functional<br />

of cont<strong>in</strong>uous <strong>for</strong>m<br />

J (x, λ, τ) = τ 2 ‖Kx − yδ ‖ 2 L + λ ( )<br />

( )<br />

1 1 2<br />

2 ‖Lx‖2 L + β 0λ − 2 2 + α 0 ln λ + β 1 τ −<br />

2 + α 1 ln τ,<br />

where the operators K and L are cont<strong>in</strong>uous analogs of the matrices K and L, respectively. Upon lett<strong>in</strong>g<br />

α ′ 0 = 1 2 + α 0 and α ′ 1 = 1 2 + α 1, we arrive at<br />

J (x, λ, τ) = τ 2 ‖Kx − yδ ‖ 2 2 + λ 2 ‖Lx‖2 2 + β 0 λ − α ′ 0 ln λ + β 1 τ − α ′ 1 ln τ.<br />

4


This naturally motivates the follow<strong>in</strong>g generalized <strong>Tikhonov</strong> (g-<strong>Tikhonov</strong> <strong>for</strong> short) functional<br />

J (x, λ, τ) = τφ(x, y δ ) + λψ(x) + β 0 λ − α ′ 0 ln λ + β 1 τ − α ′ 1 ln τ (5)<br />

def<strong>in</strong>ed <strong>for</strong> (x, λ, τ) ∈ X × R + × R + . This extends the <strong>Tikhonov</strong> functional (3), but will never be utilized<br />

to solve the <strong>in</strong>verse problem (1). Functional J (x, λ, τ) is <strong>in</strong>troduced only <strong>in</strong> the hope to help construct<br />

an adaptive algorithm <strong>for</strong> select<strong>in</strong>g a reasonable regularization parameter η <strong>in</strong> (3), which will be our<br />

<strong>in</strong>terested solver <strong>for</strong> (1).<br />

We are now go<strong>in</strong>g to derive the algorithm <strong>for</strong> adaptively updat<strong>in</strong>g the parameter η by mak<strong>in</strong>g use<br />

of the optimality system of functional J (x, λ, τ), and detect<strong>in</strong>g the noise level σ 0 <strong>in</strong> a fully data-driven<br />

manner. As we will see, the parameter η and the noise level σ(η) are connected with the parameters λ and<br />

τ <strong>in</strong> (5) by the relations η := λτ −1 and σ 2 (η) = τ −1 . Not<strong>in</strong>g that we are consider<strong>in</strong>g a general sett<strong>in</strong>g,<br />

the functional might be nonsmooth and nonconvex, so we shall resort to optimality <strong>in</strong> a generalized sense.<br />

Def<strong>in</strong>ition 2.1 An element (x ∗ , λ ∗ , τ ∗ ) ∈ X × R + × R + is called a critical po<strong>in</strong>t of the functional (5) if<br />

it satisfies the follow<strong>in</strong>g generalized optimality system<br />

x ∗ {<br />

= arg m<strong>in</strong> φ(x, y δ ) + λ ∗ (τ ∗ ) −1 ψ(x) } ,<br />

x∈X<br />

ψ(x ∗ ) + β 0 − α 0<br />

′ 1<br />

= 0, (6)<br />

λ∗ φ(x ∗ , y δ ) + β 1 − α 1<br />

′ 1<br />

τ ∗ = 0.<br />

Note that the solution x ∗ co<strong>in</strong>cides with the <strong>Tikhonov</strong> solution x η ∗ <strong>in</strong> (3) with η ∗ := λ∗<br />

τ<br />

. Numerical<br />

∗<br />

experiments <strong>in</strong>dicate that the estimate σ 2 (η ∗ ) = 1<br />

τ<br />

= ϕ(x η ∗ ,yδ )+β 1<br />

∗ α<br />

represents an excellent approximation<br />

′<br />

1<br />

to the exact variance σ0 2 <strong>for</strong> the choice α 1 = β 1 ≈ 0, like the highly applauded GCV estimate of the<br />

variance [31].<br />

From the optimality system (6), the automatically determ<strong>in</strong>ed regularization parameter η ∗ verifies<br />

α ′ 0<br />

η ∗ := λ ∗ · (τ ∗ ) −1 =<br />

· ϕ(x η ∗, yδ ) + β 1<br />

. (7)<br />

ψ(x η ∗) + β 0<br />

Under the premise that the estimate σ 2 (η ∗ ) approximates accurately the exact variance σ0, 2 the def<strong>in</strong><strong>in</strong>g<br />

relation (7) implies that the regularization parameter η ∗ verifies the <strong>in</strong>equality η ∗ α′ 0<br />

β 0<br />

σ0. 2 Experimentally,<br />

we have also observed that the value of the scale parameter λ ∗ :=<br />

α ′ 1<br />

α ′ 0<br />

ψ(x η ∗ )+β 0<br />

is almost <strong>in</strong>dependent<br />

of the noise level σ 0 <strong>for</strong> fixed α ′ 0 and β 0 , and thus empirically speak<strong>in</strong>g, η ∗ is of the order O(σ 2 0). However,<br />

the determ<strong>in</strong>istic <strong>in</strong>verse theory [11] requires a regularization parameter choice rule ˜η(σ 0 ) verify<strong>in</strong>g<br />

lim ˜η(σ σ0<br />

2<br />

0) = 0 and lim<br />

σ 0→0 σ 0→0 ˜η(σ 0 ) = 0 (8)<br />

<strong>in</strong> order to yield a valid regulariz<strong>in</strong>g scheme, i.e. the <strong>in</strong>verse solution converges to the exact one as the<br />

noise level σ 0 dim<strong>in</strong>ishes to zero. There<strong>for</strong>e, the g-<strong>Tikhonov</strong> method (5) is bounded to under-regularize<br />

the <strong>in</strong>verse problem (1) <strong>in</strong> case of low noise levels, i.e. the regularization parameter η ∗ is too small.<br />

Numerical f<strong>in</strong>d<strong>in</strong>gs also corroborate the assertion, evidenced by the under-regularization <strong>in</strong> case of low<br />

noise levels. One promis<strong>in</strong>g approach to remedy the difficulty is to rescale α 0 ′ as σ0 −d with 0 < d < 2<br />

as σ 0 tends to zero <strong>in</strong> order to ensure the consistency conditions dictated by equation (8). There<strong>for</strong>e, it<br />

seems natural to adaptively update α 0 ′ us<strong>in</strong>g the automatically determ<strong>in</strong>ed noise level σ 2 (η ∗ ).<br />

Our new choice rule derives from equation (7) and preced<strong>in</strong>g arguments. Upon abus<strong>in</strong>g the notation<br />

α 0 ′ by identify<strong>in</strong>g α 0 ′ with its rescal<strong>in</strong>g with the automatically determ<strong>in</strong>ed σ 2 (η ∗ ), it consists of choos<strong>in</strong>g<br />

5


the regularization parameter η ∗ by the rule<br />

η ∗ =<br />

α ′ 0<br />

ψ(x η ∗) + β 0<br />

·<br />

( ϕ(xη ∗, y δ ) −d<br />

)<br />

· ϕ(x η ∗, yδ )<br />

α ′ 1<br />

= α φ(x η ∗, yδ ) 1−d<br />

ψ(x η ∗) + β 0<br />

, 0 < d < 1,<br />

where α = α′ 0<br />

is some constant. Here we have dropped the constant β<br />

(α ′ 1 s<strong>in</strong>ce it has only marg<strong>in</strong>al<br />

1 )1−d<br />

practical impact on the solution procedure so long as its value is small. The rationale of <strong>in</strong>vok<strong>in</strong>g an<br />

exponent (1 − d) is to adaptively update the parameter α 0 ′ us<strong>in</strong>g the automatically detected noise level<br />

so that α 0 ′ ∼ O(σ0 −2d ) as the noise level σ 0 decreases to zero, <strong>in</strong> the hope of verify<strong>in</strong>g the consistency<br />

conditions dictated <strong>in</strong> equation (8). This choice rule is plausible provided that the estimate σ 2 (η ∗ ) agrees<br />

reasonably well with the exact noise level σ0. 2 In brevity, we have arrived at the desired choice rule which<br />

selects the regularization parameter accord<strong>in</strong>g to the nonl<strong>in</strong>ear equation of η with 0 < d < 1:<br />

η (ψ(x η ) + β 0 ) = α φ(x η , y δ ) 1−d (9)<br />

<strong>for</strong> which we shall propose an effective iterative algorithm that converges monotonically.<br />

We emphasize that the newly proposed choice rule (9) can also be regarded as an adaptive strategy<br />

<strong>for</strong> updat<strong>in</strong>g the parameter α 0 ′ <strong>in</strong> the g-<strong>Tikhonov</strong> functional. The specialization of the choice rule (9)<br />

to the L 2 -L 2 functional might be also used as a systematic strategy to adapt the parameter ν of a fixed<br />

po<strong>in</strong>t algorithm proposed <strong>in</strong> [5], which numerically implements a local m<strong>in</strong>imum criterion [26, 11] <strong>for</strong><br />

choos<strong>in</strong>g the regularization parameter.<br />

Selection of parameters β 0 , α and d <strong>in</strong> (9). Be<strong>for</strong>e proceed<strong>in</strong>g to the analysis of the new choice<br />

rule based on (9), we give some practical guidel<strong>in</strong>es on choos<strong>in</strong>g the parameters β 0 , α and d. The<br />

parameter β 0 plays only an <strong>in</strong>significant role as long as its value is kept sufficiently small so that the term<br />

ψ(x η ) is dom<strong>in</strong>ated. Practically, we have observed that the numerical results are practically identical <strong>for</strong><br />

β 0 vary<strong>in</strong>g over a wide range, e.g. [1 × 10 −3 , 1 × 10 −10 ]. Numerical experiments <strong>in</strong>dicate that <strong>for</strong> the<br />

f<strong>in</strong>ite-dimensional <strong>in</strong>verse problem (4), small values of α 0 and α 1 works well <strong>for</strong> <strong>in</strong>verse problems with<br />

5% relative noise <strong>in</strong> the data. There<strong>for</strong>e, a value of α 0 ′ ≈ m 2 and α′ 1 ≈ n 2<br />

suffices <strong>in</strong> this case, which<br />

consequently <strong>in</strong>dicates that α 0 ′ = 1 2 and α′ 1 = 1 2<br />

should suffice <strong>for</strong> its cont<strong>in</strong>uous analog if m ≈ n, i.e.<br />

the constant α 0/α ′ 1 ′ should ma<strong>in</strong>ta<strong>in</strong> the order 1 <strong>in</strong> equation (7). With these experimental observations<br />

<strong>in</strong> m<strong>in</strong>d, the constant α <strong>in</strong> (9) should be of order one, but scaled appropriately by the magnitude of<br />

the data to account <strong>for</strong> the rescal<strong>in</strong>g φ(x η ∗, y δ ) −d . This can be roughly achieved by rescal<strong>in</strong>g its value<br />

by max i |y i | 2d and max i |y i | d <strong>in</strong> case of L 2 and L 1 data-fitt<strong>in</strong>g, respectively. The optimal value of the<br />

exponent d depends on the source condition verified by the exact solution x + , see Theorems 3.4 and 3.5,<br />

and typically we choose its value <strong>in</strong> the range [ 1 3 , 1 2 ]. These guidel<strong>in</strong>es on the selection of parameters β 0,<br />

α and d are simple and easy to realize <strong>in</strong> numerical implementations, and have worked very well <strong>for</strong> all<br />

the five benchmark problems, rang<strong>in</strong>g from mildly to severely ill-posed <strong>in</strong>verse problems; see Section 5<br />

<strong>for</strong> details.<br />

3 Existence and error estimates<br />

This section shows the existence of solutions to the regularization parameter equation (9), and derives<br />

a posteriori error estimates of the <strong>Tikhonov</strong> solution x η ∗ <strong>in</strong> (3). We make the follow<strong>in</strong>g assumptions on<br />

the functionals φ(x, y δ ) and ψ(x).<br />

Assumption 3.1 Assume that the nonnegative functionals φ(x, y δ ) and ψ(x) satisfy<br />

(a) For any η > 0, the functional J η (x) def<strong>in</strong>ed <strong>in</strong> (3) is coercive on X , i.e. J η (x) → +∞ as ‖x‖ X → ∞.<br />

(b) The functionals φ(x, y δ ) and ψ(x) are weakly lower semi-cont<strong>in</strong>uous, i.e.<br />

φ(x, y δ ) ≤ lim <strong>in</strong>f<br />

n→∞ φ(x n, y δ ) and ψ(x) ≤ lim <strong>in</strong>f<br />

n→∞ ψ(x n)<br />

6<br />

α ′ 1


<strong>for</strong> any sequence {x n } n ⊂ X converg<strong>in</strong>g weakly to x.<br />

(c) There exists an ˜x such that ψ(˜x) = 0.<br />

Assumptions 3.1(a) and (b) are standard <strong>for</strong> ensur<strong>in</strong>g the existence of a m<strong>in</strong>imizer to the <strong>Tikhonov</strong><br />

functional J η [11].<br />

Note that the m<strong>in</strong>imizers x η of the <strong>Tikhonov</strong> functional J η <strong>in</strong> (3) might be nonunique, thus the<br />

functions φ(x η , y δ ) and ψ(x η ) might be multi-valued. We will need the next lemma on the monotonicity<br />

of these functions with respect to η.<br />

Lemma 3.1 Let x η1 and x η2 be the solutions to the <strong>Tikhonov</strong> functional J η (x) <strong>in</strong> (3) with the regularization<br />

parameter η 1 and η 2 , respectively. Then we have<br />

(ψ(x η1 ) − ψ(x η2 ))(η 1 − η 2 ) ≤ 0,<br />

(φ(x η1 , y δ ) − φ(x η2 , y δ ))(η 1 − η 2 ) ≥ 0.<br />

Proof. By the m<strong>in</strong>imiz<strong>in</strong>g properties of x η1 and x η2 , we have<br />

φ(x η1 , y δ ) + η 1 ψ(x η1 ) ≤ φ(x η2 , y δ ) + η 1 ψ(x η2 ),<br />

φ(x η2 , y δ ) + η 2 ψ(x η2 ) ≤ φ(x η1 , y δ ) + η 2 ψ(x η1 ).<br />

Add<strong>in</strong>g these two <strong>in</strong>equalities gives the first assertion, and the second can be derived analogously. □<br />

The m<strong>in</strong>imizer x η to the <strong>Tikhonov</strong> functional J η <strong>in</strong> (3) is a nonl<strong>in</strong>ear function of the regularization<br />

parameter η, there<strong>for</strong>e equation (9) is a nonl<strong>in</strong>ear equation <strong>in</strong> η. Assisted with Lemma 3.1, we are now<br />

ready to give an existence result to equation (9).<br />

Theorem 3.1 Assume that the functions φ(x η , y δ ) and ψ(x η ) are cont<strong>in</strong>uous with respect to η. Then<br />

there exists at least one positive solution to equation (9) if lim η→0 + φ(x η , y δ ) > 0.<br />

Proof.<br />

Def<strong>in</strong>e<br />

f(η) = η(ψ(x η ) + β 0 ) − αφ(x η , y δ ) 1−d , (10)<br />

then the nonnegativity of the functionals φ(x, y δ ) and ψ(x) and Lemma 3.1 imply that<br />

from which we derive that<br />

f(η) ≥ β 0 η − αφ(x ∞ , y δ ) 1−d ,<br />

lim f(η) = +∞.<br />

η→∞<br />

Note that by Lemma 3.1, φ(x η , y δ ) is monotonically decreas<strong>in</strong>g with the decrease of η, and by Assumption<br />

3.1, it is bounded from below. There<strong>for</strong>e, the follow<strong>in</strong>g limit<strong>in</strong>g process makes sense, and<br />

lim f(η) = lim −αφ(x η, y δ ) < 0,<br />

η→0 + η→0 +<br />

by the assumption that lim η→0 + φ(x η , y δ ) is positive. By the cont<strong>in</strong>uity of the functional φ(x η , y δ ) and<br />

ψ(x η ) with respect to η, we conclude that there exists at least one positive solution to equation (9). □<br />

Remark 3.1 The existence of a solution follows also from the convergence of the fixed po<strong>in</strong>t algorithm,<br />

see Theorem 4.1. Also, the existence of a positive solution can be ensured <strong>for</strong> a relaxation of equation (9)<br />

η(ψ(x η ) + β 0 ) = α(φ(x η , y δ ) + β 1 ) 1−d , 0 < d < 1,<br />

where β 1 acts as a relaxation parameter and is usually taken to be much smaller compared to the magnitude<br />

of φ(x η , y δ ).<br />

7


Remark 3.2 The proposed choice rule (9) also generalizes the zero-cross<strong>in</strong>g method <strong>for</strong> the L 2 -L 2 functional,<br />

which seeks the solution to the nonl<strong>in</strong>ear equation<br />

−φ(x η ∗, y δ ) + η ∗ ψ(x η ∗) = 0,<br />

It is obta<strong>in</strong>ed by sett<strong>in</strong>g d = 0 and α = 1 <strong>in</strong> equation (9). The zero-cross<strong>in</strong>g method is popular <strong>in</strong> the<br />

biomedical eng<strong>in</strong>eer<strong>in</strong>g community, and <strong>for</strong> some analysis of the method, we refer to [20].<br />

Theorem 3.1 relies crucially on the cont<strong>in</strong>uity of the functions φ(x η , y δ ) and ψ(x η ) with respect to<br />

the regularization parameter η. Lemma 3.1 <strong>in</strong>dicates that the functions are monotone, and thus are<br />

differentiable almost everywhere. The follow<strong>in</strong>g theorem gives one sufficient condition <strong>for</strong> the cont<strong>in</strong>uity.<br />

Theorem 3.2 Suppose that the functional J η has a unique m<strong>in</strong>imizer <strong>for</strong> every η > 0. Then the functions<br />

φ(x η , y δ ) and ψ(x η ) are cont<strong>in</strong>uous with respect to η.<br />

Proof. Fix η ∗ > 0 and let x η ∗ be the unique m<strong>in</strong>imizer of J η ∗. Let {η j } j ⊂ R + converge to η ∗ .<br />

Consider the sequence of m<strong>in</strong>imizers {x ηj } j . Observe that<br />

φ(x ηj , y δ ) + η j ψ(x ηj ) ≤ φ(˜x, y δ ) + η j φ(˜x) = φ(˜x, y δ ).<br />

This implies that the sequences {φ(x ηj , y δ )} j and {ψ(x ηj )} j are uni<strong>for</strong>mly bounded. By Assumption<br />

3.1(a), the sequence {x ηj } j is uni<strong>for</strong>mly bounded. There<strong>for</strong>e, there exists a subsequence of {x ηj } j , also<br />

denoted as {x ηj } j , such that x ηj → x ∗ weakly. By Assumption 3.1(b) on the weak lower semi-cont<strong>in</strong>uity<br />

of the functionals, we have<br />

Hence, we arrive at<br />

φ(x ∗ , y δ ) ≤ lim <strong>in</strong>f<br />

j→∞ φ(x η j<br />

, y δ ) and ψ(x ∗ ) ≤ lim <strong>in</strong>f<br />

j→∞ ψ(x η j<br />

). (11)<br />

J η ∗(x ∗ ) = φ(x ∗ , y δ ) + η ∗ ψ(x ∗ ) ≤ lim <strong>in</strong>f<br />

j→∞<br />

Next we show J η ∗(x η ∗) ≥ lim sup j→∞ J ηj (x ηj ). To see this,<br />

by the fact that x ηj<br />

lim sup<br />

j→∞<br />

φ(x η j<br />

, y δ ) + lim <strong>in</strong>f η jψ(x ηj ) ≤ lim <strong>in</strong>f J η j<br />

(x ηj ).<br />

j→∞<br />

j→∞<br />

J ηj (x ηj ) ≤ lim sup J ηj (x η ∗) = lim J η j<br />

(x η ∗) = J η ∗(x η ∗)<br />

j→∞<br />

j→∞<br />

is a m<strong>in</strong>imizer of J ηj . Consequently,<br />

lim sup J ηj (x ηj ) ≤ J η ∗(x η ∗) ≤ J η ∗(x ∗ ) ≤ lim <strong>in</strong>f J η j<br />

(x ηj ). (12)<br />

j→∞<br />

j→∞<br />

We thus see that x ∗ is a m<strong>in</strong>imizer of J η ∗, and by uniqueness of m<strong>in</strong>imizers of J η ∗, we deduce that<br />

x ∗ = x η ∗, and the whole sequence {x ηj } j converges weakly to x η ∗. Consequently, the function J η (x η ) is<br />

cont<strong>in</strong>uous with respect to η. Next we show that the functional ψ(x ηj ) → ψ(x η ∗), <strong>for</strong> which it suffices to<br />

show that<br />

lim sup ψ(x ηj ) ≤ ψ(x η ∗).<br />

j→∞<br />

Assume that it does not hold. Then there exists a constant c such that c := lim sup j→∞ ψ(x ηj ) > ψ(x η ∗),<br />

and there exists a subsequence of {x ηj } j , denoted by {x n } n , such that<br />

As a consequence of (12), we have<br />

x n → x η ∗ weakly , and ψ(x n ) → c.<br />

lim φ(x n, y δ ) = φ(x η ∗, y δ ) + η ∗ ψ(x η ∗) − lim η nψ(x n )<br />

n→∞ n→∞<br />

= φ(x η ∗, y δ ) + η ∗ (ψ(x η ∗) − c) < φ(x η ∗, y δ ).<br />

This is <strong>in</strong> contradiction with (11). There<strong>for</strong>e, we have lim sup j→∞ ψ(x ηj ) ≤ ψ(x η ∗), and the function<br />

ψ(x η ) is cont<strong>in</strong>uous with respect to η. The cont<strong>in</strong>uity of φ(x η , y δ ) follows from the cont<strong>in</strong>uity of the<br />

function J η (x η ) and ψ(x η ).<br />

□<br />

The follow<strong>in</strong>g corollary is a direct consequence of Theorem 3.2, and is also of <strong>in</strong>dependent <strong>in</strong>terest.<br />

8


Corollary 3.1 The functional value J η (x η ) is always cont<strong>in</strong>uous with respect to η. The multi-valued<br />

functions φ(x η , y δ ) and ψ(x η ) share the same discont<strong>in</strong>uity set, which is at most of countable card<strong>in</strong>ality.<br />

Proof. The cont<strong>in</strong>uity of the functional follows directly from the proof of Theorem 3.2, and it consequently<br />

implies that φ(x η , y δ ) and ψ(x η ) share the same discont<strong>in</strong>uity set. The fact that the discont<strong>in</strong>uity<br />

set is countable follows from the monotonicity of φ(x η , y δ ) and ψ(x η ), see Lemma 3.1.<br />

□<br />

Remark 3.3 The cont<strong>in</strong>uity result also holds <strong>for</strong> non-reflexive spaces, e.g. BV space. For a proof of<br />

the L 2 -T V <strong>for</strong>mulation, we refer to reference [9]. However, the uniqueness assumption is necessary <strong>in</strong><br />

general, and one counterexample is the L 1 -T V <strong>for</strong>mulation [9]. Theorem 3.2 rema<strong>in</strong>s valid <strong>in</strong> the presence<br />

of convex constra<strong>in</strong>t set C or nonl<strong>in</strong>ear operators K.<br />

We are now go<strong>in</strong>g to establish some variational characterization of the regularization parameter choice<br />

rule (9). For this, we <strong>in</strong>troduce a functional G by<br />

{ 1<br />

G(x) = d φ(x, yδ ) d + α ln(β 0 + ψ(x)), 0 < d < 1,<br />

ln φ(x, y δ ) + α ln(β 0 + ψ(x)), d = 0.<br />

Clearly, the existence of a m<strong>in</strong>imizer to the functional G follows directly from Assumption 3.1 as it is<br />

bounded from below, coercive and weakly lower semi-cont<strong>in</strong>uous. Under the premise that the functionals<br />

φ(x, y δ ) and ψ(x) are differentiable, a critical po<strong>in</strong>t x ∗ of the functional G satisfies<br />

Sett<strong>in</strong>g η ∗ = α φ(x∗ ,y δ ) 1−d<br />

ψ(x ∗ )+β 0<br />

gives<br />

φ(x ∗ , y δ ) d−1 φ ′ (x ∗ , y δ ) +<br />

α<br />

ψ(x ∗ ) + β 0<br />

ψ ′ (x ∗ ) = 0.<br />

φ ′ (x ∗ , y δ ) + η ∗ ψ ′ (x ∗ ) = 0,<br />

i.e. x ∗ is a critical po<strong>in</strong>t of the functional J η ∗. Furthermore, if the functional J η ∗ is convex, then x ∗ is<br />

also a m<strong>in</strong>imizer of the functional J η ∗ and x ∗ = x η ∗. The next theorem summarizes this observation.<br />

Theorem 3.3 If the functional J η is convex and the functionals φ(x, y δ ) and ψ(x) are differentiable,<br />

then a solution x η ∗ computed by the choice rule (9) is a critical po<strong>in</strong>t of the functional G.<br />

Theorem 3.3 and the existence of a m<strong>in</strong>imizer to the functional G ensures existence of a solution to<br />

the strategy (9). The functional G provides a variational characterization of the regularization parameter<br />

choice rule (9), while the strategy (9) implements the functional G via the optimality condition. There<br />

might exist better strategies numerically implement<strong>in</strong>g the functional G, however, this is beyond the<br />

scope of the present study.<br />

To offer further theoretical justifications of the choice rule (9), we will derive a posteriori error<br />

estimates, i.e. a bound of the error between the <strong>in</strong>verse solution x η ∗ and the exact solution x + to (1).<br />

We will consider functionals of L 2 -ψ type with ψ be<strong>in</strong>g convex, and discuss two cases ψ(x) = ‖x‖ 2 2 and<br />

ψ(x) be<strong>in</strong>g a general convex function separately due to the <strong>in</strong>herent differences therebetween.<br />

We first specialize to <strong>Tikhonov</strong> regularization <strong>in</strong> Hilbert spaces [11], with its norm denoted by ‖ · ‖.<br />

Let x η ∗ be a solution to the <strong>Tikhonov</strong> functional <strong>in</strong> (3) with φ(x, y δ ) = ‖Kx − y δ ‖ 2 2, ψ(x) = ‖x‖ 2 2 and<br />

with η ∗ chosen by equation (9). To this end, we adopt the general framework of reference [11]. Let<br />

g η (t) = 1<br />

η+t and r η(t) = 1 − tg η (t) =<br />

η<br />

η+t , then def<strong>in</strong>e G(η) by G(η) := sup{|g η(t)| : t ∈ [0, ‖K‖ 2 ]} = 1 η ,<br />

and let ω µ : (0, ‖K‖ 2 ) → R be such that <strong>for</strong> all γ ∈ (0, γ 0 ) and t ∈ [0, ‖K‖ 2 ], t µ |r γ (t)| ≤ ω µ (γ) holds.<br />

Then <strong>for</strong> 0 < µ ≤ 1, we have ω µ (η) = η µ . Moreover, def<strong>in</strong>e the source sets X µ,ρ by X µ,ρ := {x ∈ X :<br />

x = (K ∗ K) µ w, ‖w‖ ≤ ρ}. With these prelim<strong>in</strong>aries, we are ready to state one of our ma<strong>in</strong> results on a<br />

posteriori error estimates.<br />

Theorem 3.4 Let x + be the m<strong>in</strong>imum-norm solution to Kx = y, and assume that x + ∈ X µ,ρ <strong>for</strong> some<br />

0 < µ ≤ 1. Let δ ∗ := ‖y δ − Kx η ∗‖ and d = 2µ<br />

2µ+1<br />

. Then we have<br />

( √ )<br />

‖x + − x η ∗‖ ≤ c ρ 1 δ ψ(xη ∗) + β 0<br />

2µ+1 + √ max{δ, δ ∗ } 2µ<br />

2µ+1 . (13)<br />

δ ∗ α<br />

9


Proof.<br />

We decompose the error x + − x η <strong>in</strong>to<br />

x + − x η = r η (K ∗ K)x + + g η (K ∗ K)K ∗ (y − y δ ).<br />

Introduc<strong>in</strong>g the source representer w with x + = (K ∗ K) µ w, the <strong>in</strong>terpolation <strong>in</strong>equality gives<br />

‖r η (K ∗ K)x + ‖ = ‖r η (K ∗ K)(K ∗ K) µ w‖<br />

≤ ‖(K ∗ K) 1 2 +µ r η (K ∗ K)w‖ 2µ<br />

2µ+1 ‖rη (K ∗ K)w‖ 1<br />

2µ+1<br />

= ‖r η (KK ∗ )Kx + ‖ 2µ<br />

2µ+1 ‖rη (K ∗ K)w‖ 1<br />

2µ+1<br />

≤ c ( ‖r η (KK ∗ )y δ ‖ + ‖r η (KK ∗ )(y δ − y)‖ ) 2µ<br />

2µ+1<br />

‖w‖ 1<br />

2µ+1 ,<br />

where the constant c depends only on the maximum of r η over [0, ‖K‖ 2 ]. By not<strong>in</strong>g the relation<br />

we obta<strong>in</strong><br />

r η ∗(KK ∗ )y δ = y δ − Kx η ∗,<br />

‖r η ∗(K ∗ K)x + ‖ ≤ c(δ ∗ + cδ) 2µ<br />

2µ+1 ρ<br />

1<br />

2µ+1 ≤ c1 max{δ, δ ∗ } 2µ<br />

2µ+1 ρ<br />

1<br />

2µ+1 .<br />

It rema<strong>in</strong>s to estimate the term ‖g η ∗(K ∗ K)K ∗ (y δ − y)‖. The standard estimate (see Theorem 4.2 of [11])<br />

yields<br />

‖g η ∗(K ∗ K)K ∗ (y δ − y)‖ ≤ c√ δ<br />

η<br />

∗ ,<br />

However, by equation (9), we have<br />

There<strong>for</strong>e, we derive that<br />

1<br />

√ η<br />

∗ = δd ∗<br />

δ ∗<br />

√<br />

ψ(xη ∗) + β 0<br />

√ α<br />

.<br />

‖g η ∗(K ∗ K)K ∗ (y δ − y)‖ ≤ c δ δ ∗<br />

√<br />

ψ(xη ∗) + β 0<br />

√ α<br />

δ d ∗ ≤ c δ δ ∗<br />

√<br />

ψ(xη ∗) + β 0<br />

√ α<br />

max{δ, δ ∗ } d .<br />

Comb<strong>in</strong><strong>in</strong>g these two estimates and tak<strong>in</strong>g <strong>in</strong>to account that d = 2µ<br />

2µ+1<br />

, we arrive at the desired a posteriori<br />

error estimate.<br />

□<br />

Remark 3.4 The error bound (13) states that the approximation obta<strong>in</strong>ed from the proposed rule is<br />

order-optimal provided that δ ∗ is about the order of δ. However, to this end, the exponent d must be<br />

chosen accord<strong>in</strong>g to the sourcewise parameter µ. The knowledge of δ ∗ enables a posteriori check<strong>in</strong>g: if<br />

δ ∗ ≪ δ, then one should be cautious about the chosen parameter, s<strong>in</strong>ce the prefactor δ<br />

δ ∗<br />

is very large; if<br />

δ ∗ ≫ δ, the situation is not critical and the magnitude of δ ∗ essentially determ<strong>in</strong>es the error. Numerically,<br />

the prefactor λ ∗ = ψ(x η ∗ )+β 0<br />

α<br />

rema<strong>in</strong>s almost constant as the noise level σ0 2 varies.<br />

Next we consider functionals of the type L 2 -ψ with ψ(x) be<strong>in</strong>g convex. The convergence rate analysis<br />

<strong>for</strong> <strong>in</strong>verse problems <strong>in</strong> Banach spaces is fundamentally different from that <strong>in</strong> Hilbert spaces [8]. We will<br />

use an <strong>in</strong>terest<strong>in</strong>g new distance function, the generalized Bregman distance (cf. [8]), to measure the a<br />

posteriori error. To this end, we need the concept of the ψ-m<strong>in</strong>imiz<strong>in</strong>g solution.<br />

Def<strong>in</strong>ition 3.1 An element x + ∈ X is called a ψ-m<strong>in</strong>imiz<strong>in</strong>g solution of (1) if Kx + = y and<br />

ψ(x + ) ≤ ψ(x), ∀x ∈ X such that Kx = y.<br />

10


Let us denote the subdifferential of ψ(x) at x + by ∂ψ(x + ), i.e.<br />

∂ψ(x + ) = {q ∈ X ∗ : ψ(x) ≥ ψ(x + ) + 〈q, x − x + 〉, ∀x ∈ X },<br />

and def<strong>in</strong>e the generalized Bregman distance D ψ (x, x + ) by<br />

D ψ (x, x + ) := { ψ(x) − ψ(x + ) − 〈q, x − x + 〉 : q ∈ ∂ψ(x + ) } .<br />

One can verify that if ψ(x) = ‖x‖ 2 , then the generalized Bregman distance d(x η ∗, x + ) reduces to the<br />

familiar <strong>for</strong>mula<br />

d(x η ∗, x + ) = ‖x η ∗ − x + ‖ 2 .<br />

Now we are ready to present another a posteriori error estimate.<br />

Theorem 3.5 Let x + be a ψ-m<strong>in</strong>imiz<strong>in</strong>g solution to equation (1) and assume that the follow<strong>in</strong>g source<br />

condition holds: there exists a w ∈ Y such that<br />

K ∗ w ∈ ∂ψ(x + ).<br />

Let δ ∗ = ‖Kx η ∗ −y δ ‖ and d = 1 2 . Then <strong>for</strong> each x η ∗ that solves equation (9), there exists d ∈ D ψ(x η ∗, x + )<br />

such that<br />

( )<br />

δ<br />

d(x η ∗, x + ψ(x η ∗) + β 0 α<br />

) ≤<br />

+<br />

‖w‖ 2 max{δ, δ ∗ }.<br />

δ ∗ α ψ(x η ∗) + β 0<br />

Proof.<br />

Let<br />

d(x η ∗, x + ) = ψ(x η ∗) − ψ(x + ) − 〈K ∗ w, x η ∗ − x + 〉 ∈ D ψ (x η ∗, x + ).<br />

By the m<strong>in</strong>imiz<strong>in</strong>g property of x η ∗, Kx + = y and ‖y − y δ ‖ = δ, we have<br />

1<br />

2 ‖Kx η ∗ − yδ ‖ 2 2 + η ∗ ψ(x η ∗) ≤ δ2<br />

2 + η∗ ψ(x + ),<br />

i.e.<br />

1<br />

2 ‖Kx η ∗ − yδ ‖ 2 2 + η ∗ d + η ∗ [ 〈w, Kx η ∗ − y δ 〉 + 〈w, y δ − y〉 ] ≤ δ2<br />

2 .<br />

Add<strong>in</strong>g 1 2 η∗2 ‖w‖ 2 to both sides of the equality and utiliz<strong>in</strong>g the Cauchy-Schwartz <strong>in</strong>equality yield<br />

1<br />

2 ‖Kx η ∗ − yδ − η ∗ w‖ 2 2 + η ∗ d(x η ∗, x + ) ≤ δ2<br />

2 + η∗2<br />

2 ‖w‖2 + η ∗ 〈w, y − y δ 〉<br />

≤ δ 2 + ‖w‖ 2 η ∗2 .<br />

There<strong>for</strong>e, we derive that<br />

d(x η ∗, x + ) ≤ δ2<br />

η ∗ + ‖w‖2 η ∗ .<br />

which comb<strong>in</strong>ed with equation (9) yields the desired estimate.<br />

□<br />

4 Numerical algorithm <strong>for</strong> both <strong>Tikhonov</strong> solution and regularization<br />

parameter<br />

The new choice rule requires solv<strong>in</strong>g the nonl<strong>in</strong>ear regularization parameter equation (9) <strong>for</strong> the regularization<br />

parameter η <strong>in</strong> order to f<strong>in</strong>d the <strong>Tikhonov</strong> solution x η through the functional J η <strong>in</strong> (3). A<br />

direct numerical treatment of (9) seems difficult. Motivated by the strict biconvexity structure of the<br />

g-<strong>Tikhonov</strong> functional J (x, λ, τ), i.e. it is strictly convex <strong>in</strong> x (respectively <strong>in</strong> (λ, τ)) <strong>for</strong> fixed (λ, τ) (<br />

respectively x), we propose the follow<strong>in</strong>g iterative algorithm <strong>for</strong> the efficient numerical realization of the<br />

proposed choice rule (9), along with the <strong>Tikhonov</strong> solution x η through the functional J η <strong>in</strong> (3).<br />

Algorithm I. Choose an <strong>in</strong>itial guess η 0 > 0, and set k = 0. F<strong>in</strong>d (x k , η k ) <strong>for</strong> k ≥ 1 as follows:<br />

11


(i) Solve <strong>for</strong> x k+1 by the <strong>Tikhonov</strong> regularization method<br />

(ii) Update the regularization parameter η k+1 by<br />

x k+1 = arg m<strong>in</strong><br />

x<br />

{<br />

φ(x, y δ ) + η k ψ(x) } .<br />

η k+1 = α φ(x k+1, y δ ) 1−d<br />

ψ(x k+1 ) + β 0<br />

.<br />

(iii) Check the stopp<strong>in</strong>g criterion. If not converged, set k = k + 1 and repeat from Step (i).<br />

Be<strong>for</strong>e embark<strong>in</strong>g on the convergence analysis of the algorithm, we mention that we have not specified<br />

the solver <strong>for</strong> <strong>Tikhonov</strong> regularization problem <strong>in</strong> Step (i). The problem per se may be approximately<br />

solved with an iterative algorithm, e.g. the conjugate gradient method or iterative reweighted leastsquares<br />

method. Numerically, we have found that it will not affect the steady convergence of the algorithm<br />

much so long as the the problem is solved with reasonable accuracy.<br />

The follow<strong>in</strong>g lemma provides an <strong>in</strong>terest<strong>in</strong>g and practically very important observation on the monotonicity<br />

of the sequence {η k } k of regularization parameters generated by Algorithm I, and the monotonicity<br />

is key to the demonstration of the convergence of the algorithm.<br />

Lemma 4.1 For any <strong>in</strong>itial guess η 0 , the sequence {η k } k generated by Algorithm I converges monotonically.<br />

Proof.<br />

By the def<strong>in</strong>ition of η k , we have<br />

η k := α φ(x k, y δ ) 1−d<br />

ψ(x k ) + β 0<br />

.<br />

There<strong>for</strong>e,<br />

where the denom<strong>in</strong>ator D k is def<strong>in</strong>ed as<br />

η k − η k−1 = α φ(x k, y δ ) 1−d<br />

ψ(x k ) + β 0<br />

− α φ(x k−1, y δ ) 1−d<br />

ψ(x k−1 ) + β 0<br />

= α D k<br />

[I + β 0 II] , (14)<br />

D k = (ψ(x k−1 ) + β 0 ) (ψ(x k ) + β 0 ) .<br />

The terms I and II <strong>in</strong> the square bracket of equation (14) are respectively given by<br />

I := φ(x k , y δ ) 1−d ψ(x k−1 ) − φ(x k−1 , y δ ) 1−d ψ(x k )<br />

= φ(x k , y δ ) 1−d (ψ(x k−1 ) − ψ(x k )) + ψ(x k )(φ(x k , y δ ) 1−d − φ(x k−1 , y δ ) 1−d ),<br />

II := φ(x k , y δ ) 1−d − φ(x k−1 , y δ ) 1−d .<br />

We assume that η k−1 ≠ η k−2 , otherwise it is trivial. Lemma 3.1 <strong>in</strong>dicates that each term is of the same<br />

sign with η k − η k−1 , and thus the sequence {η k } k is monotone. Next we show that the sequence {η k } k is<br />

bounded. A trivial lower bound is zero. Now by the m<strong>in</strong>imiz<strong>in</strong>g property of x k , we deduce<br />

φ(x k , y δ ) + η k ψ(x k ) ≤ φ(˜x, y δ ) + η k ψ(˜x),<br />

where ˜x ∈ X satisfies ψ(˜x) = 0 by Assumption 3.1(c). Consequently,<br />

φ(x k , y δ ) ≤ φ(˜x, y δ ).<br />

12


There<strong>for</strong>e, the def<strong>in</strong>ition of η k gives<br />

η k = α φ(x k, y δ ) 1−d<br />

ψ(x k ) + β 0<br />

≤ α β 0<br />

φ(˜x, y δ ) 1−d ,<br />

i.e. the sequence {η k } k is uni<strong>for</strong>mly bounded, which comb<strong>in</strong>ed with the monotonicity yields the desired<br />

convergence.<br />

□<br />

Lemma 4.2 Assume that the functionals φ(x) and ψ(x) are differentiable, and let F (η) = φ(x η , y δ ) +<br />

ηψ(x η ). Then the asymptotic convergence rate r ∗ of the algorithm is dictated by<br />

r ∗ η ∗ − η k+1<br />

:= lim<br />

k→∞ η ∗ = −η∗ F ′′ (η ∗ ) [<br />

(1 − d)αφ(xη ∗, y δ ) −d + 1 ] .<br />

− η k ψ(x η ∗) + β 0<br />

Proof.<br />

Differentiat<strong>in</strong>g F (η) with respect to η gives<br />

F ′ (η) = dφ(x η, y δ ) dx η<br />

dx η<br />

dη + ηψ(x η) + η dψ(x η)<br />

dx η<br />

which tak<strong>in</strong>g <strong>in</strong>to account the optimality condition <strong>for</strong> x η gives<br />

dx η<br />

dx ,<br />

ψ(x η ) = F ′ (η) and φ(x η , y δ ) = F (η) − ηF ′ (η).<br />

The asymptotic convergence rate r ∗ of the algorithm is dictated by<br />

r ∗ η ∗ − η k+1<br />

:= lim<br />

k→∞ η ∗ = d<br />

− η k dη αφ(x η, y δ ) 1−d<br />

| η=η ∗ = α d<br />

ψ(x η ) + β 0 dη<br />

[F (η) − ηF ′ (η)] 1−d<br />

F ′ | η=η ∗<br />

(η) + β 0<br />

= α [F (η∗ ) − η ∗ F ′ (η ∗ )] −d F ′′ (η ∗ )[−(1 − d)η ∗ (F ′ (η ∗ ) + β 0 ) − (F (η ∗ ) − η ∗ F ′ (η ∗ ))]<br />

(F ′ (η ∗ ) + β 0 ) 2<br />

−η ∗ F ′′ (η ∗ ) [<br />

=<br />

(1 − d)η ∗<br />

(ψ(x η ∗) + β 0 )φ(x η ∗, y δ (ψ(x η ∗) + β 0 ) + φ(x η ∗, y δ ) ]<br />

)<br />

= −η∗ F ′′ (η ∗ ) [<br />

(1 − d)αφ(xη ∗, y δ ) −d + 1 ] .<br />

ψ(x η ∗) + β 0<br />

This establishes the lemma.<br />

□<br />

Remark 4.1 For the special case d = 0, the expression of rate r ∗ <strong>in</strong> Lemma 4.2 simplifies to<br />

r ∗ = (1 + α) −η∗ F ′′ (η ∗ )<br />

ψ(x η ∗) + β 0<br />

.<br />

The established monotone convergence of the sequence {η k } k implies that r ∗ ≤ 1, however, a precise<br />

estimate of the rate r ∗ is still miss<strong>in</strong>g. Nonetheless, a fast convergence is always numerically observed.<br />

Def<strong>in</strong>ition 4.1 [22] A functional ψ(x) is said to have the H-property on the space X if any sequence<br />

{x n } n ⊂ X weakly converg<strong>in</strong>g to a limit x 0 ∈ X and converg<strong>in</strong>g to x 0 <strong>in</strong> functional, i.e. ψ(x n ) → ψ(x 0 ),<br />

strongly converges to x 0 <strong>in</strong> X .<br />

This property is also known as the Efimov-Stechk<strong>in</strong> condition or the Kadec-Klee property <strong>in</strong> the literature.<br />

Norms and semi-norms on Hilbert spaces, and norms the spaces L p (Ω) and Sobolev spaces W m,p (Ω) with<br />

1 < p < ∞ and m ≥ 1 satisfy the H-property.<br />

Assisted with Lemma 4.1, we are now ready to prove the convergence of Algorithm I.<br />

Theorem 4.1 Assume that η ∗ > 0. Then every subsequence of the sequence {(x k , η k )} k generated by<br />

Algorithm I has a subsequence converg<strong>in</strong>g weakly to a solution (x ∗ , η ∗ ) of equation (9), and the convergence<br />

of the sequence {η k } k is monotonic. If the m<strong>in</strong>imizer of J η ∗(x) is unique, the whole sequence converges<br />

weakly. Moreover, if the functional ψ(x) satisfies the H-property, the weak convergence is actually strong.<br />

13


Proof.<br />

Lemma 4.1 shows that there exists some η ∗ such that<br />

lim η k = η ∗ > 0.<br />

k→∞<br />

By Lemma 3.1 and the monotonicity of the sequence {η k } k , we deduce that the sequences {φ(x k , y δ )} k<br />

and {ψ(x k )} k are monotonic. By η ∗ > 0 and Assumption 3.1, we observe that<br />

0 ≤ φ(x k , y δ ) ≤ φ(˜x, y δ ),<br />

0 ≤ ψ(x k ) ≤ max{ψ(x η0 ), ψ(x η ∗)}.<br />

There<strong>for</strong>e, the sequences {φ(x k , y δ )} k and {ψ(x k )} k are monotonically convergent. By Assumption 3.1(a),<br />

the sequence {x k } k is uni<strong>for</strong>mly bounded, and there exists a subsequence of {x k } k , also denoted as {x k } k ,<br />

and some x ∗ ∈ X , such that<br />

x k → x ∗ weakly.<br />

The m<strong>in</strong>imiz<strong>in</strong>g property of x k gives<br />

Lett<strong>in</strong>g k tend to ∞, we have<br />

φ(x k , y δ ) + η k−1 ψ(x k ) ≤ φ(x, y δ ) + η k−1 ψ(x), ∀ x ∈ X .<br />

φ(x ∗ , y δ ) + η ∗ ψ(x ∗ ) ≤ φ(x, y δ ) + η ∗ ψ(x), ∀ x ∈ X ,<br />

i.e., x ∗ is a m<strong>in</strong>imizer of the <strong>Tikhonov</strong> functional J η ∗. There<strong>for</strong>e, the element (x ∗ , η ∗ ) satisfies equation<br />

(9). Now if the m<strong>in</strong>imizer of the functional J η ∗ is unique, the whole sequence {x k } k converges weakly to<br />

x ∗ . Recall the monotone convergence<br />

lim<br />

k→∞ ψ(x k) = c ∗ ,<br />

<strong>for</strong> some constant c ∗ . Next we show that c ∗ = φ(x ∗ ). By the lower semi-cont<strong>in</strong>uity of ψ(x) we have<br />

φ(x ∗ ) ≤ lim <strong>in</strong>f ψ(x k) = lim ψ(x k) = c ∗ .<br />

k→∞<br />

k→∞<br />

Assume that c ∗ > ψ(x ∗ ), then by the cont<strong>in</strong>uity of the functional value J η (x η ) with respect to η, see<br />

Corollary 3.1, we have φ(x ∗ , y δ ) > lim k→∞ φ(x k , y δ ), which is <strong>in</strong> contradiction with the lower semicont<strong>in</strong>uity<br />

of the functional φ(x, y δ ). There<strong>for</strong>e, we deduce that<br />

lim ψ(x k) = ψ(x ∗ ),<br />

k→∞<br />

which together with the H-property of ψ(x) on the space X implies the desired strong convergence.<br />

□<br />

Remark 4.2 In the numerical algorithm, the quantity σ 2 (η) can also be computed<br />

σ 2 (η k ) = φ(x k, y δ )<br />

α 1<br />

,<br />

which estimates the variance σ 2 0 of the data noise, analogous to the highly applauded generalized crossvalidation<br />

[31]. By observ<strong>in</strong>g Lemmas 3.1 and 4.1, the sequence {σ 2 (η k )} k converges monotonically. One<br />

dist<strong>in</strong>ction of the estimate σ 2 (η) is that it changes very mildly dur<strong>in</strong>g the iteration, especially <strong>for</strong> severely<br />

ill-posed <strong>in</strong>verse problems.<br />

14


Table 1: Numerical examples.<br />

example description ill-posedness Cond(K) noise program φ-ψ<br />

1 Shaw’s problem severe 1.94 × 10 19 Gaussian shaw L 2 -L 2<br />

2 gravity survey<strong>in</strong>g severe 9.74 × 10 18 Gaussian gravity L 2 -H 2 with C<br />

3 differentiation mild 1.22 × 10 4 Gaussian deriv2 L 2 -T V<br />

4 Phillips’s problem mild 2.64 × 10 6 Gaussian phillips L 2 -L 2<br />

5 deblurr<strong>in</strong>g severe 2.62 × 10 12 impulsive deblur L 1 -T V<br />

5 Numerical experiments and discussions<br />

This section presents the numerical results <strong>for</strong> five benchmark <strong>in</strong>verse problems, which are adapted from<br />

Hansen’s popular MATLAB package <strong>Regularization</strong> Tool [17] and range from mild to severe ill-posedness,<br />

to illustrate salient features of the proposed rule. These are Fredholm (or Volterra) <strong>in</strong>tegral equations<br />

of the first k<strong>in</strong>d with kernel k(s, t) and solution x(t). The discretized l<strong>in</strong>ear system takes the <strong>for</strong>m<br />

Kx = y δ , and is of size 100 × 100. The regulariz<strong>in</strong>g functional is referred to as φ-ψ type, e.g. L 1 -T V<br />

denotes the one with L 1 data-fitt<strong>in</strong>g and T V regularization. Table 1 summarizes major features, e.g.<br />

degree of ill-posedness, of these examples, where the notation Cond(K) denotes the condition number<br />

of the matrix K, and relevant MATLAB programs are taken from the package. Let ε be the relative noise<br />

level, then we will consider five noise levels, i.e. ε ∈ {5 × 10 −2 , 5 × 10 −3 , 5 × 10 −4 , 5 × 10 −5 , 5 × 10 −6 },<br />

and graphically differentiated by dist<strong>in</strong>ct colors. Unless otherwise specified, the <strong>in</strong>itial guess <strong>for</strong> the<br />

regularization parameter η is η 0 = 1.0 × 10 −8 , and the value <strong>for</strong> the parameter pair (α, β 0 ) and the<br />

constant d is taken to be (0.1, 1 × 10 −4 ) and 1 3<br />

, respectively. The value <strong>for</strong> α follows from the rule of<br />

thumb that α 0 ′ ≈ 1 works well <strong>for</strong> full norms <strong>in</strong> case of the g-<strong>Tikhonov</strong> method and subsequently it is<br />

scaled by max i |y i | 2d to compensate <strong>for</strong> the effect of the component. Vector norms are rescaled so that<br />

the estimate σ 2 (η) is directly comparable with the variance σ0. 2 The nonsmooth m<strong>in</strong>imization problems<br />

aris<strong>in</strong>g from L 2 -T V , L 2 -L 1 and L 1 -T V <strong>for</strong>mulations are solved by the iterative reweighted least-squares<br />

method [18].<br />

We will term the newly proposed choice rule as g-<strong>Tikhonov</strong> rule to emphasize its <strong>in</strong>timate connection<br />

with the g-<strong>Tikhonov</strong> functional, and compare it with three other popular heuristic choice rules, i.e.<br />

quasi-optimality (QO) criterion, generalized cross-validation (GCV) and L-curve (LC) [16] criterion. The<br />

quasi-optimality criterion requires the differentiability of the <strong>in</strong>verse solution x η with respect to η and thus<br />

it might be unsuitable <strong>for</strong> nonsmooth functionals, e.g. L 2 -L 1 , and the GCV seems not directly amenable<br />

with problems with constra<strong>in</strong>t and nonsmooth functionals due to the lack of an explicit <strong>for</strong>mula <strong>for</strong><br />

comput<strong>in</strong>g the ‘effective’ degrees of freedom of the residual. Generally, the existence of a ‘corner’ on the<br />

L-curve is not guaranteed. Moreover, <strong>for</strong> regulariz<strong>in</strong>g functionals other than L 2 -L 2 type, the L-curve<br />

must be sampled at discrete po<strong>in</strong>ts, however, numerically locat<strong>in</strong>g the corner from discrete sample po<strong>in</strong>ts<br />

is highly nontrivial.<br />

5.1 Case 1: L 2 -L 2<br />

Example 1 (Shaw’s problem [17]). The functions k and x are given by k(s, t) = (cos s + cos t) ( s<strong>in</strong> u<br />

u<br />

with u(s, t) = π(s<strong>in</strong> s + s<strong>in</strong> t) and x(t) = 2e −6(t− 4 5 )2 + e −2(t+ 1 2 )2 , respectively, and the <strong>in</strong>tegration <strong>in</strong>terval<br />

is [− π 2 , π 2<br />

]. The data is contam<strong>in</strong>ated by additive Gaussian noise, i.e.<br />

yi δ = y i + max {|y i|}εξ i , 1 ≤ i ≤ 100,<br />

1≤i≤100<br />

where ξ i is the standard Gaussian random variable, and ε refers to the relative noise level. The variance<br />

σ 2 0 is related to the noise level ε by σ 0 = max 1≤i≤100 {|y i |}ε. The parameter α is taken to be α = 1.<br />

The automatically determ<strong>in</strong>ed value of the regularization parameter η depends on the realization of<br />

the random noise, and thus it is also random. The probability density p(η) is estimated from 1000 samples<br />

with kernel density estimation technique <strong>in</strong> a logarithmic scale [19] <strong>for</strong> Example 1, and it is shown <strong>in</strong><br />

) 2<br />

15


10<br />

20<br />

8<br />

8<br />

15<br />

7<br />

6<br />

6<br />

5<br />

p(η)<br />

4<br />

p(e)<br />

10<br />

p(σ 2 )<br />

4<br />

3<br />

2<br />

5<br />

2<br />

1<br />

0<br />

0<br />

0<br />

(a) 10 −10 10 −8 10 −6 10 −4 10 −2 10 0<br />

10 −3 10 −2 10 −1 10 0 10 1<br />

η<br />

(b) (c) 10 −10 10 −8 10 −6 10 −4 10 −2 10 0<br />

e<br />

σ 2<br />

Figure 1: Density of (a) η, (b) e, and (c) σ 2 <strong>for</strong> Example 1.<br />

Table 2: Numerical results <strong>for</strong> Example 1.<br />

ε σ 2 0 σ 2 GCV σ 2 GT η QO η LC η GCV η GT λ GT e QO e LC e GCV e GT<br />

5e-6 3.34e-10 2.48e-10 3.24e-10 2.87e-8 5.81e-10 2.30e-8 4.74e-7 1.01e0 3.26e-2 3.67e-2 3.28e-2 3.32e-2<br />

5e-5 3.31e-8 2.51e-8 2.60e-8 9.87e-6 6.27e-8 9.97e-7 8.83e-6 1.01e0 4.55e-2 5.60e-2 4.09e-2 4.53e-2<br />

5e-4 3.31e-6 2.52e-6 3.20e-6 3.68e-5 3.02e-6 1.70e-5 2.20e-4 1.01e0 5.33e-2 9.38e-2 5.86e-2 5.80e-2<br />

5e-3 3.31e-4 2.54e-4 2.72e-4 1.08e-2 1.53e-4 4.58e-4 4.34e-3 1.03e0 1.52e-1 7.48e-2 8.02e-2 1.35e-1<br />

5e-2 3.31e-2 2.52e-2 2.99e-2 1.02e-2 7.44e-3 1.16e-2 1.08e-1 1.13e0 1.60e-1 1.58e-1 1.61e-1 1.88e-1<br />

Figure 1(a). Here the dash-dotted, dotted, dashed and solid curves refer to results given by η QO , η LC ,<br />

η GCV and η GT , respectively. For medium noise levels, all three methods except the GCV work very well,<br />

however the variation of η QO and η LC are larger than that of η GT . The GCV fails <strong>for</strong> about 10% of the<br />

samples, signified by η GCV tak<strong>in</strong>g very small values, e.g. 1 × 10 −20 . This phenomenon occurs irrespective<br />

of noise levels, and it is attributed to the fact that the GCV curve is very flat [16], see e.g. Figure 2(c).<br />

On average, η LC decays to zero faster than η QO and η GT as the noise level σ0 2 tends to zero. Thus <strong>for</strong> low<br />

noise levels, η LC often takes very small values and spans over a broad scale, which renders its solution<br />

plagued with spurious oscillations, see e.g. the red dotted curve <strong>in</strong> Figure 1(b). The observation concurs<br />

with previous theoretical and numerical results of Hanke [14] that suggest the L-curve criterion may suffer<br />

from nonconvergence <strong>in</strong> case of smooth solutions. The quasi-optimality criterion fails also occasionally,<br />

as <strong>in</strong>dicated by the long tail, despite its overall robustness. There<strong>for</strong>e, the newly proposed choice rule is<br />

more robust than the other three. The <strong>in</strong>verse solution x η ∗ is also random. We utilize the accuracy error<br />

e def<strong>in</strong>ed below as the error metric<br />

e = ‖x η ∗ − x‖ L 2.<br />

The probability density p(e) of the accuracy error e is shown <strong>in</strong> Figure 1(b). The accuracy errors e QO<br />

and e GT are very similar despite the apparent discrepancies of the regularization parameters, whereas<br />

e LC and e GCV vary very broadly, especially at low noise levels, although the variation of e LC is much<br />

milder than that of e GCV . The estimates σGCV 2 and σ2 AT are practically identical, see Figure 1(c), which<br />

qualifies σGT 2 as an estimator of the variance. Interest<strong>in</strong>gly, σ2 GCV can slightly under-estimate the noise<br />

level σ0 2 compared σGT 2 due to the exceed<strong>in</strong>gly small regularization parameters chosen by the GCV.<br />

10 4 k<br />

10 2<br />

10 0<br />

σ 2<br />

λ<br />

η<br />

e<br />

2.5<br />

2<br />

1.5<br />

exact<br />

numerical<br />

10 −1 η<br />

10 −2<br />

10 −2<br />

x<br />

1<br />

G(η)<br />

10 −3<br />

10 −4<br />

0.5<br />

10 −6<br />

0<br />

(a)<br />

5 10 15 20<br />

0<br />

−2 −1.5 −1 −0.5<br />

(b)<br />

0<br />

t<br />

0.5 1 1.5 2<br />

10 −4<br />

(c) 10 −15 10 −10 10 −5 10 0 10 5<br />

Figure 2: (a) convergence of σ 2 , λ, η and e, (b) solution, and (c) GCV curve <strong>for</strong> Example 1 with ε = 5%.<br />

16


10<br />

8<br />

8<br />

8<br />

6<br />

6<br />

6<br />

p(η)<br />

4<br />

p(e)<br />

4<br />

p(σ 2 )<br />

4<br />

2<br />

2<br />

2<br />

0<br />

(a)<br />

10 −10 10 −8 10 −6 η<br />

10 −4 10 −2<br />

0<br />

0<br />

(b) 10 −4 10 −3 10 −2 e<br />

10 −1 10 0<br />

10<br />

(c)<br />

−10 10 −8 10 −6 10 −4<br />

σ 2 10 −2 10 0<br />

Figure 3: Density of (a) η, (b) e, and (c) σ 2 <strong>for</strong> Example 2.<br />

Next we <strong>in</strong>vestigate the convergence of Algorithm I <strong>for</strong> a particular realization of the noise. The<br />

numerical results are summarized <strong>in</strong> Figure 2 and Table 2. The algorithm converges with<strong>in</strong> five iterations,<br />

and thus it merits a fast convergence. Moreover, the convergence is rather steady, and a few extra<br />

iterations would not deteriorate the <strong>in</strong>verse solution. The estimate σ 2 (η) changes very little dur<strong>in</strong>g the<br />

iteration process, and a strik<strong>in</strong>g convergence with<strong>in</strong> one iteration is observed, concurr<strong>in</strong>g with previous<br />

numerical f<strong>in</strong>d<strong>in</strong>gs <strong>for</strong> severely ill-posed problems [19]. The convergence of the estimate σ 2 (η) is monotonic,<br />

substantiat<strong>in</strong>g the remark after Theorem 4.1. The estimate σGCV 2 also approximates reasonably<br />

σ0, 2 but it is less accurate than σGT 2 , see Table 2. The prefactor λ rema<strong>in</strong>s almost unchanged as the<br />

noise level varies, see Table 2, and thus η GT is <strong>in</strong>deed proportional to φ(x, y δ ) 1−d ≈ σ 2(1−d)<br />

0 = σ 4 3<br />

0 . The<br />

numerical solution rema<strong>in</strong>s accurate and stable <strong>for</strong> up to ε = 5%, see Figure 2(b).<br />

5.2 Case 2: L 2 -H 2 with constra<strong>in</strong>t<br />

Example 2. (1D gravity survey<strong>in</strong>g [17] with nonnegativity constra<strong>in</strong>t). The functions k and x are given<br />

( 1<br />

by k(s, t) = 1 4 16 + (s − t)2) − 3 2<br />

and x(t) = s<strong>in</strong>(πt) + 1 2<br />

s<strong>in</strong>(2πt), respectively, and the <strong>in</strong>tegration <strong>in</strong>terval<br />

is [0, 1]. The constra<strong>in</strong>ed optimization problems are solved by built-<strong>in</strong> MATLAB function quadprog.<br />

The presence of the constra<strong>in</strong>t rules out the usage of the quasi-optimality criterion and the GCV,<br />

and it can also distort the shape of the L-curve greatly so that a corner does not appear at all, e.g. <strong>in</strong><br />

case of the L 2 -L 2 functional. There does exist a dist<strong>in</strong>ct corner on the curve <strong>for</strong> the L 2 -H 2 functional,<br />

see Figure 4(c), however, it is numerically difficult to locate due to the lack of monotonicity and discrete<br />

nature of sampl<strong>in</strong>g po<strong>in</strong>ts. This causes the frequent failure of the MATLAB functions corner and l corner<br />

provided by the package, and visual <strong>in</strong>spection is required. The <strong>in</strong>convenience persists <strong>for</strong> the rema<strong>in</strong><strong>in</strong>g<br />

examples, and thus we do not <strong>in</strong>vestigate its statistical per<strong>for</strong>mance via comput<strong>in</strong>g relevant probability<br />

densities. The results <strong>for</strong> the L-curve criterion are obta<strong>in</strong>ed by manually locat<strong>in</strong>g the corner.<br />

However, the presence of constra<strong>in</strong>ts poses no difficulty to the proposed rule. Analogous to Example<br />

1, η GT and e GT are narrowly distributed, see Figures 3(a) and (b), which clearly illustrates its excellent<br />

scalability with respect to the noise level. The estimate σGT 2 peaks around the exact variance σ2 0, and<br />

always reta<strong>in</strong>s the correct magnitude, see Figure 3(c). Typical numerical results <strong>for</strong> Example 2 with<br />

ε = 5% are presented <strong>in</strong> Figure 4. A fast and steady convergence of the algorithm with<strong>in</strong> five iterations<br />

is aga<strong>in</strong> observed, see Figure 4(a), and similar convergence behavior is observed <strong>for</strong> other noise levels. In<br />

contrast, <strong>in</strong> order that the L-curve is representative, many po<strong>in</strong>ts on the curve must be sampled, which<br />

effectively dim<strong>in</strong>ishes its computational efficiency. The numerical solution is <strong>in</strong> good agreement with the<br />

exact one, see Figure 4(b) and Table 3. The accuracy error e GT improves steadily as the noise level ε<br />

decreases, and it compares very favorably with e LC , <strong>for</strong> which a nonconvergence phenomenon is observed.<br />

The prefactor λ changes very little as the noise level ε varies, and thus η GT decays at a rate commensurate<br />

with σ 4 3<br />

GT<br />

, see also Figure 3(a).<br />

17


Table 3: Numerical results <strong>for</strong> Example 2.<br />

ε σ0 2 σGT 2 η LC η GT λ GT e LC e GT<br />

5e-6 1.14e-9 8.62e-10 7.20e-12 3.72e-10 4.11e-4 1.71e-1 4.84e-4<br />

5e-5 1.14e-7 8.67e-8 7.91e-11 8.06e-9 4.12e-4 1.11e-1 7.83e-4<br />

5e-4 1.14e-5 8.57e-6 9.54e-9 1.74e-7 4.15e-4 6.86e-2 2.49e-2<br />

5e-3 1.14e-3 8.58e-4 5.18e-7 3.81e-6 4.24e-4 1.65e-2 7.20e-3<br />

5e-2 1.14e-1 8.69e-2 6.25e-5 1.04e-4 5.32e-4 1.10e-1 4.12e-2<br />

1.4<br />

1.2<br />

exact<br />

numerical<br />

10 0 k<br />

10 −2<br />

10 −4<br />

10 −6<br />

σ 2<br />

λ<br />

η<br />

e<br />

x<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

2<br />

||Lx|| 2 L<br />

10 8 ||Kx−y|| L<br />

2<br />

10 4<br />

10 0<br />

10 −8<br />

0<br />

(a)<br />

5 10 15 20<br />

0<br />

10 −4<br />

0 0.2 0.4 0.6 0.8 1<br />

(b) 10 −2 10 −1 10 0 10 1<br />

t<br />

(c)<br />

2<br />

Figure 4: (a) convergence of σ 2 , λ, η and e, (b) solution, and (c) L-curve <strong>for</strong> Example 2 with ε = 5%.<br />

5.3 Case 3: L 2 -T V<br />

Example 3 (numerical differentiation, adapted from deriv2 [16]). The functions k and x are given by<br />

k(s, t) =<br />

{<br />

s(t − 1), s < t,<br />

t(s − 1), s ≥ t,<br />

and x(t) =<br />

{<br />

1,<br />

1<br />

3 < t ≤ 2 3 ,<br />

0, otherwise,<br />

respectively, and the <strong>in</strong>tegration <strong>in</strong>terval is [0, 1]. The constant α is taken to be 5 × 10 −3 , and η 0 is<br />

1 × 10 −6 . The exact solution x is piecewise constant, and thus T V regularization is suitable.<br />

The regularization parameter η GT distributes narrowly, and the accuracy error e GT is mostly comparable<br />

with the noise level, see Figures 5(a) and (b), respectively. The sample mean of the estimate<br />

σ 2 GT (η) agrees excellently with the exact one σ2 0, see Figure 5(c). For <strong>in</strong>stance, <strong>in</strong> case of ε = 5%, the<br />

mean 1.23 × 10 −5 almost co<strong>in</strong>cides with the exact value. Typical numerical results <strong>for</strong> Example 3 are<br />

summarized <strong>in</strong> Figure 6 and Table 4. The L-curve has only an ambiguous corner, and the ambiguity<br />

persists <strong>for</strong> very low noise levels, e.g. ε = 5 × 10 −6 . Nevertheless, the regularization parameters chosen<br />

by these two rules are comparable, and the numerical results are practically identical, see Table 4.<br />

Algorithm I converges with<strong>in</strong> less than five iterations with an empirical asymptotic convergence rate<br />

r ∗ < 0.15 <strong>for</strong> all five noise levels, and moreover it tends to accelerate as σ 2 0 decreases, e.g. r ∗ ≈ 0.05<br />

<strong>for</strong> ε = 5 × 10 −6 . There<strong>for</strong>e, the algorithm is computationally very efficient. The reconstructed profile<br />

rema<strong>in</strong>s accurate and stable <strong>for</strong> ε up to 5%, see Figure 6(b) and Table 4. Note that it exhibits typical<br />

10<br />

8<br />

8<br />

8<br />

6<br />

6<br />

6<br />

p(η)<br />

4<br />

p(e)<br />

4<br />

p(σ 2 )<br />

4<br />

2<br />

2<br />

2<br />

0<br />

0<br />

0<br />

(a) 10 −10 10 −8 10 −6 10 −4 10 −2<br />

(b) 10 −4 10 −3 10 −2 10 −1 10 0<br />

10 −14 10 −12 10 −10 10 −8 10 −6 10 −4<br />

η<br />

(c)<br />

e<br />

σ 2<br />

Figure 5: Density of (a) η, (b) e and (c) σ 2 <strong>for</strong> Example 3.<br />

18


Table 4: Numerical results <strong>for</strong> Example 3.<br />

ε σ0 2 σGT 2 η LC η GT λ GT e LC e GT<br />

5e-6 1.19e-13 8.21e-14 6.43e-12 4.70e-10 2.49e-1 1.26e-3 5.77e-4<br />

5e-5 1.19e-11 8.79e-12 3.59e-9 1.06e-8 2.49e-1 5.01e-3 9.38e-3<br />

5e-4 1.19e-9 8.65e-10 1.26e-7 2.25e-7 2.48e-1 4.34e-2 4.93e-2<br />

5e-3 1.19e-7 8.60e-8 2.01e-6 4.78e-6 2.45e-1 8.68e-2 8.08e-2<br />

5e-2 1.19e-5 8.89e-6 7.05e-5 1.08e-4 2.51e-1 1.05e-1 9.69e-2<br />

1.2<br />

1<br />

10 2 k<br />

10 0<br />

10 −2<br />

10 −4<br />

σ 2<br />

λ<br />

η<br />

e<br />

x<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

exact<br />

numerical<br />

|x| TV<br />

10 4 ||Kx−y|| L<br />

2<br />

10 0<br />

10 −4<br />

0<br />

10 −6<br />

0<br />

(a)<br />

5 10 15 20<br />

−0.2<br />

0<br />

(b)<br />

0.2 0.4<br />

t<br />

0.6 0.8 1<br />

10 −8<br />

10 −10<br />

(c)<br />

10 −8 10 −6 2<br />

10 −4<br />

Figure 6: (a) convergence of σ 2 , λ, η and e, and (b) solution, and (c) L-curve <strong>for</strong> Example 3 with ε = 5%.<br />

stair-cases of TV regularization.<br />

5.4 Case 4: L 2 -L 1<br />

]<br />

χ|t−s|


Table 5: Numerical results <strong>for</strong> Example 4.<br />

ε σ0 2 σGT 2 η LC η GT λ GT e LC e GT<br />

5e-6 5.73e-12 3.59e-12 2.85e-9 3.90e-8 1.66e0 4.31e-4 5.64e-4<br />

5e-5 5.73e-10 3.81e-10 1.00e-8 8.74e-7 1.66e0 1.05e-2 6.70e-3<br />

5e-4 5.73e-8 3.94e-8 1.87e-7 1.92e-5 1.66e0 7.08e-2 3.12e-2<br />

5e-3 5.73e-6 4.10e-6 2.85e-5 4.25e-4 1.66e0 1.88e-1 5.91e-2<br />

5e-2 5.73e-4 4.27e-4 2.85e-3 9.45e-3 1.67e0 1.51e-1 6.88e-2<br />

1.5<br />

10 4 k<br />

10 0<br />

1<br />

0.5<br />

10 0<br />

x<br />

0<br />

||x|| L<br />

1<br />

10 4 ||Kx−y|| L<br />

2<br />

10 −4<br />

10 −4<br />

σ 2<br />

λ<br />

η<br />

e<br />

10 −8<br />

0<br />

(a)<br />

5 10 15 20<br />

−0.5<br />

−1<br />

exact<br />

numerical<br />

−1.5<br />

10 −12<br />

−6 −4 −2 0 2 4 6<br />

(b) 10 −8 10 −6 10 −4 10 −2 10 0<br />

t<br />

(c)<br />

2<br />

10 −8<br />

Figure 8: (a) convergence of σ 2 , λ, η and e, (b) solution, and (c) L-curve <strong>for</strong> Example 4 with ε = 5%.<br />

all five noise levels, and thus η GT scales as σ 4 3<br />

GT<br />

. The solution shows the feature of sparsity-promot<strong>in</strong>g<br />

L 1 regulariz<strong>in</strong>g: the locations of the all three small spikes are perfectly detected, and the retrieved<br />

magnitudes are reasonably accurate.<br />

5.5 Case 5: L 1 -T V<br />

Example 5 (Deblurr<strong>in</strong>g 1D image). The functions k and x are given by k(s, t) = 1<br />

4 √ (s−t)2<br />

e− 32 χ<br />

2π |s−t|


Table 6: Numerical results <strong>for</strong> Example 5.<br />

ε q σ0 2 σGT 2 η LC η GT λ GT e LC e GT<br />

1e-4 30% 2.18e-5 1.65e-5 5.34e-2 2.02e-2 4.97e0 4.03e-7 2.96e-7<br />

1e-3 30% 2.18e-4 1.64e-4 8.11e-2 6.38e-2 4.97e0 5.93e-7 4.68e-7<br />

1e-2 30% 2.18e-3 1.64e-3 4.33e-1 2.02e-1 4.97e0 8.18e-5 2.88e-6<br />

1e-1 30% 2.18e-2 1.64e-2 1.00e0 6.38e-1 4.97e0 2.93e-3 5.62e-4<br />

1e0 30% 2.18e-1 1.65e-1 1.52e0 2.02e0 4.97e0 9.42e-3 1.63e-2<br />

1e-4 50% 3.63e-5 3.27e-5 1.23e-1 2.84e-2 4.97e0 1.74e-4 3.77e-4<br />

1e-3 50% 3.63e-4 3.27e-4 1.23e-1 9.00e-2 4.97e0 1.03e-3 1.20e-3<br />

1e-2 50% 3.63e-3 3.28e-3 6.58e-1 2.85e-1 4.97e0 1.68e-2 8.58e-3<br />

1e-1 50% 3.63e-2 3.28e-2 6.58e-1 9.01e-1 4.97e0 2.69e-2 3.53e-2<br />

1e0 50% 3.63e-1 3.27e-1 1.00e0 2.85e0 4.97e0 3.81e-2 6.39e-2<br />

1.2<br />

10 0<br />

1<br />

0.8<br />

10 4<br />

10 1 k<br />

10 −1<br />

10 −2<br />

σ 2<br />

λ<br />

η<br />

e<br />

x<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

exact<br />

numerical<br />

|x| TV<br />

10 8 ||Kx−y|| L<br />

1<br />

10 0<br />

10 −4<br />

10 −8<br />

10 −3<br />

0<br />

(a)<br />

5 10 15 20<br />

−0.2<br />

0<br />

(b)<br />

20 40<br />

t<br />

60 80 100<br />

10 −12<br />

10<br />

(c)<br />

−2 10 −1 10 0<br />

Figure 10: (a) convergence of σ 2 , λ, η and e, (b) solution, and (c) L-curve <strong>for</strong> Example 5 with ε = 1 and<br />

q = 50%.<br />

e.g. [2, 6] <strong>in</strong> case of ε = 1 and q = 50%, see Figure 9(a), where the solid and dashed curves refer to q = 50%<br />

and q = 30%, respectively. However, the accuracy error e varies broadly, albeit mostly it is very small.<br />

This is <strong>in</strong>herent to the L 1 -T V functional [9] <strong>for</strong> which η plays the role of a characteristic parameter, and<br />

at some specific values the profile of the solution undergoes sudden transition. The mean of the estimate<br />

σGT 2 approximates the exact noise level σ2 0 excellently, e.g. 3.62×10 −2 <strong>in</strong> case of ε = 1×10 −1 and q = 50%,<br />

which co<strong>in</strong>cides with the exact noise level, see Table 6. There<strong>for</strong>e, the estimate σGT 2 is a valid variance<br />

estimate <strong>for</strong> non-Gaussian noise. Note that the estimate σGT 2 <strong>for</strong> q = 30% is generally less accurate than<br />

that <strong>for</strong> q = 50% due to the less effective samples available, as <strong>in</strong>dicated by the looser probability bound.<br />

Interest<strong>in</strong>gly, the numerical results of the <strong>for</strong>mer are far more accurate than that of the latter, usually by<br />

one order <strong>in</strong> magnitude, which clearly illustrates the profound effect of the corruption percentage q on<br />

the <strong>in</strong>verse solution.<br />

Exemplary numerical results <strong>for</strong> Example 5 with ε = 1 and q = 50% noise <strong>in</strong> the data are presented <strong>in</strong><br />

Figure 10. The L-curve does not exhibit the typical L-shape, and there is no dist<strong>in</strong>ct corner, see Figure<br />

10(c), which causes its ambiguity <strong>in</strong> practical usage. This concurs with previous theoretical and numerical<br />

observations <strong>in</strong> the context of image denois<strong>in</strong>g [9]. Moreover, it lacks concavity and monotonicity due to<br />

limited precision of the solver <strong>for</strong> the optimization problem, and there are many po<strong>in</strong>ts cluster<strong>in</strong>g around<br />

the ‘corner’, which renders its accurate numerical locat<strong>in</strong>g very challeng<strong>in</strong>g. Practically, the discont<strong>in</strong>uity<br />

po<strong>in</strong>ts, as <strong>in</strong>dicated by the small steps along the curve, may serve as a regularization parameter, and<br />

the lower step is selected. The results presented <strong>in</strong> Table 6 <strong>for</strong> the L-curve criterion are obta<strong>in</strong>ed with<br />

this rule of thumb. The numerical results are accurate, and thus the ad hoc rule seems viable. A steady<br />

and fast convergence of the algorithm is also observed, and tak<strong>in</strong>g account <strong>in</strong>to the large amount of data<br />

noise, the converged profile approximates excellently the exact one.<br />

21


6 Conclud<strong>in</strong>g remarks<br />

This paper proposes and analyzes a new rule <strong>for</strong> choos<strong>in</strong>g the regularization parameter <strong>in</strong> <strong>Tikhonov</strong><br />

regularization. The existence of solutions to the regularization parameter equation is shown, a variational<br />

characterization is provided, and a posterior error estimates of the <strong>in</strong>verse solution are derived. An<br />

effective iterative algorithm is suggested, and its monotonically convergence is <strong>in</strong>vestigated. Results of<br />

five regulariz<strong>in</strong>g <strong>for</strong>mulations, i.e. L 2 -L 2 , L 2 -H 2 with constra<strong>in</strong>t, L 2 -T V , L 2 -L 1 and L 1 -T V , <strong>for</strong> several<br />

benchmark examples are presented to illustrate its numerical features. The numerical results <strong>in</strong>dicate<br />

that the proposed rule is competitive with exist<strong>in</strong>g heuristic choice rules, e.g. the L-curve criterion and<br />

generalized cross-validation, and the algorithm merits a fast and steady convergence. Unlike most exist<strong>in</strong>g<br />

choice rules, solid mathematical justification is established <strong>for</strong> the new choice rule.<br />

There are several avenues <strong>for</strong> future research, which are currently under <strong>in</strong>vestigation. For large-scale<br />

<strong>in</strong>verse problems, iterative regularization methods, e.g. Landweber method, conjugate gradient method<br />

and generalized m<strong>in</strong>imal residual method, are of immense practical <strong>in</strong>terest. It is of great practical<br />

significance to develop analogous choice rules <strong>for</strong> these methods. Secondly, some applications require<br />

multiple regularization terms <strong>for</strong> preserv<strong>in</strong>g several dist<strong>in</strong>ct features simultaneously, and consequently,<br />

call <strong>for</strong> multiple regularization parameters. Thus extend<strong>in</strong>g the idea to this context would be <strong>in</strong>terest<strong>in</strong>g.<br />

Acknowledgements<br />

The authors would like to thank Professor Per Christian Hansen <strong>for</strong> his MATLAB package <strong>Regularization</strong><br />

Tool with which some of our numerical experiments were conducted.<br />

References<br />

[1] Archer G, Tigger<strong>in</strong>gton DM. On some Bayesian/regularization methods <strong>for</strong> image restoration. IEEE<br />

Transactions on Image Process<strong>in</strong>g 1995;4(7):989–995.<br />

[2] Aubert G, Aujol JF. A variational approach to remove multiplicative noise. SIAM Journal on Applied<br />

Mathematics 2008;68(4): 925–946.<br />

[3] Bauer F, K<strong>in</strong>dermann S. The quasi-optimality criterion <strong>for</strong> classical <strong>in</strong>verse problems. Inverse Problems<br />

2008;24(3): 035002.<br />

[4] Banks HT, Kunisch K. Estimation Techniques <strong>for</strong> Distributed Parameter Systems. Birkhäuser:<br />

Boston; 1989.<br />

[5] Bazán FSV. Fixed po<strong>in</strong>t iterations <strong>in</strong> determ<strong>in</strong><strong>in</strong>g the <strong>Tikhonov</strong> regularization parameter. Inverse<br />

Problems 2008;24: 0350001.<br />

[6] Beck JV, Blackwell B, Clair CRS. Inverse Heat Conduction: Ill-Posed Problems. Wiley: <strong>New</strong> York;<br />

1985.<br />

[7] Besag J. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, Series<br />

B (Methodological) 1986;48(3): 259–302.<br />

[8] Burger M, Osher S. Convergence rates of convex variational regularization. Inverse Problems<br />

2004;20(5): 1411–1421.<br />

[9] Chan TF, Esedoḡlu S. Aspects of total variation regularized L 1 function approximation. SIAM<br />

Journal on Applied Mathematics 2005;65(5): 1817–1837.<br />

[10] Chan TF, Shen J. Image Process<strong>in</strong>g and Analysis: Variational, PDE, Wavelet, and Stochastic Methods.<br />

SIAM: Philadelphia; 2005.<br />

22


[11] Engl HW, Hanke M, Neubauer A. <strong>Regularization</strong> of Inverse Problems. Kluwer: Dordrecht; 1996.<br />

[12] Gelman A, Carl<strong>in</strong> JB, Stern HS, Rub<strong>in</strong> DB. Bayesian Data Analysis (2nd ed.). Chapman & Hall:<br />

Boca Raton, 2004.<br />

[13] Golub GH, Heath MT, Wahba G. Generalized cross-validation as a method <strong>for</strong> choos<strong>in</strong>g a good ridge<br />

parameter. Technometrics 1979;21(2): 215–223.<br />

[14] Hanke M. Limitations of the L-curve method <strong>in</strong> ill-posed problems. BIT 1996;36(2): 287–301.<br />

[15] Hansen PC. Analysis of discrete ill-posed problems by means of the L-curve. SIAM Review 1992;<br />

34(4): 561–580.<br />

[16] Hansen PC. Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of L<strong>in</strong>ear Inversion.<br />

SIAM: Philadelphia; 1998.<br />

[17] Hansen PC. <strong>Regularization</strong> Tools Version 4.0 <strong>for</strong> Matlab 7.3. Numerical Algorithms 2007;46(2):<br />

189–194. Software available at: http://www2.imm.dtu.dk/∼pch/Regutools/<strong>in</strong>dex.html.<br />

[18] H<strong>in</strong>termüller M, Ito K, Kunisch K. The primal-dual active set strategy as a semismooth <strong>New</strong>ton<br />

method. SIAM Journal on Optimization 2003;13(3): 865–888.<br />

[19] J<strong>in</strong> B, Zou J. A Bayesian <strong>in</strong>ference approach to the ill-posed Cauchy problem of steady-state heat<br />

conduction. International Journal <strong>for</strong> Numerical Methods <strong>in</strong> Eng<strong>in</strong>eer<strong>in</strong>g, <strong>in</strong> press. Available at:<br />

http://doi.wiley.com/10.1002/nme.2350.<br />

[20] Johnston PR, Gulrajani RM. An analysis of the zero-cross<strong>in</strong>g method <strong>for</strong> choos<strong>in</strong>g regularization<br />

parameters. SIAM Journal on Scientific Comput<strong>in</strong>g 2002;24(2): 428–442.<br />

[21] Kunisch K, Zou J. Iterative choices of regularization parameters <strong>in</strong> l<strong>in</strong>ear <strong>in</strong>verse problems. Inverse<br />

Problems 1998;14(5): 1247–1264.<br />

[22] Leonov AS. <strong>Regularization</strong> of ill-posed problems <strong>in</strong> Sobolev space W 1 1 . Journal of Inverse and Ill-<br />

Posed Problems 2005;13(6): 595-619.<br />

[23] Natterer F. The Mathematics of Computerized Tomography. Teubner: Stuttgart; 1986.<br />

[24] Nikolova M. M<strong>in</strong>imizers of cost-functions <strong>in</strong>volv<strong>in</strong>g non-smooth data-fidelity terms: application to<br />

the process<strong>in</strong>g of outliers. SIAM Journal on Numerical Analysis 2002;40(3): 965–994.<br />

[25] Morozov VA. On the solution of functional equations by the method of regularization. Soviet Mathematics<br />

Doklady 1966;7(3): 414–417.<br />

[26] Reg<strong>in</strong>ska T. A regularization parameter <strong>in</strong> discrete ill-posed problems. SIAM Journal on Scientific<br />

Comput<strong>in</strong>g 1996;17(3): 740–749.<br />

[27] Thompson AM, Kay J. On some choices of regularization parameter <strong>in</strong> image restoration. Inverse<br />

Problems 1993;9(6): 749–761.<br />

[28] <strong>Tikhonov</strong> AN, Arsen<strong>in</strong> VA. Solutions of Ill-posed Problems. W<strong>in</strong>ston & Sons: Wash<strong>in</strong>gton; 1977.<br />

[29] <strong>Tikhonov</strong> AN, Glasko V, Kriks<strong>in</strong> Y. On the question of quasi-optimal choice of a regularized approximation.<br />

Soviet Mathematics Doklady 1979;20(5): 1036–1040.<br />

[30] Vogel CR. Computational Methods <strong>for</strong> Inverse Problems. SIAM: Philadelphia, PA; 2002.<br />

[31] Wahba G. Spl<strong>in</strong>e Models <strong>for</strong> Observational Data. SIAM: Philadelphia; 1990.<br />

[32] Xie JL, Zou J. An improved model function method <strong>for</strong> choos<strong>in</strong>g regularization parameters <strong>in</strong> l<strong>in</strong>ear<br />

<strong>in</strong>verse problems. Inverse Problems 2002;18(3): 631–643.<br />

23

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!