A New Choice Rule for Regularization Parameters in Tikhonov ...

A New Choice Rule for Regularization Parameters in Tikhonov ... A New Choice Rule for Regularization Parameters in Tikhonov ...

from math.tamu.edu More from this publisher

28.04.2014 Views

Research Report A New Choice Rule for Regularization Parameters in Tikhonov Regularization by Kazufumi Ito, Bangti Jin, Jun Zou CUHK-2008-07 (362) September 2008 Department of Mathematics The Chinese University of Hong Kong Shatin, Hong Kong Fax: (852) 2603 5154 Email : dept@math.cuhk.edu.hk URL: http://www.cuhk.edu.hk

Research Report

A New Choice Rule for Regularization Parameters

in Tikhonov Regularization

by

Kazufumi Ito, Bangti Jin, Jun Zou

CUHK-2008-07 (362)

September 2008

Department of Mathematics

The Chinese University of Hong Kong

Shatin, Hong Kong

Fax: (852) 2603 5154

Email : dept@math.cuhk.edu.hk

URL: http://www.cuhk.edu.hk

A New Choice Rule for Regularization Parameters

in Tikhonov Regularization

Kazufumi Ito ∗ Bangti Jin † Jun Zou ‡

August 26, 2008

Abstract This paper proposes and analyzes a novel rule for choosing the regularization parameters in

Tikhonov regularization for inverse problems, not necessarily requiring the knowledge of the exact noise

level. The new choice rule is derived by drawing ideas from Bayesian statistical analysis. The existence

of solutions to the regularization parameter equation is shown, and some variational characterizations of

the feasible parameters are also provided. With such feasible regularization parameters, we are able to

establish a posteriori error estimates of the approximate solutions to the concerned inverse problem. An

iterative algorithm is suggested for the efficient numerical realization of the choice rule, which is shown to

have a practically desired monotonic convergence. Numerical experiments for both mildly and severely

ill-posed benchmark inverse problems with various regularizing functionals of Tikhonov type, e.g. L 2 -L 2 ,

L 2 -L 1 and L 1 -T V , are presented which have demonstrated the effectiveness and robustness of the new

choice rule.

Key Words: regularization parameter, a posteriori error estimate, Tikhonov regularization, inverse

problem

1 Introduction

Inverse problems arise in real-world applications whenever one attempts to infer the physical laws or

parameters from imprecise and indirect observational data. This work considers all the linear inverse

problems of general form

Kx = y δ , (1)

where x ∈ X and y δ ∈ Y refer to the unknown parameters and the observational data, respectively.

The spaces X and Y are Banach spaces, with the respective norms denoted as ‖ · ‖ X and ‖ · ‖ Y , and

X is reflexive. The forward operator K : dom(K) ⊂ X ↦→ Y is linear and bounded. System (1) can

represent a wide variety of inverse problems arising in diverse industrial and engineering applications,

e.g. computerized tomography [23], parameter identification [4], image processing [10] and inverse heat

transfer [6]. The observational data y δ is a noisy version of the exact data y = Kx + , and its noise level

is often measured by the upper bound σ 0 in the following inequality

φ(x + , y δ ) ≤ σ 2 0, (2)

where the functional φ(x, y δ ) : X × Y ↦→ R + measures the proximity of the model output Kx to the data

y δ . We use the notation σ 2 0 in (2) in place of the more commonly used δ 2 to maintain its clear statistical

interpretation as the variance of the data noise.

∗ Center for Research in Scientific Computation & Department of Mathematics, North Carolina State University, Raleigh,

NC 27695, USA. (kito@math.ncsu.edu)

† Universität Bremen, FB 3 Mathematik und Informatik, Zentrum für Technomathematik, Postfach 330 440, 28344

Bremen, Germany. (kimbtsing@yahoo.com.cn)

‡ Department of Mathematics, The Chinese University of Hong Kong, Shatin N.T., Hong Kong, P.R. China. The

work of this author was substantially supported by Hong Kong RGC grants (Project 404105 and Project 404606).

(zou@math.cuhk.edu.hk)

Inverse problems are generally ill-posed in the sense of Hardamard, i.e. a solution may not exist and

be nonunique, and more severely, a small perturbation in the data may cause an enormous deviation of

the solution. Therefore, the mathematical analysis and numerical solution of inverse problems are very

challenging. The standard procedure for numerically treating inverse problems is regularization thanks

to Tikhonov’s inaugural work [28]. Regularization techniques mitigate the ill-posedness by incorporating

a priori information about the solution, e.g. boundedness, smoothness and positivity [28, 11]. The

celebrated Tikhonov regularization transforms the solution of system (1) into the minimization of the

Tikhonov functional J η defined by

J η (x) = φ(x, y δ ) + ηψ(x), (3)

and takes its minimizer x η as an approximate solution, where η is often called the regularization parameter

compromising the data fitting term φ(x, y δ ) with the a priori information encoded in the regularization

term ψ(x). Some commonly used data fidelity functionals include ‖Kx − y δ ‖ 2 L

[28], ‖Kx − y δ ‖ 2 L 1 [24]

and ∫ (Kx − y δ log Kx) [2], and regularization functionals include ‖x‖ ν L [24], ν ‖x‖2 H [28] and |x| m T V

[10]. Traditionally, Tikhonov regularization considers only L 2 data fitting in conjunction with L 2 or

H m regularization, referred to as L 2 -L 2 functional hereafter, which statistically corresponds to additive

Gaussian noise and smoothness prior, respectively. However, other nonconventional and nonstandard

functionals have also received considerable recent attention, e.g. statistically motivated data fitting and

feature-promoting, e.g. edge, sparsity and texture, regularization.

The regularization parameter η determines the tradeoff between data fidelity and a priori information,

and it plays an indispensable role in designing a stable inverse reconstruction process and obtaining a

practically acceptable inverse solution. To be more precise, the inverse solution is overwhelmed by

the prior knowledge if η is too large and it often leads to undesirable effects, e.g. over-smooth, and

conversely, it may be unstable and plagued with spurious and nonphysical details if η is too small.

Therefore, its selection constitutes one of the major inconveniences and difficulties in applying existing

regularization techniques, and is crucial to the success of a regularization method. A number of choice

rules have been proposed in the literature, e.g. discrepancy principle [21, 25, 32], unbiased predictive risk

estimator (UPRE) method [30], quasi-optimality criterion [29], generalized cross-validation (GCV) [13],

and L-curve criterion [15]. The discrepancy principle is mathematically rigorous, however, it requires

an accurate estimate of the exact noise level σ 0 and its inaccuracy can severely deteriorate the inverse

solution [11, 21]. The UPRE method was originally developed for model selection in linear regression,

and was later adapted for choosing the regularization parameter [30]. However, its application requires

an estimate of data noise like the discrepancy principle, and the minimization of the UPRE curve is

tricky since it may be very flat over a broad scale [30], which is also the case for the GCV curve [16]. The

latter three do not require any a priori knowledge of the noise level σ 0 , and thus are fully data-driven.

These methods have been very popular in the engineering community since their inception and also have

delivered satisfactory performance for numerous practical inverse problems [16]. However, these methods

are heuristic in nature, and can not be analyzed in the framework of deterministic inverse theory [11].

Nonetheless, their mathematical underpinnings might be laid down in the context of statistical inverse

theory, e.g. the semidiscrete semistochastic linear data model [30], though such analysis is seldom carried

out for general regularization formulations.

Another principled framework for selecting the regularization parameter is Bayesian inference [7, 12].

Thompson and Kay [27] and Archer and Tiggerington [1] investigated the framework in the context

of image restoration, and proposed and numerically evaluated several choice rules by considering various

point estimates, e.g. maximum likelihood estimate and maximum a posteriori, of the posteriori

probability density function and their approximations. However, these were application-oriented papers

comparing different methods with neither mathematical analysis nor algorithmic description. Motivated

by hierarchical modeling of Bayesian paradigm [12], the authors [19] recently proposed an augmented

Tikhonov functional which determines the regularization parameter and the noise level along with the

inverse solution for finite-dimensional linear inverse problems.

In this paper, we will investigate the Tikhonov regularization in a general setting, with a general

data fitting term φ(x, y δ ) and regularization term ψ(x) in (3), and propose a new choice rule for finding

a reasonable regularization parameter η. The derivation of the parameter choice rule from the point

of view of hierarchical Bayesian inference will be detailed in Section 2. As we will see, the new rule

preserves an important advantage of some other existing heuristic rules in that it does not require the

knowledge of the noise level as well. But for this new rule, some solid theoretical justifications can be

developed, especially a posteriori error estimates shall be established. In addition, an iterative algorithm

of monotone type is developed for an efficient realization of the algorithm in practice, and it merits a fast

and steady convergence.

The newly proposed choice rule has several more distinctions in comparison with existing heuristic

choice rules. Various nonconvergence results have been established for the L-curve criterion [14, 30] and

thus the variation of the regularization parameter is unduly large in case of low noise levels, and the

existence of a corner is not be ensured. The theoretical understanding of the quasi-optimality criterion is

very limited despite its popularity [3]. The GCV merits solid statistical justifications [31, 11], however, the

existence of a minimum is not guaranteed. Moreover, in the L-curve criterion, numerically locating the

corner from discrete sampling points is highly nontrivial. The GCV curve is often very flat and numerically

difficult to minimize, and it sometimes requires tight bounds on the regularization parameter so as to

work robustly. For functionals other than L 2 -L 2 type, all three existing methods require computing

the inverse solution at many discrete sampling points, and thus computationally very expensive. The

newly proposed choice rule basically eliminates these computational inconveniences by the efficient and

monotonically convergent iterative algorithm, while at the same time it can be justified mathematically

as it is done in Sections 3 and 4. Moreover, the new choice rule applies straightforwardly to Tikhonov

regularization of very general type, e.g. L 1 -T V , whereas other rules are numerically validated and

theoretically attacked mostly for functionals of L 2 -L 2 types.

We conclude this section with a general remark on heuristic choice rules. A well-known theorem of

Bakushinskii [11] states that no deterministic convergence theory can exist for choice rules disrespecting

the exact noise level. In particular, the inverse solution does not necessarily converge to the exact solution

as the noise level diminishes to zero. Therefore, we reiterate that no choice rule, in particular heuristics, for

choosing the regularization parameter in ill-posed problems should be considered a “black-box routine”.

One can always construct examples where the heuristic choice rules perform poorly.

The rest of the paper is structured as follows. In Section 2, we derive the new choice rule within the

Bayesian paradigm. Section 3 shows the existence of solutions to the regularization parameter equation,

and derives some a posteriori error estimates. Section 4 proposes an iterative algorithm for efficient

numerical computation, and establishes the monotone convergence of the algorithm. Section 5 presents

numerical results for several benchmark linear inverse problems to illustrate relevant features of the

proposed method. We conclude and indicate directions of future research in Section 6.

2 Derivation of the new choice rule

In this section, we shall motivate our new deterministic choice rule by drawing some ideas from the

nondeterministic Bayesian inference [12, 19] which was used for a different purpose in the statistical

community. But the choice rule will be rigorously analyzed and justified in the framework of deterministic

inverse theory, as it is done in the subsequent sections.

For the ease of exposition, we shall derive our new choice rule by considering the following finitedimensional

linear inverse problem

Kx = y δ , (4)

with K ∈ R n×m , x ∈ R m and y δ ∈ R n . One principled approach to provide solutions to this problem is

by Bayesian inference [12, 19]. The cornerstone of Bayesian inference is Bayes’ rule

p(x|y δ ) ∝ p(y δ |x)p(x),

where the probability and conditional probability density functions p(x) and p(y δ |x) are known as the

prior and likelihood function, and reflect the prior knowledge and contributions of the data, respectively.

Also we have dropped the normalizing constant since it plays only an immaterial role in our subsequent

developments. Therefore, there are two building blocks in Bayesian inference, i.e. p(y δ |x) and p(x), that

are to be modeled. Assume that additive i.i.d. Gaussian random variables with mean zero and variance σ 2

account for the measurement errors contaminating the exact data, then the likelihood function p(y δ |x, τ),

with τ = 1/σ 2 , is given by

p(y δ |x, τ) ∝ τ n 2 exp

(− τ )

2 ‖Kx − yδ ‖ 2 2 .

Bayesian inference encodes the a priori information of the unknown x before collecting the data in the

prior density function p(x|λ), and this is often achieved with the help of the versatile tool of Markov

random field, which in its simplest form can be mathematically written as

p(x|λ) ∝ λ m 2 exp

(− λ )

2 ‖Lx‖2 2 ,

where the matrix L ∈ R p×m encapsulates the structure of interactions between neighboring sites, and typically

corresponds to some discretized differential operator. The scale parameter λ dictates the strength

of the interaction. Unfortunately, the scale parameter λ and the inverse variance τ are often nontrivial to

assign and calibrate despite their critical role in the statistical modeling. The Bayesian paradigm resolves

the difficulty flexibly through hierarchical modeling. The underlying idea is to regard them as unknowns

and to let the data determine these parameters. More precisely, they are also modeled as random variables,

and have their own priors. We follow the standard statistical practice of adopting conjugate priors

for both λ and τ [12], i.e.

p(λ) ∝ λ α0−1 e −β0λ and p(τ) ∝ τ α1−1 e −β1τ ,

where (α 0 , β 0 ) and (α 1 , β 1 ) are the parameter pairs for the prior distributions of λ and τ, respectively. By

combining these densities via Bayes’ rule, we arrive at the complete Bayesian solution, i.e. the posterior

probability density function (PPDF) p(x, λ, τ|y δ ), to the inverse problem (4):

p(x, λ, τ|y δ ) ∝ p(y δ |x, τ) · p(x|λ) · p(λ) · p(τ)

∝ τ n 2 exp

(− τ )

2 ‖Hx − yδ ‖ 2 2 · λ m 2 exp

(− λ )

2 ‖Lx‖2 2 · λ α0−1 e −β 0λ · τ α1−1 e −β1τ .

The PPDF encapsulates complete information about the unknown x and the parameters λ and τ.

The maximum a posteriori remains the most popular Bayesian estimate, and it selects (x, λ, τ) map as the

most probable one given the observational data y δ . More precisely, it proceeds as follows:

(x, λ, τ) map = arg max p(x, λ,

(x,λ,τ) τ|yδ ) = arg min J (m, λ, τ),

(x,λ,τ)

where the functional J (x, λ, τ) is defined by

J (x, λ, τ) = τ 2 ‖Kx − yδ ‖ 2 2 + λ ( m

)

( n

)

2 ‖Lx‖2 2 + β 0 λ −

2 + α 0 − 1 ln λ + β 1 τ −

2 + α 1 − 1 ln τ.

Abusing the notations α 0 , β 0 , α 1 and β 1 slightly, its formal limit as m, n → ∞ suggests a new functional

of continuous form

J (x, λ, τ) = τ 2 ‖Kx − yδ ‖ 2 L + λ ( )

( )

1 1 2

2 ‖Lx‖2 L + β 0λ − 2 2 + α 0 ln λ + β 1 τ −

2 + α 1 ln τ,

where the operators K and L are continuous analogs of the matrices K and L, respectively. Upon letting

α ′ 0 = 1 2 + α 0 and α ′ 1 = 1 2 + α 1, we arrive at

J (x, λ, τ) = τ 2 ‖Kx − yδ ‖ 2 2 + λ 2 ‖Lx‖2 2 + β 0 λ − α ′ 0 ln λ + β 1 τ − α ′ 1 ln τ.

This naturally motivates the following generalized Tikhonov (g-Tikhonov for short) functional

J (x, λ, τ) = τφ(x, y δ ) + λψ(x) + β 0 λ − α ′ 0 ln λ + β 1 τ − α ′ 1 ln τ (5)

defined for (x, λ, τ) ∈ X × R + × R + . This extends the Tikhonov functional (3), but will never be utilized

to solve the inverse problem (1). Functional J (x, λ, τ) is introduced only in the hope to help construct

an adaptive algorithm for selecting a reasonable regularization parameter η in (3), which will be our

interested solver for (1).

We are now going to derive the algorithm for adaptively updating the parameter η by making use

of the optimality system of functional J (x, λ, τ), and detecting the noise level σ 0 in a fully data-driven

manner. As we will see, the parameter η and the noise level σ(η) are connected with the parameters λ and

τ in (5) by the relations η := λτ −1 and σ 2 (η) = τ −1 . Noting that we are considering a general setting,

the functional might be nonsmooth and nonconvex, so we shall resort to optimality in a generalized sense.

Definition 2.1 An element (x ∗ , λ ∗ , τ ∗ ) ∈ X × R + × R + is called a critical point of the functional (5) if

it satisfies the following generalized optimality system

x ∗ {

= arg min φ(x, y δ ) + λ ∗ (τ ∗ ) −1 ψ(x) } ,

x∈X

ψ(x ∗ ) + β 0 − α 0

′ 1

= 0, (6)

λ∗ φ(x ∗ , y δ ) + β 1 − α 1

′ 1

τ ∗ = 0.

Note that the solution x ∗ coincides with the Tikhonov solution x η ∗ in (3) with η ∗ := λ∗

τ

. Numerical

∗

experiments indicate that the estimate σ 2 (η ∗ ) = 1

τ

= ϕ(x η ∗ ,yδ )+β 1

∗ α

represents an excellent approximation

′

1

to the exact variance σ0 2 for the choice α 1 = β 1 ≈ 0, like the highly applauded GCV estimate of the

variance [31].

From the optimality system (6), the automatically determined regularization parameter η ∗ verifies

α ′ 0

η ∗ := λ ∗ · (τ ∗ ) −1 =

· ϕ(x η ∗, yδ ) + β 1

. (7)

ψ(x η ∗) + β 0

Under the premise that the estimate σ 2 (η ∗ ) approximates accurately the exact variance σ0, 2 the defining

relation (7) implies that the regularization parameter η ∗ verifies the inequality η ∗ α′ 0

β 0

σ0. 2 Experimentally,

we have also observed that the value of the scale parameter λ ∗ :=

α ′ 1

α ′ 0

ψ(x η ∗ )+β 0

is almost independent

of the noise level σ 0 for fixed α ′ 0 and β 0 , and thus empirically speaking, η ∗ is of the order O(σ 2 0). However,

the deterministic inverse theory [11] requires a regularization parameter choice rule ˜η(σ 0 ) verifying

lim ˜η(σ σ0

2

0) = 0 and lim

σ 0→0 σ 0→0 ˜η(σ 0 ) = 0 (8)

in order to yield a valid regularizing scheme, i.e. the inverse solution converges to the exact one as the

noise level σ 0 diminishes to zero. Therefore, the g-Tikhonov method (5) is bounded to under-regularize

the inverse problem (1) in case of low noise levels, i.e. the regularization parameter η ∗ is too small.

Numerical findings also corroborate the assertion, evidenced by the under-regularization in case of low

noise levels. One promising approach to remedy the difficulty is to rescale α 0 ′ as σ0 −d with 0 < d < 2

as σ 0 tends to zero in order to ensure the consistency conditions dictated by equation (8). Therefore, it

seems natural to adaptively update α 0 ′ using the automatically determined noise level σ 2 (η ∗ ).

Our new choice rule derives from equation (7) and preceding arguments. Upon abusing the notation

α 0 ′ by identifying α 0 ′ with its rescaling with the automatically determined σ 2 (η ∗ ), it consists of choosing

the regularization parameter η ∗ by the rule

η ∗ =

α ′ 0

ψ(x η ∗) + β 0

·

( ϕ(xη ∗, y δ ) −d

)

· ϕ(x η ∗, yδ )

α ′ 1

= α φ(x η ∗, yδ ) 1−d

ψ(x η ∗) + β 0

, 0 < d < 1,

where α = α′ 0

is some constant. Here we have dropped the constant β

(α ′ 1 since it has only marginal

1 )1−d

practical impact on the solution procedure so long as its value is small. The rationale of invoking an

exponent (1 − d) is to adaptively update the parameter α 0 ′ using the automatically detected noise level

so that α 0 ′ ∼ O(σ0 −2d ) as the noise level σ 0 decreases to zero, in the hope of verifying the consistency

conditions dictated in equation (8). This choice rule is plausible provided that the estimate σ 2 (η ∗ ) agrees

reasonably well with the exact noise level σ0. 2 In brevity, we have arrived at the desired choice rule which

selects the regularization parameter according to the nonlinear equation of η with 0 < d < 1:

η (ψ(x η ) + β 0 ) = α φ(x η , y δ ) 1−d (9)

for which we shall propose an effective iterative algorithm that converges monotonically.

We emphasize that the newly proposed choice rule (9) can also be regarded as an adaptive strategy

for updating the parameter α 0 ′ in the g-Tikhonov functional. The specialization of the choice rule (9)

to the L 2 -L 2 functional might be also used as a systematic strategy to adapt the parameter ν of a fixed

point algorithm proposed in [5], which numerically implements a local minimum criterion [26, 11] for

choosing the regularization parameter.

Selection of parameters β 0 , α and d in (9). Before proceeding to the analysis of the new choice

rule based on (9), we give some practical guidelines on choosing the parameters β 0 , α and d. The

parameter β 0 plays only an insignificant role as long as its value is kept sufficiently small so that the term

ψ(x η ) is dominated. Practically, we have observed that the numerical results are practically identical for

β 0 varying over a wide range, e.g. [1 × 10 −3 , 1 × 10 −10 ]. Numerical experiments indicate that for the

finite-dimensional inverse problem (4), small values of α 0 and α 1 works well for inverse problems with

5% relative noise in the data. Therefore, a value of α 0 ′ ≈ m 2 and α′ 1 ≈ n 2

suffices in this case, which

consequently indicates that α 0 ′ = 1 2 and α′ 1 = 1 2

should suffice for its continuous analog if m ≈ n, i.e.

the constant α 0/α ′ 1 ′ should maintain the order 1 in equation (7). With these experimental observations

in mind, the constant α in (9) should be of order one, but scaled appropriately by the magnitude of

the data to account for the rescaling φ(x η ∗, y δ ) −d . This can be roughly achieved by rescaling its value

by max i |y i | 2d and max i |y i | d in case of L 2 and L 1 data-fitting, respectively. The optimal value of the

exponent d depends on the source condition verified by the exact solution x + , see Theorems 3.4 and 3.5,

and typically we choose its value in the range [ 1 3 , 1 2 ]. These guidelines on the selection of parameters β 0,

α and d are simple and easy to realize in numerical implementations, and have worked very well for all

the five benchmark problems, ranging from mildly to severely ill-posed inverse problems; see Section 5

for details.

3 Existence and error estimates

This section shows the existence of solutions to the regularization parameter equation (9), and derives

a posteriori error estimates of the Tikhonov solution x η ∗ in (3). We make the following assumptions on

the functionals φ(x, y δ ) and ψ(x).

Assumption 3.1 Assume that the nonnegative functionals φ(x, y δ ) and ψ(x) satisfy

(a) For any η > 0, the functional J η (x) defined in (3) is coercive on X , i.e. J η (x) → +∞ as ‖x‖ X → ∞.

(b) The functionals φ(x, y δ ) and ψ(x) are weakly lower semi-continuous, i.e.

φ(x, y δ ) ≤ lim inf

n→∞ φ(x n, y δ ) and ψ(x) ≤ lim inf

n→∞ ψ(x n)

6

α ′ 1

for any sequence {x n } n ⊂ X converging weakly to x.

Assumptions 3.1(a) and (b) are standard for ensuring the existence of a minimizer to the Tikhonov

functional J η [11].

Note that the minimizers x η of the Tikhonov functional J η in (3) might be nonunique, thus the

functions φ(x η , y δ ) and ψ(x η ) might be multi-valued. We will need the next lemma on the monotonicity

of these functions with respect to η.

Lemma 3.1 Let x η1 and x η2 be the solutions to the Tikhonov functional J η (x) in (3) with the regularization

parameter η 1 and η 2 , respectively. Then we have

(ψ(x η1 ) − ψ(x η2 ))(η 1 − η 2 ) ≤ 0,

(φ(x η1 , y δ ) − φ(x η2 , y δ ))(η 1 − η 2 ) ≥ 0.

Proof. By the minimizing properties of x η1 and x η2 , we have

φ(x η1 , y δ ) + η 1 ψ(x η1 ) ≤ φ(x η2 , y δ ) + η 1 ψ(x η2 ),

φ(x η2 , y δ ) + η 2 ψ(x η2 ) ≤ φ(x η1 , y δ ) + η 2 ψ(x η1 ).

Adding these two inequalities gives the first assertion, and the second can be derived analogously. □

The minimizer x η to the Tikhonov functional J η in (3) is a nonlinear function of the regularization

parameter η, therefore equation (9) is a nonlinear equation in η. Assisted with Lemma 3.1, we are now

ready to give an existence result to equation (9).

Theorem 3.1 Assume that the functions φ(x η , y δ ) and ψ(x η ) are continuous with respect to η. Then

there exists at least one positive solution to equation (9) if lim η→0 + φ(x η , y δ ) > 0.

Proof.

Define

f(η) = η(ψ(x η ) + β 0 ) − αφ(x η , y δ ) 1−d , (10)

then the nonnegativity of the functionals φ(x, y δ ) and ψ(x) and Lemma 3.1 imply that

from which we derive that

f(η) ≥ β 0 η − αφ(x ∞ , y δ ) 1−d ,

lim f(η) = +∞.

η→∞

Note that by Lemma 3.1, φ(x η , y δ ) is monotonically decreasing with the decrease of η, and by Assumption

3.1, it is bounded from below. Therefore, the following limiting process makes sense, and

lim f(η) = lim −αφ(x η, y δ ) < 0,

η→0 + η→0 +

by the assumption that lim η→0 + φ(x η , y δ ) is positive. By the continuity of the functional φ(x η , y δ ) and

ψ(x η ) with respect to η, we conclude that there exists at least one positive solution to equation (9). □

Remark 3.1 The existence of a solution follows also from the convergence of the fixed point algorithm,

see Theorem 4.1. Also, the existence of a positive solution can be ensured for a relaxation of equation (9)

η(ψ(x η ) + β 0 ) = α(φ(x η , y δ ) + β 1 ) 1−d , 0 < d < 1,

where β 1 acts as a relaxation parameter and is usually taken to be much smaller compared to the magnitude

of φ(x η , y δ ).

Remark 3.2 The proposed choice rule (9) also generalizes the zero-crossing method for the L 2 -L 2 functional,

which seeks the solution to the nonlinear equation

−φ(x η ∗, y δ ) + η ∗ ψ(x η ∗) = 0,

It is obtained by setting d = 0 and α = 1 in equation (9). The zero-crossing method is popular in the

biomedical engineering community, and for some analysis of the method, we refer to [20].

Theorem 3.1 relies crucially on the continuity of the functions φ(x η , y δ ) and ψ(x η ) with respect to

the regularization parameter η. Lemma 3.1 indicates that the functions are monotone, and thus are

differentiable almost everywhere. The following theorem gives one sufficient condition for the continuity.

Theorem 3.2 Suppose that the functional J η has a unique minimizer for every η > 0. Then the functions

φ(x η , y δ ) and ψ(x η ) are continuous with respect to η.

Proof. Fix η ∗ > 0 and let x η ∗ be the unique minimizer of J η ∗. Let {η j } j ⊂ R + converge to η ∗ .

Consider the sequence of minimizers {x ηj } j . Observe that

φ(x ηj , y δ ) + η j ψ(x ηj ) ≤ φ(˜x, y δ ) + η j φ(˜x) = φ(˜x, y δ ).

This implies that the sequences {φ(x ηj , y δ )} j and {ψ(x ηj )} j are uniformly bounded. By Assumption

3.1(a), the sequence {x ηj } j is uniformly bounded. Therefore, there exists a subsequence of {x ηj } j , also

denoted as {x ηj } j , such that x ηj → x ∗ weakly. By Assumption 3.1(b) on the weak lower semi-continuity

of the functionals, we have

Hence, we arrive at

φ(x ∗ , y δ ) ≤ lim inf

j→∞ φ(x η j

, y δ ) and ψ(x ∗ ) ≤ lim inf

j→∞ ψ(x η j

). (11)

J η ∗(x ∗ ) = φ(x ∗ , y δ ) + η ∗ ψ(x ∗ ) ≤ lim inf

j→∞

Next we show J η ∗(x η ∗) ≥ lim sup j→∞ J ηj (x ηj ). To see this,

by the fact that x ηj

lim sup

j→∞

φ(x η j

, y δ ) + lim inf η jψ(x ηj ) ≤ lim inf J η j

(x ηj ).

j→∞

J ηj (x ηj ) ≤ lim sup J ηj (x η ∗) = lim J η j

(x η ∗) = J η ∗(x η ∗)

j→∞

is a minimizer of J ηj . Consequently,

lim sup J ηj (x ηj ) ≤ J η ∗(x η ∗) ≤ J η ∗(x ∗ ) ≤ lim inf J η j

(x ηj ). (12)

j→∞

We thus see that x ∗ is a minimizer of J η ∗, and by uniqueness of minimizers of J η ∗, we deduce that

x ∗ = x η ∗, and the whole sequence {x ηj } j converges weakly to x η ∗. Consequently, the function J η (x η ) is

continuous with respect to η. Next we show that the functional ψ(x ηj ) → ψ(x η ∗), for which it suffices to

show that

lim sup ψ(x ηj ) ≤ ψ(x η ∗).

j→∞

Assume that it does not hold. Then there exists a constant c such that c := lim sup j→∞ ψ(x ηj ) > ψ(x η ∗),

and there exists a subsequence of {x ηj } j , denoted by {x n } n , such that

As a consequence of (12), we have

x n → x η ∗ weakly , and ψ(x n ) → c.

lim φ(x n, y δ ) = φ(x η ∗, y δ ) + η ∗ ψ(x η ∗) − lim η nψ(x n )

n→∞ n→∞

= φ(x η ∗, y δ ) + η ∗ (ψ(x η ∗) − c) < φ(x η ∗, y δ ).

This is in contradiction with (11). Therefore, we have lim sup j→∞ ψ(x ηj ) ≤ ψ(x η ∗), and the function

ψ(x η ) is continuous with respect to η. The continuity of φ(x η , y δ ) follows from the continuity of the

function J η (x η ) and ψ(x η ).

□

The following corollary is a direct consequence of Theorem 3.2, and is also of independent interest.

Corollary 3.1 The functional value J η (x η ) is always continuous with respect to η. The multi-valued

functions φ(x η , y δ ) and ψ(x η ) share the same discontinuity set, which is at most of countable cardinality.

Proof. The continuity of the functional follows directly from the proof of Theorem 3.2, and it consequently

implies that φ(x η , y δ ) and ψ(x η ) share the same discontinuity set. The fact that the discontinuity

set is countable follows from the monotonicity of φ(x η , y δ ) and ψ(x η ), see Lemma 3.1.

□

Remark 3.3 The continuity result also holds for non-reflexive spaces, e.g. BV space. For a proof of

the L 2 -T V formulation, we refer to reference [9]. However, the uniqueness assumption is necessary in

general, and one counterexample is the L 1 -T V formulation [9]. Theorem 3.2 remains valid in the presence

of convex constraint set C or nonlinear operators K.

We are now going to establish some variational characterization of the regularization parameter choice

rule (9). For this, we introduce a functional G by

{ 1

G(x) = d φ(x, yδ ) d + α ln(β 0 + ψ(x)), 0 < d < 1,

ln φ(x, y δ ) + α ln(β 0 + ψ(x)), d = 0.

Clearly, the existence of a minimizer to the functional G follows directly from Assumption 3.1 as it is

bounded from below, coercive and weakly lower semi-continuous. Under the premise that the functionals

φ(x, y δ ) and ψ(x) are differentiable, a critical point x ∗ of the functional G satisfies

Setting η ∗ = α φ(x∗ ,y δ ) 1−d

ψ(x ∗ )+β 0

gives

φ(x ∗ , y δ ) d−1 φ ′ (x ∗ , y δ ) +

α

ψ(x ∗ ) + β 0

ψ ′ (x ∗ ) = 0.

φ ′ (x ∗ , y δ ) + η ∗ ψ ′ (x ∗ ) = 0,

i.e. x ∗ is a critical point of the functional J η ∗. Furthermore, if the functional J η ∗ is convex, then x ∗ is

also a minimizer of the functional J η ∗ and x ∗ = x η ∗. The next theorem summarizes this observation.

Theorem 3.3 If the functional J η is convex and the functionals φ(x, y δ ) and ψ(x) are differentiable,

then a solution x η ∗ computed by the choice rule (9) is a critical point of the functional G.

Theorem 3.3 and the existence of a minimizer to the functional G ensures existence of a solution to

the strategy (9). The functional G provides a variational characterization of the regularization parameter

choice rule (9), while the strategy (9) implements the functional G via the optimality condition. There

might exist better strategies numerically implementing the functional G, however, this is beyond the

scope of the present study.

To offer further theoretical justifications of the choice rule (9), we will derive a posteriori error

estimates, i.e. a bound of the error between the inverse solution x η ∗ and the exact solution x + to (1).

We will consider functionals of L 2 -ψ type with ψ being convex, and discuss two cases ψ(x) = ‖x‖ 2 2 and

ψ(x) being a general convex function separately due to the inherent differences therebetween.

We first specialize to Tikhonov regularization in Hilbert spaces [11], with its norm denoted by ‖ · ‖.

Let x η ∗ be a solution to the Tikhonov functional in (3) with φ(x, y δ ) = ‖Kx − y δ ‖ 2 2, ψ(x) = ‖x‖ 2 2 and

with η ∗ chosen by equation (9). To this end, we adopt the general framework of reference [11]. Let

g η (t) = 1

η+t and r η(t) = 1 − tg η (t) =

η

η+t , then define G(η) by G(η) := sup{|g η(t)| : t ∈ [0, ‖K‖ 2 ]} = 1 η ,

and let ω µ : (0, ‖K‖ 2 ) → R be such that for all γ ∈ (0, γ 0 ) and t ∈ [0, ‖K‖ 2 ], t µ |r γ (t)| ≤ ω µ (γ) holds.

Then for 0 < µ ≤ 1, we have ω µ (η) = η µ . Moreover, define the source sets X µ,ρ by X µ,ρ := {x ∈ X :

x = (K ∗ K) µ w, ‖w‖ ≤ ρ}. With these preliminaries, we are ready to state one of our main results on a

posteriori error estimates.

Theorem 3.4 Let x + be the minimum-norm solution to Kx = y, and assume that x + ∈ X µ,ρ for some

0 < µ ≤ 1. Let δ ∗ := ‖y δ − Kx η ∗‖ and d = 2µ

2µ+1

. Then we have

( √ )

‖x + − x η ∗‖ ≤ c ρ 1 δ ψ(xη ∗) + β 0

2µ+1 + √ max{δ, δ ∗ } 2µ

2µ+1 . (13)

δ ∗ α

Proof.

We decompose the error x + − x η into

x + − x η = r η (K ∗ K)x + + g η (K ∗ K)K ∗ (y − y δ ).

Introducing the source representer w with x + = (K ∗ K) µ w, the interpolation inequality gives

‖r η (K ∗ K)x + ‖ = ‖r η (K ∗ K)(K ∗ K) µ w‖

≤ ‖(K ∗ K) 1 2 +µ r η (K ∗ K)w‖ 2µ

2µ+1 ‖rη (K ∗ K)w‖ 1

2µ+1

= ‖r η (KK ∗ )Kx + ‖ 2µ

2µ+1 ‖rη (K ∗ K)w‖ 1

2µ+1

≤ c ( ‖r η (KK ∗ )y δ ‖ + ‖r η (KK ∗ )(y δ − y)‖ ) 2µ

2µ+1

‖w‖ 1

2µ+1 ,

where the constant c depends only on the maximum of r η over [0, ‖K‖ 2 ]. By noting the relation

we obtain

r η ∗(KK ∗ )y δ = y δ − Kx η ∗,

‖r η ∗(K ∗ K)x + ‖ ≤ c(δ ∗ + cδ) 2µ

2µ+1 ρ

1

2µ+1 ≤ c1 max{δ, δ ∗ } 2µ

2µ+1 ρ

1

2µ+1 .

It remains to estimate the term ‖g η ∗(K ∗ K)K ∗ (y δ − y)‖. The standard estimate (see Theorem 4.2 of [11])

yields

‖g η ∗(K ∗ K)K ∗ (y δ − y)‖ ≤ c√ δ

η

∗ ,

However, by equation (9), we have

Therefore, we derive that

1

√ η

∗ = δd ∗

δ ∗

√

ψ(xη ∗) + β 0

√ α

.

‖g η ∗(K ∗ K)K ∗ (y δ − y)‖ ≤ c δ δ ∗

√

ψ(xη ∗) + β 0

√ α

δ d ∗ ≤ c δ δ ∗

√

ψ(xη ∗) + β 0

√ α

max{δ, δ ∗ } d .

Combining these two estimates and taking into account that d = 2µ

2µ+1

, we arrive at the desired a posteriori

error estimate.

□

Remark 3.4 The error bound (13) states that the approximation obtained from the proposed rule is

order-optimal provided that δ ∗ is about the order of δ. However, to this end, the exponent d must be

chosen according to the sourcewise parameter µ. The knowledge of δ ∗ enables a posteriori checking: if

δ ∗ ≪ δ, then one should be cautious about the chosen parameter, since the prefactor δ

δ ∗

is very large; if

δ ∗ ≫ δ, the situation is not critical and the magnitude of δ ∗ essentially determines the error. Numerically,

the prefactor λ ∗ = ψ(x η ∗ )+β 0

α

remains almost constant as the noise level σ0 2 varies.

Next we consider functionals of the type L 2 -ψ with ψ(x) being convex. The convergence rate analysis

for inverse problems in Banach spaces is fundamentally different from that in Hilbert spaces [8]. We will

use an interesting new distance function, the generalized Bregman distance (cf. [8]), to measure the a

posteriori error. To this end, we need the concept of the ψ-minimizing solution.

Definition 3.1 An element x + ∈ X is called a ψ-minimizing solution of (1) if Kx + = y and

ψ(x + ) ≤ ψ(x), ∀x ∈ X such that Kx = y.

Let us denote the subdifferential of ψ(x) at x + by ∂ψ(x + ), i.e.

∂ψ(x + ) = {q ∈ X ∗ : ψ(x) ≥ ψ(x + ) + 〈q, x − x + 〉, ∀x ∈ X },

and define the generalized Bregman distance D ψ (x, x + ) by

D ψ (x, x + ) := { ψ(x) − ψ(x + ) − 〈q, x − x + 〉 : q ∈ ∂ψ(x + ) } .

One can verify that if ψ(x) = ‖x‖ 2 , then the generalized Bregman distance d(x η ∗, x + ) reduces to the

familiar formula

d(x η ∗, x + ) = ‖x η ∗ − x + ‖ 2 .

Now we are ready to present another a posteriori error estimate.

Theorem 3.5 Let x + be a ψ-minimizing solution to equation (1) and assume that the following source

condition holds: there exists a w ∈ Y such that

K ∗ w ∈ ∂ψ(x + ).

Let δ ∗ = ‖Kx η ∗ −y δ ‖ and d = 1 2 . Then for each x η ∗ that solves equation (9), there exists d ∈ D ψ(x η ∗, x + )

such that

( )

δ

d(x η ∗, x + ψ(x η ∗) + β 0 α

) ≤

+

‖w‖ 2 max{δ, δ ∗ }.

δ ∗ α ψ(x η ∗) + β 0

Proof.

Let

d(x η ∗, x + ) = ψ(x η ∗) − ψ(x + ) − 〈K ∗ w, x η ∗ − x + 〉 ∈ D ψ (x η ∗, x + ).

By the minimizing property of x η ∗, Kx + = y and ‖y − y δ ‖ = δ, we have

1

2 ‖Kx η ∗ − yδ ‖ 2 2 + η ∗ ψ(x η ∗) ≤ δ2

2 + η∗ ψ(x + ),

i.e.

1

2 ‖Kx η ∗ − yδ ‖ 2 2 + η ∗ d + η ∗ [ 〈w, Kx η ∗ − y δ 〉 + 〈w, y δ − y〉 ] ≤ δ2

2 .

Adding 1 2 η∗2 ‖w‖ 2 to both sides of the equality and utilizing the Cauchy-Schwartz inequality yield

1

2 ‖Kx η ∗ − yδ − η ∗ w‖ 2 2 + η ∗ d(x η ∗, x + ) ≤ δ2

2 + η∗2

2 ‖w‖2 + η ∗ 〈w, y − y δ 〉

≤ δ 2 + ‖w‖ 2 η ∗2 .

Therefore, we derive that

d(x η ∗, x + ) ≤ δ2

η ∗ + ‖w‖2 η ∗ .

which combined with equation (9) yields the desired estimate.

□

4 Numerical algorithm for both Tikhonov solution and regularization

parameter

The new choice rule requires solving the nonlinear regularization parameter equation (9) for the regularization

parameter η in order to find the Tikhonov solution x η through the functional J η in (3). A

direct numerical treatment of (9) seems difficult. Motivated by the strict biconvexity structure of the

g-Tikhonov functional J (x, λ, τ), i.e. it is strictly convex in x (respectively in (λ, τ)) for fixed (λ, τ) (

respectively x), we propose the following iterative algorithm for the efficient numerical realization of the

proposed choice rule (9), along with the Tikhonov solution x η through the functional J η in (3).

Algorithm I. Choose an initial guess η 0 > 0, and set k = 0. Find (x k , η k ) for k ≥ 1 as follows:

(i) Solve for x k+1 by the Tikhonov regularization method

(ii) Update the regularization parameter η k+1 by

x k+1 = arg min

x

{

φ(x, y δ ) + η k ψ(x) } .

η k+1 = α φ(x k+1, y δ ) 1−d

ψ(x k+1 ) + β 0

.

(iii) Check the stopping criterion. If not converged, set k = k + 1 and repeat from Step (i).

Before embarking on the convergence analysis of the algorithm, we mention that we have not specified

the solver for Tikhonov regularization problem in Step (i). The problem per se may be approximately

solved with an iterative algorithm, e.g. the conjugate gradient method or iterative reweighted leastsquares

method. Numerically, we have found that it will not affect the steady convergence of the algorithm

much so long as the the problem is solved with reasonable accuracy.

The following lemma provides an interesting and practically very important observation on the monotonicity

of the sequence {η k } k of regularization parameters generated by Algorithm I, and the monotonicity

is key to the demonstration of the convergence of the algorithm.

Lemma 4.1 For any initial guess η 0 , the sequence {η k } k generated by Algorithm I converges monotonically.

Proof.

By the definition of η k , we have

η k := α φ(x k, y δ ) 1−d

ψ(x k ) + β 0

.

Therefore,

where the denominator D k is defined as

η k − η k−1 = α φ(x k, y δ ) 1−d

ψ(x k ) + β 0

− α φ(x k−1, y δ ) 1−d

ψ(x k−1 ) + β 0

= α D k

[I + β 0 II] , (14)

D k = (ψ(x k−1 ) + β 0 ) (ψ(x k ) + β 0 ) .

The terms I and II in the square bracket of equation (14) are respectively given by

I := φ(x k , y δ ) 1−d ψ(x k−1 ) − φ(x k−1 , y δ ) 1−d ψ(x k )

= φ(x k , y δ ) 1−d (ψ(x k−1 ) − ψ(x k )) + ψ(x k )(φ(x k , y δ ) 1−d − φ(x k−1 , y δ ) 1−d ),

II := φ(x k , y δ ) 1−d − φ(x k−1 , y δ ) 1−d .

We assume that η k−1 ≠ η k−2 , otherwise it is trivial. Lemma 3.1 indicates that each term is of the same

sign with η k − η k−1 , and thus the sequence {η k } k is monotone. Next we show that the sequence {η k } k is

bounded. A trivial lower bound is zero. Now by the minimizing property of x k , we deduce

φ(x k , y δ ) + η k ψ(x k ) ≤ φ(˜x, y δ ) + η k ψ(˜x),

where ˜x ∈ X satisfies ψ(˜x) = 0 by Assumption 3.1(c). Consequently,

φ(x k , y δ ) ≤ φ(˜x, y δ ).

Therefore, the definition of η k gives

η k = α φ(x k, y δ ) 1−d

ψ(x k ) + β 0

≤ α β 0

φ(˜x, y δ ) 1−d ,

i.e. the sequence {η k } k is uniformly bounded, which combined with the monotonicity yields the desired

convergence.

□

Lemma 4.2 Assume that the functionals φ(x) and ψ(x) are differentiable, and let F (η) = φ(x η , y δ ) +

ηψ(x η ). Then the asymptotic convergence rate r ∗ of the algorithm is dictated by

r ∗ η ∗ − η k+1

:= lim

k→∞ η ∗ = −η∗ F ′′ (η ∗ ) [

(1 − d)αφ(xη ∗, y δ ) −d + 1 ] .

− η k ψ(x η ∗) + β 0

Proof.

Differentiating F (η) with respect to η gives

F ′ (η) = dφ(x η, y δ ) dx η

dx η

dη + ηψ(x η) + η dψ(x η)

dx η

which taking into account the optimality condition for x η gives

dx η

dx ,

ψ(x η ) = F ′ (η) and φ(x η , y δ ) = F (η) − ηF ′ (η).

The asymptotic convergence rate r ∗ of the algorithm is dictated by

r ∗ η ∗ − η k+1

:= lim

k→∞ η ∗ = d

− η k dη αφ(x η, y δ ) 1−d

| η=η ∗ = α d

ψ(x η ) + β 0 dη

[F (η) − ηF ′ (η)] 1−d

F ′ | η=η ∗

(η) + β 0

= α [F (η∗ ) − η ∗ F ′ (η ∗ )] −d F ′′ (η ∗ )[−(1 − d)η ∗ (F ′ (η ∗ ) + β 0 ) − (F (η ∗ ) − η ∗ F ′ (η ∗ ))]

(F ′ (η ∗ ) + β 0 ) 2

−η ∗ F ′′ (η ∗ ) [

=

(1 − d)η ∗

(ψ(x η ∗) + β 0 )φ(x η ∗, y δ (ψ(x η ∗) + β 0 ) + φ(x η ∗, y δ ) ]

)

= −η∗ F ′′ (η ∗ ) [

(1 − d)αφ(xη ∗, y δ ) −d + 1 ] .

ψ(x η ∗) + β 0

This establishes the lemma.

□

Remark 4.1 For the special case d = 0, the expression of rate r ∗ in Lemma 4.2 simplifies to

r ∗ = (1 + α) −η∗ F ′′ (η ∗ )

ψ(x η ∗) + β 0

.

The established monotone convergence of the sequence {η k } k implies that r ∗ ≤ 1, however, a precise

estimate of the rate r ∗ is still missing. Nonetheless, a fast convergence is always numerically observed.

Definition 4.1 [22] A functional ψ(x) is said to have the H-property on the space X if any sequence

{x n } n ⊂ X weakly converging to a limit x 0 ∈ X and converging to x 0 in functional, i.e. ψ(x n ) → ψ(x 0 ),

strongly converges to x 0 in X .

This property is also known as the Efimov-Stechkin condition or the Kadec-Klee property in the literature.

Norms and semi-norms on Hilbert spaces, and norms the spaces L p (Ω) and Sobolev spaces W m,p (Ω) with

1

Assisted with Lemma 4.1, we are now ready to prove the convergence of Algorithm I.

Theorem 4.1 Assume that η ∗ > 0. Then every subsequence of the sequence {(x k , η k )} k generated by

Algorithm I has a subsequence converging weakly to a solution (x ∗ , η ∗ ) of equation (9), and the convergence

of the sequence {η k } k is monotonic. If the minimizer of J η ∗(x) is unique, the whole sequence converges

weakly. Moreover, if the functional ψ(x) satisfies the H-property, the weak convergence is actually strong.

Proof.

Lemma 4.1 shows that there exists some η ∗ such that

lim η k = η ∗ > 0.

k→∞

By Lemma 3.1 and the monotonicity of the sequence {η k } k , we deduce that the sequences {φ(x k , y δ )} k

and {ψ(x k )} k are monotonic. By η ∗ > 0 and Assumption 3.1, we observe that

0 ≤ φ(x k , y δ ) ≤ φ(˜x, y δ ),

0 ≤ ψ(x k ) ≤ max{ψ(x η0 ), ψ(x η ∗)}.

Therefore, the sequences {φ(x k , y δ )} k and {ψ(x k )} k are monotonically convergent. By Assumption 3.1(a),

the sequence {x k } k is uniformly bounded, and there exists a subsequence of {x k } k , also denoted as {x k } k ,

and some x ∗ ∈ X , such that

x k → x ∗ weakly.

The minimizing property of x k gives

Letting k tend to ∞, we have

φ(x k , y δ ) + η k−1 ψ(x k ) ≤ φ(x, y δ ) + η k−1 ψ(x), ∀ x ∈ X .

φ(x ∗ , y δ ) + η ∗ ψ(x ∗ ) ≤ φ(x, y δ ) + η ∗ ψ(x), ∀ x ∈ X ,

i.e., x ∗ is a minimizer of the Tikhonov functional J η ∗. Therefore, the element (x ∗ , η ∗ ) satisfies equation

(9). Now if the minimizer of the functional J η ∗ is unique, the whole sequence {x k } k converges weakly to

x ∗ . Recall the monotone convergence

lim

k→∞ ψ(x k) = c ∗ ,

for some constant c ∗ . Next we show that c ∗ = φ(x ∗ ). By the lower semi-continuity of ψ(x) we have

φ(x ∗ ) ≤ lim inf ψ(x k) = lim ψ(x k) = c ∗ .

k→∞

Assume that c ∗ > ψ(x ∗ ), then by the continuity of the functional value J η (x η ) with respect to η, see

Corollary 3.1, we have φ(x ∗ , y δ ) > lim k→∞ φ(x k , y δ ), which is in contradiction with the lower semicontinuity

of the functional φ(x, y δ ). Therefore, we deduce that

lim ψ(x k) = ψ(x ∗ ),

k→∞

which together with the H-property of ψ(x) on the space X implies the desired strong convergence.

□

Remark 4.2 In the numerical algorithm, the quantity σ 2 (η) can also be computed

σ 2 (η k ) = φ(x k, y δ )

α 1

,

which estimates the variance σ 2 0 of the data noise, analogous to the highly applauded generalized crossvalidation

[31]. By observing Lemmas 3.1 and 4.1, the sequence {σ 2 (η k )} k converges monotonically. One

distinction of the estimate σ 2 (η) is that it changes very mildly during the iteration, especially for severely

ill-posed inverse problems.

Table 1: Numerical examples.

example description ill-posedness Cond(K) noise program φ-ψ

1 Shaw’s problem severe 1.94 × 10 19 Gaussian shaw L 2 -L 2

2 gravity surveying severe 9.74 × 10 18 Gaussian gravity L 2 -H 2 with C

3 differentiation mild 1.22 × 10 4 Gaussian deriv2 L 2 -T V

4 Phillips’s problem mild 2.64 × 10 6 Gaussian phillips L 2 -L 2

5 deblurring severe 2.62 × 10 12 impulsive deblur L 1 -T V

5 Numerical experiments and discussions

This section presents the numerical results for five benchmark inverse problems, which are adapted from

Hansen’s popular MATLAB package Regularization Tool [17] and range from mild to severe ill-posedness,

to illustrate salient features of the proposed rule. These are Fredholm (or Volterra) integral equations

of the first kind with kernel k(s, t) and solution x(t). The discretized linear system takes the form

Kx = y δ , and is of size 100 × 100. The regularizing functional is referred to as φ-ψ type, e.g. L 1 -T V

denotes the one with L 1 data-fitting and T V regularization. Table 1 summarizes major features, e.g.

degree of ill-posedness, of these examples, where the notation Cond(K) denotes the condition number

of the matrix K, and relevant MATLAB programs are taken from the package. Let ε be the relative noise

level, then we will consider five noise levels, i.e. ε ∈ {5 × 10 −2 , 5 × 10 −3 , 5 × 10 −4 , 5 × 10 −5 , 5 × 10 −6 },

and graphically differentiated by distinct colors. Unless otherwise specified, the initial guess for the

regularization parameter η is η 0 = 1.0 × 10 −8 , and the value for the parameter pair (α, β 0 ) and the

constant d is taken to be (0.1, 1 × 10 −4 ) and 1 3

, respectively. The value for α follows from the rule of

thumb that α 0 ′ ≈ 1 works well for full norms in case of the g-Tikhonov method and subsequently it is

scaled by max i |y i | 2d to compensate for the effect of the component. Vector norms are rescaled so that

the estimate σ 2 (η) is directly comparable with the variance σ0. 2 The nonsmooth minimization problems

arising from L 2 -T V , L 2 -L 1 and L 1 -T V formulations are solved by the iterative reweighted least-squares

method [18].

We will term the newly proposed choice rule as g-Tikhonov rule to emphasize its intimate connection

with the g-Tikhonov functional, and compare it with three other popular heuristic choice rules, i.e.

quasi-optimality (QO) criterion, generalized cross-validation (GCV) and L-curve (LC) [16] criterion. The

quasi-optimality criterion requires the differentiability of the inverse solution x η with respect to η and thus

it might be unsuitable for nonsmooth functionals, e.g. L 2 -L 1 , and the GCV seems not directly amenable

with problems with constraint and nonsmooth functionals due to the lack of an explicit formula for

computing the ‘effective’ degrees of freedom of the residual. Generally, the existence of a ‘corner’ on the

L-curve is not guaranteed. Moreover, for regularizing functionals other than L 2 -L 2 type, the L-curve

must be sampled at discrete points, however, numerically locating the corner from discrete sample points

is highly nontrivial.

5.1 Case 1: L 2 -L 2

Example 1 (Shaw’s problem [17]). The functions k and x are given by k(s, t) = (cos s + cos t) ( sin u

u

with u(s, t) = π(sin s + sin t) and x(t) = 2e −6(t− 4 5 )2 + e −2(t+ 1 2 )2 , respectively, and the integration interval

is [− π 2 , π 2

]. The data is contaminated by additive Gaussian noise, i.e.

yi δ = y i + max {|y i|}εξ i , 1 ≤ i ≤ 100,

1≤i≤100

where ξ i is the standard Gaussian random variable, and ε refers to the relative noise level. The variance

σ 2 0 is related to the noise level ε by σ 0 = max 1≤i≤100 {|y i |}ε. The parameter α is taken to be α = 1.

The automatically determined value of the regularization parameter η depends on the realization of

the random noise, and thus it is also random. The probability density p(η) is estimated from 1000 samples

with kernel density estimation technique in a logarithmic scale [19] for Example 1, and it is shown in

) 2

10

20

8

15

7

6

5

p(η)

4

p(e)

10

p(σ 2 )

4

3

2

5

2

1

0

(a) 10 −10 10 −8 10 −6 10 −4 10 −2 10 0

10 −3 10 −2 10 −1 10 0 10 1

η

(b) (c) 10 −10 10 −8 10 −6 10 −4 10 −2 10 0

e

σ 2

Figure 1: Density of (a) η, (b) e, and (c) σ 2 for Example 1.

Table 2: Numerical results for Example 1.

ε σ 2 0 σ 2 GCV σ 2 GT η QO η LC η GCV η GT λ GT e QO e LC e GCV e GT

5e-6 3.34e-10 2.48e-10 3.24e-10 2.87e-8 5.81e-10 2.30e-8 4.74e-7 1.01e0 3.26e-2 3.67e-2 3.28e-2 3.32e-2

5e-5 3.31e-8 2.51e-8 2.60e-8 9.87e-6 6.27e-8 9.97e-7 8.83e-6 1.01e0 4.55e-2 5.60e-2 4.09e-2 4.53e-2

5e-4 3.31e-6 2.52e-6 3.20e-6 3.68e-5 3.02e-6 1.70e-5 2.20e-4 1.01e0 5.33e-2 9.38e-2 5.86e-2 5.80e-2

5e-3 3.31e-4 2.54e-4 2.72e-4 1.08e-2 1.53e-4 4.58e-4 4.34e-3 1.03e0 1.52e-1 7.48e-2 8.02e-2 1.35e-1

5e-2 3.31e-2 2.52e-2 2.99e-2 1.02e-2 7.44e-3 1.16e-2 1.08e-1 1.13e0 1.60e-1 1.58e-1 1.61e-1 1.88e-1

Figure 1(a). Here the dash-dotted, dotted, dashed and solid curves refer to results given by η QO , η LC ,

η GCV and η GT , respectively. For medium noise levels, all three methods except the GCV work very well,

however the variation of η QO and η LC are larger than that of η GT . The GCV fails for about 10% of the

samples, signified by η GCV taking very small values, e.g. 1 × 10 −20 . This phenomenon occurs irrespective

of noise levels, and it is attributed to the fact that the GCV curve is very flat [16], see e.g. Figure 2(c).

On average, η LC decays to zero faster than η QO and η GT as the noise level σ0 2 tends to zero. Thus for low

noise levels, η LC often takes very small values and spans over a broad scale, which renders its solution

plagued with spurious oscillations, see e.g. the red dotted curve in Figure 1(b). The observation concurs

with previous theoretical and numerical results of Hanke [14] that suggest the L-curve criterion may suffer

from nonconvergence in case of smooth solutions. The quasi-optimality criterion fails also occasionally,

as indicated by the long tail, despite its overall robustness. Therefore, the newly proposed choice rule is

more robust than the other three. The inverse solution x η ∗ is also random. We utilize the accuracy error

e defined below as the error metric

e = ‖x η ∗ − x‖ L 2.

The probability density p(e) of the accuracy error e is shown in Figure 1(b). The accuracy errors e QO

and e GT are very similar despite the apparent discrepancies of the regularization parameters, whereas

e LC and e GCV vary very broadly, especially at low noise levels, although the variation of e LC is much

milder than that of e GCV . The estimates σGCV 2 and σ2 AT are practically identical, see Figure 1(c), which

qualifies σGT 2 as an estimator of the variance. Interestingly, σ2 GCV can slightly under-estimate the noise

level σ0 2 compared σGT 2 due to the exceedingly small regularization parameters chosen by the GCV.

10 4 k

10 2

10 0

σ 2

λ

η

e

2.5

2

1.5

exact

numerical

10 −1 η

10 −2

x

1

G(η)

10 −3

10 −4

0.5

10 −6

0

(a)

5 10 15 20

0

−2 −1.5 −1 −0.5

(b)

0

t

0.5 1 1.5 2

10 −4

Figure 2: (a) convergence of σ 2 , λ, η and e, (b) solution, and (c) GCV curve for Example 1 with ε = 5%.

10

8

6

p(η)

4

p(e)

4

p(σ 2 )

4

2

0

(a)

10 −10 10 −8 10 −6 η

10 −4 10 −2

0

(b) 10 −4 10 −3 10 −2 e

10 −1 10 0

10

(c)

−10 10 −8 10 −6 10 −4

σ 2 10 −2 10 0

Figure 3: Density of (a) η, (b) e, and (c) σ 2 for Example 2.

Next we investigate the convergence of Algorithm I for a particular realization of the noise. The

numerical results are summarized in Figure 2 and Table 2. The algorithm converges within five iterations,

and thus it merits a fast convergence. Moreover, the convergence is rather steady, and a few extra

iterations would not deteriorate the inverse solution. The estimate σ 2 (η) changes very little during the

iteration process, and a striking convergence within one iteration is observed, concurring with previous

numerical findings for severely ill-posed problems [19]. The convergence of the estimate σ 2 (η) is monotonic,

substantiating the remark after Theorem 4.1. The estimate σGCV 2 also approximates reasonably

σ0, 2 but it is less accurate than σGT 2 , see Table 2. The prefactor λ remains almost unchanged as the

noise level varies, see Table 2, and thus η GT is indeed proportional to φ(x, y δ ) 1−d ≈ σ 2(1−d)

0 = σ 4 3

0 . The

numerical solution remains accurate and stable for up to ε = 5%, see Figure 2(b).

5.2 Case 2: L 2 -H 2 with constraint

Example 2. (1D gravity surveying [17] with nonnegativity constraint). The functions k and x are given

( 1

by k(s, t) = 1 4 16 + (s − t)2) − 3 2

and x(t) = sin(πt) + 1 2

sin(2πt), respectively, and the integration interval

is [0, 1]. The constrained optimization problems are solved by built-in MATLAB function quadprog.

The presence of the constraint rules out the usage of the quasi-optimality criterion and the GCV,

and it can also distort the shape of the L-curve greatly so that a corner does not appear at all, e.g. in

case of the L 2 -L 2 functional. There does exist a distinct corner on the curve for the L 2 -H 2 functional,

see Figure 4(c), however, it is numerically difficult to locate due to the lack of monotonicity and discrete

nature of sampling points. This causes the frequent failure of the MATLAB functions corner and l corner

provided by the package, and visual inspection is required. The inconvenience persists for the remaining

examples, and thus we do not investigate its statistical performance via computing relevant probability

densities. The results for the L-curve criterion are obtained by manually locating the corner.

However, the presence of constraints poses no difficulty to the proposed rule. Analogous to Example

1, η GT and e GT are narrowly distributed, see Figures 3(a) and (b), which clearly illustrates its excellent

scalability with respect to the noise level. The estimate σGT 2 peaks around the exact variance σ2 0, and

always retains the correct magnitude, see Figure 3(c). Typical numerical results for Example 2 with

ε = 5% are presented in Figure 4. A fast and steady convergence of the algorithm within five iterations

is again observed, see Figure 4(a), and similar convergence behavior is observed for other noise levels. In

contrast, in order that the L-curve is representative, many points on the curve must be sampled, which

effectively diminishes its computational efficiency. The numerical solution is in good agreement with the

exact one, see Figure 4(b) and Table 3. The accuracy error e GT improves steadily as the noise level ε

decreases, and it compares very favorably with e LC , for which a nonconvergence phenomenon is observed.

The prefactor λ changes very little as the noise level ε varies, and thus η GT decays at a rate commensurate

with σ 4 3

GT

, see also Figure 3(a).

Table 3: Numerical results for Example 2.

ε σ0 2 σGT 2 η LC η GT λ GT e LC e GT

5e-6 1.14e-9 8.62e-10 7.20e-12 3.72e-10 4.11e-4 1.71e-1 4.84e-4

5e-5 1.14e-7 8.67e-8 7.91e-11 8.06e-9 4.12e-4 1.11e-1 7.83e-4

5e-4 1.14e-5 8.57e-6 9.54e-9 1.74e-7 4.15e-4 6.86e-2 2.49e-2

5e-3 1.14e-3 8.58e-4 5.18e-7 3.81e-6 4.24e-4 1.65e-2 7.20e-3

5e-2 1.14e-1 8.69e-2 6.25e-5 1.04e-4 5.32e-4 1.10e-1 4.12e-2

1.4

1.2

exact

numerical

10 0 k

10 −2

10 −4

10 −6

σ 2

λ

η

e

x

1

0.8

0.6

0.4

0.2

2

||Lx|| 2 L

10 8 ||Kx−y|| L

2

10 4

10 0

10 −8

0

(a)

5 10 15 20

0

10 −4

0 0.2 0.4 0.6 0.8 1

(b) 10 −2 10 −1 10 0 10 1

t

(c)

2

Figure 4: (a) convergence of σ 2 , λ, η and e, (b) solution, and (c) L-curve for Example 2 with ε = 5%.

5.3 Case 3: L 2 -T V

Example 3 (numerical differentiation, adapted from deriv2 [16]). The functions k and x are given by

k(s, t) =

{

s(t − 1), s < t,

t(s − 1), s ≥ t,

and x(t) =

{

1,

1

3 < t ≤ 2 3 ,

0, otherwise,

respectively, and the integration interval is [0, 1]. The constant α is taken to be 5 × 10 −3 , and η 0 is

1 × 10 −6 . The exact solution x is piecewise constant, and thus T V regularization is suitable.

The regularization parameter η GT distributes narrowly, and the accuracy error e GT is mostly comparable

with the noise level, see Figures 5(a) and (b), respectively. The sample mean of the estimate

σ 2 GT (η) agrees excellently with the exact one σ2 0, see Figure 5(c). For instance, in case of ε = 5%, the

mean 1.23 × 10 −5 almost coincides with the exact value. Typical numerical results for Example 3 are

summarized in Figure 6 and Table 4. The L-curve has only an ambiguous corner, and the ambiguity

persists for very low noise levels, e.g. ε = 5 × 10 −6 . Nevertheless, the regularization parameters chosen

by these two rules are comparable, and the numerical results are practically identical, see Table 4.

Algorithm I converges within less than five iterations with an empirical asymptotic convergence rate

r ∗ < 0.15 for all five noise levels, and moreover it tends to accelerate as σ 2 0 decreases, e.g. r ∗ ≈ 0.05

for ε = 5 × 10 −6 . Therefore, the algorithm is computationally very efficient. The reconstructed profile

remains accurate and stable for ε up to 5%, see Figure 6(b) and Table 4. Note that it exhibits typical

10

8

6

p(η)

4

p(e)

4

p(σ 2 )

4

2

0

(a) 10 −10 10 −8 10 −6 10 −4 10 −2

(b) 10 −4 10 −3 10 −2 10 −1 10 0

10 −14 10 −12 10 −10 10 −8 10 −6 10 −4

η

(c)

e

σ 2

Figure 5: Density of (a) η, (b) e and (c) σ 2 for Example 3.

Table 4: Numerical results for Example 3.

ε σ0 2 σGT 2 η LC η GT λ GT e LC e GT

5e-6 1.19e-13 8.21e-14 6.43e-12 4.70e-10 2.49e-1 1.26e-3 5.77e-4

5e-5 1.19e-11 8.79e-12 3.59e-9 1.06e-8 2.49e-1 5.01e-3 9.38e-3

5e-4 1.19e-9 8.65e-10 1.26e-7 2.25e-7 2.48e-1 4.34e-2 4.93e-2

5e-3 1.19e-7 8.60e-8 2.01e-6 4.78e-6 2.45e-1 8.68e-2 8.08e-2

5e-2 1.19e-5 8.89e-6 7.05e-5 1.08e-4 2.51e-1 1.05e-1 9.69e-2

1.2

1

10 2 k

10 0

10 −2

10 −4

σ 2

λ

η

e

x

0.8

0.6

0.4

0.2

exact

numerical

|x| TV

10 4 ||Kx−y|| L

2

10 0

10 −4

0

10 −6

0

(a)

5 10 15 20

−0.2

0

(b)

0.2 0.4

t

0.6 0.8 1

10 −8

10 −10

(c)

10 −8 10 −6 2

10 −4

Figure 6: (a) convergence of σ 2 , λ, η and e, and (b) solution, and (c) L-curve for Example 3 with ε = 5%.

stair-cases of TV regularization.

5.4 Case 4: L 2 -L 1

]

χ|t−s|

Table 5: Numerical results for Example 4.

ε σ0 2 σGT 2 η LC η GT λ GT e LC e GT

5e-6 5.73e-12 3.59e-12 2.85e-9 3.90e-8 1.66e0 4.31e-4 5.64e-4

5e-5 5.73e-10 3.81e-10 1.00e-8 8.74e-7 1.66e0 1.05e-2 6.70e-3

5e-4 5.73e-8 3.94e-8 1.87e-7 1.92e-5 1.66e0 7.08e-2 3.12e-2

5e-3 5.73e-6 4.10e-6 2.85e-5 4.25e-4 1.66e0 1.88e-1 5.91e-2

5e-2 5.73e-4 4.27e-4 2.85e-3 9.45e-3 1.67e0 1.51e-1 6.88e-2

1.5

10 4 k

10 0

1

0.5

10 0

x

0

||x|| L

1

10 4 ||Kx−y|| L

2

10 −4

σ 2

λ

η

e

10 −8

0

(a)

5 10 15 20

−0.5

−1

exact

numerical

−1.5

10 −12

−6 −4 −2 0 2 4 6

(b) 10 −8 10 −6 10 −4 10 −2 10 0

t

(c)

2

10 −8

Figure 8: (a) convergence of σ 2 , λ, η and e, (b) solution, and (c) L-curve for Example 4 with ε = 5%.

all five noise levels, and thus η GT scales as σ 4 3

GT

. The solution shows the feature of sparsity-promoting

L 1 regularizing: the locations of the all three small spikes are perfectly detected, and the retrieved

magnitudes are reasonably accurate.

5.5 Case 5: L 1 -T V

Example 5 (Deblurring 1D image). The functions k and x are given by k(s, t) = 1

4 √ (s−t)2

e− 32 χ

2π |s−t|

Table 6: Numerical results for Example 5.

ε q σ0 2 σGT 2 η LC η GT λ GT e LC e GT

1e-4 30% 2.18e-5 1.65e-5 5.34e-2 2.02e-2 4.97e0 4.03e-7 2.96e-7

1e-3 30% 2.18e-4 1.64e-4 8.11e-2 6.38e-2 4.97e0 5.93e-7 4.68e-7

1e-2 30% 2.18e-3 1.64e-3 4.33e-1 2.02e-1 4.97e0 8.18e-5 2.88e-6

1e-1 30% 2.18e-2 1.64e-2 1.00e0 6.38e-1 4.97e0 2.93e-3 5.62e-4

1e0 30% 2.18e-1 1.65e-1 1.52e0 2.02e0 4.97e0 9.42e-3 1.63e-2

1e-4 50% 3.63e-5 3.27e-5 1.23e-1 2.84e-2 4.97e0 1.74e-4 3.77e-4

1e-3 50% 3.63e-4 3.27e-4 1.23e-1 9.00e-2 4.97e0 1.03e-3 1.20e-3

1e-2 50% 3.63e-3 3.28e-3 6.58e-1 2.85e-1 4.97e0 1.68e-2 8.58e-3

1e-1 50% 3.63e-2 3.28e-2 6.58e-1 9.01e-1 4.97e0 2.69e-2 3.53e-2

1e0 50% 3.63e-1 3.27e-1 1.00e0 2.85e0 4.97e0 3.81e-2 6.39e-2

1.2

10 0

1

0.8

10 4

10 1 k

10 −1

10 −2

σ 2

λ

η

e

x

0.6

0.4

0.2

0

exact

numerical

|x| TV

10 8 ||Kx−y|| L

1

10 0

10 −4

10 −8

10 −3

0

(a)

5 10 15 20

−0.2

0

(b)

20 40

t

60 80 100

10 −12

10

(c)

−2 10 −1 10 0

Figure 10: (a) convergence of σ 2 , λ, η and e, (b) solution, and (c) L-curve for Example 5 with ε = 1 and

q = 50%.

e.g. [2, 6] in case of ε = 1 and q = 50%, see Figure 9(a), where the solid and dashed curves refer to q = 50%

and q = 30%, respectively. However, the accuracy error e varies broadly, albeit mostly it is very small.

This is inherent to the L 1 -T V functional [9] for which η plays the role of a characteristic parameter, and

at some specific values the profile of the solution undergoes sudden transition. The mean of the estimate

σGT 2 approximates the exact noise level σ2 0 excellently, e.g. 3.62×10 −2 in case of ε = 1×10 −1 and q = 50%,

which coincides with the exact noise level, see Table 6. Therefore, the estimate σGT 2 is a valid variance

estimate for non-Gaussian noise. Note that the estimate σGT 2 for q = 30% is generally less accurate than

that for q = 50% due to the less effective samples available, as indicated by the looser probability bound.

Interestingly, the numerical results of the former are far more accurate than that of the latter, usually by

one order in magnitude, which clearly illustrates the profound effect of the corruption percentage q on

the inverse solution.

Exemplary numerical results for Example 5 with ε = 1 and q = 50% noise in the data are presented in

Figure 10. The L-curve does not exhibit the typical L-shape, and there is no distinct corner, see Figure

10(c), which causes its ambiguity in practical usage. This concurs with previous theoretical and numerical

observations in the context of image denoising [9]. Moreover, it lacks concavity and monotonicity due to

limited precision of the solver for the optimization problem, and there are many points clustering around

the ‘corner’, which renders its accurate numerical locating very challenging. Practically, the discontinuity

points, as indicated by the small steps along the curve, may serve as a regularization parameter, and

the lower step is selected. The results presented in Table 6 for the L-curve criterion are obtained with

this rule of thumb. The numerical results are accurate, and thus the ad hoc rule seems viable. A steady

and fast convergence of the algorithm is also observed, and taking account into the large amount of data

noise, the converged profile approximates excellently the exact one.

6 Concluding remarks

This paper proposes and analyzes a new rule for choosing the regularization parameter in Tikhonov

regularization. The existence of solutions to the regularization parameter equation is shown, a variational

characterization is provided, and a posterior error estimates of the inverse solution are derived. An

effective iterative algorithm is suggested, and its monotonically convergence is investigated. Results of

five regularizing formulations, i.e. L 2 -L 2 , L 2 -H 2 with constraint, L 2 -T V , L 2 -L 1 and L 1 -T V , for several

benchmark examples are presented to illustrate its numerical features. The numerical results indicate

that the proposed rule is competitive with existing heuristic choice rules, e.g. the L-curve criterion and

generalized cross-validation, and the algorithm merits a fast and steady convergence. Unlike most existing

choice rules, solid mathematical justification is established for the new choice rule.

There are several avenues for future research, which are currently under investigation. For large-scale

inverse problems, iterative regularization methods, e.g. Landweber method, conjugate gradient method

and generalized minimal residual method, are of immense practical interest. It is of great practical

significance to develop analogous choice rules for these methods. Secondly, some applications require

multiple regularization terms for preserving several distinct features simultaneously, and consequently,

call for multiple regularization parameters. Thus extending the idea to this context would be interesting.

Acknowledgements

The authors would like to thank Professor Per Christian Hansen for his MATLAB package Regularization

Tool with which some of our numerical experiments were conducted.

References

[1] Archer G, Tiggerington DM. On some Bayesian/regularization methods for image restoration. IEEE

Transactions on Image Processing 1995;4(7):989–995.

[2] Aubert G, Aujol JF. A variational approach to remove multiplicative noise. SIAM Journal on Applied

Mathematics 2008;68(4): 925–946.

[3] Bauer F, Kindermann S. The quasi-optimality criterion for classical inverse problems. Inverse Problems

2008;24(3): 035002.

[4] Banks HT, Kunisch K. Estimation Techniques for Distributed Parameter Systems. Birkhäuser:

Boston; 1989.

[5] Bazán FSV. Fixed point iterations in determining the Tikhonov regularization parameter. Inverse

Problems 2008;24: 0350001.

[6] Beck JV, Blackwell B, Clair CRS. Inverse Heat Conduction: Ill-Posed Problems. Wiley: New York;

1985.

[7] Besag J. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, Series

B (Methodological) 1986;48(3): 259–302.

[8] Burger M, Osher S. Convergence rates of convex variational regularization. Inverse Problems

2004;20(5): 1411–1421.

[9] Chan TF, Esedoḡlu S. Aspects of total variation regularized L 1 function approximation. SIAM

Journal on Applied Mathematics 2005;65(5): 1817–1837.

[10] Chan TF, Shen J. Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic Methods.

SIAM: Philadelphia; 2005.

[11] Engl HW, Hanke M, Neubauer A. Regularization of Inverse Problems. Kluwer: Dordrecht; 1996.

[12] Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis (2nd ed.). Chapman & Hall:

Boca Raton, 2004.

[13] Golub GH, Heath MT, Wahba G. Generalized cross-validation as a method for choosing a good ridge

parameter. Technometrics 1979;21(2): 215–223.

[14] Hanke M. Limitations of the L-curve method in ill-posed problems. BIT 1996;36(2): 287–301.

[15] Hansen PC. Analysis of discrete ill-posed problems by means of the L-curve. SIAM Review 1992;

34(4): 561–580.

[16] Hansen PC. Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion.

SIAM: Philadelphia; 1998.

[17] Hansen PC. Regularization Tools Version 4.0 for Matlab 7.3. Numerical Algorithms 2007;46(2):

189–194. Software available at: http://www2.imm.dtu.dk/∼pch/Regutools/index.html.

[18] Hintermüller M, Ito K, Kunisch K. The primal-dual active set strategy as a semismooth Newton

method. SIAM Journal on Optimization 2003;13(3): 865–888.

[19] Jin B, Zou J. A Bayesian inference approach to the ill-posed Cauchy problem of steady-state heat

conduction. International Journal for Numerical Methods in Engineering, in press. Available at:

http://doi.wiley.com/10.1002/nme.2350.

[20] Johnston PR, Gulrajani RM. An analysis of the zero-crossing method for choosing regularization

parameters. SIAM Journal on Scientific Computing 2002;24(2): 428–442.

[21] Kunisch K, Zou J. Iterative choices of regularization parameters in linear inverse problems. Inverse

Problems 1998;14(5): 1247–1264.

[22] Leonov AS. Regularization of ill-posed problems in Sobolev space W 1 1 . Journal of Inverse and Ill-

Posed Problems 2005;13(6): 595-619.

[23] Natterer F. The Mathematics of Computerized Tomography. Teubner: Stuttgart; 1986.

[24] Nikolova M. Minimizers of cost-functions involving non-smooth data-fidelity terms: application to

the processing of outliers. SIAM Journal on Numerical Analysis 2002;40(3): 965–994.

[25] Morozov VA. On the solution of functional equations by the method of regularization. Soviet Mathematics

Doklady 1966;7(3): 414–417.

[26] Reginska T. A regularization parameter in discrete ill-posed problems. SIAM Journal on Scientific

Computing 1996;17(3): 740–749.

[27] Thompson AM, Kay J. On some choices of regularization parameter in image restoration. Inverse

Problems 1993;9(6): 749–761.

[28] Tikhonov AN, Arsenin VA. Solutions of Ill-posed Problems. Winston & Sons: Washington; 1977.

[29] Tikhonov AN, Glasko V, Kriksin Y. On the question of quasi-optimal choice of a regularized approximation.

Soviet Mathematics Doklady 1979;20(5): 1036–1040.

[30] Vogel CR. Computational Methods for Inverse Problems. SIAM: Philadelphia, PA; 2002.

[31] Wahba G. Spline Models for Observational Data. SIAM: Philadelphia; 1990.

[32] Xie JL, Zou J. An improved model function method for choosing regularization parameters in linear

inverse problems. Inverse Problems 2002;18(3): 631–643.

A New Choice Rule for Regularization Parameters in Tikhonov ...

A New Choice Rule for Regularization Parameters in Tikhonov ... ... View more A New Choice Rule for Regularization Parameters in Tikhonov ...

Delete template?

Save as template ?

A New Choice Rule for Regularization Parameters in Tikhonov ... A New Choice Rule for Regularization Parameters in Tikhonov ...