10.03.2015 Views

(i) {α - Convex Optimization

(i) {α - Convex Optimization

(i) {α - Convex Optimization

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Optimization</strong> problems in<br />

compressed sensing<br />

Jalal Fadili<br />

CNRS, ENSI Caen France<br />

Applied mathematics seminar<br />

Stanford July 2008


Today’s talk is about ...<br />

Compressed sensing.<br />

Sparse representations.<br />

<strong>Convex</strong> analysis and operator splitting.<br />

Non-smooth optimization.<br />

Monotone operator splitting.<br />

Fast algorithms.<br />

Stanford seminar 08-2


Compressed/ive Sensing<br />

Stanford seminar 08-3


Compressed/ive Sensing<br />

Common wisdom: Shannon sampling theorem and bandlimited signals.<br />

Stanford seminar 08-3


Compressed/ive Sensing<br />

Common wisdom: Shannon sampling theorem and bandlimited signals.<br />

The CS theory asserts that one can recover certain (sparse) signals and<br />

images from far fewer measurements m than data samples n.<br />

Dictionary<br />

Sensing<br />

y<br />

H<br />

m × 1 m × n n × L<br />

x<br />

n × 1<br />

L × 1<br />

}<br />

Stanford seminar 08-3


Compressed/ive Sensing<br />

Common wisdom: Shannon sampling theorem and bandlimited signals.<br />

The CS theory asserts that one can recover certain (sparse) signals and<br />

images from far fewer measurements m than data samples n.<br />

The CS acquisition scenario:<br />

Dictionary<br />

Sensing<br />

y<br />

H<br />

m × 1 m × n n × L<br />

x<br />

n × 1<br />

L × 1<br />

}<br />

Stanford seminar 08-3


Compressed/ive Sensing<br />

Common wisdom: Shannon sampling theorem and bandlimited signals.<br />

The CS theory asserts that one can recover certain (sparse) signals and<br />

images from far fewer measurements m than data samples n.<br />

The CS acquisition scenario:<br />

Decoder ∆ Φ (y)<br />

Dictionary<br />

Sensing<br />

y<br />

H<br />

m × 1 m × n n × L<br />

x<br />

n × 1<br />

L × 1<br />

}<br />

Stanford seminar 08-3


Compressed/ive Sensing (cont’d)<br />

Stanford seminar 08-4


Compressed/ive Sensing (cont’d)<br />

CS relies on two tenets:<br />

Stanford seminar 08-4


Compressed/ive Sensing (cont’d)<br />

CS relies on two tenets:<br />

Sparsity (compressibility):<br />

Incoherence: the sensing vectors<br />

the sparsity waveforms .<br />

(ϕ j ) L j=1<br />

as different as possible from<br />

Stanford seminar 08-4


Compressed/ive Sensing (cont’d)<br />

CS relies on two tenets:<br />

Sparsity (compressibility):<br />

Incoherence: the sensing vectors<br />

the sparsity waveforms .<br />

(ϕ j ) L j=1<br />

as different as possible from<br />

∆ Φ (y) :R m → R L<br />

CS decoder<br />

solving the non-linear program<br />

proposes to recover the signal/image by<br />

(P eq ) : min ‖α‖ l1<br />

s.t. y = HΦα.<br />

Stanford seminar 08-4


Construct H : Random Sensing<br />

Random Sensing: α is s-sparse and given m measurements selected uniformly<br />

at random from an ensemble. If m ≥ Csµ 2 H,Φ log n, then minimizing (P eq)<br />

reconstructs α exactly with overwhelming probability.<br />

Compressible signals/images and l 2 − l 1 instance optimality: (P eq ) solution<br />

recovers the s-largest entries.<br />

Stability to noise y = HΦα + ε: the decoder ∆ Φ,σ (y)<br />

(P σ ) : min ‖α‖ l1<br />

s.t. ‖y − HΦα‖ l2<br />

≤ σ<br />

has l 2 − l 1 instance optimality and a factor of the noise std σ.<br />

Stanford seminar 08-5


<strong>Convex</strong> analysis and operator splitting<br />

Stanford seminar 08-6


Class of problems in CS<br />

Recall y = Hx + ε, ε iid and Var (ε) =σε 2 < +∞,<br />

x = Φα, α (nearly-)sparse.<br />

Typical (equivalent) minimization problems:<br />

(P eq ) : min Ψ (α) s.t. y = HΦα<br />

(P σ ) : min Ψ (α) s.t. ‖y − HΦα‖ l2<br />

≤ σ<br />

(P τ ) : min 1 2 ‖y − HΦα‖2 l 2<br />

s.t. Ψ (α) ≤ τ<br />

(P λ ) : min 1 2 ‖y − HΦα‖2 l 2<br />

+λΨ(α)<br />

(P eq ) is (P σ ) when no noise.<br />

Ψ (α) = ∑ γ∈Γ ψ (α γ).<br />

ψ a sparsity-promoting penalty: non-negative, continuous, even-symmetric, and<br />

non-decreasing on R + , not necessarily smooth at point zero to produce sparse<br />

solutions.<br />

e.g. Ψ(α) =‖α‖ l1<br />

.<br />

Stanford seminar 08-7


Class of problems in CS (cont’d)<br />

(P σ ), (P τ ) and (P λ ) can all be cast as:<br />

(P1) : min<br />

α<br />

f 1(α) + f 2 (α)<br />

f 1 and f 2 are proper lsc convex functions.<br />

Stanford seminar 08-8


Class of problems in CS (cont’d)<br />

(P σ ), (P τ ) and (P λ ) can all be cast as:<br />

(P1) : min<br />

α<br />

f 1(α) + f 2 (α)<br />

f 1 and f 2 are proper lsc convex functions.<br />

(P λ )<br />

1<br />

2 ‖y − HΦα‖2 l 2<br />

+ λΨ (α)<br />

f 1 (α) = 1 2 ‖y − HΦα‖2 l 2<br />

,f 2 (α) =λΨ (α)<br />

Stanford seminar 08-8


Class of problems in CS (cont’d)<br />

(P σ ), (P τ ) and (P λ ) can all be cast as:<br />

(P1) : min<br />

α<br />

f 1(α) + f 2 (α)<br />

f 1 and f 2 are proper lsc convex functions.<br />

(P λ )<br />

1<br />

2 ‖y − HΦα‖2 l 2<br />

+ λΨ (α)<br />

f 1 (α) = 1 2 ‖y − HΦα‖2 l 2<br />

,f 2 (α) =λΨ (α)<br />

(P σ )<br />

Ψ (α) s.t. ‖y − HΦα‖ l2<br />

≤ σ<br />

f 1 (α) =Ψ (α) ,f 2 (α) =ı Bl2 ,σ (α)<br />

Stanford seminar 08-8


Class of problems in CS (cont’d)<br />

(P σ ), (P τ ) and (P λ ) can all be cast as:<br />

(P1) : min<br />

α<br />

f 1(α) + f 2 (α)<br />

f 1 and f 2 are proper lsc convex functions.<br />

(P λ )<br />

1<br />

2 ‖y − HΦα‖2 l 2<br />

+ λΨ (α)<br />

f 1 (α) = 1 2 ‖y − HΦα‖2 l 2<br />

,f 2 (α) =λΨ (α)<br />

(P σ )<br />

Ψ (α) s.t. ‖y − HΦα‖ l2<br />

≤ σ<br />

f 1 (α) =Ψ (α) ,f 2 (α) =ı Bl2 ,σ (α)<br />

(P τ )<br />

1<br />

2 ‖y − HΦα‖2 l 2<br />

s.t. Ψ (α) ≤ τ<br />

f 1 (α) = 1 2 ‖y − HΦα‖2 l 2<br />

,f 2 (α) =ı Bψ,τ (α)<br />

Stanford seminar 08-8


Characterization<br />

Theorem 1<br />

(i) Existence: (P1) possesses at least one solution if f 1 + f 2 is coercive, i.e.,<br />

lim ‖α‖→+∞ f 1 (α)+f 2 (α) = +∞.<br />

(ii) Uniqueness: (P1) possesses at most one solution if f 1 + f 2 is strictly convex.<br />

This occurs in particular when either f 1 or f 2 is strictly convex.<br />

(iii) Characterization: Let α ∈ H. Then the following statements are equivalent:<br />

(a) α solves (P1).<br />

(b) α =(I + ∂(f 1 + f 2 )) −1 (α) (proximal iteration).<br />

∂f i is the subdifferential (set-valued map), a maximal monotone operator.<br />

J ∂(f1 +f 2 )<br />

= (I + ∂(f 1 + f 2 )) −1 is the resolvent of ∂(f 1 + f 2 ) (firmly nonexpansive<br />

operator).<br />

Stanford seminar 08-9


Characterization<br />

Theorem 1<br />

(i) Existence: (P1) possesses at least one solution if f 1 + f 2 is coercive, i.e.,<br />

lim ‖α‖→+∞ f 1 (α)+f 2 (α) = +∞.<br />

(ii) Uniqueness: (P1) possesses at most one solution if f 1 + f 2 is strictly convex.<br />

This occurs in particular when either f 1 or f 2 is strictly convex.<br />

(iii) Characterization: Let α ∈ H. Then the following statements are equivalent:<br />

(a) α solves (P1).<br />

(b) α =(I + ∂(f 1 + f 2 )) −1 (α) (proximal iteration).<br />

∂f i is the subdifferential (set-valued map), a maximal monotone operator.<br />

J ∂(f1 +f 2 )<br />

= (I + ∂(f 1 + f 2 )) −1 is the resolvent of ∂(f 1 + f 2 ) (firmly nonexpansive<br />

operator).<br />

Explicit inversion difficult in general<br />

Stanford seminar 08-9


Operator splitting schemes<br />

Idea: replace explicit evaluation of the resolvent of ∂ (f 1 + f 2 ) ( i.e. J ∂(f1 +f 2 ))<br />

,<br />

by a sequence of calculations involving only ∂f 1 and ∂f 2 at a time.<br />

An extensive literature essentially divided into three classes:<br />

Splitting method Assumptions Iteration<br />

Forward-Backward f 2 has a Lipschitzcontinuous<br />

gradient<br />

Backward-Backward f 1 ,f 2 proper lsc convex<br />

but ...<br />

Douglas/Peaceman-Rachford All f 1 ,f 2 proper lsc<br />

convex<br />

α t+1 = J µ∂f1 (I − µ∇f 2 )(α t )<br />

α t+1 = J ∂f1 J ∂f2 (α t )<br />

α t+1 =(J ∂f1 (2J ∂f2 − I)+I − J ∂f2 )(α t )<br />

Stanford seminar 08-10


Operator splitting schemes<br />

Idea: replace explicit evaluation of the resolvent of ∂ (f 1 + f 2 ) ( i.e. J ∂(f1 +f 2 ))<br />

,<br />

by a sequence of calculations involving only ∂f 1 and ∂f 2 at a time.<br />

An extensive literature essentially divided into three classes:<br />

Splitting method Assumptions Iteration<br />

Forward-Backward f 2 has a Lipschitzcontinuous<br />

gradient<br />

Backward-Backward f 1 ,f 2 proper lsc convex<br />

but ...<br />

Douglas/Peaceman-Rachford All f 1 ,f 2 proper lsc<br />

convex<br />

α t+1 = J µ∂f1 (I − µ∇f 2 )(α t )<br />

α t+1 = J ∂f1 J ∂f2 (α t )<br />

α t+1 =(J ∂f1 (2J ∂f2 − I)+I − J ∂f2 )(α t )<br />

J ∂fi<br />

= (I + ∂f i ) −1<br />

Moreau(-Yosida) proximity operators<br />

Stanford seminar 08-10


Proximity operators<br />

Some properties<br />

The prox is uniquely valued.<br />

The Moreau envelope<br />

is Fréchet-differentiable with<br />

Lipschitz continuous gradient, hence the name Moreau-Yosida regularization.<br />

Many useful rules in proximal calculus (scaling, translation, quadratic<br />

perturbation, etc).<br />

Stanford seminar 08-11


Example of proximity operator<br />

Stanford seminar 08-12


Example of proximity operator<br />

Stanford seminar 08-12


Example of proximity operator<br />

Stanford seminar 08-12


Compressed sensing optimization<br />

problems<br />

Stanford seminar 08-13


Problem statement<br />

y = HΦα + ε, Var (ε) =σ 2 ε.<br />

Consider the following minimization problems:<br />

(P σ ) : min Ψ (α) s.t. ‖y − HΦα‖ l2<br />

≤ σ<br />

(P eq ) : min Ψ (α) s.t. y = HΦα<br />

(P τ ) : min 1 2 ‖y − HΦα‖2 l 2<br />

s.t. Ψ (α) ≤ τ<br />

Ψ (α) = ∑ γ∈Γ ψ (α γ).<br />

ψ is a sparsity-promoting penalty.<br />

Stanford seminar 08-14


Characterizing Problem (P τ )<br />

Stanford seminar 08-15


FB to solve Problem (P τ )<br />

Theorem 1 Let {µ t ,t ∈ N} be a sequence such that 0 < inf t µ t ≤ sup t µ t <<br />

2/ ‖HΦ‖ 2 , let {β t ,t∈ N} be a sequence in (0, 1], and let {a t ,t∈ N} and {b t ,t∈ N}<br />

be sequences in H such that ∑ t ‖a t‖ < +∞ and ∑ t ‖b t‖ < +∞. Fix α 0 , and<br />

define the sequence of iterates:<br />

α t+1 = α t + β t<br />

(<br />

proj BΨ,τ<br />

(<br />

α t + µ t<br />

(<br />

Φ ∗ H ∗ ( y − HΦα t)) − b t<br />

)<br />

+ at − α (t))<br />

where proj BΨ,τ<br />

is the projector onto the Ψ-ball of radius τ.<br />

(i) {α (t) ,t≥ 0} converges weakly to a minimizer of (P λ ).<br />

(ii) For Ψ(α) =‖α‖ l1<br />

, there exists a subsequence {α (t) ,t ≥ 0} that converges<br />

strongly to a minimizer of (P λ ).<br />

(iii) The projection operator proj BΨ,τ<br />

is<br />

⎧<br />

⎨α if Ψ (α) ≤ τ,<br />

proj BΨ,τ<br />

(α) =<br />

⎩prox κΨ α otherwise.<br />

,<br />

where κ (depending on α and τ) is chosen such that Ψ (prox κΨ (α)) = τ.<br />

Stanford seminar 08-16


Proximity operators of Ψ<br />

Conclusion<br />

Thresholding/shrinkage operator: e.g. soft-thresholding for the<br />

Computational cost: O(L) for L coefficients.<br />

norm.<br />

Stanford seminar 08-17


Characterizing Problem (P σ )<br />

Stanford seminar 08-18


DR to solve Problem (P σ )<br />

Theorem 1 Suppose that A = HΦ is a tight frame, i.e. AA ∗ = cI. Let µ ∈<br />

(0, +∞), let {β t } be a sequence in (0, 2), and let {a t } and {b t } be sequences in H<br />

such that ∑ t β t(2 − β t ) = +∞ and ∑ t β t (‖a t ‖ + ‖b t ‖) < +∞. Fix α 0 ∈ B l2 ,σ<br />

and define the sequence of iterates,<br />

α t+1/2 = α t + c −1 A ∗ (P Bσ − I) A(α t )+b<br />

(<br />

t<br />

α t+1 = α t + β t<br />

(prox µΨ ◦ 2α t+1/2 − α t) + a t − α t+1/2) ,<br />

where,<br />

(P Bσ − I)(x) =<br />

σ<br />

‖x−y‖ l2<br />

As before<br />

⎧<br />

⎨0 if ‖x − y‖ l2<br />

≤ σ,<br />

(<br />

)<br />

⎩<br />

σ<br />

x + y 1 −<br />

‖x−y‖<br />

otherwise.<br />

l2<br />

Then {α t ,t≥ 0} converges weakly to some point α and α + c −1 A ∗ (P Bσ − I)(α)<br />

is a solution to (P σ ).<br />

Stanford seminar 08-19


Characterizing Problem (Peq)<br />

min Ψ(α) s.t.<br />

y = HΦα.<br />

Proposition 1<br />

(i) Existence: (P eq ) has at least one solution.<br />

(ii) Uniqueness: (P eq ) has a unique solution if ψ is strictly convex.<br />

(iii) If α solves (P eq ), then it solves (P σ=0 ).<br />

Stanford seminar 08-20


DR to solve Problem (Peq)<br />

Theorem 1 Let A = HΦ. Let µ ∈ (0, +∞), let {β t } be a sequence in (0, 2),<br />

and let {a t } and {b t } be sequences in H such that ∑ t β t(2 − β t ) = +∞ and<br />

∑<br />

t β t (‖a t ‖ + ‖b t ‖) < +∞. Fix α 0 and define the sequence of iterates,<br />

α t+1/2 = α t + A ∗ (AA ∗ ) −1 ( y − Aα t) + b t<br />

(<br />

α t+1 = α t + β t<br />

(prox µΨ ◦ 2α t+1/2 − α t) + a t − α t+1/2) .<br />

Then {α t , t ≥ 0} converges weakly to some point α and α+A ∗ (AA ∗ ) −1 (y − Aα)<br />

is a solution to (P eq ).<br />

As before<br />

Stanford seminar 08-21


Other problems<br />

The same framework can also handle problems with analysis prior p ≥ 1:<br />

min<br />

x<br />

‖Dx‖ p l p<br />

s.t. ‖y − Hx‖ l2<br />

≤ σ<br />

min<br />

x<br />

‖Dx‖ p l p<br />

s.t. y = Hx .<br />

D a bounded linear operator, e.g. frame D = Φ ∗ , finite differences for TVregularization,<br />

etc.<br />

The proximity operator of ‖Dx‖ p p<br />

obtained through Legendre-Fenchel conjugacy<br />

and Forward-Backward splitting algorithm.<br />

Stanford seminar 08-22


Pros and cons<br />

(P σ ) and (P eq ) have the same computational complexity : 2N iter (V H + V Φ + O (L)).<br />

(P τ ) : 2N iter (V H + V Φ )+O (N iter L log L).<br />

V H and V Φ are the complexities of the operators H (resp. H ∗ ) and Φ (resp.Φ ∗ ).<br />

V H,Φ is O(n) or O(n log n) for most useful transforms/sensing operators.<br />

No tight frames with (Peq) but does not handle noise .<br />

No tight frames with (P τ ) but τ is difficult to choose .<br />

Tight frames with (P σ ) but handles noise and σ is easier to choose .<br />

Stanford seminar 08-23


Stylized applications<br />

Stanford seminar 08-24


CS reconstruction (1)<br />

H = Fourier, Φ = DWT, Ψ(α) = ‖α‖ l1<br />

, m/n = 17%<br />

Projections RealFourier m/n=0.17 SNR=30 dB<br />

Original image<br />

(P τ ) (Peq)-DR (P )-DR<br />

LASSO−Prox Iter=1000 PSNR=22.0734 dB BP−DR (noiseless) Iter=1000 PSNR=23.0226 dB BPDN−DR Iter=1000 PSNR=22.1286 dB<br />

Stanford seminar 08-25


CS reconstruction (2)<br />

Projections Real Fourier SNR=35.2649 dB<br />

Projections Real Fourier SNR=35.2649 dB<br />

Original phantom image<br />

Projections Real Fourier SNR=35.2649 dB<br />

Original phantom image<br />

TV−DR on on noiseless data Iter=200 PSNR=75.4585 dB dB<br />

TVDN−DR on on noisy data Iter=200 PSNR=40.7432 dB dB<br />

TV−DR on noiseless data Iter=200 PSNR=75.4585 dB<br />

TVDN−DR on noisy data Iter=200 PSNR=40.7432 dB<br />

Stanford seminar 08-26


Inpainting and CS<br />

H = Dirac, Φ = Curvelets + LDCT, Ψ(α) =‖α‖ l1<br />

, m/n = 20%<br />

Stanford seminar 08-27


Inpainting and CS<br />

H = Dirac, Φ = Curvelets + LDCT, Ψ(α) =‖α‖ l1<br />

, m/n = 20%<br />

Stanford seminar 08-27


Inpainting and CS<br />

H = Dirac, Φ = Curvelets + LDCT, Ψ(α) =‖α‖ l1<br />

, m/n = 20%<br />

Stanford seminar 08-27


Super-resolution<br />

Stanford seminar 08-28


Computation time<br />

CS H = Fourier, Φ = Dirac<br />

m/n !=Dirac, = H=RST, 20%, m/n=25%, sparsity sparsity=5% = 5%<br />

CS H = Hadamard, Φ = Dirac<br />

m/n = 20%, sparsity = 5%<br />

!=Dirac, H=Hadamard, m/n=25%, sparsity=5%<br />

10<br />

4<br />

log 2 (m)<br />

10 3<br />

BP−DR<br />

LARS<br />

LP−Interior Point<br />

StOMP<br />

10 1<br />

10 2 log 2<br />

(m)<br />

BP−DR<br />

LARS<br />

LP−Interior Point<br />

StOMP<br />

Computation time (s)<br />

10 0<br />

Computation time (s)<br />

10 2<br />

10 1<br />

10 0<br />

10 −1<br />

10 −1<br />

6 7 8 9 10 11 12<br />

6 7 8 9 10 11 12 13 14<br />

Stanford seminar 08-29


Take-away messages<br />

A general framework of optimization problems in CS:<br />

maximal monotone operator splitting.<br />

Good and fast solvers for large-scale problems:<br />

Iterative-thresholding.<br />

Grounded theoretical results.<br />

A wide variety of applications beyond CS.<br />

Stanford seminar 08-<br />

30


Ongoing and future work<br />

Beyond the linear case with AWGN [FXD-F.-JLS 07-08].<br />

More comparison to other algorithms.<br />

Work harder for algorithms faster than linear.<br />

Other problems, e.g. Dantzig selector.<br />

Other applications.<br />

Stanford seminar 08-31


Extended experiments, code and paper available soon at<br />

http://www.greyc.ensicaen.fr/~jfadili<br />

Sparsity stuff<br />

http://www.morphologicaldiversity.org<br />

Thanks<br />

Any questions ?<br />

Stanford seminar 08-32

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!