(i) {Î± - Convex Optimization

Optimization problems in 

compressed sensing 

Jalal Fadili 

CNRS, ENSI Caen France 

Applied mathematics seminar 

Stanford July 2008

Today’s talk is about ... 

Compressed sensing. 

Sparse representations. 

Convex analysis and operator splitting. 

Non-smooth optimization. 

Monotone operator splitting. 

Fast algorithms. 

Stanford seminar 08-2

Compressed/ive Sensing 



Common wisdom: Shannon sampling theorem and bandlimited signals. 




The CS theory asserts that one can recover certain (sparse) signals and 

images from far fewer measurements m than data samples n. 

Dictionary 

Sensing 

y 

H 

m × 1 m × n n × L 

x 

n × 1 

L × 1 

} 






The CS acquisition scenario: 

Dictionary 

Sensing 

y 

H 

m × 1 m × n n × L 

x 

n × 1 

L × 1 

} 






The CS acquisition scenario: 

Decoder ∆ Φ (y) 

Dictionary 

Sensing 

y 

H 

m × 1 m × n n × L 

x 

n × 1 

L × 1 

} 


Compressed/ive Sensing (cont’d) 



CS relies on two tenets: 




Sparsity (compressibility): 

Incoherence: the sensing vectors 

the sparsity waveforms . 

(ϕ j ) L j=1 

as different as possible from 




Sparsity (compressibility): 

Incoherence: the sensing vectors 

the sparsity waveforms . 

(ϕ j ) L j=1 

as different as possible from 

∆ Φ (y) :R m → R L 

CS decoder 

solving the non-linear program 

proposes to recover the signal/image by 

(P eq ) : min ‖α‖ l1 

s.t. y = HΦα. 


Construct H : Random Sensing 

Random Sensing: α is s-sparse and given m measurements selected uniformly 

at random from an ensemble. If m ≥ Csµ 2 H,Φ log n, then minimizing (P eq) 

reconstructs α exactly with overwhelming probability. 

Compressible signals/images and l 2 − l 1 instance optimality: (P eq ) solution 

recovers the s-largest entries. 

Stability to noise y = HΦα + ε: the decoder ∆ Φ,σ (y) 

(P σ ) : min ‖α‖ l1 

s.t. ‖y − HΦα‖ l2 

≤ σ 

has l 2 − l 1 instance optimality and a factor of the noise std σ. 


Convex analysis and operator splitting 


Class of problems in CS 

Recall y = Hx + ε, ε iid and Var (ε) =σε 2 < +∞, 

x = Φα, α (nearly-)sparse. 

Typical (equivalent) minimization problems: 

(P eq ) : min Ψ (α) s.t. y = HΦα 

(P σ ) : min Ψ (α) s.t. ‖y − HΦα‖ l2 

≤ σ 

(P τ ) : min 1 2 ‖y − HΦα‖2 l 2 

s.t. Ψ (α) ≤ τ 

(P λ ) : min 1 2 ‖y − HΦα‖2 l 2 

+λΨ(α) 

(P eq ) is (P σ ) when no noise. 

Ψ (α) = ∑ γ∈Γ ψ (α γ). 

ψ a sparsity-promoting penalty: non-negative, continuous, even-symmetric, and 

non-decreasing on R + , not necessarily smooth at point zero to produce sparse 

solutions. 

e.g. Ψ(α) =‖α‖ l1 

. 


Class of problems in CS (cont’d) 

(P σ ), (P τ ) and (P λ ) can all be cast as: 

(P1) : min 

α 

f 1(α) + f 2 (α) 

f 1 and f 2 are proper lsc convex functions. 




(P1) : min 

α 

f 1(α) + f 2 (α) 


(P λ ) 

1 

2 ‖y − HΦα‖2 l 2 

+ λΨ (α) 

f 1 (α) = 1 2 ‖y − HΦα‖2 l 2 

,f 2 (α) =λΨ (α) 




(P1) : min 

α 

f 1(α) + f 2 (α) 


(P λ ) 

1 

2 ‖y − HΦα‖2 l 2 

+ λΨ (α) 

f 1 (α) = 1 2 ‖y − HΦα‖2 l 2 

,f 2 (α) =λΨ (α) 

(P σ ) 

Ψ (α) s.t. ‖y − HΦα‖ l2 

≤ σ 

f 1 (α) =Ψ (α) ,f 2 (α) =ı Bl2 ,σ (α) 




(P1) : min 

α 

f 1(α) + f 2 (α) 


(P λ ) 

1 

2 ‖y − HΦα‖2 l 2 

+ λΨ (α) 

f 1 (α) = 1 2 ‖y − HΦα‖2 l 2 

,f 2 (α) =λΨ (α) 

(P σ ) 

Ψ (α) s.t. ‖y − HΦα‖ l2 

≤ σ 

f 1 (α) =Ψ (α) ,f 2 (α) =ı Bl2 ,σ (α) 

(P τ ) 

1 

2 ‖y − HΦα‖2 l 2 

s.t. Ψ (α) ≤ τ 

f 1 (α) = 1 2 ‖y − HΦα‖2 l 2 

,f 2 (α) =ı Bψ,τ (α) 


Characterization 

Theorem 1 

(i) Existence: (P1) possesses at least one solution if f 1 + f 2 is coercive, i.e., 

lim ‖α‖→+∞ f 1 (α)+f 2 (α) = +∞. 

(ii) Uniqueness: (P1) possesses at most one solution if f 1 + f 2 is strictly convex. 

This occurs in particular when either f 1 or f 2 is strictly convex. 

(iii) Characterization: Let α ∈ H. Then the following statements are equivalent: 

(a) α solves (P1). 

(b) α =(I + ∂(f 1 + f 2 )) −1 (α) (proximal iteration). 

∂f i is the subdifferential (set-valued map), a maximal monotone operator. 

J ∂(f1 +f 2 ) 

= (I + ∂(f 1 + f 2 )) −1 is the resolvent of ∂(f 1 + f 2 ) (firmly nonexpansive 

operator). 


Characterization 

Theorem 1 

(i) Existence: (P1) possesses at least one solution if f 1 + f 2 is coercive, i.e., 

lim ‖α‖→+∞ f 1 (α)+f 2 (α) = +∞. 

(ii) Uniqueness: (P1) possesses at most one solution if f 1 + f 2 is strictly convex. 

This occurs in particular when either f 1 or f 2 is strictly convex. 

(iii) Characterization: Let α ∈ H. Then the following statements are equivalent: 

(a) α solves (P1). 

(b) α =(I + ∂(f 1 + f 2 )) −1 (α) (proximal iteration). 

∂f i is the subdifferential (set-valued map), a maximal monotone operator. 

J ∂(f1 +f 2 ) 

= (I + ∂(f 1 + f 2 )) −1 is the resolvent of ∂(f 1 + f 2 ) (firmly nonexpansive 

operator). 

Explicit inversion difficult in general 


Operator splitting schemes 

Idea: replace explicit evaluation of the resolvent of ∂ (f 1 + f 2 ) ( i.e. J ∂(f1 +f 2 )) 

, 

by a sequence of calculations involving only ∂f 1 and ∂f 2 at a time. 

An extensive literature essentially divided into three classes: 

Splitting method Assumptions Iteration 

Forward-Backward f 2 has a Lipschitzcontinuous 

gradient 

Backward-Backward f 1 ,f 2 proper lsc convex 

but ... 

Douglas/Peaceman-Rachford All f 1 ,f 2 proper lsc 

convex 

α t+1 = J µ∂f1 (I − µ∇f 2 )(α t ) 

α t+1 = J ∂f1 J ∂f2 (α t ) 

α t+1 =(J ∂f1 (2J ∂f2 − I)+I − J ∂f2 )(α t ) 


Operator splitting schemes 

Idea: replace explicit evaluation of the resolvent of ∂ (f 1 + f 2 ) ( i.e. J ∂(f1 +f 2 )) 

, 

by a sequence of calculations involving only ∂f 1 and ∂f 2 at a time. 

An extensive literature essentially divided into three classes: 

Splitting method Assumptions Iteration 

Forward-Backward f 2 has a Lipschitzcontinuous 

gradient 

Backward-Backward f 1 ,f 2 proper lsc convex 

but ... 

Douglas/Peaceman-Rachford All f 1 ,f 2 proper lsc 

convex 

α t+1 = J µ∂f1 (I − µ∇f 2 )(α t ) 

α t+1 = J ∂f1 J ∂f2 (α t ) 

α t+1 =(J ∂f1 (2J ∂f2 − I)+I − J ∂f2 )(α t ) 

J ∂fi 

= (I + ∂f i ) −1 

Moreau(-Yosida) proximity operators 


Proximity operators 

Some properties 

The prox is uniquely valued. 

The Moreau envelope 

is Fréchet-differentiable with 

Lipschitz continuous gradient, hence the name Moreau-Yosida regularization. 

Many useful rules in proximal calculus (scaling, translation, quadratic 

perturbation, etc). 


Example of proximity operator 






Compressed sensing optimization 

problems 


Problem statement 

y = HΦα + ε, Var (ε) =σ 2 ε. 

Consider the following minimization problems: 

(P σ ) : min Ψ (α) s.t. ‖y − HΦα‖ l2 

≤ σ 

(P eq ) : min Ψ (α) s.t. y = HΦα 

(P τ ) : min 1 2 ‖y − HΦα‖2 l 2 

s.t. Ψ (α) ≤ τ 

Ψ (α) = ∑ γ∈Γ ψ (α γ). 

ψ is a sparsity-promoting penalty. 


Characterizing Problem (P τ ) 


FB to solve Problem (P τ ) 

Theorem 1 Let {µ t ,t ∈ N} be a sequence such that 0 < inf t µ t ≤ sup t µ t < 

2/ ‖HΦ‖ 2 , let {β t ,t∈ N} be a sequence in (0, 1], and let {a t ,t∈ N} and {b t ,t∈ N} 

be sequences in H such that ∑ t ‖a t‖ < +∞ and ∑ t ‖b t‖ < +∞. Fix α 0 , and 

define the sequence of iterates: 

α t+1 = α t + β t 

( 

proj BΨ,τ 

( 

α t + µ t 

( 

Φ ∗ H ∗ ( y − HΦα t)) − b t 

) 

+ at − α (t)) 

where proj BΨ,τ 

is the projector onto the Ψ-ball of radius τ. 

(i) {α (t) ,t≥ 0} converges weakly to a minimizer of (P λ ). 

(ii) For Ψ(α) =‖α‖ l1 

, there exists a subsequence {α (t) ,t ≥ 0} that converges 

strongly to a minimizer of (P λ ). 

(iii) The projection operator proj BΨ,τ 

is 

⎧ 

⎨α if Ψ (α) ≤ τ, 

proj BΨ,τ 

(α) = 

⎩prox κΨ α otherwise. 

, 

where κ (depending on α and τ) is chosen such that Ψ (prox κΨ (α)) = τ. 


Proximity operators of Ψ 

Conclusion 

Thresholding/shrinkage operator: e.g. soft-thresholding for the 

Computational cost: O(L) for L coefficients. 

norm. 


Characterizing Problem (P σ ) 


DR to solve Problem (P σ ) 

Theorem 1 Suppose that A = HΦ is a tight frame, i.e. AA ∗ = cI. Let µ ∈ 

(0, +∞), let {β t } be a sequence in (0, 2), and let {a t } and {b t } be sequences in H 

such that ∑ t β t(2 − β t ) = +∞ and ∑ t β t (‖a t ‖ + ‖b t ‖) < +∞. Fix α 0 ∈ B l2 ,σ 

and define the sequence of iterates, 

α t+1/2 = α t + c −1 A ∗ (P Bσ − I) A(α t )+b 

( 

t 

α t+1 = α t + β t 

(prox µΨ ◦ 2α t+1/2 − α t) + a t − α t+1/2) , 

where, 

(P Bσ − I)(x) = 

σ 

‖x−y‖ l2 

As before 

⎧ 

⎨0 if ‖x − y‖ l2 

≤ σ, 

( 

) 

⎩ 

σ 

x + y 1 − 

‖x−y‖ 

otherwise. 

l2 

Then {α t ,t≥ 0} converges weakly to some point α and α + c −1 A ∗ (P Bσ − I)(α) 

is a solution to (P σ ). 


Characterizing Problem (Peq) 

min Ψ(α) s.t. 

y = HΦα. 

Proposition 1 

(i) Existence: (P eq ) has at least one solution. 

(ii) Uniqueness: (P eq ) has a unique solution if ψ is strictly convex. 

(iii) If α solves (P eq ), then it solves (P σ=0 ). 


DR to solve Problem (Peq) 

Theorem 1 Let A = HΦ. Let µ ∈ (0, +∞), let {β t } be a sequence in (0, 2), 

and let {a t } and {b t } be sequences in H such that ∑ t β t(2 − β t ) = +∞ and 

∑ 

t β t (‖a t ‖ + ‖b t ‖) < +∞. Fix α 0 and define the sequence of iterates, 

α t+1/2 = α t + A ∗ (AA ∗ ) −1 ( y − Aα t) + b t 

( 

α t+1 = α t + β t 

(prox µΨ ◦ 2α t+1/2 − α t) + a t − α t+1/2) . 

Then {α t , t ≥ 0} converges weakly to some point α and α+A ∗ (AA ∗ ) −1 (y − Aα) 

is a solution to (P eq ). 

As before 


Other problems 

The same framework can also handle problems with analysis prior p ≥ 1: 

min 

x 

‖Dx‖ p l p 

s.t. ‖y − Hx‖ l2 

≤ σ 

min 

x 

‖Dx‖ p l p 

s.t. y = Hx . 

D a bounded linear operator, e.g. frame D = Φ ∗ , finite differences for TVregularization, 

etc. 

The proximity operator of ‖Dx‖ p p 

obtained through Legendre-Fenchel conjugacy 

and Forward-Backward splitting algorithm. 


Pros and cons 

(P σ ) and (P eq ) have the same computational complexity : 2N iter (V H + V Φ + O (L)). 

(P τ ) : 2N iter (V H + V Φ )+O (N iter L log L). 

V H and V Φ are the complexities of the operators H (resp. H ∗ ) and Φ (resp.Φ ∗ ). 

V H,Φ is O(n) or O(n log n) for most useful transforms/sensing operators. 

No tight frames with (Peq) but does not handle noise . 

No tight frames with (P τ ) but τ is difficult to choose . 

Tight frames with (P σ ) but handles noise and σ is easier to choose . 


Stylized applications 


CS reconstruction (1) 

H = Fourier, Φ = DWT, Ψ(α) = ‖α‖ l1 

, m/n = 17% 

Projections RealFourier m/n=0.17 SNR=30 dB 

Original image 

(P τ ) (Peq)-DR (P )-DR 

LASSO−Prox Iter=1000 PSNR=22.0734 dB BP−DR (noiseless) Iter=1000 PSNR=23.0226 dB BPDN−DR Iter=1000 PSNR=22.1286 dB 


CS reconstruction (2) 

Projections Real Fourier SNR=35.2649 dB 


Original phantom image 


Original phantom image 

TV−DR on on noiseless data Iter=200 PSNR=75.4585 dB dB 

TVDN−DR on on noisy data Iter=200 PSNR=40.7432 dB dB 

TV−DR on noiseless data Iter=200 PSNR=75.4585 dB 

TVDN−DR on noisy data Iter=200 PSNR=40.7432 dB 


Inpainting and CS 

H = Dirac, Φ = Curvelets + LDCT, Ψ(α) =‖α‖ l1 

, m/n = 20% 




, m/n = 20% 




, m/n = 20% 


Super-resolution 


Computation time 

CS H = Fourier, Φ = Dirac 

m/n !=Dirac, = H=RST, 20%, m/n=25%, sparsity sparsity=5% = 5% 

CS H = Hadamard, Φ = Dirac 

m/n = 20%, sparsity = 5% 

!=Dirac, H=Hadamard, m/n=25%, sparsity=5% 

10 

4 

log 2 (m) 

10 3 

BP−DR 

LARS 

LP−Interior Point 

StOMP 

10 1 

10 2 log 2 

(m) 

BP−DR 

LARS 

LP−Interior Point 

StOMP 

Computation time (s) 

10 0 

Computation time (s) 

10 2 

10 1 

10 0 

10 −1 

10 −1 

6 7 8 9 10 11 12 

6 7 8 9 10 11 12 13 14 


Take-away messages 

A general framework of optimization problems in CS: 

maximal monotone operator splitting. 

Good and fast solvers for large-scale problems: 

Iterative-thresholding. 

Grounded theoretical results. 

A wide variety of applications beyond CS. 

Stanford seminar 08- 

30

Ongoing and future work 

Beyond the linear case with AWGN [FXD-F.-JLS 07-08]. 

More comparison to other algorithms. 

Work harder for algorithms faster than linear. 

Other problems, e.g. Dantzig selector. 

Other applications. 


Extended experiments, code and paper available soon at 

http://www.greyc.ensicaen.fr/~jfadili 

Sparsity stuff 

http://www.morphologicaldiversity.org 

Thanks 

Any questions ?

(i) {Î± - Convex Optimization

Create successful ePaper yourself

Delete template?

Save as template?