(i) {α - Convex Optimization
(i) {α - Convex Optimization
(i) {α - Convex Optimization
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Optimization</strong> problems in<br />
compressed sensing<br />
Jalal Fadili<br />
CNRS, ENSI Caen France<br />
Applied mathematics seminar<br />
Stanford July 2008
Today’s talk is about ...<br />
Compressed sensing.<br />
Sparse representations.<br />
<strong>Convex</strong> analysis and operator splitting.<br />
Non-smooth optimization.<br />
Monotone operator splitting.<br />
Fast algorithms.<br />
Stanford seminar 08-2
Compressed/ive Sensing<br />
Stanford seminar 08-3
Compressed/ive Sensing<br />
Common wisdom: Shannon sampling theorem and bandlimited signals.<br />
Stanford seminar 08-3
Compressed/ive Sensing<br />
Common wisdom: Shannon sampling theorem and bandlimited signals.<br />
The CS theory asserts that one can recover certain (sparse) signals and<br />
images from far fewer measurements m than data samples n.<br />
Dictionary<br />
Sensing<br />
y<br />
H<br />
m × 1 m × n n × L<br />
x<br />
n × 1<br />
L × 1<br />
}<br />
Stanford seminar 08-3
Compressed/ive Sensing<br />
Common wisdom: Shannon sampling theorem and bandlimited signals.<br />
The CS theory asserts that one can recover certain (sparse) signals and<br />
images from far fewer measurements m than data samples n.<br />
The CS acquisition scenario:<br />
Dictionary<br />
Sensing<br />
y<br />
H<br />
m × 1 m × n n × L<br />
x<br />
n × 1<br />
L × 1<br />
}<br />
Stanford seminar 08-3
Compressed/ive Sensing<br />
Common wisdom: Shannon sampling theorem and bandlimited signals.<br />
The CS theory asserts that one can recover certain (sparse) signals and<br />
images from far fewer measurements m than data samples n.<br />
The CS acquisition scenario:<br />
Decoder ∆ Φ (y)<br />
Dictionary<br />
Sensing<br />
y<br />
H<br />
m × 1 m × n n × L<br />
x<br />
n × 1<br />
L × 1<br />
}<br />
Stanford seminar 08-3
Compressed/ive Sensing (cont’d)<br />
Stanford seminar 08-4
Compressed/ive Sensing (cont’d)<br />
CS relies on two tenets:<br />
Stanford seminar 08-4
Compressed/ive Sensing (cont’d)<br />
CS relies on two tenets:<br />
Sparsity (compressibility):<br />
Incoherence: the sensing vectors<br />
the sparsity waveforms .<br />
(ϕ j ) L j=1<br />
as different as possible from<br />
Stanford seminar 08-4
Compressed/ive Sensing (cont’d)<br />
CS relies on two tenets:<br />
Sparsity (compressibility):<br />
Incoherence: the sensing vectors<br />
the sparsity waveforms .<br />
(ϕ j ) L j=1<br />
as different as possible from<br />
∆ Φ (y) :R m → R L<br />
CS decoder<br />
solving the non-linear program<br />
proposes to recover the signal/image by<br />
(P eq ) : min ‖α‖ l1<br />
s.t. y = HΦα.<br />
Stanford seminar 08-4
Construct H : Random Sensing<br />
Random Sensing: α is s-sparse and given m measurements selected uniformly<br />
at random from an ensemble. If m ≥ Csµ 2 H,Φ log n, then minimizing (P eq)<br />
reconstructs α exactly with overwhelming probability.<br />
Compressible signals/images and l 2 − l 1 instance optimality: (P eq ) solution<br />
recovers the s-largest entries.<br />
Stability to noise y = HΦα + ε: the decoder ∆ Φ,σ (y)<br />
(P σ ) : min ‖α‖ l1<br />
s.t. ‖y − HΦα‖ l2<br />
≤ σ<br />
has l 2 − l 1 instance optimality and a factor of the noise std σ.<br />
Stanford seminar 08-5
<strong>Convex</strong> analysis and operator splitting<br />
Stanford seminar 08-6
Class of problems in CS<br />
Recall y = Hx + ε, ε iid and Var (ε) =σε 2 < +∞,<br />
x = Φα, α (nearly-)sparse.<br />
Typical (equivalent) minimization problems:<br />
(P eq ) : min Ψ (α) s.t. y = HΦα<br />
(P σ ) : min Ψ (α) s.t. ‖y − HΦα‖ l2<br />
≤ σ<br />
(P τ ) : min 1 2 ‖y − HΦα‖2 l 2<br />
s.t. Ψ (α) ≤ τ<br />
(P λ ) : min 1 2 ‖y − HΦα‖2 l 2<br />
+λΨ(α)<br />
(P eq ) is (P σ ) when no noise.<br />
Ψ (α) = ∑ γ∈Γ ψ (α γ).<br />
ψ a sparsity-promoting penalty: non-negative, continuous, even-symmetric, and<br />
non-decreasing on R + , not necessarily smooth at point zero to produce sparse<br />
solutions.<br />
e.g. Ψ(α) =‖α‖ l1<br />
.<br />
Stanford seminar 08-7
Class of problems in CS (cont’d)<br />
(P σ ), (P τ ) and (P λ ) can all be cast as:<br />
(P1) : min<br />
α<br />
f 1(α) + f 2 (α)<br />
f 1 and f 2 are proper lsc convex functions.<br />
Stanford seminar 08-8
Class of problems in CS (cont’d)<br />
(P σ ), (P τ ) and (P λ ) can all be cast as:<br />
(P1) : min<br />
α<br />
f 1(α) + f 2 (α)<br />
f 1 and f 2 are proper lsc convex functions.<br />
(P λ )<br />
1<br />
2 ‖y − HΦα‖2 l 2<br />
+ λΨ (α)<br />
f 1 (α) = 1 2 ‖y − HΦα‖2 l 2<br />
,f 2 (α) =λΨ (α)<br />
Stanford seminar 08-8
Class of problems in CS (cont’d)<br />
(P σ ), (P τ ) and (P λ ) can all be cast as:<br />
(P1) : min<br />
α<br />
f 1(α) + f 2 (α)<br />
f 1 and f 2 are proper lsc convex functions.<br />
(P λ )<br />
1<br />
2 ‖y − HΦα‖2 l 2<br />
+ λΨ (α)<br />
f 1 (α) = 1 2 ‖y − HΦα‖2 l 2<br />
,f 2 (α) =λΨ (α)<br />
(P σ )<br />
Ψ (α) s.t. ‖y − HΦα‖ l2<br />
≤ σ<br />
f 1 (α) =Ψ (α) ,f 2 (α) =ı Bl2 ,σ (α)<br />
Stanford seminar 08-8
Class of problems in CS (cont’d)<br />
(P σ ), (P τ ) and (P λ ) can all be cast as:<br />
(P1) : min<br />
α<br />
f 1(α) + f 2 (α)<br />
f 1 and f 2 are proper lsc convex functions.<br />
(P λ )<br />
1<br />
2 ‖y − HΦα‖2 l 2<br />
+ λΨ (α)<br />
f 1 (α) = 1 2 ‖y − HΦα‖2 l 2<br />
,f 2 (α) =λΨ (α)<br />
(P σ )<br />
Ψ (α) s.t. ‖y − HΦα‖ l2<br />
≤ σ<br />
f 1 (α) =Ψ (α) ,f 2 (α) =ı Bl2 ,σ (α)<br />
(P τ )<br />
1<br />
2 ‖y − HΦα‖2 l 2<br />
s.t. Ψ (α) ≤ τ<br />
f 1 (α) = 1 2 ‖y − HΦα‖2 l 2<br />
,f 2 (α) =ı Bψ,τ (α)<br />
Stanford seminar 08-8
Characterization<br />
Theorem 1<br />
(i) Existence: (P1) possesses at least one solution if f 1 + f 2 is coercive, i.e.,<br />
lim ‖α‖→+∞ f 1 (α)+f 2 (α) = +∞.<br />
(ii) Uniqueness: (P1) possesses at most one solution if f 1 + f 2 is strictly convex.<br />
This occurs in particular when either f 1 or f 2 is strictly convex.<br />
(iii) Characterization: Let α ∈ H. Then the following statements are equivalent:<br />
(a) α solves (P1).<br />
(b) α =(I + ∂(f 1 + f 2 )) −1 (α) (proximal iteration).<br />
∂f i is the subdifferential (set-valued map), a maximal monotone operator.<br />
J ∂(f1 +f 2 )<br />
= (I + ∂(f 1 + f 2 )) −1 is the resolvent of ∂(f 1 + f 2 ) (firmly nonexpansive<br />
operator).<br />
Stanford seminar 08-9
Characterization<br />
Theorem 1<br />
(i) Existence: (P1) possesses at least one solution if f 1 + f 2 is coercive, i.e.,<br />
lim ‖α‖→+∞ f 1 (α)+f 2 (α) = +∞.<br />
(ii) Uniqueness: (P1) possesses at most one solution if f 1 + f 2 is strictly convex.<br />
This occurs in particular when either f 1 or f 2 is strictly convex.<br />
(iii) Characterization: Let α ∈ H. Then the following statements are equivalent:<br />
(a) α solves (P1).<br />
(b) α =(I + ∂(f 1 + f 2 )) −1 (α) (proximal iteration).<br />
∂f i is the subdifferential (set-valued map), a maximal monotone operator.<br />
J ∂(f1 +f 2 )<br />
= (I + ∂(f 1 + f 2 )) −1 is the resolvent of ∂(f 1 + f 2 ) (firmly nonexpansive<br />
operator).<br />
Explicit inversion difficult in general<br />
Stanford seminar 08-9
Operator splitting schemes<br />
Idea: replace explicit evaluation of the resolvent of ∂ (f 1 + f 2 ) ( i.e. J ∂(f1 +f 2 ))<br />
,<br />
by a sequence of calculations involving only ∂f 1 and ∂f 2 at a time.<br />
An extensive literature essentially divided into three classes:<br />
Splitting method Assumptions Iteration<br />
Forward-Backward f 2 has a Lipschitzcontinuous<br />
gradient<br />
Backward-Backward f 1 ,f 2 proper lsc convex<br />
but ...<br />
Douglas/Peaceman-Rachford All f 1 ,f 2 proper lsc<br />
convex<br />
α t+1 = J µ∂f1 (I − µ∇f 2 )(α t )<br />
α t+1 = J ∂f1 J ∂f2 (α t )<br />
α t+1 =(J ∂f1 (2J ∂f2 − I)+I − J ∂f2 )(α t )<br />
Stanford seminar 08-10
Operator splitting schemes<br />
Idea: replace explicit evaluation of the resolvent of ∂ (f 1 + f 2 ) ( i.e. J ∂(f1 +f 2 ))<br />
,<br />
by a sequence of calculations involving only ∂f 1 and ∂f 2 at a time.<br />
An extensive literature essentially divided into three classes:<br />
Splitting method Assumptions Iteration<br />
Forward-Backward f 2 has a Lipschitzcontinuous<br />
gradient<br />
Backward-Backward f 1 ,f 2 proper lsc convex<br />
but ...<br />
Douglas/Peaceman-Rachford All f 1 ,f 2 proper lsc<br />
convex<br />
α t+1 = J µ∂f1 (I − µ∇f 2 )(α t )<br />
α t+1 = J ∂f1 J ∂f2 (α t )<br />
α t+1 =(J ∂f1 (2J ∂f2 − I)+I − J ∂f2 )(α t )<br />
J ∂fi<br />
= (I + ∂f i ) −1<br />
Moreau(-Yosida) proximity operators<br />
Stanford seminar 08-10
Proximity operators<br />
Some properties<br />
The prox is uniquely valued.<br />
The Moreau envelope<br />
is Fréchet-differentiable with<br />
Lipschitz continuous gradient, hence the name Moreau-Yosida regularization.<br />
Many useful rules in proximal calculus (scaling, translation, quadratic<br />
perturbation, etc).<br />
Stanford seminar 08-11
Example of proximity operator<br />
Stanford seminar 08-12
Example of proximity operator<br />
Stanford seminar 08-12
Example of proximity operator<br />
Stanford seminar 08-12
Compressed sensing optimization<br />
problems<br />
Stanford seminar 08-13
Problem statement<br />
y = HΦα + ε, Var (ε) =σ 2 ε.<br />
Consider the following minimization problems:<br />
(P σ ) : min Ψ (α) s.t. ‖y − HΦα‖ l2<br />
≤ σ<br />
(P eq ) : min Ψ (α) s.t. y = HΦα<br />
(P τ ) : min 1 2 ‖y − HΦα‖2 l 2<br />
s.t. Ψ (α) ≤ τ<br />
Ψ (α) = ∑ γ∈Γ ψ (α γ).<br />
ψ is a sparsity-promoting penalty.<br />
Stanford seminar 08-14
Characterizing Problem (P τ )<br />
Stanford seminar 08-15
FB to solve Problem (P τ )<br />
Theorem 1 Let {µ t ,t ∈ N} be a sequence such that 0 < inf t µ t ≤ sup t µ t <<br />
2/ ‖HΦ‖ 2 , let {β t ,t∈ N} be a sequence in (0, 1], and let {a t ,t∈ N} and {b t ,t∈ N}<br />
be sequences in H such that ∑ t ‖a t‖ < +∞ and ∑ t ‖b t‖ < +∞. Fix α 0 , and<br />
define the sequence of iterates:<br />
α t+1 = α t + β t<br />
(<br />
proj BΨ,τ<br />
(<br />
α t + µ t<br />
(<br />
Φ ∗ H ∗ ( y − HΦα t)) − b t<br />
)<br />
+ at − α (t))<br />
where proj BΨ,τ<br />
is the projector onto the Ψ-ball of radius τ.<br />
(i) {α (t) ,t≥ 0} converges weakly to a minimizer of (P λ ).<br />
(ii) For Ψ(α) =‖α‖ l1<br />
, there exists a subsequence {α (t) ,t ≥ 0} that converges<br />
strongly to a minimizer of (P λ ).<br />
(iii) The projection operator proj BΨ,τ<br />
is<br />
⎧<br />
⎨α if Ψ (α) ≤ τ,<br />
proj BΨ,τ<br />
(α) =<br />
⎩prox κΨ α otherwise.<br />
,<br />
where κ (depending on α and τ) is chosen such that Ψ (prox κΨ (α)) = τ.<br />
Stanford seminar 08-16
Proximity operators of Ψ<br />
Conclusion<br />
Thresholding/shrinkage operator: e.g. soft-thresholding for the<br />
Computational cost: O(L) for L coefficients.<br />
norm.<br />
Stanford seminar 08-17
Characterizing Problem (P σ )<br />
Stanford seminar 08-18
DR to solve Problem (P σ )<br />
Theorem 1 Suppose that A = HΦ is a tight frame, i.e. AA ∗ = cI. Let µ ∈<br />
(0, +∞), let {β t } be a sequence in (0, 2), and let {a t } and {b t } be sequences in H<br />
such that ∑ t β t(2 − β t ) = +∞ and ∑ t β t (‖a t ‖ + ‖b t ‖) < +∞. Fix α 0 ∈ B l2 ,σ<br />
and define the sequence of iterates,<br />
α t+1/2 = α t + c −1 A ∗ (P Bσ − I) A(α t )+b<br />
(<br />
t<br />
α t+1 = α t + β t<br />
(prox µΨ ◦ 2α t+1/2 − α t) + a t − α t+1/2) ,<br />
where,<br />
(P Bσ − I)(x) =<br />
σ<br />
‖x−y‖ l2<br />
As before<br />
⎧<br />
⎨0 if ‖x − y‖ l2<br />
≤ σ,<br />
(<br />
)<br />
⎩<br />
σ<br />
x + y 1 −<br />
‖x−y‖<br />
otherwise.<br />
l2<br />
Then {α t ,t≥ 0} converges weakly to some point α and α + c −1 A ∗ (P Bσ − I)(α)<br />
is a solution to (P σ ).<br />
Stanford seminar 08-19
Characterizing Problem (Peq)<br />
min Ψ(α) s.t.<br />
y = HΦα.<br />
Proposition 1<br />
(i) Existence: (P eq ) has at least one solution.<br />
(ii) Uniqueness: (P eq ) has a unique solution if ψ is strictly convex.<br />
(iii) If α solves (P eq ), then it solves (P σ=0 ).<br />
Stanford seminar 08-20
DR to solve Problem (Peq)<br />
Theorem 1 Let A = HΦ. Let µ ∈ (0, +∞), let {β t } be a sequence in (0, 2),<br />
and let {a t } and {b t } be sequences in H such that ∑ t β t(2 − β t ) = +∞ and<br />
∑<br />
t β t (‖a t ‖ + ‖b t ‖) < +∞. Fix α 0 and define the sequence of iterates,<br />
α t+1/2 = α t + A ∗ (AA ∗ ) −1 ( y − Aα t) + b t<br />
(<br />
α t+1 = α t + β t<br />
(prox µΨ ◦ 2α t+1/2 − α t) + a t − α t+1/2) .<br />
Then {α t , t ≥ 0} converges weakly to some point α and α+A ∗ (AA ∗ ) −1 (y − Aα)<br />
is a solution to (P eq ).<br />
As before<br />
Stanford seminar 08-21
Other problems<br />
The same framework can also handle problems with analysis prior p ≥ 1:<br />
min<br />
x<br />
‖Dx‖ p l p<br />
s.t. ‖y − Hx‖ l2<br />
≤ σ<br />
min<br />
x<br />
‖Dx‖ p l p<br />
s.t. y = Hx .<br />
D a bounded linear operator, e.g. frame D = Φ ∗ , finite differences for TVregularization,<br />
etc.<br />
The proximity operator of ‖Dx‖ p p<br />
obtained through Legendre-Fenchel conjugacy<br />
and Forward-Backward splitting algorithm.<br />
Stanford seminar 08-22
Pros and cons<br />
(P σ ) and (P eq ) have the same computational complexity : 2N iter (V H + V Φ + O (L)).<br />
(P τ ) : 2N iter (V H + V Φ )+O (N iter L log L).<br />
V H and V Φ are the complexities of the operators H (resp. H ∗ ) and Φ (resp.Φ ∗ ).<br />
V H,Φ is O(n) or O(n log n) for most useful transforms/sensing operators.<br />
No tight frames with (Peq) but does not handle noise .<br />
No tight frames with (P τ ) but τ is difficult to choose .<br />
Tight frames with (P σ ) but handles noise and σ is easier to choose .<br />
Stanford seminar 08-23
Stylized applications<br />
Stanford seminar 08-24
CS reconstruction (1)<br />
H = Fourier, Φ = DWT, Ψ(α) = ‖α‖ l1<br />
, m/n = 17%<br />
Projections RealFourier m/n=0.17 SNR=30 dB<br />
Original image<br />
(P τ ) (Peq)-DR (P )-DR<br />
LASSO−Prox Iter=1000 PSNR=22.0734 dB BP−DR (noiseless) Iter=1000 PSNR=23.0226 dB BPDN−DR Iter=1000 PSNR=22.1286 dB<br />
Stanford seminar 08-25
CS reconstruction (2)<br />
Projections Real Fourier SNR=35.2649 dB<br />
Projections Real Fourier SNR=35.2649 dB<br />
Original phantom image<br />
Projections Real Fourier SNR=35.2649 dB<br />
Original phantom image<br />
TV−DR on on noiseless data Iter=200 PSNR=75.4585 dB dB<br />
TVDN−DR on on noisy data Iter=200 PSNR=40.7432 dB dB<br />
TV−DR on noiseless data Iter=200 PSNR=75.4585 dB<br />
TVDN−DR on noisy data Iter=200 PSNR=40.7432 dB<br />
Stanford seminar 08-26
Inpainting and CS<br />
H = Dirac, Φ = Curvelets + LDCT, Ψ(α) =‖α‖ l1<br />
, m/n = 20%<br />
Stanford seminar 08-27
Inpainting and CS<br />
H = Dirac, Φ = Curvelets + LDCT, Ψ(α) =‖α‖ l1<br />
, m/n = 20%<br />
Stanford seminar 08-27
Inpainting and CS<br />
H = Dirac, Φ = Curvelets + LDCT, Ψ(α) =‖α‖ l1<br />
, m/n = 20%<br />
Stanford seminar 08-27
Super-resolution<br />
Stanford seminar 08-28
Computation time<br />
CS H = Fourier, Φ = Dirac<br />
m/n !=Dirac, = H=RST, 20%, m/n=25%, sparsity sparsity=5% = 5%<br />
CS H = Hadamard, Φ = Dirac<br />
m/n = 20%, sparsity = 5%<br />
!=Dirac, H=Hadamard, m/n=25%, sparsity=5%<br />
10<br />
4<br />
log 2 (m)<br />
10 3<br />
BP−DR<br />
LARS<br />
LP−Interior Point<br />
StOMP<br />
10 1<br />
10 2 log 2<br />
(m)<br />
BP−DR<br />
LARS<br />
LP−Interior Point<br />
StOMP<br />
Computation time (s)<br />
10 0<br />
Computation time (s)<br />
10 2<br />
10 1<br />
10 0<br />
10 −1<br />
10 −1<br />
6 7 8 9 10 11 12<br />
6 7 8 9 10 11 12 13 14<br />
Stanford seminar 08-29
Take-away messages<br />
A general framework of optimization problems in CS:<br />
maximal monotone operator splitting.<br />
Good and fast solvers for large-scale problems:<br />
Iterative-thresholding.<br />
Grounded theoretical results.<br />
A wide variety of applications beyond CS.<br />
Stanford seminar 08-<br />
30
Ongoing and future work<br />
Beyond the linear case with AWGN [FXD-F.-JLS 07-08].<br />
More comparison to other algorithms.<br />
Work harder for algorithms faster than linear.<br />
Other problems, e.g. Dantzig selector.<br />
Other applications.<br />
Stanford seminar 08-31
Extended experiments, code and paper available soon at<br />
http://www.greyc.ensicaen.fr/~jfadili<br />
Sparsity stuff<br />
http://www.morphologicaldiversity.org<br />
Thanks<br />
Any questions ?<br />
Stanford seminar 08-32