sparse image representation via combined transforms - Convex ...
sparse image representation via combined transforms - Convex ... sparse image representation via combined transforms - Convex ...
96 CHAPTER 4. COMBINED IMAGE REPRESENTATION 4.10 Numerical Issues Damped Newton direction. To ensure that Newton’s method converges, we implement a backtracking scheme to find the value for β i in (4.11). It guarantees that at every iteration, the value of the objective function is reduced. Truncated CG. In order to save computing time, in the early Newton iterations, we terminate the CG solver before it reaches high precision, because an inexact Newton direction does not hurt the precision of the final solution by Newton method [38, 39]. The algorithm is implemented in Matlab. Fast algorithms for wavelet and edgelet transforms are implemented in C and called by Matlab through a CMEX interface. 4.11 Discussion 4.11.1 Connection With Statistics In statistics, we can find the same method being used in model selection, where we choose a subset of variables so that the model is still sufficient for prediction and inference. To be more specific, in linear regression models, we consider y = Xβ + ε, where y is the response, X is the model matrix with every column being values of a variable (predictor), β is the coefficient vector, and ε is a vector of IID random variables. Model selection in this setting means choosing a subset of columns of X, X (0) , so that for most of the possible responses y, wehavey ≈ X (0) β (0) , where β (0) is a subvector of β with locations corresponding to the selected columns in X (0) . The difference (or prediction error), y − X (0) β (0) , is negligible in the sense that it can be interpreted as a realization of the random noise vector ε. Typically, people use penalized regression to select the model. Basically, we solve (PR) minimize ‖y − Xβ‖ 2 2 + λρ(β), β which is exactly the problem we encountered in (4.2). After solving problem (PR), we can pick the ith column in X if β i has a significant amplitude. When ρ(β) =‖β‖ 1 , the method (PR) is called LASSO by R. Tibshirani [133] and Basis Pursuit by Chen et al [27]. When
4.11. DISCUSSION 97 ρ(β) =‖β‖ 2 2 , the method (PR) is called ridge regression by Hoerl and Kennard [81, 80]. 4.11.2 Non-convex Sparsity Measure An ideal measure of sparsity is usually nonconvex. For example, in (4.2), the number of nonzero elements in x is the most intuitive measure of sparsity. The l 0 norm of x, ‖x‖ 0 ,is equal to the number of nonzero elements, but it is not a convex function. Another choice of measure of sparsity is the logarithmic function; for x =(x 1 ,... ,x N ) T ∈ R N ,wecan have ρ(x) = ∑ N i=1 log |x i|. In sparse image component analysis, another nonconvex sparsity measure is used: ρ(x) = ∑ N i=1 log(1 + x2 i ) [53]. Generally speaking, a nonconvex optimization problem is a combinatorial optimization problem, and hence it is NP hard. Some discussion about how to use reweighting methods to solve a nonconvex optimization problem is given in the next subsection. 4.11.3 Iterative Algorithm for Non-convex Optimization Problems Sometimes, a reweighted iterative method canbeusedtofindalocal minimum for a nonconvex optimization problem. Let’s consider the following problem: (LO) minimize x N∑ log |x i |, subject to y =Φx; i=1 and its corresponding version with a Lagrangian multiplier λ, 1 (LO λ ) minimize ‖y − Φx‖ 2 x 2 + λ N∑ log(|x i | + δ). i=1 Note that the objective function of (LO) is not convex. Let’s consider a reweighted iterative algorithm: for δ>0, (RIA) x (k+1) = argmin x N∑ i=1 |x i | |x (k) , subject to y =Φx; i | + δ 1 More precisely, (LO λ ) is the Lagrangian multiplier version of the following optimization problem: minimize x N∑ log(|x i| + δ), subject to ‖y − Φx‖ ≤ε. i=1 Note when δ and ε are small, it is close to (LO).
- Page 73 and 74: 3.1. DCT AND HOMOGENEOUS COMPONENTS
- Page 75 and 76: 3.1. DCT AND HOMOGENEOUS COMPONENTS
- Page 77 and 78: 3.1. DCT AND HOMOGENEOUS COMPONENTS
- Page 79 and 80: 3.1. DCT AND HOMOGENEOUS COMPONENTS
- Page 81 and 82: 3.2. WAVELETS AND POINT SINGULARITI
- Page 83 and 84: 3.2. WAVELETS AND POINT SINGULARITI
- Page 85 and 86: 3.2. WAVELETS AND POINT SINGULARITI
- Page 87 and 88: 3.2. WAVELETS AND POINT SINGULARITI
- Page 89 and 90: 3.2. WAVELETS AND POINT SINGULARITI
- Page 91 and 92: 3.2. WAVELETS AND POINT SINGULARITI
- Page 93 and 94: 3.3. EDGELETS AND LINEAR SINGULARIT
- Page 95 and 96: 3.4. OTHER TRANSFORMS 67 uncertaint
- Page 97 and 98: 3.4. OTHER TRANSFORMS 69 Chirplets
- Page 99 and 100: 3.4. OTHER TRANSFORMS 71 Folding. A
- Page 101 and 102: 3.4. OTHER TRANSFORMS 73 We can app
- Page 103 and 104: 3.5. DISCUSSION 75 give only a few
- Page 105 and 106: 3.7. PROOFS 77 the ijth component o
- Page 107 and 108: 3.7. PROOFS 79 Similarly, we have [
- Page 109 and 110: Chapter 4 Combined Image Representa
- Page 111 and 112: 4.2. SPARSE DECOMPOSITION 83 interi
- Page 113 and 114: 4.3. MINIMUM l 1 NORM SOLUTION 85 l
- Page 115 and 116: 4.4. LAGRANGE MULTIPLIERS 87 ρ( x
- Page 117 and 118: 4.5. HOW TO CHOOSE ρ AND λ 89 3 (
- Page 119 and 120: 4.6. HOMOTOPY 91 A way to interpret
- Page 121 and 122: 4.7. NEWTON DIRECTION 93 4.7 Newton
- Page 123: 4.9. ITERATIVE METHODS 95 1. Avoidi
- Page 127 and 128: 4.12. PROOFS 99 4.12.2 Proof of The
- Page 129 and 130: 4.12. PROOFS 101 case of (4.16). Co
- Page 131 and 132: Chapter 5 Iterative Methods This ch
- Page 133 and 134: 5.1. OVERVIEW 105 the k-th iteratio
- Page 135 and 136: 5.1. OVERVIEW 107 5.1.4 Preconditio
- Page 137 and 138: 5.2. LSQR 109 among all the block d
- Page 139 and 140: 5.2. LSQR 111 5.2.3 Algorithm LSQR
- Page 141 and 142: 5.3. MINRES 113 2. For k =1, 2,...,
- Page 143 and 144: 5.3. MINRES 115 using the precondit
- Page 145 and 146: 5.4. DISCUSSION 117 From (I + S 1 )
- Page 147 and 148: Chapter 6 Simulations Section 6.1 d
- Page 149 and 150: 6.3. DECOMPOSITION 121 10 5 5 5 20
- Page 151 and 152: 6.4. DECAY OF COEFFICIENTS 123 10 2
- Page 153 and 154: 6.5. COMPARISON WITH MATCHING PURSU
- Page 155 and 156: 6.6. SUMMARY OF COMPUTATIONAL EXPER
- Page 157 and 158: Chapter 7 Future Work In the future
- Page 159 and 160: 7.2. MODIFYING EDGELET DICTIONARY 1
- Page 161 and 162: 7.3. ACCELERATING THE ITERATIVE ALG
- Page 163 and 164: Appendix A Direct Edgelet Transform
- Page 165 and 166: A.2. EXAMPLES 137 edgelet transform
- Page 167 and 168: A.3. DETAILS 139 (a) Stick image (b
- Page 169 and 170: A.3. DETAILS 141 (a) Lenna image (b
- Page 171 and 172: A.3. DETAILS 143 Ordering of Dyadic
- Page 173 and 174: A.3. DETAILS 145 (1,K +1), (1,K +2)
96 CHAPTER 4. COMBINED IMAGE REPRESENTATION<br />
4.10 Numerical Issues<br />
Damped Newton direction. To ensure that Newton’s method converges, we implement a<br />
backtracking scheme to find the value for β i in (4.11). It guarantees that at every iteration,<br />
the value of the objective function is reduced.<br />
Truncated CG. In order to save computing time, in the early Newton iterations, we terminate<br />
the CG solver before it reaches high precision, because an inexact Newton direction<br />
does not hurt the precision of the final solution by Newton method [38, 39].<br />
The algorithm is implemented in Matlab. Fast algorithms for wavelet and edgelet <strong>transforms</strong><br />
are implemented in C and called by Matlab through a CMEX interface.<br />
4.11 Discussion<br />
4.11.1 Connection With Statistics<br />
In statistics, we can find the same method being used in model selection, where we choose<br />
a subset of variables so that the model is still sufficient for prediction and inference. To be<br />
more specific, in linear regression models, we consider<br />
y = Xβ + ε,<br />
where y is the response, X is the model matrix with every column being values of a variable<br />
(predictor), β is the coefficient vector, and ε is a vector of IID random variables. Model<br />
selection in this setting means choosing a subset of columns of X, X (0) , so that for most<br />
of the possible responses y, wehavey ≈ X (0) β (0) , where β (0) is a subvector of β with<br />
locations corresponding to the selected columns in X (0) . The difference (or prediction<br />
error), y − X (0) β (0) , is negligible in the sense that it can be interpreted as a realization of<br />
the random noise vector ε.<br />
Typically, people use penalized regression to select the model. Basically, we solve<br />
(PR)<br />
minimize ‖y − Xβ‖ 2 2 + λρ(β),<br />
β<br />
which is exactly the problem we encountered in (4.2). After solving problem (PR), we can<br />
pick the ith column in X if β i has a significant amplitude. When ρ(β) =‖β‖ 1 , the method<br />
(PR) is called LASSO by R. Tibshirani [133] and Basis Pursuit by Chen et al [27]. When