Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

More documents

Recommendations

Info

Chapter 5: Data Mining the Multiclass Dataset Lung cancer is the most prominent form of cancer worldwide, representing 12.3% of all cancer diagnoses [1]. It is a devastating disease, for 90% of those diagnosed with lung cancer will eventually succumb to it, representing 17.8% of cancer deaths worldwide. Smoking tobacco is strongly correlated to the development of lung cancer, with 80-90% of all diagnoses being attributed to smokers, although only 11% of cigarette smokers will develop lung cancer [1]. There exist four histologically distinct lung cancer variants: 3 are of non small cell lung cancer (NSCLC), and the other is small cell lung cancer (SMLC) [1]. NSMLC is the predominant form of lung cancer, encompassing 80% of all lung cancer cases [2]. Adenocarcinoma has surpassed squamous cell carcinoma in prevalence; both are subtypes of NSCLC, the most frequent subtype of lung cancer [1, 2]. Alarmingly, adenocarcinoma is most common in women, non-smokers, and the young [1, 2]. Adenocarcinomas are peripherally located in the lungs and develop from clara cells, alveoli, and mucin producing cells. Squamous carcinomas arise in the central airways and are the direct result of smoking, as there are no squamous epithelial cells in normal lungs. Surgical intervention for patients without mediastinal involvement still results in only a 30-50% chance of disease-free survival, with long-term survival greatly reduced for patients with mediastinal involvement [1]. We have separated the samples from the Bhattacharjee dataset into several subsets, in order to produce a NSCLC multiclass dataset consisting of adenocarcinomas, squamous cell carcinomas, and normal (actually adjacent) biopsies. There exist 155 samples which survived the BaFL 114
sample cleansing process and these encompass 4,248 ProbeSets. Borrowing from machine learning practices, a gain ratio was calculated from k-means clustering analysis of individual ProbeSets to perform down selection to the most significant features [3, 4]. This gain ratio was also calculated at the individual probe level and then averaged across the probes belonging to an accepted set, to give a ProbeSet response. We demonstrate that both approaches improve the model’s performance in a supervised classification analysis, implemented using either the kNN classifier based on Euclidean space [3, 5, 6], or Fisher’s linear discriminant analysis (LDA) [7, 8]. In this case the emphasis is not on comparing analysis methods, but rather focuses on discovery of intriguing biological phenomenon revealed by using the BaFL pipeline to select the most unambiguous signal. Eighteen significant ProbeSets were selected, possessing a gain criterion greater than 0.8, and this set of genes suggest that an important mechanism underlying tumorogenesis is abnormal cytokinesis. The biological significance of these genes are validated by a literature survey in the discussion section. Materials and Methods A subset of samples were selected from the BaFL cleansed Bhattacharjee dataset to construct a multiclass dataset containing NSCLC tumor biopsies and adjacent/normal biopsies. This dataset contained the 125 adenocarcinomas and 13 normals, which comprised the two state disease model previously considered, and an additional 17 squamous samples. The BaFL cleansing pipeline was applied across all samples, with the result that 24,022 probes were found to be common to the three states; these lie in 4,248 ProbeSets (with at least 4 acceptable probes each). 115
Page 1 and 2:
An Adenocarcinoma Case Study of the
Page 3 and 4:
DEDICATION The work presented in th
Page 5 and 6:
TABLE OF CONTENTS LIST OF TABLES ..
Page 7 and 8:
Conclusion.........................
Page 9 and 10:
LIST OF FIGURES Figure Page Figure
Page 11 and 12:
violate the linear correlation rela
Page 13 and 14:
Chapter 1: An Introduction to Micro
Page 15 and 16:
previously stated, rigorous reagent
Page 17 and 18:
Factors influencing Signal Interpre
Page 19 and 20:
likelihood that no such structural
Page 21 and 22:
hybridization and multiple dye stra
Page 23 and 24:
classical reaction equations, modif
Page 25 and 26:
prevent the hybridization of probes
Page 27 and 28:
platforms are used to deposit them
Page 29 and 30:
and chip layout [64]. In particular
Page 31 and 32:
whether analyses based upon individ
Page 33 and 34:
The process of determining a candid
Page 35 and 36:
implemented in order to demonstrate
Page 37 and 38:
affecting probe-target duplex forma
Page 39 and 40:
It has long been known that the var
Page 41 and 42:
probes can thus be reincorporated i
Page 43 and 44:
(Carr, ms in review). This ProbeFAT
Page 45 and 46:
have specific attributes which can
Page 47 and 48:
elow saturation. The investigator m
Page 49 and 50:
a. A filter based on how many probe
Page 51 and 52:
means. The remaining sets possess s
Page 53 and 54:
for the 24 samples in the Lu, et al
Page 55 and 56:
Table 2.1: Probe Numbers per filter
Page 57 and 58:
the effect can be. Both the type of
Page 59 and 60:
described above. In Figure 2.3, the
Page 61 and 62:
various data processing stages are
Page 63 and 64:
position of the probe on the transc
Page 65 and 66:
Figure 2.5: BaFL consistency. Demon
Page 67 and 68:
Figure 2.6: Probe-Transcript region
Page 69 and 70:
Table 2.3: ProbeSet behavior predic
Page 71 and 72:
cleansing process. This is demonstr
Page 73 and 74:
Table 2.4: ProbeSet behavior of pro
Page 75 and 76: A Priori Prediction We demonstrated
Page 77 and 78: Conclusion We have presented a comp
Page 79 and 80: mind that they are most aware of th
Page 81 and 82: Probe Cleansing Methods The BaFL pr
Page 83 and 84: are significant assume such a data
Page 85 and 86: cleansing process and were assessed
Page 87 and 88: sample selection, with replacement,
Page 89 and 90: Figure 3.1: P value distributions.
Page 91 and 92: Figure 3.2: Down selection models-
Page 93 and 94: cleansing methods were used to gene
Page 95 and 96: Discussion A striking result from t
Page 97 and 98: Figure 3.6: Concordance summary. Th
Page 99 and 100: whole model analysis without permut
Page 101 and 102: change, apparently the greater vari
Page 103 and 104: though in Chapter 2 we showed that
Page 105 and 106: Although a list of 325 candidate ge
Page 107 and 108: was determined to consist of 30 Pro
Page 109 and 110: Figure 4.1: Non-traditional PCA ana
Page 111 and 112: for the RMA and dCHIP interpretatio
Page 113 and 114: was one of them: it is counted as p
Page 115 and 116: Figure 4.5: Kaplan-Meyer survival c
Page 117 and 118: Figure 4.7: Low grade tumor surviva
Page 119 and 120: The candidate gene list values were
Page 121 and 122: ejecting the null hypothesis [8], a
Page 123 and 124: Figure 4.9: GO connectivity of cand
Page 125: under linear and non-linear dimensi
Page 129 and 130: averaged to give a final approximat
Page 131 and 132: 119 Table 5.1: : NSCLC candidate ge
Page 133 and 134: Figure 5.1: NSCLC ProbeSet gain sel
Page 135 and 136: Figure 5.3: NSCLC refined average p
Page 137 and 138: statistical criterion, but it appea
Page 139 and 140: lacking p53 mutations [30]. While,
Page 141 and 142: Kruppel-like factor 5 (KLF5) has al
Page 143 and 144: Appendix A # This is the main drive
Page 145 and 146: def Driver(usr, pswd, db, logfile,
Page 147 and 148: def DriverXH(usr, pswd, db, logfile
Page 149 and 150: fp.write('\tTable '), fp.write(msk[
Page 151 and 152: #determine outliers lowrs1=nonzero(
Page 153 and 154: tmp="update sample_mask set exclude
Page 155 and 156: fp.write('\n'+msk[ptr[j]]+' exclude
Page 157 and 158: cur, conn=UpdateWorkReg(cur, conn,
Page 159 and 160: Appendix E # The result of permutin
Page 161 and 162: BIBLIOGRAPHY 149
Page 163 and 164: 13. Bowtell DD: Options available--
Page 165 and 166: method addressing dye, intensity-de
Page 167 and 168: 73. Alter O, Brown PO, Botstein D:
Page 169 and 170: 11. Kumari S, Verma LK, Weller JW:
Page 171 and 172: 38. Michael Stonebraker LAR, Michae
Page 173 and 174: 64. Cropp CS, Lidereau R, Leone A,
Page 175 and 176: 15. Irizarry RA: affy. In.: Biocond
Page 177 and 178:
Chapter 4: 1. Minna JD, Roth JA, Ga
Page 179 and 180:
30. Delaval B, Ferrand A, Conte N,
Page 181 and 182:
27. Bai J, Cederbaum AI: Catalase p
Page 183 and 184:
51. Shouse GP, Cai X, Liu X: Serine
show all

Sample A: Cover Page of Thesis, Project, or Dissertation Proposal

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?