View - ResearchGate
View - ResearchGate
View - ResearchGate
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Clinical Proteomics
METHODS IN MOLECULAR BIOLOGY TM<br />
John M. Walker, SERIES EDITOR<br />
447. Alcohol: Methods and Protocols, edited by<br />
Laura E. Nagy, 2008<br />
446. Post-translational Modification of Proteins:<br />
Tools for Functional Proteomics, Second Edition,<br />
edited by Christoph Kannicht, 2008<br />
443. Molecular Modeling of Proteins, edited by<br />
Andreas Kukol, 2008<br />
439. Genomics Protocols: Second Edition, edited by<br />
Mike Starkey and Ramnanth Elaswarapu, 2008<br />
438. Neural Stem Cells: Methods and Protocols,<br />
Second Edition, edited by Leslie P. Weiner, 2008<br />
437. Drug Delivery Systems, edited by Kewal K. Jain,<br />
2008<br />
436. Avian Influenza Virus, edited by Erica Spackman,<br />
2008<br />
435. Chromosomal Mutagenesis, edited by Greg Davis<br />
and Kevin J. Kayser, 2008<br />
434. Gene Therapy Protocols: Volume 2: Design and<br />
Characterization of Gene Transfer Vectors edited<br />
by Joseph M. LeDoux, 2008<br />
433. Gene Therapy Protocols: Volume 1: Production<br />
and In Vivo Applications of Gene Transfer Vectors,<br />
edited by Joseph M. LeDoux, 2007<br />
432. Organelle Proteomics, edited by Delphine Pflieger<br />
and Jean Rossier, 2008<br />
431. Bacterial Pathogenesis: Methods and Protocols,<br />
edited by Frank DeLeo and Michael Otto, 2008<br />
430. Hematopoietic Stem Cell Protocols, edited by<br />
Kevin D. Bunting, 2008<br />
429. Molecular Beacons: Signalling Nucleic Acid<br />
Probes, Methods and Protocols, edited by Andreas<br />
Marx and Oliver Seitz, 2008<br />
428. Clinical Proteomics: Methods and Protocols,<br />
edited by Antonia Vlahou, 2008<br />
427. Plant Embryogenesis, edited by Maria Fernanda<br />
Suarez and Peter Bozhkov, 2008<br />
426. Structural Proteomics: High-Throughput Methods,<br />
edited by Bostjan Kobe, Mitchell Guss, and Huber<br />
Thomas, 2008<br />
425. 2D PAGE: Volume 2: Applications and Protocols,<br />
edited by Anton Posch, 2008<br />
424. 2D PAGE: Volume 1:, Sample Preparation and<br />
Pre-Fractionation, edited by Anton Posch, 2008<br />
423. Electroporation Protocols, edited by Shulin Li,<br />
2008<br />
422. Phylogenomics, edited by William J. Murphy, 2008<br />
421. Affinity Chromatography: Methods and<br />
Protocols, Second Edition, edited by Michael<br />
Zachariou, 2008<br />
420. Drosophila: Methods and Protocols, edited by<br />
Christian Dahmann, 2008<br />
419. Post-Transcriptional Gene Regulation, edited by<br />
Jeffrey Wilusz, 2008<br />
418. Avidin-Biotin Interactions: Methods and<br />
Applications, edited by Robert J. McMahon, 2008<br />
417. Tissue Engineering, Second Edition, edited by<br />
Hannsjörg Hauser and Martin Fussenegger, 2007<br />
416. Gene Essentiality: Protocols and Bioinformatics,<br />
edited by Svetlana Gerdes and Andrei L. Osterman,<br />
2008<br />
415. Innate Immunity, edited by Jonathan Ewbank and<br />
Eric Vivier, 2007<br />
414. Apoptosis in Cancer: Methods and Protocols,<br />
edited by Gil Mor and Ayesha Alvero, 2008<br />
413. Protein Structure Prediction, Second Edition,<br />
edited by Mohammed Zaki and Chris Bystroff, 2008<br />
412. Neutrophil Methods and Protocols, edited by<br />
Mark T. Quinn, Frank R. DeLeo, and Gary M.<br />
Bokoch, 2007<br />
411. Reporter Genes for Mammalian Systems, edited<br />
by Don Anson, 2007<br />
410. Environmental Genomics, edited by Cristofre<br />
C. Martin, 2007<br />
409. Immunoinformatics: Predicting Immunogenicity<br />
In Silico, edited by Darren R. Flower, 2007<br />
408. Gene Function Analysis, edited by Michael Ochs,<br />
2007<br />
407. Stem Cell Assays, edited by Vemuri C. Mohan,<br />
2007<br />
406. Plant Bioinformatics: Methods and Protocols,<br />
edited by David Edwards, 2007<br />
405. Telomerase Inhibition: Strategies and Protocols,<br />
edited by Lucy Andrews and Trygve O. Tollefsbol,<br />
2007<br />
404. Topics in Biostatistics, edited by Walter T.<br />
Ambrosius, 2007<br />
403. Patch-Clamp Methods and Protocols, edited by<br />
Peter Molnar and James J. Hickman 2007<br />
402. PCR Primer Design, edited by Anton Yuryev, 2007<br />
401. Neuroinformatics, edited by Chiquito J. Crasto,<br />
2007<br />
400. Methods in Membrane Lipids, edited by Alex<br />
Dopico, 2007<br />
399. Neuroprotection Methods and Protocols, edited<br />
by Tiziana Borsello, 2007<br />
398. Lipid Rafts, edited by Thomas J. McIntosh, 2007<br />
397. Hedgehog Signaling Protocols, edited by Jamila I.<br />
Horabin, 2007<br />
396. Comparative Genomics, Volume 2, edited by<br />
Nicholas H. Bergman, 2007<br />
395. Comparative Genomics, Volume 1, edited by<br />
Nicholas H. Bergman, 2007<br />
394. Salmonella: Methods and Protocols, edited by<br />
Heide Schatten and Abraham Eisenstark, 2007<br />
393. Plant Secondary Metabolites, edited by Harinder<br />
P. S. Makkar, P. Siddhuraju, and Klaus Becker,<br />
2007<br />
392. Molecular Motors: Methods and Protocols, edited<br />
by Ann O. Sperry, 2007<br />
391. MRSA Protocols, edited by Yinduo Ji, 2007<br />
390. Protein Targeting Protocols Second Edition,<br />
edited by Mark van der Giezen, 2007<br />
389. Pichia Protocols, Second Edition, edited by James<br />
M. Cregg, 2007<br />
388. Baculovirus and Insect Cell Expression<br />
Protocols, Second Edition, edited by David W.<br />
Murhammer, 2007<br />
387. Serial Analysis of Gene Expression (SAGE):<br />
Digital Gene Expression Profiling, edited by Kare<br />
Lehmann Nielsen, 2007<br />
386. Peptide Characterization and Application<br />
Protocols, edited by Gregg B. Fields, 2007<br />
385. Microchip-Based Assay Systems: Methods and<br />
Applications, edited by Pierre N. Floriano, 2007
METHODS IN MOLECULAR BIOLOGY TM<br />
Clinical Proteomics<br />
Methods and Protocols<br />
Edited by<br />
Antonia Vlahou<br />
Biomedical Research Foundation,<br />
Academy of Athens, Athens, Greece
Editor<br />
Antonia Vlahou<br />
Academy of Athens<br />
Biomedical Research Foundation<br />
Athens, Greece<br />
Athens 115 27<br />
e-mail: vlahoua@bioacademy.gr<br />
Series Editor<br />
John M. Walker<br />
School of Life Sciences<br />
University of Hertfordshire<br />
Hatfield, Herts., AL10 9AB<br />
UK<br />
ISBN: 978-1-58829-837-9 e-ISBN: 978-1-59745-117-8<br />
Library of Congress Control Number: 2007939413<br />
©2008 Humana Press, a part of Springer Science+Business Media, LLC<br />
All rights reserved. This work may not be translated or copied in whole or in part without the written<br />
permission of the publisher (Humana Press, 999 Riverview Drive, Suite 208, Totowa, NJ 07512 USA),<br />
except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form<br />
of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar<br />
methodology now known or hereafter developed is forbidden.<br />
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are<br />
not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to<br />
proprietary rights.<br />
While the advice and information in this book are believed to be true and accurate at the date of going to<br />
press, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors<br />
or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the<br />
material contained herein.<br />
Printed on acid-free paper<br />
987654321<br />
springer.com
Preface<br />
Clinical proteomics has rapidly evolved over the past few years and is<br />
continuously growing as new methodologies and technologies emerge. In<br />
this volume, leading researchers in the field have contributed their stateof-the-art<br />
methodologies on protein profiling and identification of disease<br />
biomarkers in tissues, microdissected cells, and body fluids. Experimental<br />
approaches involving application of two-dimensional electrophoresis, multidimensional<br />
liquid chromatography, SELDI/MALDI mass spectrometry and<br />
protein arrays, as well as the bioinformatics and statistical tools pertinent to<br />
the analysis of proteomics data are described. As stated in the introductory<br />
chapter by Prof. Paik, the Vice President of the Human Proteome Organization,<br />
“clinical proteomics needs the integration of biochemistry, pathology,<br />
analytical technology, bioinformatics, and proteome informatics to develop<br />
highly sensitive diagnostic tools for routine clinical care in the future.” The<br />
multi-disciplinary character of clinical proteomics approaches is evident in the<br />
detailed step-by-step protocols described in this volume, which makes them<br />
of potential use to a wide range of researchers, including clinicians, molecular<br />
biologists, chemists, bioinformaticians, and computational biologists.<br />
Antonia Vlahou<br />
v
Acknowledgments<br />
The editor gratefully acknowledges all contributing authors for their<br />
collaboration, which made this project possible and brought it into fruition; the<br />
series editor, Prof. John Walker, whose help and guidance have been instrumental;<br />
Mr. Patrick Marton, Mr. David Casey, and the whole production team<br />
at Humana headed by the late Mr. Tom Laningan for making an excellent<br />
production of this book.<br />
vii
Contents<br />
Preface ...............................................................<br />
Acknowledgments ....................................................<br />
Contributors ..........................................................<br />
v<br />
vii<br />
xiii<br />
1. Overview and Introduction to Clinical Proteomics ................. 1<br />
Young-Ki Paik, Hoguen Kim, Eun-Young Lee,<br />
Min-Seok Kwon, and Sang Yun Cho<br />
Part I: Specimen Collection for Clinical Proteomics<br />
2. Specimen Collection and Handling: Standardization of Blood<br />
Sample Collection .............................................. 35<br />
Harald Tammen<br />
3. Tissue Sample Collection for Proteomics Analysis.................. 43<br />
Jose I. Diaz, Lisa H. Cazares, and O. John Semmes<br />
Part II: Clinical Proteomics by 2DE and Direct<br />
MALDI/SELDI MS Profiling<br />
4. Protein Profiling of Human Plasma Samples<br />
by Two-Dimensional Electrophoresis ........................... 57<br />
Sang Yun Cho, Eun-Young Lee, Hye-Young Kim, Min-Jung<br />
Kang, Hyoung-Joo Lee, Hoguen Kim, and Young-Ki Paik<br />
5. Analysis of Laser Capture Microdissected Cells<br />
by 2-Dimensional Gel Electrophoresis .......................... 77<br />
Daohai Zhang and Evelyn Siew-Chuan Koay<br />
6. Optimizing the Difference Gel Electrophoresis (DIGE)<br />
Technology .................................................... 93<br />
David B. Friedman and Kathryn S. Lilley<br />
7. MALDI/SELDI Protein Profiling of Serum<br />
for the Identification of Cancer Biomarkers......................125<br />
Lisa H. Cazares, Jose I. Diaz, Rick R. Drake, and O. John Semmes<br />
8. Urine Sample Preparation and Protein Profiling<br />
by Two-Dimensional Electrophoresis and Matrix-Assisted Laser<br />
Desorption Ionization Time of Flight Mass Spectroscopy ........ 141<br />
Panagiotis G. Zerefos and Antonia Vlahou<br />
ix
x<br />
Contents<br />
9. Combining Laser Capture Microdissection and Proteomics<br />
Techniques .................................................... 159<br />
Dana Mustafa, Johan M. Kros, and Theo Luider<br />
Part III: Clinical Proteomics by LC-MS Approaches<br />
10. Comparison of Protein Expression by Isotope-Coded Affinity<br />
Tag Labeling ................................................... 181<br />
Zhen Xiao and Timothy D. Veenstra<br />
11. Analysis of Microdissected Cells by Two-Dimensional<br />
LC-MS Approaches .............................................193<br />
Chen Li, Yi-Hong, Ye-Xiong Tan, Jian-Hua Ai,<br />
Hu Zhou, Su-Jun Li, Lei Zhang, Qi-Chang Xia,<br />
Jia-Rui Wu, Hong-Yang Wang, and Rong Zeng<br />
12. Label-Free LC-MS Method for the Identification of Biomarkers ..... 209<br />
Richard E. Higgs, Michael D. Knierman,<br />
Valentina Gelfanova, Jon P. Butler, and John E. Hale<br />
13. Analysis of the Extracellular Matrix and Secreted Vesicle<br />
Proteomes by Mass Spectrometry ............................... 231<br />
Zhen Xiao, Thomas P. Conrads, George R. Beck, Jr.,<br />
and Timothy D. Veenstra<br />
Part IV: Clinical Proteomics and Antibody Arrays<br />
14. Miniaturized Parallelized Sandwich Immunoassays ................ 247<br />
Hsin-Yun Hsu, Silke Wittemann, and Thomas O. Joos<br />
15. Dissecting Cancer Serum Protein Profiles Using<br />
Antibody Arrays ................................................263<br />
Marta Sanchez-Carbayo<br />
Part V: Statistics and Bioinformatics in Clinical<br />
Proteomics Data Analysis<br />
16. 2D-PAGE Maps Analysis .......................................... 291<br />
Emilio Marengo, Elisa Robotti, and Marco Bobba<br />
17. Finding the Significant Markers: Statistical Analysis<br />
of Proteomic Data..............................................327<br />
Sebastien Christian Carpentier, Bart Panis,<br />
Rony Swennen, and Jeroen Lammertyn<br />
18. Web-Based Tools for Protein Classification ........................ 349<br />
Costas D. Paliakasis, Ioannis Michalopoulos, and Sophia Kossida
Contents<br />
xi<br />
19. Open-Source Platform for the Analysis of Liquid<br />
Chromatography-Mass Spectrometry (LC-MS) Data .............. 369<br />
Matthew Fitzgibbon, Wendy Law, Damon May,<br />
Andrea Detter, and Martin McIntosh<br />
20. Pattern Recognition Approaches for Classifying Proteomic Mass<br />
Spectra of Biofluids ............................................ 383<br />
Ray L. Somorjai<br />
Index ..................................................................... 397
Contributors<br />
Jian-Hua Ai • Eastern Hepatobiliary Surgery Hospital, Shanghai, China<br />
George R. Beck, Jr • Division of Endocrinology, Metabolism and Lipids<br />
Emory University, School of Medicine, Atlanta, GA<br />
Marco Bobba • University of Eastern Piedmont, Department<br />
of Environmental and Life Sciences, Alessandria, Italy<br />
Jon P. Butler • Lilly Corporate Center, Indianapolis, IN<br />
Sebastien Christian Carpentier • Faculty of Bioscience Engineering,<br />
Division of Crop Biotechnics, K.U. Leuven, Leuven, Belgium<br />
Lisa H. Cazares • The George L. Wright Jr. Center for Biomedical<br />
Proteomics Eastern Virginia Medical School, Norfolk, VA<br />
Sang Yun Cho • Yonsei Biomedical Proteome Research Center, Department<br />
of Biochemistry, College of Sciences, Seoul, Korea<br />
Thomas P. Conrads • Laboratory of Proteomics and Analytical<br />
Technologies SAIC-Frederick, Inc., National Cancer Institute at Frederick,<br />
Frederick, MD<br />
Andrea Detter • Fred Hutchinson Cancer Research Center, Seattle, WA<br />
Jose I. Diaz • Cancer Therapy Research Center’s Institute for Drug<br />
Development, University of Texas, Health Science Center, San Antonio, TX<br />
Rick R. Drake • Eastern Virginia Medical School, Norfolk, VA<br />
Matthew Fitzgibbon • Fred Hutchinson Cancer Research Center,<br />
Seattle, WA<br />
David B. Friedman • Proteomics Laboratory, Mass Spectrometry Research<br />
Center, Department of Biochemistry, Vanderbilt University School<br />
of Medicine, Nashville, TN<br />
Valentina Gelfanova • Lilly Corporate Center, Indianapolis, IN<br />
John E. Hale • Lilly Corporate Center, Indianapolis, IN<br />
Richard E. Higgs • Lilly Corporate Center, Indianapolis, IN<br />
Yi-Hong • Eastern Hepatobiliary Surgery Hospital, Shanghai, China<br />
Hsin-Yun Hsu • Biochemistry Department NMI Natural and Medical<br />
Sciences Institute at the University of Tuebingen, Reutlingen, Germany<br />
Thomas O. Joos • Biochemistry Department, NMI Natural and Medical<br />
Sciences Institute at the University of Tuebingen, Reutlingen, Germany<br />
Min-Jung Kang • Yonsei Biomedical Proteome Research Center,<br />
Department of Biochemistry, College of Sciences, Seoul, Korea<br />
xiii
xiv<br />
Contributors<br />
Hoguen Kim • Department of Pathology, College of Medicine, Yonsei<br />
University, Seoul, Korea<br />
Hye-Young Kim • Yonsei Biomedical Proteome Research Center,<br />
Department of Biochemistry, College of Sciences, Seoul, Korea<br />
Michael D. Knierman • Lilly Corporate Center, Indianapolis, IN<br />
Evelyn Siew-Chuan Koay • Department of Pathology, Yong Loo Lin<br />
School of Medicine, National University of Singapore, and Molecular<br />
Diagnosis Center, Department of Laboratory Medicine. National University<br />
Hospital, Singapore<br />
Sophia Kossida • Division of Biotechnology, Biomedical Research<br />
Foundation, Academy of Athens, Athens, Greece<br />
Johan M. Kros • Department of Pathology, Josephine Nefkens Institute<br />
Erasmus Medical Center, Rotterdam, The Netherlands<br />
Min-Seok Kwon • Yonsei Biomedical Proteome Research Center,<br />
Department of Biochemistry, College of Sciences, Seoul, Korea<br />
Jeroen Lammertyn • Faculty of Bioscience Engineering, Division<br />
of Mechatronics, Biostatistics and Sensors, K.U. Leuven, Leuven, Belgium<br />
Wendy Law • Fred Hutchinson Cancer Research Center, Seattle, WA<br />
Eun-Young Lee • Yonsei Biomedical Proteome Research Center,<br />
Department of Biochemistry, College of Sciences, Seoul, Korea<br />
Hyoung-Joo Lee • Yonsei Biomedical Proteome Research Center,<br />
Department of Biochemistry, College of Sciences, Seoul, Korea<br />
Chen Li • Research Center for Proteome Analysis, Institute of Biochemistry<br />
and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese<br />
Academy of Sciences, Shanghai, China<br />
Su-Jun Li • Research Center for Proteome Analysis, Institute of<br />
Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences,<br />
Chinese Academy of Sciences, Shanghai, China<br />
Kathryn S. Lilley • Cambridge Centre for Proteomics, Department<br />
of Biochemistry, University of Cambridge, United Kingdom<br />
Theo Luider • Laboratories of Neuro-Oncology/Clinical and Cancer<br />
Proteomics, Josephine Nefkens Institute Erasmus Medical Center,<br />
Rotterdam, The Netherlands<br />
Emilio Marengo • Department of Environmental and Life Sciences,<br />
University of Eastern Piedmont, Alessandria, Italy<br />
Damon May • Fred Hutchinson Cancer Research Center, Seattle, WA<br />
Martin McIntosh • Fred Hutchinson Cancer Research Center, Seattle, WA<br />
Ioannis Michalopoulos • Biomedical Research Foundation, Academy<br />
of Athens, Athens, Greece<br />
Dana Mustafa • Department of Pathology, Josephine Nefkens Institute<br />
Erasmus Medical Center, Rotterdam, The Netherlands
Contributors<br />
xv<br />
Young-Ki Paik • Department of Biochemistry, Yonsei Proteome Research<br />
Center & Biomedical Proteome Research Center, Seoul, Korea<br />
Costas D. Paliakasis • Biomedical Research Foundation, Academy<br />
of Athens, Athens, Greece<br />
Bart Panis • Faculty of Bioscience Engineering, Division of Crop<br />
Biotechnics, K.U. Leuven, Leuven, Belgium<br />
Elisa Robotti • Department of Environmental and Life Sciences, University<br />
of Eastern Piedmont, Alessandria, Italy<br />
Marta Ṣanchez-Carbayo • Tumor Markers Group, Spanish National<br />
Cancer Center (CNI0), Madrid, Spain<br />
O. John Semmes • The George L. Wright Jr. Center for Biomedical<br />
Proteomics, Eastern Virginia Medical School, Norfolk, VA<br />
Ray L. Somorjai • Biomedical Informatics Institute for Biodiagnostics,<br />
National Research Council, Winnipeg, Manitoba, Canada<br />
Rony Swennen • Faculty of Bioscience Engineering, Division of Crop<br />
Biotechnics, K.U. Leuven, Leuven, Belgium<br />
Harald Tammen • Digilab BioVisioN GmbH, Hannover, Germany<br />
Ye-Xiong Tan • Eastern Hepatobiliary Surgery Hospital, Shanghai, China<br />
Timothy D. Veenstra • Laboratory of Proteomics and Analytical<br />
Technologies, SAIC-Frederick, Inc., National Cancer Institute at Frederick,<br />
Frederick, MD<br />
Antonia Vlahou • Division of Biotechnology, Biomedical Research<br />
Foundation, Academy of Athens, Athens, Greece<br />
Hong-Yang Wang • Eastern Hepatobiliary Surgery Hospital,<br />
Shanghai, China<br />
Silke Wittemann • Biochemistry Department, NMI Natural and Medical<br />
Sciences Institute at the University of Tuebingen, Reutlingen, Germany<br />
Jia-Rui Wu • Research Center for Proteome Analysis, Institute of<br />
Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences,<br />
Chinese Academy of Sciences, Shanghai, China<br />
Qi-Chang Xia • Research Center for Proteome Analysis, Institute of<br />
Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences,<br />
Chinese Academy of Sciences, Shanghai, China<br />
Zhen Xiao • Laboratory of Proteomics and Analytical Technologies,<br />
SAIC-Frederick, Inc., National Cancer Institute at Frederick,<br />
Frederick, MD<br />
Rong Zeng • Research Center for Proteome Analysis, Institute of<br />
Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences,<br />
Chinese Academy of Sciences, Shanghai, China<br />
Panagiotis G. Zerefos • Division of Biotechnology, Biomedical Research<br />
Foundation, Academy of Athens, Athens, Greece
xvi<br />
Contributors<br />
Daohai Zhang • Molecular Diagnosis Center Department of Laboratory<br />
Medicine, National University Hospital, Singapore and Department of<br />
Pathology, Yong Loo Lin School of Medicine, National University of<br />
Singapore, Singapore<br />
Lei Zhang • Research Center for Proteome Analysis, Institute of<br />
Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences,<br />
Chinese Academy of Sciences, Shanghai, China<br />
Hu Zhou • Research Center for Proteome Analysis, Institute of Biochemistry<br />
and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese<br />
Academy of Sciences, Shanghai, China
1<br />
Overview and Introduction to Clinical Proteomics<br />
Young-Ki Paik, Hoguen Kim, Eun-Young Lee, Min-Seok Kwon,<br />
and Sang Yun Cho<br />
Summary<br />
As the field of clinical proteomics progresses, discovery of disease biomarkers becomes<br />
paramount. However, the immediate challenges are to establish standard operating procedures<br />
for both clinical specimen handling and reduction of sample complexity and to<br />
increase the ability to detect proteins and peptides present in low amounts. The traditional<br />
concept of a disease biomarker is shifting toward a new paradigm, namely, that an<br />
ensemble of proteins or peptides would be more efficient than a single protein/peptide<br />
in the diagnosis of disease. Because clinical proteomics usually requires easy access to<br />
well-defined fresh clinical specimens (including morphologically consistent tissue and<br />
properly pretreated body fluids of sufficient quantity), biorepository systems need to be<br />
established. Here, we address these questions and emphasize the necessity of developing<br />
various microdissection techniques for tissue specimens, multidimensional fractionation<br />
for body fluids, and other related techniques (including bioinformatics), tools which could<br />
become integral parts of clinical proteomics for disease biomarker discovery.<br />
Key Words: biomarker; body fluids; clinical proteomics; translational proteomics;<br />
depletion; biorepository; multidimensional fractionation; specimen bank; biomarker panel.<br />
Abbreviations: CSF: Cerebrospinal Fluid, SILAC: Stable Isotope Labeling with<br />
Amino acids in Cell culture, FFE: Free Flow Electrophoresis, IMAC: Immobilized Metal<br />
Affinity Chromatography, 2DE: 2-dimensional Gel electrophoresis, CBB: Coomassie<br />
Brilliant Blue, SELDI: Surface-Enhanced Laser Desorption/Ionization, MALDI: Matrix-<br />
Assisted laser desorption/ionization, MDLC: Multi-dimensional Liquid Chromatography,<br />
LC: Liquid Chromatography, TOF: Time-of-Flight, CID: Collision-induced dissociation,<br />
ETD: Electron Transfer Dissociation, LIT: Linear Ion-Trap, FT: Fourier-Transform, Q:<br />
Quadrupole, ELISA; Enzyme-Linked Immunosorbent Assay, SISCAPA: Stable Isotope<br />
Standards with Capture by Anti-Peptide Antibody, AQUA: Absolute Quantitative<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
1
2 Paik et al.<br />
Analysis. Commercial brands are also shown: MARS; Multiple Affinity Removal System,<br />
(Agilent, Palo Alto, CA, USA), Enchant TM : Enchant TM Multi-protein Affinity Separation<br />
Kit (Pall Life Sciences, Ann Arbor, MI, USA), Gradiflow TM : Gradiflow TM Separation (Life<br />
Bioprocess, Frenchs Forest, Australia), FFE TM : BD Free Flow Electrophoresis System<br />
(BD Diagnostics, Martinsried/Planegg, Germany), Zoom ® : Zoom ® Benchtop Proteomics<br />
System (Invitrogen Corporation, Carlsbad, CA, USA), Rotofor: Bio-Rad Rotofor ® Prep<br />
IEF Ccll (Bio-Rad, Hercules, CA, USA), PF2D: ProteomeLab TM PF2D Protein Fractionation<br />
System (Beckman Coulter, Inc., Fullerton, CA, USA), DIGE: Ettan TM DIGE System<br />
(GE Healthcare Bio-Sciences AB, Uppsala, Sweden), Deep Purple TM : Deep Purple TM Total<br />
Pprotein Stain (GE Healthcare Bio-Sciences AB, Uppsala, Sweden), ICAT TM : Isotopecoded<br />
affinity tags (Applied Biosystems, Foster City, CA, USA), iTRAQ TM : iTRAQ TM<br />
Reagents (Applied Biosystems, Foster City, CA, USA), Q-TRAP TM : (Applied Biosystems,<br />
Foster City, CA, USA).<br />
1. Overview and Scope of Clinical Proteomics<br />
Clinical proteomics is defined as comprehensive studies of qualitative and<br />
quantitative profiling of proteins (and peptides) present in clinical specimens<br />
such as body fluids and tissues. The comparison of specimens from healthy and<br />
diseased individuals may lead to the discovery of a disease biomarker (1). The<br />
biomarker serves as a molecular signature reflecting stages of disease before or<br />
after treatment and can also be used for prognostic purposes in monitoring the<br />
response to treatment (2). Clinical proteomics consists of a variety of experimental<br />
processes, which include the collection of well-phenotyped clinical<br />
specimens, analysis of proteins or peptides of interest, data interpretation, and<br />
validation of proteomics data in a clinical context (Fig. 1). After successful<br />
identification of a few disease biomarker candidates through extensive profiling,<br />
Fig. 1. Clinical and translational proteomics. The key components of experimental<br />
methods are included in each box.
Overview and Introduction to Clinical Proteomics 3<br />
translational proteomics involving validation with a cohort study follows. Even<br />
after proper identification and verification of a disease biomarker, it takes quite<br />
a long time to prove that this biomarker is applicable to clinical diagnosis or<br />
prognosis (3,4).<br />
There has been a remarkable increase in publication of clinical proteomics<br />
papers within a short period of time [more than 800 papers in 2006 (Fig. 2)],<br />
coinciding with the rapid growth of proteomics. Reflecting this trend in clinical<br />
proteomics, this chapter aims to present a review of core technologies that<br />
are used in the field of clinical proteomics with respect to sample specimen<br />
processing, protein separation platforms (e.g., gel-based system or liquid-based<br />
methods), quantitative labeling, mass spectrometry (MS), and proteome informatics<br />
tools. It is noteworthy that despite the advent of new technologies,<br />
there remain several bottlenecks in the proteomics field such as lack of dataset<br />
standardization, quantification of the proteins of interest, verification of protein<br />
or peptides identified, and an overall strategy for tackling biomarker postidentification.<br />
Thus, the pace of biomarker discovery, one of the key agendas of<br />
clinical proteomics, will depend on how well these obstacles or bottlenecks are<br />
resolved by technical advancement (4). The following sections address these<br />
issues in the context of clinical proteomics.<br />
Fig. 2. Recent trends in clinical proteomics publications. The distribution of the<br />
articles related to clinical proteomics listed in PubMed is shown here. The key words<br />
used for searching articles are as follows: query (clinical[All Fields] OR ((“biological<br />
markers”[TIAB] NOT Medline[SB]) OR “biological markers”[MeSH Terms] OR<br />
biomarker[Text Word])) AND (“proteomics”[MeSH Terms] OR proteomics[Text<br />
Word] OR proteomic[All Fields] OR “proteome”[MeSH Terms] OR proteome[Text<br />
Word]).
4 Paik et al.<br />
2. Sample Specimens and Processing Techniques Used for Clinical<br />
Proteomics<br />
2.1. General Considerations<br />
Because clinical proteomics rely heavily on the patient specimens, three<br />
important factors need to be considered before the selection and preparation of<br />
clinical specimens: (1) selection of the correct clinical samples according to the<br />
type of research, (2) isolation of the appropriate component from the clinical<br />
samples, and (3) establishment of optimal experimental conditions for each<br />
sample (5,6,7,8). For the selection of correct clinical samples, the relationship<br />
between clinical samples and the specific disease should also be considered.<br />
For example, although cancer tissue represents a specific cancer, several types<br />
of body fluids from patients may also have a relationship to the cancer. If<br />
the selected clinical samples specifically represent the disease, the next step<br />
is to evaluate what components are related to the specific disease. That is,<br />
tumor cells in cancerous tissues are surrounded by many types of stromal cells,<br />
inflammatory cells, and connective tissues that are directly related to changes<br />
in protein expression in the cancer. If the purpose of proteomic analysis is<br />
to identify characteristic changes of specific proteins in tumor cells, then the<br />
precise identification of tumor cell percentage that can be increased by tissue<br />
microdissection would appear to be necessary (5,6,7). As sample specimen<br />
conditions directly impact the results of biomarker discovery, well-defined<br />
clinical specimens should be used since the discovery of disease biomarkers is<br />
much easier when the samples have clear anatomical and pathophysiological<br />
definitions. Because clinical specimens are heterogeneous, sophisticated pathological<br />
discrimination is required for the isolation of specific diseased tissue or<br />
body fluids. Without the expertise of a pathologist at the earliest stage, it may<br />
be difficult to isolate a specifically defined specimen for clinical proteomics.<br />
Generally, clinical samples contain variable factors and components originating<br />
from the microenvironment of specific tissues. For instance, liver tissues usually<br />
contain a large amount of blood in the sinusoid and this amount is increased<br />
in tissues with dilated sinusoids (9). Lung tissues usually contain deposited<br />
exogenous materials and this amount is increased in heavy smokers (10). Note<br />
that the amount of blood present in isolated tissues may directly influence the<br />
relative proportion of proteins found in clinical specimens. Deposited materials<br />
and the other chemicals such as stain dye and fixatives used in the microdissection<br />
may also influence the experimental conditions (11). In the analysis of<br />
clinical samples, suitable buffer conditions, minimal lysis time, and high-yield<br />
protein precipitation are highly recommended. To avoid substantial variations<br />
between experiments using clinical specimens, a large set of specimens are<br />
also necessary because, unlike cultured cell lines, clinical specimens have high
Overview and Introduction to Clinical Proteomics 5<br />
component variability (12). More details on specific disease types are also<br />
described throughout this volume.<br />
2.2. Body Fluids<br />
Surveying the literature, there appears to be five to six different types of<br />
clinical specimens. Body fluids [e.g., plasma, urine, tear, cerebrospinal fluid,<br />
lymph, and ascites], tissues (e.g., liver, heart, muscle, brain, and lung), cells,<br />
bone, and hair have all been used for clinical proteomics (Table 1) (13,14,15,16,<br />
17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33). Each has its own merits<br />
and limitations for biomarker discovery via proteomic analysis. Among those<br />
sample specimens, the number of publications using body fluids has increased<br />
recently, perhaps because of their convenience and ease of use for noninvasive<br />
diagnosis. Since those proteins secreted in the body fluids during or after disease<br />
may reflect a broad range of pathophysiological conditions, much emphasis has<br />
been given to identification of prominent protein/peptide biomarkers that exhibit<br />
differential expression at different stages. In the literature, the terms “body<br />
fluids” and “biofluids” are being used interchangeably, although the former<br />
indicates a greater likelihood of being obtained directly from the patients, while<br />
the latter is applied more broadly, referring to liquid or liquid-like samples<br />
obtained from living organisms including model animals and plants. Throughout<br />
this chapter we will use “body fluids” for clarity.<br />
Given the large dynamic range of protein and peptide sources, plasma (a<br />
complex liquid interface between tissues) and extra cellular fluids may be the<br />
best body fluid to use for clinical proteomics and biomarker discovery (34,35,<br />
36,37,38). In addition to plasma, more than a dozen additional body fluids are<br />
currently used for biomarker discovery, ranging from urine to peritoneal fluids<br />
(Table 1). However, the biggest challenge in body fluids proteomics may be the<br />
multiple pretreatment processes including depletion of high-abundance proteins<br />
(in the case of plasma) (34,35,36) and/or their enrichment (in the case of urine)<br />
(15,39) prior to analysis (Table 1). Thus, the outcome of clinical proteomics<br />
may depend on proper sample processing since the quality of selection and<br />
handling of the most specific type of specimen will affect the overall pattern of<br />
profiling. Because the details of body fluid proteomics have been well described<br />
by Shen Hu et al. (38), we would like to focus on only a few essential points.<br />
First, standard measures need to be introduced to protect specimens from<br />
nonspecific proteolysis, lysis, and modification during collection and preparation<br />
(11). For the standardization of blood sample collection, Tammen<br />
emphasizes many useful considerations of preanalytical variables in plasma<br />
proteomics, which can be applied to processes involved with blood specimens<br />
[(40) and see Chapter 2]. The more specific problems involved in sample
Table 1<br />
Types of Biological Specimens Used in Clinical Proteomics<br />
Type Disease Reference Characteristics of the<br />
samples<br />
Fluid Secretions Plasma/serum (13,14) • Routinely accessible<br />
body fluids<br />
• Very important in the<br />
discovery of biomarkers<br />
Urine<br />
Nasal discharge Tears<br />
Saliva<br />
Amniotic-/cervical fluid<br />
Prostate cancer<br />
Seasonal allergic rhinitis Blepharitis and dry eye Oral and breast cancer Fetal aneuploidy<br />
and intra-amniotic<br />
inflammation<br />
(15) (16) (17,18) (19) (20,21) of diseases (systemic<br />
vs. organ specific/local)<br />
• Important for early<br />
detection, disease<br />
severity, prognosis,<br />
monitoring of response<br />
to therapy<br />
Proximal<br />
fluid<br />
Body<br />
cavity<br />
fluid<br />
Follicular fluid Recurrent spontaneous<br />
abortion<br />
Male infertility<br />
Breast cancer<br />
Brain tumor<br />
Seminal fluid<br />
Nipple aspirate<br />
fluid<br />
Cerebrospinal<br />
fluid<br />
Synovial fluid<br />
Ascites<br />
Bronchial lavage<br />
fluid<br />
Pleural fluid<br />
Peritoneal fluid<br />
Rheumatoid arthritis<br />
Ovarian cancer<br />
Chronic obstructive<br />
pulmonary disease,<br />
asthmatics and lung<br />
disease<br />
Lung cancer<br />
Ovarian cancer<br />
(22)<br />
(23)<br />
(24)<br />
(25)<br />
(26)<br />
(13)<br />
(27,28)<br />
(29)<br />
(14)<br />
• Can reflect disease<br />
perturbations in the<br />
organs or tissues from<br />
which they are secreted<br />
• Procedure of synovial<br />
biopsy is not very<br />
difficult<br />
Pretreatment required<br />
for proteomics<br />
• Considerations for<br />
sample adequacy<br />
– Storage<br />
– Hemolysis<br />
– Influence of<br />
anticoagulants<br />
–Consistent results<br />
• Consider whether to<br />
pool samples or analyze<br />
individual samples<br />
• Depletion of<br />
high-abundance proteins<br />
(Albumin consist of<br />
50% of plasma proteins)<br />
• Mucosa and salt<br />
have to be removed<br />
necessarily<br />
6
Tissue LCM or<br />
LMPC<br />
isolated<br />
Formalin<br />
fixed<br />
Paraffin<br />
embedded<br />
Any type of disease (30) • Very important for the<br />
development of novel<br />
in situ biomarkers<br />
• Immunofluorescence,<br />
immunocytochemistry,<br />
imaging mass<br />
spectrometry<br />
Cell Cell lines<br />
or<br />
primary<br />
tissue<br />
culture<br />
Any type of disease (31) • Very important in the<br />
discovery of biomarker<br />
candidates<br />
• Validation should be<br />
performed using<br />
primary tumor samples<br />
(e.g., immunohistologic<br />
methods, imaging MS)<br />
Bone Cartilage Rheumatoid arthritis (32) • Cartilage consists<br />
mainly of extracellular<br />
matrix, mostly made<br />
of collagens and<br />
proteoglycans<br />
Hair (33) • Over 300 proteins<br />
were found to constitute<br />
the insoluble complex<br />
formed by<br />
transglutaminase<br />
crosslinking<br />
• Considerations for<br />
sample adequacy<br />
• Integrity, degradation<br />
of protein<br />
• Contamination<br />
(microorganisms,<br />
extraneous material)<br />
• Desalting and removal<br />
of media component<br />
• Cetylpyridinium<br />
chloride effectively<br />
aggregate with<br />
proteoglycan<br />
• Need to sufficient<br />
extraction of protein<br />
from insoluble complex<br />
7
8 Paik et al.<br />
handling are also addressed by Rai et al. (41). Second, to increase the dynamic<br />
range of detection and reduce sample heterogeneity, pretreatments such as<br />
depletion of high-abundance proteins appear to be required (34,35,36). In<br />
addition, many pretreatment steps to remove high-abundance proteins may be<br />
required during initial sample processing. Multiple fractionations of clinical<br />
samples prior to major separation work would reduce the sample complexity.<br />
Note that coremoval of low-abundance proteins during this type of multiple<br />
depletion (36,42) and modification of proteins of interest during or after<br />
isolation (43) should be considered as well. For several problems encountered<br />
with specimen collection, Xiao et al. (Chapter 13) in this volume also describe<br />
different methods to isolate extra cellular matrix (ECM) and analyze the<br />
proteome of secreted vesicles. These methods will be useful for studying ECM<br />
and secreted vesicles in various samples ranging from the primary cultured<br />
cells to tissue specimens. Therefore, one must consider the best options for this<br />
process before doing the main experiment.<br />
2.3. Tissues and Other Samples<br />
Usually tissues are used as primary screening samples to find direct causes<br />
of disease from the lesion present in tissues of the corresponding organ, for<br />
example, liver tissue in hepatocellular carcinoma (HCC) (44,45). Tissues are<br />
widely used for clinical proteomics, although there are no standing operation<br />
procedures in specimen fractionation and the detection limit of current instrumentation<br />
remains borderline. As listed in Table 1, many cancer tissues can be<br />
prepared in different ways such as laser capture microdissection (LCM) (5,6),<br />
pressures catapulting techniques [laser microdissection and pressure catapulting<br />
(LMPC)] (30,46), and formalin-fixed paraffin-embedded sample preparation<br />
(11). Theses techniques are well described in Chapters 3, 5, 9, and 11 in this<br />
volume. It is desirable, however, that proteomics studies of disease tissues<br />
should also be coupled with parallel analysis of the corresponding body fluids.<br />
For example, for the study of cancer biomarkers, paired cancer tissue sets (tumor<br />
vs. nontumor) and the same patient’s plasma were used, which led to a more<br />
comprehensive analysis (47,48). Experiments on tissue samples may mostly be<br />
suitable for pathophysiological studies rather than biomarker discovery due to<br />
the complexity of the sample.<br />
In specimen processing for proteomics studies, there are usually several<br />
unwanted problems such as artifacts created during sample collection, processing,<br />
and storage. Other matters arise in the handling of patient information regarding<br />
sex, age, and race (49). To minimize those problems associated with systematic<br />
sample handling, it is plausible to establish a specimen bank (50,51,52). In fact,<br />
the collection of many clinical samples in a biorepository would have enormous
Overview and Introduction to Clinical Proteomics 9<br />
benefits for proteomic research. This enables the selection of homogeneous<br />
clinical samples according to the research purposes and isolation of specific<br />
components from clinical samples. Additionally, large scale collection of clinical<br />
specimens in a biorepository is essential for the validation of specific markers<br />
after biomarker candidate discovery. Ideally, the clinical samples stored in the<br />
biorepository should be (1) collected and stored immediately because dead cells<br />
and altered proteins affect proteomic analysis, (2) subjected to accurate quality<br />
control, and (3) catalogued by reliable and secure clinical data. The quality control<br />
of clinical samples includes trimming of specimens and confirmation of diagnosis<br />
by pathologists; information gained (such as the confirmation of tumor cell and<br />
stromal cell ratio, percentage of necrosis, percentage of fibrosis, proportion of<br />
infiltrated inflammatory cells, etc.) should be stored in a database of clinical<br />
samples. It is also essential to store clinical and follow-up data for each sample<br />
and each patient’s written informed consent form in the biorepository network.<br />
This clinical specimen banking network provides convenience, reduced budget,<br />
and reliability for researchers involved in clinical proteomic research (50,51,52).<br />
For representative tissue sample collection for proteomics studies, Diaz et al.<br />
(Chapter 3) address a practical experimental strategy for storage and handling of<br />
sample specimens that are used in surface-enhanced laser desorption/ionization<br />
(SELDI), 2D gel, and liquid chromatography (LC)-based proteomics. Emphasis<br />
should be given to the primary responsibility of pathologists in the whole<br />
process of tissue proteomics in addition to morphological analysis at the<br />
molecular level.<br />
3. Biomarker Discovery and Clinical Proteomics<br />
Given that one of the central issues of clinical proteomics is biomarker<br />
discovery and its application, a brief account of this subject is appropriate<br />
here. An excellent review of the whole arena of biomarker development can be<br />
found elsewhere (53,54,55). Until now, it has been generally accepted that a<br />
conventional concept of a disease biomarker would be a single protein/peptide<br />
with high specificity, which is usually present in low abundance, expressed in<br />
a disease in a stage-specific manner, and serve as a major fingerprint of the<br />
body’s response to drugs or other treatments. Although many examples of broad<br />
biomarkers for various diseases are known (56,57,58,59,60), identification of<br />
more specific and selective biomarkers is urgently needed. Accordingly, we<br />
may also need to change the current biomarker concept and eliminate the<br />
inherent bias toward individual disease biomarkers. Recently, a new idea has<br />
been introduced that an ensemble of different proteins would be more efficient<br />
than a single protein/peptide in the diagnosis of disease (61,62,63). To solve
10 Paik et al.<br />
this problem we propose a general strategy of clinical proteomics leading to<br />
disease biomarker discovery as outlined in Fig. 3.<br />
Since biomarker candidate proteins could come from many different cellular<br />
processes, they could be either in low abundance or high abundance, which<br />
would directly or indirectly reflect the physiological condition of the body.<br />
Perhaps they are present in different concentrations depending on the disease<br />
stage or tissue type. For example, common proteins such as Hsp 27 (64,<br />
65), 14-3-3 proteins (66,67), apoA-I (68,69), and serum amyloid precursor<br />
A (70) appear in most of disease samples from lung cancer, gastric cancer,<br />
pancreatic cancer, prostate cancer, neuroblastoma and, inflammation. A number<br />
of questions then arise: should they be treated as disease-specific or disease<br />
nonspecific proteins What would be the criterion to make this decision Is this<br />
due to the fact that the number and type of proteins secreted from a specific<br />
Fig. 3. The concept of the creation of a protein biomarker panel for a specific<br />
disease. Each white, gray, dark-gray, and black circle represents a putative protein<br />
biomarker of a specific disease at that clinical stage. A group of slash-lined circles<br />
symbolizes the biomarker panel of liver disease as an example.
Overview and Introduction to Clinical Proteomics 11<br />
physiological condition of many different types of diseases might be similar<br />
How one can distinguish one type of disease from another simply by looking<br />
at their protein profiles<br />
As outlined in Fig. 3, at the beginning of certain disease, signals at earlier<br />
stages may be limited to only a few easily counted molecules. As the disease<br />
progresses, more signal molecules might have been produced, resulting in mixed<br />
types of biomarkers representing multiple disease phenomena. Although this<br />
assumption seems to be oversimplified, more noise is created at a certain stage<br />
where it becomes more difficult to identify those molecules at the molecular<br />
level because of two reasons: (1) they are in amounts too small to be detected<br />
using the current technology and (2) it may be too premature for the molecules<br />
to be specific for a particular disease. Presumably, proteins appearing in stage 3<br />
or 4 may have higher specificity of a particular disease but the sensitivity might<br />
be low. It may be likely that this noise interferes with the signaling pathway of<br />
a certain disease, and we may end up having no decisive marker. To circumvent<br />
this problem, it may be desirable to identify a set of biomarker candidate<br />
proteins, termed a “biomarker panel,” which ideally contains potential candidate<br />
proteins or peptides that represent specific stages of the disease as a group.<br />
Given this panel, extensive validation processes may be sought using large<br />
group cohort. Analogous to this strategy, many biomarker candidates at stage 1<br />
can be included in the panel, which can have more specificity and sensitivity as<br />
compared to a single molecule biomarker. Using this kind of biomarker panel,<br />
one can use not only this molecule as diagnostic marker but also as a prognostic<br />
indicator in monitoring treatment effectiveness. For example, Linkov et al. (61)<br />
reported that both the sensitivity and specificity were improved up to 84.5 and<br />
98%, respectively, when they used a panel containing 25 multimarkers in early<br />
diagnosis of head and neck cancer (squamous cell cancer of the head and neck)<br />
(61). In the diagnosis of prostate cancer, specificity was increased from 5–15<br />
to 84–95% when they used a biomarker panel containing six marker proteins<br />
as compared to a single marker. In HCC, studies have been carried out on a<br />
biomarker panel consisting of a protein array that can be used as a diagnostic<br />
kit (62,63).<br />
A general strategy for biomarker discovery is outlined in Fig. 4. In typical<br />
clinical proteomics, work sample collection is the first step, followed by<br />
pretreatment of the sample in order to reduce sample complexity to enable<br />
searching for low-abundance proteins (e.g., disease biomarkers) using various<br />
fractionation tools. This multidimensional fractionation is well-described<br />
elsewhere (34,35,36), and depends on the properties and concentration of the<br />
sample. Typically the prefractionated samples go either to a two-dimensional<br />
electrophoresis (2DE) or LC-based proteomics separation system, followed by<br />
single or multiple steps of mass spectrometric analysis depending on the sample
12<br />
Fig. 4.
Overview and Introduction to Clinical Proteomics 13<br />
quantity and experimental goal. The data obtained from this series of analyses<br />
will be integrated into the proteome informatics system where protein/peptide<br />
identification, quantification, modification, and verification of peak list are<br />
carried out [(71) and also Chapter 19]. Usually this step becomes rate limiting<br />
since major profiling data are constructed and analyzed at this point. The<br />
clinical relevance of those proteins (and changes in their expression level) in<br />
a specific disease state is mostly determined, which eventually leads to identification<br />
of biomarker candidates. In addition, SELDI, molecular imaging and<br />
protein microarrays can also be applied before or after this step. Once major<br />
biomarker candidates are identified, those proteins are subjected to further<br />
verification via sophisticated analytical arrays and translational proteomics,<br />
which involves cohort studies, pre-evaluation, and a robust analytical system<br />
(4,72). Throughout the process of translational proteomics, one may be able to<br />
judge whether the identified panel or single proteins are suitable for biomarkers<br />
of a specific disease. A recent comprehensive review by Zolg (73) addressed<br />
several considerations in the biomarker development pipeline from discovery<br />
to validation. Three critical challenges within the pipeline are reduction of<br />
clinical sample complexity, the proof of principle of biomarker function, and<br />
the detection limit of unique proteins present in the samples.<br />
In the search for biomarker panels, reliable statistical tools and bioinformatics<br />
resources are needed, which are now available on the web (Table 2;<br />
see also Chapters 16 and 17). As the number of biomarker panel candidates<br />
increases, more cases are being examined, which require statistical learning<br />
methods. These methods include neural networks, genetic algorithms, k-means<br />
◭<br />
Fig. 4. A typical experimental strategy for clinical proteomics and translational<br />
proteomics. In clinical proteomics research, various experimental techniques<br />
are included: specimen collection, prefractionation, 2DE, Non2DE (liquid-based<br />
separation), mass spectrometry, informatics, and others. The course of each section as<br />
marked (square, circle in different color) is determined by the investigators, depending<br />
on the experimental goal. At the bottom, experimental procedures for the verification<br />
and validation of biomarker candidates are schematically outlined leading to clinical<br />
screening and applications. The squares indicate the separation system based on the<br />
specific characteristics of proteins and general prefractionation system. The open circles<br />
and open triangle represent analytical modules at the protein and peptide level, respectively.<br />
The arrow and junction points indicate an option of each selection. Bottom parts<br />
indicate verification procedure employing multiple reaction monitoring and quantitative<br />
mass analysis. Those biomarker candidates identified from typical clinical proteomics<br />
would be subject to translational proteomics for validation where a large scale cohort<br />
study and evaluation would then proceed.
14 Paik et al.<br />
nearest-neighbor analysis, euclidean distance-based nonlinear methods, fuzzy<br />
pattern matching, selforganizing mapping, and support vector machines<br />
(74,75,76,77,78). They are very useful for classification of proteins according<br />
to the specific disease state (see also Chapters 16 and 20). Once biomarker<br />
candidates are identified, it is necessary to predict in silico the function of<br />
these proteins and validate them in the context of clinical application. Table 3<br />
provides web resources, which can be used for clinical data management, in<br />
silico functional annotation (see Chapter 18), prediction, and identification of<br />
modified forms of proteins. Thus, by combining experimental methods (Fig. 4)<br />
and informatics tools (Tables 2 and 3), one is able to obtain a set of biomarker<br />
candidate proteins (panel) that would be further used for validation through<br />
translational proteomics (Fig. 1).<br />
4. Introduction of the Experimental Strategy Described<br />
in This Volume<br />
For protein profiling and identification, proteomics platform technologies<br />
are moving forward in many areas not only in clinical proteomics but also in<br />
the general biological field. In this section, the leading scientists in the field<br />
of proteomics outline core techniques and their application to the studies of<br />
clinical proteomics. For example, in plasma proteome analysis, it is necessary<br />
to deplete high-abundance proteins using various techniques such as multidimensional<br />
fractionation by immunoaffinity column, gel permeation, and beads<br />
(Fig. 4). Cho et al. (Chapter 4) addresses this in relation to 2D gel analysis of<br />
plasma wherein the technical details of sample preparation, gel electrophoresis,<br />
and quantification of proteins on the gel are described. Zhang and Koay<br />
(Chapter 5) describe the methods of 2D gel analysis for cells prepared by<br />
LCM. They describe the application of LCM in dissecting tumor cells in<br />
breast cancer for macromolecular extraction and 2D gels. This can be used<br />
for preparation of samples from paraffin-embedded tissue blocks in microdissecting<br />
the cells of interest. Further to this procedure, Mustafa et al. (Chapter 9)<br />
review the application of LCM for proteomics analysis and demonstrate that<br />
combining LCM and MS would facilitate identification of specific proteins<br />
for each sample type. For urine sample analysis, Zerefos et al. (Chapter 8)<br />
provide simple protocols for protein analysis by 2D gel or direct matrix-assisted<br />
laser desorption/ionization-time-of-flight mass spectrometry. These techniques<br />
include protein enrichment through protein precipitation and ultrafiltration<br />
means. Combining these methods with the above profiling technologies allows<br />
reproducible and sensitive analysis of one of the most significant and complex<br />
biological samples (77).
Overview and Introduction to Clinical Proteomics 15<br />
Table 2<br />
Clinical Proteomics Initiatives and Resources<br />
Institute<br />
CPTI<br />
ABRF<br />
PPI<br />
EDRN<br />
Web resources<br />
ExPASy<br />
NCBI<br />
CPRMap<br />
Database<br />
MedGene<br />
Details<br />
National Cancer Institute’s Clinical<br />
Proteomics Technologies, initiative for<br />
cancer<br />
The Association of Biomolecular<br />
Resource Facilities, an international<br />
society dedicated to advancing core and<br />
research biotechnology laboratories<br />
through research, communication, and<br />
education<br />
Plasma Proteome Institute, the PPI is<br />
working to facilitate clinical adoption of<br />
advanced diagnostic tests using proteins<br />
in plasma and serum<br />
The Early Detection Research Network,<br />
the EDRN provide up-to-date<br />
information on biomarker research<br />
through this website and scientific<br />
publications<br />
Expert Protein Analysis System,<br />
proteomics related information and<br />
database<br />
National Center for Biotechnology<br />
Information, the protein entries in the<br />
Entrez search and retrieval system have<br />
been compiled from a variety of sources,<br />
including SwissProt, PIR, PRF, PDB,<br />
and translations from annotated coding<br />
regions in GenBank and RefSeq<br />
Clinical Proteomics Research Map,<br />
updated research article for disease and<br />
clinical proteomics<br />
MedGene can make a list of human<br />
genes associated with a particular human<br />
disease in ranking order<br />
Websites<br />
http://proteomics.cancer.<br />
gov<br />
http://www.abrf.org/<br />
http://www.plasmaprote<br />
ome.org/plasmaframes.<br />
htm<br />
http://edrn.nci.nih.gov<br />
http://www.expasy.org/<br />
http://www.ncbi.nlm.<br />
nih.gov/entrez/query.<br />
fcgidb = Protein&<br />
itool = toolbar<br />
http://www.cprmap.com/<br />
http://hipseq.med.harv<br />
ard.edu/MEDGENE
16 Paik et al.<br />
Table 3<br />
Available Bioinformatic Resources for the Analysis of Proteomics Data<br />
Name Description Website URL PMID<br />
Clinical proteome data management system<br />
Proteus<br />
LIMS for proteomics<br />
pipeline<br />
CPAS<br />
LIMS for identification<br />
and quantification using<br />
by LC-MS/MS data<br />
Systems biology A management system for<br />
experiment analysis collecting, storing,<br />
management and accessing data<br />
system<br />
produced by microarray,<br />
proteomics, and<br />
immunohistochemistry<br />
GPM database Open source system for<br />
analyzing, validating,<br />
and storing protein<br />
identification data<br />
SpectrumMill MS/MS data analysis and<br />
management system<br />
http://www.<br />
genologics.com<br />
http://www.<br />
sbeams.org/<br />
http://www.<br />
thegpm.org/<br />
http://www.chem.<br />
agilent.com/<br />
16396501<br />
16756676<br />
15595733<br />
Phosphorylation<br />
Group-based<br />
phosphorylation<br />
scoring method<br />
KinasePhos<br />
NetPhos<br />
NetPhosK<br />
Prediction of<br />
kinase-specific<br />
phosphorylation sites<br />
A web tool for identifying<br />
protein kinase-specific<br />
phosphorylation sites<br />
using by hidden Markov<br />
model<br />
Sequence and<br />
structure-based prediction<br />
of eukaryotic protein<br />
phosphorylation sites<br />
Prediction of<br />
post-translational<br />
glycosylation and<br />
phosphorylation of<br />
proteins from the amino<br />
acid sequence<br />
http://973-<br />
proteinweb.ustc.<br />
edu.cn/gps/<br />
gps_web/<br />
http://kinasePhos.<br />
mbc.nctu.edu.tw<br />
http://www.cbs.<br />
dtu.dk/services/<br />
NetPhos/<br />
http://www.cbs.dtu.<br />
dk/services/<br />
NetPhosK/<br />
15980451<br />
15980458<br />
10600390<br />
15174133
Overview and Introduction to Clinical Proteomics 17<br />
PredPhospho<br />
PREDIKIN<br />
Prosite<br />
Scansite<br />
Phospho.ELM<br />
Human protein<br />
reference database<br />
(HPRD)<br />
PhosphoSite<br />
Glycosylation<br />
NetOGlyc 2.0<br />
DictyOGlyc 1.1<br />
YinOYang 1.2<br />
NetNGlyc 1.0<br />
GlycoMod<br />
Prediction of phosphorylation<br />
sites using support vector<br />
machine<br />
A prediction of substrates for<br />
serine/threonine protein<br />
kinases based on the primary<br />
sequence of a protein kinase<br />
catalytic domain<br />
A prediction of substrates<br />
for protein kinases-based<br />
conserved motif search<br />
Prediction of PK-specific<br />
phosphorylation site with<br />
Bayesian decision theory<br />
A database of experimentally<br />
verified phosphorylation sites<br />
in eukaryotic proteins<br />
A database of known<br />
kinase/phosphatase substrate as<br />
well as binding motifs that are<br />
curated from the published<br />
literature<br />
A bioinformatics resource<br />
dedicated to physiological<br />
protein phosphorylation<br />
Predicts O-glycosylation sites<br />
in mucin-type proteins<br />
Predicts O-GlcNAc sites in<br />
eukaryotic proteins<br />
Predicts O-GlcNAc sites in<br />
eukaryotic proteins<br />
Predicting N-glycosylation<br />
sites<br />
Web software for prediction of<br />
the possible oligosaccharide<br />
structures in glycoproteins<br />
from their experimentally<br />
determined masses<br />
http://pred.ngri.<br />
re.kr/Pred<br />
Phospho.htm<br />
http://florey.biosci.<br />
uq.edu.au/kinsub/<br />
home.htm<br />
http://kr.expasy.<br />
org/prosite<br />
http://scansite.<br />
mit.edu<br />
http://phospho.elm.<br />
eu.org/<br />
http://www.hprd.<br />
org/PhosphoMotif_<br />
finder<br />
http://www.<br />
phosphosite.org/<br />
Login.jsp<br />
http://www.cbs.<br />
dtu.dk/services/<br />
NetOGlyc/<br />
http://www.cbs.<br />
dtu.dk/services/<br />
DictyOGlyc/<br />
http://www.cbs.<br />
dtu.dk/services/<br />
YinOYang/<br />
http://www.cbs.dtu.<br />
dk/services/<br />
NetNGlyc/<br />
http://www.expasy.<br />
ch/tools/glycomod/<br />
15231530<br />
16445868<br />
17237102<br />
16549034<br />
15212693<br />
15174125<br />
9557871<br />
10521537<br />
16316981<br />
11680880<br />
(Continued)
18 Paik et al.<br />
Table 3<br />
(Continued)<br />
Name Description Website URL PMID<br />
Glyco-fragment<br />
GlycoSearchMS<br />
GlycosidIQ<br />
Saccharide<br />
topology<br />
analysis tool<br />
GlycoX<br />
MODi<br />
SWEET-DB<br />
A web tool to support<br />
the interpretation of<br />
mass spectra of complex<br />
carbohydrates<br />
Compares each peak<br />
of a measured mass<br />
spectrum with the calculated<br />
fragments of all structures<br />
contained in the SweetDB<br />
Based on the matching of<br />
experimental MS2 data with<br />
the theoretical fragmentation<br />
of glycan structures in<br />
GlycoSuiteDB<br />
A web-based computational<br />
program that can quickly<br />
extract sequence information<br />
from a set of MSn spectra<br />
for an oligosaccharide of up<br />
to 10 residues<br />
To determine simultaneously<br />
the glycosylation sites<br />
and oligosaccharide<br />
heterogeneity of<br />
glycoproteins using<br />
MATLAB<br />
A web server for identifying<br />
multiple post-translational<br />
peptide modifications from<br />
tandem mass spectra<br />
An attempt to create<br />
annotated data collections<br />
for carbohydrates<br />
Protein–protein interaction<br />
Munich<br />
The database of mammalian<br />
information protein–protein interactions<br />
center for protein<br />
sequence’s MPPI<br />
http://www.dkfz.<br />
de/spec/projekte/<br />
fragments/<br />
14625865<br />
http://www.dkfz. 15215392<br />
de/spec/glycosciences.<br />
de/sweetdb/ms/<br />
https://tmat. 15174134<br />
proteomesystems.<br />
com/glyco/glycosuite/<br />
glycodb<br />
http://www.<br />
unimod.org<br />
http://www.dkfz.de/<br />
spec2/sweetdb/<br />
10857602<br />
17022651<br />
16845006<br />
11752350<br />
http://mips.gsf.de 16381839
Overview and Introduction to Clinical Proteomics 19<br />
Database of<br />
interacting proteins<br />
Molecular<br />
interaction network<br />
database<br />
Protein–protein<br />
interactions of<br />
cancer proteins<br />
IntAct<br />
Biomolecular<br />
interaction network<br />
database<br />
A database that documents<br />
experimentally determined<br />
protein–protein interactions<br />
A database of storing, in<br />
a structured format,<br />
information about<br />
molecular interactions by<br />
extracting experimental<br />
details from work<br />
published in peer-reviewed<br />
journals<br />
Predicts interactions, which<br />
are derived from homology<br />
with experimentally known<br />
protein–protein interactions<br />
from various species<br />
IntAct provides a freely<br />
available, open source<br />
database system and<br />
analysis tools for protein<br />
interaction data<br />
A database designed to<br />
store full descriptions of<br />
interactions, molecular<br />
complexes and pathways<br />
http://dip.doembi.ecla.edu/<br />
http://mint.bio.<br />
uniroma2.it/mint<br />
http://bmm.<br />
cancerresearchuk.<br />
org/˜pip<br />
http://www.ebi.<br />
ac.uk/intact/<br />
Metabolic and<br />
signal pathway<br />
BioCarta A pathway database http://www.<br />
biocarta.com<br />
KEGG<br />
Cancer cell map<br />
HPRD<br />
A pathway database with<br />
genomical, chemical, and<br />
biological network<br />
information<br />
The cancer cell map is a<br />
selected set of human<br />
cancer focused pathways<br />
A database with<br />
data pertaining<br />
to post-translational<br />
modifications,<br />
protein–protein<br />
interactions, tissue<br />
expression,<br />
11752321<br />
17135203<br />
16398927<br />
17145710<br />
http://www.bind.ca 12519993<br />
http://www.<br />
genome.jp/kegg<br />
http://cancer.<br />
cellmap.org/cellmap/<br />
http://www.<br />
hprd.org/<br />
16381885<br />
(Continued)
20 Paik et al.<br />
Table 3<br />
(Continued)<br />
Name Description Website URL PMID<br />
subcellular localization,<br />
and enzyme–substrate<br />
relationships<br />
Proteomic data resource<br />
The cancer cell A database of clinical data<br />
map<br />
from SELDI-TOF<br />
Proteomics<br />
identifications<br />
database<br />
PeptideAtlas<br />
Disease resource<br />
Online<br />
mendelian<br />
inheritance in<br />
man<br />
GeneCards<br />
Cancer gene<br />
census<br />
A database of protein and<br />
peptide identifications that<br />
have been described in the<br />
scientific literature<br />
A multiorganism, publicly<br />
accessible compendium of<br />
peptides identified in a<br />
large set of tandem mass<br />
spectrometry proteomics<br />
experiments<br />
A database of human genes<br />
and genetic disorders<br />
An integrated database of<br />
human genes that includes<br />
automatically mined<br />
genomic, proteomic, and<br />
transcriptomic information<br />
A catalogue those genes for<br />
which mutations have been<br />
causally implicated in cancer<br />
http://home.ccr.<br />
cancer.gov/ncifda<br />
proteomics/<br />
ppatterns.asp<br />
http://www.ebi.<br />
ac.uk/pride/<br />
http://www.<br />
peptideatlas.org<br />
http://www.ncbi.nlm.<br />
nih.gov/entrez/query.<br />
fcgidb = OMIM<br />
http://www.genecards.<br />
org/index.shtml<br />
http://www.sanger.<br />
ac.uk/genetics/CGP/<br />
Census/<br />
16381953<br />
16381952<br />
17170002<br />
15608261<br />
14993899<br />
Two-dimensional electrophoresis is perhaps the most popular start-up tool<br />
for proteome analysis. For clinical proteomics, 2DE has been the traditional<br />
workhorse of proteomics used for the analysis of different clinical specimens<br />
ranging from plasma to urine (Table 1). Quantification problems in 2DE are now<br />
solved by employing fluorescent dyes (cy3 and cy5), which allow normalization
Overview and Introduction to Clinical Proteomics 21<br />
of data obtained from two different clinical specimens (79). Freedman and<br />
Lilley (Chapter 6) present general optimization conditions for differential in gel<br />
electrophoresis (DIGE) in the quantitative analysis of clinical samples. They<br />
address the usefulness of differentially labeling dyes (Cy2, Cy3, and Cy5).<br />
The essence of any DIGE system is to minimize any potential human errors<br />
in the process of identification and quantification of proteins spotted in a 2D<br />
gel (79). The difficulties in 2D map analysis are introduced by Marengo et al.<br />
(Chapter 16). They describe methods for comparing protein spots using image<br />
analysis technology and related informatics tools to minimize variations between<br />
measurements of spot volume, a key to successful 2D map construction.<br />
There are many variations of LC in protein profiling, including mass detection<br />
methods, column types, data mining through search engines, mass accuracy,<br />
and running conditions (80,81,82). These are all related to quantification of<br />
proteins or peptides in the sample, one of the major bottlenecks in proteomics<br />
(83,84,85,86,87). Among the several techniques are isotope-coded affinity tags<br />
(ICAT), mass-coded affinity tagging, and nonisotope labeled methods. Xiao and<br />
Veenstra (Chapter 10) present the application of ICAT in the course of COX-2<br />
inhibitor regulated proteins in a colon cancer cell line. With emphasis on sample<br />
preparation, they provide details on ICAT procedures for quantitative proteomics<br />
(88). In addition to this approach, Li et al. (Chapter 11) employ a strategy,<br />
which combines LCM techniques for sample preparation of HCC and cleavable<br />
isotope-coded affinity tags in order to identify those markers quantitatively.<br />
However, it should be mentioned here that some other measures are needed to<br />
increase the efficiency of ICAT since it has drawbacks in the efficiency of sample<br />
recovery during or after labeling steps (87). A label-free serum quantification<br />
method has been recently introduced (48) (See Chapter 12 by Higgs et al.).<br />
The use of antibody arrays in clinical proteomics has increased recently in the<br />
context of high-throughput detection of cancer specimens where the identities<br />
of the proteins of interest are known (89,90). The evaluation of antibody crossreactivity<br />
and specificity is very crucial in these assays. This matter is addressed<br />
by Sanchez-Carbayo (Chapter 15), where technical aspects and application of<br />
planar antibody arrays in the quantification of serum proteins is described as<br />
well as by Hsu et al. (Chapter 14) where the development and use of beadbased<br />
miniaturized multiplexed sandwich immunoassays for focused protein<br />
profiling in various body fluids is provided. The latter method using beadbased<br />
protein arrays or suspension microarray allows the simultaneous analysis<br />
of a variety of parameters within a single experiment. With the versatility of<br />
suspension microarray in the analysis of proteins of interest present in different<br />
types of body fluids ranging from serum to synovial fluids, this multiplexed<br />
protein profiling technology described by Hsu et al. (Chapter 14) seems to<br />
hold a great promise in clinical proteomics. Similarly, in combination with
22 Paik et al.<br />
tissue microarrays technology (91) it would also be possible to perform parallel<br />
molecular profiling of clinical samples together with immunohistochemistry,<br />
fluorescence in situ hybridization, or RNA in situ hybridization. SELDI is<br />
another arena of high-throughput profiling of clinical samples in the course<br />
of disease marker discovery [(92,93), Chapter 7]. It is expected that profiling<br />
approaches in proteomics, such as SELDI-MS, will be frequently used in disease<br />
marker discovery, but only if the proper identification technologies coupled<br />
with SELDI are improved.<br />
During the course of biomarker discovery, large data sets are usually<br />
generated and deposited in a coordinated fashion (Tables 2 and 3) (94,95).<br />
Indeed, statistical analysis of 2DE proteomics, which produce several hundred<br />
protein spots, is complex. To circumvent some inconsistency in 2D gel<br />
proteomics data, Friedman and Lilley (Chapter 6) and Carpentier et al. (Chapter<br />
17) point out available statistical tools and suggest case-specific guidelines for<br />
2D gel spot analysis. Fitzgibbon et al. (Chapter 19) describe an open source<br />
platform for LC-MS spectra where the msInspector program is used to lower<br />
false positives and guide normalization of the dataset. It is also demonstrated<br />
that msInspect can analyze data from quantitative studies with and without<br />
isotopic labels. Paliakasis et al. (Chapter 18) introduce web-based tools for<br />
protein classification, which lead to prediction of potential protein function<br />
and family clustering of related proteins. They provide some guidelines to<br />
classification of protein data into more meaningful families. Finally, Somorjai<br />
(Chapter 20) addresses important filtering criteria for the application of protein<br />
pattern recognition to biomarker discovery using statistical tools.<br />
5. Concluding Remarks<br />
Although there are several bottlenecks in clinical proteomics (such as lack<br />
of standardization of sample specimen process, quantification, and overall<br />
strategy for tackling post-identification of biomarkers), we believe that the<br />
field holds great promise in biomarker discovery. The success of clinical<br />
proteomics depends on the availability and selection of well-phenotyped<br />
specimens, reduction of sample complexity, development of good informatics<br />
tools, and efficient data management. Therefore, sample handling techniques<br />
including microdissection for tissue sample, multidimensional fractionation for<br />
body fluids, and pretreatment of other clinical specimens (e.g., urine, tears, and<br />
cells) should be developed in this context. Since there is no gold standard for<br />
sample collection and handling, one needs to find the best options available for<br />
sample processing without damage. In addition, establishment of a biorepository<br />
system would systematically minimize some artifacts and variation between<br />
samples during or after identification of biomarkers.
Overview and Introduction to Clinical Proteomics 23<br />
It is now generally accepted that an ensemble (or panel) of different proteins<br />
would be more efficient than a single protein/peptide in the diagnosis of disease,<br />
an idea which is poised to replace the conventional concept of a biomarker.<br />
As a high-throughput way of protein profiling, the use of antibody arrays<br />
in clinical proteomics has recently increased in regard to detection of cancer<br />
specimens. However, in the use of antibody arrays to profile serum autoantibodies,<br />
issues of cross-reactivity and specificity have to be resolved. Although<br />
not covered here due to space limitations, with the advent of proteomics<br />
techniques one can further analyze a network of protein–protein interaction<br />
as well as post-translational modifications of those proteins involved in a<br />
specific disease (Table 3). It is now highly recommended that common reagents<br />
such as antibodies and standard proteins, which are very useful for spiking<br />
purposes, quantification work, and sensitivity normalization of one machine to<br />
another be used in worldwide efforts like human proteome organization plasma<br />
proteome project (96,97). Finally, clinical proteomics needs the integration of<br />
biochemistry, pathology, analytical technology, bioinformatics, and proteome<br />
informatics to develop highly sensitive diagnostic tools for routine clinical care<br />
in the future (71,98).<br />
Acknowledgments<br />
This study was supported by a grant from the Korea Health 21 R&D project,<br />
Ministry of Health & Welfare, Republic of Korea (A030003 to YKP).<br />
References<br />
1. Etzioni, R., Urban, N., Ramsey, S., McIntosh, M., Schwartz, S., Reid, B., Radich, J.,<br />
Anderson, G., and Hartwell, L. (2003) The case for early detection. Nat. Rev.<br />
Cancer 3, 1–10.<br />
2. Ludwig, J. A. and Weinstein, J. N. (2005) Biomarkers in cancer staging, prognosis<br />
and treatment selection. Nat. Rev. Cancer 5, 845–856.<br />
3. Xiao, Z., Prieto, D., Conrads, T. P., Veenstra, T. D., and Issaq, H. J. (2005)<br />
Proteomic patterns: their potential for disease diagnosis. Mol. Cell Endocrinol.<br />
230, 95–106.<br />
4. Rifai, N., Gillette, M. A., and Carr, S. A. (2006) Protein biomarker discovery<br />
and validation: the long and uncertain path to clinical utility. Nat. Biotechnol. 24,<br />
97–983.<br />
5. Emmert-Buck, M. R., Bonner, R. F., Smith, P. D., Chuaqui, R. F., Zhuang, Z.,<br />
Goldstein, S. R., Weiss, R. A., and Liotta, L. A. (1996) Laser capture microdissection.<br />
Science 274, 998–1001.<br />
6. Gillespie, J. W., Ahram, M., Best, C. J., Swalwell, J. I., Krizman, D. B.,<br />
Petricoin, E. F., Liotta, L. A., and Emmert-Buck, M. R. (2001) The role of tissue<br />
microdissection in cancer research. Cancer J. 7, 32–39.
24 Paik et al.<br />
7. Craven, R. A. and Banks, R. E. (2002) Use of laser capture microdissection to<br />
selectively obtain distinct populations of cells for proteomic analysis. Methods<br />
Enzymol. 356, 33–49.<br />
8. Vincourt, J. B., Lionneton, F., Kratassiouk, G., Guillemin, F., Netter, P.,<br />
Mainard, D., and Magdalou, J. (2006) Establishment of a reliable method for direct<br />
proteome characterization of human articular cartilage. Mol. Cell Proteomics 5,<br />
1984–1995.<br />
9. Platt, M. S., Agamanolis, D. P., Krill, C. E. Jr., Boeckman, C., Potter, J. L.,<br />
Robinson, H., and Lloyd, J. (1983) Occult hepatic sinusoid tumor of infancy<br />
simulating neuroblastoma. Cancer 52, 1183–1189.<br />
10. Mahadevia, P. J., Fleisher, L. A., Frick, K. D., Eng, J., Goodman, S. N., and<br />
Powe, N. R. (2003) Lung cancer screening with helical computed tomography<br />
in older adult smokers: a decision and cost-effectiveness analysis. JAMA 289,<br />
313–322.<br />
11. Hood, B. L., Darfler, M. M., Guiel, T. G., Furusato, B., Lucas, D. A.,<br />
Ringeisen, B. R., Sesterhenn, I. A., Conrads, T. P., Veenstra, T. D., and Krizman,<br />
D. B. (2005) Proteomic analysis of formalin-fixed prostate cancer tissue. Mol. Cell<br />
Proteomics 4, 1741–1753.<br />
12. Alaiya, A., Al-Mohanna, M., and Linder, S. (2005) Clinical cancer proteomics:<br />
promises and pitfalls. J. Proteome Res. 4, 1213–1222.<br />
13. Gericke, B., Raila, J., Sehouli, J., Haebel, S., Konsgen, D., Mustea, A., and<br />
Schweigert, F. J. (2005) Microheterogeneity of transthyretin in serum and ascitic<br />
fluid of ovarian cancer patients. BMC Cancer 17, 133–141.<br />
14. Swisher, E. M., Wollan, M., Mahtani, S. M., Willner, J. B., Garcia, R., Goff, B. A.,<br />
and King, M. C. (2005) Tumor-specific p53 sequences in blood and peritoneal fluid<br />
of women with epithelial ovarian cancer. Am. J. Obstet. Gynecol. 193, 662–667.<br />
15. Pisitkun, T., Johnstone, R., and Knepper, M. A. (2006) Discovery of urinary<br />
biomarkers. Mol. Cell Proteomics 5, 1760–1771.<br />
16. Ghafouri, B., Irander, K., Lindbom, J., Tagesson, C., and Lindahl, M. (2006)<br />
Comparative proteomics of nasal fluid in seasonal allergic rhinitis. J. Proteome<br />
Res. 5, 330–338.<br />
17. Koo, B. S., Lee, D. Y., Ha, H. S., Kim, J. C., and Kim, C. W. (2005) Comparative<br />
analysis of the tear protein expression in blepharitis patients using two-dimensional<br />
electrophoresis. J. Proteome Res. 4, 719–724.<br />
18. Grus, F. H., Podust, V. N., Bruns, K., Lackner, K., Fu, S., Dalmasso, E. A.,<br />
Wirthlin, A., and Pfeiffer, N. (2005) SELDI-TOF-MS ProteinChip array profiling<br />
of tears from patients with dry eye. Invest. Ophthalmol. Vis. Sci. 46, 863–876.<br />
19. Amado, F. M., Vitorino, R. M., Domingues, P. M., Lobo, M. J., and Duarte, J. A.<br />
(2005) Analysis of the human saliva proteome. Expert Rev. Proteomics 2, 521–539.<br />
20. Wang, T. H., Chang, Y. L., Peng, H. H., Wang, S. T., Lu, H. W., Teng, S. H.,<br />
Chang, S. D., and Wang, H. S. (2005) Rapid detection of fetal aneuploidy using<br />
proteomics approaches on amniotic fluid supernatant. Prenat. Diagn. 25, 559–566.<br />
21. Ruetschi, U., Rosen, A., Karlsson, G., Zetterberg, H., Rymo, L., Hagberg,<br />
H., and Jacobsson, B. (2005) Proteomic analysis using protein chips to detect
Overview and Introduction to Clinical Proteomics 25<br />
biomarkers in cervical and amniotic fluid in women with intra-amniotic inflammation.<br />
J. Proteome Res. 4, 2236–2242.<br />
22. Kim, Y. S., Kim, M. S., Lee, S. H., Choi, B. C., Lim, J. M., Cha, K. Y., and<br />
Baek, K. H. (2006) Proteomic analysis of recurrent spontaneous abortion: identification<br />
of an inadequately expressed set of proteins in human follicular fluid.<br />
Proteomics 6, 3445–3454.<br />
23. Pilch, B. and Mann, M. (2006) Large-scale and high-confidence proteomic analysis<br />
of human seminal plasma. Genome Biol. 7, R40<br />
24. Varnum, S. M., Covington, C. C., Woodbury, R. L., Petritis, K., Kangas, L. J.,<br />
Abdullah, M. S., Pounds, J. G., Smith, R. D., and Zangar, R. C. (2003) Proteomic<br />
characterization of nipple aspirate fluid: identification of potential biomarkers of<br />
breast cancer. Breast Cancer Res. Treat. 80, 87–97.<br />
25. Zheng, P. P., Luider, T. M., Pieters, R., Avezaat, C. J., van den Bent, M. J., Sillevis<br />
Smitt, P. A., and Kros, J. M. (2003) Identification of tumor-related proteins by<br />
proteomic analysis of cerebrospinal fluid from patients with primary brain tumors.<br />
J. Neuropathol. Exp. Neurol. 62, 855–862.<br />
26. Gibson, D. S., Blelock, S., Brockbank, S., Curry, J., Healy, A., McAllister, C.,<br />
and Rooney, M. E. (2006) Proteomic analysis of recurrent joint inflammation in<br />
juvenile idiopathic arthritis. J. Proteome Res. 5, 1988–1995.<br />
27. Merkel, D., Rist, W., Seither, P., Weith, A., and Lenter, M. C. (2005)<br />
Proteomic study of human bronchoalveolar lavage fluids from smokers with<br />
chronic obstructive pulmonary disease by combining surface-enhanced laser<br />
desorption/ionization-mass spectrometry profiling with mass spectrometric protein<br />
identification. Proteomics 5, 2972–2980.<br />
28. Wu, J., Kobayashi, M., Sousa, E. A., Liu, W., Cai, J., Goldman, S. J., Dorner, A. J.,<br />
Projan, S. J., Kavuru, M. S., Qiu, Y., and Thomassen, M. J. (2005) Differential<br />
proteomic analysis of bronchoalveolar lavage fluid in asthmatics following<br />
segmental antigen challenge. Mol. Cell Proteomics 4, 1251–1264.<br />
29. Tyan, Y. C., Wu, H. Y., Lai, W. W., Su, W. C., and Liao, P. C. (2005) Proteomic<br />
profiling of human pleural effusion using two-dimensional nano liquid chromatography<br />
tandem mass spectrometry. J. Proteome Res. 4, 1274–1286.<br />
30. Khalil, A. A. and James, P. (2007) Biomarker discovery: a proteomic approach for<br />
brain cancer profiling. Cancer Sci. 98, 201–213.<br />
31. Khodavirdi, A. C., Song, Z., Yang, S., Zhong, C., Wang, S., Wu, H., Pritchard, C.,<br />
Nelson, P. S., and Roy-Burman, P. (2006) Increased expression of osteopontin<br />
contributes to the progression of prostate cancer. Cancer Res. 66, 883–888.<br />
32. Vincourt, J. B., Lionneton, F., Kratassiouk, G., Guillemin, F., Netter, P., Mainard, D.,<br />
and Magdalou, J. (2006) Establishment of a reliable method for direct proteome<br />
characterization of human articular cartilage. Mol. Cell Proteomics 5, 1984–1995.<br />
33. Lee, Y. J., Rice, R. H., and Lee, Y. M. (2006) Proteome analysis of human<br />
hair shaft: from protein identification to post-translational modification. Mol. Cell<br />
Proteomics 5, 789–800.<br />
34. Cho, S. Y., Lee, E. Y., Lee, J. S., Kim, H. Y., Park, J. M., Kwon, M. S., Park, Y. K.,<br />
Lee, H. J., Kang, M. J., Kim, J. Y., Yoo, J. S., Park, S. J., Cho, J. W., Kim, H. S., and
26 Paik et al.<br />
Paik, Y. K. (2005) Efficient prefractionation of low-abundance proteins in human<br />
plasma and construction of a two-dimensional map. Proteomics 5, 3386–3396.<br />
35. Lathrop, J. T., Hayes, T. K., Carrick, K., and Hammond, D. J. (2005) Rarity gives<br />
a charm: evaluation of trace proteins in plasma and serum. Expert Rev. Proteomics<br />
2, 393–406.<br />
36. Lee, H. J., Lee, E. Y., Kwon, M. S., and Paik, Y. K. (2006) Biomarker discovery<br />
from the plasma proteome using multidimensional fractionation proteomics. Curr.<br />
Opin. Chem. Biol. 10, 42–49.<br />
37. Anderson, N. L. and Anderson, N. G. (2002) The human plasma proteome: history,<br />
character, and diagnostic prospects. Mol. Cell Proteomics 1, 845–867.<br />
38. Hu, S., Loo, J. A., and Wong, D. T. (2006) Human body fluid proteome analysis.<br />
Proteomics 6, 6326–6353.<br />
39. Park, M. R., Wang, E. H., Jin, D. C., Cha, J. H., Lee, K. H., Yang, C. W.,<br />
Kang, C. S., and Choi, Y. J. (2006) Establishment of a 2-D human urinary proteomic<br />
map in IgA nephropathy. Proteomics 6, 1066–1076.<br />
40. Tammen, H., Schutle, I., Hess, R., Menzel, C., Kellmann, M., and Schulz-<br />
Knappe, P. (2005) Prerequisites for peptidomic analysis of blood samples: I.<br />
Evaluation of blood specimen qualities and determination of technical performance<br />
characteristics. Comb. Chem. High Trhoughput Screen 8, 725–733.<br />
41. Rai, A. J., Gelfand, C. A., Haywood, B. C., Warunek, D. J., Yi, J., Schuchard, M. D.,<br />
Mehigh, R. J., Cockrill, S. L., Scott, G. B., Tammen, H., Schulz-Knappe, P.,<br />
Speicher, D. W., Vitzthum, F., Haab, B. B., Siest, G., and Chan, D. W.<br />
(2005) HUPO plasma proteome project specimen collection and handling: towards<br />
the standardization of parameters for plasma proteome samples. Proteomics 5,<br />
3262–3277.<br />
42. Zhou, M., Lucas, D. A., Chan, K. C., Issaq, H. J., Petricoin, E. F. 3rd, Liotta, L. A.,<br />
Veenstra, T. D., and Conrads, T. P. (2004) An investigation into the human serum<br />
“interactome”. Electrophoresis 25, 1289–1298.<br />
43. Findeisen, P., Sismanidis, D., Riedl, M., Costina, V., and Neumaier, M. (2005)<br />
Preanalytical impact of sample handling on proteome profiling experiments with<br />
matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Clin.<br />
Chem. 51, 2409–2411.<br />
44. Park, K. S., Kim, H., Kim, N. G., Cho, S. Y., Choi, K. H., Seong, J. K., and Paik,<br />
Y. K. (2002) Proteomic analysis and molecular characterization of tissue ferritin<br />
light chain in hepatocellular carcinoma. Hepatology 35, 1459–1466.<br />
45. Park, K. S., Cho, S. Y., Kim, H., and Paik, Y. K. (2002) Proteomic alterations of the<br />
variants of human aldehyde dehydrogenase isozymes correlate with hepatocellular<br />
carcinoma. Int. J. Cancer 97, 261–265.<br />
46. Marko-Varga, G., Berglund, M., Malmstrom, J., Lindberg, H., and Fehniger, T. E.<br />
(2003) Targeting hepatocytes from liver tissue by laser capture microdissection<br />
and proteomics expression profiling. Electrophoresis 24, 3800–3805.<br />
47. Paradis, V., Degos, F., Dargere, D., Pham, N., Belghiti, J., Degott, C., Janeau,<br />
J. L., Bezeaud, A., Delforge, D., Cubizolles, M., Laurendeau, I., and Bedossa, P.<br />
(2005) Identification of a new biomarker of hepatocellular carcinoma by serum<br />
protein profiling of patients with chronic liver diseases. Hepatology 41, 40–47.
Overview and Introduction to Clinical Proteomics 27<br />
48. Ru, Q. C., Zhu, L. A., Silberman, J., and Shriver, C. D. (2006) Label-free semiquantitative<br />
peptide feature profiling of human breast cancer and breast disease sera via<br />
two-dimensional liquid chromatography–mass spectrometry. Mol. Cell Proteomics<br />
5, 1095–1104.<br />
49. Azad, N. S., Rasool, N., Annuziata, C. M., Minasian, L., Whiteley, G., and<br />
Kohn, E. C. (2006) Proteomics in clinical trials and practice: present uses and<br />
future promise. Mol. Cell Proteomics 5, 1819–1829.<br />
50. Gunter, E. W. (1997) Biological and environmental specimen banking at the<br />
Centers for Disease Control and Prevention. Chemosphere 34, 1945–1953.<br />
51. Strauss, G. H. and Kelly, S. J. (1990) The development of the U.S. EPA health<br />
effects research laboratory frozen blood cell repository program. Mutat. Res. 234,<br />
349–354.<br />
52. Romeo, M. J., Espina, V., Lowenthal, M., Espina, B. H., Petricoin, E. F. 3rd, and<br />
Liotta, L. A. (2005) CSF proteome: a protein repository for potential biomarker<br />
identification. Expert Rev. Proteomics 2, 57–70.<br />
53. Conrads, T. P., Hood, B. L., Petricoin, E. F. 3rd, Liotta, L. A., and Veenstra, T. D.<br />
(2005) Cancer proteomics: many technologies, one goal. Expert Rev. Proteomics<br />
2, 693–703.<br />
54. Schrader, M. and Selle, H. (2006) The process chain for peptidomic biomarker<br />
discovery. Dis. Markers 22, 27–37.<br />
55. Danna, E. A. and Nolan, G. P. (2006) Transcending the biomarker mindset:<br />
deciphering disease mechanisms at the single cell level. Curr. Opin. Chem. Biol.<br />
10, 20–27.<br />
56. De Masi, S., Tosti, M. E., and Mele, A. (2005) Screening for hepatocellular<br />
carcinoma. Dig. Liver Dis. 37, 260–268.<br />
57. Yamaguchi, K., Nagano, M., Torada, N. Hamasaki, N., Kawakita, M., and<br />
Tanaka, M. (2004) Urine diacetylspermine as a novel tumor marker for pancreatobiliary<br />
carcinomas. Rinsho. Byori. 52, 336–339<br />
58. Dabrowska, M., Grubek-Jaworska, H., Domagala-Kulawik, J., Bartoszewicz, Z.,<br />
Kondracka, A., Krenke, R., Nejman, P., and Chazan, R. (2004) Diagnostic usefulness<br />
of selected tumor markers (CA125, CEA, CYFRA 21–1) in bronchoalveolar lavage<br />
fluid in patients with non-small cell lung cancer. Pol. Arch. Med. Wewn 111, 659–665.<br />
59. Gann, P. H., Hennekens, C. H., and Stampfer, M. J. (1995) A prospective evaluation<br />
of plasma prostate-specific antigen for detection of prostatic cancer. JAMA 273,<br />
289–294<br />
60. Ciambellotti, E., Coda, C., and Lanza, E. (1993) Determination of CA 15–3 in the<br />
control of primary and metastatic breast carcinoma. Minerva Med. 84, 107–112.<br />
61. Linkov, F., Lisovich, A., Yurkovetsky, Z., Marrangoni, A., Velikokhatnaya, L.,<br />
Nolen, B., Winans, M., Bigbee, W., Siegfried, J., Lokshin, A., and Ferris, R. L.<br />
(2007) Early detection of head and neck cancer: development of a novel screening<br />
tool using multiplexed immunobead-based biomarker profiling. Cancer Epidemiol.<br />
Biomarkers Prev. 16, 102–107.<br />
62. Casiano, C. A., Mediavilla-Varela, M., and Tan, E. M. (2006) Tumor-associated<br />
antigen arrays for the serological diagnosis of cancer. Mol. Cell Proteomics 5,<br />
1745–1759.
28 Paik et al.<br />
63. Nissom, P. M., Lo, S. L., Lo, J. C., Ong, P. F., Lim, J. W., Ou, K., Liang, R. C.,<br />
Seow, T. K., and Chung, M. C. (2006) Hcc-2, a novel mammalian ER thioredoxin<br />
that is differentially expressed in hepatocellular carcinoma. FEBS Lett. 580, 2216–<br />
2226.<br />
64. Feng, J. T., Liu, Y. K., Song, H. Y., Dai, Z., Qin, L. X., Almofti, M. R., Fang, C. Y.,<br />
Lu, H. J., Yang, P. Y., and Tang, Z. Y. (2005) Heat-shock protein 27: a potential<br />
biomarker for hepatocellular carcinoma identified by serum proteome analysis.<br />
Proteomics 5, 4581–1588.<br />
65. Li, D. Q., Wang, L., Fei, F., Hou, Y. F., Luo, J. M., Wei-Chen, Zeng, R.,<br />
Wu, J., Lu, J. S., Di, G. H., Ou, Z. L., Xia, Q. C., Shen, Z. Z., and<br />
Shao, Z. M. (2006) Identification of breast cancer metastasis-associated proteins<br />
in an isogenic tumor metastasis model using two-dimensional gel electrophoresis<br />
and liquid chromatography-ion trap-mass spectrometry. Proteomics 6,<br />
3352–3368.<br />
66. Lee, I. N., Chen, C. H., Sheu, J. C., Lee, H. S., Huang, G. T., Yu, C. Y.,<br />
Lu, F. J., and Chow, L. P. (2005) Identification of human hepatocellular carcinomarelated<br />
biomarkers by two-dimensional difference gel electrophoresis and mass<br />
spectrometry. J. Proteome Res. 4, 2062–2069.<br />
67. Righetti, P. G., Castagna, A., Antonucci, F., Piubelli, C., Cecconi, D.,<br />
Campostrini, N., Rustichelli, C., Antonioli, P., Zanusso, G., Monaco, S., Lomas, L.,<br />
and Boschetti, E. (2005) Proteome analysis in the clinical chemistry laboratory:<br />
myth or reality Clin. Chim. Acta 357, 123–139.<br />
68. Jang, J. S., Cho, H. Y., Lee, Y. J., Ha, W. S., and Kim, H. W. (2004) The<br />
differential proteome profile of stomach cancer: identification of the biomarker<br />
candidates. Oncol. Res. 14, 491–499.<br />
69. Steel, L. F., Shumpert, D., Trotter, M., Seeholzer, S. H., Evans, A. A., London,<br />
W. T., Dwek, R., and Block, T. M. (2003) A strategy for the comparative analysis<br />
of serum proteomes for the discovery of biomarkers for hepatocellular carcinoma.<br />
Proteomics 3, 601–609.<br />
70. Yip, T. T., Chan, J. W., Cho, W. C., Yip, T. T., Wang, Z., Kwan, T. L., Law, S. C.,<br />
Tsang, D. N., Chan, J. K., Lee, K. C., Cheng, W. W., Ma, V. W., Yip, C.,<br />
Lim, C. K., Ngan, R. K., Au, J. S., Chan, A., Lim, W. W., and Ciphergen SARS<br />
Proteomics Study Group (2005) Protein chip array profiling analysis in patients<br />
with severe acute respiratory syndrome identified serum amyloid a protein as a<br />
biomarker potentially useful in monitoring the extent of pneumonia. Clin. Chem. 51,<br />
47–55.<br />
71. Anderson, L. and Hunter, C. L. (2005) Quantitative mass spectrometric multiple<br />
reaction monitoring assays for major plasma proteins. Mol. Cell Proteomics 5,<br />
573–588.<br />
72. Lee, J. W., Figeys, D., and Vasilescu, J. (2007) Biomarker assay translation from<br />
discovery to clinical studies in cancer drug development: quantification of emerging<br />
protein biomarkers. Adv. Cancer Res. 96, 269–298.<br />
73. Zolg, W. (2006) The proteomic search for diagnostic biomarkers: lost in translation<br />
Mol. Cell Proteomics 5, 1720–1726.
Overview and Introduction to Clinical Proteomics 29<br />
74. Bensmail, H., Golek, J., Moody, M. M., Semmes, J. O., and Haoudi, A. (2005)<br />
A novel approach for clustering proteomics data using Bayesian fast Fourier<br />
transform. Bioinformatics 21, 2210–2224.<br />
75. Ward, D. G., Cheng, Y., N’Kontchou, G., Thar, T. T., Barget, N., Wei, W.,<br />
Billingham, L. J., Martin, A., Beaugrand, M., and Johnson, P. J. (2006) Changes in<br />
the serum proteome associated with the development of hepatocellular carcinoma<br />
in hepatitis C-related cirrhosis. Br. J. Cancer 94, 287–292.<br />
76. Lin, N. and Zhao, H. (2005) Are scale-free networks robust to measurement errors<br />
BMC Bioinformatics 6, 119.<br />
77. Castagna, A., Cecconi, D., Sennels, L., Rappsilber, J., Guerrier, L., Fortis, F.,<br />
Boschetti, E., Lomas, L., and Righetti, P. G. (2005) Exploring the hidden human<br />
urinary proteome via ligand library beads. J. Proteome Res. 4, 1917–1930.<br />
78. Rauch, A., Bellew, M., Eng, J., Fitzgibbon, M., Holzman, T., Hussey, P., Igra, M.,<br />
Maclean, B., Lin, C. W., Detter, A., Fang, R., Faca, V., Gafken, P., Zhang, H.,<br />
Whiteaker, J., States, D., Hanash, S., Paulovich, A., and McIntosh, M. W. (2006)<br />
Computational proteomics analysis system (CPAS): an extensible open source<br />
analytic system for evaluating and publishing proteomic data and high throughput<br />
biological experiments. J. Proteome Res. 5, 112–121.<br />
79. Lilley, K. S. and Friedman, D. B. (2004) All about DIGE: quantification technology<br />
for differential-display 2D-gel proteomics. Expert Rev. Proteomics 1, 401–409.<br />
80. Qian, W. J., Jacobs, J. M., Liu, T., Camp, D. G. 2nd, and Smith, R. D.<br />
(2006) Advances and challenges in liquid chromatography-mass spectrometrybased<br />
proteomics profiling for clinical applications. Mol. Cell Proteomics 5,<br />
1727–1744.<br />
81. Powell, D. W., Merchant, M. L., and Link, A. J. (2006) Discovery of regulatory<br />
molecular events and biomarkers using 2D capillary chromatography and mass<br />
spectrometry. Expert Rev. Proteomics 3, 63–74.<br />
82. Andre, M., Le Caer, J. P., Greco, C., Planchon, S., El Nemer, W., Boucheix, C.,<br />
Rubinstein, E., Chamot-Rooke, J., and Le Naour, F. (2006) Proteomic analysis of<br />
the tetraspanin web using LC-ESI-MS/MS and MALDI-FTICR-MS. Proteomics<br />
6, 1437–1449.<br />
83. Greengauz-Roberts, O., Stoppler, H., Nomura, S., Yamaguchi, H.,<br />
Goldenring, J. R., Podolsky, R. H., Lee, J. R., and Dynan, W. S. (2005) Saturation<br />
labeling with cysteine-reactive cyanine fluorescent dyes provides increased sensitivity<br />
for protein expression profiling of laser-microdissected clinical specimens.<br />
Proteomics 5, 1746–1757.<br />
84. Heck, A. J. and Krijgsveld, J. (2004) Mass spectrometry-based quantitative<br />
proteomics. Expert Rev. Proteomics 1, 317–326.<br />
85. Schneider, L. V. and Hall, M. P. (2005) Stable isotope methods for high-precision<br />
proteomics. Drug Discov. Today 10, 353–363.<br />
86. Zhang, J., Goodlett, D. R., Peskind, E. R., Quinn, J. F., Zhou, Y., Wang, Q.,<br />
Pan, C., Yi, E., Eng, J., Aebersold, R. H., and Montine, T. J. (2005) Quantitative<br />
proteomic analysis of age-related changes in human cerebrospinal fluid. Neurobiol<br />
Aging 26, 207–227.
30 Paik et al.<br />
87. Liu, T., Qian, W. J., Strittmatter, E. F., Camp, D. G. 2nd, Anderson, G. A.,<br />
Thrall. B. D., and Smith, R. D. (2004) High-throughput comparative proteome<br />
analysis using a quantitative cysteinyl-peptide enrichment technology. Anal. Chem.<br />
76, 5345–5353.<br />
88. Li, C., Hong, Y., Tan, Y. X., Zhou, H., Ai, J. H., Li, S. J., Zhang, L., Xia, Q. C.,<br />
Wu, J. R., Wang, H. Y., and Zeng, R. (2004) Accurate qualitative and quantitative<br />
proteomic analysis of clinical hepatocellular carcinoma using laser capture<br />
microdissection coupled with isotope-coded affinity tag and two-dimensional liquid<br />
chromatography mass spectrometry. Mol. Cell Proteomics 3, 399–409.<br />
89. Sheehan, K. M., Calvert, V. S., Kay, E. W., Lu, Y., Fishman, D., Espina, V.,<br />
Aquino. J., Speer, R., Araujo, R., Mills, G. B., Liotta, L. A., Petricoin, E. F.<br />
3rd, and Wulfkuhle, J. D. (2005) Use of reverse phase protein microarrays and<br />
reference standard development for molecular network analysis of metastatic<br />
ovarian carcinoma. Mol. Cell Proteomics 4, 346–355.<br />
90. Knezevic, V., Leethanakul, C., Bichsel, V. E., Worth, J. M., Prabhu, V. V., Gutkind,<br />
J. S., Liotta, L. A., Munson, P. J., Petricoin, E. F. 3rd, and Krizman, D. B. (2001)<br />
Proteomic profiling of the cancer microenvironment by antibody arrays. Proteomics<br />
1, 1271–1278.<br />
91. Sharma-Oates, A., Quirke, P., Westhead, D. R. (2005) TmaDB: a repository for<br />
tissue microarray data. BMC Bioinformatics 6, 218.<br />
92. Rai, A. J., Stemmer, P. M., Zhang, Z., Adam, B. L., Morgan, W. T., Caffrey,<br />
R. E., Podust, V. N., Patel, M., Lim, L. Y., Shipulina, N. V., Chan, D. W.,<br />
Semmes, O. J., and Leung, H. C. (2005) Analysis of human proteome organization<br />
plasma proteome project (HUPO PPP) reference specimens using surface enhanced<br />
laser desorption/ionization-time of flight (SELDI-TOF) mass spectrometry: multiinstitution<br />
correlation of spectra and identification of biomarkers. Proteomics 5,<br />
3467–3474.<br />
93. Engwegen, J. Y., Gast, M. C., Schellens, J. H., and Beijnen, J. H. (2006)<br />
Clinical proteomics: searching for better tumour markers with SELDI-TOF mass<br />
spectrometry. Trends Pharmacol. Sci. 27, 251–259.<br />
94. Domon, B. and Aebersold, R. (2006) Mass spectrometry and protein analysis.<br />
Science 312, 212–217.<br />
95. Domon, B. and Aebersold, R. (2006) Challenges and opportunities in proteomics<br />
data analysis. Mol. Cell Proteomics 5, 1921–1926.<br />
96. Uhlen, M. and Ponten, F. (2005) Antibody-based proteomics for human tissue<br />
profiling. Mol. Cell Proteomics 4, 384–393.<br />
97. Taussig, M. J., Stoevesandt, O., Borrebaeck, C. A., Bradbury, A. R., Cahill, D.,<br />
Cambillau, C., de Daruvar, A., Dubel, S., Eichler, J., Frank, R., Gibson, T. J.,<br />
Gloriam, D., Gold, L., Herberg, F. W., Hermjakob, H., Hoheisel, J. D., Joos, T. O.,<br />
Kallioniemi, O., Koegll, M., Konthur, Z., Korn, B., Kremmer, E., Krobitsch, S.,<br />
Landegren, U., van der Maarel, S., McCafferty, J., Muyldermans, S., Nygren, P. A.,<br />
Palcy, S., Pluckthun, A., Polic, B., Przybylski, M., Saviranta, P., Sawyer, A.,<br />
Sherman, D. J., Skerra, A., Templin, M., Ueffing, M., and Uhlen, M. (2007)
Overview and Introduction to Clinical Proteomics 31<br />
ProteomeBinders: planning a European resource of affinity reagents for analysis<br />
of the human proteome. Nat. Methods 4, 13–17.<br />
98. Ilyin, S. E., Belkowski, S. M., and Plata-Salaman, C. R. (2004) Biomarker<br />
discovery and validation: technologies and integrative approaches. Trends<br />
Biotechnol. 22, 411–416.
I<br />
Specimen Collection for Clinical<br />
Proteomics
2<br />
Specimen Collection and Handling<br />
Standardization of Blood Sample Collection<br />
Harald Tammen<br />
Summary<br />
Preanalytical variables can alter the analysis of blood-derived samples. Prior to the<br />
analysis of a blood sample, multiple steps are necessary to generate the desired specimen.<br />
The choice of blood specimens, its collection, handling, processing, and storage are<br />
important aspects since these characteristics can have a tremendous impact on the results<br />
of the analysis.<br />
The awareness of clinical practices in medical laboratories and the current knowledge<br />
allow for identification of specific variables that affect the results of a proteomic study.<br />
The knowledge of preanalytical variables is a prerequisite to understand and control their<br />
impact.<br />
Key Words: blood; plasma; serum; proteomics; specimen; preanalytical variables.<br />
1. Introduction<br />
Proteomic analysis of blood specimens by semi-quantitative multiplex<br />
techniques offers a valuable approach for discovery of disease or therapyrelated<br />
biomarkers (1,2). Based on reproducible separation of proteins by their<br />
physical–chemical properties in combination with semi-quantitative detection<br />
methods and bioinformatic data analysis, proteomics allows for sensitive<br />
measurement of proteins in blood specimens (3). Blood can be regarded as<br />
a complex liquid tissue that comprises cells and extracellular fluid (4). The<br />
choice of a suitable specimen-collection protocol is crucial to minimize artificial<br />
processes (e.g., cell lysis, proteolysis) occurring during specimen collection and<br />
preparation (5). Preanalytic procedures can alter the analysis of blood-derived<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
35
36 Tammen<br />
samples. These procedures comprise the processes prior to actual analysis of<br />
the sample and include steps needed to obtain the primary sample (e.g., blood)<br />
and the analytical specimen (e.g., plasma, serum, cells). Legal or ethical issues<br />
(e.g., importance of informed consents) or potential risks of phlebotomy (e.g.,<br />
bleeding) are not covered in this article.<br />
1.1. Collection of Blood Samples<br />
It has been reported that the most frequent faults in the preanalytical phase<br />
are the result of erroneous procedures of sample collection (e.g., drawing blood<br />
from an infusive line resulting in sample dilution) (6). The design of blood<br />
collection devices may aid in correct sampling: evacuated containers sustain<br />
the draw of accurate quantity of blood to ensure the correct concentration of<br />
additives or the correct dilution of the blood, such as in the case of citrated<br />
plasma. The speed of blood draw is also controlled and restricts the mechanical<br />
stress. The favored site of collection is the median cubital vein, which is<br />
generally easily found and accessed. As such, it will be most comfortable to<br />
the patient, and should not evoke additional stress. Preparation of the collection<br />
site includes proper cleaning of the skin with alcohol (2-propanol). The alcohol<br />
must be allowed to evaporate, since commingling of the remaining alcohol<br />
with blood sample may result in hemolysis, raise the levels of distinct analytes,<br />
and cause interferences. The position of the patient (standing, lying, sitting)<br />
can affect the hematocrit (7), and hence may change the concentration of the<br />
analytes. Tourniquet should be applied 3–4 inches above the site of venipuncture<br />
and should be released as soon as blood begins flowing into the collection<br />
device. The duration of venous occlusion (>1 min) can affect the sample<br />
composition. Prolonged occlusion may result in hemoconcentration and subsequently<br />
increase the miscellaneous analytes, e.g., total protein levels. Blood<br />
should be collected from fasting patients in the morning between 7 and 9 a.m.,<br />
because ingestion or circadian rhythms can alter the concentration of analytes<br />
considerably (e.g., total protein, hemoglobin, myoglobin).<br />
1.2. Characteristics of Serum and Plasma Specimens<br />
Serum is one of the most frequently analyzed blood specimens. The<br />
generation of serum is time consuming and associated with the activation of<br />
coagulation cascade and complement system. These processes influence the<br />
composition of the samples, because they result in cell lysis (e.g., thrombocytes,<br />
erythrocytes). As a consequence, the concentration of components in<br />
the extracellular fluid, such as aspartate-aminotransferase, serotonin, neuronspecific<br />
enolase, and lactate-dehydrogenase, are increased (8). On the other<br />
hand, degradation of the analytes (e.g., hormones) may occur faster (9). Onthe
Specimen Collection and Handling 37<br />
proteomic level, more peptides and less proteins are observed in serum when<br />
compared to plasma (10,11).<br />
Consequently, the activation of clotting cascades necessary to generate serum<br />
can lead to artefacts. A reason to use serum as a specimen is based on<br />
the notion that the proteome or peptidome of serum may reflect biological<br />
events (12). Post-sampling proteolytic cleavage products have been proposed<br />
as biomarkers, and it has been further suggested that serum peptidome is of<br />
particular diagnostic value for the detection of cancer (13). However, it has<br />
been reported that more protein changes occur in serum than in plasma (14).<br />
Thus, it can be expected that the reproducibility of such ex vivo proteolytic<br />
events is comparatively low.<br />
In contrast to serum, citrate and EDTA inhibit coagulation and other<br />
enzymatic processes by chelate formation with ions, thereby inhibiting iondependent<br />
enzymes. This is in contrast to heparin, which acts through the<br />
activation of antithrombin III. The main concern associated with heparinized<br />
plasma for proteomic studies is that it is a poly-disperse charged molecule that<br />
binds many proteins non-specifically (15,16), and may also influence separation<br />
procedures and mass spectrometric detection of peptides and small proteins due<br />
to its similar molecular weight (17).<br />
The sampling of plasma is less time consuming than the acquisition of serum.<br />
Separation of the cells and the liquid phase can be performed subsequently to<br />
sample collection since no clotting time is required (30–60 min). In comparison<br />
to serum, the amount of plasma generated from blood is approximately 10 to<br />
20% higher. Additionally, the protein content of plasma is also higher than in<br />
serum, because of the presence of clotting factors and associated components.<br />
Furthermore, proteins may be bound to the clot, resulting in a decrease of<br />
protein concentration.<br />
1.3. Processing of Blood Samples<br />
A quick separation of cells from the plasma is favorable, since cellular<br />
constituents may liberate substances that alter the composition of the sample.<br />
Generally, it is recommended that plasma and serum be centrifuged with<br />
1300–2000×g for 10 min within 30 min from the collection of the sample. The<br />
temperature should generally be 15–24°C (18), unless recommended differently<br />
for distinct analytes like gastrin or A-type natriuretic peptide. Processing at 4°C<br />
appears to be attractive, because enzymatic degradation processes are reduced<br />
at low temperatures. However, platelets become activated at low temperatures<br />
(19) and release intracellular proteins and enzymes, which affect the sample<br />
composition. Thus, processing at low temperatures is safe only after thrombocytes<br />
have been removed. Since one centrifugation step may be insufficient for
38 Tammen<br />
depletion of platelets below 10 cells/nL, a second centrifugation step (2500×g<br />
for 15 min at room temperature) or filtration step may be required to obtain<br />
platelet-poor plasma. This procedure is applicable only to plasma since the<br />
platelets in serum are already activated.<br />
1.4. Protease Inhibitors<br />
Protease inhibitors would be attractive, but commonly used protease cocktails<br />
may introduce difficulties due to interference with mass spectrometry and<br />
formation of covalent bonds with proteins, which would result in shifting the<br />
isoform pattern (20). Protease inhibitors have been considered and investigated as<br />
additives in proteome research to prevent or slow down proteolytic processes and<br />
thereby provide a means of more sensitive detection of markers in blood (21).<br />
Even though protein integrity has been shown to be maintained by the<br />
addition of 15 commercially available protease inhibitors, the usefulness of<br />
protease inhibitors in overall protein stabilization of blood samples remains to<br />
be investigated in more detail (22). The presence of certain protease inhibitors<br />
in whole blood is toxic to live cells. Stressed, apoptotic, or necrotic cells release<br />
substances, and it may be argued that this affects the composition of serum or<br />
plasma until the cellular and soluble factions of blood are separated. However,<br />
careful selection of an appropriate protease inhibitor may solve this problem.<br />
2. Materials<br />
1. Twenty gauge needles and an appropriate adapter (e.g., Sarstedt, Nümbrecht,<br />
Germany) or a Vacutainer system (BD Bioscience, Franklin Lakes, USA).<br />
2. Alcohol (2-propanol) in spray flask.<br />
3. Swabs.<br />
4. Examination gloves.<br />
5. Tourniquet or sphygmomanometer.<br />
6. Blood collection tubes (e.g., Sarstedt).<br />
7. Centrifuge with a swinging bucket rotor (e.g., Sigma 4K15, Sigma Laborzentrifugen,<br />
Osterode, Harz).<br />
8. A 10-mL syringe equipped with a cellulose acetate filter unit with 0.2 μm pore<br />
size and 5 cm 2 filtration area (e.g., Sartorius Minisart, Sarstedt).<br />
9. 2 mL cryo-vials.<br />
10. Pipette and tips.<br />
3. Methods<br />
1. Venipuncture of a cubital vein is performed using a 20-gauge needle (diameter:<br />
0.9 mm, e.g., butterfly system max. tubing length: 6 cm). If tourniquet is applied,<br />
it should not remain in place for longer than 1 min (risk of falsifying results due to
Specimen Collection and Handling 39<br />
hemoconcentration). As soon as the blood flows into the container, the tourniquet<br />
has to be released at least partially. If more time is required, the tourniquet<br />
has to be released so that circulation resumes and normal skin color returns to<br />
extremity.<br />
• Prior to blood collection for proteomic analysis, blood is aspirated into the<br />
first container (e.g., 2.7 mL S-Monovette, Sarstedt, Nümbrecht, Germany).<br />
This is done to flush the surface and remove initial traces of contact-induced<br />
coagulation. This sample is not useful for analysis.<br />
• Afterward, blood is drawn into a standard EDTA or citrate-containing syringe<br />
(e.g. 9 mL EDTA-Monovette, Sarstedt, Nümbrecht, Germany). Depending on<br />
ease of blood flow, several samples can be collected. Free flow with mild<br />
aspiration should be assured to avoid haemolysis.<br />
2. After venipuncture, plasma is obtained by centrifugation for 10 min at 2000×g at<br />
room temperature. Centrifugation should start within 30 min after blood collection.<br />
The resulting plasma sample may now be separated from red and white blood<br />
cells in an efficient and gentle way. Nevertheless, a significant number of platelets<br />
(∼25%) are still present in the sample. This requires an additional preparation<br />
step.<br />
3. For platelet depletion, one of the following procedures has to be undertaken<br />
directly after step 2:<br />
• Platelet removal by centrifugation: The plasma sample is transferred into a<br />
second vial for another centrifugation for 15 min at 2500×g at room temperature.<br />
After centrifugation, the supernatant is transferred in aliquots of 1.5 mL<br />
into cryo vials.<br />
• Platelet removal by filtration: Plasma aliquots of 1.5 mL resulting from step<br />
2 are transferred into 2-mL cryo vials using a 10-mL syringe equipped with<br />
a cellulose acetate filter unit with 0.2 μm pore size and 5 cm 2 filtration area<br />
(e.g., Sartorius Minisart ® , Sartorius, Göttingen, Germany). Filtration requires<br />
only gentle pressure.<br />
4. Samples are transferred to an –80°C freezer within 30 min. Storage is at –80°C.<br />
Transport of samples is done on dry ice.<br />
4. Notes<br />
4.1. Frequently Made Mistakes<br />
4.1.1. Blood Withdrawal<br />
• The patient was not fasting (i.e., had taken food prior to sampling).<br />
• The blood was drawn from an infusive line.<br />
• The blood was drawn in a wrong position (e.g., supine, upright).<br />
• The consumables used were different than those recommended.
40 Tammen<br />
• The expiry date of consumables was already reached.<br />
• The tubes were not properly filled.<br />
• The tubes were agitated vigorously (instead of gentle shaking to dissolve the anticoagulant).<br />
• The blood sample tubes were not consistently kept at room temperature.<br />
• The sample tubes were put on ice or in a refrigerator.<br />
.<br />
4.1.2. Lab Handling<br />
• Centrifugation was delayed more than 30 min after blood withdrawal.<br />
• A cooling centrifuge was adjusted below room temperature.<br />
• The centrifugation speed was wrong (e.g., rounds per minute were set instead of<br />
g-force).<br />
• The centrifugation time was wrong.<br />
• The removal of blood plasma by pipetting was done without proper caution. Consequently,<br />
the buffy coat or the red blood cells were churned up.<br />
• The second centrifugation of recovered plasma samples was delayed after first<br />
centrifugation.<br />
4.1.3. Storage of Samples<br />
• The storage of samples was delayed.<br />
• The storage temperatures were above –80°C.<br />
• The labeling of sample containers was unreadable or confusable.<br />
• The attachment of labels to the sample containers was not proper during storage or<br />
handling resulted in loss of labels.<br />
4.1.4. General Recommendations<br />
• A proper first centrifugation should produce a visible white blood cell layer (buffy<br />
coat) between red blood cells and plasma. If not, centrifugation speed or time may<br />
be wrong.<br />
• One should discard plasma that is icteric or exhibits signs of haemolysis. One should<br />
check with an expert if this was due to that particular disease.<br />
References<br />
1. Vitzthum F, Behrens F, Anderson NL, Shaw JH. (2005) Proteomics: from basic<br />
research to diagnostic application. A review of requirements and needs. J. Proteome<br />
Res. 4, 1086–97.<br />
2. Lathrop JT, Anderson NL, Anderson NG, Hammond DJ. (2003) Therapeutic<br />
potential of the plasma proteome. Curr. Opin. Mol. Ther. 5, 250–7.
Specimen Collection and Handling 41<br />
3. Wang W, Zhou H, Lin H, Roy S, Shaler TA, Hill LR et al. (2003) Quantification of<br />
proteins and metabolites by mass spectrometry without isotopic labeling or spiked<br />
standards. Anal. Chem. 75, 4818–26.<br />
4. Anderson NL, Anderson NG. (2002) The human plasma proteome: history,<br />
character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845–67.<br />
5. Omenn GS. (2004) The Human Proteome Organization Plasma Proteome<br />
Project pilot phase: reference specimens, technology platform comparisons, and<br />
standardized data submissions and analyses. Proteomics 4, 1235–40.<br />
6. Plebani M, Carraro P. (1997) Mistakes in a stat laboratory: types and frequency.<br />
Clin. Chem. 43, 1348–51.<br />
7. Burtis CA, Ashwood E. (eds) (2001) Fundamentals of Clinical Chemistry.<br />
Saunders, Philadelphia.<br />
8. Guder WG, Narayanan S, Wisser H, Zawata B. (2003) Samples: From the Patient to<br />
the Laboratory. The Impact of Preanalytical Variables on the Quality of Laboratory<br />
Results. GIT Verlag, Darmstadt, Germany.<br />
9. Evans MJ, Livesey JH, Ellis MJ, Yandle TG. (2001) Effect of anticoagulants and<br />
storage temperatures on stability of plasma and serum hormones. Clin. Biochem<br />
34, 107–12.<br />
10. Omenn GS, States DJ, Adamski M, Blackwell TW, Menon R, Hermjakob H et al.<br />
(2005) Overview of the HUPO Plasma Proteome Project: results from the pilot<br />
phase with 35 collaborating laboratories and multiple analytical groups, generating<br />
a core dataset of 3020 proteins and a publicly-available database. Proteomics 5,<br />
3226–45.<br />
11. Rai AJ, Gelfand CA, Haywood BC, Warunek DJ, Yi J, Schuchard MD et al.<br />
(2005) HUPO Plasma Proteome Project specimen collection and handling: towards<br />
the standardization of parameters for plasma proteome samples. Proteomics 5,<br />
3262–77.<br />
12. Villanueva J, Shaffer DR, Philip J, Chaparro CA, Erdjument-Bromage H,<br />
Olshen AB et al. (2006) Differential exoprotease activities confer tumor-specific<br />
serum peptidome patterns. J. Clin. Invest. 116, 271–84.<br />
13. Liotta LA, Petricoin EF. (2006) Serum peptidome for cancer detection: spinning<br />
biologic trash into diagnostic gold. J. Clin. Invest. 116, 26–30.<br />
14. Tammen H, Schulte I, Hess R, Menzel C, Kellmann M, Schulz-Knappe P. (2005)<br />
Prerequisites for peptidomic analysis of blood samples: I. Evaluation of blood<br />
specimen qualities and determination of technical performance characteristics.<br />
Comb. Chem. High Throughput Screen. 8, 725–33.<br />
15. Holland NT, Smith MT, Eskenazi B, Bastaki M. (2003) Biological sample collection<br />
and processing for molecular epidemiological studies. Mutat. Res. 543, 217–34.<br />
16. Landi MT, Caporaso N. (1997) Sample collection, processing and storage. IARC<br />
Sci. Publ. 223–36.<br />
17. Tammen H, Schulte I, Hess R, Menzel C, Kellmann M, Mohring T,<br />
Schulz-Knappe P. (2005) Peptidomic analysis of human blood specimens:<br />
comparison between plasma specimens and serum by differential peptide display.<br />
Proteomics 13, 3414–22.
42 Tammen<br />
18. Favaloro EJ, Soltani S, McDonald J. (2004) Potential laboratory misdiagnosis of<br />
hemophilia and von Willebrand disorder owing to cold activation of blood samples<br />
for testing. Am. J. Clin. Pathol. 122, 686–92.<br />
19. Mustard JF, Kinlough-Rathbone RL, Packham MA. (1989) Isolation of human<br />
platelets from plasma by centrifugation and washing. Methods Enzymol. 169, 3–11.<br />
20. Schuchard MD, Mehigh RJ, Cockrill SL, Lipscomb GT, Stephan JD, Wildsmith J<br />
et al. (2005) Artifactual isoform profile modification following treatment of<br />
human plasma or serum with protease inhibitor, monitored by 2-dimensional<br />
electrophoresis and mass spectrometry. Biotechniques 39, 239–47.<br />
21. Jeffrey DH, Deidra B, Keith H, Shu-Pang H, Deborah LR, Gregory JO, Stanley AH.<br />
(2004) An Investigation of Plasma Collection, Stabilization, and Storage Procedures<br />
for Proteomic Analysis of Clinical Samples. Humana, Totowa, NJ.<br />
22. Rai AJ, Vitzthum F. (2006) Effects of preanalytical variables on peptide and protein<br />
measurements in human serum and plasma: implications for clinical proteomics.<br />
Expert Rev. Proteomics 3, 409–26.
3<br />
Tissue Sample Collection for Proteomics Analysis<br />
Jose I. Diaz, Lisa H. Cazares, and O. John Semmes<br />
Summary<br />
Successful collection of tissue samples for molecular analysis requires critical considerations.<br />
We describe here our procedure for tissue specimen collection for proteomic<br />
purposes with emphasis on the most important steps, including timing issues and the procedures<br />
for immediate freezing, storage, and microdissection of the cells of interest or “tissue<br />
targets” and the lysates for protein isolation for SELDI, MALDI, and 2DGE applications.<br />
The pathologist is at the cornerstone of this process and is an invaluable collaborator.<br />
In most institutions, pathologists are responsible for “tissue custody,” and they closely<br />
supervise the tissue bank. In addition, they are optimally trained in histopathology in<br />
order to they assist investigators to correlate tissue morphology with molecular findings.<br />
In recent years, the advent of the laser capture microscope, a tool ideally designed for<br />
pathologists, has tremendously facilitated the efficiency of collecting tissue targets for<br />
molecular analysis.<br />
Key Words: tissue bank; frozen section; immunofluorescence; laser capture microscope;<br />
proteomics.<br />
1. Introduction<br />
From the completion of surgery and the acquisition of tissue sample to<br />
protein isolation and performing the various proteomic techniques, a number<br />
of challenges must be overcome. The first challenge is time. Surgery is<br />
associated with loss of vascular supply, resulting in progressive increase of<br />
endogenous protease activity, protein degradation, and tissue autolysis. For<br />
this reason, specimens submitted for tissue procurement must be processed<br />
without delay. Formalin fixation, a standard processing procedure in pathology,<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
43
44 Diaz et al.<br />
stops protease activity. However, formalin is a cross-linking fixative that<br />
irreversibly alters protein, thus compromising the quality of the extracts for<br />
most proteomic techniques. Recent technical developments appear promising<br />
and may ultimately enable peptide analysis and protein identification (bottom<br />
up proteomics) in formalin-fixed paraffin embedded tissue (1). At present,<br />
however, it is imperative to take a representative “fresh” tissue sample immediately<br />
after surgery when collecting tissue for proteomic studies, including<br />
MALDI TOF MS and 2DGE. The surgical specimen should be transported<br />
quickly to pathology, and a representative tissue sample should be obtained<br />
under the supervision of a pathologist. The sample should be embedded in OCT<br />
and frozen without delay. Ideally, a frozen section should be performed for<br />
quality assurance before archiving the sample. Once the pathologist confirms<br />
that the expected targets are present in the collected tissue (for instance, tumor<br />
and non-tumor tissue), the frozen specimen can be stored in a –80°C freezer for<br />
subsequent use. Overcoming time constraints requires appropriate institutional<br />
policies and dedicated personnel. From our experience, it is better to delegate<br />
the responsibility of transporting the surgical specimen from the operating room<br />
to pathology to dedicated tissue procurement personnel, instead of expecting<br />
the surgical team to deliver the specimens. When collecting and archiving tissue<br />
samples, our policy is to bisect the sample into two halves, one embedded<br />
in OCT and stored permanently at –80°C for future molecular studies, and<br />
one submitted as a “mirror image” processed in formalin after performing a<br />
frozen section for morphologic comparison and cell type mapping after basic<br />
hematoxylin and eosin (H&E) staining. This formalin-processed mirror image<br />
tissue provides optimal morphological detail, which might be necessary in<br />
the future. For instance, it is very difficult to identify prostatic intraepithelial<br />
neoplasia (PIN) on frozen section slides; however, the formalin fixed section,<br />
which closely mimics the frozen section, can be used for guidance.<br />
After archiving the tissue sample, the next challenge is to ensure that the<br />
proteomic findings are representative of the tissue targets under investigation,<br />
given the cellular heterogeneity present in most tissues. For instance, if one<br />
would like to determine the differential protein expression in tumor versus<br />
non-tumor, one must ensure that proteins are separately and reliably extracted<br />
from normal and tumor cells. Certainly, many solid tumors are visible to the<br />
naked eye, and both tumor and non-tumor tissues can be collected by gross<br />
inspection. However, under a microscope, the tumor bed contains not only<br />
tumor cells but many other tumor–associated, non-tumoral elements, such as<br />
supporting stromal cells, blood vessels, infiltrating lymphocytes, etc. Moreover,<br />
microscopic foci of tumor may infiltrate grossly normal tissue. In the past,<br />
various approaches were followed to collect cells from tissue sections, including<br />
manual microdissection with a syringe. In the recent years, the procedure
Tissue Sample Collection for Proteomics Analysis 45<br />
of laser-capture microdissection (2) has tremendously increased the quality,<br />
specificity, and speed of the process, allowing selective capture of cells and<br />
various tissue elements while preserving the molecular integrity (3,4,5).<br />
The LCM is a special microscope that isolates cells from frozen or formalinfixed<br />
tissues and cytological preparations. Microdissection of single cells or<br />
multicellular structures is accomplished by placing a plastic polymer (cap) over<br />
the tissue while pulsing an infrared laser for the polymer to melt and adhere<br />
to the target cells under the laser ring. When the cap is removed, the cells<br />
that adhered to the polymer detach from the surrounding tissue without any<br />
molecular damage, becoming suitable for the extraction of high-quality nucleic<br />
acids and proteins, and for a wide range of downstream molecular analyses,<br />
A<br />
B<br />
C<br />
D<br />
Fig. 1. Selective immunofluorescent LCM of prostate gland’s basal cells by immunocapture:<br />
(A) immunofluorescent staining of basal cells with a mAb against highmolecular-weight<br />
keratins, which are highly expressed on basal cells, (B) selection<br />
of immunofluorescent-positive basal cells for subsequent LCM, (C) captured<br />
immunofluorescent-positive cells after LCM photographed from the plastic cap,<br />
(D) remaining of the gland after removing the basal cell layer by LCM.
46 Diaz et al.<br />
such as gene expression microarrays, or proteomics. The use of a microscope<br />
can be coupled with special immunostaining procedures if one wishes to capture<br />
specific cell types not easily identified by morphology alone, which is the<br />
“so called” immunocapture procedure (6,7), which further enhances the specificity<br />
of tissue procurement for molecular analysis. For example, in a former<br />
study (8), we were able to selectively capture basal cells from benign prostate<br />
glands, which are extremely difficult to recognize morphologically but easily<br />
identifiable after immunostaining for high-molecular-weight cytokeratin (Fig. 1).<br />
We obtained excellent protein quality results and were able to identify several<br />
protein peaks preferentially expressed in these cells using SELDI-TOF-MS.<br />
When we compared the protein spectra from the same tissue sample sections<br />
routinely stained with hematoxilin with those immunostained for high-molecularweight<br />
cytokeratins, there was no difference in the spectra, militating against<br />
any significant protein deterioration due to the immunostaining procedure.<br />
2. Materials<br />
2.1. Tissue Collection and Storage<br />
1. Tissue-Tek Cryomold-standard (Sakura, Torrance, CA)<br />
2. Tissue-Tek OCT (Sakura)<br />
3. 2 ′ methylbutane (Mallinckrodt, St. Louis, MO)<br />
4. Shandon Histobath II (Thermo Electron Corp., Waltham, MA)<br />
5. –80°C freezer<br />
2.2. Frozen Tissue Sectioning and Staining<br />
1. Cryostat<br />
2. HistoGene TM LCM Frozen Section Staining Kit (Arcturus Biosciences Inc,<br />
Mountain <strong>View</strong>, CA). The kit contains histogene staining solution, ethanol (75,<br />
95, 100%), xylene, distilled water nuclease free, histogene LCM slides, and<br />
disposable slide staining jars.<br />
3. 1× PBS made from 10× stock (Fisher Scientific)<br />
4. Acetone (high purity grade)<br />
5. Cy3-Strepavidin (Invitrogen, Carlsbad, CA)<br />
6. Biotinylated mAbs: Any antibody can be biotinylated. We routinely have 1.5 mg of<br />
antibody labeled with 0.2 mg biotin (Alpha Diagnostic Intl. Inc. San Antonio, TX).<br />
2.3. LCM<br />
1. PixCell II LCM System (Arcturus Biosciences Inc)<br />
2. AutoPix TM Automated LCM System (Arcturus Biosciences Inc)<br />
3. CapSure ® LCM caps (Arcturus Biosciences Inc)<br />
4. Prep Strip (Arcturus Biosciences Inc)<br />
5. Microcentrifuge tubes (0.5 ml) (Eppendorf North America)
Tissue Sample Collection for Proteomics Analysis 47<br />
2.4. LCM Lysate<br />
1. Micropipet capable of delivering 1 μl accurately<br />
2. 20 mM HEPES (pH to 8.0 with NaOH) with 1% Triton X-100<br />
3. Sonicator (optional)<br />
4. 1× PBS<br />
2.5. SELDI Analysis<br />
1. IMAC3 or WCX2 Protein Array Chips (Ciphergen Biosystems Palo Alto, CA)<br />
2. HPLC grade water (Fisher Scientific)<br />
3. 100 mM sodium acetate pH 4.0<br />
4. 100 mM ammonium acetate pH 4.0<br />
5. Sinapinic acid (SPA) (Ciphergen Biosystems, Palo Alto, CA)<br />
6. Optima grade Acetonitile (Fisher Scientific)<br />
7. Trifluoroacetic acid, packaged in 1 ml ampules (Pierce Chemical Company,<br />
Rockford, IL)<br />
2.6. MALDI Analysis<br />
1. Target plate<br />
2. Cinaminic acid (CHCA) (Bruker Daltonics, Palo Alto, CA)<br />
3. SPA (Fluka)<br />
4. Optima grade Acetonitile (Fisher Scientific)<br />
5. Trifluoroacetic acid, packaged in 1 ml ampules (Pierce Chemical Company)<br />
3. Method<br />
3.1. Tissue Collection and Storage<br />
1. The tissue sample is embedded in OCT using a cryomold and is frozen in the<br />
Shandon Histobath, which contains 2 ′ methylbutane (see Note 1).<br />
2. Hold the cryomold against the 2 ′ methylbutane liquid interface and allow the<br />
tissue to freeze slowly (3–5 min) (see Note 2).<br />
3. After achieving complete freezing, place the frozen cryomold containing the<br />
sample in a plastic bag and transport the sample within a liquid nitrogen container.<br />
Store the sample in a –80°C freezer.<br />
3.2. Frozen Tissue Sectioning and Staining<br />
3.2.1. Regular Hematoxylin Staining<br />
Prior to LCM, cut 8-μm-thick frozen tissue sections from the cryostat (discard<br />
folded or wrinkled sections). Keep slides with sections in cryostat after cutting<br />
and stain as follows (see Notes 3 and 9; slides may also be frozen at –80°C<br />
until stained.):
48 Diaz et al.<br />
1. Remove the slides from the freezer or cryostat and place in 70% ethanol (30 s).<br />
2. Place in purified water (5 s).<br />
3. Add the Histogene staining solution (30 s) (see Note 4).<br />
4. Rinse the slides with purified water.<br />
5. Wash with 70% ethanol (60 s).<br />
6. Wash with 95% ethanol twice (60 s each).<br />
7. Wash with 100% ethanol (60 s).<br />
8. Place the slides in xylene to ensure complete dehydration (10 min) (see Note 5).<br />
9. Shake off and drain carefully by touching the corner with a particle-free tissue<br />
paper.<br />
10. Air dry the slides to allow xylene to evaporate completely (at least 2 min).<br />
11. The slides are now ready for LCM (they should not be coverslipped) (see<br />
Note 12)<br />
3.2.2. Immunofluorescence Staining (see Note 7)<br />
1. Thaw slides (1 min).<br />
2. Place in cold acetone at 4°C (2 min).<br />
3. Air dry (30 s).<br />
4. Wash in filtered pH 7.4 1× PBS.<br />
5. Drain off slides.<br />
6. Add 100 μl of first biotinylated Ab at optimal dilution: recommended concentration<br />
30–100 μg/ml, optimize for best results (3 min).<br />
7. Rinse in PBS.<br />
8. Add 100 μl of Cy3 at dilution 1:100 (user may decide the optimal staining<br />
concentration of the Cy3 Streptavidin conjugate by performing a serial dilution<br />
staining experiment) (1 min).<br />
9. Rinse in PBS.<br />
10. Place slides in 75% ethanol (30 s).<br />
11. Place slides in 95% ethanol (30 s).<br />
12. Place slides in 100% ethanol (30 s).<br />
13. Place slides in xylene (5 min) (see Note 6).<br />
14. Air dry (5 min).<br />
3.3. LCM<br />
The new instruments developed by Arcturus, such as the AutoPix TM and the<br />
Veritas TM are enclosed in automated systems entirely operated by a computer.<br />
We describe here the LCM procedure using the PixCell II instrument, which<br />
is manually operated and the least expensive LCM instrument today and,<br />
therefore, more widely used (see Note 8).<br />
1. Turn on the instrument and enter pertinent data such as slide #, case #, cap lot #,<br />
thickness (always 8 μm), and place the stained slide on the mechanical stage (see<br />
Note 10).
Tissue Sample Collection for Proteomics Analysis 49<br />
2. Turn on the vacuum pump to immobilize the slide (small aperture on the left side<br />
of the stage) and push in the filter bottom for optimal image quality.<br />
3. Place the caps in the rail on the right side of the stage. Unlock the mechanical arm,<br />
move it toward the tissue, and drop it at the top of the tissue. Align the joystick<br />
to move the stage to a centered and perpendicular position before beginning the<br />
microdissection process.<br />
4. Turn on the key on the right side of the power supply to enable the infrared laser.<br />
Focus the laser before beginning microdissection using the smallest ring diameter<br />
and adjust to the desired diameter.<br />
5. Select the appropriate energy (mW) and time of exposure (ms) for the desired<br />
laser ring diameter and ensure its effectiveness in an area of the tissue that lacks<br />
any interest using a cap to be discarded (see Note 11).<br />
6. Fire the laser each time the ring is over the desired tissue target. Move the stage<br />
supporting the glass slide with the aid of the joystick, which allows fine and<br />
precise motion. Check if the tissue is appropriately microdissected and capture<br />
the tissue images before and after LCM as well as the image of the target tissue<br />
that was captured in the cap (see Note 13).<br />
7. When the cap is filled with the desired amount of tissue, remove the cap and use a<br />
0.5-ml microcentrifuge tube to collect the tissue (the cap is designed to perfectly<br />
fit to close the tube) (see Note 14).<br />
8. The microcentrifuge tube can be safely stored in a –80°C freezer without adding any<br />
buffer and without lysing the cells, which may be done at a convenient time later.<br />
3.4. LCM Lysate<br />
1. Lyse a total of 1500–2000 laser shots (about 3000 to 6000 microdissected cells)<br />
in 4 μl of 20 mM Hepes pH 8.0 with 1% Triton X-100. This is sufficient for<br />
one SELDI protein array or one MALDI run. For 2D analysis, a minimum of<br />
approximately 25,000 cells are necessary.<br />
2. Add the above lysing buffer on the cap and place in the microfuge tube holding<br />
the cap. This is usually done with two additions of 2 μl to the LCM cap. Pipet<br />
up and down and scrape the surface of the LCM cap to remove all the cells. A<br />
gentle scraping motion with the pipet tip may be necessary to remove the cells,<br />
but be careful not to rip the polymer film (see Note 15). Transfer the lysate<br />
from the surface of the cap to the microfuge tube. Cells from multiple caps may<br />
be combined by subsequently using 4 μl of LCM lysate to lyse cells on another<br />
cap. In this way the volume will remain small. If 2DGE may be performed,<br />
the lysis procedure is different (see below). Make a 1:10 dilution of each lysate<br />
in PBS (for IMAC3 SELDI chips) or 100 mM ammonium acetate pH 4.0 (for<br />
WCX2 chips) (i.e., 36 μl added to the 4 μl lysate) vortex for at least 1 min (see<br />
Note 16). Spin down briefly.<br />
3. Prepare the arrays of the IMAC chip with CuSO 4 according to the manufacturer’s<br />
specifications: 20 μl, 100 mM CuSO 4 for 10 min, wash with HPLC water; 20 μl,<br />
100 mM Na acetate pH 4.0 for 5 min, wash with water. Use the Micromix<br />
shaker for all incubations with the following settings: Form-20, Amplitude-5.
50 Diaz et al.<br />
4. Assemble the bioprocessor with the desired number of chips and add 2× 200 μl<br />
PBS to each well, incubate on the shaker for 5 min each time. Pretreat the<br />
WCX2 chip with 100 mM ammonium acetate pH 4.0. This can be done on the<br />
BioMek robot.<br />
5. Add the diluted lysate to the spot on the chip(s) in the bioprocessor.<br />
6. Cover the bioprocessor with a plastic seal and incubate overnight on MicroMix<br />
shaker at room temperature, using the same setting as given above.<br />
7. Remove lysates carefully with a pipet; do not touch the surface of the arrays.<br />
Save if needed for another experiment.<br />
8. Wash the spots in bioprocessor 2× with 200 μl PBS (for IMAC) or 100 mM<br />
ammonium acetate pH 4.0 (for WCX) for 5 min on the shaker.<br />
9. Wash the arrays with HPLC water 2× for 5 min (on shaker).<br />
10. Remove the chip(s) from bioprocessor and give them a final rinse with HPLC<br />
water.<br />
11. Let the chip dry completely, usually overnight.<br />
12. Add 2× 0.5 μl saturated SPA dissolved in 50% acetonitrile, 0.5% TFA.<br />
13. Read at instrument settings optimized for resolution and intensity for the m/z<br />
range of 1000–20,000. Higher laser energy will be required to see higher<br />
molecular weight peaks.<br />
One method of MALDI sample preparation that reduces the complexity of cell<br />
lysates while remaining robust and easily amenable to automated highthroughput<br />
applications is sample fractionation using magnetic beads<br />
(MB) combined with pre-structured MALDI sample supports (AnchorChip<br />
Technology). Several magnetic bead types with different surface chemistries can<br />
be used to fractionate serum and increase the number of detectable peaks (see<br />
the chapter on serum protein profiling for details). For MALDI analysis, dilute<br />
the lysate 1:10 with CHCA or SPA matrix (5–10 mg/ml in 50% acetonitrile, 0.1%<br />
TFA). Spot on Anchorplate and read in a MALDI instrument. Further dilution<br />
and/or fractionation of the lysate may be necessary to achieve optimal spectra.<br />
If 2DGE analysis will be performed, the cells should be lysed as follows:<br />
Remove the LCM cap from the tube and add a small volume (10 μl) of 1D<br />
focusing rehydration buffer to the tube. The preferred number of laser shots is<br />
approximately 100 K. Replace the cap and invert the tube to allow the buffer<br />
to come in contact with the cells on the cap and lyse them. Incubate 5 min<br />
at room temperature. Sonicate the samples to ensure lysis. Continue with the<br />
basic protocol for 1D IEF and 2D analysis.<br />
4. Notes<br />
1. In our experience, a time window of 30 min between completion of surgery<br />
and tissue freezing yields good protein quality for most proteomic techniques.<br />
However, if one is studying protein phosphorylation, this begins to significantly<br />
decrease 20 min after completion of surgery (10).
Tissue Sample Collection for Proteomics Analysis 51<br />
2. When freezing the tissue sample in the Histobath, avoid immediate and complete<br />
immersion in 2 ′ methylbutane to preserve optimal tissue morphology. Hold the<br />
sample at the liquid interface with minimal immersion and wait until the OCT<br />
and the tissue slowly turn white.<br />
3. Use uncoated glass slides for LCM. Coated or electrically-charged glass slides<br />
will interfere with the detachment process of the plastic polymer and are not<br />
suitable for LCM.<br />
4. Precipitate from Hematoxylin can contaminate the surface of the tissue. Filter<br />
these solutions. Add one tablet of protease inhibitor to each staining bath (we use<br />
Complete, from BMB). Do not add protease inhibitor to alcohol baths. If using<br />
the histogene staining kit (Arcturus) for frozen sections, this is not necessary.<br />
5. Change all the staining and alcohol solutions after staining 20 slides.<br />
6. Poor transfers may result if 100% ethanol has hydrated. Increasing the incubation<br />
time in xylene often improves transfer.<br />
7. When specific cells need to be microdissected and these cannot be identified<br />
morphologically, the cells of interest can be immunostained with specific mAbs<br />
against proteins highly expressed on those cells (immunophenotype). It is critical<br />
to expedite the immunostaining procedure because the shorter the immunostaining<br />
time, the better the protein quality. One must avoid exceeding 30 min for<br />
the total immunostaining and dehydration procedure. In the past, we have used<br />
the immunoperoxidase technique with DAB labeling (6), but it was difficult<br />
to perform quick enough to preserve optimal protein integrity. Also, manual<br />
microdissection of DAB labeled cells with Pixel II is extremely tedious and nonpractical.<br />
The immunofluorescence staining method (7) is faster and easier to<br />
perform. This method coupled with the Autopix microscope, which has dark field<br />
fluorescence and automation capabilities, is the ideal procedure for immunocapture.<br />
Since Cy3-strepavidin binds to the antibody labeled with biotin, there is<br />
no need for a secondary antibody, thereby decreasing the necessary staining time.<br />
It is recommended to run negative control staining; use a biotinylated control<br />
antibody from the same animal species and of the same isotype as your primary<br />
antibody. Dilute to the same working concentration as the primary antibody.<br />
8. Do not forget to wear gloves every time while performing LCM, including when<br />
handling the plastic caps.<br />
9. The thickness of the tissue section is a critical parameter for effective LCM. In<br />
our experience (using the Pixel II and the Autopix instruments by Arcturus),<br />
8 μm is the optimal thickness for LCM.<br />
10. Smooth out the surface of the tissue section with a Prep-strip before placing the<br />
slide on the LCM instrument, which improves the efficiency and uniformity of<br />
the microdissection process.<br />
11. The main factors affecting the efficiency of LCM include the energy, the time<br />
of exposure, and the diameter of the laser beam. Regarding the diameter, when<br />
using Pixel II, the smallest ring is 7 μm, the medium ring is 15 μm, and the widest<br />
ring is 30 μm. Very often, we have used the medium (15 μm, which lifts up<br />
about three cells with each shot). When trying to microdissect single cells with
52 Diaz et al.<br />
Pixel II, one must use the smallest (7 μm) diameter ring, but our experience was<br />
frustrating. With Autopix, we have observed that microdissection of individual<br />
cells is better achieved setting the laser ring at 10 μm diameter, below which it<br />
becomes very difficult to lift up cells efficiently. A 30-μm diameter laser is very<br />
effective for microdissection of whole glands and other large tissue structures.<br />
Regarding the other two parameters, the optimization depends on the tissue<br />
type. For instance, for prostate tissue, an energy of 80 mW with a duration<br />
of 0.5 ms is usually effective for a medium-size ring (15 μm). The tuning of<br />
these parameters is accomplished by a “fail and try” approach, progressively<br />
adjusting the energy and the time of exposure for the desired diameter, which<br />
obviously depends on the desired microdissection task (single cells vs. mediumor<br />
large-size tissue structures).<br />
12. Another factor that affects the effectiveness of LCM is the time the tissue section<br />
has been dry after the staining and dehydration procedure. Ideally, the tissue<br />
should be stained and microdissected within 1hifpossible. One must avoid<br />
having the slide under LCM for more than 4 h. If microdissecting many tissues,<br />
stain only four slides at a time.<br />
13. When capturing images before and after microdissection for documentation<br />
purposes, make sure the image on the monitor is focused because that is the<br />
image that would be captured. Sometimes is focused on the microscope but is<br />
unfocused on the monitor. In a typical experiment, you will capture the image<br />
before and after firing the laser, which provides records of the effectiveness in<br />
removing the cell targets. You can also capture the image of microdissected<br />
cells from the polymer cap.<br />
14. Avoid allowing the LCM caps to become excessively crowded. When using<br />
the 15-μm laser ring, microdissection is about three cells per shot. One should<br />
expect around 3000 cells for each 1000 shots, which is about right per single<br />
cap.<br />
15. LCM caps can be viewed under a dissecting microscope to ensure that all cells<br />
have been removed from the polymer film after the lysing procedure.<br />
16. Depending on the cell type, vigorous vortexing and sonication may be necessary<br />
to completely lyse the cells after they are removed from the cap.<br />
References<br />
1. Prieto, D.A., Hood, B.L., Darfler, M.M., Guiel, T.G., Lucas, D.A., Conrads, T.P.,<br />
Veenstra, D.T., and Krizman, D.B. (2005) Liquid Tissue TM : proteomic profiling of<br />
formalin-fixed tissues. Biotechniques 38: 32–5.<br />
2. Emmert-Buck, M.R., Bonner, R.F., Smith, P.D., Chuaqui, R.F., Zhuang, Z.,<br />
Goldstein, S.R., Weiss, R.A., and Liotta, L.A. (1996) Laser capture microdissection.<br />
Science 274: 998–1001.<br />
3. Espina, V., Milia, J., Wu, G., Cowherd, S., Liotta, L.A. (2006) Laser capture<br />
microdissection. Methods Mol Biol 319: 213–29.
Tissue Sample Collection for Proteomics Analysis 53<br />
4. Best, C.J., and Emmert-Buck, M.R. (2001) Molecular profiling of tissue samples<br />
using laser capture microdissection. Expert Rev Mol Diagn. 1: 53–60.<br />
5. Ornstein, D.K., Gillespie, J.W., Paweletz, C.P., Duray, P.H., Herring, J.,<br />
Vocke, C.D., Topalian, S.L., Bostwick, D.G., Linehan, W.M., Petricoin, E.F., III,<br />
and Emmert-Buck, M.R. (2000) Proteomic analysis of laser capture microdissected<br />
human prostate cancer and in vitro prostate cell lines. Electrophoresis 21:<br />
2235–42.<br />
6. Fend, F., Emmert-Buck, M.R., Chuaqui, R., Cole, K., Lee, J., Liotta, L.A., and<br />
Raffeld, M. (1999) Immuno-LCM: laser capture microdissection of immunostained<br />
frozen sections for mRNA analysis. Am J Pathol 154: 61–6.<br />
7. Murakami, H., Liotta, L., Star, R.A. (2000) IF-LCM: laser capture microdissection<br />
of immunofluorescently defined cells for mRNA analysis rapid communication.<br />
Kidney Int 58(3): 1346–53.<br />
8. Cazares, L.H., Adam, B.L., Ward, M.D., Nasim, S., Schellhammer, P.F.,<br />
Semmes, O.J., and Wright, G.L., Jr (2002) Normal, benign, preneoplastic, and<br />
malignant prostate cells have distinct protein expression profiles resolved by<br />
surface enhanced laser desorption/ionization mass spectrometry. Clin Cancer Res<br />
8: 2541–52.<br />
9. Diaz, J., Cazares, L.H., Corica, A., and Semmes O. (2004) Selective capture<br />
of prostatic basal cells and secretory epithelial cells for proteomic and genomic<br />
analysis. Urol Oncol 22(4): 329–36.<br />
10. Mora, L., Buettner, R., Seigne, J., Diaz, J., Hamad, N., Garcia, R., Bowman, T.,<br />
Falcone, R., Faigurth, R., Cantor, A., Muro-Cacho, C., Livistong, S., Levitzki, A.,<br />
Kraker, A., Karras, J., Pow-Sang, J., and Jove, R. (2002) Constitutive activation of<br />
Stat3 in human prostate tumors and cell lines: direct inhibition of stat3 signaling<br />
induces apoptosis of prostate cancer cells. Cancer Research 62: 6659–66.
4<br />
Protein Profiling of Human Plasma Samples<br />
by Two-Dimensional Electrophoresis<br />
Sang Yun Cho, Eun-Young Lee, Hye-Young Kim, Min-Jung Kang,<br />
Hyoung-Joo Lee, Hoguen Kim, and Young-Ki Paik<br />
Summary<br />
Human plasma is regarded the most complex and well-known clinical specimen that<br />
can be easily obtained; alterations in the levels of plasma proteins or their corresponding<br />
enzyme activities may reflect either a healthy or a diseased state. Given that there is<br />
no defined genomic information as to the intact protein components in plasma, protein<br />
profiling could be the first step toward its molecular characterization. Several problems<br />
exist in the analysis of plasma proteins, however. For example, the widest dynamic range<br />
of protein concentrations, the presence of high-abundance proteins, and post-translational<br />
modifications need to be considered before proteomic studies are undertaken. In particular,<br />
efficient depletion or pre-fractionation of high-abundance proteins is crucial for the identification<br />
of low-abundance proteins that may contain potential biomarkers. After the removal<br />
of high-abundance proteins, protein profiling can be initiated using two-dimensional<br />
electrophoresis (2DE), which has been widely used for displaying the differential proteome<br />
under specific physiological conditions. Here, we describe a typical 2DE procedure for<br />
plasma proteome under either a healthy or a diseased state (e.g., liver cancer) in which<br />
pre-fractionation and depletion are integral steps in the search for disease biomarkers.<br />
Key Words: 2-dimensional gel electrophoresis; plasma; HPPP; immunoaffinity<br />
column.<br />
Abbreviations: IEF: Isoelectric Focusing, IPG; Immobilized pH Gradient, TCA:<br />
Trichloroacetic Acid, FFE: Free Flow Electrophoresis, HPMC: Hydroxypropyl Methylcellulose,<br />
TBP: Tributylphosphine, 2DE: 2-dimensional Gel Electrophoresis, BPB:<br />
Bromophenol Blue, CHCA: -cyano-4-hydroxycinnamic acid, LTQ: Linear Iontrap<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
57
58 Cho et al.<br />
MALDI-TOF: Matrix-assisted Laser Desorption Ionization - Time of Flight Mass<br />
Spectrometry, HPPP: Human Plasma Proteome Project.<br />
1. Introduction<br />
Human plasma is an intravascular fluid that serves as a liquid medium<br />
for blood proteins that are derived from various cells, tissues, and other<br />
biofluids (1). In fact, the components of plasma are very heterogeneous,<br />
including inorganic ions (e.g., bicarbonate, calcium), metabolic intermediates<br />
(e.g., cholesterol, glucose), and plasma proteins (e.g., albumin, globulin), which<br />
are important in maintaining body fluid balance, immune response, blood<br />
clotting, and other metabolic mechanisms of homeostasis. Plasma contains<br />
many different proteins that are primarily synthesized in the liver and are often<br />
subjected to post-translational modification (PTM) (2).<br />
Since human plasma is the most complex and well-known clinical specimen<br />
that can be easily obtained, it has been a central target for many biomedical<br />
studies (2). Alterations in the levels of plasma proteins or their corresponding<br />
enzyme activities may reflect either a healthy or a diseased state that can<br />
be monitored by various analytical tools, including biochemical assays and<br />
proteomics. Given that there is no defined genomic information as to the<br />
intact protein components in plasma, a proteomic study may be the method of<br />
choice (3,4). Recently, plasma protein profiling was conducted as part of the<br />
plasma proteome project of HUPO, termed HPPP (5). The pilot phase of HPPP<br />
produced 3020 non-redundant proteins that were found to be present in human<br />
plasma and serum (5,6).<br />
However, several points must be addressed before proteomic studies are<br />
undertaken. First, plasma protein is believed to contain the most dynamic<br />
concentration range (more than 10 orders of magnitude) of each constituent<br />
protein, creating many technical obstacles in proteomic detection by mass<br />
spectrometry (MS) (2,3). For example, the removal of high-abundance proteins<br />
(e.g., albumin, IgG, transferrin, fibrinogen, IgA, etc.) that occupy more than<br />
90% of all plasma proteins prior to biochemical analysis may be a big<br />
challenge and perhaps even problematic in light of plasma-derived biomarker<br />
discovery (3,7). Second, since many plasma proteins have many structural<br />
isoforms, more efficient analytical system is needed to facilitate the analysis<br />
of multiple isoforms of plasma proteins (1). Third, since many plasma proteins<br />
are synthesized as pre-proteins that are subjected to various PTMs for cellular<br />
function, more efficient methods to analyze modified proteins (e.g., glycosylated<br />
proteins) are required. For example, since glycopeptides are not easily<br />
ionized completely during MS analysis, which leads to inadequate spectral<br />
data and low detection sensitivity due to the attached glycans, a strategy
Protein Profiling by Two-Dimensional Electrophoresis 59<br />
for the removal of glycans must be considered for protein identification.<br />
Taken together, all these factors are important for the proteomic study of<br />
plasma (8).<br />
Of the problems listed above, the first problem that concerns the protein<br />
profiling of plasma may be the depletion or pre-fractionation of high-abundance<br />
plasma proteins (3,4,7). Without this depletion procedure, the identification of<br />
low-abundance proteins (including biomarkers) may not be practical. After the<br />
removal of high-abundance proteins, two-dimensional electrophoresis (2DE)<br />
may be the first step chosen to analyze plasma proteins because it is easy to<br />
perform in the laboratory. Although 2DE has several limitations in terms of<br />
reproducibility, separation of membrane or low-molecular-weight proteins, and<br />
proteins with extreme pIs (10), this technique has been widely used<br />
as a first analysis of proteins in a particular physiological state when coupled<br />
with MS (9). Recently, quantitative 2DE was performed with a difference in<br />
gel electrophoresis (DIGE) system (see Chapter by Friedman and Lilley for<br />
detail), where two or three differentially staining dyes can be applied to specific<br />
protein populations to determine their quantitative changes in expression levels<br />
under a specific physiological condition (10). Thus, this chapter is intended<br />
to provide the reader with necessary information on the systematic analysis<br />
of the plasma proteome using 2DE in an attempt to search for disease<br />
biomarkers from the plasma proteins of patients with hepatocellular carcinoma<br />
(HCC) (11,12).<br />
2. Materials<br />
2.1. Preparation of Human Plasma Samples<br />
1. Blood collection tubes: BD Plus Plastic K 2 EDTA (BD, 367525; 10 mL), BD<br />
Glass Serum with silica clot activator (367820, 10 mL).<br />
2. Protease inhibitor (Complete Protease Inhibitor Cocktail, Roche, 11 697 498 001,<br />
20 tablets): One tablet contains protease inhibitors (antipain, bestatin, chymostatin,<br />
leupeptin, pepstatin, aprotinin, phosphoramidon, and EDTA) sufficient for the<br />
processing of 100 mL plasma samples. Prepare 25× stock solutions in 2 mL<br />
distilled water.<br />
2.2. Depletion of High-Abundance Proteins with an Immunoaffinity<br />
Column<br />
1. HPLC system, such as the HP1100 LC system (Agilent).<br />
2. Multiple affinity removal system (MARS): LC column (Agilent, 5185-5984);<br />
Buffer A for sample loading, washing, and equilibrating (Agilent, 5185-5987);<br />
Buffer B for eluting (Agilent, 5185-5988).
60 Cho et al.<br />
2.3. Isoelectric Focusing (IEF) with Immobilized pH Gradient (IPG)<br />
Strip<br />
1. MultiPhor TM (GE Healthcare) or Protean IEF cell (Bio-Rad): Numerous commercially<br />
available isoelectric focusing units exist<br />
2. Re-swelling tray<br />
3. Mineral oil: Immobiline Dry Strip Cover Fluid (GE Healthcare)<br />
4. Power supply, such as the EPS 3501 XL power supply (GE Healthcare)<br />
5. Thermostatic circulator: Multitemp III thermostatic circulator (GE Healthcare)<br />
6. IPG strip: Immobiline Dry Strip, pH 3-10 nonlinear (NL), or pH 4.0-5.0, and pH<br />
5.5-6.7, 18 cm long, 0.5 mm thick (GE Healthcare) or with the same pH ranges<br />
for ReadyStrip IPG strip (Bio-Rad)<br />
7. Carrier ampholyte mixtures: IPG buffer or Pharmalyte, same range as the selected<br />
IPG strip<br />
8. Sample buffer: 7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 0.5% (v/v) ampholyte,<br />
100 mM DTT, 40 mM Tris-HCl, pH 7.5, a trace amount of bromophenol blue<br />
(BPB)<br />
2.4. Microscale Solution Isoelectric Focusing: ZOOM ®<br />
1. ZOOM ® (IEF Fractionator (Invitrogen, ZF10001)).<br />
2. ZOOM ® disks: pHs 3.0, 4.6, 5.4, 6.2, 7.0, and 10.0 [Invitrogen, ZD series (e.g.,<br />
ZD10030 for pH 3.0)]<br />
3. IEF Anode Buffer (50X) (Novex, LC5300, 100 mL)<br />
4. IEF Cathode Buffer (10X) (Novex, LC5310, 125 mL)<br />
5. Anode buffer: 8.4 g urea, 3.0 g thiourea, 3.3 mL Novex ® IEF Anode Buffer<br />
(50X). Add water to a final volume of 20 mL.<br />
6. Cathode buffer: 8.4 g urea, 3.0 g thiourea, 3.3 mL Novex ® IEF Cathode Buffer<br />
(50X). Add water to a final volume of 20 mL.<br />
2.5. Fractionation of Plasma Samples by Free Flow Electrophoresis<br />
(FFE)<br />
1. ProTeam TM FFE instrument (Tecan)<br />
2. 1% 2-(4-sulfophenylazo)-1,8-dihydroxy-3,6-naphthalenedisulfonic acid (SPAD-<br />
NS) (Tecan, 517074)<br />
3. 0.8% hydroxypropyl methylcellulose (HPMC) (Tecan, 5170709)<br />
4. pI markers: mixture of pI markers that indicate pHs 4.2, 5.1, 6.3, 7.4, 8.7, and<br />
10.1 (Tecan, 5170705)<br />
5. Prolyte TM 1, Prolyte TM 2, and Prolyte TM 3 (Tecan, 0309081, 0309102, and<br />
0309093)<br />
6. Anodic stabilization medium (Inlet I 1 ): 14.5% (w/w) glycerol, 8 M urea, 0.03%<br />
(w/w) HPMC, 100 mM H 2 SO 4<br />
7. Separation medium 1 (Inlet I 2 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% (w/w)<br />
HPMC, 14.5% (w/w) Prolyte TM 1
Protein Profiling by Two-Dimensional Electrophoresis 61<br />
8. Separation medium 2 (Inlet I 3−5 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% (w/w)<br />
HPMC, 14.5% (w/w) Prolyte TM 2<br />
9. Separation medium 3 (Inlet I 6 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% (w/w)<br />
HPMC, 14.5% (w/w) Prolyte TM 3<br />
10. Cathodic stabilization medium (Inlet I 7 ): 14.5% (w/w) glycerol, 8 M urea, 0.03%<br />
(w/w) HPMC, 100 mM NaOH<br />
11. Counter flow medium (Inlet I 8 ): 14.5% (w/w) glycerol, 8 M urea<br />
12. Anodic circuit electrolyte: 100 mM H 2 SO 4<br />
13. Cathodic circuit electrolyte: 100 mM NaOH<br />
2.6. Preparation of 2D Gels<br />
1. Gradient former: One of the two Bio-Rad models can be used in this step: Model<br />
385 (30-100 mL capacity) or Model 395 (100-750 mL capacity).<br />
2. Orbital shaker with speed controller.<br />
3. SDS-PAGE: Protean II xi multicell and multicasting chamber (Bio-Rad) or Ettan<br />
DALT twelve large vertical system (GE Healthcare).<br />
4. 5× Tris-HCl buffer: Dissolve 227 g Tris into 800 mL distilled water and adjust<br />
the buffer to pH 8.8 with HCl (∼30 mL). Add distilled water to a final volume<br />
of1L.<br />
5. 5× Gel buffer: Dissolve 15 g Tris, 72 g glycine, and 5 g sodium dodesyl sulfate<br />
(SDS) into 800 mL distilled water and add distilled water to a final volume<br />
of1L.<br />
6. SDS Equilibration buffer contains 6 M urea, 2% (w/v) SDS, 5× gel buffer (pH<br />
8.8), 50% (v/v) glycerol, and 2.5% (w/v) acrylamide monomer.<br />
7. Acrylamide stock solution: Acrylamide/Bis-acrylamide 37:5.1, 40% (w/v)<br />
solution (Amresco, M157, 500 mL).<br />
8. Fixing solution: 40% (v/v) methanol and 5% (v/v) phosphoric acid in distilled<br />
water.<br />
9. Coomassie blue G-250 staining solution: 17% (w/v) ammonium sulfate, 3% (v/v)<br />
phosphoric acid, 34% (v/v) methanol, and 0.1% (w/v) Coomassie blue G-250 in<br />
distilled water.<br />
2.7. 2D Gel Image Analysis<br />
1. Scanner with transparency unit, such as Bio-Rad GS710 or GS800<br />
2. 2D gel image analysis program: Image Master Platinum 5 (GE Healthcare),<br />
PDQuest 7.3.0 (Bio-Rad), or Progenesis Discovery (NonLinear Dynamics, Ltd.)<br />
2.8. Destaining, In-gel Deglycosylation, and In-gel Tryptic Digestion<br />
1. Speed Vac (Heto)<br />
2. PNGase F stock solution for in-gel deglycosylation PNGase F (Glyko, Inc, GKE-<br />
5010). Dilute 1 μL PNGase F (2 mU) with 2.5 mL 1× N-glycanase incubation<br />
buffer (20 mM sodium phosphate, pH 7.5, and 0.02% (w/v) sodium azide)
62 Cho et al.<br />
3. Sequencing-grade modified trypsin (Promega, V5111, 100 μg, 18,100 U/mg)<br />
4. 50 mM ammonium bicarbonate<br />
2.9. Desalting of Peptides and MALDI Plating<br />
1. GELoader tips (Eppendorf, No. 0030 048.083, 20 μL capacity)<br />
2. Poros 10 R2 resin (PerSeptive Biosystems, 1-1118-02, 0.8 g)<br />
3. Oligo R3 resins (PerSeptive Biosystems, 1-1339-03, 6.3 g)<br />
4. 2% (v/v) formic acid in 70% (v/v) acetonitrile (ACN)<br />
5. 0.1% (v/v) trifluoroacetic acid in 70% (v/v) ACN<br />
6. 1-mL syringe<br />
7. Matrix: -cyano-4-hydroxycinnamic acid (CHCA)<br />
8. Opti-TOF TM 384-well insert (123 × 81 mm, 1016491, Applied Biosystems)<br />
2.10. MALDI-TOF and Peptide Mass Fingerprinting<br />
1. MALDI-TOF and MALDI-TOF/TOF: Voyager DE-Pro and 4800 MALDI<br />
TOF/TOF TM Analyzer (Applied Biosystems) equipped with a 355-nm Nd:YAG<br />
laser. The pressure in the TOF analyzer is approximately 7.6e-07 Torr.<br />
3. Methods<br />
3.1. Human Plasma Sample Preparation<br />
The following protocol is conducted according to the HUPO reference<br />
sample collection protocol (13).<br />
1. Each sample pool consisted of 400 mL blood from one healthy, fasting male and<br />
one healthy, fasting postmenopausal female, and was collected into 10-mL tubes<br />
by two venipunctures, 20 tubes per veni-puncture (see Note 1).<br />
2. Equal numbers of tubes and aliquots were generated with appropriate concentrations<br />
of K 2 -EDTA, lithium heparin, or sodium citrate for plasma or were permitted<br />
to clot at room temperature for 30 min to yield serum (with micronized silica as<br />
the clot activator) (see Note 2).<br />
3. The specimens were centrifuged for 10–15 min under refrigerated conditions at<br />
2–6°C.<br />
4. The resultant serum and plasma from 10 spun tubes of the same type from each<br />
donor were pooled into one secondary 50-mL conical bottom BD TM Falcon tube<br />
for each tube type.<br />
5. The secondary tube was centrifuged at 2400×g for 15 min to remove residual<br />
cellular material from serum and to prepare platelet-poor plasma from the EDTA,<br />
heparin, and citrate secondary tubes.<br />
6. Equal volumes of either serum or plasma were pooled from each secondary tube<br />
into media bottles (see Note 3).<br />
7. Serum/plasma was mixed gently and kept on ice while distributed as 20-μL<br />
aliquots into cryovials and was then frozen and stored at –70°C.
Protein Profiling by Two-Dimensional Electrophoresis 63<br />
3.2. Depletion of High-abundance Proteins with an Immunoaffinity<br />
Column<br />
For efficient depletion of high-abundance proteins prior to their molecular<br />
analysis, many reports have indicated that it is convenient to use commercially<br />
available immunoaffinity columns, such as the MARS (Agilent) (2,3) or the<br />
prepacked 2-mL Seppro TM MIXED12 affinity LC column (GenWay Biotech.)<br />
(14), coupled with an HPLC system. For depletion of the six most abundant<br />
proteins (i.e., albumin, transferrin, IgG, IgA, haptoglobin, and anti-trypsin) in<br />
either serum or plasma, we introduced MARS, which has been used successfully<br />
with a wide variety of sample types, including cerebrospinal fluid (CSF) and<br />
follicular fluid (2,3) (see Fig. 1 ).<br />
1. Dilute human serum or plasma fivefold with Buffer A (for example: 20 μL<br />
human plasma with 80 μL Buffer A) containing the protease inhibitor stock<br />
solution (40 μL per 1 mL plasma) (see Note 4) (adopted from the manufacturer’s<br />
instructions).<br />
2. Remove the particulates with a 0.22-μm spin filter for 1 min at 16,000×g.<br />
3. Inject 75-100 μL of the diluted serum or plasma at a flow rate of 0.5 mL/min.<br />
Fig. 1. The 2DE images of total human plasma proteins that were depleted of the<br />
major six abundant proteins through MARS. Proteins were isoelectrically focused with<br />
pH 3–10 NL IPG strips in the first dimension and then resolved by 9–16% SDS-<br />
PAGE in the second dimension. (A) Whole plasma. (B) Flow through from MARS.<br />
Approximately 800 protein spots are displayed by 2DE and identified by MALDI-TOF<br />
mass spectrometry. The names of the major proteins of each gel are marked on the<br />
image (5) (from (4)with permission)
64 Cho et al.<br />
4. Collect the flow-through fractions that appear between 1.5 and 4.5 min and store<br />
them at –20°C if they were not to be analyzed immediately.<br />
5. Elute bound proteins from the column with Buffer B (elution buffer) at a flow<br />
rate of 1 mL/min for 3.5 min.<br />
6. Regenerate the column by equilibrating with Buffer A for an additional 7.4 min<br />
at a flow rate of 1 mL/min.<br />
3.3. TCA/Acetone Precipitation<br />
During 2DE, interfering compounds, such as proteolytic enzymes, salts,<br />
lipids, nucleic acids, and any residual high-abundance proteins present after<br />
depletion, must be removed or inactivated. In the case of plasma samples, the<br />
two most important parameters are salt and proteolysis. TCA/acetone precipitation<br />
is the most useful method for desalting the whole plasma and the<br />
flow-through fractions of MARS.<br />
1. Add 50% (w/v) trichloroacetic acid (TCA, Sigma, T9159) to reach a final TCA<br />
concentration of 5-8%. Mix gently by inverting the tube 5 to 6 times and incubate<br />
on ice for 2 h.<br />
2. Centrifuge the sample at 14,000×g for 15 min and discard the supernatant.<br />
3. Add 200 μL cold acetone and resuspend the protein pellet with a pipette.<br />
4. Incubate on ice for 15 min and centrifuge the sample at 14,000×g for 20 min,<br />
discard the acetone, and dry the pellet in air (see Note 5).<br />
5. Dissolve the pellet in the sample buffer for 2DE and quantify the protein concentration<br />
by the Bradford protein assay.<br />
3.4. Rehydration of the IPG Gel Strip<br />
For analytical purposes, typically 0.3–1.0 mg protein can be loaded onto an<br />
18-cm-long IPG with a wide pH range (e.g., pH 3-10), or 0.5–2.0 mg on an<br />
IPG with a narrow pH range (e.g., pH 5.5–6.7). A narrow-range IPG usually<br />
produces a higher resolution when separate proteins are analyzed by sequential<br />
IEF systems: first, fractionate the proteins over several pI ranges in solution<br />
with ZOOM ® disks or FFE (see Subheadings 3.6 and 3.7) and then perform<br />
IEF with IPG strips [one pH unit range strips are also available (e.g., pH 3.0–<br />
4.0 or pH 3.5–4.5 up to pH 6.7)]. Certain proteins appear to be trapped in the<br />
disk membrane; partitions and sample loss should be considered.<br />
1. Dilute 1.0 mg protein with the sample buffer to a final volume of 400 μL for<br />
18-cm-long IPG strips (see Note 6).<br />
2. Transfer the entire protein-containing sample buffer into the re-swelling tray.<br />
3. Peel off the protective cover from the IPG strip and slowly slide the IPG strip (gel<br />
side down) onto the sample solution. Avoid trapping air bubbles and distribute<br />
the sample solution evenly under the strips.
Protein Profiling by Two-Dimensional Electrophoresis 65<br />
4. Overlay the strip with mineral oil and leave for 12-16 h at room temperature (see<br />
Note 7 for cup loading)<br />
3.5. IEF with IPG Strip<br />
1. Remove the rehydrated IPG strips that are carrying the protein samples and place<br />
them (gel side up) on the strip tray.<br />
2. Place the 2.5-cm filter papers, wetted with distilled water, on both sides of the<br />
strips at both cathodic and anodic ends. Place the strip tray on the IEF unit.<br />
3. Cover the strips entirely with mineral oil.<br />
4. Program the instrument (e.g., Multiphor II): Increase the voltage from 100 to<br />
3500 V to reach 80,000 total voltage hours (Vh) (e.g., sequentially, 300 Vh at<br />
100 V, 600 Vh at 300 V, 600 Vh at 600 V, 1000 Vh at 1000 V, and 2000 Vh at<br />
2000 V, for a total of 80,000 Vh at 3500 V) (see Notes 8 and 9).<br />
5. During IEF, the temperature is set to 20°C with a water circulator.<br />
3.6. Microscale Solution IEF: ZOOM ®<br />
To reduce typical artifacts that may occur when using narrow-range IPG<br />
strips (e.g., streaking, distortion, and loss of protein spots), one may use<br />
MicroSol-IEF (e.g., ZOOM ® , Invitrogen) prior to running 2D gels (3) (see<br />
Fig. 2). MicroSol-IEF is a preparative solution-phase IEF apparatus that<br />
is dissected by a defined pH membrane disc (15,16). Using MicroSol-IEF,<br />
2.5-3.0 mg plasma proteins can be loaded and efficiently fractionated into five<br />
separate chambers by their pI values.<br />
1. Add 2 μL of 99% dimethylamine (DMA) to the 400-μL sample (see Subheading<br />
3.4, Step 2) for alkylation and incubate the sample on a rotary shaker for 30 min<br />
at room temperature (adopted from the manufacturer’s instructions).<br />
2. Add 4 μL of 2 M DTT to quench any excess DMA. Centrifuge at 16,000×g for<br />
20 min at 4°C.<br />
3. Preparation of protein samples: Dilute 3 mg protein to a 3250-μL volume with<br />
sample buffer. The amount of diluted sample per chamber in the ZOOM ® IEF<br />
Fractionator is 650 μL.<br />
4. Assemble the ZOOM ® IEF Fractionator according to the manufacturer’s instructions.<br />
Six disks (pHs 3.0, 4.6, 5.4, 6.2, 7.0, and 10.0) are used to create five<br />
fractions that have a range of pH 3.0–10.0.<br />
5. Add each buffer (anode or cathode) to the corresponding blank chamber.<br />
6. Remove the sample chamber cap and add 650 μL of protein sample (step 3) to<br />
each chamber.<br />
7. Fractionation can be carried out under the following conditions: 100 V for 20 min,<br />
200 V for 80 min, and 600 V for 80 min (see Note 10). The starting current is<br />
approximately 0.6 mA, which increases to approximately 1.2 mA at the beginning<br />
of the 200-V step, and the ending current is approximately 0.2 mA.<br />
8. Load the electro-focused samples to the narrow pH IPG strips for 2DE.
66 Cho et al.<br />
Fig. 2. Narrow pH range 2DE images of plasma proteins after depletion of the major<br />
six abundant proteins through MARS. After microscale solution IEF (ZOOM ® ), the pH<br />
5.5–6.2 fraction was separated on pH 5.5–6.7 IPG strips by second isoelectric focusing<br />
and then resolved on a 9–16% gel. (A) Whole 2DE image of pH 3–10 NL and pH<br />
5.5–6.7. (B) One spot on the pH 3–10 NL gel can be separated into two or more spots<br />
in the narrow pH range 2DE. (C) Many hidden spots on the pH 3–10 NL gel appear<br />
in the narrow pH range 2DE of normal and HCC plasma.
Protein Profiling by Two-Dimensional Electrophoresis 67<br />
3.7. Fractionation of the Plasma Samples by Free Flow Electrophoresis<br />
To identify and isolate biomarker candidates from the plasma of diseased<br />
patients with HCC using 2DE, a higher resolution is critical, and the analysis<br />
can be done by performing narrow pH range IEF. However, for narrow pH range<br />
IEF, higher amounts of proteins (e.g., 10-fold or higher) should be loaded onto<br />
the IPG strip since the proteins present in other pH ranges will be discarded.<br />
Nevertheless, prefractionation or depletion is required prior to running both<br />
IEF and 2D gel. FFE is useful for prefractionation of plasma samples since it<br />
gives rise to a specific fraction of interest (e.g., pI, or density). For example, if<br />
one knows the pI of certain proteins, free fractionation by FFE can be useful<br />
for prefractionation of complex plasma. We describe here one of the several<br />
procedures for prefractionation of plasma samples using FFE.<br />
1. Dissolve the TCA-precipitated, flow-through fractions of MARS (∼2.0 mg) into<br />
the 500-μL separation medium 3 (see below) (adopted from the manufacturer’s<br />
instructions).<br />
2. Add traces of red acidic dye 2-(4-sulfophenylazo)-1,8-dihydroxy-3,6-<br />
naphthalenedisulfonic acid (SPADNS, Aldrich) to ease the optical control of the<br />
migration of sample within the separation chamber.<br />
3. FFE is carried out at 10°C using the following media (solutions marked<br />
at each inlet are applied): Anodic stabilization medium (Inlet I 1 ), separation<br />
medium 1 (Inlet I 2 ), separation medium 2 (Inlet I 3−−5 ), separation medium 3<br />
(Inlet I 6 ), cathodic stabilization medium (Inlet I 7 ), and counter-flow medium<br />
(Inlet I 8 ).<br />
4. To both the anode and the cathode, anodic circuit electrolyte and cathodic circuit<br />
electrolyte are applied, respectively.<br />
5. Assemble the ProTeam TM FFE instrument (Tecan). Use a 0.4-mm spacer for the<br />
separation chamber and a flow rate of approximately 60 mL/h (Inlet I 1−7 ) and a<br />
voltage of 1500 V, which results in a current of 20–24 mA.<br />
6. Perfuse the separation chamber with the sample using the cathodal inlet at approximately<br />
0.7 mL/h (4,17). Residence time in the separation chamber is approximately<br />
33 min.<br />
7. Collect each fraction into polypropylene, 96 deep-well plates, numbered 1 (anode)<br />
through 44 (cathode) (4).<br />
8. Remove glycerol and HPMC by TCA/acetone precipitation and dissolve the<br />
proteins with sample buffer.<br />
9. Load the electro-focused samples with narrow pH to the IPG strips for 2DE.<br />
3.8. Preparation of 2D Gels<br />
1. Cast the glass plates (separated by two 1.5-mm spacers positioned along the sides)<br />
and thin plastic sheets in the multi-casting chamber (20).<br />
2. Prepare gel solution for making 10 gels (20 × 20 cm, 1.5-mm spacer, 9–16%<br />
gradient): heavy solution (66.7 mL of 5× Tris-HCl buffer, 75 mL of a 40%
68 Cho et al.<br />
acrylamide stock solution, 0.7 mL of 10% ammonium persulfate (APS), 70 μL<br />
TEMED, and 191.7 mL of 50% glycerol), light solution (66.7 mL of 5× Tris-HCl<br />
buffer, 141.7 mL of a 40% acrylamide stock solution, 0.7 mL of 10% APS, 70 μL<br />
TEMED, and 125 mL distilled water).<br />
3. Assemble the gradient maker and peristaltic pump. Pour the light gel solution into<br />
the mixing chamber (close to the casting chamber) and the heavy gel solution<br />
into the reservoir chamber of the gradient maker. Operate the magnetic stirrer in<br />
the mixing chamber. Turn on the peristaltic pump until the gel solution reaches<br />
0.5-1.0 cm below the end of the glass plates (∼5 min). Check the flow rate, which<br />
should be between 100-120 mL/min.<br />
4. After the gel solution is poured, overlay the gel solution with distilled water to<br />
exclude air and to ensure a level surface on the top of the gel.<br />
5. Allow polymerization to occur overnight at room temperature.<br />
3.9. Equilibration of the Sample and Running of the Gel<br />
To solubilize the electro-focused proteins and to allow SDS to polymerize,<br />
it is necessary to soak the IPG strips in SDS equilibration buffer. This step<br />
is analogous to boiling the sample in SDS buffer prior to SDS-PAGE. The<br />
reducing agents, dithiothreitol (DTT) and tributylphosphine (TBP), reduce<br />
disulfide bonds to sulfhydryls (cysteine residues). Alkylating agents and iodoacetamide<br />
(IAA) prevent reoxidation of the free sulfhydryl groups (21).<br />
1. Prior to use, add approximately 158 μL TBP in 1 mL isopropanol to 100 mL<br />
SDS equilibration buffer and sonicate in a bath-type sonicator until the solution<br />
becomes transparent (see Note 11) (termed TBP equilibration buffer).<br />
2. Add 15 mL TBP equilibration buffer to each strip (gel side up) and gently shake<br />
for 25 min (TBP equilibration) (see Note 12) on an orbital shaker.<br />
3. Briefly rinse the IPG strip with 1× gel buffer and load the IPG strips onto the<br />
top of the gel and pour the agarose embedding solution (molten agarose solution<br />
with trace amounts of BPB) (see Note 13).<br />
4. Perform SDS-PAGE (40 mA/gel) until the BPB dye reaches the bottom of the<br />
gel. Keep the temperature at 10°C. The total run time for 20 × 20 cm gels is<br />
approximately 6 h.<br />
3.10. Coomassie Brilliant Blue G-250 Staining<br />
1. Fix the separated proteins into the gel in a 200-mL fixing solution for 1 h.<br />
2. Decant the fixing solution and stain the gel in Coomassie brilliant blue G-250<br />
overnight.<br />
3. Decant the staining solution.<br />
4. Wash several times (>3 times) in distilled water for more than 4 h.<br />
5. Scan the gel, then wrap the gel in plastic, and store it at 4°C.
Protein Profiling by Two-Dimensional Electrophoresis 69<br />
3.11. 2D Gel Image Analysis<br />
1. Import the gel image (recommended 12–16 bit, tiff format) and convert it into an<br />
ImageMaster file (*.mel).<br />
2. Detect the protein spots and determine the volume and percentage volume of<br />
each spot. The percentage volume is the normalized value that remains relatively<br />
independent of any irrelevant variations between gels, particularly those caused<br />
by varying experimental conditions.<br />
3. Select the differentially displayed protein spots (see Fig. 3).<br />
3.12. Destaining, In-gel Deglycosylation, and In-gel Tryptic Digestion<br />
Most plasma proteins are glycosylated, including clotting factors, lipoproteins,<br />
and antibodies (22,23). These carbohydrate-containing proteins play<br />
major roles in the normal biological functions in plasma. Since glycopeptides<br />
are not easily completely ionized during MS analysis, which may lead to inadequate<br />
spectral data and low detection sensitivity due to the attached glycans, a<br />
strategy for the removal of glycans is necessary for protein identification.<br />
1. Pick (or excise) the protein spot with an end-cut yellow tip and transfer the gel<br />
piece into a 1.5-mL Eppendorf tube.<br />
2. Wash the gel piece with 100 μL distilled water.<br />
3. Add 50 μL of 50 mM NH4HCO3 (pH 7.8) and ACN (6:4), and shake for 10 min.<br />
4. Repeat step 3 until the Coomassie blue G250 dye disappears (2 to 5 times).<br />
5. Decant the supernatant and dry the gel piece in a Speed Vac for 10 min (see<br />
Note 14).<br />
6. Add 5 μL trypsin (12.5 ng/μL in 50 mM NH 4 HCO 3 ) and leave the gel piece on<br />
ice for 45 min.<br />
7. Add 10 μL of 50 mM NH4HCO3 to the gel slice.<br />
8. Incubate the gel piece at 37°C for 12 h.<br />
3.13. Desalting of Peptides and MALDI Plating<br />
1. Resin packing: Twist the column body (GELoader tip, Eppendorf) near the end of<br />
the tip and push the resin solution [Poros R2:Oligo R3 (2:1) in 70% (v/v) ACN,<br />
occasionally in a more efficient ratio of 1:1] with a 1-mL syringe. A packed resin<br />
length of 2-3 mm is suitable (18,19).<br />
2. Equilibration of the column: Add 20 μL of 2% (v/v) formic acid and push the<br />
solution through the column with the 1-mL syringe.<br />
3. Peptide binding: Add the peptide solution (supernatant of step 9 in Subheading<br />
3.12, approximately 10-12 μL) and push this solution through the column with<br />
the syringe.<br />
4. Washing: Add 20 μL of 2% (v/v) formic acid and push this solution through the<br />
column with the syringe.
70 Cho et al.<br />
Fig. 3. Detection of PTMs on the 2DE of plasma proteins. (A) 2DE images of<br />
plasma proteins that were depleted of the major six abundant proteins through MARS,<br />
untreated (left) and alkaline phosphatase (AP)-treated (AP) (right). (B) One of the<br />
differentially displayed proteins after treatment with AP. (C) Data-dependant neutral<br />
loss scan spectrum of sequence KEPCVESLVSpQYFQTVTDYGKD corresponding to<br />
the phosphorylated apolipoprotein A-II precursor.
Protein Profiling by Two-Dimensional Electrophoresis 71<br />
5. MALDI spotting: Add 1 μL matrix solution [10 mg/mL CHCA in 70% (v/v) can<br />
and 2% (v/v) formic acid] and directly spot the eluted peptides and matrix mixture<br />
onto the MALDI plate (Opti-TOF TM 384-well Insert, Applied Biosystems).<br />
6. Reuse the column: Add 20 μL of 100% ACN and push this solution through the<br />
column with the syringe and repeat step 2 for equilibration of the column.<br />
3.14. MALDI-TOF and Peptide Mass Fingerprinting<br />
1. Analyze the peptide mass fingerprinting (PMF) with the Voyager DE-PRO or<br />
4800 MALDI-TOF/TOF mass spectrometer (Applied Biosystems).<br />
2. Obtain the mass spectra in reflectron/delayed extraction mode with an accelerating<br />
voltage of 20 kV and sum data from either 500 laser pulses (4800 MALDI-<br />
TOF/TOF) or 100 laser pulses (Voyager DE-PRO).<br />
3. Calibrate the spectrum with tryptic auto-digested peaks (m/z 842.5090 and<br />
2211.1046) and obtain monoisotopic peptide masses with Data Explorer 3.5<br />
(PerSeptive Biosystems).<br />
4. Search the Swiss-Prot and NCBInr databases with the Matrix Science search<br />
engine (http://www.matrixscience.com).<br />
3.15. Profiling of PTMs on Selected Spots<br />
Although shotgun proteomics that utilize various labeling techniques (e.g.,<br />
SILAC and iTRAQ) are useful for protein identification in a high-throughput<br />
manner, it has many limitations for PTM analysis. However, 2D gels usually<br />
display proteins with PTMs or isoforms of certain proteins on a single gel<br />
as spots in different positions, which can lead to further identification for<br />
their molecular characteristics with the aid of high resolution LC-MS/MS. For<br />
example, in a typical 2D gel of plasma, the phosphorylated forms of certain<br />
protein can be easily detected in a ladder form that results from different<br />
pIs. Figure 3 shows the localization of the exact site of phosphorylated<br />
apolipoprotein A-II precursor. As seen in the figure, there is clear difference<br />
between spots that are alkaline phosphatase (AP)-treated and those that are<br />
untreated in the 2D gel where the treated group has been shifted to a more<br />
basic position. The phosphorylation site of these proteins can be determined<br />
using multidimensional MS (MS 2 and MS 3 ). Here, we describe the procedure<br />
for identification of phosphorylated proteins by 2DE coupled to MS.<br />
1. Desalting is processed for the MARS-treated (high-abundance proteins depleted)<br />
plasma sample using Amicon Ultra-15 (Molecular Weight Cut Off; 5 kDa,<br />
Millipore).<br />
2. Dephosphorylation is carried out overnight at 37°C in a solution of 0.4%<br />
ammonium carbonate buffer (pH 8.5) with 24 ng/μL calf intestine AP in 0.4%<br />
NH4HCO3.<br />
3. The reaction is stopped by freeze drying for further analysis.
72 Cho et al.<br />
4. Execute 2DE, picking, extraction, and desalting of peptides under the same<br />
conditions (see Subheadings 3.8-3.13).<br />
5. Dissolve the extracted and desalted peptides in 10 μL of LC-MS/MS<br />
solution [0.4% (v/v) acetic acid and 0.005% (v/v) heptafluorobutyric acid<br />
(HFBA)].<br />
6. Nano LC-MS/MS analysis is then performed on an Agilent Nano HPLC system<br />
(Agilent) and LTQ mass spectrometer (Thermo Electron, San Jose, CA).<br />
7. The capillary column used for LC-MS/MS analysis (150 mm × 0.075 mm)<br />
was obtained from Proxeon (Odense M, Denmark), and the slurry was packed<br />
in-house with a 5-μm, 100-Å pore size Magic C18 stationary phase (Michrom<br />
Bioresources, Auburn, CA).<br />
8. The mobile phase A for LC separation was 0.4% acetic acid and 0.005% HFBA<br />
in deionized water (Cascada , Pall, USA), and the mobile phase B was 0.4%<br />
acetic acid and 0.005% HFBA in ACN.<br />
9. The sample obtained from the Oasis HLB (Waters, USA) desalting step and<br />
Nanosep (Pall, USA) filtering was loaded onto the LC column.<br />
10. The chromatography gradient was designed to provide a linear increase from<br />
5% B to 35% B over 50 min and from 40% B to 60% B over 20 min and from<br />
60% B to 80% B over 5 min. The flow rate was maintained at 300 nL/min.<br />
11. The mass spectra were acquired using data-dependent acquisition with a full mass<br />
scan (400-1800 m/z) followed by MS/MS scans. Each MS/MS scan acquired<br />
was an average of three microscans on LTQ.<br />
12. The temperature of the ion transfer tube was controlled at 200°C, and the spray<br />
was 2.0–3.0 kV. The normalized collision energy was set at 35% for MS2.<br />
13. To determine the exact position of the phosphorylation site, the automated<br />
neutral loss MS3 scan was employed, which relies on the observed behavior<br />
of phosphopeptides subjected to MS/MS analysis in an ion trap. If the MS/MS<br />
scan produces a fragment phosphate group (98 with charge state 1+, 49 with<br />
charge state 2+, and 32.6 with charge state 3+), an MS3 scan of the product ion<br />
is initiated (see Note 15).<br />
4. Notes<br />
1. Donors were tested and determined negative for HIV-1 and HIV-2 antibodies,<br />
HIV-1 antigen (HIV-1), Hepatitis B surface antigen (HBsAg), Hepatitis B core<br />
antigen (anti-HBc), Hepatitis C virus (anti-HCV), HTLV-I/II antibody (anti-<br />
HTLV-I/II), and syphilis.<br />
2. No protease inhibitor cocktails were used. This procedure required 2hat2-6°C.<br />
3. Approximately 10% of the sample was left at the bottom of the secondary tube<br />
to ensure that no cellular material was collected.<br />
4. If excess of protease inhibitors are used, the resolving power of protein spots in<br />
the 2D gel will be decreased, and the border of the spots will be unclear.<br />
5. If protein pellets are dried completely in the Speed Vac, they will be not redissolved<br />
in sample buffer. Pellets should be air dried for 15–30 min.
Protein Profiling by Two-Dimensional Electrophoresis 73<br />
6. To ensure complete dissolution of the sample buffer, it is usually recommended<br />
to warm the sample buffer at room temperature. The sample buffer that includes<br />
proteins should not be heated to avoid carbamylation of proteins by isocyanate,<br />
which may lead to charge heterogeneities that are formed from the decomposition<br />
of urea.<br />
7. Cup loading: Rehydrate the IPG gel strip with 350 μL sample buffer (proteins<br />
are not included), and load the 100-μL protein sample in sample buffer in the<br />
sample cup. High salt concentrations are better tolerated by cup loading.<br />
8. Apply low voltages (100 V) at the beginning of the run for 3–5 h. Replace the<br />
filter paper (for desalting purposes) at the end of the run.<br />
9. After 1D (first dimension) is run, IPG strips that were not immediately used for<br />
2D (second dimension) run can be preserved at –80°C for several months.<br />
10. If electrical current passes through the system, BPB dye starts to migrate toward<br />
the anode reservoir, which eventually results in a change in the color of the<br />
anode buffer (to yellow).<br />
11. Concentrated TBP reacts violently with organic matter. All procedures for<br />
preparing TBP stock solutions should be done in a fume hood. Store the TBP<br />
stock solution in the dark at 4°C. Do not store it longer than 2 weeks.<br />
12. DTT/IAA equilibration procedure: For reduction and alkylation of proteins,<br />
the DTT/IAA equilibration procedure is also useful to replace the use of TBP<br />
equilibration procedure. Divide the SDS equilibration buffer into two 50-mL<br />
aliquots. Add 1 g DTT to the first aliquot and 1.25 g IAA to the second aliquot.<br />
Add 10 mL of the DTT equilibration buffer to each strip and place on a shaker<br />
for 10 min. Decant the DTT equilibration buffer and shake with 10 mL of the<br />
IAA equilibration buffer for another 10 min.<br />
13. To prepare the agarose embedding solution, dissolve 1gofagarose in 100 mL<br />
of small gel buffer and melt in a microwave on medium power. For complete<br />
melting of the agarose solution, heat the agarose solution in short intervals with<br />
occasional swirling to mix the solution.<br />
14. In-gel deglycosylation: After destaining, one may remove the glycan groups<br />
of glycoproteins by trypsin digestion for obtaining peptides of highest purity.<br />
Rehydrate gel spots (see Subheading 3.12, step 5) with 10 μL of PNGase F<br />
stock solution (10 μU) and incubate for 3hat37°C. Decant the supernatant<br />
including the glycans. Wash the gel piece with 50 μL 50 mM NH4HCO3 (pH<br />
7.8) and ACN (6:4). Dry the gel piece in a Speed Vac.<br />
15. The SEQUEST software was used to identify the peptide sequences:<br />
DeltaCn ≥ 0.1 and Rsp ≤ 4; Xcorr ≥ 1.9 with charge state 1+, Xcorr ≥ 2.2 with<br />
charge state 2+, and Xcorr ≥ 3.75 with charge state 3+ were used as cutoffs for<br />
peptide identification.<br />
Acknowledgments<br />
This study was supported by a grant from the Korean Health 21 R&D project,<br />
Ministry of Health & Welfare, Republic of Korea (A030003 to YKP).
74 Cho et al.<br />
References<br />
1. Putnam, F. W. (ed) (1987) The Plasma Proteins, Academic Press, New York.<br />
2. Anderson, N. L., and Anderson, N. G. (2002) The human plasma proteome: history,<br />
character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845–867.<br />
3. Lee, H. J., Lee, E. Y., Kwon, M. S., and Paik, Y. K. (2006) Biomarker discovery<br />
from the plasma proteome using multidimensional fractionation proteomics. Curr.<br />
Opin. Chem. Biol. 10, 42–49.<br />
4. Cho, S. Y., Lee, E. Y., Lee, J. S., Kim, H. Y., Park, J. M., Kwon, M. S., Park, Y. K.,<br />
Lee, H. J., Kang, M. J., Kim, J. Y., Yoo, J. S., Park, S. J., Cho, J. W., Kim, H. S., and<br />
Paik, Y. K. (2005) Efficient prefractionation of low-abundance proteins in human<br />
plasma and construction of a two-dimensional map. Proteomics 5, 3386–396.<br />
5. Omenn, G. S., States, D. J., Adamski, M., and Blackwell, T. W. (2005). Overview<br />
of the HUPO Plasma Proteome Project: results from the pilot phase with 35<br />
collaborating laboratories and multiple analytical groups, generating a core dataset<br />
of 3020 proteins and a publicly-navailable database. Proteomics 5, 3226–3245.<br />
6. States, D. J., Omenn, G. S., Blackwell, T. W., Fermin, D., Eng, J., Speicher, D. W.,<br />
and Hanash, S. M. (2006) Challenges in deriving high-confidence protein identifications<br />
from data gathered by a HUPO plasma proteome collaborative study. Nat.<br />
Biotechnol. 24, 333–338.<br />
7. Yang, Z., Hancock, W. S., Chew, T. R., and Bonilla, L. (2005) A study of<br />
glycoproteins in human serum and plasma reference standards (HUPO) using<br />
multilectin affinity chromatography coupled with RPLC-MS/MS. Proteomics 5,<br />
3353–3366.<br />
8. Wang, Y., Wu, S. L., and Hancock, W. S. (2006) Approaches to the study of<br />
N-linked glycoproteins in human plasma using lectin affinity chromatography<br />
and nano-HPLC coupled to electrospray linear ion trap-Fourier transform mass<br />
spectrometry. Glycobiology 16, 514–523.<br />
9. Gorg, A., Boguth, G., Kopf, A., Reil, G., Parlar, H., and Weiss, W. (2002) Sample<br />
prefractionation with Sephadex isoelectric focusing prior to narrow pH range twodimensional<br />
gels. Proteomics 2, 1652–1657.<br />
10. Wu, T. L. (2006) Two-dimensional difference gel electrophoresis. Methods Mol.<br />
Biol. 328, 71–95.<br />
11. Park, K. S., Kim, H., Kim, N. G., Cho, S. Y., Choi, K. H., Seong, J. K., and<br />
Paik, Y. K. (2002) Proteomic analysis and molecular characterization of tissue<br />
ferritin light chain in hepatocellular carcinoma. Hepatology 6, 1459–1466.<br />
12. Park, K. S., Cho, S. Y., Kim, H., and Paik, Y. K. (2002) Proteomic alterations of the<br />
variants of human aldehyde dehydrogenase isozymes correlate with hepatocellular<br />
carcinoma. Int. J. Cancer 2, 261–265.<br />
13. Rai, A. J., Glefand, C. A., Haywood, B. C., Warunek, D. J., Yi, J., Schuchard, M. D.,<br />
Mehigh, R. J., Cockrill, S. L., Scott, G. B., Tammen, H., Schulz-Knappe, P.,<br />
Speicher, D. W., Vitzthum, F., Haab, B. B., Siest, G., and Chan, D. W.<br />
(2005) HUPO plasma proteome project specimen collection and handling: towards<br />
the standardization of parameters for plasma proteome samples. Proteomics 5,<br />
3262–3277.
Protein Profiling by Two-Dimensional Electrophoresis 75<br />
14. Huang, L., Harvie, G., Feitelson, J. S., Gramatikoff, K., Herold, D. A., Allen, D. L.,<br />
Amunngama, R., Hagler, R. A., Pisano, M. R., Zhang, W. W., and Fang, X. (2005)<br />
Immunoaffinity separation of plasma proteins by IgY microbeads: meeting the<br />
needs of proteomic sample preparation and analysis. Proteomics 5, 3314–3328.<br />
15. Herbert, B. and Righetti, P. G. (2000) A turning point in proteome analysis: sample<br />
prefractionation via multicompartment electrolyzers with isoelectric membranes.<br />
Electrophoresis 21, 3639–3648.<br />
16. Miklos, G. L. and Maleszka, R. (2001) Integrating molecular medicine with<br />
functional proteomics: realities and expectations. Proteomics 1, 30–41.<br />
17. Weber, G., Islinger, M., Weber, P., Eckerskorn, C., and Volkl, A. (2004)<br />
Efficient separation and analysis of peroxisomal membrane proteins using free-flow<br />
isoelectric focusing. Electrophoresis 25, 1735–1747.<br />
18. Choi, B. K., Cho, Y. M., Bae, S. H., Zoubaulis, C. C., and Paik, Y. K. (2003)<br />
Single-step perfusion chromatography with a throughput potential for enhanced<br />
peptide detection by matrix-assisted laser desorption/ionization-mass spectrometry.<br />
Proteomics 3, 1955–1961.<br />
19. Gobom, J., Nordhoff, E., Mirgorodskaya, E., Ekman, R., and Roepstorff, P. (1999)<br />
A sample purification and preparation technique based on nano-scale RP-columns<br />
for the sensitive analysis of complex peptide mixtures by MALDI-MS. J. Mass<br />
Spectrom. 24, 105–116.<br />
20. Walsh, B. J., and Herbert, B. R. (1999) Casting and running vertical slap-gel<br />
electrophoresis for 2D-PAGE. Methods Mol. Biol. 112, 245–253.<br />
21. Newhall, W. J. and Jones, R. B. (1983) Disulfide-linked oligomers of the major<br />
outer membrane protein of chlamydiae. J. Bacteriol. 154, 998–1001.<br />
22. Kaufman, R. J. (1998) Post-translational modifications required for coagulation<br />
factor secretion and function. Thromb. Haemost. 79, 1068–1079.<br />
23. Tabas, I. (1999) Nonoxidative modifications of lipoproteins in atherogenesis. Annu.<br />
Rev. Nutr. 19, 123–139.
II<br />
Clinical Proteomics by 2DE and Direct<br />
MALDI/SELDI MS Profiling
5<br />
Analysis of Laser Capture Microdissected Cells<br />
by 2-Dimensional Gel Electrophoresis<br />
Daohai Zhang and Evelyn Siew-Chuan Koay<br />
Summary<br />
Laser capture microdissection (LCM) is a powerful tool for procuring near-pure<br />
populations of targeted cell types from specific microscopic regions of tissue sections,<br />
by overcoming problems due to tissue heterogeneity and minimizing intermixture and<br />
contamination by other cell types. The combination of LCM with various proteomic<br />
technologies has enabled high-throughput molecular analysis of human tumors, and<br />
provided critical tools in the search for novel disease markers and therapeutic targets. As<br />
an example, we describe the application of LCM in dissecting the tumor cells in breast<br />
cancer for macromolecular extraction and subsequent protein separation by 2-dimensional<br />
gel electrophoresis (2-D GE). The protocols and the key issues involved in preparing<br />
ethanol-fixed paraffin-embedded tissue blocks and microscopic sections, microdissecting<br />
the cells of interest using the PixCell II LCM system, extracting and separating the cellular<br />
proteins by 2-D GE, and preparing selective proteins for peptide mass analysis by mass<br />
spectrometry, are discussed. The aim is to provide a practical guide in performing highthroughput<br />
microdissection of target cells and gel-based proteomics, which can be adapted<br />
to research in cancer formation and growth.<br />
Key Words: laser capture microdissection; 2-dimensional gel electrophoresis; breast<br />
cancer; proteomics; silver staining.<br />
1. Introduction<br />
Cellular proteins (collectively known as “proteomes”) are less susceptible<br />
than the transcriptome to experimental artifacts arising from the rigors of tissue<br />
collection and processing, and advances in global protein expression analysis<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
77
78 Zhang and Koay<br />
(expression proteomics) have been used in mapping cellular pathways, identifying<br />
the molecular alterations associated with disease onset and progression<br />
and searching for potential tumor markers or drug targets in human disease,<br />
especially in cancer. However, to obtain cell-specific protein profiles, homogeneous<br />
or near-pure populations of the cells of interest, free from contamination<br />
by adjacent cell types, are prerequisites. Laser capture microdissection (LCM)<br />
was developed to enable the procurement of near-pure populations of the target<br />
cells with a greater speed and precision than is possible with manual dissection<br />
methods. LCM permits selective transfer of specific cell types, under direct<br />
microscopic visualization, from complex tissues onto a polymer film that is<br />
activated by laser pulses, whilst retaining their morphology. The homogeneity<br />
of encapsulated cells can be verified microscopically. With these inherent<br />
advantages, LCM has become a valuable research tool and has been applied to<br />
cellular and molecular studies of various cancers, including breast (1,2), colon<br />
(3), and liver (4) cancers. It is equally efficacious in procuring cell populations<br />
from both frozen tissues (3,4) and ethanol-fixed, paraffin-embedded tissues<br />
(1,5).<br />
Protein profiles of the LCM-dissected cells can be obtained by twodimensional<br />
fluorescence difference gel electrophoresis (2-D DIGE) (6),<br />
16<br />
O/ 18 O isotopic labeling (7), differential iodine radioisotope detection (2),<br />
isotope-coded affinity tag (iCAT) coupled with two-dimensional tandem mass<br />
spectrometry (2-D LCMS/MS) (8), and mass spectrometry compatible silver<br />
staining (1,9). Protein samples from LCM-dissected cells can also be applied<br />
to reverse-protein arrays to analyze the key cellular signaling pathways and<br />
metabolic networks (10,11). In this chapter, the in-house protocols used in<br />
the authors’ laboratory for procuring near-pure populations of breast tumor<br />
cells from clinical samples, and for the extraction, isolation, and analysis of<br />
their protein profiles, are described. These include: (1) preparation of ethanolfixed<br />
paraffin-embedded tissue blocks; (2) microdissection using the Pix II<br />
LCM System and cellular protein extraction; (3) protein separation by 2-D gel<br />
electrophoresis (2-D GE), silver staining, and gel image analysis; and (4) preparation<br />
of targeted proteins of interest for peptide mass analysis by tandem mass<br />
spectrometry and identification of proteins of interest via database search.<br />
2. Materials<br />
2.1. Histology—Tissue Block and Tissue Section Preparation<br />
1. 70% (v/v), 80% (v/v), 95% (v/v), 100% ethanol<br />
2. Deionized or Milli-Q water (Millipore, Bedford, MA, USA)<br />
3. Hematoxylin solution, Mayer’s (Sigma, St. Louis, MO, USA)<br />
4. Eosin Y solution (Sigma)
Combining LCM with 2-D Gel Electrophoresis 79<br />
5. Complete, mini protease inhibitor cocktail tablets (Roche Applied Science,<br />
Pleasanton, CA, USA)<br />
6. Disposable microtome blades (Feather Safety Razor Co., Ltd., Osaka, Japan)<br />
7. Uncharged microscopic glass slides (Paul Marienfeld GmbH & Co, KG, Lauda-<br />
Koenigshofen, Germany)<br />
8. Sakura Tissue-Tek ® V.I.P. TM 5 Jr tissue processor (Sakura Finetek, Inc. Japan<br />
Co., Ltd, Tokyo)<br />
9. Paraffin wax—Paraplast ® tissue embedding medium; melting point 56-58°C,<br />
store at room temperature (RT) (Structure Probe, Inc., West Chester, PA, USA)<br />
10. Xylenes, Reagent Grade (Sigma)<br />
11. Embedding molds—super metal base molds, 66mm × 54mm × 15mm (Surgipath<br />
Medical Industries, Richmond, IL, USA)<br />
2.2. Laser Capture Microdissection and Protein Sample Preparation<br />
1. PixCell II LCM system (Arcturus Engineering, Mountain <strong>View</strong>, CA, USA)<br />
2. CapSure transparent plastic caps (Arcturus Engineering)<br />
3. Lysis buffer: 7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 1% Nonidet P (NP)-40,<br />
0.5% (v/v) Triton X-100, 50 mM dithiothreitol (DTT), 40 mM Tris-HCl, pH 7.5,<br />
2 mM tributyl phosphine (TBP), and 1% (v/v) IPG buffer (pH 3–10). Store at RT.<br />
4. PlusOne 2-D Clean-up Kit (GE Healthcare, San Francisco, CA, USA)<br />
5. Immobilized pH gradient (IPG) buffer (pH 3–10) (GE Healthcare)<br />
6. PlusOne 2-D Quantitation Kit (GE Healthcare)<br />
2.3. Isoelectric Focusing (IEF) and Sodium Dodecyl<br />
Sulfate-Polyacrylamide Gel Electrophoresis (SDS-PAGE)<br />
1. Ettan TM IPGphor TM IEF electrophoresis unit (GE Healthcare)<br />
2. Ceramic strip holders and Ettan TM IPGphor TM Strip Holder Cleaning Solution<br />
(GE Healthcare)<br />
3. Immobiline TM IPG DryStrips (18 cm, pH 3–10, NL) (GE Healthcare)<br />
4. DryStrip Cover Fluid (GE Healthcare)<br />
5. Sample rehydration buffer: 7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 1%<br />
(w/v) NP-40, 1% (v/v) IPG buffer, 50 mM DTT. DTT was added freshly to the<br />
rehydration buffer prior to use. Store at RT.<br />
6. Equilibration buffer A (prepare 10 ml for each strip): 6 M urea, 30% glycerol,<br />
2% SDS, 1% DTT, 50 mM Tris-HCl, pH 8.8. DTT is added to the stock solution<br />
before use.<br />
7. Equilibration buffer B (prepare 10 ml for each use strip): 6 M urea, 30% glycerol,<br />
2% SDS, 250 mg (2.5%, w/v) iodoacetamide (IAA), 50 mM Tris-HCl, pH 8.8.<br />
IAA is added to the stock solution before use.<br />
8. 10% SDS-acrylamide gel: 33 ml acrylamide/bis (30% T, 5% C) (Bio-Rad<br />
Laboratories, Hercules, CA, USA), 25 ml Tris (1.5 M, pH 8.8), 1 ml 10% (w/v)<br />
SDS, 0.5 ml 10% (w/v) ammonium persulfate (freshly prepared on the day of<br />
use), 35 μl TEMED (Bio-Rad). Make up to 100 ml with Milli-Q water.
80 Zhang and Koay<br />
9. Water-saturated isobutanol: Shake equal volumes of Milli-Q water and isobutanol<br />
in a glass bottle and allow the mixture to separate. Transfer the top layer<br />
to a new bottle and store at RT.<br />
10. Agarose sealing solution: Dissolve 0.5% low-melting-point agarose and 0.1%<br />
(w/v) bromophenol blue in 1× SDS-PAGE running buffer. Store at RT.<br />
11. SDS-PAGE running buffer: 25 mM Tris, 198 mM glycine, 0.2% (w/v) SDS,<br />
pH 8.3<br />
12. PROTEAN TM II xi Cell system (Bio-Rad)<br />
2.4. Silver Staining (see Note 1)<br />
1. Fix solution: 5% acetic acid and 50% ethanol per 100 ml<br />
2. Sensitivity-enhancing solution: 30% (v/v) ethanol, 6.8% (w/v) sodium acetate,<br />
100 μl of 2% (w/v) sodium thiosulphate per 100 ml<br />
3. Silver staining solution: 0.25% (w/v) silver nitrate<br />
4. Development solution: 2.5% (w/v) anhydrous potassium carbonate, 20 μl of 2%<br />
(w/v) sodium thiosulphate per 100 ml, 40 μl of 37% formaldehyde per 100 ml.<br />
5. Stop solution: 4% (w/v) Tris and 2% (v/v) acetic acid per 100 ml<br />
6. Gel store (soak) solution: 1% (w/v) sodium acetate and 10% (v/v) methanol per<br />
100 ml<br />
2.5. Gel Image Analysis<br />
1. Personal Densitometer SI (Molecular Dynamics, Sunnyvale, CA, USA)<br />
2. ImageMaster 2D Elite (Platinum) software (GE Healthcare)<br />
2.6. In-gel Trypsin Digestion and Preparation for MS Analysis<br />
1. Destaining solution: 30 mM potassium ferricyanide and 100 mM sodium<br />
thiosulfate (1:1)<br />
2. 25 mM sodium bicarbonate<br />
3. Dehydrating solution: 50 mM sodium bicarbonate and 50% (v/v) methanol per<br />
100 ml<br />
4. SpeedVac centrifuge (TeleChem International, Inc., Sunnyvale, CA, USA)<br />
5. Digestion solution: 40 ng/μl trypsin sequencing grade (Promega, Madison, WI,<br />
USA) in 20 mM ammonium bicarbonate solution<br />
6. Extraction solution (for hydrophobic peptides): 5% (v/v) trifluoracetic acid<br />
(TFA) and 50% (v/v) acetonitrile (ACN) per 100 ml<br />
7. Peptide reconstitution solution: 0.1% (v/v) TFA<br />
8. ZipTip C18 columns (Millipore)<br />
9. Eluant: 70% (v/v) ACN and 0.1% TFA per 100 ml<br />
10. Stainless steel MALDI-TOF sample target plates (Applied Biosystems,<br />
Framingham, MA, USA)<br />
11. Alpha-cyano-4-hydroxycinnamic acid (-CHCA) matrix, 3 mg/ml (Sigma)<br />
12. Applied Biosystems 4700 MALDI-TOF/TOF mass spectrometer
Combining LCM with 2-D Gel Electrophoresis 81<br />
2.7. Database Search for Protein Identification<br />
1. MASCOT software (Matrix Science, London, England)<br />
2. MS-Fit software (http://prospector.ucsf.edu)<br />
3. Methods<br />
The methods described below have been successfully used in the authors’<br />
laboratory for proteomics studies in human breast cancer specimens (1,9) and<br />
can be applied to other cancer tissues as well. Breast tumors and matched<br />
normal tissues were obtained from the Tissue Repository Unit of the National<br />
University Hospital, Singapore, after approval by our Institutional Review<br />
Board.<br />
3.1. Preparation of Tissue Sections for LCM<br />
In this step, frozen tissues can be directly transferred from the –80°C freezer,<br />
where they had been stored after surgical excision and trimming, to a pre-cooled<br />
tube containing 70% (v/v) ethanol and kept on ice. Ethanol-fixed paraffinembedded<br />
tissue blocks should be prepared as quickly as possible, and the<br />
completed blocks stored at or below 4°C.<br />
1. Fix the frozen tissue overnight in 70% ethanol at 4°C.<br />
2. Place each ethanol-fixed tissue piece, trimmed to appropriate dimensions, into<br />
a pre-cooled cassette within the tissue processor and dehydrate according to the<br />
following procedure: 30 min each in 70% and 80% ethanol at 40°C; 45 min in<br />
95% ethanol at 40°C (twice); 45 min in 100% ethanol at 40°C (twice), and 45 min<br />
in xylene at 40°C (twice) (see Note 2).<br />
3. Embed the specimen in paraffin using embedding molds, with four changes of<br />
paraffin after every 30-min interval.<br />
4. Store the paraffin blocks at or below 4°C, if they were not to be processed<br />
immediately for sectioning.<br />
5. Put the block in a –20°C freezer for at least 1 h before cutting sections from it.<br />
6. Cut sections of 8 μm thickness using a standard microtome. Blades should be<br />
changed regularly (see Note 3).<br />
7. Collect the tissue sections on uncharged microscopic glass slides, allow tissue<br />
sections to be air dried, and store the cut sections at or below 4°C.<br />
3.2. Staining of Paraffin-embedded Sections<br />
The staining of sections for LCM is similar to that used in most histology<br />
laboratories for morphological assessment. However, using minimal amount of<br />
the stain to visualize the tissue for microdissection will improve macromolecule<br />
recovery (see Note 4). One tablet of protease inhibitor cocktail should be added
82 Zhang and Koay<br />
to every 10 ml of each reagent (except xylene), and all reagents prepared using<br />
double deionized water or Milli-Q ® water. Staining should be performed as<br />
close as possible to the scheduled LCM dissection.<br />
1. Deparaffinize the sections in fresh xylene for 5 min, followed by another 5 min<br />
with a fresh change of xylene.<br />
2. Rehydrate for 15 s in each step of the following series: 100% ethanol, 95%<br />
ethanol, 75% ethanol, and deionized water.<br />
3. Stain with Mayer’s Hematoxylin for 30 s.<br />
4. Rinse off excess stain with deionized water for 15 s; repeat rinse a second time.<br />
5. Dehydrate for 15 s in 70% ethanol.<br />
6. Stain with Eosin Y for 5 s.<br />
7. Dehydrate the sections for 15 s (twice) in 95% ethanol, 15 s (twice) in 100%<br />
ethanol, and 60 s in xylene.<br />
8. Air-dry for approximately 2–5 min to allow xylene to evaporate completely (see<br />
Note 5).<br />
9. The tissue is now ready for LCM (see Note 6).<br />
3.3. Laser Capture Microdissection and Protein Sample Preparation<br />
The PixCell II LCM system (Arcturus Engineering, Mountain <strong>View</strong>, CA,<br />
USA) is used for specific microdissection of tumor cells in our laboratory.<br />
Tissue sections are usually mounted on uncoated glass slides to provide support<br />
for the CapSure cap during microdissection. LCM utilizes an infrared laser<br />
integrated into a standard microscope, and when the desired cells move into<br />
the path of the light source, the investigator activates the laser, which in<br />
turn activates the membrane (a short laser pulse emitted heats the transparent<br />
membrane to ∼90°C for 5 ms). This melts the membrane, with subsequent<br />
binding and encapsulation of the cells of interest, segregating them from the<br />
surrounding cells and connective tissues. Images of the tissues before and after<br />
microdissection and of the captured cells on the cap can be visualized, thus<br />
maintaining an accurate record of each dissection. The laser beam diameter<br />
may be adjusted from 7.5 to 30 μm to procure either single cells or groups of<br />
cells, respectively.<br />
1. Place the slide containing the prepared tissue on the microscope stage. Set the<br />
laser parameters as follows: spot diameter at 15 μm, pulse duration at 5 ms, and<br />
power at 50 mW.<br />
2. Scan the tissue section to locate the desired cells. Dissect out the target cells of<br />
interest and capture all encapsulated cells from each section in quick succession<br />
into one cap. Cells dissected from ∼2500 shots can be captured into one cap (see<br />
Note 7). Figure 1 shows an example of tumor cells before and after microdissection.
Combining LCM with 2-D Gel Electrophoresis 83<br />
A B C<br />
Fig. 1. Laser capture microdissection (LCM) of breast tumor cells. The tissue section<br />
on the uncharged glass slide was stained with hematoxylin and eosin and microdissected<br />
with the PixCell II LCM system (Arcturus Engineering). (A) section before LCM; (B)<br />
section after LCM; (C) microdissected cell.<br />
3. Place the LCM cap on an Eppendorf tube containing 100 μl of lysis buffer with<br />
protease inhibitor and invert the tube and vortex vigorously for 1 min.<br />
4. Place the tube on ice for approximately 20 min and sonicate the microdissected<br />
sample in a bath sonicator with 5 s pulses, in between 5-s intervals, for a duration<br />
of 1 min.<br />
5. Replace the sample on ice immediately after 1-min sonication.<br />
6. Centrifuge the sample at 16,000 g for 20 min at 4°C and transfer the supernatant<br />
to a new Eppendorf tube.<br />
7. Determine the protein concentration using the PlusOne 2D Quantitation kit (GE<br />
Healthcare) and clean up the sample using the PlusOne 2-D cleanup kit (GE<br />
Healthcare), following the manufacturer’s instructions closely.<br />
8. Dissolve the protein pellet in the appropriate volume of sample rehydration buffer<br />
and aliquot according to experimental plans for immediate and later usage. Store<br />
the aliquotted samples at –80°C until analyzed (see Note 8).<br />
3.4. First-dimension Gel Electrophoresis (Isoelectric Focusing)<br />
1. Prepare the strip holder for the 18-cm IPG strip (see Note 9).<br />
2. Squeeze a few drops of Ettan IPGphor Strip Holder Cleaning Solution (GE<br />
Healthcare) into the slot and clean thoroughly. Rinse with Milli-Q water and dry<br />
completely.<br />
3. Mix approximately 50 μl of the reconstituted protein samples (∼100–150 μg)<br />
with the appropriate volume of rehydration buffer. The total volume should be<br />
340 μl for one 18-cm IPG strip.<br />
4. Transfer the entire volume of the diluted protein sample into the groove of the<br />
IPG strip holder.<br />
5. Remove the cover from the IPG strip (18 cm, pH 3–10) and place the IPG strip<br />
in the holder such that the gel of the strip is in contact with the sample (i.e., gel
84 Zhang and Koay<br />
side down). Try to remove any trapped air bubbles by lifting the strip up and<br />
down from one side.<br />
6. Overlay the IPG strip with 2–3 ml of DryStrip Cover Fluid to prevent urea<br />
crystallization and evaporation, and replace the cover on the strip holder.<br />
7. Rehydrate the IPG strip at 20 V for 12 h at 20°C.<br />
8. Perform IEF under the following conditions: 500 V for 1 h, 2000 V for 1 h,<br />
4000 V for 1 h, and 8000 V for 6 h.<br />
9. Once focusing is complete, pour off the oil. The strips can be stored at –20°C for<br />
several weeks, or immediately treated as described below (see Subheading 3.5).<br />
3.5. IPG Strip Equilibration<br />
1. Place the focused IPG strips in a container with 10 ml of equilibration buffer A<br />
and shake for 15 min at RT (see Note 10).<br />
2. Transfer the IPG strip to a container with 10 ml of equilibration buffer B and<br />
shake for 15 min at RT (see Note 10).<br />
3. The equilibrated strips can then be processed for second-dimension gel<br />
electrophoresis.<br />
3.6. Second-dimensional SDS-PAGE<br />
Prepare the SDS-polyacrylamide gels in advance, and make sure that the<br />
gels are well polymerized before performing the equilibration of IPG strips.<br />
The proteins have to be charged by equilibration with SDS, and be reduced<br />
and alkylated to avoid the formation of oligomers. In our laboratory, we use<br />
the PROTEAN II xi Cell system (Bio-Rad) for SDS-PAGE.<br />
1. Assemble the gel casting cassette as per the manufacturer’s instructions.<br />
2. Prepare 10% SDS-PAGE (see Note 10) and pour the solution slowly into the<br />
cassette (two 16 cm × 20 cm glass plates sandwiched by 1.5-mm thick spacers)<br />
until the gel height is approximately 1 cm from the top.<br />
3. Overlay the gel solution with 2 ml of water-saturated isobutanol. It is best to<br />
pour 1 ml of water-saturated isobutanol from one side of the gel and 1 ml on<br />
the other side. Do not pour it all along the gel meniscus.<br />
4. Allow the gel to polymerize for at least 2 h.<br />
5. When polymerization is completed, remove the water-saturated isobutanol and<br />
rinse with water again.<br />
6. With a pair of forceps, carefully place the equilibrated strip on top of the PAGE<br />
gel, with the acidic side of the strip at left. Cover the strip with melted agarose<br />
sealing solution (see Note 11).<br />
7. Assemble the electrophoresis unit (Bio-Rad) and perform electrophoresis at 15°C<br />
as follows: 40 V for 15 min or until the blue dye enters the gel and then raise<br />
the voltage to 125 V and run the gel overnight or until the blue dye migrates to<br />
the bottom of the gel.<br />
8. Switch off the main power and disassemble the gel cassette.
Combining LCM with 2-D Gel Electrophoresis 85<br />
9. Place the gel in a glass container and wash the gel with Milli-Q water.<br />
10. Stain the gel using the mass spectrometry-compatible silver staining protocol<br />
(see Subheading 3.7).<br />
3.7. Silver Staining and Image Analysis<br />
1. The silver staining protocol as described below is used in the authors’ laboratory<br />
and is highly compatible with protein identification by MALDI-TOF MS and<br />
MALDI-TOF/TOF MS/MS. It should be noted that adequate washing with Milli-<br />
Q water is essential to reduce the risk of keratin contamination. All the solutions<br />
must be prepared with Milli-Q water, and all the chemical reagents should be<br />
filtered to remove any particles that may cause interference during MS analysis.<br />
All solutions prepared from solid chemicals should be freshly prepared before<br />
performing silver staining. Fix the gel with fixing solution for at least 2 h,<br />
changing the solution afresh at hourly intervals.<br />
2. Briefly wash with Milli-Q water, with constant shaking for about 15 min.<br />
3. Remove the wash and cover the gel with appropriate sensitivity-enhancing<br />
solution and incubate for 1 h, with constant shaking.<br />
4. Wash the gel thoroughly with Milli-Q water for 6×15min, with gentle shaking<br />
and replacing with fresh Milli-Q water after each cycle (see Note 12).<br />
5. Stain the gel with silver staining solution for 30 min.<br />
6. Wash off excess stain from the gel with Milli-Q water (twice, for 2×1min).<br />
7. Develop the gel for 5–30 min in a developing solution (see Note 13).<br />
8. Add Stop Solution and shake the gel for approximately 20 min to stop the<br />
reaction.<br />
9. Wash the gel using Milli-Q water for 20 min; replace water and repeat the wash.<br />
10. Scan the gel using Personal Densitometer SI, or store the gel in the gel soak<br />
solution for analysis at a later time.<br />
11. Capture the image using ImageMaster 2D Elite software (GE Healthcare). The<br />
image analysis includes spot detection, quantification and normalization of spot<br />
intensity to the background interferences, according to the instructions from the<br />
software. An example of images showing the differences between the protein<br />
profiles of LCM-microdissected HER-2/neu positive and -negative tumor cells<br />
is shown in Fig. 2.<br />
12. Analyze the image using the software and identify spots that show significant<br />
differences in spot intensities (see Note 14), reflecting differential protein<br />
expression in the two subtypes of breast cancer triggered by the presence or<br />
suppression of HER-2/neu oncogene. Only those spots that show either more<br />
than threefold or less than threefold change in signal intensity, consistently<br />
from three replicate sets of gels, are considered as demonstrating differential<br />
protein expression and selected for further analysis by MALDI-TOF MS/MS.<br />
The likelihood of any protein displaying less convincing evidence of differential<br />
protein expression being a potential biomarker for early detection of tumor<br />
growth or a therapeutic target for breast cancer treatment is low.
86 Zhang and Koay<br />
kDa pI3<br />
HER-2/neu-P<br />
HER-2/neu-N<br />
10 pI3<br />
10<br />
92<br />
50<br />
AAH025396<br />
P04075<br />
NP004095<br />
35<br />
28<br />
P06753-2<br />
AAB49495<br />
P07339<br />
NP001531<br />
NP000627<br />
Fig. 2. Silver-stained protein profiles of LCM-dissected cells. Protein samples from<br />
HER-2/neu positive and -negative cells are separated by using IPG ® ( strips (18 cm,<br />
pH 3–10 NL) and homogeneous SDS-PAGE (10%), and then stained with silver<br />
nitrate. Silver-stained gels were scanned using the Personal Densitometer SI (Molecular<br />
Dynamics) and differentially expressed protein spots were analyzed by ImageMaster<br />
2-D Elite software (GE Healthcare). The Accession Numbers indicate the protein<br />
ID identified by MALDI-TOF/TOF tandem mass spectrometry and NCBInr database<br />
search using Mascot software (Matrix Science, London, UK).<br />
3.8. Trypsin Digestion and Preparation of Peptides for Mass<br />
Spectrometric Analysis<br />
1. Excise the silver-stained protein spots showing significant differential protein<br />
expression, as mentioned above, one at a time, taking care not to include adjacent<br />
proteins in vicinity, and transfer to individual tubes.<br />
2. Wash with 100 μl of Milli-Q water for 5 min.<br />
3. Add 50 μl of the destaining solution into the tubes, and about 20 min on a<br />
platform shaker at RT until the gels become clear in color.<br />
4. Remove the solution carefully and wash with 100 μl of Milli-Q water.<br />
5. Incubate the gel pieces with 25 mM sodium bicarbonate for 20 min, and then<br />
cut them into smaller pieces with the tip of the transfer pipette. Avoid carryover<br />
and contamination during repetitive work on consecutive samples.<br />
6. Rinse the gel pieces with Milli-Q water, discard the wash after pulsing down<br />
the gel pieces, and repeat the washing process three times.<br />
7. Add 100 μl of dehydrating solution and incubate for 20 min at RT.<br />
8. Dry the gel pieces in a SpeedVac centrifuge.<br />
9. Re-swell the dried gel pieces with 10–20 μl of Digestion Solution and leave<br />
overnight at 37°C to ensure complete digestion.<br />
10. Extract the resultant hydrophilic peptides first with 10 μl of Milli-Q water for 1 h.
Combining LCM with 2-D Gel Electrophoresis 87<br />
11. Then extract the hydrophobic peptides with Extraction Solution for 2 h.<br />
12. Pool the extracted hydrophilic and hydrophobic peptides and dry the peptide<br />
mixture using the SpeedVac centrifuge.<br />
13. Redissolve the dried peptides in 10 μl of 0.1% (v/v) TFA.<br />
14. Desalt the sample with ZipTip C18 columns (Millipore) and elute the treated<br />
and purified peptides with 2.5 μl of Eluant.<br />
15. Mix 0.5 μl of the sample eluate with 0.5 μl of CHCA matrix (3 mg/ml) and spot<br />
the mixture onto the stainless steel MALDI-TOF sample target plates.<br />
16. The pretreated peptide samples must be stored on ice during transfer to the<br />
core facility for mass spectrometric analysis. In our laboratory, peptide mass<br />
spectra are obtained by the Applied Biosystems 4700 Proteomics Analyzer<br />
MALDI-TOF/TOF mass spectrometer, set in the positive ion reflector mode.<br />
The subsequent MS/MS analyses are performed in a data-dependent manner,<br />
and the 10 most abundant ions fulfilling certain preset criteria are subjected to<br />
high-energy CID analysis. The collision energy is set to 1 keV, and nitrogen is<br />
used as the collision gas.<br />
3.9. Database Search to Match Protein Identities<br />
Database searches were conducted using the MASCOT search engine<br />
(http://www.matrixscience.com). For database search, known contamination<br />
peaks, such as keratin and autoproteolysis peaks, were removed prior to<br />
database search. Protein identification was performed using the MASCOT<br />
software (Matrix Science, London, UK), and all tandem mass spectra were<br />
searched against the NCBInr database, with mass accuracy of within 200 ppm<br />
for mass measurement, and within 0.5 Da for MS/MS tolerance window.<br />
Searches were performed without constraining the protein molecular weight<br />
(Mr) or isoelectric point (pI) and species, and allowing for carbamidomethylation<br />
of cysteine and partial oxidation of methionine residues. Up to one missed<br />
tryptic cleavage was considered for all tryptic-mass searches. Protein scores<br />
greater than 75 are considered to be significant (p < 0.05).<br />
3.10. Experimental Example: Differential Protein Profiles<br />
between HER-2/neu Positive and -Negative Breast Tumors<br />
We dissected the tumor cells from two different subtypes of breast tumors<br />
and compared their protein profiles, based on the protocols described above.<br />
Figure 2 shows the LCM-dissected tumor cell protein patterns visualized by<br />
silver staining. It should be noted that pooled protein samples from different<br />
cases of the same tumor subtypes were used for 2-D GE. This gel-based<br />
protein visualization technique requires high amount of proteins, and thus<br />
more sensitive detecting reagents and protein identification strategies had to<br />
be developed to produce meaningful results (see Notes 15 and 16). Using
88 Zhang and Koay<br />
the silver-staining protocol, we identified 500–600 protein spots in the protein<br />
profiles generated by coupling LCM and 2-D GE. Protein spots of interest would<br />
be excavated and digested with trypsin (Promega), desalted with ZipTipc 18<br />
(Millipore), and analyzed using MALDI-TOF/TOF tandem mass spectrometry.<br />
Protein identities, as shown in Fig. 2, are obtained by searching the NCBInr<br />
databases using the MASCOT software (Matrix Science).<br />
4. Notes<br />
1. All the chemical solutions should be filtered by passing them through filter paper<br />
(Cat No. 1001 150, Whatman ® , Whatman International Limited, Springfield<br />
Mill, Maidstone, Kent, England) to minimize precipitates occurring onto the<br />
gels during silver staining.<br />
2. Tissue processors in standard histopathology laboratories generally include<br />
formalin fixation as the first step in the paraffin infiltration procedure. It is<br />
important to avoid these steps when processing tissues intended for molecular<br />
gene and proteome profiling.<br />
3. Consistent LCM transfers have been demonstrated from 5–10 μm thick paraffinembedded<br />
tissue sections. For a successful LCM transfer, the strength of the bond<br />
between polymer film and targeted tissue must be stronger than that between the<br />
tissue and the underlying glass slide. Therefore, for most tissue types, sections<br />
should be collected with uncharged glass slides. To prevent cross-contamination<br />
while sectioning, residual paraffin and tissue fragments should be wiped off<br />
from the area of the sectioning blade with xylenes between consecutive slides.<br />
If possible, a fresh microtome blade should be used to section a different block.<br />
4. In our hands, hematoxylin and eosin are best reduced to 10% of their standard<br />
concentrations used for routine histomorphological work, when applied to slides<br />
prepared for LCM. Breast tumor cells can be clearly visualized and identified<br />
from other cell types, without influencing the procurement of tumor cells by<br />
LCM, with this modification. Minimum staining also improves macromolecular<br />
recovery during cellular protein extraction.<br />
5. Complete dehydration and air drying of sections are the main factors influencing<br />
the efficiency of LCM. Prolonged air drying or presence of moisture in the<br />
sections appears to inhibit, at least partially, the transfer of cells to the plastic<br />
firm.<br />
6. If the investigators have less experience in checking cancer tissue sections,<br />
we strongly recommend that investigators consult with the pathologists in their<br />
institutions to get assistance in identifying the target cell types that will be<br />
microdissected using LCM. It is essential to avoid contamination of other cell<br />
types, or dissecting the wrong cells.<br />
7. During microdissection, make sure that there are no irregularities on the tissue<br />
surface in or near the area to be microdissected. It should also be noted that<br />
wrinkles can elevate the LCM cap away from the tissue surface and decrease the
Combining LCM with 2-D Gel Electrophoresis 89<br />
membrane contact during laser activation. Use an adhesive pad after microdissection<br />
to remove cells that may have attached non-specifically to the LCM<br />
cap. A cap-alone control is recommended for each experiment to ensure that<br />
non-specific transfer is not occurring during microdissection. The cap should be<br />
processed together with other tissue-containing caps and serves as a negative<br />
control. For protein separation by 2-D GE, 20 to 30 sections from each tissue<br />
sample are dissected, depending on the percentage of targets cells in the full<br />
sections. Generally, 2300–2700 laser pulse shots are used for each cup. Cells<br />
from at least 50,000 shots (spot diameter is 15 μm) are required for each<br />
18-cm gel.<br />
8. Up to 15 mg of proteins can be solubilized with 500 μl of the sample rehydration<br />
buffer, but with our breast tumor tissue samples, we usually reconstitute 1–2 mg<br />
of extracted proteins in 500 μl, or 2–4 mg/ml. It is recommended that the<br />
reconstituted proteins be stored in appropriate aliquots, and that only the required<br />
number of aliquots needed for the experiment at hand be removed at any time,<br />
to avoid repeated freezing and thawing the peptides, which will lead to sample<br />
deterioration.<br />
9. IEF is performed using Ettan IPGphor IEF electrophoresis unit. Rehydration<br />
loading of protein samples is used in the authors’ laboratory. The IPG strips for<br />
first-dimensional separation are commercially available, and can be procured<br />
from GE Healthcare and other suppliers. IPG strips with various pH gradients and<br />
dimensions are available. They are used for protein separation with appropriate<br />
resolution needed. The strips should be kept frozen at –20°C, and thawed just<br />
before use. The IEF conditions are dependent on the pH range. Reference to the<br />
manufacturer’s protocol is recommended. For alkali pH loading, cup loading<br />
is a must, and DTT in the rehydration buffer should be replaced by other<br />
reducing agents, such as hydroxyethyl-disulfide (HED) reagent (Destreak, GE<br />
Healthcare).<br />
10. It is essential to equilibrate the strips before being applied for the seconddimension<br />
gel electrophoresis (2-D SDS-PAGE). DTT added to buffer A will<br />
reduce the disulfide bonds whereas IAA in buffer B will alkylate the formed<br />
sulfydryl groups of proteins. This is to prevent re-oxidation of sulfydryl groups<br />
and streaking of spots during 2-D SDS-PAGE. Further, the presence of SDS<br />
makes the proteins negatively charged and suitably primed for SDS-PAGE. Use<br />
the best quality SDS available for sample and running buffers that include SDS<br />
in their formulation. We recommend C 12 Grade SDS from Pierce (Rockford, IL,<br />
USA).<br />
11. When placing the strips on top of the gel, ensure that the plastic backing of the<br />
strips is in contact with the glass wall. If necessary, the strips can be trimmed<br />
properly. When adding agarose sealing solution, make sure that there are no air<br />
bubbles trapped between the IEF strip and 2-D gel.<br />
12. Wash the gels thoroughly and repeatedly, as recommended, prior to the development<br />
step and during the development step itself, to get clear stained gels.<br />
During the development of the gels, formaldehyde should be added prior to use,
90 Zhang and Koay<br />
and the suggested concentration should be followed strictly to avoid interference<br />
during MALDI-TOF analysis. During the developing stage, the gel should be<br />
constantly shaken to reduce the background.<br />
13. The developing time depends on the total amount of protein that is used for<br />
2-D separation. With a higher amount of protein, a shorter developing time can<br />
be used, without compromising the aim of visualizing the maximum number of<br />
protein spots.<br />
14. It is important to manually verify spot detection and matching, as the variations<br />
in gel resolution, staining, gel background, and automatic image analysis may<br />
not correctly define the spot contours in every case. This variability and the<br />
complexity of 2-D gel patterns hinder the accurate matching of analogous spots<br />
in different gels.<br />
15. In our experience, approximately 500 to 600 distinct proteins from the dissected<br />
breast tumor cells can be visualized on 2D-PAGE stained with silver. On average,<br />
we can extract approximately 4–6 μg of total cellular proteins from 2500 laser<br />
pulses. Our experience is that silver staining of LCM-dissected cell proteins is a<br />
sufficiently sensitive tool for isolating and identifying the dysregulated cellular<br />
proteins of high or moderate abundance. However, for the dysregulated proteins<br />
of low abundance, the lower detection limit of this technology would have to<br />
be enhanced by other techniques such as 125-iodine labeling or biotinylation<br />
and fluorescent dye labeling. In addition, the use of scanning immunoblotting<br />
with class-specific antibodies, for example, would allow sensitive detection of<br />
specific subsets of proteins, e.g., all known proteins involved with cell-cycle<br />
regulation.<br />
16. Protein identification by MALDI-TOF, LC-MS/MS, or other techniques is also<br />
limited by the requirement of a minimal protein input amount, which is often not<br />
attainable from certain types of biopsy samples. A useful strategy to improve<br />
protein identification is to produce parallel “diagnostic” fingerprints derived<br />
from microdissected cells and “sequencing” the fingerprints generated from the<br />
whole tissue section from each case. Alignment of the diagnostic and sequencing<br />
2D gels permits determination of the proteins of interest for subsequent mass<br />
spectrometry or N-terminal sequence analysis.<br />
Acknowledgments<br />
The Tumor Repository of the National University Hospital, Singapore,<br />
provided the clinical breast cancer frozen tissues for LCM. The use of the<br />
PixCell II LCM system was courtesy of the Department of Pathology, Yong<br />
Loo Lin School of Medicine, National University of Singapore (NUS). This<br />
work was supported by an Academic Research Fund from the NUS (Grant No.<br />
R-179-000-032) to the authors.
Combining LCM with 2-D Gel Electrophoresis 91<br />
References<br />
1. Zhang, D., Tai, L. K., Wong, L. L., Sethi, S. K., Koay, E. S. (2005) Proteomics of<br />
breast cancer: enhanced expression of cytokeratin 19 in human epidermal growth<br />
factor receptor type 2 positive breast tumors. Proteomics 5, 1797–1805.<br />
2. Neubauer, H., Clare, S. E., Kurek, R., Fehm, T., Wallwiener, D., Sotlar, K., et al.<br />
(2006) Breast cancer proteomics by laser capture microdissection, sample pooling,<br />
54-cm IPG IEF, and differential iodine radioisotope detection. Electrophoresis 27,<br />
1840–1852.<br />
3. Lawrie, L. C., Curran, S., McLeod, H. L., Fothergill, J. E., Murray, G. I. (2001)<br />
Application of laser capture microdissection and proteomics in colon cancer. J.<br />
Clin. Pathol: Mol. Pathol. 54, 253–258.<br />
4. Ai, J., Tan, Y., Ying, W., Hong, Y., Liu, S., Wu, M., et al. (2006) Proteome<br />
analysis of hepatocellular carcinoma by laser capture microdissection. Proteomics<br />
6, 538–546.<br />
5. Ahram, M., Flaig, M. J., Gillespie, J. W., Duray, P. H., Linehan, W. M.,<br />
Ornstein, D. K., et al. (2003) Evaluation of ethanol-fixed, paraffin-embedded tissues<br />
for proteomic applications. Proteomics 3, 413–421.<br />
6. Greengauz-Roberts, O., Stoppler, H., Nomura, S., Yamaguchi, H., Goldenring,<br />
J. R., Podolskym R. H., et al. (2005) Saturation labeling with cysteine-reactive<br />
cyanine fluorescent dyes provides increased sensitivity for protein expression<br />
profiling of laser-microdissected clinical specimens. Proteomics 5, 1746–1757.<br />
7. Zang, L., Palmer-Toy, D., Hancock, W. S., Sgroi, D. C., Karger, B. L. (2004)<br />
Proteomic analysis of ductal carcinoma of the breast using laser capture microdissection,<br />
LC-MS, and 16 O/ 18 O isotopic labeling. J. Proteome Res. 3, 604–612.<br />
8. Li, C., Hong, Y., Tan, Y. X., Zhou, H., Ai, J. H., Li, S. J., et al. (2004) Accurate<br />
qualitative and quantitative proteomic analysis of clinical hepatocellular carcinoma<br />
using laser capture microdissection coupled with isotope-coded affinity tag and<br />
two-dimensional liquid chromatography mass spectrometry. Mol. Cell. Proteomics<br />
3, 399–409.<br />
9. Zhang, D., Tai, L. K., Wong, L. L., Chiu, L. L., Sethi, S. K., and Koay, E. S. (2005)<br />
Proteomic study reveals that proteins involved in metabolic and detoxification<br />
pathways are highly expressed in HER-2/neu-positive breast cancer. Mol. Cell.<br />
Proteomics 4, 1686–1696.<br />
10. Cowherd, S. M., Espina, V. A., Petricoin, E. F. III, Liotta, L. A. (2004) Proteomic<br />
analysis of human breast cancer tissue with laser-capture microdissection and<br />
reverse-phase protein microarrays. Clin. Breast Cancer 5, 385–392.<br />
11. Gulmann, C., Espina, V., Petricoin, E. III, Longo, D. L., Santi, M., Knutsen, T.,<br />
et al. (2005) Proteomic analysis of apoptotic pathways reveals prognostic factors<br />
in follicular lymphoma. Clin. Cancer Res. 11, 5847–5855.
6<br />
Optimizing the Difference Gel Electrophoresis<br />
(DIGE) Technology<br />
David B. Friedman and Kathryn S. Lilley<br />
Summary<br />
Difference gel electrophoresis (DIGE) technology has been used to provide a powerful<br />
quantitative component to proteomics experiments involving 2D gel electrophoresis. DIGE<br />
combines spectrally resolvable fluorescent dyes (Cy2, Cy3, and Cy5) with sample multiplexing<br />
for low technical variation, and uses an internal standard methodology to analyze<br />
replicate samples from multiple experimental conditions with unsurpassed statistical confidence<br />
for 2D gel-based differential display proteomics. DIGE experiments can facilely<br />
accommodate sufficient independent (biological) replicate samples to control for the large<br />
interpersonal variation expected from clinical samples. The use of multivariate statistical<br />
analyses can then be used to assess the global variation in a complex set of independent<br />
samples, filtering out the noise from technical variation and normal biological variation<br />
thereby focusing on the underlying variation that can describe different disease states. This<br />
chapter focuses on the design and implementation of the DIGE methodology employing<br />
the use of a pooled-sample internal standard in conjunction with the minimal CyDye<br />
chemistry. Notes are also provided for the use of the alternative saturation labeling<br />
chemistry.<br />
Key Words: difference gel electrophoresis; two-dimensional gel electrophoresis;<br />
quantification.<br />
1. Introduction<br />
Human disease phenotypes are a direct result of protein expression and<br />
modification. In many cases, such phenotypes cannot be tied directly to a single<br />
alteration in the genome or resulting proteome, but are likely to be the result<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
93
94 Friedman and Lilley<br />
of multiple factors. Studying disease at the protein level is challenging, but<br />
as proteins are the mediators of phenotype, the study of protein abundance<br />
on a global scale is required to gain a more complete understanding of the<br />
underlying molecular mechanisms of disease. Proteomics in the clinical setting<br />
is rapidly developing and is having a major impact on the way in which diseases<br />
will be diagnosed, treated, and monitored (1). It has been estimated that there<br />
could be hundreds of thousands of different protein isoforms in a mammalian<br />
cell, but the vast dynamic range of protein abundance results in only the<br />
most abundant species of proteins being observable by quantitative proteomics<br />
approaches unless technically variable biochemical or subcellular fractionation<br />
is employed. The repertoire of techniques and associated hardware, which is<br />
now applied to this field, is expanding exponentially, and although a complete<br />
visualization of the proteome is still beyond reach of any single technique, each<br />
technology platform can provide complementary datasets.<br />
Difference gel electrophoresis (DIGE) has proven to be a powerful quantitative<br />
technology for differential display proteomics on a global level, where<br />
the individual abundance changes for thousands of intact proteins can be simultaneously<br />
monitored in replicate samples over multiple variables with statistical<br />
confidence (see Note 1). This includes quantitative information on protein<br />
isoforms that arise due to post-translational modifications (such as acetylation<br />
or phosphorylation), which result in a change in the isoelectric point of the<br />
protein. This also includes splice variants and the results of protein processing,<br />
all of which are resolved for individual quantification and subsequent analysis<br />
by MS.<br />
DIGE is based on conventional 2D gel technology that is capable of resolving<br />
several thousands of intact proteins first by charge using isoelectric focusing<br />
(IEF) and then by apparent molecular mass using SDS-polyacrylamide gel<br />
electrophoresis (PAGE) (6,7) (see Note 2 and Chapters 4 and 5 by Cho et al.<br />
and Zhang et al., respectively). Importantly, DIGE overcomes many of the<br />
limitations commonly associated with 2D gels such as analytical (gel-to-gel)<br />
variation and limited dynamic range that can severely hamper a quantitative<br />
differential display study. This is accomplished using up to three spectrally<br />
resolvable fluorescent dyes (Cy2, Cy3, and Cy5, referred to as CyDyes) that<br />
enable low- to subnanogram sensitivities with >10 4 linear dynamic range, and<br />
then by multiplexing the prelabeled samples into the same analytical run (2D<br />
gel). Multiplexing in this way allows for direct quantitative measurements<br />
between the samples coresolved in the same gel, and is therefore beyond<br />
the limitations imposed by between-gel comparisons with conventional 2D<br />
gels.<br />
The highest statistical power of this multiplexing approach stems from the<br />
utilization of a pooled-sample internal standard comprised of an equal aliquot
Optimizing DIGE Technology 95<br />
of every sample in the experiment (see Subheading 1.2.1). With this method,<br />
two dyes (Cy3 and Cy5) are used to individually label two independent samples<br />
from a much larger experiment, and the Cy2 dye is used to label an internal<br />
standard, which is comprised of an equal aliquot of proteins from every sample<br />
in the experiment. This pooled-sample internal standard is labeled only once in<br />
bulk to avoid additional technical variation, and enough is made and labeled to<br />
allow for an equal aliquot to be coresolved on each gel. The three differentially<br />
labeled samples are then coresolved on the same 2D gel, after which direct<br />
measurements can be made for each resolved protein using the spectrally<br />
exclusive dye channels without interference from technical variation of the<br />
separation (gel-to-gel variation).<br />
Rather than making direct quantitative measurements between the two<br />
samples in the gel, the measurements are instead made relative to the Cy2 signal<br />
for each resolved protein. The Cy2 signal should be the same for a given protein<br />
across different gels because it came from the same bulk mixture/labeling;<br />
therefore, any difference represents gel-to-gel variation, which can be effectively<br />
neutralized by normalizing all Cy2 values for a given protein across all<br />
gels. Using the Cy2 signal to normalize ratios between gels then allows for the<br />
Cy3:Cy2 and Cy5:Cy2 ratios for each protein within each gel to be normalized<br />
to the cognate ratios from the other gels, encompassing all samples. Each gel<br />
may contain different (and/or replicate) samples in the Cy3 and Cy5 channels,<br />
but all samples can be quantified relative to each other because each protein<br />
from each sample is measured to the cognate Cy2 signal from the internal<br />
standard present on each gel. With the use of sufficient replicates, a plethora of<br />
advanced statistical tests can be applied, which can highlight proteins of interest<br />
whose change in expression is related to the disease state under investigation.<br />
Since the technical noise is low, these vital replicates should be independent<br />
(biological) replicates as most of the observed variations will be clinical sample<br />
related rather than technical or experimental related.<br />
In a final step, specific proteins of interest are then identified using standard<br />
mass spectrometry (MS) approaches on gel-resolved proteins that have been<br />
excised and proteolyzed into a discrete set of peptides. Briefly, excised proteins<br />
are subjected to in-gel digestion with trypsin protease (typically), and MS<br />
is used to acquire accurate mass determinations on the resulting peptides,<br />
as well as fragmentation on individual peptides. The mass spectral data are<br />
then used to identify statistically significant candidate protein matches through<br />
sophisticated computer search algorithms that compare the observed MS data<br />
with theoretical peptide masses (using data generated by peptide mass fingerprinting)<br />
or collision-induced fragmentation patterns (obtained from tandem<br />
MS) generated in silico from protein sequences present in databases. (see<br />
Chapter 19 by Fitzgibbon et al.).
96 Friedman and Lilley<br />
1.1. Optimizing Sensitivity and Resolution<br />
There are currently two forms of CyDye labeling chemistries available:<br />
minimal labeling involving the use of N-hydroxy succinimidyl (NHS) ester<br />
reagents for low-stoichiometry labeling of proteins largely via lysine residues,<br />
and saturation labeling, which utilize maleimide reagents for the stoichiometric<br />
labeling of cysteine sulfhydryls.<br />
The most established DIGE chemistry is the “minimal labeling” method,<br />
which has been commercially available since July 2002. Here the CyDye DIGE<br />
fluors are supplied as NHS esters, which react with the -amine groups of<br />
lysine side chains. The three fluors are mass matched (ca. 500 Da), and carry<br />
an intrinsic +1 charge to compensate for the loss of each proton-accepting site<br />
that becomes labeled (thereby maintaining the pI of the labeled protein). Each<br />
dye molecule also adds a hydrophobic component to proteins, which along with<br />
MW influences how proteins migrate in SDS-PAGE.<br />
Minimal labeling reactions are optimized such that only 2–5% of the total<br />
number of lysine residues are labeled, such that on average a given labeled<br />
protein would contain only one dye molecule. This is necessary because lysine is<br />
an abundant amino acid, and multiple labeling events may affect the hydrophobicity<br />
of some proteins such that they may no longer remain soluble under<br />
2DE conditions. Although a given protein form may exhibit specific labeling<br />
efficiencies, these will be the same for labeling with all three dyes, allowing<br />
for direct relative quantification. Minimal labeling with CyDye DIGE fluors is<br />
very sensitive, comparable to silver-staining or postelectrophoretic fluorescent<br />
stains such as Sypro Ruby, Deep Purple or Flamingo Pink (ca. 1 ng), but with<br />
a linear response in protein concentration over five orders of magnitude (8)(see<br />
Note 3).<br />
For maleimide labeling of the cysteine sulfhydryls, the overall lower cysteine<br />
content in proteins allows for labeling of these residues to saturation without<br />
increasing the overall hydrophobicity of the proteins to cause insolubility<br />
problems. Saturation labeling is ultimately more sensitive (150–500 picograms,<br />
and even more so for proteins with high cysteine content). Its use is not as<br />
commonplace, most likely due to the availability of only Cy3 and Cy5 with<br />
this chemistry (see Note 4), the fact that it is blind to the small but significant<br />
population of noncysteine containing proteins, and the additional optimization<br />
of complete cysteine reduction necessary for reproducible labeling. For these<br />
reasons, saturation DIGE is usually reserved for experiments where samples<br />
are limited, where the advantage of the increased sensitivity outweigh these<br />
additional considerations.<br />
To maximize the information that can be gained from DIGE experiments, it is<br />
imperative that resolution of protein species within gels is optimized. Although<br />
single 2DE runs can resolve proteins with pI ranges between pH 3 and 11, and
Optimizing DIGE Technology 97<br />
apparent molecular mass ranges between 10 and 200 kDa, higher resolution and<br />
sensitivity can be obtained by running a series of medium range (e.g., pH 4–7,<br />
7–11) and narrow range (e.g., pH 5–6) IEF gradients with increasing protein<br />
loads, leading to an overall more comprehensive proteomic analysis (6,7,10).<br />
(see Note 5). This is analogous to gaining increased resolution and sensitivity in<br />
an LC/MS-based strategy by using multiple high performance liquid chromatography<br />
columns with different affinity chemistries [e.g., MuDPIT (12)]. Much of<br />
the sensitivity limitation associated with 2D gels can be attributed to the analysis<br />
of unfractionated, whole-cell and whole-tissue extracts. Additional sensitivity<br />
can be gained via enrichment for the proteins of interest, such as by analyzing<br />
prefractionated or subcellular samples, or immune complexes. However, the<br />
additional experimental manipulations required for prefractionation introduce<br />
more technical variation into the samples and necessitates increased independent<br />
(biological) replicates (which can be accommodated with the DIGE internal<br />
standard methodology).<br />
The identification of proteins of interest using MS can be performed directly<br />
from the DIGE gels when protein amounts have been optimized in this way (see<br />
Subheading 3.5). Alternatively, some experimental approaches perform DIGE<br />
analysis using “analytical” gels with lower protein amounts, followed by protein<br />
excision from a secondary, “preparative” gel with higher protein amounts. This<br />
approach has its advantages when dealing with small sample amounts, such is<br />
often the case using the saturation dye chemistries, but is also prone to uncertainties<br />
that arise due to the disproportionate amount of protein loading (see<br />
Note 6). The methods presented in this protocol are for optimization of both<br />
the DIGE data as well as material for subsequent MS using high protein loads.<br />
1.2. Optimizing Statistical Significance<br />
1.2.1. Using the Internal Standard<br />
The ability to coresolve and compare two or three samples in a single gel is<br />
attractive, because it allows for direct relative quantification for a given protein<br />
without any interference from gel-to-gel variations in migration and resolution,<br />
removing the need for running replicate gels for each sample (similar to stable<br />
isotope LC/MS-based strategies, see Chapter 10). This approach has limited<br />
statistical power, however, since confidence intervals are determined based on<br />
the overall variation within a population (see Subheading 3.6.2).<br />
Many researchers new to DIGE technology are not immediately aware of<br />
the increased statistical advantage and multiplexing capabilities of DIGE when<br />
combining this approach with a pooled-sample mixture as an internal standard<br />
for a series of coordinated DIGE gels (13). This design will allow for repetitive<br />
measurements (vital to any type of experimental investigation), and in
98 Friedman and Lilley<br />
such a way as to control both for gel-to-gel variation and provide increased<br />
statistical confidence. In this way, statistical confidence can be measured for<br />
each individual protein based on the variance of repetitive measurements,<br />
independent of the variation in the population. Incorporating independently<br />
prepared replicate samples into the experimental design also controls for<br />
unexpected variation introduced into the samples during sample preparation.<br />
This more complex and statistically powerful experimental design is accomplished<br />
by using one of the three dyes (usually Cy2) to label an internal standard,<br />
which is comprised of equal aliquots of protein from all of the samples in an<br />
experiment. The total amount of the Cy2-labeled internal standard is such that<br />
an equal aliquot can be coresolved within each DIGE gel that also contains<br />
an individual Cy3- and Cy5-labeled sample from the experiment. Since this<br />
standard is composed of all of the samples in a coordinated experiment, each<br />
protein in a given sample should be represented in the standard and thus have<br />
its own unique internal standard (see Note 7). Direct quantitative comparisons<br />
are made individually for each resolved protein between the Cy3- or Cy5-<br />
labeled samples and the cognate protein signal from the Cy2-labeled standard<br />
for that gel (without interference from gel-to-gel variation) and results in the<br />
calculation of a standardized abundance for every spot matched across all<br />
gels within a multigel experiment. The individual signals from the internal<br />
standard are also used to normalize and compare between each in-gel direct<br />
quantitative comparison for that particular protein from the other gels. Using<br />
the Cy2-labeled standard in this fashion, therefore, allows for more precise<br />
and complex quantitative comparisons between gels, including independent<br />
(biological) sample repetition (Fig. 1).<br />
Importantly, the internal standard experimental design allows for the identification<br />
of significant changes that would not have been identified if the analyses<br />
were performed separately, even when using Cy3- and Cy5-labeled samples on<br />
the same DIGE gel (14). This experimental design also allows for multivariable<br />
analyses to be performed in one coordinated experiment, whereby statistically<br />
significant abundance changes can be quantitatively measured simultaneously<br />
between several sample types (e.g., different genotypes, drug treatments, or<br />
disease states), with repetition and without the necessity for every pairwise<br />
comparison to be made within a single DIGE gel (15,16) (see Note 8 and<br />
Chapter 17 by Carpentier et al.).<br />
1.2.2. Assessing Intersample Variation<br />
Clinical proteomics is hampered by the significant variation associated with<br />
patient samples. The largest proportion of this variation comes from biological<br />
diversity, but a significant amount may also come from variable collection
Optimizing DIGE Technology 99<br />
Fig. 1. Illustration of DIGE and experimental design using the mixed-sample internal<br />
standard. (A) Representative gel from a six-gel set containing three differentially<br />
labeled samples: Cy2-labeled internal standard, Cy3-labeled sample #1, and Cy5-<br />
labeled sample #2. The individual protein forms all coresolve in this one gel, but these<br />
three independently labeled populations of proteins can be individually imaged using<br />
mutually exclusive excitation/emission properties of the CyDyes. (B) Schematic of<br />
the sample loading matrix indicating gel number, CyDye labeling and three replicates<br />
(indicated as “1, 2, and 3”) of the four conditions being tested (A, B, C, D). Within<br />
the boxed regions representing each labeled sample is depicted a theoretical protein<br />
that is upregulated in condition D. Dotted lines illustrate how the protein signals from<br />
each sample are directly quantified relative to the Cy2 internal standard signal for<br />
that protein without interference from gel-to-gel variation, and how the Cy3:Cy2 and<br />
Cy5:Cy2 intragel ratios are normalized between the six gels. (C) A graphical representation<br />
of the normalized abundance ratios for this theoretical protein change. Adapted<br />
from (10).<br />
and storage of biological samples. It is of vital importance to identify changes<br />
in protein abundance that are disease specific rather than patient or sample<br />
specific.<br />
In order to gain the more robust data sets necessary to be able to draw<br />
accurate conclusions from clinical proteomics studies, it is, therefore, necessary<br />
to collect and store samples using very stringent and closely adhered to
100 Friedman and Lilley<br />
protocols. It is also necessary to assess the biological variation within the<br />
population being tested and also within a single individual. Interindividual<br />
variation has been the focus of several studies (17,18) and determining a<br />
typical diversity within a single patient (i.e., taking longitudinal samples and<br />
assessing variability in protein abundance) and between patients will determine<br />
the minimum number of patient samples required for an experiment. This is<br />
an essential step before embarking on any large-scale and potentially costly<br />
DIGE experiment. Without this type of pretest, the results of underpowered<br />
experiments run the risk of being peppered with false information (both false<br />
positives and negatives).<br />
As with all complex technologies, the DIGE technique itself is subjected<br />
to technical variation, which will be laboratory specific to a greater or lesser<br />
extent. However, the amplitude of this variation is generally outweighed by the<br />
biological variation associated with a typical sample set (19).<br />
1.2.3. Univariate Statistical Analyses<br />
To date, the majority of published quantitative proteomics studies using the<br />
DIGE technology have applied a univariate test, such as a Student’s t-test<br />
or analysis of variance (ANOVA), to identify protein species with significant<br />
changes in expression [(20) and Chapter 17 by Carpentier et al.]. These tests<br />
calculate the probability (p) that the samples being compared are the same and<br />
therefore any apparent change in expression occurs by chance alone. Typically<br />
an expression change is considered significant if the calculated p-value falls<br />
below a prescribed significance threshold, typically 0.05 (whereby 1 in 20 tests<br />
may give a change in expression by chance). For more stringent analyses, a<br />
p-value of 0.01 is often used as the significance threshold.<br />
When employing these tests on DIGE datasets, there are several factors<br />
that must be considered if correct assumptions are to be made from ensuing<br />
analyses. Student’s t-tests and ANOVA assume that the data achieved is<br />
normally distributed and that any variance is homogeneous. The measurement<br />
and correction of systematic bias within DIGE experiments have been the<br />
subject of several studies, which chart methods to optimize normalization of<br />
data sets (21,22,23).<br />
Another important consideration is that of false discovery rate (FDR), which<br />
could arise as a result of statistical tests such as the ones described above.<br />
These tests involve the simultaneous and independent testing of thousands of<br />
spots. The probability of a false positive being recorded for each test is such<br />
that a substantial number of false positives may accumulate. There are several<br />
approaches to determine the FDR and adjust p-scores to compensate for this,
Optimizing DIGE Technology 101<br />
the most widely used to date being the Benjamini and Hochberg method, whose<br />
use in conjunction with DIGE data has been described by Fodor et al. (21).<br />
1.2.4. Multivariate Statistical Analyses<br />
Discovery phase proteomics often produce large lists of proteins that are<br />
identified as changing significantly in the experiment, many of which may well<br />
be false positives. Another approach to overcome these is the application of<br />
additional multivariate statistical analyses to these datasets, which can help to<br />
filter out false positives that result from whole sample outliers (i.e., sample<br />
misclassification and/or poor sample preparation technique). These analyses,<br />
such as principle components analysis (PCA), partial least squares discriminate<br />
analysis, and unsupervised hierarchical clustering (HC) (see Figs. 2 and 3 and<br />
Chapter 16 by Marengo et al.) have recently been applied to DIGE datasets<br />
[(10,24,25,26,27,28,29,30,31,32)]. Raw and normalized data can be exported<br />
from most DIGE software solutions (e.g., DeCyder, Progenesis), and several<br />
multivariate analyses are now part of an extended data analysis (EDA) software<br />
module as part of the DeCyder suite of software tools (GE Healthcare), which<br />
was specifically developed for DIGE analysis (see Subheading 3.6).<br />
These multivariate analyses work essentially by comparing the expression<br />
patterns of all (or a subset of) proteins across all samples, using the variation<br />
of expression patterns to group or cluster individual samples. Technical noise<br />
(poor sample prep, run-to-run variation) and biological noise (normal differences<br />
between samples, especially present in clinical samples) are almost always<br />
Δfur<br />
Heme<br />
PC2<br />
–Fe<br />
control<br />
PC1<br />
Fig. 2. Illustration of the use of principle component analysis. DIGE was used<br />
to analyze changes in Staphylococcus proteins in response to genetic and chemical<br />
alterations affecting iron utilization. Adapted from (24).
102 Friedman and Lilley<br />
Fig. 3. Hierarchical clustering (by average distance correlation) of representative<br />
novel circadian proteins detected by 2D DIGE of soluble protein extracts from mouse<br />
liver. Pale gray represents low levels of protein expression, black represents intermediate<br />
levels, and dark gray represents high levels of expression. Adapted from (32).<br />
associated with any analytical dataset of this nature, and may well override<br />
any variation that arises due to actual differences related to the biological<br />
questions being tested. Unsupervised clustering of related samples, therefore,<br />
adds additional confidence that a “list of proteins” changing in a DIGE experiment<br />
are not arising stochastically (10).
Optimizing DIGE Technology 103<br />
1.3. DIGE in the Clinical Setting<br />
Although the potential for DIGE to address clinical studies is only beginning<br />
to be addressed [for example, see (29,30)], many studies have been published<br />
demonstrating the feasibility and benefit of DIGE/MS using small patient<br />
cohorts for preliminary studies in colon (14), liver (33,34,35), breast (36,37),<br />
esophageal (38,39), and pancreatic cancers (40), as well as other important<br />
clinical studies such as Severe Acute Respiratory Syndrom (SARS) (41). Many<br />
studies also explore the important benefit of procuring samples using laser<br />
capture microdissection (LCM – see Chapters 3, 5, and 9 by Diaz et al., Zhang<br />
et al., and Mustafa et al., respectively) for a highly enriched population of the<br />
cells under study (16,30,42,43,44). These LCM studies necessitate the use of<br />
the saturation chemistry owing to the increased sensitivity but limited multiplexing<br />
power, and typically require secondary preparative gels with higher<br />
protein loads to enable protein identification by MS.<br />
The study of Suehara et al. (29) represents the utility of a multivariable<br />
DIGE/MS analysis with an extended sample set pertinent for a clinical study.<br />
Eighty soft tissue sarcoma samples comprising seven different histological<br />
backgrounds were analyzed. Using the saturation DIGE fluors, individual<br />
samples were labeled with Cy5 and multiplexed with a pooled-sample internal<br />
standard (labeled in bulk with Cy3) for each DIGE gel. Using high-resolution<br />
2D gel separations and a combination of multivariate statistical tools (support<br />
vector machines, leave-one-out cross-validation, PCA, and HC), these studies<br />
identified a small subset of proteins including tropomyosin and HSP27 that<br />
were able to discriminate between the different classes of tumors. HSP27 in<br />
particular was part of a subclass of discriminating proteins that could distinguish<br />
between leiomyosarcoma and malignant fibrous histiocytoma (MFH), as<br />
well as correlate with patient survival between low-risk and high-risk groups.<br />
HSP27 has long been associated with prognosis in MFH as well as in other<br />
human carcinomas (45).<br />
2. Materials<br />
This chapter assumes a solid understanding in 2D gel electrophoresis and<br />
will focus on the design and implementation of the DIGE method using the<br />
pooled-sample internal standard methodology and the minimal dye chemistry<br />
for Cy2, Cy3, and Cy5, with notes provided for saturation labeling chemistry.<br />
2.1. Cell Lysis Buffers<br />
1. TNE: 50 mM Tris–HCl pH 7.6, 150 mM NaCl, 2 mM EDTA pH 8.0, 2 mM<br />
DTT, 1% (v/v) NP-40.
104 Friedman and Lilley<br />
2. RIPA buffer: 50 mM Tris–HCl pH 8.0, 150 mM NaCl, 1% NP-40, 0.5% deoxycholic<br />
acid, 0.1% SDS.<br />
3. Two-dimentional gel electrophoresis lysis buffer: 7 M urea, 2 M thiourea, 4%<br />
CHAPS, 2 mg/mL DTT, 50 mM Tris–HCl pH 8.0.<br />
4. ASB14 lysis buffer: 7 M urea, 2 M thiourea, 2% amidosulfobetaine 14, 50 mM<br />
Tris–HCl pH 8.0.<br />
NB: depending on the sample, it may also be necessary to add protease<br />
inhibitors and phosphatase inhibitors [sodium pyrophosphate (1 mM), sodium<br />
orthovanadate (1 mM), beta-glycerophosphate (10 mM) and sodium fluoride<br />
(50 mM)] to the chosen lysis buffer (see Subheading 3.1).<br />
2.2. SDS-Polyacrylamide Gel Electrophoresis<br />
1. Immobilized pH gradient (IPG) strips and accompanying ampholyte mixures can<br />
be purchased from a number of commercial vendors. Strip lengths vary from<br />
7 cm to high-resolution 24 cm strips, and pH ranges vary from wide-range (e.g.,<br />
pH 3–11) to high-resolution narrow-range (e.g., pH 5–6) strips.<br />
2. Bind silane working solution (50 mL): 40 mL ethanol, 1 mL acetic acid, 50 μL<br />
bind silane solution (GE Healthcare), 9 mL water (see Note 9).<br />
3. 4× separating gel buffer. 1.5 M Tris-base pH 8.8.<br />
4. 30% acrylamide:bis-acrylamide (37.5:1), N,N,N,N´-tetramethyl-ethylenediamine,<br />
and ammonium persulfate.<br />
5. 10× SDS-PAGE running buffer (1 L): 30.25 g Tris-base, 144.13 g glycine, 10 g<br />
SDS (0.1%).<br />
6. Fixing solution for SyproRuby staining (1 L): 100 mL methanol, 70 mL acetic<br />
acid, 830 mL water. SyproRuby stain is available form several commercial<br />
sources and can be substituted by other total protein stains, such as Deep Purple<br />
(GE Healthcare) or Flamingo Pink (BioRad).<br />
7. Two-dimensional equilibration buffer: 6 M urea, 50 mM Tris-base pH 8.8, 30%<br />
glycerol, 2% SDS, trace bromophenol blue.<br />
8. Water-saturated butanol (see Note 10).<br />
9. Dithiolthreitol (store dessicated).<br />
10. Iodoacetamide (store dessicated, keep in the dark).<br />
2.3. DIGE Labeling Materials<br />
1. N,N-dimethyl formamide (DMF) (see Note 11).<br />
2. Labeling (L) buffer: 7 M urea, 2 M thiourea, 4% CHAPS, 30 mM Tris-base<br />
(do not pH, but ensure that pH of final solution is between 8.0 and 9.0), 5 mM<br />
magnesium acetate (see Note 12). Alternatively, 4% CHAPS can be replaced<br />
with 2% ASB14, especially in cases where membrane rich samples are being<br />
utilized.<br />
3. Rehydration (R) buffer: 7 M urea, 2 M thiourea, 4% CHAPS, 2 mg/mL DTT<br />
(13 mM; 2%).
Optimizing DIGE Technology 105<br />
4. Cyanine dyes with NHS-ester chemistry for minimal labeling (Cy2, Cy3, and<br />
Cy5), and with maleimide chemistry for saturation labeling (Cy3 and Cy5) are<br />
available from GE Healthcare as dry solids.<br />
5. Quenching solution (for minimal labeling): 10 mM lysine.<br />
6. Dithiothreitol reduction stock solution: 200 mg/mL DTT.<br />
3. Methods<br />
The DIGE is a powerful technique for quantitative multivariable differential<br />
display proteomics. However, the quality of the data will only be as good as the<br />
quality of the underlying 2D gel electrophoresis technology upon which it is<br />
based. The main focus of this chapter is to provide detailed notes on the DIGE<br />
technology; however, some key considerations to successful high-resolution 2D<br />
gel electrophoresis are also provided. This section describes methods associated<br />
with labeling using minimal CyDyes.<br />
3.1. Sample Preparation<br />
The key to success for any analytical measurement begins with robust sample<br />
preparation. This not only includes the buffers and materials used, but also the<br />
nature of the samples and the way in which they are procured. The addition<br />
of exogenous materials (such as DNAse, RNAse), or allowing for uncontrolled<br />
manipulation of the sample (such as conditions that may lead to proteolysis) can<br />
severely hamper and sometimes completely prevent an analysis. Care should<br />
be taken to ensure against common laboratory contaminants (e.g., mycoplasma<br />
for tissue culture) that if present may be detected as significant changes using<br />
DIGE, either due to the presence in a subset of samples, or by responding to<br />
the experimental perturbation.<br />
1. Prepare protein extracts using any method of preference.<br />
The appropriate amount of protein can be subsequently precipitated prior to<br />
resuspension in the CyDye labeling buffer (see Subheading 3.2). Ensure against<br />
proteolysis and loss of post-translational modifications (e.g., phosphorylation) as<br />
this is of monumental importance.<br />
Care should be taken not to use reagents that will resolve on the 2D gel, such as<br />
soybean trypsin inhibitor. Small molecule inhibitors such as aprotinin, leupeptin,<br />
pepstatinA, antipain, 4 - (20aminoethyl) benzenesulfonyl fluoride hydrochloride<br />
(AEBSF), sodium orthovanadate, okadaic acid, and microcystin, among others,<br />
are far better choices.<br />
2. Lyse cells using standard lysis buffers such as TNE and RIPA buffers, or even<br />
the buffers used for 2D gel electrophoresis.
106 Friedman and Lilley<br />
All of these buffers have the capability of producing high-resolution samples for<br />
2DE. In most cases, the presence of reagents that would otherwise interfere with<br />
CyDye labeling (such as those that contain primary amines) will be removed prior<br />
to labeling by protein precipitation (see Subheading 3.2).<br />
3. Sonicate cells if necessary to improve sample quality.<br />
Sonication improves sample quality by disrupting nucleic acids, which are subsequently<br />
removed by sample cleanup (see Subheading 3.2) along with phospholipids.<br />
Both of these nonproteinaceous ionic components can obliterate the<br />
resolution during IEF.<br />
Short bursts with a tip-sonicator are suggested. It is important to keep the system<br />
chilled, especially in the presence of urea-containing samples that should never<br />
be heated (see Note 12).<br />
4. Determine the protein concentration of the sample using a system that is<br />
compatible for the buffer that the proteins are extracted in.<br />
CHAPS and thiourea in the buffers used for DIGE, although adequately<br />
chaotropic, interfere with either the Bradford or bicinchoninic acid assays, making<br />
the data inaccurate and unreliable. In these cases, aliquots should be precipitated<br />
prior to quantification in a suitable buffer, or the use of a detergent compatible<br />
assay should be utilized.<br />
5. Aim to use a protein concentration between 1 and 10 mg/mL.<br />
Too dilute and it will be difficult to quantitatively recover proteins following<br />
precipitation cleanup (see Subheading 3.2); too concentrated and it will be<br />
difficult to accurately dispense the appropriate volume for the experiment.<br />
Freeze/thawing should also be kept to a minimum; freezing samples in 1 mL<br />
aliquots or less will usually suffice.<br />
3.2. Sample Cleanup<br />
The desired amount of sample to be used in the experiment should be<br />
precipitated prior to labeling. This removes both nonproteinaceous ions from<br />
the sample (e.g., nucleic acids, phospholipids) that can interfere with IEF,<br />
as well as transfers the proteins into a labeling buffer optimized for CyDye<br />
labeling and subsequent IEF. Determine how much total protein will be on<br />
each gel, and precipitate ½ of that amount for each sample to be run on that<br />
gel. This is straightforward for a two-component separation, but also works out<br />
for the multigel experiments where 1/3 of the total protein amount on each gel<br />
comes from the pooled-sample internal standard (see Table 1.) Precipitate only<br />
what is needed for each sample for the experiment; too much material may<br />
create pellets that are difficult to resolubilize completely.
Table 1<br />
Experimental Design for CyDye Labeling Using a Pooled-Sample Internal Standard<br />
Samples<br />
Gel 1 Gel 2 Gel 3 Pool<br />
Control-1 Treated-1 Control-2 Treated-2 Control-3 Treated-3<br />
Precipitated amount 150 μg 150 μg 150 μg 150 μg 150 μg 150 μg<br />
L-buffer 24 μL 24 μL 24 μL 24 μL 24 μL 24 μL<br />
Aliquot 16 μL 16 μL 16 μL 16 μL 16 μL 16 μL 8 μL (×6)<br />
Cy2 6μL<br />
Cy3 2 μL 2 μL 2 μL<br />
Cy5 2 μL 2 μL 2 μL<br />
30 min on ice in the dark<br />
Lysine (quench) 2 μL 2 μL 2 μL 2 μL 2 μL 2 μL 6 μL<br />
10 min on ice in the dark<br />
Total volume 20 μL 20 μL 20 μL 20 μL 20 μL 20 μL 60 μL<br />
For each gel, combine the quenched Cy3-and Cy5-labeled samples and add 1/3 of the<br />
quenched Cy2-labeled pooled mixture<br />
20+20+20μL 20+20+20μL 20+20+20μL<br />
2× R-buffer 60 μL 60 μL 60 μL<br />
Total 120 μL 120 μL 120 μL<br />
R-buffer to V f to V f to V f<br />
This table illustrates a typical DIGE labeling experiment, as described in Subheadings 3.2 and 3.3.<br />
107
108 Friedman and Lilley<br />
Many precipitation methods are available, the following is a MeOH/CHCl 3<br />
protocol that works well for DIGE, and can be easily performed in 1.5 mL<br />
tubes [adapted from (46)]:<br />
1. Bring up predetermined amount of protein extract to 100 μL with water.<br />
2. Add 300 μL (3-volumes) water.<br />
3. Add 400 μL (4-volumes) methanol.<br />
4. Add 100 μL (1 volume) chloroform.<br />
5. Vortex vigorously and centrifuge; the protein precipitate should appear at the<br />
interface.<br />
6. Remove the water/MeOH mix on top of the interface, being careful not to<br />
disturb the interface. Often the precipitated proteins do not make a visibly white<br />
interface, and care should be taken not to disturb the interface.<br />
7. Add another 400 μL methanol to wash the precipitate.<br />
8. Vortex vigorously and centrifuge; the protein precipitate should now pellet to<br />
the bottom of the tube.<br />
9. Remove the supernatant and briefly dry the pellets in a vacuum centrifuge.<br />
10. Resuspend the pellets in a suitable amount of CyDye labeling buffer (L-buffer,<br />
see Table 1).<br />
An alternative widely used precipitation method is as follows:<br />
1. Add 5 volumes of cold 0.1 M ammonium acetate in methanol.<br />
2. Leave at –20°C for 12 h or overnight.<br />
3. Centrifuge at ∼3000 rpm (1400×g) for 10 min at 4°C and remove the supernatant.<br />
4. A pellet of protein should be visible at this stage.<br />
5. To wash the pellet, add 80% 0.1 M ammonium acetate in methanol and mix to<br />
resuspend the protein.<br />
6. Centrifuge at 3000 rpm (1400×g) for ten min at 4°C and remove the supernatant.<br />
7. To dehydrate the pellet add 80% acetone and resuspend the pellet by mixing.<br />
8. Centrifuge at 3000 rpm (1400×g) for ten min at 4°C and remove the supernatant.<br />
9. Dry pellet for 15 min by leaving open tube in a laminar flow cabinet.<br />
3.3. DIGE Experimental Design<br />
1. Start with a preliminary gel. All experiments should start with a preliminary gel<br />
on representative samples to ensure equivocal protein amounts between samples,<br />
and that the highest resolution and sensitivity are obtained before embarking on a<br />
multigel DIGE experiment. (see Notes 13 and 6). The preliminary gel will also show<br />
any problems with the sample preparation that may be corrected by adjusting the<br />
procurement methods (see Subheading 3.1). This step can also be used to optimize<br />
the maximal amount of protein can be loaded without adversely affecting resolution.<br />
The preliminary gel needs only to test one or two of the samples of a much<br />
larger experiment. This gel can simply be stained with a total protein stain (e.g.,<br />
Sypro Ruby or Deep Purple) to visually inspect the resolution and sensitivity.
Optimizing DIGE Technology 109<br />
Alternatively, the gel can contain two different samples prelabeled with Cy3<br />
and Cy5 and coresolved. (see Note 14).<br />
2. Choose a suitable pH gradient for the IEF. Precast IEF strips are commercially<br />
available from several vendors. The widest length is currently 24 cm, providing<br />
the highest resolving power for a given pH range. Medium-range IEF gradients<br />
(e.g., pH 4–7) offer the best trade-off between overall resolution and sensitivity.<br />
Subsequent experiments can then be designed to resolve proteins in the basic<br />
range (pH 7–11) and in narrow pI ranges with commensurate increases in protein<br />
loading to gain access to the lower abundant proteins in a given sample (see<br />
Note 5). In this way a more comprehensive picture of the proteomes under study<br />
can be obtained.<br />
3. Incorporate a pooled-sample mixture internal standard on every DIGE gel in<br />
a coordinated experiment. This internal standard, usually labeled with Cy2, is<br />
composed of an equal aliquot of every sample in the entire experiment, and<br />
therefore represents every protein present across all samples in an experiment. The<br />
use of this pooled-sample internal standard on every DIGE gel in a coordinated<br />
experiment allows for the facile comparison of independent sample replicates<br />
with increased statistical confidence. This experimental design also enables the<br />
simultaneous quantitative comparison between multiple variables in a coordinated<br />
experiment (Fig. 1).<br />
4. Plan out which samples will be labeled with which dyes ahead of time. For<br />
minimal dye labeling chemistry (see Subheading 3.4), each gel will contain two<br />
individual samples labeled with either Cy3 or Cy5, and an equal amount of the<br />
pooled-sample internal standard. The example outlined in Table 1 is for a twocomponent<br />
comparison repeated in triplicate, with 300 μg total protein loaded<br />
onto each of three gels. In this case, 150 μg of each sample should be precipitated<br />
(see Subheading 3.2), resuspended in L-buffer and then split 2:1. Two-thirds<br />
of each sample (100 μg) will be individually labeled with either Cy3 or Cy5.<br />
The remaining 1/3 of each sample will be pooled together and labeled with Cy2<br />
to serve as an internal standard. By following this, there will be enough of the<br />
Cy2-labeled internal standard to have an equal amount as the Cy3 or Cy5 samples<br />
loaded onto each gel. (see Note 15).<br />
3.4. CyDye Labeling<br />
All steps are performed on ice. The following protocol is for sample loading<br />
via rehydration of IPG strips, and assumes incorporation of a pooled-sample<br />
internal standard to coordinate many samples across multiple DIGE gels simultaneously.<br />
The steps are summarized in Table 1 (see Note 16).<br />
1. Resuspend precipitated sample in 24 μL labeling (L) buffer. Remove 8 μL (1/3<br />
of sample) and place into a new tube that will contain the pooled-sample internal<br />
standard (8 μL from all of the other individual samples will be pooled into this<br />
tube) (see Note 17).
110 Friedman and Lilley<br />
2. CyDyes are purchased as dry solids and should be reconstituted to 10× stock<br />
solutions (1 nmol/μL) in fresh DMF. Dilute stock solutions of CyDyes 1:10 in<br />
fresh DMF to a final working concentration of 100 pmol/μL (see Note 11).<br />
3. Label each sample (50–250 μg) with 2–4 μL (200–400 pmol) of either Cy3 or Cy5<br />
working dilution for 30 min on ice in the dark. Label the pooled-sample mixture<br />
with 2–4 μL (200–400 pmol) of Cy2 working dilution for every equivalent amount<br />
of sample present in the pooled standard as compared with the individually labeled<br />
samples. That is, if 100 μg of each sample is labeled with 200 pmol of Cy3 or<br />
Cy5, then 50 μg of each of these samples is present in the pooled standard, and<br />
200 pmol of Cy2 is used for every 100 μg of pooled standard. (see Table 1 and<br />
Note 18).<br />
4. Quench reactions with 2 μL of 10 mM lysine for 10 min on ice in the dark.<br />
5. For each gel, combine the quenched Cy3- and Cy5-labeled samples and add 1/3<br />
of the quenched Cy2-labeled pooled mixture.<br />
6. To each tripartite mixture, add an equal volume of 2× R-buffer and incubate on<br />
ice for 10 min. 2× R-buffer is R-buffer supplemented with an additional 2 mg/mL<br />
DTT using the 200 mg/mL DTT stock solution. DTT is omitted from the L-buffer<br />
to prevent unfavorable interaction with the CyDyes. Adding an equal volume of<br />
2× R-buffer to the quenched reactions provides the reducing agents to the total<br />
reaction volume at a 1× final concentration.<br />
7. Add R-buffer (1× DTT concentration) to a final volume suggested by the manufacturer<br />
for the given IPG strip length (e.g., 450 μL for 24 cm strips). Add the<br />
appropriate volume of IPG buffer ampholines to 0.5% final (v/v) for IEF. Proceed<br />
with rehydration of dehydrated IPG strips for >16 h and proceed with IEF (see<br />
Subheading 3.5.3 and Note 19).<br />
3.5. 2D Gel Electrophoresis and Poststaining<br />
As a result of the minimal labeling, quantification with the CyDyes is carried<br />
out on only 2–5% of the proteins that are labeled, and the labeled portion of<br />
the protein may migrate at a higher apparent molecular mass than the majority<br />
of the unlabeled protein due to the added mass and hydrophobicity of the dyes<br />
(exacerbated in lower M r species). To ensure that the maximum amount of<br />
protein is excised for subsequent in-gel digestion and MS, minimally labeled<br />
2D DIGE gels are poststained with a total protein stain such as SyproRuby or<br />
Deep Purple. Accurate excision is also ensured by preferentially affixing the<br />
second dimension gel to a presilanized glass plate during gel casting so that<br />
the gel dimensions do not change during the analysis (see Notes 20 and 21).<br />
These methods assume the use of the Ettan 2D electrophoresis system (GE<br />
Healthcare), but are easily adaptable to other commercially available systems.<br />
It also assumes usage of high-resolution 24 cm × 20 cm gels.<br />
1. Special gels for second dimension SDS-PAGE. Using low-fluorescence glass<br />
plates, pretreat one plate for each gel with 3–5 mL bind silane working solution,
Optimizing DIGE Technology 111<br />
carefully wiping the entire surface of the plate with a lint-free wipe. Leave treated<br />
plates covered with lint-free wipes for several hours to allow for sufficient outgassing<br />
of fumes (that may contain bind silane) before assembling gel plates and<br />
casting of second dimensional SDS-PAGE gels (see Note 22).<br />
2. Assemble plates and pour 12% homogeneous SDS-PAGE gel(s) using the appropriate<br />
amount of 30% stock acrylamide and 4× separating gel buffer for the<br />
volumes needed for the number of gels being poured (see Note 23). Overlay the<br />
gels with water-saturated butanol for several hours to provide a straight and level<br />
surface to place the focused IPG strip (see Note 10).<br />
3. Perform IEF using an IPGphor II IEF unit (GE Healthcare) of the combined<br />
tripartite-labeled samples, brought up to final volume with 1× R-buffer and<br />
passively rehydrated into IPG strips for >16 h (see Subheading 3.4.7) (see<br />
Note 24).<br />
4. Equilibrate the focused IPG strips into the second dimensional equilibration buffer.<br />
During this step, the cysteine sulfhydryls in the focused proteins are reduced<br />
and carbamidomethylated by supplementing the equilibration buffer with 1%<br />
DTT for 20 min at room temperature, followed by 2.5% iodoacetamide in fresh<br />
equilibration buffer for an additional 20 min room temperature incubation (see<br />
Note 25).<br />
5. Place equilibrated IPG strip on top of the SDS-PAGE gels that were precast with<br />
low-fluorescence glass plates. Use a thin card or ruler to carefully tamp down the<br />
IPG strip to the SDS-PAGE gel, removing air bubbles at the interface (see Notes<br />
26 and 27).<br />
6. Perform second dimensional SDS-PAGE at constant wattage, using ≪1 W/gel<br />
for at least 1 h prior to ramping up to
112 Friedman and Lilley<br />
two spot patterns, whereas most of the commercial products contain proprietary<br />
algorithms for protein spot detection, intergel matching, protein spot quantification,<br />
and even utilities for building web-based tools for data dissemination.<br />
Many include the ability to average replicate patterns into a single virtual<br />
pattern to be used in a comparative study. They are all designed to compare<br />
multiple spot patterns and quantify abundance changes for individual proteins<br />
between experimental conditions.<br />
Several software packages allow for the analysis of DIGE data. The DeCyder<br />
suite of software tools was specifically developed to support the DIGE platform<br />
when this technology was first marketed by GE Healthcare and is therefore used<br />
as an example here. The differential in-gel analysis (D I A) module of DeCyder<br />
is used for direct quantification of protein spot volume ratios between the triply<br />
codetected signals emanating from each resolved protein, and can be used for<br />
the simplest form of a DIGE experiment for pairwise comparisons with N =1.<br />
The more advanced DIGE experiments that use the internal standard to crosscompare<br />
replicate samples from pairwise and multivariable analyses (N >3)<br />
are handled by the biological variation analysis (BVA) module of DeCyder. In<br />
a BVA experiment, the signals emanating from the internal standard are used<br />
both for direct quantification within each DIGE gel in a coordinated set (using<br />
Differential In-gel Analysis (DIA) module), as well as for normalization and<br />
protein spot pattern matching between gels (see Note 31). This allows for the<br />
calculation of Student’s t-test and ANOVA statistics for individual abundance<br />
changes (see Subheading 3.6.2, and Table 2). BVA is also used to match<br />
patterns between SyproRuby- and CyDye-stained images to facilitate protein<br />
excision for subsequent MS (see Notes 20, 21, and 30).<br />
3.6.2. Experimental Design and Statistical Confidence<br />
In the simplest form of a DIGE experiment, two or three samples are<br />
separately labeled with one of the three dyes and separated in the same gel for<br />
direct pairwise comparisons. In this case, the software first normalizes the entire<br />
signal for each CyDye channel and then calculates the protein spot volume<br />
ratio for each protein pair. A normal distribution is modeled over the actual<br />
distribution of protein pair volume ratios, and two standard deviations of the<br />
mean of this normal distribution represent the 95th percent confidence level for<br />
significant abundance changes.<br />
This N = 1 type of experiment has limited statistical power, since the 95th<br />
percentile confidence interval is determined based on the overall distribution of<br />
changes within the population (see Note 32). Many more changes in abundance<br />
of much lesser magnitude can be detected with much greater statistical confidence<br />
(Student’s t-test and ANOVA, Table 2) by incorporating independent
Optimizing DIGE Technology 113<br />
Table 2<br />
Statistical Applications of DeCyder Biological Variation Analysis and Extended<br />
Data Analysis (EDA) Modules<br />
Average ratio<br />
Student’s t-test<br />
One-way ANOVA<br />
Two-way ANOVA<br />
Principle component<br />
analysis (EDA only)<br />
Hierarchical<br />
clustering (EDA only)<br />
K-means (EDA only)<br />
Self organizing maps<br />
(EDA only)<br />
Gene shaving (EDA<br />
only)<br />
Discriminant analysis<br />
(EDA only)<br />
Calculated for each protein spot feature between two groups<br />
or experimental conditions. Derived from the log standardized<br />
protein abundance changes that were directly quantified<br />
within each DIGE gel relative to the internal standard for the<br />
protein spot feature.<br />
Univariate test of statistical significance for an abundance<br />
change between two groups or experimental conditions.<br />
p-values reflect the probability that the observed change has<br />
occurred due to stochastic chance alone. With DIGE, p-values<br />
of
114 Friedman and Lilley<br />
replicate samples into the experiment (see Note 33). The number of replicates<br />
required in a study depends on the amount of variation in the system being<br />
investigated. Increasing the number of replicates will increase confidence in<br />
smaller changes in expression. The number of gel replicates that are needed for<br />
the experiment to have sufficient sensitivity to detect expression changes can<br />
be determined using power calculations (for example see (19)).<br />
With replicate samples, the Student’s t-test and ANOVA statistics are<br />
measuring the significance of the variation of a specific protein change,<br />
independent of the overall distribution of abundance changes in the population.<br />
Incorporating replicate samples into the experimental design also controls for<br />
unexpected variation introduced into the samples during sample preparation.<br />
This design not only allows for the identification of abundance changes that<br />
are consistent across multiple replicates of an experiment, but can also identify<br />
significant abundance changes that would not have been identified even if the<br />
analyses were performed using Cy3- and Cy5-labeled samples on the same<br />
gels, but without the pooled-sample internal standard to coordinate them (14).<br />
3.6.3. Multivariate Statistical Analysis<br />
Univariate analyses such as the Student’s t-test and ANOVA have traditionally<br />
been used in DIGE experiments to provide a list of statistically significant<br />
changes in protein abundance. The application of multivariate statistical<br />
analyses (as outlined in Subheading 1.2.4) allow for the assessment of<br />
changes on a global scale, and can bring added insight to the usual “list of<br />
proteins” generated. Most software packages allow for the export of raw and<br />
normalized protein spot volumes to allow for these additional statistical tests<br />
and data manipulations; in addition, the DeCyder suite of software tools now<br />
provides an Extended Data Analysis (EDA) module, that includes many of these<br />
tools (Table 2). These tools are now becoming more evident in recent DIGE<br />
publications (10,24,28,29,30,32,52). Although these multivariate analyses are<br />
especially beneficial when analyzing a DIGE experiment that contains three or<br />
more conditions, they can also useful in two-condition comparisons to detect<br />
sample outliers, fouled samples or even poor experimental design.<br />
Figure 2 illustrates an example of PCA applied to a DIGE dataset comprised<br />
of four experimental conditions each measured in quadruplicate. PCA simplifies<br />
multidimensional datasets by reducing the variation down to the two or three<br />
most significant sources of variation. In this example, the first principle<br />
component (PC1) accounts for 62.3% of the variation amongst 156 proteins<br />
of interest, with the second principle component (PC2) accounting for an<br />
additional 12.5% of the variation. Each sample datapoint describes the collective<br />
expression profile for the subset of 156 proteins, and PC1 and PC2 orthogonally
Optimizing DIGE Technology 115<br />
divide the samples into quadrants based on these two largest sources of variation<br />
within DIGE dataset. In this case, 75% of the variance between these proteins<br />
clusters the samples into the proper categories (adapted from (24)).<br />
Figure 3 is taken from a 2D DIGE study, which determined the change in<br />
protein abundance in mouse liver over a 24 h period. In this, study proteins<br />
were harvested from groups of mice on a second cycle after transfer from<br />
synchronized (12 h light:12 h dim red light) to free running conditions (constant<br />
dim red light). Proteins were extracted from each liver and pooled from six<br />
mice per 4-h time point. HC (by average distance correlation) was used to<br />
investigate the expression of 49 novel circadian proteins. This gave a range<br />
of phase groups with 10 proteins peaking during the subjective day and 39<br />
proteins distributed between two clusters, which were most abundant during<br />
the subjective night (adapted from (32)).<br />
Finally, additional information may be gleaned by mapping proteins found<br />
to be changing by DIGE to existing biological pathways and networks.<br />
Many software solutions and services are becoming available for this type<br />
of extended analysis (e.g., Kegg pathways, Ingenuity pathways analysis,<br />
WebGestalt, DeCyder EDA). Although additional validation is necessary to<br />
establish biological significance, the mapping of members of a “list of proteins”<br />
to established pathways and networks can provide validating support for the<br />
proteins observed by DIGE alone. In some cases, it can also indicate potential<br />
proteins associated with the biological question that were not accessible in the<br />
DIGE analysis. For example, Friedman et al. (10) recently reported the use of<br />
network/pathway mapping for proteins found by DIGE/MS in MCF10A cells<br />
overexpressing the HER2 receptor after treatment with TGF-. The majority of<br />
proteins identified with DIGE/MS mapped to a network of pathways involving<br />
TGF- as a major hub, but also included an intercalating pathway involving p53<br />
that effected many proteins that were independently identified in the DIGE/MS<br />
experiments. This insight linking new players to those identified with DIGE/MS<br />
led to the further investigation of a direct role for p53 in the expression of the<br />
tumor suppressor maspin (53).<br />
4. Notes<br />
1. 2DE has traditionally been a popular method for differential display proteomics<br />
on a global scale, but until recently, these strategies lacked the ability to directly<br />
quantify abundance changes in the same fashion as in stable isotope LC/MSbased<br />
strategies (2,3,4). This has been mainly due to the inability to directly<br />
correlate migration patterns and protein staining between gel separations (gelto-gel<br />
variation). Stable isotopes have been used in gel-based proteomics as<br />
well, whereby different proteomes have been separately labeled with different<br />
stable isotopes (e.g., growing cells using 14 N vs. 15 N-labeled medium) prior to
116 Friedman and Lilley<br />
mixing and running together through the same 2DE separation (5). In this case,<br />
abundance changes can be monitored during the mass spectrometry (MS) stage<br />
on individual proteins, but requires the in-gel digestion and MS on every protein<br />
present to discover the subset of proteins that is changing.<br />
2. Both hydrophobicity and molecular weight influence how proteins migrate<br />
during SDS-PAGE, yielding information on apparent molecular mass.<br />
3. In comparison, commonly used silver or colloidal coomassie blue (ca. 5–10 ng<br />
sensitivity) stains typically exhibit a dynamic range of less than two orders of<br />
magnitude (8,9). The CyDye labeling system is compatible with the downstream<br />
processing commonly used to identify proteins via MS and database interrogation,<br />
which involves the generation of tryptic peptides within excised gel<br />
plugs. Trypsin cleaves the peptide bonds the C-terminal side of lysine and<br />
arginine residues, but peptide generation is mostly unhindered as so few lysine<br />
residues are modified by dye labeling.<br />
4. DIGE experiments can still be performed using the internal standard methodology<br />
with only two CyDyes, but twice as many gels are required to analyze the<br />
same number of samples compared with the three-dye minimal labeling scheme.<br />
With saturation labeling, one dye is used to label the internal standard, and the<br />
other is used to label individual samples. A dye-swap scheme is not necessary<br />
in this case because the individual samples are always labeled with the same<br />
CyDye.<br />
5. The use of hydroxyethyl disulfide (commercially available as “DeStreak<br />
reagent”), combined with anodic cup loading, should be used for enhanced<br />
resolution for IEF above pH 8 (11).<br />
6. Running every DIGE gel with the maximal amount of protein (without adversely<br />
effecting first dimension resolution) not only enables detection of lower<br />
abundance proteins, but also provides more material for subsequent protein<br />
identification using MS. This makes every gel in a coordinated DIGE experiment<br />
a “pick-able” gel, without the need to run subsequent preparative gels<br />
with increased protein load that then have to be carefully matched to a lower<br />
abundant, analytical gel. When combined with narrow range IEF, maximizing<br />
the protein amount also allows interrogation of the lower abundant proteins in<br />
a sample.<br />
7. If one sample within a study has very skewed protein distributions compared<br />
with others, then many of the “novel proteins” within this sample will effectively<br />
be diluted out in the pool. Such a sample outlier can be easily identified using<br />
the multivariate statistical analyses described.<br />
8. Repetition not only enables the identification of subtle differences with statistical<br />
confidence, it is also vital to control for nonbiological variation. In most cases<br />
biological variation will outweigh technical variation, therefore, only biological<br />
replicates are necessary. Thus it is important that each replicate sample is derived<br />
from an independent experiment, ideally performed on different occasions as<br />
perhaps using different batches of medium. The independent samples can then be
Optimizing DIGE Technology 117<br />
analyzed coordinately using the pooled-sample internal standard methodology.<br />
See Table 1 for an example of this design.<br />
9. All solutions should be prepared using water that has a resistivity of 18.2 Mcm;<br />
this is referred to as “water” throughout the text.<br />
10. Mix equal parts of butanol and water and shake vigorously. Let the two phases<br />
separate overnight, and use the butanol phase for overlay. Butanol that is not<br />
completely water saturated can extract water from the top of the gel. A more<br />
recent improvement is to use a 0.1% SDS solution in a conventional spray bottle,<br />
used to carefully spray a fine mist over the top of the gels to thoroughly cover<br />
the top of the gel (the gel/overlay interface will not be as obvious).<br />
11. DMF can degrade, producing amines, which can react with the NHS-ester<br />
CyDyes. DMF stocks should be kept fresh (
118 Friedman and Lilley<br />
dimension due to MW and hydrophobicity shifts. Overlabeling results in side<br />
reactions with the epsilon-amine groups of lysine side chains, but since the<br />
maleimide dyes do not carry compensatory charge, this results in the overall<br />
loss of a charge, which creates a series of isoelectric forms in the first dimension<br />
(“charge trains”). Labeling buffer should not contain any components with free<br />
thiols, as these will react with the satCyDyes.<br />
17. L-buffer volume can be increased if necessary for complete resolubilization,<br />
although 100–250 μg or more should resolubilize readily in this volume. The<br />
volume of labeling buffer used for resolubilization should not exceed 40 μL per<br />
sample when using cup loading for sample entry to ensure that the final volumes<br />
will not exceed the capacity of the cup loading (ca. 100–150 μL).<br />
18. These methods are provided assuming that all gels to be run will be used both<br />
for analytical (quantification) as well as preparative (providing material for<br />
subsequent MS) purposes. Current recommendations from the manufacturer are<br />
to label 50 μg of sample with 400 pmol CyDye. Sufficient amount of unlabelled<br />
sample can be added to the quenched reactions to achieve final protein amounts<br />
to facilitate subsequent MS. Alternatively, many have found that the ratios can<br />
be adjusted to label increasing amounts of sample (up to 200 μg with 200 pmol<br />
dye) without adversely affecting the overall labeling reaction (presented here).<br />
19. If samples are to be introduced using anodic cup loading, simply bring this<br />
mixture up to 100 μL in R-buffer and proceed with cup loading. R-buffer can<br />
always be supplemented with additional DTT using the 200 mg/mL DTT stock<br />
solution. In the presence of Destreak reagent for focusing in pH ranges above pH<br />
8, the addition of equal volume 1× R-buffer should provide sufficient amount<br />
of DTT without interfering with the Destreak reagent.<br />
20. Comparison of minimally labeled protein 2D maps with unlabeled protein maps<br />
is generally not a problem, as the addition of only one dye molecule does not<br />
generally prevent the facile matching of small alterations in protein mobility<br />
between the 2- and 5%-labeled protein and the remaining unlabeled protein that<br />
will provide enough material for MS.<br />
21. Poststaining is not necessary with saturation DIGE, since an unlabeled population<br />
with potentially different migration characteristics will not exist.<br />
22. This treatment binds the gel to one of the glass plates and therefore prevents<br />
shrinking/swelling during the poststaining and protein excision processes,<br />
thereby facilitating accurate robotic protein excision. Nothing should be placed<br />
on top of wipes that are covering bind silane-treated plates, as this may leave<br />
impressions that are detected during the scanning phase. Assembly and casting<br />
too soon may create a binding surface on the opposite glass plate, preventing<br />
the gel to be subsequently poststained and picked. Automated protein excision<br />
can be facilitated for certain systems by placing fluorescent alignment reference<br />
targets on the plate, which can be performed at this stage.<br />
23. A stacking gel is not required for 2D gel electrophoresis, as the proteins are<br />
effectively “stacked” to the height of the IPG strip. SDS is also not essential in the<br />
separating gel, as the SDS associated with the proteins during the equilibration
Optimizing DIGE Technology 119<br />
step, and present in the running buffer, is sufficient (although many traditionally<br />
use it in the separating gel). Using 2× concentration running buffer in the upper<br />
buffer chamber can produce higher quality separations in some circumstances.<br />
24. Samples of similar nature should always be focused simultaneously for optimal<br />
reproducibility. Focusing programs vary for some pH gradients. A typical<br />
program for many ranges is 500 V for 500 V-h, stepping to 1000 V for 1000 V-h,<br />
followed by a final step to 8000 V until >50 V-h has been reached. Check<br />
recommendations from specific vendors.<br />
25. Volume of equilibration buffer should be large to ensure sufficient removal of<br />
ampholines and other components of the first dimensional run.<br />
26. Carefully wash out any remaining liquid on top of the SDS-PAGE gel. Prewet<br />
the IPG strip with 1× running buffer and place the strip between the gel plates<br />
with the plastic backing adhering to the inside surface of one of the glass plates.<br />
The prewetted running buffer will facilitate the manipulation of the IPG strip<br />
down the inside surface of the plate and on top of the SDS-PAGE gel.<br />
27. An agarose overlay, used by many protocols, is not absolutely necessary to<br />
ensure proper contact between the IPG strip and the second dimensional SDS-<br />
PAGE gel. Using a thin card or ruler to carefully tamp down the IPG strip to<br />
the gel is usually sufficient and removes the added problems associated with the<br />
overlay, such as trapped air bubbles in the solidified agarose.<br />
28. Running gels at less than 1 W/gel can improve resolution in the high molecular<br />
weight regions of the second dimension gel. Use wattage appropriate for the<br />
second dimensional unit being used. Many different gel units can accommodate<br />
increased power by compensating for the increased heat.<br />
29. Absorption/emission maxima in DMF are 491/506 for Cy2, 553/572 for Cy3,<br />
and 648/669 for Cy5; although care must be taken to scan in regions of each<br />
spectrum that do not contain absorbance or emission in the other spectra, which<br />
may mean using a nonmaximal region of a given spectrum.<br />
30. Comparison of the 2D spot maps between saturation-labeled samples and<br />
minimal labeled or unlabeled samples is impossible, as proteins containing<br />
multiple cysteine residues may appear as significantly larger M r species when<br />
labeled with the saturation dyes, which of course cannot be predicted without<br />
first knowing the protein identity.<br />
31. Almost all software packages for 2D electrophoresis involve matching of protein<br />
spot patterns between gels. For DeCyder, it is used in the BVA module to match<br />
the quantitative data obtained from the triply coresolved protein signals from<br />
each gel in the DIA module (where gel-to-gel variation does not come into<br />
play). Manual verification of the matching is almost always required with any<br />
software package.<br />
32. There are many “all-or-none” type of experiments where the single gel<br />
comparison may be valid, and subtle changes are not expected. Nevertheless,<br />
using independent replicates and the pooled-sample internal standard methodology<br />
is still needed to control for nonbiological sample preparation error.
120 Friedman and Lilley<br />
33. The multigel approach allows many data points to be collected for each group<br />
to be compared. Spots of interest can be selected by looking for significant<br />
change across the groups. Student’s t-test and ANOVA probability scores (p)<br />
indicate the probability that the observed change occurred due to stochastic,<br />
random events (null hypothesis). Probability values
Optimizing DIGE Technology 121<br />
9. Lilley, K.S., Razzaq, A. and Dupree, P. (2002) Two-dimensional gel<br />
electrophoresis: recent advances in sample preparation, detection and quantitation.<br />
Curr Opin Chem Biol 6(1):46–50.<br />
10. Friedman, D.B., Wang, S.E., Whitwell, C.W., Caprioli, R.M. and Arteaga, C.L.<br />
(2007) Multi-variable difference gel electrophoresis and mass spectrometry: A<br />
case study on TGF-beta and ErbB2 signaling. Mol Cell Proteomics 6:150–69.<br />
11. Olsson, I., Larsson, K., Palmgren, R. and Bjellqvist, B. (2002) Organic disulfides<br />
as a means to generate streak-free two-dimensional maps with narrow range basic<br />
immobilized pH gradient strips as first dimension. Proteomics 2(11):1630–32.<br />
12. Wolters, D.A., Washburn, M.P. and Yates, J.R. 3rd (2001) An automated multidimensional<br />
protein identification technology for shotgun proteomics. Anal Chem<br />
73(23):5683–90.<br />
13. Alban, A., David, S.O., Bjorkesten, L., Andersson, C., Sloge, E., Lewis, S. and<br />
Currie, I. (2003) A novel experimental design for comparative two-dimensional gel<br />
analysis: two-dimensional difference gel electrophoresis incorporating a pooled<br />
internal standard. Proteomics 3(1):36–44.<br />
14. Friedman, D.B., Hill, S., Keller, J.W., Merchant, N.B., Levy, S.E., Coffey, R.J.<br />
and Caprioli, R.M. (2004) Proteome analysis of human colon cancer by twodimensional<br />
difference gel electrophoresis and mass spectrometry. Proteomics<br />
4(3):793–811.<br />
15. Gerbasi, V.R., Weaver, C.M., Hill, S., Friedman, D.B. and Link, A.J. (2004) Yeast<br />
Asc1p and mammalian RACK1 are functionally orthologous core 40S ribosomal<br />
proteins that repress gene expression. Mol Cell Biol 24(18):8276–87.<br />
16. Sitek, B., Luttges, J., Marcus, K., Kloppel, G., Schmiegel, W., Meyer, H.E.,<br />
Hahn, S.A. and Stuhler, K. (2005) Application of fluorescence difference gel<br />
electrophoresis saturation labelling for the analysis of microdissected precursor<br />
lesions of pancreatic ductal adenocarcinoma. Proteomics 5(10):2665–79.<br />
17. Hu, Y., Malone, J.P., Fagan, A.M., Townsend, R.R. and Holtzman, D.M. (2005)<br />
Comparative proteomic analysis of intra- and interindividual variation in human<br />
cerebrospinal fluid. Mol Cell Proteomics 4(12):2000–9.<br />
18. Zhang, X., Guo, Y., Song, Y., Sun, W., Yu, C., Zhao, X., Wang, H., Jiang, H.,<br />
Li, Y., Qian, X., Jiang, Y. and He, F. (2006) Proteomic analysis of individual<br />
variation in normal livers of human beings using difference gel electrophoresis.<br />
Proteomics 6(19):5260–68.<br />
19. Karp, N.A., Spencer, M., Lindsay, H., O’Dell, K. and Lilley, K.S. (2005)<br />
Impact of replicate types on proteomic expression analysis. J Proteome Res 4(5):<br />
1867–71.<br />
20. Meunier, B., Dumas, E., Piec, I., Bechet, D., Hebraud, M. and Hocquette, J.F.<br />
(2007) Assessment of hierarchical clustering methodologies for proteomic data<br />
mining. J Proteome Res 6(1):358–66.<br />
21. Fodor, I.K., Nelson, D.O., Alegria-Hartman, M., Robbins, K., Langlois, R.G.,<br />
Turteltaub, K.W., Corzett, T.H. and McCutchen-Maloney, S.L. (2005) Statistical<br />
challenges in the analysis of two-dimensional difference gel electrophoresis experiments<br />
using DeCyder. Bioinformatics 21(19):3733–40.
122 Friedman and Lilley<br />
22. Karp, N., Kreil, D. and Lilley, K. (2004) Determining a significant change in<br />
protein expression with DeCyderTM during a pair-wise comparison using twodimensional<br />
difference gel electrophoresis. Proteomics 4(5):1421–32.<br />
23. Kreil, D., Karp, N. and Lilley, K. (2004) DNA microarray normalization methods<br />
can remove bias from differential protein expression analysis of 2-D difference gel<br />
electrophoresis results. Bioinformatics 20(13):2026–34.<br />
24. Friedman, D.B., Stauff, D.L., Pishchany, G., Whitwell, C.W., Torres, V.J. and<br />
Skaar, E.P. (2006) Staphylococcus aureus redirects central metabolism to increase<br />
iron availability. PLoS Pathog 2(8):e87.<br />
25. Fujii, K., Kondo, T., Yamada, M., Iwatsuki, K. and Hirohashi, S. (2006) Toward<br />
a comprehensive quantitative proteome database: protein expression map of<br />
lymphoid neoplasms by 2-D DIGE and MS. Proteomics 3:3.<br />
26. Fujii, K., Kondo, T., Yokoo, H., Yamada, T., Matsuno, Y., Iwatsuki, K. and<br />
Hirohashi, S. (2005) Protein expression pattern distinguishes different lymphoid<br />
neoplasms. Proteomics 5(16):4274–86.<br />
27. Karp, N.A., Griffin, J.L. and Lilley, K.S. (2005) Application of partial least squares<br />
discriminant analysis to two-dimensional difference gel studies in expression<br />
proteomics. Proteomics 5(1):81–90.<br />
28. Seike, M., Kondo, T., Fujii, K., Yamada, T., Gemma, A., Kudoh, S. and<br />
Hirohashi, S. (2004) Proteomic signature of human cancer cells. Proteomics<br />
4(9):2776–88.<br />
29. Suehara, Y., Kondo, T., Fujii, K., Hasegawa, T., Kawai, A., Seki, K., Beppu, Y.,<br />
Nishimura, T., Kurosawa, H. and Hirohashi, S. (2006) Proteomic signatures<br />
corresponding to histological classification and grading of soft-tissue sarcomas.<br />
Proteomics 6(15):4402–09.<br />
30. Hatakeyama, H., Kondo, T., Fujii, K., Nakanishi, Y., Kato, H., Fukuda, S. and<br />
Hirohashi, S. (2006) Protein clusters associated with carcinogenesis, histological<br />
differentiation and nodal metastasis in esophageal cancer. Proteomics 6(23):<br />
6300–16.<br />
31. Verhoeckx, K.C., Gaspari, M., Bijlsma, S., van der Greef, J., Witkamp, R.F.,<br />
Doornbos, R.P. and Rodenburg, R.J. (2005) In search of secreted protein<br />
biomarkers for the anti-inflammatory effect of beta2-adrenergic receptor agonists:<br />
application of DIGE technology in combination with multivariate and univariate<br />
data analysis tools. J Proteome Res 4(6):2015–23.<br />
32. Reddy, A.B., Karp, N.A., Maywood, E.S., Sage, E.A., Deery, M., O’Neill,<br />
J.S., Wong, G.K., Chesham, J., Odell, M., Lilley, K.S., Kyriacou, C.P. and<br />
Hastings, M.H. (2006) Circadian orchestration of the hepatic proteome. Curr Biol<br />
16(11):1107–15.<br />
33. Lee, I.N., Chen, C.H., Sheu, J.C., Lee, H.S., Huang, G.T., Yu, C.Y., Lu, F.J.<br />
and Chow, L.P. (2005) Identification of human hepatocellular carcinomarelated<br />
biomarkers by two-dimensional difference gel electrophoresis and mass<br />
spectrometry. J Proteome Res 4(6):2062–69.<br />
34. Liang, C.R., Leow, C.K., Neo, J.C., Tan, G.S., Lo, S.L., Lim, J.W., Seow, T.K.,<br />
Lai, P.B. and Chung, M.C. (2005) Proteome analysis of human hepatocellular
Optimizing DIGE Technology 123<br />
carcinoma tissues by two-dimensional difference gel electrophoresis and mass<br />
spectrometry. Proteomics 5(8):2258–71.<br />
35. Nabetani, T., Tabuse, Y., Tsugita, A. and Shoda, J. (2005) Proteomic analysis of<br />
livers of patients with primary hepatolithiasis. Proteomics 5(4):1043–61.<br />
36. Huang, H.L., Stasyk, T., Morandell, S., Dieplinger, H., Falkensammer, G., Griesmacher,<br />
A., Mogg, M., Schreiber, M., Feuerstein, I., Huck, C.W., Stecher, G.,<br />
Bonn, G.K. and Huber, L.A. (2006) Biomarker discovery in breast cancer serum<br />
using 2-D differential gel electrophoresis/ MALDI-TOF/TOF and data validation<br />
by routine clinical assays. Electrophoresis 27(8):1641–50.<br />
37. Somiari, R.I., Sullivan, A., Russell, S., Somiari, S., Hu, H., Jordan, R., George, A.,<br />
Katenhusen, R., Buchowiecka, A., Arciero, C., Brzeski, H., Hooke, J. and<br />
Shriver, C. (2003) High-throughput proteomic analysis of human infiltrating ductal<br />
carcinoma of the breast. Proteomics 3(10):1863–73.<br />
38. Nishimori, T., Tomonaga, T., Matsushita, K., Oh-Ishi, M., Kodera, Y., Maeda, T.,<br />
Nomura, F., Matsubara, H., Shimada, H. and Ochiai, T. (2006) Proteomic analysis<br />
of primary esophageal squamous cell carcinoma reveals downregulation of a cell<br />
adhesion protein, periplakin. Proteomics 6(3):1011–18.<br />
39. Zhou, G., Li, H., DeCamp, D., Chen, S., Shu, H., Gong, Y., Flaig, M.,<br />
Gillespie, J.W., Hu, N., Taylor, P.R., Emmert-Buck, M.R., Liotta, L.A.,<br />
Petricoin, E.F. 3rd and Zhao, Y. (2002) 2D differential in-gel electrophoresis for<br />
the identification of esophageal scans cell cancer-specific protein markers. Mol<br />
Cell Proteomics 1(2):117–24.<br />
40. Yu, K.H., Rustgi, A.K. and Blair, I.A. (2005) Characterization of proteins in<br />
human pancreatic cancer serum using differential gel electrophoresis and tandem<br />
mass spectrometry. J Proteome Res 4(5):1742–51.<br />
41. Wan, J., Sun, W., Li, X., Ying, W., Dai, J., Kuai, X., Wei, H., Gao, X., Zhu, Y.,<br />
Jiang, Y., Qian, X. and He, F. (2006) Inflammation inhibitors were remarkably upregulated<br />
in plasma of severe acute respiratory syndrome patients at progressive<br />
phase. Proteomics 6(9):2886–94.<br />
42. Greengauz-Roberts, O., Stoppler, H., Nomura, S., Yamaguchi, H., Goldenring, J.R.,<br />
Podolsky, R.H., Lee, J.R. and Dynan, W.S. (2005) Saturation labeling with<br />
cysteine-reactive cyanine fluorescent dyes provides increased sensitivity for<br />
protein expression profiling of laser-microdissected clinical specimens. Proteomics<br />
5(7):1746–57.<br />
43. Kondo, T., Seike, M., Mori, Y., Fujii, K., Yamada, T. and Hirohashi, S. (2003)<br />
Application of sensitive fluorescent dyes in linkage of laser microdissection and<br />
two-dimensional gel electrophoresis as a cancer proteomic study tool. Proteomics<br />
3(9):1758–66.<br />
44. Sitek, B., Potthoff, S., Schulenborg, T., Stegbauer, J., Vinke, T., Rump, L.C.,<br />
Meyer, H.E., Vonend, O. and Stuhler, K. (2006) Novel approaches to analyse<br />
glomerular proteins from smallest scale murine and human samples using DIGE<br />
saturation labelling. Proteomics 3:3.<br />
45. Tetu, B., Lacasse, B., Bouchard, H.L., Lagace, R., Huot, J. and Landry, J. (1992)<br />
Prognostic influence of HSP-27 expression in malignant fibrous histiocytoma:
124 Friedman and Lilley<br />
a clinicopathological and immunohistochemical study. Cancer Res 52(8):<br />
2325–28.<br />
46. Wessel, D. and Flugge, U.I. (1984) A method for the quantitative recovery of<br />
protein in dilute solution in the presence of detergents and lipids. Anal Biochem<br />
138(1):141–43.<br />
47. Knowles, M.R., Cervino, S., Skynner, H.A., Hunt, S.P., de Felipe, C., Salim, K.,<br />
Meneses-Lorente, G., McAllister, G. and Guest, P.C. (2003) Multiplex proteomic<br />
analysis by two-dimensional differential in-gel electrophoresis. Proteomics<br />
3:1162–71.<br />
48. Prabakaran, S., Swatton, J.E., Ryan, M.M., Huffaker, S.J., Huang, J.J., Griffin, J.L.,<br />
Wayland, M., Freeman, T., Dudbridge, F., Lilley, K.S., Karp, N.A., Hester, S.,<br />
Tkachev, D., Mimmack, M.L., Yolken, R.H., Webster, M.J., Torrey, E.F. and<br />
Bahn, S. (2004) Mitochondrial dysfunction in schizophrenia: evidence for compromised<br />
brain metabolism and oxidative stress. Mol Psychiatry 9(7):684–97.<br />
49. Wang, D., Jensen, R., Gendeh, G., Williams, K. and Pallavicini, M.G. (2004)<br />
Proteome and transcriptome analysis of retinoic acid-induced differentiation of<br />
human acute promyelocytic leukemia cells, NB4. J Proteome Res 3(3):627–35.<br />
50. Zhang, W. and Chait, B.T. (2000) ProFound: an expert system for protein<br />
identification using mass spectrometric peptide mapping information. Anal Chem<br />
72(11):2482–89.<br />
51. Zhang, Y.Q., Matthies, H.J., Mancuso, J., Andrews, H.K., Woodruff, E. 3rd,<br />
Friedman, D. and Broadie, K. (2004) The Drosophila fragile X-related gene<br />
regulates axoneme differentiation during spermatogenesis. Dev Biol 270(2):<br />
290–307.<br />
52. Yokoo, H., Kondo, T., Fujii, K., Yamada, T., Todo, S. and Hirohashi, S. (2004)<br />
Proteomic signature corresponding to alpha fetoprotein expression in liver cancer<br />
cells. Hepatology 40(3):609–17.<br />
53. Wang, S.E., Narasanna, A., Whitell, C.W., Wu, F.Y., Friedman, D.B. and<br />
Arteaga, C.L. (2007) Convergence of P53 and TGFbeta signaling on activating<br />
expression of the tumor suppressor gene maspin in mammary epithelial cells. J<br />
Biol Chem 4:4.
7<br />
MALDI/SELDI Protein Profiling of Serum<br />
for the Identification of Cancer Biomarkers<br />
Lisa H. Cazares, Jose I. Diaz, Rick R. Drake, and O. John Semmes<br />
Summary<br />
The ability to visualize the full depth of the serum proteome in a high-throughput<br />
manner is a major goal of clinical proteomics. Methodologies, which combine higher<br />
throughput with the ability to observe differential protein expression levels, have been<br />
applied to this goal. An example of such a system is the coupling of robotic sample<br />
processing to matrix-assisted laser desorption time of flight mass spectrometry (MALDI-<br />
TOF-MS). Within this paradigm is a modification of MALDI-TOF termed surfaceenhanced<br />
laser desorption/ionization-TOF (SELDI-TOF). Both conventional MALDI and<br />
SELDI have been used to generate protein expression profiles reflective of potential<br />
peptide changes in serum. This information can be used to identify proteins, which may<br />
enable new diagnostic and therapeutic strategies.<br />
Key Words: matrix-assisted laser desorption ionization; surface-enhanced laser<br />
desorption ionization; mass spectrometry; protein profiling; proteomics.<br />
1. Introduction<br />
Mining the serum proteome for the discovery of new biomarkers is<br />
a major goal of many clinical proteomics efforts. Surface-enhanced laser<br />
desorption/ionization (SELDI) and matrix-assisted laser desorption ionization<br />
(MALDI) have been used extensively for protein profiling in efforts to discover<br />
biomarkers in serum from cancer patients including prostate, lung, head and<br />
neck, ovarian, and colon (1,2,3,4,5,6). MALDI techniques usually require some<br />
up-front fractionation of the serum to reduce the complexity of the sample<br />
(7,8,9) and the ease of use in sample fractionation is considered an advantage<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
125
126 Cazares et al.<br />
in SELDI. An advantage of MALDI-TOF instrumentation is the improved<br />
resolution over SELDI instruments and the ability to directly identify peaks<br />
of interest by analyzing samples in TOF/TOF mode. For routine linear mode<br />
profiling both types of instrumentation give similar results with human serum<br />
(see Fig. 1).<br />
Besides the instrumentation and methodologies related to mass spectrometry<br />
analysis, the quality and quantity of the clinical samples to be tested is an<br />
important consideration. Serum is one of the most common sample types<br />
used in biomarker discovery, because it is routinely obtained in the clinic, a<br />
large proportion of blood clotting factors are removed, and it is a rich source<br />
of molecules that may indicate systemic function. Blood plasma is an alternative<br />
source; however, clinical plasma collection utilizes various anticoagulants,<br />
which should be standardized to allow for universal analysis. Whether<br />
serum or plasma is used, every effort to standardize the sample collection and<br />
processing protocols should be made. Several studies have highlighted this<br />
and determined that multiple factors can affect the resulting spectra generated<br />
from serum samples (10,11). These factors include the elapsed time between<br />
venipuncture and separation of plasma and serum, type of serum collection tube,<br />
5904.6<br />
A.<br />
4212.3<br />
Bruker IMAC Cu 2+ beads<br />
3266.1<br />
2663.4<br />
5337.6<br />
7762.3<br />
9282.0<br />
Ciphergen IMAC Cu 2+ chip<br />
B.<br />
Three primary peaks<br />
used for instrument<br />
standardization<br />
1<br />
2 3<br />
4000 6000 8000 10,000<br />
Fig. 1. Comparison of SELDI and MALDI spectra using QC sera. (A) MALDI<br />
spectra generated using QC processed with IMAC Cu 2 magnetic beads. (B) SELDI<br />
spectra from QC sera processed on IMAC Cu 2 chips. The three peaks used for instrument<br />
optimization are indicated.
MALDI/SELDI Protein Profiling of Serum 127<br />
storage conditions, and the number of freeze thaw cycles. In our laboratory, we<br />
routinely use serum for proteomic profiling. The following protocols outline<br />
our method for collection and storage of serum samples for subsequent analysis<br />
via MALDI-MS.<br />
Reduction of sample complexity is an essential step in the generation of<br />
high quality TOF mass spectrometry data from serum. One method of MALDI<br />
sample preparation that reduces the complexity of serum while remaining<br />
robust and easily amenable to automated high throughput applications is sample<br />
fractionation using magnetic beads (MBs) combined with prestructured MALDI<br />
sample supports (AnchorChip technology). Several MB types with different<br />
surface chemistries can be used to fractionate serum and increase the number<br />
of detectable peaks (12) (see Fig. 2). In addition, depletion of high abundant<br />
203 total unique peaks mass range 1000–10000<br />
Intens. [a.u.]<br />
Intens. [a.u.]<br />
Intens. [a.u.]<br />
Intens. [a.u.]<br />
×10 4<br />
1.50<br />
1.25<br />
1.00<br />
0.75<br />
0.50<br />
0.25<br />
0.00<br />
×10 4<br />
1.5<br />
1.0<br />
0.5<br />
0.0<br />
×10 4<br />
2.0<br />
1.5<br />
1.0<br />
0.5<br />
1016.7<br />
1208.2<br />
1208.4<br />
1361.7<br />
1547.4<br />
1733.8<br />
1946.3<br />
1467.9<br />
1468.2<br />
1706.7<br />
1790.0<br />
1947.3<br />
2014.4<br />
2210.4<br />
2212.7<br />
2382.4<br />
2557.1<br />
2662.3<br />
2663.4<br />
2607.0<br />
2954.5<br />
2955.5<br />
2935.8<br />
3265.3<br />
3266.1<br />
3266.2<br />
3509.3<br />
3450.4<br />
3884.8<br />
4092.6<br />
3885.1<br />
4093.7<br />
3956.9<br />
4211.0<br />
4212.3<br />
4212.5<br />
4644.4<br />
4646.1<br />
4644.5<br />
4964.9<br />
4965.0<br />
5336.0<br />
5337.6<br />
5337.1<br />
5902.8<br />
5904.6<br />
5903.7<br />
6087.8<br />
6086.8<br />
WCX = 84 peaks<br />
6627.7<br />
6432.2<br />
6629.9<br />
6430.7<br />
6628.3<br />
7759.4<br />
IMAC = 85 peaks<br />
7762.3<br />
8138.3<br />
8923.8<br />
8927.5<br />
9278.1<br />
9282.0<br />
C18 = 62 peaks<br />
0.0<br />
×10 4<br />
1.50<br />
1.25<br />
WAX = 80 peaks<br />
1.00<br />
0.75<br />
0.50<br />
0.25<br />
0.00<br />
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000<br />
1262.7<br />
1548.2<br />
2107.5<br />
2607.4<br />
3446.9<br />
4055.2<br />
4211.2<br />
4469.5<br />
4757.7<br />
5062.3<br />
6170.3<br />
6429.1<br />
6627.1<br />
6876.9<br />
7760.4<br />
7758.7<br />
8135.9<br />
7916.4<br />
8134.1<br />
8907.0<br />
9124.8<br />
8905.0<br />
9121.1<br />
9414.3<br />
9411.6<br />
m/z<br />
Fig. 2. MALDI spectra of serum fractionated with magnetic beads. Example of<br />
spectra produced on the Ultraflex-TOF/TOF when serum is fractionated with different<br />
magnetic bead types. A total of 203 unique peaks are resolved in the m/z range of<br />
1000–10,000.
128 Cazares et al.<br />
proteins such as albumin and IgG (13,14) serves to reduce ion suppression<br />
phenomena as well as to reveal less abundant species. Unfortunately, fractionation<br />
greatly increases the number of samples to be processed, which in<br />
turn increases the complexity of the experimental procedure. Processing of<br />
samples is, therefore, best facilitated by the use of robotics, which increases<br />
throughput and produces reproducible results, however, manual processing of<br />
small sample sets can be accomplished with careful attention to detail, and the<br />
protocols and methods contained in this chapter. Another caveat to depletion<br />
strategies is that highly abundant proteins such as albumin inadvertently bind<br />
low abundant species (15,16). For comprehensive biomarker discovery, the<br />
benefits of depletion and fractionation often outweigh these factors. We have<br />
used both depleted and nondepleted serum strategies for biomarker discovery,<br />
and this continues to be a major area of methodological development.<br />
2. Materials<br />
2.1. Serum Collection and Storage<br />
1. Becton Dickinson vacutainer serum separator tube (SST) plus blood collection<br />
tube (16 mm×100 mm, draw volume 8.5 mL) (Becton Dickenson #367988)<br />
2. Screw cap microtubes for cryo-storage (2.0 mL) (Sarstedt Inc.# 72.609.001, with<br />
caps # 65.716)<br />
3. Microcentrifuge tubes for aliquots (1.7 mL) (Corning-Costar #3620)<br />
2.2. Serum Processing for MALDI Using MB-Based Fractionation<br />
1. The MB kit(s) (immobilized metal affinity-Cu, hydrophobic interaction, weak<br />
cationic, or weak anionic exchange) (Bruker Daltonics, Billerica, MA)<br />
2. Optional: ClinProt robotic workstation (Bruker Daltonics)<br />
3. Magnetic separators for manual processing: large (1.5 mL) or small tube (0.5 mL)<br />
format (Bruker Daltonics)<br />
4. -Cyano-4-hydroxycinnamic acid (CHCA) (Bruker Daltonics)<br />
5. Ethanol ultra pure 100%<br />
6. Acetone ultra pure 100%<br />
7. Micropipette capable of delivering 1 μL accurately<br />
8. Peptide standard mix (Bruker)<br />
9. Microtiter plate AnchorChip 600/384 MALDI target 600 μm diameter (Bruker<br />
Daltonics)<br />
2.3. Serum Processing for SELDI<br />
1. Water high performance liquid chromatography (HPLC) grade (Fisher Scientific,<br />
Hampton, NH)
MALDI/SELDI Protein Profiling of Serum 129<br />
2. Copper sulfate, anhydrous (Sigma-Aldrich, St. Louis, MO)<br />
3. Sodium acetate trihydrate salt<br />
4. Phosphate buffered saline (PBS) buffer pH 7.4<br />
5. Urea, at least 99% pure (Promega Madison, WI)<br />
6. CHAPS ultra purity (Fisher Scientific)<br />
7. Sinapinic acid (SPA) (5 μg tube)(Ciphergen Biosystems, Palo Alto, CA)<br />
8. IMAC protein chip arrays (Ciphergen)<br />
9. Bioprocessor holder (Ciphergen) for the processing or 12 chips in a 96-well<br />
format<br />
10. Bioprocessor accessory, 96-well disposable reservoir and gasket (Ciphergen)<br />
11. Acetonitrile ultra high purity grade<br />
12. Trifluoroacetic acid (TFA) (100%, 1 mL ampules) [Sigma/Aldrich Chemical<br />
Company 26,977-8, (589-37-37)]<br />
13. Plate seals<br />
14. For calibration: (all from Ciphergen biosystems) NP20 ProteinChip arrays Allin-one<br />
peptide standard All-in-one protein standard<br />
15. Optional: BioMek 2000 robotic workstation, adapted to process ProteinChip<br />
arrays (Ciphergen biosystems)<br />
15. DPC MicroMix 5 shaker (Diagnostic Products Corporation, Los Angeles, CA)<br />
or another type of rotary or platform shaker<br />
16. Micropipet capable of delivering 1 μL accurately<br />
17. Pooled serum for quality control (QC)<br />
18. 100 mM CuSO 4 in water [room temperature (RT)]: 1.6 g CuSO4 (MW = 159.6)<br />
made up to 100 mL in HPLC grade water<br />
19. 100 mM sodium acetate, pH 4.0 (RT): 9.0 mL 0.2 M sodium acetate stock<br />
(27.2 g/L), 50 mL HPLC water, 41.0 mL 0.2 M acetic acid (add gradually to<br />
get to pH 4.0) (11.6 mL/L made from concentrated).<br />
20. The PBS Buffer pH 7.4 (RT): 10 mL PBS Buffer (10) made up to 100 mL in<br />
HPLC water. Check pH.<br />
21. 10% TFA stock: 1 mL TFA (100%), 9 mL HPLC water (store in amber bottle)<br />
22. 1% TFA working solution (store in amber bottle and make fresh every 2 weeks):<br />
take 1 mL TFA (10%) and add 9 mL HPLC water<br />
23. 8 M Urea, 1% CHAPS in PBS, pH 7.4: 48.05 g Urea, up to 90 mL PBS pH<br />
7.4; stir until dissolved, may need warming. Add 1 g CHAPS. Bring the final<br />
volume to 100 mL with PBS. Filter through 0.4 μm filter. Aliquot into 5 mL<br />
volumes and freeze.<br />
24. 1 M Urea, 0.125% CHAPS in PBS, pH 7.4: dilute the 8 M stock above in PBS<br />
(100 mL 8Min700mLPBS).<br />
2.4. SELDI and MALDI Spectra Acquisition<br />
1. SELDI PBS II, IIc, or PCS 4000 instrument (Ciphergen biosystems)<br />
2. Ultraflex I or II MALDI-TOF–TOF (Bruker Daltonics)
130 Cazares et al.<br />
3. Method<br />
3.1. Serum Collection<br />
Obtain proper patient consent:<br />
1. Perform venipuncture into a 10 cc SST vacutainer tube (without anticoagulant).<br />
2. Allow blood to clot at RT for 30 min.<br />
3. Spin blood at 1700 rcf for 10 min, immediately decant and freeze serum at –70°C<br />
in a screw cap freezer vial (Sarstedt). If this is not possible, the serum can be<br />
stored at –20 for 5 days, before moving to a –70 freezer.<br />
4. Prior to SELDI or MALDI analysis, the sample should be thawed and divided<br />
into small volume aliquots to avoid multiple freeze thaws. When possible, no<br />
sample should be taken through more than two freeze thaw cycles, and the number<br />
of freeze/thaw cycles should be recorded if unused volumes are returned to the<br />
freezer.<br />
3.2. Preparation of Human Serum<br />
Expression profiling of proteins/peptides utilizes both peak mass and<br />
intensity to quantify changes in differential spectra. This necessitates the use<br />
of a QC standard to monitor instrument performance (17). The QC sample<br />
routinely used in our lab is pooled human serum collected using the same<br />
serum collection protocol used to collect (see above SOP) the experimental<br />
samples. Efforts have been made to develop a standardized QC sample for<br />
serum mass spectrometry profiling (18). However, until that end, a large volume<br />
of serum can be pooled and aliquoted to be run with every experimental<br />
sample set. This QC sample should be assayed using the same processing<br />
technique, which will be employed for the experimental samples and the data<br />
from multiple runs analyzed. In this way, the inter- and intra-assay variability<br />
can be determined. Additionally, the spectra obtained from the QC sample can<br />
be used as a benchmark for the integrity of processing, instrument optimization,<br />
and ProteinChip variability. We, therefore, recommend including several QC<br />
samples on a MALDI target and one QC spot on each SELDI ProteinChip.<br />
Acceptable levels of reproducibility need to be established for any new<br />
technology, and sample preparation is the most critical step to the production<br />
of reproducible spectra (see Notes 3, 4, and 5). We have optimized the SELDI<br />
system with high-throughput robotics, and previous studies in our laboratory<br />
have determined that the mass accuracy of SELDI spectra is highly reproducible<br />
with CV’s of 0.05%. Operating in linear mode, we have found the mass accuracy<br />
of an Ultraflex-TOF–TOF to be 0.01% CV. Overall normalized intensity values<br />
for individual peaks using QC sera are routinely below a 20% CV for samples<br />
prepared robotically in our lab using either SELDI or MALDI-MS.
MALDI/SELDI Protein Profiling of Serum 131<br />
3.3. Serum Protein Profiling on the MALDI-TOF–TOF<br />
3.3.1. MB Fractionation of Human Serum<br />
These steps are performed by the ClinProt robot. Below is an outline of<br />
a comparable manual method. Sequential fractionation can also be performed<br />
with multiple bead types.<br />
1. Vortex MBs thoroughly for at least 1 min.<br />
2. In a 0.5 mL eppendorf, pretreat 5 μL of MBs with 50 μL MB-IMAC Cu binding<br />
solution.<br />
3. Place the tube in the magnetic bead separator (MBS) and move it between<br />
adjacent wells 10 times.<br />
4. Collect the beads on the wall of the tube for 20 s and remove the supernatant<br />
carefully with a pipette.<br />
5. Repeat this pretreatment two more times.<br />
6. Add 20 μL of serum and mix carefully with the beads by pipetting up and down<br />
five times.<br />
7. Keep at RT for 2 min.<br />
8. Place the tube in the MBS and wait for 20 s for beads to separate.<br />
9. Remove the supernatant with a pipette tip carefully (the unbound fraction can<br />
be discarded or saved for analysis or a second fractionation step, if desired).<br />
10. To wash, add 80 μL MB-IMAC Cu wash solution and place tube in the MSB<br />
again. Move the tube back and forth to adjacent wells 10 times.<br />
11. Collect the beads on the tube wall for 20 s and remove the supernatant carefully<br />
with a pipette.<br />
12. Repeat this wash two more times.<br />
13. To elute, add 10 μL MB-IMAC Cu elution solution and mix. Let the beads sit<br />
for 5 min at RT.<br />
14. Place the tube on the MBS and wait 20 s for beads to separate.<br />
15. Transfer the eluate to a fresh tube.<br />
3.3.2. Data Collection on MALDI-TOF–TOF Instrument<br />
To best detect proteins over the entire mass range on a MALDI instrument,<br />
it is necessary to optimize the instrument settings for both low mass (typically<br />
2000–20,000 Da) and high mass (20,000–100,000 Da or greater). The best<br />
sensitivity and resolution is in the mass range below m/z 20,000, and this is the<br />
mass range we routinely use for most profiling experiments.<br />
1. Prepare samples on an anchor plate by making dilutions of the eluates of 1:10<br />
in CHCA matrix prepared according to the anchor chip protocol (0.3 mg/mL in<br />
ethanol:acetone 2:1). SPA and/or 2,5-dihydroxybenzoic acid may also be used.<br />
2. Spot 1 μL of the sample diluted in matrix onto the 600 μm diameter AnchorChip<br />
target. Also spot 1 μL of the peptide standard diluted according to the manufacturer’s<br />
instructions.
132 Cazares et al.<br />
3. Allow spots to dry.<br />
4. Perform external calibration with the peptide standard using a linear mode method.<br />
5. Collect at least 300 shots in linear mode, adjusting the laser energy and detection<br />
sensitivity to maximize signal and resolution of the major peaks using a QC spot.<br />
Typically, in linear mode the resolution of the three major peaks should be greater<br />
than 600.<br />
6. Instrument settings will vary based on instrument set-up, and are more numerous<br />
that is feasible to describe in this book chapter but the most important settings to<br />
optimize are acceleration voltage (IS1), laser power, time lag focusing (or PIE),<br />
detector settings, and matrix suppression. Our basic instrument settings in linear<br />
mode are as follows:<br />
IS1, 22<br />
Laser, 37% with laser attenuation offset at 48%, range at 40%<br />
Time lag focus, 200 ns<br />
Detector Gain, 24×<br />
Matrix suppression, gated with suppression up to m/z 800<br />
All spectra should be processed using the same baseline subtraction protocol.<br />
Perform peak detection using a uniform definition of requisite signal-to-noise<br />
ratio and mass window. Although MALDI techniques have the potential to<br />
produce protein profiles that contain patterns capable of distinguishing disease<br />
and identifying biomarkers, a single analysis may produce many hundreds of<br />
protein peaks (see Note 2). Therefore, the data analysis required to discern<br />
the differentiating patterns poses a major challenge, and the analysis and interpretation<br />
of the enormous volumes of proteomic data remains an unsolved<br />
bioinformatics challenge. Many different classification tools are currently being<br />
used with success for the analysis of MALDI data. These approaches include<br />
Fisher discriminative analysis, CART (19,20), support vector machine (21),<br />
artificial neural network (22), boosted decision tree analysis (23), and genetic<br />
algorithm (24). General considerations for data preparation before any type<br />
of analysis should include averaging intensity values for duplicate samples,<br />
baseline subtraction, and peak picking.<br />
3.4. Protein Identification Using MALDI-TOF/TOF<br />
Biomarker candidates detected by protein profiling can be subjected to<br />
TOF/TOF analysis for the identification of peptides directly from serum profiles<br />
using the same sample spot and/or respotting of the sample. Initial analysis in<br />
the reflectron mode will allow for visualization of the target or parent peak.<br />
Metastable fragment ions of the respective precursor ion are then analyzed after<br />
a second acceleration step, and the resulting fragment pattern is interpreted and
MALDI/SELDI Protein Profiling of Serum 133<br />
Peptide <strong>View</strong><br />
MS/MS Fragmentation of DSGEGDFLAEGGGVR<br />
Found in gi|229185, fibrinopeptide A<br />
Start - End<br />
2 - 16<br />
Observed<br />
1465.72<br />
Mr(expt)<br />
1464.72<br />
Mr(calc)<br />
1464.65<br />
Delta<br />
0.07<br />
Miss<br />
0<br />
Sequence<br />
DSGEGDFLAEGGGVR<br />
Matched peptides shown in Bold Red<br />
1 ADSGEGDFLA EGGGVR<br />
×10 4<br />
3<br />
A<br />
1468.0<br />
1SLin, Baseline subtracted<br />
Intens. [a.u.]<br />
2<br />
1<br />
0<br />
1208.2<br />
1352.5<br />
1868.2<br />
1619.0<br />
2675.6<br />
1780.8 2024.5<br />
2297.6<br />
2557.2<br />
1200 1400 1600 1800 2000 2200 2400 2600 m/z<br />
C<br />
B<br />
Fig. 3. Identification of a serum peptide directly from the serum profile. Serum<br />
profile (A) was generated in linear mode on the Ultraflex-TOF/TOF, from which a<br />
peptide (m/z 1469.09) was selected for MS/MS analysis resulting in a fragmentation<br />
spectra (B). This peptide showed homology to fibrinopeptide A using the Mascot search<br />
engine (C).<br />
used for peptide identification via database search. The possibility to directly<br />
sequence the peptides of interest is a powerful feature of this method (see Fig. 3).<br />
3.5. Serum Protein Profiling on SELDI-TOF<br />
3.5.1. Preparation of Serum<br />
Note: All of the following steps including the ProteinChip preparation and<br />
serum incubation on the arrays are performed robotically by the BioMek 2000<br />
robot. The protocols below outline a manual method.<br />
1. Thaw human serum samples on ice. Use separate aliquots to set up duplicates or<br />
triplicates.<br />
2. Add 20 μL human serum into a 1.7 mL microcentrifuge tube (alternatively, this<br />
can be performed in a v-bottom 96-well plate for large sample sets).
134 Cazares et al.<br />
3. Add 30 μL of 8 M Urea, 1% CHAPS in PBS pH 7.4.<br />
4. Vortex tube at 4°C for 10 min or if using a plate, seal and place on MicroMix 5<br />
shaker at 4°C for 10 min: shaker settings: form 20, amplitude 5, time 10 min.<br />
5. Add 100 μL 1 M Urea, 0.125% CHAPS in PBS pH 7.4.<br />
6. Vortex or pipette up and down to mix (total volume 150 μL).<br />
7. Dilute sample 1:5 in PBS pH 7.4 by adding 600 μL PBS. If using a plate, remove<br />
35 μL of serum–urea mixture from first plate and transfer to a second plate. Then<br />
add 140 μL of PBS. Mix by vortexing tube or pipetting up and down.<br />
8. Store on ice until ready to add samples to a bioprocessor containing ProteinChip<br />
arrays.<br />
3.5.2. Preparation of ProteinChip Arrays<br />
This protocol describes the preparation of IMAC-Cu 2+ ProteinChips. Other<br />
types of chips should be prepared according to the manufacturer’s (Ciphergen)<br />
instructions.<br />
1. Label or number IMAC chips on the reverse side and place them into the<br />
bioprocessor according to the manufacturer’s instructions. (see Note 1)<br />
2. Add 50 μL of 100 mM CuSO 4 onto each spot or array.<br />
3. Shake on Micromix 5 for 10 min at RT.<br />
4. Shaker settings: form 20, amplitude 5, time 10 min<br />
5. Flick plate to remove CuSO 4 to waste and pat upside down onto a clean paper<br />
towel to remove residual liquid (liquid can also be removed by aspiration, but<br />
be careful no to touch array surface with pipette tip).<br />
6. Wash with 200 μL of HPLC water 2 min × 5 min at RT on Micromix shaker at<br />
the same settings for form and amplitude as before.<br />
7. Flick plate and pat on paper towel.<br />
8. Add 50 μL of 100 mM sodium acetate pH 4.0.<br />
9. Shake on Micromix shaker for 5 min at RT.<br />
10. Flick plate and pat as before.<br />
11. Wash with HPLC water 2 min × 5 min at RT on Micromix.<br />
12. Add 200 μL PBS pH 7.4.<br />
13. Flick plate and pat as before.<br />
14. Wash with PBS pH 7.4 2 min × 5 min at RT on Micromix.<br />
Leave last volume of PBS on plate until ready to use.<br />
3.5.3. Incubation of Serum on ProteinChip Arrays<br />
1. Remove PBS from bioprocessor with multichannel pipettor, one row at a time<br />
to avoid drying chips.<br />
2. Add 100 μL of each sample to respective arrays. Note: samples should be<br />
randomized as to their placement on the ProteinChip arrays. Duplicate samples<br />
should also be randomly placed.
MALDI/SELDI Protein Profiling of Serum 135<br />
3. Seal plate and shake bioprocessor on micromix (form 20, amplitude 5) for<br />
30 min at RT.<br />
4. Remove samples carefully with a pipette, changing tips to avoid cross contamination.<br />
5. Add 200 μL PBS pH 7.4 to each array and shake on micromix for 5 min at RT<br />
using same shaker settings.<br />
6. Remove PBS with multichannel pipettor changing tips for each row.<br />
7. Wash with 200 μL HPLC water, shake on micromix for 5 min at RT.<br />
8. Remove water with multichannel pipettor.<br />
9. Repeat water wash.<br />
10. Remove chips from bioprocessor and allow chips to dry completely.<br />
3.5.4. Adding SPA Matrix to the Chips<br />
1. To one tube of SPA, add 200 μL acetonitrile (100%).<br />
2. Add 200 μL 1% TFA (final concentration of SPA:12.5 mg/mL in 50% acetonitrile,<br />
50% 0.5% TFA).<br />
3. Vortex for 5 min at RT.<br />
4. Quick spin.<br />
5. Add 1.0 μL SPA matrix to each dry spot, being careful not to touch the pipette<br />
tip to the array surface.<br />
6. Allow to dry.<br />
7. Arrays are now ready to read on the SELDI instrument. Note: The arrays should<br />
be stored in the dark in a cool dry place. It is recommended to read the chips<br />
within a few hours of the addition of the matrix. Some signal degradation may<br />
occur if the arrays are stored for more than 24 h).<br />
3.5.5. Collection of Spectra on SELDI-TOF<br />
We describe here the collection of spectra using the PBS II Ciphergen<br />
instrument.<br />
3.5.5.1. Calibration<br />
Calibration of the SELDI instrument is crucial to the accurate mass analysis<br />
of the proteins present in samples. Smaller ions fly faster than larger ions, and<br />
their m/z ratio can be calculated from their flight time using compounds of<br />
known mass. For the most accurate mass assignments, the instrument should be<br />
calibrated using conditions identical to the experimental conditions. Calibration<br />
should be performed at the beginning of an experimental run, and thereafter<br />
everyday the experimental data is collected. When obtaining calibration spectra,<br />
use instrument settings as close to the settings used for serum profiling (i.e.,<br />
detector voltage, lag time, etc.) as possible.<br />
1. Reconstitute one vial each of the seven-in-one peptide and protein standards,<br />
according to the manufacturer’s instructions. Aliquot and freeze.
136 Cazares et al.<br />
2. Mix standards with SPA according to package insert.<br />
3. Deposit 1 μL of each standard onto an array of an NP20 ProteinChip.<br />
4. Air-dry the arrays completely, usually 30–60 min.<br />
5. Read the array in the SELDI instrument using a spot protocol created to read<br />
the experimental samples (see below). The laser intensity should be lowered<br />
such that the peaks from the standards do not exceed 75% maximum signal<br />
intensity.<br />
6. Follow the calibration dialogue in the software of the PBSII SELDI instrument<br />
to save the calibration equations.<br />
3.5.5.2. SELDI Instrument Settings Optimization<br />
The SELDI instrument optimization refers to the adjustment of settings<br />
necessary for data collection, which will maximize signal intensity while<br />
retaining the optimal resolution and the lowest noise. In our studies, there are<br />
three consistently present protein peaks (m/z 5900, 7764, 9284 ± 0.2%) in the<br />
QC sera processed on IMAC-Cu 2+ ProteinChips, which are used as benchmarks<br />
for instrument optimization (see Fig. 1). Based on multiple runs, the<br />
instrument settings are adjusted to maximize signal to noise and resolution for<br />
these three peaks. Thereafter specific criteria were set to ensure instrument<br />
optimization (refer to paper Semmes et al. (17)). Generally, when trying to<br />
obtain a specific overall intensity level (e.g., to get two instruments to behave<br />
similarly, or to obtain similar intensity levels over time), three parameters can<br />
be adjusted. These include laser intensity, detector sensitivity, and detector<br />
voltage. The following spot protocols for data collection on the SELDI reader<br />
are a starting point. The settings will be different from instrument to instrument<br />
and will change over time, based on cumulative laser utilization and detector<br />
settings.<br />
Data collection: standard spot protocol for QC serum on IMAC-Cu (for a<br />
PBSII)<br />
1. Set detector voltage to 1650.<br />
2. Set high mass to 100,000 Da, optimized from 3000 to 50,000 Da.<br />
3. Set starting laser intensity to 220.<br />
4. Set starting detector sensitivity to 7.<br />
5. Focus lag time at 900 ns.<br />
6. Set data acquisition method to SELDI quantitation.<br />
7. Set SELDI acquisition parameters 20 delta to 4 transients per to 12 ending position<br />
to 80.<br />
8. Set warming positions with two shots at intensity 230 and do not include warming<br />
shots.<br />
When adjusting to meet QC criteria:
MALDI/SELDI Protein Profiling of Serum 137<br />
• Increasing detector voltage typically increases signal and noise. Change this in units<br />
of 25 V.<br />
• Increasing laser increases signal and generally decreases resolution. Change this in<br />
units of 10.<br />
• Increasing sensitivity increases signal intensity. Typical working range is six to eight.<br />
For example, if the settings above are not meeting QC specifications, try the<br />
following:<br />
If S/N passes easily but resolution is low, reduce detector voltage or laser<br />
intensity:<br />
1. Set detector voltage to 1625.<br />
2. Set high mass to 100,000 Da, optimized from 3000 to 50,000 Da.<br />
3. Set starting laser intensity to 220.<br />
4. Set starting detector sensitivity to 7.<br />
5. Focus lag time at 900 ns.<br />
6. Set data acquisition method to SELDI quantitation.<br />
7. Set SELDI acquisition parameters 20 delta to 4 transients per to 12 ending position<br />
to 80 (192 total shots).<br />
8. Set warming positions with two shots at intensity 230 and do not include warming<br />
shots.<br />
If resolution passes but S/N is low increase laser intensity or detector voltage:<br />
1. Set detector voltage to 1650.<br />
2. Set high mass to 100,000 Da, optimized from 3000 to 50,000 Da.<br />
3. Set starting laser intensity to 230.<br />
4. Set starting detector sensitivity to 7.<br />
5. Focus lag time at 900 ns.<br />
6. Set data acquisition method to SELDI quantitation.<br />
7. Set SELDI acquisition parameters 20 delta to 4 transients per to 12 ending position<br />
to 80.<br />
8. Set warming positions with two shots at intensity 230 and do not include warming<br />
shots.<br />
If intensity is too high (i.e., generally stay under 65), reduce laser intensity<br />
and/or sensitivity:<br />
1. Set detector voltage to 1650.<br />
2. Set high mass to 100,000 Da, optimized from 3000 to 50,000 Da.<br />
3. Set starting laser intensity to 220.<br />
4. Set starting detector sensitivity to 6.<br />
5. Focus lag time at 900 ns.<br />
6. Set data acquisition method to SELDI quantitation.
138 Cazares et al.<br />
7. Set SELDI acquisition parameters 20 delta to 4 transients per to 12 ending position<br />
to 80.<br />
8. Set warming positions with two shots at intensity 230 and do not include warming<br />
shots.<br />
After data collection, each spectrum should be calibrated for mass using the<br />
current peptide calibration. If higher molecular weight data is included for<br />
analysis, the protein standard calibration should be used for the peaks in this<br />
mass range. Spectra should be normalized using total ion current (this is a<br />
feature in the Ciphergen software) with the same normalization coefficient<br />
and low mass cutoff (2000 Da for SPA matrix to exclude matrix peaks). All<br />
spectra should also be processed using the same baseline subtraction protocol.<br />
Perform peak detection using a uniform definition of requisite signal-to-noise<br />
ratio (usually 3) and mass window (usually 0.2–0.3%).<br />
4. Notes<br />
1. Use powder-free nitrile (not latex) gloves when processing SELDI ProteinChips.<br />
Repetitive peaks at 3000–4000 Da will appear in the spectra if samples are<br />
contaminated with latex.<br />
2. Use sample sets of sufficient size. A sample set of at least 30 should be included<br />
in each classification group in order to do multivariate analysis and to give >90%<br />
statistical confidence in a single marker with p values
MALDI/SELDI Protein Profiling of Serum 139<br />
Liotta, L. A. (2002). Use of proteomic patterns in serum to identify ovarian cancer.<br />
Lancet, 359: 572–577.<br />
4. de Noo, M. E., Mertens, B. J., Ozalp, A., Bladergroen, M. R., van der Werff, M. P.,<br />
vandeVelde,C.J.,Deelder,A.M.,andTollenaar,R.A.(2006).Detectionofcolorectal<br />
cancer using MALDI-TOF serum protein profiling. Eur J Cancer, 42: 1068–1076.<br />
5. Sidransky, D., Irizarry, R., Califano, J. A., Li, X., Ren, H., Benoit, N., and Mao, L.<br />
(2003). Serum protein MALDI profiling to distinguish upper aerodigestive tract<br />
cancer patients from control subjects. J Natl Cancer Inst, 95: 1711–1717.<br />
6. Howard, B. A., Wang, M. Z., Campa, M. J., Corro, C., Fitzgerald, M. C., and<br />
Patz, E. F. Jr. (2003). Identification and validation of a potential lung cancer serum<br />
biomarker detected by matrix-assisted laser desorption/ionization-time of flight<br />
spectra analysis. Proteomics, 3: 1720–1724.<br />
7. Baumann, S., Ceglarek, U., Fiedler, G. M., Lembcke, J., Leichtle, A., and Thiery, J.<br />
(2005). Standardized approach to proteome profiling of human serum based on<br />
magnetic bead separation and matrix-assisted laser desorption/ionization time-offlight<br />
mass spectrometry. Clin Chem, 51: 973–980.<br />
8. Orvisky, E., Drake, S. K., Martin, B. M., Abdel-Hamid, M., Ressom, H. W.,<br />
Varghese, R. S., An, Y., Saha, D., Hortin, G. L., Loffredo, C. A., and Goldman, R.<br />
(2006). Enrichment of low molecular weight fraction of serum for MS analysis of<br />
peptides associated with hepatocellular carcinoma. Proteomics, 6: 2895–2902.<br />
9. Feuerstein, I., Rainer, M., Bernardo, K., Stecher, G., Huck, C. W., Kofler, K.,<br />
Pelzer, A., Horninger, W., Klocker, H., Bartsch, G., and Bonn, G. K. (2005).<br />
Derivatized cellulose combined with MALDI-TOF MS: a new tool for serum<br />
protein profiling. J Proteome Res, 4: 2320–2326.<br />
10. Rai, A. J., Gelfand, C. A., Haywood, B. C., Warunek, D. J., Yi, J., Schuchard, M. D.,<br />
Mehigh, R. J., Cockrill, S. L., Scott, G. B., Tammen, H., Schulz-Knappe, P., Speicher,<br />
D. W., Vitzthum, F., Haab, B. B., Siest, G., and Chan, D. W. (2005). HUPO plasma<br />
proteome project specimen collection and handling: towards the standardization of<br />
parameters for plasma proteome samples. Proteomics, 5: 3262–3277.<br />
11. Banks, R. E., Stanley, A. J., Cairns, D. A., Barrett, J. H., Clarke, P., Thompson, D.,<br />
and Selby, P. J. (2005). Influences of blood sample processing on low-molecular<br />
weight proteome identified by surface-enhanced laser desorption/ionization mass<br />
spectrometry. Clin Chem, 51: 1637–1649.<br />
12. Villanueva, J., Philip, J., Entenberg, D., Chaparro, C. A., Tanwar, M. K.,<br />
Holland, E. C., and Tempst, P. (2004). Serum peptide profiling by magnetic<br />
particle-assisted, automated sample processing and MALDI-TOF mass<br />
spectrometry. Anal Chem, 76: 1560–1570.<br />
13. Guerrier, L., Thulasiraman, V., Castagna, A., Fortis, F., Lin, S., Lomas, L.,<br />
Righetti, P. G., and Boschetti, E. (2006). Reducing protein concentration range<br />
of biological samples using solid-phase ligand libraries. J Chromatogr B Analyt<br />
Technol Biomed Life Sci, 833: 33–40.<br />
14. Fountoulakis, M., Juranville, J. F., Jiang, L., Avila, D., Roder, D., Jakob, P.,<br />
Berndt, P., Evers, S., and Langen, H. (2004). Depletion of the high-abundance<br />
plasma proteins. Amino Acids, 27: 249–259.
140 Cazares et al.<br />
15. Lowenthal, M. S., Mehta, A. I., Frogale, K., Bandle, R. W., Araujo, R. P.,<br />
Hood, B. L., Veenstra, T. D., Conrads, T. P., Goldsmith, P., Fishman, D., Petricoin,<br />
E. F. 3rd, and Liotta, L. A. (2005). Analysis of albumin-associated peptides and<br />
proteins from ovarian cancer patients. Clin Chem, 51: 1933–1945.<br />
16. Mehta, A. I., Ross, S., Lowenthal, M. S., Fusaro, V., Fishman, D. A.,<br />
Petricoin, E. F. 3rd, and Liotta, L. A. (2003). Biomarker amplification by serum<br />
carrier protein binding. Dis Markers, 19: 1–10.<br />
17. Semmes, O. J., Feng, Z., Adam, B. L., Banez, L. L., Bigbee, W. L., Campos, D.,<br />
Cazares, L. H., Chan, D. W., Grizzle, W. E., Izbicka, E., Kagan, J., Malik, G.,<br />
McLerran, D., Moul, J. W., Partin, A., Prasanna, P., Rosenzweig, J., Sokoll, L. J.,<br />
Srivastava, S., Srivastava, S., Thompson, I., Welsh, M. J., White, N., Winget, M.,<br />
Yasui, Y., Zhang, Z., and Zhu, L. (2005). Evaluation of serum protein profiling by<br />
surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for<br />
the detection of prostate cancer: I. Assessment of platform reproducibility. Clin<br />
Chem, 51: 102–112.<br />
18. Rai, A. J., Stemmer, P. M., Zhang, Z., Adam, B. L., Morgan, W. T., Caffrey, R. E.,<br />
Podust, V. N., Patel, M., Lim, L. Y., Shipulina, N. V., Chan, D. W., Semmes, O. J.,<br />
and Leung, H. C. (2005). Analysis of human proteome organization plasma<br />
proteome project (HUPO PPP) reference specimens using surface enhanced<br />
laser desorption/ionization-time of flight (SELDI-TOF) mass spectrometry: multiinstitution<br />
correlation of spectra and identification of biomarkers. Proteomics, 5:<br />
3467–3474.<br />
19. Semmes, O. J., Cazares, L. H., Ward, M. D., Qi, L., Moody, M., Maloney, E.,<br />
Morris, J., Trosset, M. W., Hisada, M., Gygi, S., and Jacobson, S. (2005). Discrete<br />
serum protein signatures discriminate between human retrovirus-associated<br />
hematologic and neurologic disease. Leukemia, 19: 1229–1238.<br />
20. Qian, H. G., Shen, J., Ma, H., Ma, H. C., Su, Y. H., Hao, C. Y., Xing, B. C.,<br />
Huang, X. F., and Shou, C. C. (2005). Preliminary study on proteomics of gastric<br />
carcinoma and its clinical significance. World J Gastroenterol, 11: 6249–6253.<br />
21. Ressom, H. W., Varghese, R. S., Abdel-Hamid, M., Eissa, S. A., Saha, D.,<br />
Goldman, L., Petricoin, E. F., Conrads, T. P., Veenstra, T. D., Loffredo, C. A.,<br />
and Goldman, R. (2005). Analysis of mass spectral serum profiles for biomarker<br />
selection. Bioinformatics, 21: 4039–4045.<br />
22. Liu, J., Zheng, S., Yu, J. K., Zhang, J. M., and Chen, Z. (2005). Serum protein<br />
fingerprinting coupled with artificial neural network distinguishes glioma from<br />
healthy population or brain benign tumor. J Zhejiang Univ Sci B, 6: 4–10.<br />
23. Qu, Y., Adam, B. L., Yasui, Y., Ward, M. D., Cazares, L. H., Schellhammer, P. F.,<br />
Feng, Z., Semmes, O. J., and Wright, G. L. Jr. (2002). Boosted decision tree analysis<br />
of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates<br />
prostate cancer from noncancer patients. Clin Chem, 48: 1835–1843.<br />
24. Papadopoulos, M. C., Abel, P. M., Agranoff, D., Stich, A., Tarelli, E., Bell, B. A.,<br />
Planche, T., Loosemore, A., Saadoun, S., Wilkins, P., and Krishna, S. (2004). A<br />
novel and accurate diagnostic test for human African trypanosomiasis. Lancet, 363:<br />
1358–1363.
8<br />
Urine Sample Preparation and Protein Profiling<br />
by Two-Dimensional Electrophoresis<br />
and Matrix-Assisted Laser Desorption Ionization<br />
Time of Flight Mass Spectroscopy<br />
Panagiotis G. Zerefos and Antonia Vlahou<br />
Summary<br />
Urine represents the most easily attainable and consequently one of the most common<br />
samples in clinical analysis and diagnostics. However, urine is also considered one of<br />
the most difficult proteomic samples to work with due to its highly variable contents,<br />
as well as the presence of various proteins in low abundance or modified forms. In this<br />
chapter, we describe simple protocols and troubleshooting tips for urinary protein preparation<br />
and profiling by two-dimensional electrophoresis or directly via matrix-assisted laser<br />
desorption ionization time of flight mass spectroscopy. Direct dilution, protein precipitation,<br />
ultrafiltration, and solid phase extraction in combination to the above profiling<br />
technologies serve the means for reliable proteomics analysis of one of the most significant<br />
yet very complex biological samples.<br />
Key Words: urine; 2DE; MALDI-TOF-MS; protein profiling; sample preparation.<br />
Abbreviations: ACT: Acetone, CE: Capillary electrophoresis, CHAPS:<br />
[3-[(3-cholamidopropyl)dimethylammonio-1-propanesulfonate], CHCA: -Cyano-4-<br />
hydroxycinnamic acid, d: Dalton, 2DE: Two-dimensional gel electrophoresis, DHB:<br />
Dihydroxybenzoic acid, DTE: 1,4-Dithioerythritol, IEF: Isoelectric focusing, IPG:<br />
Immobilized pH gradient, LC: Liquid chromatography, MALDI: Matrix-assisted laser<br />
desorption ionization, MS: Mass spectrometry, MW: Molecular weight, MWCO:<br />
Molecular weight cut-off, ns: Nano-second, o/n: Overnight, RCF: Relative centrifugal<br />
forces, SA: Sinapinic acid, SDS: Sodium dodecylsulfate, SELDI: Surface-enhanced laser<br />
desorption, SPE: Solid phase extraction, TCA: Trichloroacetic acid, TFA: Trifluoroacetic<br />
acid, TGS: Tris-Glycine-SDS, TOF: Time of flight, UF: Ultrafiltration<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
141
142 Zerefos and Vlahou<br />
1. Introduction<br />
Biological fluids play a central role in clinical chemistry. Investigation<br />
of their cellular (cell number, morphology, etc.) biochemical (metabolites,<br />
biomolecules) and physicochemical (pH, transparency, absorption, etc.)<br />
attributes assists in formulating the clinical judgment on disease prognosis,<br />
diagnosis, and treatment. Urine, according to International Union of Pure and<br />
Applied Chemistry, is the human fluid, which contains water and metabolic<br />
products and is excreted by the kidneys, stored in the bladder and normally<br />
discharged by the way of the urethra. The protein content of urine is very low<br />
under normal conditions (1) and derives mainly from human plasma proteins,<br />
which are not filtered through the renal glomeruli. The presence of proteins at<br />
high concentrations in urine is usually the result of disease or pharmaceutical<br />
treatment. Creatinine assay in urine is one of the most common clinical examinations<br />
and serves this exact purpose, to assess unexpected protein excretion.<br />
It should be noted that besides the soluble proteins, urine also contains proteins<br />
included in exfoliated cells as well as in membrane components known as<br />
exosomes (2). In this chapter, we focus on the description of methods for the<br />
analysis of the soluble urinary proteins and would recommend for the interested<br />
reader the review by Pisitkun et al. (2), for a thorough description of the other<br />
urinary protein components.<br />
In comparison to other proteomics samples, urine is still less explored. The<br />
main reason for this is the fact that urine is a difficult and diverse sample. Its<br />
composition is age, sex, health, and drug dependent. In addition, tremendous<br />
day variations on the protein content exist between first, void, midstream,<br />
morning and random catch urine samples of a single donor. Despite these<br />
facts, protein markers for disease have been detected in urine and have been<br />
approved to be utilized as adjuncts to clinical assays for disease diagnosis<br />
and prognosis (3,4). This justifies and triggers an in-depth analysis of the<br />
urinary proteome, particularly with the advent of contemporary proteomics<br />
technologies, with the objective to identify novel disease diagnostic/prognostic<br />
biomarkers.<br />
Specifically, urine proteome has been studied thoroughly by a series<br />
of proteomics technologies. These include, two-dimensional electrophoresis<br />
(5,6), liquid chromatography (LC) in combination to mass spectroscopy<br />
(MS) (7,8), matrix-assisted laser desorption ionization-time of flight (MALDI-<br />
TOF) or surface-enhanced laser desorption (SELDI)-TOF profiling (9,10,11,<br />
12,13), capillary electrophoresis coupled to MS (14,15) and combinations<br />
thereof, implementing several separation steps both chromatographic and<br />
electrophoretic (15,16,17,18,19). The great interest in the investigation of the<br />
urinary proteome is reflected by the recent establishment of the human urine
Urine Sample Preparation 143<br />
and kidney proteome initiative (http://hkupp.kir.jp) within the Human Proteome<br />
Organization that targets the integration of existing research efforts in this field.<br />
In this chapter, we provide detailed protocols and troubleshooting tips<br />
as experienced by the authors, in the preparation and analysis of urinary<br />
proteins by two-dimensional gel electrophoresis (2DE) or directly by MALDI-<br />
TOF-MS. We selected these two profiling approaches since the former<br />
is a classical high resolution profiling approach (see also Chapters 4–<br />
6), whereas the latter offers the advantage of high throughput [see also<br />
Chapters 7 and 13]. In general, the process of urine analysis for the investigation<br />
of its protein content can be divided into three main steps: sample<br />
collection, usually performed at the physician’s office, protein extraction,<br />
protein separation, and detection. Each of these steps is very crucial and<br />
affects significantly the output of the proteomics experiment. In this chapter,<br />
an emphasis is given on the description of the various protein preparation/extraction<br />
methodologies including: ultrafiltration, precipitation, and<br />
solid phase extraction (SPE) as they complement 2DE and MALDI-TOF-MS<br />
profiling. Apparently, additional protein preparation methods exist such as<br />
dialysis, ultracentrifugation, etc. (see Note 1); however, we have focused on<br />
the three aforementioned methods due to their simplicity, increased reproducibility,<br />
and overall compatibility with the 2DE and MALDI MS profiling<br />
approaches.<br />
1.1. Protein Precipitation<br />
Protein precipitation is a very common purification procedure employed<br />
for the isolation of macromolecules. The denaturation and precipitation of<br />
proteins occurs in solutions of extreme ionic strength, very low pH, or high<br />
concentrations of organic solvents. In such conditions, biopolymers do not<br />
retain a conformation capable of sustaining their solubility. Commonly used<br />
reagents are ammonium sulfate ([NH 4 ] 2 SO 4 ), used for protein desalting at<br />
concentrations of 3 M, trichloroacetic acid (TCA) [used at concentrations higher<br />
than 5% (w/v)], and several organic solvents [ethanol, acetone, acetonitrile,<br />
chloroform, methanol, and isopropanol, at final concentrations higher than<br />
50%, (v/v)]. The choice of the precipitation methodology depends primarily on<br />
the analytical procedure employed. In general, protein desalting is avoided in<br />
proteomics sample preparations since residual salts inhibit further analysis by<br />
2DE and mass spectrometry. TCA precipitation followed by acetone washes is<br />
very popular and efficient, especially in cases of very dilute protein solutions.<br />
Organic solvents offer very high yields but some of them are toxic (methanol,<br />
acetonitrile) while others like chloroform (also toxic) employ rather complicated
144 Zerefos and Vlahou<br />
precipitation procedures. A detailed description of these approaches for urinary<br />
protein preparation is provided in Section 3.<br />
1.2. Ultrafiltration-SPE<br />
Ultrafiltration is a technique based on the use of molecular filters in combination<br />
to centrifugal forces. The whole procedure is performed in a centrifuge<br />
and in temperatures varying from 4°C to ambient conditions. It presents many<br />
advantages; for example, proteins are kept in solution and are more easily<br />
handled. A major disadvantage is the cost of the approach and the fact that<br />
even traces of the filter materials, when eluted, produce significant problems<br />
in MS based methodologies.<br />
Solid phase extraction in combination to MS for urine clinical proteomics<br />
is a newly added approach (22). SPE in the form of magnetic particles was<br />
recently developed as the front end of direct profiling of biological fluids by<br />
MS (23).<br />
We have found that acetone or TCA precipitation and ultrafiltration are very<br />
efficient urinary protein preparation approaches, highly compatible with 2DE<br />
analysis (Figs. 1, 2). In the case of MALDI MS profiling, we favor the utilization<br />
of ultrafiltration, SPE as well as direct dilution of urine in MS compatible<br />
buffers as front end protein preparation methods (see Note 2, Fig. 3). The<br />
detailed protocols are provided below.<br />
1 2 3 4 5 6 7 8<br />
Fig. 1. Comparison of urinary sample preparation approaches. Lanes correspond to:<br />
(1) marker, (2) urine starting material, (3) TCA/acetone precipitation supernatant, (4)<br />
TCA precipitate, (5) urine supernatant after 3 h centrifugation at 200,000 RCF, (6)<br />
protein pellet after ultracentrifugation of 5 mL urine, (7) urine filtrate after ultrafiltration<br />
through 5 kd MWCO, and (8) urine retentate after ultrafiltration. In lanes 2, 3, 5, and<br />
7 equal volumes of urine sample were utilized; similarly, lanes 4 and 8 correspond<br />
to same amount of starting urine material in order to facilitate comparison of the<br />
approaches.
Urine Sample Preparation 145<br />
2. Materials<br />
2.1. Sample Collection, Handling, and Storage<br />
1. Polypropylene aliquoting tubes (1.5, 2, 15, and 50 mL), Sarstedt Corporation<br />
(Nümbrecht, Germany)<br />
2.2. Urine Sample Preparation/Protein Precipitation<br />
2.2.1. TCA/Acetone Precipitation Protocol<br />
1. Trichloroacetic acid, ultra pure (store solutions at 2–8°C), Sigma Corporation<br />
(St. Luis, MO, USA)<br />
2. Acetone, analytical purity grade, Sigma Corporation<br />
2.2.2. Organic Solvent Precipitation Protocol<br />
1. Acetone, analytical purity grade, Sigma Corporation<br />
2. Isopropanol, analytical purity grade, Sigma Corporation<br />
3. Ethanol, analytical purity grade, Sigma Corporation<br />
2.2.3. Urine Ultrafiltration<br />
1. Amicon ultrafiltration devices, Millipore Corporation (Billerica, MA, USA)<br />
2.2.4. Urine SPE<br />
1. Bioselect C18 SPE cartridges were from Grace Vydac (Columbia, MS, USA)<br />
2. Methanol, high performance liquid chromatography HPLC grade, Sigma<br />
Corporation<br />
3. Acetonitrile, HPLC grade, Sigma Corporation<br />
4. Trifluoroacetic acid, HPLC grade, Sigma Corporation<br />
2.3. Analytical/Profiling Techniques<br />
2.3.1. Two-Dimensional Separation<br />
1. Protean isoelectric focusing (IEF) cell, Biorad (Hercules, CA, USA)<br />
2. Nonlinear immobilized pH gradient (IPG) strips (3,4,5,6,7,8,9,10), 17 cm long<br />
3. 2DE sample buffer: 7 M urea, 2 M thiourea, 4% CHAPS w/v, 0.4% 1,4-<br />
dithioerythritol (DTE) w/v, 2% IPG buffer (Biorad) w/v, all components are of<br />
molecular biology grade<br />
4. Mineral oil<br />
5. Equilibration buffer I: 6 M urea, 50 mM Tris–HCl, pH 8.8, 30% glycerol, 2.0%<br />
sodium dodecylsulfate (SDS), 30 mM DTE<br />
6. Equilibration buffer II: 6 M urea, 50 mM Tris–HCl, pH 8.8, 30% glycerol (v/v),<br />
2.0% SDS (w/v), 230 mM iodocatemide. All components are of molecular biology<br />
grade
146 Zerefos and Vlahou<br />
7. Fixation solution: 5% phosphoric acid (p.a grade, Sigma) w/v, 50% methanol v/v<br />
(HPLC grade, Sigma)<br />
8. Colloidal coomassie brilliant blue staining kit, Invitrogen (Carlsbad, CA, USA)<br />
9. GS-800 calibrated densitometer and PDQuest software, Biorad<br />
2.3.2. MALDI-TOF-MS<br />
1. Matrix solution: 50% acetonitrile v/v, 0.1% trifluoroacetic acid (TFA) v/v, 0.75%<br />
[-cyano-4-hydroxy-cinnamic (CHCA), Sigma Corporation]. Caution: all MALDI<br />
matrices are light sensitive; avoid unnecessary light exposure. Fresh preparation<br />
is advised, or else keep for 1 week (maximum) and store at 4°C<br />
2. MALDI ground steel target plate<br />
3. Ultraflex I MALDI-TOF-TOF-MS (Bruker Daltonics, Bremen, Germany)<br />
4. FlexAnalysis 2.2 software, Bruker Daltonics<br />
2.4. Miscellaneous<br />
The HPLC grade water (Resistivity >18 M cm −1 , Total organic carbon<br />
(TOC)
Urine Sample Preparation 147<br />
A1<br />
A2<br />
B1<br />
A3<br />
A4<br />
B2<br />
Fig. 2. Two-dimensional profiling of (A) 24 h collected urine concentrated by<br />
(1) ultrafiltration through 5000 MWCO, (2) TCA precipitation, (3) acetone precipitation<br />
without washing of the protein pellet, and (4) acetone precipitation with pellet washing.<br />
In these cases (1,2,3,4), the starting material was preconcentrated via membrane<br />
filtration (Pellicon 2 system, Millipore, Corporation); ultrafiltration and TCA or acetone<br />
precipitation, as applicable, were applied for the further concentration of the sample<br />
prior to 2DE analysis. (B) Two-dimensional profiling of random catch urine (50 mL<br />
starting volume without any preconcentration) condensed via (1) ultrafiltration through<br />
5000 MWCO and (2) acetone precipitation. In all cases, 1 mg of protein was analyzed<br />
and visualized with colloidal coomassie stain in 3–10 nonlinear IPG strips.<br />
6. Let pellet dry at ambient temperature (see Note 11).<br />
7. Solubilize pellet in 2DE sample buffer and proceed with 2DE analysis (see<br />
Subheading 3.3.1, Note 12, and Fig. 2).<br />
8. The protein pellet may also be subjected to solubilization with MS compatible<br />
buffers and analyzed by MS profiling (see Note 13, Subheading 3.3.2, and<br />
Fig. 3).<br />
3.2.2. Organic Solvent Precipitation Protocol<br />
1. Add to the urine sample at least equal volume of the desired organic solvent<br />
(ethanol, acetone, or isopropanol) and mix (see Notes 14, 15).<br />
2. Keep at –20°C o/n (see Note 16).
148 Zerefos and Vlahou<br />
G<br />
Intensity<br />
×10 4 2<br />
6<br />
4<br />
E<br />
F<br />
D<br />
C<br />
B<br />
1000<br />
5000<br />
Mass to charge<br />
A<br />
10,000<br />
Fig. 3. MALDI-TOF-MS profiling of urine. (A) Ultrafiltration retentate through<br />
5000 MWCO, diluted 10× in 0.1% TFA; (B) 10× dilution of urine in 0.1% TFA;<br />
(C) supernatant of urine (diluted in 0.1% TFA) after protein precipitation via acetone;<br />
(D) urine protein pellet from acetone precipitation reconstituted in 0.1% TFA; (E) urine<br />
protein pellet from acetone precipitation reconstituted in 50% acetonitrile 0.1% TFA;<br />
(F) acetone precipitation (supernatant) and further purification of the supernatant by<br />
C18-SPE followed by dilution in 0.1% TFA; (G) C18-SPE eluate in 50% acetonitrile,<br />
0.1% TFA. Extensive reproducibility studies indicated that urine processing by ultrafiltration<br />
or direct dilution in 0.1% TFA provides with the most robust spectra of the<br />
methods tested. Adapted from (13).<br />
3. Centrifuge at standard refrigerated bench-top centrifuges (for eppendorf type<br />
tubes) for 15 min at RCF of 16,000–17,000 and 4°C. Discard the supernatant.<br />
4. Wash pellet with ice-cold acetone, leave for 5–10 min at –20°C, and centrifuge<br />
again. Discard supernatant and repeat once more the washing step (see Note 17).<br />
5. Let pellet dry at ambient temperature.<br />
6. Solubilize pellet and proceed with 2DE analysis. The protein pellet or supernatant<br />
may also be subjected to solubilization with MS compatible buffers and analyzed<br />
by MS profiling (see Notes 12, 13, Subheading 3.3.1, Figs. 2, 3).<br />
3.2.3. Urine Ultrafiltration<br />
1. Place one volume of urine upon a 5000 kd molecular weight cut-offs (MWCO)<br />
Amicon ultrafiltration device (see Notes 18–20).
Urine Sample Preparation 149<br />
2. Spin in a refrigerated centrifuge at 3500 RCF and 8–12°C (see Notes 21, 22).<br />
3. After condensation, collect the retentate and discard or keep the filtrate depending<br />
on the specific application (see Notes 23–25).<br />
4. For 2DE add the appropriate volume of sample buffer to the retentate and proceed<br />
with IEF (see Notes 26–27, Subheading 3.3.1, and Fig. 2).<br />
5. For MALDI profiling dilute the retentate 10 times with 0.1% TFA v/v, and<br />
proceed as described below (see Subheading 3.3.2, Fig. 3).<br />
3.2.4. Urine SPE (see Note 28)<br />
1. Activate cartridge with a total of 1 mL methanol (two applications of 500 μL each).<br />
2. Wash cartridge with 2 mL acetonitrile (four applications of 500 μL each, see<br />
Note 29).<br />
3. Equilibrate cartridge with a total of 1 mL 0.1% TFA v/v (two applications of<br />
500 μL each).<br />
4. Load cartridge with 1 mL urine acidified by TFA at 0.1% (v/v) final concentration.<br />
5. Wash cartridge with 1 mL 0.1% TFA v/v (two applications of 500 μL each).<br />
6. Elute compounds by adding 100 μL of 50% acetonitrile, 0.1% TFA v/v.<br />
7. Take 1 μL eluent, place on MALDI target, and process for MALDI MS profiling<br />
(see Subheading 3.3.2, Fig. 3).<br />
3.2.5. Direct Dilution of Urine<br />
This method is used only in conjunction to direct MALDI MS profiling<br />
• Dilute urine 10 times with 0.1% TFA v/v (see Notes 30, 31).<br />
• Apply 1 μL of the urine sample on MALDI target.<br />
• Apply 1 μL matrix solution.<br />
• Proceed with MALDI-TOF-MS (see Subheading 3.3.2, Fig. 3).<br />
3.3. Analytical/Profiling Techniques<br />
3.3.1. Two-dimensional Separation<br />
1. Measure protein concentration of the sample (pretreated by precipitation or<br />
ultrafiltration) by the use of a commercially available protein kit.<br />
2. Take 0.5–1 mg of urinary proteins diluted in 300 μL of 2DE sample buffer (see<br />
Note 32).<br />
3. Distribute the sample volume equally in a lane of the IEF focusing tray.<br />
4. Place the strip carefully, with the gel face down and in contact with the electrodes<br />
(see Note 33).<br />
5. Rehydrate actively for 16 h at 50 V and 20°C. Caution: do not cover the strip<br />
with mineral oil immediately but after 1hofrehydration (see Note 34).<br />
6. After rehydration, place moistened IEF papers between the strip and electrodes.<br />
7. Start IEF. The typical program is: 250 V for 30 min, linear increment up to<br />
5000 V in 12 h, 5000 V for 16 h (total 110,000 V-h) (see Note 35).
150 Zerefos and Vlahou<br />
8. After IEF is complete, equilibrate strip with 10 mL equilibration buffer I for<br />
20 min at ambient temperature.<br />
9. Alkylate with 10 mL equilibration buffer II for 20 min (see Note 36).<br />
10. Place strip on top of 12.5% polyacrylamide gel, cover with 0.5% melted agarose<br />
in TGS buffer and start second dimension. Start with 10 mA current for 1hand<br />
continue with 40 mA for approximately another 4h(see Note 37).<br />
11. Fix gel for 2 h with fixation solution.<br />
12. Stain o/n with colloidal coomassie blue stain (Fig. 2).<br />
3.3.2. MALDI-TOF-MS<br />
1. Place 1 μL sample on the MALDI target plus 1 μL matrix solution and mix on<br />
spot (dried droplet technique, see Notes 38 and 39).<br />
2. Leave target to dry at ambient temperature in the dark.<br />
3. Load sample in the instrument and execute the appropriate MS method. Run the<br />
instrument in linear mode (see Note 40).<br />
4. Optimize ion acceleration; tempering with sensitivity of the detector is not recommended<br />
prior to MS method establishment (see Note 41).<br />
5. Set pulsed ion extraction (delayed ion acceleration) according to the profiling<br />
region in use. Typically when -cyano-cinnamic acid is utilized 50–150 ns are<br />
applied for large peptides (3–5 kd), 150–300 ns for small molecular weight<br />
proteins (15 kd), and higher than 300 for proteins (>20 kd, see Notes 42 and 43).<br />
6. Collect 1000–2000 shots per sample and sum the collected data (see Note 44).<br />
4. Notes<br />
1. Dialysis is one of the most classical methods for buffer exchange and purification<br />
(separation) of high from low molecular weight constituents of a specific<br />
sample. Although it has been utilized elsewhere (20) we consider it rather<br />
laborious, costly and serving solely purification and not condensation purposes.<br />
Ultracentrifugation has been applied (21) for the isolation of higher molecular<br />
weight urinary proteins prior to 2DE (Fig. 1). In our opinion, centrifugal<br />
isolation of proteins is a very diverse and complicated issue and reproducibility is<br />
consequently compromized. Precipitation of biopolymers by ultracentrifugation<br />
requires the use of solutions with very well calculated composition in order to<br />
extract the velocity for protein isolation from the theoretical Svedberg values.<br />
Urine samples differ significantly in density (d = m/v) and pH values to serve<br />
such purposes in a well-defined and reproducible manner.<br />
2. It should be emphasized that extensive complementarity of the various methods<br />
exists; thereby the combinatorial application of different methods is recommended<br />
in order to increase protein resolution.<br />
3. Urine samples can be first void, midstream, morning, random catch, or 24 h.<br />
Due to its high bacterial content, first morning urine is usually not recommended<br />
in biomarker discovery studies.
Urine Sample Preparation 151<br />
4. Upon their collection, if not stored immediately in –80°C, urine samples should<br />
be stored at 4°C. Published data support (9,10) that for analysis by 2DE or<br />
SELDI/MALDI MS the generated proteomic profiles are usually stable for up to<br />
24 h urine storage at 4°C prior to deep freezing. We have observed occasional<br />
profile changes after so prolonged storage times at 4°C, and we therefore favor<br />
shorter times.<br />
5. An enrichment of the soluble supernatant for cellular proteins may be achieved<br />
if prior to the centrifugation step a mild sonication (sonicator bath) for 5–10 min<br />
is applied.<br />
6. The volume of urine required depends on the specific downstream application.<br />
For 2DE analysis an aliquot of at least 15 mL of urine is required. For direct<br />
MALDI MS profiling 1 mL urine aliquot is sufficient.<br />
7. The TCA can be added as solid to a final concentration of 15% (w/v) (TCA<br />
is extremely hydroscopic and is easily solubilized). Alternatively, the appropriate<br />
volume of 100% TCA w/v may be added to the urine sample to reach<br />
a final concentration of 15% (w/v). TCA precipitation can also be performed<br />
at –20°C and o/n storage with occasionally slightly better efficiency. Caution:<br />
TCA solutions may form bilayer aqueous–organic systems depending on the<br />
salt concentration of the urine at –20°C or lower temperatures. The precipitation<br />
efficiency is dependent of the protein concentration of a given sample;<br />
in our experience, for example, the precipitation yield for a starting material<br />
of 0.5 mg/mL protein concentration (i.e., 1 mg total protein found in 2 mL<br />
sample) ranges from 40 to 70%; in contrast the precipitation efficiency for a<br />
starting material of 0.1 mg/mL protein concentration (i.e., 1 mg protein in 10 mL<br />
sample) is 0–30%. For this reason, avoid adding TCA solution in very dilute<br />
protein samples.<br />
8. In case where the highest available centrifugal force is only 4000–5000 RCF,<br />
then longer centrifugation times (45 min) are recommended.<br />
9. The volume of acetone utilized for washing depends on the size of the protein<br />
pellet. A general rule is to use 1 mL acetone for every 1 mL of urine starting<br />
material.<br />
10. Acetone washes are needed to drive of excess TCA or else the pellet is extremely<br />
acidic and buffers utilized in further steps are neutralized. In addition, TCA<br />
(nonvolatile acid) may inhibit IEF, PAGE, LC, or MS analysis. We have found<br />
that acetone washes of the pellet does not induce significant protein losses.<br />
11. The pellet should not be completely dried off, since this renders difficult<br />
its subsequent solubilization in 2DE or other buffers. Acetone evaporation at<br />
elevated temperatures is not recommended for the same reason.<br />
12. If the pellet does not come in solution, try mild sonication (5 min in a sonicator<br />
bath) or incubate at ambient temperature for 30 min with intermittent vortexing.<br />
However, heating should be avoided (particularly if the pellet is resuspended in<br />
2DE buffer since urea decomposes when heated and reacts with amino acids).<br />
The buffer volume required for solubilization depends on the protein content<br />
(pellet size) and the type of downstream application (2DE or MALDI-TOF-MS).
152 Zerefos and Vlahou<br />
13. The protein pellet may be solubilized in 0.1% TFA v/v (roughly 100 μL of<br />
solubilization buffer for every milliliter of urine starting material) and analyzed<br />
by MALDI-TOF-MS. However, in our experience, plasticizers possibly extracted<br />
during the precipitation process are frequently detected and reproducibility<br />
problems are observed. Therefore, unless additional purification steps are introduced<br />
(SPE, etc.), we do not favor the application of precipitation methods at<br />
the front end of MALDI MS profiling.<br />
14. The use of ethanol, acetone, or isopropanol is favored. These are hydrophobic,<br />
water mixable – even at elevated salt concentrations – nontoxic, and volatile.<br />
In particular, we favor the use of acetone since it is cheap, extremely volatile,<br />
and rarely forms aqueous–organic bilayers. Organic solvent mixtures e.g.,<br />
isopropanol–acetone, do not increase precipitation efficiencies; in our experience<br />
their use induces reproducibility problems and therefore is not recommended.<br />
15. The sample to solvent ratio depends on the downstream application and the<br />
sample protein concentration. For dilute urine samples (protein concentration of<br />
micrograms per milliliter) a solvent to sample ratio of 3 provides relatively high<br />
precipitation efficiencies. We have observed that for more concentrated samples<br />
(for example, preconcentrated urine or in general starting material of protein<br />
content in the micrograms per milliliter range), the precipitation efficiency for<br />
lower MW constituents reaches its maximum at solvent to sample ratio of<br />
about 9.<br />
16. Precipitation is most efficient at –20°C (lower efficiencies have been observed at<br />
4°C, whereas at –80°C bilayer systems may form, which inhibit the procedure).<br />
17. Acetone washes of pellet in organic solvent precipitation protocols are not<br />
accustomed. From our experience, however, washing offers great advantages<br />
especially when 2DE separation is the downstream application since salts and<br />
other interfering substances are removed (Fig. 2). This washing step renders<br />
2DE gels produced after acetone precipitation equally good to those generated<br />
following TCA precipitation. Acetone washing induces negligible protein losses.<br />
18. There are Amicon UF devices that can accommodate up to 4 (UF4) or 15 mL<br />
(UF15) sample volumes. We regularly utilize the UF4 devices when MALDI MS<br />
profiling is to be performed and UF15 when 2DE is the downstream application.<br />
19. Amicon devices have several MWCO. We propose the use of 5000 kd MWCO<br />
for the isolation and condensation of “total” urine protein content. The use of<br />
different MWCO is advised for specific isolation of molecular weight groups<br />
(see also Note 25). It should be emphasized that UF is not an absolute sizeexclusion<br />
separation method and cross-contamination between different protein<br />
size groups is expected and regularly observed.<br />
20. UF can be performed in the presence of chemical additives. The kind of additives<br />
in use depends on the downstream application (2DE, MALDI profiling, LC-<br />
MS, etc.) since in all cases the chemical compatibility to the latter should be<br />
maintained. For example, we have observed that in case of direct MALDI MS<br />
profiling most additives (detergents such as: octyl-glucopyranoside, triton-100,<br />
tween-20, and organic solvents such as: trifluoroethanol
Urine Sample Preparation 153<br />
and isopropanol
154 Zerefos and Vlahou<br />
(e.g., phosphor or glycopeptides) is feasible and that is which differentiates SPE<br />
from other sample preparation steps. From our point of view SPE in combination<br />
to direct MS profiling is encouraged.<br />
29. All chromatographic and SPE media contain residuals and plasticizers, which<br />
should be driven off prior to analyte binding. Failure to perform this step may<br />
result in complete ionization suppression during MALDI profiling.<br />
30. The user may have to try different dilutions of the urine sample. In MALDI MS<br />
profiling experiments, there is a range of protein concentration within which the<br />
spectra quality is not affected. It is advised to conduct preliminary experiments<br />
in order to address this issue.<br />
31. In addition to TFA, the use of several additives (urea, octyl-glucopyranoside,<br />
triton-100, tween-20, NP-40, cholate, and organic solvents) at MALDI MS<br />
compatible concentrations has been tested on urinary peptide–protein ionization.<br />
However, we did not observe any clear advantage on protein resolution or<br />
ionization in these cases.<br />
32. The recommended protein amount of 0.5–1 mg is suitable for 17–18 cm length<br />
and 3–10 or 4–7 pH range strips. The protein amount will vary if different strip<br />
types are utilized, according to the manufacturer’s guidelines (for additional tips<br />
on 2DE see Chapters 4–6).<br />
33. Noncup loading was found to provide better resolution in urine analysis by 2DE<br />
compared to the cup loading method.<br />
34. Direct addition of the mineral oil might cause extraction of hydrophobic proteins<br />
to the oil layer.<br />
35. These running conditions are for the analysis of 1 mg protein sample on wide<br />
range (3–10 or 4–7) 17 or 18 cm IPG strips. The program will vary depending<br />
on the sample quantity and the type of strip in use.<br />
36. Reduction and alkylation are necessary for higher protein resolution in SDS-<br />
PAGE and also for protein identification through peptide mass fingerprinting.<br />
37. The low starting current is needed for the slow migration of the proteins from<br />
the strip to the polyacrylamide gel. Direct electrophoresis with 40 mA current<br />
may cause protein losses. Alternatively, the gel may run at 10 mA o/n. Although<br />
slower, the latter approach provides gels of higher resolution, in our experience<br />
(for additional tips on 2DE see Chapters 4–6).<br />
38. Several sample application techniques were tested (thin layer preparation, double<br />
layer, and variations of dried droplet). Of those, we found that dried droplet<br />
(with simultaneous sample and matrix application) was the simplest, fastest,<br />
and most reliable method. In addition, the simultaneous drying of sample and<br />
matrix solution (rather than sample and matrix separately) increases reproducibility<br />
and minimizes losses during subsequent spot washes. In contrast,<br />
if sample and matrix are mixed prior to their application on the target, their<br />
consumption is much higher and the sample exposure to plastics increases,<br />
thereby increasing the chances for sample contamination and subsequent ion<br />
suppression by plasticizers.
Urine Sample Preparation 155<br />
39. In case that crystal formation is obscured due to high salt content in the<br />
sample, wash the spot by pipetting two to three times with 2 μL of cool 0.1%<br />
TFA solution v/v (let dry again, do not wipe dry). Always prefer spot to spot<br />
washing rather than washing the entire target, in order to avoid sample crosscontamination.<br />
40. Instrument calibration is performed according to the manufacturer specifications.<br />
In any case, we propose daily calibration to ensure precision and accuracy.<br />
41. Acceleration of biomolecules is first of all affected by voltage settings of the<br />
ion source. Settings of the analyzer (TOF) affect mainly resolution parameters,<br />
while detector settings should be tempered only to improve signal to noise<br />
characteristics of a given sample.<br />
42. The mass spectrum should be divided into subregions and data of each of the<br />
latter should be collected separately, in order to increase protein resolution.<br />
This is because ionization kinetics (and consequently instrument settings) are<br />
completely different for different protein sizes.<br />
43. Different matrices (e.g., CHCA or dihydroxybenzoic acid for peptides and SA<br />
for proteins) require different laser focusing settings. In general, large crystals<br />
(such as the ones formed by SA) and larger protein molecules require more<br />
concentrated energy bursts than smaller ones where more disperse hits may be<br />
used.<br />
44. Always sum the same amount of laser shots and select as many regions of a<br />
spot as possible to ensure high reproducibility.<br />
Acknowledgments<br />
This study was supported by the Greek Ministry of Health.<br />
References<br />
1. Norden, G.W.A., Sharratt, P., Cutillas, P.R., Cramer, R., Gardner, S.C. and<br />
Unwin, R.J. (2004) Quantitative amino acid and proteomics analysis: Very low<br />
excretion of polypeptides >750 Da in normal urine. Kidney International 66,<br />
1994–2003.<br />
2. Pisitkun, T., Johnstone, R. and Knepper, M.A. (2006) Discovery of urinary<br />
biomarkers. Molecular and Cellular Proteomics 5, 1760–1771.<br />
3. Nielsen, M.E., Schaeffer, E.M., Veltri, R.W., Schoenberg, M.P., Getzenberg, R.H.<br />
(2006) Urinary markers in the detection of bladder cancer: What’s new Current<br />
Opinion in Urology 16, 350–355.<br />
4. Thongboonkerd, V. and Malasit, P. (2005) Renal and urinary proteomics: Current<br />
applications and challenges. Proteomics 5, 1033–1042.<br />
5. Pieper, R., Gatlin, C.L., McGrath, A.M., Makusky, A.J., Mondal, M.,<br />
Seonarain, M., Field E., Schatz, C.R., Estock, M.A., Ahmed, N., Anderson, N.G.<br />
and Steiner, S. (2004) Characterization of the human urinary proteome: A method
156 Zerefos and Vlahou<br />
for high-resolution display of urinary proteins on two-dimensional electrophoresis<br />
gels with a yield of nearly 1400 distinct protein spots. Proteomics 4, 1159–1174.<br />
6. Oh, J., Pyo, J., Jo, E., Hwang, S., Kang, S., Jung, J., Park, E., Kim, S., Choi, J.<br />
and Lim, J. (2004) Establishment of a near-standard two-dimensional human urine<br />
proteomic map. Proteomics 4, 3485–3497.<br />
7. Spahr, C.S., Davis, M.T., McGinley, M.D., Robinson, J.H., Bures, E.J., Beierle, J.,<br />
Mort, J., Courchesne, P.L., Chen, K., Wahl, R.C., Yu, W., Luethy, R. and<br />
Patterson, S.D. (2001) Towards defining the urinary proteome using liquid<br />
chromatography-tandem mass spectrometry I. Profiling an unfractionated tryptic<br />
digest. Proteomics 1, 93–107.<br />
8. Cutillas, P.R., Norden, A., Cramer, R., Burlingame, A. and Unwin, R.J. (2003)<br />
Detection and analysis of urinary peptides by on-line liquid chromatography and<br />
mass spectrometry: Application to patients with renal Fanconi syndrome. Clinical<br />
Science 104, 483–490.<br />
9. Schaub, S., Wilkins J., Weiler, T., Sangster, K., Rush, D., Nickerson, P.<br />
(2004) Urine protein profiling with SELDI TOF MS. Kidney International 65,<br />
323–332.<br />
10. Rogers, M.A., Clarke, P., Noble, J., Munro, N.P., Paul, A., Selby, P.J. and<br />
Banks, R.E. (2003) Proteomic profiling of urinary proteins in renal cancer by<br />
surface enhanced laser desorption ionization and neural-network analysis: Identification<br />
of key issues affecting potential clinical utility. Cancer Research 63,<br />
6971–6983.<br />
11. Vlahou, A., Schellhammer, P.F., Mendrinos, S., Patel, K., Kondylis, F.I., Gong, L.,<br />
Nasim, S. and Wright, J.G. Jr. (2001) Development of a novel proteomic approach<br />
for the detection of transitional cell carcinoma of the bladder in urine. The American<br />
Journal of Pathology 158, 1491–1502.<br />
12. Vlahou, A., Giannopoulos, A., Gregory, B.W., Manousakas, T., Kondylis, F.I.,<br />
Wilson, L.L., Schellhammer, P.F., Semmes, O.J. and Wright G.L. Jr. (2004) Protein<br />
profiling in urine for the diagnosis of bladder cancer. Clinical Chemistry 50,<br />
1438–1445.<br />
13. Zerefos, P.G., Prados, J., Kalousis, A. and Vlahou, A. (2007) Sample preparation<br />
and bioinformatics in MALDI profiling of urinary proteins. Journal of Chromatography<br />
B. Analyt Technol Biomed Life Sci. 15, 20–30.<br />
14. Zórbig, P., Renfrow, M.B., Schiffer, E., Novak, J., Walden, M., Wittke, S., Just, I.,<br />
Pelzing, M., NeusóÌ, C., Theodorescu, D., Root, K.E., Ross, M.M. and Mischak, H.<br />
(2006) Biomarker discovery by CE-MS enables sequence analysis via MS/MS with<br />
platform-independent separation. Electrophoresis 27, 2111–2125.<br />
15. Mischal, H., Kaiser, T., Walden, M., Hillmann, M., Wittke, S., Herrmann, A.,<br />
Knueppel, S., Haller, H. and Fliser, D. (2004) Proteomic analysis for the assessment<br />
of diabetic renal damage in humans. Clinical Science 107, 485–495.<br />
16. Zerefos, P.G., Vougas, K., Dimitraki, P., Kossida, S., Petrolekas, A.,<br />
Stravodimos, K., Giannopoulos, A., Fountoulakis, M. and Vlahou, A. (2006)<br />
Characterization of the human urine proteome by preparative electrophoresis in<br />
combination with 2-DE. Proteomics 6, 4346–4355.
Urine Sample Preparation 157<br />
17. Pang, J.X., Ginanni, N., Dongre, A.R., Hefta, S.A., and Opiteck, G.J. (2002)<br />
Biomarker discovery in urine by proteomics. Journal of Proteome Research 1,<br />
161–169.<br />
18. Sun, W., Li, F., Wu, S., Wang, X., Zheng, D., Wang, J. and Gao, Y. (2005) Human<br />
urine proteome analysis by three separation approaches. Proteomics 5, 4994–5001.<br />
19. Soldi, M., Sarto, C., Valsecchi, C., Magni, F., Proserpio, V., Ticozzi, D. and<br />
Mocarelli, P. (2005) Proteome profile of human urine with two-dimensional liquid<br />
phase fractionation. Proteomics 5, 2641–2647.<br />
20. Rasmussen, H.H., Orntoft, T.F., Wolf, H. and Celis, J.E. (1996) Towards a comprehensive<br />
database of proteins from the urine of patients with bladder cancer. The<br />
Journal of Urology 6, 2113–2119.<br />
21. Thongboonkerd, V., McLeish, K.R., Arthur, J.M. and Klein, J.B. (2002) Proteomic<br />
analysis of normal human urinary proteins isolated by acetone precipitation or<br />
ultracentrifugation. Kidney International 62, 1461–1469.<br />
22. Glen, L., Hortin, G.L., Meilinger, B. and Drake, S.K. (2004) Size-selective<br />
extraction of peptides from urine for mass spectrometric analysis. Clinical<br />
Chemistry 50, 1092–1095.<br />
23. Zhang, X., Leung, S., Morris, C.R. and Shigenaga, M.K. (2004) Evaluation<br />
of a novel, integrated approach using functionalized magnetic beads, benchtop<br />
MALDI-TOF-MS with prestructured sample supports, and pattern recognition<br />
software for profiling potential biomarkers in human plasma. Journal of<br />
Biomolecular Techniques 15, 167–175.
9<br />
Combining Laser Capture Microdissection<br />
and Proteomics Techniques<br />
Dana Mustafa, Johan M. Kros, and Theo Luider<br />
Summary<br />
Laser microdissection is an effective technique to harvest pure cell populations from<br />
complex tissue sections. In addition to using the microdissected cells in several DNA and<br />
RNA studies, it has been shown that the small number of cells obtained by this technique<br />
can also be used for proteomics analysis. Combining laser capture microdissection and<br />
different types of mass spectrometers opened ways to find and identify proteins that are<br />
specific for various cell types, tissues, and their morbid alterations. Although the combination<br />
of microdissection followed by the currently available techniques of proteomics has<br />
not yet reached the stage of genome wide representation of all proteins present in a tissue,<br />
it is a feasible way to find significant differentially expressed proteins in target tissues.<br />
Recent developments in mass spectrometric detection followed by proper statistics and<br />
bioinformatics enable to analyze the proteome of not more than 100–200 cells. Obviously,<br />
validation of result is essential. The present review describes and discusses the various<br />
methods developed to target cell populations of interest by laser microdissection, followed<br />
by analysis of their proteome.<br />
Key Words: laser capture microdissection; matrix-assisted laser desorption/<br />
ionization; Fourier transformer mass spectrometry; time-of-flight mass spectrometry; liquid<br />
chromatography-electrospray ionization tandem mass spectrometry; two-dimensional<br />
polyacrylamide gel electrophoresis; differential in-gel electrophoresis; protein chip<br />
technology.<br />
Abbreviations: LCM: Laser Capture Microdissection, LMM: Laser Microbeam<br />
Microdissection, LPC: Laser Pressure Catapulting, 2D PAGE: Two-dimensional Polyacrylamide<br />
Gel Electrophoresis, 2D DIGE: Differential In-gel Electrophoresis, SDS: Sodium<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
159
160 Mustafa et al.<br />
Dodecyl Sulphate, MALDI-TOF/MS: Matrix-assisted Laser Desorption/Ionization Timeof-flight<br />
Mass Spectrometry, MALDI-FTMS: Matrix-assisted Laser Desorption/Ionization<br />
Fourier Transformer Mass Spectrometry, LC-ESI-MS/MS: Liquid Chromatography-<br />
Electrospray Ionization Tandem Mass Spectrometry, HPLC: High Performance Liquid<br />
Chromatography, SELDI-TOF: Surface-enhanced Laser Desorption/Ionization Time-offlight,<br />
ICAT: Isotope-coded Affinity Tag<br />
1. Introduction<br />
Over the last years, significant progress in the analysis of the entire genome<br />
has triggered efforts to further analyze normal and abnormal protein expression<br />
patterns. There is, for instance, an eagerness to discover more and better<br />
diagnostic markers for specific diseases. High expectations of the use of better<br />
biomarkers for the purpose of improving diagnosis and monitoring treatment<br />
initiated technical developments. Human tissues are usually composed of rather<br />
complex mixtures of different cell types. Many techniques have been used<br />
for the isolation of pure cell populations and each technique has its advantages<br />
and limitations. For example, immunohistochemistry is an established<br />
and relatively easy technique applicable for localizing protein expression. A<br />
drawback of immunohistochemistry is the impossibility of quantitative assessments<br />
of proteins. Another method to obtain information about particular cell<br />
populations is growing cell cultures in order to amplify target cells. Despite<br />
the technical feasibility of this technique, the biological characteristics of the<br />
original cells may not be so accurate in an in vitro environment (1). Alternatively,<br />
by using xenografts a better mimicking of the normal situation is<br />
reached, but again this method only reflects the real situation of cells in vivo to<br />
some extent (2). Another way of separating cell populations for further investigation<br />
is flow cytometry, which has successfully been applied in the study of<br />
many disease processes. Flow cytometric analysis is applied to cell suspensions<br />
and specific markers for selection of cell population are required. To the best<br />
of our knowledge, the combination of flow cytometry and subsequent mass<br />
spectrometry (MS) has not yet been described for the analysis of solid tissues.<br />
In this review, we discuss methods of cell purification and harvesting<br />
techniques by the use of laser microdissection, which are currently applied for<br />
further MS analysis.<br />
2. Laser Capture Microdissection<br />
In order to select for specific cell populations in heterogeneous tissues,<br />
several microdissection techniques have been described. Most techniques<br />
involve the use of a needle to scrap off cells of interest under direct microscopic
Combining LCM and Proteomics Techniques 161<br />
visualization (3,4). This method, however, tends to be slow, tedious, and highly<br />
operator dependent (2). In 1992, Shibata and coworkers described a new method<br />
of cell isolation. They used a specific pigment placed over small numbers of<br />
cells in a tissue section, which served as an umbrella preventing the covered<br />
cells of being destroyed. Ultraviolet light was used to destroy the DNA/RNA<br />
of the uncovered cells (5). Shortly later, laser capture microdissection (LCM)<br />
under direct microscopic visualization was developed by Liotta and coworkers<br />
in the National Cancer Institute. This way of target cell isolation permits rapid,<br />
reliable laser microdissection to collect specific cell populations from a section<br />
of a complex, heterogeneous tissue (6). For this approach, a tissue section<br />
is placed in a holder of an inverted microscope. A transparent, thermoplastic<br />
polymer coating [e.g., ethylene vinyl acetate (EVA) (7)] is placed in contact<br />
with the tissue. The EVA polymer is positioned over microscopically selected<br />
cell clusters and subsequently the polymer is precisely activated by a nearinfrared<br />
laser pulse steered by the investigator. The laser activation of the<br />
polymer results in specific binding to the targeted area. With the removal of<br />
the EVA and the tissue that was bound to it from the section the selected cell<br />
aggregates are isolated for molecular analysis (8). LCM is compatible with<br />
a variety of cellular staining methods and tissue preservation protocols (9).<br />
Dependent on the microlaser dissection device used, the collection caps used are<br />
positioned in different ways. For instance, the caps in the PixCell II (Arcturus<br />
Engineering, Mountain <strong>View</strong>, CA, USA) technique make contact with the<br />
tissue sections, therefore, strict requirements for preparations are needed. The<br />
PALM microlaser dissector (PALM Microlaser Technologies AG, Bernried,<br />
Germany) provides a powerful separation in which an important application of<br />
the cutting UV-laser is laser microbeam microdissection combined with laser<br />
pressure catapulting (10). A specific glass slides covered with polyethylene<br />
naphthalate membrane will aid in stabilizing the morphological integrity of<br />
the captured area (11) (Fig. 1). In this method, collecting caps do not make<br />
any contact with the tissue sections anymore, which increase the flexibility in<br />
respect to section preparation (12). Both LCM techniques are specific enough<br />
to dissect single cells. The PALM can dissect smaller sections of tissue as<br />
compared to the PixCell system. The two methods of microdissection yield<br />
RNA retrievals of comparable quality and quantity, but they have not been<br />
directly compared with regard to recent developments in protein retrieval for<br />
mass spectrometric applications (13). The collection of large quantities of cells<br />
by LCM is a time consuming procedure requiring the microscopical visualization<br />
of the cells of interest in a stained tissue sections before lasering. The<br />
software and the hardware of the different types of laser microdissection are<br />
still developing.
162 Mustafa et al.<br />
Buffer droplet<br />
Microdissected tissue<br />
Cap<br />
PEN membrane<br />
Stage<br />
Tissue section<br />
Slide<br />
Laser<br />
objective<br />
Fig. 1. A scheme that represents the principle of laser capture microdissection.<br />
3. LCM and Two-Dimensional Gel Electrophoresis<br />
A new development is the application of LCM for protein retrieval of<br />
tissues for further analysis by proteomic techniques. So far, several approaches<br />
have been performed on cells obtained by laser microdissection. In 2000,<br />
Emmert-Buck and coworkers applied two-dimensional polyacrylamide gel<br />
electrophoresis (2D PAGE) to 50,000 microdissected epithelial cells (14). They<br />
compared tumor cells and normal controls from two patients with oesophageal<br />
cancer (14). Staining the gels with silver yielded the visualization of 675 distinct<br />
proteins and isoforms. Seventeen differentially expressed spots were further<br />
analyzed by MS. This resulted in the identification of two specific proteins,<br />
cytokeratin 1 and annexin I. It was assumed that these proteins were present<br />
in an abundance range of 50,000–1,000,000 copies per cell (14). Using colon<br />
cancer as a model, also Lawrie and coworkers showed the feasibility of investigating<br />
protein expression by combining the technologies of LCM and proteome<br />
analysis like 2D PAGE and MS (15).<br />
To overcome the limitation of LCM in producing relatively low numbers of<br />
cells, an extra step has been added to the separation method. In addition to the<br />
2D PAGE from the microdissected cells, an extra 2D PAGE from the whole<br />
section of the same set of samples can be useful. The comparison of silver<br />
stained 2D gels created from microdissected epithelial cells of ovarian cancer<br />
and the 2D gels created from the whole section of the same ovarian samples,<br />
facilitated the discovery of 23 differentially expressed proteins between low<br />
malignant potential and invasive ovarian cancers (16). In-gel digestion of the<br />
specific gel spots followed by MS/MS analysis resulted in the identification of<br />
glyoxalase I, RhoGDI, and a 52 kDa FK506 binding protein (16). In another<br />
study based on 2D PAGE, 315 protein spots were identified by collecting<br />
100,000 cells by LCM of normal and cancer ductal units from breast tissue
Combining LCM and Proteomics Techniques 163<br />
sections (17). Subsequent measurement of the spots by MS resulted in the<br />
identification of 57 differentially expressed proteins between the two groups of<br />
samples (17).<br />
The relative low number of microdissected cells emphasizes the importance<br />
of loading equivalent amounts of protein on the gels. Thus, Shekouh and<br />
coworkers (18) followed a strategy to increase the accuracy of 2D PAGE from<br />
LCM samples. The samples were first separated by one-dimensional sodium<br />
dodecyl sulphate (SDS)-PAGE, stained with silver and subsequently subjected<br />
to densitometry. Evaluation of the staining intensity was used to normalize<br />
the samples. The 2D PAGE silver stained images from 50,000 microdissected<br />
adenocarcinoma cells were compared with the images from whole sections of<br />
pancreatic samples. Spots of their interest were subjected to MALDI-TOF/TOF<br />
MS, resulting in the identification of S100A6 as an over-expressed protein in<br />
pancreatic cancer cells (18). The same methodology has been used to understand<br />
the mechanism of a specific molecule such as (HER-2/neu) in breast<br />
cancer (19). Breast cancer tissue was used to microdissect about 50,000–70,000<br />
cells from three HER-2/neu-positive tumors and three HER-2/neu-negative<br />
tumors. This lead to the detection of about 500–600 protein spots in each<br />
gel. The comparison of these two groups allowed the identification of cytokeratin<br />
19 (CK19) as an overexpressed protein in HER-2/neu-positive breast<br />
cancer patients (19). In another study, the 2D PAGE of 10,000 microdissected<br />
cells of hepatocellular carcinoma (HCC) samples was compared with normal<br />
surrounding tissue. The investigators visualized about 868 spots of which 20<br />
were considered as differentially expressed proteins. The digestion of these<br />
proteins into peptides was followed by the application of ESI-MS/MS, which<br />
allowed the identification of 11 proteins. Four out of these 11 proteins were<br />
considered as novel candidates of hepatitis B-related HCC markers (20). This<br />
approach of separating the microdissected cells on 2D PAGE followed by in-gel<br />
protein digestion and MS measurements for the identification of biomarkers has<br />
been applied to a wide range of cancers, using various numbers of microdissected<br />
cells. There is a range of 10,000–100,000 cells harvested by LCM for<br />
the successful application of 2D electrophoresis (Table 1) .<br />
4. LCM and Differential In-Gel Electrophoresis<br />
In 2002, Zhou and coworkers described a new technique called differential<br />
in-gel electrophoresis (DIGE) (21). Two pools of proteins are labeled<br />
with 1-(5-carboxypentyl)-1-propylindocarbocyanine halide (Cy3) N-hydroxysuccinimidyl<br />
ester and 1-(5-carboxypentyl)-1-methylindodi-carbocyanine<br />
halide (Cy5) N-hydroxy-succinimidyl ester fluorescent dyes (21). The labeled<br />
proteins are mixed and separated in the same 2D gel. This strategy improves
Table 1<br />
Overview of Different Methods to Combine Laser Microdissection and Different Proteomics Techniques<br />
Separation<br />
technique<br />
Number of<br />
microdissected<br />
cells/sample<br />
Number of<br />
visualized<br />
proteins<br />
Identification<br />
technique<br />
Number of<br />
significant<br />
differentially<br />
identified proteins<br />
Number of<br />
samples/study<br />
Tissue<br />
used<br />
2D PAGE,<br />
silver<br />
staining<br />
2D PAGE,<br />
silver<br />
staining<br />
50,000 Approximately<br />
675 distinct<br />
proteins<br />
including<br />
isoforms<br />
1–5 μg of total<br />
cellular protein<br />
Mass spectrometry<br />
and immunoblot<br />
analysis<br />
Not determined Mass spectrometry<br />
data from all the<br />
protein spots cut<br />
from the gels<br />
n = 2; cytokeratin<br />
1 and annexin I<br />
n = 3; cytokeratin<br />
8, cytokeratin 18,<br />
and -actin<br />
2 cancer samples<br />
and 2 normal<br />
samples<br />
2 cancer samples<br />
and 2 normal<br />
samples<br />
Esophageal<br />
cancer<br />
Colon<br />
cancer<br />
2D PAGE,<br />
silver<br />
staining<br />
50,000 23 differentially<br />
expressed<br />
proteins were<br />
discussed<br />
ESI-MS<br />
identification from<br />
gels made of whole<br />
sections<br />
n = 3; FK506<br />
binding protein,<br />
glyoxalase I, and<br />
RhoGDI<br />
3 invasive OV<br />
and 2 noninvasive<br />
(LMP) OV<br />
Ovarian<br />
cancer<br />
2D PAGE,<br />
silver<br />
staining<br />
2D PAGE,<br />
silver<br />
staining<br />
100,000 315 protein spots MS identification<br />
from gels made of<br />
whole sections<br />
n = 57 observed<br />
proteins. n =2<br />
after confirmation<br />
50,000 800 protein spots MALDI-TOF/TOF n =1;<br />
calcium-binding<br />
protein, S100A6<br />
6 samples of<br />
DCIS and 6<br />
samples of normal<br />
ductal/lobular<br />
units<br />
4 cancer samples<br />
and 4 normal<br />
samples<br />
Breast<br />
cancer<br />
Pancreas<br />
cancer<br />
Reference<br />
(14)<br />
(15)<br />
(16)<br />
(17)<br />
(18)<br />
164
2D PAGE,<br />
silver<br />
staining<br />
2D PAGE,<br />
silver<br />
staining<br />
2D DIGE,<br />
lysine<br />
specific<br />
dyes<br />
2D DIGE,<br />
lysine<br />
specific<br />
dyes<br />
2D DIGE,<br />
lysine<br />
specific<br />
dyes<br />
50,000–70,000 500–600 protein<br />
spots<br />
MALDI-TOF mass<br />
spectrometer<br />
10,000 868 protein spots Nano-flow<br />
ESI-MS/MS<br />
250,000 1038–1088<br />
protein spots<br />
Capillary LC<br />
tandem mass<br />
analysis<br />
30,000 1200 protein<br />
spots<br />
MALDI-TOF<br />
measurements<br />
50,000 Not applicable MALDI-TOF<br />
and/or<br />
immunoblotting<br />
for protein<br />
identification<br />
n =7;<br />
cytokeratin19,<br />
tropomyosin 3,<br />
aldolase A,<br />
glyoxalase I,<br />
cathepsin D chain<br />
3, albumin, and<br />
MnSOD<br />
3 HER-2/neupositive<br />
samples<br />
and 3 HER-<br />
2/neu-negative<br />
samples<br />
n = 11 proteins,<br />
four of them were<br />
novel markers<br />
10 hepatic cancer<br />
cells samples<br />
n = 1; tumor<br />
rejection antigen<br />
(gp96)<br />
One sample<br />
contained normal<br />
and one sample<br />
contains cancer<br />
cells<br />
No further<br />
identifications<br />
One sample<br />
contained gastric<br />
mucosa and one<br />
SPEM<br />
n = 32 Five samples<br />
contained<br />
malignant and<br />
normal breast<br />
tissue<br />
HER-<br />
2/neupositive<br />
breast<br />
cancer<br />
cells<br />
Hepatic<br />
cancer<br />
cells.<br />
hepatitis B<br />
positive<br />
cells<br />
Esophageal<br />
carcinoma<br />
Gastric<br />
metaplasia<br />
samples<br />
Breast<br />
epithelium<br />
cell<br />
(19)<br />
(20)<br />
(21)<br />
(22)<br />
(23)<br />
Continued<br />
165
Table 1<br />
Continued<br />
Separation<br />
technique<br />
2D DIGE,<br />
cysteine<br />
specific<br />
dyes<br />
2D DIGE,<br />
cysteine<br />
specific<br />
dyes<br />
(IPG-IEF)<br />
2D-PAGE<br />
gel<br />
(IPG-IEF)<br />
2D-PAGE<br />
gel<br />
Number of<br />
microdissected<br />
cells/sample<br />
Number of<br />
visualized<br />
proteins<br />
Identification<br />
technique<br />
Number of<br />
significant<br />
differentially<br />
identified proteins<br />
Number of<br />
samples/study<br />
5000 ∼1000 protein<br />
spots<br />
MALDI-MS<br />
and MS/MS<br />
measurements<br />
n = 40 cultured oncogenetransduced<br />
epithelial cells and<br />
precancerous<br />
versus cancerous<br />
tissue<br />
Between 100<br />
and 10<br />
glomeruli,<br />
which equals<br />
to 0.5–3 μg<br />
protein<br />
Between 1400<br />
and 900 protein<br />
spots<br />
Nano<br />
LC-ESI-MS/MS<br />
n = 23 between<br />
mice glomeruli<br />
and mice cortex<br />
3 different protein<br />
extracts from<br />
human glomeruli<br />
and 3 independent<br />
isolated glomeruli<br />
and cortex from 3<br />
mice<br />
Proteins,<br />
3.8 μg<br />
Not applicable Mass spectrometry n = 29 2 samples<br />
contained renal cell<br />
carcinoma and<br />
normal kidney<br />
tissues<br />
Approximately<br />
HPLC<br />
system<br />
16 O/<br />
18 O<br />
isotopic<br />
labeling<br />
peptides<br />
Gel-free<br />
method<br />
Gel-free<br />
method<br />
Gel-free<br />
method<br />
10,000 Not applicable ESI mass<br />
spectrometry<br />
followed by<br />
MS/MS<br />
n = 9 3 slides from the<br />
same cell culture<br />
10,000 Not applicable The reverse phase<br />
of LC-ESI-MS/MS<br />
on the ion trap<br />
mass spectrum<br />
n = 76 2 samples with<br />
invasive ductal<br />
carcinoma of the<br />
breast<br />
30,000–50,000 Not applicable SELDI-TOF/MS n = 1; prostate<br />
carcinomaassociated<br />
protein<br />
(PCa-24)<br />
17 prostate<br />
carcinoma that<br />
contained normal<br />
tissue and BPH<br />
tissue and 7 BPH<br />
samples<br />
∼2000 Not applicable MALDI-TOF/MS n = 2; calgranulin<br />
A and chaperonin<br />
10<br />
8 endometrioid<br />
adenocarcinomas,<br />
4 proliferative<br />
endometria, and<br />
4 secretory<br />
endometria<br />
150 Not applicable MALDI-TOF/MS No protein<br />
identifications.<br />
Unique peptide<br />
pattern of ∼35<br />
peptides for<br />
trophoblast and<br />
stroma cells<br />
1 placenta sample<br />
contained<br />
trophoblasts and<br />
surrounding<br />
stroma cells.<br />
Breast<br />
cancer cell<br />
line<br />
(SKBR-3)<br />
Ductal<br />
carcinoma<br />
of the<br />
breast<br />
Prostate<br />
cancer<br />
Endometrial<br />
cancer<br />
Placenta<br />
samples<br />
(34)<br />
(29)<br />
(41)<br />
(36)<br />
(37)<br />
Continued<br />
167
Table 1<br />
Continued<br />
Separation<br />
technique<br />
Number of<br />
microdissected<br />
cells/sample<br />
Number of<br />
visualized<br />
proteins<br />
Identification<br />
technique<br />
Number of<br />
significant<br />
differentially<br />
identified proteins<br />
Number of<br />
samples/study<br />
Tissue<br />
used<br />
Reference<br />
Gel-free<br />
method<br />
2000–2400 Not applicable MALDI-TOF/TOF<br />
mass spectrometry<br />
No protein<br />
identifications. 9<br />
differentially<br />
expressed peptides<br />
6 invasive ductal<br />
breast carcinoma<br />
contained cancer<br />
and normal cells<br />
Breast<br />
cancer<br />
(38)<br />
Gel-free<br />
method<br />
3000 Not applicable Nano LC-FTICR<br />
mass spectrometry<br />
n = 1003 proteins<br />
identified<br />
2 replicate samples<br />
of breast cancer<br />
epithelial cells<br />
Breast<br />
cancer<br />
Umar<br />
et al.,<br />
2006<br />
ProteinChip<br />
technology<br />
3000–5000 Not applicable Isolation by<br />
two-dimensional<br />
gel electrophoresis<br />
and tandem mass<br />
spectrometry<br />
analysis<br />
n = 1; annexin V 57 head and neck<br />
tumor samples and<br />
44 mucosa samples<br />
Head and<br />
nick<br />
cancer<br />
(40)<br />
ProteinChip<br />
technology<br />
3000–5000 Not applicable Isolation by<br />
reverse-phase<br />
chromatography<br />
and SDS-PAGE<br />
then identified by<br />
MS/MS analysis<br />
n = 1; heat shock<br />
protein 10<br />
39 colorectal tumor<br />
samples, 40 normal<br />
mucosa samples,<br />
and 29 adenoma<br />
samples<br />
Colorectal<br />
cancer<br />
(39)<br />
Abbreviations: 2DE: 2 dimensional gel electrophoresis, OV: ovarian cancer, LMP: low malignant potential, DCIS: ductal/lobular units<br />
and ductal carcinoma in situ, HCC: hepatocellular carcinoma, BPH: benign prostatic hyperplasia, SPEM: spasmolytic polypeptide expressing<br />
metaplasia, PR: progesterone receptor, ER: estrogen receptor<br />
168
Combining LCM and Proteomics Techniques 169<br />
the sensitivity of detection and enlarges the range of candidate proteins<br />
for detection. Molecular weight- and charge-matched cyanine dyes enable<br />
multiplex labeling with different samples run on the same gel. The same investigators<br />
described a powerful tool for the molecular characterization of cancer<br />
progression and identification of cancer-specific protein markers by combining<br />
2D DIGE with MS. They compare the 2D DIGE of about 250,000 microdissected<br />
cells from oesophageal carcinoma with normal epithelial cells from<br />
the oesophagus. The cancer cell lysate yielded 1038 protein spots while the<br />
normal epithelial lysate yielded 1088 protein spots. In-gel digestion of the<br />
differentially expressed protein spots was followed by capillary high performance<br />
liquid chromatography (HPLC) tandem mass analysis to achieve further<br />
identification. This way, tumor rejection antigen (gp96) was found to be<br />
upregulated in oesophageal squamousal cell cancer (21). Applying the same<br />
procedure to smaller numbers of microdissected cells from biopsy samples<br />
with gastric metaplasia appeared to be successful as well (22). Approximately<br />
1200 spots were identified from 30,000 microdissected cells. Twenty-eight of<br />
these spots were over expressed in the metaplasia samples as compared to<br />
the normal surface cells (22). However, subsequent MALDI-TOF measurements<br />
of the spots did not result in the identification of proteins. The same<br />
procedure was applied to 50,000 microdissected cells resulting in the identification<br />
of 32 proteins in breast epithelial cancer cells (23), of which thirteen<br />
had not been associated previously with the tumors (23). One technical aspect<br />
of the 2D DIGE method needs special attention: the nature of the fluorescent<br />
dyes and their ability to bind to lysine residues only (21). Proteins with high<br />
percentages of lysine residues can be labeled more efficiently as compared to<br />
proteins containing little or no lysine. By developing a new generation of dyes<br />
reacting with cysteine residues, the sensitivity of DIGE has been improved (24).<br />
Although cysteine is less abundant than lysine in proteins in general, cysteine<br />
labeling can be carried to saturation. Lysine labeling must be limited to 1–3%<br />
of all the residues to prevent loss of solubility when bulky hydrophobic dyes<br />
are coupled to the polar lysine residues (24). Greengauz-Roberts and coworkers<br />
applied the saturated labeling for cysteine residues to study about 5000 cells<br />
obtained by LCM of metaplasia and cancer cells. A total of 1471 distinct protein<br />
features were observed from the relatively small number of cells. Ninety-six of<br />
these spots were further identified. Using MALDI-MS and MS/MS measurements<br />
in addition to the specific position of the protein in the gel resulted in the<br />
identification of 42 proteins in cancer samples (25). Also Sitek and coworkers<br />
described a novel approach to analyze glomerular proteins from mice and<br />
human samples using DIGE saturation labeling (26). Only 10 glomeruli (0.5 μg)<br />
picked by LCM from a slide of a human kidney biopsy appeared to be sufficient<br />
to visualize 900 spots using DIGE technique (26). 2D DIGE holds several
170 Mustafa et al.<br />
advantages over the conventional 2D gel. One of the most important advantages<br />
is the improvement of the reproducibility of 2D DIGE method. The gel-to-gel<br />
differences are minimalized because the separation of the pooled samples takes<br />
place in the same gel. Therefore, the comparison of protein expression from<br />
two cell populations or samples can be more accurately assessed and easier to<br />
be identified. The quantitative differences of protein contents are also better<br />
measured by the application of fluorescent dyes. In addition, 2D DIGE enables<br />
a higher throughput analysis of 2D gels by its feasibility to automatic gel<br />
imaging. Importantly, labeling of proteins by fluorescent dyes did not affect the<br />
protein identification by MS, because only small percentages of the molecules<br />
of each protein are labeled. Importantly, for 2D DIGE the number of microdissected<br />
cells, which are required for protein identification is less as compared<br />
to the other 2D electrophoresis techniques (Table 1).<br />
5. LCM and Different Labeling Techniques<br />
The comparison of the proteome of two different samples (for instance,<br />
normal and tumor cells) is facilitated by labeling. In 2004, Li and coworkers<br />
described a method for qualitative and quantitative protein analysis by<br />
combining LCM with isotope-coded affinity tag labeling technology and twodimensional<br />
liquid chromatography coupled with tandem mass spectroscopy<br />
(2D-LC-MS/MS) (27). Approximately 50,000–100,000 cells of HCC and<br />
nonHCC hepatocytes were microdissected and a total of 644 proteins in<br />
HCC hepatocytes were qualitatively determined, and 261 differential proteins<br />
between the two groups were quantified (28). In 2004, 16 O/ 18 O isotopic labeled<br />
peptides were generated from 10,000 microdissected cells of ductal carcinoma<br />
of the breast. The approach allowed the identification of 76 proteins (29).<br />
By using reverse phase liquid chromatography-electrospray ionization tandem<br />
mass spectrometry (LC-ESI-MS/MS) Zang and coworkers were able to identify<br />
proteins that were significantly upregulated in the breast tumor cells (29).<br />
Separating the radioactive labeled peptides on the high resolution 54 cm serial<br />
immobilized pH gradient isoelectric focusing 2D-PAGE gel provided a precise<br />
estimate of the abundance ratio for proteins from two samples (30). The radioiodination<br />
of 3.8 μg renal carcinoma proteins and 3.8 μg normal kidney proteins<br />
with both 125 I and 131 I followed by mass spectrometric identification revealed<br />
29 differentially expressed proteins (30). Applying the same methodology of<br />
radioactive labeling to a pool of microdissected breast cancer cells provided<br />
a sensitive method to identify some differentially expressed proteins in correlation<br />
with the presence of progesterone receptor in estrogens receptor-positive<br />
breast cancer (31).
Combining LCM and Proteomics Techniques 171<br />
6. Combining LCM and Different Separation Methods<br />
It has been shown previously that the number of detected and identified<br />
peptides and proteins increases significantly by coupling MALDI-MS (32)<br />
and ESI-MS (33) to a peptide or protein separation system. In 2003, Wu and<br />
coworkers described a method for discovering biomarkers from microdissected<br />
homogeneous cells from breast cancer cell lines (34). Following capturing<br />
the cells, the peptide digest was fractionated by reversed phase HPLC and<br />
analyzed by ion trap MS (34). HPLC fractionation of about 10,000 endothelial<br />
cells from a breast cancer cell line (SKBR-3) followed by ESI MS resulted<br />
in the identification of low-expressed proteins in the cell line. Capillary<br />
isoelectric focusing combined with the reverse phase nano-LC in an automated<br />
and integrated platform provides systematic resolution of complex peptide<br />
mixtures generated from limited protein quantities (7). This method separated<br />
the mixture of peptides based on differences in isoelectric points and hydrophobicity,<br />
and it eliminates peptide loss and analyte dilution (7). This method<br />
of separation coupled to ESI-tandem MS assists in the detection of 6866<br />
peptides, leading to the identification of 1820 proteins from 20,000 microdissected<br />
cells of glioblastoma (7). In order to increase the number of identified<br />
proteins from LCM of brain samples, Gozal and coworkers added an extra<br />
separation step (35). After collecting cells by LCM, the total protein were<br />
extracted and resolved on an SDS gel. Gels were cut out into multiple pieces<br />
followed by trypsin digestion. Peptides were subjected to highly sensitive liquid<br />
chromatography-tandem mass spectrometry (LC-MS/MS). This way resulted<br />
in identifying hundreds to thousands of proteins (35).<br />
7. LCM and Gel-Free Mass Spectrometry<br />
There are possibilities of measuring the peptide digest of cells harvested by<br />
LCM directly by MS, without an initial separation step on 2D PAGE (known as<br />
“gel-free MS”). Guo and coworkers directly analyzed endometrial epithelium<br />
cells obtained by LCM using matrix-assisted laser desorption/ionization timeof-flight<br />
mass spectrometry (MALDI-TOF/MS) (36). A total of 16 physiologic<br />
and malignant endometrial samples including four proliferative and four<br />
secretory endometria, and eight endometrioid adenocarcinomas were used for<br />
this study. Approximately 2000 cells appeared to be sufficient to confirm<br />
overexpression of two proteins, calgranulin A and chaperonin 10 in the<br />
epithelial cells of endometrial adenocarcinoma samples (36). In another study,<br />
the direct analysis of 125 trophoblast and stroma cells of placental tissue resulted<br />
in the detection of significant expressed protein differences between these two<br />
cell types (37). Also, differentially expressed proteins between breast cancer<br />
and normal samples can be detected by direct MALDI-TOF/MS measurements
172 Mustafa et al.<br />
of 2000–2400 LCM cells (38). In a recent study, it was possible to identify<br />
over 1000 proteins from 3000 microdissected cells by the combination of<br />
advanced nanoLC and high resolution Fourier transformer mass spectrometry<br />
(FTMS) (39).<br />
8. LCM and Protein Chip Technology<br />
There are currently two approaches to produce arrays capable of generating<br />
protein network information. The first method is the forward phase array in<br />
which each spot on the slide represents a specific antibody. Therefore, the array<br />
is incubated with only one test sample (9). The second method is the reverse<br />
phase array in which each spot represents an individual test sample, and the<br />
array is composed of multiple, different samples, which then can be tested<br />
under the same experimental conditions. In addition, when the arrays are probed<br />
separately with two different classes of antibodies, it is possible to specifically<br />
detect the total and phosphorylated forms of the protein of interest (9). By<br />
combining LCM technique to protein chip technology, Melle and coworkers<br />
identified annexin V as a specific protein in head and neck cancer patients,<br />
and heat shock protein 10 as a biomarker in colorectal cancer patients (40,41).<br />
The protein lysates from 3000 to 5000 microdissected cells were analyzed on<br />
both strong anion exchange arrays and weak cation exchange arrays, followed<br />
by separation steps (e.g., 2D gel or reverse phase chromatography and SDS-<br />
PAGE), MS measurements, and MS/MS analysis (40,41). In both cases, a<br />
validation step by immunohistochemistry confirmed their findings.<br />
In other studies surface-enhanced laser desorption/ionization time-of-flight<br />
analysis was applied to microdissected cells because of its sensitivity to<br />
smaller amounts of material than other techniques such as 2D gel (42). Using<br />
30,000–50,000 cells of prostate carcinoma specimens, the unique expression<br />
of prostate carcinoma-associated protein, called PCa-24 in the epithelial cells,<br />
was reached (42). Protein microarrays hold several technical challenges (43).<br />
Their application offers the advantage of scalability, flexibility, and automatic<br />
processing (43). Arrays may also enable the control of key parameters such as<br />
temperature, pH, and cofactor concentration, which are not easily afforded by<br />
cell-based systems.<br />
9. Perspectives of LCM and Mass Spectrometry Analysis<br />
The use of LCM of (relatively) pure populations of cells to be used for<br />
further analysis of their proteome is an important addition to the arsenal of<br />
techniques in bioscience. However, this technique is still time consuming and<br />
yield relatively small numbers of cells. To overcome this problem, alternative
Combining LCM and Proteomics Techniques 173<br />
Intens.<br />
×10 7<br />
1994.98513<br />
Intens.<br />
×10 6<br />
1.0<br />
1726.89642<br />
1793.73840<br />
1891.97950<br />
2025.94879<br />
1999.99082<br />
0.8<br />
1818.99943<br />
1943.95115<br />
1840.98089 1873.94999<br />
fibrinogen<br />
1.5<br />
0.6<br />
GAPDH<br />
1859.95483<br />
1978.96298<br />
1963.92507<br />
1475.75278<br />
0.4<br />
CD34 antigen<br />
0.2<br />
1.0<br />
1277.71354<br />
0.0<br />
1700 1750 1800 1850 1900 1950 2000 m/z<br />
+MS<br />
0.5<br />
GFAP<br />
1707.77693<br />
fibrinogen<br />
2151.08736<br />
2368.27262<br />
2511.14239<br />
Tubulin<br />
Hb<br />
2706.17286<br />
alpha 2<br />
3265.53235<br />
2903.42238<br />
0.0<br />
1000 1500 2000 2500 3000 3500 m/z<br />
+MS<br />
Fig. 2. MALDI FTMS spectrum obtained from 150 microdissected cells from a<br />
frozen glioma tissue sample. The spectrum contains approximately thousand monoisotopic<br />
peaks between 700 and 3000 m/z at relative high peak intensities. The small box<br />
is a zoom in for a small part of the spectra, between 1700 and 2000 m/z. It shows the<br />
very high numbers of peaks obtained from measuring a very small number of cells.<br />
The peaks can be identified by different sequencing MS techniques; some examples of<br />
identified peptides are indicated in the spectrum.<br />
steps of processing tissues are needed. Sample collection and preparation is<br />
crucial. During the microdissection procedure, special attention should be taken<br />
to prevent waist and contamination of target material. For instance, material<br />
should not drop from, or stick to, the cap of the tubes used. Another consideration<br />
is to minimize the steps of transferring the collected material from one<br />
tube into the other. Therefore, the use of low protein binding tubes is recommended.<br />
A protocol for sample preparation is included in this chapter (Box 1).<br />
The 2D PAGE is a well-established technique that had been used in combination<br />
with LCM in many studies so far. The need of relative large numbers of<br />
cells blocks the possibility to measure large numbers of samples as indicated<br />
in Table 1. In addition, the relative low reproducibility hampers sound statistical<br />
analysis. 2D DIGE improves reproducibility and also lowers the required<br />
amount of microdissected tissue. However, this technique is suitable for experimental<br />
research only.
174 Mustafa et al.<br />
LCM sample preparation protocol:<br />
Cryosections of 8 μm were made from glioma braintumor tissue and<br />
mounted on polyethylene naphthalate covered glass slides (PALM Microlaser<br />
Technologies AG, Bernried, Germany) as described previously (38). The<br />
slides were fixed in 70% ethanol and stored at (–20 (C for not more than 2<br />
days. After fixation and immediately before microdissection, the slides were<br />
washed twice with Milli-Q water, stained for 10 s in haematoxylin, washed<br />
again twice with Milli-Q water and subsequently dehydrated in a series of 50,<br />
70, 95, and 100% ethanol solution and air dried. The PALM laser microdissection<br />
and pressure catapulting device, type P-MB was used with PalmRobo<br />
v2.2 software at 40× magnification. Estimating that a cell has a volume of<br />
10 × 10 × 10 μm, we microdissected an area of about 190,000 μm 2 of blood<br />
vessels and another area of the same size of the surrounding tumor tissue from<br />
each sample, resulting in approximately 1500 cells per sample. The microdissected<br />
cells were collected in caps of PALM tubes in 5 μl of 0.1% RapiGest<br />
buffer (Waters, Milford, MA, USA). The caps were cut and placed onto<br />
0.5 ml Eppendorf protein LoBind tubes (Eppendorf, Hamburg, Germany).<br />
Subsequently, these tubes were centrifuged at 12,000 g for 5 min. To make<br />
sure that all the cells were covered with buffer, another 5 μl of RapiGest<br />
was added to the cells. All samples were stored at –80°C. After thawing<br />
the microdissected tissue, the tissue was disrupted by external sonification<br />
for 1 min at 70% amplitude at a maximum temperature of 25°C (Bransons<br />
Ultrasonics, Danbury, USA). The samples were incubated at 37 and 100°C<br />
for 5 and 15 min, respectively, for protein solubilization and denaturation.<br />
To each sample, 1.5 μl of 100 ng/μl gold grade trypsin (Promega, Madison,<br />
WI, USA) in 3 mM Tris–HCL diluted 1:10 in 50 mM NH 4 HCO 3 was added<br />
and incubated overnight at 37°C for protein digestion. To inactivate trypsin<br />
and to degrade the RapiGest, 2 μl of 500 mM HCL was added and incubated<br />
for 30 min at 37°C. Samples were dried in a Speedvac (Thermo Savant,<br />
Holbrook, NY, USA) and reconstituted in 5 μl of 50% acetonitrile/0.5% trifluoroacetic<br />
acid/water prior to measurement. Samples were used for immediate<br />
measurements, or stored for a maximum of 10 days at 4°C.<br />
Recently, the improvement of resolution and detection limits in modern mass<br />
spectrometers, particularly in FTMS, opened a new research field to analyze<br />
small numbers of microdissected cells (in the range of 200–5000). FTMS<br />
has specific characteristics, unrivalled high mass resolution (in the order of<br />
100,000–1,000,000), high mass accuracy (below 1 ppm), dynamics (three to<br />
four orders of magnitude), and its good signal to noise ratio (44). These features<br />
facilitate combining this technique with LCM. For instance, by MALDI-FTMS,
Combining LCM and Proteomics Techniques 175<br />
peptide digests of no more than 150 cells taken from biological samples (e.g.,<br />
glioma vessel tissue) resulted in informative mass spectra (Fig. 2). It is expected<br />
that techniques like FTMS soon will be implicated in the practice of routine<br />
laboratories for the detection of disease-related proteins in clinical specimens.<br />
References<br />
1. Zhang, L., Zhou, W., Velculescu, V. E., Kern, S. E., Hruban, R. H., Hamilton, S. R.,<br />
Vogelstein, B. and Kinzler, K. W. (1997) Gene expression profiles in normal and<br />
cancer cells. Science 276, 1268–1272.<br />
2. Curran, S., McKay, J. A., McLeod, H. L. and Murray, G. I. (2000) Laser capture<br />
microscopy. Mol Pathol 53, 64–68.<br />
3. Going, J. J. and Lamb, R. F. (1996) Practical histological microdissection for PCR<br />
analysis. J Pathol 179, 121–124.<br />
4. Zhuang, Z., Bertheau, P., Emmert-Buck, M. R., Liotta, L. A., Gnarra, J., Linehan,<br />
W. M. and Lubensky, I. A. (1995) A microdissection technique for archival DNA<br />
analysis of specific cell populations in lesions
176 Mustafa et al.<br />
13. Ball, H. J. and Hunt, N. H. (2004) Needle in a haystack: microdissecting the<br />
proteome of a tissue. Amino Acids 27, 1–7.<br />
14. Emmert-Buck, M. R., Gillespie, J. W., Paweletz, C. P., Ornstein, D. K., Basrur, V.,<br />
Appella, E., Wang, Q. H., Huang, J., Hu, N., Taylor, P. and Petricoin, E. F. 3rd (2000)<br />
An approach to proteomic analysis of human tumors. Mol Carcinog 27, 158–165.<br />
15. Lawrie, L. C., Curran, S., McLeod, H. L., Fothergill, J. E. and Murray, G. I. (2001)<br />
Application of laser capture microdissection and proteomics in colon cancer. Mol<br />
Pathol 54, 253–258.<br />
16. Jones, M. B., Krutzsch, H., Shu, H., Zhao, Y., Liotta, L. A., Kohn, E. C. and<br />
Petricoin, E. F. 3rd (2002) Proteomic analysis and identification of new biomarkers<br />
and therapeutic targets for invasive ovarian cancer. Proteomics 2, 76–84.<br />
17. Wulfkuhle, J. D., Sgroi, D. C., Krutzsch, H., McLean, K., McGarvey, K.,<br />
Knowlton, M., Chen, S., Shu, H., Sahin, A., Kurek, R., Wallwiener, D.,<br />
Merino, M. J., Petricoin, E. F. 3rd, Zhao, Y. and Steeg, P. S. (2002) Proteomics<br />
of human breast ductal carcinoma in situ. Cancer Res 62, 6740–6749.<br />
18. Shekouh, A. R., Thompson, C. C., Prime, W., Campbell, F., Hamlett, J., Herrington,<br />
C. S., Lemoine, N. R., Crnogorac-Jurcevic, T., Buechler, M. W., Friess, H.,<br />
Neoptolemos, J. P., Pennington, S. R. and Costello, E. (2003) Application of laser<br />
capture microdissection combined with two-dimensional electrophoresis for the<br />
discovery of differentially regulated proteins in pancreatic ductal adenocarcinoma.<br />
Proteomics 3, 1988–2001.<br />
19. Zhang, D. H., Tai, L. K., Wong, L. L., Sethi, S. K. and Koay, E. S. (2005)<br />
Proteomics of breast cancer: enhanced expression of cytokeratin19 in human<br />
epidermal growth factor receptor type 2 positive breast tumors. Proteomics 5,<br />
1797–1805.<br />
20. Ai, J., Tan, Y., Ying, W., Hong, Y., Liu, S., Wu, M., Qian, X. and Wang, H. (2006)<br />
Proteome analysis of hepatocellular carcinoma by laser capture microdissection.<br />
Proteomics 6, 538–546.<br />
21. Zhou, G., Li, H., DeCamp, D., Chen, S., Shu, H., Gong, Y., Flaig, M.,<br />
Gillespie, J. W., Hu, N., Taylor, P. R., Emmert-Buck, M. R., Liotta, L. A.,<br />
Petricoin, E. F. 3rd and Zhao, Y. (2002) 2D differential in-gel electrophoresis for<br />
the identification of esophageal scans cell cancer-specific protein markers. Mol<br />
Cell Proteomics 1, 117–124.<br />
22. Lee, J. R., Baxter, T. M., Yamaguchi, H., Wang, T. C., Goldenring, J. R. and<br />
Anderson, M. G. (2003) Differential protein analysis of spasomolytic polypeptide<br />
expressing metaplasia using laser capture microdissection and two-dimensional<br />
difference gel electrophoresis. Appl Immunohistochem Mol Morphol 11, 188–193.<br />
23. Hudelist, G., Singer, C. F., Pischinger, K. I., Kaserer, K., Manavi, M., Kubista, E.<br />
and Czerwenka, K. F. (2006) Proteomic analysis in human breast cancer: identification<br />
of a characteristic protein expression profile of malignant breast epithelium.<br />
Proteomics 6, 1989–2002.<br />
24. Shaw, J., Rowlinson, R., Nickson, J., Stone, T., Sweet, A., Williams, K. and<br />
Tonge, R. (2003) Evaluation of saturation labelling two-dimensional difference gel<br />
electrophoresis fluorescent dyes. Proteomics 3, 1181–1195.
Combining LCM and Proteomics Techniques 177<br />
25. Greengauz-Roberts, O., Stoppler, H., Nomura, S., Yamaguchi, H.,<br />
Goldenring, J. R., Podolsky, R. H., Lee, J. R. and Dynan, W. S. (2005) Saturation<br />
labeling with cysteine-reactive cyanine fluorescent dyes provides increased sensitivity<br />
for protein expression profiling of laser-microdissected clinical specimens.<br />
Proteomics 5, 1746–1757.<br />
26. Sitek, B., Potthoff, S., Schulenborg, T., Stegbauer, J., Vinke, T., Rump, L. C.,<br />
Meyer, H. E., Vonend, O. and Stuhler, K. (2006) Novel approaches to analyse<br />
glomerular proteins from smallest scale murine and human samples using DIGE<br />
saturation labelling. Proteomics 6, 4337–4345.<br />
27. Li, C., Hong, Y., Tan, Y. X., Zhou, H., Ai, J. H., Li, S. J., Zhang, L., Xia, Q. C.,<br />
Wu, J. R., Wang, H. Y. and Zeng, R. (2004) Accurate qualitative and quantitative<br />
proteomic analysis of clinical hepatocellular carcinoma using laser capture<br />
microdissection coupled with isotope-coded affinity tag and two-dimensional liquid<br />
chromatography mass spectrometry. Mol Cell Proteomics 3, 399–409.<br />
28. Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H. and Aebersold, R.<br />
(1999) Quantitative analysis of complex protein mixtures using isotope-coded<br />
affinity tags. Nat Biotechnol 17, 994–999.<br />
29. Zang, L., Palmer Toy, D., Hancock, W. S., Sgroi, D. C. and Karger, B. L. (2004)<br />
Proteomic analysis of ductal carcinoma of the breast using laser capture microdissection,<br />
LC-MS, and 16O/18O isotopic labeling. J Proteome Res 3, 604–612.<br />
30. Poznanovic, S., Wozny, W., Schwall, G. P., Sastri, C., Hunzinger, C.,<br />
Stegmann, W., Schrattenholz, A., Buchner, A., Gangnus, R., Burgemeister, R. and<br />
Cahill, M. A. (2005) Differential radioactive proteomic analysis of microdissected<br />
renal cell carcinoma tissue by 54 cm isoelectric focusing in serial immobilized pH<br />
gradient gels. J Proteome Res 4, 2117–2125.<br />
31. Neubauer, H., Clare, S. E., Kurek, R., Fehm, T., Wallwiener, D., Sotlar, K.,<br />
Nordheim, A., Wozny, W., Schwall, G. P., Poznanovic, S., Sastri, C.,<br />
Hunzinger, C., Stegmann, W., Schrattenholz, A. and Cahill, M. A. (2006)<br />
Breast cancer proteomics by laser capture microdissection, sample pooling, 54-<br />
cm IPG IEF, and differential iodine radioisotope detection. Electrophoresis 27,<br />
1840–1852.<br />
32. Preisler, J., Hu, P., Rejtar, T., Moskovets, E. and Karger, B. L. (2002) Capillary<br />
array electrophoresis-MALDI mass spectrometry using a vacuum deposition<br />
interface. Anal Chem 74, 17–25.<br />
33. Bergstrom, S. K., Samskog, J. and Markides, K. E. (2003) Development<br />
of a poly(dimethylsiloxane) interface for on-line capillary column liquid<br />
chromatography-capillary electrophoresis coupled to sheathless electrospray<br />
ionization time-of-flight mass spectrometry. Anal Chem 75, 5461–5467.<br />
34. Wu, S. L., Hancock, W. S., Goodrich, G. G. and Kunitake, S. T. (2003) An approach<br />
to the proteomic analysis of a breast cancer cell line (SKBR-3). Proteomics 3,<br />
1037–1046.<br />
35. Gozal, Y. M., Cheng, D., Duong, D. M., Lah, J. J., Levey, A. I. and Peng, J. (2006)<br />
Merger of laser capture microdissection and mass spectrometry: a window into the<br />
amyloid plaque proteome. Methods Enzymol 412, 77–93.
178 Mustafa et al.<br />
36. Guo, J., Colgan, T. J., DeSouza, L. V., Rodrigues, M. J., Romaschin, A. D.<br />
and Siu, K. W. (2005) Direct analysis of laser capture microdissected endometrial<br />
carcinoma and epithelium by matrix-assisted laser desorption/ionization mass<br />
spectrometry. Rapid Commun Mass Spectrom 19, 2762–2766.<br />
37. de Groot, C. J., Steegers-Theunissen, R. P., Guzel, C., Steegers, E. A. and<br />
Luider, T. M. (2005) Peptide patterns of laser dissected human trophoblasts<br />
analyzed by matrix-assisted laser desorption/ionisation-time of flight mass<br />
spectrometry. Proteomics 5, 597–607.<br />
38. Umar, A., Dalebout, J. C., Timmermans, A. M., Foekens, J. A. and Luider, T.<br />
M. (2005) Method optimisation for peptide profiling of microdissected breast<br />
carcinoma tissue by matrix-assisted laser desorption/ionisation-time of flight<br />
and matrix-assisted laser desorption/ionisation-time of flight/time of flight-mass<br />
spectrometry. Proteomics 5, 2680–2688.<br />
39. Umar, A., Luider, T. M., Foekens, J. A. and Pasa-Tolic, L. (2007) NanoLC-FT-<br />
ICR Ms improves proteome coverage attainable for approximately 3000 lasermicrodissected<br />
breast carcinoma cells. Proteomics 7, 323–329.<br />
40. Melle, C., Bogumil, R., Ernst, G., Schimmel, B., Bleul, A. and von Eggeling, F.<br />
(2006) Detection and identification of heat shock protein 10 as a biomarker in<br />
colorectal cancer by protein profiling. Proteomics 6, 2600–2608.<br />
41. Melle, C., Ernst, G., Schimmel, B., Bleul, A., Koscielny, S., Wiesner, A.,<br />
Bogumil, R., Moller, U., Osterloh, D., Halbhuber, K. J. and von Eggeling, F.<br />
(2003) Biomarker discovery and identification in laser microdissected head and<br />
neck squamous cell carcinoma with ProteinChip technology, two-dimensional gel<br />
electrophoresis, tandem mass spectrometry, and immunohistochemistry. Mol Cell<br />
Proteomics 2, 443–452.<br />
42. Zheng, Y., Xu, Y., Ye, B., Lei, J., Weinstein, M. H., O’Leary, M. P., Richie, J. P.,<br />
Mok, S. C. and Liu, B. C. (2003) Prostate carcinoma tissue proteomics for<br />
biomarker discovery. Cancer 98, 2576–2582.<br />
43. Cutler, P. (2003) Protein arrays: the current state-of-the-art. Proteomics 3, 3–18.<br />
44. Dekker, L. J., Burgers, P. C., Guzel, C. and Luider, T. M. (2007) Ftms and<br />
TOF/TOF mass spectrometry in concert: identifying peptides with high reliability<br />
using matrix prespotted MALDI target plates. J Chromatogr B Analyt Technol<br />
Biomed Life Sci 847, 62–64.<br />
45. Mustafa, D. A., Burgers, P. C., Dekker, L. J., Charif, H., Titulaer, M. K.,<br />
Smitt, P. A., Luider, T. M. and Kros, J. M., (2007) Identification of glioma<br />
neovascularization-related proteins by using MALDI-FTMS and nano-LC fractionation<br />
to microdissected tumor vessels. Mol Cell Proteomics 6, 1147–1157.
III<br />
Clinical Proteomics by LC-MS Approaches
10<br />
Comparison of Protein Expression by Isotope-Coded<br />
Affinity Tag Labeling<br />
Zhen Xiao and Timothy D. Veenstra<br />
Summary<br />
Isotope-coded affinity tag (ICAT) labeling, in combination with mass spectrometry<br />
(MS), has been widely adopted as an effective method for comparing protein abundance<br />
levels. This chapter describes the ICAT labeling procedure in search for the celecoxibregulated<br />
proteins in a colon cancer cell line. Celecoxib, a cyclooxygenase-2 (COX-2)<br />
specific inhibitor, is used as a colorectal cancer preventative drug in clinical trials. Here,<br />
celecoxib is used to inhibit the expression of COX-2 in a colon cancer cell line HT-29.<br />
To elucidate the proteomic changes induced by celecoxib, the protein lysates from the<br />
treated and control cells are prepared. The cysteine-containing proteins are labeled with the<br />
heavy and light ICAT reagents, respectively. The labeled proteins are then combined and<br />
digested with trypsin. The ICAT-labeled peptides are subject to the purification through<br />
an avidin column and eventually the cleavage of the biotin tags. This chapter focuses on<br />
the ICAT labeling procedure itself, because sample preparation is the most critical step of<br />
an ICAT-based protein expression comparison experiment. Other related procedures such<br />
as the cation exchange high performance liquid chromatography separation of peptides<br />
and MS analysis are detailed elsewhere in this book.<br />
Key Words: isotope-coded affinity tags; quantitative proteomics; mass spectrometry.<br />
1. Introduction<br />
The application of mass spectrometry (MS) has rapidly expanded from<br />
simple identification of protein components to the quantitative comparison<br />
of proteomic changes under various biological and physiological conditions<br />
(1,2,3). In many studies, it is desirable to identify proteins and quantify their<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
181
182 Xiao and Veenstra<br />
levels simultaneously using MS. While the ability to target specific molecules<br />
for quantitation is well established, there are experimental and technical issues<br />
that limit the accuracy of direct quantitation of hundreds (or thousands) of<br />
species in a single MS experiment and make it extremely challenging (4,5,6,7).<br />
To resolve this hurdle, a variety of chemical-based labeling and derivatization<br />
techniques have been developed (5,7,8,9). One of these techniques, isotopecoded<br />
affinity tags (ICATs), has been widely adopted and remains the model<br />
system by which most other differential labeling methods have been developed<br />
(10). The structure of the reagent used in ICAT studies is composed of four<br />
parts: (1) an iodoacetamide group that covalently reacts with cysteine residues<br />
within proteins; (2) an isotope-coded linker regions, which is prepared in two<br />
distinct versions containing either nine 13 C (heavy version) or nine 12 C (light<br />
version); (3) a biotin tag that facilitates the purification of labeled peptides via<br />
its specific binding to avidin; and (4) an acid-labile bond that is situated between<br />
the biotin and isotopically differential domain of the reagent (Fig. 1). After<br />
labeling the cysteine residues, the protein mixture is enzymatically digested<br />
(usually with trypsin) and the labeled peptides purified via avidin chromatography.<br />
Following the enrichment of the ICAT-labeled peptides, the cleavable<br />
linker and the biotin tag are removed using trifluoroacetic acid (TFA). The<br />
removal of the biotin tag reduces the mass of the remaining tag attached to the<br />
peptide and increases the fragmentation efficiency and ultimately the success<br />
rate of peptide identification by tandem MS.<br />
The advantage of ICAT labeling is the identical chemistry, yet differential<br />
mass, of the heavy and light reagents, which enables the protein<br />
abundances within two complex proteome samples to be compared simultaneously.<br />
Following their coelution from a nanoflow reversed-phase liquid<br />
chromatography column, the light- and heavy-labeled peptides are easily recognized<br />
within the mass spectrum, being separated by ∼9 Da. The tandem MS<br />
spectrum enables the peptide to be identified, while the ratio of the areas of<br />
each peak is used as a measurement of the peptide’s relative abundance in<br />
the samples being compared. Since its inception, the ICAT reagents have been<br />
modified, improved, and made available commercially via applied biosystems<br />
Fig. 1. The structure of cleavable isotope-coded affinity tag reagent.
Isotope-Coded Affinity Tag Labeling 183<br />
as a kit (11). The combination of ICAT labeling, peptide fractionation, and<br />
the liquid chromatography tandem mass spectrometry has enabled the rapid<br />
and simultaneous identification and quantitation of changes in complex protein<br />
mixtures (12,13,14,15,16).<br />
In this chapter, the ICAT labeling procedure is described as part of an experiment<br />
to identify celecoxib-induced proteomic changes in colon cancer cells.<br />
Celecoxib is a nonsteroidal anti-inflammatory drug that specifically inhibits<br />
cyclooxygenase-2 (COX-2) (17,18). In clinical trials, it has been shown to<br />
inhibit the development of precancerous polyposis in colon (19,20). In this<br />
study, a COX-2 expressing colon cancer cell line (HT-29) is used (21,22).<br />
After treating the cells with celecoxib, cell lysate would be prepared and<br />
labeled with the ICAT reagents. A schematic diagram of the ICAT labeling<br />
and peptide analysis procedure is shown in Fig. 2. Since the core of the ICATbased<br />
quantitative proteomic analysis is sample preparation, this chapter is<br />
dedicated to the details of the ICAT labeling protocol itself. For information<br />
on strong cation exchange (SCX) high performance liquid chromatography<br />
(HPLC) separation of peptides, analysis by nanoflow reversed-phase liquid<br />
chromatography tandem mass spectrometry, and bioinformatics analysis, refer<br />
to the chapter on “Analysis of the Extracellular Matrix and Secreted Vesicle<br />
Proteomes by Mass Spectrometry,” (Subheadings 3.6–3.8). The methods<br />
described in this chapter can be used to (1) understand the proteomic changes<br />
in response to drug; (2) illustrate the molecular mechanisms underlying the<br />
drug effects; and (3) search for biomarkers or endpoints that can be used to<br />
monitor and evaluate the therapeutic and intervention approaches.<br />
2. Materials<br />
2.1. Cell Culture and Harvest<br />
1. T-75 cell culture flasks<br />
2. McCoy’s 5a medium supplemented with 10% (v/v) fetal bovine serum, 50 U/mL<br />
penicillin, 50 μg/mL streptomycin, and 1.5 mM l-glutamine (American Type<br />
Culture Collection (ATCC), Manassas, VA)<br />
3. Dimethylsulfoxide (DMSO, cell culture use)<br />
4. HT-29 cell line (ATCC, Manassas, VA)<br />
5. Celecoxib (Pfizer, New York, NY)<br />
6. 75 μM celecoxib: dissolve celecoxib in DMSO to make a 100 mM stock solution.<br />
Further dilute to 75 μM with McCoy’s 5a cell culture medium. Use the same<br />
concentration of DMSO in medium as negative control<br />
7. Sterile phosphate-buffered saline (PBS) solution<br />
8. 500 mM EDTA, pH 8<br />
9. 2 mM EDTA in sterile PBS: add 80 μL of 500 mM EDTA, pH 8, in 20 mL of PBS<br />
10. Centrifuge (maximum force: ∼17,000×g)
184 Xiao and Veenstra<br />
Fig. 2. Schematic diagram of the ICAT labeling procedure applied to the quantitative<br />
proteomic analysis.<br />
2.2. Cell Lysis, Desalting, and Protein Quantitation<br />
1. Lysis buffer: 50 mM Tris–HCl, pH 7.2, 1% Triton X-100, 10 mM sodium fluoride<br />
(NaF), 1 mM sodium orthovanadate (Na 3 VO 4 ), and 1 mM EDTA<br />
2. Digital sonifier (Model 250, Branson Ultrasonics Corporation, Danbury, CT)<br />
3. Bicinchoninic acid (BCA) protein assay reagent kit (Pierce, Rockford, IL)<br />
4. D-Salt TM excellulose plastic desalting column 5 mL (maximum binding capacity<br />
is 1.25 mg per column) (Pierce, Rockford, IL)<br />
5. 50 mM NH 4 HCO 3 ,pH8.3
Isotope-Coded Affinity Tag Labeling 185<br />
6. Coomassie blue reagent: coomassie plus – The Better Bradford TM assay reagent<br />
(Pierce, Rockford, IL)<br />
7. Centrifuge (maximum force: ∼17,000×g)<br />
8. Vacuum centrifuge<br />
2.3. Denaturing and Reducing the Proteins<br />
1. Denaturing buffer: 6 M guanidine in 50 mM NH 4 HCO 3 ,pH8.3<br />
2. 100 mM Tris (2-carboxyethyl) phosphine (TCEP) (Pierce, Rockford, IL)<br />
3. Boiling water bath<br />
2.4. Labeling with Cleavable ICAT Reagents, Desalting,<br />
and Tryptic Digestion<br />
1. Cleavable ICAT TM reagents (light and heavy sulfhydryl modifying biotinylating<br />
reagents). Store at –20 °C. One unit of either light or heavy reagent labels 100 μg<br />
of protein. The regular kit offers both reagents in 1 unit/tube. The bulk kit offers<br />
both reagents in 10 units/tube. The method described here is based on the use of<br />
a regular kit, i.e., 1 unit that labels 100 μg of protein/tube. (Applied Biosystems,<br />
Foster City, CA)<br />
2. Acetonitrile<br />
3. 37 °C water bath<br />
4. D-Salt TM excellulose plastic desalting column 5 mL (Pierce, Rockford, IL)<br />
5. 50 mM NH 4 HCO 3 ,pH8.3<br />
6. Coomassie blue reagent: coomassie plus – The Better Bradford TM assay reagent<br />
(Pierce, Rockford, IL)<br />
7. Trypsin gold, MS grade (Promega, Madison, WI)<br />
2.5. Purifying the Labeled Peptides<br />
1. Phenylmethanesulfonyl fluoride (PMSF) (Sigma Chemical Co., St. Louis, MO)<br />
2. Glass wool<br />
3. 5–3/4˝ disposable pasteur glass pipettes<br />
4. Ultralink TM immobilized monomeric avidin slurry [50% (v/v)] (Pierce,<br />
Rockford, IL)<br />
5. Teflon tubing that fits the tip of the 5–3/4˝ disposable pasteur glass pipettes<br />
6. 2× PBS buffer, pH 7.2: dissolve 14.2 g of Na 2 HPO 4 and 8.77 g of NaCl in<br />
450 mL of H 2 O. Adjust pH to 7.2 by adding about 350 μL of 85% (v/v) H 3 PO 4 .<br />
Add H 2 O to make a total volume of 500 mL. The final concentration is 200 mM<br />
Na 2 HPO 4 and 300 mM NaCl<br />
7. 1× PBS, pH 7.2: dilute 2× PBS 1:1 in H 2 O<br />
8. 2 mM biotin solution: dissolve 9.8 mg of d-biotin ImmunoPure (MW 244.31,<br />
Pierce, Rockford, IL) in 20 mL of 2× PBS, pH 7.2<br />
9. Acetonitrile [20% (v/v)] in 50 mM NH 4 HCO 3 ,pH8.3<br />
10. Acetonitrile [30% (v/v)] containing 0.4% (v/v) formic acid
186 Xiao and Veenstra<br />
11. pH paper (pH 2–9)<br />
12. Dry ice<br />
2.6. Cleaving Biotin<br />
1. Cleaving reagent A (10 mL) (Applied Biosystems, Foster City, CA): contains<br />
concentrated TFA. Store in fume hood at room temperature<br />
2. Cleaving reagent B (Applied Biosystems, Foster City, CA): store at –20 °C<br />
3. 37 °C water bath<br />
4. Vacuum centrifuge<br />
3. Methods<br />
3.1. Cell Culture and Harvest<br />
1. On day 1, plate HT-29 cells in T-75 flasks at 5 × 10 6 cells/flask.<br />
2. On day 2, aspirate medium. Culture cells with fresh medium containing 75 μM<br />
of celecoxib or DMSO (negative control).<br />
3. On day 3, 24 h after treating cells, aspirate cell culture medium. Rinse cells once<br />
quickly with 6 mL of PBS.<br />
4. Add 3 mL of 2 mM EDTA-PBS per flask, put flask into the 37 °C incubator.<br />
Monitor the detachment of cells carefully. Cells usually detach within 5 min. For<br />
the celecoxib-treated cells, it takes less than 5 min (see Note 1).<br />
5. Tap the side of the flask against the palm of hand to dislodge cells. When the<br />
cells are visibly detached, add 7 mL of PBS to flask. Resuspend cells and transfer<br />
cell suspension to a 15 mL centrifuge tube. Harvest the treated and control cells<br />
in separate tubes.<br />
6. Centrifuge the cell suspension at 500×g for 5 min. Remove the supernatant.<br />
7. Wash cell pellet with 10 mL of PBS three times. Centrifuge at 500×g for 5 min.<br />
Remove PBS after each centrifugation.<br />
8. Cell pellet is ready for lysis. Leave cell pellet on ice before proceeding to the<br />
next step, or store the pellet at –80 °C.<br />
3.2. Cell Lysis, Desalting and Protein Quantitation<br />
1. Add 500 μL of lysis buffer to the cell pellet harvested from each T-75 flask.<br />
Transfer the resuspended cells to a 1.5 mL eppendorf tube. Vortex briefly.<br />
2. Clean the sonifier probe with H 2 O, methanol, and let it air dry before use.<br />
3. To break the cells, set the digital sonifier amplitude at 16%. Hold up the<br />
eppendorf tube with suspended cells. Let the probe plunge half way into the<br />
lysis buffer. Pulse for 10 s, pause for 50 s. Repeat this cycle five times. Rest the<br />
tube on ice between pulses. Lift the tube up again in time before the next 10 s<br />
pulse cycle starts (see Note 2).<br />
4. Clean the sonifier probe as in step 2 before starting the next sample.<br />
5. Centrifuge cell lysate at 15,000×g for 15 min at 4 °C.
Isotope-Coded Affinity Tag Labeling 187<br />
6. Transfer cell lysate to a fresh eppendorf tube (see Note 3).<br />
7. Quantify the protein in cell lysate using the BCA assay (see Note 4).<br />
8. Prepare desalting column (D-Salt TM Excellulose Plastic Desalting Column, 5 mL,<br />
Pierce) by washing column with 5× bed volume (i.e., 25 mL) of 50 mM<br />
NH 4 HCO 3 , pH 8.3 (see Note 5).<br />
9. Based on the BCA assay results, load up to 1.25 mg of cell lysate into each<br />
desalting column. Discard the flow through (see Note 6).<br />
10. Add 0.5 mL of 50 mM NH 4 HCO 3 , pH 8.3 into the column. Collect the flow<br />
through into one eppendorf tube. Repeat this step seven times. Collect eluant in<br />
seven 0.5 mL fractions.<br />
11. Take 10 μL of eluant from each fraction and mix with 300 μL (1:30) of coomassie<br />
blue reagent (Pierce). Visually examine the color of each tube. The color of<br />
the protein-containing fractions should change from brown to blue. Proteins<br />
normally elute in fractions 3–5.<br />
12. Pool the tubes containing protein. Mix well. Discard the tubes that do not contain<br />
protein.<br />
13. Measure the protein concentration using the BCA assay (see Note 4).<br />
14. Based on the BCA assay results, transfer 800 μg of protein from each of the<br />
treated and control samples into two separate eppendorf tubes (see Note 7).<br />
15. Lyophilize these two samples in vacuum centrifuge (see Note 8).<br />
3.3. Denaturing and Reducing the Proteins<br />
1. Freshly prepare denaturing buffer and 100 mM TCEP.<br />
2. Add denaturing buffer and 100 mM TCEP to the protein samples. For 800 μg of<br />
protein, add 640 μL of denaturing buffer and 8 μL of TCEP (see Note 9).<br />
3. Vortex until the sample is completely dissolved in the buffer.<br />
4. Boil the sample for 10 min.<br />
5. Vortex to mix well. Spin the samples in centrifuge briefly. Cool to room<br />
temperature.<br />
3.4. Labeling with Cleavable ICAT Reagents, Desalting,<br />
and Tryptic Digestion<br />
1. Remove the ICAT reagents from the –20 °C freezer. Bring to room temperature.<br />
Avoid exposing them to the light. To label 800 μg of protein (control or treated),<br />
use eight tubes of reagent (light or heavy, label 100 μg of protein/tube). Spin in<br />
centrifuge briefly to bring down the powder from the wall to the bottom of the<br />
tube.<br />
2. In the chemical hood with lights off, add 20 μL of acetonitrile into each of the<br />
eight reagent tubes (light or heavy). Add 80 μL (i.e., 100 μg) of protein sample into<br />
each tube. Tighten the tube caps. Vortex to mix well. Spin briefly in centrifuge<br />
(see Note 10).
188 Xiao and Veenstra<br />
3. Pool the control or treated sample mixtures (eight tubes of light or heavy),<br />
respectively, into two tubes. This pooling should result in one light and one heavy<br />
label tube with 800 μL of protein mixture in each.<br />
4. Incubate the samples in the 37 °C water bath for 2 h. Keep the samples from<br />
being exposed to light.<br />
5. Combine the light- and heavy-labeled samples together into one tube. Proceed<br />
with desalting.<br />
6. Use the same desalting column as in the previous section. Since the binding<br />
capacity per column is 1.25 mg, prepare two columns for a total of 1.6 mg of<br />
labeled protein. Wash each column with 5× bed volume (i.e., 25 mL) of 50 mM<br />
NH 4 HCO 3 , pH 8.3 (see Note 11).<br />
7. Load 800 μg of the combined and labeled proteins per column. Follow steps<br />
8–12 in Subheading 3.2. At the end of elution, pool the protein-containing eluant<br />
fractions (usually fractions 3–5) into one 15 mL tube. (see Note 12).<br />
8. Prepare trypsin freshly by reconstituting 20 μg of trypsin in 20 μL of 50 mM<br />
NH 4 HCO 3 , pH 8.3. Add trypsin to the labeled protein at a trypsin-to-protein ratio<br />
of 1:40 (w/w). For 1.6 mg of protein, add 40 μg of trypsin (see Note 13).<br />
9. Wrap the 15 mL tube with aluminum foil. Incubate at 37 °C overnight (see<br />
Note 14).<br />
3.5. Purifying the Labeled Peptides<br />
1. Boil the peptide solution for 10 min to deactivate trypsin.<br />
2. Freshly prepare 100 mM PMSF in methanol. Vortex to dissolve well.<br />
3. Add PMSF at a 1:100 dilution (v/v) to the trypsin-digested samples. For 3 mL<br />
of digests, add 30 μL of PMSF. The final PMSF concentration is 1 mM. Vortex<br />
briefly to mix.<br />
4. Prepare the avidin column: put a small trace of glass wool gently into a 5–3/4˝<br />
pasteur glass pipette. Push it from the top down for about 4–1/2˝. This packing<br />
creates a support for the resin to settle onto (see Note 15).<br />
5. Add 0.5 mL of water into the pipette. Let the water level fall till it reaches the<br />
glass wool. At this point, the flow should stop naturally. Block the bottom of<br />
the pipette. Then slowly add 1.5 mL of water into the pipette. Mark the water<br />
level as an indicator for the volume of 1.5 mL.<br />
6. Gradually add the avidin slurry to the 1.5 mL mark. Connect Teflon tubing to<br />
the pipette tip to increase the flow rate (see Note 16).<br />
7. Condition the column using the following washing buffers and sequence<br />
(see Note 17)<br />
a. 2× PBS, pH 7.2, 8 mL (5× bed volume)<br />
b. 2 mM biotin solution, 6 mL (4× bed volume)<br />
c. 30% (v/v) acetonitrile, 0.4% (v/v) formic acid, and 6 mL (4× bed volume)<br />
d. 2× PBS, pH 7.2, 8 mL (5× bed volume)<br />
8. Sample loading and incubation: take the teflon tubing off. Load 1.5 mL of the<br />
digest sample into the column. After the sample flows through, incubate at room
Isotope-Coded Affinity Tag Labeling 189<br />
temperature for 15 min. Load another 1.5 mL (or the rest) of sample. Incubate<br />
for 15 min (see Note 18).<br />
9. Connect the teflon tubing back to the tip of the pipette. Wash the column bound<br />
with ICAT-labeled peptides with the following buffers and sequence:<br />
a. 2× PBS, pH 7.2, 8 mL (5× bed volume)<br />
b. 1× PBS, pH 7.2, 8 mL (5× bed volume)<br />
c. 20% (v/v) acetonitrile in 50 mM NH 4 HCO 3 , pH 8.3, 6 mL (4× bed volume)<br />
10. Final wash: take off the teflon tubing. Add 1.3 mL (a volume slightly less than<br />
the bed volume) of 30% (v/v) acetonitrile, 0.4% (v/v) formic acid as a final<br />
wash. Discard the flow through. Measure the pH of the last drop of this wash<br />
step with pH paper. The pH should be >8 (basic), suggesting that acetonitrile<br />
has not eluted the peptides off and that the peptides are still retained on the<br />
beads (see Note 19).<br />
11. Elute the peptides with 4 mL of 30% (v/v) acetonitrile, 0.4% (v/v) formic acid<br />
in one 15 mL tube. Mix well and divide into four 1 mL aliquots. Briefly freeze<br />
the peptides on dry ice or at –80 °C and then lyophilize in vacuum centrifuge<br />
(see Note 20).<br />
3.6. Cleaving Biotin<br />
1. Prepare the cleaving reagent mixture in a chemical hood. For 1.6 mg of labeled<br />
peptides, mix 760 μL of cleaving reagent A with 40 μL of cleaving reagent B. Add<br />
the cleaving reagent mixture to the dry peptides. Dispense the mixture equally to<br />
all four peptide aliquots (see Note 21).<br />
2. Close the tube caps. Vortex well to dissolve the peptides.<br />
3. Incubate the samples in a 37 °C water bath for 2 h.<br />
4. Pool all the aliquots together when the incubation is finished. Freeze briefly on<br />
dry ice or at –80 °C. Lyophilize the peptides in vacuum centrifuge.<br />
5. Store at –80 °C prior to the next step (i.e., fractionation by SCX HPLC).<br />
4. Notes<br />
1. Dislodging cells using a low concentration of EDTA preserves the integrity of<br />
cell surface proteins, which is critical in quantitative proteomic analysis.<br />
2. For the Branson digital sonifier, use the following program settings: pulse on for<br />
10 s; off for 50 s; amplitude = 16%. If bubbles are generated during sonication,<br />
decrease the amplitude setting. Depending on the sample volume, the setting<br />
can sometimes be lowered to 14%. The clumps of cells should disappear when<br />
sonication is complete.<br />
3. After this step the cell lysate can be stored at –80 °C. Otherwise, proceed to the<br />
next step, i.e., BCA assay and desalting.<br />
4. Protein quantitation is a common laboratory procedure. The instructions are<br />
included within the BCA assay kit (Pierce); therefore, the procedure is not<br />
described in this chapter.
190 Xiao and Veenstra<br />
5. It is helpful to assemble a funnel reservoir on the top of the column to hold a<br />
larger volume (up to 25 mL) of buffer.<br />
6. The maximum binding capacity of the desalting column is 1.25 mg of protein<br />
per column.<br />
7. The method described here is based on the labeling of 800 μg of protein from<br />
each of the treated and control samples. This amount of protein is desirable if<br />
enough cell lysate is available. However, as little as 100 μg of protein from each<br />
of the treated and control samples can be labeled using this protocol.<br />
8. It takes about 3htolyophilize the samples. If necessary, leave the samples in<br />
the vacuum, centrifuge overnight to dry.<br />
9. It is important to keep the pH of the cell lysate above 7 (ideally between 8 and<br />
9). A pH below 7 will inhibit the reaction between cysteine residues and the<br />
iodoacetamide group of the ICAT reagents.<br />
10. Usually the control sample is labeled with the light reagent and the treated<br />
sample is labeled with the heavy reagent.<br />
11. To save time, it is suggested to set the two columns up on the stand during the<br />
2-h labeling incubation time. It is better to attach a funnel reservoir to the top<br />
of each column to hold up to 25 mL of wash buffer.<br />
12. Normally the volume of sample after pooling is about 3 mL. Desalted samples<br />
may have an opaque color because of the protein present in the sample.<br />
13. Instead of using the buffer provided by the manufacturer, resuspend trypsin<br />
in 50 mM NH 4 HCO 3 , pH 8.3. Keep the trypsin-to-protein ratio between 1:40<br />
and 1:50.<br />
14. The digestion mixture is incubated overnight for approximately 16–18 h.<br />
15. Make sure the glass wool is well packed. There should be no holes present;<br />
however, it should still allow liquid flow through at a reasonable flow rate.<br />
Check the flow rate by adding 0.5 mL of water into the pipette. The water<br />
should flow through quickly. Note that the flow rate will be slower considerably<br />
once the avidin slurry is packed into the column. Take these recommendations<br />
into consideration and not to pack too much or too little glass<br />
wool.<br />
16. The protein binding capacity of avidin slurry is 1.6 mg protein per milliliter of<br />
packed avidin. One 1.5 mL column should offer sufficient capacity to enrich the<br />
labeled peptides from 1.6 mg of protein.<br />
17. The binding of 2 mM biotin to the column and the elution by 30% (v/v) acetonitrile,<br />
0.4% (v/v) formic acid preclear the column of any potential nonspecific<br />
binding activities.<br />
18. The teflon tubing is a useful tool to adjust the flow rate. Connecting the teflon<br />
tubing on to the tip of the column will increase the flow rate. On the other hand,<br />
the flow rate will be slower without the teflon tubing attached.<br />
19. The final wash is aimed to remove any nonspecific binding proteins. Using a<br />
volume slightly less than the bed volume ensures that the labeled peptides are<br />
retained on the column. The volume of the final wash buffer can be adjusted<br />
according to the actual bed volume. When the bed volume of avidin is smaller,
Isotope-Coded Affinity Tag Labeling 191<br />
the volume of the final wash buffer needs to be scaled down. If the pH of the<br />
last drop is less than 3, the labeled peptides may have started to elute, meaning<br />
potential loss of the labeled peptides.<br />
20. The elution should be performed in a chemical fume hood to avoid inhaling<br />
acetonitrile. The quick freezing of samples on dry ice can prevent sample spill<br />
during vacuum centrifugation and reduce the time needed for the samples to<br />
dry.<br />
21. For every 200 μg of labeled peptides (i.e., 100 μg each of heavy or light labeled<br />
in the pair), mix 95 μL of cleaving reagent A and 5 μL of cleaving reagent B<br />
together first and transfer to the labeled peptides.<br />
Acknowledgments<br />
This project has been funded in whole or in part with Federal funds from<br />
the National Cancer Institute, National Institutes of Health, under Contract No.<br />
N01-CO-12400. The content of this publication does not necessarily reflect<br />
the views or policies of the Department of Health and Human Services, nor<br />
does mention of trade names, commercial products, or organization imply<br />
endorsement by the U.S. Government.<br />
References<br />
1. Aebersold, R., Rist, B. and Gygi, S. P. (2000) Quantitative proteome analysis:<br />
methods and applications. Ann N Y Acad Sci 919, 33–47.<br />
2. Gygi, S. P., Rist, B. and Aebersold, R. (2000) Measuring gene expression by<br />
quantitative proteome analysis. Curr Opin Biotechnol 11, 396–401.<br />
3. Yates, J. R. 3rd. (2004) Mass spectral analysis in proteomics. Annu Rev Biophys<br />
Biomol Struct 33, 297–316.<br />
4. Ong, S. E. and Mann, M. (2005) Mass spectrometry-based proteomics turns quantitative.<br />
Nat Chem Biol 1, 252–262.<br />
5. Zieske, L. R. (2006) A perspective on the use of iTRAQ reagent technology for<br />
protein complex and profiling studies. J Exp Bot 57, 1501–1508.<br />
6. Yan, W. and Chen, S. S. (2005) Mass spectrometry-based quantitative proteomic<br />
profiling. Brief Funct Genomic Proteomic 4, 27–38.<br />
7. Bronstrup, M. (2004) Absolute quantification strategies in proteomics based on<br />
mass spectrometry. Expert Rev Proteomics 1, 503–512.<br />
8. Conrads, T. P., Issaq, H. J. and Hoang, V. M. (2003) Current strategies for quantitative<br />
proteomics. Adv Protein Chem 65, 133–159.<br />
9. Leitner, A. and Lindner, W. (2004) Current chemical tagging strategies for<br />
proteome analysis by mass spectrometry. J Chromatogr B Analyt Technol Biomed<br />
Life Sci 813, 1–26.<br />
10. Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H. and Aebersold, R.<br />
(1999) Quantitative analysis of complex protein mixtures using isotope-coded<br />
affinity tags. Nat Biotechnol 17, 994–999.
192 Xiao and Veenstra<br />
11. Flory, M. R., Griffin, T. J., Martin, D. and Aebersold, R. (2002) Advances in<br />
quantitative proteomics using stable isotope tags. Trends Biotechnol 20, S23–S29.<br />
12. Han, D. K., Eng, J., Zhou, H. and Aebersold, R. (2001) Quantitative profiling of<br />
differentiation-induced microsomal proteins using isotope-coded affinity tags and<br />
mass spectrometry. Nat Biotechnol 19, 946–951.<br />
13. Conrads, K. A., Yu, L. R., Lucas, D. A., Zhou, M., Chan, K. C., Simpson, K. A.,<br />
Schaefer, C. F., Issaq, H. J., Veenstra, T. D., Beck, G. R. Jr. and Conrads, T. P.<br />
(2004) Quantitative proteomic analysis of inorganic phosphate-induced murine<br />
MC3T3-E1 osteoblast cells. Electrophoresis 25, 1342–1352.<br />
14. Gygi, S. P., Rist, B., Griffin, T. J., Eng, J. and Aebersold, R. (2002) Proteome<br />
analysis of low-abundance proteins using multidimensional chromatography and<br />
isotope-coded affinity tags. J Proteome Res 1, 47–54.<br />
15. Tao, W. A. and Aebersold, R. (2003) Advances in quantitative proteomics via<br />
stable isotope tagging and mass spectrometry. Curr Opin Biotechnol 14, 110–118.<br />
16. Conrads, K. A., Yi, M., Simpson, K. A., Lucas, D. A., Camalier, C. E., Yu, L. R.,<br />
Veenstra, T. D., Stephens, R. M., Conrads, T. P. and Beck, G. R. Jr. (2005) A<br />
combined proteome and microarray investigation of inorganic phosphate-induced<br />
pre-osteoblast cells. Mol Cell Proteomics 4, 1284–1296.<br />
17. Koehne, C. H. and Dubois, R. N. (2004) COX-2 inhibition and colorectal cancer.<br />
Semin Oncol 31, 12–21.<br />
18. Sinicrope, F. A. and Gill, S. (2004) Role of cyclooxygenase-2 in colorectal cancer.<br />
Cancer Metastasis Rev 23, 63–75.<br />
19. Steinbach, G., Lynch, P. M., Phillips, R. K., Wallace, M. H., Hawk, E.,<br />
Gordon, G. B., Wakabayashi, N., Saunders, B., Shen, Y., Fujimura, T., Su, L. K.<br />
and Levin, B. (2000) The effect of celecoxib, a cyclooxygenase-2 inhibitor, in<br />
familial adenomatous polyposis. N Engl J Med 342, 1946–1952.<br />
20. Thun, M. J., Henley, S. J. and Patrono, C. (2002) Nonsteroidal anti-inflammatory<br />
drugs as anticancer agents: mechanistic, pharmacologic, and clinical issues. J Natl<br />
Cancer Inst 94, 252–266.<br />
21. Arico, S., Pattingre, S., Bauvy, C., Gane, P., Barbat, A., Codogno, P. and Ogier-<br />
Denis, E. (2002) Celecoxib induces apoptosis by inhibiting 3-phosphoinositidedependent<br />
protein kinase-1 activity in the human colon cancer HT-29 cell line.<br />
J Biol Chem 277, 27613–27621.<br />
22. Lev-Ari, S., Strier, L., Kazanov, D., Madar-Shapiro, L., Dvory-Sobol, H.,<br />
Pinchuk, I., Marian, B., Lichtenberg, D. and Arber, N. (2005) Celecoxib and<br />
curcumin synergistically inhibit the growth of colorectal cancer cells. Clin Cancer<br />
Res 11, 6738–6744.
11<br />
Analysis of Microdissected Cells by Two-Dimensional<br />
LC-MS Approaches<br />
Chen Li, Yi-Hong, Ye-Xiong Tan, Jian-Hua Ai, Hu Zhou, Su-Jun Li,<br />
Lei Zhang, Qi-Chang Xia, Jia-Rui Wu, Hong-Yang Wang, and Rong Zeng<br />
Summary<br />
Laser capture microdissection (LCM) is a powerful tool that enables the isolation of<br />
specific cell types from tissue sections, overcoming the problem of tissue heterogeneity and<br />
contamination. We combined the LCM with isotope-coded affinity tag (ICAT) technology<br />
and two-dimensional liquid chromatography to investigate the qualitative and quantitative<br />
proteomes of hepatocellular carcinoma (HCC). The effects of three different histochemical<br />
stains on tissue sections have been compared, and toluidine blue stain was proved as the<br />
most suitable stain for LCM followed by proteomic analysis. The solubilized proteins<br />
from microdissected HCC and non-HCC hepatocytes were qualitatively and quantitatively<br />
analyzed with two-dimensional liquid chromatography tandem mass spectrometry<br />
(2D-LC-MS/MS) alone or coupled with cleavable isotope-coded affinity tag (cICAT)<br />
labeling technology. A total of 644 proteins were qualitatively identified and 261 proteins<br />
were unambiguously quantified. These results showed that the clinical proteomic method<br />
using LCM coupled with ICAT and 2D-LC-MS/MS can carry out not only large-scale but<br />
also accurate qualitative and quantitative analysis.<br />
Key Words: hepatocellular carcinoma; laser capture microdissection; isotope-coded<br />
affinity tag; two-dimensional liquid chromatography; mass spectrometry.<br />
1. Introduction<br />
Hepatocellular carcinoma (HCC) is one of the most frequent tumors<br />
worldwide. There are 0.25–1 million newly diagnosed cases of HCC each year<br />
(1). The highest frequencies of HCC are observed in sub-Saharan Africa and<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
193
194 Li et al.<br />
in Asia. In China, it has ranked the second cancer killer since 1990s. The most<br />
risky factors of HCC are chronic hepatitis B virus (HBV) and hepatitis C virus<br />
(HCV) infections, chronic exposure to the mycotoxin or aflatoxin B1 (AFB1),<br />
and alcoholic cirrhosis. Till now, the mainstay for the diagnosis for HCC<br />
includes serological tumor markers, such as alpha-fetoprotein, the L3 fraction<br />
of alpha-fetoprotein, and PIVKA-II, as well as imaging modalities (1,2,3).<br />
In order to improve diagnosis and prognosis from HCC, there is an<br />
urgent need to identify molecular markers to detect the disease. Using<br />
tissue samples from patients with HCC may be the most direct and<br />
persuasive way to find useful diagnostic and/or prognostic markers. Recently,<br />
proteomic analysis was applied to HCC tissues. Nineteen cases of HCC were<br />
analyzed by two-dimensional electrophoresis (2DE) and matrix-assisted laser<br />
desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) by<br />
Paik et al. (4,5,6). Proteome alterations in normal, cirrhotic, and tumorous<br />
tissue were observed using 2DE-MALDI-TOF-MS assay by Jung et al. (7).<br />
Kim et al. analyzed 11 cases of HCC using 2DE and delayed extractionmatrix<br />
assisted laser desorption/ionization time-of-flight mass spectrometry<br />
(DE-MALDI-TOF-MS) (8).<br />
Nowadays, non-enzymatic sample preparation (NESP) is one of the regular<br />
techniques for tissue sample preparation, which can be modified based on tissuetype-specific<br />
properties (9). However, problems may be associated with heterogeneity<br />
and contaminating proteins, e.g., blood proteins. Several approaches<br />
have been developed to resolve those problems. The selection of cell types<br />
of interest by dissection has received a great deal of attention. Since 1996,<br />
a laser-assisted technique, laser capture microdissection (LCM), has emerged<br />
as a good choice. LCM under direct microscopic visualization permits rapid<br />
one-step procurement of select cell populations from a section of complex,<br />
heterogeneous tissue (10,11). LCM has been used to isolate specific types<br />
of cells for protein, DNA, and RNA analysis. In the age of proteomics,<br />
proteins obtained by laser capture microdissected cells can be analyzed by twodimensional<br />
gel electrophoresis (2DE gel) (12,13), immunoassay (14,15), and<br />
surface-enhanced laser desorption and ionization time-of-flight (SELDI-TOF)<br />
(16,17,18,19,20,21). The only shortcoming of LCM may be that it requires long<br />
time to pick up sufficient cells for one experiment: 2–7 h for 20,000–40,000<br />
cells per immunoassay and 15 h for 250,000 cells per 2DE gel (22).<br />
Our previous work had applied proteomic analysis to HCC cell lines (23,24)<br />
and HCC metastatic cells (25). Furthermore, we extended our work to clinical<br />
tissues using LCM. However, the present LCM assay only obtains about several<br />
hundred micrograms of proteins with dissection for several hours, which is<br />
hard to be analyzed by traditional 2DE-MS proteomic route, especially for<br />
preparative 2DE gels followed by MS identification.
Proteomic Analysis of Clinical HCC Using LCM 195<br />
Since 1999, the isotope-coded affinity tag (ICAT) strategy has been a leading<br />
technology for relative protein quantification relying on post-harvest stable<br />
isotope labeling (26). Post-harvest labeling with stable isotopes can be used for<br />
protein quantification in cells and tissues from any organism, and the ICAT<br />
method as initially described has been shown to be capable of accurate quantification<br />
of proteins in complex mixtures (26). After the first-generation 2 H-<br />
ICAT reagents, the second- generation cleavable 13 C-ICAT reagents provided<br />
improved performance (27,28,29). The 2D chromatography MS/MS method has<br />
been shown to be capable of identifying a large number of proteins, including<br />
proteins of low abundance (30,31).<br />
In this study, we used LCM to isolate HCC and non-HCC hepatocytes<br />
and firstly combined LCM with cleavable isotope-coded affinity tag (cICAT)<br />
labeling technology and two-dimensional liquid chromatography tandem mass<br />
Frozen sections of HCC tissues<br />
Stained with toluidine blue<br />
Laser capture microdissection<br />
HCC hepatocytes<br />
Non-HCC hepatocytes<br />
Solubilized proteins<br />
Labeled with cICAT light chain<br />
Labeled with cICAT heavy chain<br />
Digestion of protein mixture<br />
2D-LC-MS/MS<br />
Analyze by bioinformatics<br />
Fig. 1. Outline of accurate qualitative and quantitative proteomic analysis of clinical<br />
hepatocellular carcinoma using laser capture microdissection coupled with isotopecoded<br />
affinity tag and two-dimensional liquid chromatography mass spectrometry.<br />
Reprinted with permission from (34).
196 Li et al.<br />
spectrometry (2D-LC-MS/MS) to carry out accurate qualitative and quantitative<br />
analysis of HCC and non-HCC tissues. The flowchart used is outlined in Fig. 1.<br />
Totally 644 proteins in HCC hepatocytes were qualitatively determined and 261<br />
differential proteins between HCC and non-HCC hepatocytes were quantitated.<br />
Till now, this is one of the largest qualitative and qualitative proteomes for<br />
HCC and non-HCC tissues. Our strategy and method provided an accurate,<br />
fast, and sensitive approach for proteomic analysis of clinical tissues, which<br />
will facilitate the understanding of the mechanism of HCC or other diseases<br />
and mining of potential markers and drug targets for diagnosis and treatment.<br />
2. Materials<br />
2.1. Tissue Specimen and Sample Preparation by Nonenzymatic<br />
Method (NESP)<br />
1. Tissues from a HCC patient are isolated from fresh partially hepatectized tissues<br />
of HCCs in Shanghai Eastern Hepatobiliary Surgery Hospital. Access to human<br />
tissues complies with both Chinese laws and the guidelines of the Ethics<br />
Committee.<br />
2. Glutamine-free RPMI 1640 medium: glutamine-free, 5% fetal calf serum, 0.2 mM<br />
phenylmethylsulfonyl fluoride, 1 mM ethylenediaminetetraacetic acid tetrasodium<br />
salt dehydrate (EDTA), and antibiotics: oxacillin 25 μg/ml, gentamycin 50 μg/ml,<br />
penicillin 100 U/ml, streptomycin 100 μg/ml, amphotericin B 0.25 μg/ml, nistatin<br />
50 U/ml. Store at 4°C.<br />
3. Ceramic mortar and pestle (SIBAS Corp. Shanghai, China).<br />
4. Lysis buffer: 8 M urea, 4% 3-[(3-cholamidopropyl)dimethylammonio]-1-propane<br />
sulfonate (CHAPS), 40 mM Tris-HCl (pH 8.3), 65 mM dithiothreitol (DTT).<br />
Store in aliquots at –8°C.<br />
5. Proteinase inhibitor tablet mixture (Roche).<br />
2.2. Laser Capture Microdissection<br />
1. Tissues from a HCC patient are isolated from fresh partially hepatectized tissues<br />
of HCCs in Shanghai Eastern Hepatobiliary Surgery Hospital. Access to human<br />
tissues complies with both Chinese laws and the guidelines of the Ethics<br />
Committee. The tissues are from a 50-year male patient with HCC in Edmondson<br />
grade III (HBV infected, AFP 7.3 μg/L, size 15 × 13 × 10.5 cm).<br />
2. Freezing microtome CM1900 (Leica).<br />
3. O.C.T. compound (Tissue-Tek).<br />
4. Hematoxylin, eosin, and toludine blue stain (Shanghai Genebase Corp.).<br />
5. Leica AS LMD Laser Capture Microdissection System (Leica).<br />
6. Lysis buffer: 8 M urea, 4% CHAPS, 40 mM Tris, 65 mM DTT. Store in aliquots<br />
at –8°C.<br />
7. Proteinase inhibitor tablet mixture (Roche).
Proteomic Analysis of Clinical HCC Using LCM 197<br />
2.3. Removal of Toludine Blue and Digestion of Protein Mixture<br />
for Qualitative Analysis<br />
1. Precipitation solution: 50% acetone, 50% ethanol, 0.1% acetic acid (HAc). Store<br />
at –20°C.<br />
2. Redissolved buffer: 6 M guanidine HCl, 100 mM Tris-HCl (pH 8.3). Store at<br />
4°C.<br />
3. DTT and iodoacetamide (IAA) are from Bio-Rad. Sequencing grade TPCKtrypsin<br />
is from Promega.<br />
4. YM3 ultrafiltration membranes (molecular mass cutoff, 3 kDa) are from Millipore<br />
Corp. All buffers are prepared with Milli-Q water (Millipore).<br />
2.4. Cleavable Isotope-Coded Affinity Tag Labeling of Proteins<br />
1. Tri-n-butylphosphate (TBP) is from Bio-Rad.<br />
2. cICAT light or heavy reagents, Avidin cartridge, affinity buffer–elute, affinity<br />
buffer–load, affinity buffer–wash 1, affinity buffer–wash 2, cleaving reagents A<br />
and B are from Applied Biosystems.<br />
3. Sequencing grade TPCK-trypsin (Promega).<br />
4. YM3 ultrafiltration membranes (molecular mass cutoff, 3 kDa) are from Millipore<br />
Corp. All buffers are prepared with Milli-Q water (Millipore).<br />
2.5. One-Dimensional and Two-Dimensional Liquid Chromatography<br />
Coupled with Tandem Mass Spectrometry<br />
1. Formic acid is obtained from Aldrich, and acetonitrile (HPLC gradient grade) is<br />
obtained from Merck.<br />
2. The LCQ Deca XP system, ProteomeX Workstation and TurboSequest<br />
software are purchased from Thermo Electron Corporation.<br />
2.6. Bioinformatics Analysis<br />
1. ExPASy proteomics tools are accessed from cn.expasy.org/tools/#proteome.<br />
2. Program TMHMM 2.0 is accessed from the Center for Biological Sequence<br />
Analysis (www.cbs.dtu.dk/services/TMHMM/).<br />
3. Classification tools are accessed from www.geneontology.org.<br />
3. Methods<br />
In brief, two keywords should be noticed during the whole process of LCM<br />
coupled with 2D-LC-MS/MS approaches. The first one is speediness, and<br />
the second one is impurity. Sample preparation by LCM technology must be<br />
done as quickly as possible, including fixation of fresh tissues, preparation of<br />
frozen sections, histochemical staining, microdissection, and so on. Impurities,
198 Li et al.<br />
such as histochemical stains, should be removed as completely as possible<br />
by centrifuge, precipitation, and ultrafitration before trypsin digestion and LC-<br />
MS/MS analysis.<br />
Fixation and histochemical staining are the two initial steps in LCM<br />
technology. The appropriate selection of fixation and histochemical staining<br />
methods is an important factor for the processes. In this work, we used freshly<br />
prepared liver tissues to make frozen sections (8 μm thick), and we fixed the<br />
sections with ethanol to avoid the effects on proteins, such as crosslinking<br />
caused by formalin fixation. Some histochemical stains (hematoxylin, eosin,<br />
methyl green, and toluidine blue) were tested in 2DE gel (33), which showed<br />
that staining with single stain (hematoxylin) was better than with two stains<br />
simultaneously (hematoxylin and eosin); methyl green and toluidine blue<br />
staining were both compatible with the analysis of proteins by 2D-PAGE. The<br />
results with toluidine blue staining indicated a direct link between the intensity<br />
of tissue section staining and problems with the generation of good-quality<br />
protein separations. In our study, the proteins from cells after LCM were<br />
subjected to tryptic digestion and LC-MS/MS analysis. The staining material<br />
might affect the pH of digestion buffer or inactivate the trypsin; therefore,<br />
we tried to remove the stains using precipitation and ultrafiltration prior to<br />
digestion. We used three histochemical stains (hematoxylin, eosin, and toluidine<br />
blue), respectively, to stain the frozen sections. Among these three histochemical<br />
stains, we found that almost all toluidine blue stain could be removed<br />
after precipitation in the solution (50% acetone, 50% ethanol, 0.1% acetic<br />
acid) and desalting by ultrafiltration. In addition, protein solubilization stained<br />
by toluidine blue stain was better because some colored protein precipitation<br />
appeared on the filtration membrane when using hematoxylin stain or eosin<br />
stain. Therefore, we chose toluidine blue stain to optimize the experimental<br />
conditions, including staining, microdissection, and protein digestion.<br />
3.1. Tissue Specimen and Sample Preparation by Nonenzymatic<br />
Method (NESP)<br />
1. The tissues used were from a 50-year male patient with HCC in Edmondson<br />
grade III (HBV infected, AFP 7.3 μg/L, size 15 × 13 × 10.5 cm). Tumorous<br />
tissues and their adjacent paired nontumorous tissues (3 cm away from the edge of<br />
HCC lesions, about 0.1 g) were isolated from fresh partially hepatectized tissues<br />
of HBV-associated HCC. A part of the resected tissue was used for histology<br />
analysis.<br />
2. The tissues were rinsed several times with cold glutamine-free RPMI 1640<br />
medium and were homogenized in liquid nitrogen-cooled mortar and pestle (see<br />
Note 1).<br />
3. The tissue powders obtained were dissolved in lysis buffer (see Note 2).
Proteomic Analysis of Clinical HCC Using LCM 199<br />
4. The samples were sonicated on ice for 30 s (intensity: below 50 W) using an<br />
ultrasonic processor and centrifuged for 1hat20,627×g to remove DNA, RNA,<br />
and any particulate materials.<br />
5. The protein concentrations of samples were measured by Bio-Rad Protein Assay<br />
kit. All samples were stored at –8°C until use (see Note 3).<br />
3.2. Laser Capture Microdissection<br />
1. Embed fresh tissues carefully in OCT in plastic mold, taking care not to trap air<br />
bubbles surrounding the tissue. Freeze the tissue by setting mold on top of liquid<br />
nitrogen until 70–80% of the block turns white and then put the block on top of<br />
dry ice.<br />
2. For cutting step, mount the frozen block on the cryostat holder. Never, at any<br />
point, let the tissue warm up to temperatures above –15°c. Allow frozen blocks<br />
to equilibrate in the cryostat chamber for about 5 min. Cut 8-μm sections.<br />
3. Wash 8-μm sections of freshly prepared liver tissues by cold phosphate buffered<br />
saline (PBS, pH 7.4), and stain with toluidine blue using standard manufacturer’s<br />
protocols with minor modifications (see Note 4).<br />
4. Fix the sections in cold 95% ethanol for 10 min, air-dry and microdissect with<br />
Leica AS LMD Laser Capture Microdissection System.<br />
5. Using laser pulses of 7.5 μm diameter, 70 mW, and with 2–3 ms duration,<br />
microdissect approximately 50,000 or 100,000 cells of HCC and non-HCC hepatocytes;<br />
store in microdissection caps at –8°C until lysed (see Note 5). An example<br />
of the results produced using hematoxylin and eosin (H&E) stained section is<br />
shown in Fig. 2.<br />
6. Each cell population was determined to be 95% homogeneous by microscopic<br />
visualization of the captured cells. Dissolve the laser capture microdissected HCC<br />
and non-HCC hepatocytes in lysis buffer (see Note 2).<br />
7. Sonicate the samples on ice for a while using an ultrasonic processor and<br />
centrifuge for 1 h at 20,627×g to remove DNA, RNA, and any particulate<br />
materials.<br />
8. Measure the protein concentrations of samples by Bio-Rad Protein Assay kit.<br />
Store all the samples at –8°C until use (see Note 3).<br />
3.3. Removal of Toludine Blue and Digestion of Protein Mixture<br />
for Qualitative Analysis<br />
1. Deposit the samples prepared by NESP or LCM technology in precipitation<br />
solution (50% acetone, 50% ethanol, 0.1% acetic acid; sample<br />
volume:precipitation solution volume = 1:5) at least for 12 h at –20°C. Wash the<br />
pellets with 100% acetone, 70% ethanol, and lyophilize by lyophilization (see<br />
Note 6).<br />
2. Redissolve the pellets in 6 M guanidine HCl, 100 mM Tris (pH 8.3); measure the<br />
concentrations with Bio-Rad Protein Assay kit.
200 Li et al.<br />
A.<br />
B.<br />
Fig. 2. HCC tissues before (A) and after (B) LCM. Reprinted with permission<br />
from (34).<br />
4. Reduce 200 μg solubilized proteins with DTT (final concentration 20 mM) and<br />
subsequently alkylate with IAA (final concentration 40 mM).<br />
5. After desalting by YM3 ultrafiltration membranes, incubate the protein mixture<br />
with trypsin (trypsin:protein mixture = 1:30, W/W, Promega, Madison, WI) at<br />
37°C for 16 h (see Note 7).<br />
3.4. Cleavable Isotope-Coded Affinity Tag Labeling of Proteins<br />
1. Reduce 100 μg HCC and 100 μg non-HCC solubilized proteins prepared by LCM<br />
technology with TBP (final concentration 5 mM) (see Note 8).
Proteomic Analysis of Clinical HCC Using LCM 201<br />
2. Transfer the reduced HCC and non-HCC solubilized proteins into the vial<br />
containing cICAT light or heavy reagent, respectively, and mix. After a brief<br />
centrifugation, incubate the proteins for 2hat37°C in the dark.<br />
3. Combine the labeled proteins into one tube. After desalting by YM3 ultrafiltration<br />
membranes, incubate the protein mixture with trypsin (trypsin:protein<br />
mixture = 1:30, W/W, Promega, Madison, WI) at 37°C for 16 h (see Note 7).<br />
4. Use Avidin cartridge (Applied Biosystems) to purify the ICAT-labeled peptides<br />
from tryptic digests according to the manufacture’s protocol. In brief, activate<br />
Avidin cartridge by 2 ml of the affinity buffer–elute and 2 ml of the affinity<br />
buffer–load. Slowly inject (∼1 drop/5 s) the peptide sample onto Avidin cartridge.<br />
Wash the Avidin cartridge by 500 μl of affinity buffer–load, 1 ml of affinity<br />
buffer–wash 1, 1 ml of affinity buffer–wash 2, and 1 ml of Milli-Q water. To<br />
elute the labeled peptides, slowly inject (∼1 drop/5 s) the affinity buffer–elute and<br />
collect the elute. Dry the elute from the Avidin cartridge through lyophilization.<br />
5. Dissolve the dried cICAT-labeled peptides in cleaving reagents and cleave for<br />
2 h at 37°C. Condense the cICAT-labeled peptides through lyophilization.<br />
3.5. One-Dimensional and Two-Dimensional Liquid Chromatography<br />
Coupled with Tandem Mass Spectrometry (1D- and 2D-LC-MS/MS)<br />
1. All the 2D HPLC separations are performed on ProteomeX (Thermo Finnigan<br />
Corp., San Jose, CA) equipped with two LC pumps. The flow rates of both salt and<br />
analytical pumps are 200 μl/min and about 2 μl/min after split. The strong cation<br />
exchange column is the 300 μm inner diameter ones (SCX resin, 5 μm), and the<br />
RPC column is the 150 μm inner diameter (C 18 resin, 300 A, 5 μm) (see Note 9).<br />
2. Nine different salt concentration ranges—0, 25, 50, 75, 100, 150, 200, 400, and<br />
800 mM ammonium chloride—are used for step gradient.<br />
3. The mobile phases used for reverse phase are A: 0.1% formic acid in water, pH<br />
3.0, B: 0.1% formic acid in acetonitrile.<br />
4. Load about 200 μg of peptides digested from the LCM protein to the SCX<br />
column by the autosample. The elute condition is described in step 2. Load<br />
the eluted peptides from each salt step to the RPC columns. The RPC columns<br />
are washed by 95% A mobile phases in 20 column volumes. Finally, separate<br />
the peptides using 100-min linear gradient from 5 to 80% B mobile phases.<br />
The eluting peptide enters an LCQ ProteomeX mass spectrometer (Thermo<br />
Electron, San Jose, CA) by the metal needle (see Note 10).<br />
5. The 1D HPLC separation uses the same system/experimental steps, but without<br />
the use of a strong cation exchange column.<br />
6. An electrospray (ESI) ion-trap mass spectrometer (LCQ Deca XP, Thermo<br />
Finnigan, San Jose, CA) is used for peptide detection.<br />
7. The positive ion mode is employed and the spray voltage is set at 3.2 kV. The<br />
spray temperature is set at 150°C for peptides.<br />
8. The collision energy is automatically set by LCQ Deca XP. After the acquisition<br />
of full scan mass spectra, three MS/MS scans are acquired for the next three<br />
most intense ions using dynamic exclusion.
202 Li et al.<br />
9. Peptides and proteins are identified using TurboSequest R (Thermo Finnigan,<br />
San Jose, CA), which uses the MS and MS/MS spectrum of peptide ions<br />
to search against the publicly available NCBI non-redundant protein database<br />
(www.ncbi.nlm.nih.gov).<br />
10. The protein identification criteria that we used are based on Delta CN (≥0.1)<br />
and Xcorr (one charge ≥ 1.8, two charges ≥ 2.2, three charges ≥ 3.7). An<br />
example of the results produced is shown in Table 1 (see Note 11).<br />
11. For quantitative analysis with cICAT technology and 2D-LC-MS/MS, manual<br />
check is followed after database searching and quantification by Xpress<br />
(TurboSequest R software). Quantitative analysis results of 261 proteins from<br />
LCM-ICAT-2D-LC-MS/MS are shown in Fig. 3. In our experiment, a total of<br />
149 differentially expressed proteins with at least twofold quantitative alterations<br />
in HCC and non-HCC hepatocytes were detected, including 55 upregulated<br />
proteins (32 with 2∼5 folds, 13 with 5∼10 folds, 10 with >10 folds) and 94<br />
downregulated spots in HCC hepatocytes (62 with 2∼5 folds, 17 with 5∼10<br />
folds, 15 with >10 folds).<br />
3.6. Bioinformatics Analysis<br />
1. The pI and Mr of the proteins are analyzed using ExPASy proteomics tools<br />
accessed from http://cn.expasy.org/tools/#proteome. Examples of the results<br />
produced are shown in Table 1 and Fig. 5A and 5B.<br />
17<br />
15<br />
32<br />
13<br />
2 ≤ Ratio(HCC/non-HCC) ≤ 5<br />
10<br />
5 < Ratio(HCC/non-HCC) ≤ 10<br />
Ratio(HCC/non-HCC) > 10<br />
62<br />
Ratio(HCC/non-HCC or non-HCC/HCC) < 2<br />
2 ≤ Ratio(non-HCC/HCC) ≤ 5<br />
5 < Ratio(non-HCC/HCC) ≤ 10<br />
Ratio(non-HCC/HCC) > 10<br />
112<br />
Fig. 3. Quantitative analysis results of 261 proteins from LCM-ICAT-2D-LC-<br />
MS/MS. A total of 149 differentially expressed proteins with at least twofold quantitative<br />
alterations in HCC and non-HCC hepatocytes were detected, including 55 upregulated<br />
proteins (32 with 2∼5 folds, 13 with 5∼10 folds, 10 with >10 folds) and 94<br />
downregulated spots in HCC hepatocytes (62 with 2∼5 folds, 17 with 5∼10 folds, 15<br />
with >10 folds). Reprinted with permission from (34).
Proteomic Analysis of Clinical HCC Using LCM 203<br />
Table 1<br />
Summary of Total Proteins Identified in HCC-NESP-1D-LC-MS/MS,<br />
HCC-NESP-2D-LC-MS/MS and HCC-LCM-2D-LC-MS/MS<br />
HCC-<br />
NESP-1D-<br />
LC-MS/MS<br />
HCC-<br />
NESP-2D-<br />
LC-MS/MS<br />
HCC-<br />
LCM-2D-<br />
LC-MS/MS<br />
Protein quantity 200μg 200μg 200μg<br />
Total proteins identified 208 626 644<br />
Hydrophobic proteins 25(12.0%) 64(10.2%) 80(12.4%)<br />
Trans-membrane proteins 8(3.9%) 30(4.8%) 54(8.4%)<br />
Proteins with Mr >100KD or < 10KD 19(9.1%) 77(12.3%) 75(11.6%)<br />
Proteins pI >9 21(10.1%) 78(12.5%) 126(19.6%)<br />
2. The general average hydropathicity (GRAVY) score is calculated as the arithmetic<br />
mean of the sum of the hydropathic indices of each amino acid (32). Examples<br />
of the results produced are shown in Table 1 and Fig. 5C.<br />
3. The trans-membrane prediction is conducted using the computer server<br />
program TMHMM server 2.0, which can be accessed from the CBS<br />
(http://www.cbs.dtu.dk/services/TMHMM/). Examples of the results produced are<br />
shown in Table 1 and Fig. 5D.<br />
4. All identified proteins are classified by their molecular function, cellular<br />
component, and biological process with the tools on http://www.geneontology.org.<br />
An example of the results produced is shown in Fig. 4.<br />
4. Notes<br />
1. Glutamine-free RPMI 1640 medium must be cold (4°C) before use. Washing<br />
should be done as quickly as possible, until there are no contaminations (blood,<br />
etc.) on tissues. Glutamine-free RPMI 1640 medium could be replaced by PBS<br />
(pH 7.4), 0.9% NaCl solution, or any other isotonic buffer.<br />
2. Store the lysis buffer in small aliquots at –8°C to avoid multiple freeze-thaw<br />
cycles. Protease inhibitor tablet mixture (Roche Molecular Biochemicals) should<br />
be dissolved in lysis buffer.<br />
3. Store the samples in small aliquots at –8°C to avoid multiple freeze-thaw cycles.<br />
Protein concentrations of the samples should be about 10 μg/μl for subsequent<br />
experiments.<br />
4. The sections should be very lightly stained with toluidine blue only to distinguish<br />
hepatocytes during microdissection. Otherwise, the redundant stains could affect<br />
follow-up experiments.<br />
5. In fact, in order to reduce microdissection time, manipulators could choose to<br />
capture hepatocytes or remove other cells based on the condition of each section.
204 Li et al.<br />
A.<br />
B.<br />
Fig. 4. Classification of differentially expressed proteins obtained by LCM-ICAT-<br />
2D-LC-MS/MS. (A) shows proteins with at least twofold increased expression levels<br />
in HCC hepatocytes. (B) shows proteins with at least twofold decreased expression<br />
levels in HCC hepatocytes. Reprinted with permission from (34).<br />
6. Precipitation solution, acetone, and ethanol must be cold at –20°C before use.<br />
7. Ultrafiltration is very important to remove redundant salts, stain, and other<br />
impurities, and ensure follow-up steps.<br />
8. TBP is a much stronger but more toxic reducing agent for labeling ICAT reaction<br />
than DTT.
Proteomic Analysis of Clinical HCC Using LCM 205<br />
Protein number<br />
Protein number<br />
100<br />
80<br />
60<br />
40<br />
20<br />
0<br />
45<br />
40<br />
35<br />
30<br />
25<br />
20<br />
15<br />
10<br />
5<br />
0<br />
7<br />
62<br />
100 kDa<br />
C. Hydrophile and hydrophobicity distribution<br />
9<br />
37 39<br />
31 30 27<br />
18<br />
15 11 13 12<br />
4 3<br />
0.3<br />
Protein number<br />
Protein number<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
3<br />
Number of trans-membrane region<br />
1<br />
21<br />
61<br />
5<br />
37<br />
10<br />
(5~6)<br />
(6~7)<br />
(7~8)<br />
(8~9)<br />
(9~10)<br />
>10<br />
Fig. 5. Characteristics of differentially expressed proteins obtained by LCM-ICAT-<br />
2D-LC-MS/MS. (A) shows the Mr distribution; (B) shows the pI distribution; (C)<br />
presents the hydrophile and hydrophobicity distribution; and (D) shows the transmembrane<br />
proteins. Reprinted with permission from (34).<br />
9. The LCQ ProteomeX Workstation (Thermo Electron, San Jose, CA) is an<br />
automatic 2D LC/MS system, which can be used in high-throughout proteomic<br />
research. However, you may use another equipment to separate the proteomics<br />
sample by offline SCX fractionation. The step involved in offline SCX fractionation<br />
is almost the same as online. The difference is that you need to manually<br />
load the step salt-eluted peptides to RPC column.<br />
10. If you use the nanospay kit in the mass spectrometer and the 75-μm<br />
inner diameter RPC column, the eluted peptides can directly enter the mass<br />
spectrometer. The sensitivity in the nanospay mode is higher than in the metal<br />
needle mode.<br />
11. The protein identification criteria can vary based on the type of mass<br />
spectrometer or other analytic needs. For example, we use Delta CN (≥0.1) and<br />
Xcorr (one charge ≥ 1.9, two charges≥ 2.2, three charges ≥ 3.75) as criteria<br />
when using LTQ linear ion trap mass spectrometer (Thermo Finnigan, San Jose,<br />
CA).<br />
Acknowledgments<br />
This work was supported by National High-Technology Project<br />
(2001AA233031, 2002BA711A11) and Basic Research Foundation<br />
(2001CB210501).
206 Li et al.<br />
References<br />
1. Feitelson M.A., Sun B., Satiroglu Tufan N.L., Liu J., Pan J. and Lian Z. (2002)<br />
Genetic mechanisms of hepatocarcinogenesis. Oncogene 21, 2593–2604.<br />
2. Fujiyama S., Tanaka M., Maeda S., Ashihara H., Hirata R. and Tomita K. (2002)<br />
Tumor markers in early diagnosis, follow-up and management of patients with<br />
hepatocellular carcinoma. Oncology 62(Suppl 1), 57–63.<br />
3. Qin L.X. and Tang Z.Y. (2002) The prognostic molecular markers in hepatocellular<br />
carcinoma. World J Gastroenterol 8, 385–392.<br />
4. Park K.S., Cho S.Y., Kim H. and Paik Y.K. (2002) Proteomic alterations of the<br />
variants of human aldehyde dehydrogenase isozymes correlate with hepatocellular<br />
carcinoma. Int J Cancer 97, 261–265.<br />
5. Park K.S., Kim H., Kim N.G., Cho S.Y., Choi K.H., Seong J.K. and Paik Y.K.<br />
(2002) Proteomic analysis and molecular characterization of tissue ferritin light<br />
chain in hepatocellular carcinoma. Hepatology 35, 1459–1466.<br />
6. Cho S.Y., Park K.S., Shim J.E., Kwon M.S., Joo K.H., Lee W.S., Chang J.,<br />
Kim H., Chung H.C., Kim H.O. and Paik Y.K. (2002) An integrated proteome<br />
database for two-dimensional electrophoresis data analysis and laboratory information<br />
management system. Proteomics 2, 1104–1113.<br />
7. Lim S.O., Park S.J., Kim W., Park S.G., Kim H.J., Kim Y.I., Sohn T.S., Noh J.H.<br />
and Jung G. (2002) Proteome analysis of hepatocellular carcinoma. Biochem<br />
Biophys Res Commun 291, 1031–1037.<br />
8. Kim J., Kim S.H., Lee S.U., Ha G.H., Kang D.G., Ha N.Y., Ahn J.S., Cho<br />
H.Y., Kang S.J., Lee Y.J., Hong S.C., Ha W.S., Bae J.M., Lee C.W. and<br />
Kim J.W. (2002) Proteome analysis of human liver tumor tissue by twodimensional<br />
gel electrophoresis and matrix assisted laser desorption/ionizationmass<br />
spectrometry for identification of disease-related proteins. Electrophoresis 23,<br />
4142–4156.<br />
9. Franzen B., Hirano T., Okuzawa K., Uryu K., Alaiya A.A., Linder S. and<br />
Auer G. (1995) Sample preparation of human tumors prior to two-dimensional<br />
electrophoresis of proteins. Electrophoresis 16, 1087–1089.<br />
10. Emmert-Buck M.R., Bonner R.F., Smith P.D., Chuaqui R.F., Zhuang Z.,<br />
Goldstein S.R., Weiss R.A. and Liotta L.A. (1996) Laser capture microdissection.<br />
Science 274, 998–1001.<br />
11. Bonner R.F., Emmert-Buck M., Cole K., Pohida T., Chuaqui R., Goldstein S. and<br />
Liotta L.A. (1997) Laser capture microdissection: molecular analysis of tissue.<br />
Science 278, 1481–1483.<br />
12. Ornstein D.K., Gillespie J.W., Paweletz C.P., Duray P.H., Herring J., Vocke<br />
C.D., Topalian S.L., Bostwick D.G., Linehan W.M., Petricoin E.F., III and<br />
Emmert-Buck M.R. (2000) Proteomic analysis of laser capture microdissected<br />
human prostate cancer and in vitro prostate cell lines. Electrophoresis 21,<br />
2235–2242.<br />
13. Jones M.B., Krutzsch H., Shu H., Zhao Y., Liotta L.A., Kohn E.C. and<br />
Petricoin E.F., III (2002) Proteomic analysis and identification of new biomarkers<br />
and therapeutic targets for invasive ovarian cancer. Proteomics 2, 76–84.
Proteomic Analysis of Clinical HCC Using LCM 207<br />
14. Simone N.L., Remaley A.T., Charboneau L., Petricoin E.F., III, Glickman J.W.,<br />
Emmert-Buck M.R., Fleisher T.A. and Liotta L.A. (2000) Sensitive immunoassay<br />
of tissue cell proteins procured by laser capture microdissection. Am J Pathol 156,<br />
445–452.<br />
15. Ornstein D.K., Englert C., Gillespie J.W., Paweletz C.P., Linehan W.M., Emmert-<br />
Buck M.R. and Petricoin E.F., III (2000) Characterization of intracellular prostatespecific<br />
antigen from laser capture microdissected benign and malignant prostatic<br />
epithelium. Clin Cancer Res 6, 353–356.<br />
16. Sauter E.R., Zhu W., Fan X.J., Wassell R.P., Chervoneva I. and Du Bois G.C.<br />
(2002) Proteomic analysis of nipple aspirate fluid to detect biologic markers of<br />
breast cancer. Br J Cancer 86, 1440–1443.<br />
17. Verma M., Wright G.L., Jr., Hanash S.M., Gopal-Srivastava R. and Srivastava<br />
S. (2001) Proteomic approaches within the NCI early detection research network<br />
for the discovery and identification of cancer biomarkers. Ann N Y Acad Sci 945,<br />
103–115.<br />
18. Jain K.K. (2002) Recent advances in oncoproteomics. Curr Opin Mol Ther 4,<br />
203–209.<br />
19. Jr G.W., Cazares L.H., Leung S.M., Nasim S., Adam B.L., Yip T.T., Schellhammer<br />
P.F., Gong L. and Vlahou A. (1999) ProteinChip R surface enhanced laser<br />
desorption/ionization (SELDI) mass spectrometry: a novel protein biochip<br />
technology for detection of prostate cancer biomarkers in complex protein mixtures.<br />
Prostate Cancer Prostatic Dis 2, 264–276.<br />
20. Batorfi J., Ye B., Mok S.C., Cseh I., Berkowitz R.S. and Fulop V. (2003) Protein<br />
profiling of complete mole and normal placenta using ProteinChip analysis on<br />
laser capture microdissected cells. Gynecol Oncol 88, 424–428.<br />
21. Wulfkuhle J.D., Paweletz C.P., Steeg P.S., Petricoin E.F., III and Liotta L. (2003)<br />
Proteomic approaches to the diagnosis, treatment, and monitoring of cancer. Adv<br />
Exp Med Biol 532, 59–68.<br />
22. Seow T.K., Liang R.C., Leow C.K. and Chung M.C. (2001) Hepatocellular<br />
carcinoma: from bedside to proteomics. Proteomics 1, 1249–1263.<br />
23. Yu L.R., Shao X.X., Jiang W.L., Xu D., Chang Y.C., Xu Y.H. and Xia Q.C. (2001)<br />
Proteome alterations in human hepatoma cells transfected with antisense epidermal<br />
growth factor receptor sequence. Electrophoresis 22, 3001–3008.<br />
24. Yu L.R., Zeng R., Shao X.X., Wang N., Xu Y.H. and Xia Q.C. (2000) Identification<br />
of differentially expressed proteins between human hepatoma and normal liver cell<br />
lines by two-dimensional electrophoresis and liquid chromatography-ion trap mass<br />
spectrometry. Electrophoresis 21, 3058–3068.<br />
25. Ding S.J., Li Y., Tan Y.X., Jiang M.R., Tian B., Liu Y.K., Shao X.X., Ye S.L.,<br />
Wu J.R., Zeng R., Wang H.Y., Tang Z.Y. and Xia Q.C. (2004) From proteomic<br />
analysis to clinical significance: overexpression of cytokeratin 19 correlates with<br />
hepatocellular carcinoma metastasis. Mol Cell Proteomics 3(1), 73–81.<br />
26. Gygi S.P., Rist B., Gerber S.A., Turecek F., Gelb M.H. and Aebersold R. (1999)<br />
Quantitative analysis of complex protein mixtures using isotope-coded affinity<br />
tags. Nat Biotechnol 17, 994–999.
208 Li et al.<br />
27. Li J., Steen H. and Gygi S.P. (2003) Protein profiling with cleavable isotope<br />
coded affinity tag (cICAT) reagents: the yeast salinity stress response. Mol Cell<br />
Proteomics 2 (11), 1198–204.<br />
28. Oda Y., Owa T., Sato T., Boucher B., Daniels S., Yamanaka H., Shinohara Y.,<br />
Yokoi A., Kuromitsu J. and Nagasu T. (2003) Quantitative chemical proteomics<br />
for identifying candidate drug targets. Anal Chem 75, 2159–2165.<br />
29. Hansen K.C., Schmitt-Ulms G., Chalkley R.J., Hirsch J., Baldwin M.A. and<br />
Burlingame A.L. (2003) Mass spectrometric analysis of protein mixtures at<br />
low levels using cleavable 13C-isotope-coded affinity tag and multidimensional<br />
chromatography. Mol Cell Proteomics 2, 299–314.<br />
30. Washburn M.P., Wolters D. and Yates J.R., III (2001) Large-scale analysis of<br />
the yeast proteome by multidimensional protein identification technology. Nat<br />
Biotechnol 19, 242–247.<br />
31. Gygi S.P., Corthals G.L., Zhang Y., Rochon Y. and Aebersold R. (2000) Evaluation<br />
of two-dimensional gel electrophoresis-based proteome analysis technology. Proc<br />
Natl Acad Sci USA 97, 9390–9395.<br />
32. Kyte J. and Doolittle R.F. (1982) A simple method for displaying the hydropathic<br />
character of a protein. J Mol Biol 157, 105–132.<br />
33. Craven R.A., Totty N., Harnden P., Selby P.J. and Banks R.E. (2002) Laser<br />
capture microdissection and two-dimensional polyacrylamide gel electrophoresis:<br />
evaluation of tissue preparation and sample limitations. Am J Pathol 160, 815–822.<br />
34. Li C., Hong Y., Tan Y.X., Zhou H., Ai J.H., Li S.J., Zhang L., Xia Q.C., Wu J.R.,<br />
Wang Y. and Zeng R. (2004) Accurate qualitative and quantitative proteomic<br />
analysis of clinical hepatocellular carcinoma using laser capture microdissection<br />
coupled with isotope-coded affinity tag and two-dimensional liquid chromatography<br />
mass spectrometry. Mol Cell Proteomics 3(4), 399–409.
12<br />
Label-Free LC-MS Method for the Identification<br />
of Biomarkers<br />
Richard E. Higgs, Michael D. Knierman, Valentina Gelfanova,<br />
Jon P. Butler, and John E. Hale<br />
Summary<br />
Pharmaceutical companies and regulatory agencies are pursuing biomarkers as a means<br />
to increase the productivity of drug development. Quantifying differential levels of proteins<br />
from complex biological samples like plasma or cerebrospinal fluid is one specific<br />
approach being used to identify markers of drug action, efficacy, toxicity, etc. Academic<br />
investigators are also interested in markers that are diagnostic or prognostic of disease<br />
states. We report a comprehensive, fully automated, and label-free approach to relative<br />
protein quantification including: sample preparation, proteolytic protein digestion, LC-<br />
MS/MS data acquisition, de-noising, mass and charge state estimation, chromatographic<br />
alignment, and peptide quantification via integration of extracted ion chromatograms.<br />
Additionally, we describe methods for transformation and normalization of the quantitative<br />
peptide levels in multiplexed measurements to improve precision for statistical analysis.<br />
Lastly, we outline how the described methods can be used to design and power biomarker<br />
discovery studies.<br />
Key Words: relative quantification; label-free quantification; biomarkers;<br />
proteomics; LC-MS/MS.<br />
1. Introduction<br />
Recent advances in analytical technology, particularly mass spectrometry,<br />
are finding broad applications in the search for biomarkers. Biomarkers may<br />
be defined as indicators of biological processes and encompass a variety of<br />
measures including imaging, polynucleotides, proteins, and small molecule<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
209
210 Higgs et al.<br />
metabolites, among others. These new biomarker discovery activities are<br />
motivated by the need to improve diagnosis, guide-targeted therapies, and<br />
monitor therapeutic efficacy and toxicity throughout a treatment regimen.<br />
Biomarkers of drug efficacy or toxicity have the potential to shorten the drug<br />
development timeline as they may provide early indications of a drug’s activity.<br />
This potential for increased drug development productivity from high-quality<br />
biomarkers has fueled increased attention from pharmaceutical, biotechnology,<br />
and regulatory agencies alike (1,2). Within the field of protein biomarkers,<br />
mass spectrometry is playing a central role in the discovery of biomarkers from<br />
various biological sample matrices. Quantification of small organic molecules<br />
using extracted ion chromatograms (XICs) from liquid chromatography mass<br />
spectrometry (LC-MS) experiments has a long history in analytical chemistry.<br />
Similar techniques using LC-MS experiments with proteolytic protein digests<br />
are now routinely being applied to quantify peptide and protein levels in<br />
biological samples. Early LC-MS peptide quantification methods relied on the<br />
modification of peptides with reagents enriched in stable isotopes to introduce<br />
mass shifts in the peptides from one sample in order to compare relative<br />
peptide levels to another un-labeled sample (3,4). The number of biological<br />
samples required for statistical power in many applications, the restriction that<br />
study samples must be paired or pooled for these label-based methods, and the<br />
increased cost due to specialized reagents have limited their application and<br />
motivated the search for label-free methods of non-targeted protein profiling.<br />
We report here a comprehensive analytical system to collect and automatically<br />
process the data from non-targeted LC-MS/MS analyses of complex<br />
protein mixtures. In contrast to pattern-based (5,6), difference based (7), or<br />
identification-based quantification methods (8,9), the approach presented here<br />
simply integrates the peptide parent ion current in order to obtain a relative<br />
peptide level in each study sample. No labeling or pooling of study samples<br />
is required. The output from this approach is an N × P table in which each<br />
of P peptides has been quantified in each of the N study samples. This table<br />
maximizes the flexibility in downstream statistical data analysis including transformation,<br />
normalization, and an analysis suited to the experimental design.<br />
The described method is based on the collective efforts of the applied biochemistry<br />
and statistics groups within Lilly Research Laboratories (10,11,12). As<br />
a broad-looking, discovery-oriented assay, it is important to note the limitations<br />
imposed by the approach. An assay designed to detect and quantify many<br />
analytes simultaneously compromises on sensitivity, selectivity, dynamic range,<br />
and absolute quantification relative to a targeted assay designed for a particular<br />
analyte. Ion suppression and co-elution of peptides from complex mixtures<br />
have the potential to interfere with the ion current attributed to a peptide, thus<br />
confounding any inference that may be made about the relative quantities of
Label-Free Biomarker Identification 211<br />
the peptide. The limited dynamic range of these uncalibrated assays tends to<br />
underestimate the magnitude of a change in protein levels for peptides that do<br />
not lie near the linear portion of the instrument response curve. Nonetheless,<br />
these non-targeted methods have shown promise in identifying relative changes<br />
in protein levels that can be followed in subsequent studies using more targeted<br />
assays (e.g., multiple reaction monitoring) (13) to verify the findings in a new<br />
sample set.<br />
The described method focuses on biomarker discovery from human plasma<br />
and cerebrospinal fluid (CSF). Biomarker discovery from these fluids has<br />
proven challenging as the highly abundant proteins (e.g., albumin, IgG) are<br />
difficult to completely remove and tend to mask the detection of lower<br />
abundance proteins that may be directly associated with the biology of interest.<br />
However, the analytical and statistical methods described here are directly<br />
applicable to more targeted sample matrices (e.g., tissues) in both clinical and<br />
pre-clinical models that may increase the probability of technical success based<br />
on samples more directly associated with the biology of interest with fewer<br />
abundant, masking proteins to remove. Sample collection and handling procedures<br />
are critical in reducing the overall variability in biomarker discovery<br />
studies. Age, gender, diet, time of day, and medication may affect the plasma<br />
or CSF protein profile and should be considered in study designs. Similarly,<br />
consistent sample handling tailored to proteomics profiling (e.g., preservatives,<br />
rapid sample freezing, controlling for blood contamination in CSF sampling,<br />
number of sample freeze-thaw cycles, etc.) are important considerations to<br />
ensure high-quality starting material. The proteome is arguably the most<br />
modulated class of biomolecules in disease, treatment, and toxicity, resulting in<br />
the promise of proteomics for biomarker discovery. Despite this promise and<br />
rapid advancements in technology, progress has been slow (14,15). However,<br />
with a refined strategy of: (1) applying non-targeted, hypothesis generation<br />
methods like those described here to sample matrices proximal to the biology,<br />
(2) using targeted MS assays to verify early discoveries in new sample sets,<br />
and (3) clinical validation using established diagnostic assay formats (e.g.,<br />
ELISAs), the potential to fulfill the promise is high by strategically applying<br />
the right technology to the appropriate stage of the biomarker discovery life<br />
cycle (16).<br />
2. Materials<br />
2.1. Albumin/IgG Depletion<br />
1. Montage equilibration buffer, wash buffer, and columns are provided with the<br />
Montage Albumin Deplete Kit (Millipore ® ).<br />
2. ProteinG-Sepharose (Amersham Biosciences ® ).
212 Higgs et al.<br />
2.2. Reduction, Alkylation, and Digestion<br />
1. Denaturing solution and internal standard: 8 M urea in 100 mM (NH 4 ) 2 CO 3 buffer<br />
containing chicken lysozyme (Sigma, St Louis, MO; 10.4 μg/mL), pH 11.0.<br />
2. Reduction/alkylation cocktail: 97.5% ACN, 2% iodoethanol, and 0.5%<br />
triethylphosphine (v/v).<br />
3. Trypsin solution: TPCK treated bovine pancreatic trypsin (Worthington,<br />
Lakewood, NJ) is dissolved at 1 mg/mL in H 2 O and stored in single-use aliquots<br />
at –80°C. Working solutions are prepared by diluting to 5 μg/mL in 100 mM<br />
ammonium bicarbonate pH 8.0 prior to use.<br />
2.3. HPLC<br />
1. The C-18 reversed phase column was a Zorbax SB300 1×50mm(Agilent).<br />
2. Solvent A: 0.1% formic acid (Aldrich) in water (Burdick and Jackson HPLC<br />
grade).<br />
3. Solvent B: 50% acetonitrile, 0.1% formic acid (Aldrich) in water (Burdick and<br />
Jackson HPLC grade).<br />
4. Solvent C: 80% acetonitrile, 0.1% formic acid (Aldrich) in water (Burdick and<br />
Jackson HPLC grade).<br />
2.4. Mass Spectrometry<br />
1. LTQ ion trap mass spectrometer (ThermoFinnigan).<br />
3. Methods<br />
3.1. Plasma Sample Preparation<br />
3.1.1. Albumin/IgG Depletion<br />
1. Dilute a 25 μL aliquot of plasma (1.25 mg protein assuming 50 mg/mL total<br />
protein concentration) with Montage equilibration buffer to a volume of 200 μL<br />
(see Note 1).<br />
2. Add 100 μL of a 50% proteinG-Sepharose bead suspension and rock the mixture<br />
for1hatRT.<br />
3. Pellet the G-Sepharose beads at 2000 rpm for 2 min. and transfer 200 μL of the<br />
effluent to a pre-equilibrated Montage column. Pre-equilibration was performed<br />
with 400 μL of equilibration buffer and centrifugation for 2 min at 500×g<br />
(see Note 2).<br />
4. Centrifuge the Montage column at 500×g for 2 min and re-apply the flow-thru to<br />
the column and centrifuge again. Pass two consecutive 200 μL washes of Montage<br />
wash buffer over the column via 500×g centrifugation for 2 min. (final volume<br />
approximately 600 μL).
Label-Free Biomarker Identification 213<br />
3.1.2. Reduction, Alkylation, and Digestion<br />
1. Spike a 120 μL aliquot of the diluted and depleted plasma with 120 μL of the<br />
denaturing and internal standard solution (see Note 3).<br />
2. Add an equal volume (240 μ(L) of reduction/alkylation cocktail (see Note 4).<br />
3. Cap the solutions and incubate for 1hat37°C.<br />
4. Speed vacuum the solutions to dryness (at least 3 h).<br />
5. Re-dissolve the pellet in 600 μL of the working trypsin solution. Digest overnight<br />
at 37°C (17).<br />
3.2. Cerebrospinal Fluid Sample Preparation<br />
3.2.1. Albumin/IgG Depletion<br />
1. Dilute an aliquot of CSF (34 μg protein based on a Bradford total protein assay)<br />
with Montage equilibration buffer to a volume of 200 μL (see Note 5).<br />
2. Add 100 μL of a 50% proteinG-Sepharose bead suspension and rock the mixture<br />
for1hatRT.<br />
3. Pellet the G-Sepharose beads at 2000 rpm for 2 min and transfer 200 μL of<br />
the effluent to a pre-equilibrated Montage column. Pre-equilibration is performed<br />
with 400 μL of equilibration buffer and centrifugation for 2 min at 500×g (see<br />
Note 2).<br />
4. Centrifuge the Montage column at 500×g for 2 min and re-apply the flow-thru to<br />
the column and centrifuge again. Pass two consecutive 200 μL washes of Montage<br />
wash buffer over the column via 500×g centrifugation for 2 min (final volume<br />
approximately 600 μL).<br />
3.2.2. Reduction, Alkylation, and Digestion<br />
1. Speed vacuum the CSF samples to approximately 30–50 μL and mix with 40 μL<br />
of the denaturing and internal standard solution (see Note 3).<br />
2. Add 100 μL of reduction/alkylation cocktail (see Note 4).<br />
3. Cap the solutions and incubate for 1hat37°C.<br />
4. Speed vacuum the solutions to dryness (at least 3 h).<br />
5. Re-dissolve the pellet in 600 μL of the working trypsin solution. Digest overnight<br />
at 37°C (17).<br />
3.3. HPLC Conditions<br />
1. A Surveyor autosampler and MS HPLC pump (ThermoFinnigan) are used for<br />
separation. 100 μL tryptic digests (4.2 μg plasma non-depleted equivalent protein<br />
or 14 μg CSF non-depleted equivalent protein) onto the reversed phase column<br />
at a flow rate of 50 μL/min (see Note 6). The gradient conditions are: 10–95% B<br />
(90–5% A) over 120 min, followed by a 0.1 min ramp to 100% C, followed by<br />
5 min at 100% C, followed by a 0.1 min ramp to 10% B (90% A), and hold for
214 Higgs et al.<br />
17 min at 10% B (90% A). The effluent is diverted to waste for the first 2 min<br />
to keep the mass spectrometer source clean.<br />
2. Between each sample in the set, an injection of water is made and a shortened<br />
(60 min) gradient, identical to the above, is performed to reduce carryover.<br />
3.4. Mass Spectrometer Conditions<br />
1. The total column effluent (50 μL/min) is connected to the electrospray interface<br />
of the ion trap mass spectrometer.<br />
2. The source is operated in positive ion mode with a 4.8 kV electrospray potential,<br />
a sheath gas flow of 20 arbitrary units, and a capillary temperature of 225°C. The<br />
source lenses should be set by maximizing the ion current for the 2+ charge state<br />
of angiotensin.<br />
3. Data are collected in the triple play mode with the following parameters: centroid<br />
parent scan set to one microscan and 50 ms maximum injection time, profile<br />
zoom scan set to three microscans and 500 ms maximum injection time, and a<br />
centroid MS/MS scan set to two microscans and 2000 ms maximum injection<br />
time (see Note 7).<br />
4. Dynamic exclusion settings are set to a repeat count of one, exclusion list duration<br />
of 2 min, and rejection widths of –0.75 m/z and +2.0 m/z.<br />
5. Collisional activation is carried out with relative collision energy of 35% and an<br />
exclusion width of 3 m/z.<br />
6. Study samples should be injected in a random order to reduce any effects of<br />
carryover or confounding with a non-random injection order (see Note 8).<br />
7. All water blank samples should be analyzed by the mass spectrometer in the same<br />
manner as study samples in order to monitor carryover (see Note 9).<br />
3.5. Zoom Scan Data Processing<br />
The data collected from a zoom scan triple-play experiment are used to<br />
estimate the quality of the subsequent MS/MS spectrum, the charge state of<br />
the peptide, and the monoisotopic and average mass of the peptide. The quality<br />
estimate is used to eliminate those scan events that are triggered by noise<br />
or small molecules from further downstream processing. Peptide mass and<br />
charge state estimates are used in subsequent steps for peptide identification.<br />
Eliminating low-quality scan events and more accurately estimating the charge<br />
state and mass of peptides ultimately reduces the number of false positives that<br />
must be dealt with at the peptide identification stage of the process.<br />
1. Assume the charge state of the detected peptide is 1 + .<br />
2. Given the m/z of the scan event and the assumed charge state, estimate the<br />
theoretical isotope distribution intensities for a peptide of the hypothesized mass<br />
using the relationships given in Fig. 1 (see Note 10). Begin by determining the<br />
relative intensity of the 12 C peak (I 0 ) using the relationship in Fig. 1A and the<br />
MW for the assumed charge state. Next, estimate the relative peak intensity of
Label-Free Biomarker Identification 215<br />
the 13 C peak (I 1 ) by multiplying the estimate of I 0 by the I 1 /I 0 ratio from Fig. 1B<br />
using the MW for the assumed charge state. Isotope intensities I 2 and I 3 are<br />
derived in a similar manner using the ratios from Fig. 1C–D at the MW for the<br />
assumed charge state.<br />
3. Convolve the estimated theoretical isotope stick spectrum with a Gaussian peak<br />
shape that has a peak width similar to that produced in a typical zoom scan<br />
spectrum (18). Linearly scale the result of this convolution such that the maximum<br />
value is one.<br />
(A)<br />
(B)<br />
l 0 / max(l 0 ,l 1 ,l 2 ,l 3 )<br />
l 2 / l 0<br />
1.0 2.0<br />
0.0<br />
500 2500<br />
Mono MVV<br />
(C)<br />
500 2500<br />
Mono MVV<br />
l 3 / l 0<br />
0.5 0.8<br />
l 1 / l 0<br />
0.5 1.5<br />
0.0 1.0<br />
500 2500<br />
Mono MVV<br />
(D)<br />
500 2500<br />
Mono MVV<br />
Fig. 1. Empirically derived relationships (from 15,493 example peptides) between<br />
isotope peak intensities used to estimate the theoretical isotope pattern for a peptide<br />
(A) I 0 /max(I 0 ,I 1 ,I 2 ,I 3 ), non-linear least squares fit:<br />
{ }<br />
1 if MW< 1800<br />
I 0 /maxI 0 I 1 I 2 I 3 =<br />
e −000132+MW <br />
−18000865 if MW ≥ 1800<br />
(B) I 1 /I 0 , linear least squares fit:<br />
I 1 /I 0 =−000498 + 0000560MW ,<br />
(C) I 2 /I 0 , linear least squares fit:<br />
I 2 /I 0 =−0367 + 0000516MW + 159×10 −7 MW − 152734 2 , and<br />
(D) I 3 /I 0 , nonlinear least squares fit: I 3 /I 0 = 00000605e 000251MW −270×10−7 MW 2 .<br />
Reprinted with permission from (10).
216 Higgs et al.<br />
4. Convolve the result from step 3 above with the measured zoom scan to obtain the<br />
matched filter output between the expected zoom scan spectrum from the assumed<br />
charge state and the measured zoom scan spectrum. Record the maximum value<br />
of the output of this convolution along with the x-axis (m/z) value where the<br />
maximum occurred.<br />
5. Repeat steps 2–4 above for an assumed charge state of 2 + ,3 + , and 4 + . The<br />
detected peptide charge state and mass are estimated from the best match between<br />
the observed zoom scan spectrum and the theoretically derived spectrum for<br />
the possible charge states of 1 + ,2 + ,3 + , and 4 + . The cross-correlation between<br />
the best matching theoretical isotope pattern at the m/z shift value associated<br />
with the convolution maximum and the measured zoom scan is used as an<br />
intensity-independent matching score between the measured and the best matching<br />
theoretical spectrum. Triple play events with a cross-correlation score greater<br />
than 0.6 are retained for identification. Triple plays below this threshold represent<br />
scans that are not peptides, a mixture of several peptides in the ion trap, or<br />
very low signal-to-noise measurements. These lower quality scan events are not<br />
retained for any further processing.<br />
3.6. MS/MS Spectral Filtering<br />
In order to reduce the effect of MS/MS noise peaks on the identification of<br />
peptides, a dynamic MS/MS noise level is estimated for each spectrum. This<br />
noise level estimate is then subtracted from all MS/MS peak intensities with<br />
any resulting differences less than zero set to zero. The spectral noise level is<br />
estimated based on the observation that ideal MS/MS spectra of peptides have<br />
relatively few peaks (e.g., y-ions, b-ions, adducts, etc.) in a theoretical or high<br />
signal-to-noise ratio spectrum, while noisy MS/MS spectra typically have a<br />
high density of peaks within a local m/z neighborhood (interpreted as chemical<br />
noise). Therefore, the filtering approach uses a percentile of the peak intensities<br />
within a local m/z neighborhood as the noise estimate, where the percentile<br />
used is based on the density of peaks in the neighborhood – a higher peak<br />
density results in a higher percentile to estimate the local noise level, a lower<br />
peak density results in a lower percentile to estimate the local noise level.<br />
1. Bin the MS/MS spectrum into a vector of equally spaced m/z values (bin width<br />
of 0.1 m/z).<br />
2. At 200 equally spaced m/z value design points between the maximum and<br />
minimum observed m/z values observed in the MS/MS spectrum, estimate the<br />
local peak density by counting the number of non-zero intensities in a ±20 m/z<br />
window around each of the 200 design points. Define the local peak density at<br />
these 200 design points as the number of non-zero peaks counted divided by 40<br />
(peaks per m/z).<br />
3. Transform the local peak density values to a filtering percentile value using the<br />
relationship shown in Fig. 2.
Label-Free Biomarker Identification 217<br />
Fig. 2. Filtering percentile as a function of local MS/MS peak density. Peak density<br />
is defined as the number of MS/MS peaks in a 40 m/z window divided by 40.<br />
{ }<br />
0 if PeakDensity ≤ 01<br />
Filtering Percentile =<br />
if PeakDensity > 01 <br />
Reprinted with permission from (10).<br />
075<br />
1+e 015−PeakDensity<br />
005<br />
4. Obtain an initial noise level estimate by the percentile of MS/MS peak intensities<br />
at each of the 200 design points, where the percentile used at each point is derived<br />
from step 3 above (see Note 11).<br />
5. Smooth the initial noise estimates with a Gaussian kernel smooth (150 m/z<br />
bandwidth) and interpolate between the 200 design points to obtain the final<br />
MS/MS noise estimate at each measured m/z value. Subtract this estimate from<br />
the measured MS/MS peak intensities and set any negative values to zero. An<br />
example of a high and low signal-to-noise MS/MS spectrum and the resulting<br />
estimated noise levels is shown in Fig. 3.<br />
3.7. Peptide Identification<br />
A detailed description of peptide identification is beyond the scope of this<br />
chapter, but some general discussion is warranted given the importance of the<br />
subject and its linkage to quantification with the proposed method. The primary<br />
problem with peptide identification is controlling for false-positive identifications<br />
while maintaining a reasonable sensitivity to detect correct identifications.<br />
Our approach utilizes the outputs of two search engines, Sequest (19) and<br />
X! Tandem (20), along with other descriptive features of identification (e.g.,<br />
charge state, peptide length, etc.) as inputs to a classifier that has been trained
218 Higgs et al.<br />
(A)<br />
50,000<br />
Intensity Intensity<br />
150,000 350,000<br />
0 20,000<br />
200 600 1000 1400<br />
m/z<br />
(B)<br />
0<br />
500 1000 1500<br />
Fig. 3. Example MS/MS spectra and their estimated noise levels. 443 original peaks<br />
reduced to 118 peaks above estimated noise level in high-noise spectrum (A). 589<br />
original peaks reduced to 173 peaks above estimated noise level in lower noise spectrum<br />
(B). Reprinted with permission from (10).<br />
m/z<br />
to identify correct identifications (21). The output of the classifier provides a<br />
unit-less score indicative of the likelihood of a correct identification. Falsepositive<br />
identifications are controlled by running the searches against reversed<br />
versions of the protein databases and estimating the p-values: the probability<br />
of observing a model score from the reversed database search that exceeded<br />
the observed score from the correct database. P-values alone are insufficient<br />
due to the large number of tests (identifications) being done (i.e., with a 0.05<br />
p-value cutoff, 5% of identifications declared correct would in fact be incorrect<br />
in the null condition where there are truly no matches to any MS/MS spectra).<br />
To account for multiple testing, false discovery rates (FDRs) (q-values) for
Label-Free Biomarker Identification 219<br />
peptide identifications are estimated from p-values using the method described<br />
by Benjamini and Hochberg (22). Peptides with identification q-values less than<br />
a threshold, say 0.10, are retained for quantification. Proteins identified by only<br />
one peptide are visually examined to eliminate obvious incorrect identifications<br />
(e.g., less than four consecutive y- or b-ions). We estimate that the proportion of<br />
false identifications using such a procedure is less than or equal to 2%. Overall,<br />
the method is similar in strategy to PeptideProphet (23) with the following<br />
extensions: multiple search engines are employed, a more flexible classifier<br />
(e.g., Random Forests) is used, and statistical significance is estimated from a<br />
null distribution of classifier scores derived from reversed database searching<br />
instead of fitting a mixture model to the distribution of classifier output scores.<br />
The method is described in detail in Higgs et al. (11).<br />
In general, we typically restrict biomarker hypothesis generation to identified<br />
peptides. The same relative quantification method can be used with unidentified<br />
peptides (MS features), although in practice these features need to be identified<br />
to be of practical use to clinicians and biologists. To maximize the coverage<br />
of proteins identified in a study, identifications from all samples in the study<br />
are pooled and used to create a list of peptides to quantify in each sample.<br />
Thus, a confident identification needs to be made once out of a sample in order<br />
for the associated peptide ion current to be quantified in all study samples.<br />
Pooling the identifications across all samples in a study significantly increases<br />
the number of identifications relative to the number of identifications from any<br />
single sample.<br />
3.8. Chromatographic Alignment<br />
Variability in the abundance of individual peptides between different samples<br />
may result in that peptide triggering an MS/MS scan in one sample and not in<br />
another. The area of this peptide may still be extracted from the primary mass<br />
spectrum in each sample. However, doing so requires high-quality chromatographic<br />
alignment between the samples so that a consistent region in the<br />
extracted ion chromatogram (XIC) is used for integration across all samples in<br />
a study. Large biomarker studies can produce chromatographic retention time<br />
shifts greater than 1 min between pairs of samples run several days and many<br />
samples apart. Simply expanding the integration window by 1 or 2 min to<br />
account for chromatographic variability is not an option in our experience as<br />
we are analyzing complex samples with multiple co-eluting peaks at most XIC<br />
masses. An expanded integration window that includes multiple peaks masks<br />
the quantification of individual peptides, produces results that are confounded<br />
with multiple peptides contributing to a value, and increases variability. Peak<br />
picking is another option, but was not applied here due to the computational
220 Higgs et al.<br />
cost as well as the inherent heuristic nature of peak picking algorithms with<br />
an associated variability in what is being integrated. We have found a simple<br />
pair-wise alignment between all samples and a select reference sample in the<br />
study to work well for numerous biomarker discovery projects. This approach<br />
to alignment is founded on the following assumptions: (a) the samples included<br />
in the study are generally quite similar to each other with respect to their peptide<br />
content (i.e., there are many peptides or landmarks in common between the<br />
samples), (b) the same chromatographic conditions are used for each sample in<br />
the study, and (c) in a local region of retention time, the retention time offset<br />
between any two samples is approximately constant (see Note 12).<br />
1. Identify the landmarks in the reference sample by taking all triple-play scan events<br />
with a zoom scan cross-correlation score of 0.65 or greater. This set of reference<br />
sample landmarks will be matched against other samples in the study.<br />
2. Identify the matching landmarks in a study sample by declaring a landmark match<br />
if the sample and reference triple-play events have: (a) the retention time of the<br />
triple play event between the samples is within a user-specified amount (5 min),<br />
(b) the charge state of the peptide matches, (c) the m/z value of the monoisotopic<br />
peak from the zoom scans is within a user-specified amount (0.7 Da) between the<br />
two samples, (d) the zoom scan cross-correlation coefficient of both peptides to<br />
their respective theoretical isotope patterns exceeds a threshold (0.65), and (e) the<br />
similarity between the corresponding MS/MS spectra exceeds a threshold (e.g.,<br />
0.75). The MS/MS similarity metric has been implemented as a cross-correlation<br />
coefficient between two MS/MS spectra following a convolution of each MS/MS<br />
stick-spectrum with a Gaussian peak shape.<br />
3. For each matching pair of landmarks identified in step 2 above, generate the<br />
XIC for the feature in a local retention time window (e.g., ±5 min of scan event<br />
time in each sample). Convolve the two XICs to identify the time shift value that<br />
maximizes the convolution result between the landmark XICs in both samples.<br />
Record the time shift and cross-correlation at the optimal shift value for each<br />
landmark. The cross-correlation value will be used as a weighting factor in the<br />
subsequent smoothing step below.<br />
4. The optimal time shift values for each pair of landmarks between a sample and the<br />
reference defines a warping function that can be used to transform the retention<br />
time values of a sample to the reference. Estimate a smooth warping function<br />
by fitting a weighted loess (24) to the time shift versus retention time values<br />
for each sample. The loess should be done in a weighted manner using the XIC<br />
cross-correlation values from step 3 above as weights. The result is a smooth<br />
function that can be used to transform a sample’s retention time to a common<br />
time defined by the reference sample Fig. 4.<br />
5. The loess warping function for a sample is then applied to all the retention times<br />
in the chromatogram (landmark or not). Thus, all samples in a study are projected<br />
onto the same retention time scale. The warping function between two samples is<br />
generally not monotonic over the entire retention time range, and no restriction
Label-Free Biomarker Identification 221<br />
Shift (min) n = 462<br />
–0.5<br />
0.0<br />
0.5<br />
0 20 40 60 80 100 120<br />
Ret. Time (min)<br />
Fig. 4. Example chromatographic alignment (“warping”) function between two rat<br />
serum samples. Retention time shift (min) vs. retention time (min) for 462 landmark<br />
peptides are plotted with the resulting loess fit. Reprinted with permission from (10).<br />
on overall monotonicity is used in our estimate of the warping function. We<br />
do, however, preserve the overall rank order of the retention times following<br />
alignment by constraining the bandwidth (span = 0.5) used in the loess fitting<br />
(24) (see Note 13).<br />
3.9. Peptide Quantification<br />
Relative quantification of peptides is carried out by integration of the XIC<br />
peak (using normalized retention times from the chromatographic alignment)<br />
from the primary mass spectrum within each sample. A list of peptides to<br />
integrate within each sample is constructed by pooling together all triple-play<br />
events across all the samples. This pooling can be done with or without the use<br />
of peptide identification. As previously noted, we typically restrict the analyses<br />
to identified peptides. For each identified peptide, perform the following<br />
steps:<br />
1. For each sample in which the peptide was identified, extract the XIC for the<br />
peptide and compute the centroid (weighted average of retention time values<br />
where weighting factor is the XIC ion current) of the XIC in a small retention<br />
time neighborhood (–0.5 min to +1.0 min from triple-play trigger time) using the<br />
aligned time values in the XIC. Compute the mean centroid time for the peptide<br />
over all samples in which the peptide was identified. Also compute the mean<br />
average m/z value estimated from the zoom scan spectrum for each sample in<br />
which the peptide was identified.
222 Higgs et al.<br />
2. For each sample in the study, create an XIC for the peptide using the mean zoom<br />
scan average m/z value determined in step 1.<br />
3. Estimate a local XIC baseline level and subtract the baseline from the XIC<br />
intensity values from each sample. A local linear baseline can be estimated by<br />
fitting a line between the lowest intensity XIC point before the peak and the lowest<br />
intensity XIC point following the peak in a local neighborhood (e.g., 5 min).<br />
This simple local linear baseline estimate always results in a baseline estimate<br />
below the signal intensity in the local neighborhood, leading to a low bias in the<br />
estimated baseline. For large peaks, this bias is negligible but for small peaks<br />
the bias may have a more pronounced effect on quantification. Alternatively,<br />
an asymmetric least squares smoothing approach may be used to estimate the<br />
baseline XIC values in order to reduce the potential bias with the simple local<br />
linear approach (25).<br />
4. A fixed retention time window (±0.5 min for the chromatography described)<br />
around the mean centroid time value described in step 1 is used for integration.<br />
The width of this window is dependent on the chromatography method used.<br />
For the chromatography method reported here, the peak width remains relatively<br />
constant across the HPLC gradient (i.e., no band-broadening is observed).<br />
If band-broadening is observed, then the integration window width should<br />
be modeled as a function of the retention time (e.g., integration window<br />
width = intercept + slope × retention time).<br />
5. Integrate the baseline corrected XIC values within the fixed retention time window<br />
for each sample in the study using a numerical integration algorithm such as the<br />
trapezoid rule. Record the XIC area values for each peptide in each sample. An<br />
example of XIC integration for a small study is shown in Fig. 5.<br />
3.10. Data Transformation and Normalization<br />
Following the integration of peptide-specific XIC peaks in all study samples,<br />
we have a rectangular data table with N rows corresponding to N samples in<br />
the study, and P columns corresponding to peptides detected in the study. The<br />
cell values in this data table are the peptide peak areas. With this table in hand,<br />
the usual operations of transformation and normalization may be applied prior<br />
to any statistical analysis.<br />
1. Peptide peak areas are approximately log-normal distributed. Apply a log 2 transformation<br />
to all peak area values (see Note 14).<br />
2. Normalize the log 2 transformed peak areas using a quantile normalization<br />
procedure (26) (see Note 15).<br />
3. Normalized log 2 peptide areas may be used directly as input to the statistical<br />
analysis for the study (peptide level analysis). Additionally, the average<br />
of normalized log 2 peptide areas for all the peptides identified from a protein<br />
can be used as an overall estimate of the protein level (protein level analysis,<br />
see Note 16).
Label-Free Biomarker Identification 223<br />
Fig. 5. XICs from the 2 + –1 macroglobulin peptide ATPLSLCALTAVDQSVLL-<br />
LKPEAK for eight rat serum samples following chromatographic alignment. Note that<br />
the peak from all samples fits within the highlighted [83.2, 84.2] integration region.<br />
Reprinted with permission from (10).
224 Higgs et al.<br />
3.11. Study Design, Power, Sample Size, and Analysis<br />
Our strategy of producing an N × P table of relative peptide levels allows<br />
the flexibility for the analysis to be done in a manner consistent with the<br />
study design. Note that no part of the described method imposes any limitation<br />
on the final study statistical analysis (e.g., pooling of samples, subtractiveor<br />
difference-based methods, etc.). In general, the statistical analysis used for<br />
identifying potential protein biomarkers in a study should follow the same<br />
approach as a primary clinical endpoint analysis would take (i.e., a simple<br />
paired design should be analyzed with a paired t-test, a crossover design with<br />
repeated measures within period should be analyzed as a crossover study with<br />
repeated measures within period, etc.).<br />
An analysis of a single clinical endpoint may use the familiar type I error<br />
threshold of 0.05 as a measure of statistical significance. This approach does not<br />
work well when testing hundreds or thousands of proteins in a study because, by<br />
definition, 5% of all p-values from a null experiment (an experiment in which<br />
there is truly no treatment or group effect) will have a p-value less than 0.05.<br />
The Bonferroni approach to control the family-wise type I error (controlling<br />
for no errors in the set of declared changes) has been commonly employed as<br />
a means to control false-positive findings (27). However, many investigators<br />
doing proteomic hypothesis generation are willing to tolerate some level of falsepositive<br />
findings in a declared set as long as it is relatively low and estimated.<br />
The use of FDR as a means to identify a set of declared findings with a<br />
specified proportion of false-positives has been widely applied in genomics (22)<br />
and is the current recommendation for proteomic hypothesis generating experiments.<br />
There are numerous estimators of FDR (28,29) with the original method<br />
described by Benjamini and Hochberg used in the work presented here (22).<br />
Just as multiple comparisons should be considered in the analysis of study<br />
data, these should also be considered at the design stage of a new study<br />
aimed at generating hypotheses from highly multiplexed measurements like<br />
proteomics. This is a relatively new field of research with several methods<br />
recently reported (30,31,32,33). A simple approach originally suggested by<br />
Benjamini and Hochberg (22), and adapted by Bemis (34), uses traditional<br />
sample-size calculations with the following expression for average type I error<br />
( ave ) over a set of tested hypotheses: ave = f ave q ∗ m 1<br />
where f m 1 +m 0 1−q ∗ ave is the<br />
average power of hypothesis tests conducted in a study, q ∗ is the rate at which<br />
FDR is to be controlled, m 0 is the number of true null hypotheses tested, and m 1<br />
is the number of true alternative hypotheses tested. Sample-size estimates are<br />
made by first estimating ave using the desired values for f ave and q ∗ , assumed<br />
values for m 0 and m 1 , and existing sample size calculators using for a given<br />
study design. An example set of sample-size curves using ave this approach<br />
for the two-sample t-test design is given in Fig. 6.
Label-Free Biomarker Identification 225<br />
Fig. 6. Estimated sample sized required to detect protein changes in a two-sample<br />
t-test design. Number of subjects in each of the two groups is plotted against the<br />
detectable effect size expressed as a fold-change. Four different levels of total variability<br />
are shown (10% CV, 20% CV, 30% CV, and 40% CV). Sample size estimates were<br />
made using 85% power, a 0.10 target FDR for declaring significance, and an estimated<br />
m<br />
proportion of true null hypotheses, 0<br />
, set to 0.98.<br />
m 0 +m 1<br />
4. Notes<br />
1. We find that plasma total protein concentration, as measured by a Bradford<br />
assay, has a total coefficient of variation (CV) of approximately 11% (includes<br />
inter-subject, intra-subject, and assay error) and ranges between approximately<br />
48 and 68 mg/mL (12). Due to the apparent highly regulated plasma total protein<br />
concentration, it is not generally necessary to measure total protein concentration<br />
for each sample in a study in order to load a consistent amount of protein.<br />
2. The depletion material used is based on a dye affinity removal method for<br />
albumin. There are commercially available antibody-based depletion kits that<br />
may improve albumin removal at a reasonable cost. Abundant protein depletion<br />
is an open and active research area at the time of this writing.<br />
3. Chicken lysozyme is added as a spiked internal standard at this stage in order<br />
to qualitatively assess the digestion efficiency as well as to quantitatively assess<br />
the measurement error across the samples in a study. Other internal standard(s)<br />
could also be used.<br />
4. The reduction/alkylation solution should be prepared just before use.<br />
Triethylphosphine is pyrophoric and should be handled in a fume hood in accordance<br />
with the material safety data sheet. The use of volatile reagents for this step
226 Higgs et al.<br />
reduces the variability in the sample prep by minimizing sample handling steps<br />
and removing the majority of reduction and alkylating reagents. The digestion is<br />
performed with trypsin, which is sensitive to the presence of reducing reagents.<br />
5. We find that CSF total protein concentration, as measured by a Bradford assay,<br />
has a total CV of approximately 27% (includes inter-subject, intra-subject, and<br />
assay error with the additional total variability relative to plasma total protein<br />
attributed to a higher CSF inter-subject variance) with a range between approximately<br />
0.12 and 0.41 μg/mL (12). The higher overall variability is attributed<br />
to a significantly higher inter-subject variability relative to plasma total protein<br />
(12). Due to the higher variability with CSF total protein, we use the results of<br />
Bradford total protein assay to process a consistent total CSF protein amount in<br />
the proteomics assay.<br />
6. The HPLC pumps must be capable of producing a smooth gradient at 50 μL/min.<br />
The gradient formation should be verified by using water in A and 1% acetone<br />
in water for B and running the gradient with UV monitoring at 254 nm. New<br />
HPLC columns should be conditioned with at least four runs of digested serum<br />
before use in the method.<br />
7. The mass spectrometer’s source should be carefully cleaned to minimize chemical<br />
noise. Monitor above 300 m/z and try to maximize the injection time as this is<br />
directly proportional to achievable dynamic range in an ion trap mass spectrometer.<br />
The spray conditions should be optimized for a peptide of about ˜1700 Da.<br />
8. Alternatively, a design could be used to balance various study factors (e.g.,<br />
treatment, gender, age, etc.) with injection order. This approach may be<br />
most appropriate for small studies (e.g.,
Label-Free Biomarker Identification 227<br />
the +3 13 C isotopic peak. The 15,493 example peptides were then used to derive<br />
relationships for I 0 /max (I 0 ,I 1 ,I 2 ,I 3 ), I 1 /I 0 , I 2 /I 0 , and I 3 /I 0 as functions of the<br />
peptide monoisotopic molecular weight (Fig. 1).<br />
11. Percentile transformation is done to define the noise level as the X th percentile<br />
of the peak intensities in a local m/z neighborhood where X is dependent on<br />
the peak density in the neighborhood (higher peak density–>higher percentile–<br />
>higher estimated noise level).<br />
12. One potential improvement to this alignment strategy would be to create a<br />
composite list of landmarks across all study samples instead of relying on a single<br />
sample to serve as the retention time reference. This could easily be accomplished<br />
by grouping or clustering landmarks from all samples enforcing a match on m/z,<br />
charge state, retention time, and MS/MS spectral similarity. This has not been<br />
employed yet due to the increased computational cost and the lack of data demonstrating<br />
any significant problems with the single reference sample approach. In<br />
practice, several different samples are evaluated as potential alignment reference<br />
samples, and the best sample based on a qualitative assessment of the alignment<br />
warping functions is chosen.<br />
13. A visual examination of the alignment warping functions for all samples included<br />
in a study is an effective means to detect and diagnose chromatography problems<br />
encountered in the analysis of dozens of study samples. For example, oscillatory<br />
warping functions have been associated with pump mixing problems while large<br />
magnitude mostly linear warping functions have been associated with column<br />
degradation.<br />
14. Log 2 is convenient because a unit change can be interpreted as a twofold change<br />
on the original scale.<br />
15. Normalization can be particularly important for minimizing systematic biases in<br />
ion current introduced by sample collection and handling, sample concentration,<br />
instrument sensitivity drift during the course of data acquisition, etc. The spiked<br />
internal standard, chicken lysozyme can be helpful in diagnosing and monitoring<br />
ion intensities before and after normalization. Quantile normalization assumes<br />
that the overall distribution of log 2 peptide peak areas is unchanged from sample<br />
to sample. This is generally a reasonable assumption, but there are cases where<br />
a treatment effect may modulate the level of most of the proteins detected in<br />
a study, and in such cases quantile normalization should not be used. In these<br />
cases, the spiked internal standard, chicken lysozyme can be used to normalize<br />
any systematic effects of the process on ion current occurring only after the<br />
standard was spiked.<br />
16. In practice, we will analyze a study at both the peptide and protein levels.<br />
Peptide-level analyses are generally specific to the identified peptide and allow<br />
the opportunity to discover biologically related changes in peptide level due<br />
to processing of a specific region of a protein. Protein-level analyses provide<br />
additional statistical power to detect smaller magnitude changes in protein levels<br />
since we are averaging multiple peptide values, all of which have a high positive<br />
covariance.
228 Higgs et al.<br />
Acknowledgments<br />
We thank John Saalwaechter and Andrew Kaczorek and the entire scientific<br />
computing team for their efforts in developing and maintaining a highavailability<br />
grid-computing environment used for this work. We also thank<br />
Jude Onyia and the statistical and mathematical sciences management team for<br />
supporting us in the development of these methods.<br />
References<br />
1. FDA Critical Path Initiative 2006 (http://www.fda.gov/oc/initiatives/criticalpath).<br />
2. NIH Road Map for Medical Research 2006 (http://www.nihroadmap.nih.gov/<br />
index.asp).<br />
3. Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., Gelb, M.H., and Aebersold, R. 1999.<br />
Quantitative analysis of complex protein mixtures using isotope-coded affinity<br />
tags. Nat. Biotechnol. 17: 994–999.<br />
4. Aggarwal, K., Choe, L.H., and Lee, K.H. 2006. Shotgun proteomics using the<br />
iTRAQ isobaric tags. Brief. Funct. Genomic. Proteomic. 5: 112–120.<br />
5. Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A.,<br />
Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C. et al 2002. Use<br />
of proteomic patterns in serum to identify ovarian cancer. Lancet 359: 572–577.<br />
6. Radulovic, D., Jelveh, S., Ryu, S., Hamilton, T.G., Foss, E., Mao, Y., and Emili, A.<br />
2004. Informatics platform for global proteomic profiling and biomarker discovery<br />
using liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics 3:<br />
984–997.<br />
7. Wiener, M.C., Sachs, J.R., Deyanova, E.G., and Yates, N.A. 2004. Differential<br />
mass spectrometry: a label-free LC-MS method for finding significant differences<br />
in complex peptide and protein mixtures. Anal. Chem. 76: 6085–6096.<br />
8. Gao, J., Opiteck, G.J., Friedrichs, M.S., Dongre, A.R., and Hefta, S.A. 2003.<br />
Changes in the protein expression of yeast as a function of carbon source.<br />
J. Proteome. Res. 2: 643–649.<br />
9. Colinge, J., Chiappe, D., Lagache, S., Moniatte, M., and Bougueleret, L. 2005.<br />
Differential Proteomics via probabilistic peptide identification scores. Anal. Chem.<br />
77: 596–606.<br />
10. Higgs, R.E., Knierman, M.D., Gelfanova, V., Butler, J.P., and Hale, J.E. 2005.<br />
Comprehensive label-free method for the relative quantification of proteins from<br />
biological samples. J. Proteome. Res. 4: 1442–1450.<br />
11. Higgs, R.E., Knierman, M.D., Freeman, A.B., Gelbert, L.M., Patil, S.T., and<br />
Hale, J.E. 2007. Estimating the statistical significance of peptide identifications<br />
from shotgun proteomics experiments. J. Proteome. Res. 6: 1758–1767.<br />
12. Patil, S.T., Higgs, R.E., Brandt, J.E., Knierman, M.D., Gelfanova, V., Butler, J.P.,<br />
Downing, A.M., Dorocke, J., Dean, R.A., Potter, W.Z. et al. 2007. Identifying<br />
pharmacodynamic protein markers of centrally active drugs in humans: a pilot<br />
study in a novel clinical model. J. Proteome. Res. 6: 955–966.
Label-Free Biomarker Identification 229<br />
13. Anderson, L., and Hunter, C.L. 2006. Quantitative mass spectrometric multiple<br />
reaction monitoring assays for major plasma proteins. Mol Cell Proteomics 5:<br />
573–588.<br />
14. Anderson, N.L., and Anderson, N.G. 2002. The human plasma proteome: history,<br />
character, and diagnostic prospects. Mol Cell Proteomics 1: 845–867.<br />
15. Gutman, S., and Kessler, L.G. 2006. The US Food and Drug Administration<br />
perspective on cancer biomarker development. Nat. Rev. Cancer 6: 565–571.<br />
16. Rifai, N., Gillette, M.A., and Carr, S.A. 2006. Protein biomarker discovery and<br />
validation: the long and uncertain path to clinical utility. Nat. Biotechnol. 24:<br />
971–983.<br />
17. Hale, J.E., Butler, J.P., Gelfanova, V., You, J.S., and Knierman, M.D. 2004.<br />
A simplified procedure for the reduction and alkylation of cysteine residues in<br />
proteins prior to proteolytic digestion and mass spectral analysis. Anal. Biochem.<br />
333: 174–181.<br />
18. Proakis, J.G., and Manolakis, D.G. 1992. Digital Signal Processing – Principles,<br />
Algorithms and Applications. Prentice Hall, New York, NY.<br />
19. Eng, J.K., Mccormack, A.L., and Yates, J.R. 1994. An approach to correlate tandem<br />
mass spectral data of peptides with amino acid sequences in a protein database.<br />
Journal of the American Society for Mass Spectrometry 5: 976–989.<br />
20. Craig, R., and Beavis, R.C. 2003. A method for reducing the time required to match<br />
protein sequences with tandem mass spectra. Rapid Commun. Mass Spectrom. 17:<br />
2310–2316.<br />
21. Ulintz, P.J., Zhu, J., Qin, Z.S., and Andrews, P.C. 2006. Improved classification<br />
of mass spectrometry database search results using newer machine learning<br />
approaches. Mol Cell Proteomics 5: 497–509.<br />
22. Benjamini, Y., and Hochberg, Y. 1995. Controlling the false discovery rate - a<br />
practical and powerful approach to multiple testing. Journal of the Royal Statistical<br />
Society Series B-Methodological 57: 289–300.<br />
23. Keller, A., Nesvizhskii, A.I., Kolker, E., and Aebersold, R. 2002. Empirical statistical<br />
model to estimate the accuracy of peptide identifications made by MS/MS<br />
and database search. Anal. Chem. 74: 5383–5392.<br />
24. Cleveland, W.S., Grosse, E., and Shyu, W.M. 1992. Local regression models.<br />
In Statistical Models in S. J.M. Chambers and T.J. Hastie, eds. Wadsworth &<br />
Brooks/Cole, Pacific Grove, CA.<br />
25. Boelens, H.F., Dijkstra, R.J., Eilers, P.H., Fitzpatrick, F., and Westerhuis, J.A. 2004.<br />
New background correction method for liquid chromatography with diode array<br />
detection, infrared spectroscopic detection and Raman spectroscopic detection. J.<br />
Chromatogr. A 1057: 21–30.<br />
26. Bolstad, B.M., Irizarry, R.A., Astrand, M., and Speed, T.P. 2003. A comparison<br />
of normalization methods for high density oligonucleotide array data based on<br />
variance and bias. Bioinformatics 19: 185–193.<br />
27. Miller, R.G., Jr. 1991. Simultaneous Statistical Inference. Springer-Verlag,<br />
New York.
230 Higgs et al.<br />
28. Butler, K.W., Deslauriers, R., Geoffrion, Y., Storey, J.M., Storey, K.B., Smith, I.C.,<br />
and Somorjai, R.L. 1985. 31P nuclear magnetic resonance studies of crayfish<br />
(Orconectes virilis). The use of inversion spin transfer to monitor enzyme kinetics<br />
in vivo. Eur. J. Biochem. 149: 79–83.<br />
29. Efron, B. 2004. Large-scale simultaneous hypothesis testing: the choice of a null<br />
distribution. J. Am. Stat. Soc. 99: 96–104.<br />
30. Pounds, S., and Cheng, C. 2005. Sample size determination for the false discovery<br />
rate. Bioinformatics 21: 4263–4271.<br />
31. Hu, J., Zou, F., and Wright, F.A. 2005. Practical FDR-based sample size calculations<br />
in microarray experiments. Bioinformatics 21: 3264–3272.<br />
32. Jung, S.H. 2005. Sample size for FDR-control in microarray data analysis. Bioinformatics<br />
21: 3097–3104.<br />
33. Li, S.S., Bigler, J., Lampe, J.W., Potter, J.D., and Feng, Z. 2005. FDR-controlling<br />
testing procedures and sample size determination for microarrays. Stat. Med. 24:<br />
2267–2280.<br />
34. Bemis, K.G. 2005. Statistical Issues with Mass Spectrometry Proteomics for<br />
Biomarker Discovery. In International Workshop on Statistical Methodology in<br />
Clinical and Nonclinical R&DDIA conference, Nice, France.
13<br />
Analysis of the Extracellular Matrix and Secreted Vesicle<br />
Proteomes by Mass Spectrometry<br />
Zhen Xiao, Thomas P. Conrads, George R. Beck, Jr.,<br />
and Timothy D. Veenstra<br />
Summary<br />
The extracellular matrix (ECM) and secreted vesicles are unique structures outside of<br />
cells that carry out dynamic biological functions. ECM is created by most cell types and<br />
is responsible for the three-dimensional structure of the tissue or organ in which they<br />
are originated. Many cells also produce or secrete specialized vesicles into the ECM,<br />
which are thought to influence the extracellular environment. ECM is not s a physical<br />
structure to connect cells in a tissue or organ. The proteins in ECM and secreted vesicles<br />
are critical to cell function, differentiation, motility, and cell-to-cell interaction. Although<br />
a number of major structural proteins of ECM and secreted vesicles have long been<br />
known, an appreciation of the role of less-abundant non-collagenous proteins has just<br />
begun to emerge. This chapter outlines a series of methods used to isolate and enrich<br />
ECM constituents and secreted vesicles from bone-forming osteoblast cells, enabling<br />
comprehensive profiles of their proteomes to be obtained by mass spectrometry. These<br />
methods can be easily adapted to study ECM and secreted vesicles in other cell types,<br />
primary cell cultures derived from animal models, or tissue specimens.<br />
Key Words: extracellular matrix; matrix vesicle; osteoblast; proteomics; mass<br />
spectrometry.<br />
1. Introduction<br />
Most cells reside in a matrix environment called the extracellular matrix<br />
(ECM), which offers the structural and nutritional support as well as a protective<br />
barrier required for cells to survive, interact, and differentiate. In addition to<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
231
232 Xiao et al.<br />
the intracellular and tissue-related processes, it is becoming increasingly clear<br />
that alterations in the ECM can affect the pathogenesis of the disease. While<br />
much effort has been devoted to the understanding of intracellular processes,<br />
the characteristics and functions of ECM have not been equally well studied.<br />
The evidence gathered to date has shown that ECM is a complicated organelle<br />
formed of various proteins that play central roles in cell differentiation,<br />
migration, and cell-to-cell communication (1,2,3). The complexity of ECM is<br />
exemplified in the structure of a skeleton. The formation and homeostasis of<br />
bone is an ongoing process throughout life, and involves the recruitment, replication,<br />
and differentiation of osteoblasts and osteoclasts (4). Osteoblasts are<br />
derived from mesenchymal stem cells and have the potential to further develop<br />
into either osteocytes or lining cells. When induced by the appropriate stimuli,<br />
such as ascorbic acid and -glycerophosphate, osteoblasts undergo proliferation<br />
and maturation toward the osteocyte phenotype (Fig. 1) (5). This process is<br />
accompanied by the accumulation of an ECM and ultimately mineralization of<br />
the ECM in the form of hydroxyapatite (6). The deposition of hydroxyapatite<br />
in ECM is initiated by a unique type of vesicles secreted by osteoblasts, called<br />
matrix vesicles (MVs). With diameters ranging from 30–300 nm, these vesicles<br />
reside in the ECM and play a critical role in mineralization (7,8). They serve<br />
as nucleation sites for mineralization and sustain the accumulation of ECM (9).<br />
A number of proteins, such as annexins and phosphatases, have been identified<br />
within MVs. These proteins are responsible for the enrichment of calcium<br />
and phosphate within the vesicles (8,10,11,12,13). Although the presence and<br />
Fig. 1. The three-stage timeline of the osteoblast cell differentiation. The mineral<br />
deposition is visualized by alizarin red staining of the osteoblasts cultured in the<br />
differentiation medium.
Analysis of ECM and Secreted Vesicle Proteomes 233<br />
function of other proteins are largely unknown, changes in ECM and MV<br />
proteins are associated with diseases such as osteoporosis (14), arteriosclerosis<br />
(15,16,17,18), tumor development, and metastasis (19,20,21,22). A comprehensive<br />
profile of the proteins present in these extracellular organelles enables<br />
a greater understanding of pathophysiology underlying these clinical manifestations.<br />
The development of mass spectrometry (MS) technology combined with<br />
appropriate protein enrichment and peptide separation strategies has made this<br />
aim achievable (23,24,25,26).<br />
This chapter describes the extraction of ECM constituents and MVs from<br />
an osteoblast cell line MC3T3-E1 followed by the analysis of their respective<br />
proteomic profiles by liquid chromatography (LC) fractionation combined with<br />
MS analysis (27). The ECM and MVs are isolated and enriched using centrifugation<br />
and enzymatic approaches. The enrichment of MVs is confirmed by the<br />
measurement of elevated alkaline phosphatase (ALP) activity. Following the<br />
creation of a complex mixture of peptides via a tryptic digestion of the extracted<br />
proteins, this mixture is fractionated using strong cation exchange (SCX) LC.<br />
These fractions are analyzed by nanoflow reversed-phase LC-tandem mass<br />
spectrometry (nanoRPLC-MS/MS), and proteins are identified by searching the<br />
data against appropriate proteomic database.<br />
2. Materials<br />
2.1. Cell Culture<br />
1. MC3T3-E1 pre-osteoblast cell line (see Note 1)<br />
2. Cell culture medium MEM (Irvine Scientific, Santa Ana, CA)<br />
3. Fetal bovine serum (Atlanta Biologicals, Atlanta, GA)<br />
4. Penicillin-streptomycin solution (10,000 I.U./ml penicillin, 10,000 μg/ml streptomycin)<br />
(Invitrogen Corp., Carlsbad, CA)<br />
5. 200 mM of l-glutamine (Invitrogen Corp.)<br />
6. Growth medium: MEM supplemented with 10% fetal bovine serum, 50 U/ml<br />
penicillin, 50 μg/ml streptomycin, and 2 mM l-glutamine<br />
7. Differentiation medium: growth medium supplemented with 50 μg/ml ascorbic<br />
acid (Sigma Chemical Co., St. Louis, MO) and 10 mM -glycerophosphate (Sigma<br />
Chemical Co.)<br />
8. Phosphate-buffered saline (PBS)<br />
9. Trypsin/EDTA (0.25% (w/v) trypsin/0.53 mM EDTA solution in Hank’s BSS<br />
without calcium or magnesium) (ATCC, Manassas, VA)<br />
2.2. Extraction of the ECM Constituents<br />
1. Liberase/blendzyme 1 (0.14 Wünsch units/ml) (Roche Applied Science, Indianapolis,<br />
IN)<br />
2. Centrifuge<br />
3. Bicinchoninic acid (BCA) protein assay reagent kit (Pierce, Rockford, IL)
234 Xiao et al.<br />
2.3. Enrichment of MVs from the ECM<br />
1. Liberase/blendzyme 1 (0.14 Wünsch units/ml) (Roche Applied Science, Indianapolis,<br />
IN)<br />
2. Centrifuge<br />
2.4. Isolation of MVs from Medium<br />
1. Ultra-Clear centrifuge tubes: 1 × 3.5 in (38 ml) and 5/8×4in(17ml)(Beckman,<br />
Palo Alto, CA)<br />
2. Optima L-90K preparative ultracentrifuge (Beckman Coulter, Inc., Palo Alto, CA)<br />
2.5. Alkaline Phosphatase Assay<br />
1. Mild lysis buffer: 250 mM NaCl, 50 mM HEPES, pH 7.5, 0.1% NP-40<br />
2. ALP assay kit, including alkaline buffer (1.5 mM 2-amino-2-methyl-1-propanol,<br />
pH 10.3), p-nitrophenyl phosphate (PNPP) (4 mg/ml) and p-nitrophenol (PNP)<br />
standard solution (10 μmol/ml) (Sigma, St. Louis, MO)<br />
3. Flat bottom 96-well plate<br />
4. Lumimark microplate reader (Bio-Rad, Hercules, CA)<br />
2.6. Strong Cation Exchange Liquid Chromatography of Peptides<br />
1. Trypsin Gold, mass spectrometry grade (Promega, Madison, WI)<br />
2. 25% (v/v) acetonitrile containing 0.1% (v/v) formic acid<br />
3. SCX-LC column (1 mm × 150 mm, polysulfoethyl A) (PolyLC, Columbia, MD)<br />
Fig. 2. Transmission electron microscopic image of matrix vesicles in the ultracentrifuge<br />
pellets (A). The high magnification image (B) shows fine-needle deposits and<br />
black dots, likely signs of calcification, both inside and around the vesicles. Also note<br />
the bilayer membrane of the vesicles (arrowhead).
Analysis of ECM and Secreted Vesicle Proteomes 235<br />
4. Mobile phase A: 25% (v/v) acetonitrile<br />
5. Mobile phase B: 25% (v/v) acetonitrile containing 0.5 M ammonium formate, pH 3<br />
6. 0.1% (v/v) formic acid<br />
7. Vacuum centrifuge<br />
8. Laser-induced fluorescence (LIF) detector<br />
2.7. Nanoflow Reversed-phase Liquid Chromatography Tandem Mass<br />
Spectrometry<br />
1. Slurry packer model 1666 (Alltech, Columbia, MD)<br />
2. Ceramic cutter<br />
3. 75 μm i.d. × 360 μm o.d. × 12 cm long fused silica capillary column (Polymicro<br />
Technologies, Phoenix, AZ)<br />
4. 5 μm, 300 Å pore size C-18 silica-bonded stationary RP particles (Jupiter,<br />
Phenomenex, Torrance, CA)<br />
5. Agilent 1100 nanoLC system (Agilent Technologies, Palo Alto, CA) coupled with<br />
a linear ion-trap (LIT) mass spectrometer (LTQ, ThermoElectron, San Jose, CA)<br />
6. Glass sample injection vials 12 × 32 mm (Wheaton, Millville, NJ)<br />
7. Mobile phase A: 0.1% (v/v) formic acid<br />
8. Mobile phase B: 0.1% formic acid (v/v) in acetonitrile<br />
2.8. Bioinformatic Analysis<br />
1. 20-node Beowulf cluster computer server<br />
2. SEQUEST Cluster version 3.1 SR1 (Thermo Electron Corp., Waltham, MA)<br />
3. Bioworks Browser software 3.2 (Thermo Electron Corp.)<br />
2.9. Validation by Immunofluorescence Staining<br />
1. Primary antibodies: anti-annexin V, anti-emilin-1, anti-IQGAP1 (Santa Cruz<br />
Biotechnology, Inc., Santa Cruz, CA)<br />
2. Secondary antibodies: goat anti-rabbit IgG-FITC, and donkey anti-goat IgG-TR<br />
(Santa Cruz Biotechnology)<br />
3. PBS solution<br />
4. 18 × 18 × 0.15 mm thick glass cover slips<br />
5. Regular microscope glass slides<br />
6. Blocking serum: 10% normal blocking serum in PBS. The blocking serum is<br />
derived from the same species in which the secondary antibody is raised. For<br />
example, if the secondary antibody is raised in goat, use the normal goat serum<br />
diluted to 10% in PBS as the blocking serum.<br />
7. Fixative solution: 3.7% (v/v) formaldehyde in PBS<br />
8. DAPI diluted 1:50,000 in PBS (Invitrogen, Carlsbad, CA)<br />
9. ProLong mounting reagent (Invitrogen)<br />
10. Confocal fluorescence microscope LSM 510 Meta NLO (Carl Zeiss,<br />
Oberkochen, Germany)
236 Xiao et al.<br />
3. Methods<br />
The ECM proteins are extracted from cultured cells by a short exposure<br />
to an ECM-degrading enzyme. To isolate MVs that are either confined to<br />
the ECM or reside in the cell culture medium, two approaches may be used:<br />
(1) For MVs confined to the ECM, an ECM-degrading enzyme is first applied<br />
followed by centrifugation and ultracentrifugation; (2) for MVs in the medium,<br />
centrifugation and ultracentrifugation are applied. The characterization of ECM<br />
and MV proteomes is performed using LC fractionation and MS analysis.<br />
3.1. Cell Culture<br />
1. Grow the murine calvaria-derived osteoblast MC3T3-E1 cells in growth medium.<br />
The medium is changed every two or three days. Passage the cells with<br />
trypsin/EDTA (see Note 1).<br />
2. Once the cell culture reaches ∼50% confluency, replace the growth medium with<br />
10 ml of differentiation medium per plate to induce osteoblast differentiation.<br />
3. Extract the ECM or harvest culture medium on the day indicated in the methods<br />
below.<br />
3.2. Extraction of the ECM Constituents<br />
1. Grow MC3T3-E1 cells in differentiation medium on 10-cm plates. Change the<br />
medium every two or three days (see Note 2).<br />
2. On day 21, aspirate the medium from the plates. Wash the cells with 10 ml of<br />
PBS solution three times.<br />
3. Add 3 ml of liberase/blendzyme 1 solution to each plate. Incubate at 37°C for<br />
30 min.<br />
4. Carefully collect the digested supernatant from the plates without disturbing the<br />
cells.<br />
5. Centrifuge the supernatant at 2000×g for 5 min to remove any free cells. The<br />
resulting supernatant contains ECM proteins.<br />
6. Quantify the amount of ECM proteins using the BCA assay (see Note 3).<br />
3.3. Enrichment of MVs from the ECM<br />
1. Follow the same procedure described earlier to grow and prepare cells (see<br />
Subheading 3.2, steps 1 and 2, and Note 2).<br />
2. On day 21, aspirate the medium and wash the cells three times with PBS.<br />
3. Add 3 ml of liberase/blendzyme 1 solution to each plate. Incubate at 37°C for<br />
30 min (see Note 4).<br />
4. Collect the supernatant from the plates without disturbing the cells. Centrifuge<br />
the supernatant at 2000×g for 5 min to remove any cells that may have been<br />
detached from the plate. Collect the supernatant.<br />
5. Centrifuge the supernatant at 20,000×g at 4°C for 30 min.
Analysis of ECM and Secreted Vesicle Proteomes 237<br />
6. Transfer the supernatant to the Ultra-Clear centrifuge tubes. Use the centrifuge<br />
tubes that fit the volume of the supernatant. Fill the tubes with PBS up to about<br />
2 –3 mm from the top.<br />
7. Subject the supernatant to ultracentrifugation at 100,000×g at 4°C for 60 min.<br />
Carefully remove the supernatant without disturbing the pellet.<br />
8. The pellets are enriched with MVs designed as collagenase-released MVs<br />
(CRMVs) (see Note 5).<br />
9. Confirm the enrichment of CRMVs by assaying the ALP activity using an<br />
aliquot of the pellet (see Note 6 and Subheading 3.5).<br />
10. Resuspend the rest of the pellet in 25 mM NH 4 HCO 3 , pH 8.4. Quantify the<br />
amount of CRMV proteins in the pellet by BCA assay (see Note 3).<br />
3.4. Isolation of MVs from Medium<br />
1. Grow MC3T3-E1 cells in differentiation medium in four 10-cm plates.<br />
2. On day 15, collect the media from multiple plates (see Note 2).<br />
3. Separate cellular debris from the medium by centrifugation at 20,000×g for 30 min<br />
at 4°C.<br />
4. Transfer the supernatant to Ultra-Clear centrifuge tubes. Use the centrifuge<br />
tubes that fit the volume of the supernatant.<br />
5. Further centrifuge the supernatant by ultracentrifugation at 100,000×g for 60 min.<br />
6. Carefully remove the supernatant. The MVs in the pellet are designated as medium<br />
MVs (MMVs) (see Note 5 and Fig. 1).<br />
7. Resuspend an aliquot of the MMV sample in 25 mM NH 4 HCO 3 , pH 8.4.<br />
Determine the protein concentration in the pellet by BCA assay.<br />
3.5. Alkaline Phosphatase Assay<br />
1. For the standard curve: Dilute PNP standard 1:10 in dH 2 O. Add 0, 2, 4, 6, 8, 10,<br />
20, 30, 40, and 50 μl of the standard (i.e., 0, 2, 2, 4, 6, 8, 10, 20, 30, 40, and<br />
50 nmol, respectively) to the wells of a flat-bottom 96-well microtiter plate. Add<br />
mild lysis buffer to make a total volume of 135 μl.<br />
2. For the CRMV and MMV samples: Resuspend an aliquot of the ultracentrifuged<br />
pellet in mild lysis buffer. Quantify the protein by BCA assay. Based on the BCA<br />
assay results, add 25 μg of protein to the 96-well microtiter plate. Add mild lysis<br />
buffer further to make a total volume of135 μl/well.<br />
3. Add 25 μl of alkaline buffer and 25 μl of p-nitrophenyl phosphate (PNPP) to each<br />
well.<br />
4. Incubate the microtiter plate at 37°C for up to 3 h. Monitor the colorimetric<br />
change every hour by measuring absorbance at 405 nm using the microtiter plate<br />
reader. Stop incubation when the absorbance of the sample reaches the range of<br />
the standards.<br />
5. Determine the ALP activity in MV samples by comparing to the PNP standard<br />
curve. Report the ALP activity as nmol PNP produced per minute per milligram<br />
of protein used (see Note 6).
238 Xiao et al.<br />
3.6. Strong Cation Exchange Liquid Chromatography of Peptides<br />
1. Digest 100 μg of ECM, CRMV, or MMV proteins in 25 mM NH 4 HCO 3 , pH 8.4,<br />
with trypsin using a trypsin-to-protein ratio of 1:40. For 100 μg of protein, add<br />
2.5 μg of trypsin. Incubate the digestion at 37°C overnight (see Note 7).<br />
2. Lyophilize the peptide digests in a vacuum centrifuge.<br />
3. Dissolve peptide digests in 100 μl of 25% (v/v) acetonitrile containing 0.1% (v/v)<br />
formic acid.<br />
4. Inject the peptides onto a SCX-LC column (1 × 150 mm, polysulfoethyl A).<br />
5. Maintain the flow rate of the column at 50 μl/min. Mobile phase A is 25% (v/v)<br />
acetonitrile, and mobile phase B is 25% (v/v) acetonitrile with 0.5 M ammonium<br />
formate (pH 3).<br />
6. Elute the peptides using the following 96-min gradient method: 3% B for 3 min,<br />
followed by a linear increase to 10% B in 43 min, a further increase to 45% B<br />
in 40 min, and then to 100% B in 10 min. Monitor the peptide separation by<br />
fluorescence (266 nm excitation/350 nm emission). Collect fractions every minute<br />
for 96 min (see Note 8).<br />
7. Based on the chromatogram, pool the adjacent fractions into a total of 20 fractions<br />
and lyophilize (see Notes 9 and 10).<br />
8. Resuspend each pooled fraction in 20 μl of 0.1% (v/v) formic acid prior to<br />
nanoRPLC-MS analysis.<br />
3.7. Nanoflow Reversed-Phase Liquid Chromatography Tandem Mass<br />
Spectrometry<br />
1. Cut a 12-cm piece of 75 μm i.d. × 360 μm o.d. fused silica capillary column. Use<br />
a torch to briefly flame the section about 2 cm near one end. Once the flamed<br />
section is soft, pull the column to make a 10-cm long section with a closed tip.<br />
To make a fine and flat opening at the end of the tip, lightly score near the end<br />
of the closed tip using a ceramic cutter, and then break the end away.<br />
2. Connect the column to the slurry packer. Pack the column with 5 μm, 300 Å pore<br />
size C-18 silica-bonded stationary reversed-phase particles.<br />
3. Connect the column to an Agilent 1100 nanoLC system coupled with a LIT mass<br />
spectrometer (LTQ, ThermoElectron, operated with Xcalibur 1.4 SR1 software).<br />
4. Transfer the peptide fractions into glass vials. Inject 6 μl of the solution.<br />
5. Mobile phase A is 0.1% (v/v) formic acid and B is 0.1% (v/v) formic acid in<br />
acetonitrile. Elute the peptides using the following gradient method: 2% B at<br />
500 nl/min in 30 min; a linear increase of 2–42% B at 250 nl/min in 110 min;<br />
42–98% in 30 min including the first 15 min at 250 nl/min and then 15 min at<br />
500 nl/min; 98% at 500 nl/min for 10 min.<br />
6. Set the capillary temperature and electrospray voltage at 160°C and 1.5 kV,<br />
respectively. The LIT-MS is operated in a data-dependent MS/MS mode where<br />
the five most abundant peptide molecular ions in every MS scan are sequentially<br />
selected for collision-induced dissociation (CID) using a normalized collision
Analysis of ECM and Secreted Vesicle Proteomes 239<br />
energy of 35%. Apply dynamic exclusion to minimize repeated selection of<br />
peptides previously selected for CID (see Notes 11 and 12).<br />
3.8. Bioinformatic Analysis<br />
1. Search the tandem mass spectra against the UniProt proteomic database from<br />
the European Bioinformatics Institute (http://www.ebi.ac.uk/) with SEQUEST<br />
operating on a 40-node Beowulf cluster (SEQUEST Cluster version 3.1 SR1,<br />
Bioworks Browser 3.2). Limit the search to peptides generated with fully tryptic<br />
cleavage constraints.<br />
2. Set legitimate peptide identification criteria as follows: charge state and crosscorrelation<br />
(X corr ) scores of 1.9 for [M + H] 1+ , 2.2 for [M + 2H] 2+ , 3.1 for<br />
[M + 3H] 3+ , and a minimum delta correlation (△C n ) of 0.08.<br />
3. Base protein identification exclusively on unique peptide hits, i.e., peptides whose<br />
sequence is unique to a given protein (see Notes 13 and 14).<br />
3.9. Immunofluorescence Staining<br />
1. Plate 50,000 cells on glass cover slips in 6-well plates. Culture in differentiation<br />
medium.<br />
2. On day 15, briefly wash the cells with PBS.<br />
3. Fix the cells in 3.7% (v/v) formaldehyde in PBS for 10 min.<br />
4. Incubate with 10% (v/v) normal blocking serum in PBS.<br />
5. Briefly wash the cells with PBS; incubate with primary antibodies for 1.5 h.<br />
6. Wash the cells three times with PBS for 5 min each, and then incubate with<br />
secondary antibodies conjugated with fluorochrome (FITC or Texas Red) for 1 h.<br />
7. Wash the cells three times with PBS for 5 min each, including once with DAPI<br />
diluted 1:50,000 in PBS to stain nuclei.<br />
8. Mount the cover slips on microscope glass slides with ProLong mounting reagent.<br />
9. Observe the cells using a confocal fluorescence microscope (see Note 14).<br />
4. Notes<br />
1. MC3T3-E1 pre-osteoblast cells are derived from newborn murine calvaria (28).<br />
These cells closely resemble primary cell cultures in their proliferation, differentiation,<br />
and mineralization (29,30,31). The combination of ascorbic acid and<br />
-glycerophosphate stimulates MC3T3-E1 to undergo differentiation, which is<br />
characterized by substantial matrix mineralization (32,33). Therefore, it is a<br />
suitable model for the enrichment of ECM and isolation of MVs.<br />
2. It is necessary to culture multiple 10-cm plates (four or more at approximately<br />
4 × 10 6 cells /plate) in order to obtain sufficient amount of protein from ECM<br />
or MVs.<br />
3. Protein quantitation is a common laboratory procedure. The instructions are<br />
included within the BCA assay kit (Pierce); therefore, the procedure is not<br />
described in this chapter.
240 Xiao et al.<br />
4. The liberase/blendzyme 1 is a mixture of highly purified collagenase and<br />
dispase that offers gentle protease activity as compared to other ECM-degrading<br />
enzymes. Note that four blendzyme mixtures with increasing levels of enzymatic<br />
strength are available from Roche. Blendzyme 1 is the mildest version. The<br />
digestion time varies depending on the cell or tissue type. Alternatively, collagenase/dispase<br />
(1 mg/ml of collagenase/dispase in PBS-containing collagenase,<br />
0.1 U/ml and dispase, 0.8 U/ml) (Sigma Chemical Co., St. Louis, MO) can<br />
be used. Collagenase/dispase enzyme mixture is commonly used to digest the<br />
ECM.<br />
5. Two approaches are designed to isolate MVs either from the ECM or directly<br />
from the cell culture medium. In the first approach, enzymatic digestion and<br />
ultracentrifugation are combined to release MVs embedded in the ECM (designated<br />
as CRMVs). In the second approach, ultracentrifugation is applied to the<br />
medium to isolate MVs, designated as MMVs (34). To confirm the enrichment of<br />
MVs, the ultracentrifugation pellets are fixed and examined using transmission<br />
electron microscopy (Fig. 2).<br />
6. Measurement of the enzymatic activity of ALP is a standard marker for MV<br />
isolation (35,36).<br />
7. Instead of using the buffer provided along with trypsin, it is desirable to<br />
resuspend trypsin in 25 mM NH 4 HCO 3 , pH 8.4. The trypsin-to-protein ratio<br />
should be between 1:40 and 1:50. The digestion mixture is incubated overnight<br />
(approximately 16 h).<br />
8. The LIF detector used in this method can be constructed in-house (37). The<br />
LIF detector is more sensitive than a conventional lamp-based fluorescence<br />
detector. The use of a LIF detector is particularly advantageous when a narrow<br />
bore column (
Analysis of ECM and Secreted Vesicle Proteomes 241<br />
peptide is capable of identifying more proteins than the online procedure. Thus,<br />
the offline separation is described in this chapter.<br />
10. The pooling step is optional. The peptide fractions can be pooled based on the<br />
complexity of the chromatogram. In general, pooling to about 20 fractions is<br />
appropriate. It will save LC-MS running time without compromising the number<br />
of proteins that the approach can identify.<br />
11. In general, the MS data acquisition time is set to 150 min, starting 30 min after<br />
the beginning of the peptide elution gradient and synchronized to end with the<br />
elution gradient.<br />
12. An alternative approach: the resulting ECM, CRMV, or MMV protein samples<br />
can be resolved by SDS-PAGE and the proteins visualized by Coomassie<br />
staining. The protein bands that are of greater intensity than those prepared<br />
from undifferentiated cells can be excised and subjected to in-gel digestion with<br />
trypsin and analyzed using nanoRPLC-MS/MS (27).<br />
13. Proteins that are identified in both CRMV and MMV purifications can be<br />
considered as authentic MV proteins with a higher degree of confidence than<br />
those that were identified in only one of the preparations.<br />
14. Gene ontology (GO) (www.geneontology.org) can be used to annotate the<br />
identified proteins and categorize them according to their cellular location,<br />
molecular function, and cellular processes they are associated with.<br />
15. The validation of known MV proteins is conducted using Western blotting<br />
or immunofluorescence staining. Annexin V, a known constituent of MVs, is<br />
used as a protein landmark to locate vesicles in these experiments (38). The<br />
osteoblast cells can be double- stained with anti-annexin V and an additional<br />
antibody against either the extracellular protein emilin-1 or the ras GTPase,<br />
IQGAP1 (27).<br />
Acknowledgments<br />
This project has been funded in whole or in part with Federal funds from<br />
the National Cancer Institute, National Institutes of Health, under Contract No.<br />
N01-CO-12400. The content of this publication does not necessarily reflect<br />
the views or policies of the Department of Health and Human Services, nor<br />
does the mention of trade names, commercial products, or organization imply<br />
endorsement by the US Government.<br />
References<br />
1. Holmbeck, K. and Szabova, L. (2006) Aspects of extracellular matrix remodeling<br />
in development and disease. Birth Defects Res C Embryo Today 78, 11–23.<br />
2. Brooke, B. S., Karnik, S. K. and Li, D. Y. (2003) Extracellular matrix in vascular<br />
morphogenesis and disease: structure versus signal. Trends Cell Biol 13, 51–56.<br />
3. Tahinci, E. and Lee, E. (2004) The interface between cell and developmental<br />
biology. Curr Opin Genet Dev 14, 361–366.
242 Xiao et al.<br />
4. Harada, S. and Rodan, G. A. (2003) Control of osteoblast function and regulation<br />
of bone mass. Nature 423, 349–355.<br />
5. Beck, G. R., Jr. (2003) Inorganic phosphate as a signaling molecule in osteoblast<br />
differentiation. J Cell Biochem 90, 234–243.<br />
6. Aubin, J. E. (2001) Regulation of osteoblast formation and function. Rev Endocr<br />
Metab Disord 2, 81–94.<br />
7. Anderson, H. C. (1995) Molecular biology of matrix vesicles. Clin Orthop Relat<br />
Res, 266–280.<br />
8. Anderson, H. C. (2003) Matrix vesicles and calcification. Curr Rheumatol Rep 5,<br />
222–226.<br />
9. Anderson, H. C., Garimella, R. and Tague, S. E. (2005) The role of matrix vesicles<br />
in growth plate development and biomineralization. Front Biosci 10, 822–837.<br />
10. Kirsch, T. (2005) Annexins – their role in cartilage mineralization. Front Biosci<br />
10, 576–581.<br />
11. Hessle, L., Johnson, K. A., Anderson, H. C., Narisawa, S., Sali, A., Goding, J. W.,<br />
Terkeltaub, R. and Millan, J. L. (2002) Tissue-nonspecific alkaline phosphatase<br />
and plasma cell membrane glycoprotein-1 are central antagonistic regulators of<br />
bone mineralization. Proc Natl Acad Sci USA 99, 9445–9449.<br />
12. Johnson, K. A., Hessle, L., Vaingankar, S., Wennberg, C., Mauro, S., Narisawa, S.,<br />
Goding, J. W., Sano, K., Millan, J. L. and Terkeltaub, R. (2000) Osteoblast tissuenonspecific<br />
alkaline phosphatase antagonizes and regulates PC-1. Am J Physiol<br />
Regul Integr Comp Physiol 279, R1365–1377.<br />
13. Morris, D. C., Masuhara, K., Takaoka, K., Ono, K. and Anderson, H. C. (1992)<br />
Immunolocalization of alkaline phosphatase in osteoblasts and matrix vesicles of<br />
human fetal bone. Bone Miner 19, 287–298.<br />
14. Baldini, V., Mastropasqua, M., Francucci, C. M. and D’Erasmo, E. (2005) Cardiovascular<br />
disease and osteoporosis. J Endocrinol Invest 28, 69–72.<br />
15. Dao, H. H., Essalihi, R., Bouvet, C. and Moreau, P. (2005) Evolution and<br />
modulation of age-related medial elastocalcinosis: impact on large artery stiffness<br />
and isolated systolic hypertension. Cardiovasc Res 66, 307–317.<br />
16. Reynolds, J. L., Joannides, A. J., Skepper, J. N., McNair, R., Schurgers, L. J.,<br />
Proudfoot, D., Jahnen-Dechent, W., Weissberg, P. L. and Shanahan, C. M. (2004)<br />
Human vascular smooth muscle cells undergo vesicle-mediated calcification in<br />
response to changes in extracellular calcium and phosphate concentrations: a<br />
potential mechanism for accelerated vascular calcification in ESRD. J Am Soc<br />
Nephrol 15, 2857–2867.<br />
17. Abedin, M., Tintut, Y. and Demer, L. L. (2004) Vascular calcification: mechanisms<br />
and clinical ramifications. Arterioscler Thromb Vasc Biol 24, 1161–1170.<br />
18. Tintut, Y. and Demer, L. L. (2001) Recent advances in multifactorial regulation<br />
of vascular calcification. Curr Opin Lipidol 12, 555–560.<br />
19. Stewart, D. A., Cooper, C. R. and Sikes, R. A. (2004) Changes in extracellular<br />
matrix (ECM) and ECM-associated proteins in the metastatic progression of<br />
prostate cancer. Reprod Biol Endocrinol 2, 2.
Analysis of ECM and Secreted Vesicle Proteomes 243<br />
20. Yin, J. J., Pollock, C. B. and Kelly, K. (2005) Mechanisms of cancer metastasis to<br />
the bone. Cell Res 15, 57–62.<br />
21. Mundy, G. R. (2002) Metastasis to bone: causes, consequences and therapeutic<br />
opportunities. Nat Rev Cancer 2, 584–593.<br />
22. Roodman, G. D. (2004) Mechanisms of bone metastasis. N Engl J Med 350,<br />
1655–1664.<br />
23. Yates, J. R., III. (2004) Mass spectral analysis in proteomics. Annu Rev Biophys<br />
Biomol Struct 33, 297–316.<br />
24. Yates, J. R., III, Gilchrist, A., Howell, K. E. and Bergeron, J. J. (2005) Proteomics<br />
of organelles and large cellular structures. Nat Rev Mol Cell Biol 6, 702–714.<br />
25. Domon, B. and Aebersold, R. (2006) Mass spectrometry and protein analysis.<br />
Science 312, 212–217.<br />
26. Aebersold, R. and Mann, M. (2003) Mass spectrometry-based proteomics. Nature<br />
422, 198–207.<br />
27. Xiao, Z., Camalier, C. E., Nagashima, K., Chan, K. C., Lucas, D. A., de la<br />
Cruz, M. J., Gignac, M., Lockett, S., Issaq, H. J., Veenstra, T. D., Conrads, T. P.<br />
and Beck Jr, G. R. (2006) Analysis of the extracellular matrix vesicle proteome in<br />
mineralizing osteoblasts. J Cell Physiol, In press.<br />
28. Sudo, H., Kodama, H. A., Amagai, Y., Yamamoto, S. and Kasai, S. (1983) In vitro<br />
differentiation and calcification in a new clonal osteogenic cell line derived from<br />
newborn mouse calvaria. J Cell Biol 96, 191–198.<br />
29. Choi, J. Y., Lee, B. H., Song, K. B., Park, R. W., Kim, I. S., Sohn, K. Y.,<br />
Jo, J. S. and Ryoo, H. M. (1996) Expression patterns of bone-related proteins during<br />
osteoblastic differentiation in MC3T3-E1 cells. J Cell Biochem 61, 609–618.<br />
30. Quarles, L. D., Yohay, D. A., Lever, L. W., Caton, R. and Wenstrup, R. J.<br />
(1992) Distinct proliferative and differentiated stages of murine MC3T3-E1 cells<br />
in culture: an in vitro model of osteoblast development. J Bone Miner Res 7,<br />
683–692.<br />
31. Franceschi, R. T., Iyer, B. S. and Cui, Y. (1994) Effects of ascorbic acid on collagen<br />
matrix formation and osteoblast differentiation in murine MC3T3-E1 cells. J Bone<br />
Miner Res 9, 843–854.<br />
32. Beck, G. R., Jr, Sullivan, E. C., Moran, E. and Zerler, B. (1998) Relationship<br />
between alkaline phosphatase levels, osteopontin expression, and mineralization in<br />
differentiating MC3T3-E1 osteoblasts. J Cell Biochem 68, 269–280.<br />
33. Beck, G. R., Jr, Zerler, B. and Moran, E. (2001) Gene array analysis of osteoblast<br />
differentiation. Cell Growth Differ 12, 61–83.<br />
34. Johnson, K., Moffa, A., Chen, Y., Pritzker, K., Goding, J. and Terkeltaub, R. (1999)<br />
Matrix vesicle plasma cell membrane glycoprotein-1 regulates mineralization by<br />
murine osteoblastic MC3T3 cells. J Bone Miner Res 14, 883–892.<br />
35. Ali, S. Y., Sajdera, S. W. and Anderson, H. C. (1970) Isolation and characterization<br />
of calcifying matrix vesicles from epiphyseal cartilage. Proc Natl Acad Sci USA<br />
67, 1513–1520.<br />
36. Dean, D. D., Schwartz, Z., Bonewald, L., Muniz, O. E., Morales, S., Gomez, R.,<br />
Brooks, B. P., Qiao, M., Howell, D. S. and Boyan, B. D. (1994) Matrix vesicles
244 Xiao et al.<br />
produced by osteoblast-like cells in culture become significantly enriched in<br />
proteoglycan-degrading metalloproteinases after addition of beta-glycerophosphate<br />
and ascorbic acid. Calcif Tissue Int 54, 399–408.<br />
37. Chan, K. C., Muschik, G. M. and Issaq, H. J. (2000) Solid-state UV laser-induced<br />
fluorescence detection in capillary electrophoresis. Electrophoresis 21, 2062–2066.<br />
38. Wang, W., Xu, J. and Kirsch, T. (2005) Annexin V and terminal differentiation of<br />
growth plate chondrocytes. Exp Cell Res 305, 156–165.
IV<br />
Clinical Proteomics and Antibody Arrays
14<br />
Miniaturized Parallelized Sandwich Immunoassays<br />
Hsin-Yun Hsu, Silke Wittemann, and Thomas O. Joos<br />
Summary<br />
This chapter describes the development and use of bead-based miniaturized multiplexed<br />
sandwich immunoassays for focused protein profiling. Bead-based protein arrays<br />
or suspension microarrays allow simultaneous analysis of a variety of parameters within<br />
a single experiment. In suspension microarrays capture antibodies are coupled onto colorcoded<br />
microspheres.<br />
The applications of suspension microarrays are described, which allow to analyze<br />
proteins present in different types of body fluids, such as serum or plasma, cerebrospinal,<br />
pleural and synovial fluids, as well as cell culture supernatants. The chapter is divided into<br />
the generation of suspension microarrays, sample preparation, processing of suspension<br />
microarrays, validation of analytical performance, and finally pattern generation using<br />
bioinformatics tools.<br />
Key Words: suspension microarray; microspheres; immunoassay; protein profiling;<br />
biological fluids; serum; pleura; cell culture supernatants; cerebrospinal fluid; synovial<br />
fluid.<br />
1. Introduction<br />
Protein microarray technology allows simultaneous determination of a large<br />
variety of analytes from a minute amount of sample within a single experiment.<br />
Assay systems based on this technology are currently applied for identification<br />
and quantitation of proteins. Protein microarray technology is of major interest<br />
for proteomic research in basic and applied biology as well as for diagnostic<br />
applications. Miniaturized and parallelized assay systems have reached adequate<br />
sensitivity, and hence have the potential to replace singleplex analysis systems.<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
247
248 Hsu et al.<br />
Beside the well-known planar microarray-based systems, which are perfectly<br />
suited to screen a large number of target proteins, bead-based systems named<br />
suspension assays are a very interesting alternative, especially when the number<br />
of parameters of interest is comparably low. Suspension assay systems employ<br />
different color-coded or size-coded microspheres as the solid support for capture<br />
molecules. A flow cytometer, which is able to identify each individual type of<br />
bead and quantify the amount of captured targets on each individual bead, is<br />
used as a readout system. In the first step, antigen-specific capture antibodies<br />
are immobilized on the individual bead type. Different bead types are combined<br />
and incubated with the sample of interest. A labeled secondary antibody<br />
detects the captured analytes and is visualized with a fluorescent reporter<br />
system. Sensitivity, reliability, and accuracy are similar to those observed with<br />
standard microtiter ELISA procedures (1). Color-coded microspheres can be<br />
used to perform up to a hundred different assay types simultaneously. The flow<br />
cytometer identifies several thousand microspheres in a second, and simultaneously<br />
quantitates the amount of captured analytes (2,3,4,5,6). Suspension<br />
microarrays are currently advanced within the field of miniaturized multiplexed<br />
ligand binding assays with respect to automation and throughput (7).<br />
Miniaturized parallelized assay systems have to demonstrate appropriate<br />
sensitivity, precision, and reliability before they will be applied for screening<br />
or diagnostic purposes.<br />
This chapter describes the development and use of suspension antibody<br />
microarrays for protein profiling of several human body fluids. The standard<br />
methodology guidance is described to validate immunoassays (10,11,12) and to<br />
determine the sensitivity, precision, and accuracy of the multiplexed analysis.<br />
In the final section, data analysis is described to show how to deal with highdimension<br />
data sets (13,14).<br />
2. Materials<br />
2.1. Equipment<br />
1. Centrifuge: 5415D (Eppendorf)<br />
2. Vortex Mixer (Neolab)<br />
3. Ultrasonic bath<br />
4. Thermomixer (Eppendorf)<br />
5. Luminex100 instrument (Luminex Corp.)<br />
6. Vacuum manifold (Millipore)<br />
7. Filterplates (Millipore 96-well plate, cat. # MAB1250)<br />
8. Microcentrifuge tubes (Starlab 1.5 ml, cat. # I1415-2500)<br />
9. Carboxylated Beads (Qiagen, cat. # 922400 or Luminex Corp.)<br />
10. Deionized water
Miniaturized Parallelized Sandwich Immunoassays 249<br />
2.2. Common Reagents and Materials<br />
1. Bovine serum albumin (BSA, Roth T844.2)<br />
2. PBS (Fischer Scientific, cat. # 9472615)<br />
3. EDC (Pierce)<br />
4. Sulfo-NHS (Pierce)<br />
5. Detection reagent: Streptavidin-phycoerythrin (Streptavidin-PE) stock solution<br />
(1 mg/ml) in 100 mM NaCl, 100 mM sodium phosphate, pH 7.5, containing<br />
2 mM sodium azide (Molecular Probes, cat. #S21388)<br />
2.3. Buffers<br />
1. Activation buffer [100 mM sodium phosphate (Na 2 HPO 4 ), pH 6.2]<br />
2. Coupling buffer (50 mM MES, pH 5.0)<br />
3. Washing buffer [PBS, pH 7.4, and 0.05 % (v/v) Tween-20]<br />
4. Blocking/storage (B/S) buffer: 1% BSA fraction IV (Roth, cat. # T844.2) in 1×<br />
PBS<br />
5. Assay buffer formulation: 1% BSA fraction IV in 1×PBS<br />
3. Methods<br />
3.1. Principle<br />
The principle of suspension antibody microarrays is based on sandwich<br />
immunoassays as represented in Fig. 1. First-capture antibodies are coupled to<br />
carboxylated microspheres. For performing suspension antibody microarrays,<br />
the samples are incubated with coupled microspheres. Bound analytes are<br />
detected with biotinylated antibodies. Phycoerythrin-labeled streptavidin is used<br />
for signal detection. Finally, microspheres are identified by a flow cytometer,<br />
hence allowing the quantitation of the captured analytes.<br />
3.2. Production of Suspension Microarrays—Antibody Coupling to<br />
Carboxylated Microspheres (see Note 1)<br />
Using proven carbodiimide coupling chemistry, the antibodies are covalently<br />
immobilized on carboxylated beads via the amine groups in lysine side chains.<br />
Before coupling, the beads are first activated using EDC/Sulfo-NHS.<br />
Fig. 1. Processing of suspension microarrays. Schematic representation of the steps<br />
required for performing a suspension microarray immunoassay. Figure reproduced from<br />
Proteomics of Human Body Fluids: Principles, Methods and Applications, edited by<br />
Thongboonkerd (2006). (Continued)<br />
◮
250 Hsu et al.
Miniaturized Parallelized Sandwich Immunoassays 251<br />
The antibodies should not contain foreign protein, azide, glycine, Tris, or<br />
any other reagent containing primary amine groups. Otherwise, the antibodies<br />
must be purified by gel-filtration chromatography or dialysis before use.<br />
3.2.1. Bead Activation<br />
1. Sonicate the carboxylated bead stock suspension for 15–20 s to yield a homogeneous<br />
bead suspension. Thoroughly vortex the bead stock suspension for at least<br />
10 s. Take 2.5 × 10 6 beads per coupling reaction.<br />
2. Transfer the bead stock suspension to Starlab microcentrifuge tube.<br />
3. Briefly centrifuge the bead suspension (a quick spin up to 3000×g is sufficient)<br />
and discard the supernatant.<br />
4. Wash the beads with 80 μl activation buffer. Briefly vortex and centrifuge at<br />
10,000×g for 2 min. Discard the supernatant and repeat washing.<br />
5. Resuspend the beads in 80 μl activation buffer. Sonicate for 15–20 s to yield a<br />
homogeneous bead suspension.<br />
6. Freshly prepare EDC solution (50 mg/ml) and Sulfo-NHS solution (50 mg/ml)<br />
(see Notes 2 and 3).<br />
7. Add 10 μl of EDC solution and 10 μl of Sulfo-NHS solution to the bead suspension.<br />
Incubate for 20 min at room temperature (15–25°C) in the dark.<br />
3.2.2. Coupling of Antibodies to Activated<br />
Carboxylated Beads<br />
8. Dilute the protein stock solution with coupling buffer to a concentration of<br />
100 μg/ml in a volume of 500 μl.<br />
9. Centrifuge the beads at 10,000×g for 2 min and discard the supernatant.<br />
10. Wash the beads with 500 μl of coupling buffer. Briefly vortex and centrifuge at<br />
10,000×g for 2 min. Discard the supernatant and repeat washing.<br />
11. Add the diluted antibody solution (500 μl) from step 8.<br />
12. Wrap the tube in aluminum foil to exclude light. Gently agitate the tube with<br />
activated beads and antibody solution on a plate shaker for 2hatroom temperature<br />
(15–25°C).<br />
3.2.3. Washing and Storage of Coupled<br />
Carboxylated Beads<br />
13. Centrifuge the beads at 10,000×g for 2 min and carefully remove and discard<br />
the supernatant.<br />
14. Wash the beads with 500 μl of washing buffer. Briefly vortex and centrifuge at<br />
10,000×g for 2 min. Discard the supernatant and repeat washing.<br />
15. Resuspend the bead pellet in 1 ml B/S buffer including 0.05% (w/v) azide.<br />
16. Determine the bead concentration of the suspension using a cell-counting<br />
chamber.
252 Hsu et al.<br />
3.2.4. Counting Beads Using a Cell-Counting Chamber<br />
1. Add 5 μl of beads to 45 μl of PBS and mix.<br />
2. The hemacytometer is filled with 10 μl of the sample by placing the pipette tip<br />
against the loading “V” of the hemacytometer at a 45° angle. The sample is<br />
slowly released between the slide and the cover slip until the counting chamber<br />
is loaded. It is important to fill both sides of the chamber and wait for 2–3 min<br />
to allow the beads to settle.<br />
3. Count the cells at two opposite corners of the scored chamber and take an average.<br />
Each of the nine squares on the grid has an area of 1 mm 2 , and the coverglass<br />
rests 0.1 mm above the floor of the chamber. Thus, the volume over the central<br />
counting area is 0.1 mm 3 or 0.1 ml. Multiply the average number of beads in<br />
each central counting area by 10,000 to obtain the number of beads per milliliter<br />
of diluted sample. Multiply by the dilution factor of 10 to get beads/ml.<br />
4. Store the beads at 25×, typically 5×10 6 beads/ml.<br />
3.3. Processing of Bead-Based Multiplex Assays<br />
3.3.1. Sample Preparation<br />
Here, the preparation of proteins for use in multiplexed assay from clinical<br />
specimens or cell culture is described. Subheading 3.3.1.1 describes the use<br />
of serum or plasma; Subheading 3.3.1.2 describes the analysis of proteins<br />
present in cell culture supernatants; Subheading 3.3.1.3 describes the sample<br />
preparation of cerebrospinal, synovial, and pleural fluids.<br />
3.3.1.1. Serum or Plasma Samples<br />
Serum and plasma samples should be spun down (8000×g) prior to assay<br />
to remove particulate and lipid layers. This will prevent the blocking of wash<br />
plate as well as sample needle. The samples should be handled as biohazards<br />
since they may carry infectious agents. Freezing-thawing cycles might result in<br />
a measurable breakdown of some proteins (e.g., cytokines), and so the samples<br />
should be aliquoted before any experiment. The storage of aliquoted samples at<br />
–80°C is recommended. When we analyzed eight matched serum and plasma<br />
samples on the Luminex platform, no differences were seen between samples<br />
that underwent a freeze-thaw for levels of TNF, Eotaxin, IL-13, MCP-1, IFN,<br />
IL-12p70, MIP-1, IP-10, or GM-CSF. There was, however, a significant<br />
increase in IL-1 after freeze-thaw, suggesting that this process may liberate<br />
IL-1 from insoluble receptors. IL-1 and MCP-1 levels were significantly<br />
higher in plasma as compared to the matched serum sample. IP-10 was higher in<br />
serum. Figure 2 shows the freeze-thaw experiments to evaluate 10plex soluble<br />
receptor assays. It seemed that signal from some analytes was slightly decreased<br />
after freeze-thaw cycle; however, no statistically significant differences were
Miniaturized Parallelized Sandwich Immunoassays 253<br />
10,000<br />
1000<br />
MFI<br />
100<br />
10<br />
1<br />
thaw<br />
fresh<br />
thaw<br />
fresh<br />
thaw<br />
fresh<br />
thaw<br />
fresh<br />
thaw<br />
fresh<br />
thaw<br />
fresh<br />
thaw<br />
fresh<br />
thaw<br />
fresh<br />
thaw<br />
fresh<br />
thaw<br />
fresh<br />
gp130 ICAM Fas TNFRII VCAM IL-2R E-sel TNFRI RAGE MIF<br />
Fig. 2. Serum samples were drawn from three healthy donors. Each sample was<br />
divided into two parts. One part was measured directly after serum was taken; and the<br />
other part was subjected to a freeze-thaw cycle. Soluble receptors were analyzed using<br />
Luminex technology. There were no significant differences in MFI signals attributed<br />
to the freeze-thaw cycle.<br />
observed. Another important consideration in analyzing serum or plasma<br />
samples is the need for an appropriate buffer (described in Subheading 3.3.2).<br />
3.3.1.2. Cell Culture Samples<br />
Before use, the cell culture supernatants should be centrifuged at 14,000×g<br />
to remove any particulates. The cell culture supernatants can be diluted in their<br />
corresponding cell culture medium. As well as for serum samples, cell culture<br />
supernatants should be aliquoted and frozen at –80°C for any experiment.<br />
3.3.1.3. Cerebrospinal, Synovial, and Pleural Fluids<br />
Precious samples of limited volume such as cerebrospinal fluid (CSF) and<br />
synovial fluid are ideal candidates for multiplex analysis. To the synovial<br />
fluid, animal serum should be added to prevent heterophilic antibodies and<br />
rheumatoid factor (RF) binding, which can cause false positives. For cytokine<br />
assays, the samples may be filtered with a 50-kDa filter to remove the interfering<br />
antibodies. Another recently described method uses protein L to remove RF<br />
from serum(8). CSF samples have been analyzed for 22 cytokines using the<br />
Luminex platform, 11 cytokines were detected (9). The authors performed spike<br />
recovery experiments and describe the recoveries as good.
254 Hsu et al.<br />
3.3.2. Diluent<br />
It is important that the diluents selected for reconstitution and dilution of<br />
the standards reflect the environment of the samples being measured. Diluents<br />
for specific sample types have to be validated prior to use. For analyzing cell<br />
culture samples, the standards and samples are diluted in the respective cell<br />
culture medium. It is important to use the same lot of fetal bovine serum (FBS)<br />
as there may be significant differences between lots, which can interfere with<br />
the assay. Another factor to ensure is the pH of the sample, which will affect<br />
antibody binding. For assaying serum samples, each laboratory should develop<br />
and validate an appropriate diluent. We suggest starting with PBS supplemented<br />
with 10–50% animal serum (e.g., fetal calf serum, horse serum or goat serum,<br />
depleted human serum). The goal is to mimic the serum matrix to ensure similar<br />
binding kinetics in both serum and standard samples. The serum samples may<br />
also require dilution with small amounts of serum to prevent false positives,<br />
as some human antibodies may show reactivity toward the mouse captures.<br />
Generally, 1–2% of each species of antibodies is sufficient. The serum diluent<br />
must not be used to dilute the detection antibody or the streptavidin-PE.<br />
3.3.3. Detection Antibody<br />
The concentration of detection antibody used can be varied to create<br />
an immunoassay with different sensitivity and dynamic range. The authors<br />
typically use detection antibody at a concentration between 0.5 μg/ml and<br />
1.0 μg/ml. Optimization is necessary. The quantitative range of the assay can<br />
be shifted by changing the antibody concentration. The dilution of the detection<br />
antibody shifts the standard curve to the lower concentration range, whereas an<br />
increased concentration shifts the curve to the higher concentration range.<br />
3.3.4. General Protocol for Processing Bead-Based Multiplex Assays for<br />
the Determination of Proteins in Human<br />
1. Centrifuge the sample at 14,000×g to precipitate any particulates before diluting<br />
into appropriate diluent. The dilution factors will vary depending on sample type<br />
and concentration of analyte.<br />
2. Resuspend the standard into appropriate diluent and prepare an eight-point<br />
standard curve using twofold serial dilutions.<br />
3. Wet filter plate with 100 μl assay buffer.<br />
4. Plate fitting: Add 50 μl of the standard or sample to each well.<br />
5. Sonicate the coupled beads for 15–20 s to yield a homogeneous suspension.<br />
Thoroughly vortex the beads for at least 10 s.<br />
6. Dilute the beads to 1500 beads per well, and add 25 μl of diluted bead suspension<br />
to each well.
Miniaturized Parallelized Sandwich Immunoassays 255<br />
7. Incubate for 2hinthedark at room temperature (see Note 4).<br />
8. Washing step: Apply vacuum manifold to the bottom of filter plate to remove<br />
liquid. Wash by adding 100 μl of assay buffer. Repeat washing twice. Resuspend<br />
the beads in 75 μl of assay buffer.<br />
9. Add 25 μl of the detection antibody solution to each well.<br />
10. Incubate for 1.5 h in the dark at room temperature.<br />
11. Washing step: Apply vacuum manifold to the bottom of filter plate to remove<br />
liquid. Wash by adding 100 μl of assay buffer. Repeat washing twice. Resuspend<br />
the beads in 75 μl of assay buffer.<br />
12. Add 25 μl of Streptavidin-Phycoerythrin solution to each well.<br />
13. Incubate for 0.5 h in the dark at room temperature.<br />
14. Washing step: Apply vacuum manifold to the bottom of filter plate to remove<br />
liquid. Wash by adding 100 μl of assay buffer. Repeat washing twice. Resuspend<br />
the beads in 125 μl of assay buffer.<br />
15. Incubate on a plate shaker for 1 min.<br />
16. Read the results on Luminex 100 instrument.<br />
17. Data evaluation: We recommend extrapolating the sample concentrations from<br />
a 4-PL or 5-PL curve.<br />
3.3.5. Screening Protocol: 10plex Soluble Receptor Assay for Serum<br />
Samples<br />
1. Resuspend the standard into appropriate diluent and prepare an eight-point<br />
standard curve using twofold serial dilutions.<br />
2. Block the plate with 100 μl B/S buffer (1% BSA in PBS).<br />
3. Beads: 1500 beads of each colored code.<br />
4. Prepare an eight-point standard row mixture in 10% horse serum in B/S buffer<br />
by 1:2 serial dilutions. The highest concentration (ng/mL) used in the standard<br />
curves is shown in the following table:<br />
Molecule IL-2R E-Selectin Icam Fas gp130 TNFRI TNFRII RAGE VCAM MIF<br />
ng/mL 2 6 5 1 2 0.8 1.5 2 5 4<br />
5. Prepare the samples by 1:10 dilution in B/S buffer.<br />
6. Add 30 μl beads and 30 μl sample (or standard) into the wells.<br />
7. Incubate and shake for 1.5 h at room temperature.<br />
8. Wash 3×, each time with 100 μl PBS.<br />
9. Prepare the detection antibody mixture in B/S buffer as shown below:<br />
Det. Ab -IL-2R -E-Selectin -Icam -Fas -gp130 -TNFRI -TNFRII -RAGE -VCAM -MIF<br />
μg/mL 0.4 1 0.4 0.4 1 1 0.6 0.8 0.8 0.8
256 Hsu et al.<br />
10. Add 30 μl detection antibody mixture to each well, incubate, and shake for 1 h<br />
at room temperature<br />
11. Wash 3× each time with 100 μl PBS.<br />
12. Prepare Streptavidin-PE solution (5 μg/mL) in B/S buffer and pipette 30 μl to<br />
each well.<br />
13. Incubate and shake for 30 min at room temperature.<br />
14. Wash 3×, each time with 100 μl PBS.<br />
15. Resuspend the beads in 100 μl B/S buffer.<br />
16. Read the data in Luminex100.<br />
3.4. Validation of Analytical Performance of Miniaturized<br />
Multiplexed Protein Assays<br />
3.4.1. Accuracy<br />
Accuracy is expressed by the closeness of the measured value to the true<br />
value. It should be assessed using a minimum of five determinations over a<br />
minimum of three concentrations across the expected range of the assay. A<br />
deviation of 15% of the measured value to the true value is acceptable. Several<br />
methods for estimating accuracy are available.<br />
1. by comparing the measured analyte values with those of reference data;<br />
2. by adding known quantities of the analyte into an appropriate test matrix (e.g.,<br />
serum, plasma). Then, the recovery is expressed as the measured analyte concentration<br />
relative to the added analyte concentration. The recovery (%) is calculated<br />
as follows: the background concentration of the matrix plus<br />
Recovery (%) =<br />
Measured analyte concentration<br />
Background analyte concentration in text matrix + added analyte concentration ∗100<br />
3.4.2. Selectivity<br />
Selectivity can be assessed by performing cross-reactivity experiments where<br />
multiplex assay is performed with each of the standards assayed separately.<br />
This will ensure that the capture antibody is selective for its respective analyte<br />
only in the assay.<br />
3.4.3. Specificity<br />
Specificity is defined by the ability of an assay to measure unequivocally the<br />
amount of an analyte in the presence of interfering substances. Non-specificity<br />
might be derived from cross-reactivity of the antibody used in the assay with<br />
other proteins or antibodies present in the sample.
Miniaturized Parallelized Sandwich Immunoassays 257<br />
3.4.4. Precision<br />
Precision is expressed by the closeness of agreement between a series of<br />
repeated measurements. It should be assessed using a minimum of five determinations<br />
over a minimum of three concentrations across the expected range<br />
of the assay. The mean value should be within 15% of the coefficient of<br />
variation (CV).<br />
3.4.4.1. Repeatability<br />
Intra-assay precision, or repeatibility, expresses the precision under constant<br />
conditions. The measurements are performed within 1 day by the same analyst<br />
using identical reagents and the same instruments.<br />
3.4.4.2. Reproducibility<br />
Inter-assay precision, or reproducibility, expresses the precision by changing<br />
the measurement conditions, which may involve different analysts, reagents,<br />
instruments, and laboratories.<br />
3.4.5. Limits of Detection and Quantitation (see Note 5)<br />
3.4.5.1. Detection Limit<br />
The limit of detection (LOD) is the lowest amount of analyte in a sample<br />
that can be detected but not quantitated as an exact value. According to IUPAC<br />
definition (2), the limit of detection is estimated as the mean of the zero<br />
standard signal plus three times the standard deviation (SD) obtained on the<br />
zero standard signal:<br />
LOD = Mean zerostandard + 3 ∗ SD zerostandard<br />
3.4.5.2. Quantitation Limit<br />
The limit of quantitation (LOQ) is the lowest amount of analyte in a sample<br />
that can be quantitated with acceptable statistical significance. According to<br />
IUPAC definition, the limit of quantitation is estimated as the mean of the zero<br />
standard signal plus 10 times the SD obtained on the zero standard signal:<br />
LOQ = Mean zerostandard + 10 ∗ SD zerostandard<br />
3.4.6. Linearity<br />
Linearity is defined as the ability of an analytical procedure to produce<br />
signals that are directly proportional to the analyte concentration of the sample.
258 Hsu et al.<br />
3.4.7. Range<br />
The range of an analytical procedure is defined by the interval between the<br />
upper and lower amounts of analyte within which the analyte can be detected<br />
with a suitable level of accuracy, precision, and linearity.<br />
3.4.8. Robustness<br />
Robustness expresses the extent to which the measured values remain<br />
unaffected by small variations in method parameters like temperature, reagent<br />
concentration, or instrumental parameters. It indicates the reliability of an<br />
analytical procedure during normal usage. Figure 3 indicates the standard<br />
curves of 10plex soluble receptor assay. The data have shown the feasibility<br />
and robustness of the assays.<br />
3.5. Pattern Generation<br />
After optimization of the assays, screening jobs can be performed, and<br />
huge amounts of data will be generated. To deal with high-dimensional<br />
10,000<br />
10plex soluble receptors assay<br />
MFI<br />
1000<br />
100<br />
10<br />
MIF<br />
VCAM<br />
RAGE<br />
TNFRII<br />
TNFRI<br />
gp130<br />
Fas<br />
ICAM<br />
IL-2R<br />
E-sel<br />
1<br />
10 100 1000 10,000 100,000<br />
Concentration (pg/ml)<br />
Fig. 3. The standard curves of 10plex soluble receptors assay were plotted according<br />
to average MFI readings from several individual measurements; standard deviation bars<br />
were included. The data reflected the range of the linearity and also the robustness of<br />
the assays.
Miniaturized Parallelized Sandwich Immunoassays 259<br />
data sets, some bioinformatic tools have been provided. For example,<br />
performing clustering analysis to distinguish different diseases or symptoms<br />
of diseases can lead to useful taxonomies, and correct diagnosis of clusters<br />
of symptoms is also extremely essential for successful therapy in the field of<br />
medicine.<br />
Table 1 summarizes the main features in CIMminer (Clustered Image<br />
Maps) (13) and MeV (MultiExperiment <strong>View</strong>er) (14). These are two platforms;<br />
both can be applied for the purposes mentioned above. Unsupervised hierarchical<br />
clustering analysis can be performed using the online tool CIMminer<br />
developed by the National Cancer Institute. MeV is another more integrated<br />
freeware, which was developed by TIGR (The Institute for Genomic Research).<br />
It has launched 23 modules in the analysis. Its capabilities to generate<br />
common clustering data, such as HCL (Hierarchical clustering) and ST (Support<br />
Trees), and several methods like TTEST (T-tests), SAM (Significance Analysis<br />
of Microarrays), ANOVA (Analysis of Variance), and TFA (Two-factor<br />
ANOVA) could help users discover significant parameters based on statistical<br />
analysis. Further sophisticated techniques can be applied including PCA<br />
(Principal Components Analysis), SOTA (Self Organizing Tree Algorithm),<br />
RN (Relevance Networks), KMC (K-Means/K-Medians Clustering), KMS (K-<br />
Means/K-Medians Support), CAST (Clustering Affinity Search Technique),<br />
QTC (QT CLUST), SOM (Self Organizing Maps), GSH (Gene Shaving),<br />
FOM (Figures of Merit), PTM (Template Matching), SVM (Support Vector<br />
Machines), KNNC (K-Nearest-Neighbor Classification), DAM (Discriminant<br />
Analysis Module), COA (Correspondence Analysis), TRN (Expression Terrain<br />
Maps), and EASE (Expression Analysis Systematic Explorer).<br />
Table 1<br />
Comparison of the Main Features in CIMminer and MeV<br />
CIMminer<br />
MeV<br />
Contributor NCI TIGR<br />
Analysis platform Web-based(http://<br />
discover.nci.nih.gov/<br />
cimminer/)<br />
Off-line / Free software( http://<br />
www.tm4.org/mev.html )<br />
Input file ”.txt”, “.zip” ”.txt”, “.mev”, “.tav”, “.gpr”<br />
Order Algorithm More Less<br />
Statistical analysis No Yes, significant parameters could<br />
be found out<br />
Results Color-coded Image Color-coded Image<br />
Reference Science 1997; 275:343–9 Biotechniques 2003; 34:374–8
260 Hsu et al.<br />
4. Notes<br />
1. This method can also be adapted for coupling reactions of antigens, receptors, or<br />
other proteins.<br />
2. Minimize the exposure of EDC and Sulfo-NHS to air, and close containers tightly.<br />
Use fresh aliquots for each coupling reaction and discard after use.<br />
3. S-NHS solution (50 mg/ml) can be prepared and stored at –20°C.<br />
4. Incubation time can be varied. The authors typically incubate between 30 min and<br />
2 h. The primary incubation of the bead and sample can be performed overnight<br />
at 4°C for greater low-end sensitivity.<br />
5. The detection limit is primarily dependent on the quality of the antibodies<br />
used. Additionally, the detection limit is influenced by detection conditions (e.g.,<br />
antibody concentration, incubation time), complexity of the multiplex assay, and<br />
matrix proteins.<br />
References<br />
1. Morgan, E., Varro, R., Sepulveda, H., Ember, J.A., Apgar, J., Wilson, J., Lowe, L.,<br />
Chen, R., Shivraj, L., Agadir, A., Campos, R., Ernst, D., Gaur, A. (2004)<br />
Cytometric bead array: a multiplexed assay platform with applications in various<br />
areas of biology. Clin Immunol, 110, 252–66<br />
2. Dasso, J., Lee, J., Bach, H., Mage, R.G. (2002) A comparison of ELISA and<br />
flow microsphere-based assays for quantification of immunoglobulins. J Immunol<br />
Methods, 263, 23–33<br />
3. Carson, R.T., Vignali, D.A. (1999) Simultaneous quantitation of 15 cytokines using<br />
a multiplexed flow cytometric assay. J Immunol Methods, 227, 41–52<br />
4. Dunbar, S.A., Vander Zee C.A., Oliver, K.G., Karem, K.L., Jacobson, J.W. (2003).<br />
Quantitative, multiplexed detection of bacterial pathogens: DNA and protein applications<br />
of the Luminex LabMAP system. J Microbiol Methods, 53, 245–52<br />
5. Joos, T.O., Stoll, D., Templin, M.F. (2002) Miniaturised multiplexed immunoassays.<br />
Curr Opin Chem Biol, 6, 76–80<br />
6. Prabhakar, U., Eirikis, E., Davis, H.M. (2002) Simultaneous quantification of<br />
proinflammatory cytokines in human plasma using the LabMAP assay. J Immunol<br />
Methods, 260, 207–18<br />
7. Lu, J., Getz, G., Miska, E.A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Sweet-<br />
Cordero, A., Ebert, B.L., Mak, R.H., Ferrando, A.A., Downing, J.R., Jacks, T.,<br />
Horvitz, H.R., Golub, T.R. (2005) MicroRNA expression profiles classify human<br />
cancers. Nature, 435, 834–8<br />
8. de Jager, W., Prakken, B.J., Bijlsma, J.W., Kuis, W., Rijkers, G.T. (2005) Improved<br />
multiplex immunoassay performance in human plasma and synovial fluid following<br />
removal of interfering heterophilic antibodies. J Immunol Methods, 300, 124–35<br />
9. Natelson, B.H., Weaver, S.A., Tseng, C.L., Ottenweller, J.E. (2005) Spinal fluid<br />
abnormalities in patients with chronic fatigue syndrome. Clin Diagn Lab Immunol,<br />
12, 52–5
Miniaturized Parallelized Sandwich Immunoassays 261<br />
10. Findlay, J.W., Smith, W.C., Lee, J.W., Nordblom, G.D., Das, I., DeSilva, B.S.,<br />
Khan, M.N., Bowsher, R.R. (2000) Validation of immunoassays for bioanalysis: a<br />
pharmaceutical industry perspective. J Pharmaceutical Biomed Anal, 21, 1249–73<br />
11. Sanchez-Carbayo, M. (2006) Antibody arrays: technical considerations and clinical<br />
applications in cancer. Clin Chem, 52, 1651–9<br />
12. Kingsmore, S.F. (2006) Multiplexed protein measurement: technologies and applications<br />
of protein and antibody arrays. Nat Rev Drug Discov, 5, 310–20<br />
13. Weinstein, J.N., Myers, T.G., O’Connor, P.M., et al. (1997) An informationintensive<br />
approach to the molecular pharmacology of cancer. Science, 275, 343–9<br />
14. Saeed, A.I., Sharov, V., White, J., Li, J., Liang, W., Bhagabati, N., Braisted, J.,<br />
Klapa, M., Currier, T., Thiagarajan, M., Sturn, A., Snuffin, M., Rezantsev, A.,<br />
Popov, D., Ryltsov, A., Kostukovich, E., Borisovsky, I., Liu, Z., Vinsavich, A.,<br />
Trush, V., Quackenbush, J. (2003). TM4: a free, open-source system for microarray<br />
data management and analysis. Biotechniques, 34(2), 374–8.
15<br />
Dissecting Cancer Serum Protein Profiles Using<br />
Antibody Arrays<br />
Marta Sanchez-Carbayo<br />
Summary<br />
Antibody arrays represent one of the high-throughput techniques enabling detection<br />
of multiple proteins simultaneously. One of the main advantages of the technology over<br />
other proteomic approaches resides on that the identities of the measured proteins are<br />
known at front of the experimental design or can be readily characterized, facilitating a<br />
biological interpretation of the obtained results. This chapter overviews the technical issues<br />
of the main antibody array formats as well as various applications using serum specimens<br />
in the context of neoplastic diseases. Clinical applications of antibody arrays vary from<br />
biomarker discovery for diagnosis, prognosis, and drug response to characterization of<br />
s protein pathways and modification changes associated with disease development and<br />
progression. As a high-throughput tool addressing protein levels and post-translational<br />
modifications, it improves the functional characterization of molecular bases for cancer.<br />
Furthermore, the identification and validation of protein expression patterns characteristic<br />
of cancer progression and tumor subtypes may enable tailored therapeutic intervention and<br />
improvement in the clinical management of cancer patients. Technical requirements such as<br />
lower sample volume, antibody concentration, format versatility, and high reproducibility<br />
support their increasing impact in cancer research.<br />
Key Words: antibody arrays; protein profiling; serum; direct labeling.<br />
1. Introduction<br />
1.1. Antibody Arrays in the Context of Other Proteomic Strategies<br />
Two main proteomic strategies can be taken in order to investigate the<br />
cancer proteome, named untargeted and targeted. The terminology refers to<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
263
264 Sanchez-Carbayo<br />
whether the proteins to be measured are unknown and identified along an<br />
untargeted proteomic approach, or known and considered in the experimental<br />
design for targeted strategies. Untargeted architecture platforms are best suited<br />
for first-pass comparisons of proteomes to identify relatively few, novel, or<br />
known proteins that exhibit the greatest differences in abundance. The two<br />
most commonly used technologies are two-dimensional electrophoresis (2D)<br />
and low- and high-resolution mass spectrometry (1,2,3). Targeted architecture<br />
proteomic platforms measure and quantify proteins of interest identified previously,<br />
and are suited for analyses of quantitative differences in abundance<br />
among known protein families and pathways. The versatility of targeted<br />
platforms allows controlling and estimating the reproducibility, scalability, and<br />
precise quantification, leading to high sensitivity and coverage. This approach<br />
allows experimental designs to address specific hypothesis and biological interpretation<br />
of the results obtained. However, the number of proteins amenable<br />
for these analyses depends on the availability of antibodies with high affinity<br />
and specificity to bind a target protein. The main targeted techniques used for<br />
large-scale analysis of many samples and proteins include protein microarrays,<br />
multiplexed Western blots, and tissue arrays. Protein arrays represent the most<br />
versatile among the proteomics techniques available to date, since antigens,<br />
peptides, complex protein solutions, or antibodies can be immobilized to<br />
capture and quantify the presence of specific antibodies or proteins, respectively<br />
(1,2,3,4).<br />
1.2. Antibody Array Formats<br />
Innovation in the immobilization surfaces and detection strategies has led<br />
to an increasing number of planar antibody array technologies and bead-based<br />
versions. Planar antibody arrays represent the most common type of protein<br />
arrays, which is the major focus of the present chapter. This section describes<br />
the main formats of planar arrays covering their differences with bead-based<br />
assays (Fig. 1; for bead-based arrays, see also Chapter 14).<br />
The main planar label-based types comprise one-antibody assays (using<br />
one antibody to capture the target molecule) and sandwich assays (using two<br />
antibodies to capture the target protein) (1,2,3,4). One-antibody and sandwich<br />
assays present advantages and pitfalls over each other. In one-antibody labelbased<br />
assays, the targeted proteins are captured by an immobilized antibody<br />
and detected through labeling with a tag (Fig. 1A). In direct labeling, the<br />
proteins are labeled with a fluorophore, such as cyanines (Cy3 or Cy5). In<br />
indirect labeling, the proteins are labeled with a tag that is later detected by a<br />
labeled antibody. One-antibody label-based assays allow the incubation of two<br />
different samples, each labeled with a different tag on the arrays. Normalization<br />
is facilitated by co-incubating a reference sample with a test sample (1,2,3,4).
Dissecting Cancer Serum Protein Profiles 265<br />
ANTIBODY-BASED ARRAYS<br />
ANTIGEN-BASED ARRAYS<br />
A<br />
Direct<br />
Cy3<br />
Competitive<br />
Cy5<br />
C<br />
Reverse phase<br />
TSA<br />
Cy3<br />
Indirect<br />
Cy5<br />
Complex lysate<br />
Biotin<br />
Digoxigenin<br />
D<br />
Tumor-associated antigen arrays<br />
B<br />
Suspension: bead based<br />
RCA, RLS, ECL<br />
TSA, Bio-SA-Cy3<br />
Whole cell<br />
Membrane<br />
Autoantibody, e.g.: antip53<br />
Tumor antigen e.g.:p53<br />
Soluble<br />
Fig. 1. Main formats of planar and suspension protein arrays. RCA: rolling-circle<br />
amplification; RLS: resonance light scattering; ECL: enhanced chemiluminescence;<br />
TSA: tyramide signal amplification; SA: streptavidin.<br />
Another benefit is that these assays are competitive, since the analytes in the<br />
test and reference solutions compete for binding at the antibodies (1,2,3,4).<br />
This leads to improvement in the linearity of response and dynamic range as<br />
compared to non-competitive assays (4). The main disadvantage is related to<br />
the disruption of analyte–antigen interaction by the label, which may also limit<br />
the detection as well as sensitivity and specificity.<br />
In the sandwich label-based format, antibodies capture unlabeled proteins,<br />
which are detected by another antibody using several methods to generate the<br />
signal for detection (Fig. 1B). The use of two antibodies targeting each analyte<br />
increases the specificity as compared to one-antibody label-based assays. The<br />
reduced background of these assays increases also the sensitivity. The sandwich<br />
format allows only non-competitive assays, since only one sample can be<br />
incubated on each array (1,2,3,4). This results into sigmoidal binding response,<br />
as compared to linear ones in the competitive format, and requires standard<br />
curves of known concentrations of analytes to achieve accurate calibration of<br />
concentrations (4). As compared to one-antibody label-based assays, sandwich<br />
assays are more difficult to develop in a multiplexed manner, since matched<br />
pairs of antibodies and purified antigens may not be available for each target,<br />
and the potential cross-reactivity among detection antibodies increases with<br />
additional analytes (2,4). Currently, the practical size of multiplexed sandwich
266 Sanchez-Carbayo<br />
assays limits to 30–50 different targets (1,2,3,4). This contrasts with oneantibody<br />
assays where only the availability of antibodies and space on the<br />
substrate limits the number of targets being analyzed.<br />
In addition to the planar arrays, suspension or bead-based arrays use<br />
different fluorescent beads, each coated with a different antibody and spectrally<br />
resolvable from each other [(5,6,7,8,9) and see chapter 14]. The beads are<br />
incubated with a sample to allow protein binding to the capture antibodies, and<br />
the mixture is incubated with a cocktail of detection antibodies, each corresponding<br />
to one of the capture antibodies. The detection antibodies are tagged<br />
to allow fluorescent detection. The beads are passed through a flow cytometer<br />
system, and each bead is probed by two lasers, one to read the color or identity<br />
of the beam, and another to read the amount of detection antibody on the<br />
bead (5,6,7,8,9). Multiplexed bead-based flow-cytometry assays represent an<br />
active area of development. Differentially identifiable beads coated with either<br />
proteins, autoantigens, or antibodies can identify a variety of bound antibodies<br />
or proteins using a cytometer system (5,6,7,8,9). Advances in instrumentation<br />
and bead chemistries will probably make this approach very valuable for the<br />
detection of circulating cancer cells in clinical practice. In another version<br />
of this concept, suspensions of cells can be incubated on antibody arrays,<br />
and the amount of cells that bound each antibody can be quantified by dark<br />
field microscopy. These arrays have the potential of characterizing multiple<br />
membrane proteins in specific cell populations or changes in cell surfaces<br />
induced by drug therapies.<br />
It is important to distinguish antibody arrays from two main protein array<br />
formats that can be applied to serum samples based also on the binding of<br />
antibodies to specific antigens. The development and design of tumor-associated<br />
antigen (TAAs) arrays enhance the detection of autoantibodies against TAAs<br />
for cancer diagnosis (Fig. 1C). The rationale is related to the presence in<br />
the cancer sera of antibodies, which react with a unique group of autologous<br />
cellular antigens or TAAs (10,11). Complex protein extracts can also be spotted<br />
onto membranes and probed with antibodies targeting specific proteins on the<br />
so-called reverse-phase arrays (12,13) (Fig. 1D).<br />
1.3. Types of Planar Antibody Arrays Based on the<br />
Labeling-Hybridization Methods<br />
The increasing detection modalities have led to several types and applications<br />
for antibody arrays (see Note 1). A number of labeling and detection<br />
methods can be employed for one-antibody and sandwich label-based planar<br />
arrays (Fig. 2). The signal can be generated by a fluorescently labeled detection<br />
antibody (Fig. 2A). This approach represents the standard sandwich arrays,
Dissecting Cancer Serum Protein Profiles 267<br />
A)<br />
Antibody direct<br />
Sandwich<br />
B) Species-specific<br />
Tertiary Antibody<br />
C) Biotinylated<br />
antibodies with<br />
fluorescent streptavidin<br />
conjugates<br />
D) 2 SAPE layers<br />
B<br />
B<br />
B<br />
E) Tyramide<br />
Signal<br />
Amplification<br />
F) Alkaline<br />
phosphatase linked<br />
to a species tertiary<br />
Ab activated<br />
chemiluminescence<br />
G) Rolling Circle<br />
Amplification<br />
H) Resonance lightscattering<br />
B<br />
B<br />
B<br />
Fig. 2. Several labeling and detection methods can be employed for antibody arrays.<br />
requiring chemical labeling of all secondary detection antibodies, but the<br />
assay is a simple two-step procedure that does not require a separate staining<br />
step (14,15). An alternative approach employs a species-specific fluorescently<br />
labeled tertiary antibody (Fig. 2B). This option avoids the use of large chemically<br />
modified detection antibodies, but limits the species of capture antibodies.<br />
A third option is the utilization of available biotinylated detection antibodies<br />
(Fig. 2C) (15). In these assays, detection occurs after staining of the sandwich<br />
complex with Cy3-labeled streptavidin or other streptavidin variants, such as<br />
Texas Red conjugates or streptavidin-R-Phycoerythrin (SAPE) (15). The fourth<br />
possibility is based on that the fluorescent signal can be further amplified<br />
using a second layer of SAPE coupled to the first layer via an anti-SAPE<br />
antibody (Fig. 2D). Alternatively, in the fifth option, the number of biotin<br />
labels can be increased via thyramide signal amplification (Fig. 2E) (2). An<br />
antibiotin horseradish peroxidase (HRP) will generate a thyramide radical that<br />
cross-links a biotin or a fluorophore to all exposed tyrosine residues of any<br />
protein near the recognition event (2). Chemiluminesce can also be implemented<br />
to multiplexed sandwich assays as a sixth possibility (Fig. 2F), using<br />
a streptavidin-HRP or a species-specific antibody conjugated with HRP or<br />
alkaline phosphatase and chemiluminescence substrates. Chemiluminescence<br />
is typically more sensitive than standard fluorescence applications. A polymer<br />
decorated with streptavidin and europium chelates is utilized not only for
268 Sanchez-Carbayo<br />
microplate but also for microarray measurements. Evanescence waveguide is<br />
employed as an alternative for ultrasensitive fluorescence (16). Rolling-circle<br />
amplification can be applied as a seventh option for signal generation (Fig. 2G).<br />
The 5 ′ end of an oligonucleotide primer is attached to an antibiotin antibody<br />
(17). After binding of the antibiotin antibody to the biotinylated detection<br />
antibody of the sandwich, the oligonucleotide is enzymatically extended using a<br />
circular DNA sequence as template. Fluorescently labeled short oligos are then<br />
hybridized to the extent DNA decorating each bound antibody with thousands<br />
of fluorophores (15). An alternative eighth staining method yielding sensitivity<br />
similar to evanescence wave technology and rolling-circle amplification<br />
involves the use of colloidal gold particles coated with an antibiotin antibody<br />
(18). Because of resonance light scattering (RLS), these particles scatter white<br />
light very intensely, and quantitative readouts of miniaturized sandwich assay<br />
can be obtained with a simple charge-couple device (CCD) camera-based<br />
imaging system (18) (Fig. 2H). RLS particles do not show any photobleaching<br />
as compared to fluorescence or chemiluminescence (14,15,16,17,18,19,20).<br />
Due to the high versatility of labeling-hybridization methods available to<br />
date, the present chapter will describe the detailed reagents and protocol of<br />
direct labeling on serum specimens, as summarized in Figure 3.<br />
1.4. Applications in Cancer Research Using Serum Specimens<br />
Direct labeling methods have been applied for cancer diagnostics to the<br />
detection of proteins in the serum of patients with prostate cancer (21). The<br />
use of a two-color rolling-circle amplification method improves the detection<br />
of low abundant proteins. This method has also been shown to provide<br />
adequate reproducibility and accuracy for protein profiling on serum specimens<br />
and clinical applications (17,22,23,24). Sandwich assays can also measure<br />
protein abundances in body fluids using detection methods such as RLS (25),<br />
enhanced chemiluminescence (26), tyramide signal amplification (27), and<br />
fluorescence (28).<br />
Reverse protein arrays have also been optimized to spot serum specimens<br />
and obtain high-throughput measurement of IgA in thousands of sera using<br />
a single experiment (29). For example, a recent report designed antibody<br />
arrays for bladder cancer by selecting antibodies against targets differentially<br />
expressed in bladder tumors identified by gene profiling (24). Serum<br />
protein profiles obtained by two independent antibody arrays represent comprehensive<br />
means for bladder cancer diagnosis and clinical outcome stratification<br />
(24). Validation analyses with ELISA and immunohistochemistry on<br />
tissue microarrays represent alternative approaches to confirm the relevance of<br />
identified proteins for tumor progression. Such strategy provides experimental
Dissecting Cancer Serum Protein Profiles 269<br />
evidence for the use of several integrated technologies and strengthens the<br />
process of biomarker discovery.<br />
Serum specimens can be utilized to profile the humoral immune signature of<br />
cancer patients to detect both autoantibodies against tumor antigens and secreted<br />
cytokines. The combined detection of antibodies against a group of TAAs has<br />
provided high sensitivity for diagnosis of prostate cancer (10). The use of phage<br />
display arrays can enhance tumor subtype specificity of such measurements<br />
(10,11). Cytokine profiling on serum and plasma specimens can differentiate<br />
cancer patients from control subjects, and also stratifies patients with leukemia<br />
based on clinical outcome. Several reports have also compared the reproducibility<br />
and differences among several technologies available for multiplexing cytokine<br />
measurements, including planar and bead-based antibody arrays (5,6,7).<br />
In summary, antibody arrays can be utilized for the following applications:<br />
(1) the discovery of candidate disease biomarkers (21,24); (2) characterizing<br />
signaling pathways (28), disease progression, clinical subtypes, and<br />
outcomes (21,24); (3) measurement of changes in post-translational modifications<br />
or expression levels of disease-related proteins (28); (4) identifying<br />
binding partners to proteins; this is very important especially when conducting<br />
functional studies for drug discovery; (5) epitope mapping for determining<br />
regions of proteins than bind specific antibodies.<br />
2. Materials<br />
2.1. Printing of Antibody Arrays<br />
1. Antibodies. A critical step is the selection of the antibodies to be printed onto the<br />
antibody arrays. The antibodies printed on the arrays will be selected based on<br />
their known affinity characterization and experimental design (see Note 2).<br />
2. Antibody purification with Affi-gel Protein A MASP II kit (Bio-Rad, Hercules,<br />
CA).<br />
3. Protein concentration measurements with BCA Protein Assay (Pierce, Rockford,<br />
IL).<br />
4. Fast Slides (Schleicher and Schuell Biosciences, Keene, NH) or HydroGel coated<br />
glass microscope slides (Perkin Elmer Life Sciences, Waltham, MA).<br />
5. Polypropylene 384-well microtiter plates (Genetix, New Milton, Hampshire, UK<br />
or MJ Research, Waltham, MA).<br />
6. Seal aluminum scotch brand foil tape (R.S. Hugues Sunnyvale, CA).<br />
7. Printer.<br />
2.2. Labeling and Hybridization of Serum Samples<br />
1. NHS-linked Cy3 and Cy5 protein labeling agents (Amersham, GE Healthcare,<br />
Piscataway, NJ).
270 Sanchez-Carbayo<br />
2. Microscopic slide staining chamber with slide racks (Shandon Lipshaw, Pittsburgh,<br />
PA).<br />
3. Diamond scribe (VWR, West Chester, PA).<br />
4. Hydrophobic marker (PAP pen, Immunotech, Marseille).<br />
5. Coverslips (Lifterslip, Erie Scientific, Portsmouth, NJ).<br />
6. Wafer handling tweezers (Technitool, West Berlin, NJ).<br />
7. Clinical centrifuge with flat swinging buckets for holding slide racks.<br />
8. Spin columns for protein cleanup (Bio-Rad Micro Bio-Spin P-6).<br />
9. Microcon YM-50 (Millipore, Bedford, MA).<br />
10. Complete protease inhibitors (Roche, Indianapolis, IN).<br />
11. Buffers: phosphate buffered saline (PBS), pH 7.4 (137 mM NaCl, 4.3 mM<br />
Na 2 HPO 4 , 1.4 mM KH 2 PO 4 ); carbonate buffer, pH 8.5 (50 mM NaHCO 3 );<br />
PBST, PBS containing 0.5% (v/v) Tween-20; 0.1 M PBS, pH 7.2 (68.4 ml<br />
1MNa 2 HPO 4 , 31.6 ml 1 M NaH 2 PO 4 , 900 ml dH 2 O); NP40 lysis buffer:<br />
50 mM Hepes-OH, EDTA, 50 mM NaCl, 10 mM NaPPi (Tetrasodium Diphosphate<br />
Decahydrate), 50 mM NaF, 1% (v/v) NP40, 10 mm Sodium- Vanadate,<br />
pH 7.5–8.0; saturated NaCl (Sigma); blocking buffer: 1% (w/v) bovine serum<br />
albumin (BSA) in PBST; 7–10 mM dye stock in DMSO: Dissolve one tube of<br />
Cy3 or Cy5 dyes in 30 μl of DMSO. Aliquot and freeze at –80°C.<br />
2.3. Detection<br />
1. ScanArray microarray scanner at 543 nm and 633 nm wavelengths (Packard<br />
Bioscience, Research Parkway Meriden, CT).<br />
2. GenePix Pro 3.0 (Axon Instruments, Union City, CA) software program employed<br />
to quantify the image data.<br />
3. Methods<br />
Three main steps can be considered along the overall process of setting<br />
up custom-made antibody arrays: antibody array construction, sample labeling<br />
and hybridization onto the antibody array, and scanning and data analysis. The<br />
success of the whole process is greatly dependent on the availability of highquality<br />
antibodies for capturing the target proteins as well as serum samples<br />
well handled, preserved, and characterized.<br />
3.1. Antibody Array Construction<br />
1. Select the antibodies (see Note 2).<br />
2. Purify the antibodies (see Note 3).<br />
3. Keep stable and quantify the antibodies (see Notes 4–7).<br />
4. Prepare the printing plate with antibodies. Put 5– 7 μl antibody solution on each<br />
well of a 384-well plate (see Note 8).<br />
5. Prepare slides for printing (see Note 9).
Dissecting Cancer Serum Protein Profiles 271<br />
For nitrocellulose slides, no preparation is needed (see Note 9).<br />
For hydrogel slides: The hydrogel slides should be prepared just before use<br />
(i.e., only when you are ready to print the arrays). Load the hydrogels into a<br />
slide rack, briefly rinse (1 s) in purified water, and wash three times at room<br />
temperature with gentle rocking for 10 min each time in purified water. A<br />
microscope slide staining chamber is useful for the washing steps. The staining<br />
chambers come with slide racks that hold 10–30 slides. The racks can be<br />
transferred between staining chambers containing different washing buffers as<br />
well as a clinical centrifuge for drying the slides.<br />
6. Centrifuge slides to dry at no more than 350 g for 3 min. A clinical centrifuge<br />
with flat swinging bucket holders works well for this task. Place a paper towel<br />
layer on the bottom of the swinging bucket to absorb water removed from the<br />
slides. Place the slide rack on the paper towel and centrifuge at no more than<br />
350 g for about 3 min.<br />
7. Place the hydrogel slides in a 40°C water bath for 20 min using the staining<br />
chamber allocating paper towel in the bottom.<br />
8. Remove the slides from the incubator and allow slides to cool at room temperature<br />
for 5 min. The slides are now ready for printing.<br />
9. Print the antibodies on the slides (see Note 10).<br />
10. Start the post-print processing of microarrays.<br />
For hydrogels:<br />
• Prepare staining chambers with a wet paper towel soaked in saturated NaCl at the<br />
bottom.<br />
• After printing, the slides are incubated in a humidified staining chamber overnight<br />
at room temperature to allow adsorption of the antibodies to the matrix.<br />
• The next day, circumscribe the array boundaries on each slide with a marker (e.g.,<br />
PAPpen). Leave at least 3–4 mm between the array and the marker line. Allow the<br />
hydrophobic marker lines to fully dry.<br />
For nitrocellulose (FAST, Schleier, and Schuell) slides:<br />
• Allow the slides to dry for at least 1 h (let the slides dry on a slide-staining chamber).<br />
• Store in a refrigerator on a slide rack in a humidified staining chamber.<br />
• The next day, circumscribe the array boundaries on each slide with a marker (e.g.,<br />
PAPpen). Leave at least 3–4 mm between the array and the marker line. Allow the<br />
hydrophobic marker lines to fully dry.<br />
11. Rinse the slides as follows:<br />
a. Rinse briefly (for 30 s) in PBST.<br />
b. Wash in PBST for 3 min with gentle rocking.<br />
c. Wash in PBST for 30 min with gentle rocking.
272 Sanchez-Carbayo<br />
Cy5<br />
Ligand + Test proteins Cy3 Ligand +<br />
Reference<br />
proteins<br />
Separate free dye<br />
React<br />
Mix<br />
Place on array<br />
React<br />
Separate free dye<br />
Free dye<br />
Coated slide<br />
Antibodies<br />
Free dye<br />
Scan<br />
Fig. 3. Scheme of the whole process when working with custom-made antibody<br />
arrays. Once antibodies are selected and printed on the arrays, serum samples are labeled<br />
and hybridized onto the antibody arrays. Scanning and data analyses of fluorescence<br />
will provide quantitative measurement of multiple proteins simultaneously.<br />
12. Block the slides. Once the antibodies are immobilized, it is necessary to block<br />
non-specific protein-binding sites on the printed microarrays. Typical blocking<br />
solutions include diluted BSA or casein solutions (1,2,9,12,19). If the arrays are<br />
not to be used for a day or more, leave them in the BSA-blocking solution in<br />
the refrigerator. Prepare the blocking buffer right before use. Add sodium azide<br />
to the blocking buffer if you intend to store for more than one day and then<br />
begin with step b shown below:<br />
a. Block in the blocking buffer for 1hatroom temperature with constant shaking.<br />
b. Briefly rinse with PBST twice or alternatively rinse the second time with 0.1 M<br />
PBS, pH 7.2, for 20 min.<br />
c. Dry the slides by centrifugation immediately prior to incubating with the labeled<br />
samples using a clinical centrifuge with flat swinging bucket holders.<br />
3.2. Labeling of Samples and Hybridization<br />
A protocol for direct labeling is provided, summerized in Figure 3.<br />
1. Select the serum samples for labeling (see Note 11).<br />
2. Determine the volume of each serum sample to label in both Cy3 and Cy5. It is<br />
important to note that Cy3 is more consistent and bright when deciding whether<br />
to label samples or references with either Cy3 or Cy5. For the samples, divide<br />
the volume to be placed on the array by the desired final dilution of the sample<br />
(varying from 1/30 to 1/50). For a 20 μl volume (the volume used for a 12 ×<br />
12-mm standard hydrogel) and a 1/50 final dilution, use 0.4 μl of serum sample<br />
(20/50) per array.
Dissecting Cancer Serum Protein Profiles 273<br />
If a pooled reference is to be used, each component of the reference is first<br />
labeled and then pooled (as opposed to pooling and then labeling). The amount<br />
to be labeled of each component of the reference is (Va × A)/Nr, where Va<br />
is the volume per array (0.4 μl in the above case), A is the number of arrays<br />
the reference will be used in, and Nr is the number of samples pooled in the<br />
reference. For example, if a pool of 10 samples will be used as the reference for<br />
20 arrays, the volume of each sample to be used in the Cy5 labeling mix will be<br />
(0.4 × 20)/10 = 0.8 μl.<br />
3. Dilute the serum sample approximately 15× with carbonate buffer or phosphate<br />
buffer at pH 7.5 spiked with 0.5 μg/ml dinitrophenol (DNP) flag (if the flag is<br />
to be used for normalization). Do not use buffers with an amine group such as<br />
Tris-base.<br />
4. Add a 20th volume of dye stock to each sample. The final concentration of the<br />
NH-ester activated Cy-dyes within the serum protein solution should be between<br />
100–300 μM (each vial of dye contains 200 nmol).<br />
5. Mix each dye and serum protein solutions and let the reaction proceed on ice in<br />
the dark for 2 h. Normally, mix the reference protein solution with the Cy3 dye<br />
solution, and the test protein solution with the Cy5 dye solution.<br />
6. Add a 20th volume 1 M Tris-HCl pH 7.5–8.0 (or glycine) to each of the reactions<br />
to quench (stop the labeling), so that at least a 200-fold excess of quencher:dye<br />
concentration is achieved.<br />
7. Load the samples onto a microconcentrator having the appropriate molecular<br />
cutoff, such as the Bio-Rad Bio-spin 6 microcolumn, and spin at 1000×g for<br />
2 min. A 3000-D cutoff captures most proteins while still removing the dye.<br />
If smaller proteins are not important, the 10,000-D cutoff is faster. Centrifuge<br />
according to the microconcentrator instructions. The 10,000-D microcon typically<br />
requires 20 min, and the 3000-D microcon requires 80 min of centrifugation at<br />
10,000×g at room temperature.<br />
8. Make 10× blocking solution: 30% (w/v) non-fat milk in PBS and 1% (v/v)<br />
Tween-20 (e.g., 3 ml milk in 10 ml buffer).<br />
9. Spin the milk solution at 10,000×g for 10 min. The milk blocker solution needs<br />
to be centrifuged to remove particulate matter (e.g., 10 min at 10,000×g).<br />
10. After centrifuging with the microconcentrator column to the flow-through<br />
(collection tube) of the column, add 1 μl of the supernatant of the blocking mix<br />
per array and 1 μl of 10× protease inhibitor per array.<br />
11. Pool the reference samples and divide among the test samples according to the<br />
experimental plan.<br />
12. Add 1× PBS to bring to 20–25 μ per array, if necessary. The labeled samples<br />
may be stored overnight at 4 C.<br />
13. Start hybridization of the labeled serum samples on the printed antibody<br />
arrays. Distribute the Cy3-labeled reference protein solution to the appropriate<br />
Cy5-labeled test protein solutions. Add PBS to each mix to achieve a volume<br />
of 20–25 μL per array. It is recommended to remove any particulate matter or
274 Sanchez-Carbayo<br />
precipitate by (1) filtering with a 0.45-μm spin filter, or (2) centrifuging for 10 min<br />
at 14,000×g and pipetting out the supernatant.<br />
14. Load appropriate amount of labeled samples on the slides within the marked<br />
boundaries, and cover with Lifterslip. Use 20 μl for the 12 × 12 -mm hydrogels.<br />
The cover slip should be at least 1/4 inch longer than the dimensions of the array.<br />
(The background is often higher at the edges of the cover slip.)<br />
15. Incubate for 2hatroom temperature with constant shaking.<br />
16. Rinse briefly in PBST to remove the Lifterslip.<br />
17. Wash three more times for 10 min in fresh changes of PBST. (All washes are<br />
performed in racks at room temperature.)<br />
18. Rinse for 20 s in PBS. Alternatively, final washes with H 2 O can be performed<br />
for 5 min each of gentle agitation.<br />
19. Dry the slides by centrifugation prior to scanning.<br />
3.3. Scanning and Data Analysis<br />
1. Scan the slides at 552 nm and 635 nm using a microarray fluorescence scanner<br />
(see Note 12).<br />
2. Process the data: grid the arrays and reject unsatisfactory data points (see Note<br />
13).<br />
3. Normalize the data (see Note 14).<br />
4. Analyze the data (see Note 15).<br />
5. Interpret the data (see Note 16).<br />
4. Notes<br />
1. Radioactivity, fluorescence, or chemiluminescence detection methods have been<br />
used with antibody arrays. Radioactivity is not frequently used due to its<br />
safety concerns and its longer exposure times (up to 10 h). Fluorescence<br />
is one of the most frequently utilized detection methods. Fluorophores, like<br />
chromogens, exist in many formulations and have defined emission spectra.<br />
Fluorescein, rhodamine (Texas Red), phycobiliproteins, nitrobenzoxadiazole<br />
(NBD), acridines, Cy3, Cy5, and bodipy compounds are commonly used<br />
for protein labeling (13,14,15,16,17). The selection of fluorophores for use<br />
with microarrays depends on sample type, substratum, emission characteristics,<br />
and even the number of analytes to be assayed. Not all substrates are<br />
compatible with fluorescent detection strategies due to inherent autofluorescence<br />
of the material (14,15,16,17), which significantly reduces the signal-to-noise<br />
ratios. Nitrocellulose-coated slides cause light scatter and higher background<br />
as compared to aldehyde-treated slides with laser scanner detection methods,<br />
limiting the use of nitrocellulose substrata for fluorescent detection methods<br />
(13,14,15,16,17). The sample may also have components that interfere with a<br />
selected fluorophore. Flavoproteins autofluoresce and emit light in the same<br />
region as fluorescein, limiting the use of this fluorophore in samples rich in<br />
flavoproteins, e.g., liver and kidney tissues. Photobleaching and quenching of
Dissecting Cancer Serum Protein Profiles 275<br />
fluorophores can decrease the total signal observed on an array. The Cy3 and<br />
Cy5 dyes are commonly used for fluorescent detection because they overcome<br />
these effects. They are well suited for fluorescence detection strategies due to<br />
their decreased dye interactions, increased brightness, and the ability to add<br />
charged groups to the molecules (13,14,15,16,17). Fluorescent-tagged proteins<br />
including antibodies can be used for detection of immobilized molecules on<br />
a microarray using both indirect or sandwich strategies. Streptavidin-biotin or<br />
RCA amplification chemistries can also be applied to fluorescence detection<br />
strategies (22,23,24), providing sufficient sensitivity for most applications.<br />
Chemiluminescent detection methods are based on Western blotting protocols<br />
for detection of antigen-bound antibodies with secondary antibodies conjugated<br />
to alkaline phosphatase or HRP (13,14,15,16,17,18). Chemiluminescent<br />
detection methods can be applied to any of the label detection methods. Chemiluminesce<br />
is highly sensitive but may pose limitations due to its dynamic range<br />
and compatibility with multiplexing. Amplification strategies such as biotinyltyramide<br />
can be applied to chemiluminesce. A useful application consists of<br />
total protein determination made directly on arrays using a ruthenium organic<br />
complex, which interacts non-covalently with proteins immobilized on nitrocellulose<br />
(13,14,15,16,17,18). The dye is applicable to arrays printed on nitrocellulose<br />
membranes. This type of total protein analysis is useful for minute sample<br />
volumes in which a standard protein spectrophotometric analysis would not be<br />
feasible.<br />
2. Antibody selection. The first critical step is the selection of protein targets to be<br />
measured with the antibody arrays, which depends on the experimental design<br />
and objectives of the analyses undertaken. It is advisable to have biological or<br />
experimental criteria supporting the search for specific proteins in the serum. An<br />
approach rendering high efficacy suggests analyses of high-throughput profiling<br />
at the DNA or RNA level previous to protein profiling to enrich the probability<br />
to find a target protein in the serum. Not all proteins are suitable for measurement<br />
with this assay, since their size and the likely abundances of the proteins in the<br />
samples are limiting factors. If a protein is very small (or is a polypeptide), it<br />
may not be compatible with direct labeling detection methods, which use sizebased<br />
separation of labeled product from the label. If a protein is in very low<br />
abundance, it may fall out of the detection limit of the assay. Detection limits for<br />
the assay depend on the antibody used, the protein background in the sample,<br />
and the detection conditions. In general, the direct labeling method described<br />
here can give detection limits in the low ng/ml range for targets present in the<br />
serum background.<br />
Once the target protein is assembled, the search of antibodies begins. The<br />
main bottleneck to the development of highly multiplexed planar antibody<br />
arrays is the requirement for specific affinity ligands for each analyte. Commercially<br />
available antibodies against novel or rare proteins may not exist, which<br />
leaves the option of having the antibody custom-produced. Custom antibody<br />
generation is lengthy, expensive, and probably not a viable choice for more<br />
than a few antibodies. If a protein target is more common and a choice of
276 Sanchez-Carbayo<br />
antibody exists, it is advisable to search for antibodies that work efficiently for<br />
enzyme-immunoassays, since these assays are quite similar to antibody arrays.<br />
Monoclonal antibodies seem to have a higher success rate, but polyclonals may<br />
also work well, although they may lead to high background and reduced specificity<br />
and sensitivity as compared to monoclonal antibodies. In vitro selection<br />
of antibodies using phage-ribosome or mRNA display technologies, and the<br />
use of engineered binding molecules is having increasingly important role<br />
in generating specific affinity ligands for analytes for which antibodies are<br />
unavailable (14). An alternative strategy to produce specific antibodies has been<br />
validated optimizing the design of protein sub-fragments of a selected size with<br />
minimal sequence similarity to other proteins. The fragments are selected using<br />
an alignment scanning procedure based on the principle of lowest sequence<br />
similarity to other human proteins, optimally to generate antibodies with high<br />
selectivity (20). If direct labeling method is to be used, only one antibody for<br />
target is needed. If using a sandwich assay, a matched pair of antibodies is<br />
needed. The direct labeling method works well for mid- to high-abundance<br />
proteins, while sandwich assays or amplification protocols are recommended for<br />
low-abundance proteins.<br />
Since antibodies cannot be manufactured with known affinity and specificity,<br />
it is advisable to validate the specificity and sensitivity of each antibody<br />
prior to use as a probe for protein arrays. The identification of a single band<br />
at the specified molecular weight on Western blotting represents a standard<br />
validation strategy for the specificity and sensitivity of the proposed antibody, as<br />
well as immunoprecipitation followed by mass spectrometry (1,6). The antigenantibody<br />
properties of the antibodies printed on the arrays can be evaluated<br />
by the estimation of random and systematic errors. Western blotting analyses<br />
can serve to evaluate the specificity of the antibodies. Commercial or custommade<br />
enzyme-immunoassays can be utilized to validate the ability of antibodies<br />
identified by antibody arrays by an independent method on the same serum<br />
specimens profiled using antibody arrays.<br />
Recombinant antigens can be utilized as positive and negative controls for<br />
the process of printing (depositing the antibodies onto the slides), calibration,<br />
and detection methods (1,2,9). The linearity range of the assay depends on the<br />
antibody-antigen affinity. Linearity can only be achieved when the concentration<br />
of the analyte and antibody are matched to the affinity constant. It is advisable<br />
that dilution and recovery experiments evaluating the specificity and affinity of<br />
the antibodies for their ligands are included when utilizing antibody arrays. (2,9).<br />
3. Purity of antibodies. Antibodies work best in the arrays when they are highly<br />
purified. The use of antibodies in a high background of other proteins often<br />
results in a weakened or non-specific signal, since the background proteins<br />
occupy many binding sites on the microarray. Some purified antibodies come in a<br />
BSA or gelatin stabilizer. It may be desirable to remove gelatin, since it can bind<br />
some biological molecules. BSA rarely has the problem of non-specific binding,<br />
but if it is at a much higher concentration than the antibody, it could significantly
Dissecting Cancer Serum Protein Profiles 277<br />
reduce the signal from the antibody, which would warrant further purification of<br />
the antibody. Some antibodies come in a high concentration (8–50%) of glycerol<br />
to improve stability. While glycerol will not interfere with the assay, the added<br />
viscosity may negatively affect the printing process. Glycerol concentrations<br />
above 20% should be avoided. To change the buffer of an antibody, it is advisable<br />
to use the Bio-Rad Micro Bio-Spin P30 column. These columns come with<br />
two types of buffers: sodium saline citrate (SCC) and Tris buffer. The filtrate<br />
will come through in the packing buffer. This packing buffer can be changed<br />
by running a different buffer through the column three times. The P30 column<br />
removes solution components smaller than 30 kD, and the P6 column removes<br />
components smaller than 6 kD. Thus, the P30 column is better for purification of<br />
antibodies, and the P6 column is better for purification of complex mixtures in<br />
which low-molecular-weight species should be preserved. Thus, if the antibody<br />
is to be subsequently labeled, it is recommended not to put the antibody in a<br />
Tris or amine-containing buffer.<br />
Polyclonal antibodies come either as unpurified antisera, the IgG fraction of<br />
antisera, or the affinity purified (purified using the antigen) fraction of antisera.<br />
Affinity purified is best, since it yields the highest purity of specific antibody.<br />
IgG-purified fractions of antisera usually work well. Antibodies that arrive in<br />
pure ascites fluid may also need to be purified. If a monoclonal antibody is good,<br />
it will work well without further purification, and so they should be tested first.<br />
A protein purification method of IgG antibodies is recommended using the Affigel<br />
Protein A MAPS II kit (Bio-Rad). In general, the following antibody buffer<br />
requirements should be considered: (1) all antibodies that arrive as antisera need<br />
to be IgG purified; (2) antibodies in ascites fluid may also need to be purified,<br />
although they can first be tested without purification.<br />
4. Stability and concentration. Antibodies are stable when refrigerated in a standard<br />
buffer such as PBS. The concentration of an antibody can be measured using<br />
a protein concentration kit such as the BCA 200 Protein Assay Kit (Pierce<br />
Biotechnology). The optimal spotting concentration range is 100–200 μg/mL.<br />
Higher concentrations could yield better signal strengths and lower detection<br />
limits, and may be desirable if the consumption of antibody is not a concern.<br />
Each antibody’s concentration should be constant at different printing sets, since<br />
concentration variations in an antibody can affect data. Simply stated, if a<br />
set of data is produced using a particular antibody at 300 μg/mL, subsequent<br />
experiments should use that antibody at 300 μg/ml for better comparison of<br />
the results.<br />
5. Antibody storage. Most antibodies can be stored or refrigerated for up to a year.<br />
New antibodies should be divided into aliquots that will last approximately a<br />
year each. One aliquot should be kept in the refrigerator as a working stock,<br />
and the others frozen at –70°C. Aliquoting the antibody stocks helps to avoid<br />
repeated freeze/thawing that can damage the proteins. Protein stocks should not<br />
be frozen in PBS; it is better undiluted. When retrieving antibodies/proteins from
278 Sanchez-Carbayo<br />
a freezer stock, thawing should be done slowly on ice to reduce damage to the<br />
antibody from the thawing process.<br />
6. Tracking antibodies. It is helpful to keep information about the antibodies in<br />
a database. It is advisable to provide a number code for each antibody, and if<br />
changes are made to an antibody’s buffer composition, a new code should be<br />
assigned to the new preparation. Relevant information to track include clonality,<br />
manufacturer, animal of origin, concentration, and aliquot age. It is important<br />
to track the maximum information provided in the antibody datasheet, and label<br />
aliquots accordingly.<br />
7. Maintaining antibody stocks. A refrigerator stock of ready-to-use antibodies<br />
(kept at working solution) should be maintained. Except for the antibodies that<br />
should not be frozen, only one tube of each antibody should be stored in the<br />
refrigerator at a time. The amount of each antibody in the refrigerator stock<br />
should be sufficient to last for six months or up to a year (normally around<br />
100 μL). The rest of the antibody stock should be aliquoted into similar volumes<br />
and frozen at –80°C. If the antibody in the refrigerator stock needs to be diluted<br />
in order to reach the working stock concentration, dilute only sufficient stock for<br />
the working solution. When retrieving antibodies/proteins from a freezer stock,<br />
they should be thawn slowly on ice in order to reduce damage from the thawing<br />
process. The protein stock master list will need to be adjusted to indicate when<br />
the antibodies are thawn and frozen.<br />
8. Print plate preparation. After the antibodies have been acquired and prepared<br />
at proper purity and concentration, they are assembled into a “print plate,”<br />
which is a microtiter plate used in the robotic printing of microarrays.<br />
Polypropylene microtiter plates are preferable to polystyrene because of lower<br />
protein adsorption. The plate should be rigid and precisely machined for<br />
optimal functioning with printing robots. The 384-well plates are generally more<br />
compatible with printing robots than 96-well plates and require less volume per<br />
well than 96-well plates. Load about 6–10 μl of each antibody into each well<br />
of the 384-well print plate. The volume may depend on the shape of the well<br />
and how far the print tips descend into the well. Too much volume may lead to<br />
droplets of antibody solution sticking to the outside of the print tip. The volume<br />
may also need to be optimized for particular applications, such as multiple<br />
draws from each well, which would require a greater volume. If printing is<br />
sometimes inconsistent or variable between printing tips, it is desirable to fill<br />
multiple wells with the same antibody solution so that different print pins spot<br />
the same antibody. Store the 384-well print plates sealed in the refrigerator until<br />
ready to use. Aluminum foil tape provides a good seal. Enclosing the covered<br />
plate in a sealed plastic bag ensures long-term, evaporation-free storage. It is<br />
very important to prepare a spreadsheet containing the well identities for use in<br />
downstream data processing applications.<br />
9. Selection of slides. The various immobilization and detection strategies are<br />
devised depending on which target molecules are going to be measured and<br />
which ones are used to capture them. The attributes of an ideal sub-stratum
Dissecting Cancer Serum Protein Profiles 279<br />
for antibody arrays include limited non-specific binding, high surface area-tovolume<br />
ratio, inert biological molecules, minimal autofluorescence, and compatibility<br />
with available detection methods. A variety of surfaces and immobilization<br />
chemistries have been described for antibody arrays. Derivatized supports where<br />
capture antibodies are immobilized include surfaces such as polyvinylidene<br />
difluoride, nitrocellulose, agarose, polyacrylamide, or hydrogels. Glass slides<br />
are frequently coated with one-, two-, or three-dimensionally structured surface<br />
modifications, being activated with aldehyde, polylysine, or a homo-functional<br />
cross-linker as part of the initial optimization experiments (2,9,14). The advantages<br />
of the use of distinct coating or surfaces under different blocking, pH<br />
buffering, or UV cross-linking conditions for specific applications have been<br />
described (14). Silane-coated glass slides or acrylamide hydrogel can provide<br />
good reproducibility from day to day, efficient immobilization of antibodies, and<br />
low background when used in conjunction with fluorescence detection. Various<br />
substrates for antibody arrays have been reported, such as poly-lysine coated<br />
glass (1), aldehyde-coated glass (30), nitrocellulose (31), and a poly-acrylamide<br />
based hydrogel (32). Hydrogels and nitrocellulose give good results for the direct<br />
labeling method described here. Nitrocellulose slides do not require any preparation<br />
before printing, and give clean and low background results. Hydrogel<br />
coating on glass slides (such as those supported by PerkinElmer Life Sciences)<br />
can support multiple layers of protein, thus increasing the binding capacity<br />
and signal strengths, and it should be noted that the hydrophilic matrix of the<br />
hydrogel may better retain native protein structure. Hydrogels should be stored<br />
dry at room temperature. They must be used within 2 days after preparation.<br />
10. Printing of antibody arrays. The details of printing will depend on the printing<br />
robot used. It is necessary to immobilize antibodies in a way that the functional<br />
component will be efficiently deposited without interfering subsequent binding.<br />
Conditions such as humidity, temperature, dust levels, and pin washing should<br />
also be stringently controlled during the printing step. It is important to minimize<br />
the time taken to unseal the print plates and their exposure in order to keep<br />
the evaporation of antibody solutions low. Maintaining a moderately high<br />
humidity in the printing environment (around 45%) will minimize evaporation<br />
and maintain spot quality. Excessive humidity can lead to overly large spots.<br />
The proper printing of the robot should be confirmed with test prints on dummy<br />
slides before starting the microarray production. It is advisable to use 500 μg/mL<br />
BSA in 1× PBS for the test prints. If the tips are washed in a wash bath, make<br />
sure the water is changed regularly every 6–12 loads to prevent contamination<br />
of the tips. It is also desirable to confirm sufficient washing of the pins and lack<br />
of carry-over from load to load. This test can be done by loading labeled protein<br />
into one of the print plate wells in a dummy print, followed by scanning of the<br />
unwashed slide. If fluorescence is seen in spots after the fluorescently labeled<br />
material, the pins need to be washed more stringently. Most microarrayers will<br />
allow the printing of replicate spots on each array from the same well of the print<br />
plate. Replicate spots are useful to obtain more precise data through averaging
280 Sanchez-Carbayo<br />
and ensure the acquisition of data if a portion of the array is somehow unusable.<br />
Six to ten spots per array per antibody are recommended.<br />
11. Serum sample handling and storage. Sera should be collected in red gel tubes,<br />
allowing the coagule to retrieve and centrifuged at 3000 g/10 min, aliquoted and<br />
stored at –80 C. All samples should be consecutively numbered to avoid any<br />
record compromising the identity of these patients or controls under study. Serum<br />
samples should be handled as biohazards. Tips and tubes that contact serum<br />
samples should be disposed in a biohazard bag. Upon the first thaw, the samples<br />
need to be aliquoted. Samples should be aliquoted so that no more than four<br />
thaws are necessary for every experiment. Low volume aliquots (approximately<br />
10–15 μl) of each specimen are recommended. For greater than approximately<br />
50 samples, it is convenient to use a microtiter plate for aliquoting. In this case,<br />
approximately 50 μl from each sample is placed into each well of a 96-well<br />
microtiter plate. Either a robot or a matrix multichannel pipettor is used to<br />
aliquot small volumes into replicate 96-well plates.<br />
12. Scanning. The fluorescence signal from the microarrays is detected using a<br />
microarray scanner. GenePix Pro 3.0 (Axon Instruments) software program<br />
quantifies the image data. The local background in each color channel is<br />
subtracted from the signal at each antibody spot, and spots having obvious<br />
defects, no detectable signal by GenePix, or a low net fluorescence in either color<br />
channel are removed from analysis. The ratio of net signal from the samplespecific<br />
channel to the net signal from the reference-specific channel is calculated<br />
for each antibody spot, and ratios from replicate antibody measurements in each<br />
array are averaged. An intensity-dependent normalization algorithm for antibody<br />
arrays is recommended.<br />
Some of the particulars of the scanning method will depend on the instrument,<br />
but some general principles may be followed. Scanning of an experiment set<br />
should be performed immediately after incubation of the microarrays and all on<br />
the same day, if possible, to minimize noise introduced by variable breakdown of<br />
dye on the array (particularly Cy5). The microarrays should be kept in the dark<br />
to minimize bleaching of fluorescent dyes. Scanners typically have adjustments<br />
for laser power, detector gain, and scan rate. Set both lasers to about 95% and<br />
adjust the scanner to achieve the desired signal intensities. Adjust the laser power<br />
so that at least 50% of the pixels of each spot are saturated. The laser power<br />
should almost always be set very close to the maximum since the maximum<br />
powers of the small commercial scanners are still less than optimal. Lower scan<br />
rates will generally produce higher signal-to-noise ratios. Scanning is performed<br />
at either 50 or 25% speed, depending on practical time limitations. The scan<br />
rate usually has a practical time limit to scan large sets of arrays. In order to<br />
find the optimal scanner settings, it is advisable to set the laser power close to<br />
maximum, set the scan rate to the lowest acceptable value, and then adjust the<br />
detector gain as high as possible without showing signal saturation in the data.<br />
When scanning a large set of arrays as part of a single experiment set, it is<br />
desirable to use similar settings for all the arrays to minimize the differences
Dissecting Cancer Serum Protein Profiles 281<br />
in conditions between the arrays. It may not always be possible to use the<br />
same settings for every slide due to great variations in signal and background<br />
strengths, but subsequent normalization should readjust the data accordingly.<br />
Scanned images are typically stored as tiff files to be analyzed by microarray<br />
analysis programs. It is advisable to save the scanned images by their slide<br />
number followed by either Cy3 or Cy5 and the date of scanning.<br />
13. Gridding and rejection of data points. The analysis of scanned microarray<br />
data depends somewhat on whether the experiment is one color or a twocolor<br />
direct-labeling experiment. In all experiment types, the image data first<br />
need to be converted into numbers. Various software programs that come with<br />
current scanners, such as GenePix with Axon scanners and ArrayQuant with<br />
PerkinElmer scanners, accomplish this. The details for using such programs are<br />
not discussed here, but the principles that these programs use are mentioned.<br />
The quantification of microarray data begins with loading the scanned images<br />
(usually in tiff format) into an analysis program and overlaying a grid that defines<br />
the locations of the antibody spots. After aligning the grid to the image data,<br />
the program calculates the intensities and various statistics for image areas both<br />
within and without the spots. The user can “flag” or reject spots if obvious gross<br />
defects are present. Spots with very low intensity in one or both of the color<br />
channels yield unreliable data and should be rejected. It is especially important<br />
to reject low-intensity spots in two-color ratio since the noisy low intensity<br />
data can greatly affect the ratio. It is desirable to define statistical criteria for<br />
rejecting low-intensity spots rather than relying on user judgments. A threshold<br />
based on the overall variation in background on the arrays can be defined. The<br />
median signal intensity at each spot should be three standard deviations (of the<br />
background areas) above the local background median intensity. This objective<br />
criterion provides uniform, statistically based standard for all data.<br />
14. Normalization of data. The signals obtained from each array need to be<br />
corrected or normalized for possible changes in the overall signal intensity due<br />
to factors such as scanner settings and dye labeling efficiency. This process<br />
uses signals from antibodies targeting an internal standard of known concentration.<br />
Antibodies against proteins commonly expressed in serum, such as<br />
immunoglobulin isotypes, albumin, or C-reactive protein, can be utilized as<br />
internal controls. A normalization factor is calculated for each array that sets the<br />
data from normalization antibodies to the expected or known values. A highly<br />
specific and quantitatively accurate antibody is required for measurement of the<br />
normalization protein. The protein standards can either be present naturally in<br />
the sample or can be spiked in. Naturally occurring proteins that work well is<br />
flag-labeled BSA. It is a widely used peptide tag for which commercial labeling<br />
kits are available. Other tags such as DNP can work well too.<br />
Normalization is recommended to be based on an intensity-dependent<br />
algorithm as follows (24). In this case, the local background in each color<br />
channel is subtracted from the signal at each antibody spot, and spots having<br />
obvious defects, no detectable signal by GenePix, or a low net fluorescence
282 Sanchez-Carbayo<br />
in either color channel are removed from analysis. The ratio of net signal<br />
from the sample-specific channel to the net signal from the reference-specific<br />
channel is calculated for each antibody spot, and ratios from replicated antibody<br />
measurements in the same array are averaged. It is common to plot a red (Cy5)<br />
versus green (Cy3) channel scatter plot to examine the distribution of intensities;<br />
however, transforming to fold change versus average intensity displays<br />
the data in a more easily readable form. If I red is the background subtracted red<br />
channel intensity, and I green is the background subtracted green intensity, then the<br />
following variables are created: R = I red /I green andA= √ (I red ×I green ), where R is<br />
simply the fold change ratio and A is the average intensity (the geometric mean<br />
that is equivalent to averaging the log intensity). The curvature in the scatter<br />
plot indicated a dependence of the ratio R on the overall intensity. This curve<br />
is then used to normalize the data: log I red /I green →log (I red /I green −c A, where<br />
c(A) is the fit. This is equivalent to multiplying the green channel intensity<br />
(or dividing the red) by an intensity dependent normalization constant k(A)<br />
where log [(k(A)] = c(A). The optimal normalized data should be horizontal and<br />
centered (24).<br />
15. Data analysis. A critical step using quantitative data obtained through antibody<br />
arrays is the establishment of a filtering process to assess the quality of the<br />
data. The conceptual similarity of label-based antibody arrays with two-color<br />
competitive detection genomic arrays has allowed the application of normalization<br />
and data analysis tools classically utilized for cDNA arrays to protein<br />
profiling using antibody arrays (24). In order to obtain efficient measurement<br />
of multiple proteins simultaneously with high sensitivity, specificity, and<br />
quantitative accuracy over large concentration ranges and reproducibility, it is<br />
necessary to consider quality control issues in the design of the arrays (1,4,9).<br />
Optimal assessment of technology through filtering and data analyses procedures<br />
will later address the linearity, calibration, and specificity of the antibodies, as<br />
well as if labeling and/or hybridization protocols are optimized adequately to<br />
ensure high signal-to-noise ratios (3,24). The very first level of quality control<br />
deals with the experimental design of the printing of antibody arrays, which<br />
should include various replicated spots dispersed along the complete surface<br />
of the array as well as the inclusion of controls in every single experiment to<br />
evaluate the intra- and inter-assay reproducibility of the measurements (1,4,9).<br />
The array should also include appropriate means that serve to test the presence of<br />
potential antibody interferences and cross-reactivity. In this regard, the quantity<br />
of antibody spotted can be used to standardize the antigen concentration. It is<br />
possible to use an internally controlled system where one color represents the<br />
amount of antibody spotted, and the other color represents the amount of antigen<br />
that is used to quantify the level of protein expression. This normalization for<br />
antibody spot intensity can decrease variability and lower the limits of detection<br />
of antibody arrays.<br />
The initial control of scanned data is at the spot level using the scanner<br />
software, e.g., GenePix (24). The customized report created can be utilized<br />
to analyze the quality of spots, and it is then possible to flag those spots of
Dissecting Cancer Serum Protein Profiles 283<br />
low quality. The criteria to flag the spots may include the standard deviations<br />
away from background, the R 2 , or the percent saturation (3,24). At<br />
the array level of comparison, the quality control of data includes normalization<br />
of the array, as well as calculation of average and standard deviation<br />
of the intensities of each antibody in its various replicates along the slide<br />
(3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24). Spots with high<br />
standard deviation between replicated spots can be filtered out. Normalization<br />
of the arrays can be performed using the average intensity of each array (24),<br />
protein standards such as Immunoglobulin G (1,21), or internal controls based<br />
on antibody spot intensity (31).<br />
In the next level of data filtering, each experiment set is compared, and the<br />
results are calibrated to a dilution series of antibodies by a best fit line removing<br />
data with high variability. The results can also be correlated to independent<br />
measurements obtained through enzyme-immunoassays (ELISA) available to<br />
quantify targets included in the antibody arrays. At this step, if the series for an<br />
antibody is bad, the antibody can be flagged. It is possible to set thresholds of<br />
expression for an antibody, specifying a maximum and minimum ratio for spots<br />
to be considered in further analyses (24). This is a critical step due to its ability to<br />
filter the input data based on the standard deviation between replicate spots, and<br />
also the output data based on the standard deviation of dilution experiments. The<br />
last level of quality control refers to the comparison of independent experiment<br />
sets based on internal controls that will allow comparison between experiments<br />
performed on different days. The combined use of unsupervised and supervised<br />
methods can identify protein patterns associated with disease progression and<br />
clinical outcome.<br />
16. One should be aware that there are limitations of research procedures working<br />
with antibody arrays, associated with false positive and negative results, which<br />
may be overcome using different strategies. Causes of false negative results<br />
on antibody arrays include: (1) The protein product may have been degraded<br />
by serum proteases during sample handling. (2) Interferences in the antibodyantigen<br />
binding process resulting in low detection of the target protein. The<br />
specificity of the targets for bladder cancer progression is addressed by immunohistochemistry,<br />
and using antibodies targeting different epitopes. The specificity<br />
of antigen-antibody binding is assessed by reverse-protocols, printing<br />
purified proteins and Western blots. Addition of protease inhibitors and serum<br />
preservation at –80°C will avoid protein degradation during sample handling.<br />
Serum aliquots will avoid degradation effects associated with repetitive thawing–<br />
freezing cycles. Modifications in amplification protocols such as rolling-circle<br />
amplification may increase signal detection.<br />
Similarly, the causes of false positive results on antibody arrays include: (A)<br />
The antibody is binding non-specific molecules or degradation products of the<br />
target protein. (B) Gelatin or protein-related additives to antibodies printed onto<br />
arrays. (C) The presence of heterophilic antibodies in serum samples. (D) Nonspecific<br />
binding of antibodies present in patients with any autoimmune or other<br />
diseases. False positive results can be addressed in several ways. Cross-reactivity
284 Sanchez-Carbayo<br />
can be overcome by the selection of alternative antibodies directed to other<br />
epitopes (A), or including different preservatives without gelatin (B). In cases<br />
C and D, the interference and recovery experiments proposed for the analytical<br />
validation of antibodies using dilution and recovery coefficients will estimate<br />
the amount of interference. Clinical records on other coexisting diseases in the<br />
patients analyzed, enzyme-immunoassays, and immunohistochemical analyses<br />
will assist to interpret the unexpected results. The specificity of antigen-antibody<br />
binding can be assessed by reverse-protocols, printing purified proteins and<br />
Western blots.<br />
5. Final Remarks<br />
The methods and applications of antibody arrays are increasing in scope<br />
and effectiveness. The current and new antibody array formats that may be<br />
developed in the near future are likely to markedly accelerate the rate of<br />
biomarker discovery and characterization of cancer-specific pathways that will<br />
eventually lead to the development of individualized therapies that take into<br />
account markers of disease predisposition and therapeutic response. However,<br />
multiple challenges remain in the design and application of antibody arrays (33,<br />
34,35): (1) poor understanding of protein immobilization; (2) limited dynamic<br />
ranges of no more than three orders of magnitude; (3) achieving accuracy<br />
and reproducibility similar to clinical immunoassays; (4) molecular protein<br />
complexity and denaturation affecting immunoreactivity; (5) lack of standards<br />
and calibrators; (6) development of high-affinity and specific antibodies for<br />
target antigens. Such challenges are being addressed by the multi-institutional<br />
effort of the Human Proteome Organization (HUPO) toward the standardization<br />
of critical parameters in serum or plasma proteomic analyses. Initial studies<br />
provide guidance on pre-analytical variables that can alter the analysis of bloodderived<br />
samples, including choice of sample type, stability during storage, use<br />
of protease inhibitors, and clinical standardization [(33); see also Chapter 2).<br />
As part of the HUPO approach, it is also critical to standardize the statistical<br />
strategies for high-confidence protein identification and data analyses. These<br />
efforts and strategies toward integrating proteomic datasets would lead toward<br />
accurate and comprehensive representation of human proteomes (34–35)<br />
References<br />
1. Haab BB, Dunham MJ, Brown PO. (2001). Protein microarrays for highly parallel<br />
detection and quantitation of specific proteins and antibodies in complex solutions.<br />
Genome Biol. 2(2): research 0004.1–0004.13.<br />
2. Chan SM, Ermann J, Su L, Fathman CG, Utz PJ. (2004). Protein microarrays for<br />
multiplex analysis of signal transduction pathways. Nat Med. 10, 1390–6.
Dissecting Cancer Serum Protein Profiles 285<br />
3. Sanchez-Carbayo M. (2006). Antibody arrays: technical considerations and clinical<br />
applications in cancer. Clin Chem. 52, 1651–9.<br />
4. Barry R, Diggle T, Terrett J, Soloviev M. (2003). Competitive assay formats for<br />
high-throughput affinity arrays. J Biomol Screen. 8, 257–63.<br />
5. Pang S, Smith J, Onley D, Reeve J, Walker M, Foy C. (2005). A comparability<br />
study of the emerging protein array platforms with established ELISA procedures.<br />
J Immunol Meth. 302, 1–13.<br />
6. Lash GE, Scaife PJ, Innes BA, Otun HA, Robson SC, Searle RF, Bulmer<br />
JN. (2006). Comparison of three multiplex cytokine analysis systems: Luminex,<br />
SearchLight and FAST Quant. J Immunol Meth. 309, 205–8.<br />
7. de Jager W, Rijkers GT. (2006). Solid-phase and bead-based cytokine immunoassay:<br />
a comparison. Methods 38, 294–303.<br />
8. Waterboer T, Sehr P, Pawlita M. (2006). Suppression of non-specific binding in<br />
serological Luminex assays. J Immunol Methods. 309, 200–4.<br />
9. Kingsmore SF. (2006). Multiplexed protein measurement: technologies and applications<br />
of protein and antibody arrays. Nat Rev Drug Discov. 5, 310–21.<br />
10. Wang X, Yu J, Sreekumar A, Varambally S, Shen R, Giacherio D, Mehra R, Montie<br />
JE, Pienta KJ, Sanda MG, Kantoff PW, Rubin MA, Wei JT, Ghosh D, Chinnaiyan<br />
AM. (2005). Autoantibody signatures in prostate cancer. N Engl J Med. 353, 1224–35.<br />
11. Anderson KS, LaBaer J. (2005). The sentinel within: exploiting the immune system<br />
for cancer biomarkers. J Proteome Res. 4, 1123–33.<br />
12. Petricoin EF III, Bichsel VE, Calvert VS, Espina V, Winters M, Young L, Belluco<br />
C, Trock BJ, Lippman M, Fishman DA, Sgroi DC, Munson PJ, Esserman LJ,<br />
Liotta LA. (2005). Mapping molecular networks using proteomics: a vision for<br />
patient-tailored combination therapy. J Clin Oncol. 23, 3614–21.<br />
13. Angenendt P, Glokler J, Murphy D, Lehrach H, Cahill DJ. (2002). Toward<br />
optimized antibody microarrays: a comparison of current microarray support<br />
materials. Anal Biochem. 309, 253–60.<br />
14. Espina V, Woodhouse EC, Wulfkuhle J, Asmussen HD, Petricoin EF III, Liotta<br />
LA. (2004). Protein microarray detection strategies: focus on direct detection<br />
technologies. J Immunol Methods. 290, 121–33.<br />
15. Levit-Binnun N, Lindner AB, Zik O, Eshhar Z, Moses E. (2003). Quantitative<br />
detection of protein arrays. Anal Chem. 75, 1436–41.<br />
16. Pawlak B, Gordon R. (2005). Density estimation for positron emission tomography.<br />
Technol Cancer Res Treat. 4, 131–42.<br />
17. Schweitzer B, Roberts S, Grimwade B, Shao W, Wang M, Fu Q, Shu Q, Laroche<br />
I, Zhou Z, Tchernev VT, Christiansen J, Velleca M, Kingsmore SF. (2002).<br />
Multiplexed protein profiling on microarrays by rolling-circle amplification. Nat<br />
Biotechnol. 20, 359–65.<br />
18. Pasternack RF, Collings PJ. (1995). Resonance light scattering: a new technique<br />
for studying chromophore aggregation. Science. 269, 935–9.<br />
19. Stich N, Gandhum A, Matyushin V, Raats J, Mayer C, Alguel Y, Schalkhammer T.<br />
(2002). Phage display antibody-based proteomic device using resonance-enhanced<br />
detection. J Nanosci Nanotechnol. 2, 375–81.
286 Sanchez-Carbayo<br />
20. Lindskog M, Rockberg J, Uhlen M, Sterky F. (2005). Selection of protein epitopes<br />
for antibody production. Biotechniques. 38, 723–7.<br />
21. Miller JC, Zhou H, Kwekel J, Cavallo R, Burke J, Butler EB, Teh BS, Haab BB.<br />
(2003). Antibody microarray profiling of human prostate cancer sera: antibody<br />
screening and identification of potential biomarkers. Proteomics. 3, 56–63.<br />
22. Zhou H, Bouwman K, Schotanus M, Verweij C, Marrero JA, Dillon D, Costa J,<br />
Lizardi P, Haab BB. (2004). Two-color, rolling-circle amplification on antibody<br />
microarrays for sensitive, multiplexed serum-protein measurements. Genome Biol.<br />
5, R28.<br />
23. Shao W, Zhou Z, Laroche I, Lu H, Zong Q, Patel DD, Kingsmore S, Piccoli SP.<br />
(2003). Optimization of rolling-circle amplified protein microarrays for multiplexed<br />
protein profiling. J Biomed Biotechnol. 5, 299–307.<br />
24. Sanchez-Carbayo M, Socci ND, Lozano JJ, Haab BB, Cordon-Cardo C. (2006).<br />
Profiling bladder cancer using targeted antibody arrays. Am J Pathol. 168, 93–103.<br />
25. Saviranta P, Okon R, Brinker A, Warashina M, Eppinger J, Geierstanger BH.<br />
(2004). Evaluating sandwich immunoassays in microarray format in terms of the<br />
ambient analyte regime. Clin Chem. 50, 1907–20.<br />
26. Huang R, Lin Y, Shi Q, Flowers L, Ramachandran S, Horowitz IR, Parthasarathy<br />
S, Huang RP. (2004). Enhanced protein profiling arrays with ELISA-based amplification<br />
for high-throughput molecular changes of tumor patients ′ plasma. Clin<br />
Cancer Res. 10, 598–609.<br />
27. Varnum SM, Woodbury RL, Zangar RC. (2004). A protein microarray ELISA for<br />
screening biological fluids. Methods Mol Biol. 264, 161–72.<br />
28. Gembitsky DS, Lawlor K, Jacovina A, Yaneva M, Tempst P. (2004). A prototype<br />
antibody microarray platform to monitor changes in protein tyrosine phosphorylation.<br />
Mol Cell Proteomics. 3, 1102–18.<br />
29. Janzi M, Odling J, Pan-Hammarstrom Q, Sundberg M, Lundeberg J, Uhlen M,<br />
Hammarstrom L, Nilsson P. (2005). Serum microarrays for large scale screening<br />
of protein levels. Mol Cell Proteomics. 4, 1942–7.<br />
30. MacBeath G, Schreiber SL. (2000). Printing proteins as microarrays for highthroughput<br />
function determination. Science. 289, 1760–3.<br />
31. Knezevic V, Leethanakul C, Bichsel VE, Worth JM, Prabhu VV, Gutkind JS,<br />
Liotta LA, Munson PJ, Petricoin EF 3rd, Krizman DB. (2001). Proteomic profiling<br />
of the cancer microenvironment by antibody arrays. Proteomics. 1, 1271–8.<br />
32. Arenkov P, Kukhtin A, Gemmell A, Voloshchuk S, Chupeeva V, Mirzabekov A.<br />
(2000). Protein microchips: use for immunoassay and enzymatic reactions. Anal<br />
Biochem. 278, 123–31<br />
33. Rai AJ, Gelfand CA, Haywood BC, Warunek DJ, Yi J, Schuchard MD, Mehigh<br />
RJ, Cockrill SL, Scott GB, Tammen H, Schulz-Knappe P, Speicher DW, Vitzthum<br />
F, Haab BB, Siest G, Chan DW. (2005). HUPO Plasma Proteome Project specimen<br />
collection and handling: towards the standardization of parameters for plasma<br />
proteome samples. Proteomics. 5, 3262–77.<br />
34. States DJ, Omenn GS, Blackwell TW, Fermin D, Eng J, Speicher DW, Hanash<br />
SM. (2006). Challenges in deriving high-confidence protein identifications from
Dissecting Cancer Serum Protein Profiles 287<br />
data gathered by a HUPO plasma proteome collaborative study. Nat Biotechnol.<br />
24, 333–8.<br />
35. Uhlen M, Bjorling E, Agaton C, Szigyarto CA, Amini B, Andersen E, Andersson<br />
AC, Angelidou P, Asplund A, Asplund C, Berglund L, Bergstrom K, Brumer<br />
H, Cerjan D, Ekstrom M, Elobeid A, Eriksson C, Fagerberg L, Falk R, Fall J,<br />
Forsberg M, Bjorklund MG, Gumbel K, Halimi A, Hallin I, Hamsten C, Hansson<br />
M, Hedhammar M, Hercules G, Kampf C, Larsson K, Lindskog M, Lodewyckx<br />
W, Lund J, Lundeberg J, Magnusson K, Malm E, Nilsson P, Odling J, Oksvold P,<br />
Olsson I, Oster E, Ottosson J, Paavilainen L, Persson A, Rimini R, Rockberg J,<br />
Runeson M, Sivertsson A, Skollermo A, Steen J, Stenvall M, Sterky F, Stromberg<br />
S, Sundberg M, Tegel H, Tourle S, Wahlund E, Walden A, Wan J, Wernerus H,<br />
Westberg J, Wester K, Wrethagen U, Xu LL, Hober S, Ponten F. (2005). A human<br />
protein atlas for normal and cancer tissues based on antibody proteomics. Mol Cell<br />
Proteomics. 4, 1920–32.
V<br />
Statistics and Bioinformatics in Clinical<br />
Proteomics Data Analysis
16<br />
2D-PAGE Maps Analysis<br />
Emilio Marengo, Elisa Robotti, and Marco Bobba<br />
Summary<br />
Due to the low reproducibility affecting 2D gel-electrophoresis and the complex maps<br />
provided by this technique, the use of effective and robust methods for the comparison<br />
and classification of 2D maps is a fundamental tool for the development of automated<br />
diagnostic methods. A review of classical and recently developed methods for the<br />
comparison of 2D maps is presented here. The methods proposed regard both the analysis<br />
of spot volume datasets through multivariate statistical tools (pattern recognition methods,<br />
cluster analysis, and classification methods) and the analysis of 2D map images through<br />
fuzzy logic, three-way PCA, and the use of moment functions.<br />
The theoretical basis of each procedure is briefly introduced, together with a review<br />
of the most interesting applications present in recent literature.<br />
Key Words: principal component analysis; cluster analysis; classification; SIMCA;<br />
image analysis; moment functions; fuzzy logic; three-way PCA; multidimensional scaling;<br />
spot volume data.<br />
1. Introduction<br />
The development of new and effective methods for the identification of<br />
differences between groups of 2D-PAGE maps represents one of the frontiers<br />
in the field of proteomics, for the development of reliable diagnostic/prognostic<br />
tools. The comparison of sets of 2D maps is not in fact a trivial problem<br />
due to some experimental limitations affecting 2D gel-electrophoresis. In<br />
spite of being a very powerful tool for the separation of proteins in cellular<br />
extracts, 2D gel-electrophoresis is characterized by quite low reproducibility:<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
291
292 Marengo et al.<br />
this limit is dictated by both the specificity of the specimen and the instrumental<br />
procedure employed to obtain the final electrophoretic maps. In fact, the<br />
analyzed biological samples often present complex protein mixtures, covering<br />
a wide range of structures, properties, and molecular weights. The complexity<br />
of the sample is reflected in the complexity of the final map that may contain<br />
hundreds or thousands of spots, with the further appearance of spurious spots<br />
due to impurities or side reactions. The second aspect to reducing reproducibility<br />
in 2D gel-electrophoresis is related to the instrumental technique itself, from<br />
sample preparation to the electrophoretic run. Sample pre-treatment, in fact,<br />
follows a multi-step procedure consisting of several purification and extraction<br />
steps, increasing the overall experimental uncertainty. In addition, the final<br />
result is strongly dependent on a great number of instrumental factors that<br />
have to be taken under strict control: polymerization conditions, temperature,<br />
running conditions, time and temperature during staining and de-staining steps.<br />
An unexpected or random variation of one or more of these instrumental parameters<br />
can strongly affect the final result of reproducibility of the position, size,<br />
and intensity of the spots on the final map.<br />
The large number of spots present on each map and the low reproducibility<br />
of 2D gel-electrophoresis worsen the achievement of a clear classification of<br />
samples and make it quite difficult to use 2D-PAGE maps for diagnostic and<br />
prognostic purposes or for drug-design studies. In this perspective, the use<br />
of effective and robust methods for the comparison and classification of 2D<br />
maps is a key point in the development of automated diagnostic tools based on<br />
proteomics. For taking due consideration of the low reproducibility affecting the<br />
experimental protocol, sets of replicate 2D maps are usually run and compared.<br />
The classical analysis of 2D-PAGE maps is usually carried out by dedicated<br />
software packages, which will be briefly described here. The second part of<br />
the chapter will focus on the use of multivariate statistical tools for a more<br />
effective analysis of the so-called “spot volume datasets” produced by software<br />
packages dedicated to 2D-PAGE image analysis.<br />
The final part of the chapter will be devoted to the most advanced applications<br />
of image analysis tools for the study and classification of 2D maps;<br />
these methods will be presented based on fuzzy logic principles coupled with<br />
multivariate statistical tools or on the calculation of mathematical moments of<br />
the images.<br />
2. Gel Analysis Via Dedicated Software Packages<br />
The analysis of sets of 2D maps is usually carried out via dedicated software<br />
packages; among the most popular are PDQuest, Progenesis, Melanie, Z3,<br />
Phoretix, Z4000, but many other solutions are commercially available.
2D-PAGE Maps Analysis 293<br />
Many papers appeared in the last decade about the development of software<br />
packages (1,2,3), the comparison of the performances of different packages<br />
(4,5), or the widening of particular topics like point pattern matching, reproducibility,<br />
matching efficiency and spot overlapping (6,7,8,9,10,11,12,13,14,<br />
15,16).<br />
All software solutions presently available perform the analysis of sets of 2D<br />
maps based on the digitalized images of gels obtained by laser densitometry,<br />
phosphor imagery, or via a CCD camera. The analysis of digitalized images<br />
involves several steps, which are described here in more detail with particular<br />
reference to one of the most used ones, namely the PDQuest system (17,18,19):<br />
1. Scanning. Gel images are turned into pixel data; each pixel is characterized by a<br />
couple of coordinates x–y indicating its position on the 2D image and a Z value<br />
corresponding to the signal intensity of the pixel. Each map is finally turned into<br />
a series of pixels described by their optical density value (OD).<br />
2. Filtering images. This step performs a pre-processing of gel images, allowing the<br />
elimination of noise, background effects, specks, and other imperfections.<br />
3. Automated spot detection. Spot detection involves the identification of spots<br />
present on each gel independently. The operator has to select the faintest spot<br />
(to set the sensitivity and minimum peak value parameters), the smallest spot<br />
(to set the size scale parameter), and the largest spot that one aims to detect.<br />
A final smoothing is applied to remove spots close to the background level.<br />
Spots are then located on the gel image (i.e., each spot is identified by a couple<br />
of x–y coordinates indicating its position on the gel), fitted by ideal Gaussian<br />
distributions and quantified by the sum of the OD values within each Gaussian<br />
distribution.<br />
4. Matching of protein profiles. Sets of 2D gels are then edited and matched to one<br />
another in a “match set.” Each identified spot is matched to the same spot in<br />
all the other gels of the set under investigation. To this purpose, landmarks are<br />
needed, consisting of reference spots used by PDQuest to align and position the<br />
match set members for matching. The identification of the landmarks sets some<br />
parameters accounting for distortions existing among the gels to be compared.<br />
5. Normalization. Normalization is then applied to the maps to compensate gel-togel<br />
variations due to sample preparation and loading, staining and de-staining<br />
procedures, etc.<br />
6. Differential analysis. This step allows the analysis of different sets of 2D maps,<br />
i.e., control and diseased samples. Within each group of different 2D maps, a<br />
“sample group” is created containing the average values of all the spots identified.<br />
Once the sample groups have been created (i.e., control and diseased samples), the<br />
comparison of the groups is carried out to find differentially expressed proteins.<br />
Usually, only spots showing a two-fold variation are accepted as significantly<br />
changed (100% variation).<br />
7. Statistical analysis. Statistical analysis is then applied to the differentially<br />
expressed proteins. It is usually based on Student’s t-test (p
294 Marengo et al.<br />
The final result of the overall procedure, therefore, appears deeply dependent<br />
on the accuracy of the software package adopted, and so the choice of the most<br />
suitable analysis software is critical.<br />
Commercial software packages, in spite of being powerful tools for image<br />
analysis, present two main disadvantages. The first one is related to human<br />
interference, which is introduced mainly in steps 2 and 3. The second disadvantage<br />
is related to the problem of replicas; the comparison of different groups<br />
of 2D maps is performed on the basis of the obtained “sample group” of each<br />
class, i.e., a gel containing the average of the information common to all replicates.<br />
In this way, single replicas are not considered, and the information about<br />
the reproducibility of the maps is not taken into proper consideration.<br />
3. Analysis of Spot Volume Datasets<br />
Spot volume datasets coming from the differential analysis via dedicated<br />
software (step 5 of the procedure described in Section 2) are particularly suitable<br />
for investigation by means of multivariate statistical tools; this is due both to<br />
their large dimensionality (a large number of spots identified on each map) and<br />
to the difficulty in identifying the small differences existing between groups<br />
of maps when hundreds of spots are contemporarily detected on each sample.<br />
From this point of view, multivariate statistical tools represent the best<br />
alternative since they are able to provide a clear representation of the case<br />
under study, considering all the variables contemporarily, and produce robust<br />
results, i.e., eliminating the contribution of experimental uncertainty. Among<br />
the statistical techniques that are and have been recently and successfully<br />
applied to spot volume datasets are pattern recognition methods, e.g., Principal<br />
Component Analysis (PCA) and Cluster Analysis; classification methods, e.g.,<br />
Linear Discriminant Analysis (LDA) and Soft-independent Model of Class<br />
Analogy (SIMCA); and regression methods e.g., discriminant analysis–partial<br />
least squares regression (DA-PLS).<br />
Data from spot volume datasets present a multivariate structure, where<br />
several samples (maps) are described by a large number of variables (spots<br />
identified). Multivariate data are usually arranged in matrices to undergo the<br />
statistical analysis. The datasets taken into account hereafter are arranged in<br />
data matrices of dimensions n × p, where n is the number of samples (one for<br />
each row of the matrix) and p is the number of variables (one for each column<br />
of the matrix).<br />
3.1. Principal Component Analysis<br />
Principal Component Analysis (20,21) is a multivariate pattern recognition<br />
method that represents the objects, described by the original variables, in a
2D-PAGE Maps Analysis 295<br />
new reference system characterized by new variables called principal components<br />
(PCs; see also Chapter 17). Each PC has the property of explaining the<br />
maximum possible amount of residual variance contained in the original dataset:<br />
the first PC explains the maximum amount of variance contained in the overall<br />
dataset, while the second one explains the maximum residual variance. The<br />
PCs are then calculated hierarchically so that experimental noise and random<br />
variations are contained in the last PCs.<br />
The PCs maintain a strict relationship with the original reference system,<br />
since they are calculated as linear combinations of the original variables. They<br />
are also orthogonal to each other, thus containing independent sources of information<br />
(Fig. 1). The hierarchical way in which PCs are calculated makes them<br />
useful for operating a dimensionality reduction of the original dataset: in fact,<br />
a large number of original variables can be substituted by a smaller number of<br />
significant PCs, containing a relevant amount of information when compared to<br />
the overall amount of variance contained in the original dataset, but eliminating<br />
experimental uncertainty (which is accounted for by the last PCs).<br />
Principal Component Analysis provides two main tools for data analysis: the<br />
scores and the loadings. The scores represent the coordinates of the samples<br />
in the new reference system, while the loadings represent the coefficients of<br />
the linear combination describing each PC, i.e., the weights of the original<br />
variables on each PC. The graphical representation of the scores in the space<br />
of the PCs allows the identification of groups of samples showing a similar<br />
behavior (samples close to one another in the graph) or different characteristics<br />
(samples far from each other). By looking at the corresponding loading plot, it<br />
is possible to identify the variables that are responsible for the analogies or the<br />
differences detected for the samples in the score plot.<br />
An example of loading and score plot is represented in Fig. 2. Data belong<br />
to four groups of 2D maps (24 maps described by more than 1000 spots). From<br />
the score plot, it is possible to discriminate the four groups of samples present:<br />
Fig. 1. Construction of the principal components.
296 Marengo et al.<br />
(A)<br />
Loading Plot<br />
PC2<br />
0.08<br />
0.06<br />
0.04<br />
0.02<br />
0.00<br />
– 0.02<br />
– 0.04<br />
– 0.06<br />
– 0.08<br />
V435<br />
V352<br />
V119 V160<br />
V426 V217<br />
V215<br />
V111 V479<br />
V430<br />
V295<br />
V968<br />
V796<br />
V148V317<br />
V60<br />
V451 V84 V150<br />
V423<br />
V729<br />
V208<br />
V363 V303<br />
V428<br />
V381<br />
V269 V405<br />
V475<br />
V509 V759<br />
V1076<br />
V112<br />
V188 V856<br />
V513<br />
V158<br />
V228<br />
V275<br />
V136<br />
V310<br />
V605 V912 V1008<br />
V259<br />
V753 V931<br />
V276<br />
V419 V450<br />
V145<br />
V416<br />
V42<br />
V94 V413 V672<br />
V788<br />
V1006<br />
V1116<br />
V915 V847 V550<br />
V409<br />
V305<br />
V139<br />
V1079 V743<br />
V17<br />
V237<br />
V41 V308 V603 V166<br />
V534<br />
V818 V963<br />
V916<br />
V280 V328<br />
V271<br />
V346<br />
V415<br />
V526<br />
V113<br />
V823<br />
V668<br />
V309<br />
V726 V486 V458<br />
V116 V96 V176 V781<br />
V834<br />
V1064 V888<br />
V708<br />
V204<br />
V279<br />
V474 V877<br />
V130<br />
V138<br />
V86<br />
V50<br />
V361<br />
V388<br />
V429 V403 V476<br />
V359<br />
V452<br />
V522 V709<br />
V932<br />
V902<br />
V973<br />
V949<br />
V982<br />
V478<br />
V512 V725<br />
V379<br />
V489<br />
V465<br />
V266<br />
V365<br />
V31<br />
V296<br />
V128 V367<br />
V436<br />
V555<br />
V890<br />
V1010 V1034<br />
V1167<br />
V987 V939<br />
V741 V653 V1106<br />
V675<br />
V717 V921<br />
V1107<br />
V1127<br />
V477 V990<br />
V214 V311<br />
V493 V250 V70 V55<br />
V65<br />
V99 V122<br />
V167 V245<br />
V283 V288<br />
V397<br />
V674<br />
V768<br />
V524<br />
V531<br />
V881<br />
V860 V889 V906<br />
V828 V632 V542 V919 V652 V946<br />
V950 V967 V1019<br />
V1137<br />
V1001<br />
V972<br />
V1050<br />
V200<br />
V58<br />
V74 V77 V103<br />
V124 V341<br />
V325<br />
V185 V172<br />
V97<br />
V351<br />
V195 V189<br />
V297<br />
V380<br />
V408<br />
V463<br />
V117 V246<br />
V443 V470<br />
V492<br />
V506<br />
V517<br />
V790<br />
V521 V784 V841 V563 V824 V1004 V1023 V754<br />
V591<br />
V872 V901<br />
V937<br />
V871 V883<br />
V616 V1039<br />
V947<br />
V287<br />
V98<br />
V21<br />
V126<br />
V142<br />
V143 V203<br />
V298<br />
V454<br />
V528 V395 V469<br />
V495<br />
V353<br />
V553<br />
V650 V640 V571V613<br />
V649 V582<br />
V597 V730 V899 V1017 V1062<br />
V1154 V1155<br />
V1157<br />
V174<br />
V157<br />
V360<br />
V364<br />
V231<br />
V414<br />
V501<br />
V182<br />
V255<br />
V273 V292 V256<br />
V44 V220 V4<br />
V199 V146 V114 V110<br />
V59<br />
V137<br />
V180<br />
V194<br />
V53<br />
V78<br />
V227<br />
V230<br />
V278 V336<br />
V399<br />
V538<br />
V554<br />
V439 V567 V579<br />
V637<br />
V850<br />
V1133 V985<br />
V1092<br />
V771<br />
V813<br />
V865<br />
V859<br />
V933<br />
V665<br />
V787<br />
V976<br />
V1040<br />
V758<br />
V1091<br />
V900<br />
V898<br />
V922<br />
V453 V576<br />
V669<br />
V274 V347 V689<br />
V760 V772<br />
V808 V798<br />
V863<br />
V984<br />
V938<br />
V1007<br />
V869<br />
V998<br />
V745<br />
V253<br />
V257<br />
V312<br />
V302<br />
V324<br />
V427<br />
V491<br />
V738<br />
V778<br />
V101 V177<br />
V118<br />
V369<br />
V420<br />
V447<br />
V455 V421<br />
V581 V705<br />
V809<br />
V802 V1105 V979<br />
V941<br />
V1014<br />
V1060 V1037<br />
V617<br />
V920 V917<br />
V934<br />
V1020<br />
V643<br />
V519<br />
V592<br />
V536 V737<br />
V1067 V1003 V864 V964 V1109 V1030<br />
V1114<br />
V1084<br />
V1049<br />
V684<br />
V472<br />
V490 V149 V156 V216 V270 V560<br />
V168<br />
V9<br />
V30 V73 V106<br />
V35<br />
V92 V108<br />
V229<br />
V483<br />
V557 V569<br />
V618<br />
V644<br />
V686 V630 V691 V840<br />
V842 V822<br />
V1058 V980<br />
V1087 V1080<br />
V774<br />
V211 V224 V267<br />
V935<br />
V961 V804 V791 V1078<br />
V693 V1126<br />
V996<br />
V1082<br />
V514<br />
V243<br />
V348<br />
V19<br />
V102<br />
V56 V104<br />
V36 V93<br />
V115<br />
V135<br />
V213<br />
V385<br />
V394 V306<br />
V400<br />
V410<br />
V716<br />
V251<br />
V262<br />
V570 V826<br />
V1036 V543<br />
V318 V264 V284<br />
V222 V123 V197<br />
V339 V334<br />
V376<br />
V437<br />
V559 V516 V599<br />
V456<br />
V125 V396<br />
V503<br />
V505 V552 V623<br />
V878<br />
V639<br />
V831 V609 V966 V805<br />
V903 V965 V943 V953 V928<br />
V879 V1013 V1074<br />
V1085<br />
V473<br />
V508 V100<br />
V608<br />
V587<br />
V236<br />
V625<br />
V706 V634<br />
V191 V26<br />
V159<br />
V354 V401<br />
V485<br />
V32<br />
V45<br />
V105<br />
V85<br />
V133<br />
V152<br />
V181<br />
V238 V404<br />
V329<br />
V307<br />
V496<br />
V547<br />
V187<br />
V249<br />
V234 V527 V561<br />
V590<br />
V529<br />
V572<br />
V588<br />
V641<br />
V671<br />
V656<br />
V660<br />
V734 V810 V849<br />
V1063 V1083 V1164<br />
V843<br />
V848<br />
V1129 V1135<br />
V1045 V955<br />
V692 V682<br />
V511<br />
V254<br />
V412 V633 V573 V747 V596<br />
V884<br />
V789 V904<br />
V344<br />
V80<br />
V244 V54<br />
V621<br />
V779 V780<br />
V929 V994<br />
V1066<br />
V1042<br />
V1069<br />
V991<br />
V857<br />
V914 V956<br />
V807<br />
V978 V1011 V1128 V1119<br />
V1149<br />
V1165<br />
V1166<br />
V1075<br />
V1056<br />
V1123<br />
V1124<br />
V43<br />
V258<br />
V285 V291<br />
V417 V386<br />
V390 V461<br />
V504<br />
V109<br />
V332<br />
V433 V418<br />
V241<br />
V127<br />
V29 V38<br />
V179 V171<br />
V232<br />
V375 V484<br />
V502 V549<br />
V510<br />
V498<br />
V626<br />
V638 V777<br />
V1121 V1111 V1093<br />
V793 V940 V930<br />
V1145<br />
V740<br />
V696<br />
V736<br />
V535<br />
V462<br />
V63<br />
V201<br />
V337 V247 V615 V206 V459<br />
V272<br />
V170<br />
V25 V186 V33<br />
V193<br />
V248<br />
V494<br />
V566 V580<br />
V642<br />
V647<br />
V676<br />
V695<br />
V769<br />
V820<br />
V891 V911<br />
V942<br />
V1057<br />
V1138<br />
V1101<br />
V1141<br />
V797<br />
V926<br />
V851<br />
V892<br />
V711 V723 V752<br />
V762<br />
V662<br />
V750 V659<br />
V602<br />
V221 V301<br />
V666 V766 V679 V829<br />
V601 V545 V373<br />
V703 V832<br />
V844<br />
V854 V855 V874<br />
V893 V952<br />
V1025<br />
V1110<br />
V1041<br />
V1026<br />
V1152<br />
V896<br />
V913<br />
V1018<br />
V957<br />
V948 V1077<br />
V299<br />
V79 V169 V8<br />
V16<br />
V140<br />
V320 V192<br />
V293<br />
V164 V165<br />
V2<br />
V207 V184<br />
V212<br />
V338 V537 V482 V595<br />
V546<br />
V678 V681<br />
V358<br />
V362<br />
V20 V87<br />
V219<br />
V392V370<br />
V480<br />
V235<br />
V81 V252 V22<br />
V129 V173<br />
V48<br />
V155<br />
V268<br />
V144 V190 V218<br />
V294<br />
V342<br />
V533 V460<br />
V600<br />
V624 V718<br />
V835 V812<br />
V698<br />
V763<br />
V532 V343<br />
V411<br />
V523<br />
V578 V728 V830 V598 V593<br />
V685 V720 V862<br />
V905 V951 V918<br />
V954<br />
V1029<br />
V1146 V1143 V1096<br />
V1102 V1095<br />
V960 V1086<br />
V875 V1144<br />
V1098 V969 V1094<br />
V1044<br />
V544<br />
V677<br />
V424<br />
V539<br />
V744<br />
V821<br />
V800<br />
V702V321<br />
V627<br />
V225<br />
V398 V746 V497<br />
V556<br />
V95<br />
V71<br />
V202<br />
V434<br />
V551 V612<br />
V651<br />
V64<br />
V226 V68 V260 V240V507<br />
V564 V628<br />
V714<br />
V814<br />
V838<br />
V846 V853<br />
V958<br />
V1160<br />
V1031<br />
V959<br />
V962<br />
V970<br />
V1065<br />
V944 V923V924 V936 V870<br />
V1051<br />
V1108 V1140<br />
V1070<br />
V861 V852<br />
V72 V314<br />
V648<br />
V239<br />
V153 V261<br />
V286 V290<br />
V330<br />
V377 V382<br />
V402<br />
V393<br />
V449<br />
V699 V733<br />
V773 V880 V1021<br />
V909<br />
V1159 V1072 V1000<br />
V815<br />
V1081<br />
V885 V1125<br />
V1068 V999<br />
V1071<br />
V876<br />
V631 V619<br />
V783 V886 V971 V977 V1059<br />
V1161<br />
V801<br />
V316 V389<br />
V432 V457<br />
V565<br />
V442<br />
V739 V732 V664 V575<br />
V724 V742<br />
V175 V407 V383<br />
V196 V422 V132 V488<br />
V444 V319<br />
V562<br />
V663<br />
V765<br />
V670<br />
V907<br />
V894<br />
V908 V1099<br />
V981<br />
V1035<br />
V1054 V1132<br />
V1047<br />
V1142<br />
V1134 V925 V794 V1153<br />
V1158<br />
V1163 V1162<br />
V694<br />
V727<br />
V10 V151 V178 V304 V162 V39 V83<br />
V233<br />
V466<br />
V471 V622 V558<br />
V764 V667 V792<br />
V776<br />
V989<br />
V1012<br />
V997<br />
V986 V1005<br />
V1151<br />
V722<br />
V680 V636 V372<br />
V371<br />
V88<br />
V183<br />
V340 V147 V355 V52<br />
V131<br />
V209 V468 V518<br />
V583<br />
V300<br />
V326 V464<br />
V586<br />
V541<br />
V751 V1028<br />
V1043<br />
V1073 V1100<br />
V1097<br />
V782<br />
V1033 V1027<br />
V735 V384<br />
V755<br />
V839<br />
V1103<br />
V1131<br />
V1139<br />
V910<br />
V313 V837<br />
V47<br />
V629<br />
V499<br />
V594<br />
V391<br />
V687<br />
V749<br />
V481<br />
V635<br />
V767<br />
V1089<br />
V927<br />
V1113<br />
V945 V1104 V1038<br />
V1009 V1120 V1115<br />
V1088<br />
V775 V995 V811<br />
V697<br />
V75<br />
V27<br />
V5<br />
V40<br />
V89 V277<br />
V210<br />
V62<br />
V333 V289<br />
V322 V487 V704 V731 V1022<br />
V1156<br />
V1090<br />
V1061<br />
V1024<br />
V1055 V525<br />
V707<br />
V24 V242 V349 V350 V548<br />
V441 V515<br />
V719 V568<br />
V584<br />
V867 V620<br />
V770 V1136<br />
V690<br />
V1148<br />
V585<br />
V604<br />
V57V141<br />
V378<br />
V263 V1015<br />
V107<br />
V431 V577<br />
V610 V858<br />
V992<br />
V817 V868 V540<br />
V23<br />
V356 V374<br />
V134<br />
V607 V757V345<br />
V661 V673<br />
V683<br />
V51 V611 V614 V265<br />
V715<br />
V387 V530<br />
V748<br />
V873<br />
V895<br />
V988<br />
V1032<br />
V82 V67<br />
V281<br />
V712<br />
V756<br />
V761 V1130 V1052<br />
V90<br />
V13 V76V49<br />
V833<br />
V445<br />
V335<br />
V819 V710 V520<br />
V467<br />
V836<br />
V887 V1048<br />
V1046<br />
V1122<br />
V1118<br />
V866<br />
V440<br />
V974<br />
V897 V331<br />
V368<br />
V37<br />
V28 V12V3<br />
V154<br />
V315 V161<br />
V448<br />
V825<br />
V1016 V1053<br />
V1147<br />
V688<br />
V6<br />
V34<br />
V323 V1117<br />
V198 V18 V366<br />
V15<br />
V655<br />
V1002<br />
V700<br />
V61 V66<br />
V14 V69 V121<br />
V163<br />
V406 V327<br />
V91 V438 V827 V845 V786<br />
V645 V446<br />
V882<br />
V120 V205<br />
V721 V46 V606<br />
V657 V1112 V574 V654 V785 V993 V975<br />
V795<br />
V816<br />
V357<br />
V658 V806<br />
V1150<br />
V701<br />
V282<br />
V425<br />
V803<br />
V500 V589 V983<br />
V223<br />
V646 V713 V799<br />
– 0.06 – 0.04<br />
–0.02 0.00 0.02<br />
0.04<br />
0.06<br />
PC1<br />
(B)<br />
PC2<br />
25<br />
20<br />
15<br />
10<br />
5<br />
0<br />
–5<br />
–10<br />
– 15<br />
C6<br />
C5<br />
C1<br />
C4<br />
C2 C3<br />
A<br />
A5<br />
A6<br />
A1<br />
C<br />
A2<br />
A3<br />
A4<br />
Score Plot<br />
B6<br />
AB6<br />
AB1<br />
B2 B3<br />
B4<br />
B1 B5<br />
AB5AB2AB3<br />
AB4<br />
– 20<br />
– 25 –20 –15 –10 –5 0 5 10 15 20 25<br />
PC1<br />
B<br />
AB<br />
Fig. 2. Example of loadings (A) and scores plots (B).<br />
one group in each quadrant. The first PC is able to discriminate samples C and<br />
A (negative scores on PC 1 ) from samples B and AB (positive scores on PC 1 );<br />
PC 2 separates samples C and B (positive scores on PC 2 ) from samples A and<br />
AB (negative scores on PC 2 ). The analysis of the corresponding loading plot<br />
explains the reasons for the separation of samples in the four groups: sample<br />
C shows large intensities of the spots in the 2 nd quadrant and small intensities<br />
of the spots in the 4 th quadrant, sample A shows large intensities of the spots<br />
in the 3 rd quadrant and very small in the 1 st quadrant ; samples AB present<br />
a behavior opposite to that of sample C, while sample B presents a behavior<br />
opposite to sample A.
2D-PAGE Maps Analysis 297<br />
From the point of view of identification of groups of samples and variables<br />
existing in a dataset, PCA is a very powerful visualization tool, which allows<br />
the representation of multivariate datasets by means of only few PCs identified<br />
as the most relevant.<br />
In proteomics, the representation of loadings appears more effective on a<br />
virtual 2D map. In proteomic datasets, in fact, each variable represents a spot,<br />
characterized by a couple of x–y values defining its position on the 2D maps<br />
used for analysis. The loadings of each PC can then be represented on a “virtual”<br />
2D map, where each spot is represented as a circle centered in the corresponding<br />
x–y position: each spot can be described on a color scale, with the increasing<br />
color tone corresponding to an increasing positive or negative loading. This<br />
representation was proposed for the first time by Marengo et al. (22,23).<br />
An example is represented in Fig. 3, where positive and negative loadings<br />
of the first PC are represented, referring to the example of Fig. 2. The representation<br />
appears clearer with respect to the loading plot of Fig. 2, allowing<br />
the immediate identification of the spots showing the most relevant loadings<br />
(darker grey tones) on the corresponding PC.<br />
3.2. Cluster Analysis<br />
Cluster analysis techniques are pattern recognition methods that help to<br />
identify the existence of groups of samples or of variables in a dataset, through<br />
the investigation of the relationships between the objects or variables. Cluster<br />
analysis tools are unsupervised methods, where the operator does not know the<br />
dataset partition and wants to identify potential groups of objects. From this<br />
point of view, they are different from classification methods, where the operator<br />
does know the separation of objects in classes and wants to obtain the best<br />
classification of objects in the corresponding class. The most used clustering<br />
methods belong to the class of agglomerative hierarchical methods (24), where<br />
the objects are grouped (linked together) on the basis of a measure of their<br />
similarity. The most similar objects or groups of objects are linked first. The<br />
final result is a graph, called dendrogram; the objects are represented on the x<br />
axis and are connected at decreasing levels of similarity along the y axis. An<br />
example is reported in Fig. 4, referring to the dataset already presented in Figs. 2<br />
and 3. The four groups of samples can be identified by applying a horizontal<br />
cut of the dendrogram, i.e., at a dissimilarity level of 25%, and identifying the<br />
number of vertical lines present. The clustering technique applied shows a first<br />
partition of the samples into two main groups that can be further separated<br />
into three groups at a dissimilarity level of 50%. The four groups present can<br />
be identified only by applying a further cut at a dissimilarity level of 25%.<br />
Samples B and AB, thus, appear the most similar groups.
298 Marengo et al.<br />
220<br />
Positive Loadings<br />
200<br />
180<br />
160<br />
140<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
0<br />
0 20 40 60 80 100 120 140 160 180 200 220<br />
220<br />
Negative Loadings<br />
200<br />
180<br />
160<br />
140<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
0<br />
0 20 40 60 80 100 120 140 160 180 200 220<br />
Fig. 3. Positive and negative loadings of PC 1 represented on a virtual 2D-map.
2D-PAGE Maps Analysis 299<br />
100<br />
Ward Method<br />
Euclidean Distances<br />
80<br />
(D leg / D max )*100<br />
60<br />
40<br />
20<br />
AB<br />
B<br />
A<br />
C<br />
0<br />
AB3<br />
AB2<br />
AB5<br />
AB4<br />
AB6<br />
AB1<br />
B3<br />
B2<br />
B6<br />
B5<br />
B4<br />
B1<br />
A4<br />
A1<br />
A6<br />
A2<br />
A5<br />
A3<br />
C4<br />
C5<br />
C2<br />
C3<br />
C6<br />
C1<br />
Fig. 4. Dendrogram (Ward method, Euclidean distances).<br />
The results of hierarchical clustering methods depend on the specific measure<br />
of similarity and on the linking method, and so different methods are usually<br />
adopted to have a general idea of the number of groups present. In general,<br />
the linking methods that provide the best results with regard to the clarity of<br />
groups identified are the Ward method and the Complete Linkage method.<br />
With regard to the measure of similarity, the Euclidean distances are usually<br />
adopted.<br />
Clustering techniques can be applied both to the original variables and to<br />
the results of PCA (scores of the significant PCs), thus achieving a cluster of<br />
samples eliminating the contribution of experimental error and exploiting only<br />
useful sources of variation.<br />
3.3. Classification Methods<br />
The classification methods are particularly suitable for the analysis of<br />
proteomic spot volume datasets since the primary necessity in this application<br />
is the classification of samples belonging to different groups, e.g., to both<br />
control and diseased individuals, to their proper class. The final aim is both the<br />
development of diagnostic tools and the identification of differences existing
300 Marengo et al.<br />
between the classes to shed light on the mechanism of action of a disease or<br />
of a new drug.<br />
Here, two of the most exploited classification methods will be briefly<br />
described: LDA and SIMCA.<br />
3.3.1. Linear Discriminant Analysis<br />
Linear Discriminant Analysis (25,26) belongs to the so-called Bayesian<br />
classification methods, since it exploits the Bayes’s rule; it performs the<br />
classification of samples present in a dataset based on its multivariate<br />
structure.<br />
In Bayesian classification methods, an object, x, is assigned to the class, g,<br />
for which the posterior probability P(g/x) is maximum. Posterior probability is<br />
computed according to Bayes’s formula:<br />
where<br />
Pg/x =<br />
P gfg/x<br />
∑<br />
P k fk/x<br />
P g is the prior probability of class g;<br />
P k is the prior probability of class k (k ≠ g);<br />
f(g/x) is the probability density function of class g; and<br />
f(k/x) is the probability density function of class k.<br />
One normal assumption is that each class is described by a Gaussian multivariate<br />
probability distribution:<br />
where:<br />
P g<br />
fgx =<br />
2 p/2 S g 1/2 e−1/2x i−c g T Sg<br />
−1<br />
P g is the prior probability of class g;<br />
S g is the covariance matrix of class g;<br />
c g is the centroid of class g; and<br />
p is the number of descriptors.<br />
The argument of the exponential function:<br />
k<br />
x i − c g T S −1<br />
g<br />
x i − c g <br />
x i−c g <br />
is the Mahalanobis distance between object x and the centroid of class<br />
g, and it takes into consideration the class covariance structure since it
2D-PAGE Maps Analysis 301<br />
contains the covariance matrix. The covariance matrix accounts for the relationships<br />
existing among the variables for each class, i.e., the shape of the<br />
class.<br />
From the logarithm of posterior probability by eliminating the constant terms,<br />
each object is classified in class g if it is minimum, the so-called discriminant<br />
score:<br />
Dgx = x i − c g T S −1<br />
g<br />
x i − c g + ln S g −2lnP g<br />
In LDA, the covariance matrix of each class is approximated with the pooled<br />
(between the classes) covariance matrix, thus considering all the classes having<br />
a common shape, i.e., a weighted average of the shape of each class present in<br />
the dataset.<br />
The variables contained in the LDA model, which discriminate the classes<br />
present in the dataset, can be chosen by a stepwise algorithm, selecting the<br />
most discriminating variables iteratively. LDA can be performed on both the<br />
original variables or on PCs, thus eliminating the contribution to variation given<br />
by experimental uncertainty.<br />
3.3.2. Soft-Independent Model of Class Analogy<br />
The SIMCA method (27) is based on the independent modeling of each<br />
class by means of PCA; in fact, each class is described by its relevant PCs. The<br />
samples of each class are then contained in the so-called SIMCA boxes, defined<br />
by the relevant PCs of each class. This represents one of the most important<br />
advantages of SIMCA; the classification of each sample is not affected by<br />
experimental uncertainty and spurious information, since each class is modeled<br />
only by its relevant PCs. Moreover, this method is also useful when small<br />
datasets are analyzed (more variables than objects), since it performs substantial<br />
dimensionality reduction.<br />
Thus, SIMCA classification starts with PCA calculated previously on each<br />
class independently, with the identification of relevant PCs for each class. They<br />
define the so-called class model. If the data are autoscaled (mean centering<br />
followed by normalization for the standard deviation of each variable), each<br />
object x iv belonging to class g is modeled as:<br />
x ivg = ∑ a<br />
t iag l vag + r ivg g= 1Ga= 1 A g i= 1 n g v = 1 P<br />
(G = number of classes present; A g = number of significant PCs for class g;<br />
n g = number of samples in class g; P = number of original variables)
302 Marengo et al.<br />
where<br />
t iag = score of the i-th object of class g on the a-th PC;<br />
l vag = loading of the v-th variable on the a-th PC of class g; and<br />
r ivg = residual of the i-th object of class g for variable v.<br />
The values estimated by the model are then:<br />
ˆx ivg = ∑ a<br />
t iag l vag<br />
while the residuals are defined as:<br />
r ivg = ˆx ivg − x ivg <br />
The classification rule of object i is based on a Fisher’s F-test so that object<br />
i is classified in class g if:<br />
rsd 2 ig<br />
rsd 2 g<br />
2D-PAGE Maps Analysis 303<br />
where<br />
sd vc = standard deviation of variable v on class c;<br />
rsd vc = residual standard deviation of variable v of the objects of class c<br />
from the model of their own class.<br />
The MP ranges from 0 (variable irrelevant on the definition of the class<br />
model) to 1.<br />
A typical representation of MP is given in Fig. 5, where the variables are<br />
represented on the x axis, and MP is represented as a bar diagram on the y<br />
axis. Figure 5 represents the MPs of class C in the example of Figs. 2–4.<br />
The discrimination power (DP) is a measure of the ability of each variable<br />
to discriminate between two classes (c and g) at a time. The greater the DP,<br />
the more a variable weights on the classification of an object in class c or g. It<br />
is defined as:<br />
√<br />
rsd 2 vcg + rsd 2 vgc<br />
DP vc =<br />
rsd 2 vc + rsd 2 vg<br />
1.0<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0.0<br />
1<br />
95<br />
189<br />
283<br />
377<br />
471<br />
565<br />
659<br />
753<br />
847<br />
941<br />
1035<br />
1129<br />
Fig. 5. Modeling power of a class of six control samples.
304 Marengo et al.<br />
where<br />
rsd 2 vcg = square residual standard deviation of variable v of the objects<br />
of class c from the model of class g;<br />
rsd 2 vgc = square residual standard deviation of variable v of the objects of<br />
class g from the model of class c;<br />
rsd 2 vc = square residual standard deviation of variable v of the objects<br />
of class c from the model of their own class;<br />
rsd 2 vg = square residual standard deviation of variable v of the objects of<br />
class g from the model of their own class.<br />
The DP is positively defined, but it is not limited. A representation of DP<br />
is shown in Fig. 6; the variables are represented on the x axis, and DPs as bar<br />
diagram on the y axis. Figure 6 represents the DPs of classes A and B for<br />
the example of Figs. 2–5. In general, when the dataset is constituted by two<br />
classes, a unique set of DPs is obtained, corresponding to the discrimination<br />
between the two classes present. On the other hand, where more than two<br />
classes are present, it is possible to obtain a set of DPs for each couple of<br />
classes compared.<br />
Modeling powers and DPs can be represented on a color scale on “virtual”<br />
2D maps, as seen for the loadings plots, for clearer representation. An example<br />
is given in Fig. 7, where the MPs and DPs represented as bar diagrams in<br />
Figs. 5 and 6 are represented on virtual 2D maps.<br />
6000<br />
5000<br />
4000<br />
3000<br />
2000<br />
1000<br />
0<br />
1<br />
95<br />
189<br />
283<br />
377<br />
471<br />
565<br />
659<br />
753<br />
847<br />
941<br />
1035<br />
1129<br />
Fig. 6. Discriminating power of two classes: treated with drug A (six samples) and<br />
with drug B (six samples).
2D-PAGE Maps Analysis 305<br />
220<br />
200<br />
180<br />
160<br />
140<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
0<br />
0 20 40 60 80 100 120 140 160 180 200 220<br />
Modeling Power of class C<br />
220<br />
200<br />
180<br />
160<br />
140<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
0<br />
0 20 40 60 80 100 120 140 160 180 200 220<br />
Discrimination Power classes A–B<br />
Fig. 7. MPs and DPs of Figs 5 and 6 represented on virtual 2D-maps.
306 Marengo et al.<br />
3.4. Partial Least Squares (PLS) Regression and Discriminant<br />
Analysis–Partial Least Squares (DA-PLS) Regression<br />
Partial least squares is a regression method using the information contained<br />
in X data matrix to predict the behavior of Y data matrix. PLS method models<br />
both X and Y variables simultaneously to find the latent variables in X that<br />
will predict the latent variables in Y. These PLS components (latent variables)<br />
are similar to the PCs. If there are several responses, they are modeled together<br />
in a multivariate way (28,29,30). PLS can be used for discriminant analysis<br />
(DA-PLS) by creating a response variable for each category: in the case of<br />
proteomic data, one response variable for each group of samples. Each response<br />
variable is assigned a 1 value for the samples belonging to the corresponding<br />
class, and a 0 value for the samples belonging to different classes.<br />
3.5. Applications<br />
3.5.1. Pattern Recognition Methods<br />
Many applications are reported in literature for the use of multivariate tools<br />
in the analysis of spot volume datasets. PCA can be considered quite a classical<br />
approach with its first application to spot volume data dating back to the mid-<br />
1980s, as reported by Anderson (31) in USA and Tarroux (32) in France.<br />
Anderson (31) reports an application of PCA coupled to cluster analysis to<br />
identify the differences among a panel of human cell lines; all the groups were<br />
successfully separated considering only the subset of proteins present in all<br />
the cell lines contemporarily. Tarroux et al. (32) applied PCA in the HERMeS<br />
software package, again coupled to cluster analysis.<br />
More recently, both PCA and cluster analysis have been applied to the study<br />
of DNA and RNA fragments of several biological systems by the groups of<br />
Couto (33), Johansson (34), and Boon (35) and to the immunological diagnosis<br />
of hydatidosis (36,37). Other applications are from the group of Kovarova (38,<br />
39) and De Moor et al. (40), who applied multivariate tools to microarray data.<br />
Iwadate et al. (41) applied discriminant analysis to the classification of<br />
human gliomas; the proteomic patterns of 85 tissue samples were compared<br />
(52 glioblastoma multiforme, 13 anaplastic astrocytomas, 10 atrocytomas, 10<br />
normal brain tissues). Normal brain tissues could be correctly distinguished<br />
from glioma tissues by cluster analysis, which proved to be significantly correlated<br />
with the patient survival. Discriminant analysis extracted a set of 37<br />
proteins differentially expressed based on histological grading.<br />
Principal Component Analysis has been also applied to toxicological studies<br />
by the groups of Amin (42), Hejine (43), and Anderson (44). The first paper<br />
(42) reports a study on the effect on expression profile of genes played by three
2D-PAGE Maps Analysis 307<br />
nephrotoxicants (cisplatin, gentamicin, and puromycin) on rats, as a function of<br />
time after initial administration. PCA and gene expression-based clustering of<br />
compound effects confirmed sample separation based on dose, time, and degree<br />
of renal toxicity. Heijne (43) studied the acute hepatotoxicity induced in rats by<br />
bromobenzene administration; the physiological symptoms recorded coincided<br />
with many changes of hepatic mRNA and protein content. PCA proved to be<br />
effective in the discrimination between control and treated samples for both<br />
protein and gene expression profiles; some of the proteins that significantly<br />
changed upon bromobenzene treatment were identified by mass spectrometry.<br />
Anderson (44) investigated the effects of five peroxisome proliferators on the<br />
protein profile in the livers of treated mice at 5- and 35-day time points. Data<br />
for the selected set of 107 liver protein spots, which respond strongly to at<br />
least one of the test compounds, were subjected to PCA to search for global<br />
protein pattern changes. PC 1 was identified as a global measure of peroxisome<br />
proliferation by its correlation with enzymatic peroxisomal -oxidation, while<br />
PC 2 separated the samples on the basis of time exposures.<br />
Perrot et al. (45) applied PCA to the comparison of protein expression of<br />
gel-entrapped Escherichia coli cells submitted to a cold shock at 4 °C with<br />
those of exponential- and stationary-phase free-floating cells. Ten different<br />
incubation conditions were considered; each experiment was replicated three<br />
times and each gel was run in duplicate. PCA was carried out on the 203 spots<br />
identified as significantly reproducible than those corresponding to synthesis<br />
at 37 °C, using the average spot intensities for each experimental condition<br />
adopted. In order to remove the variability of staining conditions among the<br />
gels, each spot volume was normalized by the sum of volumes of all the spots<br />
detected on each map. The data were autoscaled before PCA. From score<br />
analysis, it was possible to point out that the protein response of immobilized<br />
cells after the cold shock was significantly different from those of exponentialand<br />
stationary-phase free-floating organisms. The reasons for these differences<br />
could be searched for in the loadings analysis, from which the identification of<br />
nine families of proteins could also be confirmed.<br />
Principal Component Analysis was applied to identify the differences in<br />
macrophage maturation in the U937 human lymphoma cell line by Verhoeckx<br />
et al. (46). PCA proved to be effective in the identification of variations between<br />
samples belonging to different macrophage maturation times, where standard<br />
t-tests identified a smaller number of biomarkers. Another application (47)<br />
consisted of the characterization of anti-inflammatory compounds.<br />
Other applications from Marengo (22,23,48) exploit PCA coupled to both<br />
cluster analysis and SIMCA classification for the identification of differences<br />
between groups of maps. The first application (48) refers to a spot quantity<br />
dataset comprising 435 spots detected in 18 samples belonging to two different
308 Marengo et al.<br />
cell lines of control (untreated) and drug-treated pancreatic ductal carcinoma<br />
cells. The study was conceived for the identification of the role played by drugs<br />
on different cell lines. PCA allowed clear discrimination of the four groups of<br />
samples with the use of three PCs, and the analysis of the loadings provided<br />
reasons for the differences among groups of samples. The results were further<br />
confirmed by cluster analysis. Identification of some of the most relevant spots<br />
was also performed by mass spectrometry. The other two applications (22,23)<br />
regard the use of PCA and SIMCA to the classification of proteomic maps.<br />
The first paper (22) shows an application to the adrenal glands of healthy and<br />
diseased mice. PCA was able to discriminate the two classes of samples by<br />
means of the first PC, the loadings of which allowed the identification of spots<br />
responsible for the differences. SIMCA was then applied for the classification<br />
of samples in the two classes, and it was able to correctly classify all the<br />
samples present with one PC in the SIMCA model of each class. SIMCA<br />
allowed the identification of the most discriminating spots by the analysis of<br />
DPs. The comparison between the maps showed up- and down-regulation of<br />
84 polypeptide chains out of a total of 700 spots detected.<br />
An analog approach was followed even for the comparison of phenotypic<br />
expression of mantle cell lymphoma GRANTA-519 and MAVER-1 cell<br />
lines (23).<br />
Marengo proposed an alternative method to show loadings from PCA, and<br />
modeling and discriminating powers calculated by SIMCA. In order to obtain<br />
clearer representation of the results, the spots showing relevant discriminating<br />
and/or modeling power (and loadings as well) are represented on a virtual 2D-<br />
PAGE map. Each discriminating spot is represented as a circle on a virtual 2D<br />
map; the position of each spot is determined by its x–y coordinates identified by<br />
standard software packages (PDQuest in this case). The spots are represented on<br />
a color scale: darker red tones identify spots showing a larger discriminating or<br />
modeling power. The use of such representations in common software packages<br />
could represent a valid alternative to the standard visualization of loadings for<br />
each variable in the space given by two PCs at a time.<br />
Fujii et al. (49) studied the histological subtypes of lymphoid neoplasms:<br />
42 cell lines from human lymphoid neoplasms were included. The discriminating<br />
spots were selected by means of different methods used in sequence:<br />
(1) Wilcoxon or Kruskal–Wallis tests to find spots whose intensity was significantly<br />
(p < 0.05) different among the cell line groups, (2) statistical learning<br />
methods to prioritize the spots according to their contribution to the classification,<br />
and (3) unsupervised classification methods to validate classification<br />
robustness by the selected spots. Thirty-one spots resulted to be significant, 24<br />
of which were identified by mass spectrometry.
2D-PAGE Maps Analysis 309<br />
Other applications are in the field of food quality (coupled to cluster analysis<br />
and discriminant analysis): several examples are present in literature about<br />
cheese classification (50) and identification of the protein content in wheat and<br />
bread (51,52).<br />
3.5.2. Discriminant Analysis–Partial Least Squares<br />
With regard to the application of DA-PLS methods, many papers have<br />
appeared in the last few years. Jessen et al. (53) demonstrated with two<br />
examples how information can be extracted from 2DE data by discrimination<br />
PLSR with variable selection. The time course of post mortem proteome<br />
changes in the muscle tissues of pigs was investigated. A first discriminant<br />
PLSR was performed on the spot volume dataset derived from usual analysis<br />
via dedicated software (Bioimage 2D Analyser, Genomic Solutions, USA), the<br />
independent response being a binary indicator of the individual pig considered<br />
or of the sampling time (post mortem increasing time). PLS has been proved<br />
to be successful in the identification of spots characterized by systematic<br />
variation. In order to identify only those spots showing actual relevant variation<br />
among the groups identified, a variable selection procedure was applied, and<br />
no relevant spots were iteratively eliminated from the model: the final model<br />
chosen contained the minimum number of spots giving the best correlation with<br />
the response. For variable selection, a jack-knifing procedure was selected.<br />
Kleno et al. (54) applied PCA and PLS to the identification of the mechanism<br />
of action of hydrazine toxicity in rat liver samples. PCA was carried out on<br />
a data matrix of dimensions 30 × 431 (30 being the 2D maps: 5 animals × 3<br />
doses of hydrazine × 2 times after administration; 431 being the spots revealed<br />
on the maps). PC 1 was able to separate the samples according to three different<br />
dose levels, while PC 4 allowed the separation of the two times after the administration,<br />
but only for the largest dose level. The analysis of the loadings did<br />
not allow a clear identification of the most relevant discriminating spots, and<br />
so a PLSR was applied to model the Y variable (dose level of hydrazine). A<br />
variable selection according to jack-knifing was applied. The PLS regression<br />
allowed to identify spots that play an important role in the differentiation of<br />
samples according to the dose level administered. The results were compared<br />
to standard univariate t-tests, showing that some spots identified by PLS could<br />
not be identified as relevant by standard t-tests; this is due to the fact that PLS<br />
takes into account the correlation structure of the dataset.<br />
Kiaersgard et al. (55) studied the change in the proteomic profile of cod<br />
muscle samples during different storage conditions. Eleven storage conditions<br />
were taken into account, deriving from a large factorial design including storage<br />
temperature (two levels), storage period (4 levels), and chill storage period
310 Marengo et al.<br />
(5 levels). Each sample was replicated twice, and the replicated samples were<br />
run on different batches. PCA provided a grouping of samples on the basis of<br />
frozen storage time, but no information emerged with respect to the differences<br />
between the samples according to the other two parameters. The study was<br />
refined through the application of DA-PLS with variable selection by a jackknife<br />
procedure, and it allowed the identification of relevant spots with respect<br />
to the differentiation of samples according to the storage time. The authors focus<br />
their attention even on the optimal normalization of data before multivariate<br />
analysis. Autoscaling is in fact the most exploited method for data normalization<br />
in proteomics, but it presents the risk of amplifying the noise; this is particularly<br />
true for proteomics where experimental uncertainty is large. To avoid this<br />
problem, mean centering was applied to the data, and normalization was then<br />
applied by dividing each mean centered value by (SD + B) (SD = standard<br />
deviation of each variable, B = constant term to be optimized). The authors<br />
identified the scale range of B value (2500 in their case) by representing in a<br />
scatter diagram the mean volume for each variable (spot) versus its standard<br />
deviation: the best value was then selected by considering several values of<br />
B, as the value giving the best agreement between univariate and multivariate<br />
approaches.<br />
Gottfries et al. (56) applied both PCA and DA-PLS to the study of two<br />
different datasets: the first dataset consists of samples of cerebrospinal fluid<br />
from control individuals and individuals affected by different pathologies (12<br />
control, 15 with Alzheimer’s disease, 15 with Frontotemporal dementia, and 10<br />
with Parkinson’s disease), giving a final dataset of dimension 52 × 96 (96 spots<br />
identified on 52 maps). The second dataset consists of liver samples from normal<br />
and obese mice (samples were grouped into six groups comprising four to eight<br />
animals each); the final dataset has dimension 30 × 603 (30 being the samples,<br />
and 603 the spots identified). In both cases, the groups of samples present in<br />
each dataset could be separated by means of the first three PCs after the application<br />
of PCA. DA-PLS was then applied to each dataset in order to identify<br />
the spots responsible for the differences between each pair of groups: in all the<br />
cases the first latent variable computed was able to correctly classify the samples.<br />
In another application, Karp et al. (57) demonstrated the effectiveness of<br />
PLS-DA in the identification of the differences in three proteomic datasets;<br />
among them, a dataset in which no difference was expected between the two<br />
groups of samples considered was also included: in this case, as expected, PLS-<br />
DA provided no model. Finally, Norden et al. (58) applied PCA and DA-PLS<br />
to the identification of the differences between urine samples of smoking and<br />
non-smoking individuals.<br />
The great number of applications of PCA, PLS, and other multivariate tools<br />
in proteomics (31–59) gives a clear idea of the importance of multivariate
2D-PAGE Maps Analysis 311<br />
methods in this field; such techniques are in fact able to identify a larger number<br />
of variables (spots) relevant for discrimination between the classes of samples<br />
with respect to the classical t-tests usually carried out by standard software<br />
packages.<br />
4. Image Analysis<br />
The second approach to 2D-PAGE analysis is focused on the direct analysis<br />
of 2D maps images. This approach could present a fundamental advantage to<br />
proteomic data analysis: the elimination of contribution given by the operator,<br />
which is usually relevant when dedicated software packages for proteomic<br />
maps analysis are used. Several methods for direct 2D maps image analysis<br />
are reported in literature, but they are not yet much widespread to be included<br />
in common software packages; these methods mainly exploit artificial neural<br />
networks, fuzzy logic principles, and the calculation of mathematical moments.<br />
Such procedures represent the frontier in bioinformatics, and some of them<br />
are yet under development. The main principles related to these methods will<br />
be presented here, together with a review of the most interesting applications<br />
present in literature.<br />
4.1. Fuzzy Logic<br />
The low reproducibility of 2D gel-electrophoresis, pointed out earlier in this<br />
chapter, produces significant differences even among maps corresponding to<br />
replicates of the same electrophoretic run; these differences consist of changes<br />
in spot position, size, and shape. The precise description of the position of each<br />
spot in terms of x–y coordinates thus appears very difficult to accomplish. The<br />
uncertainty on the position and shape of each spot can be effectively treated<br />
by fuzzy logic principles. Marengo et al. (60,61,62,63,64) successfully applied<br />
fuzzy logic principles coupled to multivariate statistical tools to the analysis<br />
of sets of 2D maps.<br />
Their four-step procedure consists of:<br />
1. image digitalization;<br />
2. image defuzzyfication;<br />
3. image refuzzyfication;<br />
4. application of multivariate tools to fuzzy maps.<br />
4.1.1. Image Digitalization<br />
The first step consists of scanning each map by a densitometer to provide a<br />
description of the map as a grid of a given step containing in each cell the OD
312 Marengo et al.<br />
ranging from 0 to 1. The contribution to the signal of each map given by the<br />
background is eliminated by applying a cut-off value to each map (generally<br />
0.3/0.4): the values below the cut-off value are transformed into null values.<br />
The cut-off value applied has to be optimized independently for each case<br />
study.<br />
4.1.2. Image Defuzzyfication<br />
The second step mainly performs defuzzyfication of each map, consisting<br />
of the elimination of sensitivity due to the destaining protocol. The digitalized<br />
image is, in fact, turned into a grid of binary values: 0 is assigned to the cell<br />
where no signal is detected, 1 to the cell where a value above the cut-off<br />
threshold is present.<br />
4.1.3. Refuzzyfication<br />
The previous step eliminates the information about spatial uncertainty as<br />
well, since each spot is no more described by grey-scale values but only<br />
by binary values (presence/absence). This step is then focused on the reintroduction<br />
of information about spatial uncertainty. Each cell containing a 1<br />
value in step 2 is substituted by a 2D probability function. The most suitable<br />
distribution is a 2D Gaussian function. The probability of finding a signal in<br />
cell x i , y j when a signal is already present in cell x k , y l is given by:<br />
where<br />
1<br />
fx i y j x k y l = e<br />
2 x y<br />
[<br />
1<br />
21− 2 <br />
x i −x k 2<br />
2 x<br />
]<br />
+ y j −y l 2<br />
y<br />
2<br />
is correlation between 1 st and 2 nd dimension;<br />
(x i , y j ) is the position of the spot influencing the spot in position (x k , y l );<br />
y is the standard deviation along 1 st dimension; and<br />
x : is the standard deviation along 2 nd dimension.<br />
The correlation between the two dimensions () is usually fixed at 0, corresponding<br />
to the complete independence of two electrophoretic runs; the two<br />
standard deviations, x and y , correspond to the standard deviations of the<br />
2D Gaussian function along the x and y axes. Maintaining them identical<br />
corresponds to an identical repeatability of the result with respect to the two<br />
electrophoretic runs (according to the pH gradient and molecular mass): in<br />
this case, the parameter that is analyzed for its effect on the final result is<br />
= x = y . Alternatively, the two parameters can be fixed at different values,
2D-PAGE Maps Analysis 313<br />
usually x = 1.5 y , corresponding to an uncertainty along the second dimension<br />
that is about 50% larger than that along the first dimension. The separation<br />
according to the molecular mass is in fact expected to show a larger uncertainty<br />
(self-made polymerization of the gel for the second run versus a first dimension<br />
run on commercial strips).<br />
A change in parameter (or of parameters x and y ) corresponds to the<br />
modification of distance at which an occupied cell exerts its effect: large <br />
values reflect in a perturbation operating at larger distances. Smaller values<br />
correspond to a perturbation operating at a smaller distance, with spots acting<br />
a lesser effect on their neighbourhood and a crisper final image. Therefore, the<br />
larger the parameter, the larger the fuzzyfication level applied to the maps.<br />
In general, best results are expected for intermediate levels of parameters,<br />
corresponding to not too fuzzyfied maps (nor too blurred final images).<br />
With respect to the choice of probability function, the Gaussian distribution<br />
appeared to be the best alternative, since spots can be described as<br />
intensity/probability distributions with the highest intensity/probability value at<br />
the center of the spot and decreasing intensities/probabilities as the distance<br />
from the center increases. In addition, the integral of the Gaussian function on<br />
the whole domain of the 2D-PAGE is 1, corresponding to a total signal that is<br />
blurred but, in the meantime, maintained quantitatively coherent.<br />
The value of the signal S k in each cell x i , y j of the fuzzy map is calculated<br />
by the sum of the effect of all neighbor cells x<br />
j ′ , y′ j<br />
containing spots:<br />
S k =<br />
∑<br />
f ( )<br />
x i y j x i' y j'<br />
i'j='1n<br />
Even if the sum runs on all the cells in the grid, only the neighbor cells are<br />
influenced by the presence of a signal, depending on the parameter.<br />
The procedure consists of turning each digitalized image into a virtual map<br />
containing, in each cell, the sum of the influence of all the spots of the original<br />
2D-PAGE; these virtual maps can be called fuzzy matrices or fuzzy maps. Due<br />
to the existence of complex spots of irregular shape in real maps, the Gaussian<br />
function is associated to each cell instead of to each spot.<br />
Figure 8 represents an example of fuzzyfication of a map at different <br />
values; the example shows the digitalized and defuzzyfied maps and the fuzzyfication<br />
of the map for five increasing values.<br />
4.1.4. Application of Multivariate Tools to Fuzzy Maps<br />
The final fuzzy maps can then be analyzed by several multivariate tools<br />
for diagnostic/prognostic purposes. Two approaches will be presented here: (1)<br />
the coupling of PCA and classification tools; (2) the use of multi-dimensional<br />
scaling (MDS) techniques.
314 Marengo et al.<br />
(A)<br />
Digitalised image<br />
(B)<br />
De-fuzzyfied image<br />
20<br />
20<br />
40<br />
40<br />
60<br />
60<br />
80<br />
80<br />
100<br />
100<br />
120<br />
120<br />
140<br />
140<br />
160<br />
160<br />
180<br />
180<br />
200<br />
20 40 60 80 100 120 140 160 180 200<br />
200<br />
20 40 60 80 100 120 140 160 180 200<br />
(C)<br />
σ = 0.50 (D)<br />
σ = 1.00<br />
20<br />
20<br />
40<br />
40<br />
60<br />
60<br />
80<br />
80<br />
100<br />
100<br />
120<br />
120<br />
140<br />
140<br />
160<br />
160<br />
180<br />
180<br />
200<br />
20 40 60 80 100 120 140 160 180 200<br />
200<br />
20 40 60 80 100 120 140 160 180 200<br />
(E)<br />
σ = 1.50 (F)<br />
σ = 2.00<br />
20<br />
20<br />
40<br />
40<br />
60<br />
60<br />
80<br />
80<br />
100<br />
100<br />
120<br />
120<br />
140<br />
140<br />
160<br />
160<br />
180<br />
180<br />
200<br />
20 40 60 80 100 120 140 160 180 200<br />
200<br />
20 40 60 80 100 120 140 160 180 200<br />
(G)<br />
σ = 2.50<br />
20<br />
40<br />
60<br />
80<br />
100<br />
120<br />
140<br />
160<br />
180<br />
200<br />
20 40 60 80 100 120 140 160 180 200<br />
Fig. 8. Sample ILL1 from (61): digitalized image (A); defuzzyfied image (B);<br />
fuzzyfication at five values (C–G).
2D-PAGE Maps Analysis 315<br />
4.1.4.1. PCA and Classification Methods (61)<br />
Marengo et al. (61) have reported an application of PCA and LDA to fuzzy<br />
maps to a set of eight 2D maps belonging to control and mantle cell lymphoma<br />
samples.<br />
Principal Component Analysis can be applied to images by the previous<br />
unwrapping of each image; each sample (map) is turned into a series of variables<br />
describing the signal in each position of the map. In this case, 200 × 200<br />
pixel images were taken into consideration, providing a final set of 40,000<br />
variables for each map. PCA is particularly useful here to detect a small number<br />
of components accounting for the differences existing between the groups of<br />
samples and operating, in the meantime, a dimensionality reduction. The significant<br />
PCs calculated were used to build a LDA model to classify the samples;<br />
the selection of the variables for LDA model, which discriminates between<br />
the classes present in the dataset, was performed by a stepwise algorithm in<br />
forward search (F to−enter = 4.0).<br />
The procedure was repeated for different values of the parameter in order<br />
to detect the best value providing correct classification of the samples with<br />
the smallest number of components in the final LDA model. The best results<br />
(100% of correct assignments) were obtained for values ranging from 1.75<br />
to 2.25, with PC 1 and PC 4 in the final LDA model. The differences existing<br />
between the two groups of samples could then be investigated by the analysis<br />
of loadings on the first and the fourth PCs.<br />
Figure 9 shows the score plot and the loading plot of PC 1 and PC 4 for<br />
= 2.00. The loadings are represented again on a virtual map on a color scale:<br />
white tones correspond to the zones in the map characterized by large positive<br />
loadings and the black tones to the zones characterized by large negative<br />
loadings on the corresponding PC.<br />
4.1.4.2. Multi-Dimensional Scaling<br />
In other applications of multivariate tools to fuzzy maps, Marengo et al.<br />
(62,63) describe the use of MDS procedures. MDS performs a substantial<br />
dimensionality reduction and an effective graphical representation of the data<br />
on the basis of similarity calculated between couples of objects. MDS searches<br />
for the smallest number of dimensions in which the objects can be represented<br />
as points, matching, as much as possible, the distances between the objects<br />
in the new reference system with those calculated in the original reference<br />
system. In these applications, the calculations were performed by the Kruskal<br />
iterative method; the search for the coordinates was based on the steepest<br />
descent minimization algorithm, where the target function is the so-called stress<br />
(S), which is a measure of the ability of the configuration of points to simulate<br />
the original distance matrix.
316 Marengo et al.<br />
10<br />
σ = 2.00<br />
8<br />
HEA2<br />
6<br />
HEA4<br />
4<br />
PC4<br />
2<br />
ILL2<br />
0<br />
–2<br />
ILL3<br />
ILL4<br />
HEA3<br />
–4<br />
ILL1<br />
HEA1<br />
–6<br />
–12 –10 –8 –6 –4 –2 0 2 4 6 8 10 12 14 16 18<br />
PC1<br />
Loadings PC1<br />
Loadings PC4<br />
20<br />
0.04<br />
20<br />
0.03<br />
40<br />
60<br />
80<br />
0.03<br />
0.02<br />
40<br />
60<br />
80<br />
0.02<br />
0.01<br />
100<br />
0.01<br />
100<br />
0<br />
120<br />
140<br />
160<br />
180<br />
200<br />
120<br />
0<br />
140<br />
–0.01 160<br />
180<br />
–0.02<br />
200<br />
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200<br />
–0.01<br />
–0.02<br />
–0.03<br />
–0.04<br />
Fig. 9. Score plot (A) and loading plots (B) of PC1 and PC4 with = 2.00.<br />
As for the previous applications based on PCA and LDA, several values of <br />
parameter have been investigated, and the one providing the best classification<br />
was selected. In this case, for each value of the parameter, a similarity matrix<br />
has to be built.<br />
From the match between the two fuzzy maps k and l, the common signal<br />
SC kl (the sum of all signals present in both maps) and the total signal ST kl can<br />
be computed:<br />
SC kl = ∑<br />
min ( )<br />
S k i Sl i<br />
i=1n<br />
ST kl = ∑<br />
max ( )<br />
S k i Sl i<br />
i=1n
2D-PAGE Maps Analysis 317<br />
where n is the number of cells in the grid. The similarity index is then<br />
computed by:<br />
S kl = SC kl<br />
ST kl<br />
S kl ranges from 0 (two maps showing no common structure) to 1 (two identical<br />
maps). In both the applications, the optimal values that provide the best classification<br />
of the samples with only one or two dimensions could be identified.<br />
4.2. Moment Functions<br />
Moment functions have been widely used in image analysis, in applications<br />
related to invariant pattern recognition, object classification, pose estimation,<br />
image coding, and reconstruction (65,66,67,68,69). A set of moments computed<br />
from a digital image generally represents global characteristics of the image<br />
shape, and provides a lot of information about different types of geometrical<br />
features of the image. Geometric moments were the first ones to be applied to<br />
images, as they are computationally very simple. With the progress of research<br />
in image processing, many new types of moment functions have been introduced<br />
recently, such as orthogonal moments, rotational moments, and complex<br />
moments, which are useful tools in the field of pattern recognition, and can be<br />
used to describe the features of objects such as shape, area, border, location,<br />
and orientation; naturally each moment function has its own advantages in<br />
specific applications.<br />
The most important and most used moments are orthogonal moments (e.g.,<br />
Legendre (70,71,72) and Zernike moments (73,74,75)), which can attain a<br />
zero value of redundancy measure in a set of moment functions, so that<br />
these orthogonal moments correspond to the independent characteristics of the<br />
image. In other words, moments with orthogonal basis functions can be used<br />
to represent the image by a set of mutually independent descriptors, with a<br />
minimum amount of information redundancy. So far, orthogonal moments have<br />
additional properties of being more robust, with respect to the non-orthogonal<br />
ones, in the presence of image noise. Orthogonal moments also permit analytical<br />
reconstruction of an image intensity function from a finite set of moments,<br />
using the inverse moment transform.<br />
Legendre moments are the most used orthogonal moments and can be implemented<br />
as feature descriptors for 2D-PAGE maps classification.<br />
The main advantages in the use of Legendre moments to clustering the<br />
maps derive from the possibility to obtain invariance to translation, scale, and<br />
rotation; in other words, the original maps, without any pre-treatment, can be<br />
used for classification, and the use of complex commercial software can be<br />
totally avoided.
318 Marengo et al.<br />
The number of calculated moments is very large, and many of them do not<br />
contain information related to the specific target of correctly classifying the<br />
2D-PAGE maps; for this reason a method for selecting the moments having<br />
highest DP must be applied (e.g., LDA).<br />
4.2.1. Legendre Moments<br />
The Legendre polynomials form a complete orthogonal set inside the unit<br />
circle. Moments with Legendre polynomials as kernel functions were first<br />
introduced by Teague (68).<br />
The kernel of Legendre moments are products of Legendre polynomials<br />
defined along the rectangular image coordinate axes inside a unit circle.<br />
The two-dimensional Legendre moments of orderp + qof an image<br />
intensity mapf x y are defined as:<br />
L pq =<br />
2p + 12q + 1<br />
4<br />
∫ 1 ∫ 1<br />
−1<br />
−1<br />
P p x × P q yfx ydxdy<br />
xy∈−11<br />
where Legendre polynomial, P p x, of order p is given by:<br />
{<br />
}<br />
p∑<br />
P p x = −1 p−k 1 p − k!x k<br />
2<br />
(<br />
2 p p−k<br />
) (<br />
!<br />
p+k<br />
)<br />
!k!<br />
k=0<br />
2<br />
2<br />
p−k=even<br />
The recurrence relation of Legendre polynomials, P p x, is:<br />
P p x 2p − 1 xP p−1 x − p − 1 P p−2 x<br />
<br />
p<br />
where P 0 x-1, P 1 x = x, and p>1. Since the region of definition of Legendre<br />
polynomials is the interior of [–1,1], a square image of N × N pixels with<br />
intensity function fi j, 0≤i, j≤( N–1 ) is scaled in the region –1< x,y
2D-PAGE Maps Analysis 319<br />
The reconstruction of image function from calculated moments can be<br />
performed by the following inverse transformation:<br />
p max q<br />
∑ ∑ max<br />
( )<br />
f i j = pq P p x i P q xj<br />
p=0 q=0<br />
Marengo et al. (76) report an interesting application of Legendre moments<br />
to a set of 2D-PAGE maps belonging to two different cell lines of control<br />
(untreated) and drug-treated pancreatic ductal carcinoma cells.<br />
The aim of the work was to obtain the correct classification of the 18 samples<br />
using the Legendre moments as discriminant variables.<br />
Each 2D-PAGE, which was automatically digitalized, was described by a<br />
200×200 matrix of pixels; the value of each pixel varies from 0 to 1 to indicate<br />
the staining intensity in the given position.<br />
The Legendre moments of the 18 digitalized images were calculated.<br />
Moments up to a maximum order of 100 were computed from the images. Each<br />
matrix held the global information of the corresponding 2D-PAGE map.<br />
The final dataset contained 18 samples and 10,201 variables. The number<br />
of variables was very large, and many of them were either redundant or did<br />
not contain information related to the specific target of correctly classifying<br />
the samples; for this reason a method for selecting the variables having the<br />
highest power of discrimination was applied (forward stepwise LDA with<br />
F to−enter = 4.0). The results of stepwise LDA procedure showed that only six<br />
different Legendre moments were necessary in order to correctly classify the<br />
18 samples.<br />
The results demonstrate that the Legendre moments can be successfully<br />
applied for fast classification and similarity analysis of 2D-PAGE maps.<br />
4.3. Other Methods<br />
Schultz et al. (77), together with the application of PCA and PLS to spot<br />
volume data, applied PCA to the analysis of gel images after digitalization and<br />
unwrapping. The choice of the alignment procedure for the sets of gels proved<br />
to be the determinant of the final result. PCA proved to be effective in the<br />
identification of the groups of maps present.<br />
Marengo et al. (78) also applied three-way PCA to the identification of<br />
the differences among groups of 2D maps. Proteomic datasets are suitable<br />
to be treated by three-way method due to their three-way structure: the first<br />
dimension being the pH gradient, the second the molecular mass, and the third<br />
the samples. In three-way PCA, the observed modes (conventionally called I,<br />
J, and K) can be synthesized in more fundamental modes, each element of a<br />
reduced mode expressing a particular structure existing between all or a part
320 Marengo et al.<br />
of the elements of the associated observation mode. The final result is given<br />
by three sets of loadings together with a core array describing the relationship<br />
among them. Each of the three sets of loadings can be displayed and interpreted<br />
in the same way as a score plot of standard PCA. Three-way PCA was preceded<br />
by data transformation to scale all the samples and make them comparable;<br />
to this purpose, maximum scaling was selected and the digitalized 2D PAGE<br />
maps were scaled one at a time to the maximum value for each map. This<br />
method was successfully applied to datasets of human lymph-nodes and rat<br />
sera allowing the identification of the main differences existing among the sets<br />
of 2D maps.<br />
References<br />
1. Mahon, P., Dupree, P., (2001) Quantitative and reproducible two-dimensional gel<br />
analysis using Phoretix 2D Full, Electrophoresis 22, 2075–2085<br />
2. Rubinfeld, A., Keren-Lehrer, T., Hadas, G., Smilansky, Z., (2003) Hierarchical<br />
analysis of large-scale two-dimensional gel-electrophoresis experiments,<br />
Proteomics 3, 1930–1935<br />
3. Anderson, N.L., Taylor, J., Scandora, A.E., Coulter, B.P., Anderson, N.G., (1981)<br />
The TYCHO system for computer analysis of two-dimensional gel electrophoresis<br />
patterns, Clinical Chemistry 27 (11), 1807–1820<br />
4. Rosengren, A.T., Salmi, J.M., Aittokallio, T., Westerholm, J., Lahesmaa, R.,<br />
Nyman, T.A., Nevalainen, O.S., (2003) Comparison of PDQuest and Progenesis<br />
software packages in the analysis of two dimensional electrophoresis gels,<br />
Proteomics 3, 1936–1946<br />
5. Raman, B., Cheung, A., Marten, M.R., (2002) Quantitative comparison and<br />
evaluation of two commercially available, two-dimensional electrophoresis image<br />
analysis software packages, Z3 and Melanie, Electrophoresis 23, 2194–2202<br />
6. Panek, J., Vohradsky, J., (1999) Point pattern matching in the analysis of twodimensional<br />
gel electropherograms, Electrophoresis 20, 3483–3491<br />
7. Pleissner, K.P., Hoffman, F., Kriegel, K., Wenk, C., Wegner, S., Sahistrom, A.,<br />
Oswald, H., Alt, H., Fleck, E., (1999) New algorithmic approaches to protein spot<br />
detection and pattern matching in two-dimensional electrophoresis gel databases,<br />
Electrophoresis 20, 755–765<br />
8. Voss, T., Haberl, P., (2000) Observations on the reproducibility and matching<br />
efficiency of two-dimensional electrophoresis gels: consequences for comprehensive<br />
data analysis, Electrophoresis 21, 3345–3350<br />
9. Cutler, P., Heald, G., White, I.R., Ruan, J., (2003) A novel approach to<br />
spot detection for two-dimensional gel electrophoresis images using pixel value<br />
collection, Proteomics 3, 392–401<br />
10. Molloy, M.P., Brzezinski, E.E., Hang, J., McDowell, M.T., VanBogelen, R.A.,<br />
(2003) Overcoming technical variation and biological variation in quantitative<br />
proteomics, Proteomics 3, 1912–1919
2D-PAGE Maps Analysis 321<br />
11. Moritz, B., Meyer, H.E., (2003) Approaches for the quantification of protein<br />
concentration ratios, Proteomics 3, 2208–2220<br />
12. Wheelock, A.M., Buckpitt, A.R., (2005) Software-induced variance in twodimensional<br />
gel electrophoresis image analysis, Electrophoresis 26, 4508–4520<br />
13. Almeida, J.S., Stanislaus, R., Krug, E., Arthur, J.M., (2005) Normalisation and<br />
analysis of residual variation in two-dimensional gel electrophoresis for quantitative<br />
differential proteomics, Proteomics 5, 1242–1249<br />
14. Pietrogrande, M.C., Marchetti, N., Dondi, F., Righetti, P.G., (2003) Spot<br />
overlapping in two-dimensional polyacrylamide gel electrophoresis maps:<br />
relevance to proteomics, Electrophoresis 24, 217–224<br />
15. Pietrogrande, M.C., Marchetti, N., Dondi, F., Righetti, P.G., (2002) Spot<br />
overlapping in two-dimensional polyacrylamide gel electrophoresis separations: a<br />
statistical study of complex protein maps, Electrophoresis 23, 283–291<br />
16. Campostrini, N., Areces, L.B., Rappsilber, J., Pietrogrande M.C., Dondi, F.,<br />
Pastorino, F., Ponzoni, M., Righetti, P.G., (2005) Spot overlapping in twodimensional<br />
maps: a serious problem ignored for much too long, Proteomics 2005<br />
(5), 2385–2395<br />
17. Garrels, J.I., (1979) Two dimensional gel electrophoresis and computer analysis of<br />
proteins synthesized by clonal cell lines, J. Biol. Chem. 254, 7961–7977<br />
18. Garrels, J.I., Farrar, J.T., Burwell IV, C.B., (1984) In: Celis, J.E., Bravo, R. (Eds.),<br />
Two-dimensional Gel Electrophoresis of Proteins, Academic Press, Orlando, FA,<br />
USA, pp. 38–91<br />
19. Garrels, J.I., (1989) The QUEST system for quantitative analysis of twodimensional<br />
gels, J. Biol. Chem. 264, 5269–5282<br />
20. Massart, D.L., Vandeginste, B.G.M., Deming, S.M., Michotte, Y., Kaufman, L.,<br />
(1988) Chemometrics: A Textbook. Amsterdam, Elsevier<br />
21. Vandeginste, B.G.M., Massart, D.L., Buydens, L.M.C., De Jong, S., Lewi, P.J.,<br />
Smeyers-Verbeke, J., (1998) Handbook of Chemometrics and Qualimetrics: Part B.<br />
Amsterdam, Elsevier<br />
22. Marengo, E., Robotti, E., Righetti, P.G., Campostrini, N., Pascali, J., Ponzoni, M.,<br />
(2004) Study of Proteomic changes associated with healthy and tumoral murine<br />
samples in Neuroblastoma by Principal Component Analysis and classification<br />
methods, Clinica Chimica Acta 345, 55–67<br />
23. Marengo, E., Robotti, E., Bobba, M., Liparota, M.C., Antonucci, F., Rustichelli, C.,<br />
Zamò, A., Chilosi, M., Hamdan, M., Righetti, P.G., (2006) Characterisation of<br />
the proteomic profiles of two human lymphoma cell lines by two-dimensional<br />
gel-electrophoresis and multivariate statistical tools, Electrophoresis 27,<br />
484–494<br />
24. Massart, D.L., Kaufman, L., (1983) In: Elving, P.J., Winefordner, J.D. (Eds.), The<br />
Interpretation of Analytical Chemical Data by the Use of Cluster Analysis. Wiley,<br />
New York, USA<br />
25. Eisenbeis, R.A. (Ed.), (1972) Discriminant Analysis and Classification Procedures:<br />
Theory and Applications. Lexington, USA
322 Marengo et al.<br />
26. Klecka, W.R. (Ed.), (1980) Discriminant Analysis. Sage Publications, Beverly<br />
Hills, USA<br />
27. Wold, S., (1976) Pattern recognition by means of disjoint principal components<br />
models, Pattern Recognition 8, 127–139<br />
28. Martens, H., Naes, T., (1989) Multivariate Calibration, Wiley, London<br />
29. Kleinbaum, D., Kupper, L., Muller, K., (1988) Applied Regression Analysis and<br />
Other Multivariate Methods, 2nd ed.. Pws-Kent, Boston<br />
30. De Noord, O.E., (1994) Multivariate calibration standardization, Chemometr. Intell.<br />
Lab. Syst. 25, 85–97<br />
31. Anderson, N.L., Hofmann, J.P., Gemmell, A., Taylor, J., (1984) Global approaches<br />
to quantitative analysis of gene-expression patterns observed by use of twodimensional<br />
gel electrophoresis, Clin Chem. 30, 2031–2036<br />
32. Tarroux, P., Vincens, P., Rabilloud, T., (1987) HERMeS: a second generation<br />
approach to the automatic analysis of two-dimensional electrophoresis gels. Part<br />
V: Data analysis, Electrophoresis 8, 187–199<br />
33. Couto, M.M.B., Vogels, J.T.W.E., Hofstra, H., Husiintveld, J.H.J., Vandervossen,<br />
J.M.B.M., (1995) Random amplified polymorphic DNA and restriction<br />
enzyme analysis of PCR amplified RDNA in taxonomy, 2 Identification techniques<br />
for food-borne yeasts, J. Applied Bacteriology 79 (5), 525–535<br />
34. Johansson, M.L., Quednau, M., Ahrne, S., Molin, G., (1995) Classification of<br />
lactobacillus-plantarum by restriction-endonuclease analysis of total chromosomal<br />
DNA using conventional agarose-gel electrophoresis, International J. of Systematic<br />
Bacteriology 45 (4), 670–675<br />
35. Boon, N., De Windt, W., Verstraete, W., Top, E.M., (2002) Evaluation of nested<br />
PCR-DGGE (denaturing gradient gel electrophoresis) with group-specific 16S<br />
rRNA primers for the analysis of bacterial communities from different wastewater<br />
treatment plants, FEMS Microbiology Ecology 39 (2), 101–112<br />
36. Gadea, I., Ayala, G., Diago, M.T., Cunat, A., Garcia de Lomas J., (2000) Immunological<br />
diagnosis of human hydatid cyst relapse: utility of the enzyme-linked<br />
immunoelectrotransfer blot and discriminant analysis, Clinical and Diagnostic<br />
Laboratory Immunology 7 (4), 549–552<br />
37. Gadea, I., Ayala, G., Diago, M.T., Cunat, A., Garcia de Lomas, J., (1999) Immunological<br />
diagnosis of human cystic echinococcosis: utility of discriminant analysis<br />
applied to the enzyme-linked mmunoelectrotransfer blot, Clinical and Diagnostic<br />
Laboratory Immunology 6 (4), 504–508<br />
38. Kovarova, H., Hajduch, M., Korinkova, G., Halada, P., Krupickova, S.,<br />
Gouldsworthy, A., Zhelev, N., Strnad, M., (2000) Proteomics approach in classifying<br />
the biochemical basis of the anticancer activity of the new olomoucinederived<br />
synthetic cyclin-dependent kinase inhibitor, bohemine, Electrophoresis 21,<br />
3757–3764<br />
39. Kovarova, H., Radzioch, D., Hajduch, M., Sirova, M., Blaha, V., Macela, A.,<br />
Stulik, J., Hernychova, L., (1998) Natural resistance to intracellular parasites: a<br />
study by two-dimensional gel electrophoresis coupled with multivariate analysis,<br />
Electrophoresis 19 (8–9), 1325–1331
2D-PAGE Maps Analysis 323<br />
40. De Moor, B., Marchal, K., Mathys, J., Moreau, Y., (2003) Bioinformatics:<br />
organisms from Venus, technology from Jupiter, algorithms from Mars, European<br />
Journal of Control 9 (2–3), 237–278<br />
41. Iwadate, Y., Sakaida, T., Hiwasa, T., Nagai, Y., Ishikura, H., Takiguchi, M.,<br />
Yamaura, A., (2004) Molecular classification and survival prediction in human<br />
gliomas based on proteome analysis, Cancer Research 64 (7), 2496–2501<br />
42. Amin, R.A., Vickers, A.E., Sistare, F., Thompson, K.L., Roman, R.J.,<br />
Lawton, M., Kramer, J., Hamadeh, H.K., Collins, J., Grissom, S., Bennett, L.,<br />
Tucker, C.J., Wild, S., Kind, C., Oreffo, V., Davis, J.W., Curtiss, S., Naciff, J.M.,<br />
Cunningham, M., Tennant, R., Stevens, J., Car, B., Bertram, T.A., Afsharil, C.A.,<br />
(2004) Identification of putative gene-based markers of renal toxicity, Environmental<br />
Health Perspectives 112 (4), 465–479<br />
43. Heijne, W.H.M., Stierum, R.H., Slijper, M., van Bladeren, P.J., van Ommen, B.,<br />
(2003) Toxicogenomics of bromobenzene hepatotoxicity: a combined transcriptomics<br />
and proteomics approach, Biochemical Pharmacology 65 (5), 857–875<br />
44. Anderson, N.L., EsquerBlasco, R., Richardson, F., Foxworthy, P., Eacho, P., (1996)<br />
The effects of peroxisome proliferators on protein abundances in mouse liver,<br />
Toxicology and Applied Pharmacology 137 (1), 75–89<br />
45. Perrot, F., Hebraud, M., Charlionet, R., Junter, G.A., Jouenne, T., (2001) Cell<br />
immobilisation induces changes in the protein response of Escherichia coli K-12<br />
to a cold shock, Electrophoresis 22, 2110–2119<br />
46. Verhoeckx, K.C.M., Bijlsma, S., de Groene, E.M., Witkamp, R.F., van der Greef, J.,<br />
Rodenburg, R.J.T., (2004) A combination of proteomics, principal component<br />
analysis and transcriptomics is a powerful tool for the identification of biomarkers<br />
for macrophage maturation in the U937 cell line, Proteomics 4 (4), 1014–1028<br />
47. Verhoeckx, K.C.M., Bijlsma, S., Jespersen, S., Ramaker, R., Verheij, E.R.,<br />
Witkamp, R.F., van der Greef, J., Rodenburg, R.J.T., (2004) Characterization<br />
of anti-inflammatory compounds using transcriptomics, proteomics, and<br />
metabolomics in combination with multivariate data analysis, International<br />
Immunopharmacology 4 (12), 1499–1514<br />
48. Marengo, E., Robotti, E., Cecconi, D., Scarpa, A., Righetti, P.G., (2004) Identification<br />
of the regulatory proteins in human pancreatic cancers treated with<br />
Trichostatin-A by 2D-PAGE maps and Multivariate Statistical Analysis, Analytical<br />
and Bioanalytical Chemistry 379 (7–8), 992–1003<br />
49. Fujii, K., Kondo, T., Yokoo, H., Yamada, T., Matsuno, Y., Iwatsuki, K.,<br />
Hirohashi, S., (2005) Protein expression pattern distinguishes different lymphoid<br />
neoplasms, Proteomics 5, 4274–4286<br />
50. Dewettinck, K., Dierckx, S., Eichwalder, P., Huyghebaert, A., (1997) Comparison<br />
of SDS-PAGE profiles of four Belgian cheeses by multivariate statistics, Lait 77<br />
(1), 77–89<br />
51. Alika, J.E., AkenOva, M.E., Fatokun, C.A., (1995) Variation among maize (Zea<br />
mays L) accessions of Bendel State, Nigeria – numerical analysis of zein protein<br />
band patterns, Genetic Resources and Crop Evolution 42 (4), 393–399
324 Marengo et al.<br />
52. Magdic, D., Horvat, D., Jurkovic, Z., Sudar, R., Kurtanjek, K., (2002) Chemometric<br />
analysis of high molecular mass glutenin subunits and image data of bread crumb<br />
structure from Croatian wheat cultivars, Food Technology and Biotechnology 40<br />
(4), 331–341<br />
53. Jessen, F., Lametsch, R., Bendixen, E., Kjaersgard, I.V.H., Jorgensen, B.M., (2002)<br />
Extracting information from two-dimensional electrophoresis gels by partial least<br />
squares regression, Proteomics 2, 32–35<br />
54. Kleno, T.G., Leonardsen, L.R., Kjeldal, H.O., Laursen, S.M., Jensen, O.N.,<br />
Baunsgaard, D., (2004) Mechanisms of hydrazine toxicity in rat liver investigated<br />
by proteomics and multivariate data analysis, Proteomics 4, 868–880<br />
55. Kjaersgard, I.V.H., Norrelykke, M.R., Jessen, F., (2006) Changes in cod muscle<br />
proteins during frozen storage revealed by proteome analysis and multivariate data<br />
analysis, Proteomics 6, 1606–1618<br />
56. Gottfries, J., Sjogren, M., Holmberg, B., Rosengren, L., Davidsson, P.,<br />
Blennow, K., (2004) Proteomics for drug target discovery, Chemometrics and<br />
Intelligent Laboratory Systems 73, 47–53<br />
57. Karp, N.A., Griffin, J.L., Lilley, K.S., (2005) Application of partial least squares<br />
discriminant analysis to two-dimensional difference gel studies in expression<br />
proteomics, Proteomics 5, 81–90<br />
58. Norden, B., Broberg, P., Lindberg, C., Plymoth A., (2005) Analysis and understanding<br />
of high-dimensionality data by means of multivariate data analysis,<br />
Chemistry and Biodiversity 2 (11), 1487–1494<br />
59. Malone, J., McGarry, K., Bowermann, C., (2006) Automated trend analysis of<br />
proteomics data using an intelligent data mining architecture, Expert Systems with<br />
Applications 30, 24–33<br />
60. Marengo, E., Robotti, E., Gianotti, V., Righetti P.G., (2003) A new approach to<br />
the statistical treatment of 2D-Pages in proteomics using fuzzy logic, Annali di<br />
Chimica 93 (1–2), 105–116<br />
61. Marengo, E., Robotti, E., Righetti, P.G., Antonucci, F., (2003) A new approach<br />
based on fuzzy logic and principal component analysis for the classification of 2Dmaps<br />
in health and disease: application to lymphomas, Journal of Chromatography<br />
A 1004, 13–28<br />
62. Marengo, E., Robotti, E., Gianotti, V., Righetti, P.G., Domenici, E., Cecconi, D.,<br />
(2003) A new integrated statistical approach to the diagnostic use of proteomic<br />
two-dimensional maps, Electrophoresis 24, 225–236<br />
63. Marengo, E., Robotti, E., Cecconi, D., Scarpa, A., Righetti, P.G., (2004) Application<br />
of fuzzy logic principles to the classification of 2D-PAGE maps belonging to<br />
human pancreatic cancers treated with Trichostatin-A, Proceedings of 2004 IEEE<br />
International Conference on Fuzzy Systems, Budapest, Hungary, 25–29 July 2004,<br />
1, 359–364<br />
64. Marengo, E., Robotti, E., Antonucci, F., Cecconi, D., Campostrini, N.,<br />
Righetti, P.G., (2005) Spot matching in two-dimensional gels: a review of<br />
commercial software and of “home-made” approaches, Proteomics 5, 654–666
2D-PAGE Maps Analysis 325<br />
65. Zenkouar, H., Nachit, A., (1997) Images compression using moments method of<br />
orthogonal polynomials, Materials Science and Engineering B 49, 211–215<br />
66. Yin, J., Rodolfo De Pierro, A., Wei, M., (2002) Analysis for the reconstruction of a<br />
noisy signal based on orthogonal moments, Applied Mathematics and Computation<br />
132, 249–263<br />
67. Hu, M.K., (1962) Visual pattern recognition by moment invariants, IRE Transaction<br />
on Information Theory 8, 179–187<br />
68. Teague, M.R., (1980) Image analysis via the general theory of moments, Journal<br />
of the Optical Society of America 70, 920–930<br />
69. Li, B.C., Shen, J., (1991) Fast computation of moment invariants, Pattern Recognition<br />
24, 807–813<br />
70. Chong, C., Raveebdram, P., Mukundan, R., (2004) Translation and scale invariants<br />
of Legendre moments, Pattern Recognition 37, 119–129<br />
71. Mukundan, R., Ramakrishnan, K.R., (1995) Fast computation of Legendre and<br />
Zernike moments, Pattern Recognition 28, 1433–1442<br />
72. Zhou, J.D., Shu, H.Z., Luo, L.M., Yu, W.X., (2002) Two new algorithms for<br />
efficient computation of Legendre moments, Pattern Recognition 35, 1143–1152<br />
73. Wee, C., Paramesran, R., Takeda, F., (2004) New computational methods for full<br />
and subset Zernike moments, Information Sciences 159, 203–220<br />
74. Kan, C., Srinath, M.D., (2002) Invariant character recognition with Zernike and<br />
orthogonal Fourier-Mellin moments, Pattern Recognition 35, 143–154<br />
75. Khotanzad, A., Hong, Y.H., (1990) Invariant image recognition by Zernike<br />
moments, IEEE Transactions on Pattern Analysis and Machine Intelligence 12,<br />
489–497<br />
76. Marengo, E., Bobba, M., Robotti, E., Liparota, M.C., (2005) Use of Legendre<br />
moments for the fast comparison of 2D-PAGE maps images, Journal of<br />
Chromatography A 1096 (1–2), 86–91<br />
77. Marengo, E., Leardi, R., Robotti, E., Righetti, P.G., Antonucci, F., Cecconi, D.,<br />
(2003) Application of three-way principal component analysis to the evaluation<br />
of two-dimensional maps in proteomics, Journal of Proteome Research 2 (4),<br />
351–360<br />
78. Schultz, J., Gottlieb, D.M., Petersen, M., Nesic, L., Jacobsen, S., Sondergaard, I.,<br />
(2004) Explorative data analysis of two-dimensional electrophoresis gels,<br />
Electrophoresis 25 (3), 502–511
17<br />
Finding the Significant Markers<br />
Statistical Analysis of Proteomic Data<br />
Sebastien Christian Carpentier, Bart Panis, Rony Swennen,<br />
and Jeroen Lammertyn<br />
Summary<br />
After separation through two-dimensional gel electrophoresis (2DE), several hundreds<br />
of individual protein abundances can be quantified in a cell population or sample tissue.<br />
Both a good experimental setup and a valid statistical approach are essential to get insight<br />
into the data and to draw correct conclusions. High-throughput 2DE proteomics yield<br />
complex and large datasets with a huge disproportion between the hundreds of variables<br />
and the restricted number of replicates. However, the most commonly used statistical tests<br />
have been designed to cope with a high number of replicates and a restricted number<br />
of variables. There is some inconsistency in the proteomics community related to the<br />
use of statistics. Two approaches of data analysis can be distinguished: exploratory data<br />
analysis and confirmatory data analysis. Currently, most proteomic data are analyzed<br />
with the emphasis on confirmatory analysis and do not take into account the exploratory<br />
data analysis. This chapter gives an overview of the typical statistical exploratory and<br />
confirmatory tools available and suggests case-specific guidelines for a reliable statistical<br />
approach that can be used for 2DE analysis. Examples are given for an experimental<br />
setup based on classical staining methods as well as for the more advanced difference gel<br />
electrophoresis.<br />
Key Words: assumptions; confirmatory data analysis; experimental set-up;<br />
exploratory data analysis; missing values; multivariate statistics; non-parametric test;<br />
parametric test; principal component analysis; univariate statistics.<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
327
328 Carpentier et al.<br />
1. Introduction<br />
The conventional approach to analyze a biological problem is to collect data<br />
in order to test a particular hypothesis. Starting from this hypothesis, the data<br />
are collected, which should lead to an objective and reliable decision. As such,<br />
the hypothesis can be accepted, revised, or rejected. This confirmatory way of<br />
data analysis is accompanied by a number of steps that define the experimental<br />
setup. However, our understanding of a biological system is usually rather<br />
limited, and data may be very heterogeneous and complex. Exploratory data<br />
analysis approaches a biological problem from a different angle and tries to<br />
describe patterns, relationships, trends, outlying data, etc. Two-dimensional<br />
gel electrophoresis (2DE) simultaneously quantifies hundreds of individual<br />
protein abundances in a cell population or sample tissue. High-throughput<br />
2DE proteomics yield complex and large datasets with a huge disproportion<br />
between the hundreds of variables and the restricted number of replicates. Most<br />
commonly used statistical tests are for confirmatory data analysis and have<br />
been designed to cope with a high number of replicates and a restricted number<br />
of variables.<br />
Both a good experimental setup and a valid statistical approach are extremely<br />
important. There is some inconsistency in the proteomics community. Proteomic<br />
data are currently analyzed by a variety of approaches. The objective of this<br />
chapter is to give a concise overview of statistical methods used in functional<br />
genomics and to find a good compromise between statistics and proteome<br />
analysis in practice. This chapter deals with the experimental design and data<br />
analysis and, at the end, provides two practical examples (classical staining<br />
approach and DIGE approach). Section 2 discusses the issues of replicates and<br />
the pooling of samples, and briefly discusses the calibration, normalization,<br />
and quantification of data. Section 3 discusses confirmatory univariate and<br />
exploratory multivariate analysis and the related assumptions and associated<br />
problems.<br />
2. Experimental Design<br />
The design of an experiment is crucial for the robustness of the results<br />
obtained. Careful planning is essential to maximize the information output of<br />
an experiment. The experimental conditions must be well designed in order<br />
to keep variation within an experimental group as small as possible, and the<br />
experimental setup should be kept as simple as possible in order to keep the data<br />
manageable. When the impact of a particular treatment is to be examined, proper<br />
controls should be included (positive and negative control), and irrelevant<br />
external influences should be eliminated or anticipated (e.g., by randomized<br />
design).
Statistical Analysis of Proteomic Data 329<br />
The conventional approach of analyzing a biological problem is to collect<br />
data in order to test a particular hypothesis. The collected data should enable the<br />
researcher to make an objective and reliable decision concerning the hypothesis.<br />
The experimental setup usually includes a procedure that involves several<br />
steps: (1) state a null hypothesis (H 0 ) (e.g., there is no difference in protein<br />
abundance(s) between the treatments) and its alternative (H 1 ) (e.g., there is<br />
a difference between the treatments), (2) to choose the most appropriate test<br />
statistic to check the hypothesis, (3) specify a significance level (i.e., the<br />
accepted level of having false positive results and to reject unjustly the null<br />
hypothesis), (4) specify the sample size (number of replicates) to have sufficient<br />
power, and (5) collect the data. The power of a statistical test is the ability to<br />
detect possible differences between the experimental groups. The power of a<br />
statistical test or the reduction of false negative results depends on the variance,<br />
the change in abundance, the number of replicates, the statistical test chosen,<br />
and the predetermined significance level. Lilley and Karp have illustrated the<br />
relationship between power, replicate number, and relative expression change<br />
in a proteomics experiment (1). Urfer et al. consider the effect of testing all the<br />
proteins simultaneously by means of family-wise error rate and false discovery<br />
rate (2). The number of replicates is the best way to control the power of<br />
a statistical test. Given the labor and cost involved in the 2DE analysis, the<br />
number of replicates is often restricted, and thus the variance (technical and<br />
biological) should be kept in control.<br />
2.1. Replicates<br />
A well-discussed subject is the nature of replicates. Two types of replicates<br />
are reported in 2DE studies: (1) technical replicates (repeated measurements<br />
of the same sample (e.g., the same protein extract) and (2) biological replicates<br />
(different measurements within the same experimental group). Ideally,<br />
only biological replicate samples should be used, and one should try to limit<br />
the technical variability to the strict minimum so that a repeated measurement<br />
of the same sample is not necessary (Fig. 1A). Therefore, both a reliable<br />
sample preparation method (3) and an extended experience in electrophoresis<br />
and proteomic techniques are indispensable (4,5,6,7). Technical variability can<br />
be introduced at the level of (1) sample collection, (2) sample preparation and<br />
protein extraction, (3) sample loading and electrophoresis, and (4) staining and<br />
image analysis. Some staining methods, like silver staining, implicate a lot of<br />
steps, and each sample is run in an individual gel, which makes the approach<br />
susceptible to technical variation. Technical replicates might be considered in<br />
experiments with a low sample yield, with cost restrictions, or when all the<br />
technical variability is still too high (high inter-gel variability) (Fig. 1B).
330 Carpentier et al.<br />
In any case, one should take care to analyze technical replicates next to<br />
biological replicates. Statistically speaking, we are dealing with mixed models<br />
and nested designs (8,9). Karp et al. discuss the impact of mixing biological and<br />
technical replicates in a proteomics experiment (10). Treating technical replicates<br />
as biological replicates can increase the rate of false positives. Analyzing<br />
biological and technical replicates in one test would seem reasonable only<br />
in a nested ANOVA test. If another statistical test should be used, only the<br />
biological replicates are used (Fig. 1A), and the technical repetition of the same<br />
biological samples (proteins extracts) should be considered as a distinct and<br />
confirmatory analysis. With low technical variance observed with the difference<br />
gel electrophoresis (DIGE) approach (see below), the value of the analyzing<br />
technical replicates can be questioned and hence skipped (Fig. 1A).<br />
2.2. Pooling<br />
Another well-debated subject is the pooling of biological samples. Pooling<br />
of individual biological tissues or cells averages the sample. On one hand,<br />
pooling reduces the variability increasing the power, but on the other hand,<br />
there is incontestable loss of relevant information of individuals. The pooling of<br />
samples reduces biological variance in detecting changes in protein abundance<br />
between the averages of the experimental groups. Pooling of samples is usually<br />
done when the biological variation with in an experimental group is too big<br />
(Fig. 1C and 1D), or when an individual starting material is not sufficient<br />
to extract proteins from. Pooling of samples might be useful, but must be<br />
evaluated for each individual experimental setup.<br />
2.3. Data Processing<br />
Common strategies for quantitative determination of gel-separated proteins<br />
include organic dyes (e.g., colloidal coomassie blue), silver staining, radio<br />
labeling, and fluorescent stains (e.g., Deep Purple, Flamingo, SYPRO<br />
Orange/Red/Ruby, and other ruthenium complexes and succinimidyl ester<br />
derivatives of cyanine dyes). The use of a particular staining method should<br />
carefully be considered taking into account the lab equipment available, budget,<br />
and power of a particular method. The dynamic range of staining methods and<br />
the technical variability both have a great impact on the power of a statistical<br />
test and are decisive for the experimental setup (the number of replicates) and<br />
the choice of the statistical test.<br />
Data from 2DE analysis are generated through image analysis software<br />
that detects and quantifies protein abundances and matches the same proteins<br />
across different gels. An important challenge in 2DE is to estimate the protein<br />
concentration in order to ensure that all gels are loaded with an equal amount of
Statistical Analysis of Proteomic Data 331<br />
Fig. 1. Experimental set-up. Theoretical examples of experimental setup control vs.<br />
treatment. (A) Small intra-group variation and small technical variation: four biological<br />
replicates for control and four biological replicates for treatment. (B) Small intra-group<br />
variation and big technical variation—mixed model: four biological and three technical<br />
replicates for control and the same for treatment. (C) Big intra-group variation and<br />
small technical: four replicates of biological pool for control and the same for treatment.<br />
(D) Big intra-group variation and big technical variation—mixed model: four replicates<br />
of biological pool and three technical replicates for control and the same for treatment.
332 Carpentier et al.<br />
proteins, and hence to minimize the technical variation. Most current software<br />
packages take this into account and introduce a calibration or normalization in<br />
order to compensate for image differences caused by protein loading, staining,<br />
and scanning.<br />
2.3.1. Classical Approach<br />
Calibration in a classical approach (like silver or coomassie staining) is<br />
developed to take into account the differences in scanning properties (such as<br />
image depth). Scanner grey values are converted to optical densities so that<br />
intensities are no longer dependent on the original pixel depth. The most logical<br />
normalization procedure to anticipate possible loading differences for a classical<br />
staining is % volume, where the individual spot volumes are normalized by the<br />
total volume of all spots. Normalized data, whether or not transformed, can be<br />
subsequently analyzed statistically by a relevant statistical test (see below).<br />
The most commonly used organic staining is coomassie brilliant blue (CBB)<br />
staining. CBB staining has a relative good dynamic range (approximately 10 3 )<br />
and is perfectly compatible with MS. However, its sensitivity is relatively<br />
low. The limit of protein detection for colloidal CBB stain is approximately<br />
8–10 ng (11). Therefore, several modifications have been proposed to improve<br />
its sensitivity. For an overview, see (12).<br />
The introduction of the first sensitive silver-staining (13) method was a major<br />
breakthrough in the field of protein detection, which led to extensive research<br />
and various alternative silver-staining protocols (14). Silver-staining is still one<br />
of the most sensitive non-radioactive detection techniques with a detection limit<br />
in the lower nanogram range. However, the linearity and dynamic range are<br />
relatively poor (approximately 10 2 or less), the staining is protein-dependent,<br />
and gel-to-gel variation is not negligible due to numerous solution changes and<br />
other carefully timed steps.<br />
2.3.2. Difference Gel Electrophoresis Approach<br />
Fluorescent-based methods are surpassing the conventional technologies in<br />
use. A standard UV-transilluminator can be used for visualization of most<br />
fluorescent stains, but more sophisticated and expensive CCD cameras or laser<br />
scanners are appropriate for quantitative determination. The development of<br />
succinimidyl ester derivatives of different cyanine fluorescent dyes that modify<br />
free amino groups of proteins prior to separation (15) was a major achievement in<br />
terms of reproducibility and throughput. The DIGE approach uses fluorophores<br />
that have different absorption optimum, making it possible to run multiple samples<br />
simultaneously in the same gel. Several dyes were designed to ensure that a<br />
protein acquires the same relative mobility irrespective of the dye used to tag it.
Statistical Analysis of Proteomic Data 333<br />
The difference in MW introduced by different length linkers is compensated by<br />
different alkyl moieties opposite the linker moiety. Originally, only two different<br />
cyanine dyes were included (Cy3 and Cy5), but the concept was extended with<br />
a third dye (Cy2) that opened the way for a total new experimental design that<br />
further exploits the sample multiplexing capabilities of the dyes, by including an<br />
internal standard (16,17). The internal standard is a mixture of equal amounts of<br />
each sample and guarantees a powerful normalization procedure for high accuracy<br />
of protein quantification. This normalization reduces the variability considerably<br />
and brings on reasonable arguments to justify the use of powerful parametric<br />
statistics after transformation of the standardized volume. If multiple conditions<br />
have to be tested spread over different electrophoresis runs, one common internal<br />
standard should be created and included in all the gels of each run. However,<br />
if an experimental setup is too complex, the internal standard will contain too<br />
many samples possibly resulting in an overlap of spots of different samples. The<br />
minimal labeling approach has a dynamic range of four to five orders, and its sensitivity<br />
is currently marginally less sensitive than silver-staining (18). Although<br />
the dyes have been carefully designed, care should be taken in the experimental<br />
design to take into account possible dye-specific effects. Therefore, a supervised<br />
randomization of the Cy3/Cy5 labeling is highly recommended. Not only the<br />
labeling should be randomized, but also the samples representing an experimental<br />
group should be mixed across gels in order to avoid systematic gel artefacts.<br />
3. Data Analysis<br />
3.1. Confirmatory Univariate Data Analysis<br />
Univariate statistical methods examine the individual protein spots one by<br />
one, considering the different proteins as independent measurements. Table 1<br />
gives an overview of some commonly used parametric and non-parametric<br />
univariate tests. Univariate methods start from the null hypothesis that there<br />
is no difference between the two experimental populations. Parametric models<br />
Table 1<br />
Overview of Some Commonly Used Univariate Tests<br />
Classes of data<br />
Univariate statistics<br />
Parametric Non-parametric<br />
Comparing 2 treatments T-test Mann–Whitney/Wilcoxon<br />
Kolmogorov–Smirnov test<br />
Comparing k treatments ANOVA Kruskal–Wallis test
334 Carpentier et al.<br />
like the Student’s T-test start from the observed sampling and assume that the<br />
observed sample mean and variance approximate the real population mean and<br />
variance, and that the variances of the two experimental populations are equal.<br />
Based on the observed mean and variance, the two populations are considered<br />
normally distributed and a model is made (Fig. 2). If the test statistic (or T-<br />
value) is large enough, the null hypothesis is rejected (Eq. 1). The numerator<br />
measures the distance between the experimental means and is thus an estimation<br />
of the inter-group variability; the denominator approximates the real variability<br />
and estimates the intra-group variability.<br />
T 2 = y 2 − y 1 2 /S 2 P 1/n 1 + 1/n 2 (1)<br />
where y i : experimental mean (estimate of the population mean, μ i ); S P : pooled<br />
sample variance (estimate of the variance; it is a weighted average of the<br />
group variances accounting for the number of replicates or samples in each<br />
group); n i : number of replicates per experimental group.<br />
Parametric univariate statistical tests are very powerful, but the data must<br />
respect the restrictive assumptions (continuous and normally distributed data,<br />
homogeneity of variance, and independent samples) and the assumptions must<br />
be tested. A commonly used test for the estimation of homogeneity of variances<br />
is the Levene’s test, and for the estimation of normality, it is the Shapiro-Wilk<br />
test (19). If one assumption is not met, the significance levels and the power<br />
of the test might be invalidated. Transformation of data (e.g., log function,<br />
arcsine, square root) is frequently used to improve the distribution characteristics<br />
(normality and homogeneity of variance) (20). The problem of proteomic<br />
data is the low number of replicates. It is impossible to test these assumptions<br />
starting from the low sample sizes commonly used in 2DE experiments.<br />
Tests like the Levene’s test and the Shapiro-Wilk test are designed for higher<br />
sample sizes and have very limited power at the commonly used sample size in<br />
proteomics experiments. Given the labor and cost involved in the 2DE analysis,<br />
the number of replicates is often restricted and ranges usually between 3 and 6.<br />
Fig. 2. Distribution of two normal populations with a homogeneous variance. μ i :<br />
real population average estimated by the sample average.
Statistical Analysis of Proteomic Data 335<br />
Although some empirical evidence illustrates that slight deviations in meeting<br />
the assumptions underlying parametric tests may not have radical effects on<br />
the obtained probability levels, there is no general agreement as to what is a<br />
“slight” deviation (21).<br />
An alternative for the parametric tests is the use of non-parametric tests,<br />
which do not assume any distribution for the data but usually have a relatively<br />
low power (21). The assumptions are independent and continuous ordinal<br />
data. A useful non-parametric test is the Kolmogorov–Smirnov test. The<br />
Kolmogorov–Smirnov test determines whether or not the experimental groups<br />
come from the same distribution. Therefore, the data points in each experimental<br />
group are sorted in ascending order, and an empirical distribution<br />
function is calculated without any assumption of distribution or variance. The<br />
Kolmogorov–Smirnov test statistic D is defined as the maximum distance<br />
between the cumulative distributions of two experimental groups (for an<br />
example, see Fig. 5).<br />
D n1n2 = max S n1 X − S n2 X (2)<br />
where S ni (X)=K i /n i K i = number of data equal or less than X; n i : number of<br />
replicates per experimental group.<br />
3.2. Exploratory Multivariate Data Analysis<br />
Univariate statistical tests, such as the T-test, the Kolmogorov–Smirnov<br />
test, ANOVA, or the Kruskal–Wallis test, have not been designed to analyze<br />
complex datasets containing multiple correlated variables. Proteomic datasets<br />
generally contain hundreds of different proteins that are correlated. Proteins fit<br />
within the larger entity of networks and interact with each other. Univariate<br />
statistics test the individual variables one by one and are absolutely not able<br />
to detect correlations to other variables (proteins). Moreover, testing hundreds<br />
of variables (protein spots) one by one and reporting them with an acceptance<br />
of a certain risk of false positives () enhances the chance of reporting<br />
false positive cases (multiple testing issue), and assumes that the different<br />
variables (proteins) are uncorrelated. Proteins are not uncorrelated; they fit<br />
within multiple biological pathways and might have close correlations. The field<br />
of multivariate analysis consists of those statistical techniques that consider two<br />
or more related random variables as a single entity and attempts to produce an<br />
overall result taking the relationship among the variables into account (22). In<br />
contrast to a univariate approach, it displays the inter-relationships between a<br />
large number of variables and is able to correlate multiple proteins to a specific<br />
experimental group. The data from different image analysis software packages<br />
can be exported, introduced, and analyzed using several software packages to
336 Carpentier et al.<br />
perform multivariate analysis. Some commonly used packages are Unscrambler,<br />
Matlab, SAS, and Statistica. GE Healthcare developed a statistical software<br />
package (EDA, extended data analysis) for DIGE approach, which is linked to<br />
the image analysis software Decyder. The package offers both univariate and<br />
multivariate tools. Here, we will discuss mainly the use of Principal Component<br />
Analysis (PCA) (for an overview of other possibilities of EDA package and<br />
more DIGE related statistical examples, see Chapter 6).<br />
3.2.1. Principal Component Analysis<br />
PrincipalComponentAnalysisisoneofthemultivariatepossibilitiestoperform<br />
explorative data analysis. A comprehensive overview of the use of PCA in<br />
statistics is given by Sharma (23). The basics of PCA date back to Karl Pearson<br />
in 1901 (24), and the final procedure as we know it today was developed by<br />
Harold Hotelling in 1933 (25). The use of multivariate methods in the analysis<br />
of 2DE was already established in the early days of 2DE (26) and is an emerging<br />
application in transcriptomics and proteomics (27,28,29,30,31). PCA condenses<br />
the information contained in a huge dataset into a smaller number of artificial<br />
factors, which explain most of the variance observed. The most logical modus<br />
operandi is to consider the different biological replicate samples of the experimental<br />
groups as observations (score plot). The score plot allows the detection of<br />
trends in the samples and the loading plot allows to identify the relevant proteins<br />
that explain the trends. A principal axis transformation transforms the correlated<br />
variables (proteins) into new uncorrelated variables. A principal component<br />
(PC) is a linear combination calculated from the existing variables (proteins)<br />
[PC1 = a 1 (protein1) + a 2 (protein2) +…+a n (protein n);<br />
PC2=b 1 (protein1) + b 2 (protein2) +…+b n (protein n)]. The relation<br />
between the original variables (proteins) and the PCs is displayed in the loading<br />
plot. This means that if a protein has a high loading score for a specific PC,<br />
that protein explains an important part of the sample variance. The starting<br />
point for PCA is the sample covariance matrix. It has been proven that the sum<br />
of the original variances is equal to the sum of the eigenvalues of the sample<br />
covariance matrix. The eigenvalues are the variances of the PCs. The ratio of<br />
each eigenvalue to the total variance indicates the portion of the total variability<br />
accounted for each PC. For the fundamentals of data manipulation and a more<br />
detailed description of the properties and mechanisms of multivariate analysis<br />
and PCA, the reader is referred to the books of Jackson and Sharma (22,23).<br />
It is very important to have an insight into what is calculated and what the<br />
assumptions are of different models. The EDA software offers the user the<br />
choice to play with observations and loadings. Hence, the user also has the<br />
possibility to use the transposed data matrix, and to consider the gel images as
Statistical Analysis of Proteomic Data 337<br />
variables (loading plot) and the proteins as observations (score plot). This might<br />
be helpful to improve the image analysis and to detect protein mismatches,<br />
but should not be used to explore the inter- and intra-group variability of<br />
the biological samples. Explorative PCA does not put strict requirements to<br />
the data. The majority of PCA applications are descriptive in nature. In these<br />
instances, distributional assumptions are of secondary importance (22). The<br />
only requirement that must be met is that the dataset has to be complete,<br />
meaning that there must be no missing spot values among the different samples.<br />
Finding techniques for performing PCA in the absence of complete data and/or<br />
techniques for estimating missing data can solve the problem. Several methods<br />
for estimating missing data have been reported from the microarray community<br />
(32,33,34). A missing value in 2DE proteomics occurs when a spot is detected<br />
in the reference or master gel but not detected in one of the other sample gel<br />
images, or it is detected but not matched to the reference or master gel. The<br />
causes of missing values might be (1) faint spots, flirting with the detection limit<br />
and detected in one gel but not detected in another; (2) mismatches probably<br />
caused by distortions in the protein pattern, or (3) absence of spots due to<br />
bad transfer from the first to the second dimension. Grove et al. show that the<br />
staining procedure was an important source of missing values (27). The concept<br />
of DIGE with its common internal standard anticipates the missing value<br />
problem to some extent by matching the different internal standard images.<br />
A good sample preparation (3) and a good experience in electrophoresis and<br />
proteomic techniques also reduce this problem, but missing values are inherent<br />
to 2DE and must be faced. Some software packages replace the missing values<br />
with the value zero, and others remove all the variables with missing values.<br />
Introducing zeros leaves the results open to serious bias when a protein is<br />
mismatched in a particular sample or when the spot is missing due to a technical<br />
error. This particular protein will get an important loading value for the sample<br />
in question, influencing incorrectly the score for this particular sample. In the<br />
case a protein is really absent or below the detection limit of the staining<br />
method, those missing values can be filled either with zeros or with a threshold<br />
value (35). A better alternative might be to average the samples within an<br />
experimental group and to explore the data based on the group mean. A missing<br />
value will still be considered as a zero and will lower the group mean, but the<br />
impact of loading on the sample score plot is buffered by the average. The<br />
EDA package offers this possibility (see example below). Taking into account<br />
only the proteins that are detected and matched to the master or reference gel<br />
solves the problem of missing values, but a lot of useful information is lost<br />
(see example below). The EDA package offers the possibility to filter the base<br />
dataset and to select only those proteins that are 100% matched. Troyanskaya<br />
et al. show that averaging is an improvement upon replacing missing values
338 Carpentier et al.<br />
with zeros, but it yields drastically lower accuracy than the estimation methods<br />
such as singular value decomposition and weighted K-nearest neighbors (32).<br />
We recommend performing the initial PCA based on the complete dataset<br />
and not based on the proteins that appear to be significantly different from the<br />
individual univariate analyses. Multivariate statistics have an additional value<br />
by being capable of differentiating the different experimental groups in terms of<br />
correlated expression rather than absolute expression (28,36). Both approaches<br />
are complementary. Performing the analysis only on significant proteins from<br />
univariate analysis might disregard useful information. We recommend to<br />
start the analysis with explorative multivariate analysis and to compare the<br />
data subsequently with the confirmatory univariate analysis of the individual<br />
proteins.<br />
3.2.2. Marker Selection<br />
Principal Component Analysis is outstanding in detecting outlying data and<br />
correlations among the different variables (proteins), but it is not able to<br />
determine a threshold level for identifying which proteins are significant in<br />
classifying the experimental groups, allowing an objective removal of variables<br />
(proteins) that do not contribute to the class distinction. Several algorithms<br />
exist to select a subset of features from the whole dataset and to perform a<br />
classification. In proteome analysis, this corresponds to selecting the proteins<br />
that can best discriminate the experimental groups. The use of partial least<br />
squares (PLS) as a regression technique has been promoted primarily within the<br />
area of chemometrics (37). In contrast to PCA, PLS is a supervised technique<br />
mainly applied to link (or regress) a continuous response variable (or dependent<br />
variable) to a set of independent variables (e.g., proteins in a gel). However, in<br />
proteomic data, the response variable is often a discrete variable (e.g., treatment<br />
A, B, C,…) and only takes a fixed number of values. PLS-DA offers an<br />
algorithm to deal with this typical data structure. An analysis of the score and<br />
(correlation) loading plot allows defining the proteins that are important in<br />
discriminating the different experimental treatments. The variable importance<br />
plot (VIP) is an interesting tool for this purpose. According to the user manual,<br />
the PLS algorithm of EDA creates a supervised model of the data (predefined<br />
experimental groups) and then uses the variable influence on the projection<br />
(VIP) scores from the model to create a ranked list of how good a protein<br />
is for discrimination between the experimental groups. Discriminant analysis<br />
(DA) methods, in general, and PLS-DA, in particular, are used to calculate<br />
the probability or accuracy of the marker selection. The purpose of DA is to<br />
permit to assign individual observations (samples) to one of the experimental
Statistical Analysis of Proteomic Data 339<br />
groups [e.g., the classification of patient samples as healthy and tumor based<br />
on protein extractions (38)].<br />
4. Examples<br />
4.1. Classical Dyes, 2 Conditions<br />
In this example, we examine two different conditions, analyse six biological<br />
samples per condition, and perform the analysis with classical CBB staining.<br />
The data have been analyzed with the Image Master Platinum software version<br />
5 (GE Healthcare). Image Master version 5 offers the possibility to compensate<br />
for technical variance and offers intensity calibration and spot normalization. The<br />
relativevolume(%vol)spotnormalizationisthebestspotnormalizationprocedure<br />
because this takes into account the intensity of a spot as well as the area (Eq. 3).<br />
%vol = vol/ n S=1 vol S (3)<br />
where vol S is the volume of spot S in a gel containing n detected spots.<br />
Although this spot normalization procedure reduces the possible technical<br />
variance, it has consequences for the data. Normalizing all the spots transforms<br />
the data and creates an asymmetric population (Fig. 3). A logarithmic<br />
transformation of the data improves the distribution characteristics (Fig. 4).<br />
However, univariate statistical methods are not developed to analyze all the<br />
spots simultaneously like in Figs. 3 and 4. They examine the individual protein<br />
spots (variables) one by one, considering the different proteins as independent<br />
measurements. Therefore, one should consider each spot individually, and the<br />
real population for the experimental groups of this particular protein spot should<br />
be estimated based on the six replicates. Performing distribution tests like the<br />
Levene’s test and the Shapiro-Wilk test on six replicates is a possibility, but<br />
is unlikely that the null hypotheses (normally distributed and homogeneous<br />
variance, respectively) will be rejected. The sample sizes need to be large<br />
enough in order to minimize the amount of false results (i.e., the populations<br />
will appear to be normally distributed and of equal variance although this is<br />
not necessarily the case).<br />
Taking into account the typical heterogeneity of variance associated with<br />
classical dyes, the %vol spot normalization of Image Master, and the limited<br />
sample size, a non-parametric statistical test seems to be the best choice<br />
in this case. We opted here for the non-parametric univariate Kolmogorov–<br />
Smirnov test. The test is one among the options offered by Image Master.<br />
It is a two-sample test with high power efficiency for small sample sizes.<br />
The reduced power of a non-parametric test was anticipated by including a
340 Carpentier et al.<br />
2000<br />
Histogram: Var1<br />
Shapiro-Wilk W = .35883. p = 0.0000<br />
Expected Normal<br />
1800<br />
1600<br />
1400<br />
No. of obs.<br />
1200<br />
1000<br />
800<br />
600<br />
400<br />
200<br />
0<br />
–0.3<br />
–0.1 0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3<br />
Fig. 3. Distribution of protein spots analyzed by image master and normalized using<br />
the %vol criterion. There is an asymmetrical distribution, with the majority of the spots<br />
lying between 0 and 0.1%.<br />
1000<br />
Histogram: Var2<br />
Shapiro-Wilk W = .98283. p = .00000<br />
Expected Normal<br />
900<br />
800<br />
700<br />
No. of obs.<br />
600<br />
500<br />
400<br />
300<br />
200<br />
100<br />
0<br />
–7 –6 –5 –4 –3 –2 –1 0 1<br />
Fig. 4. A logarithmic transformation of the %vol data of Fig. 3.
Statistical Analysis of Proteomic Data 341<br />
higher number (6) of biological replicates. Figure 5 shows an example of<br />
an individual Kolmogorov–Smirnov test. For the complete experimental setup<br />
and biological background, see Carpentier et al. (39). The options of the Image<br />
Master Platinum software are rather limited and are focused on two experimental<br />
groups. The multivariate analysis offered by Image Master Platinum is<br />
factor analysis. Factor analysis is a technique similar in nature to PCA. The<br />
results of both techniques are quite similar except that factor analysis explains<br />
rather correlations between variables, while PCA explains variability (22). In<br />
Image Master Platinum, the gels (images) are used as loading and proteins<br />
for the score plot. Factor 1 (explaining the majority of the variability) is in<br />
our case associated to protein abundance, and the second factor is associated<br />
with inter-group variability. As stated above, this might be useful to improve<br />
the image analysis and to detect protein mismatches, but to explore the interand<br />
intra-variability of the biological samples, it might be better to export the<br />
A B C<br />
0.8<br />
0.8<br />
0.8<br />
0.6<br />
0.6<br />
0.6<br />
% vol<br />
0.4<br />
0.4<br />
0.4<br />
0.2<br />
0.2<br />
0.2<br />
0<br />
0<br />
0<br />
a b a b c d e f gh<br />
i jkl<br />
kjlg<br />
ih<br />
fe<br />
b a d c<br />
1373<br />
1373<br />
1373<br />
D<br />
0.9<br />
0.8<br />
frequence<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
A<br />
B<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
0<br />
0.1<br />
0.2<br />
0.3<br />
0.4<br />
0.5<br />
0.6<br />
0.7<br />
0.8<br />
%vol<br />
Fig. 5. Example of Kolmogorov–Smirnov test. (A) Descriptive statistics displaying<br />
the experimental mean and standard deviation of the two experimental groups (A and B).<br />
(B) Descriptive statistics of the individual biological samples of the two experimental<br />
groups. (C) The data sorted in ascending order. (D) Empirical cumulative distribution<br />
functions of the two experimental groups.
342 Carpentier et al.<br />
data to a statistical program. For an example of classical staining and uni- and<br />
multivariate analysis, see Pedreschi et al. (40).<br />
4.2. DIGE Approach, 4 Conditions<br />
In this example, we are interested in the effects of a specific treatment over<br />
time. Using the DIGE approach, we consider here four time points. At each time<br />
point, three biological samples were analyzed, quantifying several hundreds of<br />
protein spots (i.e., variables) per sample per time point. To process and analyze<br />
the gels, the Decyder software version 6.5 was used in combination with the<br />
EDA module (GE Healthcare). The standardized normalization procedure in<br />
Decyder 2D BVA is based on the concept of having for each gel the Cy2<br />
labeled internal standard image as reference. This standard image is used to<br />
normalize the abundance ratios between the different gels. Decyder offers<br />
the possibility to perform transformation and normalization of the data: log<br />
standardized abundance (Eq. 4).<br />
Log standardized abundance = 10 log vol Cy5 or Cy3/vol Cy2 (4)<br />
Using the DIGE approach, Karp and Lilley gathered reasonable arguments to<br />
assume that the restrictive assumptions of parametric statistics are not violated<br />
too strong after the logarithmic transformation of standardized abundance<br />
(1). The use of parametric statistics seems, therefore, acceptable. However,<br />
univariate statistics test the individual variables one by one and are absolutely<br />
not able to correlate multiple proteins. Moreover, testing hundreds of variables<br />
(protein spots) one by one and reporting them with an acceptance of a certain<br />
risk of false positives () enhances the chance of reporting false positive cases<br />
(multiple testing issue). It is, therefore, advisable to get first an insight in the<br />
complex dataset and to explore the data first via multivariate analysis and<br />
validate the individual differences via univariate statistics. Not all proteins are<br />
relevant to understand the differences between the time points. Therefore, it<br />
would be interesting to distinguish relevant proteins from irrelevant proteins<br />
that do not have a changing abundance over time. To facilitate the discovery<br />
of the differences, we used the PCA of the extended data analysis module of<br />
Decyder. PCA reduces more than 1000 variables into PCs that explain most<br />
of the variance between the treatment times. PCA analysis is not supervised,<br />
meaning that the samples are analyzed without the knowledge of sampling<br />
time. In Fig. 6, the score and loading plot are displayed, taking into account<br />
the two most important PCs. The different repetitions of the same time point<br />
cluster together, and the most important PC (i.e., PC1) is able to separate the<br />
clustered treatment times. In practice, this means that proteins with a high<br />
positive PC1 value will be abundantly present in the 2-day gels and less
Statistical Analysis of Proteomic Data 343<br />
abundant in 14-day gels and vice versa for proteins with a highly negative<br />
PC1 value. Proteins that cluster together have a similar impact on the PCs and<br />
have a similar expression pattern (Fig. 6). This rough approach explains only<br />
a small part of the variability. The first PC explains 34.2% of the variability<br />
and explains a great part of the inter-group biological variability (time effect).<br />
A high positive PC1 value is correlated to 2 days, and a high negative value is<br />
correlated to 14 days. Most proteins cluster around the origin, indicating a poor<br />
contribution to the variance and probably do not change in abundance during<br />
the examined time period. The second PC explains 15.1% of the variability<br />
and seems to explain mainly (technical) intra-group variability. By default<br />
EDA ignores the missing values. By anticipating the missing value issue and<br />
taking the average of each experimental group and reducing some technical<br />
variability, the first component explains 60.9% of the variability and the second<br />
PC 23.4%. Taking into account only the proteins that have been matched and<br />
detected in all the gels reduces the number of examined proteins by more<br />
than 50% and discards very useful proteins that have, for instance, a very low<br />
A<br />
B<br />
Fig. 6. PCA analysis. (A) Score plot. The big circle is based on the Hotellings T 2 -test<br />
statistic and is used to detect outlying observables ( 0.95). The three biological replicates<br />
of the same experimental group cluster together, indicating an acceptable intragroup<br />
variability (grey ellipse). The different experimental groups are also separated,<br />
indicating a certain inter-group variability. There is a clear difference between 2 and 14<br />
days of treatment. (B) The loading plot indicates the correlation between the original<br />
variables. A protein with a high loading score for a specific PC explains an important<br />
part of the sample variance.
344 Carpentier et al.<br />
abundance in the early days of treatment and higher abundances at the end<br />
and vice versa.<br />
As an example, we focus on five proteins that seem highly correlated from<br />
the loading plot (highlighted in Fig. 6B). Confirmatory differential expression<br />
analysis via ANOVA confirms that all five proteins have a very similar<br />
expression pattern over time (Fig. 7). This might suggest a common regulatory<br />
mechanism or an interaction between the proteins. The individual confirmatory<br />
univariate statistics (ANOVA and multiple comparison test) confirm for four<br />
out of the five proteins that 2 days is significantly different from 4 days, 8 days,<br />
and 14 days; and that 14 days is significantly different from 4 days and 8 days<br />
( ≤0.01). We could identify four proteins as lectin isoforms (39), confirming,<br />
indeed at a first level, the correlation between the proteins. One protein could<br />
not be identified and is under further investigation. This protein is likely to<br />
have a common regulatory mechanism (being also a lectin-like protein), might<br />
form a complex, or develop an interaction with lectin proteins. This particular<br />
protein shows exactly the same expression pattern as the four identified lectins,<br />
but the overall ANOVA has a value of 0.0122. This is a nice illustration of<br />
Fig. 7. Confirmatory differential expression analysis—expression pattern of the<br />
individual proteins selected from Fig. 6. The different normalized relative abundances<br />
are displayed for the different time points (14 days, 8 days, 4 days, and 2 days). The<br />
mean of each individual isoform is displayed as a cross.
Statistical Analysis of Proteomic Data 345<br />
how exploratory data analysis is performing, indicating correlation but also<br />
bringing up candidate markers that would have been missed when using only<br />
confirmatory data analysis ( ≤ 0.01).<br />
5. Conclusions<br />
The experimental conditions are important and must be well designed.<br />
Ideally, only biological replicate samples should be used, and one should try to<br />
limit the technical variability to the strict minimum. A reliable sample preparation<br />
and an extended experience in electrophoresis and proteomic techniques<br />
are indispensable. With the low technical variance observed with the DIGE<br />
approach, the need for analyzing technical replicates can be questioned. The<br />
pooling of samples reduces the biological variance to detect changes in protein<br />
abundance between the averages of the experimental groups. Pooling of samples<br />
might be useful but must be reconsidered for each individual experimental setup.<br />
The use of a particular staining method should carefully be considered taking<br />
into account the available lab equipment, budget, and power of a particular<br />
method. The dynamic range of the staining methods and the technical variability<br />
have a great impact on the power of a statistical test and are decisive for<br />
the experimental setup (the number of replicates) and the choice of the statistical<br />
test. Univariate statistics test the individual variables one by one and are<br />
absolutely not able to correlate multiple proteins. Moreover, testing hundreds<br />
of variables (protein spots) one by one and reporting them with an acceptance<br />
of a certain risk of false positives () enhances the chance of reporting false<br />
positive cases (multiple testing issue). Therefore, it is advisable to first get an<br />
insight in the complex dataset and to explore the data via multivariate analysis<br />
and validate the individual differences via univariate statistics. Using a classical<br />
approach with the typical heterogeneity of variance associated with classical<br />
dyes and the limited sample sizes, a non-parametric test seems to be the best<br />
choice. Using the DIGE approach, the restrictive assumptions of parametric<br />
statistics are not violated too strong after the logarithmic transformation of<br />
the standardized abundance. The use of parametric statistics seems, therefore,<br />
acceptable.<br />
Acknowledgments<br />
The authors would like to thank Romina Pedreschi for critical reading and<br />
suggestions and Prof. Verbeke for the sharing of his files. Financial support<br />
from the Belgian National Fund for Scientific Research (FWO-Flanders) is<br />
gratefully acknowledged.
346 Carpentier et al.<br />
References<br />
1. Karp, N. A. & Lilley, K. S. (2005) Proteomics 5, 3105–3115.<br />
2. Urfer, W., Grzegorczyk, M., & Jung, K. (2006) Proteomics S2, 48–55.<br />
3. Carpentier, S. C., Witters, E., Laukens, K., Deckers, P., Swennen, R., & Panis, B.<br />
(2005) Proteomics 5, 2497–2507.<br />
4. Bjellqvist, B., Ek, K., Righetti, P. G., Gianazza, E., Gorg, A., Westermeier, R., &<br />
Postel, W. (1982) J. Biochem. Biophys. Methods 6, 317–339.<br />
5. Westermeier, R. (2001) Electrophoresis in Practice. Wiley-VCH, Weinheim.<br />
6. Westermeier, R. & Naven, T. (2002) Proteomics in Practice. Wiley-VCH,<br />
Weinheim.<br />
7. Rabilloud, T. (2000) Proteome research: two dimensional gel electrophoresis and<br />
identification methods. Springer, Heidelberg.<br />
8. Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996) In:<br />
Applied Linear Statistical Models (Neter, J., Kutner, M. H., Nachtsheim, C. J., &<br />
Wasserman, W., eds.). Irwin, Chicago, pp. 958–1010.<br />
9. Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996) In:<br />
Applied Linear Statistical Models (Neter, J., Kutner, M. H., Nachtsheim, C. J., &<br />
Wasserman, W., eds.). Irwin, Chicago, pp. 1121–1164.<br />
10. Karp, N. A., Spencer, M., Lindsay, H., O’dell, K., & Lilley, K. S. (2005) J.<br />
Proteome Res. 4, 1867–1871.<br />
11. Patton, W. F. (2000) Electrophoresis 21, 1123–1144.<br />
12. Westermeier, R. (2006) Proteomics S2 61–64.<br />
13. Switzer, R. C., Merril, C. R., & Shifrin, S. (1979) Anal. Biochem. 98, 231–237.<br />
14. Rabilloud, T., Vuillard, L., Gilly, C., & Lawrence, J. (1994) Cellular and Molecular<br />
Biology 40, 57–75.<br />
15. Unlu, M., Morgan, M. E., & Minden, J. S. (1997) Electrophoresis 18, 2071–2077.<br />
16. Alban, A., Currie, I., Lewis, S., Stone, T., & Sweet, A. C. (2002) Mol. Biol. Cell<br />
13, 407A–408A.<br />
17. Alban, A., David, S. O., Bjorkesten, L., Andersson, C., Sloge, E., Lewis, S., &<br />
Currie, I. (2003) Proteomics 3, 36–44.<br />
18. Tonge, R., Shaw, J., Middleton, B., Rowlinson, R., Rayner, S., Young, J.,<br />
Pognan, F., Hawkins, E., Currie, I. et al. (2001) Proteomics 1, 377–396.<br />
19. Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996) In:<br />
Applied Linear Statistical Models (Neter, J., Kutner, M. H., Nachtsheim, C. J., &<br />
Wasserman, W. eds.). Irwin, Chicago, pp. 95–152.<br />
20. Gustafsson, J. S., Ceasar, R., Glasbey, C. A., Blomberg, A., & Rudemo, M. (2004)<br />
Proteomics 4, 3791–3799.<br />
21. Siegel, S. C. N. J. (1988) Non Parametric Statistics for Behavioral Sciences.<br />
McGraw-Hill Book Company, Singapore.<br />
22. Jackson, J. E. (2003) A User’s Guide to Principal Components. Wiley, New York.<br />
23. Sharma, S. Applied Multivariate Techniques. Wiley, Hoboken, NJ.<br />
24. Pearson, K. (1901) Phil. Mag. Ser. B. 2, 559–572.<br />
25. Hotelling, H. (1933) J. Educ. Psychol. 24, 417–441.<br />
26. Tarroux, P. (1983) Electrophoresis 4, 63–70.
Statistical Analysis of Proteomic Data 347<br />
27. Grove, H., Hollung, K., Uhlen, A. K., Martens, H., & Faergestad, E. M. (2006) J.<br />
Proteome Res. 5, 3399–3410.<br />
28. Marengo, E., Robotti, E., Bobba, M., Liparota, M. C., Rustichelli, C., Zamoo, A.,<br />
Chilosi, M., & Righetti, P. G. (2006) Electrophoresis 27, 484–494.<br />
29. Schultz, J., Gottlieb, D. M., Petersen, M., Nesic, L., Jacobsen, S., & Sondergaard, I.<br />
(2004) Electrophoresis 25, 502–511.<br />
30. Verhoeckx, K. C. M., Gaspari, M., Bijlsma, S., Van Der Greef, J., Witkamp, R. F.,<br />
Doornbos, R. P., & Rodenburg, R. J. T. (2005) J. Proteome Res. 4, 2015–2023.<br />
31. Gottlieb, D. M., Schultz, J., Bruun, S. W., Jacobsen, S., & Sondergaard, I. (2004)<br />
Phytochemistry 65, 1531–1548.<br />
32. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R.,<br />
Botstein, D., & Altman, R. B. (2001) Bioinformatics 17, 520–525.<br />
33. Scheel, I., Aldrin, M., Glad, I. K., Sorum, R., Lyng, H., & Frigessi, A. (2005)<br />
Bioinformatics 21, 4272–4279.<br />
34. Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., & Ishii, S. (2003)<br />
Bioinformatics 19, 2088–2096.<br />
35. Wood, J., White, I. R., & Cutler, P. (2004) Signal Process. 84, 1777–1788.<br />
36. Karp, N. A., Griffin, J. L., & Lilley, K. S. (2005) Proteomics 5, 81–90.<br />
37. Wold, S. (1985) Encyc. Stat. Sci. 6, 581–591.<br />
38. Nguyen, D. V. & Rocke, D. M. (2002) Bioinformatics 18, 39–50.<br />
39. Carpentier, S. C., Witters, E., Laukens, K., Van Onckelen, H., Swennen, R., &<br />
Panis, B. (2007) Proteomics 7, 92–105.<br />
40. Pedreschi, R., Vanstreels, E., Carpentier, S., Robben, J., Noben, J. P., Swennen, R.,<br />
Lammertyn, J., Vanderleyden, J., & Nicolaï,B.M. Proteomics 7, 2083–2099.
18<br />
Web-Based Tools for Protein Classification<br />
Costas D. Paliakasis, Ioannis Michalopoulos, and Sophia Kossida<br />
Summary<br />
Current proteomics technologies generate large number of data among which the investigator<br />
has to identify the promising diagnostic/prognostic biomarkers as well as potential<br />
therapeutic targets. For the latter, classification of proteins into meaningful families is<br />
needed. Current databases, featuring a high level of interconnectivity (cross referencing),<br />
provide the tools necessary to bring various data together, facilitating protein classification<br />
and elucidation of protein function and interoperativity. This chapter provides guidelines<br />
to explore the informationally rich peptide sequences generated by the application of the<br />
proteomics methodologies by the use of web-based tools, with the objective to predict<br />
potential protein function. After proper preprocessing (e.g., for internal repeats) of a query<br />
protein sequence, known domains can be identified, which aid in dividing the query into<br />
smaller meaningful parts. Any unclassified remainder of the protein provides the material<br />
for low-level comparative analysis for the discovery of distant homologues or candidate<br />
novel domain types to be verified experimentally.<br />
Key Words: protein classification; domain families; recurrent tertiary structural<br />
motifs; sequence–structure relationships; (protein) structural evolution; protein database;<br />
homology searches; domain inference; protein structure redundancy.<br />
1. Introduction<br />
From the times of the “one man-one gene” approach, when individuals were<br />
working on single protein sequences, which were decoded from the corresponding<br />
DNA sequences, to the era of high-throughput techniques, when<br />
massive automated procedures produce large numbers of peptide sequences,<br />
one task remains virtually the same: individual protein sequences need classification.<br />
We, humans, have an amazing instinctive capability to categorize<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
349
350 Paliakasis et al.<br />
objects, even the most complex ones, which in particular can be categorized<br />
along various kinds of natural or arbitrary schemes. Proteins feature<br />
multiple attributes, such as sequence, structure, function, organelle specificity,<br />
evolutionary origin, affinity, isoelectric point, and size (not to mention tissue<br />
specificity and antigenicity in higher organisms), all of which offer means for<br />
classification. For instance, 2D gel spots corresponding to proteins, which have<br />
been separated in terms of their size and isoelectric point, reflect a primary<br />
attempt for classification; affinity (e.g., nucleoprotein, lipoprotein, metalloprotein,<br />
etc.) and function (e.g., enzyme, carrier) offer another basis for classification,<br />
both relating to the chemistry of a protein, and basic spectroscopic<br />
data, like those of circular dichroism (which suggest an estimate of the relative<br />
amounts of -stranded vs. -helical structure), permit classification to the all-,<br />
all- or mixed / classes. However, classification schemes based on general<br />
attributes (e.g., the physicochemical properties of proteins) suffer from heterogeneity<br />
within their classes. For instance, a number of otherwise unrelated<br />
proteins can be classified as “metalloproteins.”<br />
In general, two requirements with opposing effects should be satisfied by<br />
any classification scheme: specificity, which leads to particularization (i.e., a<br />
higher number of narrower classes) and abstraction, which leads to generalization<br />
(i.e., a smaller number of wider classes). In the end, a comprehensive<br />
and useful hierarchy is a trade-off between specificity and abstraction (i.e.,<br />
the most general classes possible that are still useful in some desired way).<br />
Proteins, the structures of which represent successful solutions to the problem<br />
of thermodynamic stability and at the same time can accommodate a biologically<br />
useful function, provide the basis of all kinds of radiant variation at the<br />
level of protein sequence (and consequently function). Each protein variant,<br />
that survives the evolutionary pressure of competition against other potential<br />
variants, has emerged after a series of modifications of various extents; an<br />
explanation is presented later on why this is the preferred mode of action.<br />
Common ancestry classification schemes provide the specificity necessary to<br />
define sensible protein classes, in contrast to those classification schemes,<br />
which follow general features. In the former, all members of each class<br />
share a common tertiary structure across very wide evolutionary spans, while<br />
similarities at the level of amino acid sequence remain exploitable, even in<br />
cases where they are hard to detect. Therefore, evolution-based classification<br />
schemes are not driven by our natural impulse to categorize objects drawing<br />
arbitrary borderlines, but reflect basic principles of the protein nature. In<br />
fact, classification with respect to evolutionary history and structure comes<br />
so naturally, that when function is not preserved, we tend to refer to a<br />
“-like” form within the same family of proteins, rather than to a different<br />
family.
Web-Based Tools for Protein Classification 351<br />
Protein sequences derived from a common ancestor by divergent evolution,<br />
share a high degree of similarity (both with each other and naturally with their<br />
ancestor, although the latter may be unknown). This similarity persists over<br />
quite a wide evolutionary span, before it is worn out by divergence and rendered<br />
undetectable by direct pair-wise sequence alignments. Conveniently, it is highly<br />
unlikely that proteins without common evolutionary origin share a high degree<br />
of similarity; in fact, the higher the similarity the more recent the speciation.<br />
It will be shown how these nearest relatives provide the guidelines to identify<br />
the features that are crucial for the definition of a family of proteins, before<br />
the detection of the most remote relationships is attempted. In conclusion,<br />
the amino acid sequence offers a highly specific key to classification, albeit<br />
intermediary members, and structure may need to be consulted, before any<br />
remote members of a class can be detected.<br />
The evolution-based classification schemes, as well as the tools available<br />
over the web to explore them, constitute the subject of the following notes.<br />
Many researchers in the relevant fields tend to take simple homology searches<br />
and domain assignment tools for granted, until an unexpected outcome sheds<br />
doubt and confusion; it is the authors’ intention that by the end of this chapter,<br />
the reader will be capable to conduct those (otherwise routine) tasks with a<br />
higher degree of both awareness and confidence.<br />
2. Materials<br />
The procedure of protein classification comprises several more or less<br />
independent steps. Although these steps have been arranged (in the present<br />
notes) in the order they are usually employed, this order can change, depending<br />
on the nature of information available at each point. Steps can also be omitted,<br />
if they are unnecessary or their target has already been accomplished (although<br />
performing them will provide further reassurance). Each of the steps described<br />
is a small protocol in each own right; a number of web tools – some of them in a<br />
number of variations – implement each of those steps. However, improvement<br />
of user friendliness on one hand and users’ skills on the other has rendered the<br />
procedure to look like a single protocol; in fact, sometimes automation hides<br />
a number of steps of which only the results can be viewed, in the form of a<br />
compiled web page. Instead of listing the websites of all relevant tools, a small<br />
and comprehensive selection of entry points is suggested in Table 1, via which<br />
a wealth of tools is then accessible. All of those websites provide user friendly<br />
interfaces. It is suggested that the reader browses (and gets familiar with) at<br />
least those main websites, before attempting to delve deeper into the realm of<br />
web-based analysis tools.
352 Paliakasis et al.<br />
Table 1<br />
Main Entry Points to the World Wide Web for Protein Classification<br />
ExPASy<br />
www.expasy.org<br />
A wide range of software tools for the analysis of protein sequences and<br />
structures as well as 2D PAGE, can be found here. It also offers an entry<br />
point to a rich collection of other web sites, mainly the SwissProt/UniProt<br />
databases<br />
BLAST<br />
www.ncbi.nlm.nih.gov/BLAST<br />
A convenient starting point for on-line search of sequence databases (both<br />
protein and DNA ones). Many other sites feature some version of BLAST as<br />
well<br />
EnsEMBL<br />
www.ensembl.org<br />
A collection of complete genomes, which offers an entry point from a different<br />
view – that of a genome rather than that of a sequence<br />
Pfam<br />
www.sanger.ac.uk/software/pfam<br />
A collection of profiles of protein families against which a sequence can be<br />
matched, for initial domain recognition<br />
Protein data bank<br />
www.pdb.org and www.rcsb.org<br />
The archive of experimentally determined 3D-structures (by crystallography,<br />
NMR, and other techniques) of biological macromolecules (proteins, nucleic<br />
acids, sugars, etc.)<br />
InterPro<br />
www.ebi.ac.uk/interpro<br />
An effort to integrate information from several diverse sources to a unified<br />
comprehensible form<br />
3. Methods<br />
3.1. Theoretical Issues: Classification Based on Sequence or<br />
Structure<br />
The specifics that define a set of sequences as a protein family (i.e., molecular<br />
function and involved amino acid residues, other kinds of sequence fingerprints,<br />
post-translational modification, etc.) have to be accommodated within<br />
a structural framework Fig. 1. However, 3D structure is not reserved for one<br />
protein family. In fact, there seems to be a countable set of spatially local<br />
packing arrangements between -helices and -sheets, which, when combined,
Web-Based Tools for Protein Classification 353<br />
Fig. 1. Complex shapes can be misclassified by a general property like size, because<br />
of small (or larger) parts missing in relation to the simplest forms from which they derive.<br />
More specific (“shape-related”) attributes can bring all stars (and parts thereof) together,<br />
as they can do with triangles, squares, and circles. Once a proper overall scheme is<br />
in place, general attributes (like color) can then detail the distribution within each class.<br />
lead to 3D structural assemblages, stable in terms of thermodynamics and<br />
useful in terms of function (1). The participant elements may be distant along<br />
the sequence or they may even belong to different chains. The small number<br />
of packing options leads to the occurrence of common 3D structural themes,<br />
termed the recurrent tertiary motifs, e.g., “up-and-down” helical bundles, -<br />
barrels, etc. Descriptions at this level of abstraction take into account neither the<br />
sequential order of the helices and strands nor their length. Tertiary structural<br />
domains in proteins of unrelated evolutionary origin (or function) with apparently<br />
unrelated sequences, may adopt the same tertiary motif (usually including<br />
further 3D structural elements [(2) see also Note 1]. It can be claimed that<br />
the abstract idea of a recurrent tertiary motif leans toward the basic packing<br />
arrangements,whereastheimplementeddomainsareclosertotheproteinfamilies.<br />
The 3D environment of certain positions on the structure (a different set<br />
of positions for each recurrent tertiary structural motif) poses physicochemical
354 Paliakasis et al.<br />
5-vdef sNIR[enpvtpwnpeps]<br />
: * : * + : *+<br />
R1: A PVID PT AYID PE ASVI G<br />
R2: E VTIG AN VMVS PM ASIR S[degm]<br />
R3: P IFVG DR SNVQ DG VVLH A[letineegepiednivevdgkey]<br />
R4: A VYIG NN VSLA HQ SQVH G<br />
R5: P AAVG DD TFIG MQ AFVF -<br />
R6: K SKVG NN CVLE PR SAAI -<br />
R7: G VTIP DG RYIP AG MVVT - <br />
-------------------------<br />
CNS: a VfIG DN vyIa pQ AvVh(g|s) (Consensus)<br />
BS#1 T1 BS#2 T2 BS#3<br />
Fig. 2. The seven repeats that form the -helix in MT-CA demonstrate the level of<br />
the impact that structure can have on sequence. The -strands (groups of four residues)<br />
are shown, separated from the intervening “turns” (groups of two). The turns that<br />
connect successive repeats are split–one residue at the left end and a second one, which<br />
is missing in some cases, at the right end. Parts of the sequence in square brackets<br />
[] are intervening connecting loops; the part in angle brackets follows this core<br />
motif and is not part of the repeat sequence. The His residues that coordinate the Zn<br />
atom are underlined, and stem from positions (within the repeat) marked by a plus<br />
sign (+). A partial repeat (every six positions) has been proposed on the basis of other<br />
sequences that adopt this structure; the positions marked by stars (*) correspond to main<br />
positions in this (partial) repeat, and the ones marked by colon (:) correspond to the<br />
secondary ones. No repetition of this kind (i.e., every six positions) is apparent for any<br />
other positions, leaving the 17–18 residues long repeat unit as the only complete one.<br />
Positions Asn10–Arg12 (top row) form a small extension the -sheet #3; preceding<br />
residues are shown for completeness and only to emphasize that the repeat does not<br />
extend in them. In the consensus, drawn at the bottom row, the main ingredients of the<br />
repeat unit are shown in capital letters.<br />
requirements, which can be best met usually by one or a few amino acid types),<br />
thus defining a scale of preferences (3). These preferences are reflected onto<br />
patterns that may arise at the level of the primary sequences (that adopt the<br />
relevant recurrent tertiary motifs), whenever these spatially defined positions<br />
are close along the sequence Fig. 2. It should be noted, that these patterns are<br />
reflections along the sequence of the abstract tertiary theme and that they are<br />
much more general than the detailed protein family-specific sequence fingerprints.<br />
Simplified lattice models suggest that a small number of 3D structural<br />
motifs set loose requirements that can be met by a large number of sequences,<br />
along their evolutionary pathway (4). In this case, nature appears to reuse a
Web-Based Tools for Protein Classification 355<br />
successful structural solution in evolutionarily unrelated sequences (see Note<br />
2). On the other end, a large number of 3D structural motifs pose requirements<br />
so manifold and exact that only a few sequences can be compatible with them.<br />
The resultant patterns of preferences along the sequence appear occasionally<br />
strong enough to permit structural motif prediction from the sequence alone (5).<br />
It can be claimed that no more than 200 recurrent tertiary structural motifs<br />
(the exact number depending on the stringency of their definition) provide the<br />
structural basis of perhaps 95% of the nonredundant set of protein structures<br />
(2). The average residue coverage is a much smaller figure due to the need<br />
of additional structural elements to complete a domain. Vice versa, a large<br />
number of tertiary structural motifs are so rare, that they provide the basis of the<br />
small remaining proportion of protein structures (see Note 3). Detailed specialization<br />
into families takes place within this structural framework: Chothia (6)<br />
has long ago estimated that 95% of the protein information to be discovered<br />
will derive from no more than 1000 protein families. In fact, for a substantial<br />
(and growing) proportion of any newly identified protein sequences, enough<br />
information already exists in the databases to build a 3D model (7). The<br />
reason for this lies on a simple fact: during the creation of new protein<br />
families, the relatively small number of structural alternatives directs nature<br />
to a strong preference for the reuse of already successful solutions at the<br />
level of sequence (not structure), especially when similar problems are to be<br />
solved rather than discovering new ones, on the basis of the same or different<br />
structure. The traits being inherited along reuse of sequences are usually the<br />
ones to be exploited in protein classification. On the other hand, this small<br />
set of structural motifs, the ones easily accessible to protein families of irrelevant<br />
origin and/or function, occasionally leads otherwise unrelated proteins to<br />
elevated sequence similarity scores (which sometimes appear too high to be<br />
explained by chance), just because they fold in the same manner (see Note 4).<br />
The traits being developed (as opposed to being inherited) reflect convergent<br />
evolution.<br />
Protein structure has also served as the basis of classification in some<br />
schemes. However, the theoretical considerations, which have been discussed<br />
herein (in particular, the fact that unrelated proteins may fold in the same way),<br />
hint that classification on the basis of 3D structure alone, will tend to be on a<br />
coarser scale. On the other hand, the availability of detailed structural data for a<br />
(preferably representative) member of a protein family, experimentally derived<br />
by means of X-ray crystallography or NMR spectroscopy, besides all kinds of<br />
facilitation reserved for other procedures (e.g., structure-based protein design),<br />
offers a valuable aid in sequence-based classification. It provides a very solid<br />
ground to assess any sequence-based classification, and a great tool to detect<br />
the most remote members. However, unless classifying protein structure per se
356 Paliakasis et al.<br />
(rather than proteins in their entirety), it appears that a common structural architecture<br />
alone is not sufficient evidence to classify proteins in the same class.<br />
Evolutionarily refined variants of tertiary structural domains, “similar-yetdifferent”<br />
within a given repertoire, appear in different combinations with those<br />
of other repertoires: a domain for a different cofactor or regulatory factor<br />
(e.g., GDP vs. ADP) may be combined with a catalytic domain for a slightly<br />
different substrate (fructose vs. glucose). Thus, the most complicated and best<br />
tuned series of (simpler) functions, necessary for life, can be accomplished<br />
in a spatially ordered and life efficient manner. On the other hand, this fact<br />
makes essentially imperative that any classification proceeds up to terms of<br />
domains: it suffices to describe any sequence in question, as comprising of “an<br />
N-terminal domain of type X and a C-terminal domain of type Y, joined by a<br />
loop region of type Z,” otherwise, extensive subtyping and the “Russian doll”<br />
effect (see Note 5) will soon be confronted.<br />
In practice, the classification procedure starts in the form of the detection<br />
of some similarity between a protein (or part thereof) and a prototype (e.g.,<br />
a profile extracted from a multiple alignment or a structure through which<br />
it is threaded), which is too high to explain by chance alone. The tools to<br />
demonstrate this similarity are presented under the Subheading 3.2, in any case,<br />
it will be the network of similarities within a set of data (sequences, structures,<br />
etc), which will clarify the underlying reason for the observed similarity.<br />
3.2. The Practical Side<br />
It cannot be stressed enough that most protein sequences are nowadays translations<br />
of relevant nucleic acid sequences. It is important to identify cDNA<br />
originals if possible, to ensure that the employed nucleic acid sequence corresponds<br />
to protein in a reliable way. When the original data are supplied in<br />
the form of genomic DNA fragments, introns could still be included and alternative<br />
splicing remains a possibility. Current gene recognition programs like<br />
GeneScan (8), normally expected in genome-oriented databases like EnsEmbl<br />
(9) (see Note 6), can efficiently detect and remove introns, but errors may still<br />
infiltrate. If this is the origin of the protein data, certain precautions should be<br />
taken:<br />
• Search for relevant proteins with reliable sequences, e.g., by means of a preliminary<br />
Basic Local Alignment and Search Tool (BLAST) (10) search against SwissProt (11).<br />
• Align the sequence of interest to any trustworthy matches and observe the pattern<br />
of conservation. Sudden insertions to the sequence in question (especially ones with<br />
highly biased composition, short tandem repeats or repetitions of other parts of the<br />
protein, especially partial ones, etc) do not necessarily represent extra features or<br />
minidomains; deleted parts may have been mistakenly considered to be introns.
Web-Based Tools for Protein Classification 357<br />
• Isolate “candidate” insertions and try to find similar sequences in the databases; see<br />
if any trustworthy match makes sense in terms of biology.<br />
• Alternatively, try finding a protein in the Protein Data Bank (PDB) (12), which<br />
is similar (even remotely) to the one in question (excluding the insert), and has<br />
its 3D structure experimentally known (see Note 7). The location of the candidate<br />
insertion/deletion on the structure may verify or reject it.<br />
• Parts of the query protein matching expressed sequence tags (ESTs) (13) provide an<br />
extra source of verification (see Note 8): a part matching an EST is an expressed part.<br />
Other criteria may apply to verify the integrity of a processed putative gene.<br />
For example, if the protein has been biochemically characterized, then any<br />
experimentally observed property must match the ones of the sequence that is<br />
predicted by the gene (or have a good reason why it does not).<br />
Another very serious issue is the fact that many annotations are automatically<br />
transferred between similar sequences of the same or different databases.<br />
Even SwissProt entries are crowded with annotations assigned “by similarity.”<br />
The number of proteins with primary annotations is many orders of magnitude<br />
smaller than the number of annotated sequences in the current databases.<br />
These annotations should be considered as hints that can direct experiments to<br />
promising routes rather than secure data.<br />
3.2.1. Preprocessing the Query<br />
A preliminary check up of the protein sequence itself is recommended.<br />
Repeats and parts of low complexity are of particular interest.<br />
3.2.1.1. REPEATS<br />
Regularities in biological macromolecular structure (like the helical nature<br />
of DNA or the super-coiled structure of some protein assemblies) and multimerization<br />
create room for repetitions along the protein sequences. Repeats can<br />
range in length from a few amino acid residues to complete domains (e.g., as<br />
a result of domain duplication).<br />
In the latter case, the repetition count is usually small, just two to three<br />
copies (14) although much higher counts do occur. When catalytic domains are<br />
repeated, the situation may have no ground on structural regularities; it may<br />
for instance reflect a need for efficiency (e.g., cooperativity between different<br />
copies of a domain). In database searches for multidomain protein queries,<br />
it is anyway recommended to treat different domains separately, for reasons<br />
explained later on; the difference here lies in the fact that the separate copies<br />
can be aligned, and their consensus (or profile) can be extracted and serve as<br />
the query.
358 Paliakasis et al.<br />
On the other hand, short tandem repeats (e.g., about 10 amino acid residues<br />
long or shorter) normally reflect some structural regularity. In a dot-plot<br />
style alignment of a protein sequence to itself they manifest themselves as a<br />
(moderate-to-high) number of tracks, which run parallel to the main diagonal<br />
(and to each other) in a regular manner (Fig. 3). Since combinations of parts<br />
coming from different tracks produce significant alternative alignments, procedures,<br />
which attempt to report all possible alternative alignments between two<br />
proteins will be severely confounded (see Note 9 on BLAST in particular).<br />
A consensus or a profile may be extracted again by a proper alignment of<br />
the repeats. However, statistically significant matches cannot be expected for<br />
a resultant query of (say) 6 or 12 amino acid residues long. One possible cure<br />
is to concatenate a small number of repeats, to produce a query no longer than<br />
50 amino acid residues (see Note 10 on why 50). The small number of repeats<br />
(e.g., four repeats of length 11) helps avoiding the explosion of alternatives,<br />
although a few of them will not be completely avoided. If this step is taken, it is<br />
suggested that the output of a dot-plot utility (such as DOTLET, a Java-based<br />
hosted in ExPASy server; Table 1) is consulted, at all times.<br />
3.2.1.2. Parts of Low Complexity<br />
Low complexity occurs when some part of the sequence comprises only<br />
a few types of amino acid residues, leading database queries to nonspecific<br />
results (see Note 11); the situation can be even worse if some of these types<br />
are similar to each other. In general, it is important to know beforehand any<br />
significant deviations of the composition in types of amino acid residues, as<br />
well as the presence of special features such as signal peptides or groupings<br />
of biologically relevant charged side chains (see Note 12). Relevant search<br />
procedures, like BLAST (10), detect stretches of low complexity and offer<br />
to ignore them during the search; however, what appears to be a part of low<br />
complexity may be e.g., a transmembrane stretch. The action to take depends<br />
on both the importance and the position of the stretch:<br />
• If a single transmembrane part makes sense (or is known to exist), the extra- and<br />
intracellular moieties can be separate queries.<br />
• A signal peptide (especially when located at the extreme of the N-terminus) usually<br />
can be excluded from the procedure, profitably or at least without problem.<br />
• A stretch of low complexity, which appears to be of no special significance in terms<br />
of structure/function/evolution, can be best left to the search procedure to mask it.<br />
Relevant tools are available from the Web (e.g., the ExPASy site). Alternatively,<br />
a simple dot-plot style alignment of the protein sequence can be run vs. itself.<br />
Besides repeats, this will reveal areas of low complexity as square blocks of<br />
elevated average score, symmetrical around the main diagonal (Fig. 3). If low
Web-Based Tools for Protein Classification 359<br />
(A)<br />
(B)<br />
Fig. 3. Continued
360 Paliakasis et al.<br />
complexity occurs within the boundaries of a repeat, similar square blocks will<br />
appear around relevant parallel off-diagonal tracks.<br />
3.2.2. Inference of Domains<br />
In the spirit of the theoretical analysis earlier in this chapter, classification<br />
can take the form of assigning parts of the sequence to domains. Hence, using<br />
a domain inferring tool like the ones offered by Pfam (15) and SMART (16)<br />
should be among the first steps for classification of a protein, based on its<br />
sequence (see Note 13). This information serves to divide the sequence of<br />
interest into pieces and handle them separately (see Note 14).<br />
Given the high coverage achieved by those collections (more than 75% of<br />
the proteins have at least one domain recognized by them, and in average about<br />
two-thirds of the length of a protein can be described this way) (15), some<br />
protein sequence classification efforts end here (see Note 15). In fact, database<br />
search procedures should be soon expected to exploit high-level features, which<br />
will be extracted from the query and relevant sequences, resorting to amino<br />
acids alone, only for parts where the attempts will fail.<br />
3.2.3. Querying Other Databases<br />
Despite the current high coverage of protein sequences in terms of known<br />
domains, parts of these sequences still elude. These parts may simply be<br />
too distant members of the families they belong to, and they have failed the<br />
thresholds of automatic procedures. Those parts should be isolated, properly<br />
preprocessed (mainly for compositional biases), and queried against SwissProt<br />
and PDB.<br />
• Entries (records) in SwissProt (11) offer rich annotation and crossreferences to a<br />
number of resources, all in a mainly human readable form and via a nice user<br />
friendly interface on top. The high level of curation (including annotation derived<br />
by similarity) will save duplicate efforts and may provide valuable hints on how to<br />
move on.<br />
◭<br />
Fig. 3. (Continued) (A) Schematic representation of a dot-plot style alignment of<br />
a protein against itself; to depict the special cases presented in the text, the protein<br />
is supposed to feature two copies of some domain, a low complexity N-terminus and<br />
a C-terminal part dominated by some short internal repeat, except for a tail, which<br />
appears unique. (B) Alignment of a small part (from a real protein) of low complexity<br />
against itself. The situation here is worse than suspected, because the few types of<br />
amino acid residues are related to each other (alanine to valine and glycine; to proline<br />
and serine in lesser extent).
Web-Based Tools for Protein Classification 361<br />
• Search for similar sequences in PDB (12) will reveal experimentally determined<br />
3D structures of protein instances, possibly related (e.g., through evolution) to the<br />
protein of interest. A 3D structure offers a model (even before a model of the query<br />
sequence is built, following this information) to think on, a toy on which to visualize<br />
and handle data in far more efficient ways (see Note 16).<br />
If domains are inferred by the relevant procedures (or supplied by SwissProt<br />
annotation) and/or long stretches (say 30–40 amino acid residues or longer) of<br />
special behavior are observed, it is a good idea to handle each sequence part<br />
separately, or in small meaningful combinations, for instance, there may be no<br />
reason to treat, say, a propeptide separately from the main body of the domain<br />
it belongs to (see Note 17 and 18).<br />
If a few top hits of a database search can be aligned to the query with<br />
confidence, and the next ones are marginal (see Note 18), the output of a<br />
multiple alignment of the best hits (including the query) should be converted<br />
to some kind of profile [e.g., a position-specific scoring matrix (PSSM)] and<br />
the database should be scanned for the resulting profile (see Note 19). The<br />
marginal hits of the initial query (i.e., the protein of interest) that match positions<br />
conserved throughout the profile will have their statistical significance increased<br />
and they will surface. If domain inferring programs can detect some kind<br />
of domain on those (initially marginal) hits, this information can then be<br />
transferred to the initial query with confidence (recall: the query is part on<br />
which no domain was detected).<br />
The few top hits will be sometimes marginal (see Note 18). Each of the<br />
“best” marginal hits should be used as a query and a number of homologues<br />
(about 10; see Note 20) should be collected and aligned without the initial<br />
query (i.e., protein of interest). Some kind of profiles (e.g., a PSSM) should<br />
be produced by those alignments and the relevant part of the initial query (i.e.,<br />
protein of interest) should be aligned against them. If the initial query matches<br />
the profile at conserved positions (see Note 21), the hit was not fortuitous.<br />
Again, if domain inferring programs can detect some kind of domain along the<br />
sequences that formed the profile, this information can then be transferred to<br />
the initial query with confidence.<br />
Other databases provide annotation at high level on specific tasks. InterPro<br />
(17) offers a convenient entry point to a number of them, especially for manual<br />
sequence classification (as opposed to some massive automated procedure).<br />
SuperFamily (18) builds information based on classification of 3D structures (a<br />
hit here implies structural similarity regardless of common function or evolutionary<br />
origin), PRINTS (19) and PROSITE (20) and one may continue with a<br />
long list, where each member targets a specified problem (e.g., if the protein of<br />
interest is found to be a peptidase, MEROPS (21) may be consulted for further<br />
relevant classification).
362 Paliakasis et al.<br />
4. Notes<br />
1. It is just often a simple operation (e.g., a function) that is built by (part of) the<br />
sequences as 3D domains. For instance, there are tertiary structural domains,<br />
which simply bind a cofactor and feature an allosteric position, where some<br />
regulatory factor (e.g., ADP) will dock to exert its role. The active site may<br />
reside on a separate domain, or may be shared between two of them, within the<br />
range of the cofactor.<br />
2. Unpublished work (C.D.P., Ph.D. thesis) in continuation of (3) suggests that the<br />
requirements set – albeit too vaguely – by an -helical “up-and-down” bundle,<br />
which is an abundant tertiary structural motif, raise the relevant parts of the<br />
sequence to the extreme 0.1–1% of a suitable distribution, when proteins in<br />
a databank are scored for compatibility. This shift is not enough for structure<br />
prediction from the sequence alone (too many false positives), but it still reflects<br />
a possibly minimal set of requirements posed by the structure for compatible<br />
sequences.<br />
3. There is a tendency to treat the observed structural solutions, i.e., the recurrent<br />
tertiary structural motifs and domains, as the end evolutionary product of our<br />
days. In fact, all the preceding evolutionary steps (as well as the future ones,<br />
probably) had to employ one of the solutions provided in this relatively narrow<br />
set. If we depict this set, so that similar architectures are close to each other,<br />
then “evolution” is a “walk” through this set. Whether this set is continuous or<br />
partitioned in a discontinuous manner, is the subject of ongoing research.<br />
4. A continuum is thus established in the scale of similarities between protein<br />
sequences, on one end, the small biases due to simple facts (e.g., two transmembrane<br />
pieces are coincidentally matched); remote similarities due to common<br />
structural architecture, in the middle of the scale; and on the other end, 30% (or<br />
more) identity observed due to common origin of a protein from a mammal to<br />
a bacterial homologue (and, usually, more than 80%, e.g., between mammals,<br />
etc.).<br />
5. This effect characterizes the situation in which a particular domain includes a<br />
smaller one, plus some extra structural elements (“decorations”); then, the new<br />
total constitutes part of a larger domain, which includes some further structural<br />
elements, and so on. Orengo and coworkers (2) have presented a number of<br />
examples in their series of papers on classification of protein structure.<br />
6. The version of BLAST featured in EnsEmbl can run against the results of<br />
GeneScan; this does not simply translate genomic DNA into Opening Reading<br />
Frame (ORFs) before comparison, but it also attempts to “splice” it, after<br />
predicting and removing potential introns. Other task-specific databases feature<br />
relevant tools.<br />
7. The version of BLAST at the National Center for Biotechnology Information<br />
(NCBI) has access to all protein sequences of known structure. Alternatively,<br />
the PDB resource (Table 1) can be directly accessed for this purpose, losing<br />
however the interconnection to other databases offered by NCBI.<br />
8. Like in the previous Note 7, access by means provided by NCBI is recommended.
Web-Based Tools for Protein Classification 363<br />
9. For example, BLAST seeks all the instances where a small part from the<br />
query matches the protein of interest. Then to form longer alignments, BLAST,<br />
depending on its version, either expands these “seed-alignments” to contiguous<br />
subalignments, uninterrupted by gaps, which are then joined in all valid combinations,<br />
or expands the seeds in a gapped alignment fashion. The presence<br />
of short repeats may make the output particularly hard to follow, due to the<br />
numerous alternatives.<br />
10. Sander and Schneider (22) suggest that the minimum percentage of identity<br />
between two proteins, which is required to imply structural similarity converges<br />
to about 27% for common alignment length of about 80 amino acids. However,<br />
the change in the range of 50–80 is small to justify inclusion of further repeats,<br />
which would increase the number of alternative alignments. See also Note 18.<br />
11. For instance, assume that a stretch, about 20 amino acid long or longer, is<br />
dominated by leucine, isoleucine, and perhaps a couple of phenylalanines. Not<br />
only will this part be nonspecifically matched to any sequence that features a<br />
similar deviation in composition, but the resulting alignment will also appear<br />
unstable in this part, because of the numerous and almost equivalent alternative<br />
ways in which two stretches of the kind can be aligned.<br />
12. For example, a large deviation toward lysine and alanine will make the sequence<br />
look like a histone. Scanning a databank for similar peptide sequences, the<br />
results will tend to include nonspecific stretches rich in positive (and negative<br />
to a lesser extent) charges, in general.<br />
13. The NCBI/BLAST Server (Table 1) offers CDD (conserved domain databank),<br />
which is based on both Pfam and SMART, further including collections internal<br />
to NCBI. Other servers may offer similar compilations. However, for detailed<br />
inquiries one may need to resort to the original resources. The information<br />
presented by the original collection can be much richer. Furthermore, each<br />
specialized collection offers tools for flexible searches in terms of combinations<br />
of various domains, to help detect proteins of similar architecture, reference<br />
similarities to other related domain, and so on.<br />
14. The fact that tertiary structural domains tend to behave independently should be<br />
exploited. Bench work can usually be facilitated by studying isolated domains,<br />
e.g., if some part of a protein makes the molecule hard to crystallize, the relevant<br />
information (if available) could indicate which part to remove. Information<br />
derived using domain inferring tools can serve to divide a sequence of interest<br />
into meaningful pieces.<br />
Bioinformatics work may as well get similar profits, e.g., during databases<br />
search: assume for example that a protein includes a general hydrolase domain<br />
(e.g., an esterase), which is found in many combinations with other domains,<br />
which particularize its use; and it also contains a domain, which is specific for<br />
the family this sequence belongs to. It will be the latter that will boost the most<br />
relevant sequences to the top of the sorted list of BLAST results; accordingly,<br />
it will be the one to drive the query protein to the correct subfamily within the<br />
framework of a larger family.
364 Paliakasis et al.<br />
15. In the case of multidomain proteins, each hit to a constituent domain (or a<br />
significant part of it), signifies the existence of a related part in the databank.<br />
Occasionally, some domains will seem apparently missing: either the relevant<br />
part of the sequence appears deleted or an expected domain is not recognized<br />
along it. Given the statistical nature of the recognition procedure and the<br />
nucleotide nature of underlying primary data, the tempting conclusion that this<br />
domain/part is not present, is by no means secure.<br />
• If the relevant part of the sequence is present, you may check whether<br />
domains, which were recognized by domain inference programs along remote<br />
homologues of this part, can be transferred by means of alignment involving<br />
preformed multiple alignments, as described in Subheading 3.2.3 for the case<br />
of remote hits.<br />
• If the relevant part of the sequence seems absent, then despite the efficiency<br />
of genetic data manipulation procedures, parts of the sequence may have been<br />
accidentally considered as introns. Once some major part of a multidomain<br />
protein has been located on the complete genome, the hits should serve as<br />
pointers to the location to search more carefully at. Perhaps the next generation<br />
of data-mining will perform this retro-search of missing parts automatically<br />
(like the iterative BLAST is performed today). Until then, and in spite of<br />
the times of high-level annotation (which will retrieve the major part of the<br />
information being hunted) one should be ready for straightforward TBLASTN<br />
of minor parts of the sequence in hand to rule out their existence conclusively<br />
and beyond reasonable doubt.<br />
16. When an experimentally determined 3D structure for a similar sequence exists<br />
in PDB, then the sequence of interest and the matching structure can be input<br />
to some automated model building server (like the SwissModel Server; some<br />
servers may also need a ready made alignment between the two) and get a<br />
3D structural approximation of the query protein. If nothing else, inspection<br />
of this model will explain any mutational data available and will reveal key<br />
locations for experimentation by means of site-directed mutagenesis and other<br />
kinds of modification and querying (instead of blind trials along the sequence),<br />
in order to infer the mechanism of function or other valuable information. If<br />
the quality of the alignment is poor, but both the sequence and the structure<br />
can be aligned to e.g., a profile, this intermediary link can mediate alignment<br />
between the protein of interest and the distantly related sequence of known<br />
structure. Alternatively the remote match may serve as the query to retrieve<br />
further sequences homologous to the hit, in order to align the original query to<br />
their preformed multiple alignments, as it is described under Subheading 3.2.3.<br />
17. The expectancy value (E-value) provided with the sorted hit list by BLAST<br />
depends on the product of the length of the database by the length of the<br />
query. Assuming that matching counterparts exist for just one of the domains<br />
and that this domain comprises a small part of the total protein, BLAST may
Web-Based Tools for Protein Classification 365<br />
miss matching hits of marginal similarity, just because the length product was<br />
unnecessarily (thanks to domain independence) too large.<br />
18. The expectancy value should be regarded as only a rough measure. It would<br />
be a more accurate measure of the expected number of hits, if databases were<br />
nonredundant (i.e., they contained absolutely nonhomologous sequences) and<br />
there were no biases toward specific types of amino acid residues or toward<br />
sequence patterns (e.g., the amphipathic ones met in -helices, which account<br />
for about one quarter of protein structure in general). Besides, Sander and<br />
Schneider (22) have long shown that as soon as a subalignment of a given size<br />
exceeds a relevant level of identity, 3D structural similarity can be assumed,<br />
independently of the length of the proteins which participate in the comparison<br />
or the number of sequences which the query is compared to. They suggest a<br />
threshold t(L) = 290.15 × L 0562 for L < 80 and about 27% for L > 80; cases<br />
with identity level higher than t(L) assume related structure, allowing only a<br />
small acceptable number of false positives. Alignments lying at the lower side<br />
of the line as this derives from the equation mentioned above, do not necessarily<br />
signify proteins of unrelated structure. For them, structural similarity, if existant,<br />
cannot be simply asserted with confidence. Similarity is rendered more and more<br />
improbable as the relevant figures decrease.<br />
19. Details on how to make or use a PSSM may change with implementation. It<br />
is worth spending some time on the on-line help offered on PSSM under their<br />
implementation at the NCBI. In any case, Clustal (23) may be used to align a<br />
sequence to a block of prealigned sequences, or even to two preformed multiple<br />
alignments. In both cases, if conserved positions in the “reference” block are<br />
conserved along the query sequence (or the query block) the match is reliable.<br />
Pfam (15) offers the tools for another approach involving hidden Markov model,<br />
the explanation of which is beyond the scope of the present notes.<br />
20. Following the results of Henikoff and Henikoff (24,25), it seems that about 10<br />
homologues are usually already enough, with the reservation that they should<br />
cover, if possible, all the range of similarities from 90% down to 40–30%. If<br />
all of them are too similar to each other, it will be as if the same sequence was<br />
included 10 times. If all of them are too dissimilar to each other, then the risk<br />
of mistakes in their multiple alignment will be too high.<br />
21. As a reassurance, in case that a hit is correct, some of the sequences that are<br />
homologous to the hit should have appeared in the hit list of the initial search<br />
(i.e., the one in which the protein of interest was the query sequence). If just<br />
one protein from a large family was reported, chances are that the hit was<br />
coincidential.<br />
References<br />
1. Richardson J.S. and Richardson D.C. (1989) “Principles and patterns of protein<br />
conformation.” In: Fasman G. (ed) “Prediction of Protein Structure and the<br />
Principles of Protein Conformation.” Plenum Press, NY, pp 1–98.
366 Paliakasis et al.<br />
2. Orengo C.A. and Thornton J.M. (2005) “Protein families and their evolution – a<br />
structural perspective.” Annu. Rev. Biochem. 74, 867–900.<br />
3. Paliakasis C.D. and Kokkinidis M. (1992) “Relationships between sequence and<br />
structure for the four--helix bundle tertiary motif in proteins.” Protein Eng. 5,<br />
739–748.<br />
4. Lattman E.E., Fiebig K.M. and Dill K.A. (1994) “Modeling compact denatured<br />
states in proteins.” Biochemistry 33, 6158–6166.<br />
5. Lupas A., vanDyke M. and Stock J. (1991) “Predicting coiled-coils from protein<br />
sequences.” Science 252, 1162–1164.<br />
6. Chothia C. (1992) “One thousand families for the molecular biologist.” Nature<br />
357, 543–544.<br />
7. Schwede T., Kopp J., Guex N. and Peitsch M.C. (2003) “SWISS MODEL:<br />
an automated protein homology modeling server.” Nucleic Acids Res. 31,<br />
3381–3385.<br />
8. Burge C. and Karlin S. (1997) “Prediction of complete gene structures in human<br />
genomic DNA.” J. Mol. Biol. 268, 78–94.<br />
9. Hubbard T., Andrews D., Caccamo M., et al. (2005) “Ensembl 2005.” Nucleic<br />
Acids Res. 33, D447–D453.<br />
10. Altschul S.F., Madden T.L., Schäffer A.A., Zhang J., Zhang Z., Miller W. and<br />
Lipman D.J. (1997) “Gapped BLAST and PSI-BLAST: a new generation of protein<br />
database search programs.” Nucleic Acids Res. 25, 3389–3402.<br />
11. Bairoch A., Apweiler R., Wu C.H., Barker W.C., Boeckmann B., Ferro S.,<br />
Gasteiger E., Huang H., Lopez R., Magrane M., Martin M.J., Natale D.A.,<br />
O’Donovan C., Redaschi N. and Yeh L-S.L. (2005) “The universal protein resource<br />
(UniProt).” Nucleic Acids Res. 33, D154–D159.<br />
12. Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H.,<br />
Shindyalov I.N. and Bourne P.E. (2000) “The protein data bank.” Nucleic Acids<br />
Res. 28, 235–242.<br />
13. Boguski M.S., Lowe T.M.J. and Tolstoshev C.M. (1993) “dbEST – database for<br />
expressed sequence tags.” Nature Genet. 4, 332–333.<br />
14. Apic G., Gough J. and Teichman S.A. (2001) “Domain combinations in archaeal,<br />
eubacterial and eukaryotic proteomes.” J. Mol. Biol. 310, 311–325.<br />
15. Bateman A., Coin L., Durbin R., Finn R.D., Hollich V., Griffiths-Jones S., Khanna<br />
A., Marshall M., Moxon S., Sonnhammer E.L.L., Studholme D.J., Yates C. and<br />
Eddy S.R. (2004) “The Pfam protein families database.” Nucleic Acids Res. 32,<br />
D138–D141.<br />
16. Letunic I., Copley R.R., Pils B., Pinkert S., Schultz J. and Bork P. (2006) “SMART<br />
5: domains in the context of genomes and networks.” Nucleic Acids Res. 34,<br />
D257–D260.<br />
17. The InterPro Consortium; Mulder N.J., Apweiler R., Atwood T.K., et al. (2005)<br />
“InterPro, Progress and Status in 2005.” Nucleic Acids Res. 33, D201-D205.<br />
18. Madera M., Vogel C., Kummerfeld S.K., Chothia C. and Gough J. (2004) “The<br />
SUPERFAMILY database in 2004: additions and improvements.” Nucleic Acids<br />
Res. 32, D235-D239.
Web-Based Tools for Protein Classification 367<br />
19. Attwood T.K., Bradley P., Flower D.R., Gaulton A., Maudling N., Mitchell A.L.,<br />
Moulton G., Nordle A., Paine K., Taylor P., Uddin A. and Zygouri C. (2003)<br />
“PRINTS and its automatic supplement, preprints.” Nucleic Acids Res. 31, 400-402.<br />
20. Hulo N., Bairoch A., Bulliard B., Cerutti L., de Castro E., Langendijk-Genevaux<br />
P.S., Pagni M. and Sigrist C.J.A. (2006) “The PROSITE database.” Nucleic Acids<br />
Res. 34, D227-D230.<br />
21. Rawlings N.D., Morton F.R. and Barrett A.J. (2006) “MEROPS: the peptidase<br />
database.” Nucleic Acids Res. 34, D270–D272.<br />
22. Sander C. and Schneider R. (1991) “Database of homology-derived protein structures<br />
and the structural meaning of sequence alignment.” Proteins: Struct. Fun.<br />
Gen. 9, 56–68.<br />
23. Thompson J.D., Higgins D.G. and Gibson T.J. (1994) “CLUSTAL W: improving<br />
the sensitivity of progressive multiple sequence alignment through sequence<br />
weighting, positions-specific gap penalties and weight matrix choice.” Nucleic<br />
Acids Res. 22, 4673–4680.<br />
24. Henikoff S. and Henikoff J.G. (1992) “Amino acid substitution matrices from<br />
protein blocks.” Proc. Natl. Acad. Sci. USA 89, 10915–10919.<br />
25. Henikoff S. and Henikoff J.G. (1993) “Performance evaluation of amino acid<br />
substitution matrices.” Proteins Struct. Fun. Gen. 17, 49–61.
19<br />
Open-Source Platform for the Analysis of Liquid<br />
Chromatography-Mass Spectrometry (LC-MS) Data<br />
Matthew Fitzgibbon, Wendy Law, Damon May, Andrea Detter, and<br />
Martin McIntosh<br />
Summary<br />
The analysis of protein mixtures by liquid chromatography-mass spectrometry (LC-<br />
MS) requires tools for viewing and navigating LC-MS data, locating peptides in LC-MS<br />
data, and eliminating low-quality peptides. msInspect, an open source platform, can carry<br />
out these steps for single experiments and can align and normalize peptide features<br />
in comparative studies with multiple LC-MS runs. In addition, msInspect can analyze<br />
quantitative studies with and without isotopic labels to generate peptide arrays.<br />
Key Words: liquid chromatography-mass spectrometry; peptide identification;<br />
filtering; alignment; quantitation.<br />
1. Introduction<br />
msInspect is an open-source platform comprising algorithms and visualization<br />
tools that process liquid chromatography-mass spectrometry (LC-<br />
MS) data files to locate peptides in two dimensions [time and mass over<br />
charge (m/z)] and perform various analyses on them (1). msInspect can be<br />
used for:<br />
• Visually inspecting LC-MS spectra and peptide features<br />
• Automatically locating peptide features in high mass accuracy MS spectra<br />
• Filtering peptide features by various quality measures<br />
• Quantitating label-free peptide features between experiments via alignment and<br />
normalization of the data to create a peptide array<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
369
370 Fitzgibbon et al.<br />
• Identifying isotopically labeled pairs [e.g., isotope coded affinity tagging (ICAT),<br />
sable labeling with amino acids in cell culture (SILAC)] for quantitative peptide<br />
analysis within a single experiment<br />
• Comparing and developing MS feature-finding algorithms<br />
msInspect implements multiple algorithms specifically designed for LC-MS<br />
data.<br />
The signal processing component exploits the two-dimensional nature of<br />
the data to identify coeluting isotopes and then groups them based on the<br />
similarity of the observed isotopic distributions to those of naturally occurring<br />
peptides. The alignment method estimates the underlying nonlinear mapping of<br />
retention times between experiments. The normalization approach (2) adapts<br />
methods developed for genomic arrays to accommodate natural variation of<br />
LC-MS signal intensities across runs. Ultimately, the goal of msInspect is to<br />
mine LC-MS data and to produce peptide arrays that can then be analyzed<br />
using tools traditionally applied to genomic arrays msInspect also contains<br />
a complete Accurate Mass and Time (AMT) analysis workflow (3). These<br />
analytical techniques combine LC-MS and LC-MS/MS data in order to expand<br />
peptide coverage and enhance the confidence of peptide identifications.<br />
2. Materials<br />
To run msInspect the Java Runtime Environment must be installed. To<br />
perform alignment of multiple runs, the R environment must also be installed.<br />
Both of these programs must be properly configured and on the computer’s<br />
PATH. Information on acquiring these software packages is provided in<br />
Subheading 2.1 below. Please contact your local IT systems support group for<br />
details on installing these software properly.<br />
msInspect reads mass spectra from files in the open mzXML format (4). For<br />
background on mzXML and information about converting data from particular<br />
instruments to mzXML see Note 1.<br />
2.1. Software<br />
1. msInspect is written in platform-independent Java and requires that the Java<br />
Runtime Environment, version 1.5 or later, be installed and on the computer’s<br />
PATH. Installation of Java Runtime Environment will also install the latest<br />
version of Java Web Start, which will allow msInspect to be run without<br />
needing to explicitly install it or update it as new versions are released<br />
(see Note 2).<br />
a. Windows, Linux, and Solaris users can download “J2SE 5.0” from<br />
http://java.sun.com/j2se/1.5.0/download.jsp.<br />
b. MacIntosh users running Mac OS X v10.4 or later can download Java from<br />
http://www.apple.com/support/downloads.
Open LC-MS Analysis Platform 371<br />
2. To align multiple runs into a peptide array, the R environment for statistical<br />
computing, version 2.1.0 or later, must be installed and on the computer’s PATH.<br />
R executables for various operating systems are available from http://www.rproject.org.<br />
2.2. Hardware<br />
msInspect will run on any computer that supports the software listed in<br />
Subheading 2.1. For large input files, typical of high mass accuracy measurements,<br />
feature extraction can require several hundred megabytes of memory<br />
(see Note 3). msInspect has been tested on computers running Windows XP,<br />
GNU Linux, and Mac OS X with at least 1 GB of main memory.<br />
2.3. Data Files<br />
msInspect will open any version 2.0 mzXML file containing MS1<br />
data. However, msInspect was designed using high-resolution liquid<br />
chromatography-electrospray ionization-time of flight mass spectrometer data<br />
so it may not perform as well with an mzXML file from another type of<br />
mass spectrometer (e.g., a matrix-assisted laser desorption-time of flight mass<br />
spectrometer).<br />
Sample mzXML files that may be used to follow all of steps in Section 3<br />
are available on the Web (see Note 4).<br />
3. Methods<br />
3.1. <strong>View</strong>ing and Navigating LC-MS Data<br />
1. Launch msInspect from http://proteomics.fhcrc.org/download/tools/msInspect/<br />
viewer.jnlp by clicking on “Launch msInspect with Java Web Start.” “Fred<br />
Hutchinson Cancer Research Center” must be accepted as a trusted software<br />
publisher for the download to be completed.<br />
2. Upon launching msInspect, the Open File dialog box will automatically open.<br />
Browse for the mzXML file to be viewed, select the file, and left click the<br />
Open button (see Note 5). You may load a different mzXML file by selecting<br />
File > Open from the main msInspect menu bar.<br />
3. The msInspect window (Fig. 1) contains several panes for viewing and navigating<br />
the MS run:<br />
a. An image of the MS run will be displayed in the Image Pane (the largest<br />
pane in the center of the msInspect window).<br />
b The Properties Pane (left side of the window) will display detailed information<br />
from the mzXML file loaded. This pane will later be used to<br />
display details of individual peptide features. It can be hidden with<br />
Windows > Show/hide properties.
372 Fitzgibbon et al.<br />
c. The Detail Pane is on the right side of the window and the Chart Pane is<br />
at the bottom part of the window. Each provides a more detailed view of a<br />
region of the spectrum. The Detail Pane provides a zoomed view of the area<br />
selected in the full Image Pane. The Chart Pane plots intensity versus m/z<br />
(to show the isotopes in a single scan) or intensity versus scan (to show the<br />
elution profile of a single isotope).<br />
4. Hold the mouse cursor over a location in the Image Pane. A floating tag will<br />
appear displaying the scan number and m/z coordinates of that position.<br />
5. Areas containing peptide features in the Image Pane will appear dark. Left click<br />
in a dark area of the image where there appear to be many peptide features as<br />
shown in Fig. 1.<br />
a. The Detail Pane (right) shows a detailed view of the area selected. Feature<br />
finding is automatically launched in this area, and after a few seconds of<br />
computation, detected peptide features are circled. Xs indicate the monoisotopic<br />
peaks in each feature (see Note 6).<br />
b. To see detailed information about a detected peptide, position the mouse<br />
cursor over the monoisotopic peak. A floating tag will display scan<br />
number, m/z (followed by mass in parentheses), inferred charge state,<br />
Fig. 1. msInspect window showing the Properties Pane (top left), Image Pane (top<br />
center), Detail Pane (top right), and Chart Pane (bottom).
Open LC-MS Analysis Platform 373<br />
intensity/background intensity/median intensity, and the first and last scan<br />
for the feature.<br />
c. The Chart Pane (bottom) displays the m/z spectrum for the scan corresponding<br />
to the vertical red line in the Detail Pane.<br />
6. Zoom in on features in the Chart Pane by highlighting a desired area. To do<br />
this, anchor the mouse cursor by left clicking at the top left corner of the desired<br />
area and continue to hold down the left mouse button while dragging the mouse<br />
cursor down and to the right. When the mouse button is released, the chart will<br />
be redrawn to produce a magnified view of the selected area (see Note 7). To<br />
restore the original chart, left click on the mouse cursor anywhere in the Chart<br />
Pane and drag the cursor up or to the left.<br />
7. Select “elution” from the drop-down menu at the top of the Chart Pane to display<br />
an elution profile plot. This display shows peaks along the scan axis rather than the<br />
m/z axis. Note that the Detail Pane now displays a horizontal line corresponding<br />
to the m/z value for the profile as shown in Fig. 2.<br />
8. Zoom in on the Image Pane by right clicking on the mouse and selecting a<br />
magnification value from the list (e.g., 200%).<br />
Fig. 2. msInspect window displaying an elution profile plot in the Chart Pane and<br />
corresponding horizontal line in the Detail Pane.
374 Fitzgibbon et al.<br />
3.2. Locating Peptides in LC-MS Data<br />
A Feature Set file, which lists all of the peptide features detected in a run,<br />
can be generated using one of the algorithms included in the platform (see<br />
Note 8).<br />
1. Under the Tools menu, select two dimensional (2D) Peak Alignment. This is the<br />
default feature-finding algorithm and is recommended for most purposes.<br />
2. To initiate feature finding, select Tools > Find All Features. This will bring up<br />
the Extract Features dialog box as shown in Fig. 3.<br />
3. In the “Save Features to File” field, enter (or browse for) a path and add a name<br />
for the new Feature Set file.<br />
4. Specify a scan range in the “Start Scan” and “End Scan” fields to limit feature<br />
finding to a subset of scans. By default, msInspect will attempt to find peptides<br />
in all scans (see Note 9).<br />
5. Left click the Find Features button to begin the feature finding process. As the file<br />
is processed, the status bar at the bottom of the msInspect window will display<br />
progress. For a large input file, processing may take upwards of 20–30 min.<br />
6. When processing is complete, features will be written to the specified<br />
output file and highlighted as colored crosses in the Image and Detail<br />
Panes. The status bar will display “Finding features complete. See file<br />
yourfilepath\yourfile.peptides.tsv.” Place the mouse cursor over one of the<br />
detected features to display a summary of its properties. Left click on the feature to<br />
view details in the Properties Pane (display by Windows > Show/hide Properties).<br />
7. Select Tools > Display Peptides… to open the Display Features dialog box as<br />
shown in Fig. 4A for customization:<br />
Fig. 3. Extract Features dialog box.
Open LC-MS Analysis Platform 375<br />
(A)<br />
(B)<br />
Fig. 4. Continued
376 Fitzgibbon et al.<br />
a. Display or hide the colored crosses by checking or unchecking the box under<br />
the “Display” field.<br />
b. Change the color of the crosses by left clicking on the colored box under<br />
the “Color” field. A new color can be selected from a color palette.<br />
c. <strong>View</strong> the Feature Set browser by left clicking on the “…” button. This<br />
browser lists details of all peptides in the Feature Set. This list can be sorted<br />
and edited, comments can be added to a feature, features can be deleted, and<br />
the modified Feature Set file may be saved (see Note 10).<br />
3.3. Filtering to Eliminate Low-quality Peptides<br />
Low-quality peptides can be removed in msInspect by applying userspecified<br />
filtering criteria (e.g., a minimum number of isotopic peaks detected).<br />
Removing low-quality peptides is particularly helpful when peptide arrays are<br />
to be generated (described in Subheading 3.4.1).<br />
1. Select Tools > Display Peptides….<br />
2. Left click the Filter tab at the bottom of the Display Features dialog box. This<br />
tab displays several parameters by which features can be filtered.<br />
3. Set Min Charge = 1, Min Scans = 3, Min Intensity = 5, Max KL = 1.0, and Min<br />
Peaks =2asshown in Fig. 4A (see Note 11).<br />
4. Left click the Apply button. The Detail Pane now shows only the features that<br />
meet these filtering criteria.<br />
5. Save the filtered Feature Set file over the original file by left clicking on the “…”<br />
button at the top right of the Display Features dialog box, then left clicking on<br />
the Save button.<br />
3.4. Quantitation of Peptide Features<br />
3.4.1. Quantitation Using Label-free Approaches<br />
Features from multiple experiments can be compared in msInspect by simultaneously<br />
opening Feature Set files from multiple LC-MS runs, displaying them<br />
together, and generating a peptide array. Below are directions for multiple LC-<br />
MS run comparisons after Feature Set files have been produced (as described<br />
above in Subheadings 3.1–3.3) for all LC-MS runs to be compared.<br />
1. Select Tools > Display Peptides….<br />
2. Left click on the Add Files button (Fig. 4A).<br />
◭<br />
Fig. 4. (A) Display Features dialog box with one file loaded and the Filter tab<br />
selected. (B) Display Features dialog box with two files loaded and the Peptide Array<br />
tab selected.
Open LC-MS Analysis Platform 377<br />
3. Browse to find another Feature Set file (with file extension.peptide.tsv) and open<br />
it. A different colored cross is assigned in the Image Pane to the features from<br />
each newly opened file. In this way, multiple Feature Set files can be opened and<br />
overlaid in the Image Pane (see Note 12).<br />
4. Left click on the Filter tab (Fig. 4A) at the bottom of the Display Features<br />
dialog box and make sure the filter criteria are still set to the values entered<br />
in Subheading 3.3 (Min Charge = 1, Min Scans = 3, Min Intensity = 5, Max<br />
KL = 1.0, and Min Peaks = 2). Left click on the Apply button if any changes are<br />
made.<br />
5. Left click on the Peptide Array tab (Fig. 4B) to set criteria for the peptide array<br />
to be generated:<br />
a. Enter a name for peptide array file that will be generated. By convention,<br />
this file name should end with “.pepArray.tsv.”<br />
b. Click the Optimize button to have msInspect search for reasonable tolerances<br />
for matching features across runs (see Note 13).<br />
c. Check the Normalization box if normalization of features is desired (2).<br />
d. Click the Calculate button to actually compute the peptide array.<br />
6. The generated peptide array file consists of one column of intensities for each run<br />
and one row for each matched feature. The file is stored in a simple tab-delimited<br />
format, which can be exported (to Excel and other programs) and analyzed using<br />
tools traditionally applied to genomic arrays (see Note 14).<br />
3.4.2. Quantitation Using Isotopic Labeling<br />
A common method of relative quantitation of peptides involves applying<br />
heavy and light isotopic labels separately to two samples, then mixing them<br />
prior to collecting LC-MS data. Typically, tandem MS/MS (or MS2) experiments<br />
are used to analyze these labeled samples. Peptide sequencing in<br />
MS/MS can detect the number of labeled residues in each peptide and therefore<br />
determine the expected mass difference between light and heavy forms of each<br />
peptide.<br />
msInspect can perform relative quantitation even in the absence of MS/MS<br />
information. Provided with the mass of the light and heavy reagents and with a<br />
threshold on the number of labeled residues to consider, msInspect will search<br />
for pairs of features consistent with isotopic labeling.<br />
1. Open the file to be analyzed as described in Subheading 3.1.<br />
2. Select Tools > Find All Features.<br />
3. This will again bring up the Extract Features dialog box as shown in Fig. 3.<br />
Enter a new output file name and select a scan range of interest as described in<br />
Subheading 3.2.3–3.2.4.<br />
4. Note the “Quantitate” check box in this dialog. Selecting this box will enable<br />
several options for relative quantitation.
378 Fitzgibbon et al.<br />
5. Select one of several common isotopic labeling strategies (e.g., Cleavable ICAT<br />
and O 16 /O 18 ) from the pull-down menu. Details can be entered including masses<br />
for light and heavy label reagents, the particular amino acid labeled, and the<br />
maximum number of labeled residues to consider.<br />
6. Left click on the “Find Features” button to locate all features in the specified<br />
scan range. Display features from the Feature Set file as described in Subheading<br />
3.2.7. An additional matching step is performed to locate isotopically labeled<br />
pairs. A pair is indicated by a vertical bar connecting the light and heavy partners<br />
in the Detail Pane. Selecting a pair by left clicking in the Detail Pane will display<br />
feature properties including the light and heavy intensities, the ratio of light to<br />
heavy, and the number of isotopic labels detected.<br />
7. The results of this quantitation process are stored in a tab separated value (TSV)<br />
file specified in step 3.4.2.3. One record is written for each isotopically labeled<br />
pair and for each unlabeled peptide (see Note 15).<br />
4. Notes<br />
1. More information on the mzXML file format, as well as utilities to convert<br />
native acquisition files from many common MS instruments to mzXML, can be<br />
found on the Sashimi website at http://sashimi.sourceforge.net.<br />
2. Running msInspect via Java Web Start is highly recommended for casual use,<br />
as it greatly simplifies installation and update of the software. msInspect’s<br />
major features, such as feature finding and peptide array creation, are available<br />
from the command line as well, and command-line use is more appropriate<br />
for batch processing of large numbers of mzXML files. To use msInspect<br />
from the command line, the stand-alone JAR file can be downloaded from<br />
http://proteomics.fhcrc.org/CPL/msinspect.html. This web page also allows<br />
download of the msInspect user’s guide, which contains detailed instructions on<br />
installation, using msInspect’s features from the command line, and full source<br />
code for the released version (5).<br />
3. Feature extraction can require a great deal of memory since it operates on several<br />
scans at a time. By default the Java Web Start version of msInspect allows up to<br />
384 MB of memory to be allocated so that a number of scans and intermediate<br />
results may be cached. If additional memory is available on the computer, the<br />
amount of memory accessible by msInspect may be increased when running<br />
msInspect from the command line with the “-Xmx” option when invoking Java.<br />
For example “java –Xmx512M –jar viewerApp.jar.”<br />
4. Sample data files are available at https://proteomics.fhcrc.org/CPAS. From that<br />
website, follow the “Published Experiments” link on the lower left side and<br />
then left click on the “MiMB Clinical Proteomics” link on the left side. Because<br />
LC-MS files can be quite large, the samples provided for download are only<br />
small subregions of the files used as figures in Section 3. Some browsers, such<br />
as Internet Explorer, may add a “.mzXML.xml” suffix when downloading these
Open LC-MS Analysis Platform 379<br />
files. This should not affect msInspect’s ability to read the files and may be<br />
safely modified to “.mzXML” if desired.<br />
5. The first time a particular mzXML file is loaded, msInspect will write a “.inspect”<br />
file in the same directory where the mzXML file is located. This file contains an<br />
index of each scan in the original file, which will speed subsequent file access.<br />
Construction of this index file can take some time for larger input files; the<br />
status bar at the bottom of the msInspect window will indicate progress.<br />
6. The area shown in the Detail Pane is indicated in the main Image pane by a blue<br />
rectangle. Several aspects of Detail Pane behavior can be adjusted by selecting<br />
Detail Pane Settings from the Tools menu. There, feature detection can be turned<br />
on or off, background noise that falls below a threshold can be hidden, and the<br />
color scheme of the Detail Pane can be modified.<br />
7. Note that in Fig. 1 the Chart Pane clearly shows individual isotopic peaks<br />
because the data is from a high-resolution instrument (in this case a Waters<br />
LCT Premier). msInspect depends on resolving individual isotopes to infer the<br />
charge state of the peptide and therefore its mass. The charge is derived from<br />
the reciprocal of the distance between adjacent peaks. In Fig. 1 the peaks of<br />
the peptide on the left side of the Chart Pane are 0.5 m/z units apart, therefore<br />
msInspect infers that this peptide has a charge of 2. It is not possible to infer a<br />
charge for a single peak, so “stray peaks” that cannot be grouped into an isotopic<br />
cluster are assigned a charge of zero.<br />
8. msInspect includes a number of feature extraction algorithms, which can be<br />
selected in the Tools menu. The default, two dimensional (2D) peak alignment,<br />
is recommended for most purposes. The single scan algorithm may be useful<br />
if there is little or no scan-to-scan coherence. The feature extraction algorithms<br />
in msInspect have been designed to work on high-resolution profile mode data.<br />
The algorithms have been successfully applied to centroided data, but performance<br />
will depend on the particular centroiding algorithm used and on the noise<br />
characteristics of the run under consideration. For such data, the centroided scan<br />
algorithm may be appropriate.<br />
9. Once peptides have been located, some amount of visual curation is recommended.<br />
The Heat Map view (accessed from the Tools menu) can provide a<br />
global view of features grouped by charge state and sorted by various metrics<br />
such as mass or intensity. Each column in the Heat Map view consists of a<br />
small intensity window around each feature, colored from low intensity (red) to<br />
high intensity (yellow). Clicking on a feature in the Heat Map will highlight it<br />
in the other windows. By sorting on KL score or intensity and inspecting a few<br />
features, one can gain a sense of what filtering criteria might be appropriate for a<br />
given data set. When new filter settings are applied, as described in Subheading<br />
3.3, the Heat Map view is automatically updated.<br />
10. A typical example of editing a Feature Set file:<br />
a. Sort by ascending KL score (Left click on the “KL” column header).<br />
b. Find a feature with KL < 1 that was misidentified by examining its spectrum<br />
in msInspect window’s Chart Pane.
380 Fitzgibbon et al.<br />
c. Double click in the Description field for the feature to add a comment to<br />
the Feature Set List noting that this feature is “questionable.”<br />
d. Click “Save” to save changes by overwriting the old Feature Set file.<br />
11. Filtering peptide features can improve the performance of subsequent steps<br />
such as construction of peptide arrays. Specific filtering criteria will depend on<br />
instrumentation and the experiment goals. The most frequently used filtering<br />
criteria include:<br />
a. Minimum charge – msInspect locates features by first finding peaks and<br />
then grouping them into isotopic distributions consistent with individual<br />
peptides. Some peaks will not group with any others and are referred to as<br />
“stray peaks.” As described in Note 7, it is not possible to infer the charge<br />
state of these stray peaks, so they are assigned a charge of zero. Setting the<br />
minimum charge to 1 when filtering will remove these stray peaks, which<br />
are often due to noise or chemical contaminants.<br />
b. Minimum number of peaks – confidence in the location and charge state<br />
assignment of a peptide feature may be greater if it is supported by<br />
more isotopic peaks. Setting the minimum number of peaks to 2 will also<br />
eliminate the stray peaks described above.<br />
c. Minimum number of scans – set the minimum number of scans that a<br />
peptide must span in order to be considered. This has the effect of eliminating<br />
peptide features that persist for only a brief time.<br />
d. Minimum intensity – setting a minimum intensity threshold is often appropriate,<br />
although the specific value used will depend on the instrument.<br />
e. Maximum KL score – peaks are grouped by how well they match a model<br />
of the isotopic distribution of a peptide with a given mass. The KL score<br />
described in Bellew, et al. (1) measures how much an extracted group of<br />
peaks deviates from this model; in general, a lower KL score indicates a<br />
better match.<br />
12. When multiple feature sets are loaded, it is often useful to hide particular sets<br />
or to change the colors of the crosses that mark features in a given set. Both<br />
of these can be accomplished in the Display Features dialog box as shown in<br />
Fig. 4A (select Tools > Display Peptides). For each feature set, this dialog box<br />
provides a checkbox to control visibility and a color palette to select colors for<br />
the crosses.<br />
13. After optimization, the mass and scan window values that give the best alignment<br />
results automatically populate the Peptide Array tab.<br />
14. A number of high-quality open source tools are available for microarray analysis.<br />
To analyze peptide arrays produced by msInspect, tools from the Bioconductor<br />
project (http://www.bioconductor.org) and from the TM4 microarray software<br />
suite (http://www.tm4.org) have been used.<br />
15. Results from isotopic labeling should be treated as suggestive rather than authoritative.<br />
Without peptide sequence information, the mass difference between<br />
heavy and light partners cannot be definitively ascertained. The quality of the
Open LC-MS Analysis Platform 381<br />
matching is therefore dependent on the quality of feature filtering and the density<br />
of features in each run.<br />
Acknowledgments<br />
The authors would like to thank Matthew Bellew, Marc Coram, Jimmy Eng,<br />
Ruihua Fang, Mark Igra, and Tim Randolph for their intellectual contributions<br />
to the development of msInspect. This work was supported by contract #<br />
23XS144A from the National Cancer Institute.<br />
References<br />
1. Bellew, M., Coram, M., Fitzgibbon, M., Igra, M., Randolph, T., Wang, P.,<br />
May, D., Eng, J., Fang, R., Lin, C.W., Chen, J., Goodlet, D., Whiteaker, J.,<br />
Paulovich, A., and McIntosh, M. (2006) A suite of algorithms for<br />
the comprehensive analysis of complex protein mixtures using highresolution<br />
LC-MS. Bioinformatics Advance Access published on June 9, 2006<br />
http://bioinformatics.oxfordjournals.org/cgi/reprint/btl276v1.<br />
2. Wang, P., Tang, H., Zhang, H., Whiteaker, J., Paulovich, A.G., and McIntosh,<br />
M. (2006) Normalization regarding non-random missing values in high-throughput<br />
mass spectrometry data. Proceedings of the Pacific Symposium on Biocomputing<br />
11, 315–326.<br />
3. May, D. Fitzgibbon, M., Liu, Y., Holzman, T., Eng, J., Kemp, C.J., Whiteaker, J.,<br />
Paulovich, A., and McIntosh, M. (2007) A Platform for Accurate Mass and<br />
Time Analyses of Mass Spectrometry Data. Journal of Proteome Research 6(7),<br />
2685–2694.<br />
4. Pedrioli, P.G., Eng, J.K., Hubley, R., Vogelzang, M., Deutsch, E.W., Raught, B.,<br />
Pratt, B., Nilsson, E., Angeletti, R.H., Apweiler, R., Cheung, K., Costello, C.E.,<br />
Hermjakob, H., Huang, S., Julian, R.K., Kapp, E., McComb, M.E., Oliver, S.G.,<br />
Omenn, G., Paton, N.W., Simpson, R., Smith, R., Taylor, C.F., Zhu, W., and<br />
Aebersold, R. (2004) A common open representation of mass spectrometry data and<br />
its application to proteomics research. Nature Biotechnology 22(11), 1459–1466.<br />
5. Computational Proteomics Laboratory. msInspect website. Accessed on June 28,<br />
2006 at http://proteomics.fhcrc.org/CPL/msinspect.html.
20<br />
Pattern Recognition Approaches for Classifying<br />
Proteomic Mass Spectra of Biofluids<br />
Ray L. Somorjai<br />
Summary<br />
The statistical classification strategy we have developed for magnetic resonance,<br />
infrared, and Raman spectra for the analysis of biomedical data is discussed, particularly<br />
as it applies to proteomic mass spectra. A general discussion of the current use of<br />
pattern recognition methods is given, with caveats and suggestions relevant for clinical<br />
applicability.<br />
Key Words: visualization; preprocessing; feature selection/extraction; robust<br />
classifier; classifier aggregation; proteomics; mass spectroscopy; magnetic resonance<br />
spectroscopy; biodiagnostics.<br />
1. Introduction<br />
Unlike magnetic resonance spectroscopy (MRS), infrared spectroscopy<br />
(IRS), and Raman spectroscopy (RS) (1,2,3), proteomic mass spectroscopy<br />
(PMS) is a relative newcomer to the field of biodiagnostics. However, with<br />
the goal of discriminating various disease and disease states, it is a welcome<br />
complementary technique that provides yet another means of analyzing<br />
biofluids. In particular, this complementarity extends the range of characterizing<br />
biofluids, from vibrational states of specific chemical groups (IRS, RS),<br />
through the identification of small molecules (MRS), to proteins and protein<br />
fragments (PMS).<br />
Being an emerging field, PMS suffers from growing-up pains. In particular,<br />
there are experimental difficulties specific to PMS that have yet to be addressed<br />
From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols<br />
Edited by: A. Vlahou © Humana Press, Totowa, NJ<br />
383
384 Somorjai<br />
(see Note 1) (in the following, the author assumes that the spectra, for which<br />
classifiers are to be developed, have been properly “processed”).<br />
Typically, biomedical data consist of a relatively few (of the order 10–100)<br />
samples (patterns) that are initially presented in a very high-dimensional feature<br />
space (feature ≡ m/z intensity), with dimensionality L (dimension ≡ features) of<br />
order 1000–10,000. Unfortunately, these two characteristics lead to two curses<br />
that impede the development of robust classifiers: the curse of dimensionality<br />
and the curse of dataset sparsity (3). The consequence of the two curses is<br />
that the sample to feature ratio (SFR) is 1/10–1/1000, instead of the minimal<br />
5–10, required for robust classification, as is generally accepted by the machine<br />
learning community.<br />
In this chapter, the author presents the specific strategy [dubbed statistical<br />
classification strategy (SCS)] they have developed over the last dozen years<br />
to deal with such problems, particularly as they apply to MR, IR, and Raman<br />
spectra. We have been adapting this strategy and applying it with success to<br />
biomedical data derived from both proteomics mass spectra and microarrays<br />
(see Note 2). The author compares the differences and similarities of the SCS<br />
with the proteomics data analysts’ current tools and wherever possible, makes<br />
recommendations.<br />
2. The Statistical Classification Strategy<br />
Lifting the twin curses of high dimensionality and dataset sparsity requires<br />
special approaches. The “strategy” part of the SCS reflects the fact that no<br />
single approach is, or can be optimal [“there are no panaceas in data analysis”<br />
(4)], and that a data-driven, multistage strategy is necessary or even essential.<br />
Using a divide-and-conquer philosophy, the SCS consists of five stages:<br />
1. Data visualization<br />
2. Preprocessing<br />
3. Feature selection/extraction<br />
4. Robust classifier development<br />
5. Classifier aggregation (ensembles)<br />
The five stages are, of course, intimately interrelated; in particular, we use<br />
the visualization stage to constantly monitor how well the other stages of the<br />
strategy are working. Figure 1 provides a flowchart of the SCS. A more detailed<br />
description of the SCS can be found in (5) (see Note 3).<br />
2.1. Visualization of High-Dimensional Data<br />
Proper data visualization is an essential first step that requires dimensionalityreducing<br />
mapping/projection from typically a very large, L-dimensional feature
Pattern Recognition for Proteomic Spectra 385<br />
DATA VISUALIZATION<br />
PREPROCESSING<br />
FEATURE SELECTION / EXTRACTION<br />
CLASSIFIER DEVELOPMENT<br />
CLASSIFIER AGGREGATION<br />
Fig. 1. Flowchart for the five stages of the SCS.<br />
space to one to three dimensions. Of course, mapping from high dimensions to<br />
lower ones cannot preserve all distances exactly, because most of the original<br />
degrees of freedom are lost. However, if only class separability is required,<br />
exact visualization, our primary goal, is both achievable and sufficient. In<br />
fact, we recently proposed such an approach (6). It involves mapping highdimensional<br />
patterns to a special plane, the relative distance plane (RDP). The<br />
mapping procedure starts with the selection of a distance measure. This can<br />
range from Euclidean, city block, maximum norm to Mahalanobis, and its<br />
generalization (Anderson – Bahadur, AB) (7). Next, two reference patterns<br />
are chosen, one from each class. The critical observation, on which the RDP<br />
mapping relies, is that the distance of any other pattern to these two reference<br />
points is preserved exactly even after the mapping. This is because a triangle<br />
remains a triangle in any dimension and for any distance metric. Hence, the<br />
three distances of any such a triangle can be displayed in two dimensions,<br />
without distortion. By cycling through all possible reference pairs, we can<br />
display and visualize the data with respect to these sets, i.e., from a large number<br />
of possible “perspectives” (as an analogy, consider looking at a sculpture from<br />
every angle to assess its shape and form), a very powerful approach for detecting<br />
outliers (e.g., poor quality spectra), discovering additional subgroups within a<br />
class (clustering), assessing whether training and test sets derive from the same<br />
distributions, etc., in short, for establishing and ensuring quality control.<br />
2.2. Preprocessing<br />
Preprocessing enables the user to adapt, “tune” the data, so that the subsequent<br />
stages of the SCS are optimized. For spectra, whether MS or MR,<br />
we found that the most useful preprocessing approaches, alone or in combination,<br />
are normalization (“whitening,” or scaling to unit area), smoothing<br />
(filtering), and/or peak alignment (with respect to some internal or external
386 Somorjai<br />
reference). Various transformations of the spectra lead frequently to better<br />
classification. Examples of such transformations include replacing the spectra<br />
by their (numerical) derivatives or by rank-ordered variants (the nonlinear<br />
rank-ordering replaces the original features by their ranks, thus minimizing<br />
the influence of accidentally large or small feature values) and combinations<br />
of these. Furthermore, creating differently preprocessed versions of the same<br />
dataset, selecting different sets of features from these (stage 3), and developing<br />
different classifiers using these feature sets (stage 4) facilitates the aggregation<br />
of these multiple classifiers for possibly increased accuracy (stage 5). The<br />
achieved classifier’s accuracy and reliability are also assessed by visualization<br />
of the results (stage 1). This demonstrates how the strategy uses the stages in<br />
an interactive, feedback fashion.<br />
2.3. Feature Selection/Extraction<br />
In general, this stage is one of the two most important components of the<br />
SCS. It is essential not only for dimensionality reduction (which helps lifting<br />
the curse of dimensionality), but, when done properly, also helping to arrive at<br />
biologically relevant and transparent interpretations of the data (“biomarker”<br />
identification). The driving force behind feature selection/extraction (FSE) is<br />
the goal of satisfying one of the two critical requirements for any reliable<br />
classifier development, lifting the curse of dimensionality.<br />
Spectra, whether mass or MR, are peculiar: their “intrinsic dimensionality,”<br />
the number of independent, relevant features they possess, is generally much<br />
smaller than their original dimensionality. This is because spectra have many<br />
irrelevant features (“noise”), and adjacent features are strongly correlated.<br />
Some of these correlated features correspond to spectral peaks, representing<br />
small molecules (MRS), or small proteins, protein fragments, or peptides<br />
(PMS). Thus, it is clearly beneficial to eliminate irrelevant features and<br />
identify discriminatory peaks (potential “biomarkers”). For spectra, principal<br />
component analysis, a frequently used dimension reduction method (often the<br />
principal tool of many PMS data analysts), is doubly dangerous. First, it<br />
“scrambles” the original features, making discriminatory feature identification<br />
and selection problematic; second, since the principal components (PCs) are<br />
ordered according to the maximum variance explained in the data, there is no<br />
guarantee that the first few PCs are discriminatory for classification. Even if<br />
one were to choose the first M ≪ L PCs from the original, total L-term set, these<br />
are rarely the best discriminators. One could try selecting m < M PCs as optimal<br />
for classification (e.g., by exhaustive search); our early experience indicates<br />
that some of the good discriminators are among the remaining k = M + 1,…,L
Pattern Recognition for Proteomic Spectra 387<br />
subset of PCs. All these difficulties point to the need for a feature selection<br />
method specific to spectral data, one that preserves spectral interpretability.<br />
There are two generic approaches to feature selection (8). The filter method<br />
selects features without consideration of the classifiers to be used with these<br />
features. The wrapper (embedding) method finds optimal features, while using<br />
the eventual classifier to guide the selection method. We have developed a<br />
genetic algorithm-based optimal region selection (GA-ORS) method that finds<br />
discriminatory features without loosing spectral interpretability (9).<br />
The GA-ORS is based on the wrapper approach and is an example of feature<br />
extraction. It has the advantage that the spectral ranges found are averaged<br />
over adjacent data points (thus equivalent to peak area determination). Such<br />
averaging increases the signal to noise ratio, a bonus. Within the GA-ORS suite<br />
of programs, one can also control the widths of the selected spectral subregions<br />
(discriminatory peaks); this helps to eliminate those regions that appear to be<br />
discriminatory simply because of accidental differences in the “noise” regions<br />
due to the limited sample size (9,10).<br />
The GA-ORS has been very successful in identifying discriminatory subregions<br />
of MR, IR, and Raman spectra of biofluids and tissues, obtained for<br />
distinguishing between various diseases and disease states (1).<br />
In the context of feature selection, many proteomic mass spectroscopists first<br />
identify “relevant” peaks, sometimes in an ad hoc fashion, as possible contributors<br />
to discrimination. Although using all available “domain knowledge” is very<br />
important and should always be considered when available, it can also introduce<br />
bias, because of possible preconceived notions of what is relevant for discrimination.<br />
Our feature selection approach, sketched above, removes most of such<br />
bias, by identifying hitherto unsuspected, novel discriminatory “peaks,” or more<br />
accurately, discriminatory spectral subregions. Furthermore, by its explicit multivariate<br />
nature, GA-ORS tends to identify a “fingerprint,” a “panel” of peaks whose<br />
simultaneous interaction is necessary for discrimination.<br />
When the multidimensional feature space does not arise from spectra, e.g.,<br />
microarray data or preselected discrete peaks in PMS, for which averaging<br />
adjacent features is not meaningful, direct application of the GA-ORS methodology<br />
may not be appropriate [although we have used it as a preliminary,<br />
clustering-type feature selection “trick” (5)]. However, when possible,<br />
exhaustive, or when not, a dynamic programming-based search for optimal or<br />
near-optimal discriminatory feature subsets is still feasible and is one of the<br />
options available in GA-ORS.<br />
Figure 2 demonstrates the importance of feature selection, and the relevance<br />
of an interactive, feedback-mode visualization of data. For the two-class,<br />
prostrate cancer vs. healthy proteomic (mass spectral) dataset (11), we display<br />
a Euclidean distance-based mapping, either directly from the original 15,154
388 Somorjai<br />
Prostate Cancer – L 2 Mapping from<br />
15,154 Dimensions 5 Dimensions<br />
Fig. 2. Mapping from the original 15,154 dimensions (left panel) misclassified eight<br />
samples from the training set (TS; class 1, black disks, class 2, black crosses) and nine<br />
from the independent validation (test) set (VS; class 1, grey triangles, class 2, grey<br />
squares). The mapping from five dimensions (right panel), classified correctly all TS<br />
and the VS samples. The dashed lines shown are the optimal LDA separators.<br />
dimensions (left panel) or from five dimensions, reduced via GA-ORS (right<br />
panel). Clearly, the success of class separation depends on the dimensionality<br />
of the feature space. When mapping from the original 15,154 dimensions,<br />
the optimal two-dimensional separation of training sets (TS; black disks for<br />
class 1, black crosses for class 2) and test sets (VS; grey triangles for class 1,<br />
grey squares for class 2) misclassify eight samples from the training set and<br />
nine from the independent test set. For the mapping from five dimensions, all<br />
samples are classified correctly (see Note 4).<br />
2.4. Robust Classifier Development<br />
There are two, generally interrelated goals for supervised classifiers. First,<br />
we want robust classifiers, i.e., with high generalization power. This is realized<br />
when the classifier classifies new, unknown “patterns” correctly and reliably.<br />
Second, we want to identify the smallest subset of maximally discriminatory<br />
features. Eventual disease management/treatment would benefit from having<br />
only a few, biologically relevant and interpretable features. Ideally, both classification<br />
goals should be achieved, especially in clinically relevant studies.<br />
Unfortunately, achieving the first goal is frequently at the expense of the<br />
second. A good example is the recent use of support vector machines (SVMs)<br />
for classification. These have become particularly popular because of their
Pattern Recognition for Proteomic Spectra 389<br />
persuasive theoretical foundations (12,13) (see Note 5). However, because the<br />
SVMs project the data into even higher dimensional feature spaces to achieve<br />
linear separability of the classes, relevant, discriminatory feature identification<br />
becomes more difficult.<br />
The technical complexity and sophistication of the classifiers used range<br />
from the simplest correlation techniques, through k nearest neighbors, linear and<br />
quadratic discriminant analysis, decision trees, neural nets, etc., to (nonlinear)<br />
SVMs. However, the choice of classifier seems not to be dictated by the data<br />
to be classified, but rather by “expert” recommendation (usually based on other<br />
types of data), personal experience or preference, or simply software availability.<br />
The maxim “simpler is better” has mostly been ignored [see however<br />
(14)]. In general, no specific effort has been expended on choosing the most<br />
appropriate, optimal type of classifier for a given dataset. With a few exceptions,<br />
the proteomics (mass spectroscopy) community tends to use the “best”<br />
(i.e., the most sophisticated) classifier, whether appropriate or not!<br />
If the dataset size is sufficiently large, then the optimum approach for developing<br />
a robust classifier is to partition the data into training set, monitoring<br />
set and a completely independent test (validation) set. Such partitioning is<br />
required to prevent overfitting. This occurs when the classifier adapts itself too<br />
closely to the peculiarities of a training set that comprises a limited number<br />
of samples. Using a monitoring set helps decide when to stop training. The<br />
ultimate assessment of the classifier’s generalization capability is how well it<br />
does on the independent test set that was in no way involved in creating the<br />
classifier.<br />
Unfortunately, a sufficiently large sample size is a luxury rarely available to<br />
the data analysts of biomedical data. The only recourse is to use some version<br />
of crossvalidation (CV) (15). CV comes in different flavors, each with its<br />
advantages and disadvantages. All of them are designed to deal with the bias<br />
introduced by using the entire dataset both to develop the “optimal” classifier<br />
and to estimate the classification error (see Note 6).<br />
It is important to re-emphasize that because of the typical small sample size<br />
of biomedical data, the best approach to robust classifier development is to<br />
select the simplest classifier possible. This suggests linear classifiers. Complex<br />
classifiers have too many parameters that need optimization, inevitably raising<br />
the scepter of overfitting (see Note 7). Dimensionality reduction (FSE) is, of<br />
course, essential for obtaining an appropriate SFR. Realizing the role of the<br />
SFR is important when developing classifiers. However, an essential caveat is<br />
that data sparsity can render any classification result statistically suspect, even<br />
if the SFR is satisfied (3). The importance of guaranteeing the appropriate SFR<br />
is being recognized. However, the consequences of data set sparsity are still<br />
not appreciated (16).
390 Somorjai<br />
The control of disparate sensitivities and specificities produced by classifiers<br />
when the dataset is imbalanced has particular clinical relevance (typically, there<br />
are many more samples from normal subjects than from patients with particular<br />
diseases) and tuning methods are needed for the classifiers developed. The<br />
standard method in the pattern recognition literature is either oversampling<br />
(taking multiple samples from the sparser class), or undersampling (taking a<br />
subset of the samples from the larger class), such that the sample sizes in the<br />
two classes become balanced (sensitivity, SE ≈ specificity, SP). However, this<br />
approach fails quite frequently. Our approach is based on penalizing misclassification<br />
of members of the smaller class until SE ≈ SP (note that the penalty<br />
weight is generally not equal to the ratio of the class sizes).<br />
2.5. Classifier Aggregation<br />
Clinically relevant classifiers require statistically significant class assignments<br />
for the samples. Thus, when a classifier’s assignment probability for<br />
a sample is “fuzzy” (e.g., less than 75% for a second class problem) that<br />
assignment is not really useful from a clinical point of view. If the overall<br />
accuracy of a classifier is low and the assignments are fuzzy, a multiple classifier<br />
strategy (classifier aggregation) can frequently be beneficial. The idea is to<br />
combine the outputs of several classifiers, with the expectation that the new<br />
classifier thus formed will be more accurate and less fuzzy than the best of the<br />
individual constituents.<br />
One of the requirements for accurate ensemble-based classifiers is diversity.<br />
It is believed that the component classifiers should be as different as possible.<br />
This can be achieved in several ways. One of these approaches used conceptually<br />
and methodologically very different classifiers (Linear Discriminant<br />
Analysis (LDA), neural nets, and dynamic programming) on the same, unmodified<br />
data (17). However, our more recent experiments and experiences suggest<br />
that classifier diversity is not necessarily required. Comparable accuracy can<br />
be achieved in a simpler way, by employing a single, simple classifier (e.g.,<br />
LDA) and producing diversity using different transformations of the data (we<br />
have already discussed some of these in the context of feature selection).<br />
How are we to combine the outcomes of the various classifiers Some<br />
of the combinations range from the simple majority rule to more complex,<br />
trainable rules, e.g., stacked generalization (SG) (18). SG uses the output<br />
probabilities of the constituent classifiers as input features for a new classifier.<br />
Boosting (19) is a very powerful version a learnable classifier combination<br />
rule (see Note 8). It was used for identifying proteomic biomarkers for cancer<br />
detection (20). There are many classifier combination rules. When choosing<br />
such a rule, it is important to take into account both sample size and classifier<br />
complexity.
Pattern Recognition for Proteomic Spectra 391<br />
3. Discussion<br />
Of course, experimental quality control is essential for good classifiers, i.e.,<br />
those that have useful generalization properties. Much has been made of the<br />
“surprising” observation that different (or even the same) experimental groups,<br />
using different classifiers end up with totally different sets of discriminatory<br />
features (21). These are ascribed to various possible experimental differences in<br />
the spectral acquisition, etc. (22,23,24). Although these are indeed significant<br />
contributing factors, and must be considered and corrected, sight is lost of the<br />
important fact that when nonunique discriminatory sets are found, they are as<br />
likely caused by dataset sparsity (3) as by differences in experimental protocols.<br />
The initial euphoria is over: one cannot (or should not be able to) publish<br />
in prestigious journals (e.g., Science, Nature, Lancet, PNAS, etc.) proteomic<br />
results based on very limited sample sizes. Furthermore, even when there<br />
are enough data to produce a respectable classifier, high-impact journals are<br />
unlikely to accept a manuscript unless the results are independently validated. In<br />
particular, the chemical/biological identification of the discriminatory proteins,<br />
protein fragments, or peptides must accompany the classification results. This<br />
increased focus on establishing the clinical relevance of putative biomarkers<br />
is definitely a good sign. However, at this stage of the game, it is possibly<br />
premature, and one would prefer first to have a quick, noninvasive, reliable<br />
diagnostic/prognostic tool. To be clinically relevant, many more samples are<br />
required to develop such a tool (i.e., a sufficiently robust classifier; this<br />
requirement will likely rule out the reliable detection of rare diseases). Unfortunately,<br />
currently available sample sizes preclude the discovery of unique<br />
biomarker “fingerprints” of a disease. This nonuniqueness due to data sparsity<br />
leads inevitably to expensive, onerous, and unnecessary laboratory investigations<br />
to sift out medically relevant, unique subsets from the plethora of<br />
putative biomarkers found and suggested for various diseases. Understanding<br />
the biochemical causes is, of course, essential for, say, finding a possible cure,<br />
but should succeed the diagnostic/prognostic stage. Despites such caveats, the<br />
proteomics field is maturing and once the technical problems are successfully<br />
resolved, will undoubtedly provide important medical/clinical insights.<br />
The author further suggests that the power of proteomic spectroscopy can be<br />
enhanced by the simultaneous consideration of other experimental modalities<br />
that complement PMS, especially MRS, which could identify smaller discriminatory<br />
compounds also present in biofluids.<br />
4. Notes<br />
1. Amongst these are correcting the nonflat baselines arising from the matrix<br />
material, peak alignment of the spectra, reconciling data acquisition at different<br />
times, in different laboratories, with mass spectrometers of different sensitivity,
392 Somorjai<br />
correcting high frequency noise, etc. Proper experimental design, including<br />
rigorous quality assessment and control is essential before any classifier development<br />
is attempted. Good discussions and summaries are given in (21,22,23,24).<br />
2. The realization that some classification strategy is essential for the analysis of<br />
proteomic data is recent. That these strategies are different emphasizes that not<br />
only there is no best classifier, but also that no unique, best strategy exits either;<br />
different groups discovered different strategies that worked well for the data they<br />
analyzed (20,25). What common is that all strategies are multistage.<br />
3. The data-driven nature of the SCS emphasizes the fact that there is no simple,<br />
universal prescription for creating an optimal classifier (4), i.e., no simple, ready<br />
“recipe” is or likely to be available.<br />
4. This much-improved result strengthens the importance of feature selection. Note<br />
that both mappings were done using the Euclidean distance, necessary, because<br />
one cannot use any other distance measure (e.g., Mahalanobis) that involves<br />
matrix inversion. After feature selection, when the number of features is fewer<br />
than the number of samples, much more powerful and relevant distance measures<br />
can be used. For a fair comparison, the Euclidean distance is used for both cases<br />
presented in Fig. 2 [for further possible improvements obtainable using other<br />
distance measures see (6)]<br />
5. In practice, SVMs are not nearly as effective as suggested by theory. In fact,<br />
we have found (26) that a simple LDA classifier, with wrapper-driven feature<br />
selection, when applied to several publicly available proteomic mass spectra, and<br />
to six microarray datasets, generally outperformed a linear SVM, even when<br />
the latter was used with feature selection. Furthermore, SVM-based classifiers<br />
frequently produce classification results that are distinctly out of balance. The<br />
accuracy obtained for one of the classes is most of the time considerably better.<br />
This imbalance between sensitivity and specificity is of clinical relevance when<br />
trying to minimize false negatives and/or false positives.<br />
6. Different variants of CV deal differently with the so-called bias-variance dilemma,<br />
particularly acute for datasets with limited sample size. The simplest version, the<br />
leave-one-out (LOO) method, removes one of the N samples, develops a classifier<br />
with the remaining N – 1 samples, and tests its prediction accuracy on the left-out<br />
sample. By cycling through all N samples, N accuracy assessments are found. For<br />
small N (for which the data partition, as described in the main text, is not possible),<br />
LOO suffers from large variance, even though it minimized the bias. K-fold CV is<br />
frequently used to balance bias and variance. The samples are partitioned into K<br />
roughly equal subsets. K – 1 subsets are used for training the classifier, while the leftout<br />
subset is the current test set. Cycling through the K partitions and then calculating<br />
the mean and standard deviation of the accuracies over the K test sets assess how well<br />
and how reliably one is expected to classify new, unknown samples. K is typically<br />
chosen to be 5 or 10, whether or not the sample size warrants this choice. A more<br />
reasonable approach is to determine the best K via CV. Particularly, powerful is<br />
Efron’s bootstrapping approach (15). This involves the entire dataset, but uses a<br />
random resampling with replacement strategy. A large number of artificial datasets
Pattern Recognition for Proteomic Spectra 393<br />
of the same size as the original are thus produced. A classifier is created for each<br />
of these, and the outcomes are averaged. Bootstrapping is supposed to reduce both<br />
large bias and variance. Inspired by the bootstrapping concept, we have been using,<br />
with some success, its generalization (27).<br />
7. Instead of the direct use of nonlinear classifiers, with the attendant optimization<br />
problems, a simple trick is to use nonlinear terms but retain the simplicity of a<br />
linear classifier. One approach we found useful is to first develop a linear classifier<br />
(with feature selection) and then augment the linear features by constructing from<br />
them nonlinear functions, say, quadratic terms. This, of course, increases the<br />
number of parameters to be determined. However, the problem remains linear in<br />
the augmented feature space and linear classifiers can be developed. Furthermore,<br />
our explicit approach produces new features that remain interpretable as interaction<br />
terms. This is unlike the SVM classifiers that map implicitly into a much<br />
higher dimensional linear feature space, without interpretability. In addition, we<br />
can reduce the dimensionality of our augmented feature space by additional feature<br />
selection via exhaustive search, optimized by CV.<br />
8. Boosting requires “weak” base classifiers, C j , j = 1,2,…,j that are combined into<br />
a more accurate composite classifier, D j = C 1 + C 2 +…=C j . At stage m, the<br />
boosting algorithm carries out a weighed selection of a base classifier, given all<br />
previously chosen base classifiers. For the new base classifier C m , larger weights<br />
are given to samples that are incorrectly classified by the current composite<br />
classifier D m−1 so that C m will be chosen with a tendency to correctly classify<br />
previously incorrectly classified samples.<br />
Acknowledgments<br />
The author thanks the entire Biomedical Informatics Group for their decadelong,<br />
essential contributions to the development of the algorithms and softwares<br />
described.<br />
References<br />
1. Lean, C. L., Somorjai, R. L., Smith, I. C. P., Russell, P., Mountford, C. E.<br />
(2002) Accurate diagnosis and prognosis of human cancers by proton MRS and<br />
a three stage classification strategy. Annual Reports on NMR Spectroscopy 48,<br />
71–111.<br />
2. Somorjai, R. L., Dolenko, B., Nikulin, A., Nickerson, P., Rush, D., Shaw, A. et al.<br />
(2002) Distinguishing normal from rejecting renal allografts: application of a threestage<br />
classification strategy MR and IR spectra of urine. Vibrational Spectroscopy<br />
28, 97–102.<br />
3. Somorjai, R. L., Dolenko, B., Baumgartner, R. (2003) Class prediction and<br />
discovery using gene microarray and proteomics mass spectroscopy data: curses,<br />
caveats, cautions. Bioinformatics 19, 1484–1491.<br />
4. Huber, P. J. (1985) Projection pursuit. Ann. Statistics 13, 435–475.
394 Somorjai<br />
5. Somorjai, R. L., Alexander, M., Baumgartner, R., Booth, S., Bowman, C., Demko,<br />
A., Dolenko, B., Mandelzweig, M., Nikulin, A. E., Pizzi, N., Pranckeviciene,<br />
E., Summers, R., Zhilkin, P. (2004) A data-driven, flexible machine learning<br />
strategy for the classification of biomedical data. In: Dubitzky, W. and Azuaje, F.<br />
(eds.) Artificial Intelligence Methods and Tools for Systems Biology, Chapter 5.<br />
Computational Biology Series, Vol. 5. Springer, pp. 67–85.<br />
6. Somorjai, R. L., Demko, A., Mandelzweig, M., Dolenko, B., Nikulin, A. E.,<br />
Baumgartner, R. et al. (2004) Mapping high-dimensional data onto a relative<br />
distance plane – a novel, exact method for visualizing and characterizing highdimensional<br />
patterns. Journal of Biomedical Informatics 37, 366–379.<br />
7. Anderson, T. W., Bahadur, R. R. (1962) Classification into two multivariate normal<br />
distributions with different covariance matrices. Annals of Mathematical Statistics<br />
33, 420–431.<br />
8. Kohavi, R., John, G. H. (1997) Wrappers for feature subset selection. Artificial<br />
Intelligence 273–324.<br />
9. Nikulin, A. E., Dolenko, B., Bezabeh, T., Somorjai, R. L. (1998) Near-optimal<br />
region selection for feature space reduction: novel preprocessing methods for<br />
classifying MR spectra. NMR in Biomedicine 11, 209–217.<br />
10. Li, J., Zhang, Zh., Rosenzweig, J., Wang, Y. Y., Chan, D. W. (2002) Proteomics<br />
and bioinformatics approaches for identification of serum biomarkers to detect<br />
breast cancer. Clinical Chemistry 48, 1296–1304.<br />
11. Dataset “JNCI-7-3-02,” downloaded from the NIH/FDA Clinical Proteomics<br />
Program Databank (http://clinicalproteomics.steem.com).<br />
12. Vapnik, V. N. (2000) The nature of statistical learning theory, 2nd edition, Statistics<br />
for Engineering and Information Science. Springer, New York.<br />
13. Schölkopf, B., Smola, A. J. (2002) Learning with Kernels. Support Vector<br />
Machines, Regularization, and Beyond. The MIT Press, Cambridge, Mass.<br />
14. Lee, K. R., Lin, X., Park, D. C., Eslava, S. (2003) Megavariate data analysis<br />
of mass spectrometric proteomics data using latent variable projection method.<br />
Proteomics 3, 1680–1686.<br />
15. Efron, B. (1982) The Jackknife, the Bootstrap and Other Resampling Plans. SIAM,<br />
Philadelphia.<br />
16. Diamandis, E. P. (2003) Proteomic patterns in biological fluids: do they represent<br />
the future of cancer diagnostics Clinical Chemistry 49(8), 1272–1278.<br />
17. Somorjai, R. L., Nikulin, A. E., Pizzi, N., Jackson, D., Scarth, G., Dolenko, B.,<br />
Gordon, H., Russel, P., Lean, C. L., Delbridge, L., Mountford, C. E., Smith, I.<br />
C. P. (1995) Computerized consensus diagnosis: a classification strategy for the<br />
robust analysis of MR spectra. I. Application to 1 H spectra of thyroid neoplasms.<br />
Magnetic Resonance in Medicine 33, 257–263.<br />
18. Wolpert, D. H. (1992) Stacked generalization. Neural Networks 5, 241–259.<br />
19. Schapire, R. R. (1990) The strength of weak learnability. Machine Learning 5,<br />
197–227.<br />
20. Yasui, Y., Pepe, M., Thomson, M. L., Adam, B.-L., Wright Jr., G. L., Qu, Y.,<br />
Potter, J. D., Winget, M., Thornquist, M., Feng, Z. (2003) A data-analytic strategy
Pattern Recognition for Proteomic Spectra 395<br />
for protein biomarker discovery: profiling of high-dimensional data for cancer<br />
detection. Biostatistics 3, 449–463.<br />
21. Diamandis, E. P. (2004) Mass spectrometry as a diagnostic and a cancer biomarker<br />
discovery tool. Molecular and Cellular Proteomics 3(4), 367–378.<br />
22. Baggerly, K. A., Morris, J. S., Coombes, K. (2004) Cautions about reproducibility<br />
in mass spectrometry patterns: joint analysis of several proteomic data sets. Bioinformatics<br />
20, 777–785.<br />
23. Hu, J., Coombes, K. R., Morris, J. S., Baggerly, K. A. (2005) The importance<br />
of experimental design in mass spectrometry experiments: some cautionary tales.<br />
Briefings in Functional Genomics and Proteomics 3(4), 322–331.<br />
24. Shin, H. and Markey, M. K. (2006) A machine learning perspective on the development<br />
of clinical decision support systems utilizing mass spectra of blood samples.<br />
Journal of Biomedical Informatics 39, 2237–2248.<br />
25. Zhu, W., Wang, X., Ma, Y., Rao, M., Glimm, J., Kovach, J. S. (2003) Detection of<br />
cancer-specific markers amid massive mass spectral data. Proceedings of National<br />
Academic Science USA 100(25), 14666–14671.<br />
26. Somorjai, R. L. and Pranckeviciene, E. (2006) (Unpublished).<br />
27. Somorjai, R. L., Dolenko, B., Nikulin, A., Nickerson, P., Rush, D., Shaw, A., De<br />
Glogowski, M., Rendell, J., Deslauriers, R. (2002) Distinguishing normal from<br />
rejecting renal allografts: application of a three-stage classification strategy to MR<br />
and IR spectra of urine. Vibrational Spectroscopy 28, 97–102.
Index<br />
Affi-gel Protein A MAPS II kit, 277<br />
Aflatoxin B1 (AFB1), 194<br />
Alkaline phosphatase (ALP) assay, 233, 237<br />
Alpha-fetoprotein, 194<br />
Alzheimer’s disease, 310<br />
Annexin V, 172<br />
ANOVA, analysis of variance, 100, 112, 114, 259,<br />
330, 335, 344<br />
Antibody arrays<br />
construction, 270–272<br />
direct labeling methods, for cancer diagnostics,<br />
268–269<br />
formats for, 264–266<br />
labeling and hybridization, of serum samples,<br />
269–270, 272–274<br />
and other proteomic strategies, 263–264<br />
planar, labeling-hybridization methods and,<br />
266–268<br />
printing, 269<br />
scanning and data analysis, 274<br />
Anti-SAPE antibody, 267<br />
ArrayQuant scanners, 281<br />
AutoPix TM , 48. See also Laser-capture<br />
microdissection<br />
Axon scanners, 281<br />
Bayesian classification methods. See Linear<br />
Discriminant Analysis<br />
Bayes’s rule, 300<br />
BCA 200 Protein Assay Kit, 277<br />
Bead-based multiplex assays. See also Suspension<br />
antibody microarrays<br />
detection antibody, 254<br />
diluents, 254<br />
general protocol for, 254–255<br />
sample preparation, 252–254<br />
screening protocol, 255–256<br />
Biological variation analysis (BVA) module, of<br />
DeCyder, 112–113<br />
“Biomarker panel,” 11<br />
Bio-Rad Micro Bio-Spin P30 column, 277<br />
Biotinyl-tyramide, 275<br />
397<br />
BLAST, 352, 358<br />
Blood samples, preanalytical phase<br />
collection of, 36<br />
processing of, 37–38<br />
protease inhibitors, 38<br />
serum and plasma specimens, characteristics of,<br />
36–37<br />
Bradford assay, 225<br />
Carboxylated beads, 249. See also Suspension<br />
antibody microarrays<br />
activation, 251<br />
antibodies coupling to activated, 251<br />
cell-counting chamber and, 252<br />
washing and storage of coupled, 251<br />
1-(5-Carboxypentyl)-1-methylindodi-carbocyanine<br />
halide (Cy5) N-hydroxy-succinimidyl<br />
ester, 163<br />
1-(5-Carboxypentyl)-1-propylindocarbocyanine<br />
halide (Cy3) N-hydroxy-succinimidyl<br />
ester, 163<br />
CAST. See Clustering Affinity Search Technique<br />
Celecoxib, and cyclooxygenase-2 (COX-2), 183<br />
Charge-couple device (CCD) camera-based<br />
imaging system, 268, 293, 332<br />
CIMminer (Clustered Image Maps), 259<br />
Cleavable isotope-coded affinity tag (cICAT)<br />
labeling technology, 195, 197, 200–201<br />
Clinical proteomics, 1<br />
biological specimens, 6–7<br />
biomarker discovery and, 9–14<br />
overview and scope of, 2–3<br />
sample specimens and processing techniques, 4–9<br />
Cluster analysis techniques, 297–299, 306<br />
gene expression-based, 307<br />
Clustering Affinity Search Technique, 259<br />
Coomassie brilliant blue (CBB) staining, 68,<br />
332, 339<br />
Creatinine assay, 142<br />
Cyanines (Cy3/Cy5), 264, 333<br />
Cyclooxygenase-2 (COX-2) and celecoxib, 183
398 Index<br />
CyDye labeling, 95, 105–106, 109–110. See also<br />
Difference gel electrophoresis (DIGE)<br />
technology<br />
Cy2-labeled internal standard, 98–99<br />
minimal labeling method, 96<br />
pooled-sample internal standard for, 107<br />
saturation labeling, 96<br />
Cy3-labeled streptavidin, 267<br />
Cytokeratin 19 (CK19), 163<br />
DA-PLS method. See Discriminant analysis–partial<br />
least squares method<br />
DeCyder software, 101, 112–113, 342. See also<br />
Difference gel electrophoresis (DIGE)<br />
technology<br />
Delayed extraction-matrix assisted laser<br />
desorption/ionization time-of-flight mass<br />
spectrometry (DE-MALDI-TOF-MS), 194<br />
Dendrogram, 297, 299<br />
Dialysis, 150. See also Urine protein profiling, by<br />
2DE and MALDI-TOF-MS<br />
Difference gel electrophoresis (DIGE) technology,<br />
78, 93, 330, 332–333, 342–345<br />
ANOVA, 100, 112, 114<br />
in clinical setting, 103<br />
CyDye labeling, 95, 105–106, 109–110<br />
Cy2-labeled internal standard, 98–99<br />
minimal labeling method, 96<br />
pooled-sample internal standard for, 107<br />
saturation labeling, 96<br />
DeCyder suite of software tools, 101, 112–113<br />
2D gel electrophoresis and poststaining, 94,<br />
110–111<br />
experimental design, 108–109<br />
and statistical confidence, 112–114<br />
extended data analysis (EDA) software module,<br />
101, 113<br />
false discovery rate (FDR), 100<br />
hierarchical clustering (HC), 102<br />
labeling materials, 104–105<br />
LCM and, 163–170<br />
MeOH/CHCl 3 protocol, 106<br />
MuDPIT, 97<br />
multivariate statistical analysis, 114–115<br />
principle component analysis, 101<br />
SDS-polyacrylamide gel electrophoresis, 104<br />
software algorithms, 111–112<br />
Student’s t-test, 100, 112, 114<br />
DIGE/MS analysis, 103, 115<br />
Direct labeling, 264, 268<br />
protocol for, 272–274<br />
Discriminant analysis–partial least squares method,<br />
306, 309–311<br />
Discrimination power (DP), 303–305<br />
Dithiothreitol (DTT), 68<br />
Dot-plot style alignment, of protein sequence,<br />
358–359<br />
DTT/IAA equilibration procedure, 73<br />
ECM. See Extracellular matrix<br />
EDA software. See Extended data analysis software<br />
EDC/Sulfo-NHS, 249. See also Suspension<br />
antibody microarrays<br />
2DE-MALDI-TOF-MS assay, 194<br />
EnsEmbl, 352, 356<br />
Escherichia coli, 307<br />
Ethylene vinyl acetate (EVA) polymer, 161<br />
Ettan 2D electrophoresis system, 110<br />
Exosomes, 142<br />
ExPASy proteomics tools, 202, 352<br />
Expressed sequence tags (ESTs), 357<br />
Extended data analysis software, 101, 113<br />
Extracellular matrix, 8<br />
and matrix vesicles (MVs) proteomes, MS and,<br />
231–232<br />
alkaline phosphatase assay, 234, 237<br />
immunofluorescence staining and, 235, 239<br />
MC3T3-E1, osteoblast cell line, 233,<br />
236–237, 239<br />
nanoRPLC-MS/MS, 235, 238–239<br />
strong cation exchange liquid chromatography,<br />
of peptides, 234–235, 238<br />
Extracted ion chromatogram, 219, 221–222, 224<br />
Fetal bovine serum (FBS), 254<br />
Fisher’s F-test, 302<br />
Flow cytometric analysis, 160<br />
Fluorophores, 264, 267<br />
photobleaching and quenching of, 274–275<br />
Fourier transformer mass spectrometry (FTMS),<br />
172–174<br />
Free flow electrophoresis (FFE), plasma samples<br />
fractionation and, 60–61, 67<br />
Frontotemporal dementia, 310<br />
GAORS method. See Genetic algorithm-based<br />
optimal region selection method<br />
2D Gaussian function, 312<br />
Gaussian multivariate probability distribution, 300<br />
2-D Gel-electrophoresis (2-D GE), 292. See also<br />
2D-PAGE maps analysis<br />
LCM cells analysis by, 77<br />
HER-2/neu positive and -negative breast<br />
tumors, 87–88
Index 399<br />
isoelectric focusing (IEF), 79–80, 83–84<br />
MASCOT search engine, 87<br />
paraffin-embedded sections staining, 81–82<br />
preparation and analysis, 61, 67–69<br />
protein sample preparation, 79, 82–83<br />
SDS-PAGE, 79–80, 84–85<br />
silver staining and image analysis, 80, 85–86<br />
tissue block and tissue section preparation,<br />
78–79, 81<br />
trypsin digestion and MS analysis, 80, 86–87<br />
Gel-free mass spectrometry and LCM, 171–172<br />
Gene expression microarrays, 45<br />
GenePix Pro 3.0 software program, 280–281<br />
GeneScan program, 356<br />
Genetic algorithm-based optimal region selection<br />
method, 387–388. See also Proteomic mass<br />
spectroscopy<br />
gp96, tumor rejection antigen, 169<br />
GRANTA-519, 308<br />
HCC. See Hepatocellular carcinoma<br />
HCL. See Hierarchical clustering<br />
Hematoxylin and eosin (H&E) staining, tissue<br />
sample collection, 44, 47–48<br />
Hepatitis B/C virus (HBV/HCV), 194<br />
Hepatocellular carcinoma, 8, 11, 59, 67, 163,<br />
170, 193<br />
qualitative and quantitative proteomic analysis of<br />
cICAT labeling technology, 195, 197, 200–201<br />
2DE-MALDI-TOF-MS assay, 194<br />
2D-LC-MS/MS for, 195–197, 201–202<br />
ExPASy proteomics tools, 202<br />
LCM for, 194–196, 199<br />
nonenzymatic method (NESP), 196, 198–199<br />
toludine blue removal and protein mixture<br />
digestion, 197, 199–200<br />
HERMeS software package, PCA and, 306<br />
HER-2/neu oncogene, 85–86, 163<br />
Hierarchical clustering, 259, 299. See also Cluster<br />
analysis techniques<br />
High performance liquid chromatography, 169, 171,<br />
183, 212–214<br />
Horseradish peroxidase (HRP), 267<br />
HPLC. See High performance liquid<br />
chromatography<br />
HSP27 protein, 103<br />
HT-29, COX-2 expressing colon cancer cell<br />
line, 183<br />
Human Proteome Organization, 143<br />
Hydrogels, 271. See also Antibody arrays<br />
ICAT labeling. See Isotope-coded affinity tag<br />
labeling<br />
IMAC-Cu 2+ ProteinChips, 134, 136<br />
Image analysis. See also 2D-PAGE maps analysis<br />
by fuzzy logic principles<br />
image defuzzyfication, 312<br />
image digitalization, 311–312<br />
multi-dimensional scaling (MDS), 315–317<br />
PCA and classification methods, 315<br />
refuzzyfication, 312–313<br />
moment functions, 317<br />
Legendre moments, 318–319<br />
Image Master Platinum software, 339, 341<br />
Immobilized pH gradient strip. See also<br />
Two-dimensional electrophoresis (2DE)<br />
isoelectric focusing (IEF) with, 60, 65<br />
rehydration of, 64–65<br />
Immunofluorescence staining, 235<br />
InterPro, 352, 361<br />
Iodoacetamide (IAA), 68<br />
IPG strip. See Immobilized pH gradient strip<br />
Isotope-coded affinity tag labeling, 78, 195<br />
mass spectrometry (MS) and, 181<br />
celecoxib, cyclooxygenase-2 (COX-2)<br />
and, 183<br />
cell culture and harvest, 183, 186<br />
cell lysis, desalting, and protein quantitation,<br />
184–187<br />
cleavable reagents, 182, 185, 187–188<br />
cleaving biotin, 186, 189<br />
labeled peptides purification, 185–186,<br />
188–189<br />
proteins, denaturation and reduction of,<br />
185, 187<br />
quantitative proteomic analysis and, 184<br />
Java Runtime Environment, 370. See also<br />
msInspect, for LC-MS data analysis<br />
KMC (K-Means/K-Medians Clustering), 259<br />
Kolmogorov–Smirnov test, 335, 339, 341<br />
Kruskal–Wallis test, 335<br />
Laser-capture microdissection, 8, 44–45, 160. See<br />
also Tissue sample collection, for proteomics<br />
analysis<br />
AutoPix TM ,48<br />
cells analysis, by 2-D GE, 77<br />
HER-2/neu positive and -negative breast<br />
tumors, 87–88<br />
isoelectric focusing (IEF), 79–80, 83–84
400 Index<br />
MASCOT search engine, 87<br />
paraffin-embedded sections staining, 81–82<br />
protein sample preparation, 79, 82–83<br />
SDS-PAGE, 79–80, 84–85<br />
silver staining and image analysis, 80, 85–86<br />
tissue block and tissue section preparation,<br />
78–79, 81<br />
trypsin digestion and MS analysis, 80, 86–87<br />
development, 161<br />
different labeling techniques and, 170<br />
DIGE and, 163–170<br />
and 2-D GE, 162–163<br />
gel-free mass spectrometry and, 171–172<br />
for HCC and non-HCC hepatocytes isolation,<br />
194–195, 199<br />
LCM lysate, 49–50<br />
and mass spectrometry analysis, 172–174<br />
PixCell II instrument, 48–49, 161<br />
and protein chip technology, 172<br />
separation methods and, 171<br />
for tissue sample collection, 44–45<br />
Veritas TM ,48<br />
Laser microdissection and pressure catapulting, 8<br />
LC-ESI-MS/MS. See Liquid<br />
chromatography-electrospray ionization<br />
tandem mass spectrometry<br />
LCM. See Laser-capture microdissection<br />
LC-MS data. See Liquid chromatography-mass<br />
spectrometry data<br />
LC-MS/MS. See Liquid chromatography-tandem<br />
mass spectrometry<br />
LDA. See Linear Discriminant Analysis<br />
Legendre moments, 317–319<br />
Levene’s test, 334<br />
Linear Discriminant Analysis, 300–301,<br />
315–316<br />
Liquid chromatography-mass spectrometry data,<br />
370, 374–376, 377<br />
Liquid chromatography-mass spectrometry data<br />
analysis, msInspect for, 369<br />
data viewing and navigation, 371–373<br />
locating peptides in, 373–376<br />
low-quality peptides, elimination of, 376<br />
peptide quantitation, 376–378<br />
software installation for, 370<br />
Liquid chromatography-tandem mass spectrometry,<br />
170, 171<br />
label-free, for biomarker identification, 209–210<br />
albumin/IgG depletion, 211–213<br />
chromatographic alignment, 218–221<br />
data transformation and normalization, 222<br />
HPLC, 212–214<br />
mass spectrometer, 212, 214<br />
MS/MS spectral filtering, 216–217<br />
peptide identification, 217–218<br />
peptide quantification, 221–222<br />
statistical analysis, 223<br />
zoom scan data processing, 214–216<br />
LMPC. See Laser microdissection and pressure<br />
catapulting<br />
two-dimensional (2D-LC/MS/MS), 78<br />
Lysine labeling, 169<br />
MALDI/SELDI protein profiling, of serum,<br />
125–126<br />
on MALDI-TOF–TOF<br />
data collection, 131–132<br />
MB fractionation, of human serum, 131<br />
protein identification by, 132–133<br />
MB-based fractionation, 127, 128, 131<br />
SELDI and MALDI spectra acquisition, 129<br />
SELDI ProteinChip, 130<br />
(Magnetic bead based)<br />
on SELDI-TOF, 133<br />
ProteinChip arrays, 134–135<br />
SPA matrix addition, 135<br />
spectra collection on, 135–138<br />
MALDI-TOF-MS. See Matrix-assisted laser<br />
desorption time of flight mass spectrometry<br />
MALDI-TOF, peptide mass fingerprinting (PMF)<br />
and, 62, 71<br />
MALDI-TOF–TOF, serum protein profiling on<br />
data collection, 131–132<br />
MB fractionation, of human serum, 131<br />
protein identification by, 132–133<br />
Maleimide labeling, of cysteine<br />
sulfhydryls, 96<br />
MARS. See Multiple affinity removal system<br />
MASCOT software, 81, 87–88<br />
Mass spectrometry, 58–59, 214<br />
ICAT labeling and, 181<br />
celecoxib, cyclooxygenase-2 (COX-2)<br />
and, 183<br />
cell culture and harvest, 183, 186<br />
cell lysis, desalting, and protein quantitation,<br />
184–187<br />
cleavable reagents, 182, 185,<br />
187–188<br />
cleaving biotin, 186, 189<br />
labeled peptides purification, 185–186,<br />
188–189<br />
proteins, denaturation and reduction of,<br />
185, 187<br />
quantitative proteomic analysis and, 184<br />
LCM and, 172–174
Index 401<br />
Matrix-assisted laser desorption time of flight mass<br />
spectrometry, 125–126, 142, 163, 194<br />
LCM and, 171<br />
for urine protein profiling. See Urine protein<br />
profiling, by 2DE and MALDI-TOF-MS<br />
MAVER-1 cell lines, 308<br />
MC3T3-E1, osteoblast cell line, 233, 236–237, 239<br />
MDS technique. See Multi-dimensional scaling<br />
techniques<br />
MeOH/CHCl 3 protocol, 106<br />
Metalloproteins, 350<br />
MicroSol-IEF, ZOOM ® , 60, 65–66<br />
Miniaturized parallelized sandwich immunoassays.<br />
See Suspension antibody microarrays<br />
MS. See Mass spectrometry<br />
MS-Fit software, 81<br />
msInspect, for LC-MS data analysis, 369<br />
data viewing and navigation, 371–373<br />
locating peptides in, 373–376<br />
low-quality peptides, elimination, 376<br />
peptide quantitation, 376–378<br />
software installation for, 370<br />
MS/MS spectral filtering, 216–217<br />
Multi-dimensional scaling techniques, 313, 315–317<br />
MultiExperiment <strong>View</strong>er (MeV), 259<br />
Multiple affinity removal system, 59, 63–64<br />
Multiplexed bead-based flow-cytometry assays, 266<br />
Nanoflow reversed-phase LC-tandem mass<br />
spectrometry (nanoRPLC-MS/MS), 233, 235,<br />
238–239<br />
Non-enzymatic sample preparation (NESP), 194,<br />
196, 198–199<br />
One-antibody label-based assays, 264–266<br />
One-dimensional liquid chromatography coupled<br />
with tandem mass spectrometry<br />
(1D-LC-MS/MS), 201–202. See also<br />
Hepatocellular carcinoma<br />
16 O/ 18 O isotopic labeling, 78<br />
Osteoblasts, 232. See also Extracellular matrix<br />
MC3T3-E1, 233, 236–237, 239<br />
2D-PAGE maps analysis, 291<br />
dedicated software packages and, 292–294<br />
image analysis<br />
fuzzy logic, 311–317<br />
moment functions, 317–319<br />
spot volume datasets, analysis of, 294<br />
cluster analysis, 297–299<br />
DA-PLS method, 309–311<br />
linear discriminant analysis, 300–301<br />
pattern recognition methods, 306–309<br />
PLS regression and DA-PLS regression, 306<br />
principal component analysis, 294–297<br />
SIMCA method, 301–305<br />
PALM microlaser dissector, 161<br />
Parkinson’s disease, 310<br />
Partial least squares regression, 306, 308, 338<br />
Pattern recognition methods<br />
cluster analysis. See Cluster analysis techniques<br />
PCA. See Principle component analysis<br />
proteomic mass spectroscopy and. See Proteomic<br />
mass spectroscopy<br />
SIMCA classification. See Soft-independent<br />
model of class analogy method<br />
PCA. See Principle component analysis<br />
PCa-24 protein, in epithelial cells, 172<br />
PDB. See Protein data bank<br />
PDQuest system, 293, 308<br />
Peptide mass fingerprinting, MALDI-TOF and,<br />
62, 71<br />
Peptide/protein separation system, 171<br />
PerkinElmer scanners, 281<br />
Pfam, 352, 360<br />
PIN. See Prostatic intraepithelial neoplasia<br />
PIVKA-II, 194<br />
PixCell II system, 48–49, 77, 82–83, 161. See also<br />
Laser-capture microdissection<br />
Planar antibody arrays, 248, 264. See also Antibody<br />
arrays<br />
main formats of, 265<br />
types of, labeling-hybridization methods and,<br />
266–268<br />
10plex soluble receptor assay, 255–256, 258. See<br />
also Bead-based multiplex assays<br />
PLS regression. See Partial least squares regression<br />
PMF. See Peptide mass fingerprinting<br />
PMS. See Proteomic mass spectroscopy<br />
Position-specific scoring matrix, 361<br />
Post-translational modification (PTM) profiling, on<br />
selected spots, 71–72<br />
Principle component analysis, 101, 259, 294–297,<br />
308, 315–316, 343. See also 2D-PAGE maps<br />
analysis<br />
Escherichia coli, 307<br />
for explorative data analysis, 336–338<br />
in HERMeS software package, 306<br />
U937 human lymphoma cell line and, 307<br />
Prostatic intraepithelial neoplasia, 44<br />
Protein chip technology and LCM, 172<br />
Protein data bank, 352, 360–361<br />
Protein precipitation, 143–144
402 Index<br />
Protein profiling of human plasma samples , by<br />
two-dimensional electrophoresis, 57<br />
coomassie brilliant blue G-250 staining, 68<br />
destaining, in-gel deglycosylation and in-gel<br />
tryptic digestion, 61–62, 69<br />
2D gels preparation and analysis, 61, 67–69<br />
difference in gel electrophoresis (DIGE)<br />
system, 59<br />
free flow electrophoresis (FFE), samples<br />
fractionation by, 60–61, 67<br />
high-abundance proteins depletion, by<br />
immunoaffinity column, 59, 63–64<br />
HPPP, 58<br />
IPG gel strip rehydration, 64–65<br />
isoelectric focusing (IEF), with IPG strip,<br />
60, 65<br />
MALDI plating and peptides desalting,<br />
62, 69–71<br />
mass spectrometry (MS), 58–59<br />
microscale solution isoelectric focusing,<br />
ZOOM ® , 60, 65–66<br />
peptide mass fingerprinting, MALDI-TOF and,<br />
62, 71<br />
PTMs profiling, on selected spots,<br />
71–72<br />
samples preparation, 59, 62<br />
TCA/acetone precipitation, 64<br />
Proteomic data, statistical analysis, 327<br />
classical dyes, 339–342<br />
confirmatory univariate data analysis, 333–335<br />
DIGE approach, 342–345<br />
experimental design for, 328<br />
data processing, 330–333<br />
pooling, 330<br />
replicates, 329–330<br />
exploratory multivariate data analysis, 335<br />
marker selection, 338–339<br />
principal component analysis, 336–338<br />
Proteomic mass spectroscopy, 383<br />
statistical classification strategy (SCS) for<br />
classifier aggregation, 390<br />
data visualization, 384–385<br />
feature selection/extraction (FSE), 386–388<br />
preprocessing, 385–386<br />
robust classifier development, 388–390<br />
Proteomics analysis, for tissue sample collection<br />
formalin fixation, 43–44<br />
hematoxylin staining, 47–48<br />
immunocapture procedure, 46<br />
immunofluorescence staining, 48<br />
laser-capture microdissection (LCM), 44–45<br />
AutoPix TM ,48<br />
PixCell II instrument, 48–49<br />
Veritas TM ,48<br />
LCM lysate, 49–50<br />
SELDI-TOF-MS, 46<br />
PSSM. See Position-specific scoring matrix<br />
QTC (QT CLUST), 260<br />
Resonance light scattering (RLS), 268<br />
Reverse protein arrays, 268<br />
Rolling-circle amplification (RCA), 268<br />
SCX-LC. See Strong cation exchange liquid<br />
chromatography<br />
SDS-PAGE. See Sodium dodecyl<br />
sulfate-polyacrylamide gel electrophoresis<br />
SELDI. See Surface-enhanced laser<br />
desorption/ionization<br />
SELDI-TOF. See Surface-enhanced laser<br />
desorption/ionization time-of-flight<br />
Self Organizing Maps (SOM), 259<br />
Self Organizing Tree Algorithm (SOTA), 259<br />
Shapiro-Wilk test, 334, 339<br />
Significance Analysis of Microarrays (SAM), 259<br />
Silver staining, 80, 332–333. See also Laser-capture<br />
microdissection<br />
and image analysis, 85–86<br />
SIMCA method. See Soft-independent model of<br />
class analogy method<br />
SKBR-3, breast cancer cell line, 171<br />
Sodium dodecyl sulfate-polyacrylamide gel<br />
electrophoresis, 84–85, 94, 96,<br />
104, 110–111<br />
isoelectric focusing (IEF) and, 79–80<br />
PROTEAN II xi Cell system (Bio-Rad) for, 84<br />
Soft-independent model of class analogy method,<br />
301–305, 307–308<br />
Streptavidin-R-Phycoerythrin (SAPE), 267<br />
Strong cation exchange liquid chromatography,<br />
234–235, 238<br />
Strong cation exchange liquid chromatography, of<br />
peptides, 233, 234–235, 238<br />
Student’s T-test, 334<br />
2-(4-Sulfophenylazo)-1,8-dihydroxy-3,6-<br />
naphthalenedisulfonic acid (SPADNS), 60, 67<br />
Support vector machines, 388–389. See also<br />
Proteomic mass spectroscopy<br />
Surface-enhanced laser desorption/ionization, 9, 13,<br />
125–126, 142, 172, 194<br />
serum protein profiling on, 133<br />
ProteinChip arrays, 134–135<br />
SPA matrix addition, 135<br />
spectra collection on, 135–138
Index 403<br />
Suspension antibody microarrays, 247–248<br />
bead-based multiplex assays processing,<br />
252–256<br />
limit of detection (LOD), 257<br />
miniaturized multiplexed protein assays,<br />
analytical performance, 256–259<br />
pattern generation, 259–260<br />
principle of, 249<br />
production, coupling to carboxylated<br />
microspheres, 249–252<br />
SVMs. See Support vector machines<br />
TAAs arrays. See Tumor-associated antigen arrays<br />
TCA/acetone precipitation, 2DE and, 64<br />
Tissue sample collection, for proteomics analysis<br />
formalin fixation, 43–44<br />
hematoxylin staining, 47–48<br />
immunocapture procedure, 46<br />
immunofluorescence staining, 48<br />
laser-capture microdissection (LCM), 44–45<br />
AutoPix TM ,48<br />
PixCell II instrument, 48–49<br />
Veritas TM ,48<br />
LCM lysate, 49–50<br />
SELDI-TOF-MS, 46<br />
Tributylphosphine (TBP), 68<br />
Trichloroacetic acid (TCA) precipitation, 143–144,<br />
146–147, 151<br />
Trifluoroacetic acid (TFA), 182<br />
Tris buffer, 277<br />
TTEST (T-tests), 259<br />
Tumor-associated antigen arrays, 266, 269<br />
Two-dimensional electrophoresis (2DE), 11,<br />
194, 328<br />
biological replicates, 329–330<br />
LCM and, 162–163<br />
for protein profiling of human plasma<br />
samples, 57<br />
coomassie brilliant blue G-250 staining, 68<br />
destaining, in-gel deglycosylation and in-gel<br />
tryptic digestion, 61–62, 69<br />
2D gels preparation and analysis, 61, 67–69<br />
difference in gel electrophoresis (DIGE)<br />
system, 59<br />
free flow electrophoresis (FFE), samples<br />
fractionation by, 60–61, 67<br />
high-abundance proteins depletion, by<br />
immunoaffinity column, 59, 63–64<br />
HPPP, 58<br />
IPG gel strip rehydration, 64–65<br />
isoelectric focusing (IEF), with IPG strip,<br />
60, 65<br />
MALDI plating and peptides desalting, 62,<br />
69–71<br />
mass spectrometry (MS), 58–59<br />
microscale solution isoelectric focusing,<br />
ZOOM ® , 60, 65–66<br />
peptide mass fingerprinting, MALDI-TOF and,<br />
62, 71<br />
PTMs profiling, on selected spots, 71–72<br />
samples preparation, 59, 62<br />
TCA/acetone precipitation, 64<br />
technical replicates, 329–330<br />
for urine protein profiling. See Urine protein<br />
profiling, by 2DE and MALDI-TOF-MS<br />
Two-dimensional fluorescence difference gel<br />
electrophoresis (2-D DIGE), 78 see also<br />
Difference Gel electrophoresis (DIGE)<br />
technology<br />
Two-dimensional liquid chromatography tandem<br />
mass spectrometry (2D-LC-MS/MS), 78, 170<br />
see also liquid chromatography tandem mass<br />
spectrometry<br />
for HCC and non-HCC hepatocytes isolation,<br />
195–197, 201–202<br />
Two-dimensional polyacrylamide gel<br />
electrophoresis (2D PAGE),<br />
162–163, 174 see also 2D gel electrophoresis,<br />
2D gels<br />
Two-factor ANOVA (TFA), 259<br />
Ultrafiltration technique, 144<br />
Urine protein profiling, by 2DE and<br />
MALDI-TOF-MS, 141–142<br />
analytical/profiling techniques, 145–146<br />
organic solvent precipitation protocol, 145,<br />
147–148<br />
protein precipitation, 143–144<br />
TCA/acetone precipitation protocol, 145–147<br />
ultrafiltration-SPE, 144–145, 148–149<br />
urine SPE, 149<br />
Veritas TM , 48. See also Laser-capture<br />
microdissection<br />
Web-based tools, for protein classification, 349<br />
BLAST, 352, 358<br />
dot-plot style alignment, of protein sequence,<br />
358–359<br />
EnsEmbl, 352, 356<br />
evolution-based classification schemes, 351<br />
ExPASy, 352<br />
expressed sequence tags (ESTs), 357<br />
GeneScan program, 356
404 Index<br />
InterPro, 352, 361<br />
MEROPS, 361<br />
metalloproteins, 350<br />
PDB, 352, 360–361<br />
Pfam, 352, 360<br />
PRINTS, 361<br />
PROSITE, 361<br />
sequence and structure of proteins and, 352–356<br />
SMART, 360<br />
Western blotting protocols, 275<br />
XIC. See Extracted ion chromatogram<br />
ZOOM ® , MicroSol-IEF, 60, 65–66<br />
Zoom scan triple-play experiment, 214