View - ResearchGate

Clinical Proteomics

METHODS IN MOLECULAR BIOLOGY TM 

John M. Walker, SERIES EDITOR 

447. Alcohol: Methods and Protocols, edited by 

Laura E. Nagy, 2008 

446. Post-translational Modification of Proteins: 

Tools for Functional Proteomics, Second Edition, 

edited by Christoph Kannicht, 2008 

443. Molecular Modeling of Proteins, edited by 

Andreas Kukol, 2008 

439. Genomics Protocols: Second Edition, edited by 

Mike Starkey and Ramnanth Elaswarapu, 2008 

438. Neural Stem Cells: Methods and Protocols, 

Second Edition, edited by Leslie P. Weiner, 2008 

437. Drug Delivery Systems, edited by Kewal K. Jain, 

2008 

436. Avian Influenza Virus, edited by Erica Spackman, 

2008 

435. Chromosomal Mutagenesis, edited by Greg Davis 

and Kevin J. Kayser, 2008 

434. Gene Therapy Protocols: Volume 2: Design and 

Characterization of Gene Transfer Vectors edited 

by Joseph M. LeDoux, 2008 

433. Gene Therapy Protocols: Volume 1: Production 

and In Vivo Applications of Gene Transfer Vectors, 

edited by Joseph M. LeDoux, 2007 

432. Organelle Proteomics, edited by Delphine Pflieger 

and Jean Rossier, 2008 

431. Bacterial Pathogenesis: Methods and Protocols, 

edited by Frank DeLeo and Michael Otto, 2008 

430. Hematopoietic Stem Cell Protocols, edited by 

Kevin D. Bunting, 2008 

429. Molecular Beacons: Signalling Nucleic Acid 

Probes, Methods and Protocols, edited by Andreas 

Marx and Oliver Seitz, 2008 

428. Clinical Proteomics: Methods and Protocols, 

edited by Antonia Vlahou, 2008 

427. Plant Embryogenesis, edited by Maria Fernanda 

Suarez and Peter Bozhkov, 2008 

426. Structural Proteomics: High-Throughput Methods, 

edited by Bostjan Kobe, Mitchell Guss, and Huber 

Thomas, 2008 

425. 2D PAGE: Volume 2: Applications and Protocols, 

edited by Anton Posch, 2008 

424. 2D PAGE: Volume 1:, Sample Preparation and 

Pre-Fractionation, edited by Anton Posch, 2008 

423. Electroporation Protocols, edited by Shulin Li, 

2008 

422. Phylogenomics, edited by William J. Murphy, 2008 

421. Affinity Chromatography: Methods and 

Protocols, Second Edition, edited by Michael 

Zachariou, 2008 

420. Drosophila: Methods and Protocols, edited by 

Christian Dahmann, 2008 

419. Post-Transcriptional Gene Regulation, edited by 

Jeffrey Wilusz, 2008 

418. Avidin-Biotin Interactions: Methods and 

Applications, edited by Robert J. McMahon, 2008 

417. Tissue Engineering, Second Edition, edited by 

Hannsjörg Hauser and Martin Fussenegger, 2007 

416. Gene Essentiality: Protocols and Bioinformatics, 

edited by Svetlana Gerdes and Andrei L. Osterman, 

2008 

415. Innate Immunity, edited by Jonathan Ewbank and 

Eric Vivier, 2007 

414. Apoptosis in Cancer: Methods and Protocols, 

edited by Gil Mor and Ayesha Alvero, 2008 

413. Protein Structure Prediction, Second Edition, 

edited by Mohammed Zaki and Chris Bystroff, 2008 

412. Neutrophil Methods and Protocols, edited by 

Mark T. Quinn, Frank R. DeLeo, and Gary M. 

Bokoch, 2007 

411. Reporter Genes for Mammalian Systems, edited 

by Don Anson, 2007 

410. Environmental Genomics, edited by Cristofre 

C. Martin, 2007 

409. Immunoinformatics: Predicting Immunogenicity 

In Silico, edited by Darren R. Flower, 2007 

408. Gene Function Analysis, edited by Michael Ochs, 

2007 

407. Stem Cell Assays, edited by Vemuri C. Mohan, 

2007 

406. Plant Bioinformatics: Methods and Protocols, 

edited by David Edwards, 2007 

405. Telomerase Inhibition: Strategies and Protocols, 

edited by Lucy Andrews and Trygve O. Tollefsbol, 

2007 

404. Topics in Biostatistics, edited by Walter T. 

Ambrosius, 2007 

403. Patch-Clamp Methods and Protocols, edited by 

Peter Molnar and James J. Hickman 2007 

402. PCR Primer Design, edited by Anton Yuryev, 2007 

401. Neuroinformatics, edited by Chiquito J. Crasto, 

2007 

400. Methods in Membrane Lipids, edited by Alex 

Dopico, 2007 

399. Neuroprotection Methods and Protocols, edited 

by Tiziana Borsello, 2007 

398. Lipid Rafts, edited by Thomas J. McIntosh, 2007 

397. Hedgehog Signaling Protocols, edited by Jamila I. 

Horabin, 2007 

396. Comparative Genomics, Volume 2, edited by 

Nicholas H. Bergman, 2007 

395. Comparative Genomics, Volume 1, edited by 

Nicholas H. Bergman, 2007 

394. Salmonella: Methods and Protocols, edited by 

Heide Schatten and Abraham Eisenstark, 2007 

393. Plant Secondary Metabolites, edited by Harinder 

P. S. Makkar, P. Siddhuraju, and Klaus Becker, 

2007 

392. Molecular Motors: Methods and Protocols, edited 

by Ann O. Sperry, 2007 

391. MRSA Protocols, edited by Yinduo Ji, 2007 

390. Protein Targeting Protocols Second Edition, 

edited by Mark van der Giezen, 2007 

389. Pichia Protocols, Second Edition, edited by James 

M. Cregg, 2007 

388. Baculovirus and Insect Cell Expression 

Protocols, Second Edition, edited by David W. 

Murhammer, 2007 

387. Serial Analysis of Gene Expression (SAGE): 

Digital Gene Expression Profiling, edited by Kare 

Lehmann Nielsen, 2007 

386. Peptide Characterization and Application 

Protocols, edited by Gregg B. Fields, 2007 

385. Microchip-Based Assay Systems: Methods and 

Applications, edited by Pierre N. Floriano, 2007

METHODS IN MOLECULAR BIOLOGY TM 

Clinical Proteomics 

Methods and Protocols 

Edited by 

Antonia Vlahou 

Biomedical Research Foundation, 

Academy of Athens, Athens, Greece

Editor 


Academy of Athens 

Biomedical Research Foundation 

Athens, Greece 

Athens 115 27 

e-mail: vlahoua@bioacademy.gr 

Series Editor 

John M. Walker 

School of Life Sciences 

University of Hertfordshire 

Hatfield, Herts., AL10 9AB 

UK 

ISBN: 978-1-58829-837-9 e-ISBN: 978-1-59745-117-8 

Library of Congress Control Number: 2007939413 

©2008 Humana Press, a part of Springer Science+Business Media, LLC 

All rights reserved. This work may not be translated or copied in whole or in part without the written 

permission of the publisher (Humana Press, 999 Riverview Drive, Suite 208, Totowa, NJ 07512 USA), 

except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form 

of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar 

methodology now known or hereafter developed is forbidden. 

The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are 

not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to 

proprietary rights. 

While the advice and information in this book are believed to be true and accurate at the date of going to 

press, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors 

or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the 

material contained herein. 

Printed on acid-free paper 

987654321 

springer.com

Preface 

Clinical proteomics has rapidly evolved over the past few years and is 

continuously growing as new methodologies and technologies emerge. In 

this volume, leading researchers in the field have contributed their stateof-the-art 

methodologies on protein profiling and identification of disease 

biomarkers in tissues, microdissected cells, and body fluids. Experimental 

approaches involving application of two-dimensional electrophoresis, multidimensional 

liquid chromatography, SELDI/MALDI mass spectrometry and 

protein arrays, as well as the bioinformatics and statistical tools pertinent to 

the analysis of proteomics data are described. As stated in the introductory 

chapter by Prof. Paik, the Vice President of the Human Proteome Organization, 

“clinical proteomics needs the integration of biochemistry, pathology, 

analytical technology, bioinformatics, and proteome informatics to develop 

highly sensitive diagnostic tools for routine clinical care in the future.” The 

multi-disciplinary character of clinical proteomics approaches is evident in the 

detailed step-by-step protocols described in this volume, which makes them 

of potential use to a wide range of researchers, including clinicians, molecular 

biologists, chemists, bioinformaticians, and computational biologists. 


v

Acknowledgments 

The editor gratefully acknowledges all contributing authors for their 

collaboration, which made this project possible and brought it into fruition; the 

series editor, Prof. John Walker, whose help and guidance have been instrumental; 

Mr. Patrick Marton, Mr. David Casey, and the whole production team 

at Humana headed by the late Mr. Tom Laningan for making an excellent 

production of this book. 

vii

Contents 

Preface ............................................................... 

Acknowledgments .................................................... 

Contributors .......................................................... 

v 

vii 

xiii 

1. Overview and Introduction to Clinical Proteomics ................. 1 

Young-Ki Paik, Hoguen Kim, Eun-Young Lee, 

Min-Seok Kwon, and Sang Yun Cho 

Part I: Specimen Collection for Clinical Proteomics 

2. Specimen Collection and Handling: Standardization of Blood 

Sample Collection .............................................. 35 

Harald Tammen 

3. Tissue Sample Collection for Proteomics Analysis.................. 43 

Jose I. Diaz, Lisa H. Cazares, and O. John Semmes 

Part II: Clinical Proteomics by 2DE and Direct 

MALDI/SELDI MS Profiling 

4. Protein Profiling of Human Plasma Samples 

by Two-Dimensional Electrophoresis ........................... 57 

Sang Yun Cho, Eun-Young Lee, Hye-Young Kim, Min-Jung 

Kang, Hyoung-Joo Lee, Hoguen Kim, and Young-Ki Paik 

5. Analysis of Laser Capture Microdissected Cells 

by 2-Dimensional Gel Electrophoresis .......................... 77 

Daohai Zhang and Evelyn Siew-Chuan Koay 

6. Optimizing the Difference Gel Electrophoresis (DIGE) 

Technology .................................................... 93 

David B. Friedman and Kathryn S. Lilley 

7. MALDI/SELDI Protein Profiling of Serum 

for the Identification of Cancer Biomarkers......................125 

Lisa H. Cazares, Jose I. Diaz, Rick R. Drake, and O. John Semmes 

8. Urine Sample Preparation and Protein Profiling 

by Two-Dimensional Electrophoresis and Matrix-Assisted Laser 

Desorption Ionization Time of Flight Mass Spectroscopy ........ 141 

Panagiotis G. Zerefos and Antonia Vlahou 

ix

x 

Contents 

9. Combining Laser Capture Microdissection and Proteomics 

Techniques .................................................... 159 

Dana Mustafa, Johan M. Kros, and Theo Luider 

Part III: Clinical Proteomics by LC-MS Approaches 

10. Comparison of Protein Expression by Isotope-Coded Affinity 

Tag Labeling ................................................... 181 

Zhen Xiao and Timothy D. Veenstra 

11. Analysis of Microdissected Cells by Two-Dimensional 

LC-MS Approaches .............................................193 

Chen Li, Yi-Hong, Ye-Xiong Tan, Jian-Hua Ai, 

Hu Zhou, Su-Jun Li, Lei Zhang, Qi-Chang Xia, 

Jia-Rui Wu, Hong-Yang Wang, and Rong Zeng 

12. Label-Free LC-MS Method for the Identification of Biomarkers ..... 209 

Richard E. Higgs, Michael D. Knierman, 

Valentina Gelfanova, Jon P. Butler, and John E. Hale 

13. Analysis of the Extracellular Matrix and Secreted Vesicle 

Proteomes by Mass Spectrometry ............................... 231 

Zhen Xiao, Thomas P. Conrads, George R. Beck, Jr., 

and Timothy D. Veenstra 

Part IV: Clinical Proteomics and Antibody Arrays 

14. Miniaturized Parallelized Sandwich Immunoassays ................ 247 

Hsin-Yun Hsu, Silke Wittemann, and Thomas O. Joos 

15. Dissecting Cancer Serum Protein Profiles Using 

Antibody Arrays ................................................263 

Marta Sanchez-Carbayo 

Part V: Statistics and Bioinformatics in Clinical 

Proteomics Data Analysis 

16. 2D-PAGE Maps Analysis .......................................... 291 

Emilio Marengo, Elisa Robotti, and Marco Bobba 

17. Finding the Significant Markers: Statistical Analysis 

of Proteomic Data..............................................327 

Sebastien Christian Carpentier, Bart Panis, 

Rony Swennen, and Jeroen Lammertyn 

18. Web-Based Tools for Protein Classification ........................ 349 

Costas D. Paliakasis, Ioannis Michalopoulos, and Sophia Kossida

Contents 

xi 

19. Open-Source Platform for the Analysis of Liquid 

Chromatography-Mass Spectrometry (LC-MS) Data .............. 369 

Matthew Fitzgibbon, Wendy Law, Damon May, 

Andrea Detter, and Martin McIntosh 

20. Pattern Recognition Approaches for Classifying Proteomic Mass 

Spectra of Biofluids ............................................ 383 

Ray L. Somorjai 

Index ..................................................................... 397

Contributors 

Jian-Hua Ai • Eastern Hepatobiliary Surgery Hospital, Shanghai, China 

George R. Beck, Jr • Division of Endocrinology, Metabolism and Lipids 

Emory University, School of Medicine, Atlanta, GA 

Marco Bobba • University of Eastern Piedmont, Department 

of Environmental and Life Sciences, Alessandria, Italy 

Jon P. Butler • Lilly Corporate Center, Indianapolis, IN 

Sebastien Christian Carpentier • Faculty of Bioscience Engineering, 

Division of Crop Biotechnics, K.U. Leuven, Leuven, Belgium 

Lisa H. Cazares • The George L. Wright Jr. Center for Biomedical 

Proteomics Eastern Virginia Medical School, Norfolk, VA 

Sang Yun Cho • Yonsei Biomedical Proteome Research Center, Department 

of Biochemistry, College of Sciences, Seoul, Korea 

Thomas P. Conrads • Laboratory of Proteomics and Analytical 

Technologies SAIC-Frederick, Inc., National Cancer Institute at Frederick, 

Frederick, MD 

Andrea Detter • Fred Hutchinson Cancer Research Center, Seattle, WA 

Jose I. Diaz • Cancer Therapy Research Center’s Institute for Drug 

Development, University of Texas, Health Science Center, San Antonio, TX 

Rick R. Drake • Eastern Virginia Medical School, Norfolk, VA 

Matthew Fitzgibbon • Fred Hutchinson Cancer Research Center, 

Seattle, WA 

David B. Friedman • Proteomics Laboratory, Mass Spectrometry Research 

Center, Department of Biochemistry, Vanderbilt University School 

of Medicine, Nashville, TN 

Valentina Gelfanova • Lilly Corporate Center, Indianapolis, IN 

John E. Hale • Lilly Corporate Center, Indianapolis, IN 

Richard E. Higgs • Lilly Corporate Center, Indianapolis, IN 

Yi-Hong • Eastern Hepatobiliary Surgery Hospital, Shanghai, China 

Hsin-Yun Hsu • Biochemistry Department NMI Natural and Medical 

Sciences Institute at the University of Tuebingen, Reutlingen, Germany 

Thomas O. Joos • Biochemistry Department, NMI Natural and Medical 


Min-Jung Kang • Yonsei Biomedical Proteome Research Center, 

Department of Biochemistry, College of Sciences, Seoul, Korea 

xiii

xiv 

Contributors 

Hoguen Kim • Department of Pathology, College of Medicine, Yonsei 

University, Seoul, Korea 

Hye-Young Kim • Yonsei Biomedical Proteome Research Center, 


Michael D. Knierman • Lilly Corporate Center, Indianapolis, IN 

Evelyn Siew-Chuan Koay • Department of Pathology, Yong Loo Lin 

School of Medicine, National University of Singapore, and Molecular 

Diagnosis Center, Department of Laboratory Medicine. National University 

Hospital, Singapore 

Sophia Kossida • Division of Biotechnology, Biomedical Research 

Foundation, Academy of Athens, Athens, Greece 

Johan M. Kros • Department of Pathology, Josephine Nefkens Institute 

Erasmus Medical Center, Rotterdam, The Netherlands 

Min-Seok Kwon • Yonsei Biomedical Proteome Research Center, 


Jeroen Lammertyn • Faculty of Bioscience Engineering, Division 

of Mechatronics, Biostatistics and Sensors, K.U. Leuven, Leuven, Belgium 

Wendy Law • Fred Hutchinson Cancer Research Center, Seattle, WA 

Eun-Young Lee • Yonsei Biomedical Proteome Research Center, 


Hyoung-Joo Lee • Yonsei Biomedical Proteome Research Center, 


Chen Li • Research Center for Proteome Analysis, Institute of Biochemistry 

and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese 

Academy of Sciences, Shanghai, China 

Su-Jun Li • Research Center for Proteome Analysis, Institute of 

Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, 

Chinese Academy of Sciences, Shanghai, China 

Kathryn S. Lilley • Cambridge Centre for Proteomics, Department 

of Biochemistry, University of Cambridge, United Kingdom 

Theo Luider • Laboratories of Neuro-Oncology/Clinical and Cancer 

Proteomics, Josephine Nefkens Institute Erasmus Medical Center, 

Rotterdam, The Netherlands 

Emilio Marengo • Department of Environmental and Life Sciences, 

University of Eastern Piedmont, Alessandria, Italy 

Damon May • Fred Hutchinson Cancer Research Center, Seattle, WA 

Martin McIntosh • Fred Hutchinson Cancer Research Center, Seattle, WA 

Ioannis Michalopoulos • Biomedical Research Foundation, Academy 

of Athens, Athens, Greece 

Dana Mustafa • Department of Pathology, Josephine Nefkens Institute 

Erasmus Medical Center, Rotterdam, The Netherlands

Contributors 

xv 

Young-Ki Paik • Department of Biochemistry, Yonsei Proteome Research 

Center & Biomedical Proteome Research Center, Seoul, Korea 

Costas D. Paliakasis • Biomedical Research Foundation, Academy 

of Athens, Athens, Greece 

Bart Panis • Faculty of Bioscience Engineering, Division of Crop 

Biotechnics, K.U. Leuven, Leuven, Belgium 

Elisa Robotti • Department of Environmental and Life Sciences, University 

of Eastern Piedmont, Alessandria, Italy 

Marta Ṣanchez-Carbayo • Tumor Markers Group, Spanish National 

Cancer Center (CNI0), Madrid, Spain 

O. John Semmes • The George L. Wright Jr. Center for Biomedical 

Proteomics, Eastern Virginia Medical School, Norfolk, VA 

Ray L. Somorjai • Biomedical Informatics Institute for Biodiagnostics, 

National Research Council, Winnipeg, Manitoba, Canada 

Rony Swennen • Faculty of Bioscience Engineering, Division of Crop 

Biotechnics, K.U. Leuven, Leuven, Belgium 

Harald Tammen • Digilab BioVisioN GmbH, Hannover, Germany 

Ye-Xiong Tan • Eastern Hepatobiliary Surgery Hospital, Shanghai, China 

Timothy D. Veenstra • Laboratory of Proteomics and Analytical 

Technologies, SAIC-Frederick, Inc., National Cancer Institute at Frederick, 

Frederick, MD 

Antonia Vlahou • Division of Biotechnology, Biomedical Research 

Foundation, Academy of Athens, Athens, Greece 

Hong-Yang Wang • Eastern Hepatobiliary Surgery Hospital, 

Shanghai, China 

Silke Wittemann • Biochemistry Department, NMI Natural and Medical 


Jia-Rui Wu • Research Center for Proteome Analysis, Institute of 



Qi-Chang Xia • Research Center for Proteome Analysis, Institute of 



Zhen Xiao • Laboratory of Proteomics and Analytical Technologies, 

SAIC-Frederick, Inc., National Cancer Institute at Frederick, 

Frederick, MD 

Rong Zeng • Research Center for Proteome Analysis, Institute of 



Panagiotis G. Zerefos • Division of Biotechnology, Biomedical Research 

Foundation, Academy of Athens, Athens, Greece

xvi 

Contributors 

Daohai Zhang • Molecular Diagnosis Center Department of Laboratory 

Medicine, National University Hospital, Singapore and Department of 

Pathology, Yong Loo Lin School of Medicine, National University of 

Singapore, Singapore 

Lei Zhang • Research Center for Proteome Analysis, Institute of 



Hu Zhou • Research Center for Proteome Analysis, Institute of Biochemistry 

and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese 

Academy of Sciences, Shanghai, China

1 

Overview and Introduction to Clinical Proteomics 

Young-Ki Paik, Hoguen Kim, Eun-Young Lee, Min-Seok Kwon, 

and Sang Yun Cho 

Summary 

As the field of clinical proteomics progresses, discovery of disease biomarkers becomes 

paramount. However, the immediate challenges are to establish standard operating procedures 

for both clinical specimen handling and reduction of sample complexity and to 

increase the ability to detect proteins and peptides present in low amounts. The traditional 

concept of a disease biomarker is shifting toward a new paradigm, namely, that an 

ensemble of proteins or peptides would be more efficient than a single protein/peptide 

in the diagnosis of disease. Because clinical proteomics usually requires easy access to 

well-defined fresh clinical specimens (including morphologically consistent tissue and 

properly pretreated body fluids of sufficient quantity), biorepository systems need to be 

established. Here, we address these questions and emphasize the necessity of developing 

various microdissection techniques for tissue specimens, multidimensional fractionation 

for body fluids, and other related techniques (including bioinformatics), tools which could 

become integral parts of clinical proteomics for disease biomarker discovery. 

Key Words: biomarker; body fluids; clinical proteomics; translational proteomics; 

depletion; biorepository; multidimensional fractionation; specimen bank; biomarker panel. 

Abbreviations: CSF: Cerebrospinal Fluid, SILAC: Stable Isotope Labeling with 

Amino acids in Cell culture, FFE: Free Flow Electrophoresis, IMAC: Immobilized Metal 

Affinity Chromatography, 2DE: 2-dimensional Gel electrophoresis, CBB: Coomassie 

Brilliant Blue, SELDI: Surface-Enhanced Laser Desorption/Ionization, MALDI: Matrix- 

Assisted laser desorption/ionization, MDLC: Multi-dimensional Liquid Chromatography, 

LC: Liquid Chromatography, TOF: Time-of-Flight, CID: Collision-induced dissociation, 

ETD: Electron Transfer Dissociation, LIT: Linear Ion-Trap, FT: Fourier-Transform, Q: 

Quadrupole, ELISA; Enzyme-Linked Immunosorbent Assay, SISCAPA: Stable Isotope 

Standards with Capture by Anti-Peptide Antibody, AQUA: Absolute Quantitative 

From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols 

Edited by: A. Vlahou © Humana Press, Totowa, NJ 

1

2 Paik et al. 

Analysis. Commercial brands are also shown: MARS; Multiple Affinity Removal System, 

(Agilent, Palo Alto, CA, USA), Enchant TM : Enchant TM Multi-protein Affinity Separation 

Kit (Pall Life Sciences, Ann Arbor, MI, USA), Gradiflow TM : Gradiflow TM Separation (Life 

Bioprocess, Frenchs Forest, Australia), FFE TM : BD Free Flow Electrophoresis System 

(BD Diagnostics, Martinsried/Planegg, Germany), Zoom ® : Zoom ® Benchtop Proteomics 

System (Invitrogen Corporation, Carlsbad, CA, USA), Rotofor: Bio-Rad Rotofor ® Prep 

IEF Ccll (Bio-Rad, Hercules, CA, USA), PF2D: ProteomeLab TM PF2D Protein Fractionation 

System (Beckman Coulter, Inc., Fullerton, CA, USA), DIGE: Ettan TM DIGE System 

(GE Healthcare Bio-Sciences AB, Uppsala, Sweden), Deep Purple TM : Deep Purple TM Total 

Pprotein Stain (GE Healthcare Bio-Sciences AB, Uppsala, Sweden), ICAT TM : Isotopecoded 

affinity tags (Applied Biosystems, Foster City, CA, USA), iTRAQ TM : iTRAQ TM 

Reagents (Applied Biosystems, Foster City, CA, USA), Q-TRAP TM : (Applied Biosystems, 

Foster City, CA, USA). 

1. Overview and Scope of Clinical Proteomics 

Clinical proteomics is defined as comprehensive studies of qualitative and 

quantitative profiling of proteins (and peptides) present in clinical specimens 

such as body fluids and tissues. The comparison of specimens from healthy and 

diseased individuals may lead to the discovery of a disease biomarker (1). The 

biomarker serves as a molecular signature reflecting stages of disease before or 

after treatment and can also be used for prognostic purposes in monitoring the 

response to treatment (2). Clinical proteomics consists of a variety of experimental 

processes, which include the collection of well-phenotyped clinical 

specimens, analysis of proteins or peptides of interest, data interpretation, and 

validation of proteomics data in a clinical context (Fig. 1). After successful 

identification of a few disease biomarker candidates through extensive profiling, 

Fig. 1. Clinical and translational proteomics. The key components of experimental 

methods are included in each box.

Overview and Introduction to Clinical Proteomics 3 

translational proteomics involving validation with a cohort study follows. Even 

after proper identification and verification of a disease biomarker, it takes quite 

a long time to prove that this biomarker is applicable to clinical diagnosis or 

prognosis (3,4). 

There has been a remarkable increase in publication of clinical proteomics 

papers within a short period of time [more than 800 papers in 2006 (Fig. 2)], 

coinciding with the rapid growth of proteomics. Reflecting this trend in clinical 

proteomics, this chapter aims to present a review of core technologies that 

are used in the field of clinical proteomics with respect to sample specimen 

processing, protein separation platforms (e.g., gel-based system or liquid-based 

methods), quantitative labeling, mass spectrometry (MS), and proteome informatics 

tools. It is noteworthy that despite the advent of new technologies, 

there remain several bottlenecks in the proteomics field such as lack of dataset 

standardization, quantification of the proteins of interest, verification of protein 

or peptides identified, and an overall strategy for tackling biomarker postidentification. 

Thus, the pace of biomarker discovery, one of the key agendas of 

clinical proteomics, will depend on how well these obstacles or bottlenecks are 

resolved by technical advancement (4). The following sections address these 

issues in the context of clinical proteomics. 

Fig. 2. Recent trends in clinical proteomics publications. The distribution of the 

articles related to clinical proteomics listed in PubMed is shown here. The key words 

used for searching articles are as follows: query (clinical[All Fields] OR ((“biological 

markers”[TIAB] NOT Medline[SB]) OR “biological markers”[MeSH Terms] OR 

biomarker[Text Word])) AND (“proteomics”[MeSH Terms] OR proteomics[Text 

Word] OR proteomic[All Fields] OR “proteome”[MeSH Terms] OR proteome[Text 

Word]).

4 Paik et al. 

2. Sample Specimens and Processing Techniques Used for Clinical 

Proteomics 

2.1. General Considerations 

Because clinical proteomics rely heavily on the patient specimens, three 

important factors need to be considered before the selection and preparation of 

clinical specimens: (1) selection of the correct clinical samples according to the 

type of research, (2) isolation of the appropriate component from the clinical 

samples, and (3) establishment of optimal experimental conditions for each 

sample (5,6,7,8). For the selection of correct clinical samples, the relationship 

between clinical samples and the specific disease should also be considered. 

For example, although cancer tissue represents a specific cancer, several types 

of body fluids from patients may also have a relationship to the cancer. If 

the selected clinical samples specifically represent the disease, the next step 

is to evaluate what components are related to the specific disease. That is, 

tumor cells in cancerous tissues are surrounded by many types of stromal cells, 

inflammatory cells, and connective tissues that are directly related to changes 

in protein expression in the cancer. If the purpose of proteomic analysis is 

to identify characteristic changes of specific proteins in tumor cells, then the 

precise identification of tumor cell percentage that can be increased by tissue 

microdissection would appear to be necessary (5,6,7). As sample specimen 

conditions directly impact the results of biomarker discovery, well-defined 

clinical specimens should be used since the discovery of disease biomarkers is 

much easier when the samples have clear anatomical and pathophysiological 

definitions. Because clinical specimens are heterogeneous, sophisticated pathological 

discrimination is required for the isolation of specific diseased tissue or 

body fluids. Without the expertise of a pathologist at the earliest stage, it may 

be difficult to isolate a specifically defined specimen for clinical proteomics. 

Generally, clinical samples contain variable factors and components originating 

from the microenvironment of specific tissues. For instance, liver tissues usually 

contain a large amount of blood in the sinusoid and this amount is increased 

in tissues with dilated sinusoids (9). Lung tissues usually contain deposited 

exogenous materials and this amount is increased in heavy smokers (10). Note 

that the amount of blood present in isolated tissues may directly influence the 

relative proportion of proteins found in clinical specimens. Deposited materials 

and the other chemicals such as stain dye and fixatives used in the microdissection 

may also influence the experimental conditions (11). In the analysis of 

clinical samples, suitable buffer conditions, minimal lysis time, and high-yield 

protein precipitation are highly recommended. To avoid substantial variations 

between experiments using clinical specimens, a large set of specimens are 

also necessary because, unlike cultured cell lines, clinical specimens have high


component variability (12). More details on specific disease types are also 

described throughout this volume. 

2.2. Body Fluids 

Surveying the literature, there appears to be five to six different types of 

clinical specimens. Body fluids [e.g., plasma, urine, tear, cerebrospinal fluid, 

lymph, and ascites], tissues (e.g., liver, heart, muscle, brain, and lung), cells, 

bone, and hair have all been used for clinical proteomics (Table 1) (13,14,15,16, 

17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33). Each has its own merits 

and limitations for biomarker discovery via proteomic analysis. Among those 

sample specimens, the number of publications using body fluids has increased 

recently, perhaps because of their convenience and ease of use for noninvasive 

diagnosis. Since those proteins secreted in the body fluids during or after disease 

may reflect a broad range of pathophysiological conditions, much emphasis has 

been given to identification of prominent protein/peptide biomarkers that exhibit 

differential expression at different stages. In the literature, the terms “body 

fluids” and “biofluids” are being used interchangeably, although the former 

indicates a greater likelihood of being obtained directly from the patients, while 

the latter is applied more broadly, referring to liquid or liquid-like samples 

obtained from living organisms including model animals and plants. Throughout 

this chapter we will use “body fluids” for clarity. 

Given the large dynamic range of protein and peptide sources, plasma (a 

complex liquid interface between tissues) and extra cellular fluids may be the 

best body fluid to use for clinical proteomics and biomarker discovery (34,35, 

36,37,38). In addition to plasma, more than a dozen additional body fluids are 

currently used for biomarker discovery, ranging from urine to peritoneal fluids 

(Table 1). However, the biggest challenge in body fluids proteomics may be the 

multiple pretreatment processes including depletion of high-abundance proteins 

(in the case of plasma) (34,35,36) and/or their enrichment (in the case of urine) 

(15,39) prior to analysis (Table 1). Thus, the outcome of clinical proteomics 

may depend on proper sample processing since the quality of selection and 

handling of the most specific type of specimen will affect the overall pattern of 

profiling. Because the details of body fluid proteomics have been well described 

by Shen Hu et al. (38), we would like to focus on only a few essential points. 

First, standard measures need to be introduced to protect specimens from 

nonspecific proteolysis, lysis, and modification during collection and preparation 

(11). For the standardization of blood sample collection, Tammen 

emphasizes many useful considerations of preanalytical variables in plasma 

proteomics, which can be applied to processes involved with blood specimens 

[(40) and see Chapter 2]. The more specific problems involved in sample

Table 1 

Types of Biological Specimens Used in Clinical Proteomics 

Type Disease Reference Characteristics of the 

samples 

Fluid Secretions Plasma/serum (13,14) • Routinely accessible 

body fluids 

• Very important in the 

discovery of biomarkers 

Urine 

Nasal discharge Tears 

Saliva 

Amniotic-/cervical fluid 

Prostate cancer 

Seasonal allergic rhinitis Blepharitis and dry eye Oral and breast cancer Fetal aneuploidy 

and intra-amniotic 

inflammation 

(15) (16) (17,18) (19) (20,21) of diseases (systemic 

vs. organ specific/local) 

• Important for early 

detection, disease 

severity, prognosis, 

monitoring of response 

to therapy 

Proximal 

fluid 

Body 

cavity 

fluid 

Follicular fluid Recurrent spontaneous 

abortion 

Male infertility 

Breast cancer 

Brain tumor 

Seminal fluid 

Nipple aspirate 

fluid 

Cerebrospinal 

fluid 

Synovial fluid 

Ascites 

Bronchial lavage 

fluid 

Pleural fluid 

Peritoneal fluid 

Rheumatoid arthritis 

Ovarian cancer 

Chronic obstructive 

pulmonary disease, 

asthmatics and lung 

disease 

Lung cancer 

Ovarian cancer 

(22) 

(23) 

(24) 

(25) 

(26) 

(13) 

(27,28) 

(29) 

(14) 

• Can reflect disease 

perturbations in the 

organs or tissues from 

which they are secreted 

• Procedure of synovial 

biopsy is not very 

difficult 

Pretreatment required 

for proteomics 

• Considerations for 

sample adequacy 

– Storage 

– Hemolysis 

– Influence of 

anticoagulants 

–Consistent results 

• Consider whether to 

pool samples or analyze 

individual samples 

• Depletion of 

high-abundance proteins 

(Albumin consist of 

50% of plasma proteins) 

• Mucosa and salt 

have to be removed 

necessarily 

6

Tissue LCM or 

LMPC 

isolated 

Formalin 

fixed 

Paraffin 

embedded 

Any type of disease (30) • Very important for the 

development of novel 

in situ biomarkers 

• Immunofluorescence, 

immunocytochemistry, 

imaging mass 

spectrometry 

Cell Cell lines 

or 

primary 

tissue 

culture 

Any type of disease (31) • Very important in the 

discovery of biomarker 

candidates 

• Validation should be 

performed using 

primary tumor samples 

(e.g., immunohistologic 

methods, imaging MS) 

Bone Cartilage Rheumatoid arthritis (32) • Cartilage consists 

mainly of extracellular 

matrix, mostly made 

of collagens and 

proteoglycans 

Hair (33) • Over 300 proteins 

were found to constitute 

the insoluble complex 

formed by 

transglutaminase 

crosslinking 

• Considerations for 

sample adequacy 

• Integrity, degradation 

of protein 

• Contamination 

(microorganisms, 

extraneous material) 

• Desalting and removal 

of media component 

• Cetylpyridinium 

chloride effectively 

aggregate with 

proteoglycan 

• Need to sufficient 

extraction of protein 

from insoluble complex 

7

8 Paik et al. 

handling are also addressed by Rai et al. (41). Second, to increase the dynamic 

range of detection and reduce sample heterogeneity, pretreatments such as 

depletion of high-abundance proteins appear to be required (34,35,36). In 

addition, many pretreatment steps to remove high-abundance proteins may be 

required during initial sample processing. Multiple fractionations of clinical 

samples prior to major separation work would reduce the sample complexity. 

Note that coremoval of low-abundance proteins during this type of multiple 

depletion (36,42) and modification of proteins of interest during or after 

isolation (43) should be considered as well. For several problems encountered 

with specimen collection, Xiao et al. (Chapter 13) in this volume also describe 

different methods to isolate extra cellular matrix (ECM) and analyze the 

proteome of secreted vesicles. These methods will be useful for studying ECM 

and secreted vesicles in various samples ranging from the primary cultured 

cells to tissue specimens. Therefore, one must consider the best options for this 

process before doing the main experiment. 

2.3. Tissues and Other Samples 

Usually tissues are used as primary screening samples to find direct causes 

of disease from the lesion present in tissues of the corresponding organ, for 

example, liver tissue in hepatocellular carcinoma (HCC) (44,45). Tissues are 

widely used for clinical proteomics, although there are no standing operation 

procedures in specimen fractionation and the detection limit of current instrumentation 

remains borderline. As listed in Table 1, many cancer tissues can be 

prepared in different ways such as laser capture microdissection (LCM) (5,6), 

pressures catapulting techniques [laser microdissection and pressure catapulting 

(LMPC)] (30,46), and formalin-fixed paraffin-embedded sample preparation 

(11). Theses techniques are well described in Chapters 3, 5, 9, and 11 in this 

volume. It is desirable, however, that proteomics studies of disease tissues 

should also be coupled with parallel analysis of the corresponding body fluids. 

For example, for the study of cancer biomarkers, paired cancer tissue sets (tumor 

vs. nontumor) and the same patient’s plasma were used, which led to a more 

comprehensive analysis (47,48). Experiments on tissue samples may mostly be 

suitable for pathophysiological studies rather than biomarker discovery due to 

the complexity of the sample. 

In specimen processing for proteomics studies, there are usually several 

unwanted problems such as artifacts created during sample collection, processing, 

and storage. Other matters arise in the handling of patient information regarding 

sex, age, and race (49). To minimize those problems associated with systematic 

sample handling, it is plausible to establish a specimen bank (50,51,52). In fact, 

the collection of many clinical samples in a biorepository would have enormous


benefits for proteomic research. This enables the selection of homogeneous 

clinical samples according to the research purposes and isolation of specific 

components from clinical samples. Additionally, large scale collection of clinical 

specimens in a biorepository is essential for the validation of specific markers 

after biomarker candidate discovery. Ideally, the clinical samples stored in the 

biorepository should be (1) collected and stored immediately because dead cells 

and altered proteins affect proteomic analysis, (2) subjected to accurate quality 

control, and (3) catalogued by reliable and secure clinical data. The quality control 

of clinical samples includes trimming of specimens and confirmation of diagnosis 

by pathologists; information gained (such as the confirmation of tumor cell and 

stromal cell ratio, percentage of necrosis, percentage of fibrosis, proportion of 

infiltrated inflammatory cells, etc.) should be stored in a database of clinical 

samples. It is also essential to store clinical and follow-up data for each sample 

and each patient’s written informed consent form in the biorepository network. 

This clinical specimen banking network provides convenience, reduced budget, 

and reliability for researchers involved in clinical proteomic research (50,51,52). 

For representative tissue sample collection for proteomics studies, Diaz et al. 

(Chapter 3) address a practical experimental strategy for storage and handling of 

sample specimens that are used in surface-enhanced laser desorption/ionization 

(SELDI), 2D gel, and liquid chromatography (LC)-based proteomics. Emphasis 

should be given to the primary responsibility of pathologists in the whole 

process of tissue proteomics in addition to morphological analysis at the 

molecular level. 

3. Biomarker Discovery and Clinical Proteomics 

Given that one of the central issues of clinical proteomics is biomarker 

discovery and its application, a brief account of this subject is appropriate 

here. An excellent review of the whole arena of biomarker development can be 

found elsewhere (53,54,55). Until now, it has been generally accepted that a 

conventional concept of a disease biomarker would be a single protein/peptide 

with high specificity, which is usually present in low abundance, expressed in 

a disease in a stage-specific manner, and serve as a major fingerprint of the 

body’s response to drugs or other treatments. Although many examples of broad 

biomarkers for various diseases are known (56,57,58,59,60), identification of 

more specific and selective biomarkers is urgently needed. Accordingly, we 

may also need to change the current biomarker concept and eliminate the 

inherent bias toward individual disease biomarkers. Recently, a new idea has 

been introduced that an ensemble of different proteins would be more efficient 

than a single protein/peptide in the diagnosis of disease (61,62,63). To solve

10 Paik et al. 

this problem we propose a general strategy of clinical proteomics leading to 

disease biomarker discovery as outlined in Fig. 3. 

Since biomarker candidate proteins could come from many different cellular 

processes, they could be either in low abundance or high abundance, which 

would directly or indirectly reflect the physiological condition of the body. 

Perhaps they are present in different concentrations depending on the disease 

stage or tissue type. For example, common proteins such as Hsp 27 (64, 

65), 14-3-3 proteins (66,67), apoA-I (68,69), and serum amyloid precursor 

A (70) appear in most of disease samples from lung cancer, gastric cancer, 

pancreatic cancer, prostate cancer, neuroblastoma and, inflammation. A number 

of questions then arise: should they be treated as disease-specific or disease 

nonspecific proteins What would be the criterion to make this decision Is this 

due to the fact that the number and type of proteins secreted from a specific 

Fig. 3. The concept of the creation of a protein biomarker panel for a specific 

disease. Each white, gray, dark-gray, and black circle represents a putative protein 

biomarker of a specific disease at that clinical stage. A group of slash-lined circles 

symbolizes the biomarker panel of liver disease as an example.


physiological condition of many different types of diseases might be similar 

How one can distinguish one type of disease from another simply by looking 

at their protein profiles 

As outlined in Fig. 3, at the beginning of certain disease, signals at earlier 

stages may be limited to only a few easily counted molecules. As the disease 

progresses, more signal molecules might have been produced, resulting in mixed 

types of biomarkers representing multiple disease phenomena. Although this 

assumption seems to be oversimplified, more noise is created at a certain stage 

where it becomes more difficult to identify those molecules at the molecular 

level because of two reasons: (1) they are in amounts too small to be detected 

using the current technology and (2) it may be too premature for the molecules 

to be specific for a particular disease. Presumably, proteins appearing in stage 3 

or 4 may have higher specificity of a particular disease but the sensitivity might 

be low. It may be likely that this noise interferes with the signaling pathway of 

a certain disease, and we may end up having no decisive marker. To circumvent 

this problem, it may be desirable to identify a set of biomarker candidate 

proteins, termed a “biomarker panel,” which ideally contains potential candidate 

proteins or peptides that represent specific stages of the disease as a group. 

Given this panel, extensive validation processes may be sought using large 

group cohort. Analogous to this strategy, many biomarker candidates at stage 1 

can be included in the panel, which can have more specificity and sensitivity as 

compared to a single molecule biomarker. Using this kind of biomarker panel, 

one can use not only this molecule as diagnostic marker but also as a prognostic 

indicator in monitoring treatment effectiveness. For example, Linkov et al. (61) 

reported that both the sensitivity and specificity were improved up to 84.5 and 

98%, respectively, when they used a panel containing 25 multimarkers in early 

diagnosis of head and neck cancer (squamous cell cancer of the head and neck) 

(61). In the diagnosis of prostate cancer, specificity was increased from 5–15 

to 84–95% when they used a biomarker panel containing six marker proteins 

as compared to a single marker. In HCC, studies have been carried out on a 

biomarker panel consisting of a protein array that can be used as a diagnostic 

kit (62,63). 

A general strategy for biomarker discovery is outlined in Fig. 4. In typical 

clinical proteomics, work sample collection is the first step, followed by 

pretreatment of the sample in order to reduce sample complexity to enable 

searching for low-abundance proteins (e.g., disease biomarkers) using various 

fractionation tools. This multidimensional fractionation is well-described 

elsewhere (34,35,36), and depends on the properties and concentration of the 

sample. Typically the prefractionated samples go either to a two-dimensional 

electrophoresis (2DE) or LC-based proteomics separation system, followed by 

single or multiple steps of mass spectrometric analysis depending on the sample

12 

Fig. 4.


quantity and experimental goal. The data obtained from this series of analyses 

will be integrated into the proteome informatics system where protein/peptide 

identification, quantification, modification, and verification of peak list are 

carried out [(71) and also Chapter 19]. Usually this step becomes rate limiting 

since major profiling data are constructed and analyzed at this point. The 

clinical relevance of those proteins (and changes in their expression level) in 

a specific disease state is mostly determined, which eventually leads to identification 

of biomarker candidates. In addition, SELDI, molecular imaging and 

protein microarrays can also be applied before or after this step. Once major 

biomarker candidates are identified, those proteins are subjected to further 

verification via sophisticated analytical arrays and translational proteomics, 

which involves cohort studies, pre-evaluation, and a robust analytical system 

(4,72). Throughout the process of translational proteomics, one may be able to 

judge whether the identified panel or single proteins are suitable for biomarkers 

of a specific disease. A recent comprehensive review by Zolg (73) addressed 

several considerations in the biomarker development pipeline from discovery 

to validation. Three critical challenges within the pipeline are reduction of 

clinical sample complexity, the proof of principle of biomarker function, and 

the detection limit of unique proteins present in the samples. 

In the search for biomarker panels, reliable statistical tools and bioinformatics 

resources are needed, which are now available on the web (Table 2; 

see also Chapters 16 and 17). As the number of biomarker panel candidates 

increases, more cases are being examined, which require statistical learning 

methods. These methods include neural networks, genetic algorithms, k-means 

◭ 

Fig. 4. A typical experimental strategy for clinical proteomics and translational 

proteomics. In clinical proteomics research, various experimental techniques 

are included: specimen collection, prefractionation, 2DE, Non2DE (liquid-based 

separation), mass spectrometry, informatics, and others. The course of each section as 

marked (square, circle in different color) is determined by the investigators, depending 

on the experimental goal. At the bottom, experimental procedures for the verification 

and validation of biomarker candidates are schematically outlined leading to clinical 

screening and applications. The squares indicate the separation system based on the 

specific characteristics of proteins and general prefractionation system. The open circles 

and open triangle represent analytical modules at the protein and peptide level, respectively. 

The arrow and junction points indicate an option of each selection. Bottom parts 

indicate verification procedure employing multiple reaction monitoring and quantitative 

mass analysis. Those biomarker candidates identified from typical clinical proteomics 

would be subject to translational proteomics for validation where a large scale cohort 

study and evaluation would then proceed.


nearest-neighbor analysis, euclidean distance-based nonlinear methods, fuzzy 

pattern matching, selforganizing mapping, and support vector machines 

(74,75,76,77,78). They are very useful for classification of proteins according 

to the specific disease state (see also Chapters 16 and 20). Once biomarker 

candidates are identified, it is necessary to predict in silico the function of 

these proteins and validate them in the context of clinical application. Table 3 

provides web resources, which can be used for clinical data management, in 

silico functional annotation (see Chapter 18), prediction, and identification of 

modified forms of proteins. Thus, by combining experimental methods (Fig. 4) 

and informatics tools (Tables 2 and 3), one is able to obtain a set of biomarker 

candidate proteins (panel) that would be further used for validation through 

translational proteomics (Fig. 1). 

4. Introduction of the Experimental Strategy Described 

in This Volume 

For protein profiling and identification, proteomics platform technologies 

are moving forward in many areas not only in clinical proteomics but also in 

the general biological field. In this section, the leading scientists in the field 

of proteomics outline core techniques and their application to the studies of 

clinical proteomics. For example, in plasma proteome analysis, it is necessary 

to deplete high-abundance proteins using various techniques such as multidimensional 

fractionation by immunoaffinity column, gel permeation, and beads 

(Fig. 4). Cho et al. (Chapter 4) addresses this in relation to 2D gel analysis of 

plasma wherein the technical details of sample preparation, gel electrophoresis, 

and quantification of proteins on the gel are described. Zhang and Koay 

(Chapter 5) describe the methods of 2D gel analysis for cells prepared by 

LCM. They describe the application of LCM in dissecting tumor cells in 

breast cancer for macromolecular extraction and 2D gels. This can be used 

for preparation of samples from paraffin-embedded tissue blocks in microdissecting 

the cells of interest. Further to this procedure, Mustafa et al. (Chapter 9) 

review the application of LCM for proteomics analysis and demonstrate that 

combining LCM and MS would facilitate identification of specific proteins 

for each sample type. For urine sample analysis, Zerefos et al. (Chapter 8) 

provide simple protocols for protein analysis by 2D gel or direct matrix-assisted 

laser desorption/ionization-time-of-flight mass spectrometry. These techniques 

include protein enrichment through protein precipitation and ultrafiltration 

means. Combining these methods with the above profiling technologies allows 

reproducible and sensitive analysis of one of the most significant and complex 

biological samples (77).


Table 2 

Clinical Proteomics Initiatives and Resources 

Institute 

CPTI 

ABRF 

PPI 

EDRN 

Web resources 

ExPASy 

NCBI 

CPRMap 

Database 

MedGene 

Details 

National Cancer Institute’s Clinical 

Proteomics Technologies, initiative for 

cancer 

The Association of Biomolecular 

Resource Facilities, an international 

society dedicated to advancing core and 

research biotechnology laboratories 

through research, communication, and 

education 

Plasma Proteome Institute, the PPI is 

working to facilitate clinical adoption of 

advanced diagnostic tests using proteins 

in plasma and serum 

The Early Detection Research Network, 

the EDRN provide up-to-date 

information on biomarker research 

through this website and scientific 

publications 

Expert Protein Analysis System, 

proteomics related information and 

database 

National Center for Biotechnology 

Information, the protein entries in the 

Entrez search and retrieval system have 

been compiled from a variety of sources, 

including SwissProt, PIR, PRF, PDB, 

and translations from annotated coding 

regions in GenBank and RefSeq 

Clinical Proteomics Research Map, 

updated research article for disease and 

clinical proteomics 

MedGene can make a list of human 

genes associated with a particular human 

disease in ranking order 

Websites 

http://proteomics.cancer. 

gov 

http://www.abrf.org/ 

http://www.plasmaprote 

ome.org/plasmaframes. 

htm 

http://edrn.nci.nih.gov 

http://www.expasy.org/ 

http://www.ncbi.nlm. 

nih.gov/entrez/query. 

fcgidb = Protein& 

itool = toolbar 

http://www.cprmap.com/ 

http://hipseq.med.harv 

ard.edu/MEDGENE


Table 3 

Available Bioinformatic Resources for the Analysis of Proteomics Data 

Name Description Website URL PMID 

Clinical proteome data management system 

Proteus 

LIMS for proteomics 

pipeline 

CPAS 

LIMS for identification 

and quantification using 

by LC-MS/MS data 

Systems biology A management system for 

experiment analysis collecting, storing, 

management and accessing data 

system 

produced by microarray, 

proteomics, and 

immunohistochemistry 

GPM database Open source system for 

analyzing, validating, 

and storing protein 

identification data 

SpectrumMill MS/MS data analysis and 

management system 

http://www. 

genologics.com 

http://www. 

sbeams.org/ 

http://www. 

thegpm.org/ 

http://www.chem. 

agilent.com/ 

16396501 

16756676 

15595733 

Phosphorylation 

Group-based 

phosphorylation 

scoring method 

KinasePhos 

NetPhos 

NetPhosK 

Prediction of 

kinase-specific 

phosphorylation sites 

A web tool for identifying 

protein kinase-specific 


using by hidden Markov 

model 

Sequence and 

structure-based prediction 

of eukaryotic protein 


Prediction of 

post-translational 

glycosylation and 

phosphorylation of 

proteins from the amino 

acid sequence 

http://973- 

proteinweb.ustc. 

edu.cn/gps/ 

gps_web/ 

http://kinasePhos. 

mbc.nctu.edu.tw 

http://www.cbs. 

dtu.dk/services/ 

NetPhos/ 

http://www.cbs.dtu. 

dk/services/ 

NetPhosK/ 

15980451 

15980458 

10600390 

15174133


PredPhospho 

PREDIKIN 

Prosite 

Scansite 

Phospho.ELM 

Human protein 

reference database 

(HPRD) 

PhosphoSite 

Glycosylation 

NetOGlyc 2.0 

DictyOGlyc 1.1 

YinOYang 1.2 

NetNGlyc 1.0 

GlycoMod 

Prediction of phosphorylation 

sites using support vector 

machine 

A prediction of substrates for 

serine/threonine protein 

kinases based on the primary 

sequence of a protein kinase 

catalytic domain 

A prediction of substrates 

for protein kinases-based 

conserved motif search 

Prediction of PK-specific 

phosphorylation site with 

Bayesian decision theory 

A database of experimentally 

verified phosphorylation sites 

in eukaryotic proteins 

A database of known 

kinase/phosphatase substrate as 

well as binding motifs that are 

curated from the published 

literature 

A bioinformatics resource 

dedicated to physiological 

protein phosphorylation 

Predicts O-glycosylation sites 

in mucin-type proteins 

Predicts O-GlcNAc sites in 

eukaryotic proteins 

Predicts O-GlcNAc sites in 

eukaryotic proteins 

Predicting N-glycosylation 

sites 

Web software for prediction of 

the possible oligosaccharide 

structures in glycoproteins 

from their experimentally 

determined masses 

http://pred.ngri. 

re.kr/Pred 

Phospho.htm 

http://florey.biosci. 

uq.edu.au/kinsub/ 

home.htm 

http://kr.expasy. 

org/prosite 

http://scansite. 

mit.edu 

http://phospho.elm. 

eu.org/ 

http://www.hprd. 

org/PhosphoMotif_ 

finder 

http://www. 

phosphosite.org/ 

Login.jsp 



NetOGlyc/ 



DictyOGlyc/ 



YinOYang/ 

http://www.cbs.dtu. 

dk/services/ 

NetNGlyc/ 

http://www.expasy. 

ch/tools/glycomod/ 

15231530 

16445868 

17237102 

16549034 

15212693 

15174125 

9557871 

10521537 

16316981 

11680880 

(Continued)


Table 3 

(Continued) 


Glyco-fragment 

GlycoSearchMS 

GlycosidIQ 

Saccharide 

topology 

analysis tool 

GlycoX 

MODi 

SWEET-DB 

A web tool to support 

the interpretation of 

mass spectra of complex 

carbohydrates 

Compares each peak 

of a measured mass 

spectrum with the calculated 

fragments of all structures 

contained in the SweetDB 

Based on the matching of 

experimental MS2 data with 

the theoretical fragmentation 

of glycan structures in 

GlycoSuiteDB 

A web-based computational 

program that can quickly 

extract sequence information 

from a set of MSn spectra 

for an oligosaccharide of up 

to 10 residues 

To determine simultaneously 

the glycosylation sites 

and oligosaccharide 

heterogeneity of 

glycoproteins using 

MATLAB 

A web server for identifying 

multiple post-translational 

peptide modifications from 

tandem mass spectra 

An attempt to create 

annotated data collections 

for carbohydrates 

Protein–protein interaction 

Munich 

The database of mammalian 

information protein–protein interactions 

center for protein 

sequence’s MPPI 

http://www.dkfz. 

de/spec/projekte/ 

fragments/ 

14625865 

http://www.dkfz. 15215392 

de/spec/glycosciences. 

de/sweetdb/ms/ 

https://tmat. 15174134 

proteomesystems. 

com/glyco/glycosuite/ 

glycodb 

http://www. 

unimod.org 

http://www.dkfz.de/ 

spec2/sweetdb/ 

10857602 

17022651 

16845006 

11752350 

http://mips.gsf.de 16381839


Database of 

interacting proteins 

Molecular 

interaction network 

database 

Protein–protein 

interactions of 

cancer proteins 

IntAct 

Biomolecular 

interaction network 

database 

A database that documents 

experimentally determined 

protein–protein interactions 

A database of storing, in 

a structured format, 

information about 

molecular interactions by 

extracting experimental 

details from work 

published in peer-reviewed 

journals 

Predicts interactions, which 

are derived from homology 

with experimentally known 

protein–protein interactions 

from various species 

IntAct provides a freely 

available, open source 

database system and 

analysis tools for protein 

interaction data 

A database designed to 

store full descriptions of 

interactions, molecular 

complexes and pathways 

http://dip.doembi.ecla.edu/ 

http://mint.bio. 

uniroma2.it/mint 

http://bmm. 

cancerresearchuk. 

org/˜pip 

http://www.ebi. 

ac.uk/intact/ 

Metabolic and 

signal pathway 

BioCarta A pathway database http://www. 

biocarta.com 

KEGG 

Cancer cell map 

HPRD 

A pathway database with 

genomical, chemical, and 

biological network 

information 

The cancer cell map is a 

selected set of human 

cancer focused pathways 

A database with 

data pertaining 

to post-translational 

modifications, 

protein–protein 

interactions, tissue 

expression, 

11752321 

17135203 

16398927 

17145710 

http://www.bind.ca 12519993 

http://www. 

genome.jp/kegg 

http://cancer. 

cellmap.org/cellmap/ 

http://www. 

hprd.org/ 

16381885 

(Continued)


Table 3 

(Continued) 


subcellular localization, 

and enzyme–substrate 

relationships 

Proteomic data resource 

The cancer cell A database of clinical data 

map 

from SELDI-TOF 

Proteomics 

identifications 

database 

PeptideAtlas 

Disease resource 

Online 

mendelian 

inheritance in 

man 

GeneCards 

Cancer gene 

census 

A database of protein and 

peptide identifications that 

have been described in the 

scientific literature 

A multiorganism, publicly 

accessible compendium of 

peptides identified in a 

large set of tandem mass 

spectrometry proteomics 

experiments 

A database of human genes 

and genetic disorders 

An integrated database of 

human genes that includes 

automatically mined 

genomic, proteomic, and 

transcriptomic information 

A catalogue those genes for 

which mutations have been 

causally implicated in cancer 

http://home.ccr. 

cancer.gov/ncifda 

proteomics/ 

ppatterns.asp 

http://www.ebi. 

ac.uk/pride/ 

http://www. 

peptideatlas.org 

http://www.ncbi.nlm. 

nih.gov/entrez/query. 

fcgidb = OMIM 

http://www.genecards. 

org/index.shtml 

http://www.sanger. 

ac.uk/genetics/CGP/ 

Census/ 

16381953 

16381952 

17170002 

15608261 

14993899 

Two-dimensional electrophoresis is perhaps the most popular start-up tool 

for proteome analysis. For clinical proteomics, 2DE has been the traditional 

workhorse of proteomics used for the analysis of different clinical specimens 

ranging from plasma to urine (Table 1). Quantification problems in 2DE are now 

solved by employing fluorescent dyes (cy3 and cy5), which allow normalization


of data obtained from two different clinical specimens (79). Freedman and 

Lilley (Chapter 6) present general optimization conditions for differential in gel 

electrophoresis (DIGE) in the quantitative analysis of clinical samples. They 

address the usefulness of differentially labeling dyes (Cy2, Cy3, and Cy5). 

The essence of any DIGE system is to minimize any potential human errors 

in the process of identification and quantification of proteins spotted in a 2D 

gel (79). The difficulties in 2D map analysis are introduced by Marengo et al. 

(Chapter 16). They describe methods for comparing protein spots using image 

analysis technology and related informatics tools to minimize variations between 

measurements of spot volume, a key to successful 2D map construction. 

There are many variations of LC in protein profiling, including mass detection 

methods, column types, data mining through search engines, mass accuracy, 

and running conditions (80,81,82). These are all related to quantification of 

proteins or peptides in the sample, one of the major bottlenecks in proteomics 

(83,84,85,86,87). Among the several techniques are isotope-coded affinity tags 

(ICAT), mass-coded affinity tagging, and nonisotope labeled methods. Xiao and 

Veenstra (Chapter 10) present the application of ICAT in the course of COX-2 

inhibitor regulated proteins in a colon cancer cell line. With emphasis on sample 

preparation, they provide details on ICAT procedures for quantitative proteomics 

(88). In addition to this approach, Li et al. (Chapter 11) employ a strategy, 

which combines LCM techniques for sample preparation of HCC and cleavable 

isotope-coded affinity tags in order to identify those markers quantitatively. 

However, it should be mentioned here that some other measures are needed to 

increase the efficiency of ICAT since it has drawbacks in the efficiency of sample 

recovery during or after labeling steps (87). A label-free serum quantification 

method has been recently introduced (48) (See Chapter 12 by Higgs et al.). 

The use of antibody arrays in clinical proteomics has increased recently in the 

context of high-throughput detection of cancer specimens where the identities 

of the proteins of interest are known (89,90). The evaluation of antibody crossreactivity 

and specificity is very crucial in these assays. This matter is addressed 

by Sanchez-Carbayo (Chapter 15), where technical aspects and application of 

planar antibody arrays in the quantification of serum proteins is described as 

well as by Hsu et al. (Chapter 14) where the development and use of beadbased 

miniaturized multiplexed sandwich immunoassays for focused protein 

profiling in various body fluids is provided. The latter method using beadbased 

protein arrays or suspension microarray allows the simultaneous analysis 

of a variety of parameters within a single experiment. With the versatility of 

suspension microarray in the analysis of proteins of interest present in different 

types of body fluids ranging from serum to synovial fluids, this multiplexed 

protein profiling technology described by Hsu et al. (Chapter 14) seems to 

hold a great promise in clinical proteomics. Similarly, in combination with


tissue microarrays technology (91) it would also be possible to perform parallel 

molecular profiling of clinical samples together with immunohistochemistry, 

fluorescence in situ hybridization, or RNA in situ hybridization. SELDI is 

another arena of high-throughput profiling of clinical samples in the course 

of disease marker discovery [(92,93), Chapter 7]. It is expected that profiling 

approaches in proteomics, such as SELDI-MS, will be frequently used in disease 

marker discovery, but only if the proper identification technologies coupled 

with SELDI are improved. 

During the course of biomarker discovery, large data sets are usually 

generated and deposited in a coordinated fashion (Tables 2 and 3) (94,95). 

Indeed, statistical analysis of 2DE proteomics, which produce several hundred 

protein spots, is complex. To circumvent some inconsistency in 2D gel 

proteomics data, Friedman and Lilley (Chapter 6) and Carpentier et al. (Chapter 

17) point out available statistical tools and suggest case-specific guidelines for 

2D gel spot analysis. Fitzgibbon et al. (Chapter 19) describe an open source 

platform for LC-MS spectra where the msInspector program is used to lower 

false positives and guide normalization of the dataset. It is also demonstrated 

that msInspect can analyze data from quantitative studies with and without 

isotopic labels. Paliakasis et al. (Chapter 18) introduce web-based tools for 

protein classification, which lead to prediction of potential protein function 

and family clustering of related proteins. They provide some guidelines to 

classification of protein data into more meaningful families. Finally, Somorjai 

(Chapter 20) addresses important filtering criteria for the application of protein 

pattern recognition to biomarker discovery using statistical tools. 

5. Concluding Remarks 

Although there are several bottlenecks in clinical proteomics (such as lack 

of standardization of sample specimen process, quantification, and overall 

strategy for tackling post-identification of biomarkers), we believe that the 

field holds great promise in biomarker discovery. The success of clinical 

proteomics depends on the availability and selection of well-phenotyped 

specimens, reduction of sample complexity, development of good informatics 

tools, and efficient data management. Therefore, sample handling techniques 

including microdissection for tissue sample, multidimensional fractionation for 

body fluids, and pretreatment of other clinical specimens (e.g., urine, tears, and 

cells) should be developed in this context. Since there is no gold standard for 

sample collection and handling, one needs to find the best options available for 

sample processing without damage. In addition, establishment of a biorepository 

system would systematically minimize some artifacts and variation between 

samples during or after identification of biomarkers.


It is now generally accepted that an ensemble (or panel) of different proteins 

would be more efficient than a single protein/peptide in the diagnosis of disease, 

an idea which is poised to replace the conventional concept of a biomarker. 

As a high-throughput way of protein profiling, the use of antibody arrays 

in clinical proteomics has recently increased in regard to detection of cancer 

specimens. However, in the use of antibody arrays to profile serum autoantibodies, 

issues of cross-reactivity and specificity have to be resolved. Although 

not covered here due to space limitations, with the advent of proteomics 

techniques one can further analyze a network of protein–protein interaction 

as well as post-translational modifications of those proteins involved in a 

specific disease (Table 3). It is now highly recommended that common reagents 

such as antibodies and standard proteins, which are very useful for spiking 

purposes, quantification work, and sensitivity normalization of one machine to 

another be used in worldwide efforts like human proteome organization plasma 

proteome project (96,97). Finally, clinical proteomics needs the integration of 

biochemistry, pathology, analytical technology, bioinformatics, and proteome 

informatics to develop highly sensitive diagnostic tools for routine clinical care 

in the future (71,98). 


This study was supported by a grant from the Korea Health 21 R&D project, 

Ministry of Health & Welfare, Republic of Korea (A030003 to YKP). 

References 

1. Etzioni, R., Urban, N., Ramsey, S., McIntosh, M., Schwartz, S., Reid, B., Radich, J., 

Anderson, G., and Hartwell, L. (2003) The case for early detection. Nat. Rev. 

Cancer 3, 1–10. 

2. Ludwig, J. A. and Weinstein, J. N. (2005) Biomarkers in cancer staging, prognosis 

and treatment selection. Nat. Rev. Cancer 5, 845–856. 

3. Xiao, Z., Prieto, D., Conrads, T. P., Veenstra, T. D., and Issaq, H. J. (2005) 

Proteomic patterns: their potential for disease diagnosis. Mol. Cell Endocrinol. 

230, 95–106. 

4. Rifai, N., Gillette, M. A., and Carr, S. A. (2006) Protein biomarker discovery 

and validation: the long and uncertain path to clinical utility. Nat. Biotechnol. 24, 

97–983. 

5. Emmert-Buck, M. R., Bonner, R. F., Smith, P. D., Chuaqui, R. F., Zhuang, Z., 

Goldstein, S. R., Weiss, R. A., and Liotta, L. A. (1996) Laser capture microdissection. 

Science 274, 998–1001. 

6. Gillespie, J. W., Ahram, M., Best, C. J., Swalwell, J. I., Krizman, D. B., 

Petricoin, E. F., Liotta, L. A., and Emmert-Buck, M. R. (2001) The role of tissue 

microdissection in cancer research. Cancer J. 7, 32–39.


7. Craven, R. A. and Banks, R. E. (2002) Use of laser capture microdissection to 

selectively obtain distinct populations of cells for proteomic analysis. Methods 

Enzymol. 356, 33–49. 

8. Vincourt, J. B., Lionneton, F., Kratassiouk, G., Guillemin, F., Netter, P., 

Mainard, D., and Magdalou, J. (2006) Establishment of a reliable method for direct 

proteome characterization of human articular cartilage. Mol. Cell Proteomics 5, 

1984–1995. 

9. Platt, M. S., Agamanolis, D. P., Krill, C. E. Jr., Boeckman, C., Potter, J. L., 

Robinson, H., and Lloyd, J. (1983) Occult hepatic sinusoid tumor of infancy 

simulating neuroblastoma. Cancer 52, 1183–1189. 

10. Mahadevia, P. J., Fleisher, L. A., Frick, K. D., Eng, J., Goodman, S. N., and 

Powe, N. R. (2003) Lung cancer screening with helical computed tomography 

in older adult smokers: a decision and cost-effectiveness analysis. JAMA 289, 

313–322. 

11. Hood, B. L., Darfler, M. M., Guiel, T. G., Furusato, B., Lucas, D. A., 

Ringeisen, B. R., Sesterhenn, I. A., Conrads, T. P., Veenstra, T. D., and Krizman, 

D. B. (2005) Proteomic analysis of formalin-fixed prostate cancer tissue. Mol. Cell 

Proteomics 4, 1741–1753. 

12. Alaiya, A., Al-Mohanna, M., and Linder, S. (2005) Clinical cancer proteomics: 

promises and pitfalls. J. Proteome Res. 4, 1213–1222. 

13. Gericke, B., Raila, J., Sehouli, J., Haebel, S., Konsgen, D., Mustea, A., and 

Schweigert, F. J. (2005) Microheterogeneity of transthyretin in serum and ascitic 

fluid of ovarian cancer patients. BMC Cancer 17, 133–141. 

14. Swisher, E. M., Wollan, M., Mahtani, S. M., Willner, J. B., Garcia, R., Goff, B. A., 

and King, M. C. (2005) Tumor-specific p53 sequences in blood and peritoneal fluid 

of women with epithelial ovarian cancer. Am. J. Obstet. Gynecol. 193, 662–667. 

15. Pisitkun, T., Johnstone, R., and Knepper, M. A. (2006) Discovery of urinary 

biomarkers. Mol. Cell Proteomics 5, 1760–1771. 

16. Ghafouri, B., Irander, K., Lindbom, J., Tagesson, C., and Lindahl, M. (2006) 

Comparative proteomics of nasal fluid in seasonal allergic rhinitis. J. Proteome 

Res. 5, 330–338. 

17. Koo, B. S., Lee, D. Y., Ha, H. S., Kim, J. C., and Kim, C. W. (2005) Comparative 

analysis of the tear protein expression in blepharitis patients using two-dimensional 

electrophoresis. J. Proteome Res. 4, 719–724. 

18. Grus, F. H., Podust, V. N., Bruns, K., Lackner, K., Fu, S., Dalmasso, E. A., 

Wirthlin, A., and Pfeiffer, N. (2005) SELDI-TOF-MS ProteinChip array profiling 

of tears from patients with dry eye. Invest. Ophthalmol. Vis. Sci. 46, 863–876. 

19. Amado, F. M., Vitorino, R. M., Domingues, P. M., Lobo, M. J., and Duarte, J. A. 

(2005) Analysis of the human saliva proteome. Expert Rev. Proteomics 2, 521–539. 

20. Wang, T. H., Chang, Y. L., Peng, H. H., Wang, S. T., Lu, H. W., Teng, S. H., 

Chang, S. D., and Wang, H. S. (2005) Rapid detection of fetal aneuploidy using 

proteomics approaches on amniotic fluid supernatant. Prenat. Diagn. 25, 559–566. 

21. Ruetschi, U., Rosen, A., Karlsson, G., Zetterberg, H., Rymo, L., Hagberg, 

H., and Jacobsson, B. (2005) Proteomic analysis using protein chips to detect


biomarkers in cervical and amniotic fluid in women with intra-amniotic inflammation. 

J. Proteome Res. 4, 2236–2242. 

22. Kim, Y. S., Kim, M. S., Lee, S. H., Choi, B. C., Lim, J. M., Cha, K. Y., and 

Baek, K. H. (2006) Proteomic analysis of recurrent spontaneous abortion: identification 

of an inadequately expressed set of proteins in human follicular fluid. 


23. Pilch, B. and Mann, M. (2006) Large-scale and high-confidence proteomic analysis 

of human seminal plasma. Genome Biol. 7, R40 

24. Varnum, S. M., Covington, C. C., Woodbury, R. L., Petritis, K., Kangas, L. J., 

Abdullah, M. S., Pounds, J. G., Smith, R. D., and Zangar, R. C. (2003) Proteomic 

characterization of nipple aspirate fluid: identification of potential biomarkers of 

breast cancer. Breast Cancer Res. Treat. 80, 87–97. 

25. Zheng, P. P., Luider, T. M., Pieters, R., Avezaat, C. J., van den Bent, M. J., Sillevis 

Smitt, P. A., and Kros, J. M. (2003) Identification of tumor-related proteins by 

proteomic analysis of cerebrospinal fluid from patients with primary brain tumors. 

J. Neuropathol. Exp. Neurol. 62, 855–862. 

26. Gibson, D. S., Blelock, S., Brockbank, S., Curry, J., Healy, A., McAllister, C., 

and Rooney, M. E. (2006) Proteomic analysis of recurrent joint inflammation in 

juvenile idiopathic arthritis. J. Proteome Res. 5, 1988–1995. 

27. Merkel, D., Rist, W., Seither, P., Weith, A., and Lenter, M. C. (2005) 

Proteomic study of human bronchoalveolar lavage fluids from smokers with 

chronic obstructive pulmonary disease by combining surface-enhanced laser 

desorption/ionization-mass spectrometry profiling with mass spectrometric protein 

identification. Proteomics 5, 2972–2980. 

28. Wu, J., Kobayashi, M., Sousa, E. A., Liu, W., Cai, J., Goldman, S. J., Dorner, A. J., 

Projan, S. J., Kavuru, M. S., Qiu, Y., and Thomassen, M. J. (2005) Differential 

proteomic analysis of bronchoalveolar lavage fluid in asthmatics following 

segmental antigen challenge. Mol. Cell Proteomics 4, 1251–1264. 

29. Tyan, Y. C., Wu, H. Y., Lai, W. W., Su, W. C., and Liao, P. C. (2005) Proteomic 

profiling of human pleural effusion using two-dimensional nano liquid chromatography 

tandem mass spectrometry. J. Proteome Res. 4, 1274–1286. 

30. Khalil, A. A. and James, P. (2007) Biomarker discovery: a proteomic approach for 

brain cancer profiling. Cancer Sci. 98, 201–213. 

31. Khodavirdi, A. C., Song, Z., Yang, S., Zhong, C., Wang, S., Wu, H., Pritchard, C., 

Nelson, P. S., and Roy-Burman, P. (2006) Increased expression of osteopontin 

contributes to the progression of prostate cancer. Cancer Res. 66, 883–888. 

32. Vincourt, J. B., Lionneton, F., Kratassiouk, G., Guillemin, F., Netter, P., Mainard, D., 

and Magdalou, J. (2006) Establishment of a reliable method for direct proteome 

characterization of human articular cartilage. Mol. Cell Proteomics 5, 1984–1995. 

33. Lee, Y. J., Rice, R. H., and Lee, Y. M. (2006) Proteome analysis of human 

hair shaft: from protein identification to post-translational modification. Mol. Cell 


34. Cho, S. Y., Lee, E. Y., Lee, J. S., Kim, H. Y., Park, J. M., Kwon, M. S., Park, Y. K., 

Lee, H. J., Kang, M. J., Kim, J. Y., Yoo, J. S., Park, S. J., Cho, J. W., Kim, H. S., and


Paik, Y. K. (2005) Efficient prefractionation of low-abundance proteins in human 

plasma and construction of a two-dimensional map. Proteomics 5, 3386–3396. 

35. Lathrop, J. T., Hayes, T. K., Carrick, K., and Hammond, D. J. (2005) Rarity gives 

a charm: evaluation of trace proteins in plasma and serum. Expert Rev. Proteomics 

2, 393–406. 

36. Lee, H. J., Lee, E. Y., Kwon, M. S., and Paik, Y. K. (2006) Biomarker discovery 

from the plasma proteome using multidimensional fractionation proteomics. Curr. 

Opin. Chem. Biol. 10, 42–49. 

37. Anderson, N. L. and Anderson, N. G. (2002) The human plasma proteome: history, 

character, and diagnostic prospects. Mol. Cell Proteomics 1, 845–867. 

38. Hu, S., Loo, J. A., and Wong, D. T. (2006) Human body fluid proteome analysis. 


39. Park, M. R., Wang, E. H., Jin, D. C., Cha, J. H., Lee, K. H., Yang, C. W., 

Kang, C. S., and Choi, Y. J. (2006) Establishment of a 2-D human urinary proteomic 

map in IgA nephropathy. Proteomics 6, 1066–1076. 

40. Tammen, H., Schutle, I., Hess, R., Menzel, C., Kellmann, M., and Schulz- 

Knappe, P. (2005) Prerequisites for peptidomic analysis of blood samples: I. 

Evaluation of blood specimen qualities and determination of technical performance 

characteristics. Comb. Chem. High Trhoughput Screen 8, 725–733. 

41. Rai, A. J., Gelfand, C. A., Haywood, B. C., Warunek, D. J., Yi, J., Schuchard, M. D., 

Mehigh, R. J., Cockrill, S. L., Scott, G. B., Tammen, H., Schulz-Knappe, P., 

Speicher, D. W., Vitzthum, F., Haab, B. B., Siest, G., and Chan, D. W. 

(2005) HUPO plasma proteome project specimen collection and handling: towards 

the standardization of parameters for plasma proteome samples. Proteomics 5, 

3262–3277. 

42. Zhou, M., Lucas, D. A., Chan, K. C., Issaq, H. J., Petricoin, E. F. 3rd, Liotta, L. A., 

Veenstra, T. D., and Conrads, T. P. (2004) An investigation into the human serum 

“interactome”. Electrophoresis 25, 1289–1298. 

43. Findeisen, P., Sismanidis, D., Riedl, M., Costina, V., and Neumaier, M. (2005) 

Preanalytical impact of sample handling on proteome profiling experiments with 

matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Clin. 

Chem. 51, 2409–2411. 

44. Park, K. S., Kim, H., Kim, N. G., Cho, S. Y., Choi, K. H., Seong, J. K., and Paik, 

Y. K. (2002) Proteomic analysis and molecular characterization of tissue ferritin 

light chain in hepatocellular carcinoma. Hepatology 35, 1459–1466. 

45. Park, K. S., Cho, S. Y., Kim, H., and Paik, Y. K. (2002) Proteomic alterations of the 

variants of human aldehyde dehydrogenase isozymes correlate with hepatocellular 

carcinoma. Int. J. Cancer 97, 261–265. 

46. Marko-Varga, G., Berglund, M., Malmstrom, J., Lindberg, H., and Fehniger, T. E. 

(2003) Targeting hepatocytes from liver tissue by laser capture microdissection 

and proteomics expression profiling. Electrophoresis 24, 3800–3805. 

47. Paradis, V., Degos, F., Dargere, D., Pham, N., Belghiti, J., Degott, C., Janeau, 

J. L., Bezeaud, A., Delforge, D., Cubizolles, M., Laurendeau, I., and Bedossa, P. 

(2005) Identification of a new biomarker of hepatocellular carcinoma by serum 

protein profiling of patients with chronic liver diseases. Hepatology 41, 40–47.


48. Ru, Q. C., Zhu, L. A., Silberman, J., and Shriver, C. D. (2006) Label-free semiquantitative 

peptide feature profiling of human breast cancer and breast disease sera via 

two-dimensional liquid chromatography–mass spectrometry. Mol. Cell Proteomics 

5, 1095–1104. 

49. Azad, N. S., Rasool, N., Annuziata, C. M., Minasian, L., Whiteley, G., and 

Kohn, E. C. (2006) Proteomics in clinical trials and practice: present uses and 

future promise. Mol. Cell Proteomics 5, 1819–1829. 

50. Gunter, E. W. (1997) Biological and environmental specimen banking at the 

Centers for Disease Control and Prevention. Chemosphere 34, 1945–1953. 

51. Strauss, G. H. and Kelly, S. J. (1990) The development of the U.S. EPA health 

effects research laboratory frozen blood cell repository program. Mutat. Res. 234, 

349–354. 

52. Romeo, M. J., Espina, V., Lowenthal, M., Espina, B. H., Petricoin, E. F. 3rd, and 

Liotta, L. A. (2005) CSF proteome: a protein repository for potential biomarker 

identification. Expert Rev. Proteomics 2, 57–70. 

53. Conrads, T. P., Hood, B. L., Petricoin, E. F. 3rd, Liotta, L. A., and Veenstra, T. D. 

(2005) Cancer proteomics: many technologies, one goal. Expert Rev. Proteomics 

2, 693–703. 

54. Schrader, M. and Selle, H. (2006) The process chain for peptidomic biomarker 

discovery. Dis. Markers 22, 27–37. 

55. Danna, E. A. and Nolan, G. P. (2006) Transcending the biomarker mindset: 

deciphering disease mechanisms at the single cell level. Curr. Opin. Chem. Biol. 

10, 20–27. 

56. De Masi, S., Tosti, M. E., and Mele, A. (2005) Screening for hepatocellular 

carcinoma. Dig. Liver Dis. 37, 260–268. 

57. Yamaguchi, K., Nagano, M., Torada, N. Hamasaki, N., Kawakita, M., and 

Tanaka, M. (2004) Urine diacetylspermine as a novel tumor marker for pancreatobiliary 

carcinomas. Rinsho. Byori. 52, 336–339 

58. Dabrowska, M., Grubek-Jaworska, H., Domagala-Kulawik, J., Bartoszewicz, Z., 

Kondracka, A., Krenke, R., Nejman, P., and Chazan, R. (2004) Diagnostic usefulness 

of selected tumor markers (CA125, CEA, CYFRA 21–1) in bronchoalveolar lavage 

fluid in patients with non-small cell lung cancer. Pol. Arch. Med. Wewn 111, 659–665. 

59. Gann, P. H., Hennekens, C. H., and Stampfer, M. J. (1995) A prospective evaluation 

of plasma prostate-specific antigen for detection of prostatic cancer. JAMA 273, 

289–294 

60. Ciambellotti, E., Coda, C., and Lanza, E. (1993) Determination of CA 15–3 in the 

control of primary and metastatic breast carcinoma. Minerva Med. 84, 107–112. 

61. Linkov, F., Lisovich, A., Yurkovetsky, Z., Marrangoni, A., Velikokhatnaya, L., 

Nolen, B., Winans, M., Bigbee, W., Siegfried, J., Lokshin, A., and Ferris, R. L. 

(2007) Early detection of head and neck cancer: development of a novel screening 

tool using multiplexed immunobead-based biomarker profiling. Cancer Epidemiol. 

Biomarkers Prev. 16, 102–107. 

62. Casiano, C. A., Mediavilla-Varela, M., and Tan, E. M. (2006) Tumor-associated 

antigen arrays for the serological diagnosis of cancer. Mol. Cell Proteomics 5, 

1745–1759.


63. Nissom, P. M., Lo, S. L., Lo, J. C., Ong, P. F., Lim, J. W., Ou, K., Liang, R. C., 

Seow, T. K., and Chung, M. C. (2006) Hcc-2, a novel mammalian ER thioredoxin 

that is differentially expressed in hepatocellular carcinoma. FEBS Lett. 580, 2216– 

2226. 

64. Feng, J. T., Liu, Y. K., Song, H. Y., Dai, Z., Qin, L. X., Almofti, M. R., Fang, C. Y., 

Lu, H. J., Yang, P. Y., and Tang, Z. Y. (2005) Heat-shock protein 27: a potential 

biomarker for hepatocellular carcinoma identified by serum proteome analysis. 


65. Li, D. Q., Wang, L., Fei, F., Hou, Y. F., Luo, J. M., Wei-Chen, Zeng, R., 

Wu, J., Lu, J. S., Di, G. H., Ou, Z. L., Xia, Q. C., Shen, Z. Z., and 

Shao, Z. M. (2006) Identification of breast cancer metastasis-associated proteins 

in an isogenic tumor metastasis model using two-dimensional gel electrophoresis 

and liquid chromatography-ion trap-mass spectrometry. Proteomics 6, 

3352–3368. 

66. Lee, I. N., Chen, C. H., Sheu, J. C., Lee, H. S., Huang, G. T., Yu, C. Y., 

Lu, F. J., and Chow, L. P. (2005) Identification of human hepatocellular carcinomarelated 

biomarkers by two-dimensional difference gel electrophoresis and mass 

spectrometry. J. Proteome Res. 4, 2062–2069. 

67. Righetti, P. G., Castagna, A., Antonucci, F., Piubelli, C., Cecconi, D., 

Campostrini, N., Rustichelli, C., Antonioli, P., Zanusso, G., Monaco, S., Lomas, L., 

and Boschetti, E. (2005) Proteome analysis in the clinical chemistry laboratory: 

myth or reality Clin. Chim. Acta 357, 123–139. 

68. Jang, J. S., Cho, H. Y., Lee, Y. J., Ha, W. S., and Kim, H. W. (2004) The 

differential proteome profile of stomach cancer: identification of the biomarker 

candidates. Oncol. Res. 14, 491–499. 

69. Steel, L. F., Shumpert, D., Trotter, M., Seeholzer, S. H., Evans, A. A., London, 

W. T., Dwek, R., and Block, T. M. (2003) A strategy for the comparative analysis 

of serum proteomes for the discovery of biomarkers for hepatocellular carcinoma. 


70. Yip, T. T., Chan, J. W., Cho, W. C., Yip, T. T., Wang, Z., Kwan, T. L., Law, S. C., 

Tsang, D. N., Chan, J. K., Lee, K. C., Cheng, W. W., Ma, V. W., Yip, C., 

Lim, C. K., Ngan, R. K., Au, J. S., Chan, A., Lim, W. W., and Ciphergen SARS 

Proteomics Study Group (2005) Protein chip array profiling analysis in patients 

with severe acute respiratory syndrome identified serum amyloid a protein as a 

biomarker potentially useful in monitoring the extent of pneumonia. Clin. Chem. 51, 

47–55. 

71. Anderson, L. and Hunter, C. L. (2005) Quantitative mass spectrometric multiple 

reaction monitoring assays for major plasma proteins. Mol. Cell Proteomics 5, 

573–588. 

72. Lee, J. W., Figeys, D., and Vasilescu, J. (2007) Biomarker assay translation from 

discovery to clinical studies in cancer drug development: quantification of emerging 

protein biomarkers. Adv. Cancer Res. 96, 269–298. 

73. Zolg, W. (2006) The proteomic search for diagnostic biomarkers: lost in translation 

Mol. Cell Proteomics 5, 1720–1726.


74. Bensmail, H., Golek, J., Moody, M. M., Semmes, J. O., and Haoudi, A. (2005) 

A novel approach for clustering proteomics data using Bayesian fast Fourier 

transform. Bioinformatics 21, 2210–2224. 

75. Ward, D. G., Cheng, Y., N’Kontchou, G., Thar, T. T., Barget, N., Wei, W., 

Billingham, L. J., Martin, A., Beaugrand, M., and Johnson, P. J. (2006) Changes in 

the serum proteome associated with the development of hepatocellular carcinoma 

in hepatitis C-related cirrhosis. Br. J. Cancer 94, 287–292. 

76. Lin, N. and Zhao, H. (2005) Are scale-free networks robust to measurement errors 

BMC Bioinformatics 6, 119. 

77. Castagna, A., Cecconi, D., Sennels, L., Rappsilber, J., Guerrier, L., Fortis, F., 

Boschetti, E., Lomas, L., and Righetti, P. G. (2005) Exploring the hidden human 

urinary proteome via ligand library beads. J. Proteome Res. 4, 1917–1930. 

78. Rauch, A., Bellew, M., Eng, J., Fitzgibbon, M., Holzman, T., Hussey, P., Igra, M., 

Maclean, B., Lin, C. W., Detter, A., Fang, R., Faca, V., Gafken, P., Zhang, H., 

Whiteaker, J., States, D., Hanash, S., Paulovich, A., and McIntosh, M. W. (2006) 

Computational proteomics analysis system (CPAS): an extensible open source 

analytic system for evaluating and publishing proteomic data and high throughput 

biological experiments. J. Proteome Res. 5, 112–121. 

79. Lilley, K. S. and Friedman, D. B. (2004) All about DIGE: quantification technology 

for differential-display 2D-gel proteomics. Expert Rev. Proteomics 1, 401–409. 

80. Qian, W. J., Jacobs, J. M., Liu, T., Camp, D. G. 2nd, and Smith, R. D. 

(2006) Advances and challenges in liquid chromatography-mass spectrometrybased 

proteomics profiling for clinical applications. Mol. Cell Proteomics 5, 

1727–1744. 

81. Powell, D. W., Merchant, M. L., and Link, A. J. (2006) Discovery of regulatory 

molecular events and biomarkers using 2D capillary chromatography and mass 

spectrometry. Expert Rev. Proteomics 3, 63–74. 

82. Andre, M., Le Caer, J. P., Greco, C., Planchon, S., El Nemer, W., Boucheix, C., 

Rubinstein, E., Chamot-Rooke, J., and Le Naour, F. (2006) Proteomic analysis of 

the tetraspanin web using LC-ESI-MS/MS and MALDI-FTICR-MS. Proteomics 

6, 1437–1449. 

83. Greengauz-Roberts, O., Stoppler, H., Nomura, S., Yamaguchi, H., 

Goldenring, J. R., Podolsky, R. H., Lee, J. R., and Dynan, W. S. (2005) Saturation 

labeling with cysteine-reactive cyanine fluorescent dyes provides increased sensitivity 

for protein expression profiling of laser-microdissected clinical specimens. 


84. Heck, A. J. and Krijgsveld, J. (2004) Mass spectrometry-based quantitative 

proteomics. Expert Rev. Proteomics 1, 317–326. 

85. Schneider, L. V. and Hall, M. P. (2005) Stable isotope methods for high-precision 

proteomics. Drug Discov. Today 10, 353–363. 

86. Zhang, J., Goodlett, D. R., Peskind, E. R., Quinn, J. F., Zhou, Y., Wang, Q., 

Pan, C., Yi, E., Eng, J., Aebersold, R. H., and Montine, T. J. (2005) Quantitative 

proteomic analysis of age-related changes in human cerebrospinal fluid. Neurobiol 

Aging 26, 207–227.


87. Liu, T., Qian, W. J., Strittmatter, E. F., Camp, D. G. 2nd, Anderson, G. A., 

Thrall. B. D., and Smith, R. D. (2004) High-throughput comparative proteome 

analysis using a quantitative cysteinyl-peptide enrichment technology. Anal. Chem. 

76, 5345–5353. 

88. Li, C., Hong, Y., Tan, Y. X., Zhou, H., Ai, J. H., Li, S. J., Zhang, L., Xia, Q. C., 

Wu, J. R., Wang, H. Y., and Zeng, R. (2004) Accurate qualitative and quantitative 

proteomic analysis of clinical hepatocellular carcinoma using laser capture 

microdissection coupled with isotope-coded affinity tag and two-dimensional liquid 

chromatography mass spectrometry. Mol. Cell Proteomics 3, 399–409. 

89. Sheehan, K. M., Calvert, V. S., Kay, E. W., Lu, Y., Fishman, D., Espina, V., 

Aquino. J., Speer, R., Araujo, R., Mills, G. B., Liotta, L. A., Petricoin, E. F. 

3rd, and Wulfkuhle, J. D. (2005) Use of reverse phase protein microarrays and 

reference standard development for molecular network analysis of metastatic 

ovarian carcinoma. Mol. Cell Proteomics 4, 346–355. 

90. Knezevic, V., Leethanakul, C., Bichsel, V. E., Worth, J. M., Prabhu, V. V., Gutkind, 

J. S., Liotta, L. A., Munson, P. J., Petricoin, E. F. 3rd, and Krizman, D. B. (2001) 

Proteomic profiling of the cancer microenvironment by antibody arrays. Proteomics 

1, 1271–1278. 

91. Sharma-Oates, A., Quirke, P., Westhead, D. R. (2005) TmaDB: a repository for 

tissue microarray data. BMC Bioinformatics 6, 218. 

92. Rai, A. J., Stemmer, P. M., Zhang, Z., Adam, B. L., Morgan, W. T., Caffrey, 

R. E., Podust, V. N., Patel, M., Lim, L. Y., Shipulina, N. V., Chan, D. W., 

Semmes, O. J., and Leung, H. C. (2005) Analysis of human proteome organization 

plasma proteome project (HUPO PPP) reference specimens using surface enhanced 

laser desorption/ionization-time of flight (SELDI-TOF) mass spectrometry: multiinstitution 

correlation of spectra and identification of biomarkers. Proteomics 5, 

3467–3474. 

93. Engwegen, J. Y., Gast, M. C., Schellens, J. H., and Beijnen, J. H. (2006) 

Clinical proteomics: searching for better tumour markers with SELDI-TOF mass 

spectrometry. Trends Pharmacol. Sci. 27, 251–259. 

94. Domon, B. and Aebersold, R. (2006) Mass spectrometry and protein analysis. 

Science 312, 212–217. 

95. Domon, B. and Aebersold, R. (2006) Challenges and opportunities in proteomics 

data analysis. Mol. Cell Proteomics 5, 1921–1926. 

96. Uhlen, M. and Ponten, F. (2005) Antibody-based proteomics for human tissue 

profiling. Mol. Cell Proteomics 4, 384–393. 

97. Taussig, M. J., Stoevesandt, O., Borrebaeck, C. A., Bradbury, A. R., Cahill, D., 

Cambillau, C., de Daruvar, A., Dubel, S., Eichler, J., Frank, R., Gibson, T. J., 

Gloriam, D., Gold, L., Herberg, F. W., Hermjakob, H., Hoheisel, J. D., Joos, T. O., 

Kallioniemi, O., Koegll, M., Konthur, Z., Korn, B., Kremmer, E., Krobitsch, S., 

Landegren, U., van der Maarel, S., McCafferty, J., Muyldermans, S., Nygren, P. A., 

Palcy, S., Pluckthun, A., Polic, B., Przybylski, M., Saviranta, P., Sawyer, A., 

Sherman, D. J., Skerra, A., Templin, M., Ueffing, M., and Uhlen, M. (2007)


ProteomeBinders: planning a European resource of affinity reagents for analysis 

of the human proteome. Nat. Methods 4, 13–17. 

98. Ilyin, S. E., Belkowski, S. M., and Plata-Salaman, C. R. (2004) Biomarker 

discovery and validation: technologies and integrative approaches. Trends 

Biotechnol. 22, 411–416.

I 

Specimen Collection for Clinical 

Proteomics

2 

Specimen Collection and Handling 

Standardization of Blood Sample Collection 

Harald Tammen 

Summary 

Preanalytical variables can alter the analysis of blood-derived samples. Prior to the 

analysis of a blood sample, multiple steps are necessary to generate the desired specimen. 

The choice of blood specimens, its collection, handling, processing, and storage are 

important aspects since these characteristics can have a tremendous impact on the results 

of the analysis. 

The awareness of clinical practices in medical laboratories and the current knowledge 

allow for identification of specific variables that affect the results of a proteomic study. 

The knowledge of preanalytical variables is a prerequisite to understand and control their 

impact. 

Key Words: blood; plasma; serum; proteomics; specimen; preanalytical variables. 

1. Introduction 

Proteomic analysis of blood specimens by semi-quantitative multiplex 

techniques offers a valuable approach for discovery of disease or therapyrelated 

biomarkers (1,2). Based on reproducible separation of proteins by their 

physical–chemical properties in combination with semi-quantitative detection 

methods and bioinformatic data analysis, proteomics allows for sensitive 

measurement of proteins in blood specimens (3). Blood can be regarded as 

a complex liquid tissue that comprises cells and extracellular fluid (4). The 

choice of a suitable specimen-collection protocol is crucial to minimize artificial 

processes (e.g., cell lysis, proteolysis) occurring during specimen collection and 

preparation (5). Preanalytic procedures can alter the analysis of blood-derived 



35

36 Tammen 

samples. These procedures comprise the processes prior to actual analysis of 

the sample and include steps needed to obtain the primary sample (e.g., blood) 

and the analytical specimen (e.g., plasma, serum, cells). Legal or ethical issues 

(e.g., importance of informed consents) or potential risks of phlebotomy (e.g., 

bleeding) are not covered in this article. 

1.1. Collection of Blood Samples 

It has been reported that the most frequent faults in the preanalytical phase 

are the result of erroneous procedures of sample collection (e.g., drawing blood 

from an infusive line resulting in sample dilution) (6). The design of blood 

collection devices may aid in correct sampling: evacuated containers sustain 

the draw of accurate quantity of blood to ensure the correct concentration of 

additives or the correct dilution of the blood, such as in the case of citrated 

plasma. The speed of blood draw is also controlled and restricts the mechanical 

stress. The favored site of collection is the median cubital vein, which is 

generally easily found and accessed. As such, it will be most comfortable to 

the patient, and should not evoke additional stress. Preparation of the collection 

site includes proper cleaning of the skin with alcohol (2-propanol). The alcohol 

must be allowed to evaporate, since commingling of the remaining alcohol 

with blood sample may result in hemolysis, raise the levels of distinct analytes, 

and cause interferences. The position of the patient (standing, lying, sitting) 

can affect the hematocrit (7), and hence may change the concentration of the 

analytes. Tourniquet should be applied 3–4 inches above the site of venipuncture 

and should be released as soon as blood begins flowing into the collection 

device. The duration of venous occlusion (>1 min) can affect the sample 

composition. Prolonged occlusion may result in hemoconcentration and subsequently 

increase the miscellaneous analytes, e.g., total protein levels. Blood 

should be collected from fasting patients in the morning between 7 and 9 a.m., 

because ingestion or circadian rhythms can alter the concentration of analytes 

considerably (e.g., total protein, hemoglobin, myoglobin). 

1.2. Characteristics of Serum and Plasma Specimens 

Serum is one of the most frequently analyzed blood specimens. The 

generation of serum is time consuming and associated with the activation of 

coagulation cascade and complement system. These processes influence the 

composition of the samples, because they result in cell lysis (e.g., thrombocytes, 

erythrocytes). As a consequence, the concentration of components in 

the extracellular fluid, such as aspartate-aminotransferase, serotonin, neuronspecific 

enolase, and lactate-dehydrogenase, are increased (8). On the other 

hand, degradation of the analytes (e.g., hormones) may occur faster (9). Onthe

Specimen Collection and Handling 37 

proteomic level, more peptides and less proteins are observed in serum when 

compared to plasma (10,11). 

Consequently, the activation of clotting cascades necessary to generate serum 

can lead to artefacts. A reason to use serum as a specimen is based on 

the notion that the proteome or peptidome of serum may reflect biological 

events (12). Post-sampling proteolytic cleavage products have been proposed 

as biomarkers, and it has been further suggested that serum peptidome is of 

particular diagnostic value for the detection of cancer (13). However, it has 

been reported that more protein changes occur in serum than in plasma (14). 

Thus, it can be expected that the reproducibility of such ex vivo proteolytic 

events is comparatively low. 

In contrast to serum, citrate and EDTA inhibit coagulation and other 

enzymatic processes by chelate formation with ions, thereby inhibiting iondependent 

enzymes. This is in contrast to heparin, which acts through the 

activation of antithrombin III. The main concern associated with heparinized 

plasma for proteomic studies is that it is a poly-disperse charged molecule that 

binds many proteins non-specifically (15,16), and may also influence separation 

procedures and mass spectrometric detection of peptides and small proteins due 

to its similar molecular weight (17). 

The sampling of plasma is less time consuming than the acquisition of serum. 

Separation of the cells and the liquid phase can be performed subsequently to 

sample collection since no clotting time is required (30–60 min). In comparison 

to serum, the amount of plasma generated from blood is approximately 10 to 

20% higher. Additionally, the protein content of plasma is also higher than in 

serum, because of the presence of clotting factors and associated components. 

Furthermore, proteins may be bound to the clot, resulting in a decrease of 

protein concentration. 

1.3. Processing of Blood Samples 

A quick separation of cells from the plasma is favorable, since cellular 

constituents may liberate substances that alter the composition of the sample. 

Generally, it is recommended that plasma and serum be centrifuged with 

1300–2000×g for 10 min within 30 min from the collection of the sample. The 

temperature should generally be 15–24°C (18), unless recommended differently 

for distinct analytes like gastrin or A-type natriuretic peptide. Processing at 4°C 

appears to be attractive, because enzymatic degradation processes are reduced 

at low temperatures. However, platelets become activated at low temperatures 

(19) and release intracellular proteins and enzymes, which affect the sample 

composition. Thus, processing at low temperatures is safe only after thrombocytes 

have been removed. Since one centrifugation step may be insufficient for

38 Tammen 

depletion of platelets below 10 cells/nL, a second centrifugation step (2500×g 

for 15 min at room temperature) or filtration step may be required to obtain 

platelet-poor plasma. This procedure is applicable only to plasma since the 

platelets in serum are already activated. 

1.4. Protease Inhibitors 

Protease inhibitors would be attractive, but commonly used protease cocktails 

may introduce difficulties due to interference with mass spectrometry and 

formation of covalent bonds with proteins, which would result in shifting the 

isoform pattern (20). Protease inhibitors have been considered and investigated as 

additives in proteome research to prevent or slow down proteolytic processes and 

thereby provide a means of more sensitive detection of markers in blood (21). 

Even though protein integrity has been shown to be maintained by the 

addition of 15 commercially available protease inhibitors, the usefulness of 

protease inhibitors in overall protein stabilization of blood samples remains to 

be investigated in more detail (22). The presence of certain protease inhibitors 

in whole blood is toxic to live cells. Stressed, apoptotic, or necrotic cells release 

substances, and it may be argued that this affects the composition of serum or 

plasma until the cellular and soluble factions of blood are separated. However, 

careful selection of an appropriate protease inhibitor may solve this problem. 

2. Materials 

1. Twenty gauge needles and an appropriate adapter (e.g., Sarstedt, Nümbrecht, 

Germany) or a Vacutainer system (BD Bioscience, Franklin Lakes, USA). 

2. Alcohol (2-propanol) in spray flask. 

3. Swabs. 

4. Examination gloves. 

5. Tourniquet or sphygmomanometer. 

6. Blood collection tubes (e.g., Sarstedt). 

7. Centrifuge with a swinging bucket rotor (e.g., Sigma 4K15, Sigma Laborzentrifugen, 

Osterode, Harz). 

8. A 10-mL syringe equipped with a cellulose acetate filter unit with 0.2 μm pore 

size and 5 cm 2 filtration area (e.g., Sartorius Minisart, Sarstedt). 

9. 2 mL cryo-vials. 

10. Pipette and tips. 

3. Methods 

1. Venipuncture of a cubital vein is performed using a 20-gauge needle (diameter: 

0.9 mm, e.g., butterfly system max. tubing length: 6 cm). If tourniquet is applied, 

it should not remain in place for longer than 1 min (risk of falsifying results due to


hemoconcentration). As soon as the blood flows into the container, the tourniquet 

has to be released at least partially. If more time is required, the tourniquet 

has to be released so that circulation resumes and normal skin color returns to 

extremity. 

• Prior to blood collection for proteomic analysis, blood is aspirated into the 

first container (e.g., 2.7 mL S-Monovette, Sarstedt, Nümbrecht, Germany). 

This is done to flush the surface and remove initial traces of contact-induced 

coagulation. This sample is not useful for analysis. 

• Afterward, blood is drawn into a standard EDTA or citrate-containing syringe 

(e.g. 9 mL EDTA-Monovette, Sarstedt, Nümbrecht, Germany). Depending on 

ease of blood flow, several samples can be collected. Free flow with mild 

aspiration should be assured to avoid haemolysis. 

2. After venipuncture, plasma is obtained by centrifugation for 10 min at 2000×g at 

room temperature. Centrifugation should start within 30 min after blood collection. 

The resulting plasma sample may now be separated from red and white blood 

cells in an efficient and gentle way. Nevertheless, a significant number of platelets 

(∼25%) are still present in the sample. This requires an additional preparation 

step. 

3. For platelet depletion, one of the following procedures has to be undertaken 

directly after step 2: 

• Platelet removal by centrifugation: The plasma sample is transferred into a 

second vial for another centrifugation for 15 min at 2500×g at room temperature. 

After centrifugation, the supernatant is transferred in aliquots of 1.5 mL 

into cryo vials. 

• Platelet removal by filtration: Plasma aliquots of 1.5 mL resulting from step 

2 are transferred into 2-mL cryo vials using a 10-mL syringe equipped with 

a cellulose acetate filter unit with 0.2 μm pore size and 5 cm 2 filtration area 

(e.g., Sartorius Minisart ® , Sartorius, Göttingen, Germany). Filtration requires 

only gentle pressure. 

4. Samples are transferred to an –80°C freezer within 30 min. Storage is at –80°C. 

Transport of samples is done on dry ice. 

4. Notes 

4.1. Frequently Made Mistakes 

4.1.1. Blood Withdrawal 

• The patient was not fasting (i.e., had taken food prior to sampling). 

• The blood was drawn from an infusive line. 

• The blood was drawn in a wrong position (e.g., supine, upright). 

• The consumables used were different than those recommended.

40 Tammen 

• The expiry date of consumables was already reached. 

• The tubes were not properly filled. 

• The tubes were agitated vigorously (instead of gentle shaking to dissolve the anticoagulant). 

• The blood sample tubes were not consistently kept at room temperature. 

• The sample tubes were put on ice or in a refrigerator. 

. 

4.1.2. Lab Handling 

• Centrifugation was delayed more than 30 min after blood withdrawal. 

• A cooling centrifuge was adjusted below room temperature. 

• The centrifugation speed was wrong (e.g., rounds per minute were set instead of 

g-force). 

• The centrifugation time was wrong. 

• The removal of blood plasma by pipetting was done without proper caution. Consequently, 

the buffy coat or the red blood cells were churned up. 

• The second centrifugation of recovered plasma samples was delayed after first 

centrifugation. 

4.1.3. Storage of Samples 

• The storage of samples was delayed. 

• The storage temperatures were above –80°C. 

• The labeling of sample containers was unreadable or confusable. 

• The attachment of labels to the sample containers was not proper during storage or 

handling resulted in loss of labels. 

4.1.4. General Recommendations 

• A proper first centrifugation should produce a visible white blood cell layer (buffy 

coat) between red blood cells and plasma. If not, centrifugation speed or time may 

be wrong. 

• One should discard plasma that is icteric or exhibits signs of haemolysis. One should 

check with an expert if this was due to that particular disease. 

References 

1. Vitzthum F, Behrens F, Anderson NL, Shaw JH. (2005) Proteomics: from basic 

research to diagnostic application. A review of requirements and needs. J. Proteome 

Res. 4, 1086–97. 

2. Lathrop JT, Anderson NL, Anderson NG, Hammond DJ. (2003) Therapeutic 

potential of the plasma proteome. Curr. Opin. Mol. Ther. 5, 250–7.


3. Wang W, Zhou H, Lin H, Roy S, Shaler TA, Hill LR et al. (2003) Quantification of 

proteins and metabolites by mass spectrometry without isotopic labeling or spiked 

standards. Anal. Chem. 75, 4818–26. 

4. Anderson NL, Anderson NG. (2002) The human plasma proteome: history, 

character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845–67. 

5. Omenn GS. (2004) The Human Proteome Organization Plasma Proteome 

Project pilot phase: reference specimens, technology platform comparisons, and 

standardized data submissions and analyses. Proteomics 4, 1235–40. 

6. Plebani M, Carraro P. (1997) Mistakes in a stat laboratory: types and frequency. 

Clin. Chem. 43, 1348–51. 

7. Burtis CA, Ashwood E. (eds) (2001) Fundamentals of Clinical Chemistry. 

Saunders, Philadelphia. 

8. Guder WG, Narayanan S, Wisser H, Zawata B. (2003) Samples: From the Patient to 

the Laboratory. The Impact of Preanalytical Variables on the Quality of Laboratory 

Results. GIT Verlag, Darmstadt, Germany. 

9. Evans MJ, Livesey JH, Ellis MJ, Yandle TG. (2001) Effect of anticoagulants and 

storage temperatures on stability of plasma and serum hormones. Clin. Biochem 

34, 107–12. 

10. Omenn GS, States DJ, Adamski M, Blackwell TW, Menon R, Hermjakob H et al. 

(2005) Overview of the HUPO Plasma Proteome Project: results from the pilot 

phase with 35 collaborating laboratories and multiple analytical groups, generating 

a core dataset of 3020 proteins and a publicly-available database. Proteomics 5, 

3226–45. 

11. Rai AJ, Gelfand CA, Haywood BC, Warunek DJ, Yi J, Schuchard MD et al. 

(2005) HUPO Plasma Proteome Project specimen collection and handling: towards 


3262–77. 

12. Villanueva J, Shaffer DR, Philip J, Chaparro CA, Erdjument-Bromage H, 

Olshen AB et al. (2006) Differential exoprotease activities confer tumor-specific 

serum peptidome patterns. J. Clin. Invest. 116, 271–84. 

13. Liotta LA, Petricoin EF. (2006) Serum peptidome for cancer detection: spinning 

biologic trash into diagnostic gold. J. Clin. Invest. 116, 26–30. 

14. Tammen H, Schulte I, Hess R, Menzel C, Kellmann M, Schulz-Knappe P. (2005) 

Prerequisites for peptidomic analysis of blood samples: I. Evaluation of blood 

specimen qualities and determination of technical performance characteristics. 

Comb. Chem. High Throughput Screen. 8, 725–33. 

15. Holland NT, Smith MT, Eskenazi B, Bastaki M. (2003) Biological sample collection 

and processing for molecular epidemiological studies. Mutat. Res. 543, 217–34. 

16. Landi MT, Caporaso N. (1997) Sample collection, processing and storage. IARC 

Sci. Publ. 223–36. 

17. Tammen H, Schulte I, Hess R, Menzel C, Kellmann M, Mohring T, 

Schulz-Knappe P. (2005) Peptidomic analysis of human blood specimens: 

comparison between plasma specimens and serum by differential peptide display. 

Proteomics 13, 3414–22.

42 Tammen 

18. Favaloro EJ, Soltani S, McDonald J. (2004) Potential laboratory misdiagnosis of 

hemophilia and von Willebrand disorder owing to cold activation of blood samples 

for testing. Am. J. Clin. Pathol. 122, 686–92. 

19. Mustard JF, Kinlough-Rathbone RL, Packham MA. (1989) Isolation of human 

platelets from plasma by centrifugation and washing. Methods Enzymol. 169, 3–11. 

20. Schuchard MD, Mehigh RJ, Cockrill SL, Lipscomb GT, Stephan JD, Wildsmith J 

et al. (2005) Artifactual isoform profile modification following treatment of 

human plasma or serum with protease inhibitor, monitored by 2-dimensional 

electrophoresis and mass spectrometry. Biotechniques 39, 239–47. 

21. Jeffrey DH, Deidra B, Keith H, Shu-Pang H, Deborah LR, Gregory JO, Stanley AH. 

(2004) An Investigation of Plasma Collection, Stabilization, and Storage Procedures 

for Proteomic Analysis of Clinical Samples. Humana, Totowa, NJ. 

22. Rai AJ, Vitzthum F. (2006) Effects of preanalytical variables on peptide and protein 

measurements in human serum and plasma: implications for clinical proteomics. 

Expert Rev. Proteomics 3, 409–26.

3 

Tissue Sample Collection for Proteomics Analysis 

Jose I. Diaz, Lisa H. Cazares, and O. John Semmes 

Summary 

Successful collection of tissue samples for molecular analysis requires critical considerations. 

We describe here our procedure for tissue specimen collection for proteomic 

purposes with emphasis on the most important steps, including timing issues and the procedures 

for immediate freezing, storage, and microdissection of the cells of interest or “tissue 

targets” and the lysates for protein isolation for SELDI, MALDI, and 2DGE applications. 

The pathologist is at the cornerstone of this process and is an invaluable collaborator. 

In most institutions, pathologists are responsible for “tissue custody,” and they closely 

supervise the tissue bank. In addition, they are optimally trained in histopathology in 

order to they assist investigators to correlate tissue morphology with molecular findings. 

In recent years, the advent of the laser capture microscope, a tool ideally designed for 

pathologists, has tremendously facilitated the efficiency of collecting tissue targets for 

molecular analysis. 

Key Words: tissue bank; frozen section; immunofluorescence; laser capture microscope; 

proteomics. 


From the completion of surgery and the acquisition of tissue sample to 

protein isolation and performing the various proteomic techniques, a number 

of challenges must be overcome. The first challenge is time. Surgery is 

associated with loss of vascular supply, resulting in progressive increase of 

endogenous protease activity, protein degradation, and tissue autolysis. For 

this reason, specimens submitted for tissue procurement must be processed 

without delay. Formalin fixation, a standard processing procedure in pathology, 



43

44 Diaz et al. 

stops protease activity. However, formalin is a cross-linking fixative that 

irreversibly alters protein, thus compromising the quality of the extracts for 

most proteomic techniques. Recent technical developments appear promising 

and may ultimately enable peptide analysis and protein identification (bottom 

up proteomics) in formalin-fixed paraffin embedded tissue (1). At present, 

however, it is imperative to take a representative “fresh” tissue sample immediately 

after surgery when collecting tissue for proteomic studies, including 

MALDI TOF MS and 2DGE. The surgical specimen should be transported 

quickly to pathology, and a representative tissue sample should be obtained 

under the supervision of a pathologist. The sample should be embedded in OCT 

and frozen without delay. Ideally, a frozen section should be performed for 

quality assurance before archiving the sample. Once the pathologist confirms 

that the expected targets are present in the collected tissue (for instance, tumor 

and non-tumor tissue), the frozen specimen can be stored in a –80°C freezer for 

subsequent use. Overcoming time constraints requires appropriate institutional 

policies and dedicated personnel. From our experience, it is better to delegate 

the responsibility of transporting the surgical specimen from the operating room 

to pathology to dedicated tissue procurement personnel, instead of expecting 

the surgical team to deliver the specimens. When collecting and archiving tissue 

samples, our policy is to bisect the sample into two halves, one embedded 

in OCT and stored permanently at –80°C for future molecular studies, and 

one submitted as a “mirror image” processed in formalin after performing a 

frozen section for morphologic comparison and cell type mapping after basic 

hematoxylin and eosin (H&E) staining. This formalin-processed mirror image 

tissue provides optimal morphological detail, which might be necessary in 

the future. For instance, it is very difficult to identify prostatic intraepithelial 

neoplasia (PIN) on frozen section slides; however, the formalin fixed section, 

which closely mimics the frozen section, can be used for guidance. 

After archiving the tissue sample, the next challenge is to ensure that the 

proteomic findings are representative of the tissue targets under investigation, 

given the cellular heterogeneity present in most tissues. For instance, if one 

would like to determine the differential protein expression in tumor versus 

non-tumor, one must ensure that proteins are separately and reliably extracted 

from normal and tumor cells. Certainly, many solid tumors are visible to the 

naked eye, and both tumor and non-tumor tissues can be collected by gross 

inspection. However, under a microscope, the tumor bed contains not only 

tumor cells but many other tumor–associated, non-tumoral elements, such as 

supporting stromal cells, blood vessels, infiltrating lymphocytes, etc. Moreover, 

microscopic foci of tumor may infiltrate grossly normal tissue. In the past, 

various approaches were followed to collect cells from tissue sections, including 

manual microdissection with a syringe. In the recent years, the procedure

Tissue Sample Collection for Proteomics Analysis 45 

of laser-capture microdissection (2) has tremendously increased the quality, 

specificity, and speed of the process, allowing selective capture of cells and 

various tissue elements while preserving the molecular integrity (3,4,5). 

The LCM is a special microscope that isolates cells from frozen or formalinfixed 

tissues and cytological preparations. Microdissection of single cells or 

multicellular structures is accomplished by placing a plastic polymer (cap) over 

the tissue while pulsing an infrared laser for the polymer to melt and adhere 

to the target cells under the laser ring. When the cap is removed, the cells 

that adhered to the polymer detach from the surrounding tissue without any 

molecular damage, becoming suitable for the extraction of high-quality nucleic 

acids and proteins, and for a wide range of downstream molecular analyses, 

A 

B 

C 

D 

Fig. 1. Selective immunofluorescent LCM of prostate gland’s basal cells by immunocapture: 

(A) immunofluorescent staining of basal cells with a mAb against highmolecular-weight 

keratins, which are highly expressed on basal cells, (B) selection 

of immunofluorescent-positive basal cells for subsequent LCM, (C) captured 

immunofluorescent-positive cells after LCM photographed from the plastic cap, 

(D) remaining of the gland after removing the basal cell layer by LCM.


such as gene expression microarrays, or proteomics. The use of a microscope 

can be coupled with special immunostaining procedures if one wishes to capture 

specific cell types not easily identified by morphology alone, which is the 

“so called” immunocapture procedure (6,7), which further enhances the specificity 

of tissue procurement for molecular analysis. For example, in a former 

study (8), we were able to selectively capture basal cells from benign prostate 

glands, which are extremely difficult to recognize morphologically but easily 

identifiable after immunostaining for high-molecular-weight cytokeratin (Fig. 1). 

We obtained excellent protein quality results and were able to identify several 

protein peaks preferentially expressed in these cells using SELDI-TOF-MS. 

When we compared the protein spectra from the same tissue sample sections 

routinely stained with hematoxilin with those immunostained for high-molecularweight 

cytokeratins, there was no difference in the spectra, militating against 

any significant protein deterioration due to the immunostaining procedure. 

2. Materials 

2.1. Tissue Collection and Storage 

1. Tissue-Tek Cryomold-standard (Sakura, Torrance, CA) 

2. Tissue-Tek OCT (Sakura) 

3. 2 ′ methylbutane (Mallinckrodt, St. Louis, MO) 

4. Shandon Histobath II (Thermo Electron Corp., Waltham, MA) 

5. –80°C freezer 

2.2. Frozen Tissue Sectioning and Staining 

1. Cryostat 

2. HistoGene TM LCM Frozen Section Staining Kit (Arcturus Biosciences Inc, 

Mountain View, CA). The kit contains histogene staining solution, ethanol (75, 

95, 100%), xylene, distilled water nuclease free, histogene LCM slides, and 

disposable slide staining jars. 

3. 1× PBS made from 10× stock (Fisher Scientific) 

4. Acetone (high purity grade) 

5. Cy3-Strepavidin (Invitrogen, Carlsbad, CA) 

6. Biotinylated mAbs: Any antibody can be biotinylated. We routinely have 1.5 mg of 

antibody labeled with 0.2 mg biotin (Alpha Diagnostic Intl. Inc. San Antonio, TX). 

2.3. LCM 

1. PixCell II LCM System (Arcturus Biosciences Inc) 

2. AutoPix TM Automated LCM System (Arcturus Biosciences Inc) 

3. CapSure ® LCM caps (Arcturus Biosciences Inc) 

4. Prep Strip (Arcturus Biosciences Inc) 

5. Microcentrifuge tubes (0.5 ml) (Eppendorf North America)


2.4. LCM Lysate 

1. Micropipet capable of delivering 1 μl accurately 

2. 20 mM HEPES (pH to 8.0 with NaOH) with 1% Triton X-100 

3. Sonicator (optional) 

4. 1× PBS 

2.5. SELDI Analysis 

1. IMAC3 or WCX2 Protein Array Chips (Ciphergen Biosystems Palo Alto, CA) 

2. HPLC grade water (Fisher Scientific) 

3. 100 mM sodium acetate pH 4.0 

4. 100 mM ammonium acetate pH 4.0 

5. Sinapinic acid (SPA) (Ciphergen Biosystems, Palo Alto, CA) 

6. Optima grade Acetonitile (Fisher Scientific) 

7. Trifluoroacetic acid, packaged in 1 ml ampules (Pierce Chemical Company, 

Rockford, IL) 

2.6. MALDI Analysis 

1. Target plate 

2. Cinaminic acid (CHCA) (Bruker Daltonics, Palo Alto, CA) 

3. SPA (Fluka) 

4. Optima grade Acetonitile (Fisher Scientific) 

5. Trifluoroacetic acid, packaged in 1 ml ampules (Pierce Chemical Company) 

3. Method 

3.1. Tissue Collection and Storage 

1. The tissue sample is embedded in OCT using a cryomold and is frozen in the 

Shandon Histobath, which contains 2 ′ methylbutane (see Note 1). 

2. Hold the cryomold against the 2 ′ methylbutane liquid interface and allow the 

tissue to freeze slowly (3–5 min) (see Note 2). 

3. After achieving complete freezing, place the frozen cryomold containing the 

sample in a plastic bag and transport the sample within a liquid nitrogen container. 

Store the sample in a –80°C freezer. 

3.2. Frozen Tissue Sectioning and Staining 

3.2.1. Regular Hematoxylin Staining 

Prior to LCM, cut 8-μm-thick frozen tissue sections from the cryostat (discard 

folded or wrinkled sections). Keep slides with sections in cryostat after cutting 

and stain as follows (see Notes 3 and 9; slides may also be frozen at –80°C 

until stained.):


1. Remove the slides from the freezer or cryostat and place in 70% ethanol (30 s). 

2. Place in purified water (5 s). 

3. Add the Histogene staining solution (30 s) (see Note 4). 

4. Rinse the slides with purified water. 

5. Wash with 70% ethanol (60 s). 

6. Wash with 95% ethanol twice (60 s each). 

7. Wash with 100% ethanol (60 s). 

8. Place the slides in xylene to ensure complete dehydration (10 min) (see Note 5). 

9. Shake off and drain carefully by touching the corner with a particle-free tissue 

paper. 

10. Air dry the slides to allow xylene to evaporate completely (at least 2 min). 

11. The slides are now ready for LCM (they should not be coverslipped) (see 

Note 12) 

3.2.2. Immunofluorescence Staining (see Note 7) 

1. Thaw slides (1 min). 

2. Place in cold acetone at 4°C (2 min). 

3. Air dry (30 s). 

4. Wash in filtered pH 7.4 1× PBS. 

5. Drain off slides. 

6. Add 100 μl of first biotinylated Ab at optimal dilution: recommended concentration 

30–100 μg/ml, optimize for best results (3 min). 

7. Rinse in PBS. 

8. Add 100 μl of Cy3 at dilution 1:100 (user may decide the optimal staining 

concentration of the Cy3 Streptavidin conjugate by performing a serial dilution 

staining experiment) (1 min). 

9. Rinse in PBS. 

10. Place slides in 75% ethanol (30 s). 



13. Place slides in xylene (5 min) (see Note 6). 

14. Air dry (5 min). 

3.3. LCM 

The new instruments developed by Arcturus, such as the AutoPix TM and the 

Veritas TM are enclosed in automated systems entirely operated by a computer. 

We describe here the LCM procedure using the PixCell II instrument, which 

is manually operated and the least expensive LCM instrument today and, 

therefore, more widely used (see Note 8). 

1. Turn on the instrument and enter pertinent data such as slide #, case #, cap lot #, 

thickness (always 8 μm), and place the stained slide on the mechanical stage (see 

Note 10).


2. Turn on the vacuum pump to immobilize the slide (small aperture on the left side 

of the stage) and push in the filter bottom for optimal image quality. 

3. Place the caps in the rail on the right side of the stage. Unlock the mechanical arm, 

move it toward the tissue, and drop it at the top of the tissue. Align the joystick 

to move the stage to a centered and perpendicular position before beginning the 

microdissection process. 

4. Turn on the key on the right side of the power supply to enable the infrared laser. 

Focus the laser before beginning microdissection using the smallest ring diameter 

and adjust to the desired diameter. 

5. Select the appropriate energy (mW) and time of exposure (ms) for the desired 

laser ring diameter and ensure its effectiveness in an area of the tissue that lacks 

any interest using a cap to be discarded (see Note 11). 

6. Fire the laser each time the ring is over the desired tissue target. Move the stage 

supporting the glass slide with the aid of the joystick, which allows fine and 

precise motion. Check if the tissue is appropriately microdissected and capture 

the tissue images before and after LCM as well as the image of the target tissue 

that was captured in the cap (see Note 13). 

7. When the cap is filled with the desired amount of tissue, remove the cap and use a 

0.5-ml microcentrifuge tube to collect the tissue (the cap is designed to perfectly 

fit to close the tube) (see Note 14). 

8. The microcentrifuge tube can be safely stored in a –80°C freezer without adding any 

buffer and without lysing the cells, which may be done at a convenient time later. 

3.4. LCM Lysate 

1. Lyse a total of 1500–2000 laser shots (about 3000 to 6000 microdissected cells) 

in 4 μl of 20 mM Hepes pH 8.0 with 1% Triton X-100. This is sufficient for 

one SELDI protein array or one MALDI run. For 2D analysis, a minimum of 

approximately 25,000 cells are necessary. 

2. Add the above lysing buffer on the cap and place in the microfuge tube holding 

the cap. This is usually done with two additions of 2 μl to the LCM cap. Pipet 

up and down and scrape the surface of the LCM cap to remove all the cells. A 

gentle scraping motion with the pipet tip may be necessary to remove the cells, 

but be careful not to rip the polymer film (see Note 15). Transfer the lysate 

from the surface of the cap to the microfuge tube. Cells from multiple caps may 

be combined by subsequently using 4 μl of LCM lysate to lyse cells on another 

cap. In this way the volume will remain small. If 2DGE may be performed, 

the lysis procedure is different (see below). Make a 1:10 dilution of each lysate 

in PBS (for IMAC3 SELDI chips) or 100 mM ammonium acetate pH 4.0 (for 

WCX2 chips) (i.e., 36 μl added to the 4 μl lysate) vortex for at least 1 min (see 

Note 16). Spin down briefly. 

3. Prepare the arrays of the IMAC chip with CuSO 4 according to the manufacturer’s 

specifications: 20 μl, 100 mM CuSO 4 for 10 min, wash with HPLC water; 20 μl, 

100 mM Na acetate pH 4.0 for 5 min, wash with water. Use the Micromix 

shaker for all incubations with the following settings: Form-20, Amplitude-5.


4. Assemble the bioprocessor with the desired number of chips and add 2× 200 μl 

PBS to each well, incubate on the shaker for 5 min each time. Pretreat the 

WCX2 chip with 100 mM ammonium acetate pH 4.0. This can be done on the 

BioMek robot. 

5. Add the diluted lysate to the spot on the chip(s) in the bioprocessor. 

6. Cover the bioprocessor with a plastic seal and incubate overnight on MicroMix 

shaker at room temperature, using the same setting as given above. 

7. Remove lysates carefully with a pipet; do not touch the surface of the arrays. 

Save if needed for another experiment. 

8. Wash the spots in bioprocessor 2× with 200 μl PBS (for IMAC) or 100 mM 

ammonium acetate pH 4.0 (for WCX) for 5 min on the shaker. 

9. Wash the arrays with HPLC water 2× for 5 min (on shaker). 

10. Remove the chip(s) from bioprocessor and give them a final rinse with HPLC 

water. 

11. Let the chip dry completely, usually overnight. 

12. Add 2× 0.5 μl saturated SPA dissolved in 50% acetonitrile, 0.5% TFA. 

13. Read at instrument settings optimized for resolution and intensity for the m/z 

range of 1000–20,000. Higher laser energy will be required to see higher 

molecular weight peaks. 

One method of MALDI sample preparation that reduces the complexity of cell 

lysates while remaining robust and easily amenable to automated highthroughput 

applications is sample fractionation using magnetic beads 

(MB) combined with pre-structured MALDI sample supports (AnchorChip 

Technology). Several magnetic bead types with different surface chemistries can 

be used to fractionate serum and increase the number of detectable peaks (see 

the chapter on serum protein profiling for details). For MALDI analysis, dilute 

the lysate 1:10 with CHCA or SPA matrix (5–10 mg/ml in 50% acetonitrile, 0.1% 

TFA). Spot on Anchorplate and read in a MALDI instrument. Further dilution 

and/or fractionation of the lysate may be necessary to achieve optimal spectra. 

If 2DGE analysis will be performed, the cells should be lysed as follows: 

Remove the LCM cap from the tube and add a small volume (10 μl) of 1D 

focusing rehydration buffer to the tube. The preferred number of laser shots is 

approximately 100 K. Replace the cap and invert the tube to allow the buffer 

to come in contact with the cells on the cap and lyse them. Incubate 5 min 

at room temperature. Sonicate the samples to ensure lysis. Continue with the 

basic protocol for 1D IEF and 2D analysis. 

4. Notes 

1. In our experience, a time window of 30 min between completion of surgery 

and tissue freezing yields good protein quality for most proteomic techniques. 

However, if one is studying protein phosphorylation, this begins to significantly 

decrease 20 min after completion of surgery (10).


2. When freezing the tissue sample in the Histobath, avoid immediate and complete 

immersion in 2 ′ methylbutane to preserve optimal tissue morphology. Hold the 

sample at the liquid interface with minimal immersion and wait until the OCT 

and the tissue slowly turn white. 

3. Use uncoated glass slides for LCM. Coated or electrically-charged glass slides 

will interfere with the detachment process of the plastic polymer and are not 

suitable for LCM. 

4. Precipitate from Hematoxylin can contaminate the surface of the tissue. Filter 

these solutions. Add one tablet of protease inhibitor to each staining bath (we use 

Complete, from BMB). Do not add protease inhibitor to alcohol baths. If using 

the histogene staining kit (Arcturus) for frozen sections, this is not necessary. 

5. Change all the staining and alcohol solutions after staining 20 slides. 

6. Poor transfers may result if 100% ethanol has hydrated. Increasing the incubation 

time in xylene often improves transfer. 

7. When specific cells need to be microdissected and these cannot be identified 

morphologically, the cells of interest can be immunostained with specific mAbs 

against proteins highly expressed on those cells (immunophenotype). It is critical 

to expedite the immunostaining procedure because the shorter the immunostaining 

time, the better the protein quality. One must avoid exceeding 30 min for 

the total immunostaining and dehydration procedure. In the past, we have used 

the immunoperoxidase technique with DAB labeling (6), but it was difficult 

to perform quick enough to preserve optimal protein integrity. Also, manual 

microdissection of DAB labeled cells with Pixel II is extremely tedious and nonpractical. 

The immunofluorescence staining method (7) is faster and easier to 

perform. This method coupled with the Autopix microscope, which has dark field 

fluorescence and automation capabilities, is the ideal procedure for immunocapture. 

Since Cy3-strepavidin binds to the antibody labeled with biotin, there is 

no need for a secondary antibody, thereby decreasing the necessary staining time. 

It is recommended to run negative control staining; use a biotinylated control 

antibody from the same animal species and of the same isotype as your primary 

antibody. Dilute to the same working concentration as the primary antibody. 

8. Do not forget to wear gloves every time while performing LCM, including when 

handling the plastic caps. 

9. The thickness of the tissue section is a critical parameter for effective LCM. In 

our experience (using the Pixel II and the Autopix instruments by Arcturus), 

8 μm is the optimal thickness for LCM. 

10. Smooth out the surface of the tissue section with a Prep-strip before placing the 

slide on the LCM instrument, which improves the efficiency and uniformity of 

the microdissection process. 

11. The main factors affecting the efficiency of LCM include the energy, the time 

of exposure, and the diameter of the laser beam. Regarding the diameter, when 

using Pixel II, the smallest ring is 7 μm, the medium ring is 15 μm, and the widest 

ring is 30 μm. Very often, we have used the medium (15 μm, which lifts up 

about three cells with each shot). When trying to microdissect single cells with


Pixel II, one must use the smallest (7 μm) diameter ring, but our experience was 

frustrating. With Autopix, we have observed that microdissection of individual 

cells is better achieved setting the laser ring at 10 μm diameter, below which it 

becomes very difficult to lift up cells efficiently. A 30-μm diameter laser is very 

effective for microdissection of whole glands and other large tissue structures. 

Regarding the other two parameters, the optimization depends on the tissue 

type. For instance, for prostate tissue, an energy of 80 mW with a duration 

of 0.5 ms is usually effective for a medium-size ring (15 μm). The tuning of 

these parameters is accomplished by a “fail and try” approach, progressively 

adjusting the energy and the time of exposure for the desired diameter, which 

obviously depends on the desired microdissection task (single cells vs. mediumor 

large-size tissue structures). 

12. Another factor that affects the effectiveness of LCM is the time the tissue section 

has been dry after the staining and dehydration procedure. Ideally, the tissue 

should be stained and microdissected within 1hifpossible. One must avoid 

having the slide under LCM for more than 4 h. If microdissecting many tissues, 

stain only four slides at a time. 

13. When capturing images before and after microdissection for documentation 

purposes, make sure the image on the monitor is focused because that is the 

image that would be captured. Sometimes is focused on the microscope but is 

unfocused on the monitor. In a typical experiment, you will capture the image 

before and after firing the laser, which provides records of the effectiveness in 

removing the cell targets. You can also capture the image of microdissected 

cells from the polymer cap. 

14. Avoid allowing the LCM caps to become excessively crowded. When using 

the 15-μm laser ring, microdissection is about three cells per shot. One should 

expect around 3000 cells for each 1000 shots, which is about right per single 

cap. 

15. LCM caps can be viewed under a dissecting microscope to ensure that all cells 

have been removed from the polymer film after the lysing procedure. 

16. Depending on the cell type, vigorous vortexing and sonication may be necessary 

to completely lyse the cells after they are removed from the cap. 

References 

1. Prieto, D.A., Hood, B.L., Darfler, M.M., Guiel, T.G., Lucas, D.A., Conrads, T.P., 

Veenstra, D.T., and Krizman, D.B. (2005) Liquid Tissue TM : proteomic profiling of 

formalin-fixed tissues. Biotechniques 38: 32–5. 

2. Emmert-Buck, M.R., Bonner, R.F., Smith, P.D., Chuaqui, R.F., Zhuang, Z., 

Goldstein, S.R., Weiss, R.A., and Liotta, L.A. (1996) Laser capture microdissection. 

Science 274: 998–1001. 

3. Espina, V., Milia, J., Wu, G., Cowherd, S., Liotta, L.A. (2006) Laser capture 

microdissection. Methods Mol Biol 319: 213–29.


4. Best, C.J., and Emmert-Buck, M.R. (2001) Molecular profiling of tissue samples 

using laser capture microdissection. Expert Rev Mol Diagn. 1: 53–60. 

5. Ornstein, D.K., Gillespie, J.W., Paweletz, C.P., Duray, P.H., Herring, J., 

Vocke, C.D., Topalian, S.L., Bostwick, D.G., Linehan, W.M., Petricoin, E.F., III, 

and Emmert-Buck, M.R. (2000) Proteomic analysis of laser capture microdissected 

human prostate cancer and in vitro prostate cell lines. Electrophoresis 21: 

2235–42. 

6. Fend, F., Emmert-Buck, M.R., Chuaqui, R., Cole, K., Lee, J., Liotta, L.A., and 

Raffeld, M. (1999) Immuno-LCM: laser capture microdissection of immunostained 

frozen sections for mRNA analysis. Am J Pathol 154: 61–6. 

7. Murakami, H., Liotta, L., Star, R.A. (2000) IF-LCM: laser capture microdissection 

of immunofluorescently defined cells for mRNA analysis rapid communication. 

Kidney Int 58(3): 1346–53. 

8. Cazares, L.H., Adam, B.L., Ward, M.D., Nasim, S., Schellhammer, P.F., 

Semmes, O.J., and Wright, G.L., Jr (2002) Normal, benign, preneoplastic, and 

malignant prostate cells have distinct protein expression profiles resolved by 

surface enhanced laser desorption/ionization mass spectrometry. Clin Cancer Res 

8: 2541–52. 

9. Diaz, J., Cazares, L.H., Corica, A., and Semmes O. (2004) Selective capture 

of prostatic basal cells and secretory epithelial cells for proteomic and genomic 

analysis. Urol Oncol 22(4): 329–36. 

10. Mora, L., Buettner, R., Seigne, J., Diaz, J., Hamad, N., Garcia, R., Bowman, T., 

Falcone, R., Faigurth, R., Cantor, A., Muro-Cacho, C., Livistong, S., Levitzki, A., 

Kraker, A., Karras, J., Pow-Sang, J., and Jove, R. (2002) Constitutive activation of 

Stat3 in human prostate tumors and cell lines: direct inhibition of stat3 signaling 

induces apoptosis of prostate cancer cells. Cancer Research 62: 6659–66.

4 

Protein Profiling of Human Plasma Samples 

by Two-Dimensional Electrophoresis 

Sang Yun Cho, Eun-Young Lee, Hye-Young Kim, Min-Jung Kang, 

Hyoung-Joo Lee, Hoguen Kim, and Young-Ki Paik 

Summary 

Human plasma is regarded the most complex and well-known clinical specimen that 

can be easily obtained; alterations in the levels of plasma proteins or their corresponding 

enzyme activities may reflect either a healthy or a diseased state. Given that there is 

no defined genomic information as to the intact protein components in plasma, protein 

profiling could be the first step toward its molecular characterization. Several problems 

exist in the analysis of plasma proteins, however. For example, the widest dynamic range 

of protein concentrations, the presence of high-abundance proteins, and post-translational 

modifications need to be considered before proteomic studies are undertaken. In particular, 

efficient depletion or pre-fractionation of high-abundance proteins is crucial for the identification 

of low-abundance proteins that may contain potential biomarkers. After the removal 

of high-abundance proteins, protein profiling can be initiated using two-dimensional 

electrophoresis (2DE), which has been widely used for displaying the differential proteome 

under specific physiological conditions. Here, we describe a typical 2DE procedure for 

plasma proteome under either a healthy or a diseased state (e.g., liver cancer) in which 

pre-fractionation and depletion are integral steps in the search for disease biomarkers. 

Key Words: 2-dimensional gel electrophoresis; plasma; HPPP; immunoaffinity 

column. 

Abbreviations: IEF: Isoelectric Focusing, IPG; Immobilized pH Gradient, TCA: 

Trichloroacetic Acid, FFE: Free Flow Electrophoresis, HPMC: Hydroxypropyl Methylcellulose, 

TBP: Tributylphosphine, 2DE: 2-dimensional Gel Electrophoresis, BPB: 

Bromophenol Blue, CHCA: -cyano-4-hydroxycinnamic acid, LTQ: Linear Iontrap 



57

58 Cho et al. 

MALDI-TOF: Matrix-assisted Laser Desorption Ionization - Time of Flight Mass 

Spectrometry, HPPP: Human Plasma Proteome Project. 


Human plasma is an intravascular fluid that serves as a liquid medium 

for blood proteins that are derived from various cells, tissues, and other 

biofluids (1). In fact, the components of plasma are very heterogeneous, 

including inorganic ions (e.g., bicarbonate, calcium), metabolic intermediates 

(e.g., cholesterol, glucose), and plasma proteins (e.g., albumin, globulin), which 

are important in maintaining body fluid balance, immune response, blood 

clotting, and other metabolic mechanisms of homeostasis. Plasma contains 

many different proteins that are primarily synthesized in the liver and are often 

subjected to post-translational modification (PTM) (2). 

Since human plasma is the most complex and well-known clinical specimen 

that can be easily obtained, it has been a central target for many biomedical 

studies (2). Alterations in the levels of plasma proteins or their corresponding 

enzyme activities may reflect either a healthy or a diseased state that can 

be monitored by various analytical tools, including biochemical assays and 

proteomics. Given that there is no defined genomic information as to the 

intact protein components in plasma, a proteomic study may be the method of 

choice (3,4). Recently, plasma protein profiling was conducted as part of the 

plasma proteome project of HUPO, termed HPPP (5). The pilot phase of HPPP 

produced 3020 non-redundant proteins that were found to be present in human 

plasma and serum (5,6). 

However, several points must be addressed before proteomic studies are 

undertaken. First, plasma protein is believed to contain the most dynamic 

concentration range (more than 10 orders of magnitude) of each constituent 

protein, creating many technical obstacles in proteomic detection by mass 

spectrometry (MS) (2,3). For example, the removal of high-abundance proteins 

(e.g., albumin, IgG, transferrin, fibrinogen, IgA, etc.) that occupy more than 

90% of all plasma proteins prior to biochemical analysis may be a big 

challenge and perhaps even problematic in light of plasma-derived biomarker 

discovery (3,7). Second, since many plasma proteins have many structural 

isoforms, more efficient analytical system is needed to facilitate the analysis 

of multiple isoforms of plasma proteins (1). Third, since many plasma proteins 

are synthesized as pre-proteins that are subjected to various PTMs for cellular 

function, more efficient methods to analyze modified proteins (e.g., glycosylated 

proteins) are required. For example, since glycopeptides are not easily 

ionized completely during MS analysis, which leads to inadequate spectral 

data and low detection sensitivity due to the attached glycans, a strategy

Protein Profiling by Two-Dimensional Electrophoresis 59 

for the removal of glycans must be considered for protein identification. 

Taken together, all these factors are important for the proteomic study of 

plasma (8). 

Of the problems listed above, the first problem that concerns the protein 

profiling of plasma may be the depletion or pre-fractionation of high-abundance 

plasma proteins (3,4,7). Without this depletion procedure, the identification of 

low-abundance proteins (including biomarkers) may not be practical. After the 

removal of high-abundance proteins, two-dimensional electrophoresis (2DE) 

may be the first step chosen to analyze plasma proteins because it is easy to 

perform in the laboratory. Although 2DE has several limitations in terms of 

reproducibility, separation of membrane or low-molecular-weight proteins, and 

proteins with extreme pIs (10), this technique has been widely used 

as a first analysis of proteins in a particular physiological state when coupled 

with MS (9). Recently, quantitative 2DE was performed with a difference in 

gel electrophoresis (DIGE) system (see Chapter by Friedman and Lilley for 

detail), where two or three differentially staining dyes can be applied to specific 

protein populations to determine their quantitative changes in expression levels 

under a specific physiological condition (10). Thus, this chapter is intended 

to provide the reader with necessary information on the systematic analysis 

of the plasma proteome using 2DE in an attempt to search for disease 

biomarkers from the plasma proteins of patients with hepatocellular carcinoma 

(HCC) (11,12). 

2. Materials 

2.1. Preparation of Human Plasma Samples 

1. Blood collection tubes: BD Plus Plastic K 2 EDTA (BD, 367525; 10 mL), BD 

Glass Serum with silica clot activator (367820, 10 mL). 

2. Protease inhibitor (Complete Protease Inhibitor Cocktail, Roche, 11 697 498 001, 

20 tablets): One tablet contains protease inhibitors (antipain, bestatin, chymostatin, 

leupeptin, pepstatin, aprotinin, phosphoramidon, and EDTA) sufficient for the 

processing of 100 mL plasma samples. Prepare 25× stock solutions in 2 mL 

distilled water. 

2.2. Depletion of High-Abundance Proteins with an Immunoaffinity 

Column 

1. HPLC system, such as the HP1100 LC system (Agilent). 

2. Multiple affinity removal system (MARS): LC column (Agilent, 5185-5984); 

Buffer A for sample loading, washing, and equilibrating (Agilent, 5185-5987); 

Buffer B for eluting (Agilent, 5185-5988).

60 Cho et al. 

2.3. Isoelectric Focusing (IEF) with Immobilized pH Gradient (IPG) 

Strip 

1. MultiPhor TM (GE Healthcare) or Protean IEF cell (Bio-Rad): Numerous commercially 

available isoelectric focusing units exist 

2. Re-swelling tray 

3. Mineral oil: Immobiline Dry Strip Cover Fluid (GE Healthcare) 

4. Power supply, such as the EPS 3501 XL power supply (GE Healthcare) 

5. Thermostatic circulator: Multitemp III thermostatic circulator (GE Healthcare) 

6. IPG strip: Immobiline Dry Strip, pH 3-10 nonlinear (NL), or pH 4.0-5.0, and pH 

5.5-6.7, 18 cm long, 0.5 mm thick (GE Healthcare) or with the same pH ranges 

for ReadyStrip IPG strip (Bio-Rad) 

7. Carrier ampholyte mixtures: IPG buffer or Pharmalyte, same range as the selected 

IPG strip 

8. Sample buffer: 7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 0.5% (v/v) ampholyte, 

100 mM DTT, 40 mM Tris-HCl, pH 7.5, a trace amount of bromophenol blue 

(BPB) 

2.4. Microscale Solution Isoelectric Focusing: ZOOM ® 

1. ZOOM ® (IEF Fractionator (Invitrogen, ZF10001)). 

2. ZOOM ® disks: pHs 3.0, 4.6, 5.4, 6.2, 7.0, and 10.0 [Invitrogen, ZD series (e.g., 

ZD10030 for pH 3.0)] 

3. IEF Anode Buffer (50X) (Novex, LC5300, 100 mL) 

4. IEF Cathode Buffer (10X) (Novex, LC5310, 125 mL) 

5. Anode buffer: 8.4 g urea, 3.0 g thiourea, 3.3 mL Novex ® IEF Anode Buffer 

(50X). Add water to a final volume of 20 mL. 

6. Cathode buffer: 8.4 g urea, 3.0 g thiourea, 3.3 mL Novex ® IEF Cathode Buffer 

(50X). Add water to a final volume of 20 mL. 

2.5. Fractionation of Plasma Samples by Free Flow Electrophoresis 

(FFE) 

1. ProTeam TM FFE instrument (Tecan) 

2. 1% 2-(4-sulfophenylazo)-1,8-dihydroxy-3,6-naphthalenedisulfonic acid (SPAD- 

NS) (Tecan, 517074) 

3. 0.8% hydroxypropyl methylcellulose (HPMC) (Tecan, 5170709) 

4. pI markers: mixture of pI markers that indicate pHs 4.2, 5.1, 6.3, 7.4, 8.7, and 

10.1 (Tecan, 5170705) 

5. Prolyte TM 1, Prolyte TM 2, and Prolyte TM 3 (Tecan, 0309081, 0309102, and 

0309093) 

6. Anodic stabilization medium (Inlet I 1 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% 

(w/w) HPMC, 100 mM H 2 SO 4 

7. Separation medium 1 (Inlet I 2 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% (w/w) 

HPMC, 14.5% (w/w) Prolyte TM 1


8. Separation medium 2 (Inlet I 3−5 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% (w/w) 

HPMC, 14.5% (w/w) Prolyte TM 2 

9. Separation medium 3 (Inlet I 6 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% (w/w) 

HPMC, 14.5% (w/w) Prolyte TM 3 

10. Cathodic stabilization medium (Inlet I 7 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% 

(w/w) HPMC, 100 mM NaOH 

11. Counter flow medium (Inlet I 8 ): 14.5% (w/w) glycerol, 8 M urea 

12. Anodic circuit electrolyte: 100 mM H 2 SO 4 

13. Cathodic circuit electrolyte: 100 mM NaOH 

2.6. Preparation of 2D Gels 

1. Gradient former: One of the two Bio-Rad models can be used in this step: Model 

385 (30-100 mL capacity) or Model 395 (100-750 mL capacity). 

2. Orbital shaker with speed controller. 

3. SDS-PAGE: Protean II xi multicell and multicasting chamber (Bio-Rad) or Ettan 

DALT twelve large vertical system (GE Healthcare). 

4. 5× Tris-HCl buffer: Dissolve 227 g Tris into 800 mL distilled water and adjust 

the buffer to pH 8.8 with HCl (∼30 mL). Add distilled water to a final volume 

of1L. 

5. 5× Gel buffer: Dissolve 15 g Tris, 72 g glycine, and 5 g sodium dodesyl sulfate 

(SDS) into 800 mL distilled water and add distilled water to a final volume 

of1L. 

6. SDS Equilibration buffer contains 6 M urea, 2% (w/v) SDS, 5× gel buffer (pH 

8.8), 50% (v/v) glycerol, and 2.5% (w/v) acrylamide monomer. 

7. Acrylamide stock solution: Acrylamide/Bis-acrylamide 37:5.1, 40% (w/v) 

solution (Amresco, M157, 500 mL). 

8. Fixing solution: 40% (v/v) methanol and 5% (v/v) phosphoric acid in distilled 

water. 

9. Coomassie blue G-250 staining solution: 17% (w/v) ammonium sulfate, 3% (v/v) 

phosphoric acid, 34% (v/v) methanol, and 0.1% (w/v) Coomassie blue G-250 in 

distilled water. 

2.7. 2D Gel Image Analysis 

1. Scanner with transparency unit, such as Bio-Rad GS710 or GS800 

2. 2D gel image analysis program: Image Master Platinum 5 (GE Healthcare), 

PDQuest 7.3.0 (Bio-Rad), or Progenesis Discovery (NonLinear Dynamics, Ltd.) 

2.8. Destaining, In-gel Deglycosylation, and In-gel Tryptic Digestion 

1. Speed Vac (Heto) 

2. PNGase F stock solution for in-gel deglycosylation PNGase F (Glyko, Inc, GKE- 

5010). Dilute 1 μL PNGase F (2 mU) with 2.5 mL 1× N-glycanase incubation 

buffer (20 mM sodium phosphate, pH 7.5, and 0.02% (w/v) sodium azide)

62 Cho et al. 

3. Sequencing-grade modified trypsin (Promega, V5111, 100 μg, 18,100 U/mg) 

4. 50 mM ammonium bicarbonate 

2.9. Desalting of Peptides and MALDI Plating 

1. GELoader tips (Eppendorf, No. 0030 048.083, 20 μL capacity) 

2. Poros 10 R2 resin (PerSeptive Biosystems, 1-1118-02, 0.8 g) 

3. Oligo R3 resins (PerSeptive Biosystems, 1-1339-03, 6.3 g) 

4. 2% (v/v) formic acid in 70% (v/v) acetonitrile (ACN) 

5. 0.1% (v/v) trifluoroacetic acid in 70% (v/v) ACN 

6. 1-mL syringe 

7. Matrix: -cyano-4-hydroxycinnamic acid (CHCA) 

8. Opti-TOF TM 384-well insert (123 × 81 mm, 1016491, Applied Biosystems) 

2.10. MALDI-TOF and Peptide Mass Fingerprinting 

1. MALDI-TOF and MALDI-TOF/TOF: Voyager DE-Pro and 4800 MALDI 

TOF/TOF TM Analyzer (Applied Biosystems) equipped with a 355-nm Nd:YAG 

laser. The pressure in the TOF analyzer is approximately 7.6e-07 Torr. 

3. Methods 

3.1. Human Plasma Sample Preparation 

The following protocol is conducted according to the HUPO reference 

sample collection protocol (13). 

1. Each sample pool consisted of 400 mL blood from one healthy, fasting male and 

one healthy, fasting postmenopausal female, and was collected into 10-mL tubes 

by two venipunctures, 20 tubes per veni-puncture (see Note 1). 

2. Equal numbers of tubes and aliquots were generated with appropriate concentrations 

of K 2 -EDTA, lithium heparin, or sodium citrate for plasma or were permitted 

to clot at room temperature for 30 min to yield serum (with micronized silica as 

the clot activator) (see Note 2). 

3. The specimens were centrifuged for 10–15 min under refrigerated conditions at 

2–6°C. 

4. The resultant serum and plasma from 10 spun tubes of the same type from each 

donor were pooled into one secondary 50-mL conical bottom BD TM Falcon tube 

for each tube type. 

5. The secondary tube was centrifuged at 2400×g for 15 min to remove residual 

cellular material from serum and to prepare platelet-poor plasma from the EDTA, 

heparin, and citrate secondary tubes. 

6. Equal volumes of either serum or plasma were pooled from each secondary tube 

into media bottles (see Note 3). 

7. Serum/plasma was mixed gently and kept on ice while distributed as 20-μL 

aliquots into cryovials and was then frozen and stored at –70°C.


3.2. Depletion of High-abundance Proteins with an Immunoaffinity 

Column 

For efficient depletion of high-abundance proteins prior to their molecular 

analysis, many reports have indicated that it is convenient to use commercially 

available immunoaffinity columns, such as the MARS (Agilent) (2,3) or the 

prepacked 2-mL Seppro TM MIXED12 affinity LC column (GenWay Biotech.) 

(14), coupled with an HPLC system. For depletion of the six most abundant 

proteins (i.e., albumin, transferrin, IgG, IgA, haptoglobin, and anti-trypsin) in 

either serum or plasma, we introduced MARS, which has been used successfully 

with a wide variety of sample types, including cerebrospinal fluid (CSF) and 

follicular fluid (2,3) (see Fig. 1 ). 

1. Dilute human serum or plasma fivefold with Buffer A (for example: 20 μL 

human plasma with 80 μL Buffer A) containing the protease inhibitor stock 

solution (40 μL per 1 mL plasma) (see Note 4) (adopted from the manufacturer’s 

instructions). 

2. Remove the particulates with a 0.22-μm spin filter for 1 min at 16,000×g. 

3. Inject 75-100 μL of the diluted serum or plasma at a flow rate of 0.5 mL/min. 

Fig. 1. The 2DE images of total human plasma proteins that were depleted of the 

major six abundant proteins through MARS. Proteins were isoelectrically focused with 

pH 3–10 NL IPG strips in the first dimension and then resolved by 9–16% SDS- 

PAGE in the second dimension. (A) Whole plasma. (B) Flow through from MARS. 

Approximately 800 protein spots are displayed by 2DE and identified by MALDI-TOF 

mass spectrometry. The names of the major proteins of each gel are marked on the 

image (5) (from (4)with permission)

64 Cho et al. 

4. Collect the flow-through fractions that appear between 1.5 and 4.5 min and store 

them at –20°C if they were not to be analyzed immediately. 

5. Elute bound proteins from the column with Buffer B (elution buffer) at a flow 

rate of 1 mL/min for 3.5 min. 

6. Regenerate the column by equilibrating with Buffer A for an additional 7.4 min 

at a flow rate of 1 mL/min. 

3.3. TCA/Acetone Precipitation 

During 2DE, interfering compounds, such as proteolytic enzymes, salts, 

lipids, nucleic acids, and any residual high-abundance proteins present after 

depletion, must be removed or inactivated. In the case of plasma samples, the 

two most important parameters are salt and proteolysis. TCA/acetone precipitation 

is the most useful method for desalting the whole plasma and the 

flow-through fractions of MARS. 

1. Add 50% (w/v) trichloroacetic acid (TCA, Sigma, T9159) to reach a final TCA 

concentration of 5-8%. Mix gently by inverting the tube 5 to 6 times and incubate 

on ice for 2 h. 

2. Centrifuge the sample at 14,000×g for 15 min and discard the supernatant. 

3. Add 200 μL cold acetone and resuspend the protein pellet with a pipette. 

4. Incubate on ice for 15 min and centrifuge the sample at 14,000×g for 20 min, 

discard the acetone, and dry the pellet in air (see Note 5). 

5. Dissolve the pellet in the sample buffer for 2DE and quantify the protein concentration 

by the Bradford protein assay. 

3.4. Rehydration of the IPG Gel Strip 

For analytical purposes, typically 0.3–1.0 mg protein can be loaded onto an 

18-cm-long IPG with a wide pH range (e.g., pH 3-10), or 0.5–2.0 mg on an 

IPG with a narrow pH range (e.g., pH 5.5–6.7). A narrow-range IPG usually 

produces a higher resolution when separate proteins are analyzed by sequential 

IEF systems: first, fractionate the proteins over several pI ranges in solution 

with ZOOM ® disks or FFE (see Subheadings 3.6 and 3.7) and then perform 

IEF with IPG strips [one pH unit range strips are also available (e.g., pH 3.0– 

4.0 or pH 3.5–4.5 up to pH 6.7)]. Certain proteins appear to be trapped in the 

disk membrane; partitions and sample loss should be considered. 

1. Dilute 1.0 mg protein with the sample buffer to a final volume of 400 μL for 

18-cm-long IPG strips (see Note 6). 

2. Transfer the entire protein-containing sample buffer into the re-swelling tray. 

3. Peel off the protective cover from the IPG strip and slowly slide the IPG strip (gel 

side down) onto the sample solution. Avoid trapping air bubbles and distribute 

the sample solution evenly under the strips.


4. Overlay the strip with mineral oil and leave for 12-16 h at room temperature (see 

Note 7 for cup loading) 

3.5. IEF with IPG Strip 

1. Remove the rehydrated IPG strips that are carrying the protein samples and place 

them (gel side up) on the strip tray. 

2. Place the 2.5-cm filter papers, wetted with distilled water, on both sides of the 

strips at both cathodic and anodic ends. Place the strip tray on the IEF unit. 

3. Cover the strips entirely with mineral oil. 

4. Program the instrument (e.g., Multiphor II): Increase the voltage from 100 to 

3500 V to reach 80,000 total voltage hours (Vh) (e.g., sequentially, 300 Vh at 

100 V, 600 Vh at 300 V, 600 Vh at 600 V, 1000 Vh at 1000 V, and 2000 Vh at 

2000 V, for a total of 80,000 Vh at 3500 V) (see Notes 8 and 9). 

5. During IEF, the temperature is set to 20°C with a water circulator. 

3.6. Microscale Solution IEF: ZOOM ® 

To reduce typical artifacts that may occur when using narrow-range IPG 

strips (e.g., streaking, distortion, and loss of protein spots), one may use 

MicroSol-IEF (e.g., ZOOM ® , Invitrogen) prior to running 2D gels (3) (see 

Fig. 2). MicroSol-IEF is a preparative solution-phase IEF apparatus that 

is dissected by a defined pH membrane disc (15,16). Using MicroSol-IEF, 

2.5-3.0 mg plasma proteins can be loaded and efficiently fractionated into five 

separate chambers by their pI values. 

1. Add 2 μL of 99% dimethylamine (DMA) to the 400-μL sample (see Subheading 

3.4, Step 2) for alkylation and incubate the sample on a rotary shaker for 30 min 

at room temperature (adopted from the manufacturer’s instructions). 

2. Add 4 μL of 2 M DTT to quench any excess DMA. Centrifuge at 16,000×g for 

20 min at 4°C. 

3. Preparation of protein samples: Dilute 3 mg protein to a 3250-μL volume with 

sample buffer. The amount of diluted sample per chamber in the ZOOM ® IEF 

Fractionator is 650 μL. 

4. Assemble the ZOOM ® IEF Fractionator according to the manufacturer’s instructions. 

Six disks (pHs 3.0, 4.6, 5.4, 6.2, 7.0, and 10.0) are used to create five 

fractions that have a range of pH 3.0–10.0. 

5. Add each buffer (anode or cathode) to the corresponding blank chamber. 

6. Remove the sample chamber cap and add 650 μL of protein sample (step 3) to 

each chamber. 

7. Fractionation can be carried out under the following conditions: 100 V for 20 min, 

200 V for 80 min, and 600 V for 80 min (see Note 10). The starting current is 

approximately 0.6 mA, which increases to approximately 1.2 mA at the beginning 

of the 200-V step, and the ending current is approximately 0.2 mA. 

8. Load the electro-focused samples to the narrow pH IPG strips for 2DE.

66 Cho et al. 

Fig. 2. Narrow pH range 2DE images of plasma proteins after depletion of the major 

six abundant proteins through MARS. After microscale solution IEF (ZOOM ® ), the pH 

5.5–6.2 fraction was separated on pH 5.5–6.7 IPG strips by second isoelectric focusing 

and then resolved on a 9–16% gel. (A) Whole 2DE image of pH 3–10 NL and pH 

5.5–6.7. (B) One spot on the pH 3–10 NL gel can be separated into two or more spots 

in the narrow pH range 2DE. (C) Many hidden spots on the pH 3–10 NL gel appear 

in the narrow pH range 2DE of normal and HCC plasma.


3.7. Fractionation of the Plasma Samples by Free Flow Electrophoresis 

To identify and isolate biomarker candidates from the plasma of diseased 

patients with HCC using 2DE, a higher resolution is critical, and the analysis 

can be done by performing narrow pH range IEF. However, for narrow pH range 

IEF, higher amounts of proteins (e.g., 10-fold or higher) should be loaded onto 

the IPG strip since the proteins present in other pH ranges will be discarded. 

Nevertheless, prefractionation or depletion is required prior to running both 

IEF and 2D gel. FFE is useful for prefractionation of plasma samples since it 

gives rise to a specific fraction of interest (e.g., pI, or density). For example, if 

one knows the pI of certain proteins, free fractionation by FFE can be useful 

for prefractionation of complex plasma. We describe here one of the several 

procedures for prefractionation of plasma samples using FFE. 

1. Dissolve the TCA-precipitated, flow-through fractions of MARS (∼2.0 mg) into 

the 500-μL separation medium 3 (see below) (adopted from the manufacturer’s 

instructions). 

2. Add traces of red acidic dye 2-(4-sulfophenylazo)-1,8-dihydroxy-3,6- 

naphthalenedisulfonic acid (SPADNS, Aldrich) to ease the optical control of the 

migration of sample within the separation chamber. 

3. FFE is carried out at 10°C using the following media (solutions marked 

at each inlet are applied): Anodic stabilization medium (Inlet I 1 ), separation 

medium 1 (Inlet I 2 ), separation medium 2 (Inlet I 3−−5 ), separation medium 3 

(Inlet I 6 ), cathodic stabilization medium (Inlet I 7 ), and counter-flow medium 

(Inlet I 8 ). 

4. To both the anode and the cathode, anodic circuit electrolyte and cathodic circuit 

electrolyte are applied, respectively. 

5. Assemble the ProTeam TM FFE instrument (Tecan). Use a 0.4-mm spacer for the 

separation chamber and a flow rate of approximately 60 mL/h (Inlet I 1−7 ) and a 

voltage of 1500 V, which results in a current of 20–24 mA. 

6. Perfuse the separation chamber with the sample using the cathodal inlet at approximately 

0.7 mL/h (4,17). Residence time in the separation chamber is approximately 

33 min. 

7. Collect each fraction into polypropylene, 96 deep-well plates, numbered 1 (anode) 

through 44 (cathode) (4). 

8. Remove glycerol and HPMC by TCA/acetone precipitation and dissolve the 

proteins with sample buffer. 

9. Load the electro-focused samples with narrow pH to the IPG strips for 2DE. 

3.8. Preparation of 2D Gels 

1. Cast the glass plates (separated by two 1.5-mm spacers positioned along the sides) 

and thin plastic sheets in the multi-casting chamber (20). 

2. Prepare gel solution for making 10 gels (20 × 20 cm, 1.5-mm spacer, 9–16% 

gradient): heavy solution (66.7 mL of 5× Tris-HCl buffer, 75 mL of a 40%

68 Cho et al. 

acrylamide stock solution, 0.7 mL of 10% ammonium persulfate (APS), 70 μL 

TEMED, and 191.7 mL of 50% glycerol), light solution (66.7 mL of 5× Tris-HCl 

buffer, 141.7 mL of a 40% acrylamide stock solution, 0.7 mL of 10% APS, 70 μL 

TEMED, and 125 mL distilled water). 

3. Assemble the gradient maker and peristaltic pump. Pour the light gel solution into 

the mixing chamber (close to the casting chamber) and the heavy gel solution 

into the reservoir chamber of the gradient maker. Operate the magnetic stirrer in 

the mixing chamber. Turn on the peristaltic pump until the gel solution reaches 

0.5-1.0 cm below the end of the glass plates (∼5 min). Check the flow rate, which 

should be between 100-120 mL/min. 

4. After the gel solution is poured, overlay the gel solution with distilled water to 

exclude air and to ensure a level surface on the top of the gel. 

5. Allow polymerization to occur overnight at room temperature. 

3.9. Equilibration of the Sample and Running of the Gel 

To solubilize the electro-focused proteins and to allow SDS to polymerize, 

it is necessary to soak the IPG strips in SDS equilibration buffer. This step 

is analogous to boiling the sample in SDS buffer prior to SDS-PAGE. The 

reducing agents, dithiothreitol (DTT) and tributylphosphine (TBP), reduce 

disulfide bonds to sulfhydryls (cysteine residues). Alkylating agents and iodoacetamide 

(IAA) prevent reoxidation of the free sulfhydryl groups (21). 

1. Prior to use, add approximately 158 μL TBP in 1 mL isopropanol to 100 mL 

SDS equilibration buffer and sonicate in a bath-type sonicator until the solution 

becomes transparent (see Note 11) (termed TBP equilibration buffer). 

2. Add 15 mL TBP equilibration buffer to each strip (gel side up) and gently shake 

for 25 min (TBP equilibration) (see Note 12) on an orbital shaker. 

3. Briefly rinse the IPG strip with 1× gel buffer and load the IPG strips onto the 

top of the gel and pour the agarose embedding solution (molten agarose solution 

with trace amounts of BPB) (see Note 13). 

4. Perform SDS-PAGE (40 mA/gel) until the BPB dye reaches the bottom of the 

gel. Keep the temperature at 10°C. The total run time for 20 × 20 cm gels is 

approximately 6 h. 

3.10. Coomassie Brilliant Blue G-250 Staining 

1. Fix the separated proteins into the gel in a 200-mL fixing solution for 1 h. 

2. Decant the fixing solution and stain the gel in Coomassie brilliant blue G-250 

overnight. 

3. Decant the staining solution. 

4. Wash several times (>3 times) in distilled water for more than 4 h. 

5. Scan the gel, then wrap the gel in plastic, and store it at 4°C.


3.11. 2D Gel Image Analysis 

1. Import the gel image (recommended 12–16 bit, tiff format) and convert it into an 

ImageMaster file (*.mel). 

2. Detect the protein spots and determine the volume and percentage volume of 

each spot. The percentage volume is the normalized value that remains relatively 

independent of any irrelevant variations between gels, particularly those caused 

by varying experimental conditions. 

3. Select the differentially displayed protein spots (see Fig. 3). 

3.12. Destaining, In-gel Deglycosylation, and In-gel Tryptic Digestion 

Most plasma proteins are glycosylated, including clotting factors, lipoproteins, 

and antibodies (22,23). These carbohydrate-containing proteins play 

major roles in the normal biological functions in plasma. Since glycopeptides 

are not easily completely ionized during MS analysis, which may lead to inadequate 

spectral data and low detection sensitivity due to the attached glycans, a 

strategy for the removal of glycans is necessary for protein identification. 

1. Pick (or excise) the protein spot with an end-cut yellow tip and transfer the gel 

piece into a 1.5-mL Eppendorf tube. 

2. Wash the gel piece with 100 μL distilled water. 

3. Add 50 μL of 50 mM NH4HCO3 (pH 7.8) and ACN (6:4), and shake for 10 min. 

4. Repeat step 3 until the Coomassie blue G250 dye disappears (2 to 5 times). 

5. Decant the supernatant and dry the gel piece in a Speed Vac for 10 min (see 

Note 14). 

6. Add 5 μL trypsin (12.5 ng/μL in 50 mM NH 4 HCO 3 ) and leave the gel piece on 

ice for 45 min. 

7. Add 10 μL of 50 mM NH4HCO3 to the gel slice. 

8. Incubate the gel piece at 37°C for 12 h. 

3.13. Desalting of Peptides and MALDI Plating 

1. Resin packing: Twist the column body (GELoader tip, Eppendorf) near the end of 

the tip and push the resin solution [Poros R2:Oligo R3 (2:1) in 70% (v/v) ACN, 

occasionally in a more efficient ratio of 1:1] with a 1-mL syringe. A packed resin 

length of 2-3 mm is suitable (18,19). 

2. Equilibration of the column: Add 20 μL of 2% (v/v) formic acid and push the 

solution through the column with the 1-mL syringe. 

3. Peptide binding: Add the peptide solution (supernatant of step 9 in Subheading 

3.12, approximately 10-12 μL) and push this solution through the column with 

the syringe. 

4. Washing: Add 20 μL of 2% (v/v) formic acid and push this solution through the 

column with the syringe.

70 Cho et al. 

Fig. 3. Detection of PTMs on the 2DE of plasma proteins. (A) 2DE images of 

plasma proteins that were depleted of the major six abundant proteins through MARS, 

untreated (left) and alkaline phosphatase (AP)-treated (AP) (right). (B) One of the 

differentially displayed proteins after treatment with AP. (C) Data-dependant neutral 

loss scan spectrum of sequence KEPCVESLVSpQYFQTVTDYGKD corresponding to 

the phosphorylated apolipoprotein A-II precursor.


5. MALDI spotting: Add 1 μL matrix solution [10 mg/mL CHCA in 70% (v/v) can 

and 2% (v/v) formic acid] and directly spot the eluted peptides and matrix mixture 

onto the MALDI plate (Opti-TOF TM 384-well Insert, Applied Biosystems). 

6. Reuse the column: Add 20 μL of 100% ACN and push this solution through the 

column with the syringe and repeat step 2 for equilibration of the column. 

3.14. MALDI-TOF and Peptide Mass Fingerprinting 

1. Analyze the peptide mass fingerprinting (PMF) with the Voyager DE-PRO or 

4800 MALDI-TOF/TOF mass spectrometer (Applied Biosystems). 

2. Obtain the mass spectra in reflectron/delayed extraction mode with an accelerating 

voltage of 20 kV and sum data from either 500 laser pulses (4800 MALDI- 

TOF/TOF) or 100 laser pulses (Voyager DE-PRO). 

3. Calibrate the spectrum with tryptic auto-digested peaks (m/z 842.5090 and 

2211.1046) and obtain monoisotopic peptide masses with Data Explorer 3.5 

(PerSeptive Biosystems). 

4. Search the Swiss-Prot and NCBInr databases with the Matrix Science search 

engine (http://www.matrixscience.com). 

3.15. Profiling of PTMs on Selected Spots 

Although shotgun proteomics that utilize various labeling techniques (e.g., 

SILAC and iTRAQ) are useful for protein identification in a high-throughput 

manner, it has many limitations for PTM analysis. However, 2D gels usually 

display proteins with PTMs or isoforms of certain proteins on a single gel 

as spots in different positions, which can lead to further identification for 

their molecular characteristics with the aid of high resolution LC-MS/MS. For 

example, in a typical 2D gel of plasma, the phosphorylated forms of certain 

protein can be easily detected in a ladder form that results from different 

pIs. Figure 3 shows the localization of the exact site of phosphorylated 

apolipoprotein A-II precursor. As seen in the figure, there is clear difference 

between spots that are alkaline phosphatase (AP)-treated and those that are 

untreated in the 2D gel where the treated group has been shifted to a more 

basic position. The phosphorylation site of these proteins can be determined 

using multidimensional MS (MS 2 and MS 3 ). Here, we describe the procedure 

for identification of phosphorylated proteins by 2DE coupled to MS. 

1. Desalting is processed for the MARS-treated (high-abundance proteins depleted) 

plasma sample using Amicon Ultra-15 (Molecular Weight Cut Off; 5 kDa, 

Millipore). 

2. Dephosphorylation is carried out overnight at 37°C in a solution of 0.4% 

ammonium carbonate buffer (pH 8.5) with 24 ng/μL calf intestine AP in 0.4% 

NH4HCO3. 

3. The reaction is stopped by freeze drying for further analysis.

72 Cho et al. 

4. Execute 2DE, picking, extraction, and desalting of peptides under the same 

conditions (see Subheadings 3.8-3.13). 

5. Dissolve the extracted and desalted peptides in 10 μL of LC-MS/MS 

solution [0.4% (v/v) acetic acid and 0.005% (v/v) heptafluorobutyric acid 

(HFBA)]. 

6. Nano LC-MS/MS analysis is then performed on an Agilent Nano HPLC system 

(Agilent) and LTQ mass spectrometer (Thermo Electron, San Jose, CA). 

7. The capillary column used for LC-MS/MS analysis (150 mm × 0.075 mm) 

was obtained from Proxeon (Odense M, Denmark), and the slurry was packed 

in-house with a 5-μm, 100-Å pore size Magic C18 stationary phase (Michrom 

Bioresources, Auburn, CA). 

8. The mobile phase A for LC separation was 0.4% acetic acid and 0.005% HFBA 

in deionized water (Cascada , Pall, USA), and the mobile phase B was 0.4% 

acetic acid and 0.005% HFBA in ACN. 

9. The sample obtained from the Oasis HLB (Waters, USA) desalting step and 

Nanosep (Pall, USA) filtering was loaded onto the LC column. 

10. The chromatography gradient was designed to provide a linear increase from 

5% B to 35% B over 50 min and from 40% B to 60% B over 20 min and from 

60% B to 80% B over 5 min. The flow rate was maintained at 300 nL/min. 

11. The mass spectra were acquired using data-dependent acquisition with a full mass 

scan (400-1800 m/z) followed by MS/MS scans. Each MS/MS scan acquired 

was an average of three microscans on LTQ. 

12. The temperature of the ion transfer tube was controlled at 200°C, and the spray 

was 2.0–3.0 kV. The normalized collision energy was set at 35% for MS2. 

13. To determine the exact position of the phosphorylation site, the automated 

neutral loss MS3 scan was employed, which relies on the observed behavior 

of phosphopeptides subjected to MS/MS analysis in an ion trap. If the MS/MS 

scan produces a fragment phosphate group (98 with charge state 1+, 49 with 

charge state 2+, and 32.6 with charge state 3+), an MS3 scan of the product ion 

is initiated (see Note 15). 

4. Notes 

1. Donors were tested and determined negative for HIV-1 and HIV-2 antibodies, 

HIV-1 antigen (HIV-1), Hepatitis B surface antigen (HBsAg), Hepatitis B core 

antigen (anti-HBc), Hepatitis C virus (anti-HCV), HTLV-I/II antibody (anti- 

HTLV-I/II), and syphilis. 

2. No protease inhibitor cocktails were used. This procedure required 2hat2-6°C. 

3. Approximately 10% of the sample was left at the bottom of the secondary tube 

to ensure that no cellular material was collected. 

4. If excess of protease inhibitors are used, the resolving power of protein spots in 

the 2D gel will be decreased, and the border of the spots will be unclear. 

5. If protein pellets are dried completely in the Speed Vac, they will be not redissolved 

in sample buffer. Pellets should be air dried for 15–30 min.


6. To ensure complete dissolution of the sample buffer, it is usually recommended 

to warm the sample buffer at room temperature. The sample buffer that includes 

proteins should not be heated to avoid carbamylation of proteins by isocyanate, 

which may lead to charge heterogeneities that are formed from the decomposition 

of urea. 

7. Cup loading: Rehydrate the IPG gel strip with 350 μL sample buffer (proteins 

are not included), and load the 100-μL protein sample in sample buffer in the 

sample cup. High salt concentrations are better tolerated by cup loading. 

8. Apply low voltages (100 V) at the beginning of the run for 3–5 h. Replace the 

filter paper (for desalting purposes) at the end of the run. 

9. After 1D (first dimension) is run, IPG strips that were not immediately used for 

2D (second dimension) run can be preserved at –80°C for several months. 

10. If electrical current passes through the system, BPB dye starts to migrate toward 

the anode reservoir, which eventually results in a change in the color of the 

anode buffer (to yellow). 

11. Concentrated TBP reacts violently with organic matter. All procedures for 

preparing TBP stock solutions should be done in a fume hood. Store the TBP 

stock solution in the dark at 4°C. Do not store it longer than 2 weeks. 

12. DTT/IAA equilibration procedure: For reduction and alkylation of proteins, 

the DTT/IAA equilibration procedure is also useful to replace the use of TBP 

equilibration procedure. Divide the SDS equilibration buffer into two 50-mL 

aliquots. Add 1 g DTT to the first aliquot and 1.25 g IAA to the second aliquot. 

Add 10 mL of the DTT equilibration buffer to each strip and place on a shaker 

for 10 min. Decant the DTT equilibration buffer and shake with 10 mL of the 

IAA equilibration buffer for another 10 min. 

13. To prepare the agarose embedding solution, dissolve 1gofagarose in 100 mL 

of small gel buffer and melt in a microwave on medium power. For complete 

melting of the agarose solution, heat the agarose solution in short intervals with 

occasional swirling to mix the solution. 

14. In-gel deglycosylation: After destaining, one may remove the glycan groups 

of glycoproteins by trypsin digestion for obtaining peptides of highest purity. 

Rehydrate gel spots (see Subheading 3.12, step 5) with 10 μL of PNGase F 

stock solution (10 μU) and incubate for 3hat37°C. Decant the supernatant 

including the glycans. Wash the gel piece with 50 μL 50 mM NH4HCO3 (pH 

7.8) and ACN (6:4). Dry the gel piece in a Speed Vac. 

15. The SEQUEST software was used to identify the peptide sequences: 

DeltaCn ≥ 0.1 and Rsp ≤ 4; Xcorr ≥ 1.9 with charge state 1+, Xcorr ≥ 2.2 with 

charge state 2+, and Xcorr ≥ 3.75 with charge state 3+ were used as cutoffs for 

peptide identification. 


This study was supported by a grant from the Korean Health 21 R&D project, 

Ministry of Health & Welfare, Republic of Korea (A030003 to YKP).

74 Cho et al. 

References 

1. Putnam, F. W. (ed) (1987) The Plasma Proteins, Academic Press, New York. 

2. Anderson, N. L., and Anderson, N. G. (2002) The human plasma proteome: history, 

character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845–867. 

3. Lee, H. J., Lee, E. Y., Kwon, M. S., and Paik, Y. K. (2006) Biomarker discovery 

from the plasma proteome using multidimensional fractionation proteomics. Curr. 

Opin. Chem. Biol. 10, 42–49. 

4. Cho, S. Y., Lee, E. Y., Lee, J. S., Kim, H. Y., Park, J. M., Kwon, M. S., Park, Y. K., 

Lee, H. J., Kang, M. J., Kim, J. Y., Yoo, J. S., Park, S. J., Cho, J. W., Kim, H. S., and 

Paik, Y. K. (2005) Efficient prefractionation of low-abundance proteins in human 

plasma and construction of a two-dimensional map. Proteomics 5, 3386–396. 

5. Omenn, G. S., States, D. J., Adamski, M., and Blackwell, T. W. (2005). Overview 

of the HUPO Plasma Proteome Project: results from the pilot phase with 35 

collaborating laboratories and multiple analytical groups, generating a core dataset 

of 3020 proteins and a publicly-navailable database. Proteomics 5, 3226–3245. 

6. States, D. J., Omenn, G. S., Blackwell, T. W., Fermin, D., Eng, J., Speicher, D. W., 

and Hanash, S. M. (2006) Challenges in deriving high-confidence protein identifications 

from data gathered by a HUPO plasma proteome collaborative study. Nat. 

Biotechnol. 24, 333–338. 

7. Yang, Z., Hancock, W. S., Chew, T. R., and Bonilla, L. (2005) A study of 

glycoproteins in human serum and plasma reference standards (HUPO) using 

multilectin affinity chromatography coupled with RPLC-MS/MS. Proteomics 5, 

3353–3366. 

8. Wang, Y., Wu, S. L., and Hancock, W. S. (2006) Approaches to the study of 

N-linked glycoproteins in human plasma using lectin affinity chromatography 

and nano-HPLC coupled to electrospray linear ion trap-Fourier transform mass 

spectrometry. Glycobiology 16, 514–523. 

9. Gorg, A., Boguth, G., Kopf, A., Reil, G., Parlar, H., and Weiss, W. (2002) Sample 

prefractionation with Sephadex isoelectric focusing prior to narrow pH range twodimensional 

gels. Proteomics 2, 1652–1657. 

10. Wu, T. L. (2006) Two-dimensional difference gel electrophoresis. Methods Mol. 

Biol. 328, 71–95. 

11. Park, K. S., Kim, H., Kim, N. G., Cho, S. Y., Choi, K. H., Seong, J. K., and 

Paik, Y. K. (2002) Proteomic analysis and molecular characterization of tissue 

ferritin light chain in hepatocellular carcinoma. Hepatology 6, 1459–1466. 

12. Park, K. S., Cho, S. Y., Kim, H., and Paik, Y. K. (2002) Proteomic alterations of the 


carcinoma. Int. J. Cancer 2, 261–265. 

13. Rai, A. J., Glefand, C. A., Haywood, B. C., Warunek, D. J., Yi, J., Schuchard, M. D., 

Mehigh, R. J., Cockrill, S. L., Scott, G. B., Tammen, H., Schulz-Knappe, P., 

Speicher, D. W., Vitzthum, F., Haab, B. B., Siest, G., and Chan, D. W. 

(2005) HUPO plasma proteome project specimen collection and handling: towards 


3262–3277.


14. Huang, L., Harvie, G., Feitelson, J. S., Gramatikoff, K., Herold, D. A., Allen, D. L., 

Amunngama, R., Hagler, R. A., Pisano, M. R., Zhang, W. W., and Fang, X. (2005) 

Immunoaffinity separation of plasma proteins by IgY microbeads: meeting the 

needs of proteomic sample preparation and analysis. Proteomics 5, 3314–3328. 

15. Herbert, B. and Righetti, P. G. (2000) A turning point in proteome analysis: sample 

prefractionation via multicompartment electrolyzers with isoelectric membranes. 

Electrophoresis 21, 3639–3648. 

16. Miklos, G. L. and Maleszka, R. (2001) Integrating molecular medicine with 

functional proteomics: realities and expectations. Proteomics 1, 30–41. 

17. Weber, G., Islinger, M., Weber, P., Eckerskorn, C., and Volkl, A. (2004) 

Efficient separation and analysis of peroxisomal membrane proteins using free-flow 

isoelectric focusing. Electrophoresis 25, 1735–1747. 

18. Choi, B. K., Cho, Y. M., Bae, S. H., Zoubaulis, C. C., and Paik, Y. K. (2003) 

Single-step perfusion chromatography with a throughput potential for enhanced 

peptide detection by matrix-assisted laser desorption/ionization-mass spectrometry. 


19. Gobom, J., Nordhoff, E., Mirgorodskaya, E., Ekman, R., and Roepstorff, P. (1999) 

A sample purification and preparation technique based on nano-scale RP-columns 

for the sensitive analysis of complex peptide mixtures by MALDI-MS. J. Mass 

Spectrom. 24, 105–116. 

20. Walsh, B. J., and Herbert, B. R. (1999) Casting and running vertical slap-gel 

electrophoresis for 2D-PAGE. Methods Mol. Biol. 112, 245–253. 

21. Newhall, W. J. and Jones, R. B. (1983) Disulfide-linked oligomers of the major 

outer membrane protein of chlamydiae. J. Bacteriol. 154, 998–1001. 

22. Kaufman, R. J. (1998) Post-translational modifications required for coagulation 

factor secretion and function. Thromb. Haemost. 79, 1068–1079. 

23. Tabas, I. (1999) Nonoxidative modifications of lipoproteins in atherogenesis. Annu. 

Rev. Nutr. 19, 123–139.

II 

Clinical Proteomics by 2DE and Direct 

MALDI/SELDI MS Profiling

5 

Analysis of Laser Capture Microdissected Cells 

by 2-Dimensional Gel Electrophoresis 

Daohai Zhang and Evelyn Siew-Chuan Koay 

Summary 

Laser capture microdissection (LCM) is a powerful tool for procuring near-pure 

populations of targeted cell types from specific microscopic regions of tissue sections, 

by overcoming problems due to tissue heterogeneity and minimizing intermixture and 

contamination by other cell types. The combination of LCM with various proteomic 

technologies has enabled high-throughput molecular analysis of human tumors, and 

provided critical tools in the search for novel disease markers and therapeutic targets. As 

an example, we describe the application of LCM in dissecting the tumor cells in breast 

cancer for macromolecular extraction and subsequent protein separation by 2-dimensional 

gel electrophoresis (2-D GE). The protocols and the key issues involved in preparing 

ethanol-fixed paraffin-embedded tissue blocks and microscopic sections, microdissecting 

the cells of interest using the PixCell II LCM system, extracting and separating the cellular 

proteins by 2-D GE, and preparing selective proteins for peptide mass analysis by mass 

spectrometry, are discussed. The aim is to provide a practical guide in performing highthroughput 

microdissection of target cells and gel-based proteomics, which can be adapted 

to research in cancer formation and growth. 

Key Words: laser capture microdissection; 2-dimensional gel electrophoresis; breast 

cancer; proteomics; silver staining. 


Cellular proteins (collectively known as “proteomes”) are less susceptible 

than the transcriptome to experimental artifacts arising from the rigors of tissue 

collection and processing, and advances in global protein expression analysis 



77

78 Zhang and Koay 

(expression proteomics) have been used in mapping cellular pathways, identifying 

the molecular alterations associated with disease onset and progression 

and searching for potential tumor markers or drug targets in human disease, 

especially in cancer. However, to obtain cell-specific protein profiles, homogeneous 

or near-pure populations of the cells of interest, free from contamination 

by adjacent cell types, are prerequisites. Laser capture microdissection (LCM) 

was developed to enable the procurement of near-pure populations of the target 

cells with a greater speed and precision than is possible with manual dissection 

methods. LCM permits selective transfer of specific cell types, under direct 

microscopic visualization, from complex tissues onto a polymer film that is 

activated by laser pulses, whilst retaining their morphology. The homogeneity 

of encapsulated cells can be verified microscopically. With these inherent 

advantages, LCM has become a valuable research tool and has been applied to 

cellular and molecular studies of various cancers, including breast (1,2), colon 

(3), and liver (4) cancers. It is equally efficacious in procuring cell populations 

from both frozen tissues (3,4) and ethanol-fixed, paraffin-embedded tissues 

(1,5). 

Protein profiles of the LCM-dissected cells can be obtained by twodimensional 

fluorescence difference gel electrophoresis (2-D DIGE) (6), 

16 

O/ 18 O isotopic labeling (7), differential iodine radioisotope detection (2), 

isotope-coded affinity tag (iCAT) coupled with two-dimensional tandem mass 

spectrometry (2-D LCMS/MS) (8), and mass spectrometry compatible silver 

staining (1,9). Protein samples from LCM-dissected cells can also be applied 

to reverse-protein arrays to analyze the key cellular signaling pathways and 

metabolic networks (10,11). In this chapter, the in-house protocols used in 

the authors’ laboratory for procuring near-pure populations of breast tumor 

cells from clinical samples, and for the extraction, isolation, and analysis of 

their protein profiles, are described. These include: (1) preparation of ethanolfixed 

paraffin-embedded tissue blocks; (2) microdissection using the Pix II 

LCM System and cellular protein extraction; (3) protein separation by 2-D gel 

electrophoresis (2-D GE), silver staining, and gel image analysis; and (4) preparation 

of targeted proteins of interest for peptide mass analysis by tandem mass 

spectrometry and identification of proteins of interest via database search. 

2. Materials 

2.1. Histology—Tissue Block and Tissue Section Preparation 

1. 70% (v/v), 80% (v/v), 95% (v/v), 100% ethanol 

2. Deionized or Milli-Q water (Millipore, Bedford, MA, USA) 

3. Hematoxylin solution, Mayer’s (Sigma, St. Louis, MO, USA) 

4. Eosin Y solution (Sigma)

Combining LCM with 2-D Gel Electrophoresis 79 

5. Complete, mini protease inhibitor cocktail tablets (Roche Applied Science, 

Pleasanton, CA, USA) 

6. Disposable microtome blades (Feather Safety Razor Co., Ltd., Osaka, Japan) 

7. Uncharged microscopic glass slides (Paul Marienfeld GmbH & Co, KG, Lauda- 

Koenigshofen, Germany) 

8. Sakura Tissue-Tek ® V.I.P. TM 5 Jr tissue processor (Sakura Finetek, Inc. Japan 

Co., Ltd, Tokyo) 

9. Paraffin wax—Paraplast ® tissue embedding medium; melting point 56-58°C, 

store at room temperature (RT) (Structure Probe, Inc., West Chester, PA, USA) 

10. Xylenes, Reagent Grade (Sigma) 

11. Embedding molds—super metal base molds, 66mm × 54mm × 15mm (Surgipath 

Medical Industries, Richmond, IL, USA) 

2.2. Laser Capture Microdissection and Protein Sample Preparation 

1. PixCell II LCM system (Arcturus Engineering, Mountain View, CA, USA) 

2. CapSure transparent plastic caps (Arcturus Engineering) 

3. Lysis buffer: 7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 1% Nonidet P (NP)-40, 

0.5% (v/v) Triton X-100, 50 mM dithiothreitol (DTT), 40 mM Tris-HCl, pH 7.5, 

2 mM tributyl phosphine (TBP), and 1% (v/v) IPG buffer (pH 3–10). Store at RT. 

4. PlusOne 2-D Clean-up Kit (GE Healthcare, San Francisco, CA, USA) 

5. Immobilized pH gradient (IPG) buffer (pH 3–10) (GE Healthcare) 

6. PlusOne 2-D Quantitation Kit (GE Healthcare) 

2.3. Isoelectric Focusing (IEF) and Sodium Dodecyl 

Sulfate-Polyacrylamide Gel Electrophoresis (SDS-PAGE) 

1. Ettan TM IPGphor TM IEF electrophoresis unit (GE Healthcare) 

2. Ceramic strip holders and Ettan TM IPGphor TM Strip Holder Cleaning Solution 

(GE Healthcare) 

3. Immobiline TM IPG DryStrips (18 cm, pH 3–10, NL) (GE Healthcare) 

4. DryStrip Cover Fluid (GE Healthcare) 

5. Sample rehydration buffer: 7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 1% 

(w/v) NP-40, 1% (v/v) IPG buffer, 50 mM DTT. DTT was added freshly to the 

rehydration buffer prior to use. Store at RT. 

6. Equilibration buffer A (prepare 10 ml for each strip): 6 M urea, 30% glycerol, 

2% SDS, 1% DTT, 50 mM Tris-HCl, pH 8.8. DTT is added to the stock solution 

before use. 

7. Equilibration buffer B (prepare 10 ml for each use strip): 6 M urea, 30% glycerol, 

2% SDS, 250 mg (2.5%, w/v) iodoacetamide (IAA), 50 mM Tris-HCl, pH 8.8. 

IAA is added to the stock solution before use. 

8. 10% SDS-acrylamide gel: 33 ml acrylamide/bis (30% T, 5% C) (Bio-Rad 

Laboratories, Hercules, CA, USA), 25 ml Tris (1.5 M, pH 8.8), 1 ml 10% (w/v) 

SDS, 0.5 ml 10% (w/v) ammonium persulfate (freshly prepared on the day of 

use), 35 μl TEMED (Bio-Rad). Make up to 100 ml with Milli-Q water.


9. Water-saturated isobutanol: Shake equal volumes of Milli-Q water and isobutanol 

in a glass bottle and allow the mixture to separate. Transfer the top layer 

to a new bottle and store at RT. 

10. Agarose sealing solution: Dissolve 0.5% low-melting-point agarose and 0.1% 

(w/v) bromophenol blue in 1× SDS-PAGE running buffer. Store at RT. 

11. SDS-PAGE running buffer: 25 mM Tris, 198 mM glycine, 0.2% (w/v) SDS, 

pH 8.3 

12. PROTEAN TM II xi Cell system (Bio-Rad) 

2.4. Silver Staining (see Note 1) 

1. Fix solution: 5% acetic acid and 50% ethanol per 100 ml 

2. Sensitivity-enhancing solution: 30% (v/v) ethanol, 6.8% (w/v) sodium acetate, 

100 μl of 2% (w/v) sodium thiosulphate per 100 ml 

3. Silver staining solution: 0.25% (w/v) silver nitrate 

4. Development solution: 2.5% (w/v) anhydrous potassium carbonate, 20 μl of 2% 

(w/v) sodium thiosulphate per 100 ml, 40 μl of 37% formaldehyde per 100 ml. 

5. Stop solution: 4% (w/v) Tris and 2% (v/v) acetic acid per 100 ml 

6. Gel store (soak) solution: 1% (w/v) sodium acetate and 10% (v/v) methanol per 

100 ml 

2.5. Gel Image Analysis 

1. Personal Densitometer SI (Molecular Dynamics, Sunnyvale, CA, USA) 

2. ImageMaster 2D Elite (Platinum) software (GE Healthcare) 

2.6. In-gel Trypsin Digestion and Preparation for MS Analysis 

1. Destaining solution: 30 mM potassium ferricyanide and 100 mM sodium 

thiosulfate (1:1) 

2. 25 mM sodium bicarbonate 

3. Dehydrating solution: 50 mM sodium bicarbonate and 50% (v/v) methanol per 

100 ml 

4. SpeedVac centrifuge (TeleChem International, Inc., Sunnyvale, CA, USA) 

5. Digestion solution: 40 ng/μl trypsin sequencing grade (Promega, Madison, WI, 

USA) in 20 mM ammonium bicarbonate solution 

6. Extraction solution (for hydrophobic peptides): 5% (v/v) trifluoracetic acid 

(TFA) and 50% (v/v) acetonitrile (ACN) per 100 ml 

7. Peptide reconstitution solution: 0.1% (v/v) TFA 

8. ZipTip C18 columns (Millipore) 

9. Eluant: 70% (v/v) ACN and 0.1% TFA per 100 ml 

10. Stainless steel MALDI-TOF sample target plates (Applied Biosystems, 

Framingham, MA, USA) 

11. Alpha-cyano-4-hydroxycinnamic acid (-CHCA) matrix, 3 mg/ml (Sigma) 

12. Applied Biosystems 4700 MALDI-TOF/TOF mass spectrometer


2.7. Database Search for Protein Identification 

1. MASCOT software (Matrix Science, London, England) 

2. MS-Fit software (http://prospector.ucsf.edu) 

3. Methods 

The methods described below have been successfully used in the authors’ 

laboratory for proteomics studies in human breast cancer specimens (1,9) and 

can be applied to other cancer tissues as well. Breast tumors and matched 

normal tissues were obtained from the Tissue Repository Unit of the National 

University Hospital, Singapore, after approval by our Institutional Review 

Board. 

3.1. Preparation of Tissue Sections for LCM 

In this step, frozen tissues can be directly transferred from the –80°C freezer, 

where they had been stored after surgical excision and trimming, to a pre-cooled 

tube containing 70% (v/v) ethanol and kept on ice. Ethanol-fixed paraffinembedded 

tissue blocks should be prepared as quickly as possible, and the 

completed blocks stored at or below 4°C. 

1. Fix the frozen tissue overnight in 70% ethanol at 4°C. 

2. Place each ethanol-fixed tissue piece, trimmed to appropriate dimensions, into 

a pre-cooled cassette within the tissue processor and dehydrate according to the 

following procedure: 30 min each in 70% and 80% ethanol at 40°C; 45 min in 

95% ethanol at 40°C (twice); 45 min in 100% ethanol at 40°C (twice), and 45 min 

in xylene at 40°C (twice) (see Note 2). 

3. Embed the specimen in paraffin using embedding molds, with four changes of 

paraffin after every 30-min interval. 

4. Store the paraffin blocks at or below 4°C, if they were not to be processed 

immediately for sectioning. 

5. Put the block in a –20°C freezer for at least 1 h before cutting sections from it. 

6. Cut sections of 8 μm thickness using a standard microtome. Blades should be 

changed regularly (see Note 3). 

7. Collect the tissue sections on uncharged microscopic glass slides, allow tissue 

sections to be air dried, and store the cut sections at or below 4°C. 

3.2. Staining of Paraffin-embedded Sections 

The staining of sections for LCM is similar to that used in most histology 

laboratories for morphological assessment. However, using minimal amount of 

the stain to visualize the tissue for microdissection will improve macromolecule 

recovery (see Note 4). One tablet of protease inhibitor cocktail should be added


to every 10 ml of each reagent (except xylene), and all reagents prepared using 

double deionized water or Milli-Q ® water. Staining should be performed as 

close as possible to the scheduled LCM dissection. 

1. Deparaffinize the sections in fresh xylene for 5 min, followed by another 5 min 

with a fresh change of xylene. 

2. Rehydrate for 15 s in each step of the following series: 100% ethanol, 95% 

ethanol, 75% ethanol, and deionized water. 

3. Stain with Mayer’s Hematoxylin for 30 s. 

4. Rinse off excess stain with deionized water for 15 s; repeat rinse a second time. 

5. Dehydrate for 15 s in 70% ethanol. 

6. Stain with Eosin Y for 5 s. 

7. Dehydrate the sections for 15 s (twice) in 95% ethanol, 15 s (twice) in 100% 

ethanol, and 60 s in xylene. 

8. Air-dry for approximately 2–5 min to allow xylene to evaporate completely (see 

Note 5). 

9. The tissue is now ready for LCM (see Note 6). 

3.3. Laser Capture Microdissection and Protein Sample Preparation 

The PixCell II LCM system (Arcturus Engineering, Mountain View, CA, 

USA) is used for specific microdissection of tumor cells in our laboratory. 

Tissue sections are usually mounted on uncoated glass slides to provide support 

for the CapSure cap during microdissection. LCM utilizes an infrared laser 

integrated into a standard microscope, and when the desired cells move into 

the path of the light source, the investigator activates the laser, which in 

turn activates the membrane (a short laser pulse emitted heats the transparent 

membrane to ∼90°C for 5 ms). This melts the membrane, with subsequent 

binding and encapsulation of the cells of interest, segregating them from the 

surrounding cells and connective tissues. Images of the tissues before and after 

microdissection and of the captured cells on the cap can be visualized, thus 

maintaining an accurate record of each dissection. The laser beam diameter 

may be adjusted from 7.5 to 30 μm to procure either single cells or groups of 

cells, respectively. 

1. Place the slide containing the prepared tissue on the microscope stage. Set the 

laser parameters as follows: spot diameter at 15 μm, pulse duration at 5 ms, and 

power at 50 mW. 

2. Scan the tissue section to locate the desired cells. Dissect out the target cells of 

interest and capture all encapsulated cells from each section in quick succession 

into one cap. Cells dissected from ∼2500 shots can be captured into one cap (see 

Note 7). Figure 1 shows an example of tumor cells before and after microdissection.


A B C 

Fig. 1. Laser capture microdissection (LCM) of breast tumor cells. The tissue section 

on the uncharged glass slide was stained with hematoxylin and eosin and microdissected 

with the PixCell II LCM system (Arcturus Engineering). (A) section before LCM; (B) 

section after LCM; (C) microdissected cell. 

3. Place the LCM cap on an Eppendorf tube containing 100 μl of lysis buffer with 

protease inhibitor and invert the tube and vortex vigorously for 1 min. 

4. Place the tube on ice for approximately 20 min and sonicate the microdissected 

sample in a bath sonicator with 5 s pulses, in between 5-s intervals, for a duration 

of 1 min. 

5. Replace the sample on ice immediately after 1-min sonication. 

6. Centrifuge the sample at 16,000 g for 20 min at 4°C and transfer the supernatant 

to a new Eppendorf tube. 

7. Determine the protein concentration using the PlusOne 2D Quantitation kit (GE 

Healthcare) and clean up the sample using the PlusOne 2-D cleanup kit (GE 

Healthcare), following the manufacturer’s instructions closely. 

8. Dissolve the protein pellet in the appropriate volume of sample rehydration buffer 

and aliquot according to experimental plans for immediate and later usage. Store 

the aliquotted samples at –80°C until analyzed (see Note 8). 

3.4. First-dimension Gel Electrophoresis (Isoelectric Focusing) 

1. Prepare the strip holder for the 18-cm IPG strip (see Note 9). 

2. Squeeze a few drops of Ettan IPGphor Strip Holder Cleaning Solution (GE 

Healthcare) into the slot and clean thoroughly. Rinse with Milli-Q water and dry 

completely. 

3. Mix approximately 50 μl of the reconstituted protein samples (∼100–150 μg) 

with the appropriate volume of rehydration buffer. The total volume should be 

340 μl for one 18-cm IPG strip. 

4. Transfer the entire volume of the diluted protein sample into the groove of the 

IPG strip holder. 

5. Remove the cover from the IPG strip (18 cm, pH 3–10) and place the IPG strip 

in the holder such that the gel of the strip is in contact with the sample (i.e., gel


side down). Try to remove any trapped air bubbles by lifting the strip up and 

down from one side. 

6. Overlay the IPG strip with 2–3 ml of DryStrip Cover Fluid to prevent urea 

crystallization and evaporation, and replace the cover on the strip holder. 

7. Rehydrate the IPG strip at 20 V for 12 h at 20°C. 

8. Perform IEF under the following conditions: 500 V for 1 h, 2000 V for 1 h, 

4000 V for 1 h, and 8000 V for 6 h. 

9. Once focusing is complete, pour off the oil. The strips can be stored at –20°C for 

several weeks, or immediately treated as described below (see Subheading 3.5). 

3.5. IPG Strip Equilibration 

1. Place the focused IPG strips in a container with 10 ml of equilibration buffer A 

and shake for 15 min at RT (see Note 10). 

2. Transfer the IPG strip to a container with 10 ml of equilibration buffer B and 

shake for 15 min at RT (see Note 10). 

3. The equilibrated strips can then be processed for second-dimension gel 

electrophoresis. 

3.6. Second-dimensional SDS-PAGE 

Prepare the SDS-polyacrylamide gels in advance, and make sure that the 

gels are well polymerized before performing the equilibration of IPG strips. 

The proteins have to be charged by equilibration with SDS, and be reduced 

and alkylated to avoid the formation of oligomers. In our laboratory, we use 

the PROTEAN II xi Cell system (Bio-Rad) for SDS-PAGE. 

1. Assemble the gel casting cassette as per the manufacturer’s instructions. 

2. Prepare 10% SDS-PAGE (see Note 10) and pour the solution slowly into the 

cassette (two 16 cm × 20 cm glass plates sandwiched by 1.5-mm thick spacers) 

until the gel height is approximately 1 cm from the top. 

3. Overlay the gel solution with 2 ml of water-saturated isobutanol. It is best to 

pour 1 ml of water-saturated isobutanol from one side of the gel and 1 ml on 

the other side. Do not pour it all along the gel meniscus. 

4. Allow the gel to polymerize for at least 2 h. 

5. When polymerization is completed, remove the water-saturated isobutanol and 

rinse with water again. 

6. With a pair of forceps, carefully place the equilibrated strip on top of the PAGE 

gel, with the acidic side of the strip at left. Cover the strip with melted agarose 

sealing solution (see Note 11). 

7. Assemble the electrophoresis unit (Bio-Rad) and perform electrophoresis at 15°C 

as follows: 40 V for 15 min or until the blue dye enters the gel and then raise 

the voltage to 125 V and run the gel overnight or until the blue dye migrates to 

the bottom of the gel. 

8. Switch off the main power and disassemble the gel cassette.


9. Place the gel in a glass container and wash the gel with Milli-Q water. 

10. Stain the gel using the mass spectrometry-compatible silver staining protocol 

(see Subheading 3.7). 

3.7. Silver Staining and Image Analysis 

1. The silver staining protocol as described below is used in the authors’ laboratory 

and is highly compatible with protein identification by MALDI-TOF MS and 

MALDI-TOF/TOF MS/MS. It should be noted that adequate washing with Milli- 

Q water is essential to reduce the risk of keratin contamination. All the solutions 

must be prepared with Milli-Q water, and all the chemical reagents should be 

filtered to remove any particles that may cause interference during MS analysis. 

All solutions prepared from solid chemicals should be freshly prepared before 

performing silver staining. Fix the gel with fixing solution for at least 2 h, 

changing the solution afresh at hourly intervals. 

2. Briefly wash with Milli-Q water, with constant shaking for about 15 min. 

3. Remove the wash and cover the gel with appropriate sensitivity-enhancing 

solution and incubate for 1 h, with constant shaking. 

4. Wash the gel thoroughly with Milli-Q water for 6×15min, with gentle shaking 

and replacing with fresh Milli-Q water after each cycle (see Note 12). 

5. Stain the gel with silver staining solution for 30 min. 

6. Wash off excess stain from the gel with Milli-Q water (twice, for 2×1min). 

7. Develop the gel for 5–30 min in a developing solution (see Note 13). 

8. Add Stop Solution and shake the gel for approximately 20 min to stop the 

reaction. 

9. Wash the gel using Milli-Q water for 20 min; replace water and repeat the wash. 

10. Scan the gel using Personal Densitometer SI, or store the gel in the gel soak 

solution for analysis at a later time. 

11. Capture the image using ImageMaster 2D Elite software (GE Healthcare). The 

image analysis includes spot detection, quantification and normalization of spot 

intensity to the background interferences, according to the instructions from the 

software. An example of images showing the differences between the protein 

profiles of LCM-microdissected HER-2/neu positive and -negative tumor cells 

is shown in Fig. 2. 

12. Analyze the image using the software and identify spots that show significant 

differences in spot intensities (see Note 14), reflecting differential protein 

expression in the two subtypes of breast cancer triggered by the presence or 

suppression of HER-2/neu oncogene. Only those spots that show either more 

than threefold or less than threefold change in signal intensity, consistently 

from three replicate sets of gels, are considered as demonstrating differential 

protein expression and selected for further analysis by MALDI-TOF MS/MS. 

The likelihood of any protein displaying less convincing evidence of differential 

protein expression being a potential biomarker for early detection of tumor 

growth or a therapeutic target for breast cancer treatment is low.


kDa pI3 

HER-2/neu-P 

HER-2/neu-N 

10 pI3 

10 

92 

50 

AAH025396 

P04075 

NP004095 

35 

28 

P06753-2 

AAB49495 

P07339 

NP001531 

NP000627 

Fig. 2. Silver-stained protein profiles of LCM-dissected cells. Protein samples from 

HER-2/neu positive and -negative cells are separated by using IPG ® ( strips (18 cm, 

pH 3–10 NL) and homogeneous SDS-PAGE (10%), and then stained with silver 

nitrate. Silver-stained gels were scanned using the Personal Densitometer SI (Molecular 

Dynamics) and differentially expressed protein spots were analyzed by ImageMaster 

2-D Elite software (GE Healthcare). The Accession Numbers indicate the protein 

ID identified by MALDI-TOF/TOF tandem mass spectrometry and NCBInr database 

search using Mascot software (Matrix Science, London, UK). 

3.8. Trypsin Digestion and Preparation of Peptides for Mass 

Spectrometric Analysis 

1. Excise the silver-stained protein spots showing significant differential protein 

expression, as mentioned above, one at a time, taking care not to include adjacent 

proteins in vicinity, and transfer to individual tubes. 

2. Wash with 100 μl of Milli-Q water for 5 min. 

3. Add 50 μl of the destaining solution into the tubes, and about 20 min on a 

platform shaker at RT until the gels become clear in color. 

4. Remove the solution carefully and wash with 100 μl of Milli-Q water. 

5. Incubate the gel pieces with 25 mM sodium bicarbonate for 20 min, and then 

cut them into smaller pieces with the tip of the transfer pipette. Avoid carryover 

and contamination during repetitive work on consecutive samples. 

6. Rinse the gel pieces with Milli-Q water, discard the wash after pulsing down 

the gel pieces, and repeat the washing process three times. 

7. Add 100 μl of dehydrating solution and incubate for 20 min at RT. 

8. Dry the gel pieces in a SpeedVac centrifuge. 

9. Re-swell the dried gel pieces with 10–20 μl of Digestion Solution and leave 

overnight at 37°C to ensure complete digestion. 

10. Extract the resultant hydrophilic peptides first with 10 μl of Milli-Q water for 1 h.


11. Then extract the hydrophobic peptides with Extraction Solution for 2 h. 

12. Pool the extracted hydrophilic and hydrophobic peptides and dry the peptide 

mixture using the SpeedVac centrifuge. 

13. Redissolve the dried peptides in 10 μl of 0.1% (v/v) TFA. 

14. Desalt the sample with ZipTip C18 columns (Millipore) and elute the treated 

and purified peptides with 2.5 μl of Eluant. 

15. Mix 0.5 μl of the sample eluate with 0.5 μl of CHCA matrix (3 mg/ml) and spot 

the mixture onto the stainless steel MALDI-TOF sample target plates. 

16. The pretreated peptide samples must be stored on ice during transfer to the 

core facility for mass spectrometric analysis. In our laboratory, peptide mass 

spectra are obtained by the Applied Biosystems 4700 Proteomics Analyzer 

MALDI-TOF/TOF mass spectrometer, set in the positive ion reflector mode. 

The subsequent MS/MS analyses are performed in a data-dependent manner, 

and the 10 most abundant ions fulfilling certain preset criteria are subjected to 

high-energy CID analysis. The collision energy is set to 1 keV, and nitrogen is 

used as the collision gas. 

3.9. Database Search to Match Protein Identities 

Database searches were conducted using the MASCOT search engine 

(http://www.matrixscience.com). For database search, known contamination 

peaks, such as keratin and autoproteolysis peaks, were removed prior to 

database search. Protein identification was performed using the MASCOT 

software (Matrix Science, London, UK), and all tandem mass spectra were 

searched against the NCBInr database, with mass accuracy of within 200 ppm 

for mass measurement, and within 0.5 Da for MS/MS tolerance window. 

Searches were performed without constraining the protein molecular weight 

(Mr) or isoelectric point (pI) and species, and allowing for carbamidomethylation 

of cysteine and partial oxidation of methionine residues. Up to one missed 

tryptic cleavage was considered for all tryptic-mass searches. Protein scores 

greater than 75 are considered to be significant (p < 0.05). 

3.10. Experimental Example: Differential Protein Profiles 

between HER-2/neu Positive and -Negative Breast Tumors 

We dissected the tumor cells from two different subtypes of breast tumors 

and compared their protein profiles, based on the protocols described above. 

Figure 2 shows the LCM-dissected tumor cell protein patterns visualized by 

silver staining. It should be noted that pooled protein samples from different 

cases of the same tumor subtypes were used for 2-D GE. This gel-based 

protein visualization technique requires high amount of proteins, and thus 

more sensitive detecting reagents and protein identification strategies had to 

be developed to produce meaningful results (see Notes 15 and 16). Using


the silver-staining protocol, we identified 500–600 protein spots in the protein 

profiles generated by coupling LCM and 2-D GE. Protein spots of interest would 

be excavated and digested with trypsin (Promega), desalted with ZipTipc 18 

(Millipore), and analyzed using MALDI-TOF/TOF tandem mass spectrometry. 

Protein identities, as shown in Fig. 2, are obtained by searching the NCBInr 

databases using the MASCOT software (Matrix Science). 

4. Notes 

1. All the chemical solutions should be filtered by passing them through filter paper 

(Cat No. 1001 150, Whatman ® , Whatman International Limited, Springfield 

Mill, Maidstone, Kent, England) to minimize precipitates occurring onto the 

gels during silver staining. 

2. Tissue processors in standard histopathology laboratories generally include 

formalin fixation as the first step in the paraffin infiltration procedure. It is 

important to avoid these steps when processing tissues intended for molecular 

gene and proteome profiling. 

3. Consistent LCM transfers have been demonstrated from 5–10 μm thick paraffinembedded 

tissue sections. For a successful LCM transfer, the strength of the bond 

between polymer film and targeted tissue must be stronger than that between the 

tissue and the underlying glass slide. Therefore, for most tissue types, sections 

should be collected with uncharged glass slides. To prevent cross-contamination 

while sectioning, residual paraffin and tissue fragments should be wiped off 

from the area of the sectioning blade with xylenes between consecutive slides. 

If possible, a fresh microtome blade should be used to section a different block. 

4. In our hands, hematoxylin and eosin are best reduced to 10% of their standard 

concentrations used for routine histomorphological work, when applied to slides 

prepared for LCM. Breast tumor cells can be clearly visualized and identified 

from other cell types, without influencing the procurement of tumor cells by 

LCM, with this modification. Minimum staining also improves macromolecular 

recovery during cellular protein extraction. 

5. Complete dehydration and air drying of sections are the main factors influencing 

the efficiency of LCM. Prolonged air drying or presence of moisture in the 

sections appears to inhibit, at least partially, the transfer of cells to the plastic 

firm. 

6. If the investigators have less experience in checking cancer tissue sections, 

we strongly recommend that investigators consult with the pathologists in their 

institutions to get assistance in identifying the target cell types that will be 

microdissected using LCM. It is essential to avoid contamination of other cell 

types, or dissecting the wrong cells. 

7. During microdissection, make sure that there are no irregularities on the tissue 

surface in or near the area to be microdissected. It should also be noted that 

wrinkles can elevate the LCM cap away from the tissue surface and decrease the


membrane contact during laser activation. Use an adhesive pad after microdissection 

to remove cells that may have attached non-specifically to the LCM 

cap. A cap-alone control is recommended for each experiment to ensure that 

non-specific transfer is not occurring during microdissection. The cap should be 

processed together with other tissue-containing caps and serves as a negative 

control. For protein separation by 2-D GE, 20 to 30 sections from each tissue 

sample are dissected, depending on the percentage of targets cells in the full 

sections. Generally, 2300–2700 laser pulse shots are used for each cup. Cells 

from at least 50,000 shots (spot diameter is 15 μm) are required for each 

18-cm gel. 

8. Up to 15 mg of proteins can be solubilized with 500 μl of the sample rehydration 

buffer, but with our breast tumor tissue samples, we usually reconstitute 1–2 mg 

of extracted proteins in 500 μl, or 2–4 mg/ml. It is recommended that the 

reconstituted proteins be stored in appropriate aliquots, and that only the required 

number of aliquots needed for the experiment at hand be removed at any time, 

to avoid repeated freezing and thawing the peptides, which will lead to sample 

deterioration. 

9. IEF is performed using Ettan IPGphor IEF electrophoresis unit. Rehydration 

loading of protein samples is used in the authors’ laboratory. The IPG strips for 

first-dimensional separation are commercially available, and can be procured 

from GE Healthcare and other suppliers. IPG strips with various pH gradients and 

dimensions are available. They are used for protein separation with appropriate 

resolution needed. The strips should be kept frozen at –20°C, and thawed just 

before use. The IEF conditions are dependent on the pH range. Reference to the 

manufacturer’s protocol is recommended. For alkali pH loading, cup loading 

is a must, and DTT in the rehydration buffer should be replaced by other 

reducing agents, such as hydroxyethyl-disulfide (HED) reagent (Destreak, GE 

Healthcare). 

10. It is essential to equilibrate the strips before being applied for the seconddimension 

gel electrophoresis (2-D SDS-PAGE). DTT added to buffer A will 

reduce the disulfide bonds whereas IAA in buffer B will alkylate the formed 

sulfydryl groups of proteins. This is to prevent re-oxidation of sulfydryl groups 

and streaking of spots during 2-D SDS-PAGE. Further, the presence of SDS 

makes the proteins negatively charged and suitably primed for SDS-PAGE. Use 

the best quality SDS available for sample and running buffers that include SDS 

in their formulation. We recommend C 12 Grade SDS from Pierce (Rockford, IL, 

USA). 

11. When placing the strips on top of the gel, ensure that the plastic backing of the 

strips is in contact with the glass wall. If necessary, the strips can be trimmed 

properly. When adding agarose sealing solution, make sure that there are no air 

bubbles trapped between the IEF strip and 2-D gel. 

12. Wash the gels thoroughly and repeatedly, as recommended, prior to the development 

step and during the development step itself, to get clear stained gels. 

During the development of the gels, formaldehyde should be added prior to use,


and the suggested concentration should be followed strictly to avoid interference 

during MALDI-TOF analysis. During the developing stage, the gel should be 

constantly shaken to reduce the background. 

13. The developing time depends on the total amount of protein that is used for 

2-D separation. With a higher amount of protein, a shorter developing time can 

be used, without compromising the aim of visualizing the maximum number of 

protein spots. 

14. It is important to manually verify spot detection and matching, as the variations 

in gel resolution, staining, gel background, and automatic image analysis may 

not correctly define the spot contours in every case. This variability and the 

complexity of 2-D gel patterns hinder the accurate matching of analogous spots 

in different gels. 

15. In our experience, approximately 500 to 600 distinct proteins from the dissected 

breast tumor cells can be visualized on 2D-PAGE stained with silver. On average, 

we can extract approximately 4–6 μg of total cellular proteins from 2500 laser 

pulses. Our experience is that silver staining of LCM-dissected cell proteins is a 

sufficiently sensitive tool for isolating and identifying the dysregulated cellular 

proteins of high or moderate abundance. However, for the dysregulated proteins 

of low abundance, the lower detection limit of this technology would have to 

be enhanced by other techniques such as 125-iodine labeling or biotinylation 

and fluorescent dye labeling. In addition, the use of scanning immunoblotting 

with class-specific antibodies, for example, would allow sensitive detection of 

specific subsets of proteins, e.g., all known proteins involved with cell-cycle 

regulation. 

16. Protein identification by MALDI-TOF, LC-MS/MS, or other techniques is also 

limited by the requirement of a minimal protein input amount, which is often not 

attainable from certain types of biopsy samples. A useful strategy to improve 

protein identification is to produce parallel “diagnostic” fingerprints derived 

from microdissected cells and “sequencing” the fingerprints generated from the 

whole tissue section from each case. Alignment of the diagnostic and sequencing 

2D gels permits determination of the proteins of interest for subsequent mass 

spectrometry or N-terminal sequence analysis. 


The Tumor Repository of the National University Hospital, Singapore, 

provided the clinical breast cancer frozen tissues for LCM. The use of the 

PixCell II LCM system was courtesy of the Department of Pathology, Yong 

Loo Lin School of Medicine, National University of Singapore (NUS). This 

work was supported by an Academic Research Fund from the NUS (Grant No. 

R-179-000-032) to the authors.


References 

1. Zhang, D., Tai, L. K., Wong, L. L., Sethi, S. K., Koay, E. S. (2005) Proteomics of 

breast cancer: enhanced expression of cytokeratin 19 in human epidermal growth 

factor receptor type 2 positive breast tumors. Proteomics 5, 1797–1805. 

2. Neubauer, H., Clare, S. E., Kurek, R., Fehm, T., Wallwiener, D., Sotlar, K., et al. 

(2006) Breast cancer proteomics by laser capture microdissection, sample pooling, 

54-cm IPG IEF, and differential iodine radioisotope detection. Electrophoresis 27, 

1840–1852. 

3. Lawrie, L. C., Curran, S., McLeod, H. L., Fothergill, J. E., Murray, G. I. (2001) 

Application of laser capture microdissection and proteomics in colon cancer. J. 

Clin. Pathol: Mol. Pathol. 54, 253–258. 

4. Ai, J., Tan, Y., Ying, W., Hong, Y., Liu, S., Wu, M., et al. (2006) Proteome 

analysis of hepatocellular carcinoma by laser capture microdissection. Proteomics 

6, 538–546. 

5. Ahram, M., Flaig, M. J., Gillespie, J. W., Duray, P. H., Linehan, W. M., 

Ornstein, D. K., et al. (2003) Evaluation of ethanol-fixed, paraffin-embedded tissues 

for proteomic applications. Proteomics 3, 413–421. 

6. Greengauz-Roberts, O., Stoppler, H., Nomura, S., Yamaguchi, H., Goldenring, 

J. R., Podolskym R. H., et al. (2005) Saturation labeling with cysteine-reactive 

cyanine fluorescent dyes provides increased sensitivity for protein expression 

profiling of laser-microdissected clinical specimens. Proteomics 5, 1746–1757. 

7. Zang, L., Palmer-Toy, D., Hancock, W. S., Sgroi, D. C., Karger, B. L. (2004) 

Proteomic analysis of ductal carcinoma of the breast using laser capture microdissection, 

LC-MS, and 16 O/ 18 O isotopic labeling. J. Proteome Res. 3, 604–612. 

8. Li, C., Hong, Y., Tan, Y. X., Zhou, H., Ai, J. H., Li, S. J., et al. (2004) Accurate 

qualitative and quantitative proteomic analysis of clinical hepatocellular carcinoma 

using laser capture microdissection coupled with isotope-coded affinity tag and 

two-dimensional liquid chromatography mass spectrometry. Mol. Cell. Proteomics 

3, 399–409. 

9. Zhang, D., Tai, L. K., Wong, L. L., Chiu, L. L., Sethi, S. K., and Koay, E. S. (2005) 

Proteomic study reveals that proteins involved in metabolic and detoxification 

pathways are highly expressed in HER-2/neu-positive breast cancer. Mol. Cell. 


10. Cowherd, S. M., Espina, V. A., Petricoin, E. F. III, Liotta, L. A. (2004) Proteomic 

analysis of human breast cancer tissue with laser-capture microdissection and 

reverse-phase protein microarrays. Clin. Breast Cancer 5, 385–392. 

11. Gulmann, C., Espina, V., Petricoin, E. III, Longo, D. L., Santi, M., Knutsen, T., 

et al. (2005) Proteomic analysis of apoptotic pathways reveals prognostic factors 

in follicular lymphoma. Clin. Cancer Res. 11, 5847–5855.

6 

Optimizing the Difference Gel Electrophoresis 

(DIGE) Technology 

David B. Friedman and Kathryn S. Lilley 

Summary 

Difference gel electrophoresis (DIGE) technology has been used to provide a powerful 

quantitative component to proteomics experiments involving 2D gel electrophoresis. DIGE 

combines spectrally resolvable fluorescent dyes (Cy2, Cy3, and Cy5) with sample multiplexing 

for low technical variation, and uses an internal standard methodology to analyze 

replicate samples from multiple experimental conditions with unsurpassed statistical confidence 

for 2D gel-based differential display proteomics. DIGE experiments can facilely 

accommodate sufficient independent (biological) replicate samples to control for the large 

interpersonal variation expected from clinical samples. The use of multivariate statistical 

analyses can then be used to assess the global variation in a complex set of independent 

samples, filtering out the noise from technical variation and normal biological variation 

thereby focusing on the underlying variation that can describe different disease states. This 

chapter focuses on the design and implementation of the DIGE methodology employing 

the use of a pooled-sample internal standard in conjunction with the minimal CyDye 

chemistry. Notes are also provided for the use of the alternative saturation labeling 

chemistry. 

Key Words: difference gel electrophoresis; two-dimensional gel electrophoresis; 

quantification. 


Human disease phenotypes are a direct result of protein expression and 

modification. In many cases, such phenotypes cannot be tied directly to a single 

alteration in the genome or resulting proteome, but are likely to be the result 



93

94 Friedman and Lilley 

of multiple factors. Studying disease at the protein level is challenging, but 

as proteins are the mediators of phenotype, the study of protein abundance 

on a global scale is required to gain a more complete understanding of the 

underlying molecular mechanisms of disease. Proteomics in the clinical setting 

is rapidly developing and is having a major impact on the way in which diseases 

will be diagnosed, treated, and monitored (1). It has been estimated that there 

could be hundreds of thousands of different protein isoforms in a mammalian 

cell, but the vast dynamic range of protein abundance results in only the 

most abundant species of proteins being observable by quantitative proteomics 

approaches unless technically variable biochemical or subcellular fractionation 

is employed. The repertoire of techniques and associated hardware, which is 

now applied to this field, is expanding exponentially, and although a complete 

visualization of the proteome is still beyond reach of any single technique, each 

technology platform can provide complementary datasets. 

Difference gel electrophoresis (DIGE) has proven to be a powerful quantitative 

technology for differential display proteomics on a global level, where 

the individual abundance changes for thousands of intact proteins can be simultaneously 

monitored in replicate samples over multiple variables with statistical 

confidence (see Note 1). This includes quantitative information on protein 

isoforms that arise due to post-translational modifications (such as acetylation 

or phosphorylation), which result in a change in the isoelectric point of the 

protein. This also includes splice variants and the results of protein processing, 

all of which are resolved for individual quantification and subsequent analysis 

by MS. 

DIGE is based on conventional 2D gel technology that is capable of resolving 

several thousands of intact proteins first by charge using isoelectric focusing 

(IEF) and then by apparent molecular mass using SDS-polyacrylamide gel 

electrophoresis (PAGE) (6,7) (see Note 2 and Chapters 4 and 5 by Cho et al. 

and Zhang et al., respectively). Importantly, DIGE overcomes many of the 

limitations commonly associated with 2D gels such as analytical (gel-to-gel) 

variation and limited dynamic range that can severely hamper a quantitative 

differential display study. This is accomplished using up to three spectrally 

resolvable fluorescent dyes (Cy2, Cy3, and Cy5, referred to as CyDyes) that 

enable low- to subnanogram sensitivities with >10 4 linear dynamic range, and 

then by multiplexing the prelabeled samples into the same analytical run (2D 

gel). Multiplexing in this way allows for direct quantitative measurements 

between the samples coresolved in the same gel, and is therefore beyond 

the limitations imposed by between-gel comparisons with conventional 2D 

gels. 

The highest statistical power of this multiplexing approach stems from the 

utilization of a pooled-sample internal standard comprised of an equal aliquot

Optimizing DIGE Technology 95 

of every sample in the experiment (see Subheading 1.2.1). With this method, 

two dyes (Cy3 and Cy5) are used to individually label two independent samples 

from a much larger experiment, and the Cy2 dye is used to label an internal 

standard, which is comprised of an equal aliquot of proteins from every sample 

in the experiment. This pooled-sample internal standard is labeled only once in 

bulk to avoid additional technical variation, and enough is made and labeled to 

allow for an equal aliquot to be coresolved on each gel. The three differentially 

labeled samples are then coresolved on the same 2D gel, after which direct 

measurements can be made for each resolved protein using the spectrally 

exclusive dye channels without interference from technical variation of the 

separation (gel-to-gel variation). 

Rather than making direct quantitative measurements between the two 

samples in the gel, the measurements are instead made relative to the Cy2 signal 

for each resolved protein. The Cy2 signal should be the same for a given protein 

across different gels because it came from the same bulk mixture/labeling; 

therefore, any difference represents gel-to-gel variation, which can be effectively 

neutralized by normalizing all Cy2 values for a given protein across all 

gels. Using the Cy2 signal to normalize ratios between gels then allows for the 

Cy3:Cy2 and Cy5:Cy2 ratios for each protein within each gel to be normalized 

to the cognate ratios from the other gels, encompassing all samples. Each gel 

may contain different (and/or replicate) samples in the Cy3 and Cy5 channels, 

but all samples can be quantified relative to each other because each protein 

from each sample is measured to the cognate Cy2 signal from the internal 

standard present on each gel. With the use of sufficient replicates, a plethora of 

advanced statistical tests can be applied, which can highlight proteins of interest 

whose change in expression is related to the disease state under investigation. 

Since the technical noise is low, these vital replicates should be independent 

(biological) replicates as most of the observed variations will be clinical sample 

related rather than technical or experimental related. 

In a final step, specific proteins of interest are then identified using standard 

mass spectrometry (MS) approaches on gel-resolved proteins that have been 

excised and proteolyzed into a discrete set of peptides. Briefly, excised proteins 

are subjected to in-gel digestion with trypsin protease (typically), and MS 

is used to acquire accurate mass determinations on the resulting peptides, 

as well as fragmentation on individual peptides. The mass spectral data are 

then used to identify statistically significant candidate protein matches through 

sophisticated computer search algorithms that compare the observed MS data 

with theoretical peptide masses (using data generated by peptide mass fingerprinting) 

or collision-induced fragmentation patterns (obtained from tandem 

MS) generated in silico from protein sequences present in databases. (see 

Chapter 19 by Fitzgibbon et al.).


1.1. Optimizing Sensitivity and Resolution 

There are currently two forms of CyDye labeling chemistries available: 

minimal labeling involving the use of N-hydroxy succinimidyl (NHS) ester 

reagents for low-stoichiometry labeling of proteins largely via lysine residues, 

and saturation labeling, which utilize maleimide reagents for the stoichiometric 

labeling of cysteine sulfhydryls. 

The most established DIGE chemistry is the “minimal labeling” method, 

which has been commercially available since July 2002. Here the CyDye DIGE 

fluors are supplied as NHS esters, which react with the -amine groups of 

lysine side chains. The three fluors are mass matched (ca. 500 Da), and carry 

an intrinsic +1 charge to compensate for the loss of each proton-accepting site 

that becomes labeled (thereby maintaining the pI of the labeled protein). Each 

dye molecule also adds a hydrophobic component to proteins, which along with 

MW influences how proteins migrate in SDS-PAGE. 

Minimal labeling reactions are optimized such that only 2–5% of the total 

number of lysine residues are labeled, such that on average a given labeled 

protein would contain only one dye molecule. This is necessary because lysine is 

an abundant amino acid, and multiple labeling events may affect the hydrophobicity 

of some proteins such that they may no longer remain soluble under 

2DE conditions. Although a given protein form may exhibit specific labeling 

efficiencies, these will be the same for labeling with all three dyes, allowing 

for direct relative quantification. Minimal labeling with CyDye DIGE fluors is 

very sensitive, comparable to silver-staining or postelectrophoretic fluorescent 

stains such as Sypro Ruby, Deep Purple or Flamingo Pink (ca. 1 ng), but with 

a linear response in protein concentration over five orders of magnitude (8)(see 

Note 3). 

For maleimide labeling of the cysteine sulfhydryls, the overall lower cysteine 

content in proteins allows for labeling of these residues to saturation without 

increasing the overall hydrophobicity of the proteins to cause insolubility 

problems. Saturation labeling is ultimately more sensitive (150–500 picograms, 

and even more so for proteins with high cysteine content). Its use is not as 

commonplace, most likely due to the availability of only Cy3 and Cy5 with 

this chemistry (see Note 4), the fact that it is blind to the small but significant 

population of noncysteine containing proteins, and the additional optimization 

of complete cysteine reduction necessary for reproducible labeling. For these 

reasons, saturation DIGE is usually reserved for experiments where samples 

are limited, where the advantage of the increased sensitivity outweigh these 

additional considerations. 

To maximize the information that can be gained from DIGE experiments, it is 

imperative that resolution of protein species within gels is optimized. Although 

single 2DE runs can resolve proteins with pI ranges between pH 3 and 11, and


apparent molecular mass ranges between 10 and 200 kDa, higher resolution and 

sensitivity can be obtained by running a series of medium range (e.g., pH 4–7, 

7–11) and narrow range (e.g., pH 5–6) IEF gradients with increasing protein 

loads, leading to an overall more comprehensive proteomic analysis (6,7,10). 

(see Note 5). This is analogous to gaining increased resolution and sensitivity in 

an LC/MS-based strategy by using multiple high performance liquid chromatography 

columns with different affinity chemistries [e.g., MuDPIT (12)]. Much of 

the sensitivity limitation associated with 2D gels can be attributed to the analysis 

of unfractionated, whole-cell and whole-tissue extracts. Additional sensitivity 

can be gained via enrichment for the proteins of interest, such as by analyzing 

prefractionated or subcellular samples, or immune complexes. However, the 

additional experimental manipulations required for prefractionation introduce 

more technical variation into the samples and necessitates increased independent 

(biological) replicates (which can be accommodated with the DIGE internal 

standard methodology). 

The identification of proteins of interest using MS can be performed directly 

from the DIGE gels when protein amounts have been optimized in this way (see 

Subheading 3.5). Alternatively, some experimental approaches perform DIGE 

analysis using “analytical” gels with lower protein amounts, followed by protein 

excision from a secondary, “preparative” gel with higher protein amounts. This 

approach has its advantages when dealing with small sample amounts, such is 

often the case using the saturation dye chemistries, but is also prone to uncertainties 

that arise due to the disproportionate amount of protein loading (see 

Note 6). The methods presented in this protocol are for optimization of both 

the DIGE data as well as material for subsequent MS using high protein loads. 

1.2. Optimizing Statistical Significance 

1.2.1. Using the Internal Standard 

The ability to coresolve and compare two or three samples in a single gel is 

attractive, because it allows for direct relative quantification for a given protein 

without any interference from gel-to-gel variations in migration and resolution, 

removing the need for running replicate gels for each sample (similar to stable 

isotope LC/MS-based strategies, see Chapter 10). This approach has limited 

statistical power, however, since confidence intervals are determined based on 

the overall variation within a population (see Subheading 3.6.2). 

Many researchers new to DIGE technology are not immediately aware of 

the increased statistical advantage and multiplexing capabilities of DIGE when 

combining this approach with a pooled-sample mixture as an internal standard 

for a series of coordinated DIGE gels (13). This design will allow for repetitive 

measurements (vital to any type of experimental investigation), and in


such a way as to control both for gel-to-gel variation and provide increased 

statistical confidence. In this way, statistical confidence can be measured for 

each individual protein based on the variance of repetitive measurements, 

independent of the variation in the population. Incorporating independently 

prepared replicate samples into the experimental design also controls for 

unexpected variation introduced into the samples during sample preparation. 

This more complex and statistically powerful experimental design is accomplished 

by using one of the three dyes (usually Cy2) to label an internal standard, 

which is comprised of equal aliquots of protein from all of the samples in an 

experiment. The total amount of the Cy2-labeled internal standard is such that 

an equal aliquot can be coresolved within each DIGE gel that also contains 

an individual Cy3- and Cy5-labeled sample from the experiment. Since this 

standard is composed of all of the samples in a coordinated experiment, each 

protein in a given sample should be represented in the standard and thus have 

its own unique internal standard (see Note 7). Direct quantitative comparisons 

are made individually for each resolved protein between the Cy3- or Cy5- 

labeled samples and the cognate protein signal from the Cy2-labeled standard 

for that gel (without interference from gel-to-gel variation) and results in the 

calculation of a standardized abundance for every spot matched across all 

gels within a multigel experiment. The individual signals from the internal 

standard are also used to normalize and compare between each in-gel direct 

quantitative comparison for that particular protein from the other gels. Using 

the Cy2-labeled standard in this fashion, therefore, allows for more precise 

and complex quantitative comparisons between gels, including independent 

(biological) sample repetition (Fig. 1). 

Importantly, the internal standard experimental design allows for the identification 

of significant changes that would not have been identified if the analyses 

were performed separately, even when using Cy3- and Cy5-labeled samples on 

the same DIGE gel (14). This experimental design also allows for multivariable 

analyses to be performed in one coordinated experiment, whereby statistically 

significant abundance changes can be quantitatively measured simultaneously 

between several sample types (e.g., different genotypes, drug treatments, or 

disease states), with repetition and without the necessity for every pairwise 

comparison to be made within a single DIGE gel (15,16) (see Note 8 and 

Chapter 17 by Carpentier et al.). 

1.2.2. Assessing Intersample Variation 

Clinical proteomics is hampered by the significant variation associated with 

patient samples. The largest proportion of this variation comes from biological 

diversity, but a significant amount may also come from variable collection


Fig. 1. Illustration of DIGE and experimental design using the mixed-sample internal 

standard. (A) Representative gel from a six-gel set containing three differentially 

labeled samples: Cy2-labeled internal standard, Cy3-labeled sample #1, and Cy5- 

labeled sample #2. The individual protein forms all coresolve in this one gel, but these 

three independently labeled populations of proteins can be individually imaged using 

mutually exclusive excitation/emission properties of the CyDyes. (B) Schematic of 

the sample loading matrix indicating gel number, CyDye labeling and three replicates 

(indicated as “1, 2, and 3”) of the four conditions being tested (A, B, C, D). Within 

the boxed regions representing each labeled sample is depicted a theoretical protein 

that is upregulated in condition D. Dotted lines illustrate how the protein signals from 

each sample are directly quantified relative to the Cy2 internal standard signal for 

that protein without interference from gel-to-gel variation, and how the Cy3:Cy2 and 

Cy5:Cy2 intragel ratios are normalized between the six gels. (C) A graphical representation 

of the normalized abundance ratios for this theoretical protein change. Adapted 

from (10). 

and storage of biological samples. It is of vital importance to identify changes 

in protein abundance that are disease specific rather than patient or sample 

specific. 

In order to gain the more robust data sets necessary to be able to draw 

accurate conclusions from clinical proteomics studies, it is, therefore, necessary 

to collect and store samples using very stringent and closely adhered to


protocols. It is also necessary to assess the biological variation within the 

population being tested and also within a single individual. Interindividual 

variation has been the focus of several studies (17,18) and determining a 

typical diversity within a single patient (i.e., taking longitudinal samples and 

assessing variability in protein abundance) and between patients will determine 

the minimum number of patient samples required for an experiment. This is 

an essential step before embarking on any large-scale and potentially costly 

DIGE experiment. Without this type of pretest, the results of underpowered 

experiments run the risk of being peppered with false information (both false 

positives and negatives). 

As with all complex technologies, the DIGE technique itself is subjected 

to technical variation, which will be laboratory specific to a greater or lesser 

extent. However, the amplitude of this variation is generally outweighed by the 

biological variation associated with a typical sample set (19). 

1.2.3. Univariate Statistical Analyses 

To date, the majority of published quantitative proteomics studies using the 

DIGE technology have applied a univariate test, such as a Student’s t-test 

or analysis of variance (ANOVA), to identify protein species with significant 

changes in expression [(20) and Chapter 17 by Carpentier et al.]. These tests 

calculate the probability (p) that the samples being compared are the same and 

therefore any apparent change in expression occurs by chance alone. Typically 

an expression change is considered significant if the calculated p-value falls 

below a prescribed significance threshold, typically 0.05 (whereby 1 in 20 tests 

may give a change in expression by chance). For more stringent analyses, a 

p-value of 0.01 is often used as the significance threshold. 

When employing these tests on DIGE datasets, there are several factors 

that must be considered if correct assumptions are to be made from ensuing 

analyses. Student’s t-tests and ANOVA assume that the data achieved is 

normally distributed and that any variance is homogeneous. The measurement 

and correction of systematic bias within DIGE experiments have been the 

subject of several studies, which chart methods to optimize normalization of 

data sets (21,22,23). 

Another important consideration is that of false discovery rate (FDR), which 

could arise as a result of statistical tests such as the ones described above. 

These tests involve the simultaneous and independent testing of thousands of 

spots. The probability of a false positive being recorded for each test is such 

that a substantial number of false positives may accumulate. There are several 

approaches to determine the FDR and adjust p-scores to compensate for this,


the most widely used to date being the Benjamini and Hochberg method, whose 

use in conjunction with DIGE data has been described by Fodor et al. (21). 

1.2.4. Multivariate Statistical Analyses 

Discovery phase proteomics often produce large lists of proteins that are 

identified as changing significantly in the experiment, many of which may well 

be false positives. Another approach to overcome these is the application of 

additional multivariate statistical analyses to these datasets, which can help to 

filter out false positives that result from whole sample outliers (i.e., sample 

misclassification and/or poor sample preparation technique). These analyses, 

such as principle components analysis (PCA), partial least squares discriminate 

analysis, and unsupervised hierarchical clustering (HC) (see Figs. 2 and 3 and 

Chapter 16 by Marengo et al.) have recently been applied to DIGE datasets 

[(10,24,25,26,27,28,29,30,31,32)]. Raw and normalized data can be exported 

from most DIGE software solutions (e.g., DeCyder, Progenesis), and several 

multivariate analyses are now part of an extended data analysis (EDA) software 

module as part of the DeCyder suite of software tools (GE Healthcare), which 

was specifically developed for DIGE analysis (see Subheading 3.6). 

These multivariate analyses work essentially by comparing the expression 

patterns of all (or a subset of) proteins across all samples, using the variation 

of expression patterns to group or cluster individual samples. Technical noise 

(poor sample prep, run-to-run variation) and biological noise (normal differences 

between samples, especially present in clinical samples) are almost always 

Δfur 

Heme 

PC2 

–Fe 

control 

PC1 

Fig. 2. Illustration of the use of principle component analysis. DIGE was used 

to analyze changes in Staphylococcus proteins in response to genetic and chemical 

alterations affecting iron utilization. Adapted from (24).


Fig. 3. Hierarchical clustering (by average distance correlation) of representative 

novel circadian proteins detected by 2D DIGE of soluble protein extracts from mouse 

liver. Pale gray represents low levels of protein expression, black represents intermediate 

levels, and dark gray represents high levels of expression. Adapted from (32). 

associated with any analytical dataset of this nature, and may well override 

any variation that arises due to actual differences related to the biological 

questions being tested. Unsupervised clustering of related samples, therefore, 

adds additional confidence that a “list of proteins” changing in a DIGE experiment 

are not arising stochastically (10).


1.3. DIGE in the Clinical Setting 

Although the potential for DIGE to address clinical studies is only beginning 

to be addressed [for example, see (29,30)], many studies have been published 

demonstrating the feasibility and benefit of DIGE/MS using small patient 

cohorts for preliminary studies in colon (14), liver (33,34,35), breast (36,37), 

esophageal (38,39), and pancreatic cancers (40), as well as other important 

clinical studies such as Severe Acute Respiratory Syndrom (SARS) (41). Many 

studies also explore the important benefit of procuring samples using laser 

capture microdissection (LCM – see Chapters 3, 5, and 9 by Diaz et al., Zhang 

et al., and Mustafa et al., respectively) for a highly enriched population of the 

cells under study (16,30,42,43,44). These LCM studies necessitate the use of 

the saturation chemistry owing to the increased sensitivity but limited multiplexing 

power, and typically require secondary preparative gels with higher 

protein loads to enable protein identification by MS. 

The study of Suehara et al. (29) represents the utility of a multivariable 

DIGE/MS analysis with an extended sample set pertinent for a clinical study. 

Eighty soft tissue sarcoma samples comprising seven different histological 

backgrounds were analyzed. Using the saturation DIGE fluors, individual 

samples were labeled with Cy5 and multiplexed with a pooled-sample internal 

standard (labeled in bulk with Cy3) for each DIGE gel. Using high-resolution 

2D gel separations and a combination of multivariate statistical tools (support 

vector machines, leave-one-out cross-validation, PCA, and HC), these studies 

identified a small subset of proteins including tropomyosin and HSP27 that 

were able to discriminate between the different classes of tumors. HSP27 in 

particular was part of a subclass of discriminating proteins that could distinguish 

between leiomyosarcoma and malignant fibrous histiocytoma (MFH), as 

well as correlate with patient survival between low-risk and high-risk groups. 

HSP27 has long been associated with prognosis in MFH as well as in other 

human carcinomas (45). 

2. Materials 

This chapter assumes a solid understanding in 2D gel electrophoresis and 

will focus on the design and implementation of the DIGE method using the 

pooled-sample internal standard methodology and the minimal dye chemistry 

for Cy2, Cy3, and Cy5, with notes provided for saturation labeling chemistry. 

2.1. Cell Lysis Buffers 

1. TNE: 50 mM Tris–HCl pH 7.6, 150 mM NaCl, 2 mM EDTA pH 8.0, 2 mM 

DTT, 1% (v/v) NP-40.


2. RIPA buffer: 50 mM Tris–HCl pH 8.0, 150 mM NaCl, 1% NP-40, 0.5% deoxycholic 

acid, 0.1% SDS. 

3. Two-dimentional gel electrophoresis lysis buffer: 7 M urea, 2 M thiourea, 4% 

CHAPS, 2 mg/mL DTT, 50 mM Tris–HCl pH 8.0. 

4. ASB14 lysis buffer: 7 M urea, 2 M thiourea, 2% amidosulfobetaine 14, 50 mM 

Tris–HCl pH 8.0. 

NB: depending on the sample, it may also be necessary to add protease 

inhibitors and phosphatase inhibitors [sodium pyrophosphate (1 mM), sodium 

orthovanadate (1 mM), beta-glycerophosphate (10 mM) and sodium fluoride 

(50 mM)] to the chosen lysis buffer (see Subheading 3.1). 

2.2. SDS-Polyacrylamide Gel Electrophoresis 

1. Immobilized pH gradient (IPG) strips and accompanying ampholyte mixures can 

be purchased from a number of commercial vendors. Strip lengths vary from 

7 cm to high-resolution 24 cm strips, and pH ranges vary from wide-range (e.g., 

pH 3–11) to high-resolution narrow-range (e.g., pH 5–6) strips. 

2. Bind silane working solution (50 mL): 40 mL ethanol, 1 mL acetic acid, 50 μL 

bind silane solution (GE Healthcare), 9 mL water (see Note 9). 

3. 4× separating gel buffer. 1.5 M Tris-base pH 8.8. 

4. 30% acrylamide:bis-acrylamide (37.5:1), N,N,N,N´-tetramethyl-ethylenediamine, 

and ammonium persulfate. 

5. 10× SDS-PAGE running buffer (1 L): 30.25 g Tris-base, 144.13 g glycine, 10 g 

SDS (0.1%). 

6. Fixing solution for SyproRuby staining (1 L): 100 mL methanol, 70 mL acetic 

acid, 830 mL water. SyproRuby stain is available form several commercial 

sources and can be substituted by other total protein stains, such as Deep Purple 

(GE Healthcare) or Flamingo Pink (BioRad). 

7. Two-dimensional equilibration buffer: 6 M urea, 50 mM Tris-base pH 8.8, 30% 

glycerol, 2% SDS, trace bromophenol blue. 

8. Water-saturated butanol (see Note 10). 

9. Dithiolthreitol (store dessicated). 

10. Iodoacetamide (store dessicated, keep in the dark). 

2.3. DIGE Labeling Materials 

1. N,N-dimethyl formamide (DMF) (see Note 11). 

2. Labeling (L) buffer: 7 M urea, 2 M thiourea, 4% CHAPS, 30 mM Tris-base 

(do not pH, but ensure that pH of final solution is between 8.0 and 9.0), 5 mM 

magnesium acetate (see Note 12). Alternatively, 4% CHAPS can be replaced 

with 2% ASB14, especially in cases where membrane rich samples are being 

utilized. 

3. Rehydration (R) buffer: 7 M urea, 2 M thiourea, 4% CHAPS, 2 mg/mL DTT 

(13 mM; 2%).


4. Cyanine dyes with NHS-ester chemistry for minimal labeling (Cy2, Cy3, and 

Cy5), and with maleimide chemistry for saturation labeling (Cy3 and Cy5) are 

available from GE Healthcare as dry solids. 

5. Quenching solution (for minimal labeling): 10 mM lysine. 

6. Dithiothreitol reduction stock solution: 200 mg/mL DTT. 

3. Methods 

The DIGE is a powerful technique for quantitative multivariable differential 

display proteomics. However, the quality of the data will only be as good as the 

quality of the underlying 2D gel electrophoresis technology upon which it is 

based. The main focus of this chapter is to provide detailed notes on the DIGE 

technology; however, some key considerations to successful high-resolution 2D 

gel electrophoresis are also provided. This section describes methods associated 

with labeling using minimal CyDyes. 

3.1. Sample Preparation 

The key to success for any analytical measurement begins with robust sample 

preparation. This not only includes the buffers and materials used, but also the 

nature of the samples and the way in which they are procured. The addition 

of exogenous materials (such as DNAse, RNAse), or allowing for uncontrolled 

manipulation of the sample (such as conditions that may lead to proteolysis) can 

severely hamper and sometimes completely prevent an analysis. Care should 

be taken to ensure against common laboratory contaminants (e.g., mycoplasma 

for tissue culture) that if present may be detected as significant changes using 

DIGE, either due to the presence in a subset of samples, or by responding to 

the experimental perturbation. 

1. Prepare protein extracts using any method of preference. 

The appropriate amount of protein can be subsequently precipitated prior to 

resuspension in the CyDye labeling buffer (see Subheading 3.2). Ensure against 

proteolysis and loss of post-translational modifications (e.g., phosphorylation) as 

this is of monumental importance. 

Care should be taken not to use reagents that will resolve on the 2D gel, such as 

soybean trypsin inhibitor. Small molecule inhibitors such as aprotinin, leupeptin, 

pepstatinA, antipain, 4 - (20aminoethyl) benzenesulfonyl fluoride hydrochloride 

(AEBSF), sodium orthovanadate, okadaic acid, and microcystin, among others, 

are far better choices. 

2. Lyse cells using standard lysis buffers such as TNE and RIPA buffers, or even 

the buffers used for 2D gel electrophoresis.


All of these buffers have the capability of producing high-resolution samples for 

2DE. In most cases, the presence of reagents that would otherwise interfere with 

CyDye labeling (such as those that contain primary amines) will be removed prior 

to labeling by protein precipitation (see Subheading 3.2). 

3. Sonicate cells if necessary to improve sample quality. 

Sonication improves sample quality by disrupting nucleic acids, which are subsequently 

removed by sample cleanup (see Subheading 3.2) along with phospholipids. 

Both of these nonproteinaceous ionic components can obliterate the 

resolution during IEF. 

Short bursts with a tip-sonicator are suggested. It is important to keep the system 

chilled, especially in the presence of urea-containing samples that should never 

be heated (see Note 12). 

4. Determine the protein concentration of the sample using a system that is 

compatible for the buffer that the proteins are extracted in. 

CHAPS and thiourea in the buffers used for DIGE, although adequately 

chaotropic, interfere with either the Bradford or bicinchoninic acid assays, making 

the data inaccurate and unreliable. In these cases, aliquots should be precipitated 

prior to quantification in a suitable buffer, or the use of a detergent compatible 

assay should be utilized. 

5. Aim to use a protein concentration between 1 and 10 mg/mL. 

Too dilute and it will be difficult to quantitatively recover proteins following 

precipitation cleanup (see Subheading 3.2); too concentrated and it will be 

difficult to accurately dispense the appropriate volume for the experiment. 

Freeze/thawing should also be kept to a minimum; freezing samples in 1 mL 

aliquots or less will usually suffice. 

3.2. Sample Cleanup 

The desired amount of sample to be used in the experiment should be 

precipitated prior to labeling. This removes both nonproteinaceous ions from 

the sample (e.g., nucleic acids, phospholipids) that can interfere with IEF, 

as well as transfers the proteins into a labeling buffer optimized for CyDye 

labeling and subsequent IEF. Determine how much total protein will be on 

each gel, and precipitate ½ of that amount for each sample to be run on that 

gel. This is straightforward for a two-component separation, but also works out 

for the multigel experiments where 1/3 of the total protein amount on each gel 

comes from the pooled-sample internal standard (see Table 1.) Precipitate only 

what is needed for each sample for the experiment; too much material may 

create pellets that are difficult to resolubilize completely.

Table 1 

Experimental Design for CyDye Labeling Using a Pooled-Sample Internal Standard 

Samples 

Gel 1 Gel 2 Gel 3 Pool 

Control-1 Treated-1 Control-2 Treated-2 Control-3 Treated-3 

Precipitated amount 150 μg 150 μg 150 μg 150 μg 150 μg 150 μg 

L-buffer 24 μL 24 μL 24 μL 24 μL 24 μL 24 μL 

Aliquot 16 μL 16 μL 16 μL 16 μL 16 μL 16 μL 8 μL (×6) 

Cy2 6μL 

Cy3 2 μL 2 μL 2 μL 

Cy5 2 μL 2 μL 2 μL 

30 min on ice in the dark 

Lysine (quench) 2 μL 2 μL 2 μL 2 μL 2 μL 2 μL 6 μL 

10 min on ice in the dark 

Total volume 20 μL 20 μL 20 μL 20 μL 20 μL 20 μL 60 μL 

For each gel, combine the quenched Cy3-and Cy5-labeled samples and add 1/3 of the 

quenched Cy2-labeled pooled mixture 

20+20+20μL 20+20+20μL 20+20+20μL 

2× R-buffer 60 μL 60 μL 60 μL 

Total 120 μL 120 μL 120 μL 

R-buffer to V f to V f to V f 

This table illustrates a typical DIGE labeling experiment, as described in Subheadings 3.2 and 3.3. 

107


Many precipitation methods are available, the following is a MeOH/CHCl 3 

protocol that works well for DIGE, and can be easily performed in 1.5 mL 

tubes [adapted from (46)]: 

1. Bring up predetermined amount of protein extract to 100 μL with water. 

2. Add 300 μL (3-volumes) water. 

3. Add 400 μL (4-volumes) methanol. 

4. Add 100 μL (1 volume) chloroform. 

5. Vortex vigorously and centrifuge; the protein precipitate should appear at the 

interface. 

6. Remove the water/MeOH mix on top of the interface, being careful not to 

disturb the interface. Often the precipitated proteins do not make a visibly white 

interface, and care should be taken not to disturb the interface. 

7. Add another 400 μL methanol to wash the precipitate. 

8. Vortex vigorously and centrifuge; the protein precipitate should now pellet to 

the bottom of the tube. 

9. Remove the supernatant and briefly dry the pellets in a vacuum centrifuge. 

10. Resuspend the pellets in a suitable amount of CyDye labeling buffer (L-buffer, 

see Table 1). 

An alternative widely used precipitation method is as follows: 

1. Add 5 volumes of cold 0.1 M ammonium acetate in methanol. 

2. Leave at –20°C for 12 h or overnight. 

3. Centrifuge at ∼3000 rpm (1400×g) for 10 min at 4°C and remove the supernatant. 

4. A pellet of protein should be visible at this stage. 

5. To wash the pellet, add 80% 0.1 M ammonium acetate in methanol and mix to 

resuspend the protein. 

6. Centrifuge at 3000 rpm (1400×g) for ten min at 4°C and remove the supernatant. 

7. To dehydrate the pellet add 80% acetone and resuspend the pellet by mixing. 

8. Centrifuge at 3000 rpm (1400×g) for ten min at 4°C and remove the supernatant. 

9. Dry pellet for 15 min by leaving open tube in a laminar flow cabinet. 

3.3. DIGE Experimental Design 

1. Start with a preliminary gel. All experiments should start with a preliminary gel 

on representative samples to ensure equivocal protein amounts between samples, 

and that the highest resolution and sensitivity are obtained before embarking on a 

multigel DIGE experiment. (see Notes 13 and 6). The preliminary gel will also show 

any problems with the sample preparation that may be corrected by adjusting the 

procurement methods (see Subheading 3.1). This step can also be used to optimize 

the maximal amount of protein can be loaded without adversely affecting resolution. 

The preliminary gel needs only to test one or two of the samples of a much 

larger experiment. This gel can simply be stained with a total protein stain (e.g., 

Sypro Ruby or Deep Purple) to visually inspect the resolution and sensitivity.


Alternatively, the gel can contain two different samples prelabeled with Cy3 

and Cy5 and coresolved. (see Note 14). 

2. Choose a suitable pH gradient for the IEF. Precast IEF strips are commercially 

available from several vendors. The widest length is currently 24 cm, providing 

the highest resolving power for a given pH range. Medium-range IEF gradients 

(e.g., pH 4–7) offer the best trade-off between overall resolution and sensitivity. 

Subsequent experiments can then be designed to resolve proteins in the basic 

range (pH 7–11) and in narrow pI ranges with commensurate increases in protein 

loading to gain access to the lower abundant proteins in a given sample (see 

Note 5). In this way a more comprehensive picture of the proteomes under study 

can be obtained. 

3. Incorporate a pooled-sample mixture internal standard on every DIGE gel in 

a coordinated experiment. This internal standard, usually labeled with Cy2, is 

composed of an equal aliquot of every sample in the entire experiment, and 

therefore represents every protein present across all samples in an experiment. The 

use of this pooled-sample internal standard on every DIGE gel in a coordinated 

experiment allows for the facile comparison of independent sample replicates 

with increased statistical confidence. This experimental design also enables the 

simultaneous quantitative comparison between multiple variables in a coordinated 

experiment (Fig. 1). 

4. Plan out which samples will be labeled with which dyes ahead of time. For 

minimal dye labeling chemistry (see Subheading 3.4), each gel will contain two 

individual samples labeled with either Cy3 or Cy5, and an equal amount of the 

pooled-sample internal standard. The example outlined in Table 1 is for a twocomponent 

comparison repeated in triplicate, with 300 μg total protein loaded 

onto each of three gels. In this case, 150 μg of each sample should be precipitated 

(see Subheading 3.2), resuspended in L-buffer and then split 2:1. Two-thirds 

of each sample (100 μg) will be individually labeled with either Cy3 or Cy5. 

The remaining 1/3 of each sample will be pooled together and labeled with Cy2 

to serve as an internal standard. By following this, there will be enough of the 

Cy2-labeled internal standard to have an equal amount as the Cy3 or Cy5 samples 

loaded onto each gel. (see Note 15). 

3.4. CyDye Labeling 

All steps are performed on ice. The following protocol is for sample loading 

via rehydration of IPG strips, and assumes incorporation of a pooled-sample 

internal standard to coordinate many samples across multiple DIGE gels simultaneously. 

The steps are summarized in Table 1 (see Note 16). 

1. Resuspend precipitated sample in 24 μL labeling (L) buffer. Remove 8 μL (1/3 

of sample) and place into a new tube that will contain the pooled-sample internal 

standard (8 μL from all of the other individual samples will be pooled into this 

tube) (see Note 17).


2. CyDyes are purchased as dry solids and should be reconstituted to 10× stock 

solutions (1 nmol/μL) in fresh DMF. Dilute stock solutions of CyDyes 1:10 in 

fresh DMF to a final working concentration of 100 pmol/μL (see Note 11). 

3. Label each sample (50–250 μg) with 2–4 μL (200–400 pmol) of either Cy3 or Cy5 

working dilution for 30 min on ice in the dark. Label the pooled-sample mixture 

with 2–4 μL (200–400 pmol) of Cy2 working dilution for every equivalent amount 

of sample present in the pooled standard as compared with the individually labeled 

samples. That is, if 100 μg of each sample is labeled with 200 pmol of Cy3 or 

Cy5, then 50 μg of each of these samples is present in the pooled standard, and 

200 pmol of Cy2 is used for every 100 μg of pooled standard. (see Table 1 and 

Note 18). 

4. Quench reactions with 2 μL of 10 mM lysine for 10 min on ice in the dark. 

5. For each gel, combine the quenched Cy3- and Cy5-labeled samples and add 1/3 

of the quenched Cy2-labeled pooled mixture. 

6. To each tripartite mixture, add an equal volume of 2× R-buffer and incubate on 

ice for 10 min. 2× R-buffer is R-buffer supplemented with an additional 2 mg/mL 

DTT using the 200 mg/mL DTT stock solution. DTT is omitted from the L-buffer 

to prevent unfavorable interaction with the CyDyes. Adding an equal volume of 

2× R-buffer to the quenched reactions provides the reducing agents to the total 

reaction volume at a 1× final concentration. 

7. Add R-buffer (1× DTT concentration) to a final volume suggested by the manufacturer 

for the given IPG strip length (e.g., 450 μL for 24 cm strips). Add the 

appropriate volume of IPG buffer ampholines to 0.5% final (v/v) for IEF. Proceed 

with rehydration of dehydrated IPG strips for >16 h and proceed with IEF (see 

Subheading 3.5.3 and Note 19). 

3.5. 2D Gel Electrophoresis and Poststaining 

As a result of the minimal labeling, quantification with the CyDyes is carried 

out on only 2–5% of the proteins that are labeled, and the labeled portion of 

the protein may migrate at a higher apparent molecular mass than the majority 

of the unlabeled protein due to the added mass and hydrophobicity of the dyes 

(exacerbated in lower M r species). To ensure that the maximum amount of 

protein is excised for subsequent in-gel digestion and MS, minimally labeled 

2D DIGE gels are poststained with a total protein stain such as SyproRuby or 

Deep Purple. Accurate excision is also ensured by preferentially affixing the 

second dimension gel to a presilanized glass plate during gel casting so that 

the gel dimensions do not change during the analysis (see Notes 20 and 21). 

These methods assume the use of the Ettan 2D electrophoresis system (GE 

Healthcare), but are easily adaptable to other commercially available systems. 

It also assumes usage of high-resolution 24 cm × 20 cm gels. 

1. Special gels for second dimension SDS-PAGE. Using low-fluorescence glass 

plates, pretreat one plate for each gel with 3–5 mL bind silane working solution,


carefully wiping the entire surface of the plate with a lint-free wipe. Leave treated 

plates covered with lint-free wipes for several hours to allow for sufficient outgassing 

of fumes (that may contain bind silane) before assembling gel plates and 

casting of second dimensional SDS-PAGE gels (see Note 22). 

2. Assemble plates and pour 12% homogeneous SDS-PAGE gel(s) using the appropriate 

amount of 30% stock acrylamide and 4× separating gel buffer for the 

volumes needed for the number of gels being poured (see Note 23). Overlay the 

gels with water-saturated butanol for several hours to provide a straight and level 

surface to place the focused IPG strip (see Note 10). 

3. Perform IEF using an IPGphor II IEF unit (GE Healthcare) of the combined 

tripartite-labeled samples, brought up to final volume with 1× R-buffer and 

passively rehydrated into IPG strips for >16 h (see Subheading 3.4.7) (see 

Note 24). 

4. Equilibrate the focused IPG strips into the second dimensional equilibration buffer. 

During this step, the cysteine sulfhydryls in the focused proteins are reduced 

and carbamidomethylated by supplementing the equilibration buffer with 1% 

DTT for 20 min at room temperature, followed by 2.5% iodoacetamide in fresh 

equilibration buffer for an additional 20 min room temperature incubation (see 

Note 25). 

5. Place equilibrated IPG strip on top of the SDS-PAGE gels that were precast with 

low-fluorescence glass plates. Use a thin card or ruler to carefully tamp down the 

IPG strip to the SDS-PAGE gel, removing air bubbles at the interface (see Notes 

26 and 27). 

6. Perform second dimensional SDS-PAGE at constant wattage, using ≪1 W/gel 

for at least 1 h prior to ramping up to


two spot patterns, whereas most of the commercial products contain proprietary 

algorithms for protein spot detection, intergel matching, protein spot quantification, 

and even utilities for building web-based tools for data dissemination. 

Many include the ability to average replicate patterns into a single virtual 

pattern to be used in a comparative study. They are all designed to compare 

multiple spot patterns and quantify abundance changes for individual proteins 

between experimental conditions. 

Several software packages allow for the analysis of DIGE data. The DeCyder 

suite of software tools was specifically developed to support the DIGE platform 

when this technology was first marketed by GE Healthcare and is therefore used 

as an example here. The differential in-gel analysis (D I A) module of DeCyder 

is used for direct quantification of protein spot volume ratios between the triply 

codetected signals emanating from each resolved protein, and can be used for 

the simplest form of a DIGE experiment for pairwise comparisons with N =1. 

The more advanced DIGE experiments that use the internal standard to crosscompare 

replicate samples from pairwise and multivariable analyses (N >3) 

are handled by the biological variation analysis (BVA) module of DeCyder. In 

a BVA experiment, the signals emanating from the internal standard are used 

both for direct quantification within each DIGE gel in a coordinated set (using 

Differential In-gel Analysis (DIA) module), as well as for normalization and 

protein spot pattern matching between gels (see Note 31). This allows for the 

calculation of Student’s t-test and ANOVA statistics for individual abundance 

changes (see Subheading 3.6.2, and Table 2). BVA is also used to match 

patterns between SyproRuby- and CyDye-stained images to facilitate protein 

excision for subsequent MS (see Notes 20, 21, and 30). 

3.6.2. Experimental Design and Statistical Confidence 

In the simplest form of a DIGE experiment, two or three samples are 

separately labeled with one of the three dyes and separated in the same gel for 

direct pairwise comparisons. In this case, the software first normalizes the entire 

signal for each CyDye channel and then calculates the protein spot volume 

ratio for each protein pair. A normal distribution is modeled over the actual 

distribution of protein pair volume ratios, and two standard deviations of the 

mean of this normal distribution represent the 95th percent confidence level for 

significant abundance changes. 

This N = 1 type of experiment has limited statistical power, since the 95th 

percentile confidence interval is determined based on the overall distribution of 

changes within the population (see Note 32). Many more changes in abundance 

of much lesser magnitude can be detected with much greater statistical confidence 

(Student’s t-test and ANOVA, Table 2) by incorporating independent


Table 2 

Statistical Applications of DeCyder Biological Variation Analysis and Extended 

Data Analysis (EDA) Modules 

Average ratio 

Student’s t-test 

One-way ANOVA 

Two-way ANOVA 

Principle component 

analysis (EDA only) 

Hierarchical 

clustering (EDA only) 

K-means (EDA only) 

Self organizing maps 

(EDA only) 

Gene shaving (EDA 

only) 

Discriminant analysis 

(EDA only) 

Calculated for each protein spot feature between two groups 

or experimental conditions. Derived from the log standardized 

protein abundance changes that were directly quantified 

within each DIGE gel relative to the internal standard for the 

protein spot feature. 

Univariate test of statistical significance for an abundance 

change between two groups or experimental conditions. 

p-values reflect the probability that the observed change has 

occurred due to stochastic chance alone. With DIGE, p-values 

of


replicate samples into the experiment (see Note 33). The number of replicates 

required in a study depends on the amount of variation in the system being 

investigated. Increasing the number of replicates will increase confidence in 

smaller changes in expression. The number of gel replicates that are needed for 

the experiment to have sufficient sensitivity to detect expression changes can 

be determined using power calculations (for example see (19)). 

With replicate samples, the Student’s t-test and ANOVA statistics are 

measuring the significance of the variation of a specific protein change, 

independent of the overall distribution of abundance changes in the population. 

Incorporating replicate samples into the experimental design also controls for 

unexpected variation introduced into the samples during sample preparation. 

This design not only allows for the identification of abundance changes that 

are consistent across multiple replicates of an experiment, but can also identify 

significant abundance changes that would not have been identified even if the 

analyses were performed using Cy3- and Cy5-labeled samples on the same 

gels, but without the pooled-sample internal standard to coordinate them (14). 

3.6.3. Multivariate Statistical Analysis 

Univariate analyses such as the Student’s t-test and ANOVA have traditionally 

been used in DIGE experiments to provide a list of statistically significant 

changes in protein abundance. The application of multivariate statistical 

analyses (as outlined in Subheading 1.2.4) allow for the assessment of 

changes on a global scale, and can bring added insight to the usual “list of 

proteins” generated. Most software packages allow for the export of raw and 

normalized protein spot volumes to allow for these additional statistical tests 

and data manipulations; in addition, the DeCyder suite of software tools now 

provides an Extended Data Analysis (EDA) module, that includes many of these 

tools (Table 2). These tools are now becoming more evident in recent DIGE 

publications (10,24,28,29,30,32,52). Although these multivariate analyses are 

especially beneficial when analyzing a DIGE experiment that contains three or 

more conditions, they can also useful in two-condition comparisons to detect 

sample outliers, fouled samples or even poor experimental design. 

Figure 2 illustrates an example of PCA applied to a DIGE dataset comprised 

of four experimental conditions each measured in quadruplicate. PCA simplifies 

multidimensional datasets by reducing the variation down to the two or three 

most significant sources of variation. In this example, the first principle 

component (PC1) accounts for 62.3% of the variation amongst 156 proteins 

of interest, with the second principle component (PC2) accounting for an 

additional 12.5% of the variation. Each sample datapoint describes the collective 

expression profile for the subset of 156 proteins, and PC1 and PC2 orthogonally


divide the samples into quadrants based on these two largest sources of variation 

within DIGE dataset. In this case, 75% of the variance between these proteins 

clusters the samples into the proper categories (adapted from (24)). 

Figure 3 is taken from a 2D DIGE study, which determined the change in 

protein abundance in mouse liver over a 24 h period. In this, study proteins 

were harvested from groups of mice on a second cycle after transfer from 

synchronized (12 h light:12 h dim red light) to free running conditions (constant 

dim red light). Proteins were extracted from each liver and pooled from six 

mice per 4-h time point. HC (by average distance correlation) was used to 

investigate the expression of 49 novel circadian proteins. This gave a range 

of phase groups with 10 proteins peaking during the subjective day and 39 

proteins distributed between two clusters, which were most abundant during 

the subjective night (adapted from (32)). 

Finally, additional information may be gleaned by mapping proteins found 

to be changing by DIGE to existing biological pathways and networks. 

Many software solutions and services are becoming available for this type 

of extended analysis (e.g., Kegg pathways, Ingenuity pathways analysis, 

WebGestalt, DeCyder EDA). Although additional validation is necessary to 

establish biological significance, the mapping of members of a “list of proteins” 

to established pathways and networks can provide validating support for the 

proteins observed by DIGE alone. In some cases, it can also indicate potential 

proteins associated with the biological question that were not accessible in the 

DIGE analysis. For example, Friedman et al. (10) recently reported the use of 

network/pathway mapping for proteins found by DIGE/MS in MCF10A cells 

overexpressing the HER2 receptor after treatment with TGF-. The majority of 

proteins identified with DIGE/MS mapped to a network of pathways involving 

TGF- as a major hub, but also included an intercalating pathway involving p53 

that effected many proteins that were independently identified in the DIGE/MS 

experiments. This insight linking new players to those identified with DIGE/MS 

led to the further investigation of a direct role for p53 in the expression of the 

tumor suppressor maspin (53). 

4. Notes 

1. 2DE has traditionally been a popular method for differential display proteomics 

on a global scale, but until recently, these strategies lacked the ability to directly 

quantify abundance changes in the same fashion as in stable isotope LC/MSbased 

strategies (2,3,4). This has been mainly due to the inability to directly 

correlate migration patterns and protein staining between gel separations (gelto-gel 

variation). Stable isotopes have been used in gel-based proteomics as 

well, whereby different proteomes have been separately labeled with different 

stable isotopes (e.g., growing cells using 14 N vs. 15 N-labeled medium) prior to


mixing and running together through the same 2DE separation (5). In this case, 

abundance changes can be monitored during the mass spectrometry (MS) stage 

on individual proteins, but requires the in-gel digestion and MS on every protein 

present to discover the subset of proteins that is changing. 

2. Both hydrophobicity and molecular weight influence how proteins migrate 

during SDS-PAGE, yielding information on apparent molecular mass. 

3. In comparison, commonly used silver or colloidal coomassie blue (ca. 5–10 ng 

sensitivity) stains typically exhibit a dynamic range of less than two orders of 

magnitude (8,9). The CyDye labeling system is compatible with the downstream 

processing commonly used to identify proteins via MS and database interrogation, 

which involves the generation of tryptic peptides within excised gel 

plugs. Trypsin cleaves the peptide bonds the C-terminal side of lysine and 

arginine residues, but peptide generation is mostly unhindered as so few lysine 

residues are modified by dye labeling. 

4. DIGE experiments can still be performed using the internal standard methodology 

with only two CyDyes, but twice as many gels are required to analyze the 

same number of samples compared with the three-dye minimal labeling scheme. 

With saturation labeling, one dye is used to label the internal standard, and the 

other is used to label individual samples. A dye-swap scheme is not necessary 

in this case because the individual samples are always labeled with the same 

CyDye. 

5. The use of hydroxyethyl disulfide (commercially available as “DeStreak 

reagent”), combined with anodic cup loading, should be used for enhanced 

resolution for IEF above pH 8 (11). 

6. Running every DIGE gel with the maximal amount of protein (without adversely 

effecting first dimension resolution) not only enables detection of lower 

abundance proteins, but also provides more material for subsequent protein 

identification using MS. This makes every gel in a coordinated DIGE experiment 

a “pick-able” gel, without the need to run subsequent preparative gels 

with increased protein load that then have to be carefully matched to a lower 

abundant, analytical gel. When combined with narrow range IEF, maximizing 

the protein amount also allows interrogation of the lower abundant proteins in 

a sample. 

7. If one sample within a study has very skewed protein distributions compared 

with others, then many of the “novel proteins” within this sample will effectively 

be diluted out in the pool. Such a sample outlier can be easily identified using 

the multivariate statistical analyses described. 

8. Repetition not only enables the identification of subtle differences with statistical 

confidence, it is also vital to control for nonbiological variation. In most cases 

biological variation will outweigh technical variation, therefore, only biological 

replicates are necessary. Thus it is important that each replicate sample is derived 

from an independent experiment, ideally performed on different occasions as 

perhaps using different batches of medium. The independent samples can then be


analyzed coordinately using the pooled-sample internal standard methodology. 

See Table 1 for an example of this design. 

9. All solutions should be prepared using water that has a resistivity of 18.2 Mcm; 

this is referred to as “water” throughout the text. 

10. Mix equal parts of butanol and water and shake vigorously. Let the two phases 

separate overnight, and use the butanol phase for overlay. Butanol that is not 

completely water saturated can extract water from the top of the gel. A more 

recent improvement is to use a 0.1% SDS solution in a conventional spray bottle, 

used to carefully spray a fine mist over the top of the gels to thoroughly cover 

the top of the gel (the gel/overlay interface will not be as obvious). 

11. DMF can degrade, producing amines, which can react with the NHS-ester 

CyDyes. DMF stocks should be kept fresh (


dimension due to MW and hydrophobicity shifts. Overlabeling results in side 

reactions with the epsilon-amine groups of lysine side chains, but since the 

maleimide dyes do not carry compensatory charge, this results in the overall 

loss of a charge, which creates a series of isoelectric forms in the first dimension 

(“charge trains”). Labeling buffer should not contain any components with free 

thiols, as these will react with the satCyDyes. 

17. L-buffer volume can be increased if necessary for complete resolubilization, 

although 100–250 μg or more should resolubilize readily in this volume. The 

volume of labeling buffer used for resolubilization should not exceed 40 μL per 

sample when using cup loading for sample entry to ensure that the final volumes 

will not exceed the capacity of the cup loading (ca. 100–150 μL). 

18. These methods are provided assuming that all gels to be run will be used both 

for analytical (quantification) as well as preparative (providing material for 

subsequent MS) purposes. Current recommendations from the manufacturer are 

to label 50 μg of sample with 400 pmol CyDye. Sufficient amount of unlabelled 

sample can be added to the quenched reactions to achieve final protein amounts 

to facilitate subsequent MS. Alternatively, many have found that the ratios can 

be adjusted to label increasing amounts of sample (up to 200 μg with 200 pmol 

dye) without adversely affecting the overall labeling reaction (presented here). 

19. If samples are to be introduced using anodic cup loading, simply bring this 

mixture up to 100 μL in R-buffer and proceed with cup loading. R-buffer can 

always be supplemented with additional DTT using the 200 mg/mL DTT stock 

solution. In the presence of Destreak reagent for focusing in pH ranges above pH 

8, the addition of equal volume 1× R-buffer should provide sufficient amount 

of DTT without interfering with the Destreak reagent. 

20. Comparison of minimally labeled protein 2D maps with unlabeled protein maps 

is generally not a problem, as the addition of only one dye molecule does not 

generally prevent the facile matching of small alterations in protein mobility 

between the 2- and 5%-labeled protein and the remaining unlabeled protein that 

will provide enough material for MS. 

21. Poststaining is not necessary with saturation DIGE, since an unlabeled population 

with potentially different migration characteristics will not exist. 

22. This treatment binds the gel to one of the glass plates and therefore prevents 

shrinking/swelling during the poststaining and protein excision processes, 

thereby facilitating accurate robotic protein excision. Nothing should be placed 

on top of wipes that are covering bind silane-treated plates, as this may leave 

impressions that are detected during the scanning phase. Assembly and casting 

too soon may create a binding surface on the opposite glass plate, preventing 

the gel to be subsequently poststained and picked. Automated protein excision 

can be facilitated for certain systems by placing fluorescent alignment reference 

targets on the plate, which can be performed at this stage. 

23. A stacking gel is not required for 2D gel electrophoresis, as the proteins are 

effectively “stacked” to the height of the IPG strip. SDS is also not essential in the 

separating gel, as the SDS associated with the proteins during the equilibration


step, and present in the running buffer, is sufficient (although many traditionally 

use it in the separating gel). Using 2× concentration running buffer in the upper 

buffer chamber can produce higher quality separations in some circumstances. 

24. Samples of similar nature should always be focused simultaneously for optimal 

reproducibility. Focusing programs vary for some pH gradients. A typical 

program for many ranges is 500 V for 500 V-h, stepping to 1000 V for 1000 V-h, 

followed by a final step to 8000 V until >50 V-h has been reached. Check 

recommendations from specific vendors. 

25. Volume of equilibration buffer should be large to ensure sufficient removal of 

ampholines and other components of the first dimensional run. 

26. Carefully wash out any remaining liquid on top of the SDS-PAGE gel. Prewet 

the IPG strip with 1× running buffer and place the strip between the gel plates 

with the plastic backing adhering to the inside surface of one of the glass plates. 

The prewetted running buffer will facilitate the manipulation of the IPG strip 

down the inside surface of the plate and on top of the SDS-PAGE gel. 

27. An agarose overlay, used by many protocols, is not absolutely necessary to 

ensure proper contact between the IPG strip and the second dimensional SDS- 

PAGE gel. Using a thin card or ruler to carefully tamp down the IPG strip to 

the gel is usually sufficient and removes the added problems associated with the 

overlay, such as trapped air bubbles in the solidified agarose. 

28. Running gels at less than 1 W/gel can improve resolution in the high molecular 

weight regions of the second dimension gel. Use wattage appropriate for the 

second dimensional unit being used. Many different gel units can accommodate 

increased power by compensating for the increased heat. 

29. Absorption/emission maxima in DMF are 491/506 for Cy2, 553/572 for Cy3, 

and 648/669 for Cy5; although care must be taken to scan in regions of each 

spectrum that do not contain absorbance or emission in the other spectra, which 

may mean using a nonmaximal region of a given spectrum. 

30. Comparison of the 2D spot maps between saturation-labeled samples and 

minimal labeled or unlabeled samples is impossible, as proteins containing 

multiple cysteine residues may appear as significantly larger M r species when 

labeled with the saturation dyes, which of course cannot be predicted without 

first knowing the protein identity. 

31. Almost all software packages for 2D electrophoresis involve matching of protein 

spot patterns between gels. For DeCyder, it is used in the BVA module to match 

the quantitative data obtained from the triply coresolved protein signals from 

each gel in the DIA module (where gel-to-gel variation does not come into 

play). Manual verification of the matching is almost always required with any 

software package. 

32. There are many “all-or-none” type of experiments where the single gel 

comparison may be valid, and subtle changes are not expected. Nevertheless, 

using independent replicates and the pooled-sample internal standard methodology 

is still needed to control for nonbiological sample preparation error.


33. The multigel approach allows many data points to be collected for each group 

to be compared. Spots of interest can be selected by looking for significant 

change across the groups. Student’s t-test and ANOVA probability scores (p) 

indicate the probability that the observed change occurred due to stochastic, 

random events (null hypothesis). Probability values


9. Lilley, K.S., Razzaq, A. and Dupree, P. (2002) Two-dimensional gel 

electrophoresis: recent advances in sample preparation, detection and quantitation. 

Curr Opin Chem Biol 6(1):46–50. 

10. Friedman, D.B., Wang, S.E., Whitwell, C.W., Caprioli, R.M. and Arteaga, C.L. 

(2007) Multi-variable difference gel electrophoresis and mass spectrometry: A 

case study on TGF-beta and ErbB2 signaling. Mol Cell Proteomics 6:150–69. 

11. Olsson, I., Larsson, K., Palmgren, R. and Bjellqvist, B. (2002) Organic disulfides 

as a means to generate streak-free two-dimensional maps with narrow range basic 

immobilized pH gradient strips as first dimension. Proteomics 2(11):1630–32. 

12. Wolters, D.A., Washburn, M.P. and Yates, J.R. 3rd (2001) An automated multidimensional 

protein identification technology for shotgun proteomics. Anal Chem 

73(23):5683–90. 

13. Alban, A., David, S.O., Bjorkesten, L., Andersson, C., Sloge, E., Lewis, S. and 

Currie, I. (2003) A novel experimental design for comparative two-dimensional gel 

analysis: two-dimensional difference gel electrophoresis incorporating a pooled 

internal standard. Proteomics 3(1):36–44. 

14. Friedman, D.B., Hill, S., Keller, J.W., Merchant, N.B., Levy, S.E., Coffey, R.J. 

and Caprioli, R.M. (2004) Proteome analysis of human colon cancer by twodimensional 

difference gel electrophoresis and mass spectrometry. Proteomics 

4(3):793–811. 

15. Gerbasi, V.R., Weaver, C.M., Hill, S., Friedman, D.B. and Link, A.J. (2004) Yeast 

Asc1p and mammalian RACK1 are functionally orthologous core 40S ribosomal 

proteins that repress gene expression. Mol Cell Biol 24(18):8276–87. 

16. Sitek, B., Luttges, J., Marcus, K., Kloppel, G., Schmiegel, W., Meyer, H.E., 

Hahn, S.A. and Stuhler, K. (2005) Application of fluorescence difference gel 

electrophoresis saturation labelling for the analysis of microdissected precursor 

lesions of pancreatic ductal adenocarcinoma. Proteomics 5(10):2665–79. 

17. Hu, Y., Malone, J.P., Fagan, A.M., Townsend, R.R. and Holtzman, D.M. (2005) 

Comparative proteomic analysis of intra- and interindividual variation in human 

cerebrospinal fluid. Mol Cell Proteomics 4(12):2000–9. 

18. Zhang, X., Guo, Y., Song, Y., Sun, W., Yu, C., Zhao, X., Wang, H., Jiang, H., 

Li, Y., Qian, X., Jiang, Y. and He, F. (2006) Proteomic analysis of individual 

variation in normal livers of human beings using difference gel electrophoresis. 

Proteomics 6(19):5260–68. 

19. Karp, N.A., Spencer, M., Lindsay, H., O’Dell, K. and Lilley, K.S. (2005) 

Impact of replicate types on proteomic expression analysis. J Proteome Res 4(5): 

1867–71. 

20. Meunier, B., Dumas, E., Piec, I., Bechet, D., Hebraud, M. and Hocquette, J.F. 

(2007) Assessment of hierarchical clustering methodologies for proteomic data 

mining. J Proteome Res 6(1):358–66. 

21. Fodor, I.K., Nelson, D.O., Alegria-Hartman, M., Robbins, K., Langlois, R.G., 

Turteltaub, K.W., Corzett, T.H. and McCutchen-Maloney, S.L. (2005) Statistical 

challenges in the analysis of two-dimensional difference gel electrophoresis experiments 

using DeCyder. Bioinformatics 21(19):3733–40.


22. Karp, N., Kreil, D. and Lilley, K. (2004) Determining a significant change in 

protein expression with DeCyderTM during a pair-wise comparison using twodimensional 

difference gel electrophoresis. Proteomics 4(5):1421–32. 

23. Kreil, D., Karp, N. and Lilley, K. (2004) DNA microarray normalization methods 

can remove bias from differential protein expression analysis of 2-D difference gel 

electrophoresis results. Bioinformatics 20(13):2026–34. 

24. Friedman, D.B., Stauff, D.L., Pishchany, G., Whitwell, C.W., Torres, V.J. and 

Skaar, E.P. (2006) Staphylococcus aureus redirects central metabolism to increase 

iron availability. PLoS Pathog 2(8):e87. 

25. Fujii, K., Kondo, T., Yamada, M., Iwatsuki, K. and Hirohashi, S. (2006) Toward 

a comprehensive quantitative proteome database: protein expression map of 

lymphoid neoplasms by 2-D DIGE and MS. Proteomics 3:3. 

26. Fujii, K., Kondo, T., Yokoo, H., Yamada, T., Matsuno, Y., Iwatsuki, K. and 

Hirohashi, S. (2005) Protein expression pattern distinguishes different lymphoid 

neoplasms. Proteomics 5(16):4274–86. 

27. Karp, N.A., Griffin, J.L. and Lilley, K.S. (2005) Application of partial least squares 

discriminant analysis to two-dimensional difference gel studies in expression 

proteomics. Proteomics 5(1):81–90. 

28. Seike, M., Kondo, T., Fujii, K., Yamada, T., Gemma, A., Kudoh, S. and 

Hirohashi, S. (2004) Proteomic signature of human cancer cells. Proteomics 

4(9):2776–88. 

29. Suehara, Y., Kondo, T., Fujii, K., Hasegawa, T., Kawai, A., Seki, K., Beppu, Y., 

Nishimura, T., Kurosawa, H. and Hirohashi, S. (2006) Proteomic signatures 

corresponding to histological classification and grading of soft-tissue sarcomas. 

Proteomics 6(15):4402–09. 

30. Hatakeyama, H., Kondo, T., Fujii, K., Nakanishi, Y., Kato, H., Fukuda, S. and 

Hirohashi, S. (2006) Protein clusters associated with carcinogenesis, histological 

differentiation and nodal metastasis in esophageal cancer. Proteomics 6(23): 

6300–16. 

31. Verhoeckx, K.C., Gaspari, M., Bijlsma, S., van der Greef, J., Witkamp, R.F., 

Doornbos, R.P. and Rodenburg, R.J. (2005) In search of secreted protein 

biomarkers for the anti-inflammatory effect of beta2-adrenergic receptor agonists: 

application of DIGE technology in combination with multivariate and univariate 

data analysis tools. J Proteome Res 4(6):2015–23. 

32. Reddy, A.B., Karp, N.A., Maywood, E.S., Sage, E.A., Deery, M., O’Neill, 

J.S., Wong, G.K., Chesham, J., Odell, M., Lilley, K.S., Kyriacou, C.P. and 

Hastings, M.H. (2006) Circadian orchestration of the hepatic proteome. Curr Biol 

16(11):1107–15. 

33. Lee, I.N., Chen, C.H., Sheu, J.C., Lee, H.S., Huang, G.T., Yu, C.Y., Lu, F.J. 

and Chow, L.P. (2005) Identification of human hepatocellular carcinomarelated 

biomarkers by two-dimensional difference gel electrophoresis and mass 

spectrometry. J Proteome Res 4(6):2062–69. 

34. Liang, C.R., Leow, C.K., Neo, J.C., Tan, G.S., Lo, S.L., Lim, J.W., Seow, T.K., 

Lai, P.B. and Chung, M.C. (2005) Proteome analysis of human hepatocellular


carcinoma tissues by two-dimensional difference gel electrophoresis and mass 

spectrometry. Proteomics 5(8):2258–71. 

35. Nabetani, T., Tabuse, Y., Tsugita, A. and Shoda, J. (2005) Proteomic analysis of 

livers of patients with primary hepatolithiasis. Proteomics 5(4):1043–61. 

36. Huang, H.L., Stasyk, T., Morandell, S., Dieplinger, H., Falkensammer, G., Griesmacher, 

A., Mogg, M., Schreiber, M., Feuerstein, I., Huck, C.W., Stecher, G., 

Bonn, G.K. and Huber, L.A. (2006) Biomarker discovery in breast cancer serum 

using 2-D differential gel electrophoresis/ MALDI-TOF/TOF and data validation 

by routine clinical assays. Electrophoresis 27(8):1641–50. 

37. Somiari, R.I., Sullivan, A., Russell, S., Somiari, S., Hu, H., Jordan, R., George, A., 

Katenhusen, R., Buchowiecka, A., Arciero, C., Brzeski, H., Hooke, J. and 

Shriver, C. (2003) High-throughput proteomic analysis of human infiltrating ductal 

carcinoma of the breast. Proteomics 3(10):1863–73. 

38. Nishimori, T., Tomonaga, T., Matsushita, K., Oh-Ishi, M., Kodera, Y., Maeda, T., 

Nomura, F., Matsubara, H., Shimada, H. and Ochiai, T. (2006) Proteomic analysis 

of primary esophageal squamous cell carcinoma reveals downregulation of a cell 

adhesion protein, periplakin. Proteomics 6(3):1011–18. 

39. Zhou, G., Li, H., DeCamp, D., Chen, S., Shu, H., Gong, Y., Flaig, M., 

Gillespie, J.W., Hu, N., Taylor, P.R., Emmert-Buck, M.R., Liotta, L.A., 

Petricoin, E.F. 3rd and Zhao, Y. (2002) 2D differential in-gel electrophoresis for 

the identification of esophageal scans cell cancer-specific protein markers. Mol 

Cell Proteomics 1(2):117–24. 

40. Yu, K.H., Rustgi, A.K. and Blair, I.A. (2005) Characterization of proteins in 

human pancreatic cancer serum using differential gel electrophoresis and tandem 

mass spectrometry. J Proteome Res 4(5):1742–51. 

41. Wan, J., Sun, W., Li, X., Ying, W., Dai, J., Kuai, X., Wei, H., Gao, X., Zhu, Y., 

Jiang, Y., Qian, X. and He, F. (2006) Inflammation inhibitors were remarkably upregulated 

in plasma of severe acute respiratory syndrome patients at progressive 

phase. Proteomics 6(9):2886–94. 

42. Greengauz-Roberts, O., Stoppler, H., Nomura, S., Yamaguchi, H., Goldenring, J.R., 

Podolsky, R.H., Lee, J.R. and Dynan, W.S. (2005) Saturation labeling with 

cysteine-reactive cyanine fluorescent dyes provides increased sensitivity for 

protein expression profiling of laser-microdissected clinical specimens. Proteomics 

5(7):1746–57. 

43. Kondo, T., Seike, M., Mori, Y., Fujii, K., Yamada, T. and Hirohashi, S. (2003) 

Application of sensitive fluorescent dyes in linkage of laser microdissection and 

two-dimensional gel electrophoresis as a cancer proteomic study tool. Proteomics 

3(9):1758–66. 

44. Sitek, B., Potthoff, S., Schulenborg, T., Stegbauer, J., Vinke, T., Rump, L.C., 

Meyer, H.E., Vonend, O. and Stuhler, K. (2006) Novel approaches to analyse 

glomerular proteins from smallest scale murine and human samples using DIGE 

saturation labelling. Proteomics 3:3. 

45. Tetu, B., Lacasse, B., Bouchard, H.L., Lagace, R., Huot, J. and Landry, J. (1992) 

Prognostic influence of HSP-27 expression in malignant fibrous histiocytoma:


a clinicopathological and immunohistochemical study. Cancer Res 52(8): 

2325–28. 

46. Wessel, D. and Flugge, U.I. (1984) A method for the quantitative recovery of 

protein in dilute solution in the presence of detergents and lipids. Anal Biochem 

138(1):141–43. 

47. Knowles, M.R., Cervino, S., Skynner, H.A., Hunt, S.P., de Felipe, C., Salim, K., 

Meneses-Lorente, G., McAllister, G. and Guest, P.C. (2003) Multiplex proteomic 

analysis by two-dimensional differential in-gel electrophoresis. Proteomics 

3:1162–71. 

48. Prabakaran, S., Swatton, J.E., Ryan, M.M., Huffaker, S.J., Huang, J.J., Griffin, J.L., 

Wayland, M., Freeman, T., Dudbridge, F., Lilley, K.S., Karp, N.A., Hester, S., 

Tkachev, D., Mimmack, M.L., Yolken, R.H., Webster, M.J., Torrey, E.F. and 

Bahn, S. (2004) Mitochondrial dysfunction in schizophrenia: evidence for compromised 

brain metabolism and oxidative stress. Mol Psychiatry 9(7):684–97. 

49. Wang, D., Jensen, R., Gendeh, G., Williams, K. and Pallavicini, M.G. (2004) 

Proteome and transcriptome analysis of retinoic acid-induced differentiation of 

human acute promyelocytic leukemia cells, NB4. J Proteome Res 3(3):627–35. 

50. Zhang, W. and Chait, B.T. (2000) ProFound: an expert system for protein 

identification using mass spectrometric peptide mapping information. Anal Chem 

72(11):2482–89. 

51. Zhang, Y.Q., Matthies, H.J., Mancuso, J., Andrews, H.K., Woodruff, E. 3rd, 

Friedman, D. and Broadie, K. (2004) The Drosophila fragile X-related gene 

regulates axoneme differentiation during spermatogenesis. Dev Biol 270(2): 

290–307. 

52. Yokoo, H., Kondo, T., Fujii, K., Yamada, T., Todo, S. and Hirohashi, S. (2004) 

Proteomic signature corresponding to alpha fetoprotein expression in liver cancer 

cells. Hepatology 40(3):609–17. 

53. Wang, S.E., Narasanna, A., Whitell, C.W., Wu, F.Y., Friedman, D.B. and 

Arteaga, C.L. (2007) Convergence of P53 and TGFbeta signaling on activating 

expression of the tumor suppressor gene maspin in mammary epithelial cells. J 

Biol Chem 4:4.

7 

MALDI/SELDI Protein Profiling of Serum 

for the Identification of Cancer Biomarkers 

Lisa H. Cazares, Jose I. Diaz, Rick R. Drake, and O. John Semmes 

Summary 

The ability to visualize the full depth of the serum proteome in a high-throughput 

manner is a major goal of clinical proteomics. Methodologies, which combine higher 

throughput with the ability to observe differential protein expression levels, have been 

applied to this goal. An example of such a system is the coupling of robotic sample 

processing to matrix-assisted laser desorption time of flight mass spectrometry (MALDI- 

TOF-MS). Within this paradigm is a modification of MALDI-TOF termed surfaceenhanced 

laser desorption/ionization-TOF (SELDI-TOF). Both conventional MALDI and 

SELDI have been used to generate protein expression profiles reflective of potential 

peptide changes in serum. This information can be used to identify proteins, which may 

enable new diagnostic and therapeutic strategies. 

Key Words: matrix-assisted laser desorption ionization; surface-enhanced laser 

desorption ionization; mass spectrometry; protein profiling; proteomics. 


Mining the serum proteome for the discovery of new biomarkers is 

a major goal of many clinical proteomics efforts. Surface-enhanced laser 

desorption/ionization (SELDI) and matrix-assisted laser desorption ionization 

(MALDI) have been used extensively for protein profiling in efforts to discover 

biomarkers in serum from cancer patients including prostate, lung, head and 

neck, ovarian, and colon (1,2,3,4,5,6). MALDI techniques usually require some 

up-front fractionation of the serum to reduce the complexity of the sample 

(7,8,9) and the ease of use in sample fractionation is considered an advantage 



125

126 Cazares et al. 

in SELDI. An advantage of MALDI-TOF instrumentation is the improved 

resolution over SELDI instruments and the ability to directly identify peaks 

of interest by analyzing samples in TOF/TOF mode. For routine linear mode 

profiling both types of instrumentation give similar results with human serum 

(see Fig. 1). 

Besides the instrumentation and methodologies related to mass spectrometry 

analysis, the quality and quantity of the clinical samples to be tested is an 

important consideration. Serum is one of the most common sample types 

used in biomarker discovery, because it is routinely obtained in the clinic, a 

large proportion of blood clotting factors are removed, and it is a rich source 

of molecules that may indicate systemic function. Blood plasma is an alternative 

source; however, clinical plasma collection utilizes various anticoagulants, 

which should be standardized to allow for universal analysis. Whether 

serum or plasma is used, every effort to standardize the sample collection and 

processing protocols should be made. Several studies have highlighted this 

and determined that multiple factors can affect the resulting spectra generated 

from serum samples (10,11). These factors include the elapsed time between 

venipuncture and separation of plasma and serum, type of serum collection tube, 

5904.6 

A. 

4212.3 

Bruker IMAC Cu 2+ beads 

3266.1 

2663.4 

5337.6 

7762.3 

9282.0 

Ciphergen IMAC Cu 2+ chip 

B. 

Three primary peaks 

used for instrument 

standardization 

1 

2 3 

4000 6000 8000 10,000 

Fig. 1. Comparison of SELDI and MALDI spectra using QC sera. (A) MALDI 

spectra generated using QC processed with IMAC Cu 2 magnetic beads. (B) SELDI 

spectra from QC sera processed on IMAC Cu 2 chips. The three peaks used for instrument 

optimization are indicated.

MALDI/SELDI Protein Profiling of Serum 127 

storage conditions, and the number of freeze thaw cycles. In our laboratory, we 

routinely use serum for proteomic profiling. The following protocols outline 

our method for collection and storage of serum samples for subsequent analysis 

via MALDI-MS. 

Reduction of sample complexity is an essential step in the generation of 

high quality TOF mass spectrometry data from serum. One method of MALDI 

sample preparation that reduces the complexity of serum while remaining 

robust and easily amenable to automated high throughput applications is sample 

fractionation using magnetic beads (MBs) combined with prestructured MALDI 

sample supports (AnchorChip technology). Several MB types with different 

surface chemistries can be used to fractionate serum and increase the number 

of detectable peaks (12) (see Fig. 2). In addition, depletion of high abundant 

203 total unique peaks mass range 1000–10000 

Intens. [a.u.] 




×10 4 

1.50 

1.25 

1.00 

0.75 

0.50 

0.25 

0.00 

×10 4 

1.5 

1.0 

0.5 

0.0 

×10 4 

2.0 

1.5 

1.0 

0.5 

1016.7 

1208.2 

1208.4 

1361.7 

1547.4 

1733.8 

1946.3 

1467.9 

1468.2 

1706.7 

1790.0 

1947.3 

2014.4 

2210.4 

2212.7 

2382.4 

2557.1 

2662.3 

2663.4 

2607.0 

2954.5 

2955.5 

2935.8 

3265.3 

3266.1 

3266.2 

3509.3 

3450.4 

3884.8 

4092.6 

3885.1 

4093.7 

3956.9 

4211.0 

4212.3 

4212.5 

4644.4 

4646.1 

4644.5 

4964.9 

4965.0 

5336.0 

5337.6 

5337.1 

5902.8 

5904.6 

5903.7 

6087.8 

6086.8 

WCX = 84 peaks 

6627.7 

6432.2 

6629.9 

6430.7 

6628.3 

7759.4 

IMAC = 85 peaks 

7762.3 

8138.3 

8923.8 

8927.5 

9278.1 

9282.0 

C18 = 62 peaks 

0.0 

×10 4 

1.50 

1.25 

WAX = 80 peaks 

1.00 

0.75 

0.50 

0.25 

0.00 

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 

1262.7 

1548.2 

2107.5 

2607.4 

3446.9 

4055.2 

4211.2 

4469.5 

4757.7 

5062.3 

6170.3 

6429.1 

6627.1 

6876.9 

7760.4 

7758.7 

8135.9 

7916.4 

8134.1 

8907.0 

9124.8 

8905.0 

9121.1 

9414.3 

9411.6 

m/z 

Fig. 2. MALDI spectra of serum fractionated with magnetic beads. Example of 

spectra produced on the Ultraflex-TOF/TOF when serum is fractionated with different 

magnetic bead types. A total of 203 unique peaks are resolved in the m/z range of 

1000–10,000.


proteins such as albumin and IgG (13,14) serves to reduce ion suppression 

phenomena as well as to reveal less abundant species. Unfortunately, fractionation 

greatly increases the number of samples to be processed, which in 

turn increases the complexity of the experimental procedure. Processing of 

samples is, therefore, best facilitated by the use of robotics, which increases 

throughput and produces reproducible results, however, manual processing of 

small sample sets can be accomplished with careful attention to detail, and the 

protocols and methods contained in this chapter. Another caveat to depletion 

strategies is that highly abundant proteins such as albumin inadvertently bind 

low abundant species (15,16). For comprehensive biomarker discovery, the 

benefits of depletion and fractionation often outweigh these factors. We have 

used both depleted and nondepleted serum strategies for biomarker discovery, 

and this continues to be a major area of methodological development. 

2. Materials 

2.1. Serum Collection and Storage 

1. Becton Dickinson vacutainer serum separator tube (SST) plus blood collection 

tube (16 mm×100 mm, draw volume 8.5 mL) (Becton Dickenson #367988) 

2. Screw cap microtubes for cryo-storage (2.0 mL) (Sarstedt Inc.# 72.609.001, with 

caps # 65.716) 

3. Microcentrifuge tubes for aliquots (1.7 mL) (Corning-Costar #3620) 

2.2. Serum Processing for MALDI Using MB-Based Fractionation 

1. The MB kit(s) (immobilized metal affinity-Cu, hydrophobic interaction, weak 

cationic, or weak anionic exchange) (Bruker Daltonics, Billerica, MA) 

2. Optional: ClinProt robotic workstation (Bruker Daltonics) 

3. Magnetic separators for manual processing: large (1.5 mL) or small tube (0.5 mL) 

format (Bruker Daltonics) 

4. -Cyano-4-hydroxycinnamic acid (CHCA) (Bruker Daltonics) 

5. Ethanol ultra pure 100% 

6. Acetone ultra pure 100% 

7. Micropipette capable of delivering 1 μL accurately 

8. Peptide standard mix (Bruker) 

9. Microtiter plate AnchorChip 600/384 MALDI target 600 μm diameter (Bruker 

Daltonics) 

2.3. Serum Processing for SELDI 

1. Water high performance liquid chromatography (HPLC) grade (Fisher Scientific, 

Hampton, NH)


2. Copper sulfate, anhydrous (Sigma-Aldrich, St. Louis, MO) 

3. Sodium acetate trihydrate salt 

4. Phosphate buffered saline (PBS) buffer pH 7.4 

5. Urea, at least 99% pure (Promega Madison, WI) 

6. CHAPS ultra purity (Fisher Scientific) 

7. Sinapinic acid (SPA) (5 μg tube)(Ciphergen Biosystems, Palo Alto, CA) 

8. IMAC protein chip arrays (Ciphergen) 

9. Bioprocessor holder (Ciphergen) for the processing or 12 chips in a 96-well 

format 

10. Bioprocessor accessory, 96-well disposable reservoir and gasket (Ciphergen) 

11. Acetonitrile ultra high purity grade 

12. Trifluoroacetic acid (TFA) (100%, 1 mL ampules) [Sigma/Aldrich Chemical 

Company 26,977-8, (589-37-37)] 

13. Plate seals 

14. For calibration: (all from Ciphergen biosystems) NP20 ProteinChip arrays Allin-one 

peptide standard All-in-one protein standard 

15. Optional: BioMek 2000 robotic workstation, adapted to process ProteinChip 

arrays (Ciphergen biosystems) 

15. DPC MicroMix 5 shaker (Diagnostic Products Corporation, Los Angeles, CA) 

or another type of rotary or platform shaker 

16. Micropipet capable of delivering 1 μL accurately 

17. Pooled serum for quality control (QC) 

18. 100 mM CuSO 4 in water [room temperature (RT)]: 1.6 g CuSO4 (MW = 159.6) 

made up to 100 mL in HPLC grade water 

19. 100 mM sodium acetate, pH 4.0 (RT): 9.0 mL 0.2 M sodium acetate stock 

(27.2 g/L), 50 mL HPLC water, 41.0 mL 0.2 M acetic acid (add gradually to 

get to pH 4.0) (11.6 mL/L made from concentrated). 

20. The PBS Buffer pH 7.4 (RT): 10 mL PBS Buffer (10) made up to 100 mL in 

HPLC water. Check pH. 

21. 10% TFA stock: 1 mL TFA (100%), 9 mL HPLC water (store in amber bottle) 

22. 1% TFA working solution (store in amber bottle and make fresh every 2 weeks): 

take 1 mL TFA (10%) and add 9 mL HPLC water 

23. 8 M Urea, 1% CHAPS in PBS, pH 7.4: 48.05 g Urea, up to 90 mL PBS pH 

7.4; stir until dissolved, may need warming. Add 1 g CHAPS. Bring the final 

volume to 100 mL with PBS. Filter through 0.4 μm filter. Aliquot into 5 mL 

volumes and freeze. 

24. 1 M Urea, 0.125% CHAPS in PBS, pH 7.4: dilute the 8 M stock above in PBS 

(100 mL 8Min700mLPBS). 

2.4. SELDI and MALDI Spectra Acquisition 

1. SELDI PBS II, IIc, or PCS 4000 instrument (Ciphergen biosystems) 

2. Ultraflex I or II MALDI-TOF–TOF (Bruker Daltonics)


3. Method 

3.1. Serum Collection 

Obtain proper patient consent: 

1. Perform venipuncture into a 10 cc SST vacutainer tube (without anticoagulant). 

2. Allow blood to clot at RT for 30 min. 

3. Spin blood at 1700 rcf for 10 min, immediately decant and freeze serum at –70°C 

in a screw cap freezer vial (Sarstedt). If this is not possible, the serum can be 

stored at –20 for 5 days, before moving to a –70 freezer. 

4. Prior to SELDI or MALDI analysis, the sample should be thawed and divided 

into small volume aliquots to avoid multiple freeze thaws. When possible, no 

sample should be taken through more than two freeze thaw cycles, and the number 

of freeze/thaw cycles should be recorded if unused volumes are returned to the 

freezer. 

3.2. Preparation of Human Serum 

Expression profiling of proteins/peptides utilizes both peak mass and 

intensity to quantify changes in differential spectra. This necessitates the use 

of a QC standard to monitor instrument performance (17). The QC sample 

routinely used in our lab is pooled human serum collected using the same 

serum collection protocol used to collect (see above SOP) the experimental 

samples. Efforts have been made to develop a standardized QC sample for 

serum mass spectrometry profiling (18). However, until that end, a large volume 

of serum can be pooled and aliquoted to be run with every experimental 

sample set. This QC sample should be assayed using the same processing 

technique, which will be employed for the experimental samples and the data 

from multiple runs analyzed. In this way, the inter- and intra-assay variability 

can be determined. Additionally, the spectra obtained from the QC sample can 

be used as a benchmark for the integrity of processing, instrument optimization, 

and ProteinChip variability. We, therefore, recommend including several QC 

samples on a MALDI target and one QC spot on each SELDI ProteinChip. 

Acceptable levels of reproducibility need to be established for any new 

technology, and sample preparation is the most critical step to the production 

of reproducible spectra (see Notes 3, 4, and 5). We have optimized the SELDI 

system with high-throughput robotics, and previous studies in our laboratory 

have determined that the mass accuracy of SELDI spectra is highly reproducible 

with CV’s of 0.05%. Operating in linear mode, we have found the mass accuracy 

of an Ultraflex-TOF–TOF to be 0.01% CV. Overall normalized intensity values 

for individual peaks using QC sera are routinely below a 20% CV for samples 

prepared robotically in our lab using either SELDI or MALDI-MS.


3.3. Serum Protein Profiling on the MALDI-TOF–TOF 

3.3.1. MB Fractionation of Human Serum 

These steps are performed by the ClinProt robot. Below is an outline of 

a comparable manual method. Sequential fractionation can also be performed 

with multiple bead types. 

1. Vortex MBs thoroughly for at least 1 min. 

2. In a 0.5 mL eppendorf, pretreat 5 μL of MBs with 50 μL MB-IMAC Cu binding 

solution. 

3. Place the tube in the magnetic bead separator (MBS) and move it between 

adjacent wells 10 times. 

4. Collect the beads on the wall of the tube for 20 s and remove the supernatant 

carefully with a pipette. 

5. Repeat this pretreatment two more times. 

6. Add 20 μL of serum and mix carefully with the beads by pipetting up and down 

five times. 

7. Keep at RT for 2 min. 

8. Place the tube in the MBS and wait for 20 s for beads to separate. 

9. Remove the supernatant with a pipette tip carefully (the unbound fraction can 

be discarded or saved for analysis or a second fractionation step, if desired). 

10. To wash, add 80 μL MB-IMAC Cu wash solution and place tube in the MSB 

again. Move the tube back and forth to adjacent wells 10 times. 

11. Collect the beads on the tube wall for 20 s and remove the supernatant carefully 

with a pipette. 

12. Repeat this wash two more times. 

13. To elute, add 10 μL MB-IMAC Cu elution solution and mix. Let the beads sit 

for 5 min at RT. 

14. Place the tube on the MBS and wait 20 s for beads to separate. 

15. Transfer the eluate to a fresh tube. 

3.3.2. Data Collection on MALDI-TOF–TOF Instrument 

To best detect proteins over the entire mass range on a MALDI instrument, 

it is necessary to optimize the instrument settings for both low mass (typically 

2000–20,000 Da) and high mass (20,000–100,000 Da or greater). The best 

sensitivity and resolution is in the mass range below m/z 20,000, and this is the 

mass range we routinely use for most profiling experiments. 

1. Prepare samples on an anchor plate by making dilutions of the eluates of 1:10 

in CHCA matrix prepared according to the anchor chip protocol (0.3 mg/mL in 

ethanol:acetone 2:1). SPA and/or 2,5-dihydroxybenzoic acid may also be used. 

2. Spot 1 μL of the sample diluted in matrix onto the 600 μm diameter AnchorChip 

target. Also spot 1 μL of the peptide standard diluted according to the manufacturer’s 

instructions.


3. Allow spots to dry. 

4. Perform external calibration with the peptide standard using a linear mode method. 

5. Collect at least 300 shots in linear mode, adjusting the laser energy and detection 

sensitivity to maximize signal and resolution of the major peaks using a QC spot. 

Typically, in linear mode the resolution of the three major peaks should be greater 

than 600. 

6. Instrument settings will vary based on instrument set-up, and are more numerous 

that is feasible to describe in this book chapter but the most important settings to 

optimize are acceleration voltage (IS1), laser power, time lag focusing (or PIE), 

detector settings, and matrix suppression. Our basic instrument settings in linear 

mode are as follows: 

IS1, 22 

Laser, 37% with laser attenuation offset at 48%, range at 40% 

Time lag focus, 200 ns 

Detector Gain, 24× 

Matrix suppression, gated with suppression up to m/z 800 

All spectra should be processed using the same baseline subtraction protocol. 

Perform peak detection using a uniform definition of requisite signal-to-noise 

ratio and mass window. Although MALDI techniques have the potential to 

produce protein profiles that contain patterns capable of distinguishing disease 

and identifying biomarkers, a single analysis may produce many hundreds of 

protein peaks (see Note 2). Therefore, the data analysis required to discern 

the differentiating patterns poses a major challenge, and the analysis and interpretation 

of the enormous volumes of proteomic data remains an unsolved 

bioinformatics challenge. Many different classification tools are currently being 

used with success for the analysis of MALDI data. These approaches include 

Fisher discriminative analysis, CART (19,20), support vector machine (21), 

artificial neural network (22), boosted decision tree analysis (23), and genetic 

algorithm (24). General considerations for data preparation before any type 

of analysis should include averaging intensity values for duplicate samples, 

baseline subtraction, and peak picking. 

3.4. Protein Identification Using MALDI-TOF/TOF 

Biomarker candidates detected by protein profiling can be subjected to 

TOF/TOF analysis for the identification of peptides directly from serum profiles 

using the same sample spot and/or respotting of the sample. Initial analysis in 

the reflectron mode will allow for visualization of the target or parent peak. 

Metastable fragment ions of the respective precursor ion are then analyzed after 

a second acceleration step, and the resulting fragment pattern is interpreted and


Peptide View 

MS/MS Fragmentation of DSGEGDFLAEGGGVR 

Found in gi|229185, fibrinopeptide A 

Start - End 

2 - 16 

Observed 

1465.72 

Mr(expt) 

1464.72 

Mr(calc) 

1464.65 

Delta 

0.07 

Miss 

0 

Sequence 

DSGEGDFLAEGGGVR 

Matched peptides shown in Bold Red 

1 ADSGEGDFLA EGGGVR 

×10 4 

3 

A 

1468.0 

1SLin, Baseline subtracted 


2 

1 

0 

1208.2 

1352.5 

1868.2 

1619.0 

2675.6 

1780.8 2024.5 

2297.6 

2557.2 

1200 1400 1600 1800 2000 2200 2400 2600 m/z 

C 

B 

Fig. 3. Identification of a serum peptide directly from the serum profile. Serum 

profile (A) was generated in linear mode on the Ultraflex-TOF/TOF, from which a 

peptide (m/z 1469.09) was selected for MS/MS analysis resulting in a fragmentation 

spectra (B). This peptide showed homology to fibrinopeptide A using the Mascot search 

engine (C). 

used for peptide identification via database search. The possibility to directly 

sequence the peptides of interest is a powerful feature of this method (see Fig. 3). 

3.5. Serum Protein Profiling on SELDI-TOF 

3.5.1. Preparation of Serum 

Note: All of the following steps including the ProteinChip preparation and 

serum incubation on the arrays are performed robotically by the BioMek 2000 

robot. The protocols below outline a manual method. 

1. Thaw human serum samples on ice. Use separate aliquots to set up duplicates or 

triplicates. 

2. Add 20 μL human serum into a 1.7 mL microcentrifuge tube (alternatively, this 

can be performed in a v-bottom 96-well plate for large sample sets).


3. Add 30 μL of 8 M Urea, 1% CHAPS in PBS pH 7.4. 

4. Vortex tube at 4°C for 10 min or if using a plate, seal and place on MicroMix 5 

shaker at 4°C for 10 min: shaker settings: form 20, amplitude 5, time 10 min. 

5. Add 100 μL 1 M Urea, 0.125% CHAPS in PBS pH 7.4. 

6. Vortex or pipette up and down to mix (total volume 150 μL). 

7. Dilute sample 1:5 in PBS pH 7.4 by adding 600 μL PBS. If using a plate, remove 

35 μL of serum–urea mixture from first plate and transfer to a second plate. Then 

add 140 μL of PBS. Mix by vortexing tube or pipetting up and down. 

8. Store on ice until ready to add samples to a bioprocessor containing ProteinChip 

arrays. 

3.5.2. Preparation of ProteinChip Arrays 

This protocol describes the preparation of IMAC-Cu 2+ ProteinChips. Other 

types of chips should be prepared according to the manufacturer’s (Ciphergen) 

instructions. 

1. Label or number IMAC chips on the reverse side and place them into the 

bioprocessor according to the manufacturer’s instructions. (see Note 1) 

2. Add 50 μL of 100 mM CuSO 4 onto each spot or array. 

3. Shake on Micromix 5 for 10 min at RT. 

4. Shaker settings: form 20, amplitude 5, time 10 min 

5. Flick plate to remove CuSO 4 to waste and pat upside down onto a clean paper 

towel to remove residual liquid (liquid can also be removed by aspiration, but 

be careful no to touch array surface with pipette tip). 

6. Wash with 200 μL of HPLC water 2 min × 5 min at RT on Micromix shaker at 

the same settings for form and amplitude as before. 

7. Flick plate and pat on paper towel. 

8. Add 50 μL of 100 mM sodium acetate pH 4.0. 

9. Shake on Micromix shaker for 5 min at RT. 

10. Flick plate and pat as before. 

11. Wash with HPLC water 2 min × 5 min at RT on Micromix. 

12. Add 200 μL PBS pH 7.4. 

13. Flick plate and pat as before. 

14. Wash with PBS pH 7.4 2 min × 5 min at RT on Micromix. 

Leave last volume of PBS on plate until ready to use. 

3.5.3. Incubation of Serum on ProteinChip Arrays 

1. Remove PBS from bioprocessor with multichannel pipettor, one row at a time 

to avoid drying chips. 

2. Add 100 μL of each sample to respective arrays. Note: samples should be 

randomized as to their placement on the ProteinChip arrays. Duplicate samples 

should also be randomly placed.


3. Seal plate and shake bioprocessor on micromix (form 20, amplitude 5) for 

30 min at RT. 

4. Remove samples carefully with a pipette, changing tips to avoid cross contamination. 

5. Add 200 μL PBS pH 7.4 to each array and shake on micromix for 5 min at RT 

using same shaker settings. 

6. Remove PBS with multichannel pipettor changing tips for each row. 

7. Wash with 200 μL HPLC water, shake on micromix for 5 min at RT. 

8. Remove water with multichannel pipettor. 

9. Repeat water wash. 

10. Remove chips from bioprocessor and allow chips to dry completely. 

3.5.4. Adding SPA Matrix to the Chips 

1. To one tube of SPA, add 200 μL acetonitrile (100%). 

2. Add 200 μL 1% TFA (final concentration of SPA:12.5 mg/mL in 50% acetonitrile, 

50% 0.5% TFA). 

3. Vortex for 5 min at RT. 

4. Quick spin. 

5. Add 1.0 μL SPA matrix to each dry spot, being careful not to touch the pipette 

tip to the array surface. 

6. Allow to dry. 

7. Arrays are now ready to read on the SELDI instrument. Note: The arrays should 

be stored in the dark in a cool dry place. It is recommended to read the chips 

within a few hours of the addition of the matrix. Some signal degradation may 

occur if the arrays are stored for more than 24 h). 

3.5.5. Collection of Spectra on SELDI-TOF 

We describe here the collection of spectra using the PBS II Ciphergen 

instrument. 

3.5.5.1. Calibration 

Calibration of the SELDI instrument is crucial to the accurate mass analysis 

of the proteins present in samples. Smaller ions fly faster than larger ions, and 

their m/z ratio can be calculated from their flight time using compounds of 

known mass. For the most accurate mass assignments, the instrument should be 

calibrated using conditions identical to the experimental conditions. Calibration 

should be performed at the beginning of an experimental run, and thereafter 

everyday the experimental data is collected. When obtaining calibration spectra, 

use instrument settings as close to the settings used for serum profiling (i.e., 

detector voltage, lag time, etc.) as possible. 

1. Reconstitute one vial each of the seven-in-one peptide and protein standards, 

according to the manufacturer’s instructions. Aliquot and freeze.


2. Mix standards with SPA according to package insert. 

3. Deposit 1 μL of each standard onto an array of an NP20 ProteinChip. 

4. Air-dry the arrays completely, usually 30–60 min. 

5. Read the array in the SELDI instrument using a spot protocol created to read 

the experimental samples (see below). The laser intensity should be lowered 

such that the peaks from the standards do not exceed 75% maximum signal 

intensity. 

6. Follow the calibration dialogue in the software of the PBSII SELDI instrument 

to save the calibration equations. 

3.5.5.2. SELDI Instrument Settings Optimization 

The SELDI instrument optimization refers to the adjustment of settings 

necessary for data collection, which will maximize signal intensity while 

retaining the optimal resolution and the lowest noise. In our studies, there are 

three consistently present protein peaks (m/z 5900, 7764, 9284 ± 0.2%) in the 

QC sera processed on IMAC-Cu 2+ ProteinChips, which are used as benchmarks 

for instrument optimization (see Fig. 1). Based on multiple runs, the 

instrument settings are adjusted to maximize signal to noise and resolution for 

these three peaks. Thereafter specific criteria were set to ensure instrument 

optimization (refer to paper Semmes et al. (17)). Generally, when trying to 

obtain a specific overall intensity level (e.g., to get two instruments to behave 

similarly, or to obtain similar intensity levels over time), three parameters can 

be adjusted. These include laser intensity, detector sensitivity, and detector 

voltage. The following spot protocols for data collection on the SELDI reader 

are a starting point. The settings will be different from instrument to instrument 

and will change over time, based on cumulative laser utilization and detector 

settings. 

Data collection: standard spot protocol for QC serum on IMAC-Cu (for a 

PBSII) 

1. Set detector voltage to 1650. 

2. Set high mass to 100,000 Da, optimized from 3000 to 50,000 Da. 

3. Set starting laser intensity to 220. 

4. Set starting detector sensitivity to 7. 

5. Focus lag time at 900 ns. 

6. Set data acquisition method to SELDI quantitation. 

7. Set SELDI acquisition parameters 20 delta to 4 transients per to 12 ending position 

to 80. 

8. Set warming positions with two shots at intensity 230 and do not include warming 

shots. 

When adjusting to meet QC criteria:


• Increasing detector voltage typically increases signal and noise. Change this in units 

of 25 V. 

• Increasing laser increases signal and generally decreases resolution. Change this in 

units of 10. 

• Increasing sensitivity increases signal intensity. Typical working range is six to eight. 

For example, if the settings above are not meeting QC specifications, try the 

following: 

If S/N passes easily but resolution is low, reduce detector voltage or laser 

intensity: 








to 80 (192 total shots). 


shots. 

If resolution passes but S/N is low increase laser intensity or detector voltage: 








to 80. 


shots. 

If intensity is too high (i.e., generally stay under 65), reduce laser intensity 

and/or sensitivity: 






6. Set data acquisition method to SELDI quantitation.



to 80. 


shots. 

After data collection, each spectrum should be calibrated for mass using the 

current peptide calibration. If higher molecular weight data is included for 

analysis, the protein standard calibration should be used for the peaks in this 

mass range. Spectra should be normalized using total ion current (this is a 

feature in the Ciphergen software) with the same normalization coefficient 

and low mass cutoff (2000 Da for SPA matrix to exclude matrix peaks). All 

spectra should also be processed using the same baseline subtraction protocol. 

Perform peak detection using a uniform definition of requisite signal-to-noise 

ratio (usually 3) and mass window (usually 0.2–0.3%). 

4. Notes 

1. Use powder-free nitrile (not latex) gloves when processing SELDI ProteinChips. 

Repetitive peaks at 3000–4000 Da will appear in the spectra if samples are 

contaminated with latex. 

2. Use sample sets of sufficient size. A sample set of at least 30 should be included 

in each classification group in order to do multivariate analysis and to give >90% 

statistical confidence in a single marker with p values


Liotta, L. A. (2002). Use of proteomic patterns in serum to identify ovarian cancer. 

Lancet, 359: 572–577. 

4. de Noo, M. E., Mertens, B. J., Ozalp, A., Bladergroen, M. R., van der Werff, M. P., 

vandeVelde,C.J.,Deelder,A.M.,andTollenaar,R.A.(2006).Detectionofcolorectal 

cancer using MALDI-TOF serum protein profiling. Eur J Cancer, 42: 1068–1076. 

5. Sidransky, D., Irizarry, R., Califano, J. A., Li, X., Ren, H., Benoit, N., and Mao, L. 

(2003). Serum protein MALDI profiling to distinguish upper aerodigestive tract 

cancer patients from control subjects. J Natl Cancer Inst, 95: 1711–1717. 

6. Howard, B. A., Wang, M. Z., Campa, M. J., Corro, C., Fitzgerald, M. C., and 

Patz, E. F. Jr. (2003). Identification and validation of a potential lung cancer serum 

biomarker detected by matrix-assisted laser desorption/ionization-time of flight 

spectra analysis. Proteomics, 3: 1720–1724. 

7. Baumann, S., Ceglarek, U., Fiedler, G. M., Lembcke, J., Leichtle, A., and Thiery, J. 

(2005). Standardized approach to proteome profiling of human serum based on 

magnetic bead separation and matrix-assisted laser desorption/ionization time-offlight 

mass spectrometry. Clin Chem, 51: 973–980. 

8. Orvisky, E., Drake, S. K., Martin, B. M., Abdel-Hamid, M., Ressom, H. W., 

Varghese, R. S., An, Y., Saha, D., Hortin, G. L., Loffredo, C. A., and Goldman, R. 

(2006). Enrichment of low molecular weight fraction of serum for MS analysis of 

peptides associated with hepatocellular carcinoma. Proteomics, 6: 2895–2902. 

9. Feuerstein, I., Rainer, M., Bernardo, K., Stecher, G., Huck, C. W., Kofler, K., 

Pelzer, A., Horninger, W., Klocker, H., Bartsch, G., and Bonn, G. K. (2005). 

Derivatized cellulose combined with MALDI-TOF MS: a new tool for serum 

protein profiling. J Proteome Res, 4: 2320–2326. 

10. Rai, A. J., Gelfand, C. A., Haywood, B. C., Warunek, D. J., Yi, J., Schuchard, M. D., 

Mehigh, R. J., Cockrill, S. L., Scott, G. B., Tammen, H., Schulz-Knappe, P., Speicher, 

D. W., Vitzthum, F., Haab, B. B., Siest, G., and Chan, D. W. (2005). HUPO plasma 

proteome project specimen collection and handling: towards the standardization of 

parameters for plasma proteome samples. Proteomics, 5: 3262–3277. 

11. Banks, R. E., Stanley, A. J., Cairns, D. A., Barrett, J. H., Clarke, P., Thompson, D., 

and Selby, P. J. (2005). Influences of blood sample processing on low-molecular 

weight proteome identified by surface-enhanced laser desorption/ionization mass 

spectrometry. Clin Chem, 51: 1637–1649. 

12. Villanueva, J., Philip, J., Entenberg, D., Chaparro, C. A., Tanwar, M. K., 

Holland, E. C., and Tempst, P. (2004). Serum peptide profiling by magnetic 

particle-assisted, automated sample processing and MALDI-TOF mass 

spectrometry. Anal Chem, 76: 1560–1570. 

13. Guerrier, L., Thulasiraman, V., Castagna, A., Fortis, F., Lin, S., Lomas, L., 

Righetti, P. G., and Boschetti, E. (2006). Reducing protein concentration range 

of biological samples using solid-phase ligand libraries. J Chromatogr B Analyt 

Technol Biomed Life Sci, 833: 33–40. 

14. Fountoulakis, M., Juranville, J. F., Jiang, L., Avila, D., Roder, D., Jakob, P., 

Berndt, P., Evers, S., and Langen, H. (2004). Depletion of the high-abundance 

plasma proteins. Amino Acids, 27: 249–259.


15. Lowenthal, M. S., Mehta, A. I., Frogale, K., Bandle, R. W., Araujo, R. P., 

Hood, B. L., Veenstra, T. D., Conrads, T. P., Goldsmith, P., Fishman, D., Petricoin, 

E. F. 3rd, and Liotta, L. A. (2005). Analysis of albumin-associated peptides and 

proteins from ovarian cancer patients. Clin Chem, 51: 1933–1945. 

16. Mehta, A. I., Ross, S., Lowenthal, M. S., Fusaro, V., Fishman, D. A., 

Petricoin, E. F. 3rd, and Liotta, L. A. (2003). Biomarker amplification by serum 

carrier protein binding. Dis Markers, 19: 1–10. 

17. Semmes, O. J., Feng, Z., Adam, B. L., Banez, L. L., Bigbee, W. L., Campos, D., 

Cazares, L. H., Chan, D. W., Grizzle, W. E., Izbicka, E., Kagan, J., Malik, G., 

McLerran, D., Moul, J. W., Partin, A., Prasanna, P., Rosenzweig, J., Sokoll, L. J., 

Srivastava, S., Srivastava, S., Thompson, I., Welsh, M. J., White, N., Winget, M., 

Yasui, Y., Zhang, Z., and Zhu, L. (2005). Evaluation of serum protein profiling by 

surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for 

the detection of prostate cancer: I. Assessment of platform reproducibility. Clin 

Chem, 51: 102–112. 

18. Rai, A. J., Stemmer, P. M., Zhang, Z., Adam, B. L., Morgan, W. T., Caffrey, R. E., 

Podust, V. N., Patel, M., Lim, L. Y., Shipulina, N. V., Chan, D. W., Semmes, O. J., 

and Leung, H. C. (2005). Analysis of human proteome organization plasma 

proteome project (HUPO PPP) reference specimens using surface enhanced 

laser desorption/ionization-time of flight (SELDI-TOF) mass spectrometry: multiinstitution 

correlation of spectra and identification of biomarkers. Proteomics, 5: 

3467–3474. 

19. Semmes, O. J., Cazares, L. H., Ward, M. D., Qi, L., Moody, M., Maloney, E., 

Morris, J., Trosset, M. W., Hisada, M., Gygi, S., and Jacobson, S. (2005). Discrete 

serum protein signatures discriminate between human retrovirus-associated 

hematologic and neurologic disease. Leukemia, 19: 1229–1238. 

20. Qian, H. G., Shen, J., Ma, H., Ma, H. C., Su, Y. H., Hao, C. Y., Xing, B. C., 

Huang, X. F., and Shou, C. C. (2005). Preliminary study on proteomics of gastric 

carcinoma and its clinical significance. World J Gastroenterol, 11: 6249–6253. 

21. Ressom, H. W., Varghese, R. S., Abdel-Hamid, M., Eissa, S. A., Saha, D., 

Goldman, L., Petricoin, E. F., Conrads, T. P., Veenstra, T. D., Loffredo, C. A., 

and Goldman, R. (2005). Analysis of mass spectral serum profiles for biomarker 

selection. Bioinformatics, 21: 4039–4045. 

22. Liu, J., Zheng, S., Yu, J. K., Zhang, J. M., and Chen, Z. (2005). Serum protein 

fingerprinting coupled with artificial neural network distinguishes glioma from 

healthy population or brain benign tumor. J Zhejiang Univ Sci B, 6: 4–10. 

23. Qu, Y., Adam, B. L., Yasui, Y., Ward, M. D., Cazares, L. H., Schellhammer, P. F., 

Feng, Z., Semmes, O. J., and Wright, G. L. Jr. (2002). Boosted decision tree analysis 

of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates 

prostate cancer from noncancer patients. Clin Chem, 48: 1835–1843. 

24. Papadopoulos, M. C., Abel, P. M., Agranoff, D., Stich, A., Tarelli, E., Bell, B. A., 

Planche, T., Loosemore, A., Saadoun, S., Wilkins, P., and Krishna, S. (2004). A 

novel and accurate diagnostic test for human African trypanosomiasis. Lancet, 363: 

1358–1363.

8 

Urine Sample Preparation and Protein Profiling 

by Two-Dimensional Electrophoresis 

and Matrix-Assisted Laser Desorption Ionization 

Time of Flight Mass Spectroscopy 

Panagiotis G. Zerefos and Antonia Vlahou 

Summary 

Urine represents the most easily attainable and consequently one of the most common 

samples in clinical analysis and diagnostics. However, urine is also considered one of 

the most difficult proteomic samples to work with due to its highly variable contents, 

as well as the presence of various proteins in low abundance or modified forms. In this 

chapter, we describe simple protocols and troubleshooting tips for urinary protein preparation 

and profiling by two-dimensional electrophoresis or directly via matrix-assisted laser 

desorption ionization time of flight mass spectroscopy. Direct dilution, protein precipitation, 

ultrafiltration, and solid phase extraction in combination to the above profiling 

technologies serve the means for reliable proteomics analysis of one of the most significant 

yet very complex biological samples. 

Key Words: urine; 2DE; MALDI-TOF-MS; protein profiling; sample preparation. 

Abbreviations: ACT: Acetone, CE: Capillary electrophoresis, CHAPS: 

[3-[(3-cholamidopropyl)dimethylammonio-1-propanesulfonate], CHCA: -Cyano-4- 

hydroxycinnamic acid, d: Dalton, 2DE: Two-dimensional gel electrophoresis, DHB: 

Dihydroxybenzoic acid, DTE: 1,4-Dithioerythritol, IEF: Isoelectric focusing, IPG: 

Immobilized pH gradient, LC: Liquid chromatography, MALDI: Matrix-assisted laser 

desorption ionization, MS: Mass spectrometry, MW: Molecular weight, MWCO: 

Molecular weight cut-off, ns: Nano-second, o/n: Overnight, RCF: Relative centrifugal 

forces, SA: Sinapinic acid, SDS: Sodium dodecylsulfate, SELDI: Surface-enhanced laser 

desorption, SPE: Solid phase extraction, TCA: Trichloroacetic acid, TFA: Trifluoroacetic 

acid, TGS: Tris-Glycine-SDS, TOF: Time of flight, UF: Ultrafiltration 



141

142 Zerefos and Vlahou 


Biological fluids play a central role in clinical chemistry. Investigation 

of their cellular (cell number, morphology, etc.) biochemical (metabolites, 

biomolecules) and physicochemical (pH, transparency, absorption, etc.) 

attributes assists in formulating the clinical judgment on disease prognosis, 

diagnosis, and treatment. Urine, according to International Union of Pure and 

Applied Chemistry, is the human fluid, which contains water and metabolic 

products and is excreted by the kidneys, stored in the bladder and normally 

discharged by the way of the urethra. The protein content of urine is very low 

under normal conditions (1) and derives mainly from human plasma proteins, 

which are not filtered through the renal glomeruli. The presence of proteins at 

high concentrations in urine is usually the result of disease or pharmaceutical 

treatment. Creatinine assay in urine is one of the most common clinical examinations 

and serves this exact purpose, to assess unexpected protein excretion. 

It should be noted that besides the soluble proteins, urine also contains proteins 

included in exfoliated cells as well as in membrane components known as 

exosomes (2). In this chapter, we focus on the description of methods for the 

analysis of the soluble urinary proteins and would recommend for the interested 

reader the review by Pisitkun et al. (2), for a thorough description of the other 

urinary protein components. 

In comparison to other proteomics samples, urine is still less explored. The 

main reason for this is the fact that urine is a difficult and diverse sample. Its 

composition is age, sex, health, and drug dependent. In addition, tremendous 

day variations on the protein content exist between first, void, midstream, 

morning and random catch urine samples of a single donor. Despite these 

facts, protein markers for disease have been detected in urine and have been 

approved to be utilized as adjuncts to clinical assays for disease diagnosis 

and prognosis (3,4). This justifies and triggers an in-depth analysis of the 

urinary proteome, particularly with the advent of contemporary proteomics 

technologies, with the objective to identify novel disease diagnostic/prognostic 

biomarkers. 

Specifically, urine proteome has been studied thoroughly by a series 

of proteomics technologies. These include, two-dimensional electrophoresis 

(5,6), liquid chromatography (LC) in combination to mass spectroscopy 

(MS) (7,8), matrix-assisted laser desorption ionization-time of flight (MALDI- 

TOF) or surface-enhanced laser desorption (SELDI)-TOF profiling (9,10,11, 

12,13), capillary electrophoresis coupled to MS (14,15) and combinations 

thereof, implementing several separation steps both chromatographic and 

electrophoretic (15,16,17,18,19). The great interest in the investigation of the 

urinary proteome is reflected by the recent establishment of the human urine

Urine Sample Preparation 143 

and kidney proteome initiative (http://hkupp.kir.jp) within the Human Proteome 

Organization that targets the integration of existing research efforts in this field. 

In this chapter, we provide detailed protocols and troubleshooting tips 

as experienced by the authors, in the preparation and analysis of urinary 

proteins by two-dimensional gel electrophoresis (2DE) or directly by MALDI- 

TOF-MS. We selected these two profiling approaches since the former 

is a classical high resolution profiling approach (see also Chapters 4– 

6), whereas the latter offers the advantage of high throughput [see also 

Chapters 7 and 13]. In general, the process of urine analysis for the investigation 

of its protein content can be divided into three main steps: sample 

collection, usually performed at the physician’s office, protein extraction, 

protein separation, and detection. Each of these steps is very crucial and 

affects significantly the output of the proteomics experiment. In this chapter, 

an emphasis is given on the description of the various protein preparation/extraction 

methodologies including: ultrafiltration, precipitation, and 

solid phase extraction (SPE) as they complement 2DE and MALDI-TOF-MS 

profiling. Apparently, additional protein preparation methods exist such as 

dialysis, ultracentrifugation, etc. (see Note 1); however, we have focused on 

the three aforementioned methods due to their simplicity, increased reproducibility, 

and overall compatibility with the 2DE and MALDI MS profiling 

approaches. 

1.1. Protein Precipitation 

Protein precipitation is a very common purification procedure employed 

for the isolation of macromolecules. The denaturation and precipitation of 

proteins occurs in solutions of extreme ionic strength, very low pH, or high 

concentrations of organic solvents. In such conditions, biopolymers do not 

retain a conformation capable of sustaining their solubility. Commonly used 

reagents are ammonium sulfate ([NH 4 ] 2 SO 4 ), used for protein desalting at 

concentrations of 3 M, trichloroacetic acid (TCA) [used at concentrations higher 

than 5% (w/v)], and several organic solvents [ethanol, acetone, acetonitrile, 

chloroform, methanol, and isopropanol, at final concentrations higher than 

50%, (v/v)]. The choice of the precipitation methodology depends primarily on 

the analytical procedure employed. In general, protein desalting is avoided in 

proteomics sample preparations since residual salts inhibit further analysis by 

2DE and mass spectrometry. TCA precipitation followed by acetone washes is 

very popular and efficient, especially in cases of very dilute protein solutions. 

Organic solvents offer very high yields but some of them are toxic (methanol, 

acetonitrile) while others like chloroform (also toxic) employ rather complicated


precipitation procedures. A detailed description of these approaches for urinary 

protein preparation is provided in Section 3. 

1.2. Ultrafiltration-SPE 

Ultrafiltration is a technique based on the use of molecular filters in combination 

to centrifugal forces. The whole procedure is performed in a centrifuge 

and in temperatures varying from 4°C to ambient conditions. It presents many 

advantages; for example, proteins are kept in solution and are more easily 

handled. A major disadvantage is the cost of the approach and the fact that 

even traces of the filter materials, when eluted, produce significant problems 

in MS based methodologies. 

Solid phase extraction in combination to MS for urine clinical proteomics 

is a newly added approach (22). SPE in the form of magnetic particles was 

recently developed as the front end of direct profiling of biological fluids by 

MS (23). 

We have found that acetone or TCA precipitation and ultrafiltration are very 

efficient urinary protein preparation approaches, highly compatible with 2DE 

analysis (Figs. 1, 2). In the case of MALDI MS profiling, we favor the utilization 

of ultrafiltration, SPE as well as direct dilution of urine in MS compatible 

buffers as front end protein preparation methods (see Note 2, Fig. 3). The 

detailed protocols are provided below. 

1 2 3 4 5 6 7 8 

Fig. 1. Comparison of urinary sample preparation approaches. Lanes correspond to: 

(1) marker, (2) urine starting material, (3) TCA/acetone precipitation supernatant, (4) 

TCA precipitate, (5) urine supernatant after 3 h centrifugation at 200,000 RCF, (6) 

protein pellet after ultracentrifugation of 5 mL urine, (7) urine filtrate after ultrafiltration 

through 5 kd MWCO, and (8) urine retentate after ultrafiltration. In lanes 2, 3, 5, and 

7 equal volumes of urine sample were utilized; similarly, lanes 4 and 8 correspond 

to same amount of starting urine material in order to facilitate comparison of the 

approaches.


2. Materials 

2.1. Sample Collection, Handling, and Storage 

1. Polypropylene aliquoting tubes (1.5, 2, 15, and 50 mL), Sarstedt Corporation 

(Nümbrecht, Germany) 

2.2. Urine Sample Preparation/Protein Precipitation 

2.2.1. TCA/Acetone Precipitation Protocol 

1. Trichloroacetic acid, ultra pure (store solutions at 2–8°C), Sigma Corporation 

(St. Luis, MO, USA) 

2. Acetone, analytical purity grade, Sigma Corporation 

2.2.2. Organic Solvent Precipitation Protocol 

1. Acetone, analytical purity grade, Sigma Corporation 

2. Isopropanol, analytical purity grade, Sigma Corporation 

3. Ethanol, analytical purity grade, Sigma Corporation 

2.2.3. Urine Ultrafiltration 

1. Amicon ultrafiltration devices, Millipore Corporation (Billerica, MA, USA) 

2.2.4. Urine SPE 

1. Bioselect C18 SPE cartridges were from Grace Vydac (Columbia, MS, USA) 

2. Methanol, high performance liquid chromatography HPLC grade, Sigma 

Corporation 

3. Acetonitrile, HPLC grade, Sigma Corporation 

4. Trifluoroacetic acid, HPLC grade, Sigma Corporation 

2.3. Analytical/Profiling Techniques 

2.3.1. Two-Dimensional Separation 

1. Protean isoelectric focusing (IEF) cell, Biorad (Hercules, CA, USA) 

2. Nonlinear immobilized pH gradient (IPG) strips (3,4,5,6,7,8,9,10), 17 cm long 

3. 2DE sample buffer: 7 M urea, 2 M thiourea, 4% CHAPS w/v, 0.4% 1,4- 

dithioerythritol (DTE) w/v, 2% IPG buffer (Biorad) w/v, all components are of 

molecular biology grade 

4. Mineral oil 

5. Equilibration buffer I: 6 M urea, 50 mM Tris–HCl, pH 8.8, 30% glycerol, 2.0% 

sodium dodecylsulfate (SDS), 30 mM DTE 

6. Equilibration buffer II: 6 M urea, 50 mM Tris–HCl, pH 8.8, 30% glycerol (v/v), 

2.0% SDS (w/v), 230 mM iodocatemide. All components are of molecular biology 

grade


7. Fixation solution: 5% phosphoric acid (p.a grade, Sigma) w/v, 50% methanol v/v 

(HPLC grade, Sigma) 

8. Colloidal coomassie brilliant blue staining kit, Invitrogen (Carlsbad, CA, USA) 

9. GS-800 calibrated densitometer and PDQuest software, Biorad 

2.3.2. MALDI-TOF-MS 

1. Matrix solution: 50% acetonitrile v/v, 0.1% trifluoroacetic acid (TFA) v/v, 0.75% 

[-cyano-4-hydroxy-cinnamic (CHCA), Sigma Corporation]. Caution: all MALDI 

matrices are light sensitive; avoid unnecessary light exposure. Fresh preparation 

is advised, or else keep for 1 week (maximum) and store at 4°C 

2. MALDI ground steel target plate 

3. Ultraflex I MALDI-TOF-TOF-MS (Bruker Daltonics, Bremen, Germany) 

4. FlexAnalysis 2.2 software, Bruker Daltonics 

2.4. Miscellaneous 

The HPLC grade water (Resistivity >18 M cm −1 , Total organic carbon 

(TOC)


A1 

A2 

B1 

A3 

A4 

B2 

Fig. 2. Two-dimensional profiling of (A) 24 h collected urine concentrated by 

(1) ultrafiltration through 5000 MWCO, (2) TCA precipitation, (3) acetone precipitation 

without washing of the protein pellet, and (4) acetone precipitation with pellet washing. 

In these cases (1,2,3,4), the starting material was preconcentrated via membrane 

filtration (Pellicon 2 system, Millipore, Corporation); ultrafiltration and TCA or acetone 

precipitation, as applicable, were applied for the further concentration of the sample 

prior to 2DE analysis. (B) Two-dimensional profiling of random catch urine (50 mL 

starting volume without any preconcentration) condensed via (1) ultrafiltration through 

5000 MWCO and (2) acetone precipitation. In all cases, 1 mg of protein was analyzed 

and visualized with colloidal coomassie stain in 3–10 nonlinear IPG strips. 

6. Let pellet dry at ambient temperature (see Note 11). 

7. Solubilize pellet in 2DE sample buffer and proceed with 2DE analysis (see 

Subheading 3.3.1, Note 12, and Fig. 2). 

8. The protein pellet may also be subjected to solubilization with MS compatible 

buffers and analyzed by MS profiling (see Note 13, Subheading 3.3.2, and 

Fig. 3). 

3.2.2. Organic Solvent Precipitation Protocol 

1. Add to the urine sample at least equal volume of the desired organic solvent 

(ethanol, acetone, or isopropanol) and mix (see Notes 14, 15). 

2. Keep at –20°C o/n (see Note 16).


G 

Intensity 

×10 4 2 

6 

4 

E 

F 

D 

C 

B 

1000 

5000 

Mass to charge 

A 

10,000 

Fig. 3. MALDI-TOF-MS profiling of urine. (A) Ultrafiltration retentate through 

5000 MWCO, diluted 10× in 0.1% TFA; (B) 10× dilution of urine in 0.1% TFA; 

(C) supernatant of urine (diluted in 0.1% TFA) after protein precipitation via acetone; 

(D) urine protein pellet from acetone precipitation reconstituted in 0.1% TFA; (E) urine 

protein pellet from acetone precipitation reconstituted in 50% acetonitrile 0.1% TFA; 

(F) acetone precipitation (supernatant) and further purification of the supernatant by 

C18-SPE followed by dilution in 0.1% TFA; (G) C18-SPE eluate in 50% acetonitrile, 

0.1% TFA. Extensive reproducibility studies indicated that urine processing by ultrafiltration 

or direct dilution in 0.1% TFA provides with the most robust spectra of the 

methods tested. Adapted from (13). 

3. Centrifuge at standard refrigerated bench-top centrifuges (for eppendorf type 

tubes) for 15 min at RCF of 16,000–17,000 and 4°C. Discard the supernatant. 

4. Wash pellet with ice-cold acetone, leave for 5–10 min at –20°C, and centrifuge 

again. Discard supernatant and repeat once more the washing step (see Note 17). 

5. Let pellet dry at ambient temperature. 

6. Solubilize pellet and proceed with 2DE analysis. The protein pellet or supernatant 

may also be subjected to solubilization with MS compatible buffers and analyzed 

by MS profiling (see Notes 12, 13, Subheading 3.3.1, Figs. 2, 3). 

3.2.3. Urine Ultrafiltration 

1. Place one volume of urine upon a 5000 kd molecular weight cut-offs (MWCO) 

Amicon ultrafiltration device (see Notes 18–20).


2. Spin in a refrigerated centrifuge at 3500 RCF and 8–12°C (see Notes 21, 22). 

3. After condensation, collect the retentate and discard or keep the filtrate depending 

on the specific application (see Notes 23–25). 

4. For 2DE add the appropriate volume of sample buffer to the retentate and proceed 

with IEF (see Notes 26–27, Subheading 3.3.1, and Fig. 2). 

5. For MALDI profiling dilute the retentate 10 times with 0.1% TFA v/v, and 

proceed as described below (see Subheading 3.3.2, Fig. 3). 

3.2.4. Urine SPE (see Note 28) 

1. Activate cartridge with a total of 1 mL methanol (two applications of 500 μL each). 

2. Wash cartridge with 2 mL acetonitrile (four applications of 500 μL each, see 

Note 29). 

3. Equilibrate cartridge with a total of 1 mL 0.1% TFA v/v (two applications of 

500 μL each). 

4. Load cartridge with 1 mL urine acidified by TFA at 0.1% (v/v) final concentration. 

5. Wash cartridge with 1 mL 0.1% TFA v/v (two applications of 500 μL each). 

6. Elute compounds by adding 100 μL of 50% acetonitrile, 0.1% TFA v/v. 

7. Take 1 μL eluent, place on MALDI target, and process for MALDI MS profiling 

(see Subheading 3.3.2, Fig. 3). 

3.2.5. Direct Dilution of Urine 

This method is used only in conjunction to direct MALDI MS profiling 

• Dilute urine 10 times with 0.1% TFA v/v (see Notes 30, 31). 

• Apply 1 μL of the urine sample on MALDI target. 

• Apply 1 μL matrix solution. 

• Proceed with MALDI-TOF-MS (see Subheading 3.3.2, Fig. 3). 

3.3. Analytical/Profiling Techniques 

3.3.1. Two-dimensional Separation 

1. Measure protein concentration of the sample (pretreated by precipitation or 

ultrafiltration) by the use of a commercially available protein kit. 

2. Take 0.5–1 mg of urinary proteins diluted in 300 μL of 2DE sample buffer (see 

Note 32). 

3. Distribute the sample volume equally in a lane of the IEF focusing tray. 

4. Place the strip carefully, with the gel face down and in contact with the electrodes 

(see Note 33). 

5. Rehydrate actively for 16 h at 50 V and 20°C. Caution: do not cover the strip 

with mineral oil immediately but after 1hofrehydration (see Note 34). 

6. After rehydration, place moistened IEF papers between the strip and electrodes. 

7. Start IEF. The typical program is: 250 V for 30 min, linear increment up to 

5000 V in 12 h, 5000 V for 16 h (total 110,000 V-h) (see Note 35).


8. After IEF is complete, equilibrate strip with 10 mL equilibration buffer I for 

20 min at ambient temperature. 

9. Alkylate with 10 mL equilibration buffer II for 20 min (see Note 36). 

10. Place strip on top of 12.5% polyacrylamide gel, cover with 0.5% melted agarose 

in TGS buffer and start second dimension. Start with 10 mA current for 1hand 

continue with 40 mA for approximately another 4h(see Note 37). 

11. Fix gel for 2 h with fixation solution. 

12. Stain o/n with colloidal coomassie blue stain (Fig. 2). 

3.3.2. MALDI-TOF-MS 

1. Place 1 μL sample on the MALDI target plus 1 μL matrix solution and mix on 

spot (dried droplet technique, see Notes 38 and 39). 

2. Leave target to dry at ambient temperature in the dark. 

3. Load sample in the instrument and execute the appropriate MS method. Run the 

instrument in linear mode (see Note 40). 

4. Optimize ion acceleration; tempering with sensitivity of the detector is not recommended 

prior to MS method establishment (see Note 41). 

5. Set pulsed ion extraction (delayed ion acceleration) according to the profiling 

region in use. Typically when -cyano-cinnamic acid is utilized 50–150 ns are 

applied for large peptides (3–5 kd), 150–300 ns for small molecular weight 

proteins (15 kd), and higher than 300 for proteins (>20 kd, see Notes 42 and 43). 

6. Collect 1000–2000 shots per sample and sum the collected data (see Note 44). 

4. Notes 

1. Dialysis is one of the most classical methods for buffer exchange and purification 

(separation) of high from low molecular weight constituents of a specific 

sample. Although it has been utilized elsewhere (20) we consider it rather 

laborious, costly and serving solely purification and not condensation purposes. 

Ultracentrifugation has been applied (21) for the isolation of higher molecular 

weight urinary proteins prior to 2DE (Fig. 1). In our opinion, centrifugal 

isolation of proteins is a very diverse and complicated issue and reproducibility is 

consequently compromized. Precipitation of biopolymers by ultracentrifugation 

requires the use of solutions with very well calculated composition in order to 

extract the velocity for protein isolation from the theoretical Svedberg values. 

Urine samples differ significantly in density (d = m/v) and pH values to serve 

such purposes in a well-defined and reproducible manner. 

2. It should be emphasized that extensive complementarity of the various methods 

exists; thereby the combinatorial application of different methods is recommended 

in order to increase protein resolution. 

3. Urine samples can be first void, midstream, morning, random catch, or 24 h. 

Due to its high bacterial content, first morning urine is usually not recommended 

in biomarker discovery studies.


4. Upon their collection, if not stored immediately in –80°C, urine samples should 

be stored at 4°C. Published data support (9,10) that for analysis by 2DE or 

SELDI/MALDI MS the generated proteomic profiles are usually stable for up to 

24 h urine storage at 4°C prior to deep freezing. We have observed occasional 

profile changes after so prolonged storage times at 4°C, and we therefore favor 

shorter times. 

5. An enrichment of the soluble supernatant for cellular proteins may be achieved 

if prior to the centrifugation step a mild sonication (sonicator bath) for 5–10 min 

is applied. 

6. The volume of urine required depends on the specific downstream application. 

For 2DE analysis an aliquot of at least 15 mL of urine is required. For direct 

MALDI MS profiling 1 mL urine aliquot is sufficient. 

7. The TCA can be added as solid to a final concentration of 15% (w/v) (TCA 

is extremely hydroscopic and is easily solubilized). Alternatively, the appropriate 

volume of 100% TCA w/v may be added to the urine sample to reach 

a final concentration of 15% (w/v). TCA precipitation can also be performed 

at –20°C and o/n storage with occasionally slightly better efficiency. Caution: 

TCA solutions may form bilayer aqueous–organic systems depending on the 

salt concentration of the urine at –20°C or lower temperatures. The precipitation 

efficiency is dependent of the protein concentration of a given sample; 

in our experience, for example, the precipitation yield for a starting material 

of 0.5 mg/mL protein concentration (i.e., 1 mg total protein found in 2 mL 

sample) ranges from 40 to 70%; in contrast the precipitation efficiency for a 

starting material of 0.1 mg/mL protein concentration (i.e., 1 mg protein in 10 mL 

sample) is 0–30%. For this reason, avoid adding TCA solution in very dilute 

protein samples. 

8. In case where the highest available centrifugal force is only 4000–5000 RCF, 

then longer centrifugation times (45 min) are recommended. 

9. The volume of acetone utilized for washing depends on the size of the protein 

pellet. A general rule is to use 1 mL acetone for every 1 mL of urine starting 

material. 

10. Acetone washes are needed to drive of excess TCA or else the pellet is extremely 

acidic and buffers utilized in further steps are neutralized. In addition, TCA 

(nonvolatile acid) may inhibit IEF, PAGE, LC, or MS analysis. We have found 

that acetone washes of the pellet does not induce significant protein losses. 

11. The pellet should not be completely dried off, since this renders difficult 

its subsequent solubilization in 2DE or other buffers. Acetone evaporation at 

elevated temperatures is not recommended for the same reason. 

12. If the pellet does not come in solution, try mild sonication (5 min in a sonicator 

bath) or incubate at ambient temperature for 30 min with intermittent vortexing. 

However, heating should be avoided (particularly if the pellet is resuspended in 

2DE buffer since urea decomposes when heated and reacts with amino acids). 

The buffer volume required for solubilization depends on the protein content 

(pellet size) and the type of downstream application (2DE or MALDI-TOF-MS).


13. The protein pellet may be solubilized in 0.1% TFA v/v (roughly 100 μL of 

solubilization buffer for every milliliter of urine starting material) and analyzed 

by MALDI-TOF-MS. However, in our experience, plasticizers possibly extracted 

during the precipitation process are frequently detected and reproducibility 

problems are observed. Therefore, unless additional purification steps are introduced 

(SPE, etc.), we do not favor the application of precipitation methods at 

the front end of MALDI MS profiling. 

14. The use of ethanol, acetone, or isopropanol is favored. These are hydrophobic, 

water mixable – even at elevated salt concentrations – nontoxic, and volatile. 

In particular, we favor the use of acetone since it is cheap, extremely volatile, 

and rarely forms aqueous–organic bilayers. Organic solvent mixtures e.g., 

isopropanol–acetone, do not increase precipitation efficiencies; in our experience 

their use induces reproducibility problems and therefore is not recommended. 

15. The sample to solvent ratio depends on the downstream application and the 

sample protein concentration. For dilute urine samples (protein concentration of 

micrograms per milliliter) a solvent to sample ratio of 3 provides relatively high 

precipitation efficiencies. We have observed that for more concentrated samples 

(for example, preconcentrated urine or in general starting material of protein 

content in the micrograms per milliliter range), the precipitation efficiency for 

lower MW constituents reaches its maximum at solvent to sample ratio of 

about 9. 

16. Precipitation is most efficient at –20°C (lower efficiencies have been observed at 

4°C, whereas at –80°C bilayer systems may form, which inhibit the procedure). 

17. Acetone washes of pellet in organic solvent precipitation protocols are not 

accustomed. From our experience, however, washing offers great advantages 

especially when 2DE separation is the downstream application since salts and 

other interfering substances are removed (Fig. 2). This washing step renders 

2DE gels produced after acetone precipitation equally good to those generated 

following TCA precipitation. Acetone washing induces negligible protein losses. 

18. There are Amicon UF devices that can accommodate up to 4 (UF4) or 15 mL 

(UF15) sample volumes. We regularly utilize the UF4 devices when MALDI MS 

profiling is to be performed and UF15 when 2DE is the downstream application. 

19. Amicon devices have several MWCO. We propose the use of 5000 kd MWCO 

for the isolation and condensation of “total” urine protein content. The use of 

different MWCO is advised for specific isolation of molecular weight groups 

(see also Note 25). It should be emphasized that UF is not an absolute sizeexclusion 

separation method and cross-contamination between different protein 

size groups is expected and regularly observed. 

20. UF can be performed in the presence of chemical additives. The kind of additives 

in use depends on the downstream application (2DE, MALDI profiling, LC- 

MS, etc.) since in all cases the chemical compatibility to the latter should be 

maintained. For example, we have observed that in case of direct MALDI MS 

profiling most additives (detergents such as: octyl-glucopyranoside, triton-100, 

tween-20, and organic solvents such as: trifluoroethanol


and isopropanol


(e.g., phosphor or glycopeptides) is feasible and that is which differentiates SPE 

from other sample preparation steps. From our point of view SPE in combination 

to direct MS profiling is encouraged. 

29. All chromatographic and SPE media contain residuals and plasticizers, which 

should be driven off prior to analyte binding. Failure to perform this step may 

result in complete ionization suppression during MALDI profiling. 

30. The user may have to try different dilutions of the urine sample. In MALDI MS 

profiling experiments, there is a range of protein concentration within which the 

spectra quality is not affected. It is advised to conduct preliminary experiments 

in order to address this issue. 

31. In addition to TFA, the use of several additives (urea, octyl-glucopyranoside, 

triton-100, tween-20, NP-40, cholate, and organic solvents) at MALDI MS 

compatible concentrations has been tested on urinary peptide–protein ionization. 

However, we did not observe any clear advantage on protein resolution or 

ionization in these cases. 

32. The recommended protein amount of 0.5–1 mg is suitable for 17–18 cm length 

and 3–10 or 4–7 pH range strips. The protein amount will vary if different strip 

types are utilized, according to the manufacturer’s guidelines (for additional tips 

on 2DE see Chapters 4–6). 

33. Noncup loading was found to provide better resolution in urine analysis by 2DE 

compared to the cup loading method. 

34. Direct addition of the mineral oil might cause extraction of hydrophobic proteins 

to the oil layer. 

35. These running conditions are for the analysis of 1 mg protein sample on wide 

range (3–10 or 4–7) 17 or 18 cm IPG strips. The program will vary depending 

on the sample quantity and the type of strip in use. 

36. Reduction and alkylation are necessary for higher protein resolution in SDS- 

PAGE and also for protein identification through peptide mass fingerprinting. 

37. The low starting current is needed for the slow migration of the proteins from 

the strip to the polyacrylamide gel. Direct electrophoresis with 40 mA current 

may cause protein losses. Alternatively, the gel may run at 10 mA o/n. Although 

slower, the latter approach provides gels of higher resolution, in our experience 

(for additional tips on 2DE see Chapters 4–6). 

38. Several sample application techniques were tested (thin layer preparation, double 

layer, and variations of dried droplet). Of those, we found that dried droplet 

(with simultaneous sample and matrix application) was the simplest, fastest, 

and most reliable method. In addition, the simultaneous drying of sample and 

matrix solution (rather than sample and matrix separately) increases reproducibility 

and minimizes losses during subsequent spot washes. In contrast, 

if sample and matrix are mixed prior to their application on the target, their 

consumption is much higher and the sample exposure to plastics increases, 

thereby increasing the chances for sample contamination and subsequent ion 

suppression by plasticizers.


39. In case that crystal formation is obscured due to high salt content in the 

sample, wash the spot by pipetting two to three times with 2 μL of cool 0.1% 

TFA solution v/v (let dry again, do not wipe dry). Always prefer spot to spot 

washing rather than washing the entire target, in order to avoid sample crosscontamination. 

40. Instrument calibration is performed according to the manufacturer specifications. 

In any case, we propose daily calibration to ensure precision and accuracy. 

41. Acceleration of biomolecules is first of all affected by voltage settings of the 

ion source. Settings of the analyzer (TOF) affect mainly resolution parameters, 

while detector settings should be tempered only to improve signal to noise 

characteristics of a given sample. 

42. The mass spectrum should be divided into subregions and data of each of the 

latter should be collected separately, in order to increase protein resolution. 

This is because ionization kinetics (and consequently instrument settings) are 

completely different for different protein sizes. 

43. Different matrices (e.g., CHCA or dihydroxybenzoic acid for peptides and SA 

for proteins) require different laser focusing settings. In general, large crystals 

(such as the ones formed by SA) and larger protein molecules require more 

concentrated energy bursts than smaller ones where more disperse hits may be 

used. 

44. Always sum the same amount of laser shots and select as many regions of a 

spot as possible to ensure high reproducibility. 


This study was supported by the Greek Ministry of Health. 

References 

1. Norden, G.W.A., Sharratt, P., Cutillas, P.R., Cramer, R., Gardner, S.C. and 

Unwin, R.J. (2004) Quantitative amino acid and proteomics analysis: Very low 

excretion of polypeptides >750 Da in normal urine. Kidney International 66, 

1994–2003. 

2. Pisitkun, T., Johnstone, R. and Knepper, M.A. (2006) Discovery of urinary 

biomarkers. Molecular and Cellular Proteomics 5, 1760–1771. 

3. Nielsen, M.E., Schaeffer, E.M., Veltri, R.W., Schoenberg, M.P., Getzenberg, R.H. 

(2006) Urinary markers in the detection of bladder cancer: What’s new Current 

Opinion in Urology 16, 350–355. 

4. Thongboonkerd, V. and Malasit, P. (2005) Renal and urinary proteomics: Current 

applications and challenges. Proteomics 5, 1033–1042. 

5. Pieper, R., Gatlin, C.L., McGrath, A.M., Makusky, A.J., Mondal, M., 

Seonarain, M., Field E., Schatz, C.R., Estock, M.A., Ahmed, N., Anderson, N.G. 

and Steiner, S. (2004) Characterization of the human urinary proteome: A method


for high-resolution display of urinary proteins on two-dimensional electrophoresis 

gels with a yield of nearly 1400 distinct protein spots. Proteomics 4, 1159–1174. 

6. Oh, J., Pyo, J., Jo, E., Hwang, S., Kang, S., Jung, J., Park, E., Kim, S., Choi, J. 

and Lim, J. (2004) Establishment of a near-standard two-dimensional human urine 

proteomic map. Proteomics 4, 3485–3497. 

7. Spahr, C.S., Davis, M.T., McGinley, M.D., Robinson, J.H., Bures, E.J., Beierle, J., 

Mort, J., Courchesne, P.L., Chen, K., Wahl, R.C., Yu, W., Luethy, R. and 

Patterson, S.D. (2001) Towards defining the urinary proteome using liquid 

chromatography-tandem mass spectrometry I. Profiling an unfractionated tryptic 

digest. Proteomics 1, 93–107. 

8. Cutillas, P.R., Norden, A., Cramer, R., Burlingame, A. and Unwin, R.J. (2003) 

Detection and analysis of urinary peptides by on-line liquid chromatography and 

mass spectrometry: Application to patients with renal Fanconi syndrome. Clinical 

Science 104, 483–490. 

9. Schaub, S., Wilkins J., Weiler, T., Sangster, K., Rush, D., Nickerson, P. 

(2004) Urine protein profiling with SELDI TOF MS. Kidney International 65, 

323–332. 

10. Rogers, M.A., Clarke, P., Noble, J., Munro, N.P., Paul, A., Selby, P.J. and 

Banks, R.E. (2003) Proteomic profiling of urinary proteins in renal cancer by 

surface enhanced laser desorption ionization and neural-network analysis: Identification 

of key issues affecting potential clinical utility. Cancer Research 63, 

6971–6983. 

11. Vlahou, A., Schellhammer, P.F., Mendrinos, S., Patel, K., Kondylis, F.I., Gong, L., 

Nasim, S. and Wright, J.G. Jr. (2001) Development of a novel proteomic approach 

for the detection of transitional cell carcinoma of the bladder in urine. The American 

Journal of Pathology 158, 1491–1502. 

12. Vlahou, A., Giannopoulos, A., Gregory, B.W., Manousakas, T., Kondylis, F.I., 

Wilson, L.L., Schellhammer, P.F., Semmes, O.J. and Wright G.L. Jr. (2004) Protein 

profiling in urine for the diagnosis of bladder cancer. Clinical Chemistry 50, 

1438–1445. 

13. Zerefos, P.G., Prados, J., Kalousis, A. and Vlahou, A. (2007) Sample preparation 

and bioinformatics in MALDI profiling of urinary proteins. Journal of Chromatography 

B. Analyt Technol Biomed Life Sci. 15, 20–30. 

14. Zórbig, P., Renfrow, M.B., Schiffer, E., Novak, J., Walden, M., Wittke, S., Just, I., 

Pelzing, M., NeusóÌ, C., Theodorescu, D., Root, K.E., Ross, M.M. and Mischak, H. 

(2006) Biomarker discovery by CE-MS enables sequence analysis via MS/MS with 

platform-independent separation. Electrophoresis 27, 2111–2125. 

15. Mischal, H., Kaiser, T., Walden, M., Hillmann, M., Wittke, S., Herrmann, A., 

Knueppel, S., Haller, H. and Fliser, D. (2004) Proteomic analysis for the assessment 

of diabetic renal damage in humans. Clinical Science 107, 485–495. 

16. Zerefos, P.G., Vougas, K., Dimitraki, P., Kossida, S., Petrolekas, A., 

Stravodimos, K., Giannopoulos, A., Fountoulakis, M. and Vlahou, A. (2006) 

Characterization of the human urine proteome by preparative electrophoresis in 

combination with 2-DE. Proteomics 6, 4346–4355.


17. Pang, J.X., Ginanni, N., Dongre, A.R., Hefta, S.A., and Opiteck, G.J. (2002) 

Biomarker discovery in urine by proteomics. Journal of Proteome Research 1, 

161–169. 

18. Sun, W., Li, F., Wu, S., Wang, X., Zheng, D., Wang, J. and Gao, Y. (2005) Human 

urine proteome analysis by three separation approaches. Proteomics 5, 4994–5001. 

19. Soldi, M., Sarto, C., Valsecchi, C., Magni, F., Proserpio, V., Ticozzi, D. and 

Mocarelli, P. (2005) Proteome profile of human urine with two-dimensional liquid 

phase fractionation. Proteomics 5, 2641–2647. 

20. Rasmussen, H.H., Orntoft, T.F., Wolf, H. and Celis, J.E. (1996) Towards a comprehensive 

database of proteins from the urine of patients with bladder cancer. The 

Journal of Urology 6, 2113–2119. 

21. Thongboonkerd, V., McLeish, K.R., Arthur, J.M. and Klein, J.B. (2002) Proteomic 

analysis of normal human urinary proteins isolated by acetone precipitation or 

ultracentrifugation. Kidney International 62, 1461–1469. 

22. Glen, L., Hortin, G.L., Meilinger, B. and Drake, S.K. (2004) Size-selective 

extraction of peptides from urine for mass spectrometric analysis. Clinical 

Chemistry 50, 1092–1095. 

23. Zhang, X., Leung, S., Morris, C.R. and Shigenaga, M.K. (2004) Evaluation 

of a novel, integrated approach using functionalized magnetic beads, benchtop 

MALDI-TOF-MS with prestructured sample supports, and pattern recognition 

software for profiling potential biomarkers in human plasma. Journal of 

Biomolecular Techniques 15, 167–175.

9 

Combining Laser Capture Microdissection 

and Proteomics Techniques 

Dana Mustafa, Johan M. Kros, and Theo Luider 

Summary 

Laser microdissection is an effective technique to harvest pure cell populations from 

complex tissue sections. In addition to using the microdissected cells in several DNA and 

RNA studies, it has been shown that the small number of cells obtained by this technique 

can also be used for proteomics analysis. Combining laser capture microdissection and 

different types of mass spectrometers opened ways to find and identify proteins that are 

specific for various cell types, tissues, and their morbid alterations. Although the combination 

of microdissection followed by the currently available techniques of proteomics has 

not yet reached the stage of genome wide representation of all proteins present in a tissue, 

it is a feasible way to find significant differentially expressed proteins in target tissues. 

Recent developments in mass spectrometric detection followed by proper statistics and 

bioinformatics enable to analyze the proteome of not more than 100–200 cells. Obviously, 

validation of result is essential. The present review describes and discusses the various 

methods developed to target cell populations of interest by laser microdissection, followed 

by analysis of their proteome. 

Key Words: laser capture microdissection; matrix-assisted laser desorption/ 

ionization; Fourier transformer mass spectrometry; time-of-flight mass spectrometry; liquid 

chromatography-electrospray ionization tandem mass spectrometry; two-dimensional 

polyacrylamide gel electrophoresis; differential in-gel electrophoresis; protein chip 

technology. 

Abbreviations: LCM: Laser Capture Microdissection, LMM: Laser Microbeam 

Microdissection, LPC: Laser Pressure Catapulting, 2D PAGE: Two-dimensional Polyacrylamide 

Gel Electrophoresis, 2D DIGE: Differential In-gel Electrophoresis, SDS: Sodium 



159

160 Mustafa et al. 

Dodecyl Sulphate, MALDI-TOF/MS: Matrix-assisted Laser Desorption/Ionization Timeof-flight 

Mass Spectrometry, MALDI-FTMS: Matrix-assisted Laser Desorption/Ionization 

Fourier Transformer Mass Spectrometry, LC-ESI-MS/MS: Liquid Chromatography- 

Electrospray Ionization Tandem Mass Spectrometry, HPLC: High Performance Liquid 

Chromatography, SELDI-TOF: Surface-enhanced Laser Desorption/Ionization Time-offlight, 

ICAT: Isotope-coded Affinity Tag 


Over the last years, significant progress in the analysis of the entire genome 

has triggered efforts to further analyze normal and abnormal protein expression 

patterns. There is, for instance, an eagerness to discover more and better 

diagnostic markers for specific diseases. High expectations of the use of better 

biomarkers for the purpose of improving diagnosis and monitoring treatment 

initiated technical developments. Human tissues are usually composed of rather 

complex mixtures of different cell types. Many techniques have been used 

for the isolation of pure cell populations and each technique has its advantages 

and limitations. For example, immunohistochemistry is an established 

and relatively easy technique applicable for localizing protein expression. A 

drawback of immunohistochemistry is the impossibility of quantitative assessments 

of proteins. Another method to obtain information about particular cell 

populations is growing cell cultures in order to amplify target cells. Despite 

the technical feasibility of this technique, the biological characteristics of the 

original cells may not be so accurate in an in vitro environment (1). Alternatively, 

by using xenografts a better mimicking of the normal situation is 

reached, but again this method only reflects the real situation of cells in vivo to 

some extent (2). Another way of separating cell populations for further investigation 

is flow cytometry, which has successfully been applied in the study of 

many disease processes. Flow cytometric analysis is applied to cell suspensions 

and specific markers for selection of cell population are required. To the best 

of our knowledge, the combination of flow cytometry and subsequent mass 

spectrometry (MS) has not yet been described for the analysis of solid tissues. 

In this review, we discuss methods of cell purification and harvesting 

techniques by the use of laser microdissection, which are currently applied for 

further MS analysis. 

2. Laser Capture Microdissection 

In order to select for specific cell populations in heterogeneous tissues, 

several microdissection techniques have been described. Most techniques 

involve the use of a needle to scrap off cells of interest under direct microscopic

Combining LCM and Proteomics Techniques 161 

visualization (3,4). This method, however, tends to be slow, tedious, and highly 

operator dependent (2). In 1992, Shibata and coworkers described a new method 

of cell isolation. They used a specific pigment placed over small numbers of 

cells in a tissue section, which served as an umbrella preventing the covered 

cells of being destroyed. Ultraviolet light was used to destroy the DNA/RNA 

of the uncovered cells (5). Shortly later, laser capture microdissection (LCM) 

under direct microscopic visualization was developed by Liotta and coworkers 

in the National Cancer Institute. This way of target cell isolation permits rapid, 

reliable laser microdissection to collect specific cell populations from a section 

of a complex, heterogeneous tissue (6). For this approach, a tissue section 

is placed in a holder of an inverted microscope. A transparent, thermoplastic 

polymer coating [e.g., ethylene vinyl acetate (EVA) (7)] is placed in contact 

with the tissue. The EVA polymer is positioned over microscopically selected 

cell clusters and subsequently the polymer is precisely activated by a nearinfrared 

laser pulse steered by the investigator. The laser activation of the 

polymer results in specific binding to the targeted area. With the removal of 

the EVA and the tissue that was bound to it from the section the selected cell 

aggregates are isolated for molecular analysis (8). LCM is compatible with 

a variety of cellular staining methods and tissue preservation protocols (9). 

Dependent on the microlaser dissection device used, the collection caps used are 

positioned in different ways. For instance, the caps in the PixCell II (Arcturus 

Engineering, Mountain View, CA, USA) technique make contact with the 

tissue sections, therefore, strict requirements for preparations are needed. The 

PALM microlaser dissector (PALM Microlaser Technologies AG, Bernried, 

Germany) provides a powerful separation in which an important application of 

the cutting UV-laser is laser microbeam microdissection combined with laser 

pressure catapulting (10). A specific glass slides covered with polyethylene 

naphthalate membrane will aid in stabilizing the morphological integrity of 

the captured area (11) (Fig. 1). In this method, collecting caps do not make 

any contact with the tissue sections anymore, which increase the flexibility in 

respect to section preparation (12). Both LCM techniques are specific enough 

to dissect single cells. The PALM can dissect smaller sections of tissue as 

compared to the PixCell system. The two methods of microdissection yield 

RNA retrievals of comparable quality and quantity, but they have not been 

directly compared with regard to recent developments in protein retrieval for 

mass spectrometric applications (13). The collection of large quantities of cells 

by LCM is a time consuming procedure requiring the microscopical visualization 

of the cells of interest in a stained tissue sections before lasering. The 

software and the hardware of the different types of laser microdissection are 

still developing.


Buffer droplet 

Microdissected tissue 

Cap 

PEN membrane 

Stage 

Tissue section 

Slide 

Laser 

objective 

Fig. 1. A scheme that represents the principle of laser capture microdissection. 

3. LCM and Two-Dimensional Gel Electrophoresis 

A new development is the application of LCM for protein retrieval of 

tissues for further analysis by proteomic techniques. So far, several approaches 

have been performed on cells obtained by laser microdissection. In 2000, 

Emmert-Buck and coworkers applied two-dimensional polyacrylamide gel 

electrophoresis (2D PAGE) to 50,000 microdissected epithelial cells (14). They 

compared tumor cells and normal controls from two patients with oesophageal 

cancer (14). Staining the gels with silver yielded the visualization of 675 distinct 

proteins and isoforms. Seventeen differentially expressed spots were further 

analyzed by MS. This resulted in the identification of two specific proteins, 

cytokeratin 1 and annexin I. It was assumed that these proteins were present 

in an abundance range of 50,000–1,000,000 copies per cell (14). Using colon 

cancer as a model, also Lawrie and coworkers showed the feasibility of investigating 

protein expression by combining the technologies of LCM and proteome 

analysis like 2D PAGE and MS (15). 

To overcome the limitation of LCM in producing relatively low numbers of 

cells, an extra step has been added to the separation method. In addition to the 

2D PAGE from the microdissected cells, an extra 2D PAGE from the whole 

section of the same set of samples can be useful. The comparison of silver 

stained 2D gels created from microdissected epithelial cells of ovarian cancer 

and the 2D gels created from the whole section of the same ovarian samples, 

facilitated the discovery of 23 differentially expressed proteins between low 

malignant potential and invasive ovarian cancers (16). In-gel digestion of the 

specific gel spots followed by MS/MS analysis resulted in the identification of 

glyoxalase I, RhoGDI, and a 52 kDa FK506 binding protein (16). In another 

study based on 2D PAGE, 315 protein spots were identified by collecting 

100,000 cells by LCM of normal and cancer ductal units from breast tissue


sections (17). Subsequent measurement of the spots by MS resulted in the 

identification of 57 differentially expressed proteins between the two groups of 

samples (17). 

The relative low number of microdissected cells emphasizes the importance 

of loading equivalent amounts of protein on the gels. Thus, Shekouh and 

coworkers (18) followed a strategy to increase the accuracy of 2D PAGE from 

LCM samples. The samples were first separated by one-dimensional sodium 

dodecyl sulphate (SDS)-PAGE, stained with silver and subsequently subjected 

to densitometry. Evaluation of the staining intensity was used to normalize 

the samples. The 2D PAGE silver stained images from 50,000 microdissected 

adenocarcinoma cells were compared with the images from whole sections of 

pancreatic samples. Spots of their interest were subjected to MALDI-TOF/TOF 

MS, resulting in the identification of S100A6 as an over-expressed protein in 

pancreatic cancer cells (18). The same methodology has been used to understand 

the mechanism of a specific molecule such as (HER-2/neu) in breast 

cancer (19). Breast cancer tissue was used to microdissect about 50,000–70,000 

cells from three HER-2/neu-positive tumors and three HER-2/neu-negative 

tumors. This lead to the detection of about 500–600 protein spots in each 

gel. The comparison of these two groups allowed the identification of cytokeratin 

19 (CK19) as an overexpressed protein in HER-2/neu-positive breast 

cancer patients (19). In another study, the 2D PAGE of 10,000 microdissected 

cells of hepatocellular carcinoma (HCC) samples was compared with normal 

surrounding tissue. The investigators visualized about 868 spots of which 20 

were considered as differentially expressed proteins. The digestion of these 

proteins into peptides was followed by the application of ESI-MS/MS, which 

allowed the identification of 11 proteins. Four out of these 11 proteins were 

considered as novel candidates of hepatitis B-related HCC markers (20). This 

approach of separating the microdissected cells on 2D PAGE followed by in-gel 

protein digestion and MS measurements for the identification of biomarkers has 

been applied to a wide range of cancers, using various numbers of microdissected 

cells. There is a range of 10,000–100,000 cells harvested by LCM for 

the successful application of 2D electrophoresis (Table 1) . 

4. LCM and Differential In-Gel Electrophoresis 

In 2002, Zhou and coworkers described a new technique called differential 

in-gel electrophoresis (DIGE) (21). Two pools of proteins are labeled 

with 1-(5-carboxypentyl)-1-propylindocarbocyanine halide (Cy3) N-hydroxysuccinimidyl 

ester and 1-(5-carboxypentyl)-1-methylindodi-carbocyanine 

halide (Cy5) N-hydroxy-succinimidyl ester fluorescent dyes (21). The labeled 

proteins are mixed and separated in the same 2D gel. This strategy improves

Table 1 

Overview of Different Methods to Combine Laser Microdissection and Different Proteomics Techniques 

Separation 

technique 

Number of 

microdissected 

cells/sample 

Number of 

visualized 

proteins 

Identification 

technique 

Number of 

significant 

differentially 

identified proteins 

Number of 

samples/study 

Tissue 

used 

2D PAGE, 

silver 

staining 

2D PAGE, 

silver 

staining 

50,000 Approximately 

675 distinct 

proteins 

including 

isoforms 

1–5 μg of total 

cellular protein 

Mass spectrometry 

and immunoblot 

analysis 

Not determined Mass spectrometry 

data from all the 

protein spots cut 

from the gels 

n = 2; cytokeratin 

1 and annexin I 

n = 3; cytokeratin 

8, cytokeratin 18, 

and -actin 

2 cancer samples 

and 2 normal 

samples 


and 2 normal 

samples 

Esophageal 

cancer 

Colon 

cancer 

2D PAGE, 

silver 

staining 

50,000 23 differentially 

expressed 

proteins were 

discussed 

ESI-MS 

identification from 

gels made of whole 

sections 

n = 3; FK506 

binding protein, 

glyoxalase I, and 

RhoGDI 

3 invasive OV 

and 2 noninvasive 

(LMP) OV 

Ovarian 

cancer 

2D PAGE, 

silver 

staining 

2D PAGE, 

silver 

staining 

100,000 315 protein spots MS identification 

from gels made of 

whole sections 

n = 57 observed 

proteins. n =2 

after confirmation 

50,000 800 protein spots MALDI-TOF/TOF n =1; 

calcium-binding 

protein, S100A6 

6 samples of 

DCIS and 6 

samples of normal 

ductal/lobular 

units 


and 4 normal 

samples 

Breast 

cancer 

Pancreas 

cancer 

Reference 

(14) 

(15) 

(16) 

(17) 

(18) 

164

2D PAGE, 

silver 

staining 

2D PAGE, 

silver 

staining 

2D DIGE, 

lysine 

specific 

dyes 

2D DIGE, 

lysine 

specific 

dyes 

2D DIGE, 

lysine 

specific 

dyes 

50,000–70,000 500–600 protein 

spots 

MALDI-TOF mass 

spectrometer 

10,000 868 protein spots Nano-flow 

ESI-MS/MS 

250,000 1038–1088 

protein spots 

Capillary LC 

tandem mass 

analysis 

30,000 1200 protein 

spots 

MALDI-TOF 

measurements 

50,000 Not applicable MALDI-TOF 

and/or 

immunoblotting 

for protein 

identification 

n =7; 

cytokeratin19, 

tropomyosin 3, 

aldolase A, 

glyoxalase I, 

cathepsin D chain 

3, albumin, and 

MnSOD 

3 HER-2/neupositive 

samples 

and 3 HER- 

2/neu-negative 

samples 

n = 11 proteins, 

four of them were 

novel markers 

10 hepatic cancer 

cells samples 

n = 1; tumor 

rejection antigen 

(gp96) 

One sample 

contained normal 

and one sample 

contains cancer 

cells 

No further 

identifications 

One sample 

contained gastric 

mucosa and one 

SPEM 

n = 32 Five samples 

contained 

malignant and 

normal breast 

tissue 

HER- 

2/neupositive 

breast 

cancer 

cells 

Hepatic 

cancer 

cells. 

hepatitis B 

positive 

cells 

Esophageal 

carcinoma 

Gastric 

metaplasia 

samples 

Breast 

epithelium 

cell 

(19) 

(20) 

(21) 

(22) 

(23) 

Continued 

165

Table 1 

Continued 

Separation 

technique 

2D DIGE, 

cysteine 

specific 

dyes 

2D DIGE, 

cysteine 

specific 

dyes 

(IPG-IEF) 

2D-PAGE 

gel 

(IPG-IEF) 

2D-PAGE 

gel 

Number of 


cells/sample 

Number of 

visualized 

proteins 


technique 

Number of 

significant 



Number of 

samples/study 

5000 ∼1000 protein 

spots 

MALDI-MS 

and MS/MS 

measurements 

n = 40 cultured oncogenetransduced 

epithelial cells and 

precancerous 

versus cancerous 

tissue 

Between 100 

and 10 

glomeruli, 

which equals 

to 0.5–3 μg 

protein 

Between 1400 

and 900 protein 

spots 

Nano 

LC-ESI-MS/MS 

n = 23 between 

mice glomeruli 

and mice cortex 

3 different protein 

extracts from 

human glomeruli 

and 3 independent 

isolated glomeruli 

and cortex from 3 

mice 

Proteins, 

3.8 μg 

Not applicable Mass spectrometry n = 29 2 samples 

contained renal cell 

carcinoma and 

normal kidney 

tissues 

Approximately 

HPLC 

system 

16 O/ 

18 O 

isotopic 

labeling 

peptides 

Gel-free 

method 

Gel-free 

method 

Gel-free 

method 

10,000 Not applicable ESI mass 

spectrometry 

followed by 

MS/MS 

n = 9 3 slides from the 

same cell culture 

10,000 Not applicable The reverse phase 

of LC-ESI-MS/MS 

on the ion trap 

mass spectrum 

n = 76 2 samples with 

invasive ductal 

carcinoma of the 

breast 

30,000–50,000 Not applicable SELDI-TOF/MS n = 1; prostate 

carcinomaassociated 

protein 

(PCa-24) 

17 prostate 

carcinoma that 

contained normal 

tissue and BPH 

tissue and 7 BPH 

samples 

∼2000 Not applicable MALDI-TOF/MS n = 2; calgranulin 

A and chaperonin 

10 

8 endometrioid 

adenocarcinomas, 

4 proliferative 

endometria, and 

4 secretory 

endometria 

150 Not applicable MALDI-TOF/MS No protein 

identifications. 

Unique peptide 

pattern of ∼35 

peptides for 

trophoblast and 

stroma cells 

1 placenta sample 

contained 

trophoblasts and 

surrounding 

stroma cells. 

Breast 

cancer cell 

line 

(SKBR-3) 

Ductal 

carcinoma 

of the 

breast 

Prostate 

cancer 

Endometrial 

cancer 

Placenta 

samples 

(34) 

(29) 

(41) 

(36) 

(37) 

Continued 

167

Table 1 

Continued 

Separation 

technique 

Number of 


cells/sample 

Number of 

visualized 

proteins 


technique 

Number of 

significant 



Number of 

samples/study 

Tissue 

used 

Reference 

Gel-free 

method 

2000–2400 Not applicable MALDI-TOF/TOF 

mass spectrometry 

No protein 

identifications. 9 


expressed peptides 

6 invasive ductal 

breast carcinoma 

contained cancer 

and normal cells 

Breast 

cancer 

(38) 

Gel-free 

method 

3000 Not applicable Nano LC-FTICR 


n = 1003 proteins 

identified 

2 replicate samples 

of breast cancer 

epithelial cells 

Breast 

cancer 

Umar 

et al., 

2006 

ProteinChip 

technology 

3000–5000 Not applicable Isolation by 

two-dimensional 

gel electrophoresis 

and tandem mass 

spectrometry 

analysis 

n = 1; annexin V 57 head and neck 

tumor samples and 

44 mucosa samples 

Head and 

nick 

cancer 

(40) 

ProteinChip 

technology 

3000–5000 Not applicable Isolation by 

reverse-phase 

chromatography 

and SDS-PAGE 

then identified by 

MS/MS analysis 

n = 1; heat shock 

protein 10 

39 colorectal tumor 

samples, 40 normal 

mucosa samples, 

and 29 adenoma 

samples 

Colorectal 

cancer 

(39) 

Abbreviations: 2DE: 2 dimensional gel electrophoresis, OV: ovarian cancer, LMP: low malignant potential, DCIS: ductal/lobular units 

and ductal carcinoma in situ, HCC: hepatocellular carcinoma, BPH: benign prostatic hyperplasia, SPEM: spasmolytic polypeptide expressing 

metaplasia, PR: progesterone receptor, ER: estrogen receptor 

168


the sensitivity of detection and enlarges the range of candidate proteins 

for detection. Molecular weight- and charge-matched cyanine dyes enable 

multiplex labeling with different samples run on the same gel. The same investigators 

described a powerful tool for the molecular characterization of cancer 

progression and identification of cancer-specific protein markers by combining 

2D DIGE with MS. They compare the 2D DIGE of about 250,000 microdissected 

cells from oesophageal carcinoma with normal epithelial cells from 

the oesophagus. The cancer cell lysate yielded 1038 protein spots while the 

normal epithelial lysate yielded 1088 protein spots. In-gel digestion of the 

differentially expressed protein spots was followed by capillary high performance 

liquid chromatography (HPLC) tandem mass analysis to achieve further 

identification. This way, tumor rejection antigen (gp96) was found to be 

upregulated in oesophageal squamousal cell cancer (21). Applying the same 

procedure to smaller numbers of microdissected cells from biopsy samples 

with gastric metaplasia appeared to be successful as well (22). Approximately 

1200 spots were identified from 30,000 microdissected cells. Twenty-eight of 

these spots were over expressed in the metaplasia samples as compared to 

the normal surface cells (22). However, subsequent MALDI-TOF measurements 

of the spots did not result in the identification of proteins. The same 

procedure was applied to 50,000 microdissected cells resulting in the identification 

of 32 proteins in breast epithelial cancer cells (23), of which thirteen 

had not been associated previously with the tumors (23). One technical aspect 

of the 2D DIGE method needs special attention: the nature of the fluorescent 

dyes and their ability to bind to lysine residues only (21). Proteins with high 

percentages of lysine residues can be labeled more efficiently as compared to 

proteins containing little or no lysine. By developing a new generation of dyes 

reacting with cysteine residues, the sensitivity of DIGE has been improved (24). 

Although cysteine is less abundant than lysine in proteins in general, cysteine 

labeling can be carried to saturation. Lysine labeling must be limited to 1–3% 

of all the residues to prevent loss of solubility when bulky hydrophobic dyes 

are coupled to the polar lysine residues (24). Greengauz-Roberts and coworkers 

applied the saturated labeling for cysteine residues to study about 5000 cells 

obtained by LCM of metaplasia and cancer cells. A total of 1471 distinct protein 

features were observed from the relatively small number of cells. Ninety-six of 

these spots were further identified. Using MALDI-MS and MS/MS measurements 

in addition to the specific position of the protein in the gel resulted in the 

identification of 42 proteins in cancer samples (25). Also Sitek and coworkers 

described a novel approach to analyze glomerular proteins from mice and 

human samples using DIGE saturation labeling (26). Only 10 glomeruli (0.5 μg) 

picked by LCM from a slide of a human kidney biopsy appeared to be sufficient 

to visualize 900 spots using DIGE technique (26). 2D DIGE holds several


advantages over the conventional 2D gel. One of the most important advantages 

is the improvement of the reproducibility of 2D DIGE method. The gel-to-gel 

differences are minimalized because the separation of the pooled samples takes 

place in the same gel. Therefore, the comparison of protein expression from 

two cell populations or samples can be more accurately assessed and easier to 

be identified. The quantitative differences of protein contents are also better 

measured by the application of fluorescent dyes. In addition, 2D DIGE enables 

a higher throughput analysis of 2D gels by its feasibility to automatic gel 

imaging. Importantly, labeling of proteins by fluorescent dyes did not affect the 

protein identification by MS, because only small percentages of the molecules 

of each protein are labeled. Importantly, for 2D DIGE the number of microdissected 

cells, which are required for protein identification is less as compared 

to the other 2D electrophoresis techniques (Table 1). 

5. LCM and Different Labeling Techniques 

The comparison of the proteome of two different samples (for instance, 

normal and tumor cells) is facilitated by labeling. In 2004, Li and coworkers 

described a method for qualitative and quantitative protein analysis by 

combining LCM with isotope-coded affinity tag labeling technology and twodimensional 

liquid chromatography coupled with tandem mass spectroscopy 

(2D-LC-MS/MS) (27). Approximately 50,000–100,000 cells of HCC and 

nonHCC hepatocytes were microdissected and a total of 644 proteins in 

HCC hepatocytes were qualitatively determined, and 261 differential proteins 

between the two groups were quantified (28). In 2004, 16 O/ 18 O isotopic labeled 

peptides were generated from 10,000 microdissected cells of ductal carcinoma 

of the breast. The approach allowed the identification of 76 proteins (29). 

By using reverse phase liquid chromatography-electrospray ionization tandem 

mass spectrometry (LC-ESI-MS/MS) Zang and coworkers were able to identify 

proteins that were significantly upregulated in the breast tumor cells (29). 

Separating the radioactive labeled peptides on the high resolution 54 cm serial 

immobilized pH gradient isoelectric focusing 2D-PAGE gel provided a precise 

estimate of the abundance ratio for proteins from two samples (30). The radioiodination 

of 3.8 μg renal carcinoma proteins and 3.8 μg normal kidney proteins 

with both 125 I and 131 I followed by mass spectrometric identification revealed 

29 differentially expressed proteins (30). Applying the same methodology of 

radioactive labeling to a pool of microdissected breast cancer cells provided 

a sensitive method to identify some differentially expressed proteins in correlation 

with the presence of progesterone receptor in estrogens receptor-positive 

breast cancer (31).


6. Combining LCM and Different Separation Methods 

It has been shown previously that the number of detected and identified 

peptides and proteins increases significantly by coupling MALDI-MS (32) 

and ESI-MS (33) to a peptide or protein separation system. In 2003, Wu and 

coworkers described a method for discovering biomarkers from microdissected 

homogeneous cells from breast cancer cell lines (34). Following capturing 

the cells, the peptide digest was fractionated by reversed phase HPLC and 

analyzed by ion trap MS (34). HPLC fractionation of about 10,000 endothelial 

cells from a breast cancer cell line (SKBR-3) followed by ESI MS resulted 

in the identification of low-expressed proteins in the cell line. Capillary 

isoelectric focusing combined with the reverse phase nano-LC in an automated 

and integrated platform provides systematic resolution of complex peptide 

mixtures generated from limited protein quantities (7). This method separated 

the mixture of peptides based on differences in isoelectric points and hydrophobicity, 

and it eliminates peptide loss and analyte dilution (7). This method 

of separation coupled to ESI-tandem MS assists in the detection of 6866 

peptides, leading to the identification of 1820 proteins from 20,000 microdissected 

cells of glioblastoma (7). In order to increase the number of identified 

proteins from LCM of brain samples, Gozal and coworkers added an extra 

separation step (35). After collecting cells by LCM, the total protein were 

extracted and resolved on an SDS gel. Gels were cut out into multiple pieces 

followed by trypsin digestion. Peptides were subjected to highly sensitive liquid 

chromatography-tandem mass spectrometry (LC-MS/MS). This way resulted 

in identifying hundreds to thousands of proteins (35). 

7. LCM and Gel-Free Mass Spectrometry 

There are possibilities of measuring the peptide digest of cells harvested by 

LCM directly by MS, without an initial separation step on 2D PAGE (known as 

“gel-free MS”). Guo and coworkers directly analyzed endometrial epithelium 

cells obtained by LCM using matrix-assisted laser desorption/ionization timeof-flight 

mass spectrometry (MALDI-TOF/MS) (36). A total of 16 physiologic 

and malignant endometrial samples including four proliferative and four 

secretory endometria, and eight endometrioid adenocarcinomas were used for 

this study. Approximately 2000 cells appeared to be sufficient to confirm 

overexpression of two proteins, calgranulin A and chaperonin 10 in the 

epithelial cells of endometrial adenocarcinoma samples (36). In another study, 

the direct analysis of 125 trophoblast and stroma cells of placental tissue resulted 

in the detection of significant expressed protein differences between these two 

cell types (37). Also, differentially expressed proteins between breast cancer 

and normal samples can be detected by direct MALDI-TOF/MS measurements


of 2000–2400 LCM cells (38). In a recent study, it was possible to identify 

over 1000 proteins from 3000 microdissected cells by the combination of 

advanced nanoLC and high resolution Fourier transformer mass spectrometry 

(FTMS) (39). 

8. LCM and Protein Chip Technology 

There are currently two approaches to produce arrays capable of generating 

protein network information. The first method is the forward phase array in 

which each spot on the slide represents a specific antibody. Therefore, the array 

is incubated with only one test sample (9). The second method is the reverse 

phase array in which each spot represents an individual test sample, and the 

array is composed of multiple, different samples, which then can be tested 

under the same experimental conditions. In addition, when the arrays are probed 

separately with two different classes of antibodies, it is possible to specifically 

detect the total and phosphorylated forms of the protein of interest (9). By 

combining LCM technique to protein chip technology, Melle and coworkers 

identified annexin V as a specific protein in head and neck cancer patients, 

and heat shock protein 10 as a biomarker in colorectal cancer patients (40,41). 

The protein lysates from 3000 to 5000 microdissected cells were analyzed on 

both strong anion exchange arrays and weak cation exchange arrays, followed 

by separation steps (e.g., 2D gel or reverse phase chromatography and SDS- 

PAGE), MS measurements, and MS/MS analysis (40,41). In both cases, a 

validation step by immunohistochemistry confirmed their findings. 

In other studies surface-enhanced laser desorption/ionization time-of-flight 

analysis was applied to microdissected cells because of its sensitivity to 

smaller amounts of material than other techniques such as 2D gel (42). Using 

30,000–50,000 cells of prostate carcinoma specimens, the unique expression 

of prostate carcinoma-associated protein, called PCa-24 in the epithelial cells, 

was reached (42). Protein microarrays hold several technical challenges (43). 

Their application offers the advantage of scalability, flexibility, and automatic 

processing (43). Arrays may also enable the control of key parameters such as 

temperature, pH, and cofactor concentration, which are not easily afforded by 

cell-based systems. 

9. Perspectives of LCM and Mass Spectrometry Analysis 

The use of LCM of (relatively) pure populations of cells to be used for 

further analysis of their proteome is an important addition to the arsenal of 

techniques in bioscience. However, this technique is still time consuming and 

yield relatively small numbers of cells. To overcome this problem, alternative


Intens. 

×10 7 

1994.98513 

Intens. 

×10 6 

1.0 

1726.89642 

1793.73840 

1891.97950 

2025.94879 

1999.99082 

0.8 

1818.99943 

1943.95115 

1840.98089 1873.94999 

fibrinogen 

1.5 

0.6 

GAPDH 

1859.95483 

1978.96298 

1963.92507 

1475.75278 

0.4 

CD34 antigen 

0.2 

1.0 

1277.71354 

0.0 

1700 1750 1800 1850 1900 1950 2000 m/z 

+MS 

0.5 

GFAP 

1707.77693 

fibrinogen 

2151.08736 

2368.27262 

2511.14239 

Tubulin 

Hb 

2706.17286 

alpha 2 

3265.53235 

2903.42238 

0.0 

1000 1500 2000 2500 3000 3500 m/z 

+MS 

Fig. 2. MALDI FTMS spectrum obtained from 150 microdissected cells from a 

frozen glioma tissue sample. The spectrum contains approximately thousand monoisotopic 

peaks between 700 and 3000 m/z at relative high peak intensities. The small box 

is a zoom in for a small part of the spectra, between 1700 and 2000 m/z. It shows the 

very high numbers of peaks obtained from measuring a very small number of cells. 

The peaks can be identified by different sequencing MS techniques; some examples of 

identified peptides are indicated in the spectrum. 

steps of processing tissues are needed. Sample collection and preparation is 

crucial. During the microdissection procedure, special attention should be taken 

to prevent waist and contamination of target material. For instance, material 

should not drop from, or stick to, the cap of the tubes used. Another consideration 

is to minimize the steps of transferring the collected material from one 

tube into the other. Therefore, the use of low protein binding tubes is recommended. 

A protocol for sample preparation is included in this chapter (Box 1). 

The 2D PAGE is a well-established technique that had been used in combination 

with LCM in many studies so far. The need of relative large numbers of 

cells blocks the possibility to measure large numbers of samples as indicated 

in Table 1. In addition, the relative low reproducibility hampers sound statistical 

analysis. 2D DIGE improves reproducibility and also lowers the required 

amount of microdissected tissue. However, this technique is suitable for experimental 

research only.


LCM sample preparation protocol: 

Cryosections of 8 μm were made from glioma braintumor tissue and 

mounted on polyethylene naphthalate covered glass slides (PALM Microlaser 

Technologies AG, Bernried, Germany) as described previously (38). The 

slides were fixed in 70% ethanol and stored at (–20 (C for not more than 2 

days. After fixation and immediately before microdissection, the slides were 

washed twice with Milli-Q water, stained for 10 s in haematoxylin, washed 

again twice with Milli-Q water and subsequently dehydrated in a series of 50, 

70, 95, and 100% ethanol solution and air dried. The PALM laser microdissection 

and pressure catapulting device, type P-MB was used with PalmRobo 

v2.2 software at 40× magnification. Estimating that a cell has a volume of 

10 × 10 × 10 μm, we microdissected an area of about 190,000 μm 2 of blood 

vessels and another area of the same size of the surrounding tumor tissue from 

each sample, resulting in approximately 1500 cells per sample. The microdissected 

cells were collected in caps of PALM tubes in 5 μl of 0.1% RapiGest 

buffer (Waters, Milford, MA, USA). The caps were cut and placed onto 

0.5 ml Eppendorf protein LoBind tubes (Eppendorf, Hamburg, Germany). 

Subsequently, these tubes were centrifuged at 12,000 g for 5 min. To make 

sure that all the cells were covered with buffer, another 5 μl of RapiGest 

was added to the cells. All samples were stored at –80°C. After thawing 

the microdissected tissue, the tissue was disrupted by external sonification 

for 1 min at 70% amplitude at a maximum temperature of 25°C (Bransons 

Ultrasonics, Danbury, USA). The samples were incubated at 37 and 100°C 

for 5 and 15 min, respectively, for protein solubilization and denaturation. 

To each sample, 1.5 μl of 100 ng/μl gold grade trypsin (Promega, Madison, 

WI, USA) in 3 mM Tris–HCL diluted 1:10 in 50 mM NH 4 HCO 3 was added 

and incubated overnight at 37°C for protein digestion. To inactivate trypsin 

and to degrade the RapiGest, 2 μl of 500 mM HCL was added and incubated 

for 30 min at 37°C. Samples were dried in a Speedvac (Thermo Savant, 

Holbrook, NY, USA) and reconstituted in 5 μl of 50% acetonitrile/0.5% trifluoroacetic 

acid/water prior to measurement. Samples were used for immediate 

measurements, or stored for a maximum of 10 days at 4°C. 

Recently, the improvement of resolution and detection limits in modern mass 

spectrometers, particularly in FTMS, opened a new research field to analyze 

small numbers of microdissected cells (in the range of 200–5000). FTMS 

has specific characteristics, unrivalled high mass resolution (in the order of 

100,000–1,000,000), high mass accuracy (below 1 ppm), dynamics (three to 

four orders of magnitude), and its good signal to noise ratio (44). These features 

facilitate combining this technique with LCM. For instance, by MALDI-FTMS,


peptide digests of no more than 150 cells taken from biological samples (e.g., 

glioma vessel tissue) resulted in informative mass spectra (Fig. 2). It is expected 

that techniques like FTMS soon will be implicated in the practice of routine 

laboratories for the detection of disease-related proteins in clinical specimens. 

References 

1. Zhang, L., Zhou, W., Velculescu, V. E., Kern, S. E., Hruban, R. H., Hamilton, S. R., 

Vogelstein, B. and Kinzler, K. W. (1997) Gene expression profiles in normal and 

cancer cells. Science 276, 1268–1272. 

2. Curran, S., McKay, J. A., McLeod, H. L. and Murray, G. I. (2000) Laser capture 

microscopy. Mol Pathol 53, 64–68. 

3. Going, J. J. and Lamb, R. F. (1996) Practical histological microdissection for PCR 

analysis. J Pathol 179, 121–124. 

4. Zhuang, Z., Bertheau, P., Emmert-Buck, M. R., Liotta, L. A., Gnarra, J., Linehan, 

W. M. and Lubensky, I. A. (1995) A microdissection technique for archival DNA 

analysis of specific cell populations in lesions


13. Ball, H. J. and Hunt, N. H. (2004) Needle in a haystack: microdissecting the 

proteome of a tissue. Amino Acids 27, 1–7. 

14. Emmert-Buck, M. R., Gillespie, J. W., Paweletz, C. P., Ornstein, D. K., Basrur, V., 

Appella, E., Wang, Q. H., Huang, J., Hu, N., Taylor, P. and Petricoin, E. F. 3rd (2000) 

An approach to proteomic analysis of human tumors. Mol Carcinog 27, 158–165. 

15. Lawrie, L. C., Curran, S., McLeod, H. L., Fothergill, J. E. and Murray, G. I. (2001) 

Application of laser capture microdissection and proteomics in colon cancer. Mol 

Pathol 54, 253–258. 

16. Jones, M. B., Krutzsch, H., Shu, H., Zhao, Y., Liotta, L. A., Kohn, E. C. and 

Petricoin, E. F. 3rd (2002) Proteomic analysis and identification of new biomarkers 

and therapeutic targets for invasive ovarian cancer. Proteomics 2, 76–84. 

17. Wulfkuhle, J. D., Sgroi, D. C., Krutzsch, H., McLean, K., McGarvey, K., 

Knowlton, M., Chen, S., Shu, H., Sahin, A., Kurek, R., Wallwiener, D., 

Merino, M. J., Petricoin, E. F. 3rd, Zhao, Y. and Steeg, P. S. (2002) Proteomics 

of human breast ductal carcinoma in situ. Cancer Res 62, 6740–6749. 

18. Shekouh, A. R., Thompson, C. C., Prime, W., Campbell, F., Hamlett, J., Herrington, 

C. S., Lemoine, N. R., Crnogorac-Jurcevic, T., Buechler, M. W., Friess, H., 

Neoptolemos, J. P., Pennington, S. R. and Costello, E. (2003) Application of laser 

capture microdissection combined with two-dimensional electrophoresis for the 

discovery of differentially regulated proteins in pancreatic ductal adenocarcinoma. 


19. Zhang, D. H., Tai, L. K., Wong, L. L., Sethi, S. K. and Koay, E. S. (2005) 

Proteomics of breast cancer: enhanced expression of cytokeratin19 in human 

epidermal growth factor receptor type 2 positive breast tumors. Proteomics 5, 

1797–1805. 

20. Ai, J., Tan, Y., Ying, W., Hong, Y., Liu, S., Wu, M., Qian, X. and Wang, H. (2006) 

Proteome analysis of hepatocellular carcinoma by laser capture microdissection. 


21. Zhou, G., Li, H., DeCamp, D., Chen, S., Shu, H., Gong, Y., Flaig, M., 

Gillespie, J. W., Hu, N., Taylor, P. R., Emmert-Buck, M. R., Liotta, L. A., 

Petricoin, E. F. 3rd and Zhao, Y. (2002) 2D differential in-gel electrophoresis for 

the identification of esophageal scans cell cancer-specific protein markers. Mol 

Cell Proteomics 1, 117–124. 

22. Lee, J. R., Baxter, T. M., Yamaguchi, H., Wang, T. C., Goldenring, J. R. and 

Anderson, M. G. (2003) Differential protein analysis of spasomolytic polypeptide 

expressing metaplasia using laser capture microdissection and two-dimensional 

difference gel electrophoresis. Appl Immunohistochem Mol Morphol 11, 188–193. 

23. Hudelist, G., Singer, C. F., Pischinger, K. I., Kaserer, K., Manavi, M., Kubista, E. 

and Czerwenka, K. F. (2006) Proteomic analysis in human breast cancer: identification 

of a characteristic protein expression profile of malignant breast epithelium. 


24. Shaw, J., Rowlinson, R., Nickson, J., Stone, T., Sweet, A., Williams, K. and 

Tonge, R. (2003) Evaluation of saturation labelling two-dimensional difference gel 

electrophoresis fluorescent dyes. Proteomics 3, 1181–1195.


25. Greengauz-Roberts, O., Stoppler, H., Nomura, S., Yamaguchi, H., 

Goldenring, J. R., Podolsky, R. H., Lee, J. R. and Dynan, W. S. (2005) Saturation 

labeling with cysteine-reactive cyanine fluorescent dyes provides increased sensitivity 

for protein expression profiling of laser-microdissected clinical specimens. 


26. Sitek, B., Potthoff, S., Schulenborg, T., Stegbauer, J., Vinke, T., Rump, L. C., 

Meyer, H. E., Vonend, O. and Stuhler, K. (2006) Novel approaches to analyse 

glomerular proteins from smallest scale murine and human samples using DIGE 

saturation labelling. Proteomics 6, 4337–4345. 

27. Li, C., Hong, Y., Tan, Y. X., Zhou, H., Ai, J. H., Li, S. J., Zhang, L., Xia, Q. C., 

Wu, J. R., Wang, H. Y. and Zeng, R. (2004) Accurate qualitative and quantitative 

proteomic analysis of clinical hepatocellular carcinoma using laser capture 

microdissection coupled with isotope-coded affinity tag and two-dimensional liquid 

chromatography mass spectrometry. Mol Cell Proteomics 3, 399–409. 

28. Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H. and Aebersold, R. 

(1999) Quantitative analysis of complex protein mixtures using isotope-coded 

affinity tags. Nat Biotechnol 17, 994–999. 

29. Zang, L., Palmer Toy, D., Hancock, W. S., Sgroi, D. C. and Karger, B. L. (2004) 

Proteomic analysis of ductal carcinoma of the breast using laser capture microdissection, 

LC-MS, and 16O/18O isotopic labeling. J Proteome Res 3, 604–612. 

30. Poznanovic, S., Wozny, W., Schwall, G. P., Sastri, C., Hunzinger, C., 

Stegmann, W., Schrattenholz, A., Buchner, A., Gangnus, R., Burgemeister, R. and 

Cahill, M. A. (2005) Differential radioactive proteomic analysis of microdissected 

renal cell carcinoma tissue by 54 cm isoelectric focusing in serial immobilized pH 

gradient gels. J Proteome Res 4, 2117–2125. 

31. Neubauer, H., Clare, S. E., Kurek, R., Fehm, T., Wallwiener, D., Sotlar, K., 

Nordheim, A., Wozny, W., Schwall, G. P., Poznanovic, S., Sastri, C., 

Hunzinger, C., Stegmann, W., Schrattenholz, A. and Cahill, M. A. (2006) 

Breast cancer proteomics by laser capture microdissection, sample pooling, 54- 

cm IPG IEF, and differential iodine radioisotope detection. Electrophoresis 27, 

1840–1852. 

32. Preisler, J., Hu, P., Rejtar, T., Moskovets, E. and Karger, B. L. (2002) Capillary 

array electrophoresis-MALDI mass spectrometry using a vacuum deposition 

interface. Anal Chem 74, 17–25. 

33. Bergstrom, S. K., Samskog, J. and Markides, K. E. (2003) Development 

of a poly(dimethylsiloxane) interface for on-line capillary column liquid 

chromatography-capillary electrophoresis coupled to sheathless electrospray 

ionization time-of-flight mass spectrometry. Anal Chem 75, 5461–5467. 

34. Wu, S. L., Hancock, W. S., Goodrich, G. G. and Kunitake, S. T. (2003) An approach 

to the proteomic analysis of a breast cancer cell line (SKBR-3). Proteomics 3, 

1037–1046. 

35. Gozal, Y. M., Cheng, D., Duong, D. M., Lah, J. J., Levey, A. I. and Peng, J. (2006) 

Merger of laser capture microdissection and mass spectrometry: a window into the 

amyloid plaque proteome. Methods Enzymol 412, 77–93.


36. Guo, J., Colgan, T. J., DeSouza, L. V., Rodrigues, M. J., Romaschin, A. D. 

and Siu, K. W. (2005) Direct analysis of laser capture microdissected endometrial 

carcinoma and epithelium by matrix-assisted laser desorption/ionization mass 

spectrometry. Rapid Commun Mass Spectrom 19, 2762–2766. 

37. de Groot, C. J., Steegers-Theunissen, R. P., Guzel, C., Steegers, E. A. and 

Luider, T. M. (2005) Peptide patterns of laser dissected human trophoblasts 

analyzed by matrix-assisted laser desorption/ionisation-time of flight mass 

spectrometry. Proteomics 5, 597–607. 

38. Umar, A., Dalebout, J. C., Timmermans, A. M., Foekens, J. A. and Luider, T. 

M. (2005) Method optimisation for peptide profiling of microdissected breast 

carcinoma tissue by matrix-assisted laser desorption/ionisation-time of flight 

and matrix-assisted laser desorption/ionisation-time of flight/time of flight-mass 

spectrometry. Proteomics 5, 2680–2688. 

39. Umar, A., Luider, T. M., Foekens, J. A. and Pasa-Tolic, L. (2007) NanoLC-FT- 

ICR Ms improves proteome coverage attainable for approximately 3000 lasermicrodissected 

breast carcinoma cells. Proteomics 7, 323–329. 

40. Melle, C., Bogumil, R., Ernst, G., Schimmel, B., Bleul, A. and von Eggeling, F. 

(2006) Detection and identification of heat shock protein 10 as a biomarker in 

colorectal cancer by protein profiling. Proteomics 6, 2600–2608. 

41. Melle, C., Ernst, G., Schimmel, B., Bleul, A., Koscielny, S., Wiesner, A., 

Bogumil, R., Moller, U., Osterloh, D., Halbhuber, K. J. and von Eggeling, F. 

(2003) Biomarker discovery and identification in laser microdissected head and 

neck squamous cell carcinoma with ProteinChip technology, two-dimensional gel 

electrophoresis, tandem mass spectrometry, and immunohistochemistry. Mol Cell 


42. Zheng, Y., Xu, Y., Ye, B., Lei, J., Weinstein, M. H., O’Leary, M. P., Richie, J. P., 

Mok, S. C. and Liu, B. C. (2003) Prostate carcinoma tissue proteomics for 

biomarker discovery. Cancer 98, 2576–2582. 

43. Cutler, P. (2003) Protein arrays: the current state-of-the-art. Proteomics 3, 3–18. 

44. Dekker, L. J., Burgers, P. C., Guzel, C. and Luider, T. M. (2007) Ftms and 

TOF/TOF mass spectrometry in concert: identifying peptides with high reliability 

using matrix prespotted MALDI target plates. J Chromatogr B Analyt Technol 

Biomed Life Sci 847, 62–64. 

45. Mustafa, D. A., Burgers, P. C., Dekker, L. J., Charif, H., Titulaer, M. K., 

Smitt, P. A., Luider, T. M. and Kros, J. M., (2007) Identification of glioma 

neovascularization-related proteins by using MALDI-FTMS and nano-LC fractionation 

to microdissected tumor vessels. Mol Cell Proteomics 6, 1147–1157.

III 

Clinical Proteomics by LC-MS Approaches

10 

Comparison of Protein Expression by Isotope-Coded 

Affinity Tag Labeling 

Zhen Xiao and Timothy D. Veenstra 

Summary 

Isotope-coded affinity tag (ICAT) labeling, in combination with mass spectrometry 

(MS), has been widely adopted as an effective method for comparing protein abundance 

levels. This chapter describes the ICAT labeling procedure in search for the celecoxibregulated 

proteins in a colon cancer cell line. Celecoxib, a cyclooxygenase-2 (COX-2) 

specific inhibitor, is used as a colorectal cancer preventative drug in clinical trials. Here, 

celecoxib is used to inhibit the expression of COX-2 in a colon cancer cell line HT-29. 

To elucidate the proteomic changes induced by celecoxib, the protein lysates from the 

treated and control cells are prepared. The cysteine-containing proteins are labeled with the 

heavy and light ICAT reagents, respectively. The labeled proteins are then combined and 

digested with trypsin. The ICAT-labeled peptides are subject to the purification through 

an avidin column and eventually the cleavage of the biotin tags. This chapter focuses on 

the ICAT labeling procedure itself, because sample preparation is the most critical step of 

an ICAT-based protein expression comparison experiment. Other related procedures such 

as the cation exchange high performance liquid chromatography separation of peptides 

and MS analysis are detailed elsewhere in this book. 

Key Words: isotope-coded affinity tags; quantitative proteomics; mass spectrometry. 


The application of mass spectrometry (MS) has rapidly expanded from 

simple identification of protein components to the quantitative comparison 

of proteomic changes under various biological and physiological conditions 

(1,2,3). In many studies, it is desirable to identify proteins and quantify their 



181

182 Xiao and Veenstra 

levels simultaneously using MS. While the ability to target specific molecules 

for quantitation is well established, there are experimental and technical issues 

that limit the accuracy of direct quantitation of hundreds (or thousands) of 

species in a single MS experiment and make it extremely challenging (4,5,6,7). 

To resolve this hurdle, a variety of chemical-based labeling and derivatization 

techniques have been developed (5,7,8,9). One of these techniques, isotopecoded 

affinity tags (ICATs), has been widely adopted and remains the model 

system by which most other differential labeling methods have been developed 

(10). The structure of the reagent used in ICAT studies is composed of four 

parts: (1) an iodoacetamide group that covalently reacts with cysteine residues 

within proteins; (2) an isotope-coded linker regions, which is prepared in two 

distinct versions containing either nine 13 C (heavy version) or nine 12 C (light 

version); (3) a biotin tag that facilitates the purification of labeled peptides via 

its specific binding to avidin; and (4) an acid-labile bond that is situated between 

the biotin and isotopically differential domain of the reagent (Fig. 1). After 

labeling the cysteine residues, the protein mixture is enzymatically digested 

(usually with trypsin) and the labeled peptides purified via avidin chromatography. 

Following the enrichment of the ICAT-labeled peptides, the cleavable 

linker and the biotin tag are removed using trifluoroacetic acid (TFA). The 

removal of the biotin tag reduces the mass of the remaining tag attached to the 

peptide and increases the fragmentation efficiency and ultimately the success 

rate of peptide identification by tandem MS. 

The advantage of ICAT labeling is the identical chemistry, yet differential 

mass, of the heavy and light reagents, which enables the protein 

abundances within two complex proteome samples to be compared simultaneously. 

Following their coelution from a nanoflow reversed-phase liquid 

chromatography column, the light- and heavy-labeled peptides are easily recognized 

within the mass spectrum, being separated by ∼9 Da. The tandem MS 

spectrum enables the peptide to be identified, while the ratio of the areas of 

each peak is used as a measurement of the peptide’s relative abundance in 

the samples being compared. Since its inception, the ICAT reagents have been 

modified, improved, and made available commercially via applied biosystems 

Fig. 1. The structure of cleavable isotope-coded affinity tag reagent.

Isotope-Coded Affinity Tag Labeling 183 

as a kit (11). The combination of ICAT labeling, peptide fractionation, and 

the liquid chromatography tandem mass spectrometry has enabled the rapid 

and simultaneous identification and quantitation of changes in complex protein 

mixtures (12,13,14,15,16). 

In this chapter, the ICAT labeling procedure is described as part of an experiment 

to identify celecoxib-induced proteomic changes in colon cancer cells. 

Celecoxib is a nonsteroidal anti-inflammatory drug that specifically inhibits 

cyclooxygenase-2 (COX-2) (17,18). In clinical trials, it has been shown to 

inhibit the development of precancerous polyposis in colon (19,20). In this 

study, a COX-2 expressing colon cancer cell line (HT-29) is used (21,22). 

After treating the cells with celecoxib, cell lysate would be prepared and 

labeled with the ICAT reagents. A schematic diagram of the ICAT labeling 

and peptide analysis procedure is shown in Fig. 2. Since the core of the ICATbased 

quantitative proteomic analysis is sample preparation, this chapter is 

dedicated to the details of the ICAT labeling protocol itself. For information 

on strong cation exchange (SCX) high performance liquid chromatography 

(HPLC) separation of peptides, analysis by nanoflow reversed-phase liquid 

chromatography tandem mass spectrometry, and bioinformatics analysis, refer 

to the chapter on “Analysis of the Extracellular Matrix and Secreted Vesicle 

Proteomes by Mass Spectrometry,” (Subheadings 3.6–3.8). The methods 

described in this chapter can be used to (1) understand the proteomic changes 

in response to drug; (2) illustrate the molecular mechanisms underlying the 

drug effects; and (3) search for biomarkers or endpoints that can be used to 

monitor and evaluate the therapeutic and intervention approaches. 

2. Materials 

2.1. Cell Culture and Harvest 

1. T-75 cell culture flasks 

2. McCoy’s 5a medium supplemented with 10% (v/v) fetal bovine serum, 50 U/mL 

penicillin, 50 μg/mL streptomycin, and 1.5 mM l-glutamine (American Type 

Culture Collection (ATCC), Manassas, VA) 

3. Dimethylsulfoxide (DMSO, cell culture use) 

4. HT-29 cell line (ATCC, Manassas, VA) 

5. Celecoxib (Pfizer, New York, NY) 

6. 75 μM celecoxib: dissolve celecoxib in DMSO to make a 100 mM stock solution. 

Further dilute to 75 μM with McCoy’s 5a cell culture medium. Use the same 

concentration of DMSO in medium as negative control 

7. Sterile phosphate-buffered saline (PBS) solution 

8. 500 mM EDTA, pH 8 

9. 2 mM EDTA in sterile PBS: add 80 μL of 500 mM EDTA, pH 8, in 20 mL of PBS 

10. Centrifuge (maximum force: ∼17,000×g)


Fig. 2. Schematic diagram of the ICAT labeling procedure applied to the quantitative 

proteomic analysis. 

2.2. Cell Lysis, Desalting, and Protein Quantitation 

1. Lysis buffer: 50 mM Tris–HCl, pH 7.2, 1% Triton X-100, 10 mM sodium fluoride 

(NaF), 1 mM sodium orthovanadate (Na 3 VO 4 ), and 1 mM EDTA 

2. Digital sonifier (Model 250, Branson Ultrasonics Corporation, Danbury, CT) 

3. Bicinchoninic acid (BCA) protein assay reagent kit (Pierce, Rockford, IL) 

4. D-Salt TM excellulose plastic desalting column 5 mL (maximum binding capacity 

is 1.25 mg per column) (Pierce, Rockford, IL) 

5. 50 mM NH 4 HCO 3 ,pH8.3


6. Coomassie blue reagent: coomassie plus – The Better Bradford TM assay reagent 

(Pierce, Rockford, IL) 

7. Centrifuge (maximum force: ∼17,000×g) 

8. Vacuum centrifuge 

2.3. Denaturing and Reducing the Proteins 

1. Denaturing buffer: 6 M guanidine in 50 mM NH 4 HCO 3 ,pH8.3 

2. 100 mM Tris (2-carboxyethyl) phosphine (TCEP) (Pierce, Rockford, IL) 

3. Boiling water bath 

2.4. Labeling with Cleavable ICAT Reagents, Desalting, 

and Tryptic Digestion 

1. Cleavable ICAT TM reagents (light and heavy sulfhydryl modifying biotinylating 

reagents). Store at –20 °C. One unit of either light or heavy reagent labels 100 μg 

of protein. The regular kit offers both reagents in 1 unit/tube. The bulk kit offers 

both reagents in 10 units/tube. The method described here is based on the use of 

a regular kit, i.e., 1 unit that labels 100 μg of protein/tube. (Applied Biosystems, 

Foster City, CA) 

2. Acetonitrile 

3. 37 °C water bath 

4. D-Salt TM excellulose plastic desalting column 5 mL (Pierce, Rockford, IL) 

5. 50 mM NH 4 HCO 3 ,pH8.3 

6. Coomassie blue reagent: coomassie plus – The Better Bradford TM assay reagent 

(Pierce, Rockford, IL) 

7. Trypsin gold, MS grade (Promega, Madison, WI) 

2.5. Purifying the Labeled Peptides 

1. Phenylmethanesulfonyl fluoride (PMSF) (Sigma Chemical Co., St. Louis, MO) 

2. Glass wool 

3. 5–3/4˝ disposable pasteur glass pipettes 

4. Ultralink TM immobilized monomeric avidin slurry [50% (v/v)] (Pierce, 

Rockford, IL) 

5. Teflon tubing that fits the tip of the 5–3/4˝ disposable pasteur glass pipettes 

6. 2× PBS buffer, pH 7.2: dissolve 14.2 g of Na 2 HPO 4 and 8.77 g of NaCl in 

450 mL of H 2 O. Adjust pH to 7.2 by adding about 350 μL of 85% (v/v) H 3 PO 4 . 

Add H 2 O to make a total volume of 500 mL. The final concentration is 200 mM 

Na 2 HPO 4 and 300 mM NaCl 

7. 1× PBS, pH 7.2: dilute 2× PBS 1:1 in H 2 O 

8. 2 mM biotin solution: dissolve 9.8 mg of d-biotin ImmunoPure (MW 244.31, 

Pierce, Rockford, IL) in 20 mL of 2× PBS, pH 7.2 

9. Acetonitrile [20% (v/v)] in 50 mM NH 4 HCO 3 ,pH8.3 

10. Acetonitrile [30% (v/v)] containing 0.4% (v/v) formic acid


11. pH paper (pH 2–9) 

12. Dry ice 

2.6. Cleaving Biotin 

1. Cleaving reagent A (10 mL) (Applied Biosystems, Foster City, CA): contains 

concentrated TFA. Store in fume hood at room temperature 

2. Cleaving reagent B (Applied Biosystems, Foster City, CA): store at –20 °C 

3. 37 °C water bath 


3. Methods 

3.1. Cell Culture and Harvest 

1. On day 1, plate HT-29 cells in T-75 flasks at 5 × 10 6 cells/flask. 

2. On day 2, aspirate medium. Culture cells with fresh medium containing 75 μM 

of celecoxib or DMSO (negative control). 

3. On day 3, 24 h after treating cells, aspirate cell culture medium. Rinse cells once 

quickly with 6 mL of PBS. 

4. Add 3 mL of 2 mM EDTA-PBS per flask, put flask into the 37 °C incubator. 

Monitor the detachment of cells carefully. Cells usually detach within 5 min. For 

the celecoxib-treated cells, it takes less than 5 min (see Note 1). 

5. Tap the side of the flask against the palm of hand to dislodge cells. When the 

cells are visibly detached, add 7 mL of PBS to flask. Resuspend cells and transfer 

cell suspension to a 15 mL centrifuge tube. Harvest the treated and control cells 

in separate tubes. 

6. Centrifuge the cell suspension at 500×g for 5 min. Remove the supernatant. 

7. Wash cell pellet with 10 mL of PBS three times. Centrifuge at 500×g for 5 min. 

Remove PBS after each centrifugation. 

8. Cell pellet is ready for lysis. Leave cell pellet on ice before proceeding to the 

next step, or store the pellet at –80 °C. 

3.2. Cell Lysis, Desalting and Protein Quantitation 

1. Add 500 μL of lysis buffer to the cell pellet harvested from each T-75 flask. 

Transfer the resuspended cells to a 1.5 mL eppendorf tube. Vortex briefly. 

2. Clean the sonifier probe with H 2 O, methanol, and let it air dry before use. 

3. To break the cells, set the digital sonifier amplitude at 16%. Hold up the 

eppendorf tube with suspended cells. Let the probe plunge half way into the 

lysis buffer. Pulse for 10 s, pause for 50 s. Repeat this cycle five times. Rest the 

tube on ice between pulses. Lift the tube up again in time before the next 10 s 

pulse cycle starts (see Note 2). 

4. Clean the sonifier probe as in step 2 before starting the next sample. 

5. Centrifuge cell lysate at 15,000×g for 15 min at 4 °C.


6. Transfer cell lysate to a fresh eppendorf tube (see Note 3). 

7. Quantify the protein in cell lysate using the BCA assay (see Note 4). 

8. Prepare desalting column (D-Salt TM Excellulose Plastic Desalting Column, 5 mL, 

Pierce) by washing column with 5× bed volume (i.e., 25 mL) of 50 mM 

NH 4 HCO 3 , pH 8.3 (see Note 5). 

9. Based on the BCA assay results, load up to 1.25 mg of cell lysate into each 

desalting column. Discard the flow through (see Note 6). 

10. Add 0.5 mL of 50 mM NH 4 HCO 3 , pH 8.3 into the column. Collect the flow 

through into one eppendorf tube. Repeat this step seven times. Collect eluant in 

seven 0.5 mL fractions. 

11. Take 10 μL of eluant from each fraction and mix with 300 μL (1:30) of coomassie 

blue reagent (Pierce). Visually examine the color of each tube. The color of 

the protein-containing fractions should change from brown to blue. Proteins 

normally elute in fractions 3–5. 

12. Pool the tubes containing protein. Mix well. Discard the tubes that do not contain 

protein. 

13. Measure the protein concentration using the BCA assay (see Note 4). 

14. Based on the BCA assay results, transfer 800 μg of protein from each of the 

treated and control samples into two separate eppendorf tubes (see Note 7). 

15. Lyophilize these two samples in vacuum centrifuge (see Note 8). 

3.3. Denaturing and Reducing the Proteins 

1. Freshly prepare denaturing buffer and 100 mM TCEP. 

2. Add denaturing buffer and 100 mM TCEP to the protein samples. For 800 μg of 

protein, add 640 μL of denaturing buffer and 8 μL of TCEP (see Note 9). 

3. Vortex until the sample is completely dissolved in the buffer. 

4. Boil the sample for 10 min. 

5. Vortex to mix well. Spin the samples in centrifuge briefly. Cool to room 

temperature. 

3.4. Labeling with Cleavable ICAT Reagents, Desalting, 

and Tryptic Digestion 

1. Remove the ICAT reagents from the –20 °C freezer. Bring to room temperature. 

Avoid exposing them to the light. To label 800 μg of protein (control or treated), 

use eight tubes of reagent (light or heavy, label 100 μg of protein/tube). Spin in 

centrifuge briefly to bring down the powder from the wall to the bottom of the 

tube. 

2. In the chemical hood with lights off, add 20 μL of acetonitrile into each of the 

eight reagent tubes (light or heavy). Add 80 μL (i.e., 100 μg) of protein sample into 

each tube. Tighten the tube caps. Vortex to mix well. Spin briefly in centrifuge 

(see Note 10).


3. Pool the control or treated sample mixtures (eight tubes of light or heavy), 

respectively, into two tubes. This pooling should result in one light and one heavy 

label tube with 800 μL of protein mixture in each. 

4. Incubate the samples in the 37 °C water bath for 2 h. Keep the samples from 

being exposed to light. 

5. Combine the light- and heavy-labeled samples together into one tube. Proceed 

with desalting. 

6. Use the same desalting column as in the previous section. Since the binding 

capacity per column is 1.25 mg, prepare two columns for a total of 1.6 mg of 

labeled protein. Wash each column with 5× bed volume (i.e., 25 mL) of 50 mM 

NH 4 HCO 3 , pH 8.3 (see Note 11). 

7. Load 800 μg of the combined and labeled proteins per column. Follow steps 

8–12 in Subheading 3.2. At the end of elution, pool the protein-containing eluant 

fractions (usually fractions 3–5) into one 15 mL tube. (see Note 12). 

8. Prepare trypsin freshly by reconstituting 20 μg of trypsin in 20 μL of 50 mM 

NH 4 HCO 3 , pH 8.3. Add trypsin to the labeled protein at a trypsin-to-protein ratio 

of 1:40 (w/w). For 1.6 mg of protein, add 40 μg of trypsin (see Note 13). 

9. Wrap the 15 mL tube with aluminum foil. Incubate at 37 °C overnight (see 

Note 14). 

3.5. Purifying the Labeled Peptides 

1. Boil the peptide solution for 10 min to deactivate trypsin. 

2. Freshly prepare 100 mM PMSF in methanol. Vortex to dissolve well. 

3. Add PMSF at a 1:100 dilution (v/v) to the trypsin-digested samples. For 3 mL 

of digests, add 30 μL of PMSF. The final PMSF concentration is 1 mM. Vortex 

briefly to mix. 

4. Prepare the avidin column: put a small trace of glass wool gently into a 5–3/4˝ 

pasteur glass pipette. Push it from the top down for about 4–1/2˝. This packing 

creates a support for the resin to settle onto (see Note 15). 

5. Add 0.5 mL of water into the pipette. Let the water level fall till it reaches the 

glass wool. At this point, the flow should stop naturally. Block the bottom of 

the pipette. Then slowly add 1.5 mL of water into the pipette. Mark the water 

level as an indicator for the volume of 1.5 mL. 

6. Gradually add the avidin slurry to the 1.5 mL mark. Connect Teflon tubing to 

the pipette tip to increase the flow rate (see Note 16). 

7. Condition the column using the following washing buffers and sequence 

(see Note 17) 

a. 2× PBS, pH 7.2, 8 mL (5× bed volume) 

b. 2 mM biotin solution, 6 mL (4× bed volume) 

c. 30% (v/v) acetonitrile, 0.4% (v/v) formic acid, and 6 mL (4× bed volume) 

d. 2× PBS, pH 7.2, 8 mL (5× bed volume) 

8. Sample loading and incubation: take the teflon tubing off. Load 1.5 mL of the 

digest sample into the column. After the sample flows through, incubate at room


temperature for 15 min. Load another 1.5 mL (or the rest) of sample. Incubate 

for 15 min (see Note 18). 

9. Connect the teflon tubing back to the tip of the pipette. Wash the column bound 

with ICAT-labeled peptides with the following buffers and sequence: 

a. 2× PBS, pH 7.2, 8 mL (5× bed volume) 

b. 1× PBS, pH 7.2, 8 mL (5× bed volume) 

c. 20% (v/v) acetonitrile in 50 mM NH 4 HCO 3 , pH 8.3, 6 mL (4× bed volume) 

10. Final wash: take off the teflon tubing. Add 1.3 mL (a volume slightly less than 

the bed volume) of 30% (v/v) acetonitrile, 0.4% (v/v) formic acid as a final 

wash. Discard the flow through. Measure the pH of the last drop of this wash 

step with pH paper. The pH should be >8 (basic), suggesting that acetonitrile 

has not eluted the peptides off and that the peptides are still retained on the 

beads (see Note 19). 

11. Elute the peptides with 4 mL of 30% (v/v) acetonitrile, 0.4% (v/v) formic acid 

in one 15 mL tube. Mix well and divide into four 1 mL aliquots. Briefly freeze 

the peptides on dry ice or at –80 °C and then lyophilize in vacuum centrifuge 


3.6. Cleaving Biotin 

1. Prepare the cleaving reagent mixture in a chemical hood. For 1.6 mg of labeled 

peptides, mix 760 μL of cleaving reagent A with 40 μL of cleaving reagent B. Add 

the cleaving reagent mixture to the dry peptides. Dispense the mixture equally to 

all four peptide aliquots (see Note 21). 

2. Close the tube caps. Vortex well to dissolve the peptides. 

3. Incubate the samples in a 37 °C water bath for 2 h. 

4. Pool all the aliquots together when the incubation is finished. Freeze briefly on 

dry ice or at –80 °C. Lyophilize the peptides in vacuum centrifuge. 

5. Store at –80 °C prior to the next step (i.e., fractionation by SCX HPLC). 

4. Notes 

1. Dislodging cells using a low concentration of EDTA preserves the integrity of 

cell surface proteins, which is critical in quantitative proteomic analysis. 

2. For the Branson digital sonifier, use the following program settings: pulse on for 

10 s; off for 50 s; amplitude = 16%. If bubbles are generated during sonication, 

decrease the amplitude setting. Depending on the sample volume, the setting 

can sometimes be lowered to 14%. The clumps of cells should disappear when 

sonication is complete. 

3. After this step the cell lysate can be stored at –80 °C. Otherwise, proceed to the 

next step, i.e., BCA assay and desalting. 

4. Protein quantitation is a common laboratory procedure. The instructions are 

included within the BCA assay kit (Pierce); therefore, the procedure is not 

described in this chapter.


5. It is helpful to assemble a funnel reservoir on the top of the column to hold a 

larger volume (up to 25 mL) of buffer. 

6. The maximum binding capacity of the desalting column is 1.25 mg of protein 

per column. 

7. The method described here is based on the labeling of 800 μg of protein from 

each of the treated and control samples. This amount of protein is desirable if 

enough cell lysate is available. However, as little as 100 μg of protein from each 

of the treated and control samples can be labeled using this protocol. 

8. It takes about 3htolyophilize the samples. If necessary, leave the samples in 

the vacuum, centrifuge overnight to dry. 

9. It is important to keep the pH of the cell lysate above 7 (ideally between 8 and 

9). A pH below 7 will inhibit the reaction between cysteine residues and the 

iodoacetamide group of the ICAT reagents. 

10. Usually the control sample is labeled with the light reagent and the treated 

sample is labeled with the heavy reagent. 

11. To save time, it is suggested to set the two columns up on the stand during the 

2-h labeling incubation time. It is better to attach a funnel reservoir to the top 

of each column to hold up to 25 mL of wash buffer. 

12. Normally the volume of sample after pooling is about 3 mL. Desalted samples 

may have an opaque color because of the protein present in the sample. 

13. Instead of using the buffer provided by the manufacturer, resuspend trypsin 

in 50 mM NH 4 HCO 3 , pH 8.3. Keep the trypsin-to-protein ratio between 1:40 

and 1:50. 

14. The digestion mixture is incubated overnight for approximately 16–18 h. 

15. Make sure the glass wool is well packed. There should be no holes present; 

however, it should still allow liquid flow through at a reasonable flow rate. 

Check the flow rate by adding 0.5 mL of water into the pipette. The water 

should flow through quickly. Note that the flow rate will be slower considerably 

once the avidin slurry is packed into the column. Take these recommendations 

into consideration and not to pack too much or too little glass 

wool. 

16. The protein binding capacity of avidin slurry is 1.6 mg protein per milliliter of 

packed avidin. One 1.5 mL column should offer sufficient capacity to enrich the 

labeled peptides from 1.6 mg of protein. 

17. The binding of 2 mM biotin to the column and the elution by 30% (v/v) acetonitrile, 

0.4% (v/v) formic acid preclear the column of any potential nonspecific 

binding activities. 

18. The teflon tubing is a useful tool to adjust the flow rate. Connecting the teflon 

tubing on to the tip of the column will increase the flow rate. On the other hand, 

the flow rate will be slower without the teflon tubing attached. 

19. The final wash is aimed to remove any nonspecific binding proteins. Using a 

volume slightly less than the bed volume ensures that the labeled peptides are 

retained on the column. The volume of the final wash buffer can be adjusted 

according to the actual bed volume. When the bed volume of avidin is smaller,


the volume of the final wash buffer needs to be scaled down. If the pH of the 

last drop is less than 3, the labeled peptides may have started to elute, meaning 

potential loss of the labeled peptides. 

20. The elution should be performed in a chemical fume hood to avoid inhaling 

acetonitrile. The quick freezing of samples on dry ice can prevent sample spill 

during vacuum centrifugation and reduce the time needed for the samples to 

dry. 

21. For every 200 μg of labeled peptides (i.e., 100 μg each of heavy or light labeled 

in the pair), mix 95 μL of cleaving reagent A and 5 μL of cleaving reagent B 

together first and transfer to the labeled peptides. 


This project has been funded in whole or in part with Federal funds from 

the National Cancer Institute, National Institutes of Health, under Contract No. 

N01-CO-12400. The content of this publication does not necessarily reflect 

the views or policies of the Department of Health and Human Services, nor 

does mention of trade names, commercial products, or organization imply 

endorsement by the U.S. Government. 

References 

1. Aebersold, R., Rist, B. and Gygi, S. P. (2000) Quantitative proteome analysis: 

methods and applications. Ann N Y Acad Sci 919, 33–47. 

2. Gygi, S. P., Rist, B. and Aebersold, R. (2000) Measuring gene expression by 

quantitative proteome analysis. Curr Opin Biotechnol 11, 396–401. 

3. Yates, J. R. 3rd. (2004) Mass spectral analysis in proteomics. Annu Rev Biophys 

Biomol Struct 33, 297–316. 

4. Ong, S. E. and Mann, M. (2005) Mass spectrometry-based proteomics turns quantitative. 

Nat Chem Biol 1, 252–262. 

5. Zieske, L. R. (2006) A perspective on the use of iTRAQ reagent technology for 

protein complex and profiling studies. J Exp Bot 57, 1501–1508. 

6. Yan, W. and Chen, S. S. (2005) Mass spectrometry-based quantitative proteomic 

profiling. Brief Funct Genomic Proteomic 4, 27–38. 

7. Bronstrup, M. (2004) Absolute quantification strategies in proteomics based on 

mass spectrometry. Expert Rev Proteomics 1, 503–512. 

8. Conrads, T. P., Issaq, H. J. and Hoang, V. M. (2003) Current strategies for quantitative 

proteomics. Adv Protein Chem 65, 133–159. 

9. Leitner, A. and Lindner, W. (2004) Current chemical tagging strategies for 

proteome analysis by mass spectrometry. J Chromatogr B Analyt Technol Biomed 

Life Sci 813, 1–26. 

10. Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H. and Aebersold, R. 

(1999) Quantitative analysis of complex protein mixtures using isotope-coded 

affinity tags. Nat Biotechnol 17, 994–999.


11. Flory, M. R., Griffin, T. J., Martin, D. and Aebersold, R. (2002) Advances in 

quantitative proteomics using stable isotope tags. Trends Biotechnol 20, S23–S29. 

12. Han, D. K., Eng, J., Zhou, H. and Aebersold, R. (2001) Quantitative profiling of 

differentiation-induced microsomal proteins using isotope-coded affinity tags and 

mass spectrometry. Nat Biotechnol 19, 946–951. 

13. Conrads, K. A., Yu, L. R., Lucas, D. A., Zhou, M., Chan, K. C., Simpson, K. A., 

Schaefer, C. F., Issaq, H. J., Veenstra, T. D., Beck, G. R. Jr. and Conrads, T. P. 

(2004) Quantitative proteomic analysis of inorganic phosphate-induced murine 

MC3T3-E1 osteoblast cells. Electrophoresis 25, 1342–1352. 

14. Gygi, S. P., Rist, B., Griffin, T. J., Eng, J. and Aebersold, R. (2002) Proteome 

analysis of low-abundance proteins using multidimensional chromatography and 

isotope-coded affinity tags. J Proteome Res 1, 47–54. 

15. Tao, W. A. and Aebersold, R. (2003) Advances in quantitative proteomics via 

stable isotope tagging and mass spectrometry. Curr Opin Biotechnol 14, 110–118. 

16. Conrads, K. A., Yi, M., Simpson, K. A., Lucas, D. A., Camalier, C. E., Yu, L. R., 

Veenstra, T. D., Stephens, R. M., Conrads, T. P. and Beck, G. R. Jr. (2005) A 

combined proteome and microarray investigation of inorganic phosphate-induced 

pre-osteoblast cells. Mol Cell Proteomics 4, 1284–1296. 

17. Koehne, C. H. and Dubois, R. N. (2004) COX-2 inhibition and colorectal cancer. 

Semin Oncol 31, 12–21. 

18. Sinicrope, F. A. and Gill, S. (2004) Role of cyclooxygenase-2 in colorectal cancer. 

Cancer Metastasis Rev 23, 63–75. 

19. Steinbach, G., Lynch, P. M., Phillips, R. K., Wallace, M. H., Hawk, E., 

Gordon, G. B., Wakabayashi, N., Saunders, B., Shen, Y., Fujimura, T., Su, L. K. 

and Levin, B. (2000) The effect of celecoxib, a cyclooxygenase-2 inhibitor, in 

familial adenomatous polyposis. N Engl J Med 342, 1946–1952. 

20. Thun, M. J., Henley, S. J. and Patrono, C. (2002) Nonsteroidal anti-inflammatory 

drugs as anticancer agents: mechanistic, pharmacologic, and clinical issues. J Natl 

Cancer Inst 94, 252–266. 

21. Arico, S., Pattingre, S., Bauvy, C., Gane, P., Barbat, A., Codogno, P. and Ogier- 

Denis, E. (2002) Celecoxib induces apoptosis by inhibiting 3-phosphoinositidedependent 

protein kinase-1 activity in the human colon cancer HT-29 cell line. 

J Biol Chem 277, 27613–27621. 

22. Lev-Ari, S., Strier, L., Kazanov, D., Madar-Shapiro, L., Dvory-Sobol, H., 

Pinchuk, I., Marian, B., Lichtenberg, D. and Arber, N. (2005) Celecoxib and 

curcumin synergistically inhibit the growth of colorectal cancer cells. Clin Cancer 

Res 11, 6738–6744.

11 

Analysis of Microdissected Cells by Two-Dimensional 

LC-MS Approaches 

Chen Li, Yi-Hong, Ye-Xiong Tan, Jian-Hua Ai, Hu Zhou, Su-Jun Li, 

Lei Zhang, Qi-Chang Xia, Jia-Rui Wu, Hong-Yang Wang, and Rong Zeng 

Summary 

Laser capture microdissection (LCM) is a powerful tool that enables the isolation of 

specific cell types from tissue sections, overcoming the problem of tissue heterogeneity and 

contamination. We combined the LCM with isotope-coded affinity tag (ICAT) technology 

and two-dimensional liquid chromatography to investigate the qualitative and quantitative 

proteomes of hepatocellular carcinoma (HCC). The effects of three different histochemical 

stains on tissue sections have been compared, and toluidine blue stain was proved as the 

most suitable stain for LCM followed by proteomic analysis. The solubilized proteins 

from microdissected HCC and non-HCC hepatocytes were qualitatively and quantitatively 

analyzed with two-dimensional liquid chromatography tandem mass spectrometry 

(2D-LC-MS/MS) alone or coupled with cleavable isotope-coded affinity tag (cICAT) 

labeling technology. A total of 644 proteins were qualitatively identified and 261 proteins 

were unambiguously quantified. These results showed that the clinical proteomic method 

using LCM coupled with ICAT and 2D-LC-MS/MS can carry out not only large-scale but 

also accurate qualitative and quantitative analysis. 

Key Words: hepatocellular carcinoma; laser capture microdissection; isotope-coded 

affinity tag; two-dimensional liquid chromatography; mass spectrometry. 


Hepatocellular carcinoma (HCC) is one of the most frequent tumors 

worldwide. There are 0.25–1 million newly diagnosed cases of HCC each year 

(1). The highest frequencies of HCC are observed in sub-Saharan Africa and 



193

194 Li et al. 

in Asia. In China, it has ranked the second cancer killer since 1990s. The most 

risky factors of HCC are chronic hepatitis B virus (HBV) and hepatitis C virus 

(HCV) infections, chronic exposure to the mycotoxin or aflatoxin B1 (AFB1), 

and alcoholic cirrhosis. Till now, the mainstay for the diagnosis for HCC 

includes serological tumor markers, such as alpha-fetoprotein, the L3 fraction 

of alpha-fetoprotein, and PIVKA-II, as well as imaging modalities (1,2,3). 

In order to improve diagnosis and prognosis from HCC, there is an 

urgent need to identify molecular markers to detect the disease. Using 

tissue samples from patients with HCC may be the most direct and 

persuasive way to find useful diagnostic and/or prognostic markers. Recently, 

proteomic analysis was applied to HCC tissues. Nineteen cases of HCC were 

analyzed by two-dimensional electrophoresis (2DE) and matrix-assisted laser 

desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) by 

Paik et al. (4,5,6). Proteome alterations in normal, cirrhotic, and tumorous 

tissue were observed using 2DE-MALDI-TOF-MS assay by Jung et al. (7). 

Kim et al. analyzed 11 cases of HCC using 2DE and delayed extractionmatrix 

assisted laser desorption/ionization time-of-flight mass spectrometry 

(DE-MALDI-TOF-MS) (8). 

Nowadays, non-enzymatic sample preparation (NESP) is one of the regular 

techniques for tissue sample preparation, which can be modified based on tissuetype-specific 

properties (9). However, problems may be associated with heterogeneity 

and contaminating proteins, e.g., blood proteins. Several approaches 

have been developed to resolve those problems. The selection of cell types 

of interest by dissection has received a great deal of attention. Since 1996, 

a laser-assisted technique, laser capture microdissection (LCM), has emerged 

as a good choice. LCM under direct microscopic visualization permits rapid 

one-step procurement of select cell populations from a section of complex, 

heterogeneous tissue (10,11). LCM has been used to isolate specific types 

of cells for protein, DNA, and RNA analysis. In the age of proteomics, 

proteins obtained by laser capture microdissected cells can be analyzed by twodimensional 

gel electrophoresis (2DE gel) (12,13), immunoassay (14,15), and 

surface-enhanced laser desorption and ionization time-of-flight (SELDI-TOF) 

(16,17,18,19,20,21). The only shortcoming of LCM may be that it requires long 

time to pick up sufficient cells for one experiment: 2–7 h for 20,000–40,000 

cells per immunoassay and 15 h for 250,000 cells per 2DE gel (22). 

Our previous work had applied proteomic analysis to HCC cell lines (23,24) 

and HCC metastatic cells (25). Furthermore, we extended our work to clinical 

tissues using LCM. However, the present LCM assay only obtains about several 

hundred micrograms of proteins with dissection for several hours, which is 

hard to be analyzed by traditional 2DE-MS proteomic route, especially for 

preparative 2DE gels followed by MS identification.

Proteomic Analysis of Clinical HCC Using LCM 195 

Since 1999, the isotope-coded affinity tag (ICAT) strategy has been a leading 

technology for relative protein quantification relying on post-harvest stable 

isotope labeling (26). Post-harvest labeling with stable isotopes can be used for 

protein quantification in cells and tissues from any organism, and the ICAT 

method as initially described has been shown to be capable of accurate quantification 

of proteins in complex mixtures (26). After the first-generation 2 H- 

ICAT reagents, the second- generation cleavable 13 C-ICAT reagents provided 

improved performance (27,28,29). The 2D chromatography MS/MS method has 

been shown to be capable of identifying a large number of proteins, including 

proteins of low abundance (30,31). 

In this study, we used LCM to isolate HCC and non-HCC hepatocytes 

and firstly combined LCM with cleavable isotope-coded affinity tag (cICAT) 

labeling technology and two-dimensional liquid chromatography tandem mass 

Frozen sections of HCC tissues 

Stained with toluidine blue 

Laser capture microdissection 

HCC hepatocytes 

Non-HCC hepatocytes 

Solubilized proteins 

Labeled with cICAT light chain 

Labeled with cICAT heavy chain 

Digestion of protein mixture 

2D-LC-MS/MS 

Analyze by bioinformatics 

Fig. 1. Outline of accurate qualitative and quantitative proteomic analysis of clinical 

hepatocellular carcinoma using laser capture microdissection coupled with isotopecoded 

affinity tag and two-dimensional liquid chromatography mass spectrometry. 

Reprinted with permission from (34).

196 Li et al. 

spectrometry (2D-LC-MS/MS) to carry out accurate qualitative and quantitative 

analysis of HCC and non-HCC tissues. The flowchart used is outlined in Fig. 1. 

Totally 644 proteins in HCC hepatocytes were qualitatively determined and 261 

differential proteins between HCC and non-HCC hepatocytes were quantitated. 

Till now, this is one of the largest qualitative and qualitative proteomes for 

HCC and non-HCC tissues. Our strategy and method provided an accurate, 

fast, and sensitive approach for proteomic analysis of clinical tissues, which 

will facilitate the understanding of the mechanism of HCC or other diseases 

and mining of potential markers and drug targets for diagnosis and treatment. 

2. Materials 

2.1. Tissue Specimen and Sample Preparation by Nonenzymatic 

Method (NESP) 

1. Tissues from a HCC patient are isolated from fresh partially hepatectized tissues 

of HCCs in Shanghai Eastern Hepatobiliary Surgery Hospital. Access to human 

tissues complies with both Chinese laws and the guidelines of the Ethics 

Committee. 

2. Glutamine-free RPMI 1640 medium: glutamine-free, 5% fetal calf serum, 0.2 mM 

phenylmethylsulfonyl fluoride, 1 mM ethylenediaminetetraacetic acid tetrasodium 

salt dehydrate (EDTA), and antibiotics: oxacillin 25 μg/ml, gentamycin 50 μg/ml, 

penicillin 100 U/ml, streptomycin 100 μg/ml, amphotericin B 0.25 μg/ml, nistatin 

50 U/ml. Store at 4°C. 

3. Ceramic mortar and pestle (SIBAS Corp. Shanghai, China). 

4. Lysis buffer: 8 M urea, 4% 3-[(3-cholamidopropyl)dimethylammonio]-1-propane 

sulfonate (CHAPS), 40 mM Tris-HCl (pH 8.3), 65 mM dithiothreitol (DTT). 

Store in aliquots at –8°C. 

5. Proteinase inhibitor tablet mixture (Roche). 

2.2. Laser Capture Microdissection 

1. Tissues from a HCC patient are isolated from fresh partially hepatectized tissues 

of HCCs in Shanghai Eastern Hepatobiliary Surgery Hospital. Access to human 

tissues complies with both Chinese laws and the guidelines of the Ethics 

Committee. The tissues are from a 50-year male patient with HCC in Edmondson 

grade III (HBV infected, AFP 7.3 μg/L, size 15 × 13 × 10.5 cm). 

2. Freezing microtome CM1900 (Leica). 

3. O.C.T. compound (Tissue-Tek). 

4. Hematoxylin, eosin, and toludine blue stain (Shanghai Genebase Corp.). 

5. Leica AS LMD Laser Capture Microdissection System (Leica). 

6. Lysis buffer: 8 M urea, 4% CHAPS, 40 mM Tris, 65 mM DTT. Store in aliquots 

at –8°C. 

7. Proteinase inhibitor tablet mixture (Roche).


2.3. Removal of Toludine Blue and Digestion of Protein Mixture 

for Qualitative Analysis 

1. Precipitation solution: 50% acetone, 50% ethanol, 0.1% acetic acid (HAc). Store 

at –20°C. 

2. Redissolved buffer: 6 M guanidine HCl, 100 mM Tris-HCl (pH 8.3). Store at 

4°C. 

3. DTT and iodoacetamide (IAA) are from Bio-Rad. Sequencing grade TPCKtrypsin 

is from Promega. 

4. YM3 ultrafiltration membranes (molecular mass cutoff, 3 kDa) are from Millipore 

Corp. All buffers are prepared with Milli-Q water (Millipore). 

2.4. Cleavable Isotope-Coded Affinity Tag Labeling of Proteins 

1. Tri-n-butylphosphate (TBP) is from Bio-Rad. 

2. cICAT light or heavy reagents, Avidin cartridge, affinity buffer–elute, affinity 

buffer–load, affinity buffer–wash 1, affinity buffer–wash 2, cleaving reagents A 

and B are from Applied Biosystems. 

3. Sequencing grade TPCK-trypsin (Promega). 

4. YM3 ultrafiltration membranes (molecular mass cutoff, 3 kDa) are from Millipore 

Corp. All buffers are prepared with Milli-Q water (Millipore). 

2.5. One-Dimensional and Two-Dimensional Liquid Chromatography 

Coupled with Tandem Mass Spectrometry 

1. Formic acid is obtained from Aldrich, and acetonitrile (HPLC gradient grade) is 

obtained from Merck. 

2. The LCQ Deca XP system, ProteomeX Workstation and TurboSequest 

software are purchased from Thermo Electron Corporation. 

2.6. Bioinformatics Analysis 

1. ExPASy proteomics tools are accessed from cn.expasy.org/tools/#proteome. 

2. Program TMHMM 2.0 is accessed from the Center for Biological Sequence 

Analysis (www.cbs.dtu.dk/services/TMHMM/). 

3. Classification tools are accessed from www.geneontology.org. 

3. Methods 

In brief, two keywords should be noticed during the whole process of LCM 

coupled with 2D-LC-MS/MS approaches. The first one is speediness, and 

the second one is impurity. Sample preparation by LCM technology must be 

done as quickly as possible, including fixation of fresh tissues, preparation of 

frozen sections, histochemical staining, microdissection, and so on. Impurities,

198 Li et al. 

such as histochemical stains, should be removed as completely as possible 

by centrifuge, precipitation, and ultrafitration before trypsin digestion and LC- 

MS/MS analysis. 

Fixation and histochemical staining are the two initial steps in LCM 

technology. The appropriate selection of fixation and histochemical staining 

methods is an important factor for the processes. In this work, we used freshly 

prepared liver tissues to make frozen sections (8 μm thick), and we fixed the 

sections with ethanol to avoid the effects on proteins, such as crosslinking 

caused by formalin fixation. Some histochemical stains (hematoxylin, eosin, 

methyl green, and toluidine blue) were tested in 2DE gel (33), which showed 

that staining with single stain (hematoxylin) was better than with two stains 

simultaneously (hematoxylin and eosin); methyl green and toluidine blue 

staining were both compatible with the analysis of proteins by 2D-PAGE. The 

results with toluidine blue staining indicated a direct link between the intensity 

of tissue section staining and problems with the generation of good-quality 

protein separations. In our study, the proteins from cells after LCM were 

subjected to tryptic digestion and LC-MS/MS analysis. The staining material 

might affect the pH of digestion buffer or inactivate the trypsin; therefore, 

we tried to remove the stains using precipitation and ultrafiltration prior to 

digestion. We used three histochemical stains (hematoxylin, eosin, and toluidine 

blue), respectively, to stain the frozen sections. Among these three histochemical 

stains, we found that almost all toluidine blue stain could be removed 

after precipitation in the solution (50% acetone, 50% ethanol, 0.1% acetic 

acid) and desalting by ultrafiltration. In addition, protein solubilization stained 

by toluidine blue stain was better because some colored protein precipitation 

appeared on the filtration membrane when using hematoxylin stain or eosin 

stain. Therefore, we chose toluidine blue stain to optimize the experimental 

conditions, including staining, microdissection, and protein digestion. 

3.1. Tissue Specimen and Sample Preparation by Nonenzymatic 

Method (NESP) 

1. The tissues used were from a 50-year male patient with HCC in Edmondson 

grade III (HBV infected, AFP 7.3 μg/L, size 15 × 13 × 10.5 cm). Tumorous 

tissues and their adjacent paired nontumorous tissues (3 cm away from the edge of 

HCC lesions, about 0.1 g) were isolated from fresh partially hepatectized tissues 

of HBV-associated HCC. A part of the resected tissue was used for histology 

analysis. 

2. The tissues were rinsed several times with cold glutamine-free RPMI 1640 

medium and were homogenized in liquid nitrogen-cooled mortar and pestle (see 

Note 1). 

3. The tissue powders obtained were dissolved in lysis buffer (see Note 2).


4. The samples were sonicated on ice for 30 s (intensity: below 50 W) using an 

ultrasonic processor and centrifuged for 1hat20,627×g to remove DNA, RNA, 

and any particulate materials. 

5. The protein concentrations of samples were measured by Bio-Rad Protein Assay 

kit. All samples were stored at –8°C until use (see Note 3). 

3.2. Laser Capture Microdissection 

1. Embed fresh tissues carefully in OCT in plastic mold, taking care not to trap air 

bubbles surrounding the tissue. Freeze the tissue by setting mold on top of liquid 

nitrogen until 70–80% of the block turns white and then put the block on top of 

dry ice. 

2. For cutting step, mount the frozen block on the cryostat holder. Never, at any 

point, let the tissue warm up to temperatures above –15°c. Allow frozen blocks 

to equilibrate in the cryostat chamber for about 5 min. Cut 8-μm sections. 

3. Wash 8-μm sections of freshly prepared liver tissues by cold phosphate buffered 

saline (PBS, pH 7.4), and stain with toluidine blue using standard manufacturer’s 

protocols with minor modifications (see Note 4). 

4. Fix the sections in cold 95% ethanol for 10 min, air-dry and microdissect with 

Leica AS LMD Laser Capture Microdissection System. 

5. Using laser pulses of 7.5 μm diameter, 70 mW, and with 2–3 ms duration, 

microdissect approximately 50,000 or 100,000 cells of HCC and non-HCC hepatocytes; 

store in microdissection caps at –8°C until lysed (see Note 5). An example 

of the results produced using hematoxylin and eosin (H&E) stained section is 

shown in Fig. 2. 

6. Each cell population was determined to be 95% homogeneous by microscopic 

visualization of the captured cells. Dissolve the laser capture microdissected HCC 

and non-HCC hepatocytes in lysis buffer (see Note 2). 

7. Sonicate the samples on ice for a while using an ultrasonic processor and 

centrifuge for 1 h at 20,627×g to remove DNA, RNA, and any particulate 

materials. 

8. Measure the protein concentrations of samples by Bio-Rad Protein Assay kit. 

Store all the samples at –8°C until use (see Note 3). 

3.3. Removal of Toludine Blue and Digestion of Protein Mixture 

for Qualitative Analysis 

1. Deposit the samples prepared by NESP or LCM technology in precipitation 

solution (50% acetone, 50% ethanol, 0.1% acetic acid; sample 

volume:precipitation solution volume = 1:5) at least for 12 h at –20°C. Wash the 

pellets with 100% acetone, 70% ethanol, and lyophilize by lyophilization (see 

Note 6). 

2. Redissolve the pellets in 6 M guanidine HCl, 100 mM Tris (pH 8.3); measure the 

concentrations with Bio-Rad Protein Assay kit.

200 Li et al. 

A. 

B. 

Fig. 2. HCC tissues before (A) and after (B) LCM. Reprinted with permission 

from (34). 

4. Reduce 200 μg solubilized proteins with DTT (final concentration 20 mM) and 

subsequently alkylate with IAA (final concentration 40 mM). 

5. After desalting by YM3 ultrafiltration membranes, incubate the protein mixture 

with trypsin (trypsin:protein mixture = 1:30, W/W, Promega, Madison, WI) at 

37°C for 16 h (see Note 7). 

3.4. Cleavable Isotope-Coded Affinity Tag Labeling of Proteins 

1. Reduce 100 μg HCC and 100 μg non-HCC solubilized proteins prepared by LCM 

technology with TBP (final concentration 5 mM) (see Note 8).


2. Transfer the reduced HCC and non-HCC solubilized proteins into the vial 

containing cICAT light or heavy reagent, respectively, and mix. After a brief 

centrifugation, incubate the proteins for 2hat37°C in the dark. 

3. Combine the labeled proteins into one tube. After desalting by YM3 ultrafiltration 

membranes, incubate the protein mixture with trypsin (trypsin:protein 

mixture = 1:30, W/W, Promega, Madison, WI) at 37°C for 16 h (see Note 7). 

4. Use Avidin cartridge (Applied Biosystems) to purify the ICAT-labeled peptides 

from tryptic digests according to the manufacture’s protocol. In brief, activate 

Avidin cartridge by 2 ml of the affinity buffer–elute and 2 ml of the affinity 

buffer–load. Slowly inject (∼1 drop/5 s) the peptide sample onto Avidin cartridge. 

Wash the Avidin cartridge by 500 μl of affinity buffer–load, 1 ml of affinity 

buffer–wash 1, 1 ml of affinity buffer–wash 2, and 1 ml of Milli-Q water. To 

elute the labeled peptides, slowly inject (∼1 drop/5 s) the affinity buffer–elute and 

collect the elute. Dry the elute from the Avidin cartridge through lyophilization. 

5. Dissolve the dried cICAT-labeled peptides in cleaving reagents and cleave for 

2 h at 37°C. Condense the cICAT-labeled peptides through lyophilization. 

3.5. One-Dimensional and Two-Dimensional Liquid Chromatography 

Coupled with Tandem Mass Spectrometry (1D- and 2D-LC-MS/MS) 

1. All the 2D HPLC separations are performed on ProteomeX (Thermo Finnigan 

Corp., San Jose, CA) equipped with two LC pumps. The flow rates of both salt and 

analytical pumps are 200 μl/min and about 2 μl/min after split. The strong cation 

exchange column is the 300 μm inner diameter ones (SCX resin, 5 μm), and the 

RPC column is the 150 μm inner diameter (C 18 resin, 300 A, 5 μm) (see Note 9). 

2. Nine different salt concentration ranges—0, 25, 50, 75, 100, 150, 200, 400, and 

800 mM ammonium chloride—are used for step gradient. 

3. The mobile phases used for reverse phase are A: 0.1% formic acid in water, pH 

3.0, B: 0.1% formic acid in acetonitrile. 

4. Load about 200 μg of peptides digested from the LCM protein to the SCX 

column by the autosample. The elute condition is described in step 2. Load 

the eluted peptides from each salt step to the RPC columns. The RPC columns 

are washed by 95% A mobile phases in 20 column volumes. Finally, separate 

the peptides using 100-min linear gradient from 5 to 80% B mobile phases. 

The eluting peptide enters an LCQ ProteomeX mass spectrometer (Thermo 

Electron, San Jose, CA) by the metal needle (see Note 10). 

5. The 1D HPLC separation uses the same system/experimental steps, but without 

the use of a strong cation exchange column. 

6. An electrospray (ESI) ion-trap mass spectrometer (LCQ Deca XP, Thermo 

Finnigan, San Jose, CA) is used for peptide detection. 

7. The positive ion mode is employed and the spray voltage is set at 3.2 kV. The 

spray temperature is set at 150°C for peptides. 

8. The collision energy is automatically set by LCQ Deca XP. After the acquisition 

of full scan mass spectra, three MS/MS scans are acquired for the next three 

most intense ions using dynamic exclusion.

202 Li et al. 

9. Peptides and proteins are identified using TurboSequest R (Thermo Finnigan, 

San Jose, CA), which uses the MS and MS/MS spectrum of peptide ions 

to search against the publicly available NCBI non-redundant protein database 

(www.ncbi.nlm.nih.gov). 

10. The protein identification criteria that we used are based on Delta CN (≥0.1) 

and Xcorr (one charge ≥ 1.8, two charges ≥ 2.2, three charges ≥ 3.7). An 

example of the results produced is shown in Table 1 (see Note 11). 

11. For quantitative analysis with cICAT technology and 2D-LC-MS/MS, manual 

check is followed after database searching and quantification by Xpress 

(TurboSequest R software). Quantitative analysis results of 261 proteins from 

LCM-ICAT-2D-LC-MS/MS are shown in Fig. 3. In our experiment, a total of 

149 differentially expressed proteins with at least twofold quantitative alterations 

in HCC and non-HCC hepatocytes were detected, including 55 upregulated 

proteins (32 with 2∼5 folds, 13 with 5∼10 folds, 10 with >10 folds) and 94 

downregulated spots in HCC hepatocytes (62 with 2∼5 folds, 17 with 5∼10 

folds, 15 with >10 folds). 

3.6. Bioinformatics Analysis 

1. The pI and Mr of the proteins are analyzed using ExPASy proteomics tools 

accessed from http://cn.expasy.org/tools/#proteome. Examples of the results 

produced are shown in Table 1 and Fig. 5A and 5B. 

17 

15 

32 

13 

2 ≤ Ratio(HCC/non-HCC) ≤ 5 

10 

5 < Ratio(HCC/non-HCC) ≤ 10 

Ratio(HCC/non-HCC) > 10 

62 

Ratio(HCC/non-HCC or non-HCC/HCC) < 2 

2 ≤ Ratio(non-HCC/HCC) ≤ 5 

5 < Ratio(non-HCC/HCC) ≤ 10 

Ratio(non-HCC/HCC) > 10 

112 

Fig. 3. Quantitative analysis results of 261 proteins from LCM-ICAT-2D-LC- 

MS/MS. A total of 149 differentially expressed proteins with at least twofold quantitative 

alterations in HCC and non-HCC hepatocytes were detected, including 55 upregulated 

proteins (32 with 2∼5 folds, 13 with 5∼10 folds, 10 with >10 folds) and 94 

downregulated spots in HCC hepatocytes (62 with 2∼5 folds, 17 with 5∼10 folds, 15 

with >10 folds). Reprinted with permission from (34).


Table 1 

Summary of Total Proteins Identified in HCC-NESP-1D-LC-MS/MS, 

HCC-NESP-2D-LC-MS/MS and HCC-LCM-2D-LC-MS/MS 

HCC- 

NESP-1D- 

LC-MS/MS 

HCC- 

NESP-2D- 

LC-MS/MS 

HCC- 

LCM-2D- 

LC-MS/MS 

Protein quantity 200μg 200μg 200μg 

Total proteins identified 208 626 644 

Hydrophobic proteins 25(12.0%) 64(10.2%) 80(12.4%) 

Trans-membrane proteins 8(3.9%) 30(4.8%) 54(8.4%) 

Proteins with Mr >100KD or < 10KD 19(9.1%) 77(12.3%) 75(11.6%) 

Proteins pI >9 21(10.1%) 78(12.5%) 126(19.6%) 

2. The general average hydropathicity (GRAVY) score is calculated as the arithmetic 

mean of the sum of the hydropathic indices of each amino acid (32). Examples 

of the results produced are shown in Table 1 and Fig. 5C. 

3. The trans-membrane prediction is conducted using the computer server 

program TMHMM server 2.0, which can be accessed from the CBS 

(http://www.cbs.dtu.dk/services/TMHMM/). Examples of the results produced are 

shown in Table 1 and Fig. 5D. 

4. All identified proteins are classified by their molecular function, cellular 

component, and biological process with the tools on http://www.geneontology.org. 

An example of the results produced is shown in Fig. 4. 

4. Notes 

1. Glutamine-free RPMI 1640 medium must be cold (4°C) before use. Washing 

should be done as quickly as possible, until there are no contaminations (blood, 

etc.) on tissues. Glutamine-free RPMI 1640 medium could be replaced by PBS 

(pH 7.4), 0.9% NaCl solution, or any other isotonic buffer. 

2. Store the lysis buffer in small aliquots at –8°C to avoid multiple freeze-thaw 

cycles. Protease inhibitor tablet mixture (Roche Molecular Biochemicals) should 

be dissolved in lysis buffer. 

3. Store the samples in small aliquots at –8°C to avoid multiple freeze-thaw cycles. 

Protein concentrations of the samples should be about 10 μg/μl for subsequent 

experiments. 

4. The sections should be very lightly stained with toluidine blue only to distinguish 

hepatocytes during microdissection. Otherwise, the redundant stains could affect 

follow-up experiments. 

5. In fact, in order to reduce microdissection time, manipulators could choose to 

capture hepatocytes or remove other cells based on the condition of each section.

204 Li et al. 

A. 

B. 

Fig. 4. Classification of differentially expressed proteins obtained by LCM-ICAT- 

2D-LC-MS/MS. (A) shows proteins with at least twofold increased expression levels 

in HCC hepatocytes. (B) shows proteins with at least twofold decreased expression 

levels in HCC hepatocytes. Reprinted with permission from (34). 

6. Precipitation solution, acetone, and ethanol must be cold at –20°C before use. 

7. Ultrafiltration is very important to remove redundant salts, stain, and other 

impurities, and ensure follow-up steps. 

8. TBP is a much stronger but more toxic reducing agent for labeling ICAT reaction 

than DTT.


Protein number 


100 

80 

60 

40 

20 

0 

45 

40 

35 

30 

25 

20 

15 

10 

5 

0 

7 

62 

100 kDa 

C. Hydrophile and hydrophobicity distribution 

9 

37 39 

31 30 27 

18 

15 11 13 12 

4 3 

0.3 



70 

60 

50 

40 

30 

20 

10 

0 

3 

Number of trans-membrane region 

1 

21 

61 

5 

37 

10 

(5~6) 

(6~7) 

(7~8) 

(8~9) 

(9~10) 

>10 

Fig. 5. Characteristics of differentially expressed proteins obtained by LCM-ICAT- 

2D-LC-MS/MS. (A) shows the Mr distribution; (B) shows the pI distribution; (C) 

presents the hydrophile and hydrophobicity distribution; and (D) shows the transmembrane 

proteins. Reprinted with permission from (34). 

9. The LCQ ProteomeX Workstation (Thermo Electron, San Jose, CA) is an 

automatic 2D LC/MS system, which can be used in high-throughout proteomic 

research. However, you may use another equipment to separate the proteomics 

sample by offline SCX fractionation. The step involved in offline SCX fractionation 

is almost the same as online. The difference is that you need to manually 

load the step salt-eluted peptides to RPC column. 

10. If you use the nanospay kit in the mass spectrometer and the 75-μm 

inner diameter RPC column, the eluted peptides can directly enter the mass 

spectrometer. The sensitivity in the nanospay mode is higher than in the metal 

needle mode. 

11. The protein identification criteria can vary based on the type of mass 

spectrometer or other analytic needs. For example, we use Delta CN (≥0.1) and 

Xcorr (one charge ≥ 1.9, two charges≥ 2.2, three charges ≥ 3.75) as criteria 

when using LTQ linear ion trap mass spectrometer (Thermo Finnigan, San Jose, 

CA). 


This work was supported by National High-Technology Project 

(2001AA233031, 2002BA711A11) and Basic Research Foundation 

(2001CB210501).

206 Li et al. 

References 

1. Feitelson M.A., Sun B., Satiroglu Tufan N.L., Liu J., Pan J. and Lian Z. (2002) 

Genetic mechanisms of hepatocarcinogenesis. Oncogene 21, 2593–2604. 

2. Fujiyama S., Tanaka M., Maeda S., Ashihara H., Hirata R. and Tomita K. (2002) 

Tumor markers in early diagnosis, follow-up and management of patients with 

hepatocellular carcinoma. Oncology 62(Suppl 1), 57–63. 

3. Qin L.X. and Tang Z.Y. (2002) The prognostic molecular markers in hepatocellular 

carcinoma. World J Gastroenterol 8, 385–392. 

4. Park K.S., Cho S.Y., Kim H. and Paik Y.K. (2002) Proteomic alterations of the 


carcinoma. Int J Cancer 97, 261–265. 

5. Park K.S., Kim H., Kim N.G., Cho S.Y., Choi K.H., Seong J.K. and Paik Y.K. 

(2002) Proteomic analysis and molecular characterization of tissue ferritin light 

chain in hepatocellular carcinoma. Hepatology 35, 1459–1466. 

6. Cho S.Y., Park K.S., Shim J.E., Kwon M.S., Joo K.H., Lee W.S., Chang J., 

Kim H., Chung H.C., Kim H.O. and Paik Y.K. (2002) An integrated proteome 

database for two-dimensional electrophoresis data analysis and laboratory information 

management system. Proteomics 2, 1104–1113. 

7. Lim S.O., Park S.J., Kim W., Park S.G., Kim H.J., Kim Y.I., Sohn T.S., Noh J.H. 

and Jung G. (2002) Proteome analysis of hepatocellular carcinoma. Biochem 

Biophys Res Commun 291, 1031–1037. 

8. Kim J., Kim S.H., Lee S.U., Ha G.H., Kang D.G., Ha N.Y., Ahn J.S., Cho 

H.Y., Kang S.J., Lee Y.J., Hong S.C., Ha W.S., Bae J.M., Lee C.W. and 

Kim J.W. (2002) Proteome analysis of human liver tumor tissue by twodimensional 

gel electrophoresis and matrix assisted laser desorption/ionizationmass 

spectrometry for identification of disease-related proteins. Electrophoresis 23, 

4142–4156. 

9. Franzen B., Hirano T., Okuzawa K., Uryu K., Alaiya A.A., Linder S. and 

Auer G. (1995) Sample preparation of human tumors prior to two-dimensional 

electrophoresis of proteins. Electrophoresis 16, 1087–1089. 

10. Emmert-Buck M.R., Bonner R.F., Smith P.D., Chuaqui R.F., Zhuang Z., 

Goldstein S.R., Weiss R.A. and Liotta L.A. (1996) Laser capture microdissection. 

Science 274, 998–1001. 

11. Bonner R.F., Emmert-Buck M., Cole K., Pohida T., Chuaqui R., Goldstein S. and 

Liotta L.A. (1997) Laser capture microdissection: molecular analysis of tissue. 

Science 278, 1481–1483. 

12. Ornstein D.K., Gillespie J.W., Paweletz C.P., Duray P.H., Herring J., Vocke 

C.D., Topalian S.L., Bostwick D.G., Linehan W.M., Petricoin E.F., III and 

Emmert-Buck M.R. (2000) Proteomic analysis of laser capture microdissected 

human prostate cancer and in vitro prostate cell lines. Electrophoresis 21, 

2235–2242. 

13. Jones M.B., Krutzsch H., Shu H., Zhao Y., Liotta L.A., Kohn E.C. and 

Petricoin E.F., III (2002) Proteomic analysis and identification of new biomarkers 

and therapeutic targets for invasive ovarian cancer. Proteomics 2, 76–84.


14. Simone N.L., Remaley A.T., Charboneau L., Petricoin E.F., III, Glickman J.W., 

Emmert-Buck M.R., Fleisher T.A. and Liotta L.A. (2000) Sensitive immunoassay 

of tissue cell proteins procured by laser capture microdissection. Am J Pathol 156, 

445–452. 

15. Ornstein D.K., Englert C., Gillespie J.W., Paweletz C.P., Linehan W.M., Emmert- 

Buck M.R. and Petricoin E.F., III (2000) Characterization of intracellular prostatespecific 

antigen from laser capture microdissected benign and malignant prostatic 

epithelium. Clin Cancer Res 6, 353–356. 

16. Sauter E.R., Zhu W., Fan X.J., Wassell R.P., Chervoneva I. and Du Bois G.C. 

(2002) Proteomic analysis of nipple aspirate fluid to detect biologic markers of 

breast cancer. Br J Cancer 86, 1440–1443. 

17. Verma M., Wright G.L., Jr., Hanash S.M., Gopal-Srivastava R. and Srivastava 

S. (2001) Proteomic approaches within the NCI early detection research network 

for the discovery and identification of cancer biomarkers. Ann N Y Acad Sci 945, 

103–115. 

18. Jain K.K. (2002) Recent advances in oncoproteomics. Curr Opin Mol Ther 4, 

203–209. 

19. Jr G.W., Cazares L.H., Leung S.M., Nasim S., Adam B.L., Yip T.T., Schellhammer 

P.F., Gong L. and Vlahou A. (1999) ProteinChip R surface enhanced laser 

desorption/ionization (SELDI) mass spectrometry: a novel protein biochip 

technology for detection of prostate cancer biomarkers in complex protein mixtures. 

Prostate Cancer Prostatic Dis 2, 264–276. 

20. Batorfi J., Ye B., Mok S.C., Cseh I., Berkowitz R.S. and Fulop V. (2003) Protein 

profiling of complete mole and normal placenta using ProteinChip analysis on 

laser capture microdissected cells. Gynecol Oncol 88, 424–428. 

21. Wulfkuhle J.D., Paweletz C.P., Steeg P.S., Petricoin E.F., III and Liotta L. (2003) 

Proteomic approaches to the diagnosis, treatment, and monitoring of cancer. Adv 

Exp Med Biol 532, 59–68. 

22. Seow T.K., Liang R.C., Leow C.K. and Chung M.C. (2001) Hepatocellular 

carcinoma: from bedside to proteomics. Proteomics 1, 1249–1263. 

23. Yu L.R., Shao X.X., Jiang W.L., Xu D., Chang Y.C., Xu Y.H. and Xia Q.C. (2001) 

Proteome alterations in human hepatoma cells transfected with antisense epidermal 

growth factor receptor sequence. Electrophoresis 22, 3001–3008. 

24. Yu L.R., Zeng R., Shao X.X., Wang N., Xu Y.H. and Xia Q.C. (2000) Identification 

of differentially expressed proteins between human hepatoma and normal liver cell 

lines by two-dimensional electrophoresis and liquid chromatography-ion trap mass 

spectrometry. Electrophoresis 21, 3058–3068. 

25. Ding S.J., Li Y., Tan Y.X., Jiang M.R., Tian B., Liu Y.K., Shao X.X., Ye S.L., 

Wu J.R., Zeng R., Wang H.Y., Tang Z.Y. and Xia Q.C. (2004) From proteomic 

analysis to clinical significance: overexpression of cytokeratin 19 correlates with 

hepatocellular carcinoma metastasis. Mol Cell Proteomics 3(1), 73–81. 

26. Gygi S.P., Rist B., Gerber S.A., Turecek F., Gelb M.H. and Aebersold R. (1999) 

Quantitative analysis of complex protein mixtures using isotope-coded affinity 

tags. Nat Biotechnol 17, 994–999.

208 Li et al. 

27. Li J., Steen H. and Gygi S.P. (2003) Protein profiling with cleavable isotope 

coded affinity tag (cICAT) reagents: the yeast salinity stress response. Mol Cell 

Proteomics 2 (11), 1198–204. 

28. Oda Y., Owa T., Sato T., Boucher B., Daniels S., Yamanaka H., Shinohara Y., 

Yokoi A., Kuromitsu J. and Nagasu T. (2003) Quantitative chemical proteomics 

for identifying candidate drug targets. Anal Chem 75, 2159–2165. 

29. Hansen K.C., Schmitt-Ulms G., Chalkley R.J., Hirsch J., Baldwin M.A. and 

Burlingame A.L. (2003) Mass spectrometric analysis of protein mixtures at 

low levels using cleavable 13C-isotope-coded affinity tag and multidimensional 

chromatography. Mol Cell Proteomics 2, 299–314. 

30. Washburn M.P., Wolters D. and Yates J.R., III (2001) Large-scale analysis of 

the yeast proteome by multidimensional protein identification technology. Nat 

Biotechnol 19, 242–247. 

31. Gygi S.P., Corthals G.L., Zhang Y., Rochon Y. and Aebersold R. (2000) Evaluation 

of two-dimensional gel electrophoresis-based proteome analysis technology. Proc 

Natl Acad Sci USA 97, 9390–9395. 

32. Kyte J. and Doolittle R.F. (1982) A simple method for displaying the hydropathic 

character of a protein. J Mol Biol 157, 105–132. 

33. Craven R.A., Totty N., Harnden P., Selby P.J. and Banks R.E. (2002) Laser 

capture microdissection and two-dimensional polyacrylamide gel electrophoresis: 

evaluation of tissue preparation and sample limitations. Am J Pathol 160, 815–822. 

34. Li C., Hong Y., Tan Y.X., Zhou H., Ai J.H., Li S.J., Zhang L., Xia Q.C., Wu J.R., 

Wang Y. and Zeng R. (2004) Accurate qualitative and quantitative proteomic 

analysis of clinical hepatocellular carcinoma using laser capture microdissection 

coupled with isotope-coded affinity tag and two-dimensional liquid chromatography 

mass spectrometry. Mol Cell Proteomics 3(4), 399–409.

12 

Label-Free LC-MS Method for the Identification 

of Biomarkers 

Richard E. Higgs, Michael D. Knierman, Valentina Gelfanova, 

Jon P. Butler, and John E. Hale 

Summary 

Pharmaceutical companies and regulatory agencies are pursuing biomarkers as a means 

to increase the productivity of drug development. Quantifying differential levels of proteins 

from complex biological samples like plasma or cerebrospinal fluid is one specific 

approach being used to identify markers of drug action, efficacy, toxicity, etc. Academic 

investigators are also interested in markers that are diagnostic or prognostic of disease 

states. We report a comprehensive, fully automated, and label-free approach to relative 

protein quantification including: sample preparation, proteolytic protein digestion, LC- 

MS/MS data acquisition, de-noising, mass and charge state estimation, chromatographic 

alignment, and peptide quantification via integration of extracted ion chromatograms. 

Additionally, we describe methods for transformation and normalization of the quantitative 

peptide levels in multiplexed measurements to improve precision for statistical analysis. 

Lastly, we outline how the described methods can be used to design and power biomarker 

discovery studies. 

Key Words: relative quantification; label-free quantification; biomarkers; 

proteomics; LC-MS/MS. 


Recent advances in analytical technology, particularly mass spectrometry, 

are finding broad applications in the search for biomarkers. Biomarkers may 

be defined as indicators of biological processes and encompass a variety of 

measures including imaging, polynucleotides, proteins, and small molecule 



209

210 Higgs et al. 

metabolites, among others. These new biomarker discovery activities are 

motivated by the need to improve diagnosis, guide-targeted therapies, and 

monitor therapeutic efficacy and toxicity throughout a treatment regimen. 

Biomarkers of drug efficacy or toxicity have the potential to shorten the drug 

development timeline as they may provide early indications of a drug’s activity. 

This potential for increased drug development productivity from high-quality 

biomarkers has fueled increased attention from pharmaceutical, biotechnology, 

and regulatory agencies alike (1,2). Within the field of protein biomarkers, 

mass spectrometry is playing a central role in the discovery of biomarkers from 

various biological sample matrices. Quantification of small organic molecules 

using extracted ion chromatograms (XICs) from liquid chromatography mass 

spectrometry (LC-MS) experiments has a long history in analytical chemistry. 

Similar techniques using LC-MS experiments with proteolytic protein digests 

are now routinely being applied to quantify peptide and protein levels in 

biological samples. Early LC-MS peptide quantification methods relied on the 

modification of peptides with reagents enriched in stable isotopes to introduce 

mass shifts in the peptides from one sample in order to compare relative 

peptide levels to another un-labeled sample (3,4). The number of biological 

samples required for statistical power in many applications, the restriction that 

study samples must be paired or pooled for these label-based methods, and the 

increased cost due to specialized reagents have limited their application and 

motivated the search for label-free methods of non-targeted protein profiling. 

We report here a comprehensive analytical system to collect and automatically 

process the data from non-targeted LC-MS/MS analyses of complex 

protein mixtures. In contrast to pattern-based (5,6), difference based (7), or 

identification-based quantification methods (8,9), the approach presented here 

simply integrates the peptide parent ion current in order to obtain a relative 

peptide level in each study sample. No labeling or pooling of study samples 

is required. The output from this approach is an N × P table in which each 

of P peptides has been quantified in each of the N study samples. This table 

maximizes the flexibility in downstream statistical data analysis including transformation, 

normalization, and an analysis suited to the experimental design. 

The described method is based on the collective efforts of the applied biochemistry 

and statistics groups within Lilly Research Laboratories (10,11,12). As 

a broad-looking, discovery-oriented assay, it is important to note the limitations 

imposed by the approach. An assay designed to detect and quantify many 

analytes simultaneously compromises on sensitivity, selectivity, dynamic range, 

and absolute quantification relative to a targeted assay designed for a particular 

analyte. Ion suppression and co-elution of peptides from complex mixtures 

have the potential to interfere with the ion current attributed to a peptide, thus 

confounding any inference that may be made about the relative quantities of

Label-Free Biomarker Identification 211 

the peptide. The limited dynamic range of these uncalibrated assays tends to 

underestimate the magnitude of a change in protein levels for peptides that do 

not lie near the linear portion of the instrument response curve. Nonetheless, 

these non-targeted methods have shown promise in identifying relative changes 

in protein levels that can be followed in subsequent studies using more targeted 

assays (e.g., multiple reaction monitoring) (13) to verify the findings in a new 

sample set. 

The described method focuses on biomarker discovery from human plasma 

and cerebrospinal fluid (CSF). Biomarker discovery from these fluids has 

proven challenging as the highly abundant proteins (e.g., albumin, IgG) are 

difficult to completely remove and tend to mask the detection of lower 

abundance proteins that may be directly associated with the biology of interest. 

However, the analytical and statistical methods described here are directly 

applicable to more targeted sample matrices (e.g., tissues) in both clinical and 

pre-clinical models that may increase the probability of technical success based 

on samples more directly associated with the biology of interest with fewer 

abundant, masking proteins to remove. Sample collection and handling procedures 

are critical in reducing the overall variability in biomarker discovery 

studies. Age, gender, diet, time of day, and medication may affect the plasma 

or CSF protein profile and should be considered in study designs. Similarly, 

consistent sample handling tailored to proteomics profiling (e.g., preservatives, 

rapid sample freezing, controlling for blood contamination in CSF sampling, 

number of sample freeze-thaw cycles, etc.) are important considerations to 

ensure high-quality starting material. The proteome is arguably the most 

modulated class of biomolecules in disease, treatment, and toxicity, resulting in 

the promise of proteomics for biomarker discovery. Despite this promise and 

rapid advancements in technology, progress has been slow (14,15). However, 

with a refined strategy of: (1) applying non-targeted, hypothesis generation 

methods like those described here to sample matrices proximal to the biology, 

(2) using targeted MS assays to verify early discoveries in new sample sets, 

and (3) clinical validation using established diagnostic assay formats (e.g., 

ELISAs), the potential to fulfill the promise is high by strategically applying 

the right technology to the appropriate stage of the biomarker discovery life 

cycle (16). 

2. Materials 

2.1. Albumin/IgG Depletion 

1. Montage equilibration buffer, wash buffer, and columns are provided with the 

Montage Albumin Deplete Kit (Millipore ® ). 

2. ProteinG-Sepharose (Amersham Biosciences ® ).


2.2. Reduction, Alkylation, and Digestion 

1. Denaturing solution and internal standard: 8 M urea in 100 mM (NH 4 ) 2 CO 3 buffer 

containing chicken lysozyme (Sigma, St Louis, MO; 10.4 μg/mL), pH 11.0. 

2. Reduction/alkylation cocktail: 97.5% ACN, 2% iodoethanol, and 0.5% 

triethylphosphine (v/v). 

3. Trypsin solution: TPCK treated bovine pancreatic trypsin (Worthington, 

Lakewood, NJ) is dissolved at 1 mg/mL in H 2 O and stored in single-use aliquots 

at –80°C. Working solutions are prepared by diluting to 5 μg/mL in 100 mM 

ammonium bicarbonate pH 8.0 prior to use. 

2.3. HPLC 

1. The C-18 reversed phase column was a Zorbax SB300 1×50mm(Agilent). 

2. Solvent A: 0.1% formic acid (Aldrich) in water (Burdick and Jackson HPLC 

grade). 

3. Solvent B: 50% acetonitrile, 0.1% formic acid (Aldrich) in water (Burdick and 

Jackson HPLC grade). 

4. Solvent C: 80% acetonitrile, 0.1% formic acid (Aldrich) in water (Burdick and 

Jackson HPLC grade). 

2.4. Mass Spectrometry 

1. LTQ ion trap mass spectrometer (ThermoFinnigan). 

3. Methods 

3.1. Plasma Sample Preparation 

3.1.1. Albumin/IgG Depletion 

1. Dilute a 25 μL aliquot of plasma (1.25 mg protein assuming 50 mg/mL total 

protein concentration) with Montage equilibration buffer to a volume of 200 μL 

(see Note 1). 

2. Add 100 μL of a 50% proteinG-Sepharose bead suspension and rock the mixture 

for1hatRT. 

3. Pellet the G-Sepharose beads at 2000 rpm for 2 min. and transfer 200 μL of the 

effluent to a pre-equilibrated Montage column. Pre-equilibration was performed 

with 400 μL of equilibration buffer and centrifugation for 2 min at 500×g 

(see Note 2). 

4. Centrifuge the Montage column at 500×g for 2 min and re-apply the flow-thru to 

the column and centrifuge again. Pass two consecutive 200 μL washes of Montage 

wash buffer over the column via 500×g centrifugation for 2 min. (final volume 

approximately 600 μL).


3.1.2. Reduction, Alkylation, and Digestion 

1. Spike a 120 μL aliquot of the diluted and depleted plasma with 120 μL of the 

denaturing and internal standard solution (see Note 3). 

2. Add an equal volume (240 μ(L) of reduction/alkylation cocktail (see Note 4). 

3. Cap the solutions and incubate for 1hat37°C. 

4. Speed vacuum the solutions to dryness (at least 3 h). 

5. Re-dissolve the pellet in 600 μL of the working trypsin solution. Digest overnight 

at 37°C (17). 

3.2. Cerebrospinal Fluid Sample Preparation 

3.2.1. Albumin/IgG Depletion 

1. Dilute an aliquot of CSF (34 μg protein based on a Bradford total protein assay) 

with Montage equilibration buffer to a volume of 200 μL (see Note 5). 

2. Add 100 μL of a 50% proteinG-Sepharose bead suspension and rock the mixture 

for1hatRT. 

3. Pellet the G-Sepharose beads at 2000 rpm for 2 min and transfer 200 μL of 

the effluent to a pre-equilibrated Montage column. Pre-equilibration is performed 

with 400 μL of equilibration buffer and centrifugation for 2 min at 500×g (see 

Note 2). 

4. Centrifuge the Montage column at 500×g for 2 min and re-apply the flow-thru to 

the column and centrifuge again. Pass two consecutive 200 μL washes of Montage 

wash buffer over the column via 500×g centrifugation for 2 min (final volume 

approximately 600 μL). 

3.2.2. Reduction, Alkylation, and Digestion 

1. Speed vacuum the CSF samples to approximately 30–50 μL and mix with 40 μL 

of the denaturing and internal standard solution (see Note 3). 

2. Add 100 μL of reduction/alkylation cocktail (see Note 4). 

3. Cap the solutions and incubate for 1hat37°C. 

4. Speed vacuum the solutions to dryness (at least 3 h). 

5. Re-dissolve the pellet in 600 μL of the working trypsin solution. Digest overnight 

at 37°C (17). 

3.3. HPLC Conditions 

1. A Surveyor autosampler and MS HPLC pump (ThermoFinnigan) are used for 

separation. 100 μL tryptic digests (4.2 μg plasma non-depleted equivalent protein 

or 14 μg CSF non-depleted equivalent protein) onto the reversed phase column 

at a flow rate of 50 μL/min (see Note 6). The gradient conditions are: 10–95% B 

(90–5% A) over 120 min, followed by a 0.1 min ramp to 100% C, followed by 

5 min at 100% C, followed by a 0.1 min ramp to 10% B (90% A), and hold for


17 min at 10% B (90% A). The effluent is diverted to waste for the first 2 min 

to keep the mass spectrometer source clean. 

2. Between each sample in the set, an injection of water is made and a shortened 

(60 min) gradient, identical to the above, is performed to reduce carryover. 

3.4. Mass Spectrometer Conditions 

1. The total column effluent (50 μL/min) is connected to the electrospray interface 

of the ion trap mass spectrometer. 

2. The source is operated in positive ion mode with a 4.8 kV electrospray potential, 

a sheath gas flow of 20 arbitrary units, and a capillary temperature of 225°C. The 

source lenses should be set by maximizing the ion current for the 2+ charge state 

of angiotensin. 

3. Data are collected in the triple play mode with the following parameters: centroid 

parent scan set to one microscan and 50 ms maximum injection time, profile 

zoom scan set to three microscans and 500 ms maximum injection time, and a 

centroid MS/MS scan set to two microscans and 2000 ms maximum injection 

time (see Note 7). 

4. Dynamic exclusion settings are set to a repeat count of one, exclusion list duration 

of 2 min, and rejection widths of –0.75 m/z and +2.0 m/z. 

5. Collisional activation is carried out with relative collision energy of 35% and an 

exclusion width of 3 m/z. 

6. Study samples should be injected in a random order to reduce any effects of 

carryover or confounding with a non-random injection order (see Note 8). 

7. All water blank samples should be analyzed by the mass spectrometer in the same 

manner as study samples in order to monitor carryover (see Note 9). 

3.5. Zoom Scan Data Processing 

The data collected from a zoom scan triple-play experiment are used to 

estimate the quality of the subsequent MS/MS spectrum, the charge state of 

the peptide, and the monoisotopic and average mass of the peptide. The quality 

estimate is used to eliminate those scan events that are triggered by noise 

or small molecules from further downstream processing. Peptide mass and 

charge state estimates are used in subsequent steps for peptide identification. 

Eliminating low-quality scan events and more accurately estimating the charge 

state and mass of peptides ultimately reduces the number of false positives that 

must be dealt with at the peptide identification stage of the process. 

1. Assume the charge state of the detected peptide is 1 + . 

2. Given the m/z of the scan event and the assumed charge state, estimate the 

theoretical isotope distribution intensities for a peptide of the hypothesized mass 

using the relationships given in Fig. 1 (see Note 10). Begin by determining the 

relative intensity of the 12 C peak (I 0 ) using the relationship in Fig. 1A and the 

MW for the assumed charge state. Next, estimate the relative peak intensity of


the 13 C peak (I 1 ) by multiplying the estimate of I 0 by the I 1 /I 0 ratio from Fig. 1B 

using the MW for the assumed charge state. Isotope intensities I 2 and I 3 are 

derived in a similar manner using the ratios from Fig. 1C–D at the MW for the 

assumed charge state. 

3. Convolve the estimated theoretical isotope stick spectrum with a Gaussian peak 

shape that has a peak width similar to that produced in a typical zoom scan 

spectrum (18). Linearly scale the result of this convolution such that the maximum 

value is one. 

(A) 

(B) 

l 0 / max(l 0 ,l 1 ,l 2 ,l 3 ) 

l 2 / l 0 

1.0 2.0 

0.0 

500 2500 

Mono MVV 

(C) 

500 2500 

Mono MVV 

l 3 / l 0 

0.5 0.8 

l 1 / l 0 

0.5 1.5 

0.0 1.0 

500 2500 

Mono MVV 

(D) 

500 2500 

Mono MVV 

Fig. 1. Empirically derived relationships (from 15,493 example peptides) between 

isotope peak intensities used to estimate the theoretical isotope pattern for a peptide 

(A) I 0 /max(I 0 ,I 1 ,I 2 ,I 3 ), non-linear least squares fit: 

{ } 

1 if MW< 1800 

I 0 /maxI 0 I 1 I 2 I 3 = 

e −000132+MW 

−18000865 if MW ≥ 1800 

(B) I 1 /I 0 , linear least squares fit: 

I 1 /I 0 =−000498 + 0000560MW , 

(C) I 2 /I 0 , linear least squares fit: 

I 2 /I 0 =−0367 + 0000516MW + 159×10 −7 MW − 152734 2 , and 

(D) I 3 /I 0 , nonlinear least squares fit: I 3 /I 0 = 00000605e 000251MW −270×10−7 MW 2 . 



4. Convolve the result from step 3 above with the measured zoom scan to obtain the 

matched filter output between the expected zoom scan spectrum from the assumed 

charge state and the measured zoom scan spectrum. Record the maximum value 

of the output of this convolution along with the x-axis (m/z) value where the 

maximum occurred. 

5. Repeat steps 2–4 above for an assumed charge state of 2 + ,3 + , and 4 + . The 

detected peptide charge state and mass are estimated from the best match between 

the observed zoom scan spectrum and the theoretically derived spectrum for 

the possible charge states of 1 + ,2 + ,3 + , and 4 + . The cross-correlation between 

the best matching theoretical isotope pattern at the m/z shift value associated 

with the convolution maximum and the measured zoom scan is used as an 

intensity-independent matching score between the measured and the best matching 

theoretical spectrum. Triple play events with a cross-correlation score greater 

than 0.6 are retained for identification. Triple plays below this threshold represent 

scans that are not peptides, a mixture of several peptides in the ion trap, or 

very low signal-to-noise measurements. These lower quality scan events are not 

retained for any further processing. 

3.6. MS/MS Spectral Filtering 

In order to reduce the effect of MS/MS noise peaks on the identification of 

peptides, a dynamic MS/MS noise level is estimated for each spectrum. This 

noise level estimate is then subtracted from all MS/MS peak intensities with 

any resulting differences less than zero set to zero. The spectral noise level is 

estimated based on the observation that ideal MS/MS spectra of peptides have 

relatively few peaks (e.g., y-ions, b-ions, adducts, etc.) in a theoretical or high 

signal-to-noise ratio spectrum, while noisy MS/MS spectra typically have a 

high density of peaks within a local m/z neighborhood (interpreted as chemical 

noise). Therefore, the filtering approach uses a percentile of the peak intensities 

within a local m/z neighborhood as the noise estimate, where the percentile 

used is based on the density of peaks in the neighborhood – a higher peak 

density results in a higher percentile to estimate the local noise level, a lower 

peak density results in a lower percentile to estimate the local noise level. 

1. Bin the MS/MS spectrum into a vector of equally spaced m/z values (bin width 

of 0.1 m/z). 

2. At 200 equally spaced m/z value design points between the maximum and 

minimum observed m/z values observed in the MS/MS spectrum, estimate the 

local peak density by counting the number of non-zero intensities in a ±20 m/z 

window around each of the 200 design points. Define the local peak density at 

these 200 design points as the number of non-zero peaks counted divided by 40 

(peaks per m/z). 

3. Transform the local peak density values to a filtering percentile value using the 

relationship shown in Fig. 2.


Fig. 2. Filtering percentile as a function of local MS/MS peak density. Peak density 

is defined as the number of MS/MS peaks in a 40 m/z window divided by 40. 

{ } 

0 if PeakDensity ≤ 01 

Filtering Percentile = 

if PeakDensity > 01 

Reprinted with permission from (10). 

075 

1+e 015−PeakDensity 

005 

4. Obtain an initial noise level estimate by the percentile of MS/MS peak intensities 

at each of the 200 design points, where the percentile used at each point is derived 

from step 3 above (see Note 11). 

5. Smooth the initial noise estimates with a Gaussian kernel smooth (150 m/z 

bandwidth) and interpolate between the 200 design points to obtain the final 

MS/MS noise estimate at each measured m/z value. Subtract this estimate from 

the measured MS/MS peak intensities and set any negative values to zero. An 

example of a high and low signal-to-noise MS/MS spectrum and the resulting 

estimated noise levels is shown in Fig. 3. 

3.7. Peptide Identification 

A detailed description of peptide identification is beyond the scope of this 

chapter, but some general discussion is warranted given the importance of the 

subject and its linkage to quantification with the proposed method. The primary 

problem with peptide identification is controlling for false-positive identifications 

while maintaining a reasonable sensitivity to detect correct identifications. 

Our approach utilizes the outputs of two search engines, Sequest (19) and 

X! Tandem (20), along with other descriptive features of identification (e.g., 

charge state, peptide length, etc.) as inputs to a classifier that has been trained


(A) 

50,000 

Intensity Intensity 

150,000 350,000 

0 20,000 

200 600 1000 1400 

m/z 

(B) 

0 

500 1000 1500 

Fig. 3. Example MS/MS spectra and their estimated noise levels. 443 original peaks 

reduced to 118 peaks above estimated noise level in high-noise spectrum (A). 589 

original peaks reduced to 173 peaks above estimated noise level in lower noise spectrum 

(B). Reprinted with permission from (10). 

m/z 

to identify correct identifications (21). The output of the classifier provides a 

unit-less score indicative of the likelihood of a correct identification. Falsepositive 

identifications are controlled by running the searches against reversed 

versions of the protein databases and estimating the p-values: the probability 

of observing a model score from the reversed database search that exceeded 

the observed score from the correct database. P-values alone are insufficient 

due to the large number of tests (identifications) being done (i.e., with a 0.05 

p-value cutoff, 5% of identifications declared correct would in fact be incorrect 

in the null condition where there are truly no matches to any MS/MS spectra). 

To account for multiple testing, false discovery rates (FDRs) (q-values) for


peptide identifications are estimated from p-values using the method described 

by Benjamini and Hochberg (22). Peptides with identification q-values less than 

a threshold, say 0.10, are retained for quantification. Proteins identified by only 

one peptide are visually examined to eliminate obvious incorrect identifications 

(e.g., less than four consecutive y- or b-ions). We estimate that the proportion of 

false identifications using such a procedure is less than or equal to 2%. Overall, 

the method is similar in strategy to PeptideProphet (23) with the following 

extensions: multiple search engines are employed, a more flexible classifier 

(e.g., Random Forests) is used, and statistical significance is estimated from a 

null distribution of classifier scores derived from reversed database searching 

instead of fitting a mixture model to the distribution of classifier output scores. 

The method is described in detail in Higgs et al. (11). 

In general, we typically restrict biomarker hypothesis generation to identified 

peptides. The same relative quantification method can be used with unidentified 

peptides (MS features), although in practice these features need to be identified 

to be of practical use to clinicians and biologists. To maximize the coverage 

of proteins identified in a study, identifications from all samples in the study 

are pooled and used to create a list of peptides to quantify in each sample. 

Thus, a confident identification needs to be made once out of a sample in order 

for the associated peptide ion current to be quantified in all study samples. 

Pooling the identifications across all samples in a study significantly increases 

the number of identifications relative to the number of identifications from any 

single sample. 

3.8. Chromatographic Alignment 

Variability in the abundance of individual peptides between different samples 

may result in that peptide triggering an MS/MS scan in one sample and not in 

another. The area of this peptide may still be extracted from the primary mass 

spectrum in each sample. However, doing so requires high-quality chromatographic 

alignment between the samples so that a consistent region in the 

extracted ion chromatogram (XIC) is used for integration across all samples in 

a study. Large biomarker studies can produce chromatographic retention time 

shifts greater than 1 min between pairs of samples run several days and many 

samples apart. Simply expanding the integration window by 1 or 2 min to 

account for chromatographic variability is not an option in our experience as 

we are analyzing complex samples with multiple co-eluting peaks at most XIC 

masses. An expanded integration window that includes multiple peaks masks 

the quantification of individual peptides, produces results that are confounded 

with multiple peptides contributing to a value, and increases variability. Peak 

picking is another option, but was not applied here due to the computational


cost as well as the inherent heuristic nature of peak picking algorithms with 

an associated variability in what is being integrated. We have found a simple 

pair-wise alignment between all samples and a select reference sample in the 

study to work well for numerous biomarker discovery projects. This approach 

to alignment is founded on the following assumptions: (a) the samples included 

in the study are generally quite similar to each other with respect to their peptide 

content (i.e., there are many peptides or landmarks in common between the 

samples), (b) the same chromatographic conditions are used for each sample in 

the study, and (c) in a local region of retention time, the retention time offset 

between any two samples is approximately constant (see Note 12). 

1. Identify the landmarks in the reference sample by taking all triple-play scan events 

with a zoom scan cross-correlation score of 0.65 or greater. This set of reference 

sample landmarks will be matched against other samples in the study. 

2. Identify the matching landmarks in a study sample by declaring a landmark match 

if the sample and reference triple-play events have: (a) the retention time of the 

triple play event between the samples is within a user-specified amount (5 min), 

(b) the charge state of the peptide matches, (c) the m/z value of the monoisotopic 

peak from the zoom scans is within a user-specified amount (0.7 Da) between the 

two samples, (d) the zoom scan cross-correlation coefficient of both peptides to 

their respective theoretical isotope patterns exceeds a threshold (0.65), and (e) the 

similarity between the corresponding MS/MS spectra exceeds a threshold (e.g., 

0.75). The MS/MS similarity metric has been implemented as a cross-correlation 

coefficient between two MS/MS spectra following a convolution of each MS/MS 

stick-spectrum with a Gaussian peak shape. 

3. For each matching pair of landmarks identified in step 2 above, generate the 

XIC for the feature in a local retention time window (e.g., ±5 min of scan event 

time in each sample). Convolve the two XICs to identify the time shift value that 

maximizes the convolution result between the landmark XICs in both samples. 

Record the time shift and cross-correlation at the optimal shift value for each 

landmark. The cross-correlation value will be used as a weighting factor in the 

subsequent smoothing step below. 

4. The optimal time shift values for each pair of landmarks between a sample and the 

reference defines a warping function that can be used to transform the retention 

time values of a sample to the reference. Estimate a smooth warping function 

by fitting a weighted loess (24) to the time shift versus retention time values 

for each sample. The loess should be done in a weighted manner using the XIC 

cross-correlation values from step 3 above as weights. The result is a smooth 

function that can be used to transform a sample’s retention time to a common 

time defined by the reference sample Fig. 4. 

5. The loess warping function for a sample is then applied to all the retention times 

in the chromatogram (landmark or not). Thus, all samples in a study are projected 

onto the same retention time scale. The warping function between two samples is 

generally not monotonic over the entire retention time range, and no restriction


Shift (min) n = 462 

–0.5 

0.0 

0.5 

0 20 40 60 80 100 120 

Ret. Time (min) 

Fig. 4. Example chromatographic alignment (“warping”) function between two rat 

serum samples. Retention time shift (min) vs. retention time (min) for 462 landmark 

peptides are plotted with the resulting loess fit. Reprinted with permission from (10). 

on overall monotonicity is used in our estimate of the warping function. We 

do, however, preserve the overall rank order of the retention times following 

alignment by constraining the bandwidth (span = 0.5) used in the loess fitting 

(24) (see Note 13). 

3.9. Peptide Quantification 

Relative quantification of peptides is carried out by integration of the XIC 

peak (using normalized retention times from the chromatographic alignment) 

from the primary mass spectrum within each sample. A list of peptides to 

integrate within each sample is constructed by pooling together all triple-play 

events across all the samples. This pooling can be done with or without the use 

of peptide identification. As previously noted, we typically restrict the analyses 

to identified peptides. For each identified peptide, perform the following 

steps: 

1. For each sample in which the peptide was identified, extract the XIC for the 

peptide and compute the centroid (weighted average of retention time values 

where weighting factor is the XIC ion current) of the XIC in a small retention 

time neighborhood (–0.5 min to +1.0 min from triple-play trigger time) using the 

aligned time values in the XIC. Compute the mean centroid time for the peptide 

over all samples in which the peptide was identified. Also compute the mean 

average m/z value estimated from the zoom scan spectrum for each sample in 

which the peptide was identified.


2. For each sample in the study, create an XIC for the peptide using the mean zoom 

scan average m/z value determined in step 1. 

3. Estimate a local XIC baseline level and subtract the baseline from the XIC 

intensity values from each sample. A local linear baseline can be estimated by 

fitting a line between the lowest intensity XIC point before the peak and the lowest 

intensity XIC point following the peak in a local neighborhood (e.g., 5 min). 

This simple local linear baseline estimate always results in a baseline estimate 

below the signal intensity in the local neighborhood, leading to a low bias in the 

estimated baseline. For large peaks, this bias is negligible but for small peaks 

the bias may have a more pronounced effect on quantification. Alternatively, 

an asymmetric least squares smoothing approach may be used to estimate the 

baseline XIC values in order to reduce the potential bias with the simple local 

linear approach (25). 

4. A fixed retention time window (±0.5 min for the chromatography described) 

around the mean centroid time value described in step 1 is used for integration. 

The width of this window is dependent on the chromatography method used. 

For the chromatography method reported here, the peak width remains relatively 

constant across the HPLC gradient (i.e., no band-broadening is observed). 

If band-broadening is observed, then the integration window width should 

be modeled as a function of the retention time (e.g., integration window 

width = intercept + slope × retention time). 

5. Integrate the baseline corrected XIC values within the fixed retention time window 

for each sample in the study using a numerical integration algorithm such as the 

trapezoid rule. Record the XIC area values for each peptide in each sample. An 

example of XIC integration for a small study is shown in Fig. 5. 

3.10. Data Transformation and Normalization 

Following the integration of peptide-specific XIC peaks in all study samples, 

we have a rectangular data table with N rows corresponding to N samples in 

the study, and P columns corresponding to peptides detected in the study. The 

cell values in this data table are the peptide peak areas. With this table in hand, 

the usual operations of transformation and normalization may be applied prior 

to any statistical analysis. 

1. Peptide peak areas are approximately log-normal distributed. Apply a log 2 transformation 

to all peak area values (see Note 14). 

2. Normalize the log 2 transformed peak areas using a quantile normalization 

procedure (26) (see Note 15). 

3. Normalized log 2 peptide areas may be used directly as input to the statistical 

analysis for the study (peptide level analysis). Additionally, the average 

of normalized log 2 peptide areas for all the peptides identified from a protein 

can be used as an overall estimate of the protein level (protein level analysis, 

see Note 16).


Fig. 5. XICs from the 2 + –1 macroglobulin peptide ATPLSLCALTAVDQSVLL- 

LKPEAK for eight rat serum samples following chromatographic alignment. Note that 

the peak from all samples fits within the highlighted [83.2, 84.2] integration region. 



3.11. Study Design, Power, Sample Size, and Analysis 

Our strategy of producing an N × P table of relative peptide levels allows 

the flexibility for the analysis to be done in a manner consistent with the 

study design. Note that no part of the described method imposes any limitation 

on the final study statistical analysis (e.g., pooling of samples, subtractiveor 

difference-based methods, etc.). In general, the statistical analysis used for 

identifying potential protein biomarkers in a study should follow the same 

approach as a primary clinical endpoint analysis would take (i.e., a simple 

paired design should be analyzed with a paired t-test, a crossover design with 

repeated measures within period should be analyzed as a crossover study with 

repeated measures within period, etc.). 

An analysis of a single clinical endpoint may use the familiar type I error 

threshold of 0.05 as a measure of statistical significance. This approach does not 

work well when testing hundreds or thousands of proteins in a study because, by 

definition, 5% of all p-values from a null experiment (an experiment in which 

there is truly no treatment or group effect) will have a p-value less than 0.05. 

The Bonferroni approach to control the family-wise type I error (controlling 

for no errors in the set of declared changes) has been commonly employed as 

a means to control false-positive findings (27). However, many investigators 

doing proteomic hypothesis generation are willing to tolerate some level of falsepositive 

findings in a declared set as long as it is relatively low and estimated. 

The use of FDR as a means to identify a set of declared findings with a 

specified proportion of false-positives has been widely applied in genomics (22) 

and is the current recommendation for proteomic hypothesis generating experiments. 

There are numerous estimators of FDR (28,29) with the original method 

described by Benjamini and Hochberg used in the work presented here (22). 

Just as multiple comparisons should be considered in the analysis of study 

data, these should also be considered at the design stage of a new study 

aimed at generating hypotheses from highly multiplexed measurements like 

proteomics. This is a relatively new field of research with several methods 

recently reported (30,31,32,33). A simple approach originally suggested by 

Benjamini and Hochberg (22), and adapted by Bemis (34), uses traditional 

sample-size calculations with the following expression for average type I error 

( ave ) over a set of tested hypotheses: ave = f ave q ∗ m 1 

where f m 1 +m 0 1−q ∗ ave is the 

average power of hypothesis tests conducted in a study, q ∗ is the rate at which 

FDR is to be controlled, m 0 is the number of true null hypotheses tested, and m 1 

is the number of true alternative hypotheses tested. Sample-size estimates are 

made by first estimating ave using the desired values for f ave and q ∗ , assumed 

values for m 0 and m 1 , and existing sample size calculators using for a given 

study design. An example set of sample-size curves using ave this approach 

for the two-sample t-test design is given in Fig. 6.


Fig. 6. Estimated sample sized required to detect protein changes in a two-sample 

t-test design. Number of subjects in each of the two groups is plotted against the 

detectable effect size expressed as a fold-change. Four different levels of total variability 

are shown (10% CV, 20% CV, 30% CV, and 40% CV). Sample size estimates were 

made using 85% power, a 0.10 target FDR for declaring significance, and an estimated 

m 

proportion of true null hypotheses, 0 

, set to 0.98. 

m 0 +m 1 

4. Notes 

1. We find that plasma total protein concentration, as measured by a Bradford 

assay, has a total coefficient of variation (CV) of approximately 11% (includes 

inter-subject, intra-subject, and assay error) and ranges between approximately 

48 and 68 mg/mL (12). Due to the apparent highly regulated plasma total protein 

concentration, it is not generally necessary to measure total protein concentration 

for each sample in a study in order to load a consistent amount of protein. 

2. The depletion material used is based on a dye affinity removal method for 

albumin. There are commercially available antibody-based depletion kits that 

may improve albumin removal at a reasonable cost. Abundant protein depletion 

is an open and active research area at the time of this writing. 

3. Chicken lysozyme is added as a spiked internal standard at this stage in order 

to qualitatively assess the digestion efficiency as well as to quantitatively assess 

the measurement error across the samples in a study. Other internal standard(s) 

could also be used. 

4. The reduction/alkylation solution should be prepared just before use. 

Triethylphosphine is pyrophoric and should be handled in a fume hood in accordance 

with the material safety data sheet. The use of volatile reagents for this step


reduces the variability in the sample prep by minimizing sample handling steps 

and removing the majority of reduction and alkylating reagents. The digestion is 

performed with trypsin, which is sensitive to the presence of reducing reagents. 

5. We find that CSF total protein concentration, as measured by a Bradford assay, 

has a total CV of approximately 27% (includes inter-subject, intra-subject, and 

assay error with the additional total variability relative to plasma total protein 

attributed to a higher CSF inter-subject variance) with a range between approximately 

0.12 and 0.41 μg/mL (12). The higher overall variability is attributed 

to a significantly higher inter-subject variability relative to plasma total protein 

(12). Due to the higher variability with CSF total protein, we use the results of 

Bradford total protein assay to process a consistent total CSF protein amount in 

the proteomics assay. 

6. The HPLC pumps must be capable of producing a smooth gradient at 50 μL/min. 

The gradient formation should be verified by using water in A and 1% acetone 

in water for B and running the gradient with UV monitoring at 254 nm. New 

HPLC columns should be conditioned with at least four runs of digested serum 

before use in the method. 

7. The mass spectrometer’s source should be carefully cleaned to minimize chemical 

noise. Monitor above 300 m/z and try to maximize the injection time as this is 

directly proportional to achievable dynamic range in an ion trap mass spectrometer. 

The spray conditions should be optimized for a peptide of about ˜1700 Da. 

8. Alternatively, a design could be used to balance various study factors (e.g., 

treatment, gender, age, etc.) with injection order. This approach may be 

most appropriate for small studies (e.g.,


the +3 13 C isotopic peak. The 15,493 example peptides were then used to derive 

relationships for I 0 /max (I 0 ,I 1 ,I 2 ,I 3 ), I 1 /I 0 , I 2 /I 0 , and I 3 /I 0 as functions of the 

peptide monoisotopic molecular weight (Fig. 1). 

11. Percentile transformation is done to define the noise level as the X th percentile 

of the peak intensities in a local m/z neighborhood where X is dependent on 

the peak density in the neighborhood (higher peak density–>higher percentile– 

>higher estimated noise level). 

12. One potential improvement to this alignment strategy would be to create a 

composite list of landmarks across all study samples instead of relying on a single 

sample to serve as the retention time reference. This could easily be accomplished 

by grouping or clustering landmarks from all samples enforcing a match on m/z, 

charge state, retention time, and MS/MS spectral similarity. This has not been 

employed yet due to the increased computational cost and the lack of data demonstrating 

any significant problems with the single reference sample approach. In 

practice, several different samples are evaluated as potential alignment reference 

samples, and the best sample based on a qualitative assessment of the alignment 

warping functions is chosen. 

13. A visual examination of the alignment warping functions for all samples included 

in a study is an effective means to detect and diagnose chromatography problems 

encountered in the analysis of dozens of study samples. For example, oscillatory 

warping functions have been associated with pump mixing problems while large 

magnitude mostly linear warping functions have been associated with column 

degradation. 

14. Log 2 is convenient because a unit change can be interpreted as a twofold change 

on the original scale. 

15. Normalization can be particularly important for minimizing systematic biases in 

ion current introduced by sample collection and handling, sample concentration, 

instrument sensitivity drift during the course of data acquisition, etc. The spiked 

internal standard, chicken lysozyme can be helpful in diagnosing and monitoring 

ion intensities before and after normalization. Quantile normalization assumes 

that the overall distribution of log 2 peptide peak areas is unchanged from sample 

to sample. This is generally a reasonable assumption, but there are cases where 

a treatment effect may modulate the level of most of the proteins detected in 

a study, and in such cases quantile normalization should not be used. In these 

cases, the spiked internal standard, chicken lysozyme can be used to normalize 

any systematic effects of the process on ion current occurring only after the 

standard was spiked. 

16. In practice, we will analyze a study at both the peptide and protein levels. 

Peptide-level analyses are generally specific to the identified peptide and allow 

the opportunity to discover biologically related changes in peptide level due 

to processing of a specific region of a protein. Protein-level analyses provide 

additional statistical power to detect smaller magnitude changes in protein levels 

since we are averaging multiple peptide values, all of which have a high positive 

covariance.



We thank John Saalwaechter and Andrew Kaczorek and the entire scientific 

computing team for their efforts in developing and maintaining a highavailability 

grid-computing environment used for this work. We also thank 

Jude Onyia and the statistical and mathematical sciences management team for 

supporting us in the development of these methods. 

References 

1. FDA Critical Path Initiative 2006 (http://www.fda.gov/oc/initiatives/criticalpath). 

2. NIH Road Map for Medical Research 2006 (http://www.nihroadmap.nih.gov/ 

index.asp). 

3. Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., Gelb, M.H., and Aebersold, R. 1999. 

Quantitative analysis of complex protein mixtures using isotope-coded affinity 

tags. Nat. Biotechnol. 17: 994–999. 

4. Aggarwal, K., Choe, L.H., and Lee, K.H. 2006. Shotgun proteomics using the 

iTRAQ isobaric tags. Brief. Funct. Genomic. Proteomic. 5: 112–120. 

5. Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., 

Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C. et al 2002. Use 

of proteomic patterns in serum to identify ovarian cancer. Lancet 359: 572–577. 

6. Radulovic, D., Jelveh, S., Ryu, S., Hamilton, T.G., Foss, E., Mao, Y., and Emili, A. 

2004. Informatics platform for global proteomic profiling and biomarker discovery 

using liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics 3: 

984–997. 

7. Wiener, M.C., Sachs, J.R., Deyanova, E.G., and Yates, N.A. 2004. Differential 

mass spectrometry: a label-free LC-MS method for finding significant differences 

in complex peptide and protein mixtures. Anal. Chem. 76: 6085–6096. 

8. Gao, J., Opiteck, G.J., Friedrichs, M.S., Dongre, A.R., and Hefta, S.A. 2003. 

Changes in the protein expression of yeast as a function of carbon source. 

J. Proteome. Res. 2: 643–649. 

9. Colinge, J., Chiappe, D., Lagache, S., Moniatte, M., and Bougueleret, L. 2005. 

Differential Proteomics via probabilistic peptide identification scores. Anal. Chem. 

77: 596–606. 

10. Higgs, R.E., Knierman, M.D., Gelfanova, V., Butler, J.P., and Hale, J.E. 2005. 

Comprehensive label-free method for the relative quantification of proteins from 

biological samples. J. Proteome. Res. 4: 1442–1450. 

11. Higgs, R.E., Knierman, M.D., Freeman, A.B., Gelbert, L.M., Patil, S.T., and 

Hale, J.E. 2007. Estimating the statistical significance of peptide identifications 

from shotgun proteomics experiments. J. Proteome. Res. 6: 1758–1767. 

12. Patil, S.T., Higgs, R.E., Brandt, J.E., Knierman, M.D., Gelfanova, V., Butler, J.P., 

Downing, A.M., Dorocke, J., Dean, R.A., Potter, W.Z. et al. 2007. Identifying 

pharmacodynamic protein markers of centrally active drugs in humans: a pilot 

study in a novel clinical model. J. Proteome. Res. 6: 955–966.


13. Anderson, L., and Hunter, C.L. 2006. Quantitative mass spectrometric multiple 

reaction monitoring assays for major plasma proteins. Mol Cell Proteomics 5: 

573–588. 

14. Anderson, N.L., and Anderson, N.G. 2002. The human plasma proteome: history, 

character, and diagnostic prospects. Mol Cell Proteomics 1: 845–867. 

15. Gutman, S., and Kessler, L.G. 2006. The US Food and Drug Administration 

perspective on cancer biomarker development. Nat. Rev. Cancer 6: 565–571. 

16. Rifai, N., Gillette, M.A., and Carr, S.A. 2006. Protein biomarker discovery and 

validation: the long and uncertain path to clinical utility. Nat. Biotechnol. 24: 

971–983. 

17. Hale, J.E., Butler, J.P., Gelfanova, V., You, J.S., and Knierman, M.D. 2004. 

A simplified procedure for the reduction and alkylation of cysteine residues in 

proteins prior to proteolytic digestion and mass spectral analysis. Anal. Biochem. 

333: 174–181. 

18. Proakis, J.G., and Manolakis, D.G. 1992. Digital Signal Processing – Principles, 

Algorithms and Applications. Prentice Hall, New York, NY. 

19. Eng, J.K., Mccormack, A.L., and Yates, J.R. 1994. An approach to correlate tandem 

mass spectral data of peptides with amino acid sequences in a protein database. 

Journal of the American Society for Mass Spectrometry 5: 976–989. 

20. Craig, R., and Beavis, R.C. 2003. A method for reducing the time required to match 

protein sequences with tandem mass spectra. Rapid Commun. Mass Spectrom. 17: 

2310–2316. 

21. Ulintz, P.J., Zhu, J., Qin, Z.S., and Andrews, P.C. 2006. Improved classification 

of mass spectrometry database search results using newer machine learning 

approaches. Mol Cell Proteomics 5: 497–509. 

22. Benjamini, Y., and Hochberg, Y. 1995. Controlling the false discovery rate - a 

practical and powerful approach to multiple testing. Journal of the Royal Statistical 

Society Series B-Methodological 57: 289–300. 

23. Keller, A., Nesvizhskii, A.I., Kolker, E., and Aebersold, R. 2002. Empirical statistical 

model to estimate the accuracy of peptide identifications made by MS/MS 

and database search. Anal. Chem. 74: 5383–5392. 

24. Cleveland, W.S., Grosse, E., and Shyu, W.M. 1992. Local regression models. 

In Statistical Models in S. J.M. Chambers and T.J. Hastie, eds. Wadsworth & 

Brooks/Cole, Pacific Grove, CA. 

25. Boelens, H.F., Dijkstra, R.J., Eilers, P.H., Fitzpatrick, F., and Westerhuis, J.A. 2004. 

New background correction method for liquid chromatography with diode array 

detection, infrared spectroscopic detection and Raman spectroscopic detection. J. 

Chromatogr. A 1057: 21–30. 

26. Bolstad, B.M., Irizarry, R.A., Astrand, M., and Speed, T.P. 2003. A comparison 

of normalization methods for high density oligonucleotide array data based on 

variance and bias. Bioinformatics 19: 185–193. 

27. Miller, R.G., Jr. 1991. Simultaneous Statistical Inference. Springer-Verlag, 

New York.


28. Butler, K.W., Deslauriers, R., Geoffrion, Y., Storey, J.M., Storey, K.B., Smith, I.C., 

and Somorjai, R.L. 1985. 31P nuclear magnetic resonance studies of crayfish 

(Orconectes virilis). The use of inversion spin transfer to monitor enzyme kinetics 

in vivo. Eur. J. Biochem. 149: 79–83. 

29. Efron, B. 2004. Large-scale simultaneous hypothesis testing: the choice of a null 

distribution. J. Am. Stat. Soc. 99: 96–104. 

30. Pounds, S., and Cheng, C. 2005. Sample size determination for the false discovery 

rate. Bioinformatics 21: 4263–4271. 

31. Hu, J., Zou, F., and Wright, F.A. 2005. Practical FDR-based sample size calculations 

in microarray experiments. Bioinformatics 21: 3264–3272. 

32. Jung, S.H. 2005. Sample size for FDR-control in microarray data analysis. Bioinformatics 

21: 3097–3104. 

33. Li, S.S., Bigler, J., Lampe, J.W., Potter, J.D., and Feng, Z. 2005. FDR-controlling 

testing procedures and sample size determination for microarrays. Stat. Med. 24: 

2267–2280. 

34. Bemis, K.G. 2005. Statistical Issues with Mass Spectrometry Proteomics for 

Biomarker Discovery. In International Workshop on Statistical Methodology in 

Clinical and Nonclinical R&DDIA conference, Nice, France.

13 

Analysis of the Extracellular Matrix and Secreted Vesicle 

Proteomes by Mass Spectrometry 

Zhen Xiao, Thomas P. Conrads, George R. Beck, Jr., 

and Timothy D. Veenstra 

Summary 

The extracellular matrix (ECM) and secreted vesicles are unique structures outside of 

cells that carry out dynamic biological functions. ECM is created by most cell types and 

is responsible for the three-dimensional structure of the tissue or organ in which they 

are originated. Many cells also produce or secrete specialized vesicles into the ECM, 

which are thought to influence the extracellular environment. ECM is not s a physical 

structure to connect cells in a tissue or organ. The proteins in ECM and secreted vesicles 

are critical to cell function, differentiation, motility, and cell-to-cell interaction. Although 

a number of major structural proteins of ECM and secreted vesicles have long been 

known, an appreciation of the role of less-abundant non-collagenous proteins has just 

begun to emerge. This chapter outlines a series of methods used to isolate and enrich 

ECM constituents and secreted vesicles from bone-forming osteoblast cells, enabling 

comprehensive profiles of their proteomes to be obtained by mass spectrometry. These 

methods can be easily adapted to study ECM and secreted vesicles in other cell types, 

primary cell cultures derived from animal models, or tissue specimens. 

Key Words: extracellular matrix; matrix vesicle; osteoblast; proteomics; mass 

spectrometry. 


Most cells reside in a matrix environment called the extracellular matrix 

(ECM), which offers the structural and nutritional support as well as a protective 

barrier required for cells to survive, interact, and differentiate. In addition to 



231

232 Xiao et al. 

the intracellular and tissue-related processes, it is becoming increasingly clear 

that alterations in the ECM can affect the pathogenesis of the disease. While 

much effort has been devoted to the understanding of intracellular processes, 

the characteristics and functions of ECM have not been equally well studied. 

The evidence gathered to date has shown that ECM is a complicated organelle 

formed of various proteins that play central roles in cell differentiation, 

migration, and cell-to-cell communication (1,2,3). The complexity of ECM is 

exemplified in the structure of a skeleton. The formation and homeostasis of 

bone is an ongoing process throughout life, and involves the recruitment, replication, 

and differentiation of osteoblasts and osteoclasts (4). Osteoblasts are 

derived from mesenchymal stem cells and have the potential to further develop 

into either osteocytes or lining cells. When induced by the appropriate stimuli, 

such as ascorbic acid and -glycerophosphate, osteoblasts undergo proliferation 

and maturation toward the osteocyte phenotype (Fig. 1) (5). This process is 

accompanied by the accumulation of an ECM and ultimately mineralization of 

the ECM in the form of hydroxyapatite (6). The deposition of hydroxyapatite 

in ECM is initiated by a unique type of vesicles secreted by osteoblasts, called 

matrix vesicles (MVs). With diameters ranging from 30–300 nm, these vesicles 

reside in the ECM and play a critical role in mineralization (7,8). They serve 

as nucleation sites for mineralization and sustain the accumulation of ECM (9). 

A number of proteins, such as annexins and phosphatases, have been identified 

within MVs. These proteins are responsible for the enrichment of calcium 

and phosphate within the vesicles (8,10,11,12,13). Although the presence and 

Fig. 1. The three-stage timeline of the osteoblast cell differentiation. The mineral 

deposition is visualized by alizarin red staining of the osteoblasts cultured in the 

differentiation medium.

Analysis of ECM and Secreted Vesicle Proteomes 233 

function of other proteins are largely unknown, changes in ECM and MV 

proteins are associated with diseases such as osteoporosis (14), arteriosclerosis 

(15,16,17,18), tumor development, and metastasis (19,20,21,22). A comprehensive 

profile of the proteins present in these extracellular organelles enables 

a greater understanding of pathophysiology underlying these clinical manifestations. 

The development of mass spectrometry (MS) technology combined with 

appropriate protein enrichment and peptide separation strategies has made this 

aim achievable (23,24,25,26). 

This chapter describes the extraction of ECM constituents and MVs from 

an osteoblast cell line MC3T3-E1 followed by the analysis of their respective 

proteomic profiles by liquid chromatography (LC) fractionation combined with 

MS analysis (27). The ECM and MVs are isolated and enriched using centrifugation 

and enzymatic approaches. The enrichment of MVs is confirmed by the 

measurement of elevated alkaline phosphatase (ALP) activity. Following the 

creation of a complex mixture of peptides via a tryptic digestion of the extracted 

proteins, this mixture is fractionated using strong cation exchange (SCX) LC. 

These fractions are analyzed by nanoflow reversed-phase LC-tandem mass 

spectrometry (nanoRPLC-MS/MS), and proteins are identified by searching the 

data against appropriate proteomic database. 

2. Materials 

2.1. Cell Culture 

1. MC3T3-E1 pre-osteoblast cell line (see Note 1) 

2. Cell culture medium MEM (Irvine Scientific, Santa Ana, CA) 

3. Fetal bovine serum (Atlanta Biologicals, Atlanta, GA) 

4. Penicillin-streptomycin solution (10,000 I.U./ml penicillin, 10,000 μg/ml streptomycin) 

(Invitrogen Corp., Carlsbad, CA) 

5. 200 mM of l-glutamine (Invitrogen Corp.) 

6. Growth medium: MEM supplemented with 10% fetal bovine serum, 50 U/ml 

penicillin, 50 μg/ml streptomycin, and 2 mM l-glutamine 

7. Differentiation medium: growth medium supplemented with 50 μg/ml ascorbic 

acid (Sigma Chemical Co., St. Louis, MO) and 10 mM -glycerophosphate (Sigma 

Chemical Co.) 

8. Phosphate-buffered saline (PBS) 

9. Trypsin/EDTA (0.25% (w/v) trypsin/0.53 mM EDTA solution in Hank’s BSS 

without calcium or magnesium) (ATCC, Manassas, VA) 

2.2. Extraction of the ECM Constituents 

1. Liberase/blendzyme 1 (0.14 Wünsch units/ml) (Roche Applied Science, Indianapolis, 

IN) 

2. Centrifuge 

3. Bicinchoninic acid (BCA) protein assay reagent kit (Pierce, Rockford, IL)


2.3. Enrichment of MVs from the ECM 

1. Liberase/blendzyme 1 (0.14 Wünsch units/ml) (Roche Applied Science, Indianapolis, 

IN) 

2. Centrifuge 

2.4. Isolation of MVs from Medium 

1. Ultra-Clear centrifuge tubes: 1 × 3.5 in (38 ml) and 5/8×4in(17ml)(Beckman, 

Palo Alto, CA) 

2. Optima L-90K preparative ultracentrifuge (Beckman Coulter, Inc., Palo Alto, CA) 

2.5. Alkaline Phosphatase Assay 

1. Mild lysis buffer: 250 mM NaCl, 50 mM HEPES, pH 7.5, 0.1% NP-40 

2. ALP assay kit, including alkaline buffer (1.5 mM 2-amino-2-methyl-1-propanol, 

pH 10.3), p-nitrophenyl phosphate (PNPP) (4 mg/ml) and p-nitrophenol (PNP) 

standard solution (10 μmol/ml) (Sigma, St. Louis, MO) 

3. Flat bottom 96-well plate 

4. Lumimark microplate reader (Bio-Rad, Hercules, CA) 

2.6. Strong Cation Exchange Liquid Chromatography of Peptides 

1. Trypsin Gold, mass spectrometry grade (Promega, Madison, WI) 

2. 25% (v/v) acetonitrile containing 0.1% (v/v) formic acid 

3. SCX-LC column (1 mm × 150 mm, polysulfoethyl A) (PolyLC, Columbia, MD) 

Fig. 2. Transmission electron microscopic image of matrix vesicles in the ultracentrifuge 

pellets (A). The high magnification image (B) shows fine-needle deposits and 

black dots, likely signs of calcification, both inside and around the vesicles. Also note 

the bilayer membrane of the vesicles (arrowhead).


4. Mobile phase A: 25% (v/v) acetonitrile 

5. Mobile phase B: 25% (v/v) acetonitrile containing 0.5 M ammonium formate, pH 3 

6. 0.1% (v/v) formic acid 


8. Laser-induced fluorescence (LIF) detector 

2.7. Nanoflow Reversed-phase Liquid Chromatography Tandem Mass 

Spectrometry 

1. Slurry packer model 1666 (Alltech, Columbia, MD) 

2. Ceramic cutter 

3. 75 μm i.d. × 360 μm o.d. × 12 cm long fused silica capillary column (Polymicro 

Technologies, Phoenix, AZ) 

4. 5 μm, 300 Å pore size C-18 silica-bonded stationary RP particles (Jupiter, 

Phenomenex, Torrance, CA) 

5. Agilent 1100 nanoLC system (Agilent Technologies, Palo Alto, CA) coupled with 

a linear ion-trap (LIT) mass spectrometer (LTQ, ThermoElectron, San Jose, CA) 

6. Glass sample injection vials 12 × 32 mm (Wheaton, Millville, NJ) 

7. Mobile phase A: 0.1% (v/v) formic acid 

8. Mobile phase B: 0.1% formic acid (v/v) in acetonitrile 

2.8. Bioinformatic Analysis 

1. 20-node Beowulf cluster computer server 

2. SEQUEST Cluster version 3.1 SR1 (Thermo Electron Corp., Waltham, MA) 

3. Bioworks Browser software 3.2 (Thermo Electron Corp.) 

2.9. Validation by Immunofluorescence Staining 

1. Primary antibodies: anti-annexin V, anti-emilin-1, anti-IQGAP1 (Santa Cruz 

Biotechnology, Inc., Santa Cruz, CA) 

2. Secondary antibodies: goat anti-rabbit IgG-FITC, and donkey anti-goat IgG-TR 

(Santa Cruz Biotechnology) 

3. PBS solution 

4. 18 × 18 × 0.15 mm thick glass cover slips 

5. Regular microscope glass slides 

6. Blocking serum: 10% normal blocking serum in PBS. The blocking serum is 

derived from the same species in which the secondary antibody is raised. For 

example, if the secondary antibody is raised in goat, use the normal goat serum 

diluted to 10% in PBS as the blocking serum. 

7. Fixative solution: 3.7% (v/v) formaldehyde in PBS 

8. DAPI diluted 1:50,000 in PBS (Invitrogen, Carlsbad, CA) 

9. ProLong mounting reagent (Invitrogen) 

10. Confocal fluorescence microscope LSM 510 Meta NLO (Carl Zeiss, 

Oberkochen, Germany)


3. Methods 

The ECM proteins are extracted from cultured cells by a short exposure 

to an ECM-degrading enzyme. To isolate MVs that are either confined to 

the ECM or reside in the cell culture medium, two approaches may be used: 

(1) For MVs confined to the ECM, an ECM-degrading enzyme is first applied 

followed by centrifugation and ultracentrifugation; (2) for MVs in the medium, 

centrifugation and ultracentrifugation are applied. The characterization of ECM 

and MV proteomes is performed using LC fractionation and MS analysis. 

3.1. Cell Culture 

1. Grow the murine calvaria-derived osteoblast MC3T3-E1 cells in growth medium. 

The medium is changed every two or three days. Passage the cells with 

trypsin/EDTA (see Note 1). 

2. Once the cell culture reaches ∼50% confluency, replace the growth medium with 

10 ml of differentiation medium per plate to induce osteoblast differentiation. 

3. Extract the ECM or harvest culture medium on the day indicated in the methods 

below. 

3.2. Extraction of the ECM Constituents 

1. Grow MC3T3-E1 cells in differentiation medium on 10-cm plates. Change the 

medium every two or three days (see Note 2). 

2. On day 21, aspirate the medium from the plates. Wash the cells with 10 ml of 

PBS solution three times. 

3. Add 3 ml of liberase/blendzyme 1 solution to each plate. Incubate at 37°C for 

30 min. 

4. Carefully collect the digested supernatant from the plates without disturbing the 

cells. 

5. Centrifuge the supernatant at 2000×g for 5 min to remove any free cells. The 

resulting supernatant contains ECM proteins. 

6. Quantify the amount of ECM proteins using the BCA assay (see Note 3). 

3.3. Enrichment of MVs from the ECM 

1. Follow the same procedure described earlier to grow and prepare cells (see 

Subheading 3.2, steps 1 and 2, and Note 2). 

2. On day 21, aspirate the medium and wash the cells three times with PBS. 

3. Add 3 ml of liberase/blendzyme 1 solution to each plate. Incubate at 37°C for 

30 min (see Note 4). 

4. Collect the supernatant from the plates without disturbing the cells. Centrifuge 

the supernatant at 2000×g for 5 min to remove any cells that may have been 

detached from the plate. Collect the supernatant. 

5. Centrifuge the supernatant at 20,000×g at 4°C for 30 min.


6. Transfer the supernatant to the Ultra-Clear centrifuge tubes. Use the centrifuge 

tubes that fit the volume of the supernatant. Fill the tubes with PBS up to about 

2 –3 mm from the top. 

7. Subject the supernatant to ultracentrifugation at 100,000×g at 4°C for 60 min. 

Carefully remove the supernatant without disturbing the pellet. 

8. The pellets are enriched with MVs designed as collagenase-released MVs 

(CRMVs) (see Note 5). 

9. Confirm the enrichment of CRMVs by assaying the ALP activity using an 

aliquot of the pellet (see Note 6 and Subheading 3.5). 

10. Resuspend the rest of the pellet in 25 mM NH 4 HCO 3 , pH 8.4. Quantify the 

amount of CRMV proteins in the pellet by BCA assay (see Note 3). 

3.4. Isolation of MVs from Medium 

1. Grow MC3T3-E1 cells in differentiation medium in four 10-cm plates. 

2. On day 15, collect the media from multiple plates (see Note 2). 

3. Separate cellular debris from the medium by centrifugation at 20,000×g for 30 min 

at 4°C. 

4. Transfer the supernatant to Ultra-Clear centrifuge tubes. Use the centrifuge 

tubes that fit the volume of the supernatant. 

5. Further centrifuge the supernatant by ultracentrifugation at 100,000×g for 60 min. 

6. Carefully remove the supernatant. The MVs in the pellet are designated as medium 

MVs (MMVs) (see Note 5 and Fig. 1). 

7. Resuspend an aliquot of the MMV sample in 25 mM NH 4 HCO 3 , pH 8.4. 

Determine the protein concentration in the pellet by BCA assay. 

3.5. Alkaline Phosphatase Assay 

1. For the standard curve: Dilute PNP standard 1:10 in dH 2 O. Add 0, 2, 4, 6, 8, 10, 

20, 30, 40, and 50 μl of the standard (i.e., 0, 2, 2, 4, 6, 8, 10, 20, 30, 40, and 

50 nmol, respectively) to the wells of a flat-bottom 96-well microtiter plate. Add 

mild lysis buffer to make a total volume of 135 μl. 

2. For the CRMV and MMV samples: Resuspend an aliquot of the ultracentrifuged 

pellet in mild lysis buffer. Quantify the protein by BCA assay. Based on the BCA 

assay results, add 25 μg of protein to the 96-well microtiter plate. Add mild lysis 

buffer further to make a total volume of135 μl/well. 

3. Add 25 μl of alkaline buffer and 25 μl of p-nitrophenyl phosphate (PNPP) to each 

well. 

4. Incubate the microtiter plate at 37°C for up to 3 h. Monitor the colorimetric 

change every hour by measuring absorbance at 405 nm using the microtiter plate 

reader. Stop incubation when the absorbance of the sample reaches the range of 

the standards. 

5. Determine the ALP activity in MV samples by comparing to the PNP standard 

curve. Report the ALP activity as nmol PNP produced per minute per milligram 

of protein used (see Note 6).


3.6. Strong Cation Exchange Liquid Chromatography of Peptides 

1. Digest 100 μg of ECM, CRMV, or MMV proteins in 25 mM NH 4 HCO 3 , pH 8.4, 

with trypsin using a trypsin-to-protein ratio of 1:40. For 100 μg of protein, add 

2.5 μg of trypsin. Incubate the digestion at 37°C overnight (see Note 7). 

2. Lyophilize the peptide digests in a vacuum centrifuge. 

3. Dissolve peptide digests in 100 μl of 25% (v/v) acetonitrile containing 0.1% (v/v) 

formic acid. 

4. Inject the peptides onto a SCX-LC column (1 × 150 mm, polysulfoethyl A). 

5. Maintain the flow rate of the column at 50 μl/min. Mobile phase A is 25% (v/v) 

acetonitrile, and mobile phase B is 25% (v/v) acetonitrile with 0.5 M ammonium 

formate (pH 3). 

6. Elute the peptides using the following 96-min gradient method: 3% B for 3 min, 

followed by a linear increase to 10% B in 43 min, a further increase to 45% B 

in 40 min, and then to 100% B in 10 min. Monitor the peptide separation by 

fluorescence (266 nm excitation/350 nm emission). Collect fractions every minute 

for 96 min (see Note 8). 

7. Based on the chromatogram, pool the adjacent fractions into a total of 20 fractions 

and lyophilize (see Notes 9 and 10). 

8. Resuspend each pooled fraction in 20 μl of 0.1% (v/v) formic acid prior to 

nanoRPLC-MS analysis. 

3.7. Nanoflow Reversed-Phase Liquid Chromatography Tandem Mass 

Spectrometry 

1. Cut a 12-cm piece of 75 μm i.d. × 360 μm o.d. fused silica capillary column. Use 

a torch to briefly flame the section about 2 cm near one end. Once the flamed 

section is soft, pull the column to make a 10-cm long section with a closed tip. 

To make a fine and flat opening at the end of the tip, lightly score near the end 

of the closed tip using a ceramic cutter, and then break the end away. 

2. Connect the column to the slurry packer. Pack the column with 5 μm, 300 Å pore 

size C-18 silica-bonded stationary reversed-phase particles. 

3. Connect the column to an Agilent 1100 nanoLC system coupled with a LIT mass 

spectrometer (LTQ, ThermoElectron, operated with Xcalibur 1.4 SR1 software). 

4. Transfer the peptide fractions into glass vials. Inject 6 μl of the solution. 

5. Mobile phase A is 0.1% (v/v) formic acid and B is 0.1% (v/v) formic acid in 

acetonitrile. Elute the peptides using the following gradient method: 2% B at 

500 nl/min in 30 min; a linear increase of 2–42% B at 250 nl/min in 110 min; 

42–98% in 30 min including the first 15 min at 250 nl/min and then 15 min at 

500 nl/min; 98% at 500 nl/min for 10 min. 

6. Set the capillary temperature and electrospray voltage at 160°C and 1.5 kV, 

respectively. The LIT-MS is operated in a data-dependent MS/MS mode where 

the five most abundant peptide molecular ions in every MS scan are sequentially 

selected for collision-induced dissociation (CID) using a normalized collision


energy of 35%. Apply dynamic exclusion to minimize repeated selection of 

peptides previously selected for CID (see Notes 11 and 12). 

3.8. Bioinformatic Analysis 

1. Search the tandem mass spectra against the UniProt proteomic database from 

the European Bioinformatics Institute (http://www.ebi.ac.uk/) with SEQUEST 

operating on a 40-node Beowulf cluster (SEQUEST Cluster version 3.1 SR1, 

Bioworks Browser 3.2). Limit the search to peptides generated with fully tryptic 

cleavage constraints. 

2. Set legitimate peptide identification criteria as follows: charge state and crosscorrelation 

(X corr ) scores of 1.9 for [M + H] 1+ , 2.2 for [M + 2H] 2+ , 3.1 for 

[M + 3H] 3+ , and a minimum delta correlation (△C n ) of 0.08. 

3. Base protein identification exclusively on unique peptide hits, i.e., peptides whose 

sequence is unique to a given protein (see Notes 13 and 14). 

3.9. Immunofluorescence Staining 

1. Plate 50,000 cells on glass cover slips in 6-well plates. Culture in differentiation 

medium. 

2. On day 15, briefly wash the cells with PBS. 

3. Fix the cells in 3.7% (v/v) formaldehyde in PBS for 10 min. 

4. Incubate with 10% (v/v) normal blocking serum in PBS. 

5. Briefly wash the cells with PBS; incubate with primary antibodies for 1.5 h. 

6. Wash the cells three times with PBS for 5 min each, and then incubate with 

secondary antibodies conjugated with fluorochrome (FITC or Texas Red) for 1 h. 

7. Wash the cells three times with PBS for 5 min each, including once with DAPI 

diluted 1:50,000 in PBS to stain nuclei. 

8. Mount the cover slips on microscope glass slides with ProLong mounting reagent. 

9. Observe the cells using a confocal fluorescence microscope (see Note 14). 

4. Notes 

1. MC3T3-E1 pre-osteoblast cells are derived from newborn murine calvaria (28). 

These cells closely resemble primary cell cultures in their proliferation, differentiation, 

and mineralization (29,30,31). The combination of ascorbic acid and 

-glycerophosphate stimulates MC3T3-E1 to undergo differentiation, which is 

characterized by substantial matrix mineralization (32,33). Therefore, it is a 

suitable model for the enrichment of ECM and isolation of MVs. 

2. It is necessary to culture multiple 10-cm plates (four or more at approximately 

4 × 10 6 cells /plate) in order to obtain sufficient amount of protein from ECM 

or MVs. 

3. Protein quantitation is a common laboratory procedure. The instructions are 

included within the BCA assay kit (Pierce); therefore, the procedure is not 

described in this chapter.


4. The liberase/blendzyme 1 is a mixture of highly purified collagenase and 

dispase that offers gentle protease activity as compared to other ECM-degrading 

enzymes. Note that four blendzyme mixtures with increasing levels of enzymatic 

strength are available from Roche. Blendzyme 1 is the mildest version. The 

digestion time varies depending on the cell or tissue type. Alternatively, collagenase/dispase 

(1 mg/ml of collagenase/dispase in PBS-containing collagenase, 

0.1 U/ml and dispase, 0.8 U/ml) (Sigma Chemical Co., St. Louis, MO) can 

be used. Collagenase/dispase enzyme mixture is commonly used to digest the 

ECM. 

5. Two approaches are designed to isolate MVs either from the ECM or directly 

from the cell culture medium. In the first approach, enzymatic digestion and 

ultracentrifugation are combined to release MVs embedded in the ECM (designated 

as CRMVs). In the second approach, ultracentrifugation is applied to the 

medium to isolate MVs, designated as MMVs (34). To confirm the enrichment of 

MVs, the ultracentrifugation pellets are fixed and examined using transmission 

electron microscopy (Fig. 2). 

6. Measurement of the enzymatic activity of ALP is a standard marker for MV 

isolation (35,36). 

7. Instead of using the buffer provided along with trypsin, it is desirable to 

resuspend trypsin in 25 mM NH 4 HCO 3 , pH 8.4. The trypsin-to-protein ratio 

should be between 1:40 and 1:50. The digestion mixture is incubated overnight 

(approximately 16 h). 

8. The LIF detector used in this method can be constructed in-house (37). The 

LIF detector is more sensitive than a conventional lamp-based fluorescence 

detector. The use of a LIF detector is particularly advantageous when a narrow 

bore column (


peptide is capable of identifying more proteins than the online procedure. Thus, 

the offline separation is described in this chapter. 

10. The pooling step is optional. The peptide fractions can be pooled based on the 

complexity of the chromatogram. In general, pooling to about 20 fractions is 

appropriate. It will save LC-MS running time without compromising the number 

of proteins that the approach can identify. 

11. In general, the MS data acquisition time is set to 150 min, starting 30 min after 

the beginning of the peptide elution gradient and synchronized to end with the 

elution gradient. 

12. An alternative approach: the resulting ECM, CRMV, or MMV protein samples 

can be resolved by SDS-PAGE and the proteins visualized by Coomassie 

staining. The protein bands that are of greater intensity than those prepared 

from undifferentiated cells can be excised and subjected to in-gel digestion with 

trypsin and analyzed using nanoRPLC-MS/MS (27). 

13. Proteins that are identified in both CRMV and MMV purifications can be 

considered as authentic MV proteins with a higher degree of confidence than 

those that were identified in only one of the preparations. 

14. Gene ontology (GO) (www.geneontology.org) can be used to annotate the 

identified proteins and categorize them according to their cellular location, 

molecular function, and cellular processes they are associated with. 

15. The validation of known MV proteins is conducted using Western blotting 

or immunofluorescence staining. Annexin V, a known constituent of MVs, is 

used as a protein landmark to locate vesicles in these experiments (38). The 

osteoblast cells can be double- stained with anti-annexin V and an additional 

antibody against either the extracellular protein emilin-1 or the ras GTPase, 

IQGAP1 (27). 


This project has been funded in whole or in part with Federal funds from 

the National Cancer Institute, National Institutes of Health, under Contract No. 

N01-CO-12400. The content of this publication does not necessarily reflect 

the views or policies of the Department of Health and Human Services, nor 

does the mention of trade names, commercial products, or organization imply 

endorsement by the US Government. 

References 

1. Holmbeck, K. and Szabova, L. (2006) Aspects of extracellular matrix remodeling 

in development and disease. Birth Defects Res C Embryo Today 78, 11–23. 

2. Brooke, B. S., Karnik, S. K. and Li, D. Y. (2003) Extracellular matrix in vascular 

morphogenesis and disease: structure versus signal. Trends Cell Biol 13, 51–56. 

3. Tahinci, E. and Lee, E. (2004) The interface between cell and developmental 

biology. Curr Opin Genet Dev 14, 361–366.


4. Harada, S. and Rodan, G. A. (2003) Control of osteoblast function and regulation 

of bone mass. Nature 423, 349–355. 

5. Beck, G. R., Jr. (2003) Inorganic phosphate as a signaling molecule in osteoblast 

differentiation. J Cell Biochem 90, 234–243. 

6. Aubin, J. E. (2001) Regulation of osteoblast formation and function. Rev Endocr 

Metab Disord 2, 81–94. 

7. Anderson, H. C. (1995) Molecular biology of matrix vesicles. Clin Orthop Relat 

Res, 266–280. 

8. Anderson, H. C. (2003) Matrix vesicles and calcification. Curr Rheumatol Rep 5, 

222–226. 

9. Anderson, H. C., Garimella, R. and Tague, S. E. (2005) The role of matrix vesicles 

in growth plate development and biomineralization. Front Biosci 10, 822–837. 

10. Kirsch, T. (2005) Annexins – their role in cartilage mineralization. Front Biosci 

10, 576–581. 

11. Hessle, L., Johnson, K. A., Anderson, H. C., Narisawa, S., Sali, A., Goding, J. W., 

Terkeltaub, R. and Millan, J. L. (2002) Tissue-nonspecific alkaline phosphatase 

and plasma cell membrane glycoprotein-1 are central antagonistic regulators of 

bone mineralization. Proc Natl Acad Sci USA 99, 9445–9449. 

12. Johnson, K. A., Hessle, L., Vaingankar, S., Wennberg, C., Mauro, S., Narisawa, S., 

Goding, J. W., Sano, K., Millan, J. L. and Terkeltaub, R. (2000) Osteoblast tissuenonspecific 

alkaline phosphatase antagonizes and regulates PC-1. Am J Physiol 

Regul Integr Comp Physiol 279, R1365–1377. 

13. Morris, D. C., Masuhara, K., Takaoka, K., Ono, K. and Anderson, H. C. (1992) 

Immunolocalization of alkaline phosphatase in osteoblasts and matrix vesicles of 

human fetal bone. Bone Miner 19, 287–298. 

14. Baldini, V., Mastropasqua, M., Francucci, C. M. and D’Erasmo, E. (2005) Cardiovascular 

disease and osteoporosis. J Endocrinol Invest 28, 69–72. 

15. Dao, H. H., Essalihi, R., Bouvet, C. and Moreau, P. (2005) Evolution and 

modulation of age-related medial elastocalcinosis: impact on large artery stiffness 

and isolated systolic hypertension. Cardiovasc Res 66, 307–317. 

16. Reynolds, J. L., Joannides, A. J., Skepper, J. N., McNair, R., Schurgers, L. J., 

Proudfoot, D., Jahnen-Dechent, W., Weissberg, P. L. and Shanahan, C. M. (2004) 

Human vascular smooth muscle cells undergo vesicle-mediated calcification in 

response to changes in extracellular calcium and phosphate concentrations: a 

potential mechanism for accelerated vascular calcification in ESRD. J Am Soc 

Nephrol 15, 2857–2867. 

17. Abedin, M., Tintut, Y. and Demer, L. L. (2004) Vascular calcification: mechanisms 

and clinical ramifications. Arterioscler Thromb Vasc Biol 24, 1161–1170. 

18. Tintut, Y. and Demer, L. L. (2001) Recent advances in multifactorial regulation 

of vascular calcification. Curr Opin Lipidol 12, 555–560. 

19. Stewart, D. A., Cooper, C. R. and Sikes, R. A. (2004) Changes in extracellular 

matrix (ECM) and ECM-associated proteins in the metastatic progression of 

prostate cancer. Reprod Biol Endocrinol 2, 2.


20. Yin, J. J., Pollock, C. B. and Kelly, K. (2005) Mechanisms of cancer metastasis to 

the bone. Cell Res 15, 57–62. 

21. Mundy, G. R. (2002) Metastasis to bone: causes, consequences and therapeutic 

opportunities. Nat Rev Cancer 2, 584–593. 

22. Roodman, G. D. (2004) Mechanisms of bone metastasis. N Engl J Med 350, 

1655–1664. 

23. Yates, J. R., III. (2004) Mass spectral analysis in proteomics. Annu Rev Biophys 

Biomol Struct 33, 297–316. 

24. Yates, J. R., III, Gilchrist, A., Howell, K. E. and Bergeron, J. J. (2005) Proteomics 

of organelles and large cellular structures. Nat Rev Mol Cell Biol 6, 702–714. 

25. Domon, B. and Aebersold, R. (2006) Mass spectrometry and protein analysis. 

Science 312, 212–217. 

26. Aebersold, R. and Mann, M. (2003) Mass spectrometry-based proteomics. Nature 

422, 198–207. 

27. Xiao, Z., Camalier, C. E., Nagashima, K., Chan, K. C., Lucas, D. A., de la 

Cruz, M. J., Gignac, M., Lockett, S., Issaq, H. J., Veenstra, T. D., Conrads, T. P. 

and Beck Jr, G. R. (2006) Analysis of the extracellular matrix vesicle proteome in 

mineralizing osteoblasts. J Cell Physiol, In press. 

28. Sudo, H., Kodama, H. A., Amagai, Y., Yamamoto, S. and Kasai, S. (1983) In vitro 

differentiation and calcification in a new clonal osteogenic cell line derived from 

newborn mouse calvaria. J Cell Biol 96, 191–198. 

29. Choi, J. Y., Lee, B. H., Song, K. B., Park, R. W., Kim, I. S., Sohn, K. Y., 

Jo, J. S. and Ryoo, H. M. (1996) Expression patterns of bone-related proteins during 

osteoblastic differentiation in MC3T3-E1 cells. J Cell Biochem 61, 609–618. 

30. Quarles, L. D., Yohay, D. A., Lever, L. W., Caton, R. and Wenstrup, R. J. 

(1992) Distinct proliferative and differentiated stages of murine MC3T3-E1 cells 

in culture: an in vitro model of osteoblast development. J Bone Miner Res 7, 

683–692. 

31. Franceschi, R. T., Iyer, B. S. and Cui, Y. (1994) Effects of ascorbic acid on collagen 

matrix formation and osteoblast differentiation in murine MC3T3-E1 cells. J Bone 

Miner Res 9, 843–854. 

32. Beck, G. R., Jr, Sullivan, E. C., Moran, E. and Zerler, B. (1998) Relationship 

between alkaline phosphatase levels, osteopontin expression, and mineralization in 

differentiating MC3T3-E1 osteoblasts. J Cell Biochem 68, 269–280. 

33. Beck, G. R., Jr, Zerler, B. and Moran, E. (2001) Gene array analysis of osteoblast 

differentiation. Cell Growth Differ 12, 61–83. 

34. Johnson, K., Moffa, A., Chen, Y., Pritzker, K., Goding, J. and Terkeltaub, R. (1999) 

Matrix vesicle plasma cell membrane glycoprotein-1 regulates mineralization by 

murine osteoblastic MC3T3 cells. J Bone Miner Res 14, 883–892. 

35. Ali, S. Y., Sajdera, S. W. and Anderson, H. C. (1970) Isolation and characterization 

of calcifying matrix vesicles from epiphyseal cartilage. Proc Natl Acad Sci USA 

67, 1513–1520. 

36. Dean, D. D., Schwartz, Z., Bonewald, L., Muniz, O. E., Morales, S., Gomez, R., 

Brooks, B. P., Qiao, M., Howell, D. S. and Boyan, B. D. (1994) Matrix vesicles


produced by osteoblast-like cells in culture become significantly enriched in 

proteoglycan-degrading metalloproteinases after addition of beta-glycerophosphate 

and ascorbic acid. Calcif Tissue Int 54, 399–408. 

37. Chan, K. C., Muschik, G. M. and Issaq, H. J. (2000) Solid-state UV laser-induced 

fluorescence detection in capillary electrophoresis. Electrophoresis 21, 2062–2066. 

38. Wang, W., Xu, J. and Kirsch, T. (2005) Annexin V and terminal differentiation of 

growth plate chondrocytes. Exp Cell Res 305, 156–165.

IV 

Clinical Proteomics and Antibody Arrays

14 

Miniaturized Parallelized Sandwich Immunoassays 

Hsin-Yun Hsu, Silke Wittemann, and Thomas O. Joos 

Summary 

This chapter describes the development and use of bead-based miniaturized multiplexed 

sandwich immunoassays for focused protein profiling. Bead-based protein arrays 

or suspension microarrays allow simultaneous analysis of a variety of parameters within 

a single experiment. In suspension microarrays capture antibodies are coupled onto colorcoded 

microspheres. 

The applications of suspension microarrays are described, which allow to analyze 

proteins present in different types of body fluids, such as serum or plasma, cerebrospinal, 

pleural and synovial fluids, as well as cell culture supernatants. The chapter is divided into 

the generation of suspension microarrays, sample preparation, processing of suspension 

microarrays, validation of analytical performance, and finally pattern generation using 

bioinformatics tools. 

Key Words: suspension microarray; microspheres; immunoassay; protein profiling; 

biological fluids; serum; pleura; cell culture supernatants; cerebrospinal fluid; synovial 

fluid. 


Protein microarray technology allows simultaneous determination of a large 

variety of analytes from a minute amount of sample within a single experiment. 

Assay systems based on this technology are currently applied for identification 

and quantitation of proteins. Protein microarray technology is of major interest 

for proteomic research in basic and applied biology as well as for diagnostic 

applications. Miniaturized and parallelized assay systems have reached adequate 

sensitivity, and hence have the potential to replace singleplex analysis systems. 



247

248 Hsu et al. 

Beside the well-known planar microarray-based systems, which are perfectly 

suited to screen a large number of target proteins, bead-based systems named 

suspension assays are a very interesting alternative, especially when the number 

of parameters of interest is comparably low. Suspension assay systems employ 

different color-coded or size-coded microspheres as the solid support for capture 

molecules. A flow cytometer, which is able to identify each individual type of 

bead and quantify the amount of captured targets on each individual bead, is 

used as a readout system. In the first step, antigen-specific capture antibodies 

are immobilized on the individual bead type. Different bead types are combined 

and incubated with the sample of interest. A labeled secondary antibody 

detects the captured analytes and is visualized with a fluorescent reporter 

system. Sensitivity, reliability, and accuracy are similar to those observed with 

standard microtiter ELISA procedures (1). Color-coded microspheres can be 

used to perform up to a hundred different assay types simultaneously. The flow 

cytometer identifies several thousand microspheres in a second, and simultaneously 

quantitates the amount of captured analytes (2,3,4,5,6). Suspension 

microarrays are currently advanced within the field of miniaturized multiplexed 

ligand binding assays with respect to automation and throughput (7). 

Miniaturized parallelized assay systems have to demonstrate appropriate 

sensitivity, precision, and reliability before they will be applied for screening 

or diagnostic purposes. 

This chapter describes the development and use of suspension antibody 

microarrays for protein profiling of several human body fluids. The standard 

methodology guidance is described to validate immunoassays (10,11,12) and to 

determine the sensitivity, precision, and accuracy of the multiplexed analysis. 

In the final section, data analysis is described to show how to deal with highdimension 

data sets (13,14). 

2. Materials 

2.1. Equipment 

1. Centrifuge: 5415D (Eppendorf) 

2. Vortex Mixer (Neolab) 

3. Ultrasonic bath 

4. Thermomixer (Eppendorf) 

5. Luminex100 instrument (Luminex Corp.) 

6. Vacuum manifold (Millipore) 

7. Filterplates (Millipore 96-well plate, cat. # MAB1250) 

8. Microcentrifuge tubes (Starlab 1.5 ml, cat. # I1415-2500) 

9. Carboxylated Beads (Qiagen, cat. # 922400 or Luminex Corp.) 

10. Deionized water

Miniaturized Parallelized Sandwich Immunoassays 249 

2.2. Common Reagents and Materials 

1. Bovine serum albumin (BSA, Roth T844.2) 

2. PBS (Fischer Scientific, cat. # 9472615) 

3. EDC (Pierce) 

4. Sulfo-NHS (Pierce) 

5. Detection reagent: Streptavidin-phycoerythrin (Streptavidin-PE) stock solution 

(1 mg/ml) in 100 mM NaCl, 100 mM sodium phosphate, pH 7.5, containing 

2 mM sodium azide (Molecular Probes, cat. #S21388) 

2.3. Buffers 

1. Activation buffer [100 mM sodium phosphate (Na 2 HPO 4 ), pH 6.2] 

2. Coupling buffer (50 mM MES, pH 5.0) 

3. Washing buffer [PBS, pH 7.4, and 0.05 % (v/v) Tween-20] 

4. Blocking/storage (B/S) buffer: 1% BSA fraction IV (Roth, cat. # T844.2) in 1× 

PBS 

5. Assay buffer formulation: 1% BSA fraction IV in 1×PBS 

3. Methods 

3.1. Principle 

The principle of suspension antibody microarrays is based on sandwich 

immunoassays as represented in Fig. 1. First-capture antibodies are coupled to 

carboxylated microspheres. For performing suspension antibody microarrays, 

the samples are incubated with coupled microspheres. Bound analytes are 

detected with biotinylated antibodies. Phycoerythrin-labeled streptavidin is used 

for signal detection. Finally, microspheres are identified by a flow cytometer, 

hence allowing the quantitation of the captured analytes. 

3.2. Production of Suspension Microarrays—Antibody Coupling to 

Carboxylated Microspheres (see Note 1) 

Using proven carbodiimide coupling chemistry, the antibodies are covalently 

immobilized on carboxylated beads via the amine groups in lysine side chains. 

Before coupling, the beads are first activated using EDC/Sulfo-NHS. 

Fig. 1. Processing of suspension microarrays. Schematic representation of the steps 

required for performing a suspension microarray immunoassay. Figure reproduced from 

Proteomics of Human Body Fluids: Principles, Methods and Applications, edited by 

Thongboonkerd (2006). (Continued) 

◮

250 Hsu et al.


The antibodies should not contain foreign protein, azide, glycine, Tris, or 

any other reagent containing primary amine groups. Otherwise, the antibodies 

must be purified by gel-filtration chromatography or dialysis before use. 

3.2.1. Bead Activation 

1. Sonicate the carboxylated bead stock suspension for 15–20 s to yield a homogeneous 

bead suspension. Thoroughly vortex the bead stock suspension for at least 

10 s. Take 2.5 × 10 6 beads per coupling reaction. 

2. Transfer the bead stock suspension to Starlab microcentrifuge tube. 

3. Briefly centrifuge the bead suspension (a quick spin up to 3000×g is sufficient) 

and discard the supernatant. 

4. Wash the beads with 80 μl activation buffer. Briefly vortex and centrifuge at 

10,000×g for 2 min. Discard the supernatant and repeat washing. 

5. Resuspend the beads in 80 μl activation buffer. Sonicate for 15–20 s to yield a 

homogeneous bead suspension. 

6. Freshly prepare EDC solution (50 mg/ml) and Sulfo-NHS solution (50 mg/ml) 

(see Notes 2 and 3). 

7. Add 10 μl of EDC solution and 10 μl of Sulfo-NHS solution to the bead suspension. 

Incubate for 20 min at room temperature (15–25°C) in the dark. 

3.2.2. Coupling of Antibodies to Activated 

Carboxylated Beads 

8. Dilute the protein stock solution with coupling buffer to a concentration of 

100 μg/ml in a volume of 500 μl. 

9. Centrifuge the beads at 10,000×g for 2 min and discard the supernatant. 

10. Wash the beads with 500 μl of coupling buffer. Briefly vortex and centrifuge at 


11. Add the diluted antibody solution (500 μl) from step 8. 

12. Wrap the tube in aluminum foil to exclude light. Gently agitate the tube with 

activated beads and antibody solution on a plate shaker for 2hatroom temperature 

(15–25°C). 

3.2.3. Washing and Storage of Coupled 

Carboxylated Beads 

13. Centrifuge the beads at 10,000×g for 2 min and carefully remove and discard 

the supernatant. 

14. Wash the beads with 500 μl of washing buffer. Briefly vortex and centrifuge at 


15. Resuspend the bead pellet in 1 ml B/S buffer including 0.05% (w/v) azide. 

16. Determine the bead concentration of the suspension using a cell-counting 

chamber.


3.2.4. Counting Beads Using a Cell-Counting Chamber 

1. Add 5 μl of beads to 45 μl of PBS and mix. 

2. The hemacytometer is filled with 10 μl of the sample by placing the pipette tip 

against the loading “V” of the hemacytometer at a 45° angle. The sample is 

slowly released between the slide and the cover slip until the counting chamber 

is loaded. It is important to fill both sides of the chamber and wait for 2–3 min 

to allow the beads to settle. 

3. Count the cells at two opposite corners of the scored chamber and take an average. 

Each of the nine squares on the grid has an area of 1 mm 2 , and the coverglass 

rests 0.1 mm above the floor of the chamber. Thus, the volume over the central 

counting area is 0.1 mm 3 or 0.1 ml. Multiply the average number of beads in 

each central counting area by 10,000 to obtain the number of beads per milliliter 

of diluted sample. Multiply by the dilution factor of 10 to get beads/ml. 

4. Store the beads at 25×, typically 5×10 6 beads/ml. 

3.3. Processing of Bead-Based Multiplex Assays 

3.3.1. Sample Preparation 

Here, the preparation of proteins for use in multiplexed assay from clinical 

specimens or cell culture is described. Subheading 3.3.1.1 describes the use 

of serum or plasma; Subheading 3.3.1.2 describes the analysis of proteins 

present in cell culture supernatants; Subheading 3.3.1.3 describes the sample 

preparation of cerebrospinal, synovial, and pleural fluids. 

3.3.1.1. Serum or Plasma Samples 

Serum and plasma samples should be spun down (8000×g) prior to assay 

to remove particulate and lipid layers. This will prevent the blocking of wash 

plate as well as sample needle. The samples should be handled as biohazards 

since they may carry infectious agents. Freezing-thawing cycles might result in 

a measurable breakdown of some proteins (e.g., cytokines), and so the samples 

should be aliquoted before any experiment. The storage of aliquoted samples at 

–80°C is recommended. When we analyzed eight matched serum and plasma 

samples on the Luminex platform, no differences were seen between samples 

that underwent a freeze-thaw for levels of TNF, Eotaxin, IL-13, MCP-1, IFN, 

IL-12p70, MIP-1, IP-10, or GM-CSF. There was, however, a significant 

increase in IL-1 after freeze-thaw, suggesting that this process may liberate 

IL-1 from insoluble receptors. IL-1 and MCP-1 levels were significantly 

higher in plasma as compared to the matched serum sample. IP-10 was higher in 

serum. Figure 2 shows the freeze-thaw experiments to evaluate 10plex soluble 

receptor assays. It seemed that signal from some analytes was slightly decreased 

after freeze-thaw cycle; however, no statistically significant differences were


10,000 

1000 

MFI 

100 

10 

1 

thaw 

fresh 

thaw 

fresh 

thaw 

fresh 

thaw 

fresh 

thaw 

fresh 

thaw 

fresh 

thaw 

fresh 

thaw 

fresh 

thaw 

fresh 

thaw 

fresh 

gp130 ICAM Fas TNFRII VCAM IL-2R E-sel TNFRI RAGE MIF 

Fig. 2. Serum samples were drawn from three healthy donors. Each sample was 

divided into two parts. One part was measured directly after serum was taken; and the 

other part was subjected to a freeze-thaw cycle. Soluble receptors were analyzed using 

Luminex technology. There were no significant differences in MFI signals attributed 

to the freeze-thaw cycle. 

observed. Another important consideration in analyzing serum or plasma 

samples is the need for an appropriate buffer (described in Subheading 3.3.2). 

3.3.1.2. Cell Culture Samples 

Before use, the cell culture supernatants should be centrifuged at 14,000×g 

to remove any particulates. The cell culture supernatants can be diluted in their 

corresponding cell culture medium. As well as for serum samples, cell culture 

supernatants should be aliquoted and frozen at –80°C for any experiment. 

3.3.1.3. Cerebrospinal, Synovial, and Pleural Fluids 

Precious samples of limited volume such as cerebrospinal fluid (CSF) and 

synovial fluid are ideal candidates for multiplex analysis. To the synovial 

fluid, animal serum should be added to prevent heterophilic antibodies and 

rheumatoid factor (RF) binding, which can cause false positives. For cytokine 

assays, the samples may be filtered with a 50-kDa filter to remove the interfering 

antibodies. Another recently described method uses protein L to remove RF 

from serum(8). CSF samples have been analyzed for 22 cytokines using the 

Luminex platform, 11 cytokines were detected (9). The authors performed spike 

recovery experiments and describe the recoveries as good.


3.3.2. Diluent 

It is important that the diluents selected for reconstitution and dilution of 

the standards reflect the environment of the samples being measured. Diluents 

for specific sample types have to be validated prior to use. For analyzing cell 

culture samples, the standards and samples are diluted in the respective cell 

culture medium. It is important to use the same lot of fetal bovine serum (FBS) 

as there may be significant differences between lots, which can interfere with 

the assay. Another factor to ensure is the pH of the sample, which will affect 

antibody binding. For assaying serum samples, each laboratory should develop 

and validate an appropriate diluent. We suggest starting with PBS supplemented 

with 10–50% animal serum (e.g., fetal calf serum, horse serum or goat serum, 

depleted human serum). The goal is to mimic the serum matrix to ensure similar 

binding kinetics in both serum and standard samples. The serum samples may 

also require dilution with small amounts of serum to prevent false positives, 

as some human antibodies may show reactivity toward the mouse captures. 

Generally, 1–2% of each species of antibodies is sufficient. The serum diluent 

must not be used to dilute the detection antibody or the streptavidin-PE. 

3.3.3. Detection Antibody 

The concentration of detection antibody used can be varied to create 

an immunoassay with different sensitivity and dynamic range. The authors 

typically use detection antibody at a concentration between 0.5 μg/ml and 

1.0 μg/ml. Optimization is necessary. The quantitative range of the assay can 

be shifted by changing the antibody concentration. The dilution of the detection 

antibody shifts the standard curve to the lower concentration range, whereas an 

increased concentration shifts the curve to the higher concentration range. 

3.3.4. General Protocol for Processing Bead-Based Multiplex Assays for 

the Determination of Proteins in Human 

1. Centrifuge the sample at 14,000×g to precipitate any particulates before diluting 

into appropriate diluent. The dilution factors will vary depending on sample type 

and concentration of analyte. 

2. Resuspend the standard into appropriate diluent and prepare an eight-point 

standard curve using twofold serial dilutions. 

3. Wet filter plate with 100 μl assay buffer. 

4. Plate fitting: Add 50 μl of the standard or sample to each well. 

5. Sonicate the coupled beads for 15–20 s to yield a homogeneous suspension. 

Thoroughly vortex the beads for at least 10 s. 

6. Dilute the beads to 1500 beads per well, and add 25 μl of diluted bead suspension 

to each well.


7. Incubate for 2hinthedark at room temperature (see Note 4). 

8. Washing step: Apply vacuum manifold to the bottom of filter plate to remove 

liquid. Wash by adding 100 μl of assay buffer. Repeat washing twice. Resuspend 

the beads in 75 μl of assay buffer. 

9. Add 25 μl of the detection antibody solution to each well. 

10. Incubate for 1.5 h in the dark at room temperature. 




12. Add 25 μl of Streptavidin-Phycoerythrin solution to each well. 

13. Incubate for 0.5 h in the dark at room temperature. 




15. Incubate on a plate shaker for 1 min. 

16. Read the results on Luminex 100 instrument. 

17. Data evaluation: We recommend extrapolating the sample concentrations from 

a 4-PL or 5-PL curve. 

3.3.5. Screening Protocol: 10plex Soluble Receptor Assay for Serum 

Samples 

1. Resuspend the standard into appropriate diluent and prepare an eight-point 

standard curve using twofold serial dilutions. 

2. Block the plate with 100 μl B/S buffer (1% BSA in PBS). 

3. Beads: 1500 beads of each colored code. 

4. Prepare an eight-point standard row mixture in 10% horse serum in B/S buffer 

by 1:2 serial dilutions. The highest concentration (ng/mL) used in the standard 

curves is shown in the following table: 

Molecule IL-2R E-Selectin Icam Fas gp130 TNFRI TNFRII RAGE VCAM MIF 

ng/mL 2 6 5 1 2 0.8 1.5 2 5 4 

5. Prepare the samples by 1:10 dilution in B/S buffer. 

6. Add 30 μl beads and 30 μl sample (or standard) into the wells. 

7. Incubate and shake for 1.5 h at room temperature. 

8. Wash 3×, each time with 100 μl PBS. 

9. Prepare the detection antibody mixture in B/S buffer as shown below: 

Det. Ab -IL-2R -E-Selectin -Icam -Fas -gp130 -TNFRI -TNFRII -RAGE -VCAM -MIF 

μg/mL 0.4 1 0.4 0.4 1 1 0.6 0.8 0.8 0.8


10. Add 30 μl detection antibody mixture to each well, incubate, and shake for 1 h 

at room temperature 

11. Wash 3× each time with 100 μl PBS. 

12. Prepare Streptavidin-PE solution (5 μg/mL) in B/S buffer and pipette 30 μl to 

each well. 

13. Incubate and shake for 30 min at room temperature. 

14. Wash 3×, each time with 100 μl PBS. 

15. Resuspend the beads in 100 μl B/S buffer. 

16. Read the data in Luminex100. 

3.4. Validation of Analytical Performance of Miniaturized 

Multiplexed Protein Assays 

3.4.1. Accuracy 

Accuracy is expressed by the closeness of the measured value to the true 

value. It should be assessed using a minimum of five determinations over a 

minimum of three concentrations across the expected range of the assay. A 

deviation of 15% of the measured value to the true value is acceptable. Several 

methods for estimating accuracy are available. 

1. by comparing the measured analyte values with those of reference data; 

2. by adding known quantities of the analyte into an appropriate test matrix (e.g., 

serum, plasma). Then, the recovery is expressed as the measured analyte concentration 

relative to the added analyte concentration. The recovery (%) is calculated 

as follows: the background concentration of the matrix plus 

Recovery (%) = 

Measured analyte concentration 

Background analyte concentration in text matrix + added analyte concentration ∗100 

3.4.2. Selectivity 

Selectivity can be assessed by performing cross-reactivity experiments where 

multiplex assay is performed with each of the standards assayed separately. 

This will ensure that the capture antibody is selective for its respective analyte 

only in the assay. 

3.4.3. Specificity 

Specificity is defined by the ability of an assay to measure unequivocally the 

amount of an analyte in the presence of interfering substances. Non-specificity 

might be derived from cross-reactivity of the antibody used in the assay with 

other proteins or antibodies present in the sample.


3.4.4. Precision 

Precision is expressed by the closeness of agreement between a series of 

repeated measurements. It should be assessed using a minimum of five determinations 

over a minimum of three concentrations across the expected range 

of the assay. The mean value should be within 15% of the coefficient of 

variation (CV). 

3.4.4.1. Repeatability 

Intra-assay precision, or repeatibility, expresses the precision under constant 

conditions. The measurements are performed within 1 day by the same analyst 

using identical reagents and the same instruments. 

3.4.4.2. Reproducibility 

Inter-assay precision, or reproducibility, expresses the precision by changing 

the measurement conditions, which may involve different analysts, reagents, 

instruments, and laboratories. 

3.4.5. Limits of Detection and Quantitation (see Note 5) 

3.4.5.1. Detection Limit 

The limit of detection (LOD) is the lowest amount of analyte in a sample 

that can be detected but not quantitated as an exact value. According to IUPAC 

definition (2), the limit of detection is estimated as the mean of the zero 

standard signal plus three times the standard deviation (SD) obtained on the 

zero standard signal: 

LOD = Mean zerostandard + 3 ∗ SD zerostandard 

3.4.5.2. Quantitation Limit 

The limit of quantitation (LOQ) is the lowest amount of analyte in a sample 

that can be quantitated with acceptable statistical significance. According to 

IUPAC definition, the limit of quantitation is estimated as the mean of the zero 

standard signal plus 10 times the SD obtained on the zero standard signal: 

LOQ = Mean zerostandard + 10 ∗ SD zerostandard 

3.4.6. Linearity 

Linearity is defined as the ability of an analytical procedure to produce 

signals that are directly proportional to the analyte concentration of the sample.


3.4.7. Range 

The range of an analytical procedure is defined by the interval between the 

upper and lower amounts of analyte within which the analyte can be detected 

with a suitable level of accuracy, precision, and linearity. 

3.4.8. Robustness 

Robustness expresses the extent to which the measured values remain 

unaffected by small variations in method parameters like temperature, reagent 

concentration, or instrumental parameters. It indicates the reliability of an 

analytical procedure during normal usage. Figure 3 indicates the standard 

curves of 10plex soluble receptor assay. The data have shown the feasibility 

and robustness of the assays. 

3.5. Pattern Generation 

After optimization of the assays, screening jobs can be performed, and 

huge amounts of data will be generated. To deal with high-dimensional 

10,000 

10plex soluble receptors assay 

MFI 

1000 

100 

10 

MIF 

VCAM 

RAGE 

TNFRII 

TNFRI 

gp130 

Fas 

ICAM 

IL-2R 

E-sel 

1 

10 100 1000 10,000 100,000 

Concentration (pg/ml) 

Fig. 3. The standard curves of 10plex soluble receptors assay were plotted according 

to average MFI readings from several individual measurements; standard deviation bars 

were included. The data reflected the range of the linearity and also the robustness of 

the assays.


data sets, some bioinformatic tools have been provided. For example, 

performing clustering analysis to distinguish different diseases or symptoms 

of diseases can lead to useful taxonomies, and correct diagnosis of clusters 

of symptoms is also extremely essential for successful therapy in the field of 

medicine. 

Table 1 summarizes the main features in CIMminer (Clustered Image 

Maps) (13) and MeV (MultiExperiment Viewer) (14). These are two platforms; 

both can be applied for the purposes mentioned above. Unsupervised hierarchical 

clustering analysis can be performed using the online tool CIMminer 

developed by the National Cancer Institute. MeV is another more integrated 

freeware, which was developed by TIGR (The Institute for Genomic Research). 

It has launched 23 modules in the analysis. Its capabilities to generate 

common clustering data, such as HCL (Hierarchical clustering) and ST (Support 

Trees), and several methods like TTEST (T-tests), SAM (Significance Analysis 

of Microarrays), ANOVA (Analysis of Variance), and TFA (Two-factor 

ANOVA) could help users discover significant parameters based on statistical 

analysis. Further sophisticated techniques can be applied including PCA 

(Principal Components Analysis), SOTA (Self Organizing Tree Algorithm), 

RN (Relevance Networks), KMC (K-Means/K-Medians Clustering), KMS (K- 

Means/K-Medians Support), CAST (Clustering Affinity Search Technique), 

QTC (QT CLUST), SOM (Self Organizing Maps), GSH (Gene Shaving), 

FOM (Figures of Merit), PTM (Template Matching), SVM (Support Vector 

Machines), KNNC (K-Nearest-Neighbor Classification), DAM (Discriminant 

Analysis Module), COA (Correspondence Analysis), TRN (Expression Terrain 

Maps), and EASE (Expression Analysis Systematic Explorer). 

Table 1 

Comparison of the Main Features in CIMminer and MeV 

CIMminer 

MeV 

Contributor NCI TIGR 

Analysis platform Web-based(http:// 

discover.nci.nih.gov/ 

cimminer/) 

Off-line / Free software( http:// 

www.tm4.org/mev.html ) 

Input file ”.txt”, “.zip” ”.txt”, “.mev”, “.tav”, “.gpr” 

Order Algorithm More Less 

Statistical analysis No Yes, significant parameters could 

be found out 

Results Color-coded Image Color-coded Image 

Reference Science 1997; 275:343–9 Biotechniques 2003; 34:374–8


4. Notes 

1. This method can also be adapted for coupling reactions of antigens, receptors, or 

other proteins. 

2. Minimize the exposure of EDC and Sulfo-NHS to air, and close containers tightly. 

Use fresh aliquots for each coupling reaction and discard after use. 

3. S-NHS solution (50 mg/ml) can be prepared and stored at –20°C. 

4. Incubation time can be varied. The authors typically incubate between 30 min and 

2 h. The primary incubation of the bead and sample can be performed overnight 

at 4°C for greater low-end sensitivity. 

5. The detection limit is primarily dependent on the quality of the antibodies 

used. Additionally, the detection limit is influenced by detection conditions (e.g., 

antibody concentration, incubation time), complexity of the multiplex assay, and 

matrix proteins. 

References 

1. Morgan, E., Varro, R., Sepulveda, H., Ember, J.A., Apgar, J., Wilson, J., Lowe, L., 

Chen, R., Shivraj, L., Agadir, A., Campos, R., Ernst, D., Gaur, A. (2004) 

Cytometric bead array: a multiplexed assay platform with applications in various 

areas of biology. Clin Immunol, 110, 252–66 

2. Dasso, J., Lee, J., Bach, H., Mage, R.G. (2002) A comparison of ELISA and 

flow microsphere-based assays for quantification of immunoglobulins. J Immunol 

Methods, 263, 23–33 

3. Carson, R.T., Vignali, D.A. (1999) Simultaneous quantitation of 15 cytokines using 

a multiplexed flow cytometric assay. J Immunol Methods, 227, 41–52 

4. Dunbar, S.A., Vander Zee C.A., Oliver, K.G., Karem, K.L., Jacobson, J.W. (2003). 

Quantitative, multiplexed detection of bacterial pathogens: DNA and protein applications 

of the Luminex LabMAP system. J Microbiol Methods, 53, 245–52 

5. Joos, T.O., Stoll, D., Templin, M.F. (2002) Miniaturised multiplexed immunoassays. 

Curr Opin Chem Biol, 6, 76–80 

6. Prabhakar, U., Eirikis, E., Davis, H.M. (2002) Simultaneous quantification of 

proinflammatory cytokines in human plasma using the LabMAP assay. J Immunol 

Methods, 260, 207–18 

7. Lu, J., Getz, G., Miska, E.A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Sweet- 

Cordero, A., Ebert, B.L., Mak, R.H., Ferrando, A.A., Downing, J.R., Jacks, T., 

Horvitz, H.R., Golub, T.R. (2005) MicroRNA expression profiles classify human 

cancers. Nature, 435, 834–8 

8. de Jager, W., Prakken, B.J., Bijlsma, J.W., Kuis, W., Rijkers, G.T. (2005) Improved 

multiplex immunoassay performance in human plasma and synovial fluid following 

removal of interfering heterophilic antibodies. J Immunol Methods, 300, 124–35 

9. Natelson, B.H., Weaver, S.A., Tseng, C.L., Ottenweller, J.E. (2005) Spinal fluid 

abnormalities in patients with chronic fatigue syndrome. Clin Diagn Lab Immunol, 

12, 52–5


10. Findlay, J.W., Smith, W.C., Lee, J.W., Nordblom, G.D., Das, I., DeSilva, B.S., 

Khan, M.N., Bowsher, R.R. (2000) Validation of immunoassays for bioanalysis: a 

pharmaceutical industry perspective. J Pharmaceutical Biomed Anal, 21, 1249–73 

11. Sanchez-Carbayo, M. (2006) Antibody arrays: technical considerations and clinical 

applications in cancer. Clin Chem, 52, 1651–9 

12. Kingsmore, S.F. (2006) Multiplexed protein measurement: technologies and applications 

of protein and antibody arrays. Nat Rev Drug Discov, 5, 310–20 

13. Weinstein, J.N., Myers, T.G., O’Connor, P.M., et al. (1997) An informationintensive 

approach to the molecular pharmacology of cancer. Science, 275, 343–9 

14. Saeed, A.I., Sharov, V., White, J., Li, J., Liang, W., Bhagabati, N., Braisted, J., 

Klapa, M., Currier, T., Thiagarajan, M., Sturn, A., Snuffin, M., Rezantsev, A., 

Popov, D., Ryltsov, A., Kostukovich, E., Borisovsky, I., Liu, Z., Vinsavich, A., 

Trush, V., Quackenbush, J. (2003). TM4: a free, open-source system for microarray 

data management and analysis. Biotechniques, 34(2), 374–8.

15 

Dissecting Cancer Serum Protein Profiles Using 

Antibody Arrays 

Marta Sanchez-Carbayo 

Summary 

Antibody arrays represent one of the high-throughput techniques enabling detection 

of multiple proteins simultaneously. One of the main advantages of the technology over 

other proteomic approaches resides on that the identities of the measured proteins are 

known at front of the experimental design or can be readily characterized, facilitating a 

biological interpretation of the obtained results. This chapter overviews the technical issues 

of the main antibody array formats as well as various applications using serum specimens 

in the context of neoplastic diseases. Clinical applications of antibody arrays vary from 

biomarker discovery for diagnosis, prognosis, and drug response to characterization of 

s protein pathways and modification changes associated with disease development and 

progression. As a high-throughput tool addressing protein levels and post-translational 

modifications, it improves the functional characterization of molecular bases for cancer. 

Furthermore, the identification and validation of protein expression patterns characteristic 

of cancer progression and tumor subtypes may enable tailored therapeutic intervention and 

improvement in the clinical management of cancer patients. Technical requirements such as 

lower sample volume, antibody concentration, format versatility, and high reproducibility 

support their increasing impact in cancer research. 

Key Words: antibody arrays; protein profiling; serum; direct labeling. 


1.1. Antibody Arrays in the Context of Other Proteomic Strategies 

Two main proteomic strategies can be taken in order to investigate the 

cancer proteome, named untargeted and targeted. The terminology refers to 



263

264 Sanchez-Carbayo 

whether the proteins to be measured are unknown and identified along an 

untargeted proteomic approach, or known and considered in the experimental 

design for targeted strategies. Untargeted architecture platforms are best suited 

for first-pass comparisons of proteomes to identify relatively few, novel, or 

known proteins that exhibit the greatest differences in abundance. The two 

most commonly used technologies are two-dimensional electrophoresis (2D) 

and low- and high-resolution mass spectrometry (1,2,3). Targeted architecture 

proteomic platforms measure and quantify proteins of interest identified previously, 

and are suited for analyses of quantitative differences in abundance 

among known protein families and pathways. The versatility of targeted 

platforms allows controlling and estimating the reproducibility, scalability, and 

precise quantification, leading to high sensitivity and coverage. This approach 

allows experimental designs to address specific hypothesis and biological interpretation 

of the results obtained. However, the number of proteins amenable 

for these analyses depends on the availability of antibodies with high affinity 

and specificity to bind a target protein. The main targeted techniques used for 

large-scale analysis of many samples and proteins include protein microarrays, 

multiplexed Western blots, and tissue arrays. Protein arrays represent the most 

versatile among the proteomics techniques available to date, since antigens, 

peptides, complex protein solutions, or antibodies can be immobilized to 

capture and quantify the presence of specific antibodies or proteins, respectively 

(1,2,3,4). 

1.2. Antibody Array Formats 

Innovation in the immobilization surfaces and detection strategies has led 

to an increasing number of planar antibody array technologies and bead-based 

versions. Planar antibody arrays represent the most common type of protein 

arrays, which is the major focus of the present chapter. This section describes 

the main formats of planar arrays covering their differences with bead-based 

assays (Fig. 1; for bead-based arrays, see also Chapter 14). 

The main planar label-based types comprise one-antibody assays (using 

one antibody to capture the target molecule) and sandwich assays (using two 

antibodies to capture the target protein) (1,2,3,4). One-antibody and sandwich 

assays present advantages and pitfalls over each other. In one-antibody labelbased 

assays, the targeted proteins are captured by an immobilized antibody 

and detected through labeling with a tag (Fig. 1A). In direct labeling, the 

proteins are labeled with a fluorophore, such as cyanines (Cy3 or Cy5). In 

indirect labeling, the proteins are labeled with a tag that is later detected by a 

labeled antibody. One-antibody label-based assays allow the incubation of two 

different samples, each labeled with a different tag on the arrays. Normalization 

is facilitated by co-incubating a reference sample with a test sample (1,2,3,4).

Dissecting Cancer Serum Protein Profiles 265 

ANTIBODY-BASED ARRAYS 

ANTIGEN-BASED ARRAYS 

A 

Direct 

Cy3 

Competitive 

Cy5 

C 

Reverse phase 

TSA 

Cy3 

Indirect 

Cy5 

Complex lysate 

Biotin 

Digoxigenin 

D 

Tumor-associated antigen arrays 

B 

Suspension: bead based 

RCA, RLS, ECL 

TSA, Bio-SA-Cy3 

Whole cell 

Membrane 

Autoantibody, e.g.: antip53 

Tumor antigen e.g.:p53 

Soluble 

Fig. 1. Main formats of planar and suspension protein arrays. RCA: rolling-circle 

amplification; RLS: resonance light scattering; ECL: enhanced chemiluminescence; 

TSA: tyramide signal amplification; SA: streptavidin. 

Another benefit is that these assays are competitive, since the analytes in the 

test and reference solutions compete for binding at the antibodies (1,2,3,4). 

This leads to improvement in the linearity of response and dynamic range as 

compared to non-competitive assays (4). The main disadvantage is related to 

the disruption of analyte–antigen interaction by the label, which may also limit 

the detection as well as sensitivity and specificity. 

In the sandwich label-based format, antibodies capture unlabeled proteins, 

which are detected by another antibody using several methods to generate the 

signal for detection (Fig. 1B). The use of two antibodies targeting each analyte 

increases the specificity as compared to one-antibody label-based assays. The 

reduced background of these assays increases also the sensitivity. The sandwich 

format allows only non-competitive assays, since only one sample can be 

incubated on each array (1,2,3,4). This results into sigmoidal binding response, 

as compared to linear ones in the competitive format, and requires standard 

curves of known concentrations of analytes to achieve accurate calibration of 

concentrations (4). As compared to one-antibody label-based assays, sandwich 

assays are more difficult to develop in a multiplexed manner, since matched 

pairs of antibodies and purified antigens may not be available for each target, 

and the potential cross-reactivity among detection antibodies increases with 

additional analytes (2,4). Currently, the practical size of multiplexed sandwich


assays limits to 30–50 different targets (1,2,3,4). This contrasts with oneantibody 

assays where only the availability of antibodies and space on the 

substrate limits the number of targets being analyzed. 

In addition to the planar arrays, suspension or bead-based arrays use 

different fluorescent beads, each coated with a different antibody and spectrally 

resolvable from each other [(5,6,7,8,9) and see chapter 14]. The beads are 

incubated with a sample to allow protein binding to the capture antibodies, and 

the mixture is incubated with a cocktail of detection antibodies, each corresponding 

to one of the capture antibodies. The detection antibodies are tagged 

to allow fluorescent detection. The beads are passed through a flow cytometer 

system, and each bead is probed by two lasers, one to read the color or identity 

of the beam, and another to read the amount of detection antibody on the 

bead (5,6,7,8,9). Multiplexed bead-based flow-cytometry assays represent an 

active area of development. Differentially identifiable beads coated with either 

proteins, autoantigens, or antibodies can identify a variety of bound antibodies 

or proteins using a cytometer system (5,6,7,8,9). Advances in instrumentation 

and bead chemistries will probably make this approach very valuable for the 

detection of circulating cancer cells in clinical practice. In another version 

of this concept, suspensions of cells can be incubated on antibody arrays, 

and the amount of cells that bound each antibody can be quantified by dark 

field microscopy. These arrays have the potential of characterizing multiple 

membrane proteins in specific cell populations or changes in cell surfaces 

induced by drug therapies. 

It is important to distinguish antibody arrays from two main protein array 

formats that can be applied to serum samples based also on the binding of 

antibodies to specific antigens. The development and design of tumor-associated 

antigen (TAAs) arrays enhance the detection of autoantibodies against TAAs 

for cancer diagnosis (Fig. 1C). The rationale is related to the presence in 

the cancer sera of antibodies, which react with a unique group of autologous 

cellular antigens or TAAs (10,11). Complex protein extracts can also be spotted 

onto membranes and probed with antibodies targeting specific proteins on the 

so-called reverse-phase arrays (12,13) (Fig. 1D). 

1.3. Types of Planar Antibody Arrays Based on the 

Labeling-Hybridization Methods 

The increasing detection modalities have led to several types and applications 

for antibody arrays (see Note 1). A number of labeling and detection 

methods can be employed for one-antibody and sandwich label-based planar 

arrays (Fig. 2). The signal can be generated by a fluorescently labeled detection 

antibody (Fig. 2A). This approach represents the standard sandwich arrays,


A) 

Antibody direct 

Sandwich 

B) Species-specific 

Tertiary Antibody 

C) Biotinylated 

antibodies with 

fluorescent streptavidin 

conjugates 

D) 2 SAPE layers 

B 

B 

B 

E) Tyramide 

Signal 

Amplification 

F) Alkaline 

phosphatase linked 

to a species tertiary 

Ab activated 

chemiluminescence 

G) Rolling Circle 

Amplification 

H) Resonance lightscattering 

B 

B 

B 

Fig. 2. Several labeling and detection methods can be employed for antibody arrays. 

requiring chemical labeling of all secondary detection antibodies, but the 

assay is a simple two-step procedure that does not require a separate staining 

step (14,15). An alternative approach employs a species-specific fluorescently 

labeled tertiary antibody (Fig. 2B). This option avoids the use of large chemically 

modified detection antibodies, but limits the species of capture antibodies. 

A third option is the utilization of available biotinylated detection antibodies 

(Fig. 2C) (15). In these assays, detection occurs after staining of the sandwich 

complex with Cy3-labeled streptavidin or other streptavidin variants, such as 

Texas Red conjugates or streptavidin-R-Phycoerythrin (SAPE) (15). The fourth 

possibility is based on that the fluorescent signal can be further amplified 

using a second layer of SAPE coupled to the first layer via an anti-SAPE 

antibody (Fig. 2D). Alternatively, in the fifth option, the number of biotin 

labels can be increased via thyramide signal amplification (Fig. 2E) (2). An 

antibiotin horseradish peroxidase (HRP) will generate a thyramide radical that 

cross-links a biotin or a fluorophore to all exposed tyrosine residues of any 

protein near the recognition event (2). Chemiluminesce can also be implemented 

to multiplexed sandwich assays as a sixth possibility (Fig. 2F), using 

a streptavidin-HRP or a species-specific antibody conjugated with HRP or 

alkaline phosphatase and chemiluminescence substrates. Chemiluminescence 

is typically more sensitive than standard fluorescence applications. A polymer 

decorated with streptavidin and europium chelates is utilized not only for


microplate but also for microarray measurements. Evanescence waveguide is 

employed as an alternative for ultrasensitive fluorescence (16). Rolling-circle 

amplification can be applied as a seventh option for signal generation (Fig. 2G). 

The 5 ′ end of an oligonucleotide primer is attached to an antibiotin antibody 

(17). After binding of the antibiotin antibody to the biotinylated detection 

antibody of the sandwich, the oligonucleotide is enzymatically extended using a 

circular DNA sequence as template. Fluorescently labeled short oligos are then 

hybridized to the extent DNA decorating each bound antibody with thousands 

of fluorophores (15). An alternative eighth staining method yielding sensitivity 

similar to evanescence wave technology and rolling-circle amplification 

involves the use of colloidal gold particles coated with an antibiotin antibody 

(18). Because of resonance light scattering (RLS), these particles scatter white 

light very intensely, and quantitative readouts of miniaturized sandwich assay 

can be obtained with a simple charge-couple device (CCD) camera-based 

imaging system (18) (Fig. 2H). RLS particles do not show any photobleaching 

as compared to fluorescence or chemiluminescence (14,15,16,17,18,19,20). 

Due to the high versatility of labeling-hybridization methods available to 

date, the present chapter will describe the detailed reagents and protocol of 

direct labeling on serum specimens, as summarized in Figure 3. 

1.4. Applications in Cancer Research Using Serum Specimens 

Direct labeling methods have been applied for cancer diagnostics to the 

detection of proteins in the serum of patients with prostate cancer (21). The 

use of a two-color rolling-circle amplification method improves the detection 

of low abundant proteins. This method has also been shown to provide 

adequate reproducibility and accuracy for protein profiling on serum specimens 

and clinical applications (17,22,23,24). Sandwich assays can also measure 

protein abundances in body fluids using detection methods such as RLS (25), 

enhanced chemiluminescence (26), tyramide signal amplification (27), and 

fluorescence (28). 

Reverse protein arrays have also been optimized to spot serum specimens 

and obtain high-throughput measurement of IgA in thousands of sera using 

a single experiment (29). For example, a recent report designed antibody 

arrays for bladder cancer by selecting antibodies against targets differentially 

expressed in bladder tumors identified by gene profiling (24). Serum 

protein profiles obtained by two independent antibody arrays represent comprehensive 

means for bladder cancer diagnosis and clinical outcome stratification 

(24). Validation analyses with ELISA and immunohistochemistry on 

tissue microarrays represent alternative approaches to confirm the relevance of 

identified proteins for tumor progression. Such strategy provides experimental


evidence for the use of several integrated technologies and strengthens the 

process of biomarker discovery. 

Serum specimens can be utilized to profile the humoral immune signature of 

cancer patients to detect both autoantibodies against tumor antigens and secreted 

cytokines. The combined detection of antibodies against a group of TAAs has 

provided high sensitivity for diagnosis of prostate cancer (10). The use of phage 

display arrays can enhance tumor subtype specificity of such measurements 

(10,11). Cytokine profiling on serum and plasma specimens can differentiate 

cancer patients from control subjects, and also stratifies patients with leukemia 

based on clinical outcome. Several reports have also compared the reproducibility 

and differences among several technologies available for multiplexing cytokine 

measurements, including planar and bead-based antibody arrays (5,6,7). 

In summary, antibody arrays can be utilized for the following applications: 

(1) the discovery of candidate disease biomarkers (21,24); (2) characterizing 

signaling pathways (28), disease progression, clinical subtypes, and 

outcomes (21,24); (3) measurement of changes in post-translational modifications 

or expression levels of disease-related proteins (28); (4) identifying 

binding partners to proteins; this is very important especially when conducting 

functional studies for drug discovery; (5) epitope mapping for determining 

regions of proteins than bind specific antibodies. 

2. Materials 

2.1. Printing of Antibody Arrays 

1. Antibodies. A critical step is the selection of the antibodies to be printed onto the 

antibody arrays. The antibodies printed on the arrays will be selected based on 

their known affinity characterization and experimental design (see Note 2). 

2. Antibody purification with Affi-gel Protein A MASP II kit (Bio-Rad, Hercules, 

CA). 

3. Protein concentration measurements with BCA Protein Assay (Pierce, Rockford, 

IL). 

4. Fast Slides (Schleicher and Schuell Biosciences, Keene, NH) or HydroGel coated 

glass microscope slides (Perkin Elmer Life Sciences, Waltham, MA). 

5. Polypropylene 384-well microtiter plates (Genetix, New Milton, Hampshire, UK 

or MJ Research, Waltham, MA). 

6. Seal aluminum scotch brand foil tape (R.S. Hugues Sunnyvale, CA). 

7. Printer. 

2.2. Labeling and Hybridization of Serum Samples 

1. NHS-linked Cy3 and Cy5 protein labeling agents (Amersham, GE Healthcare, 

Piscataway, NJ).


2. Microscopic slide staining chamber with slide racks (Shandon Lipshaw, Pittsburgh, 

PA). 

3. Diamond scribe (VWR, West Chester, PA). 

4. Hydrophobic marker (PAP pen, Immunotech, Marseille). 

5. Coverslips (Lifterslip, Erie Scientific, Portsmouth, NJ). 

6. Wafer handling tweezers (Technitool, West Berlin, NJ). 

7. Clinical centrifuge with flat swinging buckets for holding slide racks. 

8. Spin columns for protein cleanup (Bio-Rad Micro Bio-Spin P-6). 

9. Microcon YM-50 (Millipore, Bedford, MA). 

10. Complete protease inhibitors (Roche, Indianapolis, IN). 

11. Buffers: phosphate buffered saline (PBS), pH 7.4 (137 mM NaCl, 4.3 mM 

Na 2 HPO 4 , 1.4 mM KH 2 PO 4 ); carbonate buffer, pH 8.5 (50 mM NaHCO 3 ); 

PBST, PBS containing 0.5% (v/v) Tween-20; 0.1 M PBS, pH 7.2 (68.4 ml 

1MNa 2 HPO 4 , 31.6 ml 1 M NaH 2 PO 4 , 900 ml dH 2 O); NP40 lysis buffer: 

50 mM Hepes-OH, EDTA, 50 mM NaCl, 10 mM NaPPi (Tetrasodium Diphosphate 

Decahydrate), 50 mM NaF, 1% (v/v) NP40, 10 mm Sodium- Vanadate, 

pH 7.5–8.0; saturated NaCl (Sigma); blocking buffer: 1% (w/v) bovine serum 

albumin (BSA) in PBST; 7–10 mM dye stock in DMSO: Dissolve one tube of 

Cy3 or Cy5 dyes in 30 μl of DMSO. Aliquot and freeze at –80°C. 

2.3. Detection 

1. ScanArray microarray scanner at 543 nm and 633 nm wavelengths (Packard 

Bioscience, Research Parkway Meriden, CT). 

2. GenePix Pro 3.0 (Axon Instruments, Union City, CA) software program employed 

to quantify the image data. 

3. Methods 

Three main steps can be considered along the overall process of setting 

up custom-made antibody arrays: antibody array construction, sample labeling 

and hybridization onto the antibody array, and scanning and data analysis. The 

success of the whole process is greatly dependent on the availability of highquality 

antibodies for capturing the target proteins as well as serum samples 

well handled, preserved, and characterized. 

3.1. Antibody Array Construction 

1. Select the antibodies (see Note 2). 

2. Purify the antibodies (see Note 3). 

3. Keep stable and quantify the antibodies (see Notes 4–7). 

4. Prepare the printing plate with antibodies. Put 5– 7 μl antibody solution on each 

well of a 384-well plate (see Note 8). 

5. Prepare slides for printing (see Note 9).


For nitrocellulose slides, no preparation is needed (see Note 9). 

For hydrogel slides: The hydrogel slides should be prepared just before use 

(i.e., only when you are ready to print the arrays). Load the hydrogels into a 

slide rack, briefly rinse (1 s) in purified water, and wash three times at room 

temperature with gentle rocking for 10 min each time in purified water. A 

microscope slide staining chamber is useful for the washing steps. The staining 

chambers come with slide racks that hold 10–30 slides. The racks can be 

transferred between staining chambers containing different washing buffers as 

well as a clinical centrifuge for drying the slides. 

6. Centrifuge slides to dry at no more than 350 g for 3 min. A clinical centrifuge 

with flat swinging bucket holders works well for this task. Place a paper towel 

layer on the bottom of the swinging bucket to absorb water removed from the 

slides. Place the slide rack on the paper towel and centrifuge at no more than 

350 g for about 3 min. 

7. Place the hydrogel slides in a 40°C water bath for 20 min using the staining 

chamber allocating paper towel in the bottom. 

8. Remove the slides from the incubator and allow slides to cool at room temperature 

for 5 min. The slides are now ready for printing. 

9. Print the antibodies on the slides (see Note 10). 

10. Start the post-print processing of microarrays. 

For hydrogels: 

• Prepare staining chambers with a wet paper towel soaked in saturated NaCl at the 

bottom. 

• After printing, the slides are incubated in a humidified staining chamber overnight 

at room temperature to allow adsorption of the antibodies to the matrix. 

• The next day, circumscribe the array boundaries on each slide with a marker (e.g., 

PAPpen). Leave at least 3–4 mm between the array and the marker line. Allow the 

hydrophobic marker lines to fully dry. 

For nitrocellulose (FAST, Schleier, and Schuell) slides: 

• Allow the slides to dry for at least 1 h (let the slides dry on a slide-staining chamber). 

• Store in a refrigerator on a slide rack in a humidified staining chamber. 

• The next day, circumscribe the array boundaries on each slide with a marker (e.g., 

PAPpen). Leave at least 3–4 mm between the array and the marker line. Allow the 

hydrophobic marker lines to fully dry. 

11. Rinse the slides as follows: 

a. Rinse briefly (for 30 s) in PBST. 

b. Wash in PBST for 3 min with gentle rocking. 

c. Wash in PBST for 30 min with gentle rocking.


Cy5 

Ligand + Test proteins Cy3 Ligand + 

Reference 

proteins 

Separate free dye 

React 

Mix 

Place on array 

React 

Separate free dye 

Free dye 

Coated slide 

Antibodies 

Free dye 

Scan 

Fig. 3. Scheme of the whole process when working with custom-made antibody 

arrays. Once antibodies are selected and printed on the arrays, serum samples are labeled 

and hybridized onto the antibody arrays. Scanning and data analyses of fluorescence 

will provide quantitative measurement of multiple proteins simultaneously. 

12. Block the slides. Once the antibodies are immobilized, it is necessary to block 

non-specific protein-binding sites on the printed microarrays. Typical blocking 

solutions include diluted BSA or casein solutions (1,2,9,12,19). If the arrays are 

not to be used for a day or more, leave them in the BSA-blocking solution in 

the refrigerator. Prepare the blocking buffer right before use. Add sodium azide 

to the blocking buffer if you intend to store for more than one day and then 

begin with step b shown below: 

a. Block in the blocking buffer for 1hatroom temperature with constant shaking. 

b. Briefly rinse with PBST twice or alternatively rinse the second time with 0.1 M 

PBS, pH 7.2, for 20 min. 

c. Dry the slides by centrifugation immediately prior to incubating with the labeled 

samples using a clinical centrifuge with flat swinging bucket holders. 

3.2. Labeling of Samples and Hybridization 

A protocol for direct labeling is provided, summerized in Figure 3. 

1. Select the serum samples for labeling (see Note 11). 

2. Determine the volume of each serum sample to label in both Cy3 and Cy5. It is 

important to note that Cy3 is more consistent and bright when deciding whether 

to label samples or references with either Cy3 or Cy5. For the samples, divide 

the volume to be placed on the array by the desired final dilution of the sample 

(varying from 1/30 to 1/50). For a 20 μl volume (the volume used for a 12 × 

12-mm standard hydrogel) and a 1/50 final dilution, use 0.4 μl of serum sample 

(20/50) per array.


If a pooled reference is to be used, each component of the reference is first 

labeled and then pooled (as opposed to pooling and then labeling). The amount 

to be labeled of each component of the reference is (Va × A)/Nr, where Va 

is the volume per array (0.4 μl in the above case), A is the number of arrays 

the reference will be used in, and Nr is the number of samples pooled in the 

reference. For example, if a pool of 10 samples will be used as the reference for 

20 arrays, the volume of each sample to be used in the Cy5 labeling mix will be 

(0.4 × 20)/10 = 0.8 μl. 

3. Dilute the serum sample approximately 15× with carbonate buffer or phosphate 

buffer at pH 7.5 spiked with 0.5 μg/ml dinitrophenol (DNP) flag (if the flag is 

to be used for normalization). Do not use buffers with an amine group such as 

Tris-base. 

4. Add a 20th volume of dye stock to each sample. The final concentration of the 

NH-ester activated Cy-dyes within the serum protein solution should be between 

100–300 μM (each vial of dye contains 200 nmol). 

5. Mix each dye and serum protein solutions and let the reaction proceed on ice in 

the dark for 2 h. Normally, mix the reference protein solution with the Cy3 dye 

solution, and the test protein solution with the Cy5 dye solution. 

6. Add a 20th volume 1 M Tris-HCl pH 7.5–8.0 (or glycine) to each of the reactions 

to quench (stop the labeling), so that at least a 200-fold excess of quencher:dye 

concentration is achieved. 

7. Load the samples onto a microconcentrator having the appropriate molecular 

cutoff, such as the Bio-Rad Bio-spin 6 microcolumn, and spin at 1000×g for 

2 min. A 3000-D cutoff captures most proteins while still removing the dye. 

If smaller proteins are not important, the 10,000-D cutoff is faster. Centrifuge 

according to the microconcentrator instructions. The 10,000-D microcon typically 

requires 20 min, and the 3000-D microcon requires 80 min of centrifugation at 

10,000×g at room temperature. 

8. Make 10× blocking solution: 30% (w/v) non-fat milk in PBS and 1% (v/v) 

Tween-20 (e.g., 3 ml milk in 10 ml buffer). 

9. Spin the milk solution at 10,000×g for 10 min. The milk blocker solution needs 

to be centrifuged to remove particulate matter (e.g., 10 min at 10,000×g). 

10. After centrifuging with the microconcentrator column to the flow-through 

(collection tube) of the column, add 1 μl of the supernatant of the blocking mix 

per array and 1 μl of 10× protease inhibitor per array. 

11. Pool the reference samples and divide among the test samples according to the 

experimental plan. 

12. Add 1× PBS to bring to 20–25 μ per array, if necessary. The labeled samples 

may be stored overnight at 4 C. 

13. Start hybridization of the labeled serum samples on the printed antibody 

arrays. Distribute the Cy3-labeled reference protein solution to the appropriate 

Cy5-labeled test protein solutions. Add PBS to each mix to achieve a volume 

of 20–25 μL per array. It is recommended to remove any particulate matter or


precipitate by (1) filtering with a 0.45-μm spin filter, or (2) centrifuging for 10 min 

at 14,000×g and pipetting out the supernatant. 

14. Load appropriate amount of labeled samples on the slides within the marked 

boundaries, and cover with Lifterslip. Use 20 μl for the 12 × 12 -mm hydrogels. 

The cover slip should be at least 1/4 inch longer than the dimensions of the array. 

(The background is often higher at the edges of the cover slip.) 

15. Incubate for 2hatroom temperature with constant shaking. 

16. Rinse briefly in PBST to remove the Lifterslip. 

17. Wash three more times for 10 min in fresh changes of PBST. (All washes are 

performed in racks at room temperature.) 

18. Rinse for 20 s in PBS. Alternatively, final washes with H 2 O can be performed 

for 5 min each of gentle agitation. 

19. Dry the slides by centrifugation prior to scanning. 

3.3. Scanning and Data Analysis 

1. Scan the slides at 552 nm and 635 nm using a microarray fluorescence scanner 


2. Process the data: grid the arrays and reject unsatisfactory data points (see Note 

13). 

3. Normalize the data (see Note 14). 

4. Analyze the data (see Note 15). 

5. Interpret the data (see Note 16). 

4. Notes 

1. Radioactivity, fluorescence, or chemiluminescence detection methods have been 

used with antibody arrays. Radioactivity is not frequently used due to its 

safety concerns and its longer exposure times (up to 10 h). Fluorescence 

is one of the most frequently utilized detection methods. Fluorophores, like 

chromogens, exist in many formulations and have defined emission spectra. 

Fluorescein, rhodamine (Texas Red), phycobiliproteins, nitrobenzoxadiazole 

(NBD), acridines, Cy3, Cy5, and bodipy compounds are commonly used 

for protein labeling (13,14,15,16,17). The selection of fluorophores for use 

with microarrays depends on sample type, substratum, emission characteristics, 

and even the number of analytes to be assayed. Not all substrates are 

compatible with fluorescent detection strategies due to inherent autofluorescence 

of the material (14,15,16,17), which significantly reduces the signal-to-noise 

ratios. Nitrocellulose-coated slides cause light scatter and higher background 

as compared to aldehyde-treated slides with laser scanner detection methods, 

limiting the use of nitrocellulose substrata for fluorescent detection methods 

(13,14,15,16,17). The sample may also have components that interfere with a 

selected fluorophore. Flavoproteins autofluoresce and emit light in the same 

region as fluorescein, limiting the use of this fluorophore in samples rich in 

flavoproteins, e.g., liver and kidney tissues. Photobleaching and quenching of


fluorophores can decrease the total signal observed on an array. The Cy3 and 

Cy5 dyes are commonly used for fluorescent detection because they overcome 

these effects. They are well suited for fluorescence detection strategies due to 

their decreased dye interactions, increased brightness, and the ability to add 

charged groups to the molecules (13,14,15,16,17). Fluorescent-tagged proteins 

including antibodies can be used for detection of immobilized molecules on 

a microarray using both indirect or sandwich strategies. Streptavidin-biotin or 

RCA amplification chemistries can also be applied to fluorescence detection 

strategies (22,23,24), providing sufficient sensitivity for most applications. 

Chemiluminescent detection methods are based on Western blotting protocols 

for detection of antigen-bound antibodies with secondary antibodies conjugated 

to alkaline phosphatase or HRP (13,14,15,16,17,18). Chemiluminescent 

detection methods can be applied to any of the label detection methods. Chemiluminesce 

is highly sensitive but may pose limitations due to its dynamic range 

and compatibility with multiplexing. Amplification strategies such as biotinyltyramide 

can be applied to chemiluminesce. A useful application consists of 

total protein determination made directly on arrays using a ruthenium organic 

complex, which interacts non-covalently with proteins immobilized on nitrocellulose 

(13,14,15,16,17,18). The dye is applicable to arrays printed on nitrocellulose 

membranes. This type of total protein analysis is useful for minute sample 

volumes in which a standard protein spectrophotometric analysis would not be 

feasible. 

2. Antibody selection. The first critical step is the selection of protein targets to be 

measured with the antibody arrays, which depends on the experimental design 

and objectives of the analyses undertaken. It is advisable to have biological or 

experimental criteria supporting the search for specific proteins in the serum. An 

approach rendering high efficacy suggests analyses of high-throughput profiling 

at the DNA or RNA level previous to protein profiling to enrich the probability 

to find a target protein in the serum. Not all proteins are suitable for measurement 

with this assay, since their size and the likely abundances of the proteins in the 

samples are limiting factors. If a protein is very small (or is a polypeptide), it 

may not be compatible with direct labeling detection methods, which use sizebased 

separation of labeled product from the label. If a protein is in very low 

abundance, it may fall out of the detection limit of the assay. Detection limits for 

the assay depend on the antibody used, the protein background in the sample, 

and the detection conditions. In general, the direct labeling method described 

here can give detection limits in the low ng/ml range for targets present in the 

serum background. 

Once the target protein is assembled, the search of antibodies begins. The 

main bottleneck to the development of highly multiplexed planar antibody 

arrays is the requirement for specific affinity ligands for each analyte. Commercially 

available antibodies against novel or rare proteins may not exist, which 

leaves the option of having the antibody custom-produced. Custom antibody 

generation is lengthy, expensive, and probably not a viable choice for more 

than a few antibodies. If a protein target is more common and a choice of


antibody exists, it is advisable to search for antibodies that work efficiently for 

enzyme-immunoassays, since these assays are quite similar to antibody arrays. 

Monoclonal antibodies seem to have a higher success rate, but polyclonals may 

also work well, although they may lead to high background and reduced specificity 

and sensitivity as compared to monoclonal antibodies. In vitro selection 

of antibodies using phage-ribosome or mRNA display technologies, and the 

use of engineered binding molecules is having increasingly important role 

in generating specific affinity ligands for analytes for which antibodies are 

unavailable (14). An alternative strategy to produce specific antibodies has been 

validated optimizing the design of protein sub-fragments of a selected size with 

minimal sequence similarity to other proteins. The fragments are selected using 

an alignment scanning procedure based on the principle of lowest sequence 

similarity to other human proteins, optimally to generate antibodies with high 

selectivity (20). If direct labeling method is to be used, only one antibody for 

target is needed. If using a sandwich assay, a matched pair of antibodies is 

needed. The direct labeling method works well for mid- to high-abundance 

proteins, while sandwich assays or amplification protocols are recommended for 

low-abundance proteins. 

Since antibodies cannot be manufactured with known affinity and specificity, 

it is advisable to validate the specificity and sensitivity of each antibody 

prior to use as a probe for protein arrays. The identification of a single band 

at the specified molecular weight on Western blotting represents a standard 

validation strategy for the specificity and sensitivity of the proposed antibody, as 

well as immunoprecipitation followed by mass spectrometry (1,6). The antigenantibody 

properties of the antibodies printed on the arrays can be evaluated 

by the estimation of random and systematic errors. Western blotting analyses 

can serve to evaluate the specificity of the antibodies. Commercial or custommade 

enzyme-immunoassays can be utilized to validate the ability of antibodies 

identified by antibody arrays by an independent method on the same serum 

specimens profiled using antibody arrays. 

Recombinant antigens can be utilized as positive and negative controls for 

the process of printing (depositing the antibodies onto the slides), calibration, 

and detection methods (1,2,9). The linearity range of the assay depends on the 

antibody-antigen affinity. Linearity can only be achieved when the concentration 

of the analyte and antibody are matched to the affinity constant. It is advisable 

that dilution and recovery experiments evaluating the specificity and affinity of 

the antibodies for their ligands are included when utilizing antibody arrays. (2,9). 

3. Purity of antibodies. Antibodies work best in the arrays when they are highly 

purified. The use of antibodies in a high background of other proteins often 

results in a weakened or non-specific signal, since the background proteins 

occupy many binding sites on the microarray. Some purified antibodies come in a 

BSA or gelatin stabilizer. It may be desirable to remove gelatin, since it can bind 

some biological molecules. BSA rarely has the problem of non-specific binding, 

but if it is at a much higher concentration than the antibody, it could significantly


reduce the signal from the antibody, which would warrant further purification of 

the antibody. Some antibodies come in a high concentration (8–50%) of glycerol 

to improve stability. While glycerol will not interfere with the assay, the added 

viscosity may negatively affect the printing process. Glycerol concentrations 

above 20% should be avoided. To change the buffer of an antibody, it is advisable 

to use the Bio-Rad Micro Bio-Spin P30 column. These columns come with 

two types of buffers: sodium saline citrate (SCC) and Tris buffer. The filtrate 

will come through in the packing buffer. This packing buffer can be changed 

by running a different buffer through the column three times. The P30 column 

removes solution components smaller than 30 kD, and the P6 column removes 

components smaller than 6 kD. Thus, the P30 column is better for purification of 

antibodies, and the P6 column is better for purification of complex mixtures in 

which low-molecular-weight species should be preserved. Thus, if the antibody 

is to be subsequently labeled, it is recommended not to put the antibody in a 

Tris or amine-containing buffer. 

Polyclonal antibodies come either as unpurified antisera, the IgG fraction of 

antisera, or the affinity purified (purified using the antigen) fraction of antisera. 

Affinity purified is best, since it yields the highest purity of specific antibody. 

IgG-purified fractions of antisera usually work well. Antibodies that arrive in 

pure ascites fluid may also need to be purified. If a monoclonal antibody is good, 

it will work well without further purification, and so they should be tested first. 

A protein purification method of IgG antibodies is recommended using the Affigel 

Protein A MAPS II kit (Bio-Rad). In general, the following antibody buffer 

requirements should be considered: (1) all antibodies that arrive as antisera need 

to be IgG purified; (2) antibodies in ascites fluid may also need to be purified, 

although they can first be tested without purification. 

4. Stability and concentration. Antibodies are stable when refrigerated in a standard 

buffer such as PBS. The concentration of an antibody can be measured using 

a protein concentration kit such as the BCA 200 Protein Assay Kit (Pierce 

Biotechnology). The optimal spotting concentration range is 100–200 μg/mL. 

Higher concentrations could yield better signal strengths and lower detection 

limits, and may be desirable if the consumption of antibody is not a concern. 

Each antibody’s concentration should be constant at different printing sets, since 

concentration variations in an antibody can affect data. Simply stated, if a 

set of data is produced using a particular antibody at 300 μg/mL, subsequent 

experiments should use that antibody at 300 μg/ml for better comparison of 

the results. 

5. Antibody storage. Most antibodies can be stored or refrigerated for up to a year. 

New antibodies should be divided into aliquots that will last approximately a 

year each. One aliquot should be kept in the refrigerator as a working stock, 

and the others frozen at –70°C. Aliquoting the antibody stocks helps to avoid 

repeated freeze/thawing that can damage the proteins. Protein stocks should not 

be frozen in PBS; it is better undiluted. When retrieving antibodies/proteins from


a freezer stock, thawing should be done slowly on ice to reduce damage to the 

antibody from the thawing process. 

6. Tracking antibodies. It is helpful to keep information about the antibodies in 

a database. It is advisable to provide a number code for each antibody, and if 

changes are made to an antibody’s buffer composition, a new code should be 

assigned to the new preparation. Relevant information to track include clonality, 

manufacturer, animal of origin, concentration, and aliquot age. It is important 

to track the maximum information provided in the antibody datasheet, and label 

aliquots accordingly. 

7. Maintaining antibody stocks. A refrigerator stock of ready-to-use antibodies 

(kept at working solution) should be maintained. Except for the antibodies that 

should not be frozen, only one tube of each antibody should be stored in the 

refrigerator at a time. The amount of each antibody in the refrigerator stock 

should be sufficient to last for six months or up to a year (normally around 

100 μL). The rest of the antibody stock should be aliquoted into similar volumes 

and frozen at –80°C. If the antibody in the refrigerator stock needs to be diluted 

in order to reach the working stock concentration, dilute only sufficient stock for 

the working solution. When retrieving antibodies/proteins from a freezer stock, 

they should be thawn slowly on ice in order to reduce damage from the thawing 

process. The protein stock master list will need to be adjusted to indicate when 

the antibodies are thawn and frozen. 

8. Print plate preparation. After the antibodies have been acquired and prepared 

at proper purity and concentration, they are assembled into a “print plate,” 

which is a microtiter plate used in the robotic printing of microarrays. 

Polypropylene microtiter plates are preferable to polystyrene because of lower 

protein adsorption. The plate should be rigid and precisely machined for 

optimal functioning with printing robots. The 384-well plates are generally more 

compatible with printing robots than 96-well plates and require less volume per 

well than 96-well plates. Load about 6–10 μl of each antibody into each well 

of the 384-well print plate. The volume may depend on the shape of the well 

and how far the print tips descend into the well. Too much volume may lead to 

droplets of antibody solution sticking to the outside of the print tip. The volume 

may also need to be optimized for particular applications, such as multiple 

draws from each well, which would require a greater volume. If printing is 

sometimes inconsistent or variable between printing tips, it is desirable to fill 

multiple wells with the same antibody solution so that different print pins spot 

the same antibody. Store the 384-well print plates sealed in the refrigerator until 

ready to use. Aluminum foil tape provides a good seal. Enclosing the covered 

plate in a sealed plastic bag ensures long-term, evaporation-free storage. It is 

very important to prepare a spreadsheet containing the well identities for use in 

downstream data processing applications. 

9. Selection of slides. The various immobilization and detection strategies are 

devised depending on which target molecules are going to be measured and 

which ones are used to capture them. The attributes of an ideal sub-stratum


for antibody arrays include limited non-specific binding, high surface area-tovolume 

ratio, inert biological molecules, minimal autofluorescence, and compatibility 

with available detection methods. A variety of surfaces and immobilization 

chemistries have been described for antibody arrays. Derivatized supports where 

capture antibodies are immobilized include surfaces such as polyvinylidene 

difluoride, nitrocellulose, agarose, polyacrylamide, or hydrogels. Glass slides 

are frequently coated with one-, two-, or three-dimensionally structured surface 

modifications, being activated with aldehyde, polylysine, or a homo-functional 

cross-linker as part of the initial optimization experiments (2,9,14). The advantages 

of the use of distinct coating or surfaces under different blocking, pH 

buffering, or UV cross-linking conditions for specific applications have been 

described (14). Silane-coated glass slides or acrylamide hydrogel can provide 

good reproducibility from day to day, efficient immobilization of antibodies, and 

low background when used in conjunction with fluorescence detection. Various 

substrates for antibody arrays have been reported, such as poly-lysine coated 

glass (1), aldehyde-coated glass (30), nitrocellulose (31), and a poly-acrylamide 

based hydrogel (32). Hydrogels and nitrocellulose give good results for the direct 

labeling method described here. Nitrocellulose slides do not require any preparation 

before printing, and give clean and low background results. Hydrogel 

coating on glass slides (such as those supported by PerkinElmer Life Sciences) 

can support multiple layers of protein, thus increasing the binding capacity 

and signal strengths, and it should be noted that the hydrophilic matrix of the 

hydrogel may better retain native protein structure. Hydrogels should be stored 

dry at room temperature. They must be used within 2 days after preparation. 

10. Printing of antibody arrays. The details of printing will depend on the printing 

robot used. It is necessary to immobilize antibodies in a way that the functional 

component will be efficiently deposited without interfering subsequent binding. 

Conditions such as humidity, temperature, dust levels, and pin washing should 

also be stringently controlled during the printing step. It is important to minimize 

the time taken to unseal the print plates and their exposure in order to keep 

the evaporation of antibody solutions low. Maintaining a moderately high 

humidity in the printing environment (around 45%) will minimize evaporation 

and maintain spot quality. Excessive humidity can lead to overly large spots. 

The proper printing of the robot should be confirmed with test prints on dummy 

slides before starting the microarray production. It is advisable to use 500 μg/mL 

BSA in 1× PBS for the test prints. If the tips are washed in a wash bath, make 

sure the water is changed regularly every 6–12 loads to prevent contamination 

of the tips. It is also desirable to confirm sufficient washing of the pins and lack 

of carry-over from load to load. This test can be done by loading labeled protein 

into one of the print plate wells in a dummy print, followed by scanning of the 

unwashed slide. If fluorescence is seen in spots after the fluorescently labeled 

material, the pins need to be washed more stringently. Most microarrayers will 

allow the printing of replicate spots on each array from the same well of the print 

plate. Replicate spots are useful to obtain more precise data through averaging


and ensure the acquisition of data if a portion of the array is somehow unusable. 

Six to ten spots per array per antibody are recommended. 

11. Serum sample handling and storage. Sera should be collected in red gel tubes, 

allowing the coagule to retrieve and centrifuged at 3000 g/10 min, aliquoted and 

stored at –80 C. All samples should be consecutively numbered to avoid any 

record compromising the identity of these patients or controls under study. Serum 

samples should be handled as biohazards. Tips and tubes that contact serum 

samples should be disposed in a biohazard bag. Upon the first thaw, the samples 

need to be aliquoted. Samples should be aliquoted so that no more than four 

thaws are necessary for every experiment. Low volume aliquots (approximately 

10–15 μl) of each specimen are recommended. For greater than approximately 

50 samples, it is convenient to use a microtiter plate for aliquoting. In this case, 

approximately 50 μl from each sample is placed into each well of a 96-well 

microtiter plate. Either a robot or a matrix multichannel pipettor is used to 

aliquot small volumes into replicate 96-well plates. 

12. Scanning. The fluorescence signal from the microarrays is detected using a 

microarray scanner. GenePix Pro 3.0 (Axon Instruments) software program 

quantifies the image data. The local background in each color channel is 

subtracted from the signal at each antibody spot, and spots having obvious 

defects, no detectable signal by GenePix, or a low net fluorescence in either color 

channel are removed from analysis. The ratio of net signal from the samplespecific 

channel to the net signal from the reference-specific channel is calculated 

for each antibody spot, and ratios from replicate antibody measurements in each 

array are averaged. An intensity-dependent normalization algorithm for antibody 

arrays is recommended. 

Some of the particulars of the scanning method will depend on the instrument, 

but some general principles may be followed. Scanning of an experiment set 

should be performed immediately after incubation of the microarrays and all on 

the same day, if possible, to minimize noise introduced by variable breakdown of 

dye on the array (particularly Cy5). The microarrays should be kept in the dark 

to minimize bleaching of fluorescent dyes. Scanners typically have adjustments 

for laser power, detector gain, and scan rate. Set both lasers to about 95% and 

adjust the scanner to achieve the desired signal intensities. Adjust the laser power 

so that at least 50% of the pixels of each spot are saturated. The laser power 

should almost always be set very close to the maximum since the maximum 

powers of the small commercial scanners are still less than optimal. Lower scan 

rates will generally produce higher signal-to-noise ratios. Scanning is performed 

at either 50 or 25% speed, depending on practical time limitations. The scan 

rate usually has a practical time limit to scan large sets of arrays. In order to 

find the optimal scanner settings, it is advisable to set the laser power close to 

maximum, set the scan rate to the lowest acceptable value, and then adjust the 

detector gain as high as possible without showing signal saturation in the data. 

When scanning a large set of arrays as part of a single experiment set, it is 

desirable to use similar settings for all the arrays to minimize the differences


in conditions between the arrays. It may not always be possible to use the 

same settings for every slide due to great variations in signal and background 

strengths, but subsequent normalization should readjust the data accordingly. 

Scanned images are typically stored as tiff files to be analyzed by microarray 

analysis programs. It is advisable to save the scanned images by their slide 

number followed by either Cy3 or Cy5 and the date of scanning. 

13. Gridding and rejection of data points. The analysis of scanned microarray 

data depends somewhat on whether the experiment is one color or a twocolor 

direct-labeling experiment. In all experiment types, the image data first 

need to be converted into numbers. Various software programs that come with 

current scanners, such as GenePix with Axon scanners and ArrayQuant with 

PerkinElmer scanners, accomplish this. The details for using such programs are 

not discussed here, but the principles that these programs use are mentioned. 

The quantification of microarray data begins with loading the scanned images 

(usually in tiff format) into an analysis program and overlaying a grid that defines 

the locations of the antibody spots. After aligning the grid to the image data, 

the program calculates the intensities and various statistics for image areas both 

within and without the spots. The user can “flag” or reject spots if obvious gross 

defects are present. Spots with very low intensity in one or both of the color 

channels yield unreliable data and should be rejected. It is especially important 

to reject low-intensity spots in two-color ratio since the noisy low intensity 

data can greatly affect the ratio. It is desirable to define statistical criteria for 

rejecting low-intensity spots rather than relying on user judgments. A threshold 

based on the overall variation in background on the arrays can be defined. The 

median signal intensity at each spot should be three standard deviations (of the 

background areas) above the local background median intensity. This objective 

criterion provides uniform, statistically based standard for all data. 

14. Normalization of data. The signals obtained from each array need to be 

corrected or normalized for possible changes in the overall signal intensity due 

to factors such as scanner settings and dye labeling efficiency. This process 

uses signals from antibodies targeting an internal standard of known concentration. 

Antibodies against proteins commonly expressed in serum, such as 

immunoglobulin isotypes, albumin, or C-reactive protein, can be utilized as 

internal controls. A normalization factor is calculated for each array that sets the 

data from normalization antibodies to the expected or known values. A highly 

specific and quantitatively accurate antibody is required for measurement of the 

normalization protein. The protein standards can either be present naturally in 

the sample or can be spiked in. Naturally occurring proteins that work well is 

flag-labeled BSA. It is a widely used peptide tag for which commercial labeling 

kits are available. Other tags such as DNP can work well too. 

Normalization is recommended to be based on an intensity-dependent 

algorithm as follows (24). In this case, the local background in each color 

channel is subtracted from the signal at each antibody spot, and spots having 

obvious defects, no detectable signal by GenePix, or a low net fluorescence


in either color channel are removed from analysis. The ratio of net signal 

from the sample-specific channel to the net signal from the reference-specific 

channel is calculated for each antibody spot, and ratios from replicated antibody 

measurements in the same array are averaged. It is common to plot a red (Cy5) 

versus green (Cy3) channel scatter plot to examine the distribution of intensities; 

however, transforming to fold change versus average intensity displays 

the data in a more easily readable form. If I red is the background subtracted red 

channel intensity, and I green is the background subtracted green intensity, then the 

following variables are created: R = I red /I green andA= √ (I red ×I green ), where R is 

simply the fold change ratio and A is the average intensity (the geometric mean 

that is equivalent to averaging the log intensity). The curvature in the scatter 

plot indicated a dependence of the ratio R on the overall intensity. This curve 

is then used to normalize the data: log I red /I green →log (I red /I green −c A, where 

c(A) is the fit. This is equivalent to multiplying the green channel intensity 

(or dividing the red) by an intensity dependent normalization constant k(A) 

where log [(k(A)] = c(A). The optimal normalized data should be horizontal and 

centered (24). 

15. Data analysis. A critical step using quantitative data obtained through antibody 

arrays is the establishment of a filtering process to assess the quality of the 

data. The conceptual similarity of label-based antibody arrays with two-color 

competitive detection genomic arrays has allowed the application of normalization 

and data analysis tools classically utilized for cDNA arrays to protein 

profiling using antibody arrays (24). In order to obtain efficient measurement 

of multiple proteins simultaneously with high sensitivity, specificity, and 

quantitative accuracy over large concentration ranges and reproducibility, it is 

necessary to consider quality control issues in the design of the arrays (1,4,9). 

Optimal assessment of technology through filtering and data analyses procedures 

will later address the linearity, calibration, and specificity of the antibodies, as 

well as if labeling and/or hybridization protocols are optimized adequately to 

ensure high signal-to-noise ratios (3,24). The very first level of quality control 

deals with the experimental design of the printing of antibody arrays, which 

should include various replicated spots dispersed along the complete surface 

of the array as well as the inclusion of controls in every single experiment to 

evaluate the intra- and inter-assay reproducibility of the measurements (1,4,9). 

The array should also include appropriate means that serve to test the presence of 

potential antibody interferences and cross-reactivity. In this regard, the quantity 

of antibody spotted can be used to standardize the antigen concentration. It is 

possible to use an internally controlled system where one color represents the 

amount of antibody spotted, and the other color represents the amount of antigen 

that is used to quantify the level of protein expression. This normalization for 

antibody spot intensity can decrease variability and lower the limits of detection 

of antibody arrays. 

The initial control of scanned data is at the spot level using the scanner 

software, e.g., GenePix (24). The customized report created can be utilized 

to analyze the quality of spots, and it is then possible to flag those spots of


low quality. The criteria to flag the spots may include the standard deviations 

away from background, the R 2 , or the percent saturation (3,24). At 

the array level of comparison, the quality control of data includes normalization 

of the array, as well as calculation of average and standard deviation 

of the intensities of each antibody in its various replicates along the slide 

(3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24). Spots with high 

standard deviation between replicated spots can be filtered out. Normalization 

of the arrays can be performed using the average intensity of each array (24), 

protein standards such as Immunoglobulin G (1,21), or internal controls based 

on antibody spot intensity (31). 

In the next level of data filtering, each experiment set is compared, and the 

results are calibrated to a dilution series of antibodies by a best fit line removing 

data with high variability. The results can also be correlated to independent 

measurements obtained through enzyme-immunoassays (ELISA) available to 

quantify targets included in the antibody arrays. At this step, if the series for an 

antibody is bad, the antibody can be flagged. It is possible to set thresholds of 

expression for an antibody, specifying a maximum and minimum ratio for spots 

to be considered in further analyses (24). This is a critical step due to its ability to 

filter the input data based on the standard deviation between replicate spots, and 

also the output data based on the standard deviation of dilution experiments. The 

last level of quality control refers to the comparison of independent experiment 

sets based on internal controls that will allow comparison between experiments 

performed on different days. The combined use of unsupervised and supervised 

methods can identify protein patterns associated with disease progression and 

clinical outcome. 

16. One should be aware that there are limitations of research procedures working 

with antibody arrays, associated with false positive and negative results, which 

may be overcome using different strategies. Causes of false negative results 

on antibody arrays include: (1) The protein product may have been degraded 

by serum proteases during sample handling. (2) Interferences in the antibodyantigen 

binding process resulting in low detection of the target protein. The 

specificity of the targets for bladder cancer progression is addressed by immunohistochemistry, 

and using antibodies targeting different epitopes. The specificity 

of antigen-antibody binding is assessed by reverse-protocols, printing 

purified proteins and Western blots. Addition of protease inhibitors and serum 

preservation at –80°C will avoid protein degradation during sample handling. 

Serum aliquots will avoid degradation effects associated with repetitive thawing– 

freezing cycles. Modifications in amplification protocols such as rolling-circle 

amplification may increase signal detection. 

Similarly, the causes of false positive results on antibody arrays include: (A) 

The antibody is binding non-specific molecules or degradation products of the 

target protein. (B) Gelatin or protein-related additives to antibodies printed onto 

arrays. (C) The presence of heterophilic antibodies in serum samples. (D) Nonspecific 

binding of antibodies present in patients with any autoimmune or other 

diseases. False positive results can be addressed in several ways. Cross-reactivity


can be overcome by the selection of alternative antibodies directed to other 

epitopes (A), or including different preservatives without gelatin (B). In cases 

C and D, the interference and recovery experiments proposed for the analytical 

validation of antibodies using dilution and recovery coefficients will estimate 

the amount of interference. Clinical records on other coexisting diseases in the 

patients analyzed, enzyme-immunoassays, and immunohistochemical analyses 

will assist to interpret the unexpected results. The specificity of antigen-antibody 

binding can be assessed by reverse-protocols, printing purified proteins and 

Western blots. 

5. Final Remarks 

The methods and applications of antibody arrays are increasing in scope 

and effectiveness. The current and new antibody array formats that may be 

developed in the near future are likely to markedly accelerate the rate of 

biomarker discovery and characterization of cancer-specific pathways that will 

eventually lead to the development of individualized therapies that take into 

account markers of disease predisposition and therapeutic response. However, 

multiple challenges remain in the design and application of antibody arrays (33, 

34,35): (1) poor understanding of protein immobilization; (2) limited dynamic 

ranges of no more than three orders of magnitude; (3) achieving accuracy 

and reproducibility similar to clinical immunoassays; (4) molecular protein 

complexity and denaturation affecting immunoreactivity; (5) lack of standards 

and calibrators; (6) development of high-affinity and specific antibodies for 

target antigens. Such challenges are being addressed by the multi-institutional 

effort of the Human Proteome Organization (HUPO) toward the standardization 

of critical parameters in serum or plasma proteomic analyses. Initial studies 

provide guidance on pre-analytical variables that can alter the analysis of bloodderived 

samples, including choice of sample type, stability during storage, use 

of protease inhibitors, and clinical standardization [(33); see also Chapter 2). 

As part of the HUPO approach, it is also critical to standardize the statistical 

strategies for high-confidence protein identification and data analyses. These 

efforts and strategies toward integrating proteomic datasets would lead toward 

accurate and comprehensive representation of human proteomes (34–35) 

References 

1. Haab BB, Dunham MJ, Brown PO. (2001). Protein microarrays for highly parallel 

detection and quantitation of specific proteins and antibodies in complex solutions. 

Genome Biol. 2(2): research 0004.1–0004.13. 

2. Chan SM, Ermann J, Su L, Fathman CG, Utz PJ. (2004). Protein microarrays for 

multiplex analysis of signal transduction pathways. Nat Med. 10, 1390–6.


3. Sanchez-Carbayo M. (2006). Antibody arrays: technical considerations and clinical 

applications in cancer. Clin Chem. 52, 1651–9. 

4. Barry R, Diggle T, Terrett J, Soloviev M. (2003). Competitive assay formats for 

high-throughput affinity arrays. J Biomol Screen. 8, 257–63. 

5. Pang S, Smith J, Onley D, Reeve J, Walker M, Foy C. (2005). A comparability 

study of the emerging protein array platforms with established ELISA procedures. 

J Immunol Meth. 302, 1–13. 

6. Lash GE, Scaife PJ, Innes BA, Otun HA, Robson SC, Searle RF, Bulmer 

JN. (2006). Comparison of three multiplex cytokine analysis systems: Luminex, 

SearchLight and FAST Quant. J Immunol Meth. 309, 205–8. 

7. de Jager W, Rijkers GT. (2006). Solid-phase and bead-based cytokine immunoassay: 

a comparison. Methods 38, 294–303. 

8. Waterboer T, Sehr P, Pawlita M. (2006). Suppression of non-specific binding in 

serological Luminex assays. J Immunol Methods. 309, 200–4. 

9. Kingsmore SF. (2006). Multiplexed protein measurement: technologies and applications 

of protein and antibody arrays. Nat Rev Drug Discov. 5, 310–21. 

10. Wang X, Yu J, Sreekumar A, Varambally S, Shen R, Giacherio D, Mehra R, Montie 

JE, Pienta KJ, Sanda MG, Kantoff PW, Rubin MA, Wei JT, Ghosh D, Chinnaiyan 

AM. (2005). Autoantibody signatures in prostate cancer. N Engl J Med. 353, 1224–35. 

11. Anderson KS, LaBaer J. (2005). The sentinel within: exploiting the immune system 

for cancer biomarkers. J Proteome Res. 4, 1123–33. 

12. Petricoin EF III, Bichsel VE, Calvert VS, Espina V, Winters M, Young L, Belluco 

C, Trock BJ, Lippman M, Fishman DA, Sgroi DC, Munson PJ, Esserman LJ, 

Liotta LA. (2005). Mapping molecular networks using proteomics: a vision for 

patient-tailored combination therapy. J Clin Oncol. 23, 3614–21. 

13. Angenendt P, Glokler J, Murphy D, Lehrach H, Cahill DJ. (2002). Toward 

optimized antibody microarrays: a comparison of current microarray support 

materials. Anal Biochem. 309, 253–60. 

14. Espina V, Woodhouse EC, Wulfkuhle J, Asmussen HD, Petricoin EF III, Liotta 

LA. (2004). Protein microarray detection strategies: focus on direct detection 

technologies. J Immunol Methods. 290, 121–33. 

15. Levit-Binnun N, Lindner AB, Zik O, Eshhar Z, Moses E. (2003). Quantitative 

detection of protein arrays. Anal Chem. 75, 1436–41. 

16. Pawlak B, Gordon R. (2005). Density estimation for positron emission tomography. 

Technol Cancer Res Treat. 4, 131–42. 

17. Schweitzer B, Roberts S, Grimwade B, Shao W, Wang M, Fu Q, Shu Q, Laroche 

I, Zhou Z, Tchernev VT, Christiansen J, Velleca M, Kingsmore SF. (2002). 

Multiplexed protein profiling on microarrays by rolling-circle amplification. Nat 

Biotechnol. 20, 359–65. 

18. Pasternack RF, Collings PJ. (1995). Resonance light scattering: a new technique 

for studying chromophore aggregation. Science. 269, 935–9. 

19. Stich N, Gandhum A, Matyushin V, Raats J, Mayer C, Alguel Y, Schalkhammer T. 

(2002). Phage display antibody-based proteomic device using resonance-enhanced 

detection. J Nanosci Nanotechnol. 2, 375–81.


20. Lindskog M, Rockberg J, Uhlen M, Sterky F. (2005). Selection of protein epitopes 

for antibody production. Biotechniques. 38, 723–7. 

21. Miller JC, Zhou H, Kwekel J, Cavallo R, Burke J, Butler EB, Teh BS, Haab BB. 

(2003). Antibody microarray profiling of human prostate cancer sera: antibody 

screening and identification of potential biomarkers. Proteomics. 3, 56–63. 

22. Zhou H, Bouwman K, Schotanus M, Verweij C, Marrero JA, Dillon D, Costa J, 

Lizardi P, Haab BB. (2004). Two-color, rolling-circle amplification on antibody 

microarrays for sensitive, multiplexed serum-protein measurements. Genome Biol. 

5, R28. 

23. Shao W, Zhou Z, Laroche I, Lu H, Zong Q, Patel DD, Kingsmore S, Piccoli SP. 

(2003). Optimization of rolling-circle amplified protein microarrays for multiplexed 

protein profiling. J Biomed Biotechnol. 5, 299–307. 

24. Sanchez-Carbayo M, Socci ND, Lozano JJ, Haab BB, Cordon-Cardo C. (2006). 

Profiling bladder cancer using targeted antibody arrays. Am J Pathol. 168, 93–103. 

25. Saviranta P, Okon R, Brinker A, Warashina M, Eppinger J, Geierstanger BH. 

(2004). Evaluating sandwich immunoassays in microarray format in terms of the 

ambient analyte regime. Clin Chem. 50, 1907–20. 

26. Huang R, Lin Y, Shi Q, Flowers L, Ramachandran S, Horowitz IR, Parthasarathy 

S, Huang RP. (2004). Enhanced protein profiling arrays with ELISA-based amplification 

for high-throughput molecular changes of tumor patients ′ plasma. Clin 

Cancer Res. 10, 598–609. 

27. Varnum SM, Woodbury RL, Zangar RC. (2004). A protein microarray ELISA for 

screening biological fluids. Methods Mol Biol. 264, 161–72. 

28. Gembitsky DS, Lawlor K, Jacovina A, Yaneva M, Tempst P. (2004). A prototype 

antibody microarray platform to monitor changes in protein tyrosine phosphorylation. 

Mol Cell Proteomics. 3, 1102–18. 

29. Janzi M, Odling J, Pan-Hammarstrom Q, Sundberg M, Lundeberg J, Uhlen M, 

Hammarstrom L, Nilsson P. (2005). Serum microarrays for large scale screening 

of protein levels. Mol Cell Proteomics. 4, 1942–7. 

30. MacBeath G, Schreiber SL. (2000). Printing proteins as microarrays for highthroughput 

function determination. Science. 289, 1760–3. 

31. Knezevic V, Leethanakul C, Bichsel VE, Worth JM, Prabhu VV, Gutkind JS, 

Liotta LA, Munson PJ, Petricoin EF 3rd, Krizman DB. (2001). Proteomic profiling 

of the cancer microenvironment by antibody arrays. Proteomics. 1, 1271–8. 

32. Arenkov P, Kukhtin A, Gemmell A, Voloshchuk S, Chupeeva V, Mirzabekov A. 

(2000). Protein microchips: use for immunoassay and enzymatic reactions. Anal 

Biochem. 278, 123–31 

33. Rai AJ, Gelfand CA, Haywood BC, Warunek DJ, Yi J, Schuchard MD, Mehigh 

RJ, Cockrill SL, Scott GB, Tammen H, Schulz-Knappe P, Speicher DW, Vitzthum 

F, Haab BB, Siest G, Chan DW. (2005). HUPO Plasma Proteome Project specimen 

collection and handling: towards the standardization of parameters for plasma 

proteome samples. Proteomics. 5, 3262–77. 

34. States DJ, Omenn GS, Blackwell TW, Fermin D, Eng J, Speicher DW, Hanash 

SM. (2006). Challenges in deriving high-confidence protein identifications from


data gathered by a HUPO plasma proteome collaborative study. Nat Biotechnol. 

24, 333–8. 

35. Uhlen M, Bjorling E, Agaton C, Szigyarto CA, Amini B, Andersen E, Andersson 

AC, Angelidou P, Asplund A, Asplund C, Berglund L, Bergstrom K, Brumer 

H, Cerjan D, Ekstrom M, Elobeid A, Eriksson C, Fagerberg L, Falk R, Fall J, 

Forsberg M, Bjorklund MG, Gumbel K, Halimi A, Hallin I, Hamsten C, Hansson 

M, Hedhammar M, Hercules G, Kampf C, Larsson K, Lindskog M, Lodewyckx 

W, Lund J, Lundeberg J, Magnusson K, Malm E, Nilsson P, Odling J, Oksvold P, 

Olsson I, Oster E, Ottosson J, Paavilainen L, Persson A, Rimini R, Rockberg J, 

Runeson M, Sivertsson A, Skollermo A, Steen J, Stenvall M, Sterky F, Stromberg 

S, Sundberg M, Tegel H, Tourle S, Wahlund E, Walden A, Wan J, Wernerus H, 

Westberg J, Wester K, Wrethagen U, Xu LL, Hober S, Ponten F. (2005). A human 

protein atlas for normal and cancer tissues based on antibody proteomics. Mol Cell 

Proteomics. 4, 1920–32.

V 

Statistics and Bioinformatics in Clinical 

Proteomics Data Analysis

16 

2D-PAGE Maps Analysis 

Emilio Marengo, Elisa Robotti, and Marco Bobba 

Summary 

Due to the low reproducibility affecting 2D gel-electrophoresis and the complex maps 

provided by this technique, the use of effective and robust methods for the comparison 

and classification of 2D maps is a fundamental tool for the development of automated 

diagnostic methods. A review of classical and recently developed methods for the 

comparison of 2D maps is presented here. The methods proposed regard both the analysis 

of spot volume datasets through multivariate statistical tools (pattern recognition methods, 

cluster analysis, and classification methods) and the analysis of 2D map images through 

fuzzy logic, three-way PCA, and the use of moment functions. 

The theoretical basis of each procedure is briefly introduced, together with a review 

of the most interesting applications present in recent literature. 

Key Words: principal component analysis; cluster analysis; classification; SIMCA; 

image analysis; moment functions; fuzzy logic; three-way PCA; multidimensional scaling; 

spot volume data. 


The development of new and effective methods for the identification of 

differences between groups of 2D-PAGE maps represents one of the frontiers 

in the field of proteomics, for the development of reliable diagnostic/prognostic 

tools. The comparison of sets of 2D maps is not in fact a trivial problem 

due to some experimental limitations affecting 2D gel-electrophoresis. In 

spite of being a very powerful tool for the separation of proteins in cellular 

extracts, 2D gel-electrophoresis is characterized by quite low reproducibility: 



291

292 Marengo et al. 

this limit is dictated by both the specificity of the specimen and the instrumental 

procedure employed to obtain the final electrophoretic maps. In fact, the 

analyzed biological samples often present complex protein mixtures, covering 

a wide range of structures, properties, and molecular weights. The complexity 

of the sample is reflected in the complexity of the final map that may contain 

hundreds or thousands of spots, with the further appearance of spurious spots 

due to impurities or side reactions. The second aspect to reducing reproducibility 

in 2D gel-electrophoresis is related to the instrumental technique itself, from 

sample preparation to the electrophoretic run. Sample pre-treatment, in fact, 

follows a multi-step procedure consisting of several purification and extraction 

steps, increasing the overall experimental uncertainty. In addition, the final 

result is strongly dependent on a great number of instrumental factors that 

have to be taken under strict control: polymerization conditions, temperature, 

running conditions, time and temperature during staining and de-staining steps. 

An unexpected or random variation of one or more of these instrumental parameters 

can strongly affect the final result of reproducibility of the position, size, 

and intensity of the spots on the final map. 

The large number of spots present on each map and the low reproducibility 

of 2D gel-electrophoresis worsen the achievement of a clear classification of 

samples and make it quite difficult to use 2D-PAGE maps for diagnostic and 

prognostic purposes or for drug-design studies. In this perspective, the use 

of effective and robust methods for the comparison and classification of 2D 

maps is a key point in the development of automated diagnostic tools based on 

proteomics. For taking due consideration of the low reproducibility affecting the 

experimental protocol, sets of replicate 2D maps are usually run and compared. 

The classical analysis of 2D-PAGE maps is usually carried out by dedicated 

software packages, which will be briefly described here. The second part of 

the chapter will focus on the use of multivariate statistical tools for a more 

effective analysis of the so-called “spot volume datasets” produced by software 

packages dedicated to 2D-PAGE image analysis. 

The final part of the chapter will be devoted to the most advanced applications 

of image analysis tools for the study and classification of 2D maps; 

these methods will be presented based on fuzzy logic principles coupled with 

multivariate statistical tools or on the calculation of mathematical moments of 

the images. 

2. Gel Analysis Via Dedicated Software Packages 

The analysis of sets of 2D maps is usually carried out via dedicated software 

packages; among the most popular are PDQuest, Progenesis, Melanie, Z3, 

Phoretix, Z4000, but many other solutions are commercially available.

2D-PAGE Maps Analysis 293 

Many papers appeared in the last decade about the development of software 

packages (1,2,3), the comparison of the performances of different packages 

(4,5), or the widening of particular topics like point pattern matching, reproducibility, 

matching efficiency and spot overlapping (6,7,8,9,10,11,12,13,14, 

15,16). 

All software solutions presently available perform the analysis of sets of 2D 

maps based on the digitalized images of gels obtained by laser densitometry, 

phosphor imagery, or via a CCD camera. The analysis of digitalized images 

involves several steps, which are described here in more detail with particular 

reference to one of the most used ones, namely the PDQuest system (17,18,19): 

1. Scanning. Gel images are turned into pixel data; each pixel is characterized by a 

couple of coordinates x–y indicating its position on the 2D image and a Z value 

corresponding to the signal intensity of the pixel. Each map is finally turned into 

a series of pixels described by their optical density value (OD). 

2. Filtering images. This step performs a pre-processing of gel images, allowing the 

elimination of noise, background effects, specks, and other imperfections. 

3. Automated spot detection. Spot detection involves the identification of spots 

present on each gel independently. The operator has to select the faintest spot 

(to set the sensitivity and minimum peak value parameters), the smallest spot 

(to set the size scale parameter), and the largest spot that one aims to detect. 

A final smoothing is applied to remove spots close to the background level. 

Spots are then located on the gel image (i.e., each spot is identified by a couple 

of x–y coordinates indicating its position on the gel), fitted by ideal Gaussian 

distributions and quantified by the sum of the OD values within each Gaussian 

distribution. 

4. Matching of protein profiles. Sets of 2D gels are then edited and matched to one 

another in a “match set.” Each identified spot is matched to the same spot in 

all the other gels of the set under investigation. To this purpose, landmarks are 

needed, consisting of reference spots used by PDQuest to align and position the 

match set members for matching. The identification of the landmarks sets some 

parameters accounting for distortions existing among the gels to be compared. 

5. Normalization. Normalization is then applied to the maps to compensate gel-togel 

variations due to sample preparation and loading, staining and de-staining 

procedures, etc. 

6. Differential analysis. This step allows the analysis of different sets of 2D maps, 

i.e., control and diseased samples. Within each group of different 2D maps, a 

“sample group” is created containing the average values of all the spots identified. 

Once the sample groups have been created (i.e., control and diseased samples), the 

comparison of the groups is carried out to find differentially expressed proteins. 

Usually, only spots showing a two-fold variation are accepted as significantly 

changed (100% variation). 

7. Statistical analysis. Statistical analysis is then applied to the differentially 

expressed proteins. It is usually based on Student’s t-test (p


The final result of the overall procedure, therefore, appears deeply dependent 

on the accuracy of the software package adopted, and so the choice of the most 

suitable analysis software is critical. 

Commercial software packages, in spite of being powerful tools for image 

analysis, present two main disadvantages. The first one is related to human 

interference, which is introduced mainly in steps 2 and 3. The second disadvantage 

is related to the problem of replicas; the comparison of different groups 

of 2D maps is performed on the basis of the obtained “sample group” of each 

class, i.e., a gel containing the average of the information common to all replicates. 

In this way, single replicas are not considered, and the information about 

the reproducibility of the maps is not taken into proper consideration. 

3. Analysis of Spot Volume Datasets 

Spot volume datasets coming from the differential analysis via dedicated 

software (step 5 of the procedure described in Section 2) are particularly suitable 

for investigation by means of multivariate statistical tools; this is due both to 

their large dimensionality (a large number of spots identified on each map) and 

to the difficulty in identifying the small differences existing between groups 

of maps when hundreds of spots are contemporarily detected on each sample. 

From this point of view, multivariate statistical tools represent the best 

alternative since they are able to provide a clear representation of the case 

under study, considering all the variables contemporarily, and produce robust 

results, i.e., eliminating the contribution of experimental uncertainty. Among 

the statistical techniques that are and have been recently and successfully 

applied to spot volume datasets are pattern recognition methods, e.g., Principal 

Component Analysis (PCA) and Cluster Analysis; classification methods, e.g., 

Linear Discriminant Analysis (LDA) and Soft-independent Model of Class 

Analogy (SIMCA); and regression methods e.g., discriminant analysis–partial 

least squares regression (DA-PLS). 

Data from spot volume datasets present a multivariate structure, where 

several samples (maps) are described by a large number of variables (spots 

identified). Multivariate data are usually arranged in matrices to undergo the 

statistical analysis. The datasets taken into account hereafter are arranged in 

data matrices of dimensions n × p, where n is the number of samples (one for 

each row of the matrix) and p is the number of variables (one for each column 

of the matrix). 

3.1. Principal Component Analysis 

Principal Component Analysis (20,21) is a multivariate pattern recognition 

method that represents the objects, described by the original variables, in a


new reference system characterized by new variables called principal components 

(PCs; see also Chapter 17). Each PC has the property of explaining the 

maximum possible amount of residual variance contained in the original dataset: 

the first PC explains the maximum amount of variance contained in the overall 

dataset, while the second one explains the maximum residual variance. The 

PCs are then calculated hierarchically so that experimental noise and random 

variations are contained in the last PCs. 

The PCs maintain a strict relationship with the original reference system, 

since they are calculated as linear combinations of the original variables. They 

are also orthogonal to each other, thus containing independent sources of information 

(Fig. 1). The hierarchical way in which PCs are calculated makes them 

useful for operating a dimensionality reduction of the original dataset: in fact, 

a large number of original variables can be substituted by a smaller number of 

significant PCs, containing a relevant amount of information when compared to 

the overall amount of variance contained in the original dataset, but eliminating 

experimental uncertainty (which is accounted for by the last PCs). 

Principal Component Analysis provides two main tools for data analysis: the 

scores and the loadings. The scores represent the coordinates of the samples 

in the new reference system, while the loadings represent the coefficients of 

the linear combination describing each PC, i.e., the weights of the original 

variables on each PC. The graphical representation of the scores in the space 

of the PCs allows the identification of groups of samples showing a similar 

behavior (samples close to one another in the graph) or different characteristics 

(samples far from each other). By looking at the corresponding loading plot, it 

is possible to identify the variables that are responsible for the analogies or the 

differences detected for the samples in the score plot. 

An example of loading and score plot is represented in Fig. 2. Data belong 

to four groups of 2D maps (24 maps described by more than 1000 spots). From 

the score plot, it is possible to discriminate the four groups of samples present: 

Fig. 1. Construction of the principal components.


(A) 

Loading Plot 

PC2 

0.08 

0.06 

0.04 

0.02 

0.00 

– 0.02 

– 0.04 

– 0.06 

– 0.08 

V435 

V352 

V119 V160 

V426 V217 

V215 

V111 V479 

V430 

V295 

V968 

V796 

V148V317 

V60 

V451 V84 V150 

V423 

V729 

V208 

V363 V303 

V428 

V381 

V269 V405 

V475 

V509 V759 

V1076 

V112 

V188 V856 

V513 

V158 

V228 

V275 

V136 

V310 

V605 V912 V1008 

V259 

V753 V931 

V276 

V419 V450 

V145 

V416 

V42 

V94 V413 V672 

V788 

V1006 

V1116 

V915 V847 V550 

V409 

V305 

V139 

V1079 V743 

V17 

V237 

V41 V308 V603 V166 

V534 

V818 V963 

V916 

V280 V328 

V271 

V346 

V415 

V526 

V113 

V823 

V668 

V309 

V726 V486 V458 

V116 V96 V176 V781 

V834 

V1064 V888 

V708 

V204 

V279 

V474 V877 

V130 

V138 

V86 

V50 

V361 

V388 

V429 V403 V476 

V359 

V452 

V522 V709 

V932 

V902 

V973 

V949 

V982 

V478 

V512 V725 

V379 

V489 

V465 

V266 

V365 

V31 

V296 

V128 V367 

V436 

V555 

V890 

V1010 V1034 

V1167 

V987 V939 

V741 V653 V1106 

V675 

V717 V921 

V1107 

V1127 

V477 V990 

V214 V311 

V493 V250 V70 V55 

V65 

V99 V122 

V167 V245 

V283 V288 

V397 

V674 

V768 

V524 

V531 

V881 

V860 V889 V906 

V828 V632 V542 V919 V652 V946 

V950 V967 V1019 

V1137 

V1001 

V972 

V1050 

V200 

V58 

V74 V77 V103 

V124 V341 

V325 

V185 V172 

V97 

V351 

V195 V189 

V297 

V380 

V408 

V463 

V117 V246 

V443 V470 

V492 

V506 

V517 

V790 

V521 V784 V841 V563 V824 V1004 V1023 V754 

V591 

V872 V901 

V937 

V871 V883 

V616 V1039 

V947 

V287 

V98 

V21 

V126 

V142 

V143 V203 

V298 

V454 

V528 V395 V469 

V495 

V353 

V553 

V650 V640 V571V613 

V649 V582 

V597 V730 V899 V1017 V1062 

V1154 V1155 

V1157 

V174 

V157 

V360 

V364 

V231 

V414 

V501 

V182 

V255 

V273 V292 V256 

V44 V220 V4 

V199 V146 V114 V110 

V59 

V137 

V180 

V194 

V53 

V78 

V227 

V230 

V278 V336 

V399 

V538 

V554 

V439 V567 V579 

V637 

V850 

V1133 V985 

V1092 

V771 

V813 

V865 

V859 

V933 

V665 

V787 

V976 

V1040 

V758 

V1091 

V900 

V898 

V922 

V453 V576 

V669 

V274 V347 V689 

V760 V772 

V808 V798 

V863 

V984 

V938 

V1007 

V869 

V998 

V745 

V253 

V257 

V312 

V302 

V324 

V427 

V491 

V738 

V778 

V101 V177 

V118 

V369 

V420 

V447 

V455 V421 

V581 V705 

V809 

V802 V1105 V979 

V941 

V1014 

V1060 V1037 

V617 

V920 V917 

V934 

V1020 

V643 

V519 

V592 

V536 V737 

V1067 V1003 V864 V964 V1109 V1030 

V1114 

V1084 

V1049 

V684 

V472 

V490 V149 V156 V216 V270 V560 

V168 

V9 

V30 V73 V106 

V35 

V92 V108 

V229 

V483 

V557 V569 

V618 

V644 

V686 V630 V691 V840 

V842 V822 

V1058 V980 

V1087 V1080 

V774 

V211 V224 V267 

V935 

V961 V804 V791 V1078 

V693 V1126 

V996 

V1082 

V514 

V243 

V348 

V19 

V102 

V56 V104 

V36 V93 

V115 

V135 

V213 

V385 

V394 V306 

V400 

V410 

V716 

V251 

V262 

V570 V826 

V1036 V543 

V318 V264 V284 

V222 V123 V197 

V339 V334 

V376 

V437 

V559 V516 V599 

V456 

V125 V396 

V503 

V505 V552 V623 

V878 

V639 

V831 V609 V966 V805 

V903 V965 V943 V953 V928 

V879 V1013 V1074 

V1085 

V473 

V508 V100 

V608 

V587 

V236 

V625 

V706 V634 

V191 V26 

V159 

V354 V401 

V485 

V32 

V45 

V105 

V85 

V133 

V152 

V181 

V238 V404 

V329 

V307 

V496 

V547 

V187 

V249 

V234 V527 V561 

V590 

V529 

V572 

V588 

V641 

V671 

V656 

V660 

V734 V810 V849 

V1063 V1083 V1164 

V843 

V848 

V1129 V1135 

V1045 V955 

V692 V682 

V511 

V254 

V412 V633 V573 V747 V596 

V884 

V789 V904 

V344 

V80 

V244 V54 

V621 

V779 V780 

V929 V994 

V1066 

V1042 

V1069 

V991 

V857 

V914 V956 

V807 

V978 V1011 V1128 V1119 

V1149 

V1165 

V1166 

V1075 

V1056 

V1123 

V1124 

V43 

V258 

V285 V291 

V417 V386 

V390 V461 

V504 

V109 

V332 

V433 V418 

V241 

V127 

V29 V38 

V179 V171 

V232 

V375 V484 

V502 V549 

V510 

V498 

V626 

V638 V777 

V1121 V1111 V1093 

V793 V940 V930 

V1145 

V740 

V696 

V736 

V535 

V462 

V63 

V201 

V337 V247 V615 V206 V459 

V272 

V170 

V25 V186 V33 

V193 

V248 

V494 

V566 V580 

V642 

V647 

V676 

V695 

V769 

V820 

V891 V911 

V942 

V1057 

V1138 

V1101 

V1141 

V797 

V926 

V851 

V892 

V711 V723 V752 

V762 

V662 

V750 V659 

V602 

V221 V301 

V666 V766 V679 V829 

V601 V545 V373 

V703 V832 

V844 

V854 V855 V874 

V893 V952 

V1025 

V1110 

V1041 

V1026 

V1152 

V896 

V913 

V1018 

V957 

V948 V1077 

V299 

V79 V169 V8 

V16 

V140 

V320 V192 

V293 

V164 V165 

V2 

V207 V184 

V212 

V338 V537 V482 V595 

V546 

V678 V681 

V358 

V362 

V20 V87 

V219 

V392V370 

V480 

V235 

V81 V252 V22 

V129 V173 

V48 

V155 

V268 

V144 V190 V218 

V294 

V342 

V533 V460 

V600 

V624 V718 

V835 V812 

V698 

V763 

V532 V343 

V411 

V523 

V578 V728 V830 V598 V593 

V685 V720 V862 

V905 V951 V918 

V954 

V1029 

V1146 V1143 V1096 

V1102 V1095 

V960 V1086 

V875 V1144 

V1098 V969 V1094 

V1044 

V544 

V677 

V424 

V539 

V744 

V821 

V800 

V702V321 

V627 

V225 

V398 V746 V497 

V556 

V95 

V71 

V202 

V434 

V551 V612 

V651 

V64 

V226 V68 V260 V240V507 

V564 V628 

V714 

V814 

V838 

V846 V853 

V958 

V1160 

V1031 

V959 

V962 

V970 

V1065 

V944 V923V924 V936 V870 

V1051 

V1108 V1140 

V1070 

V861 V852 

V72 V314 

V648 

V239 

V153 V261 

V286 V290 

V330 

V377 V382 

V402 

V393 

V449 

V699 V733 

V773 V880 V1021 

V909 

V1159 V1072 V1000 

V815 

V1081 

V885 V1125 

V1068 V999 

V1071 

V876 

V631 V619 

V783 V886 V971 V977 V1059 

V1161 

V801 

V316 V389 

V432 V457 

V565 

V442 

V739 V732 V664 V575 

V724 V742 

V175 V407 V383 

V196 V422 V132 V488 

V444 V319 

V562 

V663 

V765 

V670 

V907 

V894 

V908 V1099 

V981 

V1035 

V1054 V1132 

V1047 

V1142 

V1134 V925 V794 V1153 

V1158 

V1163 V1162 

V694 

V727 

V10 V151 V178 V304 V162 V39 V83 

V233 

V466 

V471 V622 V558 

V764 V667 V792 

V776 

V989 

V1012 

V997 

V986 V1005 

V1151 

V722 

V680 V636 V372 

V371 

V88 

V183 

V340 V147 V355 V52 

V131 

V209 V468 V518 

V583 

V300 

V326 V464 

V586 

V541 

V751 V1028 

V1043 

V1073 V1100 

V1097 

V782 

V1033 V1027 

V735 V384 

V755 

V839 

V1103 

V1131 

V1139 

V910 

V313 V837 

V47 

V629 

V499 

V594 

V391 

V687 

V749 

V481 

V635 

V767 

V1089 

V927 

V1113 

V945 V1104 V1038 

V1009 V1120 V1115 

V1088 

V775 V995 V811 

V697 

V75 

V27 

V5 

V40 

V89 V277 

V210 

V62 

V333 V289 

V322 V487 V704 V731 V1022 

V1156 

V1090 

V1061 

V1024 

V1055 V525 

V707 

V24 V242 V349 V350 V548 

V441 V515 

V719 V568 

V584 

V867 V620 

V770 V1136 

V690 

V1148 

V585 

V604 

V57V141 

V378 

V263 V1015 

V107 

V431 V577 

V610 V858 

V992 

V817 V868 V540 

V23 

V356 V374 

V134 

V607 V757V345 

V661 V673 

V683 

V51 V611 V614 V265 

V715 

V387 V530 

V748 

V873 

V895 

V988 

V1032 

V82 V67 

V281 

V712 

V756 

V761 V1130 V1052 

V90 

V13 V76V49 

V833 

V445 

V335 

V819 V710 V520 

V467 

V836 

V887 V1048 

V1046 

V1122 

V1118 

V866 

V440 

V974 

V897 V331 

V368 

V37 

V28 V12V3 

V154 

V315 V161 

V448 

V825 

V1016 V1053 

V1147 

V688 

V6 

V34 

V323 V1117 

V198 V18 V366 

V15 

V655 

V1002 

V700 

V61 V66 

V14 V69 V121 

V163 

V406 V327 

V91 V438 V827 V845 V786 

V645 V446 

V882 

V120 V205 

V721 V46 V606 

V657 V1112 V574 V654 V785 V993 V975 

V795 

V816 

V357 

V658 V806 

V1150 

V701 

V282 

V425 

V803 

V500 V589 V983 

V223 

V646 V713 V799 

– 0.06 – 0.04 

–0.02 0.00 0.02 

0.04 

0.06 

PC1 

(B) 

PC2 

25 

20 

15 

10 

5 

0 

–5 

–10 

– 15 

C6 

C5 

C1 

C4 

C2 C3 

A 

A5 

A6 

A1 

C 

A2 

A3 

A4 

Score Plot 

B6 

AB6 

AB1 

B2 B3 

B4 

B1 B5 

AB5AB2AB3 

AB4 

– 20 

– 25 –20 –15 –10 –5 0 5 10 15 20 25 

PC1 

B 

AB 

Fig. 2. Example of loadings (A) and scores plots (B). 

one group in each quadrant. The first PC is able to discriminate samples C and 

A (negative scores on PC 1 ) from samples B and AB (positive scores on PC 1 ); 

PC 2 separates samples C and B (positive scores on PC 2 ) from samples A and 

AB (negative scores on PC 2 ). The analysis of the corresponding loading plot 

explains the reasons for the separation of samples in the four groups: sample 

C shows large intensities of the spots in the 2 nd quadrant and small intensities 

of the spots in the 4 th quadrant, sample A shows large intensities of the spots 

in the 3 rd quadrant and very small in the 1 st quadrant ; samples AB present 

a behavior opposite to that of sample C, while sample B presents a behavior 

opposite to sample A.


From the point of view of identification of groups of samples and variables 

existing in a dataset, PCA is a very powerful visualization tool, which allows 

the representation of multivariate datasets by means of only few PCs identified 

as the most relevant. 

In proteomics, the representation of loadings appears more effective on a 

virtual 2D map. In proteomic datasets, in fact, each variable represents a spot, 

characterized by a couple of x–y values defining its position on the 2D maps 

used for analysis. The loadings of each PC can then be represented on a “virtual” 

2D map, where each spot is represented as a circle centered in the corresponding 

x–y position: each spot can be described on a color scale, with the increasing 

color tone corresponding to an increasing positive or negative loading. This 

representation was proposed for the first time by Marengo et al. (22,23). 

An example is represented in Fig. 3, where positive and negative loadings 

of the first PC are represented, referring to the example of Fig. 2. The representation 

appears clearer with respect to the loading plot of Fig. 2, allowing 

the immediate identification of the spots showing the most relevant loadings 

(darker grey tones) on the corresponding PC. 

3.2. Cluster Analysis 

Cluster analysis techniques are pattern recognition methods that help to 

identify the existence of groups of samples or of variables in a dataset, through 

the investigation of the relationships between the objects or variables. Cluster 

analysis tools are unsupervised methods, where the operator does not know the 

dataset partition and wants to identify potential groups of objects. From this 

point of view, they are different from classification methods, where the operator 

does know the separation of objects in classes and wants to obtain the best 

classification of objects in the corresponding class. The most used clustering 

methods belong to the class of agglomerative hierarchical methods (24), where 

the objects are grouped (linked together) on the basis of a measure of their 

similarity. The most similar objects or groups of objects are linked first. The 

final result is a graph, called dendrogram; the objects are represented on the x 

axis and are connected at decreasing levels of similarity along the y axis. An 

example is reported in Fig. 4, referring to the dataset already presented in Figs. 2 

and 3. The four groups of samples can be identified by applying a horizontal 

cut of the dendrogram, i.e., at a dissimilarity level of 25%, and identifying the 

number of vertical lines present. The clustering technique applied shows a first 

partition of the samples into two main groups that can be further separated 

into three groups at a dissimilarity level of 50%. The four groups present can 

be identified only by applying a further cut at a dissimilarity level of 25%. 

Samples B and AB, thus, appear the most similar groups.


220 

Positive Loadings 

200 

180 

160 

140 

120 

100 

80 

60 

40 

20 

0 

0 20 40 60 80 100 120 140 160 180 200 220 

220 

Negative Loadings 

200 

180 

160 

140 

120 

100 

80 

60 

40 

20 

0 

0 20 40 60 80 100 120 140 160 180 200 220 

Fig. 3. Positive and negative loadings of PC 1 represented on a virtual 2D-map.


100 

Ward Method 

Euclidean Distances 

80 

(D leg / D max )*100 

60 

40 

20 

AB 

B 

A 

C 

0 

AB3 

AB2 

AB5 

AB4 

AB6 

AB1 

B3 

B2 

B6 

B5 

B4 

B1 

A4 

A1 

A6 

A2 

A5 

A3 

C4 

C5 

C2 

C3 

C6 

C1 

Fig. 4. Dendrogram (Ward method, Euclidean distances). 

The results of hierarchical clustering methods depend on the specific measure 

of similarity and on the linking method, and so different methods are usually 

adopted to have a general idea of the number of groups present. In general, 

the linking methods that provide the best results with regard to the clarity of 

groups identified are the Ward method and the Complete Linkage method. 

With regard to the measure of similarity, the Euclidean distances are usually 

adopted. 

Clustering techniques can be applied both to the original variables and to 

the results of PCA (scores of the significant PCs), thus achieving a cluster of 

samples eliminating the contribution of experimental error and exploiting only 

useful sources of variation. 

3.3. Classification Methods 

The classification methods are particularly suitable for the analysis of 

proteomic spot volume datasets since the primary necessity in this application 

is the classification of samples belonging to different groups, e.g., to both 

control and diseased individuals, to their proper class. The final aim is both the 

development of diagnostic tools and the identification of differences existing


between the classes to shed light on the mechanism of action of a disease or 

of a new drug. 

Here, two of the most exploited classification methods will be briefly 

described: LDA and SIMCA. 

3.3.1. Linear Discriminant Analysis 

Linear Discriminant Analysis (25,26) belongs to the so-called Bayesian 

classification methods, since it exploits the Bayes’s rule; it performs the 

classification of samples present in a dataset based on its multivariate 

structure. 

In Bayesian classification methods, an object, x, is assigned to the class, g, 

for which the posterior probability P(g/x) is maximum. Posterior probability is 

computed according to Bayes’s formula: 

where 

Pg/x = 

P gfg/x 

∑ 

P k fk/x 

P g is the prior probability of class g; 

P k is the prior probability of class k (k ≠ g); 

f(g/x) is the probability density function of class g; and 

f(k/x) is the probability density function of class k. 

One normal assumption is that each class is described by a Gaussian multivariate 

probability distribution: 

where: 

P g 

fgx = 

2 p/2 S g 1/2 e−1/2x i−c g T Sg 

−1 

P g is the prior probability of class g; 

S g is the covariance matrix of class g; 

c g is the centroid of class g; and 

p is the number of descriptors. 

The argument of the exponential function: 

k 

x i − c g T S −1 

g 

x i − c g 

x i−c g 

is the Mahalanobis distance between object x and the centroid of class 

g, and it takes into consideration the class covariance structure since it


contains the covariance matrix. The covariance matrix accounts for the relationships 

existing among the variables for each class, i.e., the shape of the 

class. 

From the logarithm of posterior probability by eliminating the constant terms, 

each object is classified in class g if it is minimum, the so-called discriminant 

score: 

Dgx = x i − c g T S −1 

g 

x i − c g + ln S g −2lnP g 

In LDA, the covariance matrix of each class is approximated with the pooled 

(between the classes) covariance matrix, thus considering all the classes having 

a common shape, i.e., a weighted average of the shape of each class present in 

the dataset. 

The variables contained in the LDA model, which discriminate the classes 

present in the dataset, can be chosen by a stepwise algorithm, selecting the 

most discriminating variables iteratively. LDA can be performed on both the 

original variables or on PCs, thus eliminating the contribution to variation given 

by experimental uncertainty. 

3.3.2. Soft-Independent Model of Class Analogy 

The SIMCA method (27) is based on the independent modeling of each 

class by means of PCA; in fact, each class is described by its relevant PCs. The 

samples of each class are then contained in the so-called SIMCA boxes, defined 

by the relevant PCs of each class. This represents one of the most important 

advantages of SIMCA; the classification of each sample is not affected by 

experimental uncertainty and spurious information, since each class is modeled 

only by its relevant PCs. Moreover, this method is also useful when small 

datasets are analyzed (more variables than objects), since it performs substantial 

dimensionality reduction. 

Thus, SIMCA classification starts with PCA calculated previously on each 

class independently, with the identification of relevant PCs for each class. They 

define the so-called class model. If the data are autoscaled (mean centering 

followed by normalization for the standard deviation of each variable), each 

object x iv belonging to class g is modeled as: 

x ivg = ∑ a 

t iag l vag + r ivg g= 1Ga= 1 A g i= 1 n g v = 1 P 

(G = number of classes present; A g = number of significant PCs for class g; 

n g = number of samples in class g; P = number of original variables)


where 

t iag = score of the i-th object of class g on the a-th PC; 

l vag = loading of the v-th variable on the a-th PC of class g; and 

r ivg = residual of the i-th object of class g for variable v. 

The values estimated by the model are then: 

ˆx ivg = ∑ a 

t iag l vag 

while the residuals are defined as: 

r ivg = ˆx ivg − x ivg 

The classification rule of object i is based on a Fisher’s F-test so that object 

i is classified in class g if: 

rsd 2 ig 

rsd 2 g 


where 

sd vc = standard deviation of variable v on class c; 

rsd vc = residual standard deviation of variable v of the objects of class c 

from the model of their own class. 

The MP ranges from 0 (variable irrelevant on the definition of the class 

model) to 1. 

A typical representation of MP is given in Fig. 5, where the variables are 

represented on the x axis, and MP is represented as a bar diagram on the y 

axis. Figure 5 represents the MPs of class C in the example of Figs. 2–4. 

The discrimination power (DP) is a measure of the ability of each variable 

to discriminate between two classes (c and g) at a time. The greater the DP, 

the more a variable weights on the classification of an object in class c or g. It 

is defined as: 

√ 

rsd 2 vcg + rsd 2 vgc 

DP vc = 

rsd 2 vc + rsd 2 vg 

1.0 

0.8 

0.6 

0.4 

0.2 

0.0 

1 

95 

189 

283 

377 

471 

565 

659 

753 

847 

941 

1035 

1129 

Fig. 5. Modeling power of a class of six control samples.


where 

rsd 2 vcg = square residual standard deviation of variable v of the objects 

of class c from the model of class g; 

rsd 2 vgc = square residual standard deviation of variable v of the objects of 

class g from the model of class c; 

rsd 2 vc = square residual standard deviation of variable v of the objects 

of class c from the model of their own class; 

rsd 2 vg = square residual standard deviation of variable v of the objects of 

class g from the model of their own class. 

The DP is positively defined, but it is not limited. A representation of DP 

is shown in Fig. 6; the variables are represented on the x axis, and DPs as bar 

diagram on the y axis. Figure 6 represents the DPs of classes A and B for 

the example of Figs. 2–5. In general, when the dataset is constituted by two 

classes, a unique set of DPs is obtained, corresponding to the discrimination 

between the two classes present. On the other hand, where more than two 

classes are present, it is possible to obtain a set of DPs for each couple of 

classes compared. 

Modeling powers and DPs can be represented on a color scale on “virtual” 

2D maps, as seen for the loadings plots, for clearer representation. An example 

is given in Fig. 7, where the MPs and DPs represented as bar diagrams in 

Figs. 5 and 6 are represented on virtual 2D maps. 

6000 

5000 

4000 

3000 

2000 

1000 

0 

1 

95 

189 

283 

377 

471 

565 

659 

753 

847 

941 

1035 

1129 

Fig. 6. Discriminating power of two classes: treated with drug A (six samples) and 

with drug B (six samples).


220 

200 

180 

160 

140 

120 

100 

80 

60 

40 

20 

0 

0 20 40 60 80 100 120 140 160 180 200 220 

Modeling Power of class C 

220 

200 

180 

160 

140 

120 

100 

80 

60 

40 

20 

0 

0 20 40 60 80 100 120 140 160 180 200 220 

Discrimination Power classes A–B 

Fig. 7. MPs and DPs of Figs 5 and 6 represented on virtual 2D-maps.


3.4. Partial Least Squares (PLS) Regression and Discriminant 

Analysis–Partial Least Squares (DA-PLS) Regression 

Partial least squares is a regression method using the information contained 

in X data matrix to predict the behavior of Y data matrix. PLS method models 

both X and Y variables simultaneously to find the latent variables in X that 

will predict the latent variables in Y. These PLS components (latent variables) 

are similar to the PCs. If there are several responses, they are modeled together 

in a multivariate way (28,29,30). PLS can be used for discriminant analysis 

(DA-PLS) by creating a response variable for each category: in the case of 

proteomic data, one response variable for each group of samples. Each response 

variable is assigned a 1 value for the samples belonging to the corresponding 

class, and a 0 value for the samples belonging to different classes. 

3.5. Applications 

3.5.1. Pattern Recognition Methods 

Many applications are reported in literature for the use of multivariate tools 

in the analysis of spot volume datasets. PCA can be considered quite a classical 

approach with its first application to spot volume data dating back to the mid- 

1980s, as reported by Anderson (31) in USA and Tarroux (32) in France. 

Anderson (31) reports an application of PCA coupled to cluster analysis to 

identify the differences among a panel of human cell lines; all the groups were 

successfully separated considering only the subset of proteins present in all 

the cell lines contemporarily. Tarroux et al. (32) applied PCA in the HERMeS 

software package, again coupled to cluster analysis. 

More recently, both PCA and cluster analysis have been applied to the study 

of DNA and RNA fragments of several biological systems by the groups of 

Couto (33), Johansson (34), and Boon (35) and to the immunological diagnosis 

of hydatidosis (36,37). Other applications are from the group of Kovarova (38, 

39) and De Moor et al. (40), who applied multivariate tools to microarray data. 

Iwadate et al. (41) applied discriminant analysis to the classification of 

human gliomas; the proteomic patterns of 85 tissue samples were compared 

(52 glioblastoma multiforme, 13 anaplastic astrocytomas, 10 atrocytomas, 10 

normal brain tissues). Normal brain tissues could be correctly distinguished 

from glioma tissues by cluster analysis, which proved to be significantly correlated 

with the patient survival. Discriminant analysis extracted a set of 37 

proteins differentially expressed based on histological grading. 

Principal Component Analysis has been also applied to toxicological studies 

by the groups of Amin (42), Hejine (43), and Anderson (44). The first paper 

(42) reports a study on the effect on expression profile of genes played by three


nephrotoxicants (cisplatin, gentamicin, and puromycin) on rats, as a function of 

time after initial administration. PCA and gene expression-based clustering of 

compound effects confirmed sample separation based on dose, time, and degree 

of renal toxicity. Heijne (43) studied the acute hepatotoxicity induced in rats by 

bromobenzene administration; the physiological symptoms recorded coincided 

with many changes of hepatic mRNA and protein content. PCA proved to be 

effective in the discrimination between control and treated samples for both 

protein and gene expression profiles; some of the proteins that significantly 

changed upon bromobenzene treatment were identified by mass spectrometry. 

Anderson (44) investigated the effects of five peroxisome proliferators on the 

protein profile in the livers of treated mice at 5- and 35-day time points. Data 

for the selected set of 107 liver protein spots, which respond strongly to at 

least one of the test compounds, were subjected to PCA to search for global 

protein pattern changes. PC 1 was identified as a global measure of peroxisome 

proliferation by its correlation with enzymatic peroxisomal -oxidation, while 

PC 2 separated the samples on the basis of time exposures. 

Perrot et al. (45) applied PCA to the comparison of protein expression of 

gel-entrapped Escherichia coli cells submitted to a cold shock at 4 °C with 

those of exponential- and stationary-phase free-floating cells. Ten different 

incubation conditions were considered; each experiment was replicated three 

times and each gel was run in duplicate. PCA was carried out on the 203 spots 

identified as significantly reproducible than those corresponding to synthesis 

at 37 °C, using the average spot intensities for each experimental condition 

adopted. In order to remove the variability of staining conditions among the 

gels, each spot volume was normalized by the sum of volumes of all the spots 

detected on each map. The data were autoscaled before PCA. From score 

analysis, it was possible to point out that the protein response of immobilized 

cells after the cold shock was significantly different from those of exponentialand 

stationary-phase free-floating organisms. The reasons for these differences 

could be searched for in the loadings analysis, from which the identification of 

nine families of proteins could also be confirmed. 

Principal Component Analysis was applied to identify the differences in 

macrophage maturation in the U937 human lymphoma cell line by Verhoeckx 

et al. (46). PCA proved to be effective in the identification of variations between 

samples belonging to different macrophage maturation times, where standard 

t-tests identified a smaller number of biomarkers. Another application (47) 

consisted of the characterization of anti-inflammatory compounds. 

Other applications from Marengo (22,23,48) exploit PCA coupled to both 

cluster analysis and SIMCA classification for the identification of differences 

between groups of maps. The first application (48) refers to a spot quantity 

dataset comprising 435 spots detected in 18 samples belonging to two different


cell lines of control (untreated) and drug-treated pancreatic ductal carcinoma 

cells. The study was conceived for the identification of the role played by drugs 

on different cell lines. PCA allowed clear discrimination of the four groups of 

samples with the use of three PCs, and the analysis of the loadings provided 

reasons for the differences among groups of samples. The results were further 

confirmed by cluster analysis. Identification of some of the most relevant spots 

was also performed by mass spectrometry. The other two applications (22,23) 

regard the use of PCA and SIMCA to the classification of proteomic maps. 

The first paper (22) shows an application to the adrenal glands of healthy and 

diseased mice. PCA was able to discriminate the two classes of samples by 

means of the first PC, the loadings of which allowed the identification of spots 

responsible for the differences. SIMCA was then applied for the classification 

of samples in the two classes, and it was able to correctly classify all the 

samples present with one PC in the SIMCA model of each class. SIMCA 

allowed the identification of the most discriminating spots by the analysis of 

DPs. The comparison between the maps showed up- and down-regulation of 

84 polypeptide chains out of a total of 700 spots detected. 

An analog approach was followed even for the comparison of phenotypic 

expression of mantle cell lymphoma GRANTA-519 and MAVER-1 cell 

lines (23). 

Marengo proposed an alternative method to show loadings from PCA, and 

modeling and discriminating powers calculated by SIMCA. In order to obtain 

clearer representation of the results, the spots showing relevant discriminating 

and/or modeling power (and loadings as well) are represented on a virtual 2D- 

PAGE map. Each discriminating spot is represented as a circle on a virtual 2D 

map; the position of each spot is determined by its x–y coordinates identified by 

standard software packages (PDQuest in this case). The spots are represented on 

a color scale: darker red tones identify spots showing a larger discriminating or 

modeling power. The use of such representations in common software packages 

could represent a valid alternative to the standard visualization of loadings for 

each variable in the space given by two PCs at a time. 

Fujii et al. (49) studied the histological subtypes of lymphoid neoplasms: 

42 cell lines from human lymphoid neoplasms were included. The discriminating 

spots were selected by means of different methods used in sequence: 

(1) Wilcoxon or Kruskal–Wallis tests to find spots whose intensity was significantly 

(p < 0.05) different among the cell line groups, (2) statistical learning 

methods to prioritize the spots according to their contribution to the classification, 

and (3) unsupervised classification methods to validate classification 

robustness by the selected spots. Thirty-one spots resulted to be significant, 24 

of which were identified by mass spectrometry.


Other applications are in the field of food quality (coupled to cluster analysis 

and discriminant analysis): several examples are present in literature about 

cheese classification (50) and identification of the protein content in wheat and 

bread (51,52). 

3.5.2. Discriminant Analysis–Partial Least Squares 

With regard to the application of DA-PLS methods, many papers have 

appeared in the last few years. Jessen et al. (53) demonstrated with two 

examples how information can be extracted from 2DE data by discrimination 

PLSR with variable selection. The time course of post mortem proteome 

changes in the muscle tissues of pigs was investigated. A first discriminant 

PLSR was performed on the spot volume dataset derived from usual analysis 

via dedicated software (Bioimage 2D Analyser, Genomic Solutions, USA), the 

independent response being a binary indicator of the individual pig considered 

or of the sampling time (post mortem increasing time). PLS has been proved 

to be successful in the identification of spots characterized by systematic 

variation. In order to identify only those spots showing actual relevant variation 

among the groups identified, a variable selection procedure was applied, and 

no relevant spots were iteratively eliminated from the model: the final model 

chosen contained the minimum number of spots giving the best correlation with 

the response. For variable selection, a jack-knifing procedure was selected. 

Kleno et al. (54) applied PCA and PLS to the identification of the mechanism 

of action of hydrazine toxicity in rat liver samples. PCA was carried out on 

a data matrix of dimensions 30 × 431 (30 being the 2D maps: 5 animals × 3 

doses of hydrazine × 2 times after administration; 431 being the spots revealed 

on the maps). PC 1 was able to separate the samples according to three different 

dose levels, while PC 4 allowed the separation of the two times after the administration, 

but only for the largest dose level. The analysis of the loadings did 

not allow a clear identification of the most relevant discriminating spots, and 

so a PLSR was applied to model the Y variable (dose level of hydrazine). A 

variable selection according to jack-knifing was applied. The PLS regression 

allowed to identify spots that play an important role in the differentiation of 

samples according to the dose level administered. The results were compared 

to standard univariate t-tests, showing that some spots identified by PLS could 

not be identified as relevant by standard t-tests; this is due to the fact that PLS 

takes into account the correlation structure of the dataset. 

Kiaersgard et al. (55) studied the change in the proteomic profile of cod 

muscle samples during different storage conditions. Eleven storage conditions 

were taken into account, deriving from a large factorial design including storage 

temperature (two levels), storage period (4 levels), and chill storage period


(5 levels). Each sample was replicated twice, and the replicated samples were 

run on different batches. PCA provided a grouping of samples on the basis of 

frozen storage time, but no information emerged with respect to the differences 

between the samples according to the other two parameters. The study was 

refined through the application of DA-PLS with variable selection by a jackknife 

procedure, and it allowed the identification of relevant spots with respect 

to the differentiation of samples according to the storage time. The authors focus 

their attention even on the optimal normalization of data before multivariate 

analysis. Autoscaling is in fact the most exploited method for data normalization 

in proteomics, but it presents the risk of amplifying the noise; this is particularly 

true for proteomics where experimental uncertainty is large. To avoid this 

problem, mean centering was applied to the data, and normalization was then 

applied by dividing each mean centered value by (SD + B) (SD = standard 

deviation of each variable, B = constant term to be optimized). The authors 

identified the scale range of B value (2500 in their case) by representing in a 

scatter diagram the mean volume for each variable (spot) versus its standard 

deviation: the best value was then selected by considering several values of 

B, as the value giving the best agreement between univariate and multivariate 

approaches. 

Gottfries et al. (56) applied both PCA and DA-PLS to the study of two 

different datasets: the first dataset consists of samples of cerebrospinal fluid 

from control individuals and individuals affected by different pathologies (12 

control, 15 with Alzheimer’s disease, 15 with Frontotemporal dementia, and 10 

with Parkinson’s disease), giving a final dataset of dimension 52 × 96 (96 spots 

identified on 52 maps). The second dataset consists of liver samples from normal 

and obese mice (samples were grouped into six groups comprising four to eight 

animals each); the final dataset has dimension 30 × 603 (30 being the samples, 

and 603 the spots identified). In both cases, the groups of samples present in 

each dataset could be separated by means of the first three PCs after the application 

of PCA. DA-PLS was then applied to each dataset in order to identify 

the spots responsible for the differences between each pair of groups: in all the 

cases the first latent variable computed was able to correctly classify the samples. 

In another application, Karp et al. (57) demonstrated the effectiveness of 

PLS-DA in the identification of the differences in three proteomic datasets; 

among them, a dataset in which no difference was expected between the two 

groups of samples considered was also included: in this case, as expected, PLS- 

DA provided no model. Finally, Norden et al. (58) applied PCA and DA-PLS 

to the identification of the differences between urine samples of smoking and 

non-smoking individuals. 

The great number of applications of PCA, PLS, and other multivariate tools 

in proteomics (31–59) gives a clear idea of the importance of multivariate


methods in this field; such techniques are in fact able to identify a larger number 

of variables (spots) relevant for discrimination between the classes of samples 

with respect to the classical t-tests usually carried out by standard software 

packages. 

4. Image Analysis 

The second approach to 2D-PAGE analysis is focused on the direct analysis 

of 2D maps images. This approach could present a fundamental advantage to 

proteomic data analysis: the elimination of contribution given by the operator, 

which is usually relevant when dedicated software packages for proteomic 

maps analysis are used. Several methods for direct 2D maps image analysis 

are reported in literature, but they are not yet much widespread to be included 

in common software packages; these methods mainly exploit artificial neural 

networks, fuzzy logic principles, and the calculation of mathematical moments. 

Such procedures represent the frontier in bioinformatics, and some of them 

are yet under development. The main principles related to these methods will 

be presented here, together with a review of the most interesting applications 

present in literature. 

4.1. Fuzzy Logic 

The low reproducibility of 2D gel-electrophoresis, pointed out earlier in this 

chapter, produces significant differences even among maps corresponding to 

replicates of the same electrophoretic run; these differences consist of changes 

in spot position, size, and shape. The precise description of the position of each 

spot in terms of x–y coordinates thus appears very difficult to accomplish. The 

uncertainty on the position and shape of each spot can be effectively treated 

by fuzzy logic principles. Marengo et al. (60,61,62,63,64) successfully applied 

fuzzy logic principles coupled to multivariate statistical tools to the analysis 

of sets of 2D maps. 

Their four-step procedure consists of: 

1. image digitalization; 

2. image defuzzyfication; 

3. image refuzzyfication; 

4. application of multivariate tools to fuzzy maps. 

4.1.1. Image Digitalization 

The first step consists of scanning each map by a densitometer to provide a 

description of the map as a grid of a given step containing in each cell the OD


ranging from 0 to 1. The contribution to the signal of each map given by the 

background is eliminated by applying a cut-off value to each map (generally 

0.3/0.4): the values below the cut-off value are transformed into null values. 

The cut-off value applied has to be optimized independently for each case 

study. 

4.1.2. Image Defuzzyfication 

The second step mainly performs defuzzyfication of each map, consisting 

of the elimination of sensitivity due to the destaining protocol. The digitalized 

image is, in fact, turned into a grid of binary values: 0 is assigned to the cell 

where no signal is detected, 1 to the cell where a value above the cut-off 

threshold is present. 

4.1.3. Refuzzyfication 

The previous step eliminates the information about spatial uncertainty as 

well, since each spot is no more described by grey-scale values but only 

by binary values (presence/absence). This step is then focused on the reintroduction 

of information about spatial uncertainty. Each cell containing a 1 

value in step 2 is substituted by a 2D probability function. The most suitable 

distribution is a 2D Gaussian function. The probability of finding a signal in 

cell x i , y j when a signal is already present in cell x k , y l is given by: 

where 

1 

fx i y j x k y l = e 

2 x y 

[ 

1 

21− 2 

x i −x k 2 

2 x 

] 

+ y j −y l 2 

y 

2 

is correlation between 1 st and 2 nd dimension; 

(x i , y j ) is the position of the spot influencing the spot in position (x k , y l ); 

y is the standard deviation along 1 st dimension; and 

x : is the standard deviation along 2 nd dimension. 

The correlation between the two dimensions () is usually fixed at 0, corresponding 

to the complete independence of two electrophoretic runs; the two 

standard deviations, x and y , correspond to the standard deviations of the 

2D Gaussian function along the x and y axes. Maintaining them identical 

corresponds to an identical repeatability of the result with respect to the two 

electrophoretic runs (according to the pH gradient and molecular mass): in 

this case, the parameter that is analyzed for its effect on the final result is 

= x = y . Alternatively, the two parameters can be fixed at different values,


usually x = 1.5 y , corresponding to an uncertainty along the second dimension 

that is about 50% larger than that along the first dimension. The separation 

according to the molecular mass is in fact expected to show a larger uncertainty 

(self-made polymerization of the gel for the second run versus a first dimension 

run on commercial strips). 

A change in parameter (or of parameters x and y ) corresponds to the 

modification of distance at which an occupied cell exerts its effect: large 

values reflect in a perturbation operating at larger distances. Smaller values 

correspond to a perturbation operating at a smaller distance, with spots acting 

a lesser effect on their neighbourhood and a crisper final image. Therefore, the 

larger the parameter, the larger the fuzzyfication level applied to the maps. 

In general, best results are expected for intermediate levels of parameters, 

corresponding to not too fuzzyfied maps (nor too blurred final images). 

With respect to the choice of probability function, the Gaussian distribution 

appeared to be the best alternative, since spots can be described as 

intensity/probability distributions with the highest intensity/probability value at 

the center of the spot and decreasing intensities/probabilities as the distance 

from the center increases. In addition, the integral of the Gaussian function on 

the whole domain of the 2D-PAGE is 1, corresponding to a total signal that is 

blurred but, in the meantime, maintained quantitatively coherent. 

The value of the signal S k in each cell x i , y j of the fuzzy map is calculated 

by the sum of the effect of all neighbor cells x 

j ′ , y′ j 

containing spots: 

S k = 

∑ 

f ( ) 

x i y j x i' y j' 

i'j='1n 

Even if the sum runs on all the cells in the grid, only the neighbor cells are 

influenced by the presence of a signal, depending on the parameter. 

The procedure consists of turning each digitalized image into a virtual map 

containing, in each cell, the sum of the influence of all the spots of the original 

2D-PAGE; these virtual maps can be called fuzzy matrices or fuzzy maps. Due 

to the existence of complex spots of irregular shape in real maps, the Gaussian 

function is associated to each cell instead of to each spot. 

Figure 8 represents an example of fuzzyfication of a map at different 

values; the example shows the digitalized and defuzzyfied maps and the fuzzyfication 

of the map for five increasing values. 

4.1.4. Application of Multivariate Tools to Fuzzy Maps 

The final fuzzy maps can then be analyzed by several multivariate tools 

for diagnostic/prognostic purposes. Two approaches will be presented here: (1) 

the coupling of PCA and classification tools; (2) the use of multi-dimensional 

scaling (MDS) techniques.


(A) 

Digitalised image 

(B) 

De-fuzzyfied image 

20 

20 

40 

40 

60 

60 

80 

80 

100 

100 

120 

120 

140 

140 

160 

160 

180 

180 

200 

20 40 60 80 100 120 140 160 180 200 

200 

20 40 60 80 100 120 140 160 180 200 

(C) 

σ = 0.50 (D) 

σ = 1.00 

20 

20 

40 

40 

60 

60 

80 

80 

100 

100 

120 

120 

140 

140 

160 

160 

180 

180 

200 

20 40 60 80 100 120 140 160 180 200 

200 

20 40 60 80 100 120 140 160 180 200 

(E) 

σ = 1.50 (F) 

σ = 2.00 

20 

20 

40 

40 

60 

60 

80 

80 

100 

100 

120 

120 

140 

140 

160 

160 

180 

180 

200 

20 40 60 80 100 120 140 160 180 200 

200 

20 40 60 80 100 120 140 160 180 200 

(G) 

σ = 2.50 

20 

40 

60 

80 

100 

120 

140 

160 

180 

200 

20 40 60 80 100 120 140 160 180 200 

Fig. 8. Sample ILL1 from (61): digitalized image (A); defuzzyfied image (B); 

fuzzyfication at five values (C–G).


4.1.4.1. PCA and Classification Methods (61) 

Marengo et al. (61) have reported an application of PCA and LDA to fuzzy 

maps to a set of eight 2D maps belonging to control and mantle cell lymphoma 

samples. 

Principal Component Analysis can be applied to images by the previous 

unwrapping of each image; each sample (map) is turned into a series of variables 

describing the signal in each position of the map. In this case, 200 × 200 

pixel images were taken into consideration, providing a final set of 40,000 

variables for each map. PCA is particularly useful here to detect a small number 

of components accounting for the differences existing between the groups of 

samples and operating, in the meantime, a dimensionality reduction. The significant 

PCs calculated were used to build a LDA model to classify the samples; 

the selection of the variables for LDA model, which discriminates between 

the classes present in the dataset, was performed by a stepwise algorithm in 

forward search (F to−enter = 4.0). 

The procedure was repeated for different values of the parameter in order 

to detect the best value providing correct classification of the samples with 

the smallest number of components in the final LDA model. The best results 

(100% of correct assignments) were obtained for values ranging from 1.75 

to 2.25, with PC 1 and PC 4 in the final LDA model. The differences existing 

between the two groups of samples could then be investigated by the analysis 

of loadings on the first and the fourth PCs. 

Figure 9 shows the score plot and the loading plot of PC 1 and PC 4 for 

= 2.00. The loadings are represented again on a virtual map on a color scale: 

white tones correspond to the zones in the map characterized by large positive 

loadings and the black tones to the zones characterized by large negative 

loadings on the corresponding PC. 

4.1.4.2. Multi-Dimensional Scaling 

In other applications of multivariate tools to fuzzy maps, Marengo et al. 

(62,63) describe the use of MDS procedures. MDS performs a substantial 

dimensionality reduction and an effective graphical representation of the data 

on the basis of similarity calculated between couples of objects. MDS searches 

for the smallest number of dimensions in which the objects can be represented 

as points, matching, as much as possible, the distances between the objects 

in the new reference system with those calculated in the original reference 

system. In these applications, the calculations were performed by the Kruskal 

iterative method; the search for the coordinates was based on the steepest 

descent minimization algorithm, where the target function is the so-called stress 

(S), which is a measure of the ability of the configuration of points to simulate 

the original distance matrix.


10 

σ = 2.00 

8 

HEA2 

6 

HEA4 

4 

PC4 

2 

ILL2 

0 

–2 

ILL3 

ILL4 

HEA3 

–4 

ILL1 

HEA1 

–6 

–12 –10 –8 –6 –4 –2 0 2 4 6 8 10 12 14 16 18 

PC1 

Loadings PC1 

Loadings PC4 

20 

0.04 

20 

0.03 

40 

60 

80 

0.03 

0.02 

40 

60 

80 

0.02 

0.01 

100 

0.01 

100 

0 

120 

140 

160 

180 

200 

120 

0 

140 

–0.01 160 

180 

–0.02 

200 

20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 

–0.01 

–0.02 

–0.03 

–0.04 

Fig. 9. Score plot (A) and loading plots (B) of PC1 and PC4 with = 2.00. 

As for the previous applications based on PCA and LDA, several values of 

parameter have been investigated, and the one providing the best classification 

was selected. In this case, for each value of the parameter, a similarity matrix 

has to be built. 

From the match between the two fuzzy maps k and l, the common signal 

SC kl (the sum of all signals present in both maps) and the total signal ST kl can 

be computed: 

SC kl = ∑ 

min ( ) 

S k i Sl i 

i=1n 

ST kl = ∑ 

max ( ) 

S k i Sl i 

i=1n


where n is the number of cells in the grid. The similarity index is then 

computed by: 

S kl = SC kl 

ST kl 

S kl ranges from 0 (two maps showing no common structure) to 1 (two identical 

maps). In both the applications, the optimal values that provide the best classification 

of the samples with only one or two dimensions could be identified. 

4.2. Moment Functions 

Moment functions have been widely used in image analysis, in applications 

related to invariant pattern recognition, object classification, pose estimation, 

image coding, and reconstruction (65,66,67,68,69). A set of moments computed 

from a digital image generally represents global characteristics of the image 

shape, and provides a lot of information about different types of geometrical 

features of the image. Geometric moments were the first ones to be applied to 

images, as they are computationally very simple. With the progress of research 

in image processing, many new types of moment functions have been introduced 

recently, such as orthogonal moments, rotational moments, and complex 

moments, which are useful tools in the field of pattern recognition, and can be 

used to describe the features of objects such as shape, area, border, location, 

and orientation; naturally each moment function has its own advantages in 

specific applications. 

The most important and most used moments are orthogonal moments (e.g., 

Legendre (70,71,72) and Zernike moments (73,74,75)), which can attain a 

zero value of redundancy measure in a set of moment functions, so that 

these orthogonal moments correspond to the independent characteristics of the 

image. In other words, moments with orthogonal basis functions can be used 

to represent the image by a set of mutually independent descriptors, with a 

minimum amount of information redundancy. So far, orthogonal moments have 

additional properties of being more robust, with respect to the non-orthogonal 

ones, in the presence of image noise. Orthogonal moments also permit analytical 

reconstruction of an image intensity function from a finite set of moments, 

using the inverse moment transform. 

Legendre moments are the most used orthogonal moments and can be implemented 

as feature descriptors for 2D-PAGE maps classification. 

The main advantages in the use of Legendre moments to clustering the 

maps derive from the possibility to obtain invariance to translation, scale, and 

rotation; in other words, the original maps, without any pre-treatment, can be 

used for classification, and the use of complex commercial software can be 

totally avoided.


The number of calculated moments is very large, and many of them do not 

contain information related to the specific target of correctly classifying the 

2D-PAGE maps; for this reason a method for selecting the moments having 

highest DP must be applied (e.g., LDA). 

4.2.1. Legendre Moments 

The Legendre polynomials form a complete orthogonal set inside the unit 

circle. Moments with Legendre polynomials as kernel functions were first 

introduced by Teague (68). 

The kernel of Legendre moments are products of Legendre polynomials 

defined along the rectangular image coordinate axes inside a unit circle. 

The two-dimensional Legendre moments of orderp + qof an image 

intensity mapf x y are defined as: 

L pq = 

2p + 12q + 1 

4 

∫ 1 ∫ 1 

−1 

−1 

P p x × P q yfx ydxdy 

xy∈−11 

where Legendre polynomial, P p x, of order p is given by: 

{ 

} 

p∑ 

P p x = −1 p−k 1 p − k!x k 

2 

( 

2 p p−k 

) ( 

! 

p+k 

) 

!k! 

k=0 

2 

2 

p−k=even 

The recurrence relation of Legendre polynomials, P p x, is: 

P p x 2p − 1 xP p−1 x − p − 1 P p−2 x 

 

p 

where P 0 x-1, P 1 x = x, and p>1. Since the region of definition of Legendre 

polynomials is the interior of [–1,1], a square image of N × N pixels with 

intensity function fi j, 0≤i, j≤( N–1 ) is scaled in the region –1< x,y


The reconstruction of image function from calculated moments can be 

performed by the following inverse transformation: 

p max q 

∑ ∑ max 

( ) 

f i j = pq P p x i P q xj 

p=0 q=0 

Marengo et al. (76) report an interesting application of Legendre moments 

to a set of 2D-PAGE maps belonging to two different cell lines of control 

(untreated) and drug-treated pancreatic ductal carcinoma cells. 

The aim of the work was to obtain the correct classification of the 18 samples 

using the Legendre moments as discriminant variables. 

Each 2D-PAGE, which was automatically digitalized, was described by a 

200×200 matrix of pixels; the value of each pixel varies from 0 to 1 to indicate 

the staining intensity in the given position. 

The Legendre moments of the 18 digitalized images were calculated. 

Moments up to a maximum order of 100 were computed from the images. Each 

matrix held the global information of the corresponding 2D-PAGE map. 

The final dataset contained 18 samples and 10,201 variables. The number 

of variables was very large, and many of them were either redundant or did 

not contain information related to the specific target of correctly classifying 

the samples; for this reason a method for selecting the variables having the 

highest power of discrimination was applied (forward stepwise LDA with 

F to−enter = 4.0). The results of stepwise LDA procedure showed that only six 

different Legendre moments were necessary in order to correctly classify the 

18 samples. 

The results demonstrate that the Legendre moments can be successfully 

applied for fast classification and similarity analysis of 2D-PAGE maps. 

4.3. Other Methods 

Schultz et al. (77), together with the application of PCA and PLS to spot 

volume data, applied PCA to the analysis of gel images after digitalization and 

unwrapping. The choice of the alignment procedure for the sets of gels proved 

to be the determinant of the final result. PCA proved to be effective in the 

identification of the groups of maps present. 

Marengo et al. (78) also applied three-way PCA to the identification of 

the differences among groups of 2D maps. Proteomic datasets are suitable 

to be treated by three-way method due to their three-way structure: the first 

dimension being the pH gradient, the second the molecular mass, and the third 

the samples. In three-way PCA, the observed modes (conventionally called I, 

J, and K) can be synthesized in more fundamental modes, each element of a 

reduced mode expressing a particular structure existing between all or a part


of the elements of the associated observation mode. The final result is given 

by three sets of loadings together with a core array describing the relationship 

among them. Each of the three sets of loadings can be displayed and interpreted 

in the same way as a score plot of standard PCA. Three-way PCA was preceded 

by data transformation to scale all the samples and make them comparable; 

to this purpose, maximum scaling was selected and the digitalized 2D PAGE 

maps were scaled one at a time to the maximum value for each map. This 

method was successfully applied to datasets of human lymph-nodes and rat 

sera allowing the identification of the main differences existing among the sets 

of 2D maps. 

References 

1. Mahon, P., Dupree, P., (2001) Quantitative and reproducible two-dimensional gel 

analysis using Phoretix 2D Full, Electrophoresis 22, 2075–2085 

2. Rubinfeld, A., Keren-Lehrer, T., Hadas, G., Smilansky, Z., (2003) Hierarchical 

analysis of large-scale two-dimensional gel-electrophoresis experiments, 

Proteomics 3, 1930–1935 

3. Anderson, N.L., Taylor, J., Scandora, A.E., Coulter, B.P., Anderson, N.G., (1981) 

The TYCHO system for computer analysis of two-dimensional gel electrophoresis 

patterns, Clinical Chemistry 27 (11), 1807–1820 

4. Rosengren, A.T., Salmi, J.M., Aittokallio, T., Westerholm, J., Lahesmaa, R., 

Nyman, T.A., Nevalainen, O.S., (2003) Comparison of PDQuest and Progenesis 

software packages in the analysis of two dimensional electrophoresis gels, 

Proteomics 3, 1936–1946 

5. Raman, B., Cheung, A., Marten, M.R., (2002) Quantitative comparison and 

evaluation of two commercially available, two-dimensional electrophoresis image 

analysis software packages, Z3 and Melanie, Electrophoresis 23, 2194–2202 

6. Panek, J., Vohradsky, J., (1999) Point pattern matching in the analysis of twodimensional 

gel electropherograms, Electrophoresis 20, 3483–3491 

7. Pleissner, K.P., Hoffman, F., Kriegel, K., Wenk, C., Wegner, S., Sahistrom, A., 

Oswald, H., Alt, H., Fleck, E., (1999) New algorithmic approaches to protein spot 

detection and pattern matching in two-dimensional electrophoresis gel databases, 

Electrophoresis 20, 755–765 

8. Voss, T., Haberl, P., (2000) Observations on the reproducibility and matching 

efficiency of two-dimensional electrophoresis gels: consequences for comprehensive 

data analysis, Electrophoresis 21, 3345–3350 

9. Cutler, P., Heald, G., White, I.R., Ruan, J., (2003) A novel approach to 

spot detection for two-dimensional gel electrophoresis images using pixel value 

collection, Proteomics 3, 392–401 

10. Molloy, M.P., Brzezinski, E.E., Hang, J., McDowell, M.T., VanBogelen, R.A., 

(2003) Overcoming technical variation and biological variation in quantitative 

proteomics, Proteomics 3, 1912–1919


11. Moritz, B., Meyer, H.E., (2003) Approaches for the quantification of protein 

concentration ratios, Proteomics 3, 2208–2220 

12. Wheelock, A.M., Buckpitt, A.R., (2005) Software-induced variance in twodimensional 

gel electrophoresis image analysis, Electrophoresis 26, 4508–4520 

13. Almeida, J.S., Stanislaus, R., Krug, E., Arthur, J.M., (2005) Normalisation and 

analysis of residual variation in two-dimensional gel electrophoresis for quantitative 

differential proteomics, Proteomics 5, 1242–1249 

14. Pietrogrande, M.C., Marchetti, N., Dondi, F., Righetti, P.G., (2003) Spot 

overlapping in two-dimensional polyacrylamide gel electrophoresis maps: 

relevance to proteomics, Electrophoresis 24, 217–224 

15. Pietrogrande, M.C., Marchetti, N., Dondi, F., Righetti, P.G., (2002) Spot 

overlapping in two-dimensional polyacrylamide gel electrophoresis separations: a 

statistical study of complex protein maps, Electrophoresis 23, 283–291 

16. Campostrini, N., Areces, L.B., Rappsilber, J., Pietrogrande M.C., Dondi, F., 

Pastorino, F., Ponzoni, M., Righetti, P.G., (2005) Spot overlapping in twodimensional 

maps: a serious problem ignored for much too long, Proteomics 2005 

(5), 2385–2395 

17. Garrels, J.I., (1979) Two dimensional gel electrophoresis and computer analysis of 

proteins synthesized by clonal cell lines, J. Biol. Chem. 254, 7961–7977 

18. Garrels, J.I., Farrar, J.T., Burwell IV, C.B., (1984) In: Celis, J.E., Bravo, R. (Eds.), 

Two-dimensional Gel Electrophoresis of Proteins, Academic Press, Orlando, FA, 

USA, pp. 38–91 

19. Garrels, J.I., (1989) The QUEST system for quantitative analysis of twodimensional 

gels, J. Biol. Chem. 264, 5269–5282 

20. Massart, D.L., Vandeginste, B.G.M., Deming, S.M., Michotte, Y., Kaufman, L., 

(1988) Chemometrics: A Textbook. Amsterdam, Elsevier 

21. Vandeginste, B.G.M., Massart, D.L., Buydens, L.M.C., De Jong, S., Lewi, P.J., 

Smeyers-Verbeke, J., (1998) Handbook of Chemometrics and Qualimetrics: Part B. 

Amsterdam, Elsevier 

22. Marengo, E., Robotti, E., Righetti, P.G., Campostrini, N., Pascali, J., Ponzoni, M., 

(2004) Study of Proteomic changes associated with healthy and tumoral murine 

samples in Neuroblastoma by Principal Component Analysis and classification 

methods, Clinica Chimica Acta 345, 55–67 

23. Marengo, E., Robotti, E., Bobba, M., Liparota, M.C., Antonucci, F., Rustichelli, C., 

Zamò, A., Chilosi, M., Hamdan, M., Righetti, P.G., (2006) Characterisation of 

the proteomic profiles of two human lymphoma cell lines by two-dimensional 

gel-electrophoresis and multivariate statistical tools, Electrophoresis 27, 

484–494 

24. Massart, D.L., Kaufman, L., (1983) In: Elving, P.J., Winefordner, J.D. (Eds.), The 

Interpretation of Analytical Chemical Data by the Use of Cluster Analysis. Wiley, 

New York, USA 

25. Eisenbeis, R.A. (Ed.), (1972) Discriminant Analysis and Classification Procedures: 

Theory and Applications. Lexington, USA


26. Klecka, W.R. (Ed.), (1980) Discriminant Analysis. Sage Publications, Beverly 

Hills, USA 

27. Wold, S., (1976) Pattern recognition by means of disjoint principal components 

models, Pattern Recognition 8, 127–139 

28. Martens, H., Naes, T., (1989) Multivariate Calibration, Wiley, London 

29. Kleinbaum, D., Kupper, L., Muller, K., (1988) Applied Regression Analysis and 

Other Multivariate Methods, 2nd ed.. Pws-Kent, Boston 

30. De Noord, O.E., (1994) Multivariate calibration standardization, Chemometr. Intell. 

Lab. Syst. 25, 85–97 

31. Anderson, N.L., Hofmann, J.P., Gemmell, A., Taylor, J., (1984) Global approaches 

to quantitative analysis of gene-expression patterns observed by use of twodimensional 

gel electrophoresis, Clin Chem. 30, 2031–2036 

32. Tarroux, P., Vincens, P., Rabilloud, T., (1987) HERMeS: a second generation 

approach to the automatic analysis of two-dimensional electrophoresis gels. Part 

V: Data analysis, Electrophoresis 8, 187–199 

33. Couto, M.M.B., Vogels, J.T.W.E., Hofstra, H., Husiintveld, J.H.J., Vandervossen, 

J.M.B.M., (1995) Random amplified polymorphic DNA and restriction 

enzyme analysis of PCR amplified RDNA in taxonomy, 2 Identification techniques 

for food-borne yeasts, J. Applied Bacteriology 79 (5), 525–535 

34. Johansson, M.L., Quednau, M., Ahrne, S., Molin, G., (1995) Classification of 

lactobacillus-plantarum by restriction-endonuclease analysis of total chromosomal 

DNA using conventional agarose-gel electrophoresis, International J. of Systematic 

Bacteriology 45 (4), 670–675 

35. Boon, N., De Windt, W., Verstraete, W., Top, E.M., (2002) Evaluation of nested 

PCR-DGGE (denaturing gradient gel electrophoresis) with group-specific 16S 

rRNA primers for the analysis of bacterial communities from different wastewater 

treatment plants, FEMS Microbiology Ecology 39 (2), 101–112 

36. Gadea, I., Ayala, G., Diago, M.T., Cunat, A., Garcia de Lomas J., (2000) Immunological 

diagnosis of human hydatid cyst relapse: utility of the enzyme-linked 

immunoelectrotransfer blot and discriminant analysis, Clinical and Diagnostic 

Laboratory Immunology 7 (4), 549–552 

37. Gadea, I., Ayala, G., Diago, M.T., Cunat, A., Garcia de Lomas, J., (1999) Immunological 

diagnosis of human cystic echinococcosis: utility of discriminant analysis 

applied to the enzyme-linked mmunoelectrotransfer blot, Clinical and Diagnostic 

Laboratory Immunology 6 (4), 504–508 

38. Kovarova, H., Hajduch, M., Korinkova, G., Halada, P., Krupickova, S., 

Gouldsworthy, A., Zhelev, N., Strnad, M., (2000) Proteomics approach in classifying 

the biochemical basis of the anticancer activity of the new olomoucinederived 

synthetic cyclin-dependent kinase inhibitor, bohemine, Electrophoresis 21, 

3757–3764 

39. Kovarova, H., Radzioch, D., Hajduch, M., Sirova, M., Blaha, V., Macela, A., 

Stulik, J., Hernychova, L., (1998) Natural resistance to intracellular parasites: a 

study by two-dimensional gel electrophoresis coupled with multivariate analysis, 

Electrophoresis 19 (8–9), 1325–1331


40. De Moor, B., Marchal, K., Mathys, J., Moreau, Y., (2003) Bioinformatics: 

organisms from Venus, technology from Jupiter, algorithms from Mars, European 

Journal of Control 9 (2–3), 237–278 

41. Iwadate, Y., Sakaida, T., Hiwasa, T., Nagai, Y., Ishikura, H., Takiguchi, M., 

Yamaura, A., (2004) Molecular classification and survival prediction in human 

gliomas based on proteome analysis, Cancer Research 64 (7), 2496–2501 

42. Amin, R.A., Vickers, A.E., Sistare, F., Thompson, K.L., Roman, R.J., 

Lawton, M., Kramer, J., Hamadeh, H.K., Collins, J., Grissom, S., Bennett, L., 

Tucker, C.J., Wild, S., Kind, C., Oreffo, V., Davis, J.W., Curtiss, S., Naciff, J.M., 

Cunningham, M., Tennant, R., Stevens, J., Car, B., Bertram, T.A., Afsharil, C.A., 

(2004) Identification of putative gene-based markers of renal toxicity, Environmental 

Health Perspectives 112 (4), 465–479 

43. Heijne, W.H.M., Stierum, R.H., Slijper, M., van Bladeren, P.J., van Ommen, B., 

(2003) Toxicogenomics of bromobenzene hepatotoxicity: a combined transcriptomics 

and proteomics approach, Biochemical Pharmacology 65 (5), 857–875 

44. Anderson, N.L., EsquerBlasco, R., Richardson, F., Foxworthy, P., Eacho, P., (1996) 

The effects of peroxisome proliferators on protein abundances in mouse liver, 

Toxicology and Applied Pharmacology 137 (1), 75–89 

45. Perrot, F., Hebraud, M., Charlionet, R., Junter, G.A., Jouenne, T., (2001) Cell 

immobilisation induces changes in the protein response of Escherichia coli K-12 

to a cold shock, Electrophoresis 22, 2110–2119 

46. Verhoeckx, K.C.M., Bijlsma, S., de Groene, E.M., Witkamp, R.F., van der Greef, J., 

Rodenburg, R.J.T., (2004) A combination of proteomics, principal component 

analysis and transcriptomics is a powerful tool for the identification of biomarkers 

for macrophage maturation in the U937 cell line, Proteomics 4 (4), 1014–1028 

47. Verhoeckx, K.C.M., Bijlsma, S., Jespersen, S., Ramaker, R., Verheij, E.R., 

Witkamp, R.F., van der Greef, J., Rodenburg, R.J.T., (2004) Characterization 

of anti-inflammatory compounds using transcriptomics, proteomics, and 

metabolomics in combination with multivariate data analysis, International 

Immunopharmacology 4 (12), 1499–1514 

48. Marengo, E., Robotti, E., Cecconi, D., Scarpa, A., Righetti, P.G., (2004) Identification 

of the regulatory proteins in human pancreatic cancers treated with 

Trichostatin-A by 2D-PAGE maps and Multivariate Statistical Analysis, Analytical 

and Bioanalytical Chemistry 379 (7–8), 992–1003 

49. Fujii, K., Kondo, T., Yokoo, H., Yamada, T., Matsuno, Y., Iwatsuki, K., 

Hirohashi, S., (2005) Protein expression pattern distinguishes different lymphoid 

neoplasms, Proteomics 5, 4274–4286 

50. Dewettinck, K., Dierckx, S., Eichwalder, P., Huyghebaert, A., (1997) Comparison 

of SDS-PAGE profiles of four Belgian cheeses by multivariate statistics, Lait 77 

(1), 77–89 

51. Alika, J.E., AkenOva, M.E., Fatokun, C.A., (1995) Variation among maize (Zea 

mays L) accessions of Bendel State, Nigeria – numerical analysis of zein protein 

band patterns, Genetic Resources and Crop Evolution 42 (4), 393–399


52. Magdic, D., Horvat, D., Jurkovic, Z., Sudar, R., Kurtanjek, K., (2002) Chemometric 

analysis of high molecular mass glutenin subunits and image data of bread crumb 

structure from Croatian wheat cultivars, Food Technology and Biotechnology 40 

(4), 331–341 

53. Jessen, F., Lametsch, R., Bendixen, E., Kjaersgard, I.V.H., Jorgensen, B.M., (2002) 

Extracting information from two-dimensional electrophoresis gels by partial least 

squares regression, Proteomics 2, 32–35 

54. Kleno, T.G., Leonardsen, L.R., Kjeldal, H.O., Laursen, S.M., Jensen, O.N., 

Baunsgaard, D., (2004) Mechanisms of hydrazine toxicity in rat liver investigated 

by proteomics and multivariate data analysis, Proteomics 4, 868–880 

55. Kjaersgard, I.V.H., Norrelykke, M.R., Jessen, F., (2006) Changes in cod muscle 

proteins during frozen storage revealed by proteome analysis and multivariate data 

analysis, Proteomics 6, 1606–1618 

56. Gottfries, J., Sjogren, M., Holmberg, B., Rosengren, L., Davidsson, P., 

Blennow, K., (2004) Proteomics for drug target discovery, Chemometrics and 

Intelligent Laboratory Systems 73, 47–53 

57. Karp, N.A., Griffin, J.L., Lilley, K.S., (2005) Application of partial least squares 

discriminant analysis to two-dimensional difference gel studies in expression 

proteomics, Proteomics 5, 81–90 

58. Norden, B., Broberg, P., Lindberg, C., Plymoth A., (2005) Analysis and understanding 

of high-dimensionality data by means of multivariate data analysis, 

Chemistry and Biodiversity 2 (11), 1487–1494 

59. Malone, J., McGarry, K., Bowermann, C., (2006) Automated trend analysis of 

proteomics data using an intelligent data mining architecture, Expert Systems with 

Applications 30, 24–33 

60. Marengo, E., Robotti, E., Gianotti, V., Righetti P.G., (2003) A new approach to 

the statistical treatment of 2D-Pages in proteomics using fuzzy logic, Annali di 

Chimica 93 (1–2), 105–116 

61. Marengo, E., Robotti, E., Righetti, P.G., Antonucci, F., (2003) A new approach 

based on fuzzy logic and principal component analysis for the classification of 2Dmaps 

in health and disease: application to lymphomas, Journal of Chromatography 

A 1004, 13–28 

62. Marengo, E., Robotti, E., Gianotti, V., Righetti, P.G., Domenici, E., Cecconi, D., 

(2003) A new integrated statistical approach to the diagnostic use of proteomic 

two-dimensional maps, Electrophoresis 24, 225–236 

63. Marengo, E., Robotti, E., Cecconi, D., Scarpa, A., Righetti, P.G., (2004) Application 

of fuzzy logic principles to the classification of 2D-PAGE maps belonging to 

human pancreatic cancers treated with Trichostatin-A, Proceedings of 2004 IEEE 

International Conference on Fuzzy Systems, Budapest, Hungary, 25–29 July 2004, 

1, 359–364 

64. Marengo, E., Robotti, E., Antonucci, F., Cecconi, D., Campostrini, N., 

Righetti, P.G., (2005) Spot matching in two-dimensional gels: a review of 

commercial software and of “home-made” approaches, Proteomics 5, 654–666


65. Zenkouar, H., Nachit, A., (1997) Images compression using moments method of 

orthogonal polynomials, Materials Science and Engineering B 49, 211–215 

66. Yin, J., Rodolfo De Pierro, A., Wei, M., (2002) Analysis for the reconstruction of a 

noisy signal based on orthogonal moments, Applied Mathematics and Computation 

132, 249–263 

67. Hu, M.K., (1962) Visual pattern recognition by moment invariants, IRE Transaction 

on Information Theory 8, 179–187 

68. Teague, M.R., (1980) Image analysis via the general theory of moments, Journal 

of the Optical Society of America 70, 920–930 

69. Li, B.C., Shen, J., (1991) Fast computation of moment invariants, Pattern Recognition 

24, 807–813 

70. Chong, C., Raveebdram, P., Mukundan, R., (2004) Translation and scale invariants 

of Legendre moments, Pattern Recognition 37, 119–129 

71. Mukundan, R., Ramakrishnan, K.R., (1995) Fast computation of Legendre and 

Zernike moments, Pattern Recognition 28, 1433–1442 

72. Zhou, J.D., Shu, H.Z., Luo, L.M., Yu, W.X., (2002) Two new algorithms for 

efficient computation of Legendre moments, Pattern Recognition 35, 1143–1152 

73. Wee, C., Paramesran, R., Takeda, F., (2004) New computational methods for full 

and subset Zernike moments, Information Sciences 159, 203–220 

74. Kan, C., Srinath, M.D., (2002) Invariant character recognition with Zernike and 

orthogonal Fourier-Mellin moments, Pattern Recognition 35, 143–154 

75. Khotanzad, A., Hong, Y.H., (1990) Invariant image recognition by Zernike 

moments, IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 

489–497 

76. Marengo, E., Bobba, M., Robotti, E., Liparota, M.C., (2005) Use of Legendre 

moments for the fast comparison of 2D-PAGE maps images, Journal of 

Chromatography A 1096 (1–2), 86–91 

77. Marengo, E., Leardi, R., Robotti, E., Righetti, P.G., Antonucci, F., Cecconi, D., 

(2003) Application of three-way principal component analysis to the evaluation 

of two-dimensional maps in proteomics, Journal of Proteome Research 2 (4), 

351–360 

78. Schultz, J., Gottlieb, D.M., Petersen, M., Nesic, L., Jacobsen, S., Sondergaard, I., 

(2004) Explorative data analysis of two-dimensional electrophoresis gels, 

Electrophoresis 25 (3), 502–511

17 

Finding the Significant Markers 

Statistical Analysis of Proteomic Data 

Sebastien Christian Carpentier, Bart Panis, Rony Swennen, 

and Jeroen Lammertyn 

Summary 

After separation through two-dimensional gel electrophoresis (2DE), several hundreds 

of individual protein abundances can be quantified in a cell population or sample tissue. 

Both a good experimental setup and a valid statistical approach are essential to get insight 

into the data and to draw correct conclusions. High-throughput 2DE proteomics yield 

complex and large datasets with a huge disproportion between the hundreds of variables 

and the restricted number of replicates. However, the most commonly used statistical tests 

have been designed to cope with a high number of replicates and a restricted number 

of variables. There is some inconsistency in the proteomics community related to the 

use of statistics. Two approaches of data analysis can be distinguished: exploratory data 

analysis and confirmatory data analysis. Currently, most proteomic data are analyzed 

with the emphasis on confirmatory analysis and do not take into account the exploratory 

data analysis. This chapter gives an overview of the typical statistical exploratory and 

confirmatory tools available and suggests case-specific guidelines for a reliable statistical 

approach that can be used for 2DE analysis. Examples are given for an experimental 

setup based on classical staining methods as well as for the more advanced difference gel 

electrophoresis. 

Key Words: assumptions; confirmatory data analysis; experimental set-up; 

exploratory data analysis; missing values; multivariate statistics; non-parametric test; 

parametric test; principal component analysis; univariate statistics. 



327

328 Carpentier et al. 


The conventional approach to analyze a biological problem is to collect data 

in order to test a particular hypothesis. Starting from this hypothesis, the data 

are collected, which should lead to an objective and reliable decision. As such, 

the hypothesis can be accepted, revised, or rejected. This confirmatory way of 

data analysis is accompanied by a number of steps that define the experimental 

setup. However, our understanding of a biological system is usually rather 

limited, and data may be very heterogeneous and complex. Exploratory data 

analysis approaches a biological problem from a different angle and tries to 

describe patterns, relationships, trends, outlying data, etc. Two-dimensional 

gel electrophoresis (2DE) simultaneously quantifies hundreds of individual 

protein abundances in a cell population or sample tissue. High-throughput 

2DE proteomics yield complex and large datasets with a huge disproportion 

between the hundreds of variables and the restricted number of replicates. Most 

commonly used statistical tests are for confirmatory data analysis and have 

been designed to cope with a high number of replicates and a restricted number 

of variables. 

Both a good experimental setup and a valid statistical approach are extremely 

important. There is some inconsistency in the proteomics community. Proteomic 

data are currently analyzed by a variety of approaches. The objective of this 

chapter is to give a concise overview of statistical methods used in functional 

genomics and to find a good compromise between statistics and proteome 

analysis in practice. This chapter deals with the experimental design and data 

analysis and, at the end, provides two practical examples (classical staining 

approach and DIGE approach). Section 2 discusses the issues of replicates and 

the pooling of samples, and briefly discusses the calibration, normalization, 

and quantification of data. Section 3 discusses confirmatory univariate and 

exploratory multivariate analysis and the related assumptions and associated 

problems. 

2. Experimental Design 

The design of an experiment is crucial for the robustness of the results 

obtained. Careful planning is essential to maximize the information output of 

an experiment. The experimental conditions must be well designed in order 

to keep variation within an experimental group as small as possible, and the 

experimental setup should be kept as simple as possible in order to keep the data 

manageable. When the impact of a particular treatment is to be examined, proper 

controls should be included (positive and negative control), and irrelevant 

external influences should be eliminated or anticipated (e.g., by randomized 

design).

Statistical Analysis of Proteomic Data 329 

The conventional approach of analyzing a biological problem is to collect 

data in order to test a particular hypothesis. The collected data should enable the 

researcher to make an objective and reliable decision concerning the hypothesis. 

The experimental setup usually includes a procedure that involves several 

steps: (1) state a null hypothesis (H 0 ) (e.g., there is no difference in protein 

abundance(s) between the treatments) and its alternative (H 1 ) (e.g., there is 

a difference between the treatments), (2) to choose the most appropriate test 

statistic to check the hypothesis, (3) specify a significance level (i.e., the 

accepted level of having false positive results and to reject unjustly the null 

hypothesis), (4) specify the sample size (number of replicates) to have sufficient 

power, and (5) collect the data. The power of a statistical test is the ability to 

detect possible differences between the experimental groups. The power of a 

statistical test or the reduction of false negative results depends on the variance, 

the change in abundance, the number of replicates, the statistical test chosen, 

and the predetermined significance level. Lilley and Karp have illustrated the 

relationship between power, replicate number, and relative expression change 

in a proteomics experiment (1). Urfer et al. consider the effect of testing all the 

proteins simultaneously by means of family-wise error rate and false discovery 

rate (2). The number of replicates is the best way to control the power of 

a statistical test. Given the labor and cost involved in the 2DE analysis, the 

number of replicates is often restricted, and thus the variance (technical and 

biological) should be kept in control. 

2.1. Replicates 

A well-discussed subject is the nature of replicates. Two types of replicates 

are reported in 2DE studies: (1) technical replicates (repeated measurements 

of the same sample (e.g., the same protein extract) and (2) biological replicates 

(different measurements within the same experimental group). Ideally, 

only biological replicate samples should be used, and one should try to limit 

the technical variability to the strict minimum so that a repeated measurement 

of the same sample is not necessary (Fig. 1A). Therefore, both a reliable 

sample preparation method (3) and an extended experience in electrophoresis 

and proteomic techniques are indispensable (4,5,6,7). Technical variability can 

be introduced at the level of (1) sample collection, (2) sample preparation and 

protein extraction, (3) sample loading and electrophoresis, and (4) staining and 

image analysis. Some staining methods, like silver staining, implicate a lot of 

steps, and each sample is run in an individual gel, which makes the approach 

susceptible to technical variation. Technical replicates might be considered in 

experiments with a low sample yield, with cost restrictions, or when all the 

technical variability is still too high (high inter-gel variability) (Fig. 1B).


In any case, one should take care to analyze technical replicates next to 

biological replicates. Statistically speaking, we are dealing with mixed models 

and nested designs (8,9). Karp et al. discuss the impact of mixing biological and 

technical replicates in a proteomics experiment (10). Treating technical replicates 

as biological replicates can increase the rate of false positives. Analyzing 

biological and technical replicates in one test would seem reasonable only 

in a nested ANOVA test. If another statistical test should be used, only the 

biological replicates are used (Fig. 1A), and the technical repetition of the same 

biological samples (proteins extracts) should be considered as a distinct and 

confirmatory analysis. With low technical variance observed with the difference 

gel electrophoresis (DIGE) approach (see below), the value of the analyzing 

technical replicates can be questioned and hence skipped (Fig. 1A). 

2.2. Pooling 

Another well-debated subject is the pooling of biological samples. Pooling 

of individual biological tissues or cells averages the sample. On one hand, 

pooling reduces the variability increasing the power, but on the other hand, 

there is incontestable loss of relevant information of individuals. The pooling of 

samples reduces biological variance in detecting changes in protein abundance 

between the averages of the experimental groups. Pooling of samples is usually 

done when the biological variation with in an experimental group is too big 

(Fig. 1C and 1D), or when an individual starting material is not sufficient 

to extract proteins from. Pooling of samples might be useful, but must be 

evaluated for each individual experimental setup. 

2.3. Data Processing 

Common strategies for quantitative determination of gel-separated proteins 

include organic dyes (e.g., colloidal coomassie blue), silver staining, radio 

labeling, and fluorescent stains (e.g., Deep Purple, Flamingo, SYPRO 

Orange/Red/Ruby, and other ruthenium complexes and succinimidyl ester 

derivatives of cyanine dyes). The use of a particular staining method should 

carefully be considered taking into account the lab equipment available, budget, 

and power of a particular method. The dynamic range of staining methods and 

the technical variability both have a great impact on the power of a statistical 

test and are decisive for the experimental setup (the number of replicates) and 

the choice of the statistical test. 

Data from 2DE analysis are generated through image analysis software 

that detects and quantifies protein abundances and matches the same proteins 

across different gels. An important challenge in 2DE is to estimate the protein 

concentration in order to ensure that all gels are loaded with an equal amount of


Fig. 1. Experimental set-up. Theoretical examples of experimental setup control vs. 

treatment. (A) Small intra-group variation and small technical variation: four biological 

replicates for control and four biological replicates for treatment. (B) Small intra-group 

variation and big technical variation—mixed model: four biological and three technical 

replicates for control and the same for treatment. (C) Big intra-group variation and 

small technical: four replicates of biological pool for control and the same for treatment. 

(D) Big intra-group variation and big technical variation—mixed model: four replicates 

of biological pool and three technical replicates for control and the same for treatment.


proteins, and hence to minimize the technical variation. Most current software 

packages take this into account and introduce a calibration or normalization in 

order to compensate for image differences caused by protein loading, staining, 

and scanning. 

2.3.1. Classical Approach 

Calibration in a classical approach (like silver or coomassie staining) is 

developed to take into account the differences in scanning properties (such as 

image depth). Scanner grey values are converted to optical densities so that 

intensities are no longer dependent on the original pixel depth. The most logical 

normalization procedure to anticipate possible loading differences for a classical 

staining is % volume, where the individual spot volumes are normalized by the 

total volume of all spots. Normalized data, whether or not transformed, can be 

subsequently analyzed statistically by a relevant statistical test (see below). 

The most commonly used organic staining is coomassie brilliant blue (CBB) 

staining. CBB staining has a relative good dynamic range (approximately 10 3 ) 

and is perfectly compatible with MS. However, its sensitivity is relatively 

low. The limit of protein detection for colloidal CBB stain is approximately 

8–10 ng (11). Therefore, several modifications have been proposed to improve 

its sensitivity. For an overview, see (12). 

The introduction of the first sensitive silver-staining (13) method was a major 

breakthrough in the field of protein detection, which led to extensive research 

and various alternative silver-staining protocols (14). Silver-staining is still one 

of the most sensitive non-radioactive detection techniques with a detection limit 

in the lower nanogram range. However, the linearity and dynamic range are 

relatively poor (approximately 10 2 or less), the staining is protein-dependent, 

and gel-to-gel variation is not negligible due to numerous solution changes and 

other carefully timed steps. 

2.3.2. Difference Gel Electrophoresis Approach 

Fluorescent-based methods are surpassing the conventional technologies in 

use. A standard UV-transilluminator can be used for visualization of most 

fluorescent stains, but more sophisticated and expensive CCD cameras or laser 

scanners are appropriate for quantitative determination. The development of 

succinimidyl ester derivatives of different cyanine fluorescent dyes that modify 

free amino groups of proteins prior to separation (15) was a major achievement in 

terms of reproducibility and throughput. The DIGE approach uses fluorophores 

that have different absorption optimum, making it possible to run multiple samples 

simultaneously in the same gel. Several dyes were designed to ensure that a 

protein acquires the same relative mobility irrespective of the dye used to tag it.


The difference in MW introduced by different length linkers is compensated by 

different alkyl moieties opposite the linker moiety. Originally, only two different 

cyanine dyes were included (Cy3 and Cy5), but the concept was extended with 

a third dye (Cy2) that opened the way for a total new experimental design that 

further exploits the sample multiplexing capabilities of the dyes, by including an 

internal standard (16,17). The internal standard is a mixture of equal amounts of 

each sample and guarantees a powerful normalization procedure for high accuracy 

of protein quantification. This normalization reduces the variability considerably 

and brings on reasonable arguments to justify the use of powerful parametric 

statistics after transformation of the standardized volume. If multiple conditions 

have to be tested spread over different electrophoresis runs, one common internal 

standard should be created and included in all the gels of each run. However, 

if an experimental setup is too complex, the internal standard will contain too 

many samples possibly resulting in an overlap of spots of different samples. The 

minimal labeling approach has a dynamic range of four to five orders, and its sensitivity 

is currently marginally less sensitive than silver-staining (18). Although 

the dyes have been carefully designed, care should be taken in the experimental 

design to take into account possible dye-specific effects. Therefore, a supervised 

randomization of the Cy3/Cy5 labeling is highly recommended. Not only the 

labeling should be randomized, but also the samples representing an experimental 

group should be mixed across gels in order to avoid systematic gel artefacts. 

3. Data Analysis 

3.1. Confirmatory Univariate Data Analysis 

Univariate statistical methods examine the individual protein spots one by 

one, considering the different proteins as independent measurements. Table 1 

gives an overview of some commonly used parametric and non-parametric 

univariate tests. Univariate methods start from the null hypothesis that there 

is no difference between the two experimental populations. Parametric models 

Table 1 

Overview of Some Commonly Used Univariate Tests 

Classes of data 

Univariate statistics 

Parametric Non-parametric 

Comparing 2 treatments T-test Mann–Whitney/Wilcoxon 

Kolmogorov–Smirnov test 

Comparing k treatments ANOVA Kruskal–Wallis test


like the Student’s T-test start from the observed sampling and assume that the 

observed sample mean and variance approximate the real population mean and 

variance, and that the variances of the two experimental populations are equal. 

Based on the observed mean and variance, the two populations are considered 

normally distributed and a model is made (Fig. 2). If the test statistic (or T- 

value) is large enough, the null hypothesis is rejected (Eq. 1). The numerator 

measures the distance between the experimental means and is thus an estimation 

of the inter-group variability; the denominator approximates the real variability 

and estimates the intra-group variability. 

T 2 = y 2 − y 1 2 /S 2 P 1/n 1 + 1/n 2 (1) 

where y i : experimental mean (estimate of the population mean, μ i ); S P : pooled 

sample variance (estimate of the variance; it is a weighted average of the 

group variances accounting for the number of replicates or samples in each 

group); n i : number of replicates per experimental group. 

Parametric univariate statistical tests are very powerful, but the data must 

respect the restrictive assumptions (continuous and normally distributed data, 

homogeneity of variance, and independent samples) and the assumptions must 

be tested. A commonly used test for the estimation of homogeneity of variances 

is the Levene’s test, and for the estimation of normality, it is the Shapiro-Wilk 

test (19). If one assumption is not met, the significance levels and the power 

of the test might be invalidated. Transformation of data (e.g., log function, 

arcsine, square root) is frequently used to improve the distribution characteristics 

(normality and homogeneity of variance) (20). The problem of proteomic 

data is the low number of replicates. It is impossible to test these assumptions 

starting from the low sample sizes commonly used in 2DE experiments. 

Tests like the Levene’s test and the Shapiro-Wilk test are designed for higher 

sample sizes and have very limited power at the commonly used sample size in 

proteomics experiments. Given the labor and cost involved in the 2DE analysis, 

the number of replicates is often restricted and ranges usually between 3 and 6. 

Fig. 2. Distribution of two normal populations with a homogeneous variance. μ i : 

real population average estimated by the sample average.


Although some empirical evidence illustrates that slight deviations in meeting 

the assumptions underlying parametric tests may not have radical effects on 

the obtained probability levels, there is no general agreement as to what is a 

“slight” deviation (21). 

An alternative for the parametric tests is the use of non-parametric tests, 

which do not assume any distribution for the data but usually have a relatively 

low power (21). The assumptions are independent and continuous ordinal 

data. A useful non-parametric test is the Kolmogorov–Smirnov test. The 

Kolmogorov–Smirnov test determines whether or not the experimental groups 

come from the same distribution. Therefore, the data points in each experimental 

group are sorted in ascending order, and an empirical distribution 

function is calculated without any assumption of distribution or variance. The 

Kolmogorov–Smirnov test statistic D is defined as the maximum distance 

between the cumulative distributions of two experimental groups (for an 

example, see Fig. 5). 

D n1n2 = max S n1 X − S n2 X (2) 

where S ni (X)=K i /n i K i = number of data equal or less than X; n i : number of 

replicates per experimental group. 

3.2. Exploratory Multivariate Data Analysis 

Univariate statistical tests, such as the T-test, the Kolmogorov–Smirnov 

test, ANOVA, or the Kruskal–Wallis test, have not been designed to analyze 

complex datasets containing multiple correlated variables. Proteomic datasets 

generally contain hundreds of different proteins that are correlated. Proteins fit 

within the larger entity of networks and interact with each other. Univariate 

statistics test the individual variables one by one and are absolutely not able 

to detect correlations to other variables (proteins). Moreover, testing hundreds 

of variables (protein spots) one by one and reporting them with an acceptance 

of a certain risk of false positives () enhances the chance of reporting 

false positive cases (multiple testing issue), and assumes that the different 

variables (proteins) are uncorrelated. Proteins are not uncorrelated; they fit 

within multiple biological pathways and might have close correlations. The field 

of multivariate analysis consists of those statistical techniques that consider two 

or more related random variables as a single entity and attempts to produce an 

overall result taking the relationship among the variables into account (22). In 

contrast to a univariate approach, it displays the inter-relationships between a 

large number of variables and is able to correlate multiple proteins to a specific 

experimental group. The data from different image analysis software packages 

can be exported, introduced, and analyzed using several software packages to


perform multivariate analysis. Some commonly used packages are Unscrambler, 

Matlab, SAS, and Statistica. GE Healthcare developed a statistical software 

package (EDA, extended data analysis) for DIGE approach, which is linked to 

the image analysis software Decyder. The package offers both univariate and 

multivariate tools. Here, we will discuss mainly the use of Principal Component 

Analysis (PCA) (for an overview of other possibilities of EDA package and 

more DIGE related statistical examples, see Chapter 6). 

3.2.1. Principal Component Analysis 

PrincipalComponentAnalysisisoneofthemultivariatepossibilitiestoperform 

explorative data analysis. A comprehensive overview of the use of PCA in 

statistics is given by Sharma (23). The basics of PCA date back to Karl Pearson 

in 1901 (24), and the final procedure as we know it today was developed by 

Harold Hotelling in 1933 (25). The use of multivariate methods in the analysis 

of 2DE was already established in the early days of 2DE (26) and is an emerging 

application in transcriptomics and proteomics (27,28,29,30,31). PCA condenses 

the information contained in a huge dataset into a smaller number of artificial 

factors, which explain most of the variance observed. The most logical modus 

operandi is to consider the different biological replicate samples of the experimental 

groups as observations (score plot). The score plot allows the detection of 

trends in the samples and the loading plot allows to identify the relevant proteins 

that explain the trends. A principal axis transformation transforms the correlated 

variables (proteins) into new uncorrelated variables. A principal component 

(PC) is a linear combination calculated from the existing variables (proteins) 

[PC1 = a 1 (protein1) + a 2 (protein2) +…+a n (protein n); 

PC2=b 1 (protein1) + b 2 (protein2) +…+b n (protein n)]. The relation 

between the original variables (proteins) and the PCs is displayed in the loading 

plot. This means that if a protein has a high loading score for a specific PC, 

that protein explains an important part of the sample variance. The starting 

point for PCA is the sample covariance matrix. It has been proven that the sum 

of the original variances is equal to the sum of the eigenvalues of the sample 

covariance matrix. The eigenvalues are the variances of the PCs. The ratio of 

each eigenvalue to the total variance indicates the portion of the total variability 

accounted for each PC. For the fundamentals of data manipulation and a more 

detailed description of the properties and mechanisms of multivariate analysis 

and PCA, the reader is referred to the books of Jackson and Sharma (22,23). 

It is very important to have an insight into what is calculated and what the 

assumptions are of different models. The EDA software offers the user the 

choice to play with observations and loadings. Hence, the user also has the 

possibility to use the transposed data matrix, and to consider the gel images as


variables (loading plot) and the proteins as observations (score plot). This might 

be helpful to improve the image analysis and to detect protein mismatches, 

but should not be used to explore the inter- and intra-group variability of 

the biological samples. Explorative PCA does not put strict requirements to 

the data. The majority of PCA applications are descriptive in nature. In these 

instances, distributional assumptions are of secondary importance (22). The 

only requirement that must be met is that the dataset has to be complete, 

meaning that there must be no missing spot values among the different samples. 

Finding techniques for performing PCA in the absence of complete data and/or 

techniques for estimating missing data can solve the problem. Several methods 

for estimating missing data have been reported from the microarray community 

(32,33,34). A missing value in 2DE proteomics occurs when a spot is detected 

in the reference or master gel but not detected in one of the other sample gel 

images, or it is detected but not matched to the reference or master gel. The 

causes of missing values might be (1) faint spots, flirting with the detection limit 

and detected in one gel but not detected in another; (2) mismatches probably 

caused by distortions in the protein pattern, or (3) absence of spots due to 

bad transfer from the first to the second dimension. Grove et al. show that the 

staining procedure was an important source of missing values (27). The concept 

of DIGE with its common internal standard anticipates the missing value 

problem to some extent by matching the different internal standard images. 

A good sample preparation (3) and a good experience in electrophoresis and 

proteomic techniques also reduce this problem, but missing values are inherent 

to 2DE and must be faced. Some software packages replace the missing values 

with the value zero, and others remove all the variables with missing values. 

Introducing zeros leaves the results open to serious bias when a protein is 

mismatched in a particular sample or when the spot is missing due to a technical 

error. This particular protein will get an important loading value for the sample 

in question, influencing incorrectly the score for this particular sample. In the 

case a protein is really absent or below the detection limit of the staining 

method, those missing values can be filled either with zeros or with a threshold 

value (35). A better alternative might be to average the samples within an 

experimental group and to explore the data based on the group mean. A missing 

value will still be considered as a zero and will lower the group mean, but the 

impact of loading on the sample score plot is buffered by the average. The 

EDA package offers this possibility (see example below). Taking into account 

only the proteins that are detected and matched to the master or reference gel 

solves the problem of missing values, but a lot of useful information is lost 

(see example below). The EDA package offers the possibility to filter the base 

dataset and to select only those proteins that are 100% matched. Troyanskaya 

et al. show that averaging is an improvement upon replacing missing values


with zeros, but it yields drastically lower accuracy than the estimation methods 

such as singular value decomposition and weighted K-nearest neighbors (32). 

We recommend performing the initial PCA based on the complete dataset 

and not based on the proteins that appear to be significantly different from the 

individual univariate analyses. Multivariate statistics have an additional value 

by being capable of differentiating the different experimental groups in terms of 

correlated expression rather than absolute expression (28,36). Both approaches 

are complementary. Performing the analysis only on significant proteins from 

univariate analysis might disregard useful information. We recommend to 

start the analysis with explorative multivariate analysis and to compare the 

data subsequently with the confirmatory univariate analysis of the individual 

proteins. 

3.2.2. Marker Selection 

Principal Component Analysis is outstanding in detecting outlying data and 

correlations among the different variables (proteins), but it is not able to 

determine a threshold level for identifying which proteins are significant in 

classifying the experimental groups, allowing an objective removal of variables 

(proteins) that do not contribute to the class distinction. Several algorithms 

exist to select a subset of features from the whole dataset and to perform a 

classification. In proteome analysis, this corresponds to selecting the proteins 

that can best discriminate the experimental groups. The use of partial least 

squares (PLS) as a regression technique has been promoted primarily within the 

area of chemometrics (37). In contrast to PCA, PLS is a supervised technique 

mainly applied to link (or regress) a continuous response variable (or dependent 

variable) to a set of independent variables (e.g., proteins in a gel). However, in 

proteomic data, the response variable is often a discrete variable (e.g., treatment 

A, B, C,…) and only takes a fixed number of values. PLS-DA offers an 

algorithm to deal with this typical data structure. An analysis of the score and 

(correlation) loading plot allows defining the proteins that are important in 

discriminating the different experimental treatments. The variable importance 

plot (VIP) is an interesting tool for this purpose. According to the user manual, 

the PLS algorithm of EDA creates a supervised model of the data (predefined 

experimental groups) and then uses the variable influence on the projection 

(VIP) scores from the model to create a ranked list of how good a protein 

is for discrimination between the experimental groups. Discriminant analysis 

(DA) methods, in general, and PLS-DA, in particular, are used to calculate 

the probability or accuracy of the marker selection. The purpose of DA is to 

permit to assign individual observations (samples) to one of the experimental


groups [e.g., the classification of patient samples as healthy and tumor based 

on protein extractions (38)]. 

4. Examples 

4.1. Classical Dyes, 2 Conditions 

In this example, we examine two different conditions, analyse six biological 

samples per condition, and perform the analysis with classical CBB staining. 

The data have been analyzed with the Image Master Platinum software version 

5 (GE Healthcare). Image Master version 5 offers the possibility to compensate 

for technical variance and offers intensity calibration and spot normalization. The 

relativevolume(%vol)spotnormalizationisthebestspotnormalizationprocedure 

because this takes into account the intensity of a spot as well as the area (Eq. 3). 

%vol = vol/ n S=1 vol S (3) 

where vol S is the volume of spot S in a gel containing n detected spots. 

Although this spot normalization procedure reduces the possible technical 

variance, it has consequences for the data. Normalizing all the spots transforms 

the data and creates an asymmetric population (Fig. 3). A logarithmic 

transformation of the data improves the distribution characteristics (Fig. 4). 

However, univariate statistical methods are not developed to analyze all the 

spots simultaneously like in Figs. 3 and 4. They examine the individual protein 

spots (variables) one by one, considering the different proteins as independent 

measurements. Therefore, one should consider each spot individually, and the 

real population for the experimental groups of this particular protein spot should 

be estimated based on the six replicates. Performing distribution tests like the 

Levene’s test and the Shapiro-Wilk test on six replicates is a possibility, but 

is unlikely that the null hypotheses (normally distributed and homogeneous 

variance, respectively) will be rejected. The sample sizes need to be large 

enough in order to minimize the amount of false results (i.e., the populations 

will appear to be normally distributed and of equal variance although this is 

not necessarily the case). 

Taking into account the typical heterogeneity of variance associated with 

classical dyes, the %vol spot normalization of Image Master, and the limited 

sample size, a non-parametric statistical test seems to be the best choice 

in this case. We opted here for the non-parametric univariate Kolmogorov– 

Smirnov test. The test is one among the options offered by Image Master. 

It is a two-sample test with high power efficiency for small sample sizes. 

The reduced power of a non-parametric test was anticipated by including a


2000 

Histogram: Var1 

Shapiro-Wilk W = .35883. p = 0.0000 

Expected Normal 

1800 

1600 

1400 

No. of obs. 

1200 

1000 

800 

600 

400 

200 

0 

–0.3 

–0.1 0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 

Fig. 3. Distribution of protein spots analyzed by image master and normalized using 

the %vol criterion. There is an asymmetrical distribution, with the majority of the spots 

lying between 0 and 0.1%. 

1000 

Histogram: Var2 

Shapiro-Wilk W = .98283. p = .00000 

Expected Normal 

900 

800 

700 

No. of obs. 

600 

500 

400 

300 

200 

100 

0 

–7 –6 –5 –4 –3 –2 –1 0 1 

Fig. 4. A logarithmic transformation of the %vol data of Fig. 3.


higher number (6) of biological replicates. Figure 5 shows an example of 

an individual Kolmogorov–Smirnov test. For the complete experimental setup 

and biological background, see Carpentier et al. (39). The options of the Image 

Master Platinum software are rather limited and are focused on two experimental 

groups. The multivariate analysis offered by Image Master Platinum is 

factor analysis. Factor analysis is a technique similar in nature to PCA. The 

results of both techniques are quite similar except that factor analysis explains 

rather correlations between variables, while PCA explains variability (22). In 

Image Master Platinum, the gels (images) are used as loading and proteins 

for the score plot. Factor 1 (explaining the majority of the variability) is in 

our case associated to protein abundance, and the second factor is associated 

with inter-group variability. As stated above, this might be useful to improve 

the image analysis and to detect protein mismatches, but to explore the interand 

intra-variability of the biological samples, it might be better to export the 

A B C 

0.8 

0.8 

0.8 

0.6 

0.6 

0.6 

% vol 

0.4 

0.4 

0.4 

0.2 

0.2 

0.2 

0 

0 

0 

a b a b c d e f gh 

i jkl 

kjlg 

ih 

fe 

b a d c 

1373 

1373 

1373 

D 

0.9 

0.8 

frequence 

0.7 

0.6 

0.5 

0.4 

A 

B 

0.3 

0.2 

0.1 

0 

0 

0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

%vol 

Fig. 5. Example of Kolmogorov–Smirnov test. (A) Descriptive statistics displaying 

the experimental mean and standard deviation of the two experimental groups (A and B). 

(B) Descriptive statistics of the individual biological samples of the two experimental 

groups. (C) The data sorted in ascending order. (D) Empirical cumulative distribution 

functions of the two experimental groups.


data to a statistical program. For an example of classical staining and uni- and 

multivariate analysis, see Pedreschi et al. (40). 

4.2. DIGE Approach, 4 Conditions 

In this example, we are interested in the effects of a specific treatment over 

time. Using the DIGE approach, we consider here four time points. At each time 

point, three biological samples were analyzed, quantifying several hundreds of 

protein spots (i.e., variables) per sample per time point. To process and analyze 

the gels, the Decyder software version 6.5 was used in combination with the 

EDA module (GE Healthcare). The standardized normalization procedure in 

Decyder 2D BVA is based on the concept of having for each gel the Cy2 

labeled internal standard image as reference. This standard image is used to 

normalize the abundance ratios between the different gels. Decyder offers 

the possibility to perform transformation and normalization of the data: log 

standardized abundance (Eq. 4). 

Log standardized abundance = 10 log vol Cy5 or Cy3/vol Cy2 (4) 

Using the DIGE approach, Karp and Lilley gathered reasonable arguments to 

assume that the restrictive assumptions of parametric statistics are not violated 

too strong after the logarithmic transformation of standardized abundance 

(1). The use of parametric statistics seems, therefore, acceptable. However, 

univariate statistics test the individual variables one by one and are absolutely 

not able to correlate multiple proteins. Moreover, testing hundreds of variables 

(protein spots) one by one and reporting them with an acceptance of a certain 

risk of false positives () enhances the chance of reporting false positive cases 

(multiple testing issue). It is, therefore, advisable to get first an insight in the 

complex dataset and to explore the data first via multivariate analysis and 

validate the individual differences via univariate statistics. Not all proteins are 

relevant to understand the differences between the time points. Therefore, it 

would be interesting to distinguish relevant proteins from irrelevant proteins 

that do not have a changing abundance over time. To facilitate the discovery 

of the differences, we used the PCA of the extended data analysis module of 

Decyder. PCA reduces more than 1000 variables into PCs that explain most 

of the variance between the treatment times. PCA analysis is not supervised, 

meaning that the samples are analyzed without the knowledge of sampling 

time. In Fig. 6, the score and loading plot are displayed, taking into account 

the two most important PCs. The different repetitions of the same time point 

cluster together, and the most important PC (i.e., PC1) is able to separate the 

clustered treatment times. In practice, this means that proteins with a high 

positive PC1 value will be abundantly present in the 2-day gels and less


abundant in 14-day gels and vice versa for proteins with a highly negative 

PC1 value. Proteins that cluster together have a similar impact on the PCs and 

have a similar expression pattern (Fig. 6). This rough approach explains only 

a small part of the variability. The first PC explains 34.2% of the variability 

and explains a great part of the inter-group biological variability (time effect). 

A high positive PC1 value is correlated to 2 days, and a high negative value is 

correlated to 14 days. Most proteins cluster around the origin, indicating a poor 

contribution to the variance and probably do not change in abundance during 

the examined time period. The second PC explains 15.1% of the variability 

and seems to explain mainly (technical) intra-group variability. By default 

EDA ignores the missing values. By anticipating the missing value issue and 

taking the average of each experimental group and reducing some technical 

variability, the first component explains 60.9% of the variability and the second 

PC 23.4%. Taking into account only the proteins that have been matched and 

detected in all the gels reduces the number of examined proteins by more 

than 50% and discards very useful proteins that have, for instance, a very low 

A 

B 

Fig. 6. PCA analysis. (A) Score plot. The big circle is based on the Hotellings T 2 -test 

statistic and is used to detect outlying observables ( 0.95). The three biological replicates 

of the same experimental group cluster together, indicating an acceptable intragroup 

variability (grey ellipse). The different experimental groups are also separated, 

indicating a certain inter-group variability. There is a clear difference between 2 and 14 

days of treatment. (B) The loading plot indicates the correlation between the original 

variables. A protein with a high loading score for a specific PC explains an important 

part of the sample variance.


abundance in the early days of treatment and higher abundances at the end 

and vice versa. 

As an example, we focus on five proteins that seem highly correlated from 

the loading plot (highlighted in Fig. 6B). Confirmatory differential expression 

analysis via ANOVA confirms that all five proteins have a very similar 

expression pattern over time (Fig. 7). This might suggest a common regulatory 

mechanism or an interaction between the proteins. The individual confirmatory 

univariate statistics (ANOVA and multiple comparison test) confirm for four 

out of the five proteins that 2 days is significantly different from 4 days, 8 days, 

and 14 days; and that 14 days is significantly different from 4 days and 8 days 

( ≤0.01). We could identify four proteins as lectin isoforms (39), confirming, 

indeed at a first level, the correlation between the proteins. One protein could 

not be identified and is under further investigation. This protein is likely to 

have a common regulatory mechanism (being also a lectin-like protein), might 

form a complex, or develop an interaction with lectin proteins. This particular 

protein shows exactly the same expression pattern as the four identified lectins, 

but the overall ANOVA has a value of 0.0122. This is a nice illustration of 

Fig. 7. Confirmatory differential expression analysis—expression pattern of the 

individual proteins selected from Fig. 6. The different normalized relative abundances 

are displayed for the different time points (14 days, 8 days, 4 days, and 2 days). The 

mean of each individual isoform is displayed as a cross.


how exploratory data analysis is performing, indicating correlation but also 

bringing up candidate markers that would have been missed when using only 

confirmatory data analysis ( ≤ 0.01). 

5. Conclusions 

The experimental conditions are important and must be well designed. 

Ideally, only biological replicate samples should be used, and one should try to 

limit the technical variability to the strict minimum. A reliable sample preparation 

and an extended experience in electrophoresis and proteomic techniques 

are indispensable. With the low technical variance observed with the DIGE 

approach, the need for analyzing technical replicates can be questioned. The 

pooling of samples reduces the biological variance to detect changes in protein 

abundance between the averages of the experimental groups. Pooling of samples 

might be useful but must be reconsidered for each individual experimental setup. 

The use of a particular staining method should carefully be considered taking 

into account the available lab equipment, budget, and power of a particular 

method. The dynamic range of the staining methods and the technical variability 

have a great impact on the power of a statistical test and are decisive for 

the experimental setup (the number of replicates) and the choice of the statistical 

test. Univariate statistics test the individual variables one by one and are 

absolutely not able to correlate multiple proteins. Moreover, testing hundreds 

of variables (protein spots) one by one and reporting them with an acceptance 

of a certain risk of false positives () enhances the chance of reporting false 

positive cases (multiple testing issue). Therefore, it is advisable to first get an 

insight in the complex dataset and to explore the data via multivariate analysis 

and validate the individual differences via univariate statistics. Using a classical 

approach with the typical heterogeneity of variance associated with classical 

dyes and the limited sample sizes, a non-parametric test seems to be the best 

choice. Using the DIGE approach, the restrictive assumptions of parametric 

statistics are not violated too strong after the logarithmic transformation of 

the standardized abundance. The use of parametric statistics seems, therefore, 

acceptable. 


The authors would like to thank Romina Pedreschi for critical reading and 

suggestions and Prof. Verbeke for the sharing of his files. Financial support 

from the Belgian National Fund for Scientific Research (FWO-Flanders) is 

gratefully acknowledged.


References 

1. Karp, N. A. & Lilley, K. S. (2005) Proteomics 5, 3105–3115. 

2. Urfer, W., Grzegorczyk, M., & Jung, K. (2006) Proteomics S2, 48–55. 

3. Carpentier, S. C., Witters, E., Laukens, K., Deckers, P., Swennen, R., & Panis, B. 

(2005) Proteomics 5, 2497–2507. 

4. Bjellqvist, B., Ek, K., Righetti, P. G., Gianazza, E., Gorg, A., Westermeier, R., & 

Postel, W. (1982) J. Biochem. Biophys. Methods 6, 317–339. 

5. Westermeier, R. (2001) Electrophoresis in Practice. Wiley-VCH, Weinheim. 

6. Westermeier, R. & Naven, T. (2002) Proteomics in Practice. Wiley-VCH, 

Weinheim. 

7. Rabilloud, T. (2000) Proteome research: two dimensional gel electrophoresis and 

identification methods. Springer, Heidelberg. 

8. Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996) In: 

Applied Linear Statistical Models (Neter, J., Kutner, M. H., Nachtsheim, C. J., & 

Wasserman, W., eds.). Irwin, Chicago, pp. 958–1010. 



Wasserman, W., eds.). Irwin, Chicago, pp. 1121–1164. 

10. Karp, N. A., Spencer, M., Lindsay, H., O’dell, K., & Lilley, K. S. (2005) J. 

Proteome Res. 4, 1867–1871. 

11. Patton, W. F. (2000) Electrophoresis 21, 1123–1144. 

12. Westermeier, R. (2006) Proteomics S2 61–64. 

13. Switzer, R. C., Merril, C. R., & Shifrin, S. (1979) Anal. Biochem. 98, 231–237. 

14. Rabilloud, T., Vuillard, L., Gilly, C., & Lawrence, J. (1994) Cellular and Molecular 

Biology 40, 57–75. 

15. Unlu, M., Morgan, M. E., & Minden, J. S. (1997) Electrophoresis 18, 2071–2077. 

16. Alban, A., Currie, I., Lewis, S., Stone, T., & Sweet, A. C. (2002) Mol. Biol. Cell 

13, 407A–408A. 

17. Alban, A., David, S. O., Bjorkesten, L., Andersson, C., Sloge, E., Lewis, S., & 

Currie, I. (2003) Proteomics 3, 36–44. 

18. Tonge, R., Shaw, J., Middleton, B., Rowlinson, R., Rayner, S., Young, J., 

Pognan, F., Hawkins, E., Currie, I. et al. (2001) Proteomics 1, 377–396. 



Wasserman, W. eds.). Irwin, Chicago, pp. 95–152. 

20. Gustafsson, J. S., Ceasar, R., Glasbey, C. A., Blomberg, A., & Rudemo, M. (2004) 


21. Siegel, S. C. N. J. (1988) Non Parametric Statistics for Behavioral Sciences. 

McGraw-Hill Book Company, Singapore. 

22. Jackson, J. E. (2003) A User’s Guide to Principal Components. Wiley, New York. 

23. Sharma, S. Applied Multivariate Techniques. Wiley, Hoboken, NJ. 

24. Pearson, K. (1901) Phil. Mag. Ser. B. 2, 559–572. 

25. Hotelling, H. (1933) J. Educ. Psychol. 24, 417–441. 

26. Tarroux, P. (1983) Electrophoresis 4, 63–70.


27. Grove, H., Hollung, K., Uhlen, A. K., Martens, H., & Faergestad, E. M. (2006) J. 

Proteome Res. 5, 3399–3410. 

28. Marengo, E., Robotti, E., Bobba, M., Liparota, M. C., Rustichelli, C., Zamoo, A., 

Chilosi, M., & Righetti, P. G. (2006) Electrophoresis 27, 484–494. 

29. Schultz, J., Gottlieb, D. M., Petersen, M., Nesic, L., Jacobsen, S., & Sondergaard, I. 

(2004) Electrophoresis 25, 502–511. 

30. Verhoeckx, K. C. M., Gaspari, M., Bijlsma, S., Van Der Greef, J., Witkamp, R. F., 

Doornbos, R. P., & Rodenburg, R. J. T. (2005) J. Proteome Res. 4, 2015–2023. 

31. Gottlieb, D. M., Schultz, J., Bruun, S. W., Jacobsen, S., & Sondergaard, I. (2004) 

Phytochemistry 65, 1531–1548. 

32. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., 

Botstein, D., & Altman, R. B. (2001) Bioinformatics 17, 520–525. 

33. Scheel, I., Aldrin, M., Glad, I. K., Sorum, R., Lyng, H., & Frigessi, A. (2005) 

Bioinformatics 21, 4272–4279. 

34. Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., & Ishii, S. (2003) 

Bioinformatics 19, 2088–2096. 

35. Wood, J., White, I. R., & Cutler, P. (2004) Signal Process. 84, 1777–1788. 

36. Karp, N. A., Griffin, J. L., & Lilley, K. S. (2005) Proteomics 5, 81–90. 

37. Wold, S. (1985) Encyc. Stat. Sci. 6, 581–591. 

38. Nguyen, D. V. & Rocke, D. M. (2002) Bioinformatics 18, 39–50. 

39. Carpentier, S. C., Witters, E., Laukens, K., Van Onckelen, H., Swennen, R., & 

Panis, B. (2007) Proteomics 7, 92–105. 

40. Pedreschi, R., Vanstreels, E., Carpentier, S., Robben, J., Noben, J. P., Swennen, R., 

Lammertyn, J., Vanderleyden, J., & Nicolaï,B.M. Proteomics 7, 2083–2099.

18 

Web-Based Tools for Protein Classification 

Costas D. Paliakasis, Ioannis Michalopoulos, and Sophia Kossida 

Summary 

Current proteomics technologies generate large number of data among which the investigator 

has to identify the promising diagnostic/prognostic biomarkers as well as potential 

therapeutic targets. For the latter, classification of proteins into meaningful families is 

needed. Current databases, featuring a high level of interconnectivity (cross referencing), 

provide the tools necessary to bring various data together, facilitating protein classification 

and elucidation of protein function and interoperativity. This chapter provides guidelines 

to explore the informationally rich peptide sequences generated by the application of the 

proteomics methodologies by the use of web-based tools, with the objective to predict 

potential protein function. After proper preprocessing (e.g., for internal repeats) of a query 

protein sequence, known domains can be identified, which aid in dividing the query into 

smaller meaningful parts. Any unclassified remainder of the protein provides the material 

for low-level comparative analysis for the discovery of distant homologues or candidate 

novel domain types to be verified experimentally. 

Key Words: protein classification; domain families; recurrent tertiary structural 

motifs; sequence–structure relationships; (protein) structural evolution; protein database; 

homology searches; domain inference; protein structure redundancy. 


From the times of the “one man-one gene” approach, when individuals were 

working on single protein sequences, which were decoded from the corresponding 

DNA sequences, to the era of high-throughput techniques, when 

massive automated procedures produce large numbers of peptide sequences, 

one task remains virtually the same: individual protein sequences need classification. 

We, humans, have an amazing instinctive capability to categorize 



349

350 Paliakasis et al. 

objects, even the most complex ones, which in particular can be categorized 

along various kinds of natural or arbitrary schemes. Proteins feature 

multiple attributes, such as sequence, structure, function, organelle specificity, 

evolutionary origin, affinity, isoelectric point, and size (not to mention tissue 

specificity and antigenicity in higher organisms), all of which offer means for 

classification. For instance, 2D gel spots corresponding to proteins, which have 

been separated in terms of their size and isoelectric point, reflect a primary 

attempt for classification; affinity (e.g., nucleoprotein, lipoprotein, metalloprotein, 

etc.) and function (e.g., enzyme, carrier) offer another basis for classification, 

both relating to the chemistry of a protein, and basic spectroscopic 

data, like those of circular dichroism (which suggest an estimate of the relative 

amounts of -stranded vs. -helical structure), permit classification to the all-, 

all- or mixed / classes. However, classification schemes based on general 

attributes (e.g., the physicochemical properties of proteins) suffer from heterogeneity 

within their classes. For instance, a number of otherwise unrelated 

proteins can be classified as “metalloproteins.” 

In general, two requirements with opposing effects should be satisfied by 

any classification scheme: specificity, which leads to particularization (i.e., a 

higher number of narrower classes) and abstraction, which leads to generalization 

(i.e., a smaller number of wider classes). In the end, a comprehensive 

and useful hierarchy is a trade-off between specificity and abstraction (i.e., 

the most general classes possible that are still useful in some desired way). 

Proteins, the structures of which represent successful solutions to the problem 

of thermodynamic stability and at the same time can accommodate a biologically 

useful function, provide the basis of all kinds of radiant variation at the 

level of protein sequence (and consequently function). Each protein variant, 

that survives the evolutionary pressure of competition against other potential 

variants, has emerged after a series of modifications of various extents; an 

explanation is presented later on why this is the preferred mode of action. 

Common ancestry classification schemes provide the specificity necessary to 

define sensible protein classes, in contrast to those classification schemes, 

which follow general features. In the former, all members of each class 

share a common tertiary structure across very wide evolutionary spans, while 

similarities at the level of amino acid sequence remain exploitable, even in 

cases where they are hard to detect. Therefore, evolution-based classification 

schemes are not driven by our natural impulse to categorize objects drawing 

arbitrary borderlines, but reflect basic principles of the protein nature. In 

fact, classification with respect to evolutionary history and structure comes 

so naturally, that when function is not preserved, we tend to refer to a 

“-like” form within the same family of proteins, rather than to a different 

family.

Web-Based Tools for Protein Classification 351 

Protein sequences derived from a common ancestor by divergent evolution, 

share a high degree of similarity (both with each other and naturally with their 

ancestor, although the latter may be unknown). This similarity persists over 

quite a wide evolutionary span, before it is worn out by divergence and rendered 

undetectable by direct pair-wise sequence alignments. Conveniently, it is highly 

unlikely that proteins without common evolutionary origin share a high degree 

of similarity; in fact, the higher the similarity the more recent the speciation. 

It will be shown how these nearest relatives provide the guidelines to identify 

the features that are crucial for the definition of a family of proteins, before 

the detection of the most remote relationships is attempted. In conclusion, 

the amino acid sequence offers a highly specific key to classification, albeit 

intermediary members, and structure may need to be consulted, before any 

remote members of a class can be detected. 

The evolution-based classification schemes, as well as the tools available 

over the web to explore them, constitute the subject of the following notes. 

Many researchers in the relevant fields tend to take simple homology searches 

and domain assignment tools for granted, until an unexpected outcome sheds 

doubt and confusion; it is the authors’ intention that by the end of this chapter, 

the reader will be capable to conduct those (otherwise routine) tasks with a 

higher degree of both awareness and confidence. 

2. Materials 

The procedure of protein classification comprises several more or less 

independent steps. Although these steps have been arranged (in the present 

notes) in the order they are usually employed, this order can change, depending 

on the nature of information available at each point. Steps can also be omitted, 

if they are unnecessary or their target has already been accomplished (although 

performing them will provide further reassurance). Each of the steps described 

is a small protocol in each own right; a number of web tools – some of them in a 

number of variations – implement each of those steps. However, improvement 

of user friendliness on one hand and users’ skills on the other has rendered the 

procedure to look like a single protocol; in fact, sometimes automation hides 

a number of steps of which only the results can be viewed, in the form of a 

compiled web page. Instead of listing the websites of all relevant tools, a small 

and comprehensive selection of entry points is suggested in Table 1, via which 

a wealth of tools is then accessible. All of those websites provide user friendly 

interfaces. It is suggested that the reader browses (and gets familiar with) at 

least those main websites, before attempting to delve deeper into the realm of 

web-based analysis tools.


Table 1 

Main Entry Points to the World Wide Web for Protein Classification 

ExPASy 

www.expasy.org 

A wide range of software tools for the analysis of protein sequences and 

structures as well as 2D PAGE, can be found here. It also offers an entry 

point to a rich collection of other web sites, mainly the SwissProt/UniProt 

databases 

BLAST 

www.ncbi.nlm.nih.gov/BLAST 

A convenient starting point for on-line search of sequence databases (both 

protein and DNA ones). Many other sites feature some version of BLAST as 

well 

EnsEMBL 

www.ensembl.org 

A collection of complete genomes, which offers an entry point from a different 

view – that of a genome rather than that of a sequence 

Pfam 

www.sanger.ac.uk/software/pfam 

A collection of profiles of protein families against which a sequence can be 

matched, for initial domain recognition 

Protein data bank 

www.pdb.org and www.rcsb.org 

The archive of experimentally determined 3D-structures (by crystallography, 

NMR, and other techniques) of biological macromolecules (proteins, nucleic 

acids, sugars, etc.) 

InterPro 

www.ebi.ac.uk/interpro 

An effort to integrate information from several diverse sources to a unified 

comprehensible form 

3. Methods 

3.1. Theoretical Issues: Classification Based on Sequence or 

Structure 

The specifics that define a set of sequences as a protein family (i.e., molecular 

function and involved amino acid residues, other kinds of sequence fingerprints, 

post-translational modification, etc.) have to be accommodated within 

a structural framework Fig. 1. However, 3D structure is not reserved for one 

protein family. In fact, there seems to be a countable set of spatially local 

packing arrangements between -helices and -sheets, which, when combined,


Fig. 1. Complex shapes can be misclassified by a general property like size, because 

of small (or larger) parts missing in relation to the simplest forms from which they derive. 

More specific (“shape-related”) attributes can bring all stars (and parts thereof) together, 

as they can do with triangles, squares, and circles. Once a proper overall scheme is 

in place, general attributes (like color) can then detail the distribution within each class. 

lead to 3D structural assemblages, stable in terms of thermodynamics and 

useful in terms of function (1). The participant elements may be distant along 

the sequence or they may even belong to different chains. The small number 

of packing options leads to the occurrence of common 3D structural themes, 

termed the recurrent tertiary motifs, e.g., “up-and-down” helical bundles, - 

barrels, etc. Descriptions at this level of abstraction take into account neither the 

sequential order of the helices and strands nor their length. Tertiary structural 

domains in proteins of unrelated evolutionary origin (or function) with apparently 

unrelated sequences, may adopt the same tertiary motif (usually including 

further 3D structural elements [(2) see also Note 1]. It can be claimed that 

the abstract idea of a recurrent tertiary motif leans toward the basic packing 

arrangements,whereastheimplementeddomainsareclosertotheproteinfamilies. 

The 3D environment of certain positions on the structure (a different set 

of positions for each recurrent tertiary structural motif) poses physicochemical


5-vdef sNIR[enpvtpwnpeps] 

: * : * + : *+ 

R1: A PVID PT AYID PE ASVI G 

R2: E VTIG AN VMVS PM ASIR S[degm] 

R3: P IFVG DR SNVQ DG VVLH A[letineegepiednivevdgkey] 

R4: A VYIG NN VSLA HQ SQVH G 

R5: P AAVG DD TFIG MQ AFVF - 

R6: K SKVG NN CVLE PR SAAI - 

R7: G VTIP DG RYIP AG MVVT - 

------------------------- 

CNS: a VfIG DN vyIa pQ AvVh(g|s) (Consensus) 

BS#1 T1 BS#2 T2 BS#3 

Fig. 2. The seven repeats that form the -helix in MT-CA demonstrate the level of 

the impact that structure can have on sequence. The -strands (groups of four residues) 

are shown, separated from the intervening “turns” (groups of two). The turns that 

connect successive repeats are split–one residue at the left end and a second one, which 

is missing in some cases, at the right end. Parts of the sequence in square brackets 

[] are intervening connecting loops; the part in angle brackets follows this core 

motif and is not part of the repeat sequence. The His residues that coordinate the Zn 

atom are underlined, and stem from positions (within the repeat) marked by a plus 

sign (+). A partial repeat (every six positions) has been proposed on the basis of other 

sequences that adopt this structure; the positions marked by stars (*) correspond to main 

positions in this (partial) repeat, and the ones marked by colon (:) correspond to the 

secondary ones. No repetition of this kind (i.e., every six positions) is apparent for any 

other positions, leaving the 17–18 residues long repeat unit as the only complete one. 

Positions Asn10–Arg12 (top row) form a small extension the -sheet #3; preceding 

residues are shown for completeness and only to emphasize that the repeat does not 

extend in them. In the consensus, drawn at the bottom row, the main ingredients of the 

repeat unit are shown in capital letters. 

requirements, which can be best met usually by one or a few amino acid types), 

thus defining a scale of preferences (3). These preferences are reflected onto 

patterns that may arise at the level of the primary sequences (that adopt the 

relevant recurrent tertiary motifs), whenever these spatially defined positions 

are close along the sequence Fig. 2. It should be noted, that these patterns are 

reflections along the sequence of the abstract tertiary theme and that they are 

much more general than the detailed protein family-specific sequence fingerprints. 

Simplified lattice models suggest that a small number of 3D structural 

motifs set loose requirements that can be met by a large number of sequences, 

along their evolutionary pathway (4). In this case, nature appears to reuse a


successful structural solution in evolutionarily unrelated sequences (see Note 

2). On the other end, a large number of 3D structural motifs pose requirements 

so manifold and exact that only a few sequences can be compatible with them. 

The resultant patterns of preferences along the sequence appear occasionally 

strong enough to permit structural motif prediction from the sequence alone (5). 

It can be claimed that no more than 200 recurrent tertiary structural motifs 

(the exact number depending on the stringency of their definition) provide the 

structural basis of perhaps 95% of the nonredundant set of protein structures 

(2). The average residue coverage is a much smaller figure due to the need 

of additional structural elements to complete a domain. Vice versa, a large 

number of tertiary structural motifs are so rare, that they provide the basis of the 

small remaining proportion of protein structures (see Note 3). Detailed specialization 

into families takes place within this structural framework: Chothia (6) 

has long ago estimated that 95% of the protein information to be discovered 

will derive from no more than 1000 protein families. In fact, for a substantial 

(and growing) proportion of any newly identified protein sequences, enough 

information already exists in the databases to build a 3D model (7). The 

reason for this lies on a simple fact: during the creation of new protein 

families, the relatively small number of structural alternatives directs nature 

to a strong preference for the reuse of already successful solutions at the 

level of sequence (not structure), especially when similar problems are to be 

solved rather than discovering new ones, on the basis of the same or different 

structure. The traits being inherited along reuse of sequences are usually the 

ones to be exploited in protein classification. On the other hand, this small 

set of structural motifs, the ones easily accessible to protein families of irrelevant 

origin and/or function, occasionally leads otherwise unrelated proteins to 

elevated sequence similarity scores (which sometimes appear too high to be 

explained by chance), just because they fold in the same manner (see Note 4). 

The traits being developed (as opposed to being inherited) reflect convergent 

evolution. 

Protein structure has also served as the basis of classification in some 

schemes. However, the theoretical considerations, which have been discussed 

herein (in particular, the fact that unrelated proteins may fold in the same way), 

hint that classification on the basis of 3D structure alone, will tend to be on a 

coarser scale. On the other hand, the availability of detailed structural data for a 

(preferably representative) member of a protein family, experimentally derived 

by means of X-ray crystallography or NMR spectroscopy, besides all kinds of 

facilitation reserved for other procedures (e.g., structure-based protein design), 

offers a valuable aid in sequence-based classification. It provides a very solid 

ground to assess any sequence-based classification, and a great tool to detect 

the most remote members. However, unless classifying protein structure per se


(rather than proteins in their entirety), it appears that a common structural architecture 

alone is not sufficient evidence to classify proteins in the same class. 

Evolutionarily refined variants of tertiary structural domains, “similar-yetdifferent” 

within a given repertoire, appear in different combinations with those 

of other repertoires: a domain for a different cofactor or regulatory factor 

(e.g., GDP vs. ADP) may be combined with a catalytic domain for a slightly 

different substrate (fructose vs. glucose). Thus, the most complicated and best 

tuned series of (simpler) functions, necessary for life, can be accomplished 

in a spatially ordered and life efficient manner. On the other hand, this fact 

makes essentially imperative that any classification proceeds up to terms of 

domains: it suffices to describe any sequence in question, as comprising of “an 

N-terminal domain of type X and a C-terminal domain of type Y, joined by a 

loop region of type Z,” otherwise, extensive subtyping and the “Russian doll” 

effect (see Note 5) will soon be confronted. 

In practice, the classification procedure starts in the form of the detection 

of some similarity between a protein (or part thereof) and a prototype (e.g., 

a profile extracted from a multiple alignment or a structure through which 

it is threaded), which is too high to explain by chance alone. The tools to 

demonstrate this similarity are presented under the Subheading 3.2, in any case, 

it will be the network of similarities within a set of data (sequences, structures, 

etc), which will clarify the underlying reason for the observed similarity. 

3.2. The Practical Side 

It cannot be stressed enough that most protein sequences are nowadays translations 

of relevant nucleic acid sequences. It is important to identify cDNA 

originals if possible, to ensure that the employed nucleic acid sequence corresponds 

to protein in a reliable way. When the original data are supplied in 

the form of genomic DNA fragments, introns could still be included and alternative 

splicing remains a possibility. Current gene recognition programs like 

GeneScan (8), normally expected in genome-oriented databases like EnsEmbl 

(9) (see Note 6), can efficiently detect and remove introns, but errors may still 

infiltrate. If this is the origin of the protein data, certain precautions should be 

taken: 

• Search for relevant proteins with reliable sequences, e.g., by means of a preliminary 

Basic Local Alignment and Search Tool (BLAST) (10) search against SwissProt (11). 

• Align the sequence of interest to any trustworthy matches and observe the pattern 

of conservation. Sudden insertions to the sequence in question (especially ones with 

highly biased composition, short tandem repeats or repetitions of other parts of the 

protein, especially partial ones, etc) do not necessarily represent extra features or 

minidomains; deleted parts may have been mistakenly considered to be introns.


• Isolate “candidate” insertions and try to find similar sequences in the databases; see 

if any trustworthy match makes sense in terms of biology. 

• Alternatively, try finding a protein in the Protein Data Bank (PDB) (12), which 

is similar (even remotely) to the one in question (excluding the insert), and has 

its 3D structure experimentally known (see Note 7). The location of the candidate 

insertion/deletion on the structure may verify or reject it. 

• Parts of the query protein matching expressed sequence tags (ESTs) (13) provide an 

extra source of verification (see Note 8): a part matching an EST is an expressed part. 

Other criteria may apply to verify the integrity of a processed putative gene. 

For example, if the protein has been biochemically characterized, then any 

experimentally observed property must match the ones of the sequence that is 

predicted by the gene (or have a good reason why it does not). 

Another very serious issue is the fact that many annotations are automatically 

transferred between similar sequences of the same or different databases. 

Even SwissProt entries are crowded with annotations assigned “by similarity.” 

The number of proteins with primary annotations is many orders of magnitude 

smaller than the number of annotated sequences in the current databases. 

These annotations should be considered as hints that can direct experiments to 

promising routes rather than secure data. 

3.2.1. Preprocessing the Query 

A preliminary check up of the protein sequence itself is recommended. 

Repeats and parts of low complexity are of particular interest. 

3.2.1.1. REPEATS 

Regularities in biological macromolecular structure (like the helical nature 

of DNA or the super-coiled structure of some protein assemblies) and multimerization 

create room for repetitions along the protein sequences. Repeats can 

range in length from a few amino acid residues to complete domains (e.g., as 

a result of domain duplication). 

In the latter case, the repetition count is usually small, just two to three 

copies (14) although much higher counts do occur. When catalytic domains are 

repeated, the situation may have no ground on structural regularities; it may 

for instance reflect a need for efficiency (e.g., cooperativity between different 

copies of a domain). In database searches for multidomain protein queries, 

it is anyway recommended to treat different domains separately, for reasons 

explained later on; the difference here lies in the fact that the separate copies 

can be aligned, and their consensus (or profile) can be extracted and serve as 

the query.


On the other hand, short tandem repeats (e.g., about 10 amino acid residues 

long or shorter) normally reflect some structural regularity. In a dot-plot 

style alignment of a protein sequence to itself they manifest themselves as a 

(moderate-to-high) number of tracks, which run parallel to the main diagonal 

(and to each other) in a regular manner (Fig. 3). Since combinations of parts 

coming from different tracks produce significant alternative alignments, procedures, 

which attempt to report all possible alternative alignments between two 

proteins will be severely confounded (see Note 9 on BLAST in particular). 

A consensus or a profile may be extracted again by a proper alignment of 

the repeats. However, statistically significant matches cannot be expected for 

a resultant query of (say) 6 or 12 amino acid residues long. One possible cure 

is to concatenate a small number of repeats, to produce a query no longer than 

50 amino acid residues (see Note 10 on why 50). The small number of repeats 

(e.g., four repeats of length 11) helps avoiding the explosion of alternatives, 

although a few of them will not be completely avoided. If this step is taken, it is 

suggested that the output of a dot-plot utility (such as DOTLET, a Java-based 

hosted in ExPASy server; Table 1) is consulted, at all times. 

3.2.1.2. Parts of Low Complexity 

Low complexity occurs when some part of the sequence comprises only 

a few types of amino acid residues, leading database queries to nonspecific 

results (see Note 11); the situation can be even worse if some of these types 

are similar to each other. In general, it is important to know beforehand any 

significant deviations of the composition in types of amino acid residues, as 

well as the presence of special features such as signal peptides or groupings 

of biologically relevant charged side chains (see Note 12). Relevant search 

procedures, like BLAST (10), detect stretches of low complexity and offer 

to ignore them during the search; however, what appears to be a part of low 

complexity may be e.g., a transmembrane stretch. The action to take depends 

on both the importance and the position of the stretch: 

• If a single transmembrane part makes sense (or is known to exist), the extra- and 

intracellular moieties can be separate queries. 

• A signal peptide (especially when located at the extreme of the N-terminus) usually 

can be excluded from the procedure, profitably or at least without problem. 

• A stretch of low complexity, which appears to be of no special significance in terms 

of structure/function/evolution, can be best left to the search procedure to mask it. 

Relevant tools are available from the Web (e.g., the ExPASy site). Alternatively, 

a simple dot-plot style alignment of the protein sequence can be run vs. itself. 

Besides repeats, this will reveal areas of low complexity as square blocks of 

elevated average score, symmetrical around the main diagonal (Fig. 3). If low


(A) 

(B) 

Fig. 3. Continued


complexity occurs within the boundaries of a repeat, similar square blocks will 

appear around relevant parallel off-diagonal tracks. 

3.2.2. Inference of Domains 

In the spirit of the theoretical analysis earlier in this chapter, classification 

can take the form of assigning parts of the sequence to domains. Hence, using 

a domain inferring tool like the ones offered by Pfam (15) and SMART (16) 

should be among the first steps for classification of a protein, based on its 

sequence (see Note 13). This information serves to divide the sequence of 

interest into pieces and handle them separately (see Note 14). 

Given the high coverage achieved by those collections (more than 75% of 

the proteins have at least one domain recognized by them, and in average about 

two-thirds of the length of a protein can be described this way) (15), some 

protein sequence classification efforts end here (see Note 15). In fact, database 

search procedures should be soon expected to exploit high-level features, which 

will be extracted from the query and relevant sequences, resorting to amino 

acids alone, only for parts where the attempts will fail. 

3.2.3. Querying Other Databases 

Despite the current high coverage of protein sequences in terms of known 

domains, parts of these sequences still elude. These parts may simply be 

too distant members of the families they belong to, and they have failed the 

thresholds of automatic procedures. Those parts should be isolated, properly 

preprocessed (mainly for compositional biases), and queried against SwissProt 

and PDB. 

• Entries (records) in SwissProt (11) offer rich annotation and crossreferences to a 

number of resources, all in a mainly human readable form and via a nice user 

friendly interface on top. The high level of curation (including annotation derived 

by similarity) will save duplicate efforts and may provide valuable hints on how to 

move on. 

◭ 

Fig. 3. (Continued) (A) Schematic representation of a dot-plot style alignment of 

a protein against itself; to depict the special cases presented in the text, the protein 

is supposed to feature two copies of some domain, a low complexity N-terminus and 

a C-terminal part dominated by some short internal repeat, except for a tail, which 

appears unique. (B) Alignment of a small part (from a real protein) of low complexity 

against itself. The situation here is worse than suspected, because the few types of 

amino acid residues are related to each other (alanine to valine and glycine; to proline 

and serine in lesser extent).


• Search for similar sequences in PDB (12) will reveal experimentally determined 

3D structures of protein instances, possibly related (e.g., through evolution) to the 

protein of interest. A 3D structure offers a model (even before a model of the query 

sequence is built, following this information) to think on, a toy on which to visualize 

and handle data in far more efficient ways (see Note 16). 

If domains are inferred by the relevant procedures (or supplied by SwissProt 

annotation) and/or long stretches (say 30–40 amino acid residues or longer) of 

special behavior are observed, it is a good idea to handle each sequence part 

separately, or in small meaningful combinations, for instance, there may be no 

reason to treat, say, a propeptide separately from the main body of the domain 

it belongs to (see Note 17 and 18). 

If a few top hits of a database search can be aligned to the query with 

confidence, and the next ones are marginal (see Note 18), the output of a 

multiple alignment of the best hits (including the query) should be converted 

to some kind of profile [e.g., a position-specific scoring matrix (PSSM)] and 

the database should be scanned for the resulting profile (see Note 19). The 

marginal hits of the initial query (i.e., the protein of interest) that match positions 

conserved throughout the profile will have their statistical significance increased 

and they will surface. If domain inferring programs can detect some kind 

of domain on those (initially marginal) hits, this information can then be 

transferred to the initial query with confidence (recall: the query is part on 

which no domain was detected). 

The few top hits will be sometimes marginal (see Note 18). Each of the 

“best” marginal hits should be used as a query and a number of homologues 

(about 10; see Note 20) should be collected and aligned without the initial 

query (i.e., protein of interest). Some kind of profiles (e.g., a PSSM) should 

be produced by those alignments and the relevant part of the initial query (i.e., 

protein of interest) should be aligned against them. If the initial query matches 

the profile at conserved positions (see Note 21), the hit was not fortuitous. 

Again, if domain inferring programs can detect some kind of domain along the 

sequences that formed the profile, this information can then be transferred to 

the initial query with confidence. 

Other databases provide annotation at high level on specific tasks. InterPro 

(17) offers a convenient entry point to a number of them, especially for manual 

sequence classification (as opposed to some massive automated procedure). 

SuperFamily (18) builds information based on classification of 3D structures (a 

hit here implies structural similarity regardless of common function or evolutionary 

origin), PRINTS (19) and PROSITE (20) and one may continue with a 

long list, where each member targets a specified problem (e.g., if the protein of 

interest is found to be a peptidase, MEROPS (21) may be consulted for further 

relevant classification).


4. Notes 

1. It is just often a simple operation (e.g., a function) that is built by (part of) the 

sequences as 3D domains. For instance, there are tertiary structural domains, 

which simply bind a cofactor and feature an allosteric position, where some 

regulatory factor (e.g., ADP) will dock to exert its role. The active site may 

reside on a separate domain, or may be shared between two of them, within the 

range of the cofactor. 

2. Unpublished work (C.D.P., Ph.D. thesis) in continuation of (3) suggests that the 

requirements set – albeit too vaguely – by an -helical “up-and-down” bundle, 

which is an abundant tertiary structural motif, raise the relevant parts of the 

sequence to the extreme 0.1–1% of a suitable distribution, when proteins in 

a databank are scored for compatibility. This shift is not enough for structure 

prediction from the sequence alone (too many false positives), but it still reflects 

a possibly minimal set of requirements posed by the structure for compatible 

sequences. 

3. There is a tendency to treat the observed structural solutions, i.e., the recurrent 

tertiary structural motifs and domains, as the end evolutionary product of our 

days. In fact, all the preceding evolutionary steps (as well as the future ones, 

probably) had to employ one of the solutions provided in this relatively narrow 

set. If we depict this set, so that similar architectures are close to each other, 

then “evolution” is a “walk” through this set. Whether this set is continuous or 

partitioned in a discontinuous manner, is the subject of ongoing research. 

4. A continuum is thus established in the scale of similarities between protein 

sequences, on one end, the small biases due to simple facts (e.g., two transmembrane 

pieces are coincidentally matched); remote similarities due to common 

structural architecture, in the middle of the scale; and on the other end, 30% (or 

more) identity observed due to common origin of a protein from a mammal to 

a bacterial homologue (and, usually, more than 80%, e.g., between mammals, 

etc.). 

5. This effect characterizes the situation in which a particular domain includes a 

smaller one, plus some extra structural elements (“decorations”); then, the new 

total constitutes part of a larger domain, which includes some further structural 

elements, and so on. Orengo and coworkers (2) have presented a number of 

examples in their series of papers on classification of protein structure. 

6. The version of BLAST featured in EnsEmbl can run against the results of 

GeneScan; this does not simply translate genomic DNA into Opening Reading 

Frame (ORFs) before comparison, but it also attempts to “splice” it, after 

predicting and removing potential introns. Other task-specific databases feature 

relevant tools. 

7. The version of BLAST at the National Center for Biotechnology Information 

(NCBI) has access to all protein sequences of known structure. Alternatively, 

the PDB resource (Table 1) can be directly accessed for this purpose, losing 

however the interconnection to other databases offered by NCBI. 

8. Like in the previous Note 7, access by means provided by NCBI is recommended.


9. For example, BLAST seeks all the instances where a small part from the 

query matches the protein of interest. Then to form longer alignments, BLAST, 

depending on its version, either expands these “seed-alignments” to contiguous 

subalignments, uninterrupted by gaps, which are then joined in all valid combinations, 

or expands the seeds in a gapped alignment fashion. The presence 

of short repeats may make the output particularly hard to follow, due to the 

numerous alternatives. 

10. Sander and Schneider (22) suggest that the minimum percentage of identity 

between two proteins, which is required to imply structural similarity converges 

to about 27% for common alignment length of about 80 amino acids. However, 

the change in the range of 50–80 is small to justify inclusion of further repeats, 

which would increase the number of alternative alignments. See also Note 18. 

11. For instance, assume that a stretch, about 20 amino acid long or longer, is 

dominated by leucine, isoleucine, and perhaps a couple of phenylalanines. Not 

only will this part be nonspecifically matched to any sequence that features a 

similar deviation in composition, but the resulting alignment will also appear 

unstable in this part, because of the numerous and almost equivalent alternative 

ways in which two stretches of the kind can be aligned. 

12. For example, a large deviation toward lysine and alanine will make the sequence 

look like a histone. Scanning a databank for similar peptide sequences, the 

results will tend to include nonspecific stretches rich in positive (and negative 

to a lesser extent) charges, in general. 

13. The NCBI/BLAST Server (Table 1) offers CDD (conserved domain databank), 

which is based on both Pfam and SMART, further including collections internal 

to NCBI. Other servers may offer similar compilations. However, for detailed 

inquiries one may need to resort to the original resources. The information 

presented by the original collection can be much richer. Furthermore, each 

specialized collection offers tools for flexible searches in terms of combinations 

of various domains, to help detect proteins of similar architecture, reference 

similarities to other related domain, and so on. 

14. The fact that tertiary structural domains tend to behave independently should be 

exploited. Bench work can usually be facilitated by studying isolated domains, 

e.g., if some part of a protein makes the molecule hard to crystallize, the relevant 

information (if available) could indicate which part to remove. Information 

derived using domain inferring tools can serve to divide a sequence of interest 

into meaningful pieces. 

Bioinformatics work may as well get similar profits, e.g., during databases 

search: assume for example that a protein includes a general hydrolase domain 

(e.g., an esterase), which is found in many combinations with other domains, 

which particularize its use; and it also contains a domain, which is specific for 

the family this sequence belongs to. It will be the latter that will boost the most 

relevant sequences to the top of the sorted list of BLAST results; accordingly, 

it will be the one to drive the query protein to the correct subfamily within the 

framework of a larger family.


15. In the case of multidomain proteins, each hit to a constituent domain (or a 

significant part of it), signifies the existence of a related part in the databank. 

Occasionally, some domains will seem apparently missing: either the relevant 

part of the sequence appears deleted or an expected domain is not recognized 

along it. Given the statistical nature of the recognition procedure and the 

nucleotide nature of underlying primary data, the tempting conclusion that this 

domain/part is not present, is by no means secure. 

• If the relevant part of the sequence is present, you may check whether 

domains, which were recognized by domain inference programs along remote 

homologues of this part, can be transferred by means of alignment involving 

preformed multiple alignments, as described in Subheading 3.2.3 for the case 

of remote hits. 

• If the relevant part of the sequence seems absent, then despite the efficiency 

of genetic data manipulation procedures, parts of the sequence may have been 

accidentally considered as introns. Once some major part of a multidomain 

protein has been located on the complete genome, the hits should serve as 

pointers to the location to search more carefully at. Perhaps the next generation 

of data-mining will perform this retro-search of missing parts automatically 

(like the iterative BLAST is performed today). Until then, and in spite of 

the times of high-level annotation (which will retrieve the major part of the 

information being hunted) one should be ready for straightforward TBLASTN 

of minor parts of the sequence in hand to rule out their existence conclusively 

and beyond reasonable doubt. 

16. When an experimentally determined 3D structure for a similar sequence exists 

in PDB, then the sequence of interest and the matching structure can be input 

to some automated model building server (like the SwissModel Server; some 

servers may also need a ready made alignment between the two) and get a 

3D structural approximation of the query protein. If nothing else, inspection 

of this model will explain any mutational data available and will reveal key 

locations for experimentation by means of site-directed mutagenesis and other 

kinds of modification and querying (instead of blind trials along the sequence), 

in order to infer the mechanism of function or other valuable information. If 

the quality of the alignment is poor, but both the sequence and the structure 

can be aligned to e.g., a profile, this intermediary link can mediate alignment 

between the protein of interest and the distantly related sequence of known 

structure. Alternatively the remote match may serve as the query to retrieve 

further sequences homologous to the hit, in order to align the original query to 

their preformed multiple alignments, as it is described under Subheading 3.2.3. 

17. The expectancy value (E-value) provided with the sorted hit list by BLAST 

depends on the product of the length of the database by the length of the 

query. Assuming that matching counterparts exist for just one of the domains 

and that this domain comprises a small part of the total protein, BLAST may


miss matching hits of marginal similarity, just because the length product was 

unnecessarily (thanks to domain independence) too large. 

18. The expectancy value should be regarded as only a rough measure. It would 

be a more accurate measure of the expected number of hits, if databases were 

nonredundant (i.e., they contained absolutely nonhomologous sequences) and 

there were no biases toward specific types of amino acid residues or toward 

sequence patterns (e.g., the amphipathic ones met in -helices, which account 

for about one quarter of protein structure in general). Besides, Sander and 

Schneider (22) have long shown that as soon as a subalignment of a given size 

exceeds a relevant level of identity, 3D structural similarity can be assumed, 

independently of the length of the proteins which participate in the comparison 

or the number of sequences which the query is compared to. They suggest a 

threshold t(L) = 290.15 × L 0562 for L < 80 and about 27% for L > 80; cases 

with identity level higher than t(L) assume related structure, allowing only a 

small acceptable number of false positives. Alignments lying at the lower side 

of the line as this derives from the equation mentioned above, do not necessarily 

signify proteins of unrelated structure. For them, structural similarity, if existant, 

cannot be simply asserted with confidence. Similarity is rendered more and more 

improbable as the relevant figures decrease. 

19. Details on how to make or use a PSSM may change with implementation. It 

is worth spending some time on the on-line help offered on PSSM under their 

implementation at the NCBI. In any case, Clustal (23) may be used to align a 

sequence to a block of prealigned sequences, or even to two preformed multiple 

alignments. In both cases, if conserved positions in the “reference” block are 

conserved along the query sequence (or the query block) the match is reliable. 

Pfam (15) offers the tools for another approach involving hidden Markov model, 

the explanation of which is beyond the scope of the present notes. 

20. Following the results of Henikoff and Henikoff (24,25), it seems that about 10 

homologues are usually already enough, with the reservation that they should 

cover, if possible, all the range of similarities from 90% down to 40–30%. If 

all of them are too similar to each other, it will be as if the same sequence was 

included 10 times. If all of them are too dissimilar to each other, then the risk 

of mistakes in their multiple alignment will be too high. 

21. As a reassurance, in case that a hit is correct, some of the sequences that are 

homologous to the hit should have appeared in the hit list of the initial search 

(i.e., the one in which the protein of interest was the query sequence). If just 

one protein from a large family was reported, chances are that the hit was 

coincidential. 

References 

1. Richardson J.S. and Richardson D.C. (1989) “Principles and patterns of protein 

conformation.” In: Fasman G. (ed) “Prediction of Protein Structure and the 

Principles of Protein Conformation.” Plenum Press, NY, pp 1–98.


2. Orengo C.A. and Thornton J.M. (2005) “Protein families and their evolution – a 

structural perspective.” Annu. Rev. Biochem. 74, 867–900. 

3. Paliakasis C.D. and Kokkinidis M. (1992) “Relationships between sequence and 

structure for the four--helix bundle tertiary motif in proteins.” Protein Eng. 5, 

739–748. 

4. Lattman E.E., Fiebig K.M. and Dill K.A. (1994) “Modeling compact denatured 

states in proteins.” Biochemistry 33, 6158–6166. 

5. Lupas A., vanDyke M. and Stock J. (1991) “Predicting coiled-coils from protein 

sequences.” Science 252, 1162–1164. 

6. Chothia C. (1992) “One thousand families for the molecular biologist.” Nature 

357, 543–544. 

7. Schwede T., Kopp J., Guex N. and Peitsch M.C. (2003) “SWISS MODEL: 

an automated protein homology modeling server.” Nucleic Acids Res. 31, 

3381–3385. 

8. Burge C. and Karlin S. (1997) “Prediction of complete gene structures in human 

genomic DNA.” J. Mol. Biol. 268, 78–94. 

9. Hubbard T., Andrews D., Caccamo M., et al. (2005) “Ensembl 2005.” Nucleic 

Acids Res. 33, D447–D453. 

10. Altschul S.F., Madden T.L., Schäffer A.A., Zhang J., Zhang Z., Miller W. and 

Lipman D.J. (1997) “Gapped BLAST and PSI-BLAST: a new generation of protein 

database search programs.” Nucleic Acids Res. 25, 3389–3402. 

11. Bairoch A., Apweiler R., Wu C.H., Barker W.C., Boeckmann B., Ferro S., 

Gasteiger E., Huang H., Lopez R., Magrane M., Martin M.J., Natale D.A., 

O’Donovan C., Redaschi N. and Yeh L-S.L. (2005) “The universal protein resource 

(UniProt).” Nucleic Acids Res. 33, D154–D159. 

12. Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., 

Shindyalov I.N. and Bourne P.E. (2000) “The protein data bank.” Nucleic Acids 

Res. 28, 235–242. 

13. Boguski M.S., Lowe T.M.J. and Tolstoshev C.M. (1993) “dbEST – database for 

expressed sequence tags.” Nature Genet. 4, 332–333. 

14. Apic G., Gough J. and Teichman S.A. (2001) “Domain combinations in archaeal, 

eubacterial and eukaryotic proteomes.” J. Mol. Biol. 310, 311–325. 

15. Bateman A., Coin L., Durbin R., Finn R.D., Hollich V., Griffiths-Jones S., Khanna 

A., Marshall M., Moxon S., Sonnhammer E.L.L., Studholme D.J., Yates C. and 

Eddy S.R. (2004) “The Pfam protein families database.” Nucleic Acids Res. 32, 

D138–D141. 

16. Letunic I., Copley R.R., Pils B., Pinkert S., Schultz J. and Bork P. (2006) “SMART 

5: domains in the context of genomes and networks.” Nucleic Acids Res. 34, 

D257–D260. 

17. The InterPro Consortium; Mulder N.J., Apweiler R., Atwood T.K., et al. (2005) 

“InterPro, Progress and Status in 2005.” Nucleic Acids Res. 33, D201-D205. 

18. Madera M., Vogel C., Kummerfeld S.K., Chothia C. and Gough J. (2004) “The 

SUPERFAMILY database in 2004: additions and improvements.” Nucleic Acids 

Res. 32, D235-D239.


19. Attwood T.K., Bradley P., Flower D.R., Gaulton A., Maudling N., Mitchell A.L., 

Moulton G., Nordle A., Paine K., Taylor P., Uddin A. and Zygouri C. (2003) 

“PRINTS and its automatic supplement, preprints.” Nucleic Acids Res. 31, 400-402. 

20. Hulo N., Bairoch A., Bulliard B., Cerutti L., de Castro E., Langendijk-Genevaux 

P.S., Pagni M. and Sigrist C.J.A. (2006) “The PROSITE database.” Nucleic Acids 

Res. 34, D227-D230. 

21. Rawlings N.D., Morton F.R. and Barrett A.J. (2006) “MEROPS: the peptidase 

database.” Nucleic Acids Res. 34, D270–D272. 

22. Sander C. and Schneider R. (1991) “Database of homology-derived protein structures 

and the structural meaning of sequence alignment.” Proteins: Struct. Fun. 

Gen. 9, 56–68. 

23. Thompson J.D., Higgins D.G. and Gibson T.J. (1994) “CLUSTAL W: improving 

the sensitivity of progressive multiple sequence alignment through sequence 

weighting, positions-specific gap penalties and weight matrix choice.” Nucleic 

Acids Res. 22, 4673–4680. 

24. Henikoff S. and Henikoff J.G. (1992) “Amino acid substitution matrices from 

protein blocks.” Proc. Natl. Acad. Sci. USA 89, 10915–10919. 

25. Henikoff S. and Henikoff J.G. (1993) “Performance evaluation of amino acid 

substitution matrices.” Proteins Struct. Fun. Gen. 17, 49–61.

19 

Open-Source Platform for the Analysis of Liquid 

Chromatography-Mass Spectrometry (LC-MS) Data 

Matthew Fitzgibbon, Wendy Law, Damon May, Andrea Detter, and 

Martin McIntosh 

Summary 

The analysis of protein mixtures by liquid chromatography-mass spectrometry (LC- 

MS) requires tools for viewing and navigating LC-MS data, locating peptides in LC-MS 

data, and eliminating low-quality peptides. msInspect, an open source platform, can carry 

out these steps for single experiments and can align and normalize peptide features 

in comparative studies with multiple LC-MS runs. In addition, msInspect can analyze 

quantitative studies with and without isotopic labels to generate peptide arrays. 

Key Words: liquid chromatography-mass spectrometry; peptide identification; 

filtering; alignment; quantitation. 


msInspect is an open-source platform comprising algorithms and visualization 

tools that process liquid chromatography-mass spectrometry (LC- 

MS) data files to locate peptides in two dimensions [time and mass over 

charge (m/z)] and perform various analyses on them (1). msInspect can be 

used for: 

• Visually inspecting LC-MS spectra and peptide features 

• Automatically locating peptide features in high mass accuracy MS spectra 

• Filtering peptide features by various quality measures 

• Quantitating label-free peptide features between experiments via alignment and 

normalization of the data to create a peptide array 



369

370 Fitzgibbon et al. 

• Identifying isotopically labeled pairs [e.g., isotope coded affinity tagging (ICAT), 

sable labeling with amino acids in cell culture (SILAC)] for quantitative peptide 

analysis within a single experiment 

• Comparing and developing MS feature-finding algorithms 

msInspect implements multiple algorithms specifically designed for LC-MS 

data. 

The signal processing component exploits the two-dimensional nature of 

the data to identify coeluting isotopes and then groups them based on the 

similarity of the observed isotopic distributions to those of naturally occurring 

peptides. The alignment method estimates the underlying nonlinear mapping of 

retention times between experiments. The normalization approach (2) adapts 

methods developed for genomic arrays to accommodate natural variation of 

LC-MS signal intensities across runs. Ultimately, the goal of msInspect is to 

mine LC-MS data and to produce peptide arrays that can then be analyzed 

using tools traditionally applied to genomic arrays msInspect also contains 

a complete Accurate Mass and Time (AMT) analysis workflow (3). These 

analytical techniques combine LC-MS and LC-MS/MS data in order to expand 

peptide coverage and enhance the confidence of peptide identifications. 

2. Materials 

To run msInspect the Java Runtime Environment must be installed. To 

perform alignment of multiple runs, the R environment must also be installed. 

Both of these programs must be properly configured and on the computer’s 

PATH. Information on acquiring these software packages is provided in 

Subheading 2.1 below. Please contact your local IT systems support group for 

details on installing these software properly. 

msInspect reads mass spectra from files in the open mzXML format (4). For 

background on mzXML and information about converting data from particular 

instruments to mzXML see Note 1. 

2.1. Software 

1. msInspect is written in platform-independent Java and requires that the Java 

Runtime Environment, version 1.5 or later, be installed and on the computer’s 

PATH. Installation of Java Runtime Environment will also install the latest 

version of Java Web Start, which will allow msInspect to be run without 

needing to explicitly install it or update it as new versions are released 

(see Note 2). 

a. Windows, Linux, and Solaris users can download “J2SE 5.0” from 

http://java.sun.com/j2se/1.5.0/download.jsp. 

b. MacIntosh users running Mac OS X v10.4 or later can download Java from 

http://www.apple.com/support/downloads.

Open LC-MS Analysis Platform 371 

2. To align multiple runs into a peptide array, the R environment for statistical 

computing, version 2.1.0 or later, must be installed and on the computer’s PATH. 

R executables for various operating systems are available from http://www.rproject.org. 

2.2. Hardware 

msInspect will run on any computer that supports the software listed in 

Subheading 2.1. For large input files, typical of high mass accuracy measurements, 

feature extraction can require several hundred megabytes of memory 

(see Note 3). msInspect has been tested on computers running Windows XP, 

GNU Linux, and Mac OS X with at least 1 GB of main memory. 

2.3. Data Files 

msInspect will open any version 2.0 mzXML file containing MS1 

data. However, msInspect was designed using high-resolution liquid 

chromatography-electrospray ionization-time of flight mass spectrometer data 

so it may not perform as well with an mzXML file from another type of 

mass spectrometer (e.g., a matrix-assisted laser desorption-time of flight mass 

spectrometer). 

Sample mzXML files that may be used to follow all of steps in Section 3 

are available on the Web (see Note 4). 

3. Methods 

3.1. Viewing and Navigating LC-MS Data 

1. Launch msInspect from http://proteomics.fhcrc.org/download/tools/msInspect/ 

viewer.jnlp by clicking on “Launch msInspect with Java Web Start.” “Fred 

Hutchinson Cancer Research Center” must be accepted as a trusted software 

publisher for the download to be completed. 

2. Upon launching msInspect, the Open File dialog box will automatically open. 

Browse for the mzXML file to be viewed, select the file, and left click the 

Open button (see Note 5). You may load a different mzXML file by selecting 

File > Open from the main msInspect menu bar. 

3. The msInspect window (Fig. 1) contains several panes for viewing and navigating 

the MS run: 

a. An image of the MS run will be displayed in the Image Pane (the largest 

pane in the center of the msInspect window). 

b The Properties Pane (left side of the window) will display detailed information 

from the mzXML file loaded. This pane will later be used to 

display details of individual peptide features. It can be hidden with 

Windows > Show/hide properties.


c. The Detail Pane is on the right side of the window and the Chart Pane is 

at the bottom part of the window. Each provides a more detailed view of a 

region of the spectrum. The Detail Pane provides a zoomed view of the area 

selected in the full Image Pane. The Chart Pane plots intensity versus m/z 

(to show the isotopes in a single scan) or intensity versus scan (to show the 

elution profile of a single isotope). 

4. Hold the mouse cursor over a location in the Image Pane. A floating tag will 

appear displaying the scan number and m/z coordinates of that position. 

5. Areas containing peptide features in the Image Pane will appear dark. Left click 

in a dark area of the image where there appear to be many peptide features as 

shown in Fig. 1. 

a. The Detail Pane (right) shows a detailed view of the area selected. Feature 

finding is automatically launched in this area, and after a few seconds of 

computation, detected peptide features are circled. Xs indicate the monoisotopic 

peaks in each feature (see Note 6). 

b. To see detailed information about a detected peptide, position the mouse 

cursor over the monoisotopic peak. A floating tag will display scan 

number, m/z (followed by mass in parentheses), inferred charge state, 

Fig. 1. msInspect window showing the Properties Pane (top left), Image Pane (top 

center), Detail Pane (top right), and Chart Pane (bottom).


intensity/background intensity/median intensity, and the first and last scan 

for the feature. 

c. The Chart Pane (bottom) displays the m/z spectrum for the scan corresponding 

to the vertical red line in the Detail Pane. 

6. Zoom in on features in the Chart Pane by highlighting a desired area. To do 

this, anchor the mouse cursor by left clicking at the top left corner of the desired 

area and continue to hold down the left mouse button while dragging the mouse 

cursor down and to the right. When the mouse button is released, the chart will 

be redrawn to produce a magnified view of the selected area (see Note 7). To 

restore the original chart, left click on the mouse cursor anywhere in the Chart 

Pane and drag the cursor up or to the left. 

7. Select “elution” from the drop-down menu at the top of the Chart Pane to display 

an elution profile plot. This display shows peaks along the scan axis rather than the 

m/z axis. Note that the Detail Pane now displays a horizontal line corresponding 

to the m/z value for the profile as shown in Fig. 2. 

8. Zoom in on the Image Pane by right clicking on the mouse and selecting a 

magnification value from the list (e.g., 200%). 

Fig. 2. msInspect window displaying an elution profile plot in the Chart Pane and 

corresponding horizontal line in the Detail Pane.


3.2. Locating Peptides in LC-MS Data 

A Feature Set file, which lists all of the peptide features detected in a run, 

can be generated using one of the algorithms included in the platform (see 

Note 8). 

1. Under the Tools menu, select two dimensional (2D) Peak Alignment. This is the 

default feature-finding algorithm and is recommended for most purposes. 

2. To initiate feature finding, select Tools > Find All Features. This will bring up 

the Extract Features dialog box as shown in Fig. 3. 

3. In the “Save Features to File” field, enter (or browse for) a path and add a name 

for the new Feature Set file. 

4. Specify a scan range in the “Start Scan” and “End Scan” fields to limit feature 

finding to a subset of scans. By default, msInspect will attempt to find peptides 

in all scans (see Note 9). 

5. Left click the Find Features button to begin the feature finding process. As the file 

is processed, the status bar at the bottom of the msInspect window will display 

progress. For a large input file, processing may take upwards of 20–30 min. 

6. When processing is complete, features will be written to the specified 

output file and highlighted as colored crosses in the Image and Detail 

Panes. The status bar will display “Finding features complete. See file 

yourfilepath\yourfile.peptides.tsv.” Place the mouse cursor over one of the 

detected features to display a summary of its properties. Left click on the feature to 

view details in the Properties Pane (display by Windows > Show/hide Properties). 

7. Select Tools > Display Peptides… to open the Display Features dialog box as 

shown in Fig. 4A for customization: 

Fig. 3. Extract Features dialog box.


(A) 

(B) 

Fig. 4. Continued


a. Display or hide the colored crosses by checking or unchecking the box under 

the “Display” field. 

b. Change the color of the crosses by left clicking on the colored box under 

the “Color” field. A new color can be selected from a color palette. 

c. View the Feature Set browser by left clicking on the “…” button. This 

browser lists details of all peptides in the Feature Set. This list can be sorted 

and edited, comments can be added to a feature, features can be deleted, and 

the modified Feature Set file may be saved (see Note 10). 

3.3. Filtering to Eliminate Low-quality Peptides 

Low-quality peptides can be removed in msInspect by applying userspecified 

filtering criteria (e.g., a minimum number of isotopic peaks detected). 

Removing low-quality peptides is particularly helpful when peptide arrays are 

to be generated (described in Subheading 3.4.1). 

1. Select Tools > Display Peptides…. 

2. Left click the Filter tab at the bottom of the Display Features dialog box. This 

tab displays several parameters by which features can be filtered. 

3. Set Min Charge = 1, Min Scans = 3, Min Intensity = 5, Max KL = 1.0, and Min 

Peaks =2asshown in Fig. 4A (see Note 11). 

4. Left click the Apply button. The Detail Pane now shows only the features that 

meet these filtering criteria. 

5. Save the filtered Feature Set file over the original file by left clicking on the “…” 

button at the top right of the Display Features dialog box, then left clicking on 

the Save button. 

3.4. Quantitation of Peptide Features 

3.4.1. Quantitation Using Label-free Approaches 

Features from multiple experiments can be compared in msInspect by simultaneously 

opening Feature Set files from multiple LC-MS runs, displaying them 

together, and generating a peptide array. Below are directions for multiple LC- 

MS run comparisons after Feature Set files have been produced (as described 

above in Subheadings 3.1–3.3) for all LC-MS runs to be compared. 

1. Select Tools > Display Peptides…. 

2. Left click on the Add Files button (Fig. 4A). 

◭ 

Fig. 4. (A) Display Features dialog box with one file loaded and the Filter tab 

selected. (B) Display Features dialog box with two files loaded and the Peptide Array 

tab selected.


3. Browse to find another Feature Set file (with file extension.peptide.tsv) and open 

it. A different colored cross is assigned in the Image Pane to the features from 

each newly opened file. In this way, multiple Feature Set files can be opened and 

overlaid in the Image Pane (see Note 12). 

4. Left click on the Filter tab (Fig. 4A) at the bottom of the Display Features 

dialog box and make sure the filter criteria are still set to the values entered 

in Subheading 3.3 (Min Charge = 1, Min Scans = 3, Min Intensity = 5, Max 

KL = 1.0, and Min Peaks = 2). Left click on the Apply button if any changes are 

made. 

5. Left click on the Peptide Array tab (Fig. 4B) to set criteria for the peptide array 

to be generated: 

a. Enter a name for peptide array file that will be generated. By convention, 

this file name should end with “.pepArray.tsv.” 

b. Click the Optimize button to have msInspect search for reasonable tolerances 

for matching features across runs (see Note 13). 

c. Check the Normalization box if normalization of features is desired (2). 

d. Click the Calculate button to actually compute the peptide array. 

6. The generated peptide array file consists of one column of intensities for each run 

and one row for each matched feature. The file is stored in a simple tab-delimited 

format, which can be exported (to Excel and other programs) and analyzed using 

tools traditionally applied to genomic arrays (see Note 14). 

3.4.2. Quantitation Using Isotopic Labeling 

A common method of relative quantitation of peptides involves applying 

heavy and light isotopic labels separately to two samples, then mixing them 

prior to collecting LC-MS data. Typically, tandem MS/MS (or MS2) experiments 

are used to analyze these labeled samples. Peptide sequencing in 

MS/MS can detect the number of labeled residues in each peptide and therefore 

determine the expected mass difference between light and heavy forms of each 

peptide. 

msInspect can perform relative quantitation even in the absence of MS/MS 

information. Provided with the mass of the light and heavy reagents and with a 

threshold on the number of labeled residues to consider, msInspect will search 

for pairs of features consistent with isotopic labeling. 

1. Open the file to be analyzed as described in Subheading 3.1. 

2. Select Tools > Find All Features. 

3. This will again bring up the Extract Features dialog box as shown in Fig. 3. 

Enter a new output file name and select a scan range of interest as described in 

Subheading 3.2.3–3.2.4. 

4. Note the “Quantitate” check box in this dialog. Selecting this box will enable 

several options for relative quantitation.


5. Select one of several common isotopic labeling strategies (e.g., Cleavable ICAT 

and O 16 /O 18 ) from the pull-down menu. Details can be entered including masses 

for light and heavy label reagents, the particular amino acid labeled, and the 

maximum number of labeled residues to consider. 

6. Left click on the “Find Features” button to locate all features in the specified 

scan range. Display features from the Feature Set file as described in Subheading 

3.2.7. An additional matching step is performed to locate isotopically labeled 

pairs. A pair is indicated by a vertical bar connecting the light and heavy partners 

in the Detail Pane. Selecting a pair by left clicking in the Detail Pane will display 

feature properties including the light and heavy intensities, the ratio of light to 

heavy, and the number of isotopic labels detected. 

7. The results of this quantitation process are stored in a tab separated value (TSV) 

file specified in step 3.4.2.3. One record is written for each isotopically labeled 

pair and for each unlabeled peptide (see Note 15). 

4. Notes 

1. More information on the mzXML file format, as well as utilities to convert 

native acquisition files from many common MS instruments to mzXML, can be 

found on the Sashimi website at http://sashimi.sourceforge.net. 

2. Running msInspect via Java Web Start is highly recommended for casual use, 

as it greatly simplifies installation and update of the software. msInspect’s 

major features, such as feature finding and peptide array creation, are available 

from the command line as well, and command-line use is more appropriate 

for batch processing of large numbers of mzXML files. To use msInspect 

from the command line, the stand-alone JAR file can be downloaded from 

http://proteomics.fhcrc.org/CPL/msinspect.html. This web page also allows 

download of the msInspect user’s guide, which contains detailed instructions on 

installation, using msInspect’s features from the command line, and full source 

code for the released version (5). 

3. Feature extraction can require a great deal of memory since it operates on several 

scans at a time. By default the Java Web Start version of msInspect allows up to 

384 MB of memory to be allocated so that a number of scans and intermediate 

results may be cached. If additional memory is available on the computer, the 

amount of memory accessible by msInspect may be increased when running 

msInspect from the command line with the “-Xmx” option when invoking Java. 

For example “java –Xmx512M –jar viewerApp.jar.” 

4. Sample data files are available at https://proteomics.fhcrc.org/CPAS. From that 

website, follow the “Published Experiments” link on the lower left side and 

then left click on the “MiMB Clinical Proteomics” link on the left side. Because 

LC-MS files can be quite large, the samples provided for download are only 

small subregions of the files used as figures in Section 3. Some browsers, such 

as Internet Explorer, may add a “.mzXML.xml” suffix when downloading these


files. This should not affect msInspect’s ability to read the files and may be 

safely modified to “.mzXML” if desired. 

5. The first time a particular mzXML file is loaded, msInspect will write a “.inspect” 

file in the same directory where the mzXML file is located. This file contains an 

index of each scan in the original file, which will speed subsequent file access. 

Construction of this index file can take some time for larger input files; the 

status bar at the bottom of the msInspect window will indicate progress. 

6. The area shown in the Detail Pane is indicated in the main Image pane by a blue 

rectangle. Several aspects of Detail Pane behavior can be adjusted by selecting 

Detail Pane Settings from the Tools menu. There, feature detection can be turned 

on or off, background noise that falls below a threshold can be hidden, and the 

color scheme of the Detail Pane can be modified. 

7. Note that in Fig. 1 the Chart Pane clearly shows individual isotopic peaks 

because the data is from a high-resolution instrument (in this case a Waters 

LCT Premier). msInspect depends on resolving individual isotopes to infer the 

charge state of the peptide and therefore its mass. The charge is derived from 

the reciprocal of the distance between adjacent peaks. In Fig. 1 the peaks of 

the peptide on the left side of the Chart Pane are 0.5 m/z units apart, therefore 

msInspect infers that this peptide has a charge of 2. It is not possible to infer a 

charge for a single peak, so “stray peaks” that cannot be grouped into an isotopic 

cluster are assigned a charge of zero. 

8. msInspect includes a number of feature extraction algorithms, which can be 

selected in the Tools menu. The default, two dimensional (2D) peak alignment, 

is recommended for most purposes. The single scan algorithm may be useful 

if there is little or no scan-to-scan coherence. The feature extraction algorithms 

in msInspect have been designed to work on high-resolution profile mode data. 

The algorithms have been successfully applied to centroided data, but performance 

will depend on the particular centroiding algorithm used and on the noise 

characteristics of the run under consideration. For such data, the centroided scan 

algorithm may be appropriate. 

9. Once peptides have been located, some amount of visual curation is recommended. 

The Heat Map view (accessed from the Tools menu) can provide a 

global view of features grouped by charge state and sorted by various metrics 

such as mass or intensity. Each column in the Heat Map view consists of a 

small intensity window around each feature, colored from low intensity (red) to 

high intensity (yellow). Clicking on a feature in the Heat Map will highlight it 

in the other windows. By sorting on KL score or intensity and inspecting a few 

features, one can gain a sense of what filtering criteria might be appropriate for a 

given data set. When new filter settings are applied, as described in Subheading 

3.3, the Heat Map view is automatically updated. 

10. A typical example of editing a Feature Set file: 

a. Sort by ascending KL score (Left click on the “KL” column header). 

b. Find a feature with KL < 1 that was misidentified by examining its spectrum 

in msInspect window’s Chart Pane.


c. Double click in the Description field for the feature to add a comment to 

the Feature Set List noting that this feature is “questionable.” 

d. Click “Save” to save changes by overwriting the old Feature Set file. 

11. Filtering peptide features can improve the performance of subsequent steps 

such as construction of peptide arrays. Specific filtering criteria will depend on 

instrumentation and the experiment goals. The most frequently used filtering 

criteria include: 

a. Minimum charge – msInspect locates features by first finding peaks and 

then grouping them into isotopic distributions consistent with individual 

peptides. Some peaks will not group with any others and are referred to as 

“stray peaks.” As described in Note 7, it is not possible to infer the charge 

state of these stray peaks, so they are assigned a charge of zero. Setting the 

minimum charge to 1 when filtering will remove these stray peaks, which 

are often due to noise or chemical contaminants. 

b. Minimum number of peaks – confidence in the location and charge state 

assignment of a peptide feature may be greater if it is supported by 

more isotopic peaks. Setting the minimum number of peaks to 2 will also 

eliminate the stray peaks described above. 

c. Minimum number of scans – set the minimum number of scans that a 

peptide must span in order to be considered. This has the effect of eliminating 

peptide features that persist for only a brief time. 

d. Minimum intensity – setting a minimum intensity threshold is often appropriate, 

although the specific value used will depend on the instrument. 

e. Maximum KL score – peaks are grouped by how well they match a model 

of the isotopic distribution of a peptide with a given mass. The KL score 

described in Bellew, et al. (1) measures how much an extracted group of 

peaks deviates from this model; in general, a lower KL score indicates a 

better match. 

12. When multiple feature sets are loaded, it is often useful to hide particular sets 

or to change the colors of the crosses that mark features in a given set. Both 

of these can be accomplished in the Display Features dialog box as shown in 

Fig. 4A (select Tools > Display Peptides). For each feature set, this dialog box 

provides a checkbox to control visibility and a color palette to select colors for 

the crosses. 

13. After optimization, the mass and scan window values that give the best alignment 

results automatically populate the Peptide Array tab. 

14. A number of high-quality open source tools are available for microarray analysis. 

To analyze peptide arrays produced by msInspect, tools from the Bioconductor 

project (http://www.bioconductor.org) and from the TM4 microarray software 

suite (http://www.tm4.org) have been used. 

15. Results from isotopic labeling should be treated as suggestive rather than authoritative. 

Without peptide sequence information, the mass difference between 

heavy and light partners cannot be definitively ascertained. The quality of the


matching is therefore dependent on the quality of feature filtering and the density 

of features in each run. 


The authors would like to thank Matthew Bellew, Marc Coram, Jimmy Eng, 

Ruihua Fang, Mark Igra, and Tim Randolph for their intellectual contributions 

to the development of msInspect. This work was supported by contract # 

23XS144A from the National Cancer Institute. 

References 

1. Bellew, M., Coram, M., Fitzgibbon, M., Igra, M., Randolph, T., Wang, P., 

May, D., Eng, J., Fang, R., Lin, C.W., Chen, J., Goodlet, D., Whiteaker, J., 

Paulovich, A., and McIntosh, M. (2006) A suite of algorithms for 

the comprehensive analysis of complex protein mixtures using highresolution 

LC-MS. Bioinformatics Advance Access published on June 9, 2006 

http://bioinformatics.oxfordjournals.org/cgi/reprint/btl276v1. 

2. Wang, P., Tang, H., Zhang, H., Whiteaker, J., Paulovich, A.G., and McIntosh, 

M. (2006) Normalization regarding non-random missing values in high-throughput 

mass spectrometry data. Proceedings of the Pacific Symposium on Biocomputing 

11, 315–326. 

3. May, D. Fitzgibbon, M., Liu, Y., Holzman, T., Eng, J., Kemp, C.J., Whiteaker, J., 

Paulovich, A., and McIntosh, M. (2007) A Platform for Accurate Mass and 

Time Analyses of Mass Spectrometry Data. Journal of Proteome Research 6(7), 

2685–2694. 

4. Pedrioli, P.G., Eng, J.K., Hubley, R., Vogelzang, M., Deutsch, E.W., Raught, B., 

Pratt, B., Nilsson, E., Angeletti, R.H., Apweiler, R., Cheung, K., Costello, C.E., 

Hermjakob, H., Huang, S., Julian, R.K., Kapp, E., McComb, M.E., Oliver, S.G., 

Omenn, G., Paton, N.W., Simpson, R., Smith, R., Taylor, C.F., Zhu, W., and 

Aebersold, R. (2004) A common open representation of mass spectrometry data and 

its application to proteomics research. Nature Biotechnology 22(11), 1459–1466. 

5. Computational Proteomics Laboratory. msInspect website. Accessed on June 28, 

2006 at http://proteomics.fhcrc.org/CPL/msinspect.html.

20 

Pattern Recognition Approaches for Classifying 

Proteomic Mass Spectra of Biofluids 

Ray L. Somorjai 

Summary 

The statistical classification strategy we have developed for magnetic resonance, 

infrared, and Raman spectra for the analysis of biomedical data is discussed, particularly 

as it applies to proteomic mass spectra. A general discussion of the current use of 

pattern recognition methods is given, with caveats and suggestions relevant for clinical 

applicability. 

Key Words: visualization; preprocessing; feature selection/extraction; robust 

classifier; classifier aggregation; proteomics; mass spectroscopy; magnetic resonance 

spectroscopy; biodiagnostics. 


Unlike magnetic resonance spectroscopy (MRS), infrared spectroscopy 

(IRS), and Raman spectroscopy (RS) (1,2,3), proteomic mass spectroscopy 

(PMS) is a relative newcomer to the field of biodiagnostics. However, with 

the goal of discriminating various disease and disease states, it is a welcome 

complementary technique that provides yet another means of analyzing 

biofluids. In particular, this complementarity extends the range of characterizing 

biofluids, from vibrational states of specific chemical groups (IRS, RS), 

through the identification of small molecules (MRS), to proteins and protein 

fragments (PMS). 

Being an emerging field, PMS suffers from growing-up pains. In particular, 

there are experimental difficulties specific to PMS that have yet to be addressed 



383

384 Somorjai 

(see Note 1) (in the following, the author assumes that the spectra, for which 

classifiers are to be developed, have been properly “processed”). 

Typically, biomedical data consist of a relatively few (of the order 10–100) 

samples (patterns) that are initially presented in a very high-dimensional feature 

space (feature ≡ m/z intensity), with dimensionality L (dimension ≡ features) of 

order 1000–10,000. Unfortunately, these two characteristics lead to two curses 

that impede the development of robust classifiers: the curse of dimensionality 

and the curse of dataset sparsity (3). The consequence of the two curses is 

that the sample to feature ratio (SFR) is 1/10–1/1000, instead of the minimal 

5–10, required for robust classification, as is generally accepted by the machine 

learning community. 

In this chapter, the author presents the specific strategy [dubbed statistical 

classification strategy (SCS)] they have developed over the last dozen years 

to deal with such problems, particularly as they apply to MR, IR, and Raman 

spectra. We have been adapting this strategy and applying it with success to 

biomedical data derived from both proteomics mass spectra and microarrays 

(see Note 2). The author compares the differences and similarities of the SCS 

with the proteomics data analysts’ current tools and wherever possible, makes 

recommendations. 

2. The Statistical Classification Strategy 

Lifting the twin curses of high dimensionality and dataset sparsity requires 

special approaches. The “strategy” part of the SCS reflects the fact that no 

single approach is, or can be optimal [“there are no panaceas in data analysis” 

(4)], and that a data-driven, multistage strategy is necessary or even essential. 

Using a divide-and-conquer philosophy, the SCS consists of five stages: 

1. Data visualization 

2. Preprocessing 

3. Feature selection/extraction 

4. Robust classifier development 

5. Classifier aggregation (ensembles) 

The five stages are, of course, intimately interrelated; in particular, we use 

the visualization stage to constantly monitor how well the other stages of the 

strategy are working. Figure 1 provides a flowchart of the SCS. A more detailed 

description of the SCS can be found in (5) (see Note 3). 

2.1. Visualization of High-Dimensional Data 

Proper data visualization is an essential first step that requires dimensionalityreducing 

mapping/projection from typically a very large, L-dimensional feature

Pattern Recognition for Proteomic Spectra 385 

DATA VISUALIZATION 

PREPROCESSING 

FEATURE SELECTION / EXTRACTION 

CLASSIFIER DEVELOPMENT 

CLASSIFIER AGGREGATION 

Fig. 1. Flowchart for the five stages of the SCS. 

space to one to three dimensions. Of course, mapping from high dimensions to 

lower ones cannot preserve all distances exactly, because most of the original 

degrees of freedom are lost. However, if only class separability is required, 

exact visualization, our primary goal, is both achievable and sufficient. In 

fact, we recently proposed such an approach (6). It involves mapping highdimensional 

patterns to a special plane, the relative distance plane (RDP). The 

mapping procedure starts with the selection of a distance measure. This can 

range from Euclidean, city block, maximum norm to Mahalanobis, and its 

generalization (Anderson – Bahadur, AB) (7). Next, two reference patterns 

are chosen, one from each class. The critical observation, on which the RDP 

mapping relies, is that the distance of any other pattern to these two reference 

points is preserved exactly even after the mapping. This is because a triangle 

remains a triangle in any dimension and for any distance metric. Hence, the 

three distances of any such a triangle can be displayed in two dimensions, 

without distortion. By cycling through all possible reference pairs, we can 

display and visualize the data with respect to these sets, i.e., from a large number 

of possible “perspectives” (as an analogy, consider looking at a sculpture from 

every angle to assess its shape and form), a very powerful approach for detecting 

outliers (e.g., poor quality spectra), discovering additional subgroups within a 

class (clustering), assessing whether training and test sets derive from the same 

distributions, etc., in short, for establishing and ensuring quality control. 

2.2. Preprocessing 

Preprocessing enables the user to adapt, “tune” the data, so that the subsequent 

stages of the SCS are optimized. For spectra, whether MS or MR, 

we found that the most useful preprocessing approaches, alone or in combination, 

are normalization (“whitening,” or scaling to unit area), smoothing 

(filtering), and/or peak alignment (with respect to some internal or external

386 Somorjai 

reference). Various transformations of the spectra lead frequently to better 

classification. Examples of such transformations include replacing the spectra 

by their (numerical) derivatives or by rank-ordered variants (the nonlinear 

rank-ordering replaces the original features by their ranks, thus minimizing 

the influence of accidentally large or small feature values) and combinations 

of these. Furthermore, creating differently preprocessed versions of the same 

dataset, selecting different sets of features from these (stage 3), and developing 

different classifiers using these feature sets (stage 4) facilitates the aggregation 

of these multiple classifiers for possibly increased accuracy (stage 5). The 

achieved classifier’s accuracy and reliability are also assessed by visualization 

of the results (stage 1). This demonstrates how the strategy uses the stages in 

an interactive, feedback fashion. 

2.3. Feature Selection/Extraction 

In general, this stage is one of the two most important components of the 

SCS. It is essential not only for dimensionality reduction (which helps lifting 

the curse of dimensionality), but, when done properly, also helping to arrive at 

biologically relevant and transparent interpretations of the data (“biomarker” 

identification). The driving force behind feature selection/extraction (FSE) is 

the goal of satisfying one of the two critical requirements for any reliable 

classifier development, lifting the curse of dimensionality. 

Spectra, whether mass or MR, are peculiar: their “intrinsic dimensionality,” 

the number of independent, relevant features they possess, is generally much 

smaller than their original dimensionality. This is because spectra have many 

irrelevant features (“noise”), and adjacent features are strongly correlated. 

Some of these correlated features correspond to spectral peaks, representing 

small molecules (MRS), or small proteins, protein fragments, or peptides 

(PMS). Thus, it is clearly beneficial to eliminate irrelevant features and 

identify discriminatory peaks (potential “biomarkers”). For spectra, principal 

component analysis, a frequently used dimension reduction method (often the 

principal tool of many PMS data analysts), is doubly dangerous. First, it 

“scrambles” the original features, making discriminatory feature identification 

and selection problematic; second, since the principal components (PCs) are 

ordered according to the maximum variance explained in the data, there is no 

guarantee that the first few PCs are discriminatory for classification. Even if 

one were to choose the first M ≪ L PCs from the original, total L-term set, these 

are rarely the best discriminators. One could try selecting m < M PCs as optimal 

for classification (e.g., by exhaustive search); our early experience indicates 

that some of the good discriminators are among the remaining k = M + 1,…,L


subset of PCs. All these difficulties point to the need for a feature selection 

method specific to spectral data, one that preserves spectral interpretability. 

There are two generic approaches to feature selection (8). The filter method 

selects features without consideration of the classifiers to be used with these 

features. The wrapper (embedding) method finds optimal features, while using 

the eventual classifier to guide the selection method. We have developed a 

genetic algorithm-based optimal region selection (GA-ORS) method that finds 

discriminatory features without loosing spectral interpretability (9). 

The GA-ORS is based on the wrapper approach and is an example of feature 

extraction. It has the advantage that the spectral ranges found are averaged 

over adjacent data points (thus equivalent to peak area determination). Such 

averaging increases the signal to noise ratio, a bonus. Within the GA-ORS suite 

of programs, one can also control the widths of the selected spectral subregions 

(discriminatory peaks); this helps to eliminate those regions that appear to be 

discriminatory simply because of accidental differences in the “noise” regions 

due to the limited sample size (9,10). 

The GA-ORS has been very successful in identifying discriminatory subregions 

of MR, IR, and Raman spectra of biofluids and tissues, obtained for 

distinguishing between various diseases and disease states (1). 

In the context of feature selection, many proteomic mass spectroscopists first 

identify “relevant” peaks, sometimes in an ad hoc fashion, as possible contributors 

to discrimination. Although using all available “domain knowledge” is very 

important and should always be considered when available, it can also introduce 

bias, because of possible preconceived notions of what is relevant for discrimination. 

Our feature selection approach, sketched above, removes most of such 

bias, by identifying hitherto unsuspected, novel discriminatory “peaks,” or more 

accurately, discriminatory spectral subregions. Furthermore, by its explicit multivariate 

nature, GA-ORS tends to identify a “fingerprint,” a “panel” of peaks whose 

simultaneous interaction is necessary for discrimination. 

When the multidimensional feature space does not arise from spectra, e.g., 

microarray data or preselected discrete peaks in PMS, for which averaging 

adjacent features is not meaningful, direct application of the GA-ORS methodology 

may not be appropriate [although we have used it as a preliminary, 

clustering-type feature selection “trick” (5)]. However, when possible, 

exhaustive, or when not, a dynamic programming-based search for optimal or 

near-optimal discriminatory feature subsets is still feasible and is one of the 

options available in GA-ORS. 

Figure 2 demonstrates the importance of feature selection, and the relevance 

of an interactive, feedback-mode visualization of data. For the two-class, 

prostrate cancer vs. healthy proteomic (mass spectral) dataset (11), we display 

a Euclidean distance-based mapping, either directly from the original 15,154

388 Somorjai 

Prostate Cancer – L 2 Mapping from 

15,154 Dimensions 5 Dimensions 

Fig. 2. Mapping from the original 15,154 dimensions (left panel) misclassified eight 

samples from the training set (TS; class 1, black disks, class 2, black crosses) and nine 

from the independent validation (test) set (VS; class 1, grey triangles, class 2, grey 

squares). The mapping from five dimensions (right panel), classified correctly all TS 

and the VS samples. The dashed lines shown are the optimal LDA separators. 

dimensions (left panel) or from five dimensions, reduced via GA-ORS (right 

panel). Clearly, the success of class separation depends on the dimensionality 

of the feature space. When mapping from the original 15,154 dimensions, 

the optimal two-dimensional separation of training sets (TS; black disks for 

class 1, black crosses for class 2) and test sets (VS; grey triangles for class 1, 

grey squares for class 2) misclassify eight samples from the training set and 

nine from the independent test set. For the mapping from five dimensions, all 

samples are classified correctly (see Note 4). 

2.4. Robust Classifier Development 

There are two, generally interrelated goals for supervised classifiers. First, 

we want robust classifiers, i.e., with high generalization power. This is realized 

when the classifier classifies new, unknown “patterns” correctly and reliably. 

Second, we want to identify the smallest subset of maximally discriminatory 

features. Eventual disease management/treatment would benefit from having 

only a few, biologically relevant and interpretable features. Ideally, both classification 

goals should be achieved, especially in clinically relevant studies. 

Unfortunately, achieving the first goal is frequently at the expense of the 

second. A good example is the recent use of support vector machines (SVMs) 

for classification. These have become particularly popular because of their


persuasive theoretical foundations (12,13) (see Note 5). However, because the 

SVMs project the data into even higher dimensional feature spaces to achieve 

linear separability of the classes, relevant, discriminatory feature identification 

becomes more difficult. 

The technical complexity and sophistication of the classifiers used range 

from the simplest correlation techniques, through k nearest neighbors, linear and 

quadratic discriminant analysis, decision trees, neural nets, etc., to (nonlinear) 

SVMs. However, the choice of classifier seems not to be dictated by the data 

to be classified, but rather by “expert” recommendation (usually based on other 

types of data), personal experience or preference, or simply software availability. 

The maxim “simpler is better” has mostly been ignored [see however 

(14)]. In general, no specific effort has been expended on choosing the most 

appropriate, optimal type of classifier for a given dataset. With a few exceptions, 

the proteomics (mass spectroscopy) community tends to use the “best” 

(i.e., the most sophisticated) classifier, whether appropriate or not! 

If the dataset size is sufficiently large, then the optimum approach for developing 

a robust classifier is to partition the data into training set, monitoring 

set and a completely independent test (validation) set. Such partitioning is 

required to prevent overfitting. This occurs when the classifier adapts itself too 

closely to the peculiarities of a training set that comprises a limited number 

of samples. Using a monitoring set helps decide when to stop training. The 

ultimate assessment of the classifier’s generalization capability is how well it 

does on the independent test set that was in no way involved in creating the 

classifier. 

Unfortunately, a sufficiently large sample size is a luxury rarely available to 

the data analysts of biomedical data. The only recourse is to use some version 

of crossvalidation (CV) (15). CV comes in different flavors, each with its 

advantages and disadvantages. All of them are designed to deal with the bias 

introduced by using the entire dataset both to develop the “optimal” classifier 

and to estimate the classification error (see Note 6). 

It is important to re-emphasize that because of the typical small sample size 

of biomedical data, the best approach to robust classifier development is to 

select the simplest classifier possible. This suggests linear classifiers. Complex 

classifiers have too many parameters that need optimization, inevitably raising 

the scepter of overfitting (see Note 7). Dimensionality reduction (FSE) is, of 

course, essential for obtaining an appropriate SFR. Realizing the role of the 

SFR is important when developing classifiers. However, an essential caveat is 

that data sparsity can render any classification result statistically suspect, even 

if the SFR is satisfied (3). The importance of guaranteeing the appropriate SFR 

is being recognized. However, the consequences of data set sparsity are still 

not appreciated (16).

390 Somorjai 

The control of disparate sensitivities and specificities produced by classifiers 

when the dataset is imbalanced has particular clinical relevance (typically, there 

are many more samples from normal subjects than from patients with particular 

diseases) and tuning methods are needed for the classifiers developed. The 

standard method in the pattern recognition literature is either oversampling 

(taking multiple samples from the sparser class), or undersampling (taking a 

subset of the samples from the larger class), such that the sample sizes in the 

two classes become balanced (sensitivity, SE ≈ specificity, SP). However, this 

approach fails quite frequently. Our approach is based on penalizing misclassification 

of members of the smaller class until SE ≈ SP (note that the penalty 

weight is generally not equal to the ratio of the class sizes). 

2.5. Classifier Aggregation 

Clinically relevant classifiers require statistically significant class assignments 

for the samples. Thus, when a classifier’s assignment probability for 

a sample is “fuzzy” (e.g., less than 75% for a second class problem) that 

assignment is not really useful from a clinical point of view. If the overall 

accuracy of a classifier is low and the assignments are fuzzy, a multiple classifier 

strategy (classifier aggregation) can frequently be beneficial. The idea is to 

combine the outputs of several classifiers, with the expectation that the new 

classifier thus formed will be more accurate and less fuzzy than the best of the 

individual constituents. 

One of the requirements for accurate ensemble-based classifiers is diversity. 

It is believed that the component classifiers should be as different as possible. 

This can be achieved in several ways. One of these approaches used conceptually 

and methodologically very different classifiers (Linear Discriminant 

Analysis (LDA), neural nets, and dynamic programming) on the same, unmodified 

data (17). However, our more recent experiments and experiences suggest 

that classifier diversity is not necessarily required. Comparable accuracy can 

be achieved in a simpler way, by employing a single, simple classifier (e.g., 

LDA) and producing diversity using different transformations of the data (we 

have already discussed some of these in the context of feature selection). 

How are we to combine the outcomes of the various classifiers Some 

of the combinations range from the simple majority rule to more complex, 

trainable rules, e.g., stacked generalization (SG) (18). SG uses the output 

probabilities of the constituent classifiers as input features for a new classifier. 

Boosting (19) is a very powerful version a learnable classifier combination 

rule (see Note 8). It was used for identifying proteomic biomarkers for cancer 

detection (20). There are many classifier combination rules. When choosing 

such a rule, it is important to take into account both sample size and classifier 

complexity.


3. Discussion 

Of course, experimental quality control is essential for good classifiers, i.e., 

those that have useful generalization properties. Much has been made of the 

“surprising” observation that different (or even the same) experimental groups, 

using different classifiers end up with totally different sets of discriminatory 

features (21). These are ascribed to various possible experimental differences in 

the spectral acquisition, etc. (22,23,24). Although these are indeed significant 

contributing factors, and must be considered and corrected, sight is lost of the 

important fact that when nonunique discriminatory sets are found, they are as 

likely caused by dataset sparsity (3) as by differences in experimental protocols. 

The initial euphoria is over: one cannot (or should not be able to) publish 

in prestigious journals (e.g., Science, Nature, Lancet, PNAS, etc.) proteomic 

results based on very limited sample sizes. Furthermore, even when there 

are enough data to produce a respectable classifier, high-impact journals are 

unlikely to accept a manuscript unless the results are independently validated. In 

particular, the chemical/biological identification of the discriminatory proteins, 

protein fragments, or peptides must accompany the classification results. This 

increased focus on establishing the clinical relevance of putative biomarkers 

is definitely a good sign. However, at this stage of the game, it is possibly 

premature, and one would prefer first to have a quick, noninvasive, reliable 

diagnostic/prognostic tool. To be clinically relevant, many more samples are 

required to develop such a tool (i.e., a sufficiently robust classifier; this 

requirement will likely rule out the reliable detection of rare diseases). Unfortunately, 

currently available sample sizes preclude the discovery of unique 

biomarker “fingerprints” of a disease. This nonuniqueness due to data sparsity 

leads inevitably to expensive, onerous, and unnecessary laboratory investigations 

to sift out medically relevant, unique subsets from the plethora of 

putative biomarkers found and suggested for various diseases. Understanding 

the biochemical causes is, of course, essential for, say, finding a possible cure, 

but should succeed the diagnostic/prognostic stage. Despites such caveats, the 

proteomics field is maturing and once the technical problems are successfully 

resolved, will undoubtedly provide important medical/clinical insights. 

The author further suggests that the power of proteomic spectroscopy can be 

enhanced by the simultaneous consideration of other experimental modalities 

that complement PMS, especially MRS, which could identify smaller discriminatory 

compounds also present in biofluids. 

4. Notes 

1. Amongst these are correcting the nonflat baselines arising from the matrix 

material, peak alignment of the spectra, reconciling data acquisition at different 

times, in different laboratories, with mass spectrometers of different sensitivity,

392 Somorjai 

correcting high frequency noise, etc. Proper experimental design, including 

rigorous quality assessment and control is essential before any classifier development 

is attempted. Good discussions and summaries are given in (21,22,23,24). 

2. The realization that some classification strategy is essential for the analysis of 

proteomic data is recent. That these strategies are different emphasizes that not 

only there is no best classifier, but also that no unique, best strategy exits either; 

different groups discovered different strategies that worked well for the data they 

analyzed (20,25). What common is that all strategies are multistage. 

3. The data-driven nature of the SCS emphasizes the fact that there is no simple, 

universal prescription for creating an optimal classifier (4), i.e., no simple, ready 

“recipe” is or likely to be available. 

4. This much-improved result strengthens the importance of feature selection. Note 

that both mappings were done using the Euclidean distance, necessary, because 

one cannot use any other distance measure (e.g., Mahalanobis) that involves 

matrix inversion. After feature selection, when the number of features is fewer 

than the number of samples, much more powerful and relevant distance measures 

can be used. For a fair comparison, the Euclidean distance is used for both cases 

presented in Fig. 2 [for further possible improvements obtainable using other 

distance measures see (6)] 

5. In practice, SVMs are not nearly as effective as suggested by theory. In fact, 

we have found (26) that a simple LDA classifier, with wrapper-driven feature 

selection, when applied to several publicly available proteomic mass spectra, and 

to six microarray datasets, generally outperformed a linear SVM, even when 

the latter was used with feature selection. Furthermore, SVM-based classifiers 

frequently produce classification results that are distinctly out of balance. The 

accuracy obtained for one of the classes is most of the time considerably better. 

This imbalance between sensitivity and specificity is of clinical relevance when 

trying to minimize false negatives and/or false positives. 

6. Different variants of CV deal differently with the so-called bias-variance dilemma, 

particularly acute for datasets with limited sample size. The simplest version, the 

leave-one-out (LOO) method, removes one of the N samples, develops a classifier 

with the remaining N – 1 samples, and tests its prediction accuracy on the left-out 

sample. By cycling through all N samples, N accuracy assessments are found. For 

small N (for which the data partition, as described in the main text, is not possible), 

LOO suffers from large variance, even though it minimized the bias. K-fold CV is 

frequently used to balance bias and variance. The samples are partitioned into K 

roughly equal subsets. K – 1 subsets are used for training the classifier, while the leftout 

subset is the current test set. Cycling through the K partitions and then calculating 

the mean and standard deviation of the accuracies over the K test sets assess how well 

and how reliably one is expected to classify new, unknown samples. K is typically 

chosen to be 5 or 10, whether or not the sample size warrants this choice. A more 

reasonable approach is to determine the best K via CV. Particularly, powerful is 

Efron’s bootstrapping approach (15). This involves the entire dataset, but uses a 

random resampling with replacement strategy. A large number of artificial datasets


of the same size as the original are thus produced. A classifier is created for each 

of these, and the outcomes are averaged. Bootstrapping is supposed to reduce both 

large bias and variance. Inspired by the bootstrapping concept, we have been using, 

with some success, its generalization (27). 

7. Instead of the direct use of nonlinear classifiers, with the attendant optimization 

problems, a simple trick is to use nonlinear terms but retain the simplicity of a 

linear classifier. One approach we found useful is to first develop a linear classifier 

(with feature selection) and then augment the linear features by constructing from 

them nonlinear functions, say, quadratic terms. This, of course, increases the 

number of parameters to be determined. However, the problem remains linear in 

the augmented feature space and linear classifiers can be developed. Furthermore, 

our explicit approach produces new features that remain interpretable as interaction 

terms. This is unlike the SVM classifiers that map implicitly into a much 

higher dimensional linear feature space, without interpretability. In addition, we 

can reduce the dimensionality of our augmented feature space by additional feature 

selection via exhaustive search, optimized by CV. 

8. Boosting requires “weak” base classifiers, C j , j = 1,2,…,j that are combined into 

a more accurate composite classifier, D j = C 1 + C 2 +…=C j . At stage m, the 

boosting algorithm carries out a weighed selection of a base classifier, given all 

previously chosen base classifiers. For the new base classifier C m , larger weights 

are given to samples that are incorrectly classified by the current composite 

classifier D m−1 so that C m will be chosen with a tendency to correctly classify 

previously incorrectly classified samples. 


The author thanks the entire Biomedical Informatics Group for their decadelong, 

essential contributions to the development of the algorithms and softwares 

described. 

References 

1. Lean, C. L., Somorjai, R. L., Smith, I. C. P., Russell, P., Mountford, C. E. 

(2002) Accurate diagnosis and prognosis of human cancers by proton MRS and 

a three stage classification strategy. Annual Reports on NMR Spectroscopy 48, 

71–111. 

2. Somorjai, R. L., Dolenko, B., Nikulin, A., Nickerson, P., Rush, D., Shaw, A. et al. 

(2002) Distinguishing normal from rejecting renal allografts: application of a threestage 

classification strategy MR and IR spectra of urine. Vibrational Spectroscopy 

28, 97–102. 

3. Somorjai, R. L., Dolenko, B., Baumgartner, R. (2003) Class prediction and 

discovery using gene microarray and proteomics mass spectroscopy data: curses, 

caveats, cautions. Bioinformatics 19, 1484–1491. 

4. Huber, P. J. (1985) Projection pursuit. Ann. Statistics 13, 435–475.

394 Somorjai 

5. Somorjai, R. L., Alexander, M., Baumgartner, R., Booth, S., Bowman, C., Demko, 

A., Dolenko, B., Mandelzweig, M., Nikulin, A. E., Pizzi, N., Pranckeviciene, 

E., Summers, R., Zhilkin, P. (2004) A data-driven, flexible machine learning 

strategy for the classification of biomedical data. In: Dubitzky, W. and Azuaje, F. 

(eds.) Artificial Intelligence Methods and Tools for Systems Biology, Chapter 5. 

Computational Biology Series, Vol. 5. Springer, pp. 67–85. 

6. Somorjai, R. L., Demko, A., Mandelzweig, M., Dolenko, B., Nikulin, A. E., 

Baumgartner, R. et al. (2004) Mapping high-dimensional data onto a relative 

distance plane – a novel, exact method for visualizing and characterizing highdimensional 

patterns. Journal of Biomedical Informatics 37, 366–379. 

7. Anderson, T. W., Bahadur, R. R. (1962) Classification into two multivariate normal 

distributions with different covariance matrices. Annals of Mathematical Statistics 

33, 420–431. 

8. Kohavi, R., John, G. H. (1997) Wrappers for feature subset selection. Artificial 

Intelligence 273–324. 

9. Nikulin, A. E., Dolenko, B., Bezabeh, T., Somorjai, R. L. (1998) Near-optimal 

region selection for feature space reduction: novel preprocessing methods for 

classifying MR spectra. NMR in Biomedicine 11, 209–217. 

10. Li, J., Zhang, Zh., Rosenzweig, J., Wang, Y. Y., Chan, D. W. (2002) Proteomics 

and bioinformatics approaches for identification of serum biomarkers to detect 

breast cancer. Clinical Chemistry 48, 1296–1304. 

11. Dataset “JNCI-7-3-02,” downloaded from the NIH/FDA Clinical Proteomics 

Program Databank (http://clinicalproteomics.steem.com). 

12. Vapnik, V. N. (2000) The nature of statistical learning theory, 2nd edition, Statistics 

for Engineering and Information Science. Springer, New York. 

13. Schölkopf, B., Smola, A. J. (2002) Learning with Kernels. Support Vector 

Machines, Regularization, and Beyond. The MIT Press, Cambridge, Mass. 

14. Lee, K. R., Lin, X., Park, D. C., Eslava, S. (2003) Megavariate data analysis 

of mass spectrometric proteomics data using latent variable projection method. 


15. Efron, B. (1982) The Jackknife, the Bootstrap and Other Resampling Plans. SIAM, 

Philadelphia. 

16. Diamandis, E. P. (2003) Proteomic patterns in biological fluids: do they represent 

the future of cancer diagnostics Clinical Chemistry 49(8), 1272–1278. 

17. Somorjai, R. L., Nikulin, A. E., Pizzi, N., Jackson, D., Scarth, G., Dolenko, B., 

Gordon, H., Russel, P., Lean, C. L., Delbridge, L., Mountford, C. E., Smith, I. 

C. P. (1995) Computerized consensus diagnosis: a classification strategy for the 

robust analysis of MR spectra. I. Application to 1 H spectra of thyroid neoplasms. 

Magnetic Resonance in Medicine 33, 257–263. 

18. Wolpert, D. H. (1992) Stacked generalization. Neural Networks 5, 241–259. 

19. Schapire, R. R. (1990) The strength of weak learnability. Machine Learning 5, 

197–227. 

20. Yasui, Y., Pepe, M., Thomson, M. L., Adam, B.-L., Wright Jr., G. L., Qu, Y., 

Potter, J. D., Winget, M., Thornquist, M., Feng, Z. (2003) A data-analytic strategy


for protein biomarker discovery: profiling of high-dimensional data for cancer 

detection. Biostatistics 3, 449–463. 

21. Diamandis, E. P. (2004) Mass spectrometry as a diagnostic and a cancer biomarker 

discovery tool. Molecular and Cellular Proteomics 3(4), 367–378. 

22. Baggerly, K. A., Morris, J. S., Coombes, K. (2004) Cautions about reproducibility 

in mass spectrometry patterns: joint analysis of several proteomic data sets. Bioinformatics 

20, 777–785. 

23. Hu, J., Coombes, K. R., Morris, J. S., Baggerly, K. A. (2005) The importance 

of experimental design in mass spectrometry experiments: some cautionary tales. 

Briefings in Functional Genomics and Proteomics 3(4), 322–331. 

24. Shin, H. and Markey, M. K. (2006) A machine learning perspective on the development 

of clinical decision support systems utilizing mass spectra of blood samples. 

Journal of Biomedical Informatics 39, 2237–2248. 

25. Zhu, W., Wang, X., Ma, Y., Rao, M., Glimm, J., Kovach, J. S. (2003) Detection of 

cancer-specific markers amid massive mass spectral data. Proceedings of National 

Academic Science USA 100(25), 14666–14671. 

26. Somorjai, R. L. and Pranckeviciene, E. (2006) (Unpublished). 

27. Somorjai, R. L., Dolenko, B., Nikulin, A., Nickerson, P., Rush, D., Shaw, A., De 

Glogowski, M., Rendell, J., Deslauriers, R. (2002) Distinguishing normal from 

rejecting renal allografts: application of a three-stage classification strategy to MR 

and IR spectra of urine. Vibrational Spectroscopy 28, 97–102.

Index 

Affi-gel Protein A MAPS II kit, 277 

Aflatoxin B1 (AFB1), 194 

Alkaline phosphatase (ALP) assay, 233, 237 

Alpha-fetoprotein, 194 

Alzheimer’s disease, 310 

Annexin V, 172 

ANOVA, analysis of variance, 100, 112, 114, 259, 

330, 335, 344 

Antibody arrays 

construction, 270–272 

direct labeling methods, for cancer diagnostics, 

268–269 

formats for, 264–266 

labeling and hybridization, of serum samples, 

269–270, 272–274 

and other proteomic strategies, 263–264 

planar, labeling-hybridization methods and, 

266–268 

printing, 269 

scanning and data analysis, 274 

Anti-SAPE antibody, 267 

ArrayQuant scanners, 281 

AutoPix TM , 48. See also Laser-capture 

microdissection 

Axon scanners, 281 

Bayesian classification methods. See Linear 

Discriminant Analysis 

Bayes’s rule, 300 

BCA 200 Protein Assay Kit, 277 

Bead-based multiplex assays. See also Suspension 

antibody microarrays 

detection antibody, 254 

diluents, 254 

general protocol for, 254–255 

sample preparation, 252–254 

screening protocol, 255–256 

Biological variation analysis (BVA) module, of 

DeCyder, 112–113 

“Biomarker panel,” 11 

Bio-Rad Micro Bio-Spin P30 column, 277 

Biotinyl-tyramide, 275 

397 

BLAST, 352, 358 

Blood samples, preanalytical phase 

collection of, 36 

processing of, 37–38 

protease inhibitors, 38 

serum and plasma specimens, characteristics of, 

36–37 

Bradford assay, 225 

Carboxylated beads, 249. See also Suspension 


activation, 251 

antibodies coupling to activated, 251 

cell-counting chamber and, 252 

washing and storage of coupled, 251 

1-(5-Carboxypentyl)-1-methylindodi-carbocyanine 

halide (Cy5) N-hydroxy-succinimidyl 

ester, 163 

1-(5-Carboxypentyl)-1-propylindocarbocyanine 

halide (Cy3) N-hydroxy-succinimidyl 

ester, 163 

CAST. See Clustering Affinity Search Technique 

Celecoxib, and cyclooxygenase-2 (COX-2), 183 

Charge-couple device (CCD) camera-based 

imaging system, 268, 293, 332 

CIMminer (Clustered Image Maps), 259 

Cleavable isotope-coded affinity tag (cICAT) 

labeling technology, 195, 197, 200–201 

Clinical proteomics, 1 

biological specimens, 6–7 

biomarker discovery and, 9–14 

overview and scope of, 2–3 

sample specimens and processing techniques, 4–9 

Cluster analysis techniques, 297–299, 306 

gene expression-based, 307 

Clustering Affinity Search Technique, 259 

Coomassie brilliant blue (CBB) staining, 68, 

332, 339 

Creatinine assay, 142 

Cyanines (Cy3/Cy5), 264, 333 

Cyclooxygenase-2 (COX-2) and celecoxib, 183

398 Index 

CyDye labeling, 95, 105–106, 109–110. See also 

Difference gel electrophoresis (DIGE) 

technology 

Cy2-labeled internal standard, 98–99 

minimal labeling method, 96 

pooled-sample internal standard for, 107 

saturation labeling, 96 

Cy3-labeled streptavidin, 267 

Cytokeratin 19 (CK19), 163 

DA-PLS method. See Discriminant analysis–partial 

least squares method 

DeCyder software, 101, 112–113, 342. See also 

Difference gel electrophoresis (DIGE) 

technology 

Delayed extraction-matrix assisted laser 

desorption/ionization time-of-flight mass 

spectrometry (DE-MALDI-TOF-MS), 194 

Dendrogram, 297, 299 

Dialysis, 150. See also Urine protein profiling, by 

2DE and MALDI-TOF-MS 

Difference gel electrophoresis (DIGE) technology, 

78, 93, 330, 332–333, 342–345 

ANOVA, 100, 112, 114 

in clinical setting, 103 

CyDye labeling, 95, 105–106, 109–110 

Cy2-labeled internal standard, 98–99 

minimal labeling method, 96 

pooled-sample internal standard for, 107 

saturation labeling, 96 

DeCyder suite of software tools, 101, 112–113 

2D gel electrophoresis and poststaining, 94, 

110–111 

experimental design, 108–109 

and statistical confidence, 112–114 

extended data analysis (EDA) software module, 

101, 113 

false discovery rate (FDR), 100 

hierarchical clustering (HC), 102 

labeling materials, 104–105 

LCM and, 163–170 

MeOH/CHCl 3 protocol, 106 

MuDPIT, 97 

multivariate statistical analysis, 114–115 

principle component analysis, 101 

SDS-polyacrylamide gel electrophoresis, 104 

software algorithms, 111–112 

Student’s t-test, 100, 112, 114 

DIGE/MS analysis, 103, 115 

Direct labeling, 264, 268 

protocol for, 272–274 

Discriminant analysis–partial least squares method, 

306, 309–311 

Discrimination power (DP), 303–305 

Dithiothreitol (DTT), 68 

Dot-plot style alignment, of protein sequence, 

358–359 

DTT/IAA equilibration procedure, 73 

ECM. See Extracellular matrix 

EDA software. See Extended data analysis software 

EDC/Sulfo-NHS, 249. See also Suspension 


2DE-MALDI-TOF-MS assay, 194 

EnsEmbl, 352, 356 

Escherichia coli, 307 

Ethylene vinyl acetate (EVA) polymer, 161 

Ettan 2D electrophoresis system, 110 

Exosomes, 142 

ExPASy proteomics tools, 202, 352 

Expressed sequence tags (ESTs), 357 

Extended data analysis software, 101, 113 

Extracellular matrix, 8 

and matrix vesicles (MVs) proteomes, MS and, 

231–232 

alkaline phosphatase assay, 234, 237 

immunofluorescence staining and, 235, 239 

MC3T3-E1, osteoblast cell line, 233, 

236–237, 239 

nanoRPLC-MS/MS, 235, 238–239 

strong cation exchange liquid chromatography, 

of peptides, 234–235, 238 

Extracted ion chromatogram, 219, 221–222, 224 

Fetal bovine serum (FBS), 254 

Fisher’s F-test, 302 

Flow cytometric analysis, 160 

Fluorophores, 264, 267 

photobleaching and quenching of, 274–275 

Fourier transformer mass spectrometry (FTMS), 

172–174 

Free flow electrophoresis (FFE), plasma samples 

fractionation and, 60–61, 67 

Frontotemporal dementia, 310 

GAORS method. See Genetic algorithm-based 

optimal region selection method 

2D Gaussian function, 312 

Gaussian multivariate probability distribution, 300 

2-D Gel-electrophoresis (2-D GE), 292. See also 

2D-PAGE maps analysis 

LCM cells analysis by, 77 

HER-2/neu positive and -negative breast 

tumors, 87–88

Index 399 

isoelectric focusing (IEF), 79–80, 83–84 

MASCOT search engine, 87 

paraffin-embedded sections staining, 81–82 

preparation and analysis, 61, 67–69 

protein sample preparation, 79, 82–83 

SDS-PAGE, 79–80, 84–85 

silver staining and image analysis, 80, 85–86 

tissue block and tissue section preparation, 

78–79, 81 

trypsin digestion and MS analysis, 80, 86–87 

Gel-free mass spectrometry and LCM, 171–172 

Gene expression microarrays, 45 

GenePix Pro 3.0 software program, 280–281 

GeneScan program, 356 

Genetic algorithm-based optimal region selection 

method, 387–388. See also Proteomic mass 

spectroscopy 

gp96, tumor rejection antigen, 169 

GRANTA-519, 308 

HCC. See Hepatocellular carcinoma 

HCL. See Hierarchical clustering 

Hematoxylin and eosin (H&E) staining, tissue 

sample collection, 44, 47–48 

Hepatitis B/C virus (HBV/HCV), 194 

Hepatocellular carcinoma, 8, 11, 59, 67, 163, 

170, 193 

qualitative and quantitative proteomic analysis of 

cICAT labeling technology, 195, 197, 200–201 

2DE-MALDI-TOF-MS assay, 194 

2D-LC-MS/MS for, 195–197, 201–202 

ExPASy proteomics tools, 202 

LCM for, 194–196, 199 

nonenzymatic method (NESP), 196, 198–199 

toludine blue removal and protein mixture 

digestion, 197, 199–200 

HERMeS software package, PCA and, 306 

HER-2/neu oncogene, 85–86, 163 

Hierarchical clustering, 259, 299. See also Cluster 

analysis techniques 

High performance liquid chromatography, 169, 171, 

183, 212–214 

Horseradish peroxidase (HRP), 267 

HPLC. See High performance liquid 


HSP27 protein, 103 

HT-29, COX-2 expressing colon cancer cell 

line, 183 

Human Proteome Organization, 143 

Hydrogels, 271. See also Antibody arrays 

ICAT labeling. See Isotope-coded affinity tag 

labeling 

IMAC-Cu 2+ ProteinChips, 134, 136 

Image analysis. See also 2D-PAGE maps analysis 

by fuzzy logic principles 

image defuzzyfication, 312 

image digitalization, 311–312 

multi-dimensional scaling (MDS), 315–317 

PCA and classification methods, 315 

refuzzyfication, 312–313 

moment functions, 317 

Legendre moments, 318–319 

Image Master Platinum software, 339, 341 

Immobilized pH gradient strip. See also 

Two-dimensional electrophoresis (2DE) 

isoelectric focusing (IEF) with, 60, 65 

rehydration of, 64–65 

Immunofluorescence staining, 235 

InterPro, 352, 361 

Iodoacetamide (IAA), 68 

IPG strip. See Immobilized pH gradient strip 

Isotope-coded affinity tag labeling, 78, 195 

mass spectrometry (MS) and, 181 

celecoxib, cyclooxygenase-2 (COX-2) 

and, 183 

cell culture and harvest, 183, 186 

cell lysis, desalting, and protein quantitation, 

184–187 

cleavable reagents, 182, 185, 187–188 

cleaving biotin, 186, 189 

labeled peptides purification, 185–186, 

188–189 

proteins, denaturation and reduction of, 

185, 187 

quantitative proteomic analysis and, 184 

Java Runtime Environment, 370. See also 

msInspect, for LC-MS data analysis 

KMC (K-Means/K-Medians Clustering), 259 

Kolmogorov–Smirnov test, 335, 339, 341 

Kruskal–Wallis test, 335 

Laser-capture microdissection, 8, 44–45, 160. See 

also Tissue sample collection, for proteomics 

analysis 

AutoPix TM ,48 

cells analysis, by 2-D GE, 77 

HER-2/neu positive and -negative breast 

tumors, 87–88 

isoelectric focusing (IEF), 79–80, 83–84

400 Index 

MASCOT search engine, 87 

paraffin-embedded sections staining, 81–82 

protein sample preparation, 79, 82–83 

SDS-PAGE, 79–80, 84–85 

silver staining and image analysis, 80, 85–86 

tissue block and tissue section preparation, 

78–79, 81 

trypsin digestion and MS analysis, 80, 86–87 

development, 161 

different labeling techniques and, 170 

DIGE and, 163–170 

and 2-D GE, 162–163 

gel-free mass spectrometry and, 171–172 

for HCC and non-HCC hepatocytes isolation, 

194–195, 199 

LCM lysate, 49–50 

and mass spectrometry analysis, 172–174 

PixCell II instrument, 48–49, 161 

and protein chip technology, 172 

separation methods and, 171 

for tissue sample collection, 44–45 

Veritas TM ,48 

Laser microdissection and pressure catapulting, 8 

LC-ESI-MS/MS. See Liquid 

chromatography-electrospray ionization 

tandem mass spectrometry 

LCM. See Laser-capture microdissection 

LC-MS data. See Liquid chromatography-mass 

spectrometry data 

LC-MS/MS. See Liquid chromatography-tandem 


LDA. See Linear Discriminant Analysis 

Legendre moments, 317–319 

Levene’s test, 334 

Linear Discriminant Analysis, 300–301, 

315–316 

Liquid chromatography-mass spectrometry data, 

370, 374–376, 377 

Liquid chromatography-mass spectrometry data 

analysis, msInspect for, 369 

data viewing and navigation, 371–373 

locating peptides in, 373–376 

low-quality peptides, elimination of, 376 

peptide quantitation, 376–378 

software installation for, 370 

Liquid chromatography-tandem mass spectrometry, 

170, 171 

label-free, for biomarker identification, 209–210 

albumin/IgG depletion, 211–213 

chromatographic alignment, 218–221 

data transformation and normalization, 222 

HPLC, 212–214 

mass spectrometer, 212, 214 

MS/MS spectral filtering, 216–217 

peptide identification, 217–218 

peptide quantification, 221–222 

statistical analysis, 223 

zoom scan data processing, 214–216 

LMPC. See Laser microdissection and pressure 

catapulting 

two-dimensional (2D-LC/MS/MS), 78 

Lysine labeling, 169 

MALDI/SELDI protein profiling, of serum, 

125–126 

on MALDI-TOF–TOF 

data collection, 131–132 

MB fractionation, of human serum, 131 

protein identification by, 132–133 

MB-based fractionation, 127, 128, 131 

SELDI and MALDI spectra acquisition, 129 

SELDI ProteinChip, 130 

(Magnetic bead based) 

on SELDI-TOF, 133 

ProteinChip arrays, 134–135 

SPA matrix addition, 135 

spectra collection on, 135–138 

MALDI-TOF-MS. See Matrix-assisted laser 

desorption time of flight mass spectrometry 

MALDI-TOF, peptide mass fingerprinting (PMF) 

and, 62, 71 

MALDI-TOF–TOF, serum protein profiling on 

data collection, 131–132 

MB fractionation, of human serum, 131 

protein identification by, 132–133 

Maleimide labeling, of cysteine 

sulfhydryls, 96 

MARS. See Multiple affinity removal system 

MASCOT software, 81, 87–88 

Mass spectrometry, 58–59, 214 

ICAT labeling and, 181 

celecoxib, cyclooxygenase-2 (COX-2) 

and, 183 

cell culture and harvest, 183, 186 

cell lysis, desalting, and protein quantitation, 

184–187 

cleavable reagents, 182, 185, 

187–188 

cleaving biotin, 186, 189 

labeled peptides purification, 185–186, 

188–189 

proteins, denaturation and reduction of, 

185, 187 

quantitative proteomic analysis and, 184 

LCM and, 172–174

Index 401 

Matrix-assisted laser desorption time of flight mass 

spectrometry, 125–126, 142, 163, 194 

LCM and, 171 

for urine protein profiling. See Urine protein 

profiling, by 2DE and MALDI-TOF-MS 

MAVER-1 cell lines, 308 

MC3T3-E1, osteoblast cell line, 233, 236–237, 239 

MDS technique. See Multi-dimensional scaling 

techniques 

MeOH/CHCl 3 protocol, 106 

Metalloproteins, 350 

MicroSol-IEF, ZOOM ® , 60, 65–66 

Miniaturized parallelized sandwich immunoassays. 

See Suspension antibody microarrays 

MS. See Mass spectrometry 

MS-Fit software, 81 

msInspect, for LC-MS data analysis, 369 

data viewing and navigation, 371–373 

locating peptides in, 373–376 

low-quality peptides, elimination, 376 

peptide quantitation, 376–378 

software installation for, 370 

MS/MS spectral filtering, 216–217 

Multi-dimensional scaling techniques, 313, 315–317 

MultiExperiment Viewer (MeV), 259 

Multiple affinity removal system, 59, 63–64 

Multiplexed bead-based flow-cytometry assays, 266 

Nanoflow reversed-phase LC-tandem mass 

spectrometry (nanoRPLC-MS/MS), 233, 235, 

238–239 

Non-enzymatic sample preparation (NESP), 194, 

196, 198–199 

One-antibody label-based assays, 264–266 

One-dimensional liquid chromatography coupled 

with tandem mass spectrometry 

(1D-LC-MS/MS), 201–202. See also 

Hepatocellular carcinoma 

16 O/ 18 O isotopic labeling, 78 

Osteoblasts, 232. See also Extracellular matrix 

MC3T3-E1, 233, 236–237, 239 

2D-PAGE maps analysis, 291 

dedicated software packages and, 292–294 

image analysis 

fuzzy logic, 311–317 

moment functions, 317–319 

spot volume datasets, analysis of, 294 

cluster analysis, 297–299 

DA-PLS method, 309–311 

linear discriminant analysis, 300–301 

pattern recognition methods, 306–309 

PLS regression and DA-PLS regression, 306 

principal component analysis, 294–297 

SIMCA method, 301–305 

PALM microlaser dissector, 161 

Parkinson’s disease, 310 

Partial least squares regression, 306, 308, 338 

Pattern recognition methods 

cluster analysis. See Cluster analysis techniques 

PCA. See Principle component analysis 

proteomic mass spectroscopy and. See Proteomic 

mass spectroscopy 

SIMCA classification. See Soft-independent 

model of class analogy method 

PCA. See Principle component analysis 

PCa-24 protein, in epithelial cells, 172 

PDB. See Protein data bank 

PDQuest system, 293, 308 

Peptide mass fingerprinting, MALDI-TOF and, 

62, 71 

Peptide/protein separation system, 171 

PerkinElmer scanners, 281 

Pfam, 352, 360 

PIN. See Prostatic intraepithelial neoplasia 

PIVKA-II, 194 

PixCell II system, 48–49, 77, 82–83, 161. See also 

Laser-capture microdissection 

Planar antibody arrays, 248, 264. See also Antibody 

arrays 

main formats of, 265 

types of, labeling-hybridization methods and, 

266–268 

10plex soluble receptor assay, 255–256, 258. See 

also Bead-based multiplex assays 

PLS regression. See Partial least squares regression 

PMF. See Peptide mass fingerprinting 

PMS. See Proteomic mass spectroscopy 

Position-specific scoring matrix, 361 

Post-translational modification (PTM) profiling, on 

selected spots, 71–72 

Principle component analysis, 101, 259, 294–297, 

308, 315–316, 343. See also 2D-PAGE maps 

analysis 

Escherichia coli, 307 

for explorative data analysis, 336–338 

in HERMeS software package, 306 

U937 human lymphoma cell line and, 307 

Prostatic intraepithelial neoplasia, 44 

Protein chip technology and LCM, 172 

Protein data bank, 352, 360–361 

Protein precipitation, 143–144

402 Index 

Protein profiling of human plasma samples , by 

two-dimensional electrophoresis, 57 

coomassie brilliant blue G-250 staining, 68 

destaining, in-gel deglycosylation and in-gel 

tryptic digestion, 61–62, 69 

2D gels preparation and analysis, 61, 67–69 

difference in gel electrophoresis (DIGE) 

system, 59 

free flow electrophoresis (FFE), samples 

fractionation by, 60–61, 67 

high-abundance proteins depletion, by 

immunoaffinity column, 59, 63–64 

HPPP, 58 

IPG gel strip rehydration, 64–65 

isoelectric focusing (IEF), with IPG strip, 

60, 65 

MALDI plating and peptides desalting, 

62, 69–71 

mass spectrometry (MS), 58–59 

microscale solution isoelectric focusing, 

ZOOM ® , 60, 65–66 

peptide mass fingerprinting, MALDI-TOF and, 

62, 71 

PTMs profiling, on selected spots, 

71–72 

samples preparation, 59, 62 

TCA/acetone precipitation, 64 

Proteomic data, statistical analysis, 327 

classical dyes, 339–342 

confirmatory univariate data analysis, 333–335 

DIGE approach, 342–345 

experimental design for, 328 

data processing, 330–333 

pooling, 330 

replicates, 329–330 

exploratory multivariate data analysis, 335 

marker selection, 338–339 

principal component analysis, 336–338 

Proteomic mass spectroscopy, 383 

statistical classification strategy (SCS) for 

classifier aggregation, 390 

data visualization, 384–385 

feature selection/extraction (FSE), 386–388 

preprocessing, 385–386 

robust classifier development, 388–390 

Proteomics analysis, for tissue sample collection 

formalin fixation, 43–44 

hematoxylin staining, 47–48 

immunocapture procedure, 46 

immunofluorescence staining, 48 

laser-capture microdissection (LCM), 44–45 


PixCell II instrument, 48–49 



SELDI-TOF-MS, 46 

PSSM. See Position-specific scoring matrix 

QTC (QT CLUST), 260 

Resonance light scattering (RLS), 268 

Reverse protein arrays, 268 

Rolling-circle amplification (RCA), 268 

SCX-LC. See Strong cation exchange liquid 


SDS-PAGE. See Sodium dodecyl 

sulfate-polyacrylamide gel electrophoresis 

SELDI. See Surface-enhanced laser 

desorption/ionization 

SELDI-TOF. See Surface-enhanced laser 

desorption/ionization time-of-flight 

Self Organizing Maps (SOM), 259 

Self Organizing Tree Algorithm (SOTA), 259 

Shapiro-Wilk test, 334, 339 

Significance Analysis of Microarrays (SAM), 259 

Silver staining, 80, 332–333. See also Laser-capture 


and image analysis, 85–86 

SIMCA method. See Soft-independent model of 

class analogy method 

SKBR-3, breast cancer cell line, 171 

Sodium dodecyl sulfate-polyacrylamide gel 

electrophoresis, 84–85, 94, 96, 

104, 110–111 

isoelectric focusing (IEF) and, 79–80 

PROTEAN II xi Cell system (Bio-Rad) for, 84 

Soft-independent model of class analogy method, 

301–305, 307–308 

Streptavidin-R-Phycoerythrin (SAPE), 267 

Strong cation exchange liquid chromatography, 

234–235, 238 

Strong cation exchange liquid chromatography, of 

peptides, 233, 234–235, 238 

Student’s T-test, 334 

2-(4-Sulfophenylazo)-1,8-dihydroxy-3,6- 

naphthalenedisulfonic acid (SPADNS), 60, 67 

Support vector machines, 388–389. See also 

Proteomic mass spectroscopy 

Surface-enhanced laser desorption/ionization, 9, 13, 

125–126, 142, 172, 194 

serum protein profiling on, 133 

ProteinChip arrays, 134–135 

SPA matrix addition, 135 

spectra collection on, 135–138

Index 403 

Suspension antibody microarrays, 247–248 

bead-based multiplex assays processing, 

252–256 

limit of detection (LOD), 257 

miniaturized multiplexed protein assays, 

analytical performance, 256–259 

pattern generation, 259–260 

principle of, 249 

production, coupling to carboxylated 

microspheres, 249–252 

SVMs. See Support vector machines 

TAAs arrays. See Tumor-associated antigen arrays 

TCA/acetone precipitation, 2DE and, 64 

Tissue sample collection, for proteomics analysis 

formalin fixation, 43–44 

hematoxylin staining, 47–48 

immunocapture procedure, 46 

immunofluorescence staining, 48 

laser-capture microdissection (LCM), 44–45 


PixCell II instrument, 48–49 



SELDI-TOF-MS, 46 

Tributylphosphine (TBP), 68 

Trichloroacetic acid (TCA) precipitation, 143–144, 

146–147, 151 

Trifluoroacetic acid (TFA), 182 

Tris buffer, 277 

TTEST (T-tests), 259 

Tumor-associated antigen arrays, 266, 269 

Two-dimensional electrophoresis (2DE), 11, 

194, 328 

biological replicates, 329–330 

LCM and, 162–163 

for protein profiling of human plasma 

samples, 57 

coomassie brilliant blue G-250 staining, 68 

destaining, in-gel deglycosylation and in-gel 

tryptic digestion, 61–62, 69 

2D gels preparation and analysis, 61, 67–69 

difference in gel electrophoresis (DIGE) 

system, 59 

free flow electrophoresis (FFE), samples 

fractionation by, 60–61, 67 

high-abundance proteins depletion, by 

immunoaffinity column, 59, 63–64 

HPPP, 58 

IPG gel strip rehydration, 64–65 

isoelectric focusing (IEF), with IPG strip, 

60, 65 

MALDI plating and peptides desalting, 62, 

69–71 

mass spectrometry (MS), 58–59 

microscale solution isoelectric focusing, 

ZOOM ® , 60, 65–66 

peptide mass fingerprinting, MALDI-TOF and, 

62, 71 

PTMs profiling, on selected spots, 71–72 

samples preparation, 59, 62 

TCA/acetone precipitation, 64 

technical replicates, 329–330 

for urine protein profiling. See Urine protein 

profiling, by 2DE and MALDI-TOF-MS 

Two-dimensional fluorescence difference gel 

electrophoresis (2-D DIGE), 78 see also 

Difference Gel electrophoresis (DIGE) 

technology 

Two-dimensional liquid chromatography tandem 

mass spectrometry (2D-LC-MS/MS), 78, 170 

see also liquid chromatography tandem mass 

spectrometry 

for HCC and non-HCC hepatocytes isolation, 

195–197, 201–202 

Two-dimensional polyacrylamide gel 

electrophoresis (2D PAGE), 

162–163, 174 see also 2D gel electrophoresis, 

2D gels 

Two-factor ANOVA (TFA), 259 

Ultrafiltration technique, 144 

Urine protein profiling, by 2DE and 

MALDI-TOF-MS, 141–142 

analytical/profiling techniques, 145–146 

organic solvent precipitation protocol, 145, 

147–148 

protein precipitation, 143–144 

TCA/acetone precipitation protocol, 145–147 

ultrafiltration-SPE, 144–145, 148–149 

urine SPE, 149 

Veritas TM , 48. See also Laser-capture 


Web-based tools, for protein classification, 349 

BLAST, 352, 358 

dot-plot style alignment, of protein sequence, 

358–359 

EnsEmbl, 352, 356 

evolution-based classification schemes, 351 

ExPASy, 352 

expressed sequence tags (ESTs), 357 

GeneScan program, 356

404 Index 

InterPro, 352, 361 

MEROPS, 361 

metalloproteins, 350 

PDB, 352, 360–361 

Pfam, 352, 360 

PRINTS, 361 

PROSITE, 361 

sequence and structure of proteins and, 352–356 

SMART, 360 

Western blotting protocols, 275 

XIC. See Extracted ion chromatogram 

ZOOM ® , MicroSol-IEF, 60, 65–66 

Zoom scan triple-play experiment, 214

View - ResearchGate

Create successful ePaper yourself

Delete template?

Save as template?