Journal home
Advance online publication
Current issue
Archive
Press releases
Methagora
Focuses
Guide to authors
Online submissionOnline submission
Permissions
For referees
Free online issue
Contact the journal
Subscribe
naturejobs
For Advertisers
work@npg
naturereprints
About this site
For librarians
Application notes
 
NPG Resources
Nature
Nature Biotechnology
Nature Protocols
Nature Genetics
Nature Chemical Biology
Nature Cell Biology
Nature Neuroscience
Nature Reviews Genetics
Nature Reviews Molecular Cell Biology
Nature Reviews Drug Discovery
Nature Conferences
NPG Subject areas
Biotechnology
Cancer
Chemistry
Clinical Medicine
Dentistry
Development
Drug Discovery
Earth Sciences
Evolution & Ecology
Genetics
Immunology
Materials Science
Medical Research
Microbiology
Molecular Cell Biology
Neuroscience
Pharmacology
Physics
Browse all publications
Review
Focus on RNA interference - A user’s guide
Contents Foreword Commentaries Reviews
Perspectives Glossary NPG Library Feedback


Nature Methods - 4, 807 - 815 (2007)
Published online: 27 September 2007; | doi:10.1038/nmeth1093

Mass spectrometry–based functional proteomics: from molecular machines to protein networks

Thomas Köcher & Giulio Superti-Furga

Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 19, 1090 Vienna, Austria.

Correspondence should be addressed to Thomas Köcher gsuperti@cemm.oeaw.ac.at or Giulio Superti-Furga tkoecher@cemm.oeaw.ac.at

The study of protein-protein interactions by mass spectrometry is an increasingly important part of post-genomics strategies to understand protein function. A variety of mass spectrometry–based approaches allow characterization of cellular protein assemblies under near-physiological conditions and subsequent assignment of individual proteins to specific molecular machines, pathways and networks, according to an increasing level of organizational complexity. An appropriate analytical strategy can be individually tailored—from an in-depth analysis of single complexes to a large-scale characterization of entire molecular pathways or even an analysis of the molecular organization of entire expressed proteomes. Here we review different options regarding protein-complex purification strategies, mass spectrometry analysis and bioinformatic methods according to the specific question that is being addressed.
The ability to characterize cellular, subcellular or organismal proteins in an unbiased fashion using mass spectrometry–based methods1, 2 has led to important insights into cell biological processes as well as signal transduction pathways3, 4, 5. We envision that together with sophisticated analytical methods targeting the genome and metabolome, ongoing advances in proteomic methodologies will eventually lead to important improvements in our understanding of pathological processes and ultimately in clinical practice6, 7.

Recent years have seen widely acclaimed breakthroughs in the large-scale characterization of protein complexes of entire organisms8, 9 and pathways3, 4. In parallel, an ever-increasing number of highly informative studies has increased our knowledge of selected molecular machines10, 11, 12. Finally, we have seen stunning cases of three-dimensional structures of molecular assemblies13, 14, 15, 16, converting the 'seeing-is-believing' fraction of researchers. As a result, characterization of the molecular partners of a protein has become a critical part of analyzing its biological function, next to knocking down its expression by RNA interference or studying its subcellular localization.

There are two distinct fundamental approaches in proteomics. The original concept of proteomics—expression-based proteomics—is traditionally linked to two-dimensional electrophoresis17, and can be defined as the attempt to catalog the expression of all proteins present in cells, tissues, organisms, or the differential analysis of biological systems reacting upon external stimuli or of specific disease conditions. Although this strategy has been successful in many cases18, 19, it suffers from the huge dynamic range of protein expression in biological systems, with regulatory proteins frequently hidden by abundant proteins. It is also limited by the difficulties associated with precise mass spectrometry–based quantitation20, 21, and the variability of gene and protein expression resulting from genetic and environmental variability. Additionally, most cellular processes are partly controlled by post-translational modifications, which are difficult to analyze with presently available tools in a comprehensive and quantitative manner22, 23. Monitoring protein expression can not be replaced by mRNA microarray technology because changes in protein abundance and mRNA abundance only moderately correlate with each other24, 25. Ultimately, expression-based proteomics offers only correlative relationships, requiring extensive validation.

The other approach—functional proteomics—is fundamentally and strategically different. Most cellular functions are executed by protein complexes, acting like molecular machines26. The term 'functional proteomics' derives from the hypothesis that the association of proteins would suggest their common involvement in a biological function, analogous to the 'guilt by association' concept in criminal investigation. There are many different functional proteomics technologies (Fig. 1), such as those (i) based on affinity purification procedures and physical measurement of the associated protein partners from physiological fluids27, 28, (ii) relying on pairwise testing of two partners, based on biochemical automation or chip technologies29, (iii) based on genetic readout systems such as the various two-hybrid systems and also phage-display technologies30, 31, 32, and (iv) based on computational prediction methods, some of which are based on known three-dimensional structures and binding motifs33, 34.

Figure 1. Decision tree of options for the most common different protein-protein (or protein-ligand) interaction experimental strategies.
Figure 1 thumbnail

Several permutations and variants of the individual approaches are possible. 2H, two-hybrid.



Full FigureFull Figure and legend (35K)
The data sets from these approaches can be integrated and compared to obtain additional insights about the function and the evolution of biological systems35, 36. Although 'maps' and even compilation of maps into 'atlases' of protein-protein interaction networks are currently incomplete, future maps might describe the cellular activities in a comprehensive and potentially even quantitative fashion. Driven by the vision of quantitative biology, the quantitation of proteins is an emerging trend in proteomics. Various methods are now used to distinguish complex components from contaminating proteins4, 37, 38 and to determine the stoichiometry of the proteins present in a complex39. Functional proteomics and expression-based proteomics provide complementary views onto the ensemble of proteins and their associations within a biological system.

Here we review mass spectrometry–based functional proteomics approaches, including protein-complex purification strategies, mass spectrometry analysis as well as data analysis and interpretation.

Functional proteomics
Goals of protein-protein interaction studies. Modern genetics and functional genomics experiments often lead to the identification of gene products with a putative biological function but a poorly characterized biochemical mode of action. Functional proteomics experiments allow researchers to identify the interacting proteins, facilitating mapping of a protein to a particular biological pathway. If only individual connections to a potential pathway are sought, then the highly effective and less resource-intensive binary yeast two-hybrid approach40 is recommended (Fig. 1).

Biochemical purification of protein complexes followed by characterization of their components by mass spectrometry41 requires a greater commitment of research capabilities, but the potential informational gain is an order of magnitude larger than the information about individual binary interactions. The researcher can discover the entire cellular machinery in which the protein of interest participates, which can seldom be recapitulated as the sum of binary interactions. Clues can be obtained about the links of the complex to various cellular signaling pathways as well as about cell biological processes governing its 'birth', 'death', stability and subcellular localization. In a characterization of protein complexes a protein of interest initially serves as the bait in cycles of purification under various physiological conditions (for example, in the presence of a stimulus) and subsequent purifications use identified preys as baits.

To characterize the proteins involved in entire biological processes and signaling pathways42, the approach is essentially identical to the study of a single protein complex but the scale is larger, starting with as few as five5 or as many as 32 (ref. 3) entry points. Although such efforts require multiplication of resources, the synergistic effect of efficiency gains and economy of labor lead to a nonlinear relationship between required effort and output.

The ultimate goal of functional proteomics is to decipher the molecular function of an entire cell by generating a construction master plan describing all molecular machines, their functions, their reactions to external stimuli and their interconnectivities. An interdisciplinary and community-wide effort will be required to realize this vision, even when limited to the characterization of a few cellular states.

Biochemical approaches to map protein-protein interactions. Presently available protein-complex characterization methods can be grouped into methods to isolate endogenous protein complexes from cells and in vitro methods using recombinant proteins (Fig. 1). The simplest in vitro method uses immobilized recombinant proteins to capture putative interaction partners by binding. After washing, the bound proteins are eluted and typically identified by mass spectrometry.

Phage display technology43 screens for interacting proteins by expressing recombinant proteins on the surface of phage particles. Interacting proteins are identified as phage to bind to selected immobilized molecules. By fusing foreign cDNA libraries into the phage genome, libraries containing billions of proteins and peptides can be screened in a single experiment. After washing, interacting phages are amplified and the DNA sequence of the putative interacting molecule is sequenced.

Protein arrays29, 44 are constructed by spotting proteins in defined locations on a chip surface. These proteins can be recombinant proteins45, samples from patients46 or antibodies47. Protein arrays have been used to detect protein-protein, protein–nucleic acid interactions and biochemical functions such as kinase activity48. Detection can be based on fluorescent49 or chemoluminescent probes, radioisotope labeling48, or mass spectrometry50.

Approaches for purifying endogenous protein complexes include antibody-based (immunochemical) methods, biochemical purification methods and affinity chromatography (Fig. 1). Subsequent characterization requires unambiguous identification of large numbers of proteins rapidly and with high sensitivity41. This can only be achieved by using state-of-the-art mass spectrometry, which can analyze and identify thousands of proteins in a few hours.

In contrast to universally applicable approaches such as immunochemical methods, biochemical purifications are now limited to rather abundant protein complexes such as the ribosome, the proteasome or the spliceosome51, 52. A specific purification strategy must be developed for each type of complex.

In immunochemical methods, protein complexes are precipitated from a cell lysate by using an immobilized antibody to a known component of a complex53 (Fig. 2a). The protein complex is then purified by washing away nonspecific interactors. A fundamental advantage of this approach is that protein complexes can be isolated with a specific and efficient antibody from all types of biological sources, including tissue samples from patients, circumventing the need to express the target protein. Whenever feasible, this route to characterize endogenous complexes should be chosen. Initiatives are underway to generate antibodies to the entire human proteome, which should enable large-scale studies54. Modifications of this approach use affibodies, minibodies or aptamers55, 56.

Figure 2. Main routes of protein-complex purification.
Figure 2 thumbnail

(a) In immunochemical purification, the endogenous protein complex is precipitated using an antibody to the target protein, allowing protein-complex characterization without expression of a tagged protein. (b) In one-step affinity purification, the purified protein complex is obtained by expression of the tagged construct in the cell, followed by specific binding and elution from an affinity column. (c) In two-step affinity purification, two rounds of specific binding and specific elution assure a highly purified protein complex with little contaminating proteins at the cost of losing transient interactions.



Full FigureFull Figure and legend (32K)
Affinity purification–based techniques exploit the biochemical properties of a tag attached to the bait protein to purify the other components of a protein complex (Fig. 2b,c). Using standard cloning techniques, target-protein and peptide-tag coding sequences are fused, and the resulting construct is expressed in target cells. Available tagging systems include His tags, glutathione S-transferase (GST) tags, Flag tags, the calmodulin-binding peptide, the streptavidin binding peptide or the in vivo biotinylation of the target tagged peptide using coexpression of the BirA ligase57, 58. Combinations of these tags have been also used in various configurations (Fig. 2c). Specific columns with a high specificity for a certain tag are used to enrich the protein complex. One of the first protein complexes of considerable size analyzed by tagging technology was the spliceosomal U1 small nuclear ribonucleoprotein complex59. It was isolated by fusing a His tag to a known component of the protein complex and purifying the complex by nickel–nitrilotriacetic acid affinity chromatography.

One of the most successful tags developed to date is the tandem affinity purification (TAP) tag60, 61, which uses two sequential enrichment steps. Developed for yeast, the original TAP tag is composed of a protein A tag, followed by a tobacco etch virus (TEV) protease cleavage site and a calmodulin binding peptide60, 61. In the first purification step, the protein complex is purified from the cell lysate on an immunoglobulin gamma (IgG) affinity resin. The target protein complex is cleaved from the protein A tag with TEV protease. The eluate is then enriched in a second affinity purification step on an immobilized calmodulin column; elution yields a highly purified protein complex. Notably, all binding and elution steps are performed in mild buffer conditions, keeping the native complex as intact as possible. The TAP-tag can be fused to the N or C termini of the target protein. Fusion at one terminus might disturb the interaction of the protein with its partners; therefore, it might be necessary to test both variants. Several variants of this two-step approach, using different combinations of tags, have been described3, 60, 62, 63, 64, 65. The combination of protein G, a TEV protease cleavage site and streptavidin-binding peptide facilitated a tenfold improvement of recovery for complexes from mammalian cells62.

In yeast, bait proteins are usually expressed by replacing the endogenous gene by homologous recombination with its tagged version61. In higher eukaryotic systems such as human cell lines, the expression of the tagged protein in the natural chromosomal context cannot be easily achieved. Consequently, other methods of expressing the fusion protein have to be used such as the stable integration of the construct by retrovirus-mediated gene transfer3 or by transient transfection66. In our experience, the problems with transient transfection concern not only expression over endogenous levels of the protein but also the cellular shock associated with the large amount of newly translated proteins, often resulting in their association with chaperones. Therefore, whenever feasible, infection with viral vectors is preferred. Additionally, as the bait protein is overexpressed, it competes with the endogenous protein for complex formation. The recovery rate of protein complexes can be increased by abolishing competition from the endogenous protein by RNA interference–mediated knockdown63.

How should the researcher judge which tag to choose? Is a one-step or a two-step procedure recommended? Specific protocols might work best for a specific protein complex, cell line or organism. Successful purification of protein complexes from eukaryotic cells has been reported using both one-step and two-step purifications27. As the reader would guess, one-step purifications on average lead to a higher number of contaminating proteins and the two-step procedures tend to yield cleaner results but weak interactions can be lost62. Typically, the recovery yields for one-step procedures are 3–5 times higher than for two-step purifications. Many researchers prefer to start their investigation with a shorter list of interacting proteins obtained with a two-step procedure because the validation of data is the main experimental bottleneck. A recently developed flexible tag allows one- or two-step purifications62.

As tagging can interfere with protein function, it is recommended to try both N-terminal and C-terminal fusion if no additional information, such as the three-dimensional structure or subcellular localization data is available. On average, however, only in 10–15% of cases does the tag interfere with the function of the protein28, 67.

One new area of development is the characterization of complete protein complexes in their native form with mass spectrometry68. Similarly to other top-down mass spectrometry techniques69, these experiments are now limited to abundant complexes such as ribosomes70, proteasomes71 or exosomes72 because large amounts of the purified protein complex are required.

Cross-linking techniques can also be applied to the study of protein complexes73 and can be combined with biochemical approaches to purify protein complexes74. In general, these approaches aim for two major improvements over conventional techniques. They either attempt to maintain the interaction of loosely bound interacting proteins by covalently linking them, or they attempt to map the topology of the purified complex. In the latter approach, cross-linked peptides originating from different proteins prove that there is a direct interaction between two proteins and give clues about their binding interfaces. Although several successful experiments have been reported73, 74, the need for large amounts of protein and additional sample preparation steps have limited widespread application of these techniques.

Mass-spectrometric protein identification
Sample preparation options. In the majority of proteomics experiments the purified proteins are separated by one-dimensional SDS-PAGE and stained with a mass-spectrometry compatible dye such as silver, fluorescence-based dyes such as SYPRO ruby or Coomassie (Fig. 3). SDS-PAGE separation removes unwanted contaminants such as buffer components from the protein sample, and the sample complexity is decreased by separating the proteins according to molecular weight. Additionally, the staining pattern of the gel can be used as semiquantitative assessment of the experiment, for example, to evaluate the quantity of the bait protein versus the interacting components or compare an experiment with a control sample.

Figure 3. Flowchart of different options for sample preparation, protein separation and mass-spectrometric analysis.
Figure 3 thumbnail

The main routes to protein identification are shown.



Full FigureFull Figure and legend (12K)
Individual protein bands of interest are excised or the entire lane is cut into slices, followed by in-gel digestion with a specific protease such as trypsin to produce peptides for mass spectrometry analysis75. Unlike intact proteins, peptides have lower detection limits, they can be extracted from gels, and their cleavage pattern by specific proteases can provide additional information. Note that the extraction efficiency of peptides from a gel is only about 20% and is dependent on the primary structure of the peptide76.

As an alternative to gel-based protocols, protein mixtures can be digested in solution without prior separation of individual components and analyzed by mass spectrometry77 (Fig. 3). In many cases buffer components such as detergents prohibit direct analysis as they can interfere with the mass spectrometry ionization process. In such cases protein samples can be precipitated with trichloroacetic acid, washed and dissolved in a digestion buffer containing the appropriate protease. Although successful, the procedure requires a high-pressure liquid chromatography (HPLC) system that can readily resolve peptides in a complex mixture. An option to improve the peak capacity is two-dimensional liquid chromatography separation77, 78. The main advantages of in-solution digestion protocols are the reduction of the time required for analysis77 and a higher recovery of peptides compared to in-gel digestion.

Peptide separation options. The peptide mixture can be directly analyzed by mass spectrometry79, 80, or separated by HPLC before mass-spectrometric analysis (Fig. 3). Direct analysis of a peptide sample is rapid compared to liquid chromatography–mass spectrometry (LC-MS) because the loading of a sample onto a liquid chromatography system and subsequent separation is a time-consuming procedure (45–90 min per experiment). In contrast, the use of HPLC systems coupled to a mass spectrometer not only results in a dramatically higher number of detected peptides but also facilitates automation. Typically, an even greater number of proteins can be identified using two-dimensional peptide separations such as combining strong ion exchange with conventional reversed phase separation78.

Good chromatographic resolution is important because if peptides elute in a relatively short period, the high concentration will yield high ion counts in the mass spectrometer. More peptides can be sequenced by the mass spectrometer if they are well-separated over time. In practice, most laboratories aim to achieve chromatographic resolution in the range of 10–15 s full width half maximum (FWHM). Another important aspect of on-line coupled LC-MS instrumentation is the flow rate, which inversely affects the ionization efficiency80.

In general, low-complexity samples can be analyzed simply and rapidly without peptide separation. Complex samples such as the peptide mixtures generated from a complete pulldown require chromatographic separation before mass-spectrometric analysis.

Mass-spectrometric options. Two soft ionization techniques have been developed for the ionization of macromolecules (including proteins and peptides) into the gas phase, electrospray ionization (ESI)81 and matrix-assisted laser desorption/ionization (MALDI)82 (Fig. 3). In the most common instrumental designs, ESI is performed with mass spectrometers capable of tandem mass spectrometry (MS/MS) experiments. Ion traps, quadrupole time-of-flight instruments (Q-TOF), Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometers (FTMS) and the Orbitrap are the most common types of instrumentation now used in high-end protein analysis. A critical factor for the unambiguous identification of proteins is high mass accuracy of the mass spectrometer, best realized with FTMS83. But high sensitivity and high sequencing speed might compensate for lower mass accuracy. Therefore, although all of the above mentioned mass spectrometers can be used for protein identification, a specific analytical question might be answered optimally using specific mass-spectrometric instrumentation. Q-TOF instruments now have mass accuracies for typical tryptic peptides in the range of 10–20 p.p.m., the Orbitrap can achieve mass accuracies of approximately 2 p.p.m. and FT-ICR instrumentation can be optimized for sub-p.p.m. mass accuracy.

But even mass windows in the range of 1 Da (typical for ion traps or triple quadrupole instruments) allow the unambiguous identification of proteins. In such instances, improved fragment ion data are necessary to ensure correct protein identifications. In most cases, ESI experiments are performed by directly coupling a HPLC system to the mass spectrometer. The experiment is performed in a data-dependent acquisition mode whereby the MS/MS experiments are automated based on the eluting peptides1. The m/z values of the peptides eluting at a given time from the column are recorded, and the peptide with the most intense signal is automatically selected, fragmented and the fragment ion spectrum is recorded. This procedure can be repeated with the other eluting peptides, but is limited by the duty cycle of the mass spectrometer, detection limits and the m/z value of the peptides.

An alternative method to ESI-MS is MALDI-MS, usually performed with time-of-flight (TOF) instruments recording only the mass spectrum of the peptide ions; no fragmentation information is collected. This analytical strategy is rapid but protein identification based on MS1 data alone uses less statistical information as compared to MS/MS data. This can be problematic when analyzing protein mixtures or samples containing modified proteins, common in samples from higher eukaryotes. Instruments such as MALDI–Q-TOFs or MALDI-TOF/TOFscan operate in an MS/MS mode. It is common practice to separate complex peptide samples off-line with an HPLC system. A limitation of MALDI-MS/MS, however, is that as mostly singly charged ions are generated, and thus weaker fragment ion spectra are produced compared to those generated by multiply charged electrospray ions.

If analysis time is the critical factor in the analytical strategy, a simple MALDI-TOF analysis without HPLC separation (with a time frame of seconds) is recommended. If high sample complexity is expected and identification of numerous proteins covering a substantial dynamic range is required, then LC-MS/MS is the method now chosen by most proteomics researchers. Given the importance of unambiguous identification of proteins, most proteomic studies published today are based on LC-MS/MS data acquired from peptides ionized via ESI.

Approaches to assess and increase confidence of data sets. In contrast to the plethora of routes to analyze peptide mixtures, approaches to analyze the resulting data sets are fairly similar. In most cases, the raw data files are first processed by the software controlling the respective mass spectrometry instrument. Typical processing steps include smoothing, centroiding and charge-state deconvolution of the acquired spectra. The generated data sets are then searched against a protein database. The two most commonly used algorithms use either a probabilistic approximation such as the search engine MASCOT84 or a mathematical correlation method such as SEQUEST85. Although these and other search engines differ in their mathematical approach and exact statistical methods, the most crucial factor affecting false positive and false negative rates is the applied mass accuracy. Consequently, the greatest care should be taken in defining the thresholds of the minimum scores and the allowed mass tolerances for the precursor ion and the fragment ions. A valid approach for validation of the chosen parameters is to search the obtained data sets against a decoy protein database86.

After protein identification by one of the available search engines, the data might be further filtered by setting specific thresholds such as a minimum peptide length or a specific number of peptides to consider a protein identification to be correct. There is currently a controversy about whether protein identifications should be based on a minimum of two tryptic peptides or if protein identifications based on a single tandem mass spectrum with very high statistical significance or MS/MS/MSdata of one peptide are sufficient87.

Often a set of identified peptides will match to more than one protein, which are often homologous proteins or different isoforms of the same reading frame88. Commercial algorithms have been developed to group these proteins, facilitating assessment and interpretation of data.

Data standardization, interpretation and validation
Data standardization. The high-throughput nature of proteomics has generated a rapidly increasing flow of progressively complex protein identification data. Given the different types of mass spectrometric instrumentation, ionization processes and software platforms, the assessment of published data becomes increasingly difficult. To facilitate sharing experimental data, common standards in data acquisition, data interpretation and data storage are required. The proteomics community has begun a process toward defining the minimal standards for generating and publishing mass-spectrometric data and proteomics experiments. Although this process is far from complete and many different groups of researchers have defined their own standardization protocols89, it is foreseeable that in the future a common format of publishing and storing of proteomic data will exist90. It is evident that fundamental experimental information such as the mass-spectrometric instrumentation, resolution, mass accuracy, software for data interpretation, HPLC flow rates and composition or type of MALDI sample plates must be reported and stored in a specific format.

Related standardization initiatives tackle the protein identification process. Descriptions of the software used, input parameters such as the database queried, the restrictions applied to the search, the cleavage agents and the mass tolerances used should be reported in proteomics publications. Specific requirements for publishing the output of the identification process, such as the accession codes of the identified proteins, the protein scores or the obtained sequence coverage have also been defined. In addition, any publication reporting proteomics data should contain a statistical analysis of the data such as the determination of the false positive rate.

Similar trends have emerged for the standardization of proteomic protein-protein interaction data91. In recent years, several public databases have been created for storing functional proteomics data sets. Now the data are manually curated, and are often extracted from publications. The main aim of standardization in reporting protein-protein interaction data is to define common standards, similar to those of nucleotide databases. The use of ambiguous protein identifiers and unclear descriptions of experimental conditions seriously hinder the development of interaction databases and the exchange of information between them. Key data such as exact accession numbers and classification of the molecular role of any published proteins should be included in each publication.

Data validation and interpretation. Because mass spectrometry is such a sensitive technique, an undesired side effect is that contaminating proteins such as keratins and highly abundant proteins are also identified in purification experiments. In the case of protein-protein interaction experiments at a small scale, these contaminants do not impose substantial problems upon data validation, but can be removed from the data set based on biological knowledge and experimental data sets. Putative interacting proteins can be evaluated by various orthogonal methods such as colocalization studies, gain-of-function or loss-of-function experiments28.

Quantitative methods in mass spectrometry can also facilitate this process (for reviews see refs. 20,21,38.). By applying relative quantification methods such as the isotope-coded affinity tag (ICAT), the genuine components of a protein complex can be distinguished from contaminant proteins by comparing the relative abundances of the differently isotope-labeled peptides derived from a control sample and the purified specific complex37. Absolute quantitation of proteins can be achieved by spiking a single sample with isotopically labeled standard peptides92. This technique can be used to define the stoichiometry of a protein complex.

One critical general limitation encountered in the interpretation of the data obtained from the purification of a protein complex is a lack of information about binary interactions. At least in theory it is possible that several proteins copurify, each binding like beads on a string to maximally two components. From biophysical, structural or genetic methods in combination with biochemistry, we know that complexes assemble according to precise assembly steps. The order of complex assembly, post-translational modifications, allosteric effects and cooperative binding are fundamental parts of the biological integration of the machine in the cellular orchestration. Bioinformatics tools can use existing protein-protein interaction data sets from the literature, binding prediction and structural considerations to compose a possible interaction diagram for each complex33, 93. Thus binary interaction data, as provided by yeast two-hybrid screens or pairwise biochemical interaction experiments (Fig. 4a), is perfectly complementary to mass spectrometry–based approaches.

Figure 4. Illustration of the types of protein networks that can be elucidated with different experimental approaches.
Figure 4 thumbnail

(a) Binary protein interactions are typically obtained from two-hybrid assays. (b) The purification of protein complexes leads to a corresponding protein network where two protein complexes are connected by sharing one or more proteins, indicated by lines. (c) Protein interaction networks can be generated from protein pulldown assays. The matrix model represents purified protein assemblies as if interacting all with each other. (d) Statistical analysis and clustering demonstrates the modularity of protein complexes. Core components and alternative attachments or modules of one or more other proteins are depicted.



Full FigureFull Figure and legend (29K)
In contrast, the nature of systematic large-scale experiments does not allow for the subjective and individual evaluation of their results. In these cases, the removal of potential contaminating proteins can not be based on judging individual purifications. One possible approach to highlight potentially contaminating proteins in high-throughput data is to quantitatively compare them37 against core proteomes94, defined as the subset of highly expressed proteins in a cell. Another approach is to subtract the proteins identified in a larger number of pulldown assays than a specific cut-off value95 or present in pull-downs of unrelated proteins3. In various yeast studies8, 95, 96 using the TAP tagging technique61, the problems associated with sticky proteins appear to be relatively moderate, also because the redundancy of repeated analysis of a limited number of complexes using different baits allows for robust statistical and probabilistic analyses95, 96. In experiments using cell lines derived from higher organisms, contaminating proteins can be a critical issue, particularly if too few repetitions are performed to allow for statistical analysis28.

Medium- and large-scale experiments require a completely automated interpretation routine, and several bioinformatics approaches to this problem have been developed. Complexes can be represented as static entities, leading to the creation of networks in which the edges represent proteins shared between complexes (that is, present in different assemblies, Fig. 4b)8. This however only poorly captures the dynamics of complex composition and is very sensitive to any abundant false positive result that survives the statistical filtering for significance. If one assumes that each protein in a purified complex interacts with each other protein in the same complex (Fig. 4c), then one can easily generate large networks in which each protein is represented as an entity connected to various degrees with other proteins, in a spiderweb of binary interactions similar to those generated by two-hybrid screens97, 98 (Fig. 4a). The comprehensive analyses of protein interactions in the yeast proteome have been performed with a high level of redundancy (via repurification using different baits), facilitating the calculation of how often a protein was retrieved in a reciprocal manner (spike model)96. Building upon the matrix model, a conceptual 'affinity-index' can be assigned to each protein pair. The data can be further processed by clustering analysis and applying a cutoff that corresponds to the ideal representation of very well characterized examples of protein complexes from the literature. Complexes are thus represented as modular entities consisting of core components that are always together and alternative 'attachments' of one or more other proteins96 (Fig. 4d). Decomposed this way, the modular organization of the proteome allows for the prediction of evolutionarily more conserved elements (cores) and less conserved connections (attachments) as well as for diversification of function based on a combination of limited sets of components96.

Future perspectives
Recent major technical developments in protein-complex purification, mass spectrometry and bioinformatics will facilitate analysis of protein interactions. Several large-scale studies of protein complexes in yeast95, 96, 97, 98 and animals have been reported, and there are plans to tackle the human proteome.

What are the major challenges and future goals? The major ones include (i) the ever-changing nature of the proteome composition, (ii) the dynamics of protein complex assembly and disassembly, (iii) the absolute and relative quantitation of proteins, (iv) the capturing of endogenous complexes from native cells and tissues, (v) the integration of mass spectrometry–based data sets with the data sets from binary interaction screens, localization studies, genetic or structural information and known interactions with metabolites, (vi) and the generation of a 'molecular anatomy of the cell view' bridging the structure of the molecular machine to the organelle substructural level93, 99. Current experimental limitations hinder the mapping of transient or weak interactions between proteins and the comprehensive characterization of their post-translational modifications. We are confident, however, that all these experimental limitations will be overcome as an increasing number of scientists apply proteomic methods to biological as well as to clinically relevant questions in biomedical research. We predict that medicine will also profit enormously from these emerging trends100, especially from the development of technologies for the identification and characterization of multiparameter diagnostic and prognostic tools as well as from the discovery of new targets for therapeutics.

Published online: 27 September 2007.

 Top
REFERENCES
  1. Steen, H. & Mann, M. The ABC's (and XYZ's) of peptide sequencing. Nat. Rev. Mol. Cell Biol. 5, 699–711 (2004). | Article | PubMed | ISI | ChemPort |
  2. Ferguson, P.L. & Smith, R.D. Proteome analysis by mass spectrometry. Annu. Rev. Biophys. Biomol. Struct. 32, 399–424 (2003). | Article | PubMed | ISI | ChemPort |
  3. Bouwmeester, T. et al. A physical and functional map of the human TNF-alpha NF-kappa B signal transduction pathway. Nat. Cell Bio. 6, 97–105 (2004). | ISI | ChemPort |
  4. Blagoev, B. et al. A proteomics strategy to elucidate functional protein-protein interactions applied to EGF signaling. Nat. Biotechnol. 21, 315–318 (2003). | Article | PubMed | ISI | ChemPort |
  5. Major, M.B. et al. Wilms tumor suppressor WTX negatively regulates WNT/beta-catenin signaling. Science 316, 1043–1046 (2007). | Article | PubMed | ISI | ChemPort |
  6. Weston, A.D. & Hood, L. Systems biology, proteomics, and the future of health care: Toward predictive, preventative, and personalized medicine. J. Proteome Res. 3, 179–196 (2004). | Article | PubMed | ISI | ChemPort |
  7. Anderson, N.L. & Anderson, N.G. The human plasma proteome — history, character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845–867 (2002). | Article | PubMed | ISI | ChemPort |
  8. Gavin, A.C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002). | Article | PubMed | ISI | ChemPort |
  9. Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002). | Article | PubMed | ISI | ChemPort |
  10. Riedel, C.G. et al. Protein phosphatase 2A protects centromeric sister chromatid cohesion during meiosis I. Nature 441, 53–61 (2006). | Article | PubMed | ISI | ChemPort |
  11. Vanacova, S. et al. A new yeast poly(A) polymerase complex involved in RNA quality control. PLoS Biol. 3, 986–997 (2005). | ISI | ChemPort |
  12. Bertwistle, D., Sugimoto, M. & Sherr, C.J. Physical and functional interactions of the Arf tumor suppressor protein with nucleophosmin/B23. Mol. Cell. Biol. 24, 985–996 (2004). | Article | PubMed | ISI | ChemPort |
  13. Lorentzen, E. et al. The archaeal exosome core is a hexameric ring structure with three catalytic subunits. Nat. Struct. Mol. Biol. 12, 575–581 (2005). | Article | PubMed | ISI | ChemPort |
  14. Hao, B., Oehlmann, S., Sowa, M.E., Harper, J.W. & Pavletich, N.P. Structure of a Fbw7-Skp1-cyclin E complex: multisite-phosphorylated substrate recognition by SCF ubiquitin ligases. Mol. Cell 26, 131–143 (2007). | Article | PubMed | ISI | ChemPort |
  15. Nickell, S., Kofler, C., Leis, A.P. & Baumeister, W. A visual approach to proteomics. Nat. Rev. Mol. Cell Biol. 7, 225–230 (2006). | Article | PubMed | ISI | ChemPort |
  16. Groll, M., Bochtler, M., Brandstetter, H., Clausen, T. & Huber, R. Molecular machines for protein degradation. ChemBioChem 6, 222–256 (2005). | Article | PubMed | ISI | ChemPort |
  17. Gorg, A., Weiss, W. & Dunn, M.J. Current two-dimensional electrophoresis technology for proteomics. Proteomics 4, 3665–3685 (2004). | Article | PubMed | ISI | ChemPort |
  18. Wulfkuhle, J.D. et al. Proteomics of human breast ductal carcinoma in situ. Cancer Res. 62, 6740–6749 (2002). | PubMed | ISI | ChemPort |
  19. Le Naour, F. et al. Profiling changes in gene expression during differentiation and maturation of monocyte-derived dendritic cells using both oligonucleotide microarrays and proteomics. J. Biol. Chem. 276, 17920–17931 (2001). | Article | PubMed | ISI |