Proteomics Analysis Basics

Quantitative proteomics analysis basics

An introduction to MS based quantitative analysis


Proteomics quantification

Quantitative proteomics is a term used to describe the techniques for determining the amount of proteins in a sample. The most widely used technique today is Mass spectrometry based proteomics where a biological sample of proteins is broken (digested) into peptides and these are introduced to a mass spectrometer. The mass spectra that are produced might not accurately define the amount of a peptide in the sample but can accurately measure the relative differences in the same peptide’s abundance between samples. Relative quantification is based on labeling peptides produced under different conditions with different labels and then searching for their relative abundance to define the impact of the conditions to protein production. The labels might be added using the tools of the cells themselves, introducing recidue-tagged aminoacids to samples (metabolic labelling) or using chemical methods after protein production (chemical labeling). Another technique described is the label free technique where each sample is passed through the Mass spectrometer separately and the peptides quantified from the different samples are compared together. After peptide quantification, the peptides are then fragmented and the fragments are used to find the peptide sequence. The spectrum that results from this procedure is called MS2. The sequences are searched against a known protein database to estimate the respective proteins’ abundances. A processing program such as Thermo’s Proteome Discoverer or MaxQuant is used to compute these relative abundances. ProteoSign gets the results of the aforementioned programs to do quality testing, produce descriptive plots and focus the attention of the biologist to the most interesting and affected proteins of the experiment. A diagram showing the different quantitative techniques

Experiment types and experimental structure

Stable isotope labeling by amino acids in cell culture (SILAC) is one of the most widely used experiment types in proteomics. In this type of experiment two cultures are cultivated under different conditions and heavy carbon isotopes are introduced to the one of them. After protein production the proteins are isolated, digested to peptides and fed to a mass spectrometer. The weight difference of the labelled and unlabelled peptides is used to tell which culture a peptide comes from. More isotopes can be introduced to the experiment to compare between more than two conditions. pSILAC is a SILAC variation where unmarked proteins are produced in two cultures and then different isotopes are introduced to the cultures for a limited amount of time. This allows for de novo protein production relative quantification. The unlabelled proteins are called “background proteins”. Isobaric labelling is a strategy where the proteins are labelled after their biological production with different isobaric tags that are indistinguishable in the Mass spectrum but are fragmented in the MS2 spectrum. The intensity of the fragments represents the abundance of the respective peptides. Tandem mass tag (TMT) is a common isobaric labelling technique allowing for up to 10 different tags to be introduced. Isobaric tag for relative and absolute quantitation (iTRAQ) is another isobaric labelling technique assigning isotopes to the N terminus or to the side chain amines of peptides allowing for up to 8 different labels per experiment. Label free is a strategy on its own. The different samples are quantified under different MS runs and each output is respective to a single condition Compensating for biological and experimental errors: Introducing different conditions to two or more cultures is a strenuous procedure. The cultures might produce erroneously differences in protein abundances due to biological variations between the cultures. To compensate this error different cultures are used in the same experiment under the same condition - these are called biological replicates. At the same time taking an aliquot from a protein culture is also prone to errors. To compensate this many alliquots might be taken from the same culture thus producing technical replicates. Finally - since a mass spectrometer can hardly analyse all proteins of an aliquot at once, this is broken into smaller ones called fractions and passed through the spectrometer consecutively. As a result, a typical labelled experiment where different conditions are represented as different labels can be described in the following diagram: A diagram showing a typical labelled experiment Notice that the amount of MS files that is output is equal to (Bio Reps) * (Tech Reps) * (Fractions) In the case of label free experiments each condition is represented as a different raw file so the amount of raw files is equal to (Conditions) * (Bio Reps) * (Tech Reps) * (Fractions) and the following diagram best describes the procedure: A diagram showing a typical label free experiment Usually in a labelled experiment, the conditions are represented as different labels and the biological and technical replicates as different MS runs but this may not be the case as in figure 1 in this experiment: An experiment where replicates are represented as different tags. These kinds of experiments are compatible with ProteoSign using a feature called “Replication Multiplexing”.

MaxQuant and Proteome Discoverer output files

ProteoSign gets MaxQuant (MQ) and Proteome Discoverer (PD) files as input to produce quality plots and perform differential analysis using LIMMA. MQ gives us many comprehensive output files including spreadsheet files that describe the identified proteins and their abundances. The output tables are located in the txt folder that is produced by MQ, the two files that PS uses are proteinGroups and evidence. evidence is a table file in which each row corresponds to a different peptide detected in possibly different variations such as different charge states. proteinGroups on the other hand contains information about the proteins detected in the samples and their respective calculated abundances. PS combines information from both these files to perform a statistical analysis using LIMMA. In contrast proteome discoverer is a proprietary program that returns a result table to the user and does not save text files automatically as MQ does. However at the end of an analysis PD shows a report that can be saved with the export -> to text command. Comprehensive instructions on how to do so is provided on the fly in PS.

Quality testing and post processing - ProteoSign’s output

PS runs a statistical analysis on your data using the power of LIMMA a powerful statistical package used mainly for microarray data. The results are a bunch of high quality plots and a results spreadsheet that shows the log2 ratio between every combination of conditions for each protein and a p-value for the statistical significance of that. The spreadsheet also provides many more information concerning the times a protein was found in each replicate, the intensity of the protein in each condition an replicate and the log fold ratio in each replicate. The result plots are well documented statistical plots. For each condition combination a volcano plot is provided that shows the log fold ratio plotted against the respective p value for each protein. A typical volcano plot Proteins on the sides are up or down regulated respectively and the higher a protein is found the higher its statistical significance. An MA plot plots the binary logarithm of the abundance ratio against the average log intensity of each proteins in all replicates. MA plots are excelent for quality testing since they should normally be symmetrical to the x axis as seen below. A typical MA plot A unique matrix plot is also provided: each tile in the diagonal represents a different replicate and shows the distribution of the log ratios of all the proteins in the replicate. Each tile below them show a scatter plot of all proteins identified in the respective combination of replicates (for example the scatter plot below replicate 1 and on the left of replicate 2 show the scatterplot of proteins in replicates 1 and 2). This should be more or less symmetrical around the y=x axis and proteins should be mostly located close this axis depicting low replication variation. The symmetrical tile shows the corresponding pearson R that should be near 1 depicting low variation as well. A typical matrix plot Some other plots are also produced emerging from LIMMA itself.