In an earlier post, I have talked about the principle of how a mass spectrometry works (Link) and how proteomics by sequencing is done using MS (Link). I had a few readers who suggested the idea that I have talked about MS-based shotgun sequencing but proteomics could be done even without sequencing. For example, MALDI-TOF analysis can tell about the protein identity which doesn't involve sequencing. This is absolutely true. However, such assays are now nearly outdated and sequence information can give us a lot more insight than just predicting protein based on the m/z values. In the earlier post, I ended with a note saying that I will revert back to the topic and talk about proteogenomics, targeted proteomics and quantitative proteomics. In this post, I will talk about labelling methods for quantitative proteomics or sometimes referred to as differential proteomics. If you have not read my earlier posts on MS, I strongly recommend that you read them first.
Let us build an example scenario. You want to learn what are the changes that occur in the cell after a virus infection. The most likely scenario in terms of proteome would be certain proteins will have increased expression and certain will have decreased expression, as a result of interaction with a virus. If you could find out what those proteins are, then there is a good chance that you could predict the pathways that have been disrupted. But for identifying what is the fold change, we have to quantify each protein. In a traditional assay like quantitative ELISA, the protein is directly estimated using a set of standards and then plot a graph. In proteomics, several thousand proteins are estimated in a single run and hence it is not practical to have several standards for every individual protein. MS technique is originally designed to be a detection methodology and not a quantitative technique.
MS is a very sensitive technique, and there is a statistical chance that certain ions are more easily picked up than others which mean that the peak height or area in a mass spectrum in itself does not accurately reflect the abundance of a peptide in the sample. The main reasons for this are the differences in ionisation efficiency and detectability of peptides. Mathematically the equation would look something like this (I will not get into the actual mathematics since that is not relevant here).
Protein concentration= MS abundance value x Error factor
The error factor depends on each run and will vary from experiment to experiment. Consider this experiment. If you have a cell lysate you run it 10 times in LC-MS/MS analysis the final result will be varied from experiment to experiment. In fact, the number of proteins identified will also significantly change and you can expect a variation of at least 30% between any two runs as shown by multiple studies. If you run 2 independent batches of LC-MS/MS for comparison then the final result will consist only of error for purposes of direct comparison. The best idea would be to compare proteins from test and control in the same run so that the error will be constant. Since the error factor is the same in both cases (which is unknown), relative fold change can be accurately calculated by comparing the abundance value of m/z peak from the experiment.
So what is required for comparison is to run all the protein preparation that has to be compared in a single mass spec run. Now you need a method to tell which peptide came from whom. That is why we label the peptide library obtained from each case. Let us say you want to run 5 biological test cases against 5 biological control case that would be a 10 plex labelling experiment with each condition being labelled with a different label. The label will tell MS where the peptide originally came from and how much of it is there in t.
|Fig 1: Hypothetical example of m/z abundance|
as an indicator of fold change.
Fig 1, is a hypothetical example of m/z abundance as an indicator of fold change. Consider you are comparing 3 cases against a control sample. The height of the peak represents the peptide abundance. In comparison to control, the case 1 is slightly elevated, case 2 is drastically down and case 3 is unchanged. This kind of comparison is available for all the peptides that have been detected in MS. The overall finding is then curated by the software and presented as a protein expression data with reference to the control.
|Fig 2: Labelling methods for quantification of proteins in Mass Spectrometry.|
There are wide varieties of labelling methods available and different literature have a different classification and there is an overlap in some cases. For simplicity, labelling methods can be broadly classified into 3 subtypes- Metabolic, Enzymatic and chemical labelling.See Fig 2 for a summarised classification. It is not possible to talk about all the methods and intricate details of every method, which would make this post too long. I will stick to explaining a few methods that are more famous in biological practice which will give an idea of what exactly is happening. Chemical labelling is much similar to metabolic labelling except that the label is chemically attached to a particular peptide after extraction unlike doing it metabolically. Enzymatic labelling is almost a chemical labelling except that it is done using an enzymatic process.
Stable isotope labelling by amino acids in cell culture (SILAC)
Fig 3: Example light and heavy amino acids for SILAC.
The methodology considers the idea of class-2 proteases, such as trypsin, to catalyse the exchange of two 16 O atoms for two 18 O atoms at the C-terminal carboxyl group of proteolytic peptides. Hydrolysis of a protein in H218O by a protease results in the incorporation of one 18 O atom into the carboxyl terminus of each proteolytically generated peptide. Despite its simplicity, the method is not in regular use owing to the difficulty in attaining a high labelling accuracy.
|Fig 4: Structure of TMT tags. Source|
This is probably one of the most common labelling methods to be used. Let us take the example of TMT (Tandem mass tags). Labels are basically isobaric compounds (They have same net mass) with a peptide binding site.
Each chemical tag contains a different number of heavy isotopes in the mass reporter region, which gives a unique reporter mass during tandem MS/MS for sample identification and relative quantitation, a mass normaliser which adjusts for the mass and a reactive group.
I have limited the discussion on labelling methods to the basic essence to give you an idea of how the system works. I recommend you read the references to have a detailed picture of the process.
Tabb et al. Repeatability and Reproducibility in Proteomic Identifications by Liquid Chromatography-Tandem Mass Spectrometry. J Proteome Res. 2010 Feb 5; 9(2): 761. doi: 10.1021/pr9006365
Ong S, Mann M. A practical recipe for stable isotope labeling by amino acids in cell culture (SILAC). Nature Protocols. 2007;1(6):2650-2660.
Rauniyar N, Yates J. Isobaric Labeling-Based Relative Quantification in Shotgun Proteomics. Journal of Proteome Research. 2014;13(12):5293-5309.