Quickstart
This is a quickstart guide for PS
File extraction and preparation
Before using PS make sure you extracted your data correctly from PD or MQ
Proteome Discoverer
Proteome discoverer does not produce any output files so a PSM file should be manually exported. The procedure is different according to PD’s version. For older versions simply adjsut the ratio calculation values as seen below
and disable peptide grouping. Then go to File -> export -> to text to get a table file containing all the necessary information. For later versions of PD, export to tabular format the PSM information under PSM tab. Since PS is based in comparing the variations between biological - technical replicates and conditions, make sure that all files from all MS runs that correspond to all biological, technical replicates, conditions and fractions were fed to Proteome Discoverer to run the analysis.
MaxQuant
MaxQuant automatically produces some result files taht are saved under a folder named “txt”. The evidence and proteinGroups files are the ones necessary for PD. Also make sure that all MS runs for replicates, conditions and fractions are fed at once to MQ so that the result files contain the necessary merged information for all MS runs.
Running a simple comparison analysis
The following instructions show you how to run a simple analysis. To do so we will use a test dataset (SILAC 2plex MQ) that contain a 2 condition dataset coming from MaxQuant and 72 total MS runs - 3 biological replicates with 2 technical replicates each - fractionated to 12 fractions per sample.
Using the web intefrace
Whether you installed PS in a single computer or in a local server or if you are using PS as a web application you will be using it through our web interface. It is advisable that you open our help page simultaneously in case you want to see screenshots of the procedure in parallel
- First upload your data in Step1: Upload the PSM file from PD or the evidence.txt and proteinGroups.txt from MQ. You should see a progress bar until the files are uploaded and then a message saying Processing please wait… Proteosign automatically detects the type of the experiment and the processing software (PD or MQ) for you. It will also detect all rawfiles contained (one per MS run) and all labels and tags - if any. In case you don’t have an experiment ready to test and just want to see how PS works, you can use one of our examples - click on choose on Step 1 and opt for an experiment that suits your needs. Note that all options in the next steps for the example datasets will be automatically filled in.
- Define your experimental structure in Step 2: As described in Quantitative proteomics analysis basics a typical proteomics experiment consists of different MS runs that correspond to different biological replicates, technical replicates, fractions and possibly different conditions (see the diagrams A diagram showing a typical labelled experiment and A diagram showing a typical label free experiment). In step 2 you are asked to describe this experiment to PS so that it knows where the data derive from. In a labelled or tagged experiment you need to define only the biological and technical replicate per rawfile. For example in the experiment described by the flowchart above the rawfiles “Brep1 Trep1 Frac1”, “Brep1 Trep1 Frac2”, “Brep1 Trep1 Frac3” should all be assigned to Bio replicate # = 1 and Tech replcate# = 1 - select the rawfiles by clicking on them and use the text boxes on the roght to assign the replicate numbers. PS will automatically assign different fraction numbers to these raw files - note that the order of the fraction numbers does not matter. In case your experiment is label free, different rawfiles represent also different conditions. To assign a condition to some raw files, select them - right click on them and select “assign condition”. Type a description of the condition and hit the Ok button. Then assign the biological and technical replicates as above. The next button will be enabled when all necessary information is given - note that PS currently needs at least two biological relicates to be part of the experiment to complete the analysis so you might see the next button disabled in case you provided only one biorep to PS.
- Define your experiment’s parameters in Step 3: PS accepts two descriptive texts for your experiment: experiment ID end experiment description. The reason is that you may want to run many analyses in parallel - for example in different time points - and gather the results to extract more information afterwards. These parallel experiments might share a common ID but have a different description (e.g. 0h, 6h etc.), all result files will contain both the ID and the description in their filename. In the same step highlight the conditions you want to compare (at least one pair should be highlighted) - in case you want to perform more than one pair’s comparison select more than 2 conditions and all possible comprisons will be done. More advanced options are described below:
- Hit the Submit button and wait for the analysis to be completed
- After analysis completion you will be able to preview the main plots that are returned by PS. Hitting the next button will provide you with a download link containing all the results. Most of them are plots of all comparisons in pdf format as described in Quantitative proteomics analysis basics: Quality testing and post processing - ProteoSign’s output. Many tabular files are also contained in the results that can be opened using a spreadsheet application such as MS excel. The most comprehensive of them is “results” ([Exp_ID]results[Exp_description].txt) that compares the abundance of all proteins between conditions. The following list describes all columns in this file:
- Protein: a short description of the protein
- avg log2 [condition]/[condition]: the binary logarithm of the abundance ration of the protein, a value more than 1 or less than -1 is usually related to a significant abundance ratio between the conditions
- P-value adjusted [condition]/[condition]: the p-value of the analysis for the specific protein, usually a value less than 0.05 shows a statistically significant abundance ratio (adjusted using Bonferoni correction)
- [replicate] Ratio counts: how many times a ratio between conditions was calculated in a specific replicate
- A: the average abundance (intensity) of the protein in all replicates
- Coef [condition]/[condition]: Limma applies a linear regression model for each protein between the different replicates, coef represents the regression coefficient of this model. The terms [condition]/[condition] might not appear if only 2 conditions are compared
- t [condition]/[condition]: a t statistic for the protein, testing whether it is not differentially expressed in a specific comparison (e.g. Heavy vs Light). The terms [condition]/[condition] might not appear if only 2 conditions are compared
- P-value [condition]/[condition]: the p value of the t statistic used to infer the statistical significance of the abundance difference. The afforementioned adjusted p value is a corrected value of this very p value. The terms [condition]/[condition] might not appear if only 2 conditions are compared
- F: the F statistic is useful only when a protein is tested against multiple comparisons (e.g. Heavy vs Light and Medium vs Light etc.). It is a statistic that tests if the protein is not differentially expressed in all comparisons in total. It is an overall test of signifiance for the protein.
- F p-value: the p value for the F statistic
- [condition] [replicate number]: the quantification value (the estimation of concentration) of the protein in a specific condition and replicate. Note that the replicate number is not a description of the replicate but the index of it. That is if you sort in ascending order the biological and technical replicates of your experiment the number you see in these columns is the corresponding index (e.g. b1t1 > 1, b1t2 > 2, b2t1 > 3, b2t2 > 4).
- N: the amount of condition - replicate pairs where the protein was quantified
- sd log2 [condition]/[condition]: the standard deviation of the residuals in the linear model for a specific comparison
- N log2 [condition]/[condition]: the amount of abundance ratios computed for a specific comparison
- log2 [condition]/[condition] [replicate number]: log2 intensity ratio for a specific replicate, note again that the replicate number is not a description of the replicate but the index of it. (see [condition] [replicate number] above for more info)
- The rest of the tabular files returned are for diagnostic purposes and mainly give insight into how PS uses the limma. Another remarkable file is “diffexp” ([Exp_ID]diffexp[Exp_description].txt) that gives very little information concerning the differentially expressed proteins and the parameters file ([Exp_ID]parameters_from_session[session_number].txt) that is automatically produced and can be used to enter the same parameters to PS in a future analysis of the same experiment.
- Hitting the Reset button will restart ProteoSign ready for a new analysis
Saving and Loading experimental parameters
A text file containing your experimental structure and all your options can be saved by hitting the Save parameters link (bottom right corner) in Steps 3 and 4. Also as described above a parameters file is automatically saved in the results folder when completing an analysis. In case you want to rerun the analysis with slightly different options you can reupload the same experiment and load the initial options using the load parameters file link in Steps 2 or 3.
Give us a Feedback or contact us
In case you want to contact the PS team hit the contact link in ProteoSign’s bottom right corner, send an e-mail to msdiffexp@gmail.com or (if you use our web service) use the feedback option after completing the analysis to rate PS and send us your thoughts.