Differential expression format¶
Use cases¶
Store the differential express proteins between two contrasts, with the corresponding fold changes and p-values.
Enable easy visualization using tools like Volcano Plot.
Enable easy integration with other omics data resources.
Store metadata information about the project, the workflow and the columns in the file.
Format¶
The differential expression format by quantms is based on the MSstats output. The MSstats format is a tab-delimited file that contains the following fields - see example file:
protein
-> Protein Accessionlabel
-> Label for the contrast on which the fold changes and p-values are based onlog2fc
-> Log2 Fold Changese
-> Standard error of the log2 fold changedf
-> Degree of freedom of the Student testpvalue
-> Raw p-valuesadj.pvalue
-> P-values adjusted among all the proteins in the specific comparison using the approach by Benjamini and Hochbergissue
-> Issue column shows if there is any issue for inference in corresponding protein and comparison, for example, OneConditionMissing or CompleteMissing.
Example:
protein |
label |
log 2fc |
se |
d f |
pv al ue |
adj.p value |
i ss ue |
---|---|---|---|---|---|---|---|
LV86 1_HUMAN |
normal-squamous cell carcinoma |
0 .60 |
0. 87 |
8 |
0. 51 |
0.62 |
NA |
DE Header¶
By default, the MSstats format does not have any header of metadata. We
suggest adding a header to the output for better understanding of the
file. By default, MSstats allows comments in the file if the line starts
with #
. The quantms output will start with some key value pairs that
describe the project, the workflow and also the columns in the file. For
example:
#project_accession=PXD000000
In addition, for each Default
column of the matrix the following
information should be added:
#INFO=<ID=protein, Number=inf, Type=String, Description="Protein Accession">
#INFO=<ID=label, Number=1, Type=String, Description="Label for the Conditions combination">
#INFO=<ID=log2fc, Number=1, Type=Double, Description="Log2 Fold Change">
#INFO=<ID=se, Number=1, Type=Double, Description="Standard error of the log2 fold change">
#INFO=<ID=df, Number=1, Type=Integer, Description="Degree of freedom of the Student test">
#INFO=<ID=pvalue, Number=1, Type=Double, Description="Raw p-values">
#INFO=<ID=adj.pvalue, Number=1, Type=Double, Description="P-values adjusted among all the proteins in the specific comparison using the approach by Benjamini and Hochberg">
#INFO=<ID=issue, Number=1, Type=String, Description="Issue column shows if there is any issue for inference in corresponding protein and comparison">
The
ID
is the column name in the matrix, theNumber
is the number of values in the column (separated by;
), theType
is the type of the values in the column and theDescription
is a description of the column. The number of values in the column can go from 1 toinf
(infinity).Protein groups are written as a list of protein accessions separated by
;
(e.g.P12345;P12346
)
We suggest including the following properties in the header:
project_accession: The project accession in PRIDE Archive
project_title: The project title in PRIDE Archive
project_description: The project description in PRIDE Archive
quanmts_version: The version of the quantms workflow used to generate the file
factor_value: The factor values used in the analysis (e.g.
phenotype
)fdr_threshold: The FDR threshold used to filter the protein lists (e.g.
adj.pvalue < 0.05
)
A complete example of a quantms output file can be seen here.