Peptide table format#
Use cases#
The Peptide table aims to cover detail on peptide level including peptide intensity. The most of content are from peptide part of mzTab.
Store peptide intensity to perform down-stream analysis and integration.
Enable easy visualization and scanning on peptide level.
Format#
For large-scale datasets, a peptide section would be very large. Therefore, a Parquet format is adopted and its data section mainly consists of the following column:
sequence: Peptide sequence ->stringprotein_accessions: A list protein’s accessions ->list[string] (e.g. [P02768, P02769])unique: Indicates whether the peptide is unique for this protein in respect to the searched database ->boolean (0/1)best_id_score: A key value pair of the best search engine score selected by the algorithm(e.g. "MS-GF:RawScore": 234.0)->stringposterior_error_probability: Posterior Error Probability scores ->doublemodifications: A list of modifications for a give peptide ->[modification1, modification2, ...]. A modification should be recorded as string similarly to mztab like: -{position}({Probabilistic Score:0.9})|{position2}|..-{modification accession or name}-> e.g
1(Probabilistic Score:0.9)|2|3-UNIMOD:35charge: Precursor charge ->intexp_mass_to_charge: The precursor’s experimental mass to charge (m/z) ->doublepeptidoform: Peptidoform of the peptidePEPTIDE[+80.0]FORM->stringsample_accession: A unique sample accession corresponding to the source name in the SDRF->stringabundance: The peptide’s abundance in the given sample ->floatis_decoy: Indicates whether the peptide sequence is decoy ->boolean (0/1)
Optional fields:
number_of_psms: Number of PSMs for the peptide in the given samplesample_accession->intretention_time: Retention time (seconds), it can be the median across all retention times in the Peptide quantification features ->floatgene_accessions: A list of gene accessions ->list[string] (e.g. [ENSG00000139618, ENSG00000139618])gene_names: A list of gene names ->list[string] (e.g. [APOA1, APOA1])consensus_support: Global consensus support scores for multiple search engines ->floatid_scores: A list of identification scores, search engine, percolator etc. Each search engine score will be a key/value pair(e.g. "MS-GF:RawScore": 78.9)->list[string]reference_file_name: The reference file name that contains the spectrum. ->stringscan_number: The scan number of the spectrum. The scan number or index of the spectrum in the file ->string