Peptide table format¶
Use cases¶
The Peptide table aims to cover detail on peptide level including peptide intensity. The most of content are from peptide part of mzTab.
Store peptide intensity to perform down-stream analysis and integration.
Enable easy visualization and scanning on peptide level.
Format¶
For large-scale datasets, a peptide section would be very large. Therefore, a Parquet format is adopted and its data section mainly consists of the following column:
sequence: Peptide sequence ->stringprotein_accessions: A list protein’s accessions ->list[string] (e.g. [P02768, P02769])unique: Indicates whether the peptide is unique for this protein in respect to the searched database ->boolean (0/1)best_id_score: A key value pair of the best search engine score selected by the algorithm(e.g. "MS-GF:RawScore": 234.0)->stringposterior_error_probability: Posterior Error Probability scores ->doublemodifications: A list of modifications for a give peptide ->[modification1, modification2, ...]. A modification should be recorded as string similarly to mztab like: -{position}({Probabilistic Score:0.9})|{position2}|..-{modification accession or name}-> e.g
1(Probabilistic Score:0.9)|2|3-UNIMOD:35charge: Precursor charge ->intexp_mass_to_charge: The precursor’s experimental mass to charge (m/z) ->doublepeptidoform: Peptidoform of the peptidePEPTIDE[+80.0]FORM->stringsample_accession: A unique sample accession corresponding to the source name in the SDRF->stringabundance: The peptide’s abundance in the given sample ->floatis_decoy: Indicates whether the peptide sequence is decoy ->boolean (0/1)
Optional fields:
number_of_psms: Number of PSMs for the peptide in the given samplesample_accession->intretention_time: Retention time (seconds), it can be the median across all retention times in the feature ->floatgene_accessions: A list of gene accessions ->list[string] (e.g. [ENSG00000139618, ENSG00000139618])gene_names: A list of gene names ->list[string] (e.g. [APOA1, APOA1])consensus_support: Global consensus support scores for multiple search engines ->floatid_scores: A list of identification scores, search engine, percolator etc. Each search engine score will be a key/value pair(e.g. "MS-GF:RawScore": 78.9)->list[string]reference_file_name: The reference file name that contains the spectrum. ->stringscan_number: The scan number of the spectrum. The scan number or index of the spectrum in the file ->string