Peptide quantification features¶

Use cases¶

The features table (peptide features) aims to cover detail on peptide level, including peptide intensity in relation to the sample metadata. The feature parquet file is the combination of between the MSstats, mzTab and Triqler peptide tables:

Store peptide intensities in relation to the sample metadata to perform down-stream analysis and integration. This file can be used as input of MSstats and ibaqpy for protein quantification.
Enable peptide level statistics and algorithms to move from peptide level to protein level.

NOTE: quantms also release the peptide table for MSstats. The objective of the feature table is to provide a more general peptide table and improve the annotations of the peptides with more columns.

Format¶

Peptide properties and columns:

sequence: The peptide’s sequence corresponding to the feature, this peptide sequence do not includes post-translational modifications -> string
unique: Indicates whether the peptide sequence is unique for this protein in respect to the searched database -> boolean (0/1)
modifications: A list of modifications for a give peptide [modification1, modification2, ...]. A modification should be recorded as string like Modifications -> list[string]
charge: The charge assigned by the search engine/software -> integer
calc_mass_to_charge: The PSM’s calculated (theoretical) mass to charge (m/z) -> double
exp_mass_to_charge: The PSM’s experimental mass to charge (m/z) -> double
peptidoform: Peptidoform of the PSM. See more Peptidoform -> string
posterior_error_probability: Posterior Error Probability score from quantms -> double
global_qvalue: Global q-value for the feature for the peptide identification in the experiment -> double
is_decoy: Indicates whether the peptide sequence (coming from the PSM) is decoy -> boolean (0/1)
intensity: The abundance of the peptide in the sample -> float
spectral_count: The number of spectra that match the peptide. Number of a PSMs for a given peptidoform in a given file (peptide sequence + charge + modifications). If the peptidoform in the file is a product of an inference process like match between runs, it must be 0, but if the value is not computed or provided it must be NA or Null -> integer
retention_time: The retention time of the feature -> float

Properties and columns from sample:

sample_accession: The sample accession in the sdrf which column is called source name -> string
condition: The value for the factor value column in the sdrf, for example, the tissue name for the given sample in the column factor value[organism part] -> string
fraction: The index value in the SDRF for the fraction column -> string
biological_replicate: The value of the biological replicate column in the SDRF in relation with the condition -> string
fragment_ion: The column defines a spectral feature: fragment ions e.g. y7. If information for the column is not available or not applicable, it should be set to a constant value NA -> string
isotope_label_type: The column indicates whether the measurement is based on an endogenous peptide (indicated by value L or light) or reference peptide (indicated by value H or heavy) -> string
run: The column stores IDs of mass spectrometry runs for LFQ experiments e.g. 1. For TMT/iTRAQ experiments, it is a identifier of mixture combined with technical replicate and fractions {mixture}_{technical_replicate}_{fraction} e.g. 1_2_3 -> string
channel: The channel used to label the sample (e.g. TMT115)-> string
reference_file_name: The reference file name that contains the feature. -> string

Protein group samples: - protein_accessions: A list protein’s accessions -> list[string] - protein_start_positions: A list of protein’s start positions -> list[int] - protein_end_positions: A list of protein’s end positions -> list[int] - protein_global_qvalue: Global q-value associated with the protein or protein group. -> double

Optional fields:

gene_accessions: A list of gene accessions -> list[string]
gene_names: A list of gene names -> list[string]
id_scores: A list of identification scores, search engine, percolator etc. Each search engine score will be a key/value pair (e.g. "MS-GF:RawScore": 78.9) -> list[string]
best_psm_reference_file_name: The reference file containing the best PSM that identified the feature. Note: This file can be different from the file that contains the feature (reference_file_name).
best_psm_scan_number: The scan number of the spectrum. The scan number or index of the spectrum in the file -> string
mz_array: A list of mz values for the spectrum -> list[double]
intensity_array: A list of intensity values for the spectrum -> list[float]
num_peaks: The number of peaks in the spectrum, this is the size of previous lists intensity and mz -> integer