Project file¶

The project file is a JSON file that contains the metadata of the project. The project file is used to link the different files of the project and to store the metadata of the project. The project file is a json file that contains the following fields:

project_accession -> ProteomeXchange Identifier -> string
project_title -> Project title -> string
project_description -> Project description -> string
project_sample_description -> Sample description of the project -> string
project_data_description -> Data description of the project -> string
project_pubmed_id -> PubMed identifier -> string
organism -> List organism name -> list[string]
organism_part -> List of organism part -> list[string]
disease -> List of diseases -> list[string]
cell line -> List of cell line (if available) -> list[string]
instrument -> List of instrument names -> list[string]
enzyme -> List of protease type for digest -> list[string]
experiment_type -> List of all keywords in ProteomeXchange or PRIDE around the dataset. -> list[string]
acquisition_properties -> List of key value pairs for the acquisition properties (see example below) -> list[Key/Value]
quantms_files -> List of all files generated by quantms and collected in the final results folder-> list[Key/Value]
quantms_version -> Version of quantms used to generate the files -> string
comments -> List of comments or additional information needed -> list[string]

Key/Value pair object:

The key/value pairs are used to store the acquisition properties and the quantms files. The key/value pair object is a json object that contains the following fields:

key -> Key of the pair -> string
value -> Value of the pair -> string

Example of acquisition_properties:

"acquisition_properties": [
     {"precursor tolerance": "0.05 Da"},
     {"dissociation method": "HCD"}
]

In the acquisition properties only the instrument and the enzyme are not present and should be written independently in the properties instrument and enzyme.

Quantms files¶

Recommendations for the file name in the quantms project. The file name should be in the following format:

{user_prefix}-{uui}.{file_section}.{file_extension}

Example of quantms_files:

"quantms_files": [
     {"protein_file": "PXD004683-550e8400-e29b-41d4-a716-446655440000.protein.parquet"},
     {"peptide_file": "PXD004683-550e8400-e29b-41d4-a716-446655440000.peptide.parquet"},
     {"psm_file":     "PXD004683-550e8400-e29b-41d4-a716-446655440000.psm.parquet"},
     {"feature_file": "PXD004683-958e8400-e29b-41f4-a716-446655440000.feature.parquet"},
     {"differential_file": "PXD004683-958e8400-e29b-41f4-a716-446655440000.differential.tsv"},
     {"absolute_file":     "PXD004683-958e8400-e29b-41f4-a716-446655440000.absolute.tsv"},
     {"sdrf_file":         "PXD004683-958e8400-e29b-41f4-a716-446655440000.sdrf.tsv"}
]

uuids: A Universally Unique Identifier (UUID) URN Namespace, as defined in RFC 4122, provides a standardized method for generating globally unique identifiers across various systems and applications. UUIDs are structured into five sections, separated by hyphens, which include a time-based timestamp, a clock sequence, and a node identifier. The UUID URN Namespace ensures that each generated UUID is highly unlikely to collide with any other UUID, even when produced by different entities and systems.

To generate file names using UUIDs in a programming language like Python, you can utilize the uuid module that provides functions to create UUIDs. Here’s an example of how you could generate and format UUID-based file names:

import uuid

def generate_uuid_filename():
    return uuid.uuid4()  # Generate a random UUID

# Generate and print a UUID-based file name
print("Generated UUID filename:", generate_uuid_filename())

In this Python code snippet, the generate_uuid_filename function creates a random UUID using the uuid4 function. The uuid in quantms will contain 5 sections separated by hyphens, which include a time-based timestamp, a clock sequence, and a node identifier.

file_sections: File sections are used to identify the type of file. The file sections are the following:

Protein file -> Protein table format
Peptide file -> Peptide table format
Peptide features -> Peptide quantification features
PSM file -> PSM table format
Differential file -> Differential expression format
Absolute file -> Absolute expression format
SDRF File -> Sample table

Sample table¶

We only provide here the SDRF format used to analyze the data with quantms. The SDRF file is a tab-delimited file that contains the metadata of the samples. The SDRF file is used to link the different files of the project and to store the metadata of the samples.

Read here more about SDRF.