Skip to content

HDXMS Datasets

Welcome to the HDXMS datasets repository.

The hdxms-datasets package provides tools handling HDX-MS datasets.

The package offers the following features:

  • Defining datasets and their experimental metadata
  • Verification of datasets and metadata
  • Loading datasets from local or remote (WIP) database
  • Conversion of datasets from various formats (e.g., DynamX, HDExaminer) to a standardized format
  • Propagation of standard deviations from replicates to fractional relative uptake values

Example Usage

```python {title="Loading a dataset"}

from hdxms_datasets import DataBase

db = DataBase('path/to/local_db') dataset = db.get_dataset('HDX_D9096080')

Protein identifier information

print(dataset.protein_identifiers.uniprot_entry_name)

> 'SECB_ECOLI'

Access HDX states

print([state.name for state in dataset.states])

> ['Tetramer', 'Dimer']

Get the sequence of the first state

state = dataset.states[0] print(state.protein_state.sequence)

> 'MSEQNNTEMTFQIQRIYT...'

Load peptides

peptides = state.peptides[0]

Access peptide information

print(peptides.deuteration_type, peptides.pH, peptides.temperature)

> DeuterationType.partially_deuterated 8.0 303.15

Load the peptide table as standardized narwhals DataFrame

df = peptides.load( convert=True, # convert column header names to open hdx stanard aggregate=True, # aggregate centroids / uptake values across replicates )

print(df.columns)

> ['start', 'end', 'sequence', 'state', 'exposure', 'centroid_mz', 'rt', 'rt_sd', 'uptake', ...

```python {title="Define a set of peptides for a state"}
from hdxms_datasets import ProteinState, Peptides, verify_sequence, merge_peptides, compute_uptake_metrics

# Define the protein state
protein_state = ProteinState(
    sequence="MSEQNNTEMTFQIQRIYTKDISFEAPNAPHVFQKDWQPEVKLDLDTASSQLADDVYEVVLRVTVTASLGEETAFLCEVQQGGIFSIAGIEGTQMAHCLGAYCPNILFPYARECITSMVSRGTFPQLNLAPVNFDALFMNYLQQQAGEGTEEHQDA",
    n_term=1,
    c_term=155,
    oligomeric_state=4,
)

# Define the partially deuterated peptides for the SecB state
pd_peptides = Peptides(
    data_file=data_dir / "ecSecB_apo.csv",
    data_format=PeptideFormat.DynamX_v3_state,
    deuteration_type=DeuterationType.partially_deuterated,
    filters={
        "State": "SecB WT apo",
        "Exposure": [0.167, 0.5, 1.0, 10.0, 100.000008],
    },
    pH=8.0,
    temperature=303.15,
    d_percentage=90.0,
)

# check for difference between the protein state sequence and the peptide sequences
mismatches = verify_sequence(pd_peptides.load(), protein_state.sequence, n_term=protein_state.n_term)
print(mismatches)
#> [] # sequences match

# Define the fully deuterated peptides for the SecB state
fd_peptides = Peptides(
    data_file=data_dir / "ecSecB_apo.csv",
    data_format=PeptideFormat.DynamX_v3_state,
    deuteration_type=DeuterationType.fully_deuterated,
    filters={
        "State": "Full deuteration control",
        "Exposure": 0.167,
    },
)

# merge both peptides together in a single dataframe
merged = merge_peptides([pd_peptides, fd_peptides])
print(merged.columns)
#> ['start', 'end', 'sequence', ... 'uptake', 'uptake_sd', 'fd_uptake', 'fd_uptake_sd']

# compute uptake metrics for the merged peptides
# this function computes uptake from centroid mass if not present
# as well as fractional uptake
processed = compute_uptake_metrics(merged)
print(processed.columns)
#> ['start', 'end', 'sequence', ... 'uptake', 'uptake_sd', 'fd_uptake', 'fd_uptake_sd', 'fractional_uptake', 'fractional_uptake_sd']

Installation

$ pip install hdxms-datasets