utils
contiguous_peptides(df)
Given a dataframe with 'start' and 'end' columns, each describing a range, (inclusive intervals), this function returns a list of tuples representing contiguous regions.
Source code in hdxms_datasets/utils.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
|
get_peptides_by_type(peptides, deuteration_type)
Get peptides of a specific deuteration type.
Source code in hdxms_datasets/utils.py
187 188 189 190 191 192 193 194 195 196 |
|
non_overlapping_peptides(df)
Given a dataframe with 'start' and 'end' columns, each describing a range, (inclusive intervals), this function returns a list of tuples representing non-overlapping peptides.
Source code in hdxms_datasets/utils.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
peptide_redundancy(df)
Compute the redundancy of peptides in a DataFrame based on their start and end positions. Redundancy is defined as the number of peptides overlapping at each position.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
IntoFrame
|
DataFrame containing peptide information with 'start' and 'end' columns. |
required |
start
|
Column name for the start position. |
required | |
end
|
Column name for the end position. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
A tuple containing: |
ndarray
|
|
tuple[ndarray, ndarray]
|
|
Source code in hdxms_datasets/utils.py
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
|
peptides_are_unique(peptides_df)
Check if the peptides in the dataframe are unique.
Source code in hdxms_datasets/utils.py
199 200 201 202 |
|
reconstruct_sequence(peptides, known_sequence, n_term=1, start='start', end='end', sequence='sequence')
Reconstruct the sequence form a dataframe of peptides with sequence information. The sequence is reconstructed by replacing the known sequence with the peptide sequences at the specified start and end positions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
peptides
|
DataFrame
|
DataFrame containing peptide information. |
required |
known_sequence
|
str
|
Starting sequence. Can be a string 'X' as placeholder. |
required |
n_term
|
int
|
The residue number of the N-terminal residue. This is typically 1, can be negative in case of purification tags. |
1
|
start
|
Column name for the start position of the peptide. |
'start'
|
|
end
|
Column name for the end position of the peptide. |
'end'
|
|
sequence
|
Column name for the peptide sequence. |
'sequence'
|
Returns:
Type | Description |
---|---|
str
|
The reconstructed sequence. |
Source code in hdxms_datasets/utils.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
records_to_dict(records)
Convert a list of records to a dictionary.
Source code in hdxms_datasets/utils.py
17 18 19 20 21 22 23 24 25 26 |
|
verify_sequence(peptides, known_sequence, n_term=1, start='start', end='end', sequence='sequence')
Verify the sequence of peptides against the given sequence.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
peptides
|
IntoFrame
|
DataFrame containing peptide information. |
required |
sequence
|
The original sequence to check against. |
'sequence'
|
|
n_term
|
int
|
The number of N-terminal residues to consider. |
1
|
Returns:
Type | Description |
---|---|
list[tuple[int, str, str]]
|
A tuple containing the fixed sequence and a list of mismatches. |
Source code in hdxms_datasets/utils.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|