utils
FrameSlicer
Bases: Generic[IntoFrameT]
Wrap a DataFrame and allow indexing by column values (sorted).
Example
s = FrameSlicer(df, col="exposure")
first_df = s[0] # filtered dataframe where col == first unique value
three = s[0:3] # filtered dataframe where col in first three unique values
Source code in hdxms_datasets/utils.py
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 | |
contiguous_peptides(df)
Given a dataframe with 'start' and 'end' columns, each describing a range,
(inclusive intervals), this function returns a list of tuples
representing contiguous regions.
Source code in hdxms_datasets/utils.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | |
diff_sequence(a, b)
Compute the similarity ratio between two sequences.
Source code in hdxms_datasets/utils.py
13 14 15 16 17 | |
get_peptides_by_type(peptides, deuteration_type)
Get peptides of a specific deuteration type.
Source code in hdxms_datasets/utils.py
177 178 179 180 181 182 183 184 185 186 | |
non_overlapping_peptides(df)
Given a dataframe with 'start' and 'end' columns, each describing a range,
(inclusive intervals), this function returns a list of tuples
representing non-overlapping peptides.
Source code in hdxms_datasets/utils.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
peptide_redundancy(df)
Compute the redundancy of peptides in a DataFrame based on their start and end positions.
Redundancy is defined as the number of peptides overlapping at each position.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
IntoFrame
|
DataFrame containing peptide information with 'start' and 'end' columns. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
A tuple containing: |
ndarray
|
|
tuple[ndarray, ndarray]
|
|
Source code in hdxms_datasets/utils.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 | |
peptides_are_unique(peptides_df)
Check if the peptides in the dataframe are unique.
Source code in hdxms_datasets/utils.py
189 190 191 192 | |
reconstruct_sequence(peptides, known_sequence, n_term=1)
Reconstruct the sequence form a dataframe of peptides with sequence information.
The sequence is reconstructed by replacing the known sequence with the peptide
sequences at the specified start and end positions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
peptides
|
DataFrame
|
DataFrame containing peptide information. |
required |
known_sequence
|
str
|
Starting sequence. Can be a string 'X' as placeholder. |
required |
n_term
|
int
|
The residue number of the N-terminal residue. This is typically 1, can be |
1
|
Returns:
| Type | Description |
|---|---|
str
|
The reconstructed sequence. |
Source code in hdxms_datasets/utils.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | |
records_to_dict(records)
Convert a list of records to a dictionary.
Source code in hdxms_datasets/utils.py
20 21 22 23 24 25 26 27 28 29 | |
slice_exposure(df)
Factory returning FrameSlicer for df using column 'exposure'.
Source code in hdxms_datasets/utils.py
228 229 230 | |
verify_sequence(peptides, known_sequence, n_term=1)
Verify the sequence of peptides against the given sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
peptides
|
IntoFrame
|
DataFrame containing peptide information. |
required |
known_sequence
|
str
|
The original sequence to check against. |
required |
n_term
|
int
|
The number of N-terminal residues to consider. |
1
|
Returns:
| Type | Description |
|---|---|
list[tuple[int, str, str]]
|
A tuple containing the fixed sequence and a list of mismatches. |
Source code in hdxms_datasets/utils.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | |