dget package

Submodules

dget.adduct module

Class for adduct calculations.

class dget.adduct.Adduct(base: Formula, adduct: str)

Bases: object

Class used to create a molmass.Formula from a base molmass.Formula and an adduct string. This string should be in the format [nM+nX-nY]n+ where M is the base molecule and X, Y are gains / losses. Some valid examples are:

[M]+
[M-H]-
[M+Na]+
[2M-H]-
[M+2H]+
[M+K-2H]-

adduct: adduct string in the form [nM+nX-nY]n+

base: formula of the base molecule, represented by M in adduct

num_base: number of base molecules in adduct

formula: formula of the adduct

property composition: Composition: The composition of the adduct.

static is_valid_adduct(adduct: str) → bool

Test to see if adduct string is valid.

Tests string against Adduct.regex and makes sure any +/- adducts match Adduct.regex_split.

Parameters:: adduct – adduct string in the form [nM+nX-nY]n+
Returns:: True if valid

mz_range(min_fraction: float = 0.0) → Tuple[float, float]: Return the spectrum mz range.

regex = re.compile('\\[(\\d*)M(.*)\\](\\d+)?([+-])')

regex_split = re.compile('([+-])(\\d*)(\\w+)')

property spectrum: Spectrum: The spectrum of the adduct.

dget.convolve module

Convolution implementations.

Deconvolution is used by DGet to recover the original deuteration pattern from a given mass spectrum.

dget.convolve.deconvolve(x: ndarray, psf: ndarray) → Tuple[ndarray, ndarray]

Inverse of convolution.

Deconvolution is performed in frequency domain.

Parameters:

x – array
psf – point spread function

Returns:

recovered data remainder

Notes

Based on https://rosettacode.org/wiki/Deconvolution/1D

dget.dget module

Class for deuteration calculations.

Bases: object

Deuteration calculation class.

This class contains functions for calculating deuteration from a molecular formula and mass spectra.

The lowest deuteration state to include in the calculation can be selected using the cutoff argument. This accepts floats to specify an m/z or a string in the format ‘D<int>’ to specify the lowest state. By default the lowest state will be the first where 2 consecutive states are < 1% and the accumulated probability is > 10%.

Signals are read from the data using the signal_mode, ‘peak area’ will integrate the signal_mass_width region around each m/z, while ‘peak height’and ‘raw’ will select the highest peak within this region. If ‘raw’ is selected, no de-convolution is performed.

Mass spectra files are expected to be a delimited text file with at least 2 columns, one for mass and one for signals. Specify columns using the keyword ‘usecols’ in loadtxt_kws, a (zero indexed) tuple of ints for (mass, signal) columns. The delimiter can be specified using the ‘delimiter’ keyword. Mass spectra can also be passed as a tuple of numpy arrays, (masses, signals).

deuterated_formula: formula of fully deuterated molecule

tofdata: path to mass spectra text file, or tuple of masses, signals

adduct: form of adduct ion, see dget.adduct

cutoff: cutoff for calculation as an m/z ‘123.4’ or state ‘D<int>’

signal_mass_width: range around each m/z to search for maxima or integrate

signal_mode: detection mode, one of ‘peak area’, ‘peak height’, ‘raw’

loadtxt_kws: parameters passed to numpy.loadtxt, defaults to {‘delimiter’: ‘,’, ‘usecols’: (0, 1)}

align_tof_with_spectra(alignment_mz: float | None = None) → float

Shifts ToF data to better align with monoisotopic m/z.

Please calibrate your MS instead of using this.

Parameters:: alignment_mz – m/z used for alignment, defaults to monoisotopic m/z
Returns:: offset used for alignment

property base_name: str: The name of the base formula, with D instead of [2H].

common_adducts = ['[M]+', '[M+H]+', '[M+Na]+', '[M+H2]2+', '[2M+H]+', '[M-H]-', '[2M-H]-', '[M-H2]2-', '[M+Cl]-', '[M-H3O]-']

property deuteration: float

The deuteration of the base molecule.

Deuteration is calculated as the fraction of deuterium in the molecular formula that have been deuterated successfully. For example: 60% C2H5D1, 40% C2H6 would give a deuteration of 0.6.

Deuteration is only calculated for the states above the deuteration cutoff.

property deuteration_probabilities: ndarray

The deuteration fraction of each possible deuteration.

Probabilities are listed in order of D=0 to N, where N is the number of deuterium in the original molecular formula. Probabilities will sum to 1.0.

property deuteration_states: ndarray

Indexes of the valid deuteration states.

Valid states are those Dx-Dn, where n is the number of deuterium atoms in the base molecule as x is inferred from self.deuteration_cutoff if defined or the last 2 consecutive probabilities that are < 1% with an accumulative probability of at least 10%.

property deuterium_count: int: The number of deuterium atoms in the adduct.

property formula: Formula: The adduct formula.

guess_adduct_from_base_peak(adducts: List[Formula] | None = None) → Tuple[Adduct, float]

Search for the adduct with the highest intensity.

If multiple adducts have the maximum intensity then the adduct with the monoisotopic mass closest to the local base peak is returned. This function will work best with highly deuterated samples.

Parameters:: adducts – adducts to try, defaults to DGet.common_adducts
Returns:: best adduct mass difference from adducts base peak

min_fraction_for_spectra = 0.001

plot_predicted_spectra(ax: matplotlib.axes.Axes, mass_range: Tuple[float, float] | str = 'targets') → None

Plot spectra over mass spectra on ax.

mass_range can be passed as a tuple of floats (start m/z, end m/z), ‘full’ to plot the entire mass range or ‘targets’ to plot the region around the predicted spectra.

Parameters:

ax – matplotlib axes to plot on
mass_range – range to plot

print_results(file: TextIO | None = None) → None

Print results.

Parameters:: file – file to print to, or sys.stdout if None

property psf: ndarray

The point spread function used for (de)convolution.

This is the normalised spectrum of the adduct.

property residual_error: float | None

The normalised (0.0 - 1.0) sum of deonvolution residuals.

A high residual error is indicitive of a poor fit between the data and isotopic spectra. This can result from an incorrect formula or contaminants in the mass spectra.

spectra(**kwargs) → Generator[Spectrum, None, None]

Spectrum of all compounds from non to fully deuterated.

kwargs are passed to molmass.Formula.spectrum()

property spectrum: Spectrum: The adduct spectrum.

subtract_baseline(mass_range: Tuple[float, float] | None = None, percentile: float = 25.0) → float

Subtracts baseline of region.

Calculates the percentile percentile of the designated mass region and subtracts it from the mass spec signals.

Parameters:

mass_range – region to find baseline
percentile – percentile to use

Returns:

amount subtracted from baseline

property target_masses: ndarray

The m/z of every possible spectrum.

A new spectrum is created by combining the spectra of every possible deuteration state.

property target_signals: ndarray

The signal for every m/z in the possible spectrum.

The mass_width area around each of the target_masses is integrated or searched for the maximum peak height, depending on the current signal_mode.

dget.formula module

Module containing molmass helper functions.

dget.formula.divide_formulas(a: Formula, b: Formula) → Tuple[int, Formula]

Divide Formula a by b. Returns the number of times b is in a and remainder.

Parameters:

a – numerator Formula
b – divisor Formula

Returns:

number of times b in a Formula of remainder

dget.formula.formula_in_formula(a: Formula, b: Formula) → bool

Check if all atoms of a are in b.

Returns:: True if a in b

dget.formula.spectra_mz_spread(spectra: List[Spectrum], charge: int = 0) → Spectrum

Calculte the m/z spread of the given spectra.

Each entry with the same unit mass is averaged, weighted by its relative intensity.

Parameters:: spectra – list of Spectrum to combine
Returns:: array of mean m/z values