OpenMSToffee: C++

TODO…

OpenSwathWorkflow

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
 $ OpenSwathWorkflow --helphelp
 OpenSwathWorkflow -- Complete workflow to run OpenSWATH
 Version: 2.3.0 Jun 21 2018, 07:51:05, Revision: 763e76a
 To cite OpenMS:
   Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

 Usage:
   OpenSwathWorkflow <options>

 Options (mandatory options marked with '*'):
   -in <files>*                                                                    Input files separated by blank (valid formats: 'mzML', 'mzXML', 'sqMass')
   -tr <file>*                                                                     Transition file ('TraML','tsv','pqp') (valid formats: 'traML', 'tsv', 'pqp')
   -tr_type <type>                                                                 Input file type -- default: determined from file extension or content
                                                                                   (valid: 'traML', 'tsv', 'pqp')
   -tr_irt <file>                                                                  Transition file ('TraML') (valid formats: 'traML')
   -rt_norm <file>                                                                 RT normalization file (how to map the RTs of this run to the ones stored in the library). If set, tr_irt may be omitted. (valid formats: 'trafoXML')
   -swath_windows_file <file>                                                      Optional, tab separated file containing the SWATH windows for extraction: lower_offset upper_offset \newline 400 425 \newline ... Note that the first line is a header and will be skipped.
   -sort_swath_maps                                                                Sort input SWATH files when matching to SWATH windows from swath_windows_file
   -use_ms1_traces                                                                 Extract the precursor ion trace(s) and use for scoring
   -enable_uis_scoring                                                             Enable additional scoring of identification assays
   -out_features <file>                                                            Output file (valid formats: 'featureXML')
   -out_tsv <file>                                                                 TSV output file (mProphet compatible TSV file) (valid formats: 'tsv')
   -out_osw <file>                                                                 OSW output file (PyProphet compatible SQLite file) (valid formats: 'osw')
   -out_chrom <file>                                                               Also output all computed chromatograms output in mzML (chrom.mzML) or sqMass (SQLite format) (valid formats: 'mzML', 'sqMass')
   -min_upper_edge_dist <double>                                                   Minimal distance to the edge to still consider a precursor, in Thomson (default: '0')
   -rt_extraction_window <double>                                                  Only extract RT around this value (-1 means extract over the whole range, a value of 600 means to extract around +/- 300 s of the expected elution). (default: '600')
   -extra_rt_extraction_window <double>                                            Output an XIC with a RT-window that by this much larger (e.g. to visually inspect a larger area of the chromatogram) (default: '0' min: '0')
   -mz_extraction_window <double>                                                  Extraction window used (in Thomson, to use ppm see -ppm flag) (default: '0.05' min: '0')
   -ppm                                                                            M/z extraction_window is in ppm
   -sonar                                                                          Data is scanning SWATH data
   -min_rsq <double>                                                               Minimum r-squared of RT peptides regression (default: '0.95')
   -min_coverage <double>                                                          Minimum relative amount of RT peptides to keep (default: '0.6')
   -split_file_input                                                               The input files each contain one single SWATH (alternatively: all SWATH are in separate files)
   -use_elution_model_score                                                        Turn on elution model score (EMG fit to peak)
   -readOptions <name>                                                             Whether to run OpenSWATH directly on the input data, cache data to disk first or to perform a datareduction step first. If you choose cache, make sure to also set tempDirectory (default: 'normal' valid: 'nor
                                                                                   mal', 'cache', 'cacheWorkingInMemory', 'workingInMemory')
   -mz_correction_function <name>                                                  Use the retention time normalization peptide MS2 masses to perform a mass correction (linear, weighted by intensity linear or quadratic) of all spectra. (default: 'none' valid: 'none', 'unweighted_regression
                                                                                   ', 'weighted_regression', 'quadratic_regression', 'weighted_quadratic_regression', 'weighted_quadratic_regression_delta_ppm', 'quadratic_regression_delta_ppm')
   -irt_mz_extraction_window <double>                                              Extraction window used for iRT and m/z correction (in Thomson, use ppm use -ppm flag) (default: '0.05')
   -ppm_irtwindow                                                                  IRT m/z extraction_window is in ppm
   -tempDirectory <tmp>                                                            Temporary directory to store cached files for example (default: '/tmp/')
   -extraction_function <name>                                                     Function used to extract the signal (default: 'tophat' valid: 'tophat', 'bartlett')
   -batchSize <number>                                                             The batch size of chromatograms to process (0 means to only have one batch, sensible values are around 500-1000) (default: '0' min: '0')

 Common UTIL options:
   -ini <file>                                                                     Use the given TOPP INI file
   -log <file>                                                                     Name of log file (created only when specified)
   -instance <n>                                                                   Instance number for the TOPP INI file (default: '1')
   -debug <n>                                                                      Sets the debug level (default: '0')
   -threads <n>                                                                    Sets the number of threads allowed to be used by the TOPP tool (default: '1')
   -write_ini <file>                                                               Writes the default configuration file
   -write_ctd <out_dir>                                                            Writes the common tool description file(s) (Toolname(s).ctd) to <out_dir>
   -no_progress                                                                    Disables progress logging to command line
   -force                                                                          Overwrite tool specific checks.
   -test                                                                           Enables the test mode (needed for internal use only)
   --help                                                                          Shows options
   --helphelp                                                                      Shows all options (including advanced)
   --log_arguments                                                                 Print out all the command line arguments

 Debugging:
   -Debugging:irt_trafo <text>                                                     Transformation file for RT transform

 Parameters for the RTNormalization for iRT petides. This specifies how the RT alignment is performed and how outlier detection is applied. Outlier detection can be done iteratively (by default) which removes one outlier per iteration or using the RANSAC algorithm.:
   -RTNormalization:alignmentMethod <choice>                                       How to perform the alignment to the normalized RT space using anchor points. 'linear': perform linear regression (for few anchor points). 'interpolated': Interpolate between anchor points (for few, noise-fre
                                                                                   e anchor points). 'lowess' Use local regression (for many, noisy anchor points). 'b_spline' use b splines for smoothing. (default: 'linear' valid: 'linear', 'interpolated', 'lowess', 'b_spline')
   -RTNormalization:outlierMethod <choice>                                         Which outlier detection method to use (valid: 'iter_residual', 'iter_jackknife', 'ransac', 'none'). Iterative methods remove one outlier at a time. Jackknife approach optimizes for maximum r-squared improvem
                                                                                   ent while 'iter_residual' removes the datapoint with the largest residual error (removal by residual is computationally cheaper, use this with lots of peptides). (default: 'iter_residual' valid: 'iter_residu
                                                                                   al', 'iter_jackknife', 'ransac', 'none')
   -RTNormalization:useIterativeChauvenet                                          Whether to use Chauvenet's criterion when using iterative methods. This should be used if the algorithm removes too many datapoints but it may lead to true outliers being retained.
   -RTNormalization:RANSACMaxIterations <number>                                   Maximum iterations for the RANSAC outlier detection algorithm. (default: '1000')
   -RTNormalization:RANSACMaxPercentRTThreshold <number>                           Maximum threshold in RT dimension for the RANSAC outlier detection algorithm (in percent of the total gradient). Default is set to 3% which is around +/- 4 minutes on a 120 gradient. (default: '3')
   -RTNormalization:RANSACSamplingSize <number>                                    Sampling size of data points per iteration for the RANSAC outlier detection algorithm. (default: '10')
   -RTNormalization:estimateBestPeptides                                           Whether the algorithms should try to choose the best peptides based on their peak shape for normalization. Use this option you do not expect all your peptides to be detected in a sample and too many 'bad'
                                                                                   peptides enter the outlier removal step (e.g. due to them being endogenous peptides or using a less curated list of peptides).
   -RTNormalization:InitialQualityCutoff <value>                                   The initial overall quality cutoff for a peak to be scored (range ca. -2 to 2) (default: '0.5')
   -RTNormalization:OverallQualityCutoff <value>                                   The overall quality cutoff for a peak to go into the retention time estimation (range ca. 0 to 10) (default: '5.5')
   -RTNormalization:NrRTBins <number>                                              Number of RT bins to use to compute coverage. This option should be used to ensure that there is a complete coverage of the RT space (this should detect cases where only a part of the RT gradient is actually
                                                                                   covered by normalization peptides) (default: '10')
   -RTNormalization:MinPeptidesPerBin <number>                                     Minimal number of peptides that are required for a bin to counted as 'covered' (default: '1')
   -RTNormalization:MinBinsFilled <number>                                         Minimal number of bins required to be covered (default: '8')

 RTNormalization:lowess:
   -RTNormalization:lowess:span <value>                                            Span parameter for lowess (default: '0.666666666666667' min: '0' max: '1')

 RTNormalization:b_spline:
   -RTNormalization:b_spline:num_nodes <number>                                    Number of nodes for b spline (default: '5' min: '0')

 Scoring parameters section:
   -Scoring:stop_report_after_feature <number>                                     Stop reporting after feature (ordered by quality; -1 means do not stop). (default: '-1')
   -Scoring:rt_normalization_factor <value>                                        The normalized RT is expected to be between 0 and 1. If your normalized RT has a different range, pass this here (e.g. it goes from 0 to 100, set this value to 100) (default: '100')
   -Scoring:quantification_cutoff <value>                                          Cutoff in m/z below which peaks should not be used for quantification any more (default: '0' min: '0')
   -Scoring:write_convex_hull                                                      Whether to write out all points of all features into the featureXML
   -Scoring:uis_threshold_sn <number>                                              S/N threshold to consider identification transition (set to -1 to consider all) (default: '0')
   -Scoring:uis_threshold_peak_area <number>                                       Peak area threshold to consider identification transition (set to -1 to consider all) (default: '0')
   -Scoring:scoring_model <choice>                                                 Scoring model to use (default: 'default' valid: 'default', 'single_transition')

 Scoring:TransitionGroupPicker:
   -Scoring:TransitionGroupPicker:stop_after_feature <number>                      Stop finding after feature (ordered by intensity; -1 means do not stop). (default: '-1')
   -Scoring:TransitionGroupPicker:min_peak_width <value>                           Minimal peak width (s), discard all peaks below this value (-1 means no action). (default: '14')
   -Scoring:TransitionGroupPicker:peak_integration <choice>                        Calculate the peak area and height either the smoothed or the raw chromatogram data. (default: 'original' valid: 'original', 'smoothed')
   -Scoring:TransitionGroupPicker:background_subtraction <choice>                  Remove background from peak signal using estimated noise levels. The 'original' method is only provided for historical purposes, please use the 'exact' method and set parameters using the PeakIntegrator:
                                                                                   settings. The same original or smoothed chromatogram specified by peak_integration will be used for background estimation. (default: 'none' valid: 'none', 'original', 'exact')
   -Scoring:TransitionGroupPicker:recalculate_peaks <choice>                       Tries to get better peak picking by looking at peak consistency of all picked peaks. Tries to use the consensus (median) peak border if theof variation within the picked peaks is too large. (default: 'true'
                                                                                   valid: 'true', 'false')
   -Scoring:TransitionGroupPicker:use_precursors                                   Use precursor chromatogram for peak picking
   -Scoring:TransitionGroupPicker:recalculate_peaks_max_z <value>                  Determines the maximal Z-Score (difference measured in standard deviations) that is considered too large for peak boundaries. If the Z-Score is above this value, the median is used for peak boundaries (defau
                                                                                   lt value 1.0). (default: '0.75')
   -Scoring:TransitionGroupPicker:minimal_quality <value>                          Only if compute_peak_quality is set, this parameter will not consider peaks below this quality threshold (default: '-1.5')
   -Scoring:TransitionGroupPicker:resample_boundary <value>                        For computing peak quality, how many extra seconds should be sample left and right of the actual peak (default: '15')
   -Scoring:TransitionGroupPicker:compute_peak_quality <choice>                    Tries to compute a quality value for each peakgroup and detect outlier transitions. The resulting score is centered around zero and values above 0 are generally good and below -1 or -2 are usually bad. (defa
                                                                                   ult: 'true' valid: 'true', 'false')
   -Scoring:TransitionGroupPicker:compute_peak_shape_metrics                       Calulates various peak shape metrics (e.g., tailing) that can be used for downstream QC/QA.
   -Scoring:TransitionGroupPicker:boundary_selection_method <choice>               Method to use when selecting the best boundaries for peaks. (default: 'largest' valid: 'largest', 'widest')

 Scoring:TransitionGroupPicker:PeakPickerMRM:
   -Scoring:TransitionGroupPicker:PeakPickerMRM:sgolay_frame_length <number>       The number of subsequent data points used for smoothing.
                                                                                   This number has to be uneven. If it is not, 1 will be added. (default: '11')
   -Scoring:TransitionGroupPicker:PeakPickerMRM:sgolay_polynomial_order <number>   Order of the polynomial that is fitted. (default: '3')
   -Scoring:TransitionGroupPicker:PeakPickerMRM:gauss_width <value>                Gaussian width in seconds, estimated peak size. (default: '30')
   -Scoring:TransitionGroupPicker:PeakPickerMRM:use_gauss <choice>                 Use Gaussian filter for smoothing (alternative is Savitzky-Golay filter) (default: 'false' valid: 'false', 'true')
   -Scoring:TransitionGroupPicker:PeakPickerMRM:peak_width <value>                 Force a certain minimal peak_width on the data (e.g. extend the peak at least by this amount on both sides) in seconds. -1 turns this feature off. (default: '-1')
   -Scoring:TransitionGroupPicker:PeakPickerMRM:signal_to_noise <value>            Signal-to-noise threshold at which a peak will not be extended any more. Note that setting this too high (e.g. 1.0) can lead to peaks whose flanks are not fully captured. (default: '0.1' min: '0')
   -Scoring:TransitionGroupPicker:PeakPickerMRM:write_sn_log_messages              Write out log messages of the signal-to-noise estimator in case of sparse windows or median in rightmost histogram bin
   -Scoring:TransitionGroupPicker:PeakPickerMRM:remove_overlapping_peaks <choice>  Try to remove overlapping peaks during peak picking (default: 'true' valid: 'false', 'true')
   -Scoring:TransitionGroupPicker:PeakPickerMRM:method <choice>                    Which method to choose for chromatographic peak-picking (OpenSWATH legacy on raw data, corrected picking on smoothed chromatogram or Crawdad on smoothed chromatogram). (default: 'corrected' valid: 'legacy',
                                                                                   'corrected', 'crawdad')

 Scoring:TransitionGroupPicker:PeakIntegrator:
   -Scoring:TransitionGroupPicker:PeakIntegrator:integration_type <choice>         The integration technique to use in integratePeak() and estimateBackground() which uses either the summed intensity, integration by Simpson's rule or trapezoidal integration. (default: 'intensity_sum' valid:
                                                                                   'intensity_sum', 'simpson', 'trapezoid')
   -Scoring:TransitionGroupPicker:PeakIntegrator:baseline_type <choice>            The baseline type to use in estimateBackground() based on the peak boundaries. A rectangular baseline shape is computed based either on the minimal intensity of the peak boundaries, the maximum intensity or
                                                                                   the average intensity (base_to_base). (default: 'base_to_base' valid: 'base_to_base', 'vertical_division', 'vertical_division_min', 'vertical_division_max')

 Scoring:DIAScoring:
   -Scoring:DIAScoring:dia_extraction_window <value>                               DIA extraction window in Th or ppm. (default: '0.05' min: '0')
   -Scoring:DIAScoring:dia_extraction_unit <choice>                                DIA extraction window unit (default: 'Th' valid: 'Th', 'ppm')
   -Scoring:DIAScoring:dia_centroided                                              Use centroided DIA data.
   -Scoring:DIAScoring:dia_byseries_intensity_min <value>                          DIA b/y series minimum intensity to consider. (default: '300' min: '0')
   -Scoring:DIAScoring:dia_byseries_ppm_diff <value>                               DIA b/y series minimal difference in ppm to consider. (default: '10' min: '0')
   -Scoring:DIAScoring:dia_nr_isotopes <number>                                    DIA number of isotopes to consider. (default: '4' min: '0')
   -Scoring:DIAScoring:dia_nr_charges <number>                                     DIA number of charges to consider. (default: '4' min: '0')
   -Scoring:DIAScoring:peak_before_mono_max_ppm_diff <value>                       DIA maximal difference in ppm to count a peak at lower m/z when searching for evidence that a peak might not be monoisotopic. (default: '20' min: '0')

 Scoring:EMGScoring:
   -Scoring:EMGScoring:max_iteration <number>                                      Maximum number of iterations using by Levenberg-Marquardt algorithm. (default: '10')

 Scoring:Scores:
   -Scoring:Scores:use_shape_score <choice>                                        Use the shape score (this score measures the similarity in shape of the transitions using a cross-correlation) (default: 'true' valid: 'true', 'false')
   -Scoring:Scores:use_coelution_score <choice>                                    Use the coelution score (this score measures the similarity in coelution of the transitions using a cross-correlation) (default: 'true' valid: 'true', 'false')
   -Scoring:Scores:use_rt_score <choice>                                           Use the retention time score (this score measure the difference in retention time) (default: 'true' valid: 'true', 'false')
   -Scoring:Scores:use_library_score <choice>                                      Use the library score (default: 'true' valid: 'true', 'false')
   -Scoring:Scores:use_intensity_score <choice>                                    Use the intensity score (default: 'true' valid: 'true', 'false')
   -Scoring:Scores:use_nr_peaks_score <choice>                                     Use the number of peaks score (default: 'true' valid: 'true', 'false')
   -Scoring:Scores:use_total_xic_score <choice>                                    Use the total XIC score (default: 'true' valid: 'true', 'false')
   -Scoring:Scores:use_sn_score <choice>                                           Use the SN (signal to noise) score (default: 'true' valid: 'true', 'false')
   -Scoring:Scores:use_dia_scores <choice>                                         Use the DIA (SWATH) scores. If turned off, will not use fragment ion spectra for scoring. (default: 'true' valid: 'true', 'false')
   -Scoring:Scores:use_ms1_correlation                                             Use the correlation scores with the MS1 elution profiles
   -Scoring:Scores:use_sonar_scores                                                Use the scores for SONAR scans (scanning swath)
   -Scoring:Scores:use_ms1_fullscan                                                Use the full MS1 scan at the peak apex for scoring (ppm accuracy of precursor and isotopic pattern)
   -Scoring:Scores:use_uis_scores                                                  Use UIS scores for peptidoform identification

Internal Class Structure

class OpenMSToffeeWorkflow : public TOPPBase

Public Types

enum FileFormat

Values:

TSV
SQLITE

Public Functions

OpenMSToffeeWorkflow(bool testing = false)
struct FileArguments

Public Members

std::string toffeeFilePath

input tof file path

std::string srlFilePath

input srl file path (tsv or pqp)

std::string alignmentTSVFilePath

input alightment tsv file path

std::string outputFilePath

output file path

std::string inputRTTrafoXML

if not empty, use this to specify RT norm (trafoXML)

std::string outputRTTrafoXML

if not empty, save RT norm to here (trafoXML)

FileFormat format

defines if using TSV or SQLite

Extracting data from mzML and mzXML files

class HDF5ChromatogramConsumer : public IMSDataConsumer

Public Types

using MapType = OpenMS::PeakMap
using SpectrumType = MapType::SpectrumType
using ChromatogramType = MapType::ChromatogramType

Public Functions

HDF5ChromatogramConsumer(const std::string &h5FilePath)
~HDF5ChromatogramConsumer()
void consumeSpectrum(SpectrumType &s)
void consumeChromatogram(ChromatogramType &c)
void setExpectedSize(size_t expectedSpectra, size_t expectedChromatograms)
void setExperimentalSettings(const OpenMS::ExperimentalSettings &exp)

Public Static Functions

void chromatogramToHDF5(const std::string &mzMLFilePath, const std::string &h5FilePath)

Save an mzML chromatogram file to a much more (>100x) compressed HDF5 file This data is saved as a series of 1D vectors:

  • names gives the transition ids of the chromatograms
  • offset gives the index into the retention time and intensity vectors for the corresponding transition id
  • size gives the size of the data in the retention time and intensity vectors for the corresponding transition id. I.e. its data resides at [offset, offset + size)
  • retentionTime all retention time data concatenated into a single vector
  • intensity all intensity data concatenated into a single vector

Parameters
  • mzMLFilePath: path the the mzML file generated by OpenSwath
  • h5FilePath: output file path

class RTNormalisation

Calculate the world to iRT normalisation transformation using raw SWATH-MS data contained in a toffee file, and a list of precursor and product ions that can be used for alignment

Public Functions

RTNormalisation(const std::string &toffeeFilePath, const std::string &alignmentTSVFilePath)

Parameters
  • toffeeFilePath: the toffee file for which we wish to calculate the world to iRT normalisation
  • alignmentTSVFilePath: the path to a TSV file of retention time alignment precursor and product ions

void updateNormalisationParams(const OpenMS::Param &param)

Update the default parameters to input into the normalisation algorithms.

void updateMzCorrectionFunction(const std::string &mzCorrectionFunction)

Update the default method for correcting the mass over charge.

void updateMinRSquared(double minRSquared)

Update the default minimum R-squared value in regression fitting method.

void updateMinCoverage(double minCoverage)

Update the default minimum coverage of the fit regression method.

void updateChromExtractParams(const OpenMS::ChromExtractParams &param)

Update the chromatogram extraction configuration.

void updateFeatureFinderParams(const OpenMS::Param &param)

Update the feature finding configuration.

OpenMS::TransformationDescription calculateNormalisation() const

Calculate the world to iRT normalisation.

OpenMS::TransformationDescription calculateNormalisationFromFile(const std::string &inputTrafoPath) const

Calculate the world to iRT normalisation.

void saveToFile(const OpenMS::TransformationDescription &transform, const std::string &trafoXMLFilePath) const

Save a previously calculated iRT normalisation to file.

Warning
this function does not check if the transformation is in world to iRT coordinates, or its inverse. It relies on the user to take care!

void calculateAndSaveToFile(const std::string &trafoXMLFilePath) const

Calculate the world to iRT normalisation and save it to file.

Public Static Functions

OpenMS::Param defaultNormalisationParams()

Default parameters to input into the normalisation algorithms. These can be updated, see below.

static std::string defaultMzCorrectionFunction()

Default method for correcting the mass over charge. This can be updated, see below.

static double defaultMinRSquared()

Default minimum R-squared value in regression fitting method. This can be updated, see below.

static double defaultMinCoverage()

Default minimum coverage of the fit regression method. This can be updated, see below.