OpenMSToffee: C++¶
Contents
TODO…
OpenSwathWorkflow¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | $ OpenSwathWorkflow --helphelp
OpenSwathWorkflow -- Complete workflow to run OpenSWATH
Version: 2.3.0 Jun 21 2018, 07:51:05, Revision: 763e76a
To cite OpenMS:
Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.
Usage:
OpenSwathWorkflow <options>
Options (mandatory options marked with '*'):
-in <files>* Input files separated by blank (valid formats: 'mzML', 'mzXML', 'sqMass')
-tr <file>* Transition file ('TraML','tsv','pqp') (valid formats: 'traML', 'tsv', 'pqp')
-tr_type <type> Input file type -- default: determined from file extension or content
(valid: 'traML', 'tsv', 'pqp')
-tr_irt <file> Transition file ('TraML') (valid formats: 'traML')
-rt_norm <file> RT normalization file (how to map the RTs of this run to the ones stored in the library). If set, tr_irt may be omitted. (valid formats: 'trafoXML')
-swath_windows_file <file> Optional, tab separated file containing the SWATH windows for extraction: lower_offset upper_offset \newline 400 425 \newline ... Note that the first line is a header and will be skipped.
-sort_swath_maps Sort input SWATH files when matching to SWATH windows from swath_windows_file
-use_ms1_traces Extract the precursor ion trace(s) and use for scoring
-enable_uis_scoring Enable additional scoring of identification assays
-out_features <file> Output file (valid formats: 'featureXML')
-out_tsv <file> TSV output file (mProphet compatible TSV file) (valid formats: 'tsv')
-out_osw <file> OSW output file (PyProphet compatible SQLite file) (valid formats: 'osw')
-out_chrom <file> Also output all computed chromatograms output in mzML (chrom.mzML) or sqMass (SQLite format) (valid formats: 'mzML', 'sqMass')
-min_upper_edge_dist <double> Minimal distance to the edge to still consider a precursor, in Thomson (default: '0')
-rt_extraction_window <double> Only extract RT around this value (-1 means extract over the whole range, a value of 600 means to extract around +/- 300 s of the expected elution). (default: '600')
-extra_rt_extraction_window <double> Output an XIC with a RT-window that by this much larger (e.g. to visually inspect a larger area of the chromatogram) (default: '0' min: '0')
-mz_extraction_window <double> Extraction window used (in Thomson, to use ppm see -ppm flag) (default: '0.05' min: '0')
-ppm M/z extraction_window is in ppm
-sonar Data is scanning SWATH data
-min_rsq <double> Minimum r-squared of RT peptides regression (default: '0.95')
-min_coverage <double> Minimum relative amount of RT peptides to keep (default: '0.6')
-split_file_input The input files each contain one single SWATH (alternatively: all SWATH are in separate files)
-use_elution_model_score Turn on elution model score (EMG fit to peak)
-readOptions <name> Whether to run OpenSWATH directly on the input data, cache data to disk first or to perform a datareduction step first. If you choose cache, make sure to also set tempDirectory (default: 'normal' valid: 'nor
mal', 'cache', 'cacheWorkingInMemory', 'workingInMemory')
-mz_correction_function <name> Use the retention time normalization peptide MS2 masses to perform a mass correction (linear, weighted by intensity linear or quadratic) of all spectra. (default: 'none' valid: 'none', 'unweighted_regression
', 'weighted_regression', 'quadratic_regression', 'weighted_quadratic_regression', 'weighted_quadratic_regression_delta_ppm', 'quadratic_regression_delta_ppm')
-irt_mz_extraction_window <double> Extraction window used for iRT and m/z correction (in Thomson, use ppm use -ppm flag) (default: '0.05')
-ppm_irtwindow IRT m/z extraction_window is in ppm
-tempDirectory <tmp> Temporary directory to store cached files for example (default: '/tmp/')
-extraction_function <name> Function used to extract the signal (default: 'tophat' valid: 'tophat', 'bartlett')
-batchSize <number> The batch size of chromatograms to process (0 means to only have one batch, sensible values are around 500-1000) (default: '0' min: '0')
Common UTIL options:
-ini <file> Use the given TOPP INI file
-log <file> Name of log file (created only when specified)
-instance <n> Instance number for the TOPP INI file (default: '1')
-debug <n> Sets the debug level (default: '0')
-threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1')
-write_ini <file> Writes the default configuration file
-write_ctd <out_dir> Writes the common tool description file(s) (Toolname(s).ctd) to <out_dir>
-no_progress Disables progress logging to command line
-force Overwrite tool specific checks.
-test Enables the test mode (needed for internal use only)
--help Shows options
--helphelp Shows all options (including advanced)
--log_arguments Print out all the command line arguments
Debugging:
-Debugging:irt_trafo <text> Transformation file for RT transform
Parameters for the RTNormalization for iRT petides. This specifies how the RT alignment is performed and how outlier detection is applied. Outlier detection can be done iteratively (by default) which removes one outlier per iteration or using the RANSAC algorithm.:
-RTNormalization:alignmentMethod <choice> How to perform the alignment to the normalized RT space using anchor points. 'linear': perform linear regression (for few anchor points). 'interpolated': Interpolate between anchor points (for few, noise-fre
e anchor points). 'lowess' Use local regression (for many, noisy anchor points). 'b_spline' use b splines for smoothing. (default: 'linear' valid: 'linear', 'interpolated', 'lowess', 'b_spline')
-RTNormalization:outlierMethod <choice> Which outlier detection method to use (valid: 'iter_residual', 'iter_jackknife', 'ransac', 'none'). Iterative methods remove one outlier at a time. Jackknife approach optimizes for maximum r-squared improvem
ent while 'iter_residual' removes the datapoint with the largest residual error (removal by residual is computationally cheaper, use this with lots of peptides). (default: 'iter_residual' valid: 'iter_residu
al', 'iter_jackknife', 'ransac', 'none')
-RTNormalization:useIterativeChauvenet Whether to use Chauvenet's criterion when using iterative methods. This should be used if the algorithm removes too many datapoints but it may lead to true outliers being retained.
-RTNormalization:RANSACMaxIterations <number> Maximum iterations for the RANSAC outlier detection algorithm. (default: '1000')
-RTNormalization:RANSACMaxPercentRTThreshold <number> Maximum threshold in RT dimension for the RANSAC outlier detection algorithm (in percent of the total gradient). Default is set to 3% which is around +/- 4 minutes on a 120 gradient. (default: '3')
-RTNormalization:RANSACSamplingSize <number> Sampling size of data points per iteration for the RANSAC outlier detection algorithm. (default: '10')
-RTNormalization:estimateBestPeptides Whether the algorithms should try to choose the best peptides based on their peak shape for normalization. Use this option you do not expect all your peptides to be detected in a sample and too many 'bad'
peptides enter the outlier removal step (e.g. due to them being endogenous peptides or using a less curated list of peptides).
-RTNormalization:InitialQualityCutoff <value> The initial overall quality cutoff for a peak to be scored (range ca. -2 to 2) (default: '0.5')
-RTNormalization:OverallQualityCutoff <value> The overall quality cutoff for a peak to go into the retention time estimation (range ca. 0 to 10) (default: '5.5')
-RTNormalization:NrRTBins <number> Number of RT bins to use to compute coverage. This option should be used to ensure that there is a complete coverage of the RT space (this should detect cases where only a part of the RT gradient is actually
covered by normalization peptides) (default: '10')
-RTNormalization:MinPeptidesPerBin <number> Minimal number of peptides that are required for a bin to counted as 'covered' (default: '1')
-RTNormalization:MinBinsFilled <number> Minimal number of bins required to be covered (default: '8')
RTNormalization:lowess:
-RTNormalization:lowess:span <value> Span parameter for lowess (default: '0.666666666666667' min: '0' max: '1')
RTNormalization:b_spline:
-RTNormalization:b_spline:num_nodes <number> Number of nodes for b spline (default: '5' min: '0')
Scoring parameters section:
-Scoring:stop_report_after_feature <number> Stop reporting after feature (ordered by quality; -1 means do not stop). (default: '-1')
-Scoring:rt_normalization_factor <value> The normalized RT is expected to be between 0 and 1. If your normalized RT has a different range, pass this here (e.g. it goes from 0 to 100, set this value to 100) (default: '100')
-Scoring:quantification_cutoff <value> Cutoff in m/z below which peaks should not be used for quantification any more (default: '0' min: '0')
-Scoring:write_convex_hull Whether to write out all points of all features into the featureXML
-Scoring:uis_threshold_sn <number> S/N threshold to consider identification transition (set to -1 to consider all) (default: '0')
-Scoring:uis_threshold_peak_area <number> Peak area threshold to consider identification transition (set to -1 to consider all) (default: '0')
-Scoring:scoring_model <choice> Scoring model to use (default: 'default' valid: 'default', 'single_transition')
Scoring:TransitionGroupPicker:
-Scoring:TransitionGroupPicker:stop_after_feature <number> Stop finding after feature (ordered by intensity; -1 means do not stop). (default: '-1')
-Scoring:TransitionGroupPicker:min_peak_width <value> Minimal peak width (s), discard all peaks below this value (-1 means no action). (default: '14')
-Scoring:TransitionGroupPicker:peak_integration <choice> Calculate the peak area and height either the smoothed or the raw chromatogram data. (default: 'original' valid: 'original', 'smoothed')
-Scoring:TransitionGroupPicker:background_subtraction <choice> Remove background from peak signal using estimated noise levels. The 'original' method is only provided for historical purposes, please use the 'exact' method and set parameters using the PeakIntegrator:
settings. The same original or smoothed chromatogram specified by peak_integration will be used for background estimation. (default: 'none' valid: 'none', 'original', 'exact')
-Scoring:TransitionGroupPicker:recalculate_peaks <choice> Tries to get better peak picking by looking at peak consistency of all picked peaks. Tries to use the consensus (median) peak border if theof variation within the picked peaks is too large. (default: 'true'
valid: 'true', 'false')
-Scoring:TransitionGroupPicker:use_precursors Use precursor chromatogram for peak picking
-Scoring:TransitionGroupPicker:recalculate_peaks_max_z <value> Determines the maximal Z-Score (difference measured in standard deviations) that is considered too large for peak boundaries. If the Z-Score is above this value, the median is used for peak boundaries (defau
lt value 1.0). (default: '0.75')
-Scoring:TransitionGroupPicker:minimal_quality <value> Only if compute_peak_quality is set, this parameter will not consider peaks below this quality threshold (default: '-1.5')
-Scoring:TransitionGroupPicker:resample_boundary <value> For computing peak quality, how many extra seconds should be sample left and right of the actual peak (default: '15')
-Scoring:TransitionGroupPicker:compute_peak_quality <choice> Tries to compute a quality value for each peakgroup and detect outlier transitions. The resulting score is centered around zero and values above 0 are generally good and below -1 or -2 are usually bad. (defa
ult: 'true' valid: 'true', 'false')
-Scoring:TransitionGroupPicker:compute_peak_shape_metrics Calulates various peak shape metrics (e.g., tailing) that can be used for downstream QC/QA.
-Scoring:TransitionGroupPicker:boundary_selection_method <choice> Method to use when selecting the best boundaries for peaks. (default: 'largest' valid: 'largest', 'widest')
Scoring:TransitionGroupPicker:PeakPickerMRM:
-Scoring:TransitionGroupPicker:PeakPickerMRM:sgolay_frame_length <number> The number of subsequent data points used for smoothing.
This number has to be uneven. If it is not, 1 will be added. (default: '11')
-Scoring:TransitionGroupPicker:PeakPickerMRM:sgolay_polynomial_order <number> Order of the polynomial that is fitted. (default: '3')
-Scoring:TransitionGroupPicker:PeakPickerMRM:gauss_width <value> Gaussian width in seconds, estimated peak size. (default: '30')
-Scoring:TransitionGroupPicker:PeakPickerMRM:use_gauss <choice> Use Gaussian filter for smoothing (alternative is Savitzky-Golay filter) (default: 'false' valid: 'false', 'true')
-Scoring:TransitionGroupPicker:PeakPickerMRM:peak_width <value> Force a certain minimal peak_width on the data (e.g. extend the peak at least by this amount on both sides) in seconds. -1 turns this feature off. (default: '-1')
-Scoring:TransitionGroupPicker:PeakPickerMRM:signal_to_noise <value> Signal-to-noise threshold at which a peak will not be extended any more. Note that setting this too high (e.g. 1.0) can lead to peaks whose flanks are not fully captured. (default: '0.1' min: '0')
-Scoring:TransitionGroupPicker:PeakPickerMRM:write_sn_log_messages Write out log messages of the signal-to-noise estimator in case of sparse windows or median in rightmost histogram bin
-Scoring:TransitionGroupPicker:PeakPickerMRM:remove_overlapping_peaks <choice> Try to remove overlapping peaks during peak picking (default: 'true' valid: 'false', 'true')
-Scoring:TransitionGroupPicker:PeakPickerMRM:method <choice> Which method to choose for chromatographic peak-picking (OpenSWATH legacy on raw data, corrected picking on smoothed chromatogram or Crawdad on smoothed chromatogram). (default: 'corrected' valid: 'legacy',
'corrected', 'crawdad')
Scoring:TransitionGroupPicker:PeakIntegrator:
-Scoring:TransitionGroupPicker:PeakIntegrator:integration_type <choice> The integration technique to use in integratePeak() and estimateBackground() which uses either the summed intensity, integration by Simpson's rule or trapezoidal integration. (default: 'intensity_sum' valid:
'intensity_sum', 'simpson', 'trapezoid')
-Scoring:TransitionGroupPicker:PeakIntegrator:baseline_type <choice> The baseline type to use in estimateBackground() based on the peak boundaries. A rectangular baseline shape is computed based either on the minimal intensity of the peak boundaries, the maximum intensity or
the average intensity (base_to_base). (default: 'base_to_base' valid: 'base_to_base', 'vertical_division', 'vertical_division_min', 'vertical_division_max')
Scoring:DIAScoring:
-Scoring:DIAScoring:dia_extraction_window <value> DIA extraction window in Th or ppm. (default: '0.05' min: '0')
-Scoring:DIAScoring:dia_extraction_unit <choice> DIA extraction window unit (default: 'Th' valid: 'Th', 'ppm')
-Scoring:DIAScoring:dia_centroided Use centroided DIA data.
-Scoring:DIAScoring:dia_byseries_intensity_min <value> DIA b/y series minimum intensity to consider. (default: '300' min: '0')
-Scoring:DIAScoring:dia_byseries_ppm_diff <value> DIA b/y series minimal difference in ppm to consider. (default: '10' min: '0')
-Scoring:DIAScoring:dia_nr_isotopes <number> DIA number of isotopes to consider. (default: '4' min: '0')
-Scoring:DIAScoring:dia_nr_charges <number> DIA number of charges to consider. (default: '4' min: '0')
-Scoring:DIAScoring:peak_before_mono_max_ppm_diff <value> DIA maximal difference in ppm to count a peak at lower m/z when searching for evidence that a peak might not be monoisotopic. (default: '20' min: '0')
Scoring:EMGScoring:
-Scoring:EMGScoring:max_iteration <number> Maximum number of iterations using by Levenberg-Marquardt algorithm. (default: '10')
Scoring:Scores:
-Scoring:Scores:use_shape_score <choice> Use the shape score (this score measures the similarity in shape of the transitions using a cross-correlation) (default: 'true' valid: 'true', 'false')
-Scoring:Scores:use_coelution_score <choice> Use the coelution score (this score measures the similarity in coelution of the transitions using a cross-correlation) (default: 'true' valid: 'true', 'false')
-Scoring:Scores:use_rt_score <choice> Use the retention time score (this score measure the difference in retention time) (default: 'true' valid: 'true', 'false')
-Scoring:Scores:use_library_score <choice> Use the library score (default: 'true' valid: 'true', 'false')
-Scoring:Scores:use_intensity_score <choice> Use the intensity score (default: 'true' valid: 'true', 'false')
-Scoring:Scores:use_nr_peaks_score <choice> Use the number of peaks score (default: 'true' valid: 'true', 'false')
-Scoring:Scores:use_total_xic_score <choice> Use the total XIC score (default: 'true' valid: 'true', 'false')
-Scoring:Scores:use_sn_score <choice> Use the SN (signal to noise) score (default: 'true' valid: 'true', 'false')
-Scoring:Scores:use_dia_scores <choice> Use the DIA (SWATH) scores. If turned off, will not use fragment ion spectra for scoring. (default: 'true' valid: 'true', 'false')
-Scoring:Scores:use_ms1_correlation Use the correlation scores with the MS1 elution profiles
-Scoring:Scores:use_sonar_scores Use the scores for SONAR scans (scanning swath)
-Scoring:Scores:use_ms1_fullscan Use the full MS1 scan at the peak apex for scoring (ppm accuracy of precursor and isotopic pattern)
-Scoring:Scores:use_uis_scores Use UIS scores for peptidoform identification
|
Internal Class Structure¶
-
class
OpenMSToffeeWorkflow
: public TOPPBase¶ -
Public Functions
-
OpenMSToffeeWorkflow
(bool testing = false)¶
-
struct
FileArguments
¶ Public Members
-
std::string
toffeeFilePath
¶ input tof file path
-
std::string
srlFilePath
¶ input srl file path (tsv or pqp)
-
std::string
alignmentTSVFilePath
¶ input alightment tsv file path
-
std::string
outputFilePath
¶ output file path
-
std::string
inputRTTrafoXML
¶ if not empty, use this to specify RT norm (trafoXML)
-
std::string
outputRTTrafoXML
¶ if not empty, save RT norm to here (trafoXML)
-
FileFormat
format
¶ defines if using TSV or SQLite
-
std::string
-
Extracting data from mzML and mzXML files¶
-
class
HDF5ChromatogramConsumer
: public IMSDataConsumer¶ Public Types
-
using
MapType
= OpenMS::PeakMap¶
Public Functions
-
HDF5ChromatogramConsumer
(const std::string &h5FilePath)¶
-
~HDF5ChromatogramConsumer
()¶
-
void
consumeSpectrum
(SpectrumType &s)¶
-
void
consumeChromatogram
(ChromatogramType &c)¶
-
void
setExpectedSize
(size_t expectedSpectra, size_t expectedChromatograms)¶
-
void
setExperimentalSettings
(const OpenMS::ExperimentalSettings &exp)¶
Public Static Functions
-
void
chromatogramToHDF5
(const std::string &mzMLFilePath, const std::string &h5FilePath)¶ Save an mzML chromatogram file to a much more (>100x) compressed HDF5 file This data is saved as a series of 1D vectors:
- names gives the transition ids of the chromatograms
- offset gives the index into the retention time and intensity vectors for the corresponding transition id
- size gives the size of the data in the retention time and intensity vectors for the corresponding transition id. I.e. its data resides at [offset, offset + size)
- retentionTime all retention time data concatenated into a single vector
- intensity all intensity data concatenated into a single vector
- Parameters
mzMLFilePath
: path the the mzML file generated by OpenSwathh5FilePath
: output file path
-
using
-
class
RTNormalisation
¶ Calculate the world to iRT normalisation transformation using raw SWATH-MS data contained in a toffee file, and a list of precursor and product ions that can be used for alignment
Public Functions
-
RTNormalisation
(const std::string &toffeeFilePath, const std::string &alignmentTSVFilePath)¶ - Parameters
toffeeFilePath
: the toffee file for which we wish to calculate the world to iRT normalisationalignmentTSVFilePath
: the path to a TSV file of retention time alignment precursor and product ions
-
void
updateNormalisationParams
(const OpenMS::Param ¶m)¶ Update the default parameters to input into the normalisation algorithms.
-
void
updateMzCorrectionFunction
(const std::string &mzCorrectionFunction)¶ Update the default method for correcting the mass over charge.
-
void
updateMinRSquared
(double minRSquared)¶ Update the default minimum R-squared value in regression fitting method.
-
void
updateMinCoverage
(double minCoverage)¶ Update the default minimum coverage of the fit regression method.
-
void
updateChromExtractParams
(const OpenMS::ChromExtractParams ¶m)¶ Update the chromatogram extraction configuration.
-
void
updateFeatureFinderParams
(const OpenMS::Param ¶m)¶ Update the feature finding configuration.
-
OpenMS::TransformationDescription
calculateNormalisation
() const¶ Calculate the world to iRT normalisation.
-
OpenMS::TransformationDescription
calculateNormalisationFromFile
(const std::string &inputTrafoPath) const¶ Calculate the world to iRT normalisation.
-
void
saveToFile
(const OpenMS::TransformationDescription &transform, const std::string &trafoXMLFilePath) const¶ Save a previously calculated iRT normalisation to file.
- Warning
- this function does not check if the transformation is in world to iRT coordinates, or its inverse. It relies on the user to take care!
-
void
calculateAndSaveToFile
(const std::string &trafoXMLFilePath) const¶ Calculate the world to iRT normalisation and save it to file.
Public Static Functions
-
OpenMS::Param
defaultNormalisationParams
()¶ Default parameters to input into the normalisation algorithms. These can be updated, see below.
-
static std::string
defaultMzCorrectionFunction
()¶ Default method for correcting the mass over charge. This can be updated, see below.
-
static double
defaultMinRSquared
()¶ Default minimum R-squared value in regression fitting method. This can be updated, see below.
-
static double
defaultMinCoverage
()¶ Default minimum coverage of the fit regression method. This can be updated, see below.
-