QGIS/python/plugins/processing/algs/otb/description/5.0.0/doc/TrainImagesClassifier-gbt.html
Juergen E. Fischer 00633811c9 spelling fixes
2016-01-21 10:42:25 +01:00

12 lines
16 KiB
HTML

<html><head>
<style type="text/css">
dl { border: 3px double #ccc; padding: 0.5em; } dt { float: left; clear: left; text-align: left; font-weight: bold; color: green; } dt:after { content: ":"; } dd { margin: 0 0 0 220px; padding: 0 0 0.5em 0; }
</style>
</head><body><h1>TrainImagesClassifier</h1><h2>Brief Description</h2>Train a classifier from multiple pairs of images and training vector data.<h2>Tags</h2>Learning<h2>Long Description</h2>This application performs a classifier training from multiple pairs of input images and training vector data. Samples are composed of pixel values in each band optionally centered and reduced using an XML statistics file produced by the ComputeImagesStatistics application.
The training vector data must contain polygons with a positive integer field representing the class label. The name of this field can be set using the "Class label field" parameter. Training and validation sample lists are built such that each class is equally represented in both lists. One parameter allows controlling the ratio between the number of samples in training and validation sets. Two parameters allow managing the size of the training and validation sets per class and per image.
Several classifier parameters can be set depending on the chosen classifier. In the validation process, the confusion matrix is organized the following way: rows = reference labels, columns = produced labels. In the header of the optional confusion matrix output file, the validation (reference) and predicted (produced) class labels are ordered according to the rows/columns of the confusion matrix.
This application is based on LibSVM and on OpenCV Machine Learning classifiers, and is compatible with OpenCV 2.3.1 and later.<h2>Parameters</h2><ul><li><b>[param] -io</b> &lt;string&gt; This group of parameters allows setting input and output data.. Mandatory: True. Default Value: &quot;0&quot;</li><li><b>[param] -elev</b> &lt;string&gt; This group of parameters allows managing elevation values. Supported formats are SRTM, DTED or any geotiff. DownloadSRTMTiles application could be a useful tool to list/download tiles related to a product.. Mandatory: True. Default Value: &quot;0&quot;</li><li><b>[param] -sample</b> &lt;string&gt; This group of parameters allows setting training and validation sample lists parameters.. Mandatory: True. Default Value: &quot;0&quot;</li><li><b>[param] -rand</b> &lt;int32&gt; Set specific seed. with integer value.. Mandatory: False. Default Value: &quot;0&quot;</li><li><b>[param] -inxml</b> &lt;string&gt; Load otb application from xml file. Mandatory: False. Default Value: &quot;&quot;</li><li><b>[param] -outxml</b> &lt;string&gt; Save otb application to xml file. Mandatory: False. Default Value: &quot;&quot;</li><b>[choice] -classifier</b> Choice of the classifier to use for the training. libsvm,svm,boost,dt,gbt,ann,bayes,rf,knn. Mandatory: True. Default Value: &quot;libsvm&quot;<ul><li><b>[group] -libsvm</b></li><ul><li><b>[param] -classifier.libsvm.k</b> &lt;string&gt; SVM Kernel Type.. Mandatory: True. Default Value: &quot;linear&quot;</li><li><b>[param] -classifier.libsvm.c</b> &lt;float&gt; SVM models have a cost parameter C (1 by default) to control the trade-off between training errors and forcing rigid margins.. Mandatory: True. Default Value: &quot;1&quot;</li><li><b>[param] -classifier.libsvm.opt</b> &lt;boolean&gt; SVM parameters optimization flag.. Mandatory: False. Default Value: &quot;True&quot;</li></ul><li><b>[group] -svm</b></li><ul><li><b>[param] -classifier.svm.m</b> &lt;string&gt; Type of SVM formulation.. Mandatory: True. Default Value: &quot;csvc&quot;</li><li><b>[param] -classifier.svm.k</b> &lt;string&gt; SVM Kernel Type.. Mandatory: True. Default Value: &quot;linear&quot;</li><li><b>[param] -classifier.svm.c</b> &lt;float&gt; SVM models have a cost parameter C (1 by default) to control the trade-off between training errors and forcing rigid margins.. Mandatory: True. Default Value: &quot;1&quot;</li><li><b>[param] -classifier.svm.nu</b> &lt;float&gt; Parameter nu of a SVM optimization problem.. Mandatory: True. Default Value: &quot;0&quot;</li><li><b>[param] -classifier.svm.coef0</b> &lt;float&gt; Parameter coef0 of a kernel function (POLY / SIGMOID).. Mandatory: True. Default Value: &quot;0&quot;</li><li><b>[param] -classifier.svm.gamma</b> &lt;float&gt; Parameter gamma of a kernel function (POLY / RBF / SIGMOID).. Mandatory: True. Default Value: &quot;1&quot;</li><li><b>[param] -classifier.svm.degree</b> &lt;float&gt; Parameter degree of a kernel function (POLY).. Mandatory: True. Default Value: &quot;1&quot;</li><li><b>[param] -classifier.svm.opt</b> &lt;boolean&gt; SVM parameters optimization flag.
-If set to True, then the optimal SVM parameters will be estimated. Parameters are considered optimal by OpenCV when the cross-validation estimate of the test set error is minimal. Finally, the SVM training process is computed 10 times with these optimal parameters over subsets corresponding to 1/10th of the training samples using the k-fold cross-validation (with k = 10).
-If set to False, the SVM classification process will be computed once with the currently set input SVM parameters over the training samples.
-Thus, even with identical input SVM parameters and a similar random seed, the output SVM models will be different according to the method used (optimized or not) because the samples are not identically processed within OpenCV.. Mandatory: False. Default Value: &quot;True&quot;</li></ul><li><b>[group] -boost</b></li><ul><li><b>[param] -classifier.boost.t</b> &lt;string&gt; Type of Boosting algorithm.. Mandatory: True. Default Value: &quot;real&quot;</li><li><b>[param] -classifier.boost.w</b> &lt;int32&gt; The number of weak classifiers.. Mandatory: True. Default Value: &quot;100&quot;</li><li><b>[param] -classifier.boost.r</b> &lt;float&gt; A threshold between 0 and 1 used to save computational time. Samples with summary weight <= (1 - weight_trim_rate) do not participate in the next iteration of training. Set this parameter to 0 to turn off this functionality.. Mandatory: True. Default Value: &quot;0.95&quot;</li><li><b>[param] -classifier.boost.m</b> &lt;int32&gt; Maximum depth of the tree.. Mandatory: True. Default Value: &quot;1&quot;</li></ul><li><b>[group] -dt</b></li><ul><li><b>[param] -classifier.dt.max</b> &lt;int32&gt; The training algorithm attempts to split each node while its depth is smaller than the maximum possible depth of the tree. The actual depth may be smaller if the other termination criteria are met, and/or if the tree is pruned.. Mandatory: True. Default Value: &quot;65535&quot;</li><li><b>[param] -classifier.dt.min</b> &lt;int32&gt; If all absolute differences between an estimated value in a node and the values of the train samples in this node are smaller than this regression accuracy parameter, then the node will not be split.. Mandatory: True. Default Value: &quot;10&quot;</li><li><b>[param] -classifier.dt.ra</b> &lt;float&gt; . Mandatory: True. Default Value: &quot;0.01&quot;</li><li><b>[param] -classifier.dt.cat</b> &lt;int32&gt; Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split.. Mandatory: True. Default Value: &quot;10&quot;</li><li><b>[param] -classifier.dt.f</b> &lt;int32&gt; If cv_folds > 1, then it prunes a tree with K-fold cross-validation where K is equal to cv_folds.. Mandatory: True. Default Value: &quot;10&quot;</li><li><b>[param] -classifier.dt.r</b> &lt;boolean&gt; If true, then a pruning will be harsher. This will make a tree more compact and more resistant to the training data noise but a bit less accurate.. Mandatory: False. Default Value: &quot;True&quot;</li><li><b>[param] -classifier.dt.t</b> &lt;boolean&gt; If true, then pruned branches are physically removed from the tree.. Mandatory: False. Default Value: &quot;True&quot;</li></ul><li><b>[group] -gbt</b></li><ul><li><b>[param] -classifier.gbt.w</b> &lt;int32&gt; Number "w" of boosting algorithm iterations, with w*K being the total number of trees in the GBT model, where K is the output number of classes.. Mandatory: True. Default Value: &quot;200&quot;</li><li><b>[param] -classifier.gbt.s</b> &lt;float&gt; Regularization parameter.. Mandatory: True. Default Value: &quot;0.01&quot;</li><li><b>[param] -classifier.gbt.p</b> &lt;float&gt; Portion of the whole training set used for each algorithm iteration. The subset is generated randomly.. Mandatory: True. Default Value: &quot;0.8&quot;</li><li><b>[param] -classifier.gbt.max</b> &lt;int32&gt; The training algorithm attempts to split each node while its depth is smaller than the maximum possible depth of the tree. The actual depth may be smaller if the other termination criteria are met, and/or if the tree is pruned.. Mandatory: True. Default Value: &quot;3&quot;</li></ul><li><b>[group] -ann</b></li><ul><li><b>[param] -classifier.ann.t</b> &lt;string&gt; Type of training method for the multilayer perceptron (MLP) neural network.. Mandatory: True. Default Value: &quot;reg&quot;</li><li><b>[param] -classifier.ann.sizes</b> &lt;string&gt; The number of neurons in each intermediate layer (excluding input and output layers).. Mandatory: True. Default Value: &quot;&quot;</li><li><b>[param] -classifier.ann.f</b> &lt;string&gt; Neuron activation function.. Mandatory: True. Default Value: &quot;sig&quot;</li><li><b>[param] -classifier.ann.a</b> &lt;float&gt; Alpha parameter of the activation function (used only with sigmoid and gaussian functions).. Mandatory: True. Default Value: &quot;1&quot;</li><li><b>[param] -classifier.ann.b</b> &lt;float&gt; Beta parameter of the activation function (used only with sigmoid and gaussian functions).. Mandatory: True. Default Value: &quot;1&quot;</li><li><b>[param] -classifier.ann.bpdw</b> &lt;float&gt; Strength of the weight gradient term in the BACKPROP method. The recommended value is about 0.1.. Mandatory: True. Default Value: &quot;0.1&quot;</li><li><b>[param] -classifier.ann.bpms</b> &lt;float&gt; Strength of the momentum term (the difference between weights on the 2 previous iterations). This parameter provides some inertia to smooth the random fluctuations of the weights. It can vary from 0 (the feature is disabled) to 1 and beyond. The value 0.1 or so is good enough.. Mandatory: True. Default Value: &quot;0.1&quot;</li><li><b>[param] -classifier.ann.rdw</b> &lt;float&gt; Initial value Delta_0 of update-values Delta_{ij} in RPROP method (default = 0.1).. Mandatory: True. Default Value: &quot;0.1&quot;</li><li><b>[param] -classifier.ann.rdwm</b> &lt;float&gt; Update-values lower limit Delta_{min} in RPROP method. It must be positive (default = 1e-7).. Mandatory: True. Default Value: &quot;1e-07&quot;</li><li><b>[param] -classifier.ann.term</b> &lt;string&gt; Termination criteria.. Mandatory: True. Default Value: &quot;all&quot;</li><li><b>[param] -classifier.ann.eps</b> &lt;float&gt; Epsilon value used in the Termination criteria.. Mandatory: True. Default Value: &quot;0.01&quot;</li><li><b>[param] -classifier.ann.iter</b> &lt;int32&gt; Maximum number of iterations used in the Termination criteria.. Mandatory: True. Default Value: &quot;1000&quot;</li></ul><li><b>[group] -bayes</b></li><ul></ul><li><b>[group] -rf</b></li><ul><li><b>[param] -classifier.rf.max</b> &lt;int32&gt; The depth of the tree. A low value will likely underfit and conversely a high value will likely overfit. The optimal value can be obtained using cross validation or other suitable methods.. Mandatory: True. Default Value: &quot;5&quot;</li><li><b>[param] -classifier.rf.min</b> &lt;int32&gt; If the number of samples in a node is smaller than this parameter, then the node will not be split. A reasonable value is a small percentage of the total data e.g. 1 percent.. Mandatory: True. Default Value: &quot;10&quot;</li><li><b>[param] -classifier.rf.ra</b> &lt;float&gt; If all absolute differences between an estimated value in a node and the values of the train samples in this node are smaller than this regression accuracy parameter, then the node will not be split.. Mandatory: True. Default Value: &quot;0&quot;</li><li><b>[param] -classifier.rf.cat</b> &lt;int32&gt; Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split.. Mandatory: True. Default Value: &quot;10&quot;</li><li><b>[param] -classifier.rf.var</b> &lt;int32&gt; The size of the subset of features, randomly selected at each tree node, that are used to find the best split(s). If you set it to 0, then the size will be set to the square root of the total number of features.. Mandatory: True. Default Value: &quot;0&quot;</li><li><b>[param] -classifier.rf.nbtrees</b> &lt;int32&gt; The maximum number of trees in the forest. Typically, the more trees you have, the better the accuracy. However, the improvement in accuracy generally diminishes and reaches an asymptote for a certain number of trees. Also to keep in mind, increasing the number of trees increases the prediction time linearly.. Mandatory: True. Default Value: &quot;100&quot;</li><li><b>[param] -classifier.rf.acc</b> &lt;float&gt; Sufficient accuracy (OOB error).. Mandatory: True. Default Value: &quot;0.01&quot;</li></ul><li><b>[group] -knn</b></li><ul><li><b>[param] -classifier.knn.k</b> &lt;int32&gt; The number of neighbors to use.. Mandatory: True. Default Value: &quot;32&quot;</li></ul></ul></ul><h2>Limitations</h2>None<h2>Authors</h2>OTB-Team<h2>See Also</h2>OpenCV documentation for machine learning http://docs.opencv.org/modules/ml/doc/ml.html <h2>Example of use</h2><ul><li><p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">io.il: QB_1_ortho.tif</p></li><li><p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">io.vd: VectorData_QB1.shp</p></li><li><p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">io.imstat: EstimateImageStatisticsQB1.xml</p></li><li><p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">sample.mv: 100</p></li><li><p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">sample.mt: 100</p></li><li><p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">sample.vtr: 0.5</p></li><li><p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">sample.edg: false</p></li><li><p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">sample.vfn: Class</p></li><li><p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">classifier: libsvm</p></li><li><p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">classifier.libsvm.k: linear</p></li><li><p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">classifier.libsvm.c: 1</p></li><li><p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">classifier.libsvm.opt: false</p></li><li><p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">io.out: svmModelQB1.txt</p></li><li><p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">io.confmatout: svmConfusionMatrixQB1.csv</p></li></ul></body></html>