wiki/howto/windestimation.md
... ...
@@ -13,17 +13,17 @@ To complete the training process successfully, you need to make sure that you ha
13 13
14 14
## Model training process
15 15
1. Run ``com.sap.sailing.windestimation.model.SimpleModelsTrainingPart1`` as a normal Java Application. After this, all the necessary maneuver and wind data will be downloaded, pre-processed and maneuver classifiers get trained.
16
-2. Make sure that the launched program does not get termined by an uncaught exception. Wait until graphical info dialog shows up which requests you to perform data cleansing for duration dimension.
16
+2. Make sure that the launched program does not get terminated by an uncaught exception. Wait until graphical info dialog shows up which requests you to perform data cleansing for duration dimension.
17 17
![Screenshot of graphical info dialog requesting to perform data cleansing for duration dimension](../images/windestimation/dialogRequestingDataCleansingForDurationDimension.jpg "Screenshot of graphical info dialog requesting to perform data cleansing for duration dimension")
18
- Press OK. Afterwards, a graphical window must open with two charts. The top chart is an XY-chart where the x-axis represents **seconds** and the y-axis represents various TWD delta-based measures (e.g. standard deviation or mean). Below the XY-chart, a histogram for the data points of the XY-Chart is provided. You can zoom-in and zoom-out in each of the charts by mouse dragging. Be aware that currently, the zoom level of both charts is not synchronizing.
19
- ![Screenshot of graphical wind data visualization tool for duration dimension](https://github.com/adam-p/markdown-here/raw/master/src/common/images/icon48.png "Screenshot of graphical wind data visualization tool for duration dimension")
20
-3. Open your graphical MongoDB client and connect to ``windEstimation`` database hosted within your local MongoDB. Open the collection with name ``aggregatedDurationTwdTransition``. Within the collection, you will see all the instances/data points visualized in the previous step. The attribute used for the x-axis is ``value``.
21
- ![Screenshot of MongoDB Compass with opened aggregatedDurationTwdTransition collection](https://github.com/adam-p/markdown-here/raw/master/src/common/images/icon48.png "Screenshot of MongoDB Compass with opened aggregatedDurationTwdTransition collection")
18
+ Press OK. Afterwards, a graphical window must open with two charts. The top chart is an XY-chart where the x-axis represents **seconds** and the y-axis represents various TWD delta-based measures (e.g. standard deviation or mean). Below the XY-chart, a histogram for the data points of the XY-chart is provided. You can zoom-in and zoom-out in each of the charts by mouse dragging. Be aware that currently, the zoom level of both charts is not synchronizing.
19
+ ![Screenshot of graphical wind data visualization tool for duration dimension](../images/windestimation/aggregatedDurationBasedTwdDeltaTransitionBeforeDataCleansing.jpg "Screenshot of duration-based TWD delta visualization tool before data cleansing")
20
+3. Open your graphical MongoDB client and connect to ``windEstimation`` database hosted within your local MongoDB. Open the collection with name ``aggregatedDurationTwdTransition``. Within the collection, you will see all the instances/data points visualized in the previous step. The attribute used for the x-axis is ``value``. Its corresponding metrics plotted in y-axis are the other attributes. ``std`` represents standard deviation (``Sigma`` curve in XY-chart) and ``std0`` represents standard deviation with zero as mean value (``Zero mean sigma`` curve in XY-chart).
21
+ ![Screenshot of MongoDB Compass with opened aggregatedDurationTwdTransition collection](../images/windestimation/mongoDbCompassWithOpenedAggregatedDurationTwdTransitionCollection.jpg "Screenshot of MongoDB Compass with opened aggregatedDurationTwdTransition collection")
22 22
4. Delete all the instances within the collection which do not make sense. For this, use the data visualization tool from step 2 to identify such instances. Some of the instances are not representative due to the small number of supporting instances which is visualized in the histogram. Such instances can produce unreasonable bumps in the XY-chart. The desired output of this step is that the curve ``Zero mean sigma`` looks smooth and always growing, e.g. as depicted below:
23
- ![Screenshot of graphical visualization tool of duration dimension with after data cleansing](https://github.com/adam-p/markdown-here/raw/master/src/common/images/icon48.png "Screenshot of graphical visualization tool of duration dimension with after data cleansing")
23
+ ![Screenshot of graphical visualization tool of duration dimension after data cleansing](../images/windestimation/aggregatedDurationBasedTwdDeltaTransitionAfterDataCleansing.jpg "Screenshot of duration-based TWD delta visualization tool after data cleansing")
24 24
Use the ``Refresh charts`` button as often as needed to update the charts with the modified data in MongoDB. Close the graphical visualization tool window after you are done with data cleansing to resume the training process. A confirmation dialog shows up. Confirm it by pressing *"Continue with model training"* button.
25 25
![Screenshot of confirmation dialog for finishing the data cleansing](../images/windestimation/confirmationDialogAfterDataCleansingDurationDimension.jpg "Screenshot of confirmation dialog for finishing the data cleansing")
26
-5. A new information dialog shows up (**do not press OK yet!**) requesting you to open the source code of the class ``com.sap.sailing.windestimation.model.regressor.twdtransition.DurationBasedTwdTransitionRegressorModelContext``. Open it and scroll down to the definition of the inner enum ``DurationValueRange``. The enum defines the intervals for which a separate regressor model will be trained. Read the Javadoc of ``DurationValueRange`` and adjust the intervals accordingly in order to allow the regressor model to learn the ``Zero mean sigma`` curve with minimal error. You can also configure the polynomial which will be used for regressor training. Make sure that there are at least 2 data points contained within each configured interval. The data point with x = 0, y = 0 will be created automatically within model training procedure. Press OK in information dialog after you are done.
26
+5. A new information dialog shows up requesting you to open the source code of the class ``com.sap.sailing.windestimation.model.regressor.twdtransition.DurationBasedTwdTransitionRegressorModelContext``. Open it and scroll down to the definition of the inner enum ``DurationValueRange``. The enum defines the intervals for which a separate regressor model will be trained. Read the Javadoc of ``DurationValueRange`` and adjust the intervals accordingly in order to allow the regressor model to learn the ``Zero mean sigma`` curve with minimal error. You can also configure the polynomial which will be used as regressor function. Make sure that there are at least 2 data points contained within each configured interval. The data point with x = 0, y = 0 will be created automatically within the model training procedure. Press OK in information dialog after you are done.
27 27
6. A graphical info dialog shows up which requests you to perform data cleansing for the *distance* dimension. Press OK. All steps for data cleansing for the distance dimension are analogous to the steps of the duration dimension described from step 2. until step 5. Thus, consult these steps in order to complete the data cleansing for the distance dimension. The unit used for the distance representation is **meters**. The collection name required in step 3. is ``aggregatedDistanceTwdTransition``. The class required in step 5. is ``com.sap.sailing.windestimation.model.regressor.twdtransition.DistanceBasedTwdTransitionRegressorModelContext`` and its inner enum is ``DistanceValueRange``.
28 28
7. Run ``com.sap.sailing.windestimation.model.SimpleModelsTrainingPart2`` as a normal Java Application. Wait until the model training finishes and the program terminates normally. A new file with serialized representation of internal wind estimation models should be located in ``./windEstimationModels.dat``. The absolute path of the file must be printed in the console output of the program. You can upload the file via HTTP POST to http://sapsailing.com/windestimation/api/windestimation_data (see ``com.sap.sailing.windestimation.jaxrs.api.WindEstimationDataResource``) to update the wind estimation of a server instance. If you changed the source files of ``DurationValueRange`` or ``DistanceValueRange``, then you will need to update ``com.sap.sailing.windestimation`` bundle of the server instance which is meant to receive the new wind estimation models.
29
-8. Optionally, run ``com.sap.sailing.windestimation.evaluation.WindEstimatorManeuverNumberDependentEvaluationRunner`` as normal Java Application to evaluate the wind estimation with the new trained models. The evaluation score will be stored as CSV in ``./maneuverNumberDependentEvaluation.csv``.
... ...
\ No newline at end of file
0
+8. Optionally, run ``com.sap.sailing.windestimation.evaluation.WindEstimatorManeuverNumberDependentEvaluationRunner`` as a normal Java Application to evaluate the wind estimation with the new trained models. The evaluation score will be stored as CSV in ``./maneuverNumberDependentEvaluation.csv``.
... ...
\ No newline at end of file
wiki/howto/windestimationAdvanced.md
... ...
@@ -1,6 +1,6 @@
1 1
# Training of internal Wind Estimation models
2 2
3
-This document describes the generation process of Machine Learning (ML) models which are used internally by wind estimation. It is highly recommended to proceed this howto step by step considering the order of sections. At the end of this howto, you will generate a file containing the representation of internal models used by ``com.sap.sailing.windestimation`` bundle. You can use this file to update the wind estimation models of a running server instance. If you are interested in a simpler tutorial with less execution steps thanks to automation, then you might be interested in [Simple Guide for training of internal Wind Estimation models](./windestimation.md)
3
+This document describes the generation process of Machine Learning (ML) models which are used internally by wind estimation. It is highly recommended to proceed this howto step by step considering the order of sections. At the end of this howto, you will generate a file containing the representation of internal models used by ``com.sap.sailing.windestimation`` bundle. You can use this file to update the wind estimation models of a running server instance. If you are interested in a simpler tutorial with less execution steps thanks to automation, then check out [Simple Guide for training of internal Wind Estimation models](./windestimation.md)
4 4
5 5
## Overview
6 6
In total, there are the following three categories of ML models used by wind estimation:
... ...
@@ -22,7 +22,7 @@ Each of the ML model categories must be trained individually. The common workflo
22 22
2. Preprocess data
23 23
3. Train the model category
24 24
25
-For each of the steps, appropriate Java classes must be executed per *Run with...->Java Application*. All referenced classes are located in *com.sap.sailing.windestimation.lab* Java project. Each class execution must finish without termination due to uncaught exceptions before proceeding to next instruction. You can skip the training of a model category if you do not want to update the models on server of that category. After model training, A new file with serialized representation of internal wind estimation models should be located in ``./windEstimationModels.dat``, which is normally */path/to/workspace/com.sap.sailing.windestimation/trained_wind_estimation_models* if you start the training classes in Eclipse per *Run with...->Java Application*.
25
+For each of the steps, appropriate Java classes must be executed per *Run with...->Java Application*. All referenced classes are located in *com.sap.sailing.windestimation.lab* Java project. Each class execution must finish without termination due to uncaught exceptions before proceeding to next instruction. You can skip the training of a model category if you do not want to update the models of that category on server. After model training, a new file with serialized representation of internal wind estimation models should be created in ``./windEstimationModels.dat``, which is normally */path/to/workspace/com.sap.sailing.windestimation/trained_wind_estimation_models* if you start the training classes in Eclipse per *Run with...->Java Application*.
26 26
27 27
The details of the training process for each model category are described in the following sections.
28 28
... ...
@@ -39,17 +39,21 @@ The following steps import all the data required from sapsailing.com into the lo
39 39
2. Run *com.sap.sailing.windestimation.data.importer.PolarDataImporter*
40 40
41 41
## Maneuver classifiers training
42
-1. Run *com.sap.sailing.windestimation.model.classifier.maneuver.ManeuverClassifierTrainer*. Within the this step, the maneuver data is preprocessed and all maneuver classifiers are trained for each supported context.
42
+1. Run *com.sap.sailing.windestimation.model.classifier.maneuver.ManeuverClassifierTrainer*. Within this step, the maneuver data is preprocessed and all maneuver classifiers are trained for each supported context.
43 43
2. Optionally run *com.sap.sailing.windestimation.model.classifier.maneuver.ManeuverClassifierScoring* to print the performance of the trained classifiers. After this step, a list with macro-averaged F2-score of each trained classifier will be stored in *./maneuverClassifierScores.csv*
44 44
45 45
## Duration-based TWD delta standard deviation regressor
46 46
47 47
1. Run *com.sap.sailing.windestimation.data.importer.DurationBasedTwdTransitionImporter*
48 48
2. Run *com.sap.sailing.windestimation.data.importer.AggregatedDurationBasedTwdTransitionImporter*
49
-3. Run *com.sap.sailing.windestimation.datavisualization.AggregatedDurationDimensionPlot* to visualize the wind data. A Swing-based GUI-Window must open with two charts, one XY-chart where the x-axis represents **seconds**, and the y-axis represents TWD delta-based series measures (e.g. standard deviation or mean). Below the chart, a histogram for data points of the XY-Chart is provided. You can zoom-in and zoom-out in each of the chart by mouse dragging. Be aware that currently the zoom level of both charts is not synchronized
50
-4. Open your graphical MongoDB client and connect to *windEstimation* database hosted by your local MongoDB. Open the collection with name *aggregatedDurationTwdTransition*. Within the collection you will see all the instances/data points visualized in the previous step. The total number of the points must not exceed 100.
51
-5. Delete all the instances within the collection which do not make sense. For this, use the data visualization tool from step 3 to identify such instances. Some of the instances are not representative due to the small number of supporting instances which is visualized in the histogram. Such instances can produce unreasonable bumps in the XY-chart. The desired output of this step is that the curve ``Zero mean sigma`` looks smooth and always growing. Use the ``Refresh charts`` button as often as needed to update the charts with the modified data in MongoDB.
52
-6. Open the source code of the class ``com.sap.sailing.windestimation.model.regressor.twdtransition.DurationBasedTwdTransitionRegressorModelContext``. Open it and scroll down to the definition of the inner enum ``DurationValueRange``. The enum defines the intervals for which a separate regressor model will be trained. Read the Javadoc of ``DurationValueRange`` and adjust the intervals accordingly in order to allow the regressor model to learn the ``Zero mean sigma`` curve with minimal error. You can also configure the polynomial which will be used for regressor training. Make sure that there are at least 2 data points contained within each configured interval. The data point with x = 0, y = 0 will be created automatically within model training procedure.
49
+3. Run *com.sap.sailing.windestimation.datavisualization.AggregatedDurationDimensionPlot* to visualize the wind data. A Swing-based GUI-Window must open containing two charts. The upper chart is an XY-chart where the x-axis represents **seconds**, and the y-axis represents TWD delta-based series measures (e.g. standard deviation or mean). Below the chart, a histogram for data points of the XY-Chart is provided. You can zoom-in and zoom-out in each of the charts by mouse dragging. Be aware that currently, the zoom level of both charts is not synchronizing.
50
+ ![Screenshot of graphical wind data visualization tool for duration dimension](../images/windestimation/aggregatedDurationBasedTwdDeltaTransitionBeforeDataCleansing.jpg "Screenshot of duration-based TWD delta visualization tool before data cleansing")
51
+4. Open your graphical MongoDB client and connect to ``windEstimation`` database hosted within your local MongoDB. Open the collection with name ``aggregatedDurationTwdTransition``. Within the collection, you will see all the instances/data points visualized in the previous step. The attribute used for the x-axis is ``value``. Its corresponding metrics plotted in y-axis are the other attributes. ``std`` represents standard deviation (``Sigma`` curve in XY-chart) and ``std0`` represents standard deviation with zero as mean value (``Zero mean sigma`` curve in XY-chart).
52
+ ![Screenshot of MongoDB Compass with opened aggregatedDurationTwdTransition collection](../images/windestimation/mongoDbCompassWithOpenedAggregatedDurationTwdTransitionCollection.jpg "Screenshot of MongoDB Compass with opened aggregatedDurationTwdTransition collection")
53
+5. Delete all the instances within the collection which do not make sense. For this, use the data visualization tool from step 3 to identify such instances. Some of the instances are not representative due to the small number of supporting instances which is visualized in the histogram. Such instances can produce unreasonable bumps in the XY-chart. The desired output of this step is that the curve ``Zero mean sigma`` looks smooth and always growing, e.g. as depicted below:
54
+ ![Screenshot of graphical visualization tool of duration dimension after data cleansing](../images/windestimation/aggregatedDurationBasedTwdDeltaTransitionAfterDataCleansing.jpg "Screenshot of duration-based TWD delta visualization tool after data cleansing")
55
+ Use the ``Refresh charts`` button as often as needed to update the charts with the modified data in MongoDB.
56
+6. Open the source code of the class ``com.sap.sailing.windestimation.model.regressor.twdtransition.DurationBasedTwdTransitionRegressorModelContext`` and scroll down to the definition of the inner enum ``DurationValueRange``. The enum defines the intervals for which a separate regressor model will be trained. Read the Javadoc of ``DurationValueRange`` and adjust the intervals accordingly in order to allow the regressor model to learn the ``Zero mean sigma`` curve with minimal error. You can also configure the polynomial which will be used as regressor function. Make sure that there are at least 2 data points contained within each configured interval. The data point with x = 0, y = 0 will be created automatically within the model training procedure.
53 57
7. Run *com.sap.sailing.windestimation.model.regressor.twdtransition.DurationBasedTwdTransitionStdRegressorTrainer*
54 58
8. Verify the trained regressor functions. They are printed in the console output of the previous step. For instance, you can visualize the polynoms by means of https://www.wolframalpha.com/
55 59
... ...
@@ -58,9 +62,9 @@ The following steps import all the data required from sapsailing.com into the lo
58 62
The steps of this sections are similar to the steps of the previous section. It is recommended to traverse through the previous section before starting with this one, because due to similarity of the steps, the similar steps in this section are described with less details and hints.
59 63
60 64
1. Run *com.sap.sailing.windestimation.data.importer.DistanceBasedTwdTransitionImporter*
61
-2. Run *com.sap.sailing.windestimation.data.importer.AggregatedDistanceBasedTwdTransitionImporter* with at least 10 GB JVM memory.
65
+2. Run *com.sap.sailing.windestimation.data.importer.AggregatedDistanceBasedTwdTransitionImporter*
62 66
3. Run *com.sap.sailing.windestimation.datavisualization.AggregatedDistanceDimensionPlot* to visualize the wind data. Here, the x-axis of the XY-chart represents **meters**
63
-4. Open your graphical MongoDB client and connect to *windEstimation* database hosted by your local MongoDB. Open collection *aggregatedDistanceTwdTransition* collection. Within the collection you will see all the instances/data points visualized in the previous step. The total number of the points must not exceed 100.
67
+4. Open your graphical MongoDB client and connect to *windEstimation* database hosted by your local MongoDB. Open the collection *aggregatedDistanceTwdTransition*. Within the collection you will see all the instances/data points visualized in the previous step.
64 68
5. Delete all the instances within the collection which do not make sense.
65 69
6. Open the source code of the class *com.sap.sailing.windestimation.model.regressor.twdtransition.DistanceBasedTwdTransitionRegressorModelContext*. Scroll down to the definition of the inner class/enum *DistanceValueRange*. The enum defines the intervals for which a separate regressor model will be trained. Adjust the intervals accordingly in order to allow the regressor model to learn the data curve with minimal error.
66 70
7. Run *com.sap.sailing.windestimation.model.regressor.twdtransition.DistanceBasedTwdTransitionStdRegressorTrainer*
... ...
@@ -68,13 +72,13 @@ The steps of this sections are similar to the steps of the previous section. It
68 72
69 73
## Generate the models file
70 74
71
-Within this section, all the trained models produce within previous sections are aggregated and stored as one single file.
75
+Within this section, all the trained models produced within previous sections are aggregated and stored in a single file.
72 76
73
-1. Run ``com.sap.sailing.windestimation.model.ExportedModelsGenerator``. Wait until the model training finishes and the program terminates normally. A new file with serialized representation of internal wind estimation models should be located in ``./windEstimationModels.dat``. The absolute path of the file must be printed in the console output of the program. You can upload the file via HTTP POST to http://sapsailing.com/windestimation/api/windestimation_data (see ``com.sap.sailing.windestimation.jaxrs.api.WindEstimationDataResource``) to update the wind estimation of a server instance. If you changed the source files of ``DurationValueRange`` or ``DistanceValueRange``, then you will need to update ``com.sap.sailing.windestimation`` bundle of the server instance which is meant to receive the new wind estimation models.
77
+1. Run ``com.sap.sailing.windestimation.model.ExportedModelsGenerator``. Wait until the model serialization finishes and the program terminates normally. A new file with serialized representation of internal wind estimation models should be located in ``./windEstimationModels.dat``. The absolute path of the file must be printed in the console output of the program. You can upload the file via HTTP POST to http://sapsailing.com/windestimation/api/windestimation_data (see ``com.sap.sailing.windestimation.jaxrs.api.WindEstimationDataResource``) to update the wind estimation of a server instance. If you changed the source files of ``DurationValueRange`` or ``DistanceValueRange``, then you will need to update ``com.sap.sailing.windestimation`` bundle of the server instance which is meant to receive the new wind estimation models.
74 78
75 79
76 80
## Model Evaluation
77 81
78
-This step is optional. However, it is recommended, to evaluate the performance of wind estimation with new trained models.
82
+This step is optional. However, it is recommended, to evaluate the performance of wind estimation operating with the new models.
79 83
80 84
1. Run ``com.sap.sailing.windestimation.evaluation.WindEstimatorManeuverNumberDependentEvaluationRunner`` to evaluate the wind estimation with the new trained models. The evaluation score will be stored as CSV in ``./maneuverNumberDependentEvaluation.csv``.
... ...
\ No newline at end of file
wiki/images/windestimation/aggregatedDurationBasedTwdDeltaTransitionAfterDataCleansing.jpg
... ...
Binary files /dev/null and b/wiki/images/windestimation/aggregatedDurationBasedTwdDeltaTransitionAfterDataCleansing.jpg differ
wiki/images/windestimation/aggregatedDurationBasedTwdDeltaTransitionBeforeDataCleansing.jpg
... ...
Binary files /dev/null and b/wiki/images/windestimation/aggregatedDurationBasedTwdDeltaTransitionBeforeDataCleansing.jpg differ
wiki/images/windestimation/mongoDbCompassWithOpenedAggregatedDurationTwdTransitionCollection.jpg
... ...
Binary files /dev/null and b/wiki/images/windestimation/mongoDbCompassWithOpenedAggregatedDurationTwdTransitionCollection.jpg differ