Training of internal Wind Estimation models
This document describes the generation process of Machine Learning (ML) models which are used internally by wind estimation. It is highly recommended to proceed this howto step by step considering the order of sections. At the end of this howto, you will generate a file containing the representation of internal models used by com.sap.sailing.windestimation bundle. You can use this file to update the wind estimation models of a running server instance. If you are interested in a more advanced tutorial which requires all the execution steps contained in SimpleModelsTrainingPart... classes to be executed manually, then you might be interested in Advanced Guide for training of internal Wind Estimation models
Prerequisites
To complete the training process successfully, you need to make sure that you have the following stuff:
- A complete onboarding setup for SAP Sailing Analytics development
- MongoDB (3.4 or higher!) is up and running (can be the same MongoDB instance as required in onboarding howto)
- At least 100 GB free space on the partition, where MongoDB is operating
- Installed graphical MongoDB client such as MongoDB Compass (Community version)
- 16 GB RAM
- 24+ operating hours of your computer
Model training process
- Run
com.sap.sailing.windestimation.model.SimpleModelsTrainingPart1as a normal Java Application. After this, all the necessary maneuver and wind data will be downloaded, pre-processed and maneuver classifiers get trained. - Make sure that the launched program does not get termined by an uncaught exception. Wait until graphical info dialog shows up which requests you to perform data cleansing for duration dimension.
Press OK. Afterwards, a graphical window must open with two charts. The top chart is an XY-chart where the x-axis represents seconds and the y-axis represents various TWD delta-based measures (e.g. standard deviation or mean). Below the XY-chart, a histogram for the data points of the XY-Chart is provided. You can zoom-in and zoom-out in each of the charts by mouse dragging. Be aware that currently, the zoom level of both charts is not synchronizing.
- Open your graphical MongoDB client and connect to
windEstimationdatabase hosted within your local MongoDB. Open the collection with nameaggregatedDurationTwdTransition. Within the collection, you will see all the instances/data points visualized in the previous step. The attribute used for the x-axis isvalue.
- Delete all the instances within the collection which do not make sense. For this, use the data visualization tool from step 2 to identify such instances. Some of the instances are not representative due to the small number of supporting instances which is visualized in the histogram. Such instances can produce unreasonable bumps in the XY-chart. The desired output of this step is that the curve
Zero mean sigmalooks smooth and always growing, e.g. as depicted below:
Use the Refresh chartsbutton as often as needed to update the charts with the modified data in MongoDB. Close the graphical visualization tool window after you are done with data cleansing to resume the training process. A confirmation dialog shows up. Confirm it by pressing "Continue with model training" button.
- A new information dialog shows up (do not press OK yet!) requesting you to open the source code of the class
com.sap.sailing.windestimation.model.regressor.twdtransition.DurationBasedTwdTransitionRegressorModelContext. Open it and scroll down to the definition of the inner enumDurationValueRange. The enum defines the intervals for which a separate regressor model will be trained. Read the Javadoc ofDurationValueRangeand adjust the intervals accordingly in order to allow the regressor model to learn theZero mean sigmacurve with minimal error. You can also configure the polynomial which will be used for regressor training. Make sure that there are at least 2 data points contained within each configured interval. The data point with x = 0, y = 0 will be created automatically within model training procedure. Press OK in information dialog after you are done. - A graphical info dialog shows up which requests you to perform data cleansing for the distance dimension. Press OK. All steps for data cleansing for the distance dimension are analogous to the steps of the duration dimension described from step 2. until step 5. Thus, consult these steps in order to complete the data cleansing for the distance dimension. The unit used for the distance representation is meters. The collection name required in step 3. is
aggregatedDistanceTwdTransition. The class required in step 5. iscom.sap.sailing.windestimation.model.regressor.twdtransition.DistanceBasedTwdTransitionRegressorModelContextand its inner enum isDistanceValueRange. - Run
com.sap.sailing.windestimation.model.SimpleModelsTrainingPart2as a normal Java Application. Wait until the model training finishes and the program terminates normally. A new file with serialized representation of internal wind estimation models should be located in./windEstimationModels.dat. The absolute path of the file must be printed in the console output of the program. You can upload the file via HTTP POST to http://sapsailing.com/windestimation/api/windestimation_data (seecom.sap.sailing.windestimation.jaxrs.api.WindEstimationDataResource) to update the wind estimation of a server instance. If you changed the source files ofDurationValueRangeorDistanceValueRange, then you will need to updatecom.sap.sailing.windestimationbundle of the server instance which is meant to receive the new wind estimation models. - Optionally, run
com.sap.sailing.windestimation.evaluation.WindEstimatorManeuverNumberDependentEvaluationRunneras normal Java Application to evaluate the wind estimation with the new trained models. The evaluation score will be stored as CSV in./maneuverNumberDependentEvaluation.csv.