Training of internal Wind Estimation models

This document describes the generation process of Machine Learning (ML) models which are used internally by wind estimation. It is highly recommended to proceed this howto step by step considering the order of sections.

Prerequisites

To complete the training process successfully, you need to make sure that you have the following stuff:

  • A complete onboarding setup for SAP Sailing Analytics development
  • MongoDB (3.4 or higher!) is up and running (same MongoDB instance as required in onboarding howto)
  • At least 100 GB free space on the partition, where MongoDB is operating
  • Installed graphical MongoDB client such as MongoDB Compass (Community version)
  • 16 GB RAM
  • ~24 operation hours of your computer

Model training process

  1. Run com.sap.sailing.windestimation.model.SimpleModelsTraining as normal Java Application. This program downloads all the necessary maneuver and wind data, pre-processes them and initiates training of maneuver classifiers.
  2. Make sure that the launched program does not get termined by an uncaught exception. Wait until graphical info dialog shows up which requests you to perform data cleansing for duration dimension and press OK. Screenshot of graphical info dialog requesting to perform data cleansing for duration dimension A Swing-based GUI-Window must open with two charts, one XY-chart where the x-axis represents seconds, and the y-axis represents TWD delta-based series measures (e.g. standard deviation or mean). Below the chart, a histogram for the data points of the XY-Chart is provided. You can zoom-in and zoom-out in each of the chart by mouse dragging. Be aware that currently, the zoom level of both charts is not synchronizing. Screenshot of graphical wind data visualization tool for duration dimension
  3. Open your graphical MongoDB client and connect to windEstimation database hosted by your local MongoDB. Open the collection with name aggregatedDurationTwdTransition. Within the collection you will see all the instances/data points visualized in the previous step. The attribute used for the x-axis is represented by value. Screenshot of MongoDB Compass with opened aggregatedDurationTwdTransition collection
  4. Delete all the instances within the collection which do not make sense. For this, use the data visualization tool from step 2 to identify such instances. Some of the instances are not representative due to the small number of supporting instances which is visualized in the histogram. Such instances can produce unreasonable bumps in the XY-chart. The desired output of this step is that the series curve Zero mean sigma looks smooth and always growing, like depicted below: Screenshot of graphical visualization tool of duration dimension with after data cleansing Use the Refresh charts button as often as needed to update the charts with the modified data in MongoDB. Close the graphical visualization tool window after you are done with data cleansing to resume the training process. Confirm the confirmation dialog after you have finished the data cleansing of duration dimension: Screenshot of confirmation dialog for finishing the data cleansing
  5. A new information dialog shows up (do not press OK yet!) requesting you to open the source code of the class com.sap.sailing.windestimation.model.regressor.twdtransition.DurationBasedTwdTransitionRegressorModelContext. Open it and scroll down to the definition of the inner enum DurationValueRange. The enum defines the intervals for which a separate regressor model will be trained. Read the Javadoc of DurationValueRange and adjust the intervals accordingly in order to allow the regressor model to learn the Zero mean sigma curve with minimal error. You can also configurate the polynomial which will be used for regressor training. Make sure that there are at least 2 data points available within each interval. The datapoint with x = 0, y = 0 will be created automatically. Press OK in information dialog after you are done.
  6. A graphical info dialog shows up which requests you to perform data cleansing for distance dimension. Press OK. All steps for data cleansing for the distance dimension are very similar to the data cleansing steps step 2. until step 5. for the duration dimension. Thus, consult these steps to complete data cleansing and models configuration for the distance duration. The unit used for distance representation is meters. The collection name required in step 3. is aggregatedDistanceTwdTransition. The class required in step 5. is com.sap.sailing.windestimation.model.regressor.twdtransition.DistanceBasedTwdTransitionRegressorModelContext and its inner enum is DistanceValueRange.
  7. Wait until model training finishes and the program terminates normally. A new file with serialized representation of internal wind estimation models should be located in ./windEstimationModels.dat. The absolute path of the file must be printed in the console output of the program. You can upload the file via HTTP POST to http://sapsailing.com/windestimation/api/windestimation_data (see com.sap.sailing.windestimation.jaxrs.api.WindEstimationDataResource) to update the wind estimation of a server instance.
  8. Optionally, run com.sap.sailing.windestimation.evaluation.WindEstimatorManeuverNumberDependentEvaluationRunner as normal Java Application to evaluate the wind estimation with the new trained models. The evaluation score will be stored as CSV in ./maneuverNumberDependentEvaluation.csv.