Training of internal Wind Estimation models
This document describes the generation process of Machine Learning (ML) models which are used internally by wind estimation. It is highly recommended to proceed this howto step by step considering the order of sections.
Prerequisites
To complete the training process successfully, you need to make sure that you have the following stuff:
- A complete onboarding setup for SAP Sailing Analytics development
- MongoDB (3.4 or higher!) is up and running (same MongoDB instance as required in onboarding howto)
- At least 100 GB free space on the partition, where MongoDB is operating
- Installed graphical MongoDB client such as MongoDB Compass (Community version)
- 16 GB RAM
- ~24 operation hours of your computer
Model training process
- Run
com.sap.sailing.windestimation.model.SimpleModelsTrainingas normal Java Application. This program downloads all the necessary maneuver and wind data, pre-processes them and initiates training of maneuver classifiers. - Make sure that the launched program does not get termined by an uncaught exception. Wait until graphical info dialog shows up which requests you to perform data cleansing for duration dimension and press OK.
A Swing-based GUI-Window must open with two charts, one XY-chart where the x-axis represents seconds, and the y-axis represents TWD delta-based series measures (e.g. standard deviation or mean). Below the chart, a histogram for the data points of the XY-Chart is provided. You can zoom-in and zoom-out in each of the chart by mouse dragging. Be aware that currently, the zoom level of both charts is not synchronizing.
- Open your graphical MongoDB client and connect to
windEstimationdatabase hosted by your local MongoDB. Open the collection with nameaggregatedDurationTwdTransition. Within the collection you will see all the instances/data points visualized in the previous step. The attribute used for the x-axis is represented byvalue.
- Delete all the instances within the collection which do not make sense. For this, use the data visualization tool from step 2 to identify such instances. Some of the instances are not representative due to the small number of supporting instances which is visualized in the histogram. Such instances can produce unreasonable bumps in the XY-chart. The desired output of this step is that the series curve
Zero mean sigmalooks smooth and always growing, like depicted below:
Use the Refresh chartsbutton as often as needed to update the charts with the modified data in MongoDB. Close the graphical visualization tool window after you are done with data cleansing to resume the training process. Confirm the confirmation dialog after you have finished the data cleansing of duration dimension:
- A new information dialog shows up (do not press OK yet!) requesting you to open the source code of the class
com.sap.sailing.windestimation.model.regressor.twdtransition.DurationBasedTwdTransitionRegressorModelContext. Open it and scroll down to the definition of the inner enumDurationValueRange. The enum defines the intervals for which a separate regressor model will be trained. Read the Javadoc ofDurationValueRangeand adjust the intervals accordingly in order to allow the regressor model to learn theZero mean sigmacurve with minimal error. You can also configurate the polynomial which will be used for regressor training. Make sure that there are at least 2 data points available within each interval. The datapoint with x = 0, y = 0 will be created automatically. Press OK in information dialog after you are done. - A graphical info dialog shows up which requests you to perform data cleansing for distance dimension. Press OK. All steps for data cleansing for the distance dimension are very similar to the data cleansing steps step 2. until step 5. for the duration dimension. Thus, consult these steps to complete data cleansing and models configuration for the distance duration. The unit used for distance representation is meters. The collection name required in step 3. is
aggregatedDistanceTwdTransition. The class required in step 5. iscom.sap.sailing.windestimation.model.regressor.twdtransition.DistanceBasedTwdTransitionRegressorModelContextand its inner enum isDistanceValueRange. - Wait until model training finishes and the program terminates normally. A new file with serialized representation of internal wind estimation models should be located in
./windEstimationModels.dat. The absolute path of the file must be printed in the console output of the program. You can upload the file via HTTP POST to http://sapsailing.com/windestimation/api/windestimation_data (seecom.sap.sailing.windestimation.jaxrs.api.WindEstimationDataResource) to update the wind estimation of a server instance. - Optionally, run
com.sap.sailing.windestimation.evaluation.WindEstimatorManeuverNumberDependentEvaluationRunneras normal Java Application to evaluate the wind estimation with the new trained models. The evaluation score will be stored as CSV in./maneuverNumberDependentEvaluation.csv.