wiki/info/landscape/archive-server-upgrade.md
... ...
@@ -2,15 +2,22 @@
2 2
3 3
## TL;DR
4 4
5
-- Launch MongoDB replica for ``archive`` replica set (see [here](https://security-service.sapsailing.com/gwt/AdminConsole.html#LandscapeManagementPlace:))
6
-- Wait until MongoDB replica is in ``SECONDARY`` state
5
+- Optionally, to accelerate DB reads, launch MongoDB replica for ``archive`` replica set (see [here](https://security-service.sapsailing.com/gwt/AdminConsole.html#LandscapeManagementPlace:)); wait until MongoDB replica is in ``SECONDARY`` state
7 6
- Launch "more like this" based on existing primary archive, adjusting the ``INSTALL_FROM_RELEASE`` user data entry to the release of choice and the ``Name`` tag to "SL Archive (New Candidate)"
8
-- Wait until the new instance is done with its background tasks and CPU utilization goes to 0% (approximately 24h)
7
+- Wait until the new instance is done with its background tasks and CPU utilization goes to 0% (approximately 36h)
8
+- Create an entry in the reverse proxy's ``/etc/httpd/conf.d/001-events.conf`` file like this:
9
+```
10
+ Use Plain archive-candidate.sapsailing.com 172.31.46.203 8888
11
+```
12
+with ``172.31.46.203`` being an example of the internal IP address your new archive candidate instance got assigned.
9 13
- Compare server contents, either with ``compareServers`` script or through REST API, and fix any differences
14
+```
15
+ java/target/compareServers -ael https://www.sapsailing.com https://archive-candidate.sapsailing.com
16
+```
10 17
- Do some spot checks on the new instance
11 18
- Switch reverse proxy by adjusting ``ArchiveRewrite`` macro under ``root@sapsailing.com:/etc/httpd/conf.d/000-macros.conf`` followed by ``service httpd reload``
12
-- Terminate old fail-over EC2 instance
13
-- adjust Name tags for what is now the fail-over and what is now the primary archive server in EC2 console
19
+- Terminate old fail-over EC2 instance; you will have to disabel its termination protection first.
20
+- Adjust Name tags for what is now the fail-over and what is now the primary archive server in EC2 console
14 21
15 22
[[_TOC_]]
16 23
... ...
@@ -61,6 +68,16 @@ INFO: Thread[MarkPassingCalculator for race R14 initialization,4,main]: Timeout
61 68
```
62 69
that will keep repeating. Watch out for the ``queued tasks`` count. It should be decreasing, and when done it should go down to 0 eventually.
63 70
71
+### Create a Temporary Mapping in ``/etc/httpd/conf.d/001-events.conf`` to Make New Server Accessible Before Switching
72
+
73
+Grab the internal IP address of your freshly launched archive server candidate (something like 172.31.x.x) and ensure you have a line of the form
74
+
75
+```
76
+ Use Plain archive-candidate.sapsailing.com 172.31.35.213 8888
77
+```
78
+
79
+in the file ``root@sapsailing.com:/etc/httpd/conf.d/001-events.conf``, preferably towards the top of the file where it can be quickly found. Save the changes and check the configuration using the ``apachectl configtest`` command. It should give an output saying ``Syntax OK``. Only in this case reload the configuration by issuing the ``service httpd reload`` command as user ``root``. After this command has completed, you can watch your archive server candidate start up at [https://archive-candidate.sapsailing.com/gwt/status](https://archive-candidate.sapsailing.com/gwt/status) and make any changes necessary when the ``compareServers`` script (see below) notifies you of any differences that need handling.
80
+
64 81
### Comparing Contents with Primary ARCHIVE Server
65 82
66 83
This is when you can move on to comparing the new candidate's contents with the primary archive server. Two ways are currently possible.
... ...
@@ -94,6 +111,8 @@ Differences are reported in a JSON response document. If no differences are foun
94 111
```
95 112
In this example, ``34.242.227.113`` would be the public IP address of the server that responded to the ``www.sapsailing.com`` request (the current primary archive server), and ``1.2.3.4`` would be the public IP address of your new candidate archive server. The response status will be ``200`` if the comparison was ok, and ``409`` otherwise. Handle differences are described for the ``compareServers`` script above.
96 113
114
+You can also trigger the REST API-based comparison by using the ``-a`` option of the ``compareServers`` script.
115
+
97 116
### Manual Spot Checks
98 117
99 118
Following the mandatory automated content comparison you should do a few spot checks on the new archive server candidate. Go to ``http://1.2.3.4:8888/gwt/Home.html`` if ``1.2.3.4`` is the public IP address of your new archive server candidate and browse through a few events. Note that clicking on a link to show all events will get you back to ``www.sapsailing.com``. In this case, replace ``www.sapsailing.com`` by your candidate server's public IP address again and continue browsing.
wiki/info/landscape/upgrading-archive-server.md
... ...
@@ -1,113 +0,0 @@
1
-# Upgrading the Archive Server
2
-
3
-[[_TOC_]]
4
-
5
-For the archive server we currently use two EC2 instances: one as the production server, and a second one as a failover instance. The failover server is usually one version "behind" the production server which is meant to allow for recovering fast from regressions introduced accidentally by an upgrade. With startup cycles that take approximately 24h as of this writing it is no option to hope to recover by starting another instance "fast."
6
-
7
-The mapping for which archive server to use is encoded in the central web server's Apache reverse proxy configuration under ``/etc/httpd/conf.d/000-macros.conf``. At its head there is a configuration like this:
8
-
9
-```
10
-<Macro ArchiveRewrite>
11
-# ARCHIVE, based on i3.2xlarge, 64GB RAM and 1.9TB swap
12
- Use Rewrite 172.31.26.254 8888
13
-</Macro>
14
-```
15
-
16
-It defines the ``ArchiveRewrite`` macro which is then used by other macros such as ``Series-ARCHIVE`` and ``Event-ARCHIVE``, used then in ``/etc/httpd/conf.d/001-events.conf``.
17
-
18
-The process of upgrading the archive server works in the following steps:
19
-* Scale out the underlying ``archive`` MongoDB replica set
20
-* Launch a new EC2 instance with the necessary archive server configuration for the new release you'd like to upgrade to
21
-* Create a temporary mapping in ``/etc/httpd/conf.d/001-events.conf`` to make new server accessible before switching
22
-* Wait for the new candidate server to have finished loading
23
-* Compare contents to running archive
24
-* Switch ``ArchiveRewrite`` macro to new instance
25
-* Reload Apache httpd service
26
-* Re-label EC2 instances and shut down old failover server
27
-
28
-The following sections explain the steps in a bit more detail.
29
-
30
-## Scale out the underlying ``archive`` MongoDB replica set
31
-
32
-When not launching an archive server, the ``archive`` replica set runs on a small, inexpensive but slow instance ``dbserver.internal.sapsailing.com:10201`` with which launching a new archive server takes days and weeks. In order to accelerate loading all content, first two or three replicas need to be added. Either use the ``configuration/aws-automation/launch-mongodb-replica.sh`` script from your git workspace as many times as you want to have replicas:
33
-
34
-```
35
- ./configuration/aws-automation/launch-mongodb-replica.sh -r archive -p dbserver.internal.sapsailing.com:10201 -P 0 -t i3.2xlarge -k {your-key-name}
36
-```
37
-
38
-Or use [https://security-service.sapsailing.com/gwt/AdminConsole.html?locale=en#LandscapeManagementPlace:](https://security-service.sapsailing.com/gwt/AdminConsole.html?locale=en#LandscapeManagementPlace:) and log on with your account, make sure you have an EC2 key registered, then select the ``archive`` replica set under "MongoDB Endpoints" and click on the "Scale in/out" action button, then choose the number of replicas you'd like to launch and confirm.
39
-
40
-Once triggered, you can watch the instances start up in the AWS console under EC2 instances. The filling of those replicas may take a few hours; you may check progress by connecting to a host in the landscape with the ``mongo`` client installed and try something like this:
41
-
42
-```
43
- $ mongo "mongodb://dbserver.internal.sapsailing.com:10201/?replicaSet=archive&retryWrites=true&readPreference=nearest"
44
- ...
45
- archive:PRIMARY> rs.status()
46
- archive:PRIMARY> quit()
47
-```
48
-
49
-This will output a JSON document listing the members of the ``archive`` replica set. You should see a ``stateStr`` attribute which usually is one of ``PRIMARY``, ``SECONDARY``, or ``STARTUP2``. Those ``STARTUP2`` instances are in the process of receiving an initial load from the replica set and are not yet ready to receive requests. If you'd like to minimize the actual start-up time of your archive server process later, wait until all replicas have reached the ``SECONDARY`` state. But it's no problem, either, to launch the new archive server before that, only it will take longer to launch, and probably the additional load on the replica set will make the replicas reach the ``SECONDARY`` state even later.
50
-
51
-## Launch a new EC2 instance
52
-
53
-Use an instance of type ``i3.2xlarge`` or larger. The most important thing is to pick an "i" instance type that has fast NVMe disks attached. Those will be configured automatically as swap space. You may select "Launch more like this" based on the current production archive server to start with. Use the following as the user data:
54
-
55
-```
56
-INSTALL_FROM_RELEASE={build-...}
57
-USE_ENVIRONMENT=archive-server
58
-REPLICATE_MASTER_BEARER_TOKEN="***"
59
-```
60
-
61
-Obtain the actual build version, e.g., from looking at [https://releases.sapsailing.com](https://releases.sapsailing.com), picking the newest ``build-...`` release. The Java heap size is currently set at 200GB in [https://releases.sapsailing.com/environments/archive-server](https://releases.sapsailing.com/environments/archive-server). You can adjust this by either adding a ``MEMORY=...`` assignment to your user data, overriding the value coming from the ``archive-server`` environment, or you may consider adjusting the ``archive-server`` environment for good by editing ``trac@sapsailing.com:releases/environments/archive-server``.
62
-
63
-Launching the instance in our default availability zone (AZ) ``eu-west-1c`` may be preferable as it minimizes cross-AZ traffic since the central web server / reverse proxy is also located in that AZ. However, this is not absolutely necessary, and you may also consider it a good choice to launch the new candidate in an AZ explicitly different from the current production archive server's AZ in order to increase chances for high availability should an AZ die.
64
-
65
-Tag the instance with ``Name=SL Archive (New Candidate)`` and ``sailing-analytics-server=ARCHIVE``. You may have inherited these tags (except with the wrong name) in case you used "Launch more like this" before.
66
-
67
-## Create a temporary mapping in ``/etc/httpd/conf.d/001-events.conf`` to make new server accessible before switching
68
-
69
-Grab the internal IP address of your freshly launched archive server candidate (something like 172.31.x.x) and ensure you have a line of the form
70
-
71
-```
72
- Use Plain archive-candidate.sapsailing.com 172.31.35.213 8888
73
-```
74
-
75
-in the file ``root@sapsailing.com:/etc/httpd/conf.d/001-events.conf``, preferably towards the top of the file where it can be quickly found. Save the changes and check the configuration using the ``apachectl configtest`` command. It should give an output saying ``Syntax OK``. Only in this case reload the configuration by issuing the ``service httpd reload`` command as user ``root`. After this command has completed, you can watch your archive server candidate start up at [https://archive-candidate.sapsailing.com/gwt/status](https://archive-candidate.sapsailing.com/gwt/status).
76
-
77
-## Wait for the new candidate server to have finished loading
78
-
79
-At the top of the ``/gwt/status`` output you'll see the fields ``numberofracestorestore`` and the ``numberofracesrestored``. The ``/gwt/status`` health check will respond with a status 200 only after the ``numberofracesrestored`` has reached the ``numberofracestorestore``. But even after that the new archive server candidate will be busy for a few more hours. The ``numberofracesrestored`` count only tells you the loading of how many races has been *triggered*, not necessarily how many of those have already *completed* loading. Furthermore, even when all content has been loaded, calculations will keep going on for a few more hours, re-establishing information about wind estimation, mark passings, and maneuvers. You can follow the CPU load of the candidate server in the AWS EC2 console. Select your new instance and switch to the Monitoring tab, there watch for the "CPU Utilization" and wait for it to consistently drop from approximately 100% down to 0%. This indicates the end of the loading process, and you can now proceed to the next step.
80
-
81
-## Compare contents to running archive
82
-
83
-It is a good idea to ensure all races really have loaded successfully, and all wind data and GPS data is available again, just like in the current production archive server. For this purpose, the ``java/target/compareServers`` script exists in your git workspace. Call like this:
84
-
85
-```
86
- java/target/compareServers -el https://www.sapsailing.com https://archive-candidate.sapsailing.com
87
-```
88
-
89
-This assumes, of course, that you have completed the step explained above for establishing HTTPS access to ``archive-candidate.sapsailing.com``.
90
-
91
-The script will fetch leaderboard group by leaderboard group and compare their contents, race loading state, and presence or absence of tracking and wind data for all races reachable through those leaderboard groups. The names of the leaderboard groups will be printed to the standard output. If a difference is found, the script will exit, showing the difference on the console. You then have a chance to fix the problem by logging on to [https://archive-candidate.sapsailing.com/gwt/AdminConsole.html](https://archive-candidate.sapsailing.com/gwt/AdminConsole.html). Assuming you were able to fix the problem, continue the comparison by calling the ``compareServers`` script again, adding the ``-c`` option ("continue"), like this:
92
-
93
- ```
94
- java/target/compareServers -cel https://www.sapsailing.com https://archive-candidate.sapsailing.com
95
-```
96
-
97
-This will continue with the comparison starting with the leaderboard group where a difference was found in the previous run.
98
-
99
-Should you encounter a situation that you think you cannot or don't want to fix, you have to edit the ``leaderboardgroups.old.sed`` and ``leaderboardgroups.new.sed`` files and remove the leaderboard group(s) you want to exclude from continued comparisons. You find the leaderboard group that produced the last difference usually at the top of the two files. Once saved, continue again with the ``-c`` option as shown above.
100
-
101
-Once the script has completed successfully you may want to do a few spot checks on [https://archive-candidate.sapsailing.com](https://archive-candidate.sapsailing.com). Note, that when you navigate around, some links may throw you back to ``www.sapsailing.com``. There, you manually have to adjust your browser address bar to ``archive-candidate.sapsailing.com`` to continue looking at your new candidate archive server contents.
102
-
103
-## Switch ``ArchiveRewrite`` macro to new instance and reload httpd
104
-
105
-Once you're satisfied with the contents of your new candidate it's time to switch. Log on as ``root@sapsailing.com`` and adjust the ``/etc/httpd/conf.d/000-macros.conf`` file, providing the new internal server IP address in the ``ArchiveRewrite`` macro at the top of the file. Save and check the configuration again using ``apachectl configtest``. If the syntax is OK, reload the configuration with ``service httpd reload`` and inform the community by a corresponding JAM post, ideally providing the release to which you upgraded.
106
-
107
-## Re-label EC2 instances and shut down old failover server
108
-
109
-Finally, it's time to shut down the old failover instance which is now no longer needed. Go to the AWS EC2 console, look at the running instances and identify the one labeled "SL Archive (Failover)". In the "Instance Settings" disable the termination protection for this instance, then terminate.
110
-
111
-Rename the "SL Archive" instance to "SL Archive (Failover)", then rename "SL Archive (New Candidate)" to "SL Archive".
112
-
113
-Then, enable termination protection for your new "SL Archive" server if it isn't already enabled because you used "Launch more like this" to create the instance using the previous production archive server as a template.
... ...
\ No newline at end of file