wiki/info/landscape/archive-server-upgrade.md
... ...
@@ -15,7 +15,7 @@ with ``172.31.46.203`` being an example of the internal IP address your new arch
15 15
java/target/compareServers -ael https://www.sapsailing.com https://archive-candidate.sapsailing.com
16 16
```
17 17
- Do some spot checks on the new instance
18
-- Switch reverse proxy by adjusting ``ArchiveRewrite`` macro under ``root@sapsailing.com:/etc/httpd/conf.d/000-macros.conf`` followed by ``service httpd reload``
18
+- Switch reverse proxy, by adjusting the archive IP definitions at the top of ``root@sapsailing.com:/etc/httpd/conf.d/000-macros.conf``, followed by ``service httpd reload``
19 19
- Terminate old fail-over EC2 instance; you will have to disabel its termination protection first.
20 20
- Adjust Name tags for what is now the fail-over and what is now the primary archive server in EC2 console
21 21
... ...
@@ -119,46 +119,46 @@ Following the mandatory automated content comparison you should do a few spot ch
119 119
120 120
### Switching in Reverse Proxy
121 121
122
-Once you are content with the quality of the new archive server candidate's contents it's time to switch. Technically, switching archive servers is done by adjusting the corresponding configuration in the central Apache reverse proxy server. You find this in ``root@sapsailing.com:/etc/httpd/conf.d/000-macros.conf`` at the top. In the past we changed the macro directly and didn't have a failover setup to easily switch:
122
+Once you are content with the quality of the new archive server candidate's contents it's time to switch. Technically, switching archive servers is done by adjusting the corresponding configuration, in the central Apache reverse proxy server. You find this in ``root@sapsailing.com:/etc/httpd/conf.d/000-macros.conf`` at the top. The current macros file is as follows:
123 123
124 124
```
125
-Define ARCHIVE_IP xxx.xx.xx.xxx
126
-Define ARCHIVE_FAILOVER_IP xxx.xx.xx.xxx
125
+Define ARCHIVE_IP 172.31.43.140
126
+Define ARCHIVE_FAILOVER_IP 172.31.9.8
127
+Define PRODUCTION_ARCHIVE ${ARCHIVE_IP}
127 128
128 129
<Macro ArchiveRewrite>
129
- Use Rewrite ${ARCHIVE_IP} 8888
130
+ Use Rewrite ${PRODUCTION_ARCHIVE} 8888
130 131
</Macro>
131 132
```
132 133
133
-This was slow if we needed to switchover. As an improvement, which happened to also be neater, we added variables -- defined at the top -- including a variable for an up-and-running failover. In the case of an outage, we could comment the current archive and rename the failover (and then reload). This way we could also switch back if the primary returns to healthy. However, we have worked on an automation script, which now changes the PRODUCTION_ARCHIVE value (see below) to point to the variables ARCHIVE_IP or ARCHIVE_FAILOVER_IP. Upon switching, it calls notify-operators, which can be found in /usr/local/bin: it's a symbollic link pointing to configuration/on-site-scripts/paris2024/notify-operators. The current macros file is as follows:
134
+When the new archive is ready, duplicate the "Define ARCHIVE_IP....." line; comment the first one; and then change the ip
135
+of the second one to be the upgraded archive's private IP. Set the "Define ARCHIVE_FAILOVER_IP....." value to the now old primary. Also make sure "Define PRODUCTION_ARCHIVE...." is a pointer to the archive value, by setting it to `${ARCHIVE_IP}`. It should look something like below (if the new IP
136
+is 172.31.7.12):
137
+
134 138
```
135
-Define ARCHIVE_IP xxx.xx.xx.xxx
136
-Define ARCHIVE_FAILOVER_IP xxx.xx.xx.xxx
137
-Define PRODUCTION_ARCHIVE ${ARCHIVE_IP}
139
+#Define ARCHIVE_IP 172.31.43.140 # comment the old primary
140
+Define ARCHIVE_IP 172.31.7.12 # add the new upgraded item
141
+Define ARCHIVE_FAILOVER_IP 172.31.43.140 # the old primary
142
+Define PRODUCTION_ARCHIVE ${ARCHIVE_IP} #ensure this points to the new archive variable
138 143
139 144
<Macro ArchiveRewrite>
140 145
Use Rewrite ${PRODUCTION_ARCHIVE} 8888
141 146
</Macro>
142 147
```
143
-When a new failover is setup, its IP must replace the ARCHIVE_FAILOVER_IP (a manual operation).
144
-The update script can be found in the git at **switchoverArchive.sh**. This script has 1 parameter which is the path to the macros file, which contains the above macros (currently in /etc/httpd/conf.d/000-macros.conf). Run ```crontab -e``` to edit the cronjobs and add
145
-```
146
-* * * * * /home/wiki/gitwiki/configuration/switchoverArchive.sh "/etc/httpd/conf.d/000-macros.conf"
147
-```
148
-Then save and exit the editor.
149
-Check that the new archive service is now active, e.g., by looking at [sapsailing.com/gwt/status](https://sapsailing.com/gwt/status). It should reflect the new release in its ``release`` field.
150
-
151
-## Tests
152 148
153
-1. Healthy -> Stay healthy
154
-2. Healthy -> Unhealthy
155
-3. Unhealthy -> Stay unhealthy
156
-4. Unhealthy -> Become healthy
157
-5. Multiple cycles
158
-6. Different order combinations: eg. 1,2,3,4; 2,4,1,2,3
149
+Then save and exit the editor. And enter `systemctl reload httpd`.
150
+Check that the new archive service is now active, e.g., by looking at [sapsailing.com/gwt/status](https://sapsailing.com/gwt/status). It should reflect the new release in its ``release`` field.
159 151
160 152
### Clean up EC2 Names and Instances
161 153
162 154
Next, you should terminate the previous fail-over archive server instance, and you need to adjust the ``Name`` tags in the EC2 console of the old primary to show that it's now the fail-over, and for the candidate to show that it's now the primary. Select the old fail-over instance and terminate it. Then change the name tag of "SL Archive" to "SL Archive (Failover)", then change that of "SL Archive (New Candidate)" to "SL Archive", and you're done for now....
163 155
164
-If you establish that the old primary will not recover you must setup a new failover and reconfigure the httpd and then run ```systemctl reload httpd ```, which won't drop any connections.
... ...
\ No newline at end of file
0
+If you need to upgrade this old failover then you can repeat the whole process.
1
+
2
+
3
+### How we automated the automatic failover of the reverse proxy
4
+
5
+We setup a script to be installed as a cronjob on the reverse proxy. It runs multiple curl checks to `/gwt/status` of the primary and if a healthy status code is returned then no change is made but,
6
+if multiple unhealthy status codes are returned, the PRODUCTION_IP definition (found at the top of the macros) is altered to point to the failover definition. Then a reload occurs
7
+and various users are notified by email. If it returns to healthy, then the definition returns to point to the definition of the main archive: `${ARCHIVE_IP}`.
8
+Note that we only reload, edit or send emails if the "new" status differs to what the macros file already displays.
... ...
\ No newline at end of file