Thoughts on Landscape Configuration for Paris 2024 / Marseille
As a baseline we'll use the Olympic Setup From Tokyo 2020. The major change, though, would be that instead of running a local on-site master and a local on-site replica we would run two master instances locally on site where one is the "shadow" and the other one is the "production" master.
We captured a set of scripts and configuration files in out Git repository at configuration/on-site-scripts, in particular also separately for the two laptops, in configuration/on-site-scripts/sap-p1-1 and configuration/on-site-scripts/sap-p1-2.
Many of these scripts and configuration files contain an explicit reference to the replica set name (and therefore sub-domain name, DB name, tag values, etc.) tokyo2020. With the test event up in July 2023 and the Paris Olympic Summer Games 2024 we should consider making this a parameter of these scripts so it is easy to adjust. We will need different sub-domains for the test event and the Games where the latter most likely will have paris2024.sapsailing.com as its domain name and hence paris2024 as the replica set name.
VPCs and VPC Peering
From Tokyo2020 we still have the VPCs around in five regions (eu-west-3, us-west-1, us-east-1, ap-northeast-1, and ap-southeast-2). They were named Tokyo2020 and our scripts currently depend on this. But VPCs can easily be renamed, and with that we may save a lot of work regarding re-peering those VPCs. We will, though need routes to the new "primary" VPC eu-west-3 from everywhere because the paris-ssh.sapsailing.com jump host will be based there. Note the inconsistency in capitalization: for the VPC name and as part of instance names such as SL Tokyo2020 (Upgrade Replica) we use Tokyo2020, for basically everything else it's tokyo2020 (lowercase). When switching to a parameterized approach we should probably harmonize this and use the lowercase name consistently throughout.
I've started with re-naming the VPCs and their routing tables from Tokyo2020 to Paris2024. I've also added VPC peering between Paris (eu-west-3) and California (us-west-1), Virginia (us-east-1), and Sydney (ap-southeast-2). The peering between Paris and Tokyo (ap-northeast-1) already existed because for Tokyo 2020, Paris hosted replicas that needed to access the jump host in the Tokyo region.
I've also copied the "SAP Sailing Analytics 1.150" image to all five regions.
Master and Shadow Master
We will use one laptop as production master, the other as "shadow master." The reason for not using a master and a local replica is that if the local master fails, re-starting later in the event can cause significant delays until all races have loaded and replicated again.
Both laptops shall run their local RabbitMQ instance. Each of the two master processes can optionally write into its local RabbitMQ through an SSH tunnel which may instead redirect to the cloud-based RabbitMQ for an active Internet/Cloud connection.
This will require to set up two MongoDB databases (not separate processes, just different DB names), e.g., "paris2024" and "paris2024-shadow". Note that for the shadow master this means that the DB name does not follow the typical naming convention where the SERVER_NAME property ("paris2024" for both, the primary and the shadow master) also is used as the default MongoDB database name.
Note: The shadow master must have at least one registered replica because otherwise it would not send any operations into the RabbitMQ replication channel. This can be a challenge for a shadow master that has never seen any replica. We could, for example, simulate a replica registration when the shadow master is still basically empty, using, e.g., a CURL request and then ignoring and later deleting the initial load queue on the local RabbitMQ.
Furthermore, the shadow master must not send into the production RabbitMQ replication channel that is used by the production master instance while it is not in production itself, because it would duplicate the operations sent. Instead, the shadow master shall use a local RabbitMQ instance to which an SSH tunnel forwards.
We will install a cron job that regularly performs a "compareServers" between production and shadow master. Any deviation shall be notified using the e-mail notification mechanism in place for all other alerts and monitoring activities, too.
Cloud RabbitMQ
We will use rabbit-eu-west-3.sapsailing.com pointing to the internal IP address of the RabbitMQ installation in eu-west-3 that is used as the default for the on-site master processes as well as for all cloud replicas.
ALB and Target Group Set-Up
Like for Tokyo2020, a separate ALB for the Paris2024 event will be set up in each of the regions supported. They will all be registered with the Global Accelerator to whose anycast-IP adresses the DNS alias record for paris2024.sapsailing.com will point. Different from Tokyo2020 where we used a static "404 - Not Found" rule as the default rule for all of these ALBs, we can and should use an IP-based target group for the default rule's forwarding and should registed the eu-west-1 "Webserver" (Central Reverse Proxy)'s internal IP address in these target groups. This way, when archiving the event, cached DNS records can still resolve to the Global Accelerator and from there to the ALB(s) and from there, via these default rules, back to the central reverse proxy which then should now where to find the paris2024.sapsailing.com content in the archive.
Target group naming conventions have changed slightly since Tokyo2020: instead of S-ded-tokyo2020 we will use only S-paris2024 for the public target group containing all the cloud replicas.
Cloud Replica Set-Up
Based on the cloud replica set-up for Tokyo2020 we can derive the following user data for Paris2024 cloud replicas:
INSTALL_FROM_RELEASE=build-.............
SERVER_NAME=paris2024
MONGODB_URI="mongodb://localhost/paris2024-replica?replicaSet=replica&retryWrites=true&readPreference=nearest"
USE_ENVIRONMENT=live-replica-server
REPLICATION_CHANNEL=paris2024-replica
REPLICATION_HOST=rabbit-eu-west-3.sapsailing.com
REPLICATE_MASTER_SERVLET_HOST=paris-ssh.internal.sapsailing.com
REPLICATE_MASTER_SERVLET_PORT=8888
REPLICATE_MASTER_EXCHANGE_NAME=paris2024
REPLICATE_MASTER_QUEUE_HOST=rabbit-eu-west-3.sapsailing.com
REPLICATE_MASTER_BEARER_TOKEN="***"
Make sure to align the INSTALL_FROM_RELEASE parameter to match up with the release used on site.
SSH Tunnels
The baseline is again the Tokyo 2020 set-up. Besides the jump host's re-naming from tokyo-ssh.sapsailing.com to paris-ssh.sapsailing.com. The tunnel scripts for sap-p1-2 that assume sap-p1-2 is (primary) master seem to be faulty. At least, they don't establish a reverse port forward for port 8888 which, however, seems necessary to let cloud replicas reach the on-site master. sap-p1-2 becoming (primary) on-site master means that sap-p1-1 has failed. This can be a problem with the application process but could even be a hardware issue where the entire machine has crashed and has become unavailable. Therefore, sap-p1-2 must take over at least the application and become primary master, and this requires the reverse port forward like this: -R '*:8888:localhost:8888'
The ports and their semantics:
- 443: HTTPS port of security-service.sapsailing.com (or its local replacement through NGINX)
- 5673: Outbound RabbitMQ to use by on-site master (regularly to RabbitMQ in eu-west-3, local replacement as fallback)
- 5675: Inbound RabbitMQ (rabbit.internal.sapsailing.com) for replication from security-service.sapsailing.com (or local replacement)
- 9443: NGINX HTTP port on sap-p1-1 (also reverse-forwarded from paris-ssh.sapsailing.com)
- 9444: NGINX HTTP port on sap-p1-2 (also reverse-forwarded from paris-ssh.sapsailing.com)
- 10201: MongoDB on sap-p1-1
- 10202: MongoDB on sap-p1-2
- 10203: MongoDB on paris-ssh.sapsailing.com
- 15673: HTTP to RabbitMQ administration UI of the RabbitMQ server reached on port 5673
- 15675: HTTP to RabbitMQ administration UI of the RabbitMQ server reached on port 5675
- 22222: SSH access to sapsailing.com:22, e.g., for Git access through
ssh://trac@localhost:22222/home/trac/git - 22443: HTTPS access to sapsailing.com:443, e.g., for trying to download a release, although chances are slim this works without local
/etc/hostsmagic, e.g., forreleases.sapsailing.com
/etc/hosts must map security-service.sapsailing.com to localhost so that local port 443 can be forwarded to different targets based on needs.
Regular Operations
- Three MongoDB nodes form the
paris2024replica set:sap-p1-1:10201,sap-p1-2:10202, andparis-ssh.sapsailing.com:10203, where SSH tunnels forward ports 10201..10203 such that everywhere on the three hosts involved the replica set can be addressed asmongodb://localhost:10201,localhost:10202,localhost:10203/?replicaSet=paris2024&retryWrites=true&readPreference=nearest -
sap-p1-1runs theparis2024production master from/home/sailing/servers/paris2024against local databaseparis2024:paris2024, replicating fromsecurity-service.sapsailing.comthrough SSH tunnel from local port 443 pointing tosecurity-service.sapsailing.com(which actually forwards to the ALB hosting the rules forsecurity-service.sapsailing.comand RabbitMQrabbit.internal.sapsailing.comtunneled through port 5675, with the RabbitMQ admin UI tunneled through port 15675; outbound replication goes to local port 5673 which tunnels torabbit-eu-west-3.sapsailing.comwhose admin UI is reached through port 15673 which tunnels torabbit-eu-west-3.sapsailing.com:15672 -
sap-p1-2runs theparis2024shadow master from/home/sailing/servers/paris2024against local databaseparis2024:paris2024-shadow, replicating fromsecurity-service.sapsailing.comthrough SSH tunnel from local port 443 pointing tosecurity-service.sapsailing.com(which actually forwards to the ALB hosting the rules forsecurity-service.sapsailing.comand RabbitMQrabbit.internal.sapsailing.comtunneled through port 5675, with the RabbitMQ admin UI tunneled through port 15675; outbound replication goes to local port 5673 which tunnels to the RabbitMQ running locally onsap-p1-2, port 5672 whose admin UI is then reached through port 15673 which tunnels tosap-p1-2:15672 - The database
mongodb://mongo0.internal.sapsailing.com,mongo1.internal.sapsailing.com/security_service?replicaSet=liveis backed up on a regular basis (nightly) to the local MongoDB replica setparis2024DB namedsecurity_servicewhich makes it visible especially in the two MongoDB replicas running onsap-p1-1andsap-p1-2
Production Master Failure
Situation: production master fails, e.g., because of a Java VM crash or a deadlock or user issues such as killing the wrong process…
Approach: Switch to previous shadow master on sap-p1-2, re-configuring all SSH tunnels accordingly; this includes the 8888 reverse forward from the cloud to the local on-site master, as well as the RabbitMQ forward which needs to switch from the local RabbitMQ running on the shadow master's host to the cloud-based RabbitMQ. Clients such as SwissTiming clients need to switch to the shadow master. To remedy gaps in replication due to the SSH tunnel switch we may want to circulate the replica instances, rolling over to a new set of replicas that fetch a new initial load. If sap-p1-1's operating system is still alive, its SSH tunnel especially for port 8888 reverse forwarding from paris-ssh.sapsailing.com must be terminated because otherwise sap-p1-2 may not be able to establish its according reverse forward of port 8888.
Here are the major changes:
-
sap-p1-2runs theparis2024shadow master from/home/sailing/servers/paris2024against local databaseparis2024:paris2024-shadow, replicating fromsecurity-service.sapsailing.comthrough SSH tunnel from local port 443 pointing tosecurity-service.sapsailing.com(which actually forwards to the ALB hosting the rules forsecurity-service.sapsailing.comand RabbitMQrabbit.internal.sapsailing.comtunneled through port 5675, with the RabbitMQ admin UI tunneled through port 15675; outbound replication goes to local port 5673 which tunnels torabbit-eu-west-3.sapsailing.comwhose admin UI is reached through port 15673 which tunnels torabbit-eu-west-3.sapsailing.com:15672
Internet Failure
While cloud replicas and hence the ALBs and Global Accelerator will remain reachable with the latest data snapshot at the time the connection is lost, we will then lose the following capabilities:
- replicate the official
security-service.sapsailing.comservice, both, from an HTTP as well as a RabbitMQ perspective;rabbit.internal.sapsailing.comwill then no longer be reachable from the on-site network - keep the cloud MongoDB instance on
paris-ssh.sapsailing.comsynchronized; it will fall behind - outbound replication to
rabbit-eu-west-3.sapsailing.comand from there on to the cloud replicas in all regions supported will stop - inbound "reverse" replication from the cloud replicas to the on-site master through the reverse forward of
paris-ssh.sapsailing.com:8888will stop working; the cloud replicas will start buffering the operations to send to their master and will keep re-trying in growing time intervals
To recover with as little disruption as possible, switching to a local copy of the security-service and to a local RabbitMQ for "outbound" replication is required. Of course, no replicas will be listening on that local RabbitMQ, but in order to not stop working, the application server will need a RabbitMQ that can be reached on the outbound port 5673. This is achieved by switching the SSH tunnel such that port 5673 will then forward to a RabbitMQ running locally.
We will then start sap-p1-1:/home/sailing/servers/security_service on port 8889 which will connect to the local MongoDB replica set still consisting of the two on-site nodes, using the database security_service that has been obtained as a copy of the live MongoDB replica set in our default region. This local security service uses the local RabbitMQ running on the same host for its outbound replication. On both on-site laptops the port 443 then needs to forward to the NGINX instance running locally as a reverse proxy for the local security service. On sap-p1-1 this is port 9443, on sap-p1-2 this is port 9444. Furthermore, the port forward from port 5675 and 15675 on both laptops then must point to the local RabbitMQ used outbound by the security service running locally. This will usually be the RabbitMQ running on sap-p1-1, so sap-p1-1:5672, or sap-p1-1:15672, respectively, for the admin port.
This makes for the following set-up:
- Only two MongoDB nodes remain available on site from the
paris2024replica set:sap-p1-1:10201andsap-p1-2:10202, where SSH tunnels forward ports 10201..10203 such that everywhere on the three hosts involved the replica set can be addressed asmongodb://localhost:10201,localhost:10202,localhost:10203/?replicaSet=paris2024&retryWrites=true&readPreference=nearest -
sap-p1-1runs theparis2024production master from/home/sailing/servers/paris2024against local databaseparis2024:paris2024, replicating fromsecurity-service.sapsailing.comthrough SSH tunnel from local port 443 pointing tosap-p1-1:9443which is the port of the local NGINX acting as an SSL-offloading reverse proxy for the security service running locally on port 8889; port 5675 forwards tosap-p1-1:5672where the local RabbitMQ runs, with the localsap-p1-1RabbitMQ admin UI tunneled through port 15675; outbound replication goes to local port 5673 which then also tunnels to the local RabbitMQ onsap-p1-1:5672, whose admin UI is reached through port 15673 which tunnels tosap-p1-1:15672 -
sap-p1-2runs theparis2024shadow master from/home/sailing/servers/paris2024against local databaseparis2024:paris2024-shadow, replicating fromsecurity-service.sapsailing.comthrough SSH tunnel from local port 443 pointing tosap-p1-1:9443which is the reverse proxy for the security service running onsap-p1-1:8889, and RabbitMQ tunneled through port 5675 tosap-p1-1:5672, with the RabbitMQ admin UI tunneled through port 15675 tosap-p1-1:15672; outbound replication still goes to local port 5673 which tunnels to the RabbitMQ running locally onsap-p1-2, port 5672 whose admin UI is then reached through port 15673 which tunnels tosap-p1-2:15672which keeps the shadow master's outbound replication from interfering with the production master's outbound replication.
Internet Failure Using Shadow Master
TODO
Checklist After Event
The experience during "Tokyo 2020" has shown that after the last race of the last day everybody gets in a rush, and the on-site infrastructure starts to get dismantled quickly. For us this means that we need to prepare well for switching to cloud-only operations. The approach in Enoshima worked well, although we were caught a bit by surprise regarding the speed at which infrastructure was taken down.
Cleanly Remove On-Site MongoDB Replicas from paris2024 MongoDB Replica Set
Connecting to the paris2024 MongoDB replica set, first we need to make sure that the cloud replica can become primary. The production configuration was such that by assigning a priority and votes of 0 the cloud replica never would become primary. Now it shall, so we need to change its priority and votes value in the configuration first. For this, issue the following command in the MongoDB shell while connected to the paris2024 replica set:
cfg=rs.config()
Then find the member using port number 10203 which is the cloud replica. Typically, this would be the first element (index 0) in the members array of the cfg object. Assuming it is at index 0, issue the following commands (replacing the 0 index by the actual index of the 10203 port member):
cfg.members[0].priority=1
cfg.members[0].votes=1
rs.reconfig()
rs.remove("localhost:10201")
rs.remove("localhost:10202")
This will make the MongoDB cloud replica running on paris-ssh.sapsailing.com the single primary of the now single-element replica set. The MongoDB processes running on the on-site laptops can then be stopped.
Stop Replication in Cloud Replicas
Then, all cloud replicas need to stop replicating because soon the on-site master will be stopped. See script configuration/on-site-scripts/paris2024/stop-all-cloud-replicas.sh.
Stop On-Site Master and Launch Cloud Master on paris-ssh.sapsailing.com
Next, an application master for the paris2024 application replica set needs to be launched on paris-ssh.sapsailing.com. It uses the MongoDB URI mongodb://localhost:10203/paris2024?replicaSet=paris2024&retryWrites=true&readPreference=nearest, hence connecting to the single-instance MongoDB "replica set" running on the same host. Other than this the instance uses a standard configuration for a live master. This configuration can already be prepared before the event. All that then needs to be done is to adjust the release to the one that all cloud replicas are using.
Test Plan for Test Event Marseille July 2023
Test Internet Failure
We shall emulate the lack of a working Internet connection and practice and test the procedures for switching to a local security-service.sapsailing.com installation as well as a local RabbitMQ standing in for the RabbitMQ deployed in the cloud.
Test Primary Master Hardware Failure
This will require switching entirely to the shadow master. Depending on the state of the reverse port forward of the 8888 HTTP port from the cloud we may or may not have to try to terminate a hanging connection in order to be able to establish a new reverse port forward pointing from the cloud to the shadow master. The shadow master also then needs to use the cloud-based RabbitMQ instead of its local one. As a fine-tuning, we can practice the rolling re-sync of all cloud replicas which will likely have missed operations in the meantime.
Test Primary Master Java VM Failure
This can be caused by a deadlock, VM crash, Full GC phase, massive performance degradation or other faulty behavior. We then need to actively close the reverse SSH port forward from the cloud to the production master's 8888 HTTP port, as a precaution switch the RabbitMQ tunnel from the cloud-based to the local RabbitMQ instance so that in case the production master "wakes up" again, e.g., after a Full GC, it does not start to interfere with the now active shadow master on the RabbitMQ fan-out exchange. On the shadow master we need to re-configure the SSH tunnels, particularly to target the cloud-based RabbitMQ and have the reverse port forward on port 8888 target the shadow master on site now.
Test Primary Master Failures with no Internet Connection
Combine the above scenarios: a failing production master (hardware or VM-only) will require different tunnel re-configurations, especially regarding the then local security-service.sapsailing.com environment which may need to move to the shadow laptop.
TODO Before / During On-Site Set-Up (Both, Test Event and OSG2024)
- Set up Global Accelerator and have the already established DNS record
paris2024.sapsailing.com(placeholder that points to the Dynamic ALB in the default regioneu-west-1to effectively forward to the central reverse proxy and ultimately the archive server's landing page) become an alias pointing to this Global Accelerator - Set up logging buckets for ALBs in all supported regions
- Set up ALBs in all supported regions, define their three rules (redirect for
paris2024.sapsailing.com/path; forward to public target group for all otherparis2024.sapsailing.comtraffic; default rule forwarding to IP-based target group containing theeu-west-1central reverse proxy) and register them with the Global Accelerator - Add SSH public keys for password-less private keys of
sap-p1-1andsap-p1-2toec2-user@paris-ssh.sapsailing.com:.ssh/authorized_keys.orgso that when the authorized_keys file is updated automatically, the on-site keys are still preserved. - Create LetsEncrypt certificates for the NGINX installations for paris2024.sapsailing.com and security-service.sapsailing.com and install to the two on-site laptops' NGINX environments
- Ensure the MongoDB installations on both laptops use the
paris2024replica set - Adjust Athena queries to include all ALB logging buckets from all regions
Other TODOs, Open Questions
- master set-up on sap-p1-2 must be configured in "failover" mode by default; this means it sends to the local RabbitMQ and the security_service MongoDB replica set that does not replicate into the cloud, to keep traffic on the SSH tunnel to the cloud as low as possible
-
create a "primary master" configuration on sap-p1-2 in case sap-p1-1 fails for a longer time and we need to switch to sap-p1-2 for a longer time; in that case we would like to have DB replication into the cloud, so use the localhost:[10201 10202 10203] "paris2024" MongoDB replica set and send to the RabbitMQ in the cloud (rabbit-eu-west-3); these failover scenarios should be manageable by corresponding scripts - Shall we obtain the tunnel scripts via symbolic links from the respective git repo at /home/sailing/code, there then under configuration/on-site-scripts/paris2024/sap-p1-[12]? Currently, they are copies that evolve independently from the git repo.
Replacing an Access Token Accidentally Revoked in security-service DB
SSH into the security-service instance, then run
mongo "mongodb://dbserver.internal.sapsailing.com:10203/security_service?replicaSet=live"
> db.PREFERENCES.update({"USERNAME": "username", "KEYS_AND_VALUES.KEY": "___access_token___"}, { $set: { 'KEYS_AND_VALUES.$': { "KEY" : "___access_token___", "VALUE" : "asfdasdfasdfasfdasfdasdfsadfasfdasdfdsaf=" } } })
> quit()