Thoughts on Landscape Configuration for Paris 2024 / Marseille

As a baseline we'll use the Olympic Setup. The major change, though, would be that instead of running a local on-site master and a local on-site replica we would run two master instances locally on site where one is the "shadow" and the other one is the "production" master.

We captured a set of scripts and configuration files in out Git repository at configuration/on-site-scripts, in particular also separately for the two laptops, in configuration/on-site-scripts/sap-p1-1 and configuration/on-site-scripts/sap-p1-2.

Many of these scripts and configuration files contain an explicit reference to the replica set name (and therefore sub-domain name, DB name, tag values, etc.) tokyo2020. With the test event up in July 2023 and the Paris Olympic Summer Games 2024 we should consider making this a parameter of these scripts so it is easy to adjust. We will need different sub-domains for the test event and the Games where the latter most likely will have paris2024.sapsailing.com as its domain name and hence paris2024 as the replica set name.

VPCs and VPC Peering

From Tokyo2020 we still have the VPCs around in five regions (eu-west-3, us-west-1, us-east-1, ap-northeast-1, and ap-southeast-2). They were named Tokyo2020 and our scripts currently depend on this. But VPCs can easily be renamed, and with that we may save a lot of work regarding re-peering those VPCs. We will, though need routes to the new "primary" VPC eu-west-3 from everywhere because the paris-ssh.sapsailing.com jump host will be based there. Note the inconsistency in capitalization: for the VPC name and as part of instance names such as SL Tokyo2020 (Upgrade Replica) we use Tokyo2020, for basically everything else it's tokyo2020 (lowercase). When switching to a parameterized approach we should probably harmonize this and use the lowercase name consistently throughout.

I've started with re-naming the VPCs and their routing tables from Tokyo2020 to Paris2024. I've also added VPC peering between Paris (eu-west-3) and California (us-west-1), Virginia (us-east-1), and Sydney (ap-southeast-2). The peering between Paris and Tokyo (ap-northeast-1) already existed because for Tokyo 2020, Paris hosted replicas that needed to access the jump host in the Tokyo region.

I've also copied the "SAP Sailing Analytics 1.150" image to all five regions.

Master and Shadow Master

We will use one laptop as production master, the other as "shadow master." The reason for not using a master and a local replica is that if the local master fails, re-starting later in the event can cause significant delays until all races have loaded and replicated again.

Both laptops shall run their local RabbitMQ instance. Each of the two master processes can optionally write into its local RabbitMQ through an SSH tunnel which may instead redirect to the cloud-based RabbitMQ for an active Internet/Cloud connection.

This will require to set up two MongoDB databases (not separate processes, just different DB names), e.g., "paris2024" and "paris2024-shadow". Note that for the shadow master this means that the DB name does not follow the typical naming convention where the SERVER_NAME property ("paris2024" for both, the primary and the shadow master) also is used as the default MongoDB database name.

Note: The shadow master must have at least one registered replica because otherwise it would not send any operations into the RabbitMQ replication channel. This can be a challenge for a shadow master that has never seen any replica. We could, for example, simulate a replica registration when the shadow master is still basically empty, using, e.g., a CURL request and then ignoring and later deleting the initial load queue on the local RabbitMQ.

Furthermore, the shadow master must not send into the production RabbitMQ replication channel that is used by the production master instance while it is not in production itself, because it would duplicate the operations sent. Instead, the shadow master shall use a local RabbitMQ instance to which an SSH tunnel forwards.

Cloud RabbitMQ

Instead of rabbit-ap-northeast-1.sapsailing.com we will use rabbit-eu-west-3.sapsailing.com pointing to the internal IP address of the RabbitMQ installation in eu-west-3 that is used as the default for the on-site master processes as well as for all cloud replicas.

ALB and Target Group Set-Up

Like for Tokyo2020, a separate ALB for the Paris2024 event will be set up in each of the regions supported. They will all be registered with the Global Accelerator to whose anycast-IP adresses the DNS alias record for paris2024.sapsailing.com will point. Different from Tokyo2020 where we used a static "404 - Not Found" rule as the default rule for all of these ALBs, we can and should use an IP-based target group for the default rule's forwarding and should registed the eu-west-1 "Webserver" (Central Reverse Proxy)'s internal IP address in these target groups. This way, when archiving the event, cached DNS records can still resolve to the Global Accelerator and from there to the ALB(s) and from there, via these default rules, back to the central reverse proxy which then should now where to find the paris2024.sapsailing.com content in the archive.

Target group naming conventions have changed slightly since Tokyo2020: instead of S-ded-tokyo2020 we will use only S-paris2024 for the public target group containing all the cloud replicas.

Cloud Replica Set-Up

Based on the cloud replica set-up for Tokyo2020 we can derive the following user data for Paris2024 cloud replicas:

INSTALL_FROM_RELEASE=build-.............
SERVER_NAME=paris2024
MONGODB_URI="mongodb://localhost/paris2024-replica?replicaSet=replica&retryWrites=true&readPreference=nearest"
USE_ENVIRONMENT=live-replica-server
REPLICATION_CHANNEL=paris2024-replica
REPLICATION_HOST=rabbit-eu-west-3.sapsailing.com
REPLICATE_MASTER_SERVLET_HOST=paris-ssh.internal.sapsailing.com
REPLICATE_MASTER_SERVLET_PORT=8888
REPLICATE_MASTER_EXCHANGE_NAME=paris2024
REPLICATE_MASTER_QUEUE_HOST=rabbit-eu-west-3.sapsailing.com
REPLICATE_MASTER_BEARER_TOKEN="***"
ADDITIONAL_JAVA_ARGS="${ADDITIONAL_JAVA_ARGS} -Dcom.sap.sse.debranding=true"

Make sure to align the INSTALL_FROM_RELEASE parameter to match up with the release used on site.

SSH Tunnels

The baseline is again the Tokyo 2020 set-up. Besides the jump host's re-naming from tokyo-ssh.sapsailing.com to paris-ssh.sapsailing.com. The tunnel scripts for sap-p1-2 that assume sap-p1-2 is (primary) master seem to be faulty. At least, they don't establish a reverse port forward for port 8888 which, however, seems necessary to let cloud replicas reach the on-site master. sap-p1-2 becoming (primary) on-site master means that sap-p1-1 has failed. This can be a problem with the application process but could even be a hardware issue where the entire machine has crashed and has become unavailable. Therefore, sap-p1-2 must take over at least the application and become primary master, and this requires the reverse port forward like this: -R '*:8888:localhost:8888'

The ports and their semantics:

  • 443: HTTPS port of security-service.sapsailing.com (or its local replacement through NGINX)
  • 5673: Outbound RabbitMQ to use by on-site master (or local replacement)
  • 5675: Inbound RabbitMQ for replication from security-service.sapsailing.com (or local replacement)
  • 9443: NGINX HTTP port on sap-p1-1 (also reverse-forwarded from paris-ssh.sapsailing.com)
  • 9444: NGINX HTTP port on sap-p1-2 (also reverse-forwarded from paris-ssh.sapsailing.com)
  • 10201: MongoDB on sap-p1-1
  • 10202: MongoDB on sap-p1-2
  • 10203: MongoDB on paris-ssh.sapsailing.com
  • 15673: HTTP to RabbitMQ administration UI of the RabbitMQ server reached on port 5673
  • 15675: HTTP to RabbitMQ administration UI of the RabbitMQ server reached on port 5675

Regular Operations

  • Three MongoDB nodes form the paris2024 replica set: sap-p1-1:10201, sap-p1-2:10202, and paris-ssh.sapsailing.com:10203, where SSH tunnels forward ports 10201..10203 such that everywhere on the three hosts involved the replica set can be addressed as mongodb://localhost:10201,localhost:10202,localhost:10203/?replicaSet=paris2024&retryWrites=true&readPreference=nearest
  • sap-p1-1 runs the paris2024 production master from /home/sailing/servers/paris2024 against local database paris2024:paris2024, replicating from security-service.sapsailing.com through SSH tunnel from local port 443 pointing to security-service.sapsailing.com (which actually forwards to the ALB hosting the rules for security-service.sapsailing.com and RabbitMQ rabbit.internal.sapsailing.com tunneled through port 5675, with the RabbitMQ admin UI tunneled through port 15675; outbound replication goes to local port 5673 which tunnels to rabbit-eu-west-3.sapsailing.com whose admin UI is reached through port 15673 which tunnels to rabbit-eu-west-3.sapsailing.com:15672
  • sap-p1-2 runs the paris2024 shadow master from /home/sailing/servers/paris2024 against local database paris2024:paris2024-shadow, replicating from security-service.sapsailing.com through SSH tunnel from local port 443 pointing to security-service.sapsailing.com (which actually forwards to the ALB hosting the rules for security-service.sapsailing.com and RabbitMQ rabbit.internal.sapsailing.com tunneled through port 5675, with the RabbitMQ admin UI tunneled through port 15675; outbound replication goes to local port 5673 which tunnels to the RabbitMQ running locally on sap-p1-2, port 5672 whose admin UI is then reached through port 15673 which tunnels to sap-p1-2:15672
  • The database mongodb://mongo0.internal.sapsailing.com,mongo1.internal.sapsailing.com/security_service?replicaSet=live is backed up on a regular basis (nightly) to the local MongoDB replica set paris2024 DB named security_service.

Production Master Failure

Situation: production master fails, e.g., because of a Java VM crash or a deadlock or user issues such as killing the wrong process…

Approach: Switch to previous shadow master, re-configuring all SSH tunnels accordingly; this includes the 8888 reverse forward from the cloud to the local on-site master, as well as the RabbitMQ forward which needs to switch from the local RabbitMQ running on the shadow master's host to the cloud-based RabbitMQ. Clients such as SwissTiming clients need to switch to the shadow master. To remedy gaps in replication due to the SSH tunnel switch we may want to circulate the replica instances, rolling over to a new set of replicas that fetch a new initial load.

Internet Failure

As in the Tokyo 2020 scenario; in particular, the local security service must be started which will work off a regularly updated local MongoDB copy of the cloud-based security-service.sapsailing.com; this also requires to adjust /etc/hosts and the tunnels accordingly.

Test Plan for Test Event Marseille July 2023

Test Internet Failure

We shall emulate the lack of a working Internet connection and practice and test the procedures for switching to a local security-service.sapsailing.com installation as well as a local RabbitMQ standing in for the RabbitMQ deployed in the cloud.

Test Primary Master Hardware Failure

This will require switching entirely to the shadow master. Depending on the state of the reverse port forward of the 8888 HTTP port from the cloud we may or may not have to try to terminate a hanging connection in order to be able to establish a new reverse port forward pointing from the cloud to the shadow master. The shadow master also then needs to use the cloud-based RabbitMQ instead of its local one. As a fine-tuning, we can practice the rolling re-sync of all cloud replicas which will likely have missed operations in the meantime.

Test Primary Master Java VM Failure

This can be caused by a deadlock, VM crash, Full GC phase, massive performance degradation or other faulty behavior. We then need to actively close the reverse SSH port forward from the cloud to the production master's 8888 HTTP port, as a precaution switch the RabbitMQ tunnel from the cloud-based to the local RabbitMQ instance so that in case the production master "wakes up" again, e.g., after a Full GC, it does not start to interfere with the now active shadow master on the RabbitMQ fan-out exchange. On the shadow master we need to re-configure the SSH tunnels, particularly to target the cloud-based RabbitMQ and have the reverse port forward on port 8888 target the shadow master on site now.

Test Primary Mater Failures with no Internet Connection

Combine the above scenarios: a failing production master (hardware or VM-only) will require different tunnel re-configurations, especially regarding the then local security-service.sapsailing.com environment which may need to move to the shadow laptop.

TODO Before / During On-Site Set-Up (Both, Test Event and OSG2024)

  • Set up Global Accelerator and have the already established DNS record paris2024.sapsailing.com (placeholder that points to the Dynamic ALB in the default region eu-west-1 to effectively forward to the central reverse proxy and ultimately the archive server's landing page) become an alias pointing to this Global Accelerator
  • Set up logging buckets for ALBs in all supported regions
  • Set up ALBs in all supported regions, define their three rules (redirect for paris2024.sapsailing.com/ path; forward to public target group for all other paris2024.sapsailing.com traffic; default rule forwarding to IP-based target group containing the eu-west-1 central reverse proxy) and register them with the Global Accelerator
  • Add SSH public keys for password-less private keys of sap-p1-1 and sap-p1-2 to ec2-user@paris-ssh.sapsailing.com:.ssh/authorized_keys.org so that when the authorized_keys file is updated automatically, the on-site keys are still preserved.
  • Create LetsEncrypt certificates for the NGINX installations for paris2024.sapsailing.com and security-service.sapsailing.com and install to the two on-site laptops' NGINX environments
  • Ensure the MongoDB installations on both laptops use the paris2024 replica set
  • Adjust Athena queries to include all ALB logging buckets from all regions