doc/LandscapeOverview.pptx
... ...
Binary files /dev/null and b/doc/LandscapeOverview.pptx differ
wiki/info/landscape/amazon-ec2.md
... ...
@@ -4,14 +4,16 @@
4 4
5 5
## Quickstart
6 6
7
-Our default region in AWS EC2 is eu-west-1 (Ireland). Tests are currently run in the otherwise unused region eu-west-2 (London).
7
+Our default region in AWS EC2 is eu-west-1 (Ireland). Tests are currently run in the otherwise unused region eu-west-2 (London). Most regular operations can be handled through the AdminConsole's "Advanced / Landscape" tab. See, e.g., [https://security-service.sapsailing.com/gwt/AdminConsole.html#LandscapeManagementPlace:](https://security-service.sapsailing.com/gwt/AdminConsole.html#LandscapeManagementPlace:). Some operations occurring not so frequently still require more in-depth knowledge of steps, manual execution of commands on the command line and some basic Linux understanding.
8 8
9
-### Important Servers, Hostnames
9
+## Important Servers, Hostnames
10 10
11 11
- Web Server / Central Reverse Proxy: reachable through SSH to sapsailing.com:22
12 12
- Database Servers: dbserver.internal.sapsailing.com (archive server winddb on port 10201, all other slow/archived DBs on 10202, hidden replica of "live" replica set on 10203), mongo0.internal.sapsailing.com, mongo1.internal.sapsailing.com
13 13
- RabbitMQ Server: rabbit.internal.sapsailing.com
14
-- MySQL DB (mainly for Bugzilla): mysql.internal.sapsailing.com (currently co-deployed on the same old instance that also runs RabbitMQ)
14
+- MySQL DB (mainly for Bugzilla): mysql.internal.sapsailing.com (currently co-deployed on the same old instance that also runs RabbitMQ, hence currently mysql.internal.sapsailing.com and rabbit.internal.sapsailing.com refer to the same instance)
15
+- Hudson Build Server: called "Build/Test/Dev", running a Hudson instance reachable at ``hudson.sapsailing.com`` and a test instance of the SAP Sailing Analytics available under ``dev.sapsailing.com``
16
+- Additional "Build Slaves" launched by the Hudson Build Server: Named ``Hudson Ubuntu Slave``, used to run individual build jobs
15 17
16 18
## Landscape Overview
17 19
... ...
@@ -22,9 +24,13 @@ In Route53 (the AWS DNS) we have registered the sapsailing.com domain and can ma
22 24
* accept HTTPS/TLS connections on port 443, using the ACM-managed certificate for ``*.sapsailing.com`` and ``sapsailing.com`` and also forwarding to the ``HTTP-to-sapsailing-dot-com`` target group
23 25
* optionally, this NLB could be extended by UDP port mappings in case we see a use case for UDP-based data streams that need forwarding to specific applications, such as the Expedition data typically sent on ports 2010 and following
24 26
25
-### Webserver
27
+Additionally, we have created a CNAME record for ``*.sapsailing.com`` pointing at a default application load balancer (ALB) (currently ``DefDynsapsailing-com-1492504005.eu-west-1.elb.amazonaws.com``) in our default region (eu-west-1). Thie default ALB is also called our "dynamic ALB" because it doesn't depend on DNS rules other than the default one for ``*.sapsailing.com``, so other than changes to the DNS which can take minutes to hours to propagate through the world-wide DNS, changes to the default ALB's rule set take effect immediately. Like all ALBs, this one also has a default rule that refers all traffic not matched by other rules to a target group that forwards traffic to an (in the future probably multiple) Apache httpd webserver. All these ALBs handle SSL termination by means of an ACM-managed certificate that AWS automatically renews before it expires. The traffic routed to the target groups is always HTTP only.
26 28
27
-The web server currently exists only as one instance but could now be replicated to other availabililty zones (AZ)s, entering those other IPs into the ``HTTP-to-sapsailing-dot-com`` target group (and, as will be described further below, to the ``CentralWebServerHTTP*`` (for the "dynamic" ALB in eu-west-1) or ``{ALB-name}-HTTP`` (for all DNS-mapped ALBs) target group of each application load balancer (ALB) in the region). For all of sapsailing.com it does not (no longer) care about SSL and does not need to have an SSL certificate (anymore). In particular, it offers the following services:
29
+Further ALBs may exist in addition to the default ALB and the NLB for ``sapsailing.com``. Those will then have to have one or more DNS record(s) pointing to them for which matching rules based on the hostname exist in the ALB listener's rule set. This set-up is specifically appropriate for "longer-lived" content where during archiving or dismantling a DNS lag is not a significant problem.
30
+
31
+### Apache httpd Webserver and Reverse Proxy
32
+
33
+The web server currently exists only as one instance but could now be replicated to other availability zones (AZ)s, entering those other IPs into the ``HTTP-to-sapsailing-dot-com`` target group (and, as will be described further below, to the ``CentralWebServerHTTP*`` (for the "dynamic" ALB in eu-west-1) or ``{ALB-name}-HTTP`` (for all DNS-mapped ALBs) target group of each application load balancer (ALB) in the region). For all of sapsailing.com it does not (no longer) care about SSL and does not need to have an SSL certificate (anymore). In particular, it offers the following services:
28 34
29 35
* hudson.sapsailing.com - a Hudson installation on dev.internal.sapsailing.com
30 36
* bugzilla.sapsailing.com - a Bugzilla installation under /usr/lib/bugzilla
... ...
@@ -37,7 +43,7 @@ The web server currently exists only as one instance but could now be replicated
37 43
* gitlist.sapsailing.com - for our git at /home/trac/git
38 44
* git.sapsailing.com - for git cloning for dedicated users, used among other things for replication into git.wdf.sap.corp
39 45
40
-Furthermore, it host aliases for ``sapsailing.com``, ``www.sapsailing.com`` and all subdomains for archived content, pointing to the archive server which is defined in ``/etc/httpd/conf.d/000-macros.conf``. This is also where the archive server switching has to be configured. Before reloading the configuration, make sure the syntax is correct, or else you may end up killing the web server, leading to downtime. Check by running
46
+Furthermore, it hosts aliases for ``sapsailing.com``, ``www.sapsailing.com`` and all subdomains for archived content, pointing to the archive server which is defined in ``/etc/httpd/conf.d/000-macros.conf``. This is also where the archive server switching has to be configured. Before reloading the configuration, make sure the syntax is correct, or else you may end up killing the web server, leading to downtime. Check by running
41 47
```
42 48
apachectl configtest
43 49
```
... ...
@@ -51,7 +57,7 @@ The webserver is registered as target in various locations:
51 57
* As DNS record with its internal IP address (e.g., 172.31.19.129) for the two DNS entries ``logfiles.internal.sapsailing.com`` used by various NFS mounts, and ``smtp.internal.sapsailing.com`` for e-mail traffic sent within the landscape and not requiring the AWS SES
52 58
* as IP target with its internal IP address for the ``HTTP-to-sapsailing-dot-com`` target group, accepting the HTTP traffic sent straight to ``sapsailing.com`` (not ``www.sapsailing.com``)
53 59
* as IP target with its internal IP address for the ``SSH-to-sapsailing-dot-com`` target group, accepting the SSH traffic for ``sapsailing.com``
54
-* as regular instance target in all load balancers' default rule's target group, such as ``DefDynsapsailing-com``, ``DNSMapped-0``, ``DNSMapped-1``, and so on
60
+* as regular instance target in all load balancers' default rule's target group, such as ``DefDynsapsailing-com``, ``DNSMapped-0``, ``DNSMapped-1``, and so on; the names of the target groups are ``CentralWebServerHTTP-Dyn``, ``DDNSMapped-0-HTTP``, ``DDNSMapped-1-HTTP``, and so on, respectively.
55 61
* as target of the elastic IP address ``54.229.94.254``
56 62
57 63
Furthermore, it is helpful to ensure that the ``/internal-server-status`` path will resolve correctly to the Apache httpd server status page. For this, the ``/etc/httpd/conf.d/001-events.conf`` file contains three rules at the very beginning:
... ...
@@ -67,11 +73,65 @@ The second obviously requires maintenance as the internal IP changes, e.g., when
67 73
68 74
### DNS and Application Load Balancers (ALBs)
69 75
70
-We distinguish between DNS-mapped and non-DNS-mapped content. The basic services offered by the web server as listed above are DNS-mapped, with the DNS entries being CNAME records pointing to an ALB (DNSMapped-0-1286577811.eu-west-1.elb.amazonaws.com) which handles SSL offloading with the Amazon-managed certificate and forwards those requests to the web server. Furthermore, longer-running application replica sets can have a sub-domain declared in Route53's DNS, pointing to an ALB which then forwards to the public and master target groups for this replica set based on hostname, header fields and request method. A default redirect for the ``/`` path can also be defined, obsoleting previous Apache httpd reverse proxy redirects.
76
+We distinguish between DNS-mapped and non-DNS-mapped content. The basic services offered by the web server as listed above are DNS-mapped, with the DNS entries being CNAME records pointing to an ALB (DNSMapped-0-1286577811.eu-west-1.elb.amazonaws.com) which handles SSL offloading with the Amazon-managed certificate and forwards those requests to the web server. Furthermore, longer-running application replica sets can have a sub-domain declared in Route53's DNS, pointing to an ALB which then forwards to the public and master target groups for this replica set based on hostname, header fields and request method. A default redirect for the ``/`` path can also be defined, obsoleting previous Apache httpd reverse proxy redirects for non-archived ALB-mapped content.
71 77
72 78
Shorter-running events may not require a DNS record. The ALB ``DefDynsapsailing-com-1492504005.eu-west-1.elb.amazonaws.com`` is target for ``*.sapsailing.com`` and receives all HTTP/HTTPS requests not otherwise handled. While HTTP immediately redirects to HTTPS, the HTTPS requests will pass through its rules. If application replica sets have their rules declared here, they will fire. Everything else falls through to the default rule which forwards to the web server's target group again. This is how archived events as well as requests for ``www.sapsailing.com`` end up.
73 79
74
-The requests going straight to ``sapsailing.com`` are handled by the NLB (see above), get forwarded to the web server and are re-directed to ``www.sapsailing.com`` from there, ending up at the non-DNS-mapped load balancer where by default they are then sent again to the web server which sends it to the archive server.
80
+The requests going straight to ``sapsailing.com`` are handled by the NLB (see above), get forwarded to the web server and are re-directed to ``www.sapsailing.com`` from there, ending up at the non-DNS-mapped load balancer where by default they are then sent again to the web server / reverse proxy which sends it to the archive server.
81
+
82
+In addition to a default re-direct for the "/" path, the following four ALB listener rules for a single application replica set are defined, all requiring the "Host" to match the hostname:
83
+- if the HTTP header ``X-SAPSSE-Forward-Request-To`` is ``master`` then forward to the master target group
84
+- if the HTTP header ``X-SAPSSE-Forward-Request-To`` is ``replica`` then forward to the public target group
85
+- if the request method is ``GET`` then forward to the public target group
86
+- forward all other request for the hostname to the master target group
87
+
88
+### MongoDB Replica Sets
89
+
90
+### Shared Security and Application Data Across ``sapsailing.com``
91
+
92
+TODO explain the special role of security-service.sapsailing.com
93
+
94
+### Dedicated Application Replica Set
95
+
96
+### Application Replica Set Using Shared Instances
97
+
98
+### Auto-Scaling Groups and Launch Configurations
99
+
100
+### Important Amazon Machine Images (AMIs)
101
+
102
+TODO talk about image upgradability using the "image-upgrade" user data line, optionally combined with the "no-shutdown" flag.
103
+
104
+### AWS Tags
105
+
106
+AMIs / image-type and versions
107
+
108
+sailing-analytics-server
109
+
110
+mongo-replica-sets
111
+
112
+## Automated Procedures
113
+
114
+### Creating a New Application Replica Set
115
+
116
+### Moving Application Replica Set from Shared to Dedicated Infrastructure
117
+
118
+### Moving Application Replica Set from Dedicated to Shared Infrastructure
119
+
120
+### Scaling Replica Instances Up/Down
121
+
122
+### Scaling Master Up/Down
123
+
124
+### Upgrading Application Replica Set
125
+
126
+### Archiving Application Replica Set
127
+
128
+### Removing Application Replica Set
129
+
130
+### Upgrading Application AMI
131
+
132
+TODO explain how launch configurations can be upgraded as well
133
+
134
+### Upgrading MongoDB AMI
75 135
76 136
#### Starting an instance
77 137
wiki/projects/cloud-orchestrator.md
... ...
@@ -10,6 +10,29 @@ We believe that a central *orchestrator* approach should be used to solve this c
10 10
11 11
![](https://wiki.sapsailing.com/wiki/images/orchestration/architecture.png)
12 12
13
+## Current Status
14
+
15
+Parts of the plan outlined by this document around August 2018 have been implemented by February 2022. The document is still left in place because it contains several concepts and ideas not implemented yet. The following things have already been achieved by February 2022:
16
+
17
+- SSL offloading at the ALB: all application traffic is taken in through application load balancers (ALBs) by now. The HTTPS/SSL connection is terminated there, using an AWS-managed certificate with automatic renewal.
18
+- An Apache-based central reverse proxy exists only for archived content and a few basic services such as CI, Bug tracking or HTTPS Git access
19
+- Automation functionality has been built using the Java/OSGi-based architecture suggested herein. It has been integrated in the existing AdminConsole for now, using a new "Landscape" tab in the "Advanced" category.
20
+- Automation covers handling and upgrading AMIs (images), DNS records, ALBs, creating and managing shared ("multi") instances, creating and managing auto-scaling groups, migrating application replica sets from shared to dedicated infrastructure and back, scaling master and replicas up and down, archiving content from a replica set into an archive server, scaling MongoDB replica sets up and down, and upgrading replica sets to new application versions.
21
+- Logging happens primarily from ALB to S3; a synchronization script under ``/var/log/old/cache/aws/sync-alb-access-logs-from-s3.sh`` which is also found in the git folder ``configuration/`` handles the synchronization
22
+-
23
+
24
+Major topics yet missing:
25
+
26
+- Making all relevant AMIs automatically upgradable (Webserver, Hudson build server, MongoDB)
27
+- Archive server handling (in particular upgrades and automatic failover)
28
+- Routing based on criteria other than hostname (e.g., in order to allow for more dynamic "scope migrations")
29
+- The central reverse proxy is still a single point of failure and should be replicated, based on a shared configuration
30
+- Cross-region set-ups will still require manual preparation (getting at least the sailing server AMI there, setting up security group, avoiding DNS interference if a Global Accelerator is being used, not setting up a master node for an application replica set in every region but only in a primary region or even on site, setting up the auto-scaling group and managing it during upgrades, ...); for the Tokyo 2020 Olympic Games we used scripts found in Git under ``configuration/on-site-scripts`` that helped during the process.
31
+- MongoDB disk space management automation or at least support
32
+- MongoDB replica set upgrades
33
+- automation of AMI production "from scratch" for reverse proxy nodes, MongoDB hosts and application servers
34
+- automatic management of alarms for every target group created
35
+
13 36
## Overview of Cloud Configuration
14 37
15 38
As of this writing (January 2020), our cloud setup has the following essential components: