f48ac53c10c310bd06be9e5a0197b4692c33a1ba
wiki/info/landscape/amazon-ec2.md
| ... | ... | @@ -470,6 +470,25 @@ proxy. And virtual hosts are created for the private IP and |
| 470 | 470 | localhost, so the internal server status and main healthcheck |
| 471 | 471 | can function (see below). |
| 472 | 472 | |
| 473 | +### Healthcheck |
|
| 474 | + |
|
| 475 | +On the topic of healthchecks, we have the important reverseProxyHealthcheck.sh, which can be found on the *central and |
|
| 476 | +disposables*. It is used to reduce costly cross-AZ traffic between our instances, whilst also ensuring reliability and availability. |
|
| 477 | + |
|
| 478 | +The general idea of this ALB target group healthcheck, is to make instances healthy only if in the same AZ as the archive (the correct AZ). However, availability takes priority over cost saving, so if there is no healthy instance in the "correct" AZ, the healthcheck returns healthy. |
|
| 479 | + |
|
| 480 | +All the target groups, tagged with allReverseProxies, have this healthcheck: |
|
| 481 | + |
|
| 482 | +``` |
|
| 483 | +/cgi-bin/reverseProxyHealthcheck.sh?arn=TARGET_GROUP_ARN |
|
| 484 | +``` |
|
| 485 | + |
|
| 486 | +The healthcheck works by first checking internal-server-status. If genuinely unhealthy, then unhealthy is returned to the ELB (elastic load balancer) health checker. Otherwise, the instance uses cached CIDR masks (which correspond to AZ definitions) and nmap to check if in the same AZ as the archive. |
|
| 487 | +If in the same AZ, then "healthy" is returned to the ELB health checker. If not, then the target group ARN, passed as a parameter |
|
| 488 | +to the healthcheck, is used to get the private IPs of the other instances in the target group, via a describe-target-health call to the AWS API. This is the most costly part of the check, so these values are cached. |
|
| 489 | + |
|
| 490 | +We then use the same nmap/CIDR method, to check which of the discovered instances is in the same AZ as the archive. Finally, we use the internal-server-status, of those instances in the same AZ as the archive, to check if they are healthy. If there are no healthy instances in the "correct" AZ, then we return healthy, otherwise unhealthy. |
|
| 491 | + |
|
| 473 | 492 | |
| 474 | 493 | ### Automating archive failover |
| 475 | 494 |