fa762504737a291f91a07d0dd322de3d0f02e851
Home.md
| ... | ... | @@ -59,6 +59,7 @@ SAP is at the center of today’s technology revolution, developing innovations |
| 59 | 59 | * Amazon |
| 60 | 60 | * [[Amazon EC2|wiki/info/landscape/amazon-ec2]] |
| 61 | 61 | * [[Upgrading ARCHIVE server|wiki/info/landscape/archive-server-upgrade]] |
| 62 | + * [[Upgrading MongoDB Nodes|wiki/info/landscape/mongo-cluster-upgrade]] |
|
| 62 | 63 | * [[EC2 Backup Strategy|wiki/info/landscape/amazon-ec2-backup-strategy]] |
| 63 | 64 | * [[Creating an EC2 image from scratch|wiki/info/landscape/creating-ec2-image-from-scratch]] |
| 64 | 65 | * [[Upgrading an EC2 image|wiki/info/landscape/upgrading-ec2-image]] |
wiki/info/landscape/mongo-cluster-upgrade.md
| ... | ... | @@ -0,0 +1,64 @@ |
| 1 | +# MongoDB Cluster Upgrades |
|
| 2 | + |
|
| 3 | +In our production environment on AWS, we currently (2026-02-10) run three MongoDB replica sets: |
|
| 4 | + |
|
| 5 | +- ``live``: holds all databases for live operations and consists of three nodes: two i3.large instances with fast NVMe storage used for the ``/var/lib/mongo`` partition, and a hidden instance with an EBS volume that is backed up on a daily basis |
|
| 6 | +- ``archive``: holds the ``winddb`` database used for the ARCHIVE server |
|
| 7 | +- ``slow``: used for backing up databases when removing them from the ``live`` replica set, e.g., when shutting down an application replica set after an event |
|
| 8 | + |
|
| 9 | +The ``archive`` and ``slow`` replica sets usually have only a single instance running on ``dbserver.internal.sapsailing.com``, and this is also where the hidden replica of the ``live`` replica set runs. The other two ``live`` nodes have internal DNS names set for them: ``mongo[01].internal.sapsailing.com``. |
|
| 10 | + |
|
| 11 | +Upgrades may affect the packages installed on the nodes, or may affect the major version of MongoDB being run. Both upgrade procedures are described in the following two sections. |
|
| 12 | + |
|
| 13 | +## Upgrade Using Package Manager |
|
| 14 | + |
|
| 15 | +With Amazon Linux 2023, ``dnf`` is the package manager used. When logging on to an instance, a message like |
|
| 16 | + |
|
| 17 | +``` |
|
| 18 | +A newer release of "Amazon Linux" is available. |
|
| 19 | + Version 2023.10.20260202: |
|
| 20 | +Run "/usr/bin/dnf check-release-update" for full release and version update info |
|
| 21 | +``` |
|
| 22 | + |
|
| 23 | +may be shown. In this case, run |
|
| 24 | + |
|
| 25 | +``` |
|
| 26 | +dnf --releasever=latest upgrade |
|
| 27 | +``` |
|
| 28 | + |
|
| 29 | +and watch closely what the package manager suggests. As soon as you see a kernel update about to install, displayed in red color (if your terminal supports colored output), a reboot will be required after completing the installation. This can also be checked using the following command: |
|
| 30 | + |
|
| 31 | +``` |
|
| 32 | +needs-restarting -r |
|
| 33 | +``` |
|
| 34 | + |
|
| 35 | +It will output a message like |
|
| 36 | + |
|
| 37 | +``` |
|
| 38 | +No core libraries or services have been updated since boot-up. |
|
| 39 | +Reboot should not be necessary. |
|
| 40 | +``` |
|
| 41 | + |
|
| 42 | +and exits with code ``0`` if no reboot is required; otherwise, it will exit with ``1`` and display a corresponding message. |
|
| 43 | + |
|
| 44 | +To avoid interrupting user-facing services, rebooting the MongoDB nodes shall follow a certain procedure: |
|
| 45 | + |
|
| 46 | +- Ensure that no ARCHIVE candidate is currently launching; such a candidate would read from the ``archive`` replica set, so that rebooting the ``dbserver.internal.sapsailing.com`` node would interrupt this loading process. If an ARCHIVE candidate is launching, wait for the launch to finish. |
|
| 47 | +- Ensure that no application replica set is currently being shut down with backing up its database. This backup would fail if the ``dbserver.internal.sapsailing.com`` node were restarted as it hosts the ``slow`` replica set used for the backup. |
|
| 48 | +- ssh into ``ec2-user@dbserver.internal.sapsailing.com`` |
|
| 49 | +- There, run ``sudo dnf --releasever=latest upgrade`` and confirm with "yes" |
|
| 50 | +- Assuming an update was installed that now requires a reboot, run ``sudo reboot`` |
|
| 51 | +- Wait until the instance is back up and running, you can ssh into it again, and ``pgrep mongod`` shows the three process IDs of the three running ``mongod`` processes |
|
| 52 | +- ssh into ``ec2-user@mongo0.internal.sapsailing.com`` |
|
| 53 | +- run ``mongosh`` to see if ``mongo0`` is currently primary or secondary in the ``live`` replica set |
|
| 54 | +- if you see "secondary", you're all set; if you see "primary", enter ``rs.stepDown()`` and see how the prompt changes from "primary" to "secondary" |
|
| 55 | +- use ``quit()`` to exit the ``mongosh`` shell |
|
| 56 | +- run ``sudo dnf --releasever=latest upgrade`` and confirm with "yes" |
|
| 57 | +- if a reboot is required, run ``sudo reboot`` |
|
| 58 | +- wait for the instance and its ``mongod`` process to become available again; you may probe, e.g., by ssh-ing into the instance and checking with ``mongosh`` |
|
| 59 | +- repeat the process described for ``mongo0`` for ``mongo1.internal.sapsailing.com`` |
|
| 60 | + |
|
| 61 | +Hint: You can choose the order between ``mongo0`` and ``mongo1`` as you wish. If you start with the "secondary" instance, you will save one ``rs.stepDown()`` command. |
|
| 62 | + |
|
| 63 | +## MongoDB Major Version Upgrade |
|
| 64 | + |