In the previous guide, a robust Primary-Replica topology for Valkey was established. Read scaling is now active, and a hot copy of the data is securely stored on a second node.
But there is a catch. If a primary node crashes, the replica will remain faithful and wait for instructions. It will not automatically take over the responsibilities of the primary. Applications will start throwing write errors until an administrator manually logs in and reconfigures the replica to become the new primary.
To achieve true High Availability (HA) and ensure continuous uptime without manual intervention, Valkey Sentinel is required.
What is Valkey Sentinel?
Valkey Sentinel is a distributed system designed to monitor Valkey instances, detect failures, and automatically handle failover.
When Sentinel detects that a primary node is unresponsive, it performs the following tasks:
- Monitoring: It continuously checks whether primary and replica nodes are functioning as expected.
- Notification: It can notify system administrators or another computer program via an API that something is wrong.
- Automatic Failover: It promotes a healthy replica to the new primary and reconfigures the other replicas to sync with it.
- Configuration Provider: It acts as a source of truth for clients. Applications can connect to Sentinel to ask for the current primary’s address. If a failover occurs, Sentinel reports the new address.
The Rule of Three (Quorum)
Sentinel is a distributed system, meaning multiple Sentinel processes must run and agree on a node’s failure before taking action. This agreement is called a quorum.
To prevent a “split-brain” scenario (where a network partition causes two nodes to both assume they are the primary), at least three Sentinel instances must be deployed.
For this guide, the environment consists of three dedicated database nodes. Each node will run both the Valkey database service and the Valkey Sentinel service:
- ArunValkeyPrimary (Primary + Sentinel): 172.31.32.27
- ArunValkeyReplica (Replica 1 + Sentinel): 172.31.37.55
- ArunValkeyReplica2 (Replica 2 + Sentinel): 172.31.39.58
The primary node is healthy and running as the master, with two replicas connected and actively syncing.
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -a amma@123
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> INFO replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.31.37.55,port=6379,state=online,offset=98214,lag=1
slave1:ip=172.31.39.58,port=6379,state=online,offset=98214,lag=1
master_failover_state:no-failover
master_replid:629656a198b7290bf6492e470b449ad1ced509e0
master_replid2:30977276632877f46ad12fcc2bbc2c5191c67c0c
master_repl_offset:98214
second_repl_offset:1643
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1643
repl_backlog_histlen:96572
127.0.0.1:6379>
Step 1: Create the Sentinel Configuration File
Sentinel runs as a separate process from the main Valkey database, using its own configuration file and listening on port 26379 by default.
The Sentinel configuration file (typically /etc/valkey/sentinel.conf) must be created or edited on all three nodes(ArunValkeyPrimary, ArunValkeyReplica, and ArunValkeyReplica2).
Open the file and add the following core directives:
port 26379
# Format: sentinel monitor <cluster-name> <primary-ip> <primary-port> <quorum>
sentinel monitor mymaster 172.31.32.27 6379 2
# The primary password set in the previous setup
sentinel auth-user mymaster default
sentinel auth-pass mymaster amma@123
# How many milliseconds the primary must be unreachable before Sentinel considers it down
sentinel down-after-milliseconds mymaster 5000
# How long to wait before trying another failover if the first one fails
sentinel failover-timeout mymaster 10000
Understanding the monitor line:
- mymaster is the arbitrary name given to this cluster.
- 172.31.32.27 6379 points to the current primary node (ArunValkeyPrimary). (Sentinels will automatically discover both replicas by querying the primary, so the replica IPs do not need to be listed).
- 2 is the quorum. This means at least 2 out of the 3 Sentinels must agree the primary is down to initiate a failover.
Step 2: Ensure Proper Permissions
Sentinel needs the ability to rewrite its own configuration file. When a failover happens, Sentinel updates sentinel.conf with the new primary’s IP address and the current state of the cluster.
Ensure the valkey user has write permissions to the file on all three nodes:
sudo chown valkey:valkey /etc/valkey/sentinel.conf
Step 3: Start the Sentinel Services
Start the Sentinel service on all three nodes. Depending on the Linux distribution and the Valkey installation method, this is usually done via systemctl:
root@ArunValkeyPrimary:/home/ubuntu# sudo systemctl enable valkey-sentinel
Synchronizing state of valkey-sentinel.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable valkey-sentinel
root@ArunValkeyPrimary:/home/ubuntu# sudo systemctl start valkey-sentinel
root@ArunValkeyPrimary:/home/ubuntu#
root@ArunValkeyReplica:/home/ubuntu# sudo systemctl enable valkey-sentinel
Synchronizing state of valkey-sentinel.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable valkey-sentinel
root@ArunValkeyReplica:/home/ubuntu# sudo systemctl start valkey-sentinel
root@ArunValkeyReplica:/home/ubuntu#
root@ArunValkeyReplica2:/home/ubuntu# sudo systemctl enable valkey-sentinel
Synchronizing state of valkey-sentinel.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable valkey-sentinel
root@ArunValkeyReplica2:/home/ubuntu# sudo systemctl start valkey-sentinel
root@ArunValkeyReplica2:/home/ubuntu#
Step 4: Verify the Sentinel Cluster
Check if the Sentinels are successfully communicating with each other and monitoring the database. Log into any node and use the Valkey CLI to connect to the Sentinel port (26379):
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -p 26379
AUTH failed: ERR AUTH <password> called without any password configured for the default user. Are you sure your configuration is correct?
127.0.0.1:26379> INFO sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.31.32.27:6379,slaves=2,sentinels=3
127.0.0.1:26379>
Look closely at the master0 line at the bottom. This confirms everything is functioning correctly:
- status=ok: The primary (ArunValkeyPrimary) is healthy.
- slaves=2: Sentinel found both ArunValkeyReplica and ArunValkeyReplica2.
- sentinels=3: All three Sentinel instances have discovered each other and formed a quorum.
Additional Verification: Sentinel Peer Health
To further validate that all Sentinel nodes are actively communicating and healthy, we can query the list of Sentinel peers and inspect their status:
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -p 26379 SENTINEL SENTINELS mymaster | grep -E -A 1 '^ip$|^flags$|^last-ok-ping-reply$|^down-after-milliseconds$'
ip
172.31.37.55
--
flags
sentinel
--
last-ok-ping-reply
65
--
down-after-milliseconds
5000
--
ip
172.31.39.58
--
flags
sentinel
--
last-ok-ping-reply
65
--
down-after-milliseconds
5000
root@ArunValkeyPrimary:/home/ubuntu#
What this means:
- ip ? Lists the other Sentinel nodes in the cluster
- flags=sentinel ? Confirms these are active Sentinel peers
- last-ok-ping-reply ? Indicates the last successful heartbeat response (in milliseconds)
- down-after-milliseconds: 5000 ms ? failure threshold
Lower values here indicate healthy and responsive communication between Sentinel nodes.
Step 5: The Chaos Test (Triggering a Failover)
The best way to trust an HA setup is to break it intentionally. We will simulate a crash by killing the primary node, verifying the failover, and then manually failing back to our original primary.
1. Kill the Primary
On ArunValkeyPrimary (172.31.32.27), stop the Valkey database service (do not stop Sentinel, just the database):
root@ArunValkeyPrimary:/home/ubuntu# sudo systemctl stop valkey
root@ArunValkeyPrimary:/home/ubuntu#
2. Verify the Failover via Sentinel
Wait for about 5 to 10 seconds to allow the down-after-milliseconds threshold to pass and the Sentinels to complete the election process. Instead of checking the logs, you can query the Sentinel information directly to confirm the failover has occurred and find out which node was promoted.
On ArunValkeyReplica, connect to the Sentinel port (26379) and run the INFO sentinel command:
root@ArunValkeyReplica:/home/ubuntu# valkey-cli -p 26379
127.0.0.1:26379> INFO sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.31.37.55:6379,slaves=2,sentinels=3
127.0.0.1:26379>
Look at the master0 line at the bottom. It shows that the status is ok and the primary address is now 172.31.37.55:6379.
3. Verify the Failover via the Database
Now, connect to that newly promoted node (172.31.37.55) on the standard database port to verify the promotion from the database’s perspective:
root@ArunValkeyReplica:/home/ubuntu# valkey-cli -a amma@123
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> INFO replication
# Replication
role:master
connected_slaves:1
slave0:ip=172.31.39.58,port=6379,state=online,offset=574633,lag=0
master_failover_state:no-failover
master_replid:b93b82982616a59a2304a799e548d7398ee15732
master_replid2:43ea3aeca4846f06c3c6dd11174e9bfd7ac7fabf
master_repl_offset:574633
second_repl_offset:475110
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:450256
repl_backlog_histlen:124378
127.0.0.1:6379>
Notice that the role has changed from slave to master, and it now shows 1 connected slave (the other surviving replica, 172.31.39.58).
4. Restarting the Old Primary
When the Valkey service on ArunValkeyPrimary is eventually restarted, Sentinel will automatically detect it, reconfigure it as a read-only replica, and point it to the newly promoted primary to catch up on missed data.
root@ArunValkeyPrimary:/home/ubuntu# sudo systemctl start valkey
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -p 26379 INFO sentinel
AUTH failed: ERR AUTH <password> called without any password configured for the default user. Are you sure your configuration is correct?
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.31.37.55:6379,slaves=2,sentinels=3
Check the database replication status on the old primary to see it is now acting as a replica:
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli INFO replication
# Replication
role:slave
master_host:172.31.37.55
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_read_repl_offset:614120
slave_repl_offset:614120
slave_priority:1
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:b93b82982616a59a2304a799e548d7398ee15732
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:614120
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:607384
repl_backlog_histlen:6737
root@ArunValkeyPrimary:/home/ubuntu#
5. Executing a Manual Failback
If you want ArunValkeyPrimary to reclaim its throne as the primary node, you can trigger a manual failover. First, configure it to have a high priority for elections, then issue the failover command to Sentinel:
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli CONFIG SET replica-priority 1
OK
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli CONFIG REWRITE
OK
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -p 26379 SENTINEL FAILOVER mymaster
AUTH failed: ERR AUTH <password> called without any password configured for the default user. Are you sure your configuration is correct?
OK
(Note: The AUTH failed warnings simply indicate the CLI attempted to pass a default auth to a Sentinel instance that might not require it or is configured differently, but the OK confirms the command successfully executed.)
Check Sentinel one last time to confirm ArunValkeyPrimary (172.31.32.27) is back in charge:
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -p 26379 INFO sentinel
AUTH failed: ERR AUTH <password> called without any password configured for the default user. Are you sure your configuration is correct?
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.31.32.27:6379,slaves=2,sentinels=3
root@ArunValkeyPrimary:/home/ubuntu#
Wrapping Up
By combining replication with Sentinel, a single cache becomes a highly available, self-healing data cluster. If hardware fails or network hiccups occur, Sentinel automatically handles the reshuffling. Furthermore, as demonstrated, system administrators still retain full control to manually shuffle roles during planned maintenance or load balancing.
The post Achieving High Availability with Valkey Sentinel appeared first on Percona.