Applies to: BLAZE
Summary
BLAZE automatically performs failover and failback operations when a server becomes unavailable or recovers. This article explains failover and failover in a cluster.
Failover/Failback
Failover is triggered when a server is detected as failed through one of two conditions: network failure or storage failure.
Failback does not occur automatically. Failback behavior depends on the failure type:
- Network Failure – The cluster initiates failback automatically once the connection is restored.
- Storage Failure – The administrator must manually trigger failback from the BLAZE Desktop Client after confirming the storage issue have been resolved.
If no server has enough available channel capacity to take over a camera, that camera remains in a Pending state until capacity becomes available.
When a failback occurs, the recovering server signals that it is ready to reclaim its cameras. Each server currently hosting the recovered server's cameras releases them, and the cameras return to the original server and normal operation. Data synchronization to the recovered server restarts automatically.
Network Failure
A network failure occurs when no heartbeat is received for 10 seconds and the connection is confirmed dropped. When this happens, the cluster automatically initiates failover for the affected server's cameras. Once network connectivity is restored, the cluster automatically performs failback and returns the cameras to the original server.
Storage Failure
A storage failure occurs when a server detects a storage (HDD/SSD) failure and reports the failure in its heartbeat. Other servers receive this notification and immediately initiate failover for the affected server's cameras, even if the network connection is still active. Once the storage issue is resolved, the administrator manually confirms recovery before the cluster initiates failback and returns its cameras. This ensures the server is fully operational before cameras are returned.
Comments
0 comments
Article is closed for comments.