All three applications that were part of the validation were tested for HA functionality:
Milestone 2022
BriefCam 6.3
Ipsotek 11.7.1
Each application was set up following the vendor recommendations. Additional HA validations were performed in addition to the VxRail HA tests.
BriefCam HA results
Protecting mission-critical CV applications is becoming an increasingly important aspect of facilities management. BriefCam supports a rich set of active/active as well as active/passive HA features. In this validation, we tested the native active/active HA features of the BriefCam alert processing server and active/passive protection of the VS Server. One of our design goals is to provide the best combination of native and VMware HA to protect everything from a specific VM to the full cluster to avoid operational disruption.
The BriefCam application has full support for the following features when building an HA solution.
Redundant Microservice architecture
Backup Management console
Backup Database servers
Description
This test simulates a failure of a machine that runs the real-time Alert Processing Service.
Steps
Select any BriefCam Alert processing VM and shut it down. It must have active camera traffic.
Monitor camera behavior in the Web console.
Measure the time taken for the camera to be moved to an active node in the cluster.
Expected results
It is expected that when an Alert Processing Service goes offline other nodes running the Alert Processing Service will take over processing and alerts generation.
The time taken for the processing to switch from a failed node to an active one will be measured.
Results
When the Alert Processing VM was shut down, the cameras that were running on that server went into the status "Active (Queued)"
No alerts were being triggered while in this state.
After a minimum of 5 minutes, the camera streams were picked off the database queue and launched on another processing node in the cluster.
Ipsotek HA results
Ipsotek supports additional application-level security on top of the VxRail provided HA. This security protects against an issue with a specific VM when the full cluster is still operational.
The Ipsotek application has full support for building an HA solution. It is possible to build two fully redundant systems processing systems and also to build redundancy in to a single cluster.
The single cluster option was used for this testing and has the following features:
Multiple active Camera Alert processing nodes
Backup Management console (active/passive)
Multiple Database servers (min 3-node cluster)
The following scenario was validated during this phase of testing.
Description
This test simulates a failure of a machine that runs the core Camera Alert processing.
Steps
Select any Ipsotek processing VM and shut it down to simulate failure.
Monitor camera behavior in the Ipsotek configuration tool.
Measure the time taken for the camera to be moved to an active node in the cluster.
Expected results
When a Camera Processing VM is shut down other nodes take over and process the camera alerts instead.
The time taken for the processing to switch from a failed node to an active one is measured.
Results
Once the VM was shut down, 15 cameras went in to "Active - No Video" mode in the Management console.
The cameras were automatically distributed to among the remaining three active Ipsotek Processing nodes.
After a maximum of 30s, all cameras were back processing alerts. A camera is reassigned to a new cluster member after 20s downtime. The Management console refreshes the camera status every 30s.
The "Processing Node Information" view on the Management console shows that one node is not part of the cluster and that the workload has been distributed among the remaining active nodes.
Once the Processing node was restarted, it automatically rejoined the cluster.
Note: Once an Ipsotek Processing node joins a cluster, the cluster will not automatically rebalance to adjust the workload. If this is required cameras can be disabled and enabled to distribute to the new nodes.
Milestone HA results
High Availability is an important consideration for security systems deployment to avoid loss of camera stream data during an unplanned outage. For larger environments, Milestone Systems recommends hosting SQL Server on a dedicated server to provide adequate latency for many devices and/or many event transactions. For small to medium environments all four of the following VMS components can run on a single computer:
Management server
Event server
Log server
SQL Server
When using dedicated SQL Server installations, the HA design for the three remaining services (management, events, and logging) must be planned for separately. During our validation we installed two SQL Server database servers in an Alway-On Availability Group for high availability. In environments where the uptime needs of the other three services is critical, consult with Milestone Systems and your system integrator to understand the options available.
SQL database failover
Our test began with two Microsoft SQL Servers configured with AAG and all Milestone databases restored to the second server and log replication active. We also had two Milestone Management Servers installed and connected to the Virtual Server name of the AAG.
The test began by failing the databases to become active on the second database server in the AAG (failed from mil-db-1 to mil-db-2).
All camera streams contained to be recorded without interruption.
The management server UI on mil-dir-1 was used to disable 2 camera streams.
A login was made to mil-dir-2 and the two cameras impacted in step 3 were in the disabled state.
The two cameras were then successfully enabled on mil-dir-2
This test confirms that SQL Server can successfully be protected against a single VM failure using SQL Server AAG.
Recorder hot standby failover
Hot standby failover recording server setup requires a dedicated failover recording server for each protected recording server. This one-to-one mapping allows the system to quickly transition a failover recording server from "standby" mode by synchronizing the correct/current configuration of the recording server it is dedicated to.
We began the test with two VMs, the primary recorder (mil-rec-1) and one hot backup recorder (mil-hot-1).
The data folder configured for storing video recording locally is empty on mil-hot-1, the hot standby server.
vSphere was used to shut down the primary recorder (mil-rec-1) that is then shown as offline in the Management Server.
We observed data being written to local files on the hot standby server.
The data from the local file storage is then merged back to the primary server when failback is initiated
Checking the video archive for a camera attached to the primary server shows approximately 15 seconds of missing recording when the initial failover occurred and 7 seconds of missing video when the fallback occurred.
We also checked the potential impact of a CV application processing camera streams mapped to the primary server before performing a hot standby failover test. The charts below show the processing impact on a BriefCam alert processing server analyzing several camera streams as the processing moves from the primary recorder to the hot standby and back again. We first see the primary failover around time interval 20 and fallback around time interval 50. The BriefCam user portal showed that alerts were continuously processing through both events even though there were losses of 15 seconds and 7 seconds of video data during failover and fallback respectively.
Recorder cold standby failover
We tested several scenarios using the cold standby features for Milestone XProtect video recorders. Our setup started by creating 3 VMs and installing the Systems Failover Server service and the Failover Recording Server service described in the Milestone Architecture section of this Design Guide.
We then created two Failover Server groups called cold-failover and cold-failover-2. The mapping of server VMs to Failover Groups was:
Failover group
Servers
cold-failover
mil-cold-1
mil-cold-2
cold-failover-2
mil-hot-12
Note: We apologize for any confusion caused by the naming of a cold standby server "mil-hot-12". The VM was repurposed from its original intended use.
We completed our initial setup by configuring the cold-failover group as the Primary Failover Server Group and the cold-failover-2 group as the Secondary Failover Server Group for the mil-rec-3 video recorder.
The expected behavior with this configuration is that when mil-rec-3 fails or goes offline recording for any attached cameras will be transferred to either one of the two servers in the Primary failover server group. Failover server groups can protect more than one recording server so that in the event that both servers in the primary group are in service from prior failures when mil-rec-3 fails, it will then failover to a server in the Secondary failover server group.
Our first test was to create a failure on mil-rec-3 with all three cold standby servers in the two groups available. We had 2 physical cameras streaming to mil-rec-3 receiving approximately 15mbps of video data. We then powered down mil-rec-3 simulating a failover and the camera streams were transferred to mil-cold-1. There were approximately 37 seconds of missing video data that was not recorded during the failover. After the first simulated failure, we still had two additional cold standby servers available.
For our second test, we simulated a failure for mil-cold-1 and we saw nearly identical behavior. The two camera streams were transferred to mil-cold-2, the second failover server in the Primary failover server group, and observed about a 38 seconds gap in the video archive for those two cameras.
Our final test was to simulate a failover of mil-cold-2. We saw the same results as the two previous tests, but, this time the recording was transferred to the single VM in the Secondary failover group - mil-hot-12. The amount of lost video data this time was only about 26 seconds. This may be the result of this being the last available server and therefore there is no logic for choosing which server to failover to but we cannot verify that.
We also configured real-time alert processing for the two physical cameras through integration with our BriefCam CV application. The video from the physical cameras was of extremely low quality for use in a CV application. We were able to occasionally detect a person in the video using a people counting algorithm. The two charts below show resource consumption on the BriefCam Alert Processing server configured for this testing. There was minimal impact on resource utilization during the failovers.
We did see some interruptions in the alert rule processing in the BriefCam event log. One of the two streams would change from the Active Processing state to Recovering, then into the Processing Queue, and finally resume Active Processing. The duration of the disruption was roughly equivalent to the duration of the lost video recording. There was no manual intervention required on the BriefCam application to resume stream processing during any of these standby failover tests.