Execution Agent Failover

Several methods can be deployed to ensure high availability for the Execution Agent component.

 

  1. If you are running a cluster you can assign the service as a resource to be failed over.

  2. You can publish and use the Directory services name of the Execution Agent as part of your Queue(s) properties. This has the added benefit of allowing you to move or change machines without any changes to the ActiveBatch system.

  3. You can use Generic Queues to further abstract and separate Job Scheduling from specific machines.

    • A Generic Queue consists of one or more Execution Queues. When jobs that are associated with a Generic Queue trigger, they are dispatched to a member Execution Queue. The member Execution Queue points to a system where the job will run. That system must have an ActiveBatch Execution Agent installed on it.

    • The Generic Queue is ActiveBatch's active-active high availability solution for the Execution Agent. For example, assume you have a job that can run on any three Execution Agent machines. If you associated the job directly to an Execution Queue, the job will only execute on that machine. If the machine isn’t available (perhaps its reached its executing job limit, or the queue is stopped for maintenance purposes), the job will wait if triggered (unless the queue is closed). But if you associate the job to a Generic Queue that includes the three machines the job can run on, and one of those machines goes offline, there are still two machines left that can run the job when triggered. If you have a job that can only run on one specific Execution Agent (e.g., it is the only system with the job's required software installed), then it would make sense to associate that job to the appropriate Execution Queue. The Generic Queue should be associated to jobs that can run on more than one system. This would be considered a best practice.

 

While Execution Agent failover is supported, please remember that you may be limited to a specific machine based on the resources your job requires.  This is particularly true of Microsoft Windows platforms which generally require a cluster solution so that shared resources can be failed over as well. Other platforms (such as OpenVMS) may have fewer restrictions in terms of job resources which can be shared or used concurrently.