Job Features
Job Features
To this point we’ve managed to touch most of the ActiveBatch objects and many of each object’s properties through many of the case studies and chapter discussions. This chapter will now begin to tie everything together.
Completion Status Rule
Problem:
We need to run a program in which the exit code cannot be counted on as the measure of success or failure. How do we handle this in ActiveBatch?
Solution:
ActiveBatch provides the ability to define a success code rule based on the presence of a string.
For our example, let’s use the Windows PING program. PING is used to issue a TCP/IP request to verify whether a machine is present on the network. The Windows PING command always returns a zero (0) exit code regardless of whether it actually works or not.
The above figure shows a successful PING command. Note the echo for the exit code (%errorlevel% represents the last exit code of a completed program).
The above figure shows a failed PING command. Note the echo for the exit code is also zero but we know it didn’t really work because the program says, “Request timed out”. The problem of an unreliable exit code does come up and ActiveBatch has addressed it through the Completion Status Rule – String Search capability.
String search allows you to have ActiveBatch scan through the job’s log file or a user-specified file and search for a word or phrase that denotes success. Alternatively, you can also search for a word or phrase that denotes failure (sometimes that might be easier).
To use the search string capability instead of relying on the exit code, enable the Use Search String checkbox. Click the Setup button to enter your text and select where ActiveBatch is to look.
For this example we entered the text “Reply from” since that text indicates the Ping command worked. The text, if present, will be found in the job’s log file. In this case we could have made it case sensitive but elected not to since Ping is pretty simple (even if a bit flawed). The last checkbox does offer an alternative.
We could have searched for “timeout” and if found indicate that the job failed. ActiveBatch offers both methods since sometimes a program will indicate failure but not success. Alternatively, sometimes programs, particularly Visual Basic applications, will write a separate file (other than the job log file) where a success or failure indication can be made. If that’s applicable you would select the “Search another file…” radio button and enter the full file specification. This specification must be entered from the Execution Machine’s point-of-view. In the interest of both security and performance, the search is performed by the Execution Agent using the job’s security credentials.
Pre/Post Job Steps
ActiveBatch Process-type jobs actually support three (3) steps within a single job context: Pre, Main and Post. Our case studies have focused on the “main” job step, however, there are advantages to using “pre” and “post” rather than create separate jobs. First, job steps are tightly integrated. If the pre-step doesn’t work, the job fails. You don’t have to create any constraint or completion trigger rules for steps to run (or not run). Second, job steps allow the main-step to focus on the task at hand rather than setting up an environment for the job. For example, a pre-step may move files into position for the main-step to process them. If the main-step works, the post step may move those files to other locations. If the main-step fails, the post step may clean up and delete the files from their temporary location.
The above figure shows how both pre and post job steps are used (see Job\Job Properties – Process Job). The job specifications, just as for the main step, must be viewed from the perspective of the execution machine. Just as with the main step, you can elect to have ActiveBatch copy your scripts to the Job Scheduler for later deployment to whatever execution machine needs them. If the pre step fails, the entire job fails. By default, the post step status is ignored. You may elect not to ignore the post step status by enabling the Failure results… checkbox.
Job/Plan Run-Time Monitoring
This feature allows you to have ActiveBatch monitor your plan and/or job’s elapsed running time. The purpose of run-time monitoring is two fold: First, to provide the capability of alerting someone that a plan/job is running longer than expected and second, to have ActiveBatch take proactive action and abort the object. ActiveBatch supports the monitoring of CPU time as well but that feature has limited use.
For example, let’s say a plan normally runs for 3 hours +/- 1 hour. This means the acceptable range is from 2 hours to 4 hours. In today’s world, with 24 by 7 processing, if the plan is still running after 6 hours, you may literally run out of time and severely impact the next day’s production business.
The above figure illustrates the example. We enter 3 hours for the initial expected run-time. The tolerance is specified as a delta time of 1 hour. If the job runs in less than 2 hour a potential under-run alert may be generated. If the job runs in greater than 4 hours a potential overrun alert may be generated. We say “potential” only because you would still need to establish an alert for these possible events. You’ll notice that we can also specify the tolerance as a percentage of the expected run-time. For example, 3 hours +/- 50% would yield a range of 1.5 hours to 4.5 hours. The choice of percentage or delta time is up to you. One question you are likely to have is this, “What if my business keeps expanding and the elapsed time gradually grows such that 4 or 5 hours becomes the norm? Do I have to keep changing the expected run-time value?” Good point. We anticipated this and provided the “Set Run Against Historical Average” feature. This feature, when enabled as is shown, causes ActiveBatch to change the initial expected time and average successful job or plan runs. This means that if your successful runs are taking longer and longer for merited reasons, the increasing elapsed time “creep” will be automatically factored into by ActiveBatch without any manual intervention. You can always check the current historical average by clicking on the Counters tab of a Plan or Job definition. To reset the average back to the value of the initial expected property click the Reset Average button.
By default, job/plan monitoring is reactive in that a potential alert may be raised, however, if you enable the Abort if Overrun checkbox, ActiveBatch will abort the job/plan on an elapsed time overrun. An overrun condition is signaled if the initial/average run-time plus the tolerance factor (percentage or delta time) is exceeded. The specific error code ABAT_EXCRUNTIME is used to abort the job. You may also select the Fail if Underrun option. This option allows you to have ActiveBatch treat the job/plan as a failure if an underrun condition occurs regardless of the actual exit code or completion state rule. This is useful when a 1-hour job takes 10 seconds to complete. The likelihood that something wrong occurred is very high.
Service Level Agreements
This feature allows you to have ActiveBatch monitor your Plans and/or Jobs from the perspective of ensuring a time deadline has been met. This means that you care much more about the timeliness of how your Plan and/or Jobs execute from an elapsed time perspective than the monitoring mentioned in the previous section.
Two (2) forms of time deadlines are available: Absolute Deadline (AD) is an actual wall clock time. For example, 14:00 means 2pm local and that the SLA marked object must have completed successfully by that time. If your Plan or Job runs multiple times per day you can establish multiple Absolute Deadline times. Relative Duration (RD) means that your Plan or Job is given a specific time period to complete. The specification of HH:MM means hours and minutes (and not time of day). The start of the duration begins when the Plan/Job object is instantiated. At that point the countdown begins.
The above figure shows an object with an SLA expressed as a Relative Deadline. The Relative Deadline is six (6) minutes. Two (2) alerts have been established. A Warning alert at 80% of the time left and a Critical Alert at 90%. The 80% zone also indicates to “Take Action”. This means that any executing jobs will see their OS priority increased. Any waiting jobs will see both their Queue Priority and OS Priority increased. In addition, the Execution Queues involved will see an establishment of a Queue Priority Fence at 100. This effectively means that only SLA sensitive jobs will execute.
Alerts
We discussed the Alert object briefly in section 5.2 as part of that case study. This section will focus on the alerts themselves. The types of events and actions that you can have ActiveBatch take run the gamut from sending an e-mail to triggering a job/plan for execution.
There are two basic alerts: Notification and Trigger. Notification basically indicates that ActiveBatch can notify someone or some entity when a specific event(s) happens. Trigger indicates that ActiveBatch can start the execution of a plan or job based on a specific event(s) occurring.
Notification currently supports twenty-two (22) actions. Most of the actions involve e-mail or some interaction with thirdparty monitoring applications. ActiveBatch provides its own alert capability through the ActiveBatch Alert action and the Alert view. While many alert types involve errors, several involve successful indications. This provides a symmetry where some of the notification actions are in Open and Resolve pairs. This allows a workflow author to open a ticket or alert when an error occurs and to then resolve the ticket or alert when the error is corrected, and the job succeeds. Good examples include Open/Resolve SCSM and Open/Resolve ServiceNow actions.
ActiveBatch provides more than two-dozen events for plan, jobs and queues that you can be alerted to. Sample events are: Job/Plan Begins Execution, Completed, Succeeded, Failed, Aborted, Elapsed Time Overrun, etc.
The E-Mail alert above is set for the Job/Plan Begins Execution event. When this event occurs on the selected plan(s) and/or job(s), this notification action will occur. Each notification action has its own properties. The ones you see above are for an e-mail. All notification types support customization. In the template mode, as above, you can modify some properties but not others. The To field uses the built-in system variable @Owner. This means the owner of the alert object. The ActiveBatch Reference Manual describes all the properties in more detail. All of the properties may be customized. For properties where text may tend to be longer than the field itself, there is a small dropdown symbol that can be used to easily enter more information. Clicking that button will display a small edit dialog that allows more comfortable editing of the property.
One aspect I’m sure you’ve noticed is the liberal use of variable substitution with these alerts. In addition to composing your own messages you can use variable substitution as well.
Queue alerts are very useful when you need to ensure that jobs are running on time. If a machine isn’t available or a service fails to start the Execution Queue is left in the “Starting” state. While this state can be transient, a long term indication is indicative of a problem. With ActiveBatch you can associate Alerts and/or Alert objects that can take action should this state persist.
Besides the typical Plan and Job alert types, Queue and Job Scheduler alerts allow a more comprehensive set of determining, in a timely manner, an interruption of service.
Best Practice: While embedded Alerts have their place, most alerts are so common to a set of workflows or operational aspects of the scheduling system that it is best to create these alerts in the context of an Alert object. The Alert object can then be associated with multiple targeted objects such as Plans, Jobs, Queues and even the Scheduler itself.
Alerts – Three Shifts Case Study
This mini-Case Study proposes and solves the following problem:
Problem:
You have three (3) shifts for each day and you would like to alert the proper people who are on shift (or off-shift as the case may be). The Alert Object does not have a date/time component so how would you do it?
Solution:
Create three (3) Alert Objects: Alert1st, Alert2nd, Alert3rd. The Alert Objects would have the proper people listed by the shift. The key to solving the problem, however, is that only one Alert Object can be active (or two Alert Objects depending on how you approach the problem). So you start by disabling all of them.
Next, you associate a schedule with the following type of job (you’ll probably end up with three (3) Schedules and Jobs).
In our example, we named the plan ThreeShiftsAlert, the alert object Alert1st and the job EnableAlert. The job uses the ActiveBatch job steps to enable the alert object and then disable it (in real life you would enable one alert object and then disable the other two).