For load balancing workers the status worker shows some interesting overview information.
It categorises the members of the load balancer into the classes "good", "bad" and degraded".
This feature can be combined with external escalation procedures. Depending on your global
system design and your operating practises your preferred categorisation might vary.
The categorisation is based on the activation state of the workers (active, disabled or stopped),
which is a pure configuration state, and the runtime state
(OK or ERR with possible substates idle, busy, recovering, probing, and forced recovery)
which only depends on the runtime situation.
The runtime substates have the following meaning:
-
OK (idle): This worker didn't receive any request since the last balancer
maintenance. By default balancer maintenance runs every 60 seconds. The
worker should be OK, but since we didn't have to use it for some time, we
can't be sure. This state has been called N/A before version 1.2.24.
-
OK (busy): All connections for this worker are in use for requests.
-
ERROR (recovering): The worker was in error state for some time and is now
marked for recovery. The next request suitable for this worker will use it.
-
ERROR (probing): After setting the worker to recovering, we received a request
suitable for this worker. This request is now using the worker.
-
ERROR (forced recovery): The worker is in error, but we don't have an alternative
worker, so we keep using it.
By default the status worker groups into "good" all members, that have activation "active" and
runtime state not equal to "error" with empty substate.
The "bad" group consists of the members, that have either activation
"stopped", or are in runtime state "error" with empty substate.
Workers that fit neither of the two groups, are considered to be "degraded".
You can define other rules for the grouping into good, bad and degraded.
The two attributes "good" and "bad" can be populated by a comma-separated list ob single characters or
dot-separated pairs. Each character stands for the first character of one of the possible states "active",
"disabled", "stopped", "ok", "idle", "busy", "recovering" and "error". The additional states "probing"
and "forced recovery" are always rated equivalent to "recovering".
Comma-separated entries will be combined
with logical "or", if you combine a configuration and a runtime state with a dot. the are combined with logical
"and". So the default value for "good" is "a.o,a.i,a.b,a.r", for "bad" it is "e,s".
The status worker first tries to match against the "bad" definitions, if this doesn't succeed
it tries to match against "good", and finally it chooses "degraded", if no "bad" or "good" match
can be found.