Process Load Balancing
Starting with the 0.9.9 release we have made made load leveling much
easier on Redhat system. Loadleveling is automatically enabled on all
nodes for the program /bin/bash-ll, which otherwise works exactly the
same as /bin/bash. Any job you start from a bash-ll prompt can be
loadleveled. That's all there is to it.
If you always want to enable loadleveling for a certain program,
regardless of who starts it, you can add its full pathname to
/etc/sysconfig/loadlevellist. For the cluster to see the new
changes you will need to restart loadleveling using the
"service loadlevel stop" and "service loadlevel start" commands.
If you are not on Redhat, then see details below.
Note for Debian:
The program "bash-ll" is just a hard link to bash
(ln /bin/bash /bin/bash-ll). This is not created by default during
openssi installation. User should create this link manually.
There is no file "/etc/sysconfig/loadlevellist" on Debian.
The equivalent file on Debian is "/cluster/etc/loadlevellist".
For the cluster to see the new changes you will need to restart
loadleveling using the "invoke-rc.d loadlevel stop" and
"invoke-rc.d loadlevel start" commands.
Load leveling in SSI (internal details)
In the 0.6 release, we have taken Mosix's process load leveling
algorithms and integrated them into our Open SSI cluster base.
This is currently a compile time config variable CONFIG_MOSIX_LL, which
is set by default. However, loadleveling is not on by default.
Once the machines are up loadleveling can be started using the
"loadlevel -a on" command to turn loadleveling on for all nodes
in the cluster. To turn loadleveling on for a specific node
"loadlevel -n <node #> on" can be used.
To configure say yes to the "MOSIX Load Leveling for SSI" config option
during compile time. Once the cluster is up, add the absolute pathname
of the programs/executables that you would like loadleveled automatically
into the /proc/cluster/loadlevellist inclusion file.
The list should contain only those programs/executables that should
be automatically load leveled in the cluster. Any new processes
that are forked from executables from the list will also be eligible
to move. Thus, by default every process will be implicitly pinned
to the node, unless its explicitly allowed to move by either adding
it to the list or by inheritance from its parent being on the list
or by the user level commands, "migrate" or the "loadlevel" command.
Loadleveling must be enabled on at least two nodes for processes to
migrate. If only one node has loadleveling enabled, then the node
will never try to load balance since there is no where else to move
processes to. If loadleveling is not enabled then the commands
"loadlevel -p <pid>" and "loadlevel <program>" will make the
process eligible to move, but the processes will not loadlevel until
loadleveling has been enabled on more than one node using the
The "loads" command displays the Mosix load of each node in the cluster.
An asterisk next to the load means that loadleveling is not on for
that specific node.
Using the Mosix algorithms, the load of each node is calculated and
compared to the loads of the other nodes in the cluster, if its determined
that the node is overloaded it will choose a process to migrate to the best
underloaded node. Using the existing node monitoring scheme, each node
sends its current load to the clms master node, the master node then
sends all the loads its received to all the other nodes. Consequently,
each node independently determines if it needs to move processes to other
nodes and will automatically migrate processes to other nodes.
A process gets chosen to migrate if its in the /proc/cluster/loadlevellist,
or its been marked eligible to move with the loadlevel command, and
has the highest weighted load. All processes/applications are
implicitly pinned to the node it started on, unless its added to the
/proc/cluster/loadlevellist file or started with the "loadlevel" command
or use the "loadlevel -p <pid>" command to make it loadlevelable.
Only the load leveling algorithms have been taken from Mosix. The Open
SSI project is using its own process migration model, membership
mechanism, and information sharing scheme needed for automatic load
In addition to process load balancing, OpenSSI also has exec-time load
balancing. At exec time if the node is heavily loaded and the executable
is in the loadlevellist, the exec will turn into an rexec and the
process will be rexec'ed on a less loaded node. For 1.0.0-rc2 release,
the Mosix frequency to determine if a node was overloaded was increased
from 1 sec to every half a second, and the exec time calculations got
linked to the node monitoring frequency.
Limitations with respect to MOSIX
Some limitations on our first load leveling implementation are:
- Currently there is no support for the /proc Mosix interfaces.
This means a node cannot expel remote processes, bring back processes
that have gone remote, or prevent local processes from going remote,
by simply writing into /proc. Default values for decay, and speed
cannot be changed either.
Information that is provided through the /proc mechansim in Mosix,
such as where the process is executing, number of nodes in the cluster,
number of cpu's per node, etc can be obtained by using the SSI
cluster commands/interfaces, such as clusternode_num, cluster,
cluster_getinfo. Migrating processes to another node can be done in
SSI by using the migrate command.