Developed in python
CLUES is completely developed in python, and all of its source code is distributed as open source under the GNU General Public License - version 3.0. This license allows users to make modifications, as long as the license type is kept if the results are distributed.
This also implies that CLUES is multi-platform, the only constraint being that the python interpreter and the libraries on which it depends can be executed. Given the widespread availability of the python interpreter, CLUES manager can be used in most current systems.
Thanks to the system of connectors, CLUES can manage any type of infrastructure, either homogeneous or heterogeneous. Therefore, it can switch on/off computers based on Microsoft WindowsTM, Linux or AppleTM, either directly or with the help of PDU (Power Devide Unit) systems or embedded energy managers.
Thanks to the system of connectors, CLUES can integrate with virtually any local resource management system (LRMS), both with batch-queuing systems and with Cloud Computing infrastructure managers. Examples of currently available connectors are:
- Torque: this connector enables the integration of CLUES with Torque, which is one of the most popular batch-queuing systems in the scientific community. The connector takes profit of Torque callback system in order to make the integration with CLUES. Additionally, it allows using the queuing system in a conventional way, with all its options (both the more usual and those less frequent).
- SLURM: the mechanism of integration with Torque enables other queuing systems that present a PBS interface, such as SLURM, to also get integrated with CLUES
- OpenNebula: this connector enables the integration of OpenNebula with CLUES, by using the hooks system provided by the cloud infrastructure manager. The job granularity in this case is the virtual machine, considering the number of CPUs and the memory size as the factors to determine if a node can run a given virtual machine. The system is completely integrated and the users would simply interact with it by means of the conventional mechanisms, both regarding commands and virtual machine templates.
- Sun Grid Engine: although Sun Grid Engine provides an interface that is very similar to Torque, a specific connector has been developed to integrate this queuing system with CLUES, in order to capture some of its peculiarities. The integration approach is very similar to that used in Torque, and the achieved versatility is equivalent, thus allowing the use of practically every option of Sun Grid Engine.
- Globus Toolkit y gLite: these two Grid middleware integrate with the local batch-queuing systems, behaving as local users. Since batch-queuing systems such as OpenPBS/Torque, Sun Grid Engine, SLURM, etc. integrate completely with CLUES, an installation of gLite or Globus Toolkit would also be completely integrated.
Adaptable energy saving policies
CLUES allows the implementation of flexible energy saving policies, by adjusting different parameters such as:
- Excess of nodes powered on, to take into account future jobs. Thus, we try to improve the cluster end user experience, by reducing waiting time for subsequent job submissions.
- Time beyond which a node is considered to be idle. With this parameter you can model the working guidelines of an organization, adjusting node switching off to them. By adjusting this parameter you can take into account employee breaks, time between tests, etc. in order to avoid that nodes are switched off during that time, affecting system usability.
- List of nodes that should not be considered for powering on/off. This feature can be used either to keep a group of nodes always on or to keep them always off (in order to have some nodes in reserve).
Additionally, most of these parameters are adjustable individually for each subsystem controlled by CLUES. Thus, individual policies can be established for each of these subsystems.
Tools for node management
CLUES provides both CLI (Command Line Interface) tools and web interface to power on or off a node in a controlled way, by taking advantage of the fact that CLUES is integrated with cluster management subsystems. When powering off a node using CLUES tools, the necessary operations will be done in order to take the node out of the LRMS (or cloud system) management in an ordered way, thus avoiding failures or nodes appearing in a state of error. Conversely, when a node is powered on using CLUES tools, the necessary operations will be done in order to integrate the node in the local resource management systems.
Powering on a large number of nodes simultaneously can cause an excess of electricity demand. This can be made worse by transient consumption peaks caused by power supply devices at start up time. CLUES tries to soften these power demand peaks by performing a controlled powering on of nodes. Thus the effect of powering on and off nodes on the general electric system is limited.
CLUES incorporates tools for generating reports on the cluster operation. These tools enable generation of reports on the cluster usage (submitted jobs, active nodes, average values, waiting times, etc.). These reports can be exported to formats compatible with tools such as MS ExcelTM.
The reports are a fundamental help for tasks of energy management, enabling administrators to identify which nodes have been most used and which ones have been most idle. Using this information, appropriate actions can be taken such as changing the dedication or incorporating specialized equipment. Additionally, it is also possible to identify which subsystems have been most active, their impact on the infrastructure, etc.
All this information is helpful for decision making with respect to the equipment administration at the level of management subsystems (batch-queuing system, cloud infrastructure, etc.). But the information is also helpful for decision making with respect to the purchase of new equipment, since administrators can have a global view of the infrastructure usage.
Contacto: +34963877023, Fax: +34963877274