| The computer industry is reaching a consensus that large scale systems should be self-managing and self-optimizing. Otherwise, too many human experts would be needed to manage them, and their performance would still be inadequate because of constant unpredictable changes in the external environment. The current industry-standard policy-based management approaches use built-in rules that specify what needs to be done when certain situations arise. These rules take actions that improve the situation, but since they are specified heuristically and the environment around the system evolves over time, it is very difficult to make claims that such built-in rules actually optimize the system's behavior. Another approach uses the concept of utility functions. Within this approach, a utility function is formed that represents the expected future system's performance, and the system then automatically takes actions that optimize this utility function. This approach raises the level of abstraction at which humans get involved in system management: instead of specifying what actions the system should take, human administrators specify/adjust the optimization goals, which the system then automatically tries to achieve. In order to implement this approach, however, the administrators need to decide how to form the utility function for each decision-making agent. Reinforcement Learning (RL) is a recently developed methodology for optimizing system management policies based on statistical estimation and maximization of the expected long-term utilities. The RL methodology can be used for learning the previously described utility functions, as was demonstrated in the technical reports TR-2005-148, TR-2006-157, TR-2007-164. The core idea of RL is to correlate the dynamically changing system's states and actions with the observed performance feedback, thus learning the expected value of the feedback signal starting from any system's state. RL can also be used for directly learning the best action to take in every state, thus obtaining the optimal policy for managing the system, as was demonstrated in the technical report TR-2007-169 linked below. Other frameworks besides RL have also been applied within this project. For example, the technical report TR-2009-179 describes an adaptive optimization approach where the gradient of the performance function was continually estimated online, and system's parameters were continually adjusted in the direction of the maximum performance gradient. The overall goal of this project is to create a generalized systemic foundation supporting autonomous adaptability and self-optimization of various systems and processes (using RL and other adaptive learning methods). Areas under investigation currently include:
- Dynamic migration of users in distributed virtual environments (in collaboration with project Darkstar)
- Dynamic bandwidth allocation among different packet flows (in collaboration with project Crossbow)
- Optimal cache-aware thread migration in Solaris - Prediction of CPU power consumption of different applications based on performance monitoring events and then using this information for dynamic adjustment of the CPU frequency.
- Dynamic tuning of garbage collection policies in the G1 HotSpot garbage collector (preliminary report describing the mathematical framework of the G1 tuning is already available)
- Dynamic tuning of parameters for the Real-time Java garbage collector (a dynamic optimization algorithm was already implemented and will appear in the next Real-time Java release; a report describing this work will be available soon). The conference and journal publications for this project can be found at http://research.sun.com/people/vengerov/publications.html |