PASH

Productivity-Aware Scheduling in HPC

GitHub

About PASH

Improving the productivity of a high performance computing (HPC) system has been a non-trivial challenge since the inception of the petascale computing. As the modern HPC system is heading towards the exascale era, the design complexity of the solutions for the HPC productivity challenge is compounded with the additional constraint on the system-wide power consumption. The U.S. Department of Energy has mandated to operate a future exascale system under a strict power budget of 20MW - 30MW to support efficient electricity generation and distribution, and to keep the operational cost of an exascale computing system manageable. These challenges have contributed to pushing the delivery of the first exascale system from 2018 to 2022. In PASH project, we address the combined challenge of maximizing HPC productivity under a system-wide power constraint through power-aware resource management.

Publications and News

  1. "A Value-Oriented Job Scheduling Approach for Power-Constrained and Oversubscribed HPC Systems,” IEEE Transactions on Parallel and Distributed Systems, 14 pages, Jan. 2020.
  2. "Adaptive Power Reallocation for Value-Oriented Schedulers in Power-Constrained HPC,” 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), 7 pages, Dec. 2019, (best paper award).
  3. "Utility-based resource management in an oversubscribed energy-constrained heterogeneous environment executing parallel applications,” Parallel Computing, 25 pages, Apr. 2019.
  4. "An empirical survey of performance and energy efficiency variation on Intel processors,” 5th International Workshop on Energy Efficient Supercomputing (E2SC), 8 pages, Nov. 2017.
  5. "Value Based Scheduling for Oversubscribed Power-Constrained Homogeneous HPC Systems,” International Conference on Cloud and Autonomic Computing (ICCAC), 11 pages, Sep. 2017.
  6. "Value of service based resource management for large-scale computing systems,” Cluster Computing, 18 pages, May 2017.
  7. "Just In Time Architecture (JITA) for dynamically composable data centers,” IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), 8 pages, Dec. 2016.
  8. "Value of Service Based Task Scheduling for Cloud Computing Systems,” International Conference on Cloud and Autonomic Computing (ICCAC), 11 pages, Sep. 2016.
  9. "Value-Based Resource Management in High-Performance Computing Systems,” ACM 7th Workshop on Scientific Cloud Computing, 8 pages, June 2016.

Project Status and Future Work

In PASH project, we proposed various static and dynamic power-aware resource management strategies to maximize system productivity under a system-wide power constraint. We define a job's productivity using a time-dependent value function to measure the importance of application output. Our resource management strategies, combine the job value functions with their power-performance models to make informed scheduling decisions at runtime. We evaluated our strategies on a real HPC cluster at Lawrence Livermore National Laboratory. We also developed a simulation environment to evaluate our scheduling algorithms on hypothetical systems.

In the future, we plan to extend the capabilities of our simulation environment by including manufacturing variations in the simulated CPUs, and design scheduling algorithms for workflow-based applications.

Current Contributors