< session />

Advancing High-Performance Computing: The Hybrid Scheduler Approach at the D. E. Shaw Group

Thu, 25 April

This session offers a detailed discussion of a hybrid computing scheduler developed at the D. E. Shaw group. The scheduler was created to address challenges associated with on-premises compute clusters for high-performance computing workloads. These clusters, often used for workloads such as scientific simulations and data analytics, pose challenges, including the cost of maintaining and updating hardware and the struggle to scale and test resources quickly. While the cloud isn't always more cost-efficient than an on-premises solution, it does offer much-needed flexibility.

Our high-performance hybrid computing scheduler allows our researchers to leverage both on-premises and cloud computing infrastructure using the same APIs they use internally. In addition to optimal scheduling, the hybrid computing scheduler offers advanced features like the ability to dynamically resize resources, predict future needs, monitor resource consumption, provide alerts, and more. It acts as a bridge between our researchers and both on-premises data centers and cloud computational resources.

In this session, we will present our hybrid computing scheduler and discuss (a) how we’ve addressed challenges such as application compatibility, cost management, vendor lock-in, and network connectivity, and (b) how the scheduler has been beneficial to our business.

< speaker_info />

About the speaker

Alex Tringham

Vice President, The D. E. Shaw Group

Alex Tringham is a Vice President in Quant Systems at the D. E. Shaw group in London. He leads the team responsible for the High-Performance Computing (HPC) and Orchestration platform. His team focuses on building robust and efficient systems, including the creation of large platforms. He has expertise in distributed systems and cloud computing.