Developersummit
  • HOME
  • SPEAKERS
  • SESSIONS
  • SCHEDULE
  • FAQ
  • BUY TICKETS
  • ONDEMAND
  • CONTACT
saltmarch

GIDS news media, articles, insights and virtual events educate and illuminate its audiences so they can be fully prepared to deal with the new realities at work and in their professions.

Saltmarch On-Demand
Media

Our Experts

Videos On Demand

Insights

Call for Papers

Connect

About Us

Privacy Policy

Terms & Conditions

Contact Us

Subscribe to Developersummit

Get the latest event updates, and insights from today's leading voices.

© 2026-2027 Saltmarch. All rights reserved.

Orchestrating Thousands of GPUs: Engineering Patterns for Large-Scale Model Training
RegisterTwitterLinkedInFacebook

< session />

Orchestrating Thousands of GPUs: Engineering Patterns for Large-Scale Model Training

Tue, April 21 at 2:00 PM - 3:00 PM GMT+5:30ArchitectureDeepTech OpsTech

Training large AI models requires more than raw compute. It demands careful orchestration of multi-node GPU systems, robust communication, and disciplined engineering trade-offs. This session traces the shift from traditional computing models to large-scale parallel training, explaining how distributed training works beneath the surface and what it takes to make it reliable in production. The talk examines real-world challenges in distributed data processing, introduces the five dimensions of parallelism, and walks through practical heuristics and trade-off decisions used to scale AI training architectures across diverse hardware environments.

What You Will Learn

  • How gradient synchronization, collective operations, and fault tolerance operate in practice, including the role of frameworks such as NCCL, Gloo, and MPI

  • The five dimensions of parallelism and how data, tensor, pipeline, expert, and context parallelism are applied at scale

  • Engineering trade-offs across communication patterns, memory management, network topology, and resource utilization in distributed training systems

Who Should Attend

  • Software Architects

  • Platform Engineers

  • Distributed Systems Engineers

  • Infrastructure and Systems Practitioners

  • Technical Leads working on large-scale compute platforms

< speaker_info />

About the speaker

Krishnaswamy Subramanian

Krishnaswamy Subramanian

Principal Consultant, Thoughtworks

Krishnaswamy Subramanian is a Principal Consultant at Thoughtworks with over 18 years of experience in custom software development. As an "expert generalist," he specializes in solving complex technical challenges across full-stack development, mobile applications, and DevOps. His expertise encompasses databases, infrastructure, and Kubernetes, with a proven track record of leading large-scale infrastructure projects.

Throughout his career, Krishnaswamy has served as technical leader, advisor, and principal architect. He is passionate about empowering teams and delivering impactful, scalable solutions. A dedicated knowledge sharer, he has presented at multiple conferences and actively contributes to open-source projects, demonstrating his commitment to technological innovation and community collaboration.

His technical approach focuses on understanding system architectures and creating innovative solutions through strategic development.

Related Talks

From Noise to Signal: Using MCP Servers for AI-Driven Alerting and Monitoring

Tue, April 21

From Noise to Signal: Using MCP Servers for AI-Driven Alerting and Monitoring

Devyani Kota
How Google Built a Consistent, Global Authorization System; and You Can Too!

Thu, April 23

How Google Built a Consistent, Global Authorization System; and You Can Too!

Sohan Maheshwar
Creating Architectures with the Aid of AI

Wed, April 22

Creating Architectures with the Aid of AI

Venkat Subramaniam

On-Demand Talks

Microservice Migration Roadmap

Microservice Migration Roadmap

Mike Amundsen
Architecture Foundations: Styles & Patterns (2021)

Architecture Foundations: Styles & Patterns (2021)

Neal Ford
Hypermedia and the Rest of REST

Hypermedia and the Rest of REST

Michael Carducci
Analyzing Architecture Tradeoffs

Analyzing Architecture Tradeoffs

Mark Richards
Expanding Omnichannel Retail Horizon: Headless Commerce

Expanding Omnichannel Retail Horizon: Headless Commerce

Manasvi Sharma
Building Better Applications with Architecture Best Practices

Building Better Applications with Architecture Best Practices

Pallavi Bhosale
All On-Demand »