Developersummit
  • HOME
  • SPEAKERS
  • SESSIONS
  • SCHEDULE
  • FAQ
  • BUY TICKETS
  • ONDEMAND
  • CONTACT
saltmarch

GIDS news media, articles, insights and virtual events educate and illuminate its audiences so they can be fully prepared to deal with the new realities at work and in their professions.

Saltmarch On-Demand
Media

Our Experts

Videos On Demand

Insights

Call for Papers

Connect

About Us

Privacy Policy

Terms & Conditions

Contact Us

Subscribe to Developersummit

Get the latest event updates, and insights from today's leading voices.

© 2026-2027 Saltmarch. All rights reserved.

AI Inference at Scale: Reliability, Observability, Cost, and Sustainability
RegisterTwitterLinkedInFacebook

< session />

AI Inference at Scale: Reliability, Observability, Cost, and Sustainability

Wed, April 22 at 11:00 AM - 12:00 PM GMT+5:30DataTech OpsTech Architecture

AI inference has become the new production workload: always on, cost-intensive, and increasingly complex. Teams face unpredictable latency spikes, runaway GPU costs, and limited visibility across agentic and retrieval pipelines. This session presents a vendor-aware playbook for building reliable, observable, and sustainable inference systems at scale.

Grounded in the Google Cloud AI/ML Well-Architected Framework, Azure AI Workload Guidance, and Databricks Lakehouse Principles, the session explores practical strategies for managing latency, cost, and environmental impact. Attendees will learn how to design resilient inference flows using asynchronous queues, caching, and GPU pooling; implement full-stack observability for prompt, vector, and GPU metrics; and apply FinOps and GreenOps practices for financial and energy efficiency.

Through real-world case studies and cross-cloud design patterns, you will gain a framework for making AI inference performant, cost-effective, and planet-friendly.

What You Will Learn

  • How to engineer reliable inference pipelines using queueing, caching, and GPU pooling

  • Methods for full-stack observability across prompts, vector queries, and GPU utilization

  • FinOps guardrails for cost control and GreenOps strategies for sustainable AI workloads

  • How to align reliability, cost, and sustainability principles across GCP, Azure, and Databricks

Who Should Attend

AI engineers, software architects, DevOps specialists, and FinOps or GreenOps practitioners responsible for optimizing large-scale AI inference systems for performance, cost, and sustainability.

< speaker_info />

About the speaker

Rohit Bhardwaj

Rohit Bhardwaj

Director of Architecture, Expert in Cloud-native Solutions

Rohit Bhardwaj is a Director of Architecture working at Salesforce. Rohit has extensive experience architecting multi-tenant cloud-native solutions in Resilient Microservices Service-Oriented architectures using AWS Stack. In addition, Rohit has a proven ability in designing solutions and executing and delivering transformational programs that reduce costs and increase efficiencies.

As a trusted advisor, leader, and collaborator, Rohit applies problem resolution, analytical, and operational skills to all initiatives and develops strategic requirements and solution analysis through all stages of the project life cycle and product readiness to execution.
Rohit excels in designing scalable cloud microservice architectures using Spring Boot and Netflix OSS technologies using AWS and Google clouds. As a Security Ninja, Rohit looks for ways to resolve application security vulnerabilities using ethical hacking and threat modeling. Rohit is excited about architecting cloud technologies using Dockers, REDIS, NGINX, RightScale, RabbitMQ, Apigee, Azul Zing, Actuate BIRT reporting, Chef, Splunk, Rest-Assured, SoapUI, Dynatrace, and EnterpriseDB. In addition, Rohit has developed lambda architecture solutions using Apache Spark, Cassandra, and Camel for real-time analytics and integration projects.

Rohit has done MBA from Babson College in Corporate Entrepreneurship, Masters in Computer Science from Boston University and Harvard University. Rohit is a regular speaker at No Fluff Just Stuff, UberConf, RichWeb, GIDS, and other international conferences.

Related Talks

Beyond Transformers: State Space Models as the Next Paradigm in AI

Fri, April 24

Beyond Transformers: State Space Models as the Next Paradigm in AI

Badri Narayana Patro
From Paper to Pixels to Pipelines

Fri, April 24

From Paper to Pixels to Pipelines

Tate Andrea Aung
Define Once, Enforce Everywhere: Model-Driven Design Using Legend

Thu, April 23

Define Once, Enforce Everywhere: Model-Driven Design Using Legend

Sumit Rastogi, Deepika Srivastava

On-Demand Talks

Accelerating AI at Scale with Data

Accelerating AI at Scale with Data

Mammad Zadeh
Multi-Tenant NoSQL and NewSQL Cloud-data Design Patterns Deep-dive

Multi-Tenant NoSQL and NewSQL Cloud-data Design Patterns Deep-dive

Rohit Bhardwaj
Stream, Store, Visualize: Fast-track Real-Time Analytics the OSS Way

Stream, Store, Visualize: Fast-track Real-Time Analytics the OSS Way

Kamesh Sampath
Emerging Models for Data Applications

Emerging Models for Data Applications

Farhan Choudhary
Data Engineering for Graph-based Retrieval Augmented Generation (GraphRAG) using Neo4j

Data Engineering for Graph-based Retrieval Augmented Generation (GraphRAG) using Neo4j

Sumit Shatwara
Kafka: The Must-Know Pub/Sub

Kafka: The Must-Know Pub/Sub

Daniel Hinojosa
All On-Demand »