
“Once again Saltmarch has knocked it out of the park with interesting speakers, engaging content and challenging ideas. No jetlag fog at all, which counts for how interesting the whole thing was.”
Cybersecurity Lead, PwC

“Very much looking forward to next year. I will be keeping my eye out for the date so I can make sure I lock it in my calendar.”
Software Engineering Specialist, Intuit

“Best conference I have ever been to with lots of insights and information on next generation technologies and those that are the need of the hour.”
Software Architect, GroupOn

“Happy to meet everyone who came from near and far. Glad to know you've discovered some great lessons here, and glad you joined us for all the discoveries great and small.”
Web Architect & Principal Engineer, Scott Davis

“Wonderful set of conferences, well organized, fantastic speakers, and an amazingly interactive set of audience. Thanks for having me at the events!”
Founder of Agile Developer Inc., Dr. Venkat Subramaniam

“What a buzz! The events have been instrumental in bringing the whole software community together. There has been something for everyone from developers to architects to business to vendors. Thanks everyone!”
Voltaire Yap, Global Events Manager, Oracle Corp.
AI inference has become the new production workload: always on, cost-intensive, and increasingly complex. Teams face unpredictable latency spikes, runaway GPU costs, and limited visibility across agentic and retrieval pipelines. This session presents a vendor-aware playbook for building reliable, observable, and sustainable inference systems at scale.
Grounded in the Google Cloud AI/ML Well-Architected Framework, Azure AI Workload Guidance, and Databricks Lakehouse Principles, the session explores practical strategies for managing latency, cost, and environmental impact. Attendees will learn how to design resilient inference flows using asynchronous queues, caching, and GPU pooling; implement full-stack observability for prompt, vector, and GPU metrics; and apply FinOps and GreenOps practices for financial and energy efficiency.
Through real-world case studies and cross-cloud design patterns, you will gain a framework for making AI inference performant, cost-effective, and planet-friendly.
What You Will Learn
How to engineer reliable inference pipelines using queueing, caching, and GPU pooling
Methods for full-stack observability across prompts, vector queries, and GPU utilization
FinOps guardrails for cost control and GreenOps strategies for sustainable AI workloads
How to align reliability, cost, and sustainability principles across GCP, Azure, and Databricks
Who Should Attend
AI engineers, software architects, DevOps specialists, and FinOps or GreenOps practitioners responsible for optimizing large-scale AI inference systems for performance, cost, and sustainability.