
“Once again Saltmarch has knocked it out of the park with interesting speakers, engaging content and challenging ideas. No jetlag fog at all, which counts for how interesting the whole thing was.”
Cybersecurity Lead, PwC

“Very much looking forward to next year. I will be keeping my eye out for the date so I can make sure I lock it in my calendar.”
Software Engineering Specialist, Intuit

“Best conference I have ever been to with lots of insights and information on next generation technologies and those that are the need of the hour.”
Software Architect, GroupOn

“Happy to meet everyone who came from near and far. Glad to know you've discovered some great lessons here, and glad you joined us for all the discoveries great and small.”
Web Architect & Principal Engineer, Scott Davis

“Wonderful set of conferences, well organized, fantastic speakers, and an amazingly interactive set of audience. Thanks for having me at the events!”
Founder of Agile Developer Inc., Dr. Venkat Subramaniam

“What a buzz! The events have been instrumental in bringing the whole software community together. There has been something for everyone from developers to architects to business to vendors. Thanks everyone!”
Voltaire Yap, Global Events Manager, Oracle Corp.
An agent that performs well in a demo still faces a harder test in production, where real users, changing prompts, and unstable tools expose hidden weaknesses. This session focuses on turning a working agent into a system you can trust. Using a single concrete agent as the running example, the session defines what reliability means for multi step, tool using behavior, including success, partial success, and failure modes. It then shows how to design evaluations that reflect real usage by building golden datasets grounded in actual user intent and scenario based tests that cover full action paths. You will also learn how to structure evaluation runs, score outcomes to surface brittleness and silent failures, and set up regression tests that detect breakage as prompts, tools, or APIs change over time.
What You will Learn
How to define and classify reliability for multi step agents, including partial success and failure modes
How to build production relevant evaluation suites using golden datasets and scenario based action path tests
How to operationalize evaluation through scoring, regression testing, and monitoring for prompt, tool, and API drift
Who Should Attend
Developers building or maintaining AI agents
Engineers responsible for testing and reliability of AI systems
AI and ML practitioners deploying agents into production environments
Technical Leads overseeing quality and long term robustness of agent based systems