
Krakow, Poland, 17 - 19 June 2026
Building AI Agents is accessible, but ensuring their reliability in production is a major engineering challenge. Unlike deterministic software, Agents are probabilistic. A binary "Pass/Fail" test is often insufficient to capture the nuances of an agent's reasoning process.
In this talk, we explore "Evaluation-Driven Development"—a paradigm shift for Python engineers building AI systems. We will focus on measuring the quality of agent trajectories using Python tools and visualizations.
The session covers:
In this talk, we explore "Evaluation-Driven Development"—a paradigm shift for Python engineers building AI systems. We will focus on measuring the quality of agent trajectories using Python tools and visualizations.
The session covers:
- From Testing to Evaluation: Why we need to move beyond standard assertions to probabilistic scoring (0.0 to 1.0) for Generative AI.
- Metrics as Code: Implementing specific evaluation metrics in Python:
- Faithfulness: Scoring whether the answer is grounded in the retrieved context to detect hallucinations.
- Tool Selection Accuracy: Evaluating if the agent chose the correct tool (e.g., search vs. calculation) for the user's intent.
- Answer Relevancy: Using embedding similarity to measure if the response actually answers the prompt.
- Visualizing the Black Box: A live demo using Streamlit. We will showcase a custom dashboard that runs these evaluations, allowing developers to visualize the "reasoning trace" and identify exactly where the agent failed (Retrieval layer vs. Generation layer).
- The Feedback Loop: How to use these evaluation scores to iteratively improve prompts and context retrieval logic.
Sho Tanaka
Snowflake
Sho Tanaka is a Lead Developer Advocate at Snowflake, focused on AI/ML and data engineering. He previously worked at Google (gTech) delivering ML/data solutions across Japan, APAC and global, and he is a Google Developer Expert (AI/ML) and co-founder of MLOps community in Japan. He enjoys turning messy real-world ML projects into reproducible, production-minded architectures.
Ticket prices will go up in...
44
Days
:
14
Hours
:
20
Minutes
:
30
Seconds
You missed out!
Venue address
ICE Krakow, ul. Marii Konopnickiej 17
Phone
+48 691 793 877
info@devoxx.pl
