Krakow, Poland, 17 - 19 June 2026

Sho Tanaka
Snowflake

Sho Tanaka is a Lead Developer Advocate at Snowflake, focused on AI/ML and data engineering. He previously worked at Google (gTech) delivering ML/data solutions across Japan, APAC and global, and he is a Google Developer Expert (AI/ML) and co-founder of MLOps community in Japan. He enjoys turning messy real-world ML projects into reproducible, production-minded architectures.

View
How to evaluate AI Agent to be robust Intelligence
Conference - Short (INTERMEDIATE level)

Building AI Agents is accessible, but ensuring their reliability in production is a major engineering challenge. Unlike deterministic software, Agents are probabilistic. A binary "Pass/Fail" test is often insufficient to capture the nuances of an agent's reasoning process.

In this talk, we explore "Evaluation-Driven Development"—a paradigm shift for Python engineers building AI systems. We will focus on measuring the quality of agent trajectories using Python tools and visualizations.

The session covers:

  1. From Testing to Evaluation: Why we need to move beyond standard assertions to probabilistic scoring (0.0 to 1.0) for Generative AI.
  2. Metrics as Code: Implementing specific evaluation metrics in Python:
    1. Faithfulness: Scoring whether the answer is grounded in the retrieved context to detect hallucinations.
    2. Tool Selection Accuracy: Evaluating if the agent chose the correct tool (e.g., search vs. calculation) for the user's intent.
    3. Answer Relevancy: Using embedding similarity to measure if the response actually answers the prompt.
  3. Visualizing the Black Box: A live demo using Streamlit. We will showcase a custom dashboard that runs these evaluations, allowing developers to visualize the "reasoning trace" and identify exactly where the agent failed (Retrieval layer vs. Generation layer).
  4. The Feedback Loop: How to use these evaluation scores to iteratively improve prompts and context retrieval logic.
More

Searching for speaker images...

Ticket prices will go up in...

44
Days
:
 
14
Hours
:
 
20
Minutes
:
 
30
Seconds

You missed out!

Venue address

ICE Krakow, ul. Marii Konopnickiej 17

Phone

+48 691 793 877

Email

info@devoxx.pl

Social Media