evaluation
A list of content tagged evaluation
Responses
- Why We Are Excited About Confessions
- How confessions can keep language models honest
- Introducing Metrax: performant, efficient, and robust model evaluation metrics in JAX
- Measuring AI Ability to Complete Long Tasks
- Evaluating Context Compression for AI Agents
- Predictive Human Preference: From Model Ranking to Model Routing