Luis Quintanilla avatarLuis Quintanilla
HomeAboutContactSearchSubscribe
BlogrollPodrollYouTubeForums
Starter PacksTravel Guides
AlbumsPlaylists
RadioTags
SnippetsWikiPresentationsRead Later
AI Memex

evaluation

A list of content tagged evaluation

Responses

  • Why We Are Excited About Confessions
  • How confessions can keep language models honest
  • Introducing Metrax: performant, efficient, and robust model evaluation metrics in JAX
  • Measuring AI Ability to Complete Long Tasks
  • Evaluating Context Compression for AI Agents
  • Predictive Human Preference: From Model Ranking to Model Routing

Bookmarks

  • Evaluating LLMs is a minefield
  • Best Practices for LLM Evaluation of RAG Applications
  • MLflow 2.8 with LLM-as-a-judge metrics and Best Practices for LLM Evaluation of RAG Applications