Agents, LLM Evals, OpenTelemetry – Lessons from Building Arize Phoenix in 2024
14 points
2 days ago
| 0 comments
| HN
We're the team behind Arize Phoenix, an open-source LLM observability tool that we launched last year.

This past year, we watched the AI industry shift incredibly quickly. Agents became the new norm, LLM evaluation went from a niche interest to a product requirement, and OpenTelemetry became the LLM observability standard. And somewhere in the middle of all that, our open-source project went from 20k monthly downloads to 2.5M.

Along the way, we learned a ton: what it takes to adapt OpenTelemetry for LLM observability, why agents need better evaluation frameworks than existing function calling evals and skill checks, and how to effectively combine synthetic and non-synthetic data for evaluations. It wasn't always smooth sailing, but figuring it out felt like we were doing our small part in moving the whole space forward.

The support from the community has fueled our ability to double down on development. We wrapped up 2024 with major launches like Prompt Playground and Sessions, and we have a ton more on the horizon.

Check out our full reflection here for more: https://arize.com/blog/arize-phoenix-2024-in-review/

And finally, a big thank you to everyone who's tried out Arize Phoenix, left comments, submitted issues, and come to meet ups. We, quite literally, couldn't do it without you!

- The Phoenix Team (Mikyo, Xander, Dustin, Roger, Parker, Tony, and John)

No one has commented on this post.