Skip to main content

Traceloop Launches to Replace Vibes and Prompting With Data and Insight

Traceloop optimizes AI agent performance as an observability and evaluation layer on top of frameworks like MCP, Agent2Agent, AGNTcy

Traceloop today announced its launch for general availability as well as the closing of $6.1 million in seed funding led by Sorenson Capital and Ibex Investors, with additional participation from Y-Combinator, Samsung NEXT, and Grand Ventures. The investment will accelerate product development, expand go-to-market efforts, and support Traceloop’s mission to make AI agents production-ready and enterprise-grade.

Public model benchmarks don’t actually reveal how a new AI model will perform in a business setting. Businesses today gather data from seeing when an agent goes wrong and spend days tweaking prompts in a tedious trial and error approach. The software engineering practices of Continuous Integration and Continuous Deployment (CI/CD) have yet to be established for AI agents which leads to customer churn by users disengaging from unpredictable and buggy assistants.

“Prompt engineering shouldn’t be a guessing game or have to rely on ‘vibes’ to be successful,” said Nir Gazit, co-founder and CEO, Traceloop. “It should be like the rest of engineering – observable, testable, and reliable. When we bring the same rigor to AI that we expect from the rest of our stack, we unlock its full potential.”

New agent frameworks and protocols from OpenAI, Anthropic, and Google make it easier than ever to connect AI systems to external data and trigger autonomous actions. But as these agents grow more complex, developers face two persistent challenges. First, they lack visibility into how decisions are made. Second, they have no reliable way to evaluate performance in real-world conditions. With varying and arbitrary criteria, public benchmarks and testing methods often fall short once applications move into production. When AI agents misfire, whether by hallucinating, taking the wrong action, or producing unpredictable outputs, users do not file bug reports. They disengage.

"Trust but verify. It's no secret that LLMs represent a step-function improvement in how humans interact with data. But their confidence — and potential for inaccuracy — makes AI agents that much more dangerous. IBM, Cisco, Dynatrace, and others already rely on Traceloop's core technology for agent observability and verification, ensuring that AI agents function as intended. I expect the adoption of verification tools to outpace that of LLMs themselves," said Aaron Rinberg of Ibex Investors.

Built on open-source OpenLLMetry, Traceloop is now available as a commercial platform that helps teams test, troubleshoot, and improve AI agents before they reach users. By replacing manual “vibe checks” with automated evaluations, it gives teams the tools to reduce guesswork, deploy changes more frequently with data-backed confidence, and catch quality issues before they reach production – enabling faster iteration, more reliable outputs, and greater confidence in every release.

“Miro Insights processes millions of conversations across their platform. At that scale, edge cases appear almost instantly. We can’t assume that what works in testing will behave the same way in production,” said Eu-Tak Kong, AI Engineer at Miro. “Traceloop gives us real-world performance visibility, flags critical edge cases, and helps us confidently experiment with and migrate to new models like GPT-4.1 without disrupting the user experience.”

“Agents are rapidly becoming AI’s de facto customer-facing technology, but teams are still relying on customer feedback to iterate,” said Vidya Raman, partner, at Sorenson Capital. “Traceloop has the right technology at the right moment to offer substantial value immediately.”

Traceloop is founded by a veteran team with hybrid experience in machine learning artificial intelligence and enterprise software development. Gazit’s tenure includes four years at Google, where he led a team of engineers responsible for building models for predicting user engagement and retention using internal LLMs. Gal Kleinman, Traceloop’s cofounder and CTO, previously led the development of Fiverr’s machine learning platform and data infrastructure.

The company’s open-source technology, OpenLLMetry, powers critical AI systems at enterprise scale. IBM uses OpenLLMetry together with its Instana platform to monitor the performance of large language models running on services like Amazon Bedrock and IBM watsonx.ai, helping teams understand how AI applications behave in real-world conditions. Altogether, OpenLLMetry now sees half a million monthly installs across open-source packages with 5.6 thousand stars, over sixty contributors, and 50,000 weekly active installations of the SDK.

For more information, or to try a demo, visit the Traceloop website.

About Traceloop

Traceloop stops developers from shipping agents based on vibes. The company brings automated evaluation and monitoring to generative AI development to predict performance and prevent errors from reaching users. It is an enterprise ready solution built by developers who created the industry leading open source technology OpenLLMetry.

Traceloop is backed by marquee venture capitalists like Sorenson Capital, Ibex Investors, Y-Combinator, Samsung NEXT, and Grand Ventures.

When AI agents misfire, whether by hallucinating, taking the wrong action, or producing unpredictable outputs, users do not file bug reports. They disengage.

Contacts

Stock Quote API & Stock News API supplied by www.cloudquote.io
Quotes delayed at least 20 minutes.
By accessing this page, you agree to the following
Privacy Policy and Terms Of Service.