Scale AI
NEWScale AI provides data labeling, RLHF annotation, and AI evaluation services for training frontier models. Powers the data…
Humanloop
NEWHumanloop is an LLM engineering platform for managing prompts, evaluating quality, and fine-tuning models. Teams iterate on AI…
Vellum AI
NEWVellum is an AI product development platform with prompt versioning, side-by-side comparisons, and evaluation workflows. Product and engineering…
Mirascope
NEWMirascope is a Python toolkit for building LLM applications with clean abstractions for prompts, calls, and extractions. Type-safe…
HoneyHive
NEWHoneyHive is an AI evaluation and observability platform for teams building LLM applications. Dataset management, automated evaluations, and…
Agenta
NEWAgenta is an open-source LLMOps platform for prompt management, evaluation, and deployment. Teams collaborate on prompts, run systematic…
Eden AI
NEWEden AI provides a unified API for 100+ AI models across text, image, audio, and video. Test and…
Portkey AI
NEWPortkey is an AI gateway providing unified access to 200+ LLMs with built-in observability, caching, and fallbacks. Production-grade…
Unify AI
NEWUnify automatically routes LLM requests to the cheapest or fastest provider based on your optimization criteria. Benchmark any…
DeepEval
NEWDeepEval is an open-source LLM evaluation framework with 14+ evaluation metrics including hallucination, answer relevancy, and faithfulness. pytest-style…
TruLens
NEWTruLens is an open-source framework for evaluating and tracking LLM applications. Feedback functions assess truthfulness, harmlessness, and helpfulness…
Phoenix Arize
NEWPhoenix by Arize is an open-source AI observability library for ML engineers. Traces LLM and embedding applications, visualizes…