Scale AI provides data labeling, RLHF annotation, and AI evaluation services for training frontier models. Powers the data…
Humanloop is an LLM engineering platform for managing prompts, evaluating quality, and fine-tuning models. Teams iterate on AI…
Vellum is an AI product development platform with prompt versioning, side-by-side comparisons, and evaluation workflows. Product and engineering…
Mirascope is a Python toolkit for building LLM applications with clean abstractions for prompts, calls, and extractions. Type-safe…
HoneyHive is an AI evaluation and observability platform for teams building LLM applications. Dataset management, automated evaluations, and…
Agenta is an open-source LLMOps platform for prompt management, evaluation, and deployment. Teams collaborate on prompts, run systematic…
Eden AI provides a unified API for 100+ AI models across text, image, audio, and video. Test and…
Portkey is an AI gateway providing unified access to 200+ LLMs with built-in observability, caching, and fallbacks. Production-grade…
Unify automatically routes LLM requests to the cheapest or fastest provider based on your optimization criteria. Benchmark any…
DeepEval is an open-source LLM evaluation framework with 14+ evaluation metrics including hallucination, answer relevancy, and faithfulness. pytest-style…
TruLens is an open-source framework for evaluating and tracking LLM applications. Feedback functions assess truthfulness, harmlessness, and helpfulness…
Phoenix by Arize is an open-source AI observability library for ML engineers. Traces LLM and embedding applications, visualizes…
More Categories
Submit it free — reviewed within 24–48 hours.