Braintrust
NEWBraintrust is an enterprise AI evaluation platform for measuring, improving, and shipping AI applications. Logging, evaluation datasets, prompt…
Helicone
NEWHelicone provides one-line LLM observability — add a single line to your OpenAI calls and get full logging,…
Opik
NEWOpik by Comet is an open-source LLM evaluation framework for testing AI application quality at scale. Automated evaluation…
Langfuse
NEWLangfuse is an open-source LLM engineering platform for observability, testing, and prompt management. Debug production AI issues, evaluate…
PromptLayer
NEWPromptLayer is a platform for tracking, managing, and evaluating LLM prompts in production. Log every prompt and completion,…
Guardrails AI
NEWGuardrails AI adds input/output validation to LLM applications. Define rules for what the LLM can and cannot say,…
LiteLLM
NEWLiteLLM provides a unified API for 100+ LLM providers using the OpenAI format. Switch between GPT-4, Claude, Gemini,…
Instructor
NEWInstructor makes it easy to get structured outputs from LLMs using Python type hints. Define a Pydantic model…
LlamaIndex
NEWLlamaIndex is a data framework for building LLM-powered applications over your data. Simple connectors for 160+ data sources,…
DSPy
NEWDSPy is a framework for algorithmically optimizing LLM prompts and weights. Instead of manually writing prompts, DSPy compiles…
Milvus
NEWMilvus is a cloud-native vector database built for billion-scale AI applications. Handles trillion-dimensional vectors with hybrid search combining…
Qdrant
NEWQdrant is a high-performance vector database optimized for production AI applications. Rust-based engine delivers exceptional speed with filtering…