deepeval

Star

Here are 54 public repositories matching this topic...

kanchengw / cnllm

Star

中文大模型通用SDK，系统性优化接口适配、增强响应解析和批量处理等能力，深度适配 OpenAI 生态内 LangChain、LlamaIndex、AutoGen 等大模型应用框架。支持作为Agent Skill部署到各种AI编程工具。

Updated May 23, 2026
Python

bansalkanav / GenAI-AgenticAI-From-Zero-to-Production

Star

Learn GenAI and Agentic AI from Zero to Production

Updated May 10, 2026
Jupyter Notebook

avnlp / rag-pipelines

Star

Advanced RAG Pipelines and Evaluation

pubmed unstructured rag baml milvus earnings-calls contextual-ai llm langgraph rag-pipeline agentic-rag deepeval financebench healthbench

Updated May 18, 2026
Python

Advanced RAG pipeline optimization framework using DSPy. Implements modular RAG pipelines with Query-Rewriting, Sub-Query Decomposition, and Hybrid Search via Weaviate. Automates prompt tuning and few-shot selection using GEPA, SIMBA, MIPRO, COPRO, and BootstrapFewShot optimizers on datasets like FreshQA, HotpotQA, TriviaQA, Wikipedia and PubMedQA.

metadata-extraction query-rewriting rag weaviate dspy rag-pipeline deepeval sub-query-generation

Updated May 21, 2026
Python

MERakram / Advanced-RAG-monorepo

Star

🚀 Production-ready modular RAG monorepo: Local LLM inference (vLLM) • Hybrid retrieval with Qdrant • Semantic caching • Docling document parsing • Cross-encoder reranking • DeepEval evaluation • Full observability with Langfuse • Open WebUI chat interface • OpenAI-compatible API • Fully Dockerized

python nlp ai self-hosted reranking rag fastapi vector-database cross-encoder qdrant vllm langfuse open-webui deepeval

Updated Jan 28, 2026
Python

JohnRitchie / qa-llm-guard

Star

python pytest allure testing-framework qa-automation llm-testing deepeval

Updated May 20, 2025
Python

adityapradhan202 / BNS-LexAI

Star

BNS-LexAI is an AI-powered legal information and case understanding assistant.

docker python3 fastapi streamlit generative-ai pineconedb google-ai-studio deepeval

Updated Feb 1, 2026
Jupyter Notebook

gonzaloMorenoc / ai-testing-lab

Star

pytest lab for testing LLMs: RAG eval, red teaming, guardrails, drift monitoring — 14 modules, 382 tests, zero API calls needed

Updated May 13, 2026
Python

avi350751 / test-llm-with-deepeval

Star

A hands-on exploration of Deepeval — an open-source framework for evaluating and red-teaming large language models (LLMs). This repository documents my journey of testing, benchmarking, and improving LLM reliability using custom prompts, metrics, and pipelines.

evals deepeval llmtesting

Updated Nov 2, 2025
Jupyter Notebook

hellolets / letsrag

Star

Step-by-step guide to building a local RAG system from scratch. Learn hybrid search, reranking, HyDE, and evaluation... 100% free, no cloud required.

python semantic-search bm25 reranker rag fastapi hybrid-search llm ollama chonkie deepeval

Updated Mar 1, 2026
Python

ahmedbutt2015 / deal-agent

Star

Drop in deal documents → get an onboarding plan, draft invoice, and stakeholder summary. Multi-agent LangGraph pipeline with RAG, human approval, and self-correcting retries.

multi-agent openai ai-agents fastapi streamlit document-intelligence langchain llm-agent retrieval-augmented-generation langgraph deepeval

Updated Apr 16, 2026
Python

gabonavarroo / faultmap

Star

Automatically discover where and why your LLM is failing — embedding-space clustering + statistical hypothesis testing to surface input slices with elevated failure rates and audit test suite coverage gaps.

python testing clustering evaluation embeddings hypothesis-testing observability hdbscan llm litellm ragas deepeval

Updated Apr 15, 2026
Python

kothakota-bindu / finsight-ai-testing

Star

Production-grade LLM evaluation pipeline for RAG chatbot — DeepEval + RAGAS + Garak + CI/CD | Financial domain | 7 metrics | Adversarial testing

python pytest fintech llama rag github-actions groq langchain ai-quality llm-evaluation ragas llm-testing deepeval garak

Updated May 6, 2026
Python

kothakota-bindu / Medical-chatbot

Star

LLM evaluation framework for medical chatbot — DeepEval quality + RAG metrics + hallucination detection + red teaming | pytest CI/CD | LLaMA 3.1 8B | Groq

python pytest llama ai-safety red-teaming rag github-actions groq medical-chatbot llm-evaluation hallucination-detection llm-testing deepeval

Updated May 6, 2026
Python

Michelin-Ensimag / AI-Agent-Testing

Star

A research project to measure AI agent robustness. Contains automated testing pipelines and a benchmarking methodology developed to audit Agentic AI architectures for complex reasoning flaws.

python benchmarking mcp ai-agents generative-ai langchain llm-evaluation llm-as-a-judge deepeval

Updated May 21, 2026
Python

SchadenKai / Clinical-RAG

Star

[UNDER DEVELOPMENT] Clinical-RAG is a production-grade, citation-backed AI system designed to bridge the "Trust Gap" in medical information retrieval.

milvus healthcare-ai langchain-python rag-pipeline rag-chatbot langgraph-python deepeval

Updated Mar 14, 2026
Python

piyush-genai / rag-copilot-eval

Star

Production RAG evaluation pipeline: RAGAS (faithfulness · context recall · answer relevancy) + DeepEval (hallucination scoring). Lambda-triggered on KB updates. Regression gating blocks deployment at >5% metric drop.

python aws lambda evaluation bedrock rag llmops ragas deepeval deployment-gating

Updated May 16, 2026
Python

KooshaPari / kwality

Star

STRICTLY DO NOT DELETE NOR UNARCHIVE - Personal Project - LLM validation platform

testing validation ai tdd neo4j observability claude playwright llm deepeval

Updated May 21, 2026
Makefile

viniciusfinger / surgical-planning-ai

Star

AI assistant structures perioperative planning (ASA, checklist, post-op care) from indications and patient context

ai agents healthcare-ai langchain langgraph deepeval

Updated May 13, 2026
Python

serhiismetanskyi / llm-output-evaluation-with-deepeval

Star

DeepEval LLM quality evaluation tests with LLM-as-a-judge

python ai pytest llm llm-evaluation deepeval

Updated Mar 17, 2026
Python

Improve this page

Add a description, image, and links to the deepeval topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the deepeval topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepeval

Here are 54 public repositories matching this topic...

kanchengw / cnllm

bansalkanav / GenAI-AgenticAI-From-Zero-to-Production

avnlp / rag-pipelines

avnlp / dspy-opt

MERakram / Advanced-RAG-monorepo

JohnRitchie / qa-llm-guard

adityapradhan202 / BNS-LexAI

gonzaloMorenoc / ai-testing-lab

avi350751 / test-llm-with-deepeval

hellolets / letsrag

ahmedbutt2015 / deal-agent

gabonavarroo / faultmap

kothakota-bindu / finsight-ai-testing

kothakota-bindu / Medical-chatbot

Michelin-Ensimag / AI-Agent-Testing

SchadenKai / Clinical-RAG

piyush-genai / rag-copilot-eval

KooshaPari / kwality

viniciusfinger / surgical-planning-ai

serhiismetanskyi / llm-output-evaluation-with-deepeval

Improve this page

Add this topic to your repo