Back to Case Studies
Agentic AIEnterprise

100% Private Agentic RAG API

A complete AI-powered research and writing assistant using CrewAI and LitServe with a modern glassmorphism web interface, running 100% locally for total privacy.

Oct 2025
10 min read
100% Private Agentic RAG API

Project Overview

Most RAG systems rely on cloud-based LLMs, posing significant privacy risks for sensitive data. This project implements a fully local agentic system where a Researcher agent performs deep web searches and a Writer agent synthesizes the findings, all orchestrated via LitServe and running on local Ollama instances. This ensures that no data ever leaves the user's infrastructure.

100%
Privacy
<5 min
Setup Time
Qwen2.5/Llama3
Local LLM
Low Latency
API Performance

System Architecture

The system follows a multi-layered architecture starting with a LitServe-powered API gateway. It utilizes CrewAI for agent orchestration, delegating tasks to specialized Researcher and Writer agents. The agents interact with a local Ollama server for inference, providing a seamless and private experience.

System Architecture
Figure 1: System Architecture Diagram

LitServe API

High-performance serving engine for the RAG agents.

CrewAI Agents

Coordinated Researcher and Writer agents for task completion.

Ollama Local LLM

Local inference engine ensuring data remains private.

Flask Web UI

Modern dashboard with glassmorphism for user interaction.

Implementation Details

Code Example

python
from crewai import Agent, Task, Crew\n\n# Define the Researcher Agent\nresearcher = Agent(\n  role='Senior Researcher',\n  goal='Find comprehensive info on {topic}',\n  backstory='Expert at digging through web data.',\n  llm=local_llm\n)\n\n# Define the Writer Agent\nwriter = Agent(\n  role='Content Writer',\n  goal='Write a detailed report based on research',\n  backstory='Talented writer capable of explaining complex topics.',\n  llm=local_llm\n)

Agent Memory

Using a quantized model like Qwen2.5-7B with Ollama allows for near-real-time responses on consumer-grade hardware while maintaining high reasoning quality.

Workflow

1

Query Ingestion: User submits a query via the Glassmorphism UI.\n2. Agent Delegation: CrewAI assigns the 'Researcher' to search for relevant information.\n3. Synthesis: The 'Writer' agent summarizes the research into a coherent report.\n4. Local Serving: The LitServe API delivers the result back to the Flask frontend.

Workflow Diagram
Figure 2: Workflow Diagram

Results & Impact

"The ability to run a research assistant entirely on my own machine without compromising on agent intelligence is a game-changer for our internal documents."

Data Security

Zero data leaks due to 100% local execution.

Efficiency

Automated research saves hours of manual searching.

Accessibility

Easy deployment with Docker and local API endpoints.

AIRAGCrewAILitServeOllamaPrivacy

About the Author

Hansika, AI Solutions Architect

Hansika

AI Solutions Architect

4+
Projects Delivered
1.5yr
Industry Experience

Hansika

AI Solutions Architect

Apex Neural

Hansika specializes in designing and implementing intelligent AI systems, from agentic platforms to RAG pipelines. She leads complex enterprise deployments and has architected solutions for data labeling, document processing, and knowledge management.

Contributors

Ready to Build Your AI Solution?

Get a free consultation and see how we can help transform your business.