Enterprise

100% Private Agentic RAG API

A complete AI-powered research and writing assistant using CrewAI and LitServe with a modern glassmorphism web interface, running 100% locally for total privacy.

Hansika

•Oct 2025•

10 min read

Project Overview

Most RAG systems rely on cloud-based LLMs, posing significant privacy risks for sensitive data. This project implements a fully local agentic system where a Researcher agent performs deep web searches and a Writer agent synthesizes the findings, all orchestrated via LitServe and running on local Ollama instances. This ensures that no data ever leaves the user's infrastructure.

100%

Privacy

<5 min

Setup Time

Qwen2.5/Llama3

Local LLM

Low Latency

API Performance

System Architecture

The system follows a multi-layered architecture starting with a LitServe-powered API gateway. It utilizes CrewAI for agent orchestration, delegating tasks to specialized Researcher and Writer agents. The agents interact with a local Ollama server for inference, providing a seamless and private experience.

Figure 1: System Architecture Diagram

LitServe API

High-performance serving engine for the RAG agents.

CrewAI Agents

Coordinated Researcher and Writer agents for task completion.

Ollama Local LLM

Local inference engine ensuring data remains private.

Flask Web UI

Modern dashboard with glassmorphism for user interaction.

Implementation Details

Code Example

python

from crewai import Agent, Task, Crew\n\n# Define the Researcher Agent\nresearcher = Agent(\n  role='Senior Researcher',\n  goal='Find comprehensive info on {topic}',\n  backstory='Expert at digging through web data.',\n  llm=local_llm\n)\n\n# Define the Writer Agent\nwriter = Agent(\n  role='Content Writer',\n  goal='Write a detailed report based on research',\n  backstory='Talented writer capable of explaining complex topics.',\n  llm=local_llm\n)

Agent Memory

Using a quantized model like Qwen2.5-7B with Ollama allows for near-real-time responses on consumer-grade hardware while maintaining high reasoning quality.

Workflow

Query Ingestion: User submits a query via the Glassmorphism UI.\n2. Agent Delegation: CrewAI assigns the 'Researcher' to search for relevant information.\n3. Synthesis: The 'Writer' agent summarizes the research into a coherent report.\n4. Local Serving: The LitServe API delivers the result back to the Flask frontend.

Figure 2: Workflow Diagram

Results & Impact

"The ability to run a research assistant entirely on my own machine without compromising on agent intelligence is a game-changer for our internal documents."

Data Security

Zero data leaks due to 100% local execution.

Efficiency

Automated research saves hours of manual searching.

Accessibility

Easy deployment with Docker and local API endpoints.

About the Author

Hansika

AI Context Engineer

Projects Delivered

1.5yr

Industry Experience

Hansika

AI Context Engineer

Apex Neural

Building deployable AI systems using LLMs, RAG pipelines, and modular backend architectures. Focused on clean system design, secure implementation, and scalable deployment practices.

Contributors

Hansika

Ready to Build Your AI Solution?

Get a free consultation and see how we can help transform your business.