Back to Case Studies
Agentic AIEnterprise

FireCrawl Agentic RAG Platform

A production-grade autonomous RAG system that bridges local document knowledge with live web data using FireCrawl and LlamaIndex Workflows.

Nov 2025
12 min read
Live Demo
FireCrawl Agentic RAG Platform

Project Overview

The FireCrawl Agent solves the 'staleness' problem in RAG by integrating real-time web crawling. We built a persistent system where users can upload PDFs and engage in a dialogue that automatically crawls the web for missing context. The system was recently migrated to PostgreSQL to support multi-user sessions and high-concurrency workloads.

98.5%
Retrieval Accuracy
Multi-Hop
Context Depth
99.9%
System Uptime
JWT/RSA
Auth Security

System Architecture

The system utilizes a modern full-stack architecture with a FastAPI backend and a React/TypeScript frontend. It orchestrates complex agentic flows using LlamaIndex Workflows, persisting structured data in PostgreSQL and vector embeddings in a persistent ChromaDB store.

System Architecture
Figure 1: System Architecture Diagram

Orchestrator

LlamaIndex Workflows for state-managed agent runs

Web Scraper

FireCrawl for intelligent, LLM-ready web ingestion

Primary Store

PostgreSQL with SQLAlchemy for session persistence

Vector Store

ChromaDB for local document semantic indexing

Implementation Details

Code Example

python
async def process_document(self, file_path: str):
    # Setup persistent ChromaDB client
    chroma_client = chromadb.PersistentClient(path='./chroma_db')
    chroma_collection = chroma_client.get_or_create_collection('demo')
    vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
    
    # Create Agentic Workflow with FireCrawl tools
    workflow = AgenticRAGWorkflow(
        index=index,
        firecrawl_api_key=os.environ['FIRECRAWL_API_KEY'],
        timeout=249
    )
    return workflow

Agent Memory

When migrating from SQLite to PostgreSQL, always ensure UUID types match across the schema to prevent bind-parameter mismatches during high-concurrency async operations.

Workflow

1

Authentication: User signs in via premium UI.

2

Ingestion: PDF uploaded and embedded into persistent storage.

3

Query: User asks a question in the chat interface.

4

Agentic Loop: System decides whether to use local PDF data or crawl via FireCrawl.

5

Result: Final synthesized answer with full logs returned to user.

Workflow Diagram
Figure 2: Workflow Diagram

Results & Impact

"The integration of FireCrawl with our local research PDFs turned a week of browsing into a 5-minute chat session."

Scale

Ready for 10,000+ concurrent sessions

UX

Sub-2s response time for vector retrieval

Persistence

100% session recovery after server restarts

FireCrawlLlamaIndexPostgreSQLReactFastAPIRAGChromaDBVector DBGPT-4oJWTWeb Crawling

About the Author

Hansika, AI Solutions Architect

Hansika

AI Solutions Architect

4+
Projects Delivered
1.5yr
Industry Experience

Hansika

AI Solutions Architect

Apex Neural

Hansika specializes in designing and implementing intelligent AI systems, from agentic platforms to RAG pipelines. She leads complex enterprise deployments and has architected solutions for data labeling, document processing, and knowledge management.

Ready to Build Your AI Solution?

Get a free consultation and see how we can help transform your business.