← Back to Projects

Scholar AI - LLM based RAG System

A LLM-based RAG academic literature retrieval system with intelligent document parsing and real-time streaming responses.

GPT 4o React FastAPI Python TypeScript Ant Design  Elasticsearch Hybrid Search BM25+Vector  BGE m3 embedding BGE Reranker

Overview

Scholar AI is a comprehensive academic literature retrieval system that leverages large language models and retrieval-augmented generation (RAG) to help researchers find and understand relevant academic papers. The system combines traditional keyword search with semantic understanding to provide more accurate and contextual results.


Key Features

  • Intelligent document parsing (PDF/Word/PPT): Seamless extraction and understanding of content from multiple document format.
  • Real-time streaming responses: Dynamic LLM integration with GPT-4 for instant, conversational AI interactions .
  • Multi-language support (Chinese/English/Japanese): Cross-lingual capabilities for global accessibility .
  • Intelligent question recommendations: Context-aware suggestions to guide user inquiries and exploration .
  • Hybrid search combining BM25 and vector similarity: Advanced retrieval combining keyword matching and semantic understanding .
  • BGE reranking for improved relevance: Enhanced result ordering for more accurate information retrieval .

Technical Architecture


Core Technical Implementations

  • Layout-Aware PDF Parsing: Preserves reading order and structure.


  • Hybrid Retrieval System: Keyword search misses semantics; vector search misses exact terms (model names, authors).

© 2025 Qing Zhong