Aurenia: A Blazing Fast Offline and Multilingual Study Assistant
In a world where AI is becoming synonymous with the cloud, I asked myself a simple question: What if we could build a powerful study assistant that was truly ours—private, offline, and spoke our language? The current landscape often forces a trade-off between intelligence and privacy. I wanted to challenge that.
So I decided to build what I wanted to exist. This post details the technical journey of bringing it to life.
The Architecture: A Local-First Approach
My primary goal was to create an app that runs natively on the desktop without any external dependencies or cloud calls.
The Core: Tauri
I chose Tauri to build the application shell. It uses modern web technologies (HTML, CSS, JS) for the frontend and a lightweight, high-performance Rust backend. As I also wanted to learn some Rust, Tauri was the perfect fit for this project.
On-Device Inference
To avoid any cloud dependency for the AI models, I embedded the inference engine directly into the app. I used the brilliant llama.cpp
project (including its binaries including CUDA ones) and run its llama-server
as a sidecar process launched via tauri::shell
. This approach powers both the LLM and the embedding models.
-
LLM - Google’s
Gemma 3n
: This model is surprisingly powerful for its size, excelling at instruction-following and multilingual tasks. In my testing, its performance surpassed even some 12B and 27B parameter models, especially when considering its incredible speed on consumer hardware. -
Embedding Model -
multilingual-e5-large
: To understand the content of documents across many languages, I chose this powerful and efficient model. It converts chunks of text into vector embeddings for our RAG pipeline.
LanceDB for Vector Storage
For the vector database, I needed a solution that was fast, embeddable, and didn’t require a separate server. LanceDB was the perfect choice. It’s an open-source, Rust-native database designed for high-performance vector search directly on disk, which integrated beautifully with the Tauri backend.
OCR and PDF Handling
To handle PDFs, I used pdf.js
to parse documents and extract text (and implemented lazy loading for performance). For scanned documents or images within PDFs, I integrated PaddleOCR models via the ONNX runtime by using paddle-ocr-rs
crate, allowing Aurenia to read and understand text that isn’t digitally native.
Implementing the Core Features
With the architecture in place, I focused on building intuitive features that would feel like a true study assistant. Here’s a look at the prompt engineering behind some of them.
Text Selection Menu
When a user highlights text, they are presented with three options:
- Define: Takes the selected text and explains it in the user’s native language.
- System Prompt:
You are a dictionary which defines things from any language into ${language} language. Output the definition in ${language} only.
- User Prompt:
Define: ${selectedText}
- System Prompt:
- Translate: Translates any highlighted text into the user’s chosen language.
- System Prompt:
You are a translator that converts any language to ${language}. Output the translation only.
- User Prompt:
Translate into ${language}: ${selectedText}
- System Prompt:
- What’s this?: Provides a contextual explanation of the selected text, using the entire page for context.
- System Prompt:
You are a study assistant. Provide explanation only in ${language}.
- User Prompt:
Here is a page from a PDF file: ${pageText}\nWhat does "${selectedText}" mean in this context?
- System Prompt:
Page Context Menu
Right-clicking on a page gives access to more powerful, page-wide tools:
-
Generate Quiz: This feature generates a 5-question interactive quiz based on the page’s content. To ensure the output is always usable, I leveraged
llama.cpp
’s built-in support for JSON Schema, forcing the model to reply in a structured format.The Prompt:
Content: --- ${pageText} --- Based on the content above, generate 5 multiple-choice questions. Reply only in the specified JSON format.
Response format Schema:
export const questions_schema = { title: "Questions", type: "object", properties: { questions: { type: "array", items: { type: "object", properties: { question: { type: "string" }, A: { type: "string" }, B: { type: "string" }, C: { type: "string" }, D: { type: "string" }, correct_option: { type: "string", enum: ["A", "B", "C", "D"] }, }, required: ["question", "A", "B", "C", "D", "correct_option"], }, minItems: 5, maxItems: 5, }, }, required: ["questions"], };
-
Summarize: This summarizes the whole page in user’s native language. The whole page is given as user prompt.
The System Prompt:
`You are a ${language} summarizer. Only output in ${language} the summary of the given content in markdown.`
The Intelligent Chat: A Multi-Step RAG Pipeline
The heart of Aurenia’s chat is a custom, multi-step pipeline designed to ensure responses are both fast and contextually accurate. I couldn’t rely on a model’s built-in tool-use and wanted to not fill up the main context window with unnecessary context, so I built my own routing logic.
Here’s how it works every time the user sends a message:
Step 1: Is the answer on the current page?
First, the app performs a quick check to see if the user’s question can be answered by the text they are currently looking at. This avoids unnecessary searches. An LLM call is made with the following prompt and schema:
- Prompt:
`The user is reading a PDF. Based on the provided text visible on the app screen, determine if the user is asking a question related to the visible text or not. If the user is asking related to provided visible text, then output {question_from_visible_text: "true"} Else output {question_from_visible_text: "false"} Here is the text visible: ------- ${visibleText} ------- Here is user's query: ${userQuery}`
- Response format Schema:
export const question_from_visible_text_schema = { title: "Choice", type: "object", properties: { question_from_visible_text: { type: "string", enum: ["true", "false"], }, }, required: ["question_from_visible_text"], };
If LLM returns { question_from_visible_text: "true" }
then there is no need to perform rag, and the page text is attached to main chat completion as context, and LLM answers normally.
If LLM returns { question_from_visible_text: "false" }
then we pass the whole conversation as a string to another Chat completion again with custom response format to know whether user need information from the document at all, or are they just having a general chat? To decide this, the entire conversation history is passed as a string to another LLM call to act as a “router.”
- Conversation history string is of the form:
User: the user's message Assistant: the reply of assistant User: the user's message . . .
- Prompt:
`Based on the provided conversation history, determine if the user is asking a new question that requires retrieving content from the document related to some query or if they are asking only a general or a follow-up question. You need to reply either {rag: "true", query: "natural language RAG query to retrieve the required information"} or {rag: "false", query: "none"} The RAG query shouldn't mention pdf name. Here is the conversation history: ${conversation_history}`
- Schema:
export const decide_rag_schema = { title: "Choice", type: "object", properties: { rag: { type: "string", enum: ["true", "false"], }, query: { type: "string", } }, required: ["rag", "query"], };
The LLM replies in the form
-
` {rag: “true”, query: “natural language RAG query to retrieve the required information”} `
or
-
` {rag: “false”, query: “none” } `
If RAG is needed, then we use the Query to fetch the top similar chunk using the document’s table stored in the database, and attach the corresponding page context.
Languages Supported
Aurenia is designed for a global audience. Out of the box, it can understand and respond in the world’s 25 most spoken languages, turning any document into a conversation in your native tongue.
English
Mandarin Chinese
Hindi
Spanish
French
Standard Arabic
Bengali
Russian
Portuguese
Urdu
Indonesian
German
Japanese
Nigerian Pidgin
Egyptian Arabic
Marathi
Telugu
Turkish
Tamil
Cantonese
Vietnamese
Wu Chinese
Tagalog
Korean
Farsi
Language list sourced from Britannica’s list of languages by total number of speakers.
The Road Ahead: Future Improvements
Aurenia is a project I’m incredibly passionate about, and this is just the beginning. Here are some of the key improvements on the roadmap:
- Unlocking Vision Capabilities: Gemma 3n is a powerful multimodal model. A key priority is to enable its vision features within Aurenia, which would allow for a true multimodal understanding of documents, including diagrams, charts, and images, as the inference ecosystem evolves.
- Advanced RAG Pipeline: Implementing a
k=5
retrieval with a re-ranking routing logic to improve accuracy on highly ambiguous queries. - Expanded Document Support: Adding support for more file types like
.epub
,.docx
, and.pptx
. - Chat History & Memory: Implementing a more robust system for long-term conversational memory.
- UI/UX Enhancements: Adding features like annotations and cross-document search.
Try Aurenia
Thank you for reading about my journey building Aurenia. I’ve made the entire project open-source and available for you to try.
- 💻 Get the Code & Installer: [https://github.com/inventwithdean/aurenia]
- 📺 Watch the Demo video: [https://youtu.be/n_-dwJi9wO8]