AI-Powered OCR Prescription Processing & Medicine Intelligence Platform
Digital Health Platform
A conversational AI health assistant that processes handwritten and printed prescriptions via OCR, identifies medicines through a multi-stage RAG pipeline, manages family health profiles, and delivers intelligent medication reminders.
Client Overview
A digital health startup needed an intelligent prescription processing system that patients could use directly through a messaging platform. The core challenge was replacing traditional OCR — which outputs raw text and struggles with doctor handwriting, faded ink, and non-standard layouts — with a Vision LLM that understands the context of a medical document, not just the characters in it.
The goal was to let users photograph a prescription, have the system extract and identify all medicines with full clinical detail, and attach the results to the correct patient profile — all within a single conversational flow.
The platform also needed to handle multi-patient households, lab report analysis, and automated medication reminder scheduling.
Business Challenges
The client faced several operational and technical challenges before engaging Aviasole:
- Traditional OCR engines output raw character strings and cannot interpret the structure or intent of a medical document — doctor shorthand, overlapping text, and poor scan quality produce unusable output
- Handwritten prescriptions vary significantly in style, abbreviation, and layout, making rule-based text extraction brittle and unmaintainable
- Even when text is extracted correctly, medicine names have spelling variants, regional brand names, and generic equivalents that a simple database lookup cannot resolve
- Managing patient identity becomes complex when one user submits prescriptions for multiple family members
- Lab reports require a separate processing pipeline with abnormal result detection and historical trend tracking
- Doctors seeing a patient for the first time have no consolidated view of their prescription history, lab results, and diagnoses — leading to repeated tests and incomplete consultations
- Reminder scheduling needed to stay synchronized across database and cache without creating duplicate notifications
Solution Provided
Aviasole designed and built a full-stack AI health assistant comprising a FastAPI backend, a multimodal prescription agent powered by Google Gemini, and a multi-stage medicine RAG pipeline backed by PostgreSQL and Redis. The system processes prescription images end-to-end -from OCR extraction, to patient identity resolution, medicine matching, scan logging, and reminder creation -within a single API call.
Key Features & Capabilities
Vision LLM & OCR Layer
- Uses Google Gemini as a Vision LLM rather than a traditional OCR engine, enabling the model to understand the semantic structure of a prescription — not just extract characters
- Reads handwritten and printed prescriptions from photographs taken on any mobile device, handling variable image quality, rotation, and lighting conditions
- Identifies and separates medicine names, patient name, doctor name, clinic, diagnosis, and prescription date as distinct structured fields from a single image
- Interprets doctor shorthand and abbreviations in context — for example, understanding that “Tab.” means tablet or “BD” means twice daily — rather than passing them as raw strings
- Handles mixed-language prescriptions where medicine names may appear in English while surrounding text is in a regional language
- Extracts the complete medicine list in one pass, including partially legible names, ensuring nothing is silently dropped
Agentic Processing Pipeline
- Operates as a ReAct agent with automatic tool calling, separating the vision extraction step from the medicine intelligence step
- The agent invokes the medicine lookup tool exactly once with the full extracted list, keeping the flow auditable and preventing partial saves
- Generates a human-readable, language-aware summary for the patient using only the verified tool response — never inferring dosages or side effects from the image independently
- Agent conversation history is retained per request, enabling full traceability of what was extracted and what the model resolved
Multi-Stage Medicine RAG Pipeline
- Four-stage sequential lookup: in-memory cache, exact database match, trigram fuzzy search, and vector semantic search
- Each stage handles a different class of OCR noise -clean names, partial names, misspellings, and phonetic variants respectively
- Medicines that pass none of the stages are flagged as pharmaceutical terms and queued for automated scraper follow-up
- Matched results include brand name, salt composition, uses, side effects, safety advice, and visual medicine attributes
Family Identity & Profile Management
- Resolves the correct family member profile before saving any prescription data
- Supports multi-patient households under a single user account
- When a patient name is new or ambiguous, the scan is held in a confirmation queue while a medicine preview is still returned to the user -no OCR data is lost
- Profile data includes name, age, gender, and linked scan history per family member
Lab Report Processing
- Parallel pipeline for diagnostic lab reports using the same multi-stage matching logic applied to test parameter names
- Results stored per test with computed normal, high, low, and critical flags
- Returns trend data comparing the current report against prior reports for the same patient, surfacing changes in abnormal values over time
- Supports multi-page PDF lab reports via a shared scan identifier
Doctor Consultation Report
- Generates a structured summary report for the doctor before or during a consultation, compiled entirely from the patient’s scanned data
- Consolidates medicines from all past prescriptions, current diagnoses, and latest lab report results into a single document
- Surfaces abnormal lab values prominently so the doctor can identify critical findings at a glance without reviewing individual reports
- Includes prescription history with doctor names, clinic details, and dates, giving the consulting doctor full context on prior treatments
- Report is generated on demand from existing scan and lab data — no manual data entry required from the patient or clinic staff
Medication Reminder Engine
- Creates time-aware reminders linked to specific medicines and prescription scans
- Reminders are written to both the database and a Redis cache immediately after insertion for low-latency notification delivery
- Built-in deduplication prevents duplicate reminders when the same prescription is rescanned
- Supports one-time and recurring reminder schedules with configurable time slots
Data Pipeline & Medicine Coverage
- Medicine database seeded from a scraped catalog of widely used drugs and loaded into PostgreSQL
- A scheduled daily job processes the missing medicines queue, scraping and adding new entries automatically
- Embedding rebuild job regenerates vector representations when the underlying model is updated
- Admin dashboard provides the operations team direct control over the medicine catalog
Technology Stack
AI & Agent Layer
- Google Gemini Vision LLM for prescription image understanding, structured extraction, and context-aware OCR
- ReAct agent pattern with automatic tool calling for auditable, step-by-step processing
- pgvector for approximate nearest-neighbour semantic search
Backend
- Python and FastAPI for the API server
- Connection-pooled PostgreSQL for all relational data
- Redis for medicine name cache and reminder state
Database
- PostgreSQL with trigram indexing for fuzzy text matching
- pgvector extension for vector embedding storage and search
- Automatic schema migration on startup
Frontend
- React admin dashboard for drug catalog and missing medicine management
Cloud & Infrastructure
- AWS deployment with scalable compute and storage
- Scheduled background jobs for scraping and embedding maintenance
AI-Ready Enhancements
- Multilingual Vision LLM extraction for prescriptions written in regional scripts and mixed-language documents
- Fine-tuned OCR model for low-quality or damaged prescription photographs
- AI-driven drug interaction detection across a patient’s full prescription history
- Intelligent abnormal lab result alerts with clinical context surfaced through the agent
- AI Agents for proactive health reminders based on extracted diagnosis and prescription patterns
- Predictive refill reminders based on dosage duration and prescription history
Business Impact
- Vision LLM replaced brittle traditional OCR, enabling reliable extraction from handwritten, printed, and mixed-language prescriptions
- End-to-end processing from photograph to structured medicine data completed within a single API call
- Multi-stage RAG pipeline resolves brand names, generics, abbreviations, and OCR noise without any manual matching rules
- Family profile management supports multi-patient households under a single account
- Doctor consultation reports give physicians a consolidated view of prescriptions, diagnoses, and lab results without any manual preparation
- Missing medicines are logged and automatically scraped, improving database coverage continuously
- Reminder deduplication prevents duplicate notifications when the same prescription is rescanned
Outcome
The platform launched as a fully automated health assistant capable of reading prescription photographs — including handwritten doctor notes — and returning clinically structured medicine data in seconds. Replacing rule-based OCR with a Vision LLM was the foundational shift that made the rest of the system reliable: accurate extraction fed accurate matching, which fed accurate reminders. The family identity system enabled household-level health record management without requiring separate accounts per family member, and the architecture is designed to extend to additional document types, languages, and notification channels.