Case Study — Healthcare & MedTech

AI-Powered OCR Prescription Processing & Medicine Intelligence Platform

Digital Health Platform

A conversational AI health assistant that processes handwritten and printed prescriptions via OCR, identifies medicines through a multi-stage RAG pipeline, manages family health profiles, and delivers intelligent medication reminders.

Client Overview

A digital health startup needed an intelligent prescription processing system that patients could use directly through a messaging platform. The core challenge was replacing traditional OCR — which outputs raw text and struggles with doctor handwriting, faded ink, and non-standard layouts — with a Vision LLM that understands the context of a medical document, not just the characters in it.

The goal was to let users photograph a prescription, have the system extract and identify all medicines with full clinical detail, and attach the results to the correct patient profile — all within a single conversational flow.

The platform also needed to handle multi-patient households, lab report analysis, and automated medication reminder scheduling.

Business Challenges

The client faced several operational and technical challenges before engaging Aviasole:

  • Traditional OCR engines output raw character strings and cannot interpret the structure or intent of a medical document — doctor shorthand, overlapping text, and poor scan quality produce unusable output
  • Handwritten prescriptions vary significantly in style, abbreviation, and layout, making rule-based text extraction brittle and unmaintainable
  • Even when text is extracted correctly, medicine names have spelling variants, regional brand names, and generic equivalents that a simple database lookup cannot resolve
  • Managing patient identity becomes complex when one user submits prescriptions for multiple family members
  • Lab reports require a separate processing pipeline with abnormal result detection and historical trend tracking
  • Doctors seeing a patient for the first time have no consolidated view of their prescription history, lab results, and diagnoses — leading to repeated tests and incomplete consultations
  • Reminder scheduling needed to stay synchronized across database and cache without creating duplicate notifications

Solution Provided

Aviasole designed and built a full-stack AI health assistant comprising a FastAPI backend, a multimodal prescription agent powered by Google Gemini, and a multi-stage medicine RAG pipeline backed by PostgreSQL and Redis. The system processes prescription images end-to-end -from OCR extraction, to patient identity resolution, medicine matching, scan logging, and reminder creation -within a single API call.

Key Features & Capabilities

Vision LLM & OCR Layer

  • Uses Google Gemini as a Vision LLM rather than a traditional OCR engine, enabling the model to understand the semantic structure of a prescription — not just extract characters
  • Reads handwritten and printed prescriptions from photographs taken on any mobile device, handling variable image quality, rotation, and lighting conditions
  • Identifies and separates medicine names, patient name, doctor name, clinic, diagnosis, and prescription date as distinct structured fields from a single image
  • Interprets doctor shorthand and abbreviations in context — for example, understanding that “Tab.” means tablet or “BD” means twice daily — rather than passing them as raw strings
  • Handles mixed-language prescriptions where medicine names may appear in English while surrounding text is in a regional language
  • Extracts the complete medicine list in one pass, including partially legible names, ensuring nothing is silently dropped

Agentic Processing Pipeline

  • Operates as a ReAct agent with automatic tool calling, separating the vision extraction step from the medicine intelligence step
  • The agent invokes the medicine lookup tool exactly once with the full extracted list, keeping the flow auditable and preventing partial saves
  • Generates a human-readable, language-aware summary for the patient using only the verified tool response — never inferring dosages or side effects from the image independently
  • Agent conversation history is retained per request, enabling full traceability of what was extracted and what the model resolved

Multi-Stage Medicine RAG Pipeline

  • Four-stage sequential lookup: in-memory cache, exact database match, trigram fuzzy search, and vector semantic search
  • Each stage handles a different class of OCR noise -clean names, partial names, misspellings, and phonetic variants respectively
  • Medicines that pass none of the stages are flagged as pharmaceutical terms and queued for automated scraper follow-up
  • Matched results include brand name, salt composition, uses, side effects, safety advice, and visual medicine attributes

Family Identity & Profile Management

  • Resolves the correct family member profile before saving any prescription data
  • Supports multi-patient households under a single user account
  • When a patient name is new or ambiguous, the scan is held in a confirmation queue while a medicine preview is still returned to the user -no OCR data is lost
  • Profile data includes name, age, gender, and linked scan history per family member

Lab Report Processing

  • Parallel pipeline for diagnostic lab reports using the same multi-stage matching logic applied to test parameter names
  • Results stored per test with computed normal, high, low, and critical flags
  • Returns trend data comparing the current report against prior reports for the same patient, surfacing changes in abnormal values over time
  • Supports multi-page PDF lab reports via a shared scan identifier

Doctor Consultation Report

  • Generates a structured summary report for the doctor before or during a consultation, compiled entirely from the patient’s scanned data
  • Consolidates medicines from all past prescriptions, current diagnoses, and latest lab report results into a single document
  • Surfaces abnormal lab values prominently so the doctor can identify critical findings at a glance without reviewing individual reports
  • Includes prescription history with doctor names, clinic details, and dates, giving the consulting doctor full context on prior treatments
  • Report is generated on demand from existing scan and lab data — no manual data entry required from the patient or clinic staff

Medication Reminder Engine

  • Creates time-aware reminders linked to specific medicines and prescription scans
  • Reminders are written to both the database and a Redis cache immediately after insertion for low-latency notification delivery
  • Built-in deduplication prevents duplicate reminders when the same prescription is rescanned
  • Supports one-time and recurring reminder schedules with configurable time slots

Data Pipeline & Medicine Coverage

  • Medicine database seeded from a scraped catalog of widely used drugs and loaded into PostgreSQL
  • A scheduled daily job processes the missing medicines queue, scraping and adding new entries automatically
  • Embedding rebuild job regenerates vector representations when the underlying model is updated
  • Admin dashboard provides the operations team direct control over the medicine catalog

Technology Stack

AI & Agent Layer

  • Google Gemini Vision LLM for prescription image understanding, structured extraction, and context-aware OCR
  • ReAct agent pattern with automatic tool calling for auditable, step-by-step processing
  • pgvector for approximate nearest-neighbour semantic search

Backend

  • Python and FastAPI for the API server
  • Connection-pooled PostgreSQL for all relational data
  • Redis for medicine name cache and reminder state

Database

  • PostgreSQL with trigram indexing for fuzzy text matching
  • pgvector extension for vector embedding storage and search
  • Automatic schema migration on startup

Frontend

  • React admin dashboard for drug catalog and missing medicine management

Cloud & Infrastructure

  • AWS deployment with scalable compute and storage
  • Scheduled background jobs for scraping and embedding maintenance

AI-Ready Enhancements

  • Multilingual Vision LLM extraction for prescriptions written in regional scripts and mixed-language documents
  • Fine-tuned OCR model for low-quality or damaged prescription photographs
  • AI-driven drug interaction detection across a patient’s full prescription history
  • Intelligent abnormal lab result alerts with clinical context surfaced through the agent
  • AI Agents for proactive health reminders based on extracted diagnosis and prescription patterns
  • Predictive refill reminders based on dosage duration and prescription history

Business Impact

  • Vision LLM replaced brittle traditional OCR, enabling reliable extraction from handwritten, printed, and mixed-language prescriptions
  • End-to-end processing from photograph to structured medicine data completed within a single API call
  • Multi-stage RAG pipeline resolves brand names, generics, abbreviations, and OCR noise without any manual matching rules
  • Family profile management supports multi-patient households under a single account
  • Doctor consultation reports give physicians a consolidated view of prescriptions, diagnoses, and lab results without any manual preparation
  • Missing medicines are logged and automatically scraped, improving database coverage continuously
  • Reminder deduplication prevents duplicate notifications when the same prescription is rescanned

Outcome

The platform launched as a fully automated health assistant capable of reading prescription photographs — including handwritten doctor notes — and returning clinically structured medicine data in seconds. Replacing rule-based OCR with a Vision LLM was the foundational shift that made the rest of the system reliable: accurate extraction fed accurate matching, which fed accurate reminders. The family identity system enabled household-level health record management without requiring separate accounts per family member, and the architecture is designed to extend to additional document types, languages, and notification channels.

Services Used

Technologies

Python / FastAPIGoogle GeminiPostgreSQLpgvectorRedisReactAWS

Key Results

4-Stage RAG cascade for medicine lookup
~95% Medicine identification accuracy
2 Types Prescriptions and lab reports processed
Zero OCR data lost during identity confirmation

Ready to Transform
Your Business?

Let's discuss how our technology solutions can help you achieve your goals.

We respond within 24 hours • Available Monday-Friday, 10:00 AM - 7:00 PM IST

Start a Conversation