02AI · EdTech2025

MasomoAI

An AI learning platform that converts documents, images, video and text into structured study material. I led the backend architecture for ingestion, asynchronous AI work, learning domains and production operations.

Visit product ↗

Source code is private. The case study focuses on pipeline structure, asynchronous control, failure handling and production behavior rather than client-sensitive content.

Product model: AI learning SaaS
Audience: Students & learning partners
Timeline: 2025
My role: Lead backend engineer
Context: Private commercial product
Delivery scope: API · AI pipeline · workers · operations

5priority queues

5content processing states

10+backend domains

Product surfaces

Distinct product surfaces made school, association and campus responsibilities visible before getting into the backend architecture.

Private product UI

Course upload and processing

Source ingestion, processing status and material preparation before AI generation.

Screenshot coming

Generated study materials

Summaries, quizzes, flashcards and practice outputs generated from uploaded content.

Private workflow

Tutor and chat interface

Learner-facing interactions that stay responsive while heavy generation runs in background queues.

Problem and constraints

Learning content arrives in formats with different extraction costs and failure modes. A useful product must isolate slow OCR and model calls from interactive requests while keeping generated content tied to the correct learner, course and processing state.

Unpredictable source quality

Native PDFs, scans, images and videos cannot share one extraction path or one cost profile.

Long-running AI work

OCR, generation and grading exceed the latency budget of normal API requests.

Interactive flows versus batch workloads

Learner-facing chat and browsing must not wait behind bulk extraction or scheduled analytics.

Operational recovery after async failure

The system must make stuck jobs, failed tasks and provider-side errors visible instead of hiding them behind false success states.

My exact ownership

I designed

The ingestion lifecycle separating source files, extracted text and generated learning artifacts.
Queue routing and workload isolation between chat, extraction, OCR, generation and scheduled work.
Processing states that keep asynchronous AI work inspectable to both users and operators.

I implemented

Asynchronous generation flows for summaries, quizzes, flashcards, practice tests and tutoring support.
Celery task controls with retries, late acknowledgements, worker-loss rejection and bounded execution.
Authentication, scoped throttling, redacted logging and production observability around long-running work.

I collaborated on

The product behavior around user-facing asynchronous states and generated study outputs.
Provider choices and operational handling for document extraction and AI workflows.

Objectives

Ingestion

Accept heterogeneous material and preserve a reusable extracted-text layer.

Responsiveness

Move expensive work off the request path and prioritize interactive operations.

Domain separation

Keep courses, practices, grading, chat, quizzes and flashcards independently maintainable.

Operations

Expose queue health, retries, failures and request correlation for incident analysis.

System architecture

01Inputs

PDF · image · video · text

02Extraction

PyMuPDF · pypdf · Document AI · OCR

03Orchestration

Django · Celery · Redis · LLM services

04Learning outputs

Summary · quiz · cards · practice · chat

The platform separates ingestion, extraction, orchestration and learning outputs so heavy AI work can evolve without contaminating the request path.

Technology stack

Application

Django 5 · Django REST Framework · PostgreSQL · OpenAPI

Async runtime

Celery · Redis · Celery Beat · Priority queues

AI & extraction

Document AI · PyMuPDF · pypdf · Tesseract · DeepSeek · OpenAI · Mistral

Production services

Cloudflare R2 · PgBouncer · Better Stack · Gunicorn

State model

Course processing is an explicit asynchronous state machine. The request can return before extraction, moderation and generation finish.

Pending

The file and course record exist; background work is queued.

Processing

Text extraction, content checks and AI generation run outside the HTTP request.

Ready

Extracted text and generated learning assets can be served.

Alternate transitions

Rejected

The moderation gate records a reason; rejected user files can be purged.

Failed

A processing error is recorded for retry or operational investigation.

The visible processing lifecycle is part of the product contract: users and operators can distinguish pending, active, rejected and failed work.

Engineering challenges

One content model, several extraction paths

Problem

Source quality varies from selectable PDF text to scans that require OCR.

Solution

Attempt native extraction first, escalate difficult files to OCR, cache extracted text and run moderation before generation.

Tradeoff

The pipeline has more states and cleanup logic, but avoids paying the OCR cost for every document.

Failure if handled poorly

If every file is treated the same way, cost, latency and failure recovery all become harder to control.

Interactive work versus bulk AI work

Problem

A single FIFO queue lets long OCR jobs delay chat replies and notifications.

Solution

Use five routed queues with priorities, per-task time limits, late acknowledgements and worker-loss rejection.

Tradeoff

Operations become more complex, but latency-sensitive and batch workloads receive explicit capacity rules.

Failure if handled poorly

If interactive and bulk workloads share one queue blindly, the product feels broken even when the workers are technically alive.

Fallbacks that do not hide failure

Problem

Silently running jobs in web-process threads can lose work during restarts.

Solution

Disable thread fallback in production, fail enqueue operations explicitly and persist retry/failure worker events.

Tradeoff

Users see a recoverable failure when the broker is down instead of receiving false confirmation.

Failure if handled poorly

If the system pretends background work started when it did not, support and users are left with content stuck in misleading states.

Failure modes & mitigations

OCR failure on low-quality material

The pipeline records explicit failure or rejection states instead of silently looping on unreadable documents.

Worker crash during generation

Late acknowledgements, worker-loss rejection and retries make interrupted jobs visible and recoverable.

Broker or queue unavailability

Enqueue failures surface explicitly rather than falling back to unsafe in-process behavior in production.

Stale task never completes

Stale-job recovery and queue health visibility give operations a way to inspect and recover incomplete work.

Technical decisions

Extraction before generation

Native PDF text extraction is attempted first, with repair and OCR paths reserved for difficult documents.

Jobs, not long requests

Generation work is moved to Celery so HTTP responses remain predictable and work can be retried or observed independently.

Domain-specific AI services

Learning, practice and grading flows have separate services rather than one oversized prompt layer.

Security & operations

Isolated Redis responsibilities

Cache, sessions and Celery use separate logical databases to avoid cross-subsystem eviction.

Private media delivery

R2 objects remain private and are served through signed URLs with bounded expiration.

Proxy-aware security

HTTPS, trusted origins, proxy headers and callback source allowlists are validated for production.

Redacted observability

Request IDs and latency buckets are logged while tokens, passwords, cookies and file payloads are redacted.

Worker time limits, retry strategies and queue monitoring.
Structured production logs and error/performance monitoring.
Provider boundaries that keep model orchestration replaceable.

Delivered system

Implementation outcomes — no unverified commercial metrics.

Deterministic content lifecycle

Every uploaded course exposes a processing state and terminal failure path.

Workload isolation

Chat, generation, OCR and scheduled work no longer compete under one undifferentiated queue.

Recoverable operations

Retries, stale work and task failures produce explicit diagnostic signals.

Replaceable providers

Extraction and model providers sit behind domain services rather than product endpoints.

What I learned

“AI product quality depends as much on document conditioning, queue behavior and failure handling as it does on the model itself.”

Next case studyPumpy Family Life↗