MasomoAI
An AI learning platform that converts documents, images, video and text into structured study material. I led the backend architecture for ingestion, asynchronous AI work, learning domains and production operations.
Source code is private. The case study focuses on pipeline structure, asynchronous control, failure handling and production behavior rather than client-sensitive content.
- Product model
- AI learning SaaS
- Audience
- Students & learning partners
- Timeline
- 2025
- My role
- Lead backend engineer
- Context
- Private commercial product
- Delivery scope
- API · AI pipeline · workers · operations
01
Product surfaces
Product surfaces
Distinct product surfaces made school, association and campus responsibilities visible before getting into the backend architecture.
Course upload and processing
Source ingestion, processing status and material preparation before AI generation.
Generated study materials
Summaries, quizzes, flashcards and practice outputs generated from uploaded content.
Tutor and chat interface
Learner-facing interactions that stay responsive while heavy generation runs in background queues.
02
Problem and constraints
Problem and constraints
Learning content arrives in formats with different extraction costs and failure modes. A useful product must isolate slow OCR and model calls from interactive requests while keeping generated content tied to the correct learner, course and processing state.
Unpredictable source quality
Native PDFs, scans, images and videos cannot share one extraction path or one cost profile.
Long-running AI work
OCR, generation and grading exceed the latency budget of normal API requests.
Interactive flows versus batch workloads
Learner-facing chat and browsing must not wait behind bulk extraction or scheduled analytics.
Operational recovery after async failure
The system must make stuck jobs, failed tasks and provider-side errors visible instead of hiding them behind false success states.
03
My exact ownership
My exact ownership
I designed
- The ingestion lifecycle separating source files, extracted text and generated learning artifacts.
- Queue routing and workload isolation between chat, extraction, OCR, generation and scheduled work.
- Processing states that keep asynchronous AI work inspectable to both users and operators.
I implemented
- Asynchronous generation flows for summaries, quizzes, flashcards, practice tests and tutoring support.
- Celery task controls with retries, late acknowledgements, worker-loss rejection and bounded execution.
- Authentication, scoped throttling, redacted logging and production observability around long-running work.
I collaborated on
- The product behavior around user-facing asynchronous states and generated study outputs.
- Provider choices and operational handling for document extraction and AI workflows.
03
Objectives
Objectives
Ingestion
Accept heterogeneous material and preserve a reusable extracted-text layer.
Responsiveness
Move expensive work off the request path and prioritize interactive operations.
Domain separation
Keep courses, practices, grading, chat, quizzes and flashcards independently maintainable.
Operations
Expose queue health, retries, failures and request correlation for incident analysis.
04
System architecture
System architecture
PDF · image · video · text
PyMuPDF · pypdf · Document AI · OCR
Django · Celery · Redis · LLM services
Summary · quiz · cards · practice · chat
The platform separates ingestion, extraction, orchestration and learning outputs so heavy AI work can evolve without contaminating the request path.
Technology stack
Application
Django 5 · Django REST Framework · PostgreSQL · OpenAPI
Async runtime
Celery · Redis · Celery Beat · Priority queues
AI & extraction
Document AI · PyMuPDF · pypdf · Tesseract · DeepSeek · OpenAI · Mistral
Production services
Cloudflare R2 · PgBouncer · Better Stack · Gunicorn
05
State model
State model
Course processing is an explicit asynchronous state machine. The request can return before extraction, moderation and generation finish.
Pending
The file and course record exist; background work is queued.
Processing
Text extraction, content checks and AI generation run outside the HTTP request.
Ready
Extracted text and generated learning assets can be served.
Alternate transitions
Rejected
The moderation gate records a reason; rejected user files can be purged.
Failed
A processing error is recorded for retry or operational investigation.
The visible processing lifecycle is part of the product contract: users and operators can distinguish pending, active, rejected and failed work.
07
Engineering challenges
Engineering challenges
One content model, several extraction paths
Source quality varies from selectable PDF text to scans that require OCR.
Attempt native extraction first, escalate difficult files to OCR, cache extracted text and run moderation before generation.
The pipeline has more states and cleanup logic, but avoids paying the OCR cost for every document.
If every file is treated the same way, cost, latency and failure recovery all become harder to control.
Interactive work versus bulk AI work
A single FIFO queue lets long OCR jobs delay chat replies and notifications.
Use five routed queues with priorities, per-task time limits, late acknowledgements and worker-loss rejection.
Operations become more complex, but latency-sensitive and batch workloads receive explicit capacity rules.
If interactive and bulk workloads share one queue blindly, the product feels broken even when the workers are technically alive.
Fallbacks that do not hide failure
Silently running jobs in web-process threads can lose work during restarts.
Disable thread fallback in production, fail enqueue operations explicitly and persist retry/failure worker events.
Users see a recoverable failure when the broker is down instead of receiving false confirmation.
If the system pretends background work started when it did not, support and users are left with content stuck in misleading states.
08
Failure modes & mitigations
Failure modes & mitigations
OCR failure on low-quality material
The pipeline records explicit failure or rejection states instead of silently looping on unreadable documents.
Worker crash during generation
Late acknowledgements, worker-loss rejection and retries make interrupted jobs visible and recoverable.
Broker or queue unavailability
Enqueue failures surface explicitly rather than falling back to unsafe in-process behavior in production.
Stale task never completes
Stale-job recovery and queue health visibility give operations a way to inspect and recover incomplete work.
09
Technical decisions
Technical decisions
Extraction before generation
Native PDF text extraction is attempted first, with repair and OCR paths reserved for difficult documents.
Jobs, not long requests
Generation work is moved to Celery so HTTP responses remain predictable and work can be retried or observed independently.
Domain-specific AI services
Learning, practice and grading flows have separate services rather than one oversized prompt layer.
10
Security & operations
Security & operations
Isolated Redis responsibilities
Cache, sessions and Celery use separate logical databases to avoid cross-subsystem eviction.
Private media delivery
R2 objects remain private and are served through signed URLs with bounded expiration.
Proxy-aware security
HTTPS, trusted origins, proxy headers and callback source allowlists are validated for production.
Redacted observability
Request IDs and latency buckets are logged while tokens, passwords, cookies and file payloads are redacted.
- Worker time limits, retry strategies and queue monitoring.
- Structured production logs and error/performance monitoring.
- Provider boundaries that keep model orchestration replaceable.
11
Delivered system
Delivered system
Implementation outcomes — no unverified commercial metrics.
Deterministic content lifecycle
Every uploaded course exposes a processing state and terminal failure path.
Workload isolation
Chat, generation, OCR and scheduled work no longer compete under one undifferentiated queue.
Recoverable operations
Retries, stale work and task failures produce explicit diagnostic signals.
Replaceable providers
Extraction and model providers sit behind domain services rather than product endpoints.
What I learned
“AI product quality depends as much on document conditioning, queue behavior and failure handling as it does on the model itself.”