Case Study | Mobile + Backend + Custom LoRA | Hummus Development LLC

Papi AI.

A cross-platform AI dating wingman that ships on a real iPhone, runs on Vercel-hosted FastAPI, and is being progressively de-risked from third-party LLMs by training a custom Qwen2.5-7B LoRA on a 16 GB GPU at home.

Stack · React Native / FastAPI / LoRA / GGUF Status · v2.9 in users' hands via TestFlight peers Deploy · Vercel + EAS Update + Local Ollama Role · Solo founder / engineer
v2.9
Mobile Version Live
7B
Param LoRA Base
31 tok/s
Local Inference
3
Distinct Layers
01

The Goal

Papi started as a single-file dating-app helper and has grown into a three-layer product: a mobile app users actually open on their phone, a backend that runs on Vercel and Groq, and an in-progress custom fine-tune that will eventually replace the third-party LLM on a personal RTX 4060 Ti.

The interesting engineering isn't "Papi answers Tinder messages." It's three layers that have to ship together — every release means a coordinated mobile + backend + (eventually) model update — and the long-term goal is to own the entire stack so the inference cost tends toward zero and the data never leaves my hardware.

02

Three Layers, Tightly Versioned

Layer 01 · Mobile

papi-ai-app

React Native on Expo SDK 54, distributed to peers via the EAS Update preview channel — no App Store review for iteration. Tap-to-fullscreen chat-image modal, voice + screenshot reply, per-match memory.

Expo 54 EAS Update Reanimated 4 RN Worklets
Layer 02 · Backend

papi-ai-web

Vercel FastAPI on Groq Llama 3.3 70B (chat) + whisper-large-v3 (multilingual voice). Strict version-tagged rollouts; the health endpoint hardcodes the version so a stale deployment lights up red immediately.

Vercel FastAPI Groq Whisper
Layer 03 · Local Model

papi-finetune

Qwen2.5-7B LoRA fine-tune lab. WSL Ubuntu + uv-managed Python 3.11 + torch 2.11+cu130 + Unsloth + xformers. Seed-extract → synthetic-corpus → train → GGUF export → drop into local Ollama gateway.

Unsloth PEFT TRL GGUF
03

Architecture

Today the app calls Vercel; tomorrow that same call routes through a personal Tailscale-fronted Ollama gateway running the fine-tuned model. The mobile and backend layers don't need to know which model is on the other end.

                        [ User on iPhone ]
                                
                       React Native (Expo)
                                
                  HTTPS / WebSocket / multipart
                                
                       Vercel Edge / Functions
                       FastAPI (papi-ai-web)
                                
       ┌────────────────────┬───────┴────────┬────────────────────┐
                                                               
   Groq Cloud            Local Ollama     Whisper-V3          SQLite
   Llama 3.3 70B         Qwen2.5-7B + LoRA  (multilingual)     memory
   (today)               (in progress)     voice transcribe
                           
       └────────────────────┘
              routed via
            ┌──────────────┐
             Tailscale    
             100.89.111.87
             :8089 Bearer  
            └──────────────┘
04

The Local Fine-Tune Lab

GPU spec

NVIDIA RTX 4060 Ti

Workstation card, single GPU, runs in Karim's tower at home. Sufficient VRAM for Qwen2.5-7B QLoRA without offload; enough headroom for inference + training swap.

16GB

The fine-tune lab is a five-script pipeline:

Discipline

"Own AI" includes the data layer. Synthesizing training data via someone else's API still leaves a critical dependency upstream. This pipeline draws its synthetic corpus from a local model so the entire training-and-serving loop is self-hosted from day one.

05

The Hard Bugs

iPhone HEIC + Vercel's 4.5 MB body cap

iPhones photograph in HEIC. Vercel Functions cap multipart bodies at 4.5 MB. A modern iPhone HEIC plus form metadata = 5–8 MB easily. The mobile client now handles both the format flip (HEIC → JPEG re-encode) and a 1600 px resize before upload — and the error strings on the server differentiate which layer failed so the next bug doesn't take an hour to bisect.

Whisper Turbo's English-only failure mode

Initial voice transcription used whisper-large-v3-turbo. Speed was great; multilingual quality was abysmal — non-English voice notes came back as confident-sounding nonsense. Switched to whisper-large-v3 with explicit language hint + a domain-specific prompt bias. Real bug, real fix, permanent note.

Reanimated 4 + missing Worklets dep

Upgrading to Reanimated 4 silently requires react-native-worklets as a peer dep. Missing it surfaces as a misleading "unable to resolve the native app" in Expo Go that doesn't mention worklets. The fix is one line in package.json; the diagnostic is the cost.

06

What's Next

Next Case Study

RoofRoof.solutions

Live roofing-lead marketplace — Stripe in production, Twilio toll-free, Supabase + RLS, Meta Pixel + CAPI dedup, restoreSiteDeploy incident write-up.