A cross-platform AI dating wingman that ships on a real iPhone, runs on Vercel-hosted FastAPI, and is being progressively de-risked from third-party LLMs by training a custom Qwen2.5-7B LoRA on a 16 GB GPU at home.
Papi started as a single-file dating-app helper and has grown into a three-layer product: a mobile app users actually open on their phone, a backend that runs on Vercel and Groq, and an in-progress custom fine-tune that will eventually replace the third-party LLM on a personal RTX 4060 Ti.
The interesting engineering isn't "Papi answers Tinder messages." It's three layers that have to ship together — every release means a coordinated mobile + backend + (eventually) model update — and the long-term goal is to own the entire stack so the inference cost tends toward zero and the data never leaves my hardware.
React Native on Expo SDK 54, distributed to peers via the EAS Update preview channel — no App Store review for iteration. Tap-to-fullscreen chat-image modal, voice + screenshot reply, per-match memory.
Vercel FastAPI on Groq Llama 3.3 70B (chat) + whisper-large-v3 (multilingual voice). Strict version-tagged rollouts; the health endpoint hardcodes the version so a stale deployment lights up red immediately.
Qwen2.5-7B LoRA fine-tune lab. WSL Ubuntu + uv-managed Python 3.11 + torch 2.11+cu130 + Unsloth + xformers. Seed-extract → synthetic-corpus → train → GGUF export → drop into local Ollama gateway.
Today the app calls Vercel; tomorrow that same call routes through a personal Tailscale-fronted Ollama gateway running the fine-tuned model. The mobile and backend layers don't need to know which model is on the other end.
[ User on iPhone ] │ React Native (Expo) │ HTTPS / WebSocket / multipart ▼ Vercel Edge / Functions FastAPI (papi-ai-web) │ ┌────────────────────┬───────┴────────┬────────────────────┐ ▼ ▼ ▼ ▼ Groq Cloud Local Ollama Whisper-V3 SQLite Llama 3.3 70B Qwen2.5-7B + LoRA (multilingual) memory (today) (in progress) voice transcribe │ ▲ └────────────────────┘ routed via ┌──────────────┐ │ Tailscale │ │ 100.89.111.87│ │ :8089 Bearer │ └──────────────┘
Workstation card, single GPU, runs in Karim's tower at home. Sufficient VRAM for Qwen2.5-7B QLoRA without offload; enough headroom for inference + training swap.
The fine-tune lab is a five-script pipeline:
extract_seed.py — pulls real Papi conversations + persona blocks into a clean ChatML format.synth_generate.py — expands seed corpus with synthetic continuations (currently via the local Ollama gateway, intentionally not via third-party APIs because if the goal is "build my own AI" the data layer also has to be self-hosted).train_lora.py — Unsloth-accelerated QLoRA on Qwen2.5-7B-instruct. WSL Ubuntu, Python 3.11.15 in a uv-managed venv, torch 2.11+cu130, xformers 0.0.35, trl 0.15.2, peft 0.19.1, bitsandbytes 0.49.2 — full stack imports clean.export_gguf.py — converts the LoRA adapter + base into a single GGUF for llama.cpp / Ollama consumption.peek.py + Modelfile.papi — final drop into the personal Ollama instance behind the Tailscale gateway, ready to serve through the same API the Vercel backend calls today."Own AI" includes the data layer. Synthesizing training data via someone else's API still leaves a critical dependency upstream. This pipeline draws its synthetic corpus from a local model so the entire training-and-serving loop is self-hosted from day one.
iPhones photograph in HEIC. Vercel Functions cap multipart bodies at 4.5 MB. A modern iPhone HEIC plus form metadata = 5–8 MB easily. The mobile client now handles both the format flip (HEIC → JPEG re-encode) and a 1600 px resize before upload — and the error strings on the server differentiate which layer failed so the next bug doesn't take an hour to bisect.
Initial voice transcription used whisper-large-v3-turbo. Speed was great; multilingual quality was abysmal — non-English voice notes came back as confident-sounding nonsense. Switched to whisper-large-v3 with explicit language hint + a domain-specific prompt bias. Real bug, real fix, permanent note.
Upgrading to Reanimated 4 silently requires react-native-worklets as a peer dep. Missing it surfaces as a misleading "unable to resolve the native app" in Expo Go that doesn't mention worklets. The fix is one line in package.json; the diagnostic is the cost.
Live roofing-lead marketplace — Stripe in production, Twilio toll-free, Supabase + RLS, Meta Pixel + CAPI dedup, restoreSiteDeploy incident write-up.