BIFROST
document evidenceFinancial document pipeline with OCR, semantic chunking, cited retrieval, and reviewable answers for messy uploads.
AI products that survive production.
Madrid-based product engineer working across UX, backend, retrieval, evals, and agentic workflows.
I use Codex and Claude inside the engineering loop, but the standard is still clear product judgment, reliable software, and systems people can operate when the model is unsure.
The public version is intentionally compact. The pattern is the same across the private work: build the product surface, define the harness, keep evidence visible, and make failure recoverable.
Financial document pipeline with OCR, semantic chunking, cited retrieval, and reviewable answers for messy uploads.
B2B collections workflow with intent parsing, deterministic states, drafting support, and escalation rules.
Permission-aware retrieval, cited answers, and streaming UX for support, sales, and product teams.
The useful work is usually the system around it: how it gets context, chooses tools, exposes uncertainty, hands work back to people, and stays understandable after months of changes.
I build products where language models are useful components inside a larger system: permissions, state, queues, interfaces, persistence, observability, and failure handling.
I design routing, tool use, handoffs, evaluator loops, human checkpoints, and stopping conditions for workflows that need judgment without becoming uncontrolled automation.
I shape the environment around Codex, Claude, and human engineers: repository knowledge, executable plans, review loops, browser checks, traces, evals, and CI guardrails.
I work on the less glamorous parts of RAG: document ingestion, chunking strategy, metadata, grounding, permission boundaries, citations, and workflows for when retrieval is uncertain.
I turn vague quality into examples, traces, graders, regression checks, and operating thresholds so teams can improve systems instead of arguing from screenshots.
I can own the product surface and the backend path: Next.js, TypeScript, Python, FastAPI, Postgres, Redis, background jobs, deployment, monitoring, and cost control.
I use AI tools aggressively, but not as a substitute for judgment. The leverage comes from designing a loop where humans, agents, and the running system can all give useful feedback.
I clarify the goal, the risk, and what must stay under human control. The point is not to automate everything; it is to make the right work easier to trust.
I use Codex and Claude as engineering collaborators, but I design the harness around them: context, tools, tests, reviews, and observable feedback from the running system.
I care about boundaries, data contracts, failure modes, latency, cost, and the operational screen someone will use when the model is wrong or unsure.
routing, tool use, handoffs, evaluator loops, and stopping conditions
repo-local knowledge, executable plans, browser checks, traces, and review loops
Python, FastAPI, Next.js, TypeScript, Postgres, Redis, observability, CI