BIFROST
document evidenceDocument intelligence layer with ingestion quality gates, semantic and visual retrieval, pgvector/HNSW search, caching, source quality, and honest no-answer behavior.
Production AI systems: retrieval, agents, guardrails.
Madrid-based product engineer working across UX, backend, retrieval infrastructure, evals, guardrails, and agent runtimes.
I use Codex and Claude inside the engineering loop, but the standard is still clear product judgment, reliable software, and systems people can inspect when the model is wrong or unsure.
The public version is intentionally compact. The pattern is the same across the private work: build the product surface, define the runtime, keep evidence visible, and make failure recoverable.
Document intelligence layer with ingestion quality gates, semantic and visual retrieval, pgvector/HNSW search, caching, source quality, and honest no-answer behavior.
AI workflow runtime with context assembly, durable memory, execution tiers, run events, queues, idempotency, and human-review metadata.
Internal AI assistant product that combines BIFROST retrieval, MONARCH guardrails, cached tool handoff, citations, streaming UX, and suggestion revalidation.
The useful work is usually the system around it: how it gets context, chooses tools, exposes uncertainty, hands work back to people, records what happened, and stays understandable after months of changes.
I build products where language models are useful components inside a larger system: permissions, state, queues, interfaces, persistence, observability, cost control, and failure handling.
I design runtime state, tool use, handoffs, evaluator loops, human checkpoints, idempotency, and stopping conditions for workflows that need judgment without becoming uncontrolled automation.
I shape the environment around Codex, Claude, and human engineers: repository knowledge, executable plans, review loops, browser checks, traces, evals, and CI guardrails.
I work on the less glamorous parts of retrieval: document ingestion, chunking strategy, metadata, semantic and visual search, grounding, permission boundaries, citations, source quality, and honest no-answer paths.
I turn vague quality into examples, traces, graders, regression checks, and operating thresholds so teams can improve systems instead of arguing from screenshots.
I can own the product surface and the backend path: Next.js, TypeScript, Python, FastAPI, Postgres, Redis, background jobs, deployment, monitoring, and cost control.
I use AI tools aggressively, but not as a substitute for judgment. The leverage comes from designing a loop where humans, agents, and the running system can all give useful feedback.
I clarify the goal, the risk, and what must stay under human control. The point is not to automate everything; it is to make the right work easier to trust.
I use Codex and Claude as engineering collaborators, but I design the harness around them: context, tools, tests, reviews, and observable feedback from the running system.
I care about boundaries, data contracts, failure modes, latency, cost, and the operational screen someone will use when the model is wrong or unsure.
runtime state, tool use, traces, handoffs, human review, queues, and stopping conditions
ingestion, chunking, pgvector, visual search, cache strategy, citations, and no-answer paths
Python, FastAPI, Next.js, TypeScript, Postgres, Redis, observability, CI