AI engineering lead · Madrid · remote EU

I build AI systems after the prototype.

Retrieval, agent runtimes, guardrails, evals, traces, and the screens operators need when an answer is wrong, incomplete, or expensive. Professional software work since 2006. Machine learning since 2015.

Currently
AI Engineering Lead · Atlax360
Experience
Software work since 2006 · ML since 2015
Focus
Production LLM · RAG · agents · evals · guardrails
Based
Madrid · Remote across EU

01 / background

Banking software habits, applied to AI.

Professional software work since 2006, across salaried company roles and selected freelance/client work. Machine learning since 2015. Before the current LLM wave, I shipped systems for banks and retailers where defects hit money movement, customer data, compliance review, and release windows.

since

2006

Professional software work across product, frontend, backend, and platform systems.

ml since

2015

now

Retrieval, agent runtimes, guardrails, evals, traces, and operator screens.

At Atlax360 I lead AI engineering across BIFROST, ORVIAN, and Polaris: document intelligence, an AI workflow runtime, and an internal assistant. They run inside financial operations, so retrieval quality, permissions, cost, latency, and review paths matter more than a polished chat response.

I use coding agents and language models heavily in my own work. The useful part is not the fluent output. It is the surrounding system: context, tool limits, tests, traces, approvals, and clear stopping points when evidence is weak.

Worked with
Company work

BBVA, Santander/PagoFX, Bankinter, Decathlon, El Corte Inglés, and current Atlax360 work.

Team ownership

Led architecture and delivery across frontend, backend, platform, and AI systems.

Public work

Open-source repos for agent policy, memory, tracing, logging, and local systems tooling.

ClientsBBVASantanderBankinterDecathlonEl Corte Inglés

02 / current work

Current production work.

The implementations are private. The shape is public enough to explain the work: document ingestion and retrieval, a multi-tenant AI workflow runtime, and an internal assistant with guardrails and citations.

01

BIFROST

document intelligence

Document intelligence for the documents finance actually runs on: ingestion quality gates, semantic and visual chunking, pgvector/HNSW search, caching, source-quality scoring, analytics, and explicit no-answer paths when evidence is weak.

02

ORVIAN

workflow runtime

Multi-tenant AI workflow runtime: context assembly, durable memory, deterministic and cached execution tiers, queue processing, idempotency, run events, and human-review metadata when automation should stop.

03

Polaris

internal assistant

Internal assistant that combines BIFROST retrieval with guardrails, citations, streaming UX, suggestion revalidation, and operator analytics. One surface for support, sales, and product teams.

03 / open source

Public engineering work.

Public work I can point to directly: agent policy, traces, governed memory, structured observability, local systems tooling, and developer environments.

04 / what i work on

The parts I usually own.

Most of the value sits between a source document and the person using the result. That path includes indexing, permissions, runtime state, evals, guardrails, latency, and the UI for failure handling.

Retrieval

Chunking, metadata, permissions, source quality, citations, and caching, with explicit refusal when retrieved sources are weak, contradictory, missing, or out of scope.

Agent runtimes

Tool boundaries, stopping conditions, traces, handoffs, evaluators, queues, and cost control, defined before an agent reaches production.

Evaluation

Eval sets, abstain logic, and regression traces. The discipline of checking whether a change helped before treating it as an improvement.

Compliance-aware operations

Permissions, audit trails, PII handling, and human-in-the-loop review for AI running inside regulated finance, where a fluent wrong answer is still a defect.

Harness engineering

Repository context, executable plans, browser checks, CI gates, and review logs around coding agents, so their output stays verifiable inside a real codebase.

Product surfaces

Next.js, TypeScript, streaming UX, and operator screens for model failures, showing retrieved evidence, failure state, audit trail, escalation path, and the next valid action.

Read field notes

05 / how i work

How I work.

I work async and stay accountable for retrieval quality, runtime behavior, latency, cost, permissions, and operator workflows. Madrid-based, remote-first across the EU. Open to Staff, Principal, Architect, and Forward Deployed roles, and to contract or fractional engagements where retrieval, agents, evals, and guardrails are part of the shipped product.

Production stack

  • Python · FastAPI · TypeScript · Next.js
  • PostgreSQL · pgvector · Redis · Supabase
  • OpenAI · Anthropic · Vercel AI SDK
  • Evals · traces · guardrails · observability

Good fit when

  • Teams shipping AI inside regulated or operationally heavy environments
  • Roles that need retrieval, runtime, evals, and product judgment in one architect
  • Products where an AI feature is already close to users and now needs reliability, cost control, and permissions

06 / questions

Common questions.

Quick facts on stack, roles, and fit. The CV has the full timeline.

Download CV

07

When the prototype has to survive real use.

I work with teams shipping LLM features that need owned retrieval quality, cost ceilings, latency targets, permission checks, and failure review. Available full-time, contract, or fractional. Madrid-based, remote across the EU.

Available for full-time, contract, and fractional work
Madrid · Remote across the EU
© 2026 Petru Arakiss · MadridAI engineer · full-time, contract, fractional · remote across the EU