Skip to main content
petruarakiss
aboutprojectswritingCV
CV

Open for staff / principal roles · Madrid

Themodelischeap.Thesystemisn't.

Retrieval, agent runtimes, guardrails, evals — the infrastructure around the model, built for production teams who need it to hold up after launch.

Download CVLinkedInGitHub
operational_meta_harness
governed_session
01 / policy_engine
gommage.rs
restrictive_sandbox_mode
02 / trace_recorder
traceframe.rs
append_only_traces: 3_nodes
isolated_sandbox_telemetry[pid_8192]
cpu_util
1.2%
memory_rss
42MB
approval_tier
strict_v1
ledger_evidence_feed
00:01:02sandboxkernel_sandbox_init: isolate_pid_8192
00:01:03gommagepolicy_load: default_requirements.toml
00:01:04traceframetrace_init: append_only_evidence_stream_created

01 / positioning

Most AI projects fail for boring reasons. Making them reliable is a software engineering problem.

I am a production AI engineer with 20 years in software. Before shifting to AI platform work, I spent years shipping code for banks and retailers where production bugs were expensive. My focus isn't prompt engineering or clever tricks to bypass limitations. I build the operational layer around the model: retrieval quality, deterministic runtimes, strict guardrails, and interfaces operators can actually trust when a prediction is wrong.

operating principles

  • Deterministic Safety
  • Inspectable Traces
  • System Governance
20 Yrs

Full-stack architecture and platform execution behind financial and B2B products.

[01.01_context]

At Atlax360, I lead AI engineering across BIFROST (our document intelligence layer), ORVIAN (our multi-tenant workflow runtime), and Polaris (our internal knowledge assistant). These are not portfolio mockups or hackathon prototypes—they are live systems running inside financial operations. My job is to ensure they hold up under production stress.

[01.02_context]

I use coding agents and language models heavily inside my own engineering loop, but I have no illusions about their limitations. The final standard is always deterministic safety, inspectable traces, and software that keeps running when input quality, permissions, or API latency get messy.

Full backgroundProject inventory
corporate footprint & trust track
BBVA & Santander

Led engineering teams and major rewrite processes inside strict, highly regulated financial environments.

12–20 Engineer Lead

Managed cross-functional platform teams in complex shipping schedules and legacy software transitions.

IBM & LF Credentials

IBM Machine Learning Professional & Linux Foundation Node.js developer verified certifications.

Deployed for:BBVASantanderBankinterDecathlonEl Corte Inglés

02 / current proof

Private work. Public patterns.

Most of my work stays behind enterprise firewalls. These three systems show the technical architecture I build: ingestion pipelines, multi-tenant workflow runtimes, and guardrailed assistant interfaces.

01

BIFROST

document intelligence

Document intelligence for messy real-world inputs: ingestion quality gates, semantic and visual chunking, pgvector/HNSW search, caching, source-quality scoring, analytics, and explicit no-answer paths when evidence is weak.

[bifrost.stream]
active_pipeline
01 / ingestprocessing
raw_document.pdf
unstructured_text
02 / chunkidle
idle
03 / vectoridle
idle
02

ORVIAN

workflow runtime

Multi-tenant AI workflow runtime: context assembly, durable memory, deterministic and cached execution tiers, queue processing, idempotency, run events, and human-review metadata when automation should stop.

[orvian.scheduler]
active_runtime
incoming_queue
tx_9012
idx_0
tx_9013
idx_1
tx_9014
idx_2
executing_thread
tx_9011busy
memory: 512mb_max
sandbox_id: sb_node_v1
realtime_logsstream_active
> scheduler.spawn_process()
> init_thread_pool
> task_completed_exit_0
03

Polaris

internal assistant

Internal assistant product combining BIFROST retrieval with guardrails, citations, streaming UX, suggestion revalidation, and operator analytics — one surface for support, sales, and product teams.

[polaris.client]
hybrid_routed
user_intent_query
"Explicar transacciones fallidas en Bifrost para el cliente 8901"
assembling_context...
assistant_outputcitations_present
Awaiting pipeline context assembly...

03 / open source

Deterministic utilities. Publicly shared.

Most AI frameworks rely entirely on prompts for behavior control. I publish Rust and TypeScript libraries that enforce determinism, tracing, and PII sanitization at the compile or platform level.

gommage / Rust

gommage

Policy-as-code permission harness for AI coding agents: deterministic rule files instead of prompt-only safety instructions.

github.com/Arakiss →
traceframe / Rust

traceframe

Local-first trace recorder and inspector for agentic workflows: append-only evidence, hook ingestion, and signed audits.

github.com/Arakiss →
vestig / TypeScript

vestig

Runtime-agnostic structured logging with context propagation, observability primitives, and automatic PII sanitization.

github.com/Arakiss →

04 / what ships

Everything except the slide deck.

To me, AI engineering means owning the full execution loop—not wiring up a thin API wrapper and calling it a platform. I build the retrieval index, the orchestration runtime, the evaluation harness, and the operator-facing interface.

Retrieval that survives mess

Chunking strategy, metadata, permissions, source quality, citations, caching, and the unglamorous work of saying “I don't know” when the index is wrong or the document is ambiguous.

Agent runtimes with brakes

Tool boundaries, stopping conditions, traces, handoffs, evaluators, queues, and cost controls — not infinite autonomy theater that breaks the first time an edge case hits production.

Harness engineering

Repository context, executable plans, browser checks, CI guardrails, and review loops around Codex and Claude — so agent output stays legible inside a real codebase.

Product surfaces people trust

Next.js, TypeScript, streaming UX, operator screens, and the interface someone uses when the model misfires — because AI products fail in the UI long before they fail in the benchmark.

Read field notes

05 / operating model

How the work actually runs.

I operate with high autonomy and own production outcomes. Based in Madrid, I work remote-first across European time zones, targeting Staff and Principal roles in teams that treat language models as core product infrastructure—not as a marketing layer on top of brittle glue code.

Production stack

  • Python · FastAPI · TypeScript · Next.js
  • PostgreSQL · pgvector · Redis · Supabase
  • OpenAI · Anthropic · Vercel AI SDK
  • Evals · traces · guardrails · observability

Good fit when

  • Teams shipping AI inside regulated or operationally heavy environments
  • Leaders who need retrieval, runtime, and product in one senior engineer
  • Organizations past the demo phase and into reliability, cost, and permissions
runtime before hypecontext is a product surfaceevals before confidencecode must stay legibleruntime before hypecontext is a product surfaceevals before confidencecode must stay legibleruntime before hypecontext is a product surfaceevals before confidencecode must stay legible

06 / questions

Straight answers for search, recruiters, and models.

routing guide

Straight answers.

Quick facts, core stack, and operational fit compiled for recruiters, hiring managers, and search crawlers.

Download CV

07

Let's talk about production AI.

LATENCY_LIMIT: 250ms
OBSERVABILITY: enabled
SYSTEM_STATUS: holding_up
[pa.ops.ledger_init]

I am looking for teams treating language models as product infrastructure — where retrieval quality, cost, latency, and safety are treated as software engineering problems.

Available for select conversations
|
Madrid / Remote across EU
Get in touchCV
© 2026 Petru Arakiss · MadridStaff / Principal AI engineering · Systems that hold up
aboutprojectswriting