Platform and ReliabilityFull-time
Infrastructure Engineer
Remote-first, Europe and Americas overlap
Build the runtime, deployment, and observability foundations that let omegaXiv run heavy research workloads without losing reliability.
You will shape the control plane for research execution, packaging, and public delivery, making the system fast to ship and boring to operate.
Apply or comment
Send a short intro, relevant work samples, and a concise explanation of the systems you have owned.
Apply for this roleShare feedback on GitHubWhat you will own
- Own deployment, runtime isolation, secrets handling, and service-to-service reliability for the platform.
- Improve CI/CD, rollout safety, and environment parity across local, staging, and production setups.
- Build observability around long-running jobs, packaging flows, and operator-facing diagnostics.
- Partner with application engineers on performance, caching, and failure containment.
- Push the platform toward simpler operations with fewer manual interventions and clearer runbooks.
What we need
- Strong experience with cloud infrastructure, containers, and service operations.
- Comfort debugging distributed failures across app code, networks, and runtime environments.
- Experience with observability, alerting, and incident response for production systems.
- Ability to simplify deployments and reduce operational surface area instead of adding layers.
- Clear judgment around security boundaries, secrets, and least-privilege defaults.
First 90 days
- Remove one painful deployment or runtime bottleneck from the current platform.
- Add clearer operational telemetry for research runs and export packaging.
- Define a pragmatic reliability roadmap for the next growth stage of omegaXiv.
Stack and environment
- Cloud Run and containers
- CI/CD
- Secrets and identity
- Tracing and metrics
- Batch orchestration
Nice to have
- Experience running GPU or batch-heavy workloads.
- Familiarity with multi-tenant execution, artifact publishing, or sandboxed job systems.
- Background with Next.js, API backends, and developer productivity tooling.
- Experience supporting research or data platforms under rapid iteration.
Working style
How we operate
We value engineers who can reason from first principles, keep systems understandable, and make tradeoffs visible. The team is small, so ownership is real and surface area is broad.
If your best work is at the intersection of product urgency and infrastructure rigor, you will likely fit well here.