As Senior Manager, LLMOps & AI Integration Engineer (Individual Contributor), you will lead the model integration, agent/tooling, and LLM operations strategy for our pip‑installable AI Capabilities SDK. You will design and maintain clean, extensible interfaces for LLMs, embeddings, retrieval, tool execution, and agentic workflows; build runtime guards and evaluation hooks; and optimize latency, reliability, and cost—so teams can embed AI safely and predictably.
Why this Role Matters
LLMOps sits at the intersection of capability, safety, and operability. By turning complex model/agent behaviors into well-abstracted, evaluable, and observable SDK modules, you enable teams to deliver AI features fast with safety—reducing bespoke implementations, ensuring reproducibility, and controlling latency/cost at scale. Your work is foundational for platform services that rely on the SDK as their backbone.
ROLE RESPONSIBILITIES
1) Model & Agent Integration Architecture
- Design Python SDK interfaces for LLM clients, embeddings, tokenization, and structured outputs (e.g., Pydantic/JSON schemas).
- Implement function/tool calling abstractions, agent orchestration patterns (ReAct-like, planner/executor), and MCP adaptors for interoperable tools.
- Provide configuration and secrets integration points that downstream services can adopt consistently (env vars, config schemas, key management hooks).
2) Retrieval & RAG Components
- Build RAG modules (document loaders, chunking/segmentation, embeddings, retrievers, rerankers) with pluggable backends (vector stores, search indices).
- Standardize connectors and interface contracts for common data sources; ensure reproducible pipelines and SDK↔service parity of behaviors.
- Optimize retrieval quality via evaluation hooks (precision/recall, MRR, hit rate) and tunable parameters (top‑k, thresholds).
3) Guardrails, Safety & Policy Integration
- Codify safety guardrails: prompt hygiene, jailbreak resistance, content filters, sensitive topic/policy checks, red‑team/adversarial probes.
- Integrate privacy/security controls: data minimization, PII detection flags, safe logging, and deterministic truncation strategies.
- Provide policy‑as‑code hooks and pre/post‑processing middleware to enforce runtime constraints within SDK flows.
4) Performance, Reliability & Cost Engineering
- Implement caching, batching, retry/backoff, circuit breakers, and fall‑back strategies across model/tool calls.
- Profile and tune latency, throughput, and token usage/cost; expose configuration knobs and budget guards (token and cost caps).
5) Observability‑by‑Design for Libraries
- Embed telemetry hooks (structured logs, metrics, traces) suitable for a library so that services consuming the SDK can attach enterprise observability.
- Define SLIs relevant to LLMOps (latency, error rates, cache hit ratio, token usage, cost per call) and document runbook guidance for downstream teams.
6) Evaluation & CI/CD Integration
- Partner with Evaluation & QA Engineers to build test harnesses (unit/integration/contract/fuzz/adversarial) and benchmark suites for agents/tools, RAG, and model endpoints.
- Work with DevOps Engineers to wire quality gates into CI/CD: coverage thresholds, performance budgets, safety checks, SBOM/signing for release artifacts.
- Provide fixtures and simulators to enable deterministic tests across backends (vector stores, providers).
7) Developer Experience & Enablement
- Collaborate with Developer Experience and Technical Writers to deliver quickstarts, sample apps, notebooks, and code recipes for common patterns (RAG, agents, tool use, structured outputs).
- Contribute to cookiecutters/CLI scaffolds that prewire configuration, tests, telemetry, and guardrails for new modules or integrations.
8) Collaboration & Continuous Improvement
- Close feedback loops with Embedded Architecture, Solution Design, AI Engineering, Creation Centers, and FITs to refine APIs, defaults, and ergonomics.
- Participate in post‑release retrospectives and incident reviews to strengthen guardrails, performance tuning, and developer experience.
MEASURES OF SUCCESS
- Performance & cost: Latency and token/cost budgets consistently met; cache hit ratios improved; fewer performance regressions.
- Safety & reliability: Guardrail tests pass; reduced policy violations and incident rates; predictable error handling.
- Adoption & DX: Faster time‑to‑first‑success for consuming teams; positive developer feedback on APIs and examples.
QUALIFICATIONS
Basic Qualifications
Preferred Qualifications
Work Location Assignment: Hybrid |