[Remote] Senior AIOps Engineer, Incident Response [Remote-US]
Note: The job is a remote job and is open to candidates in USA. Quanata is an insurance technology innovation company focused on creating context-based insurance solutions. They are seeking a Senior AIOps Engineer to lead production health, incident response, and operational reliability while collaborating with engineering and AI orchestration teams to enhance scalability and issue resolution.
Responsibilities
- Own production health, reliability, and operational support processes across critical systems and services
- Lead incident response efforts, stakeholder communication, root cause analysis, and post-incident reviews
- Identify patterns in production issues and drive improvements to reduce recurring incidents and operational overhead
- Design and implement AI-driven agents and workflows that automate support and operational tasks
- Partner with engineering, product, and AI orchestration teams to improve system resilience and operational efficiency
- Build and maintain operational runbooks, documentation, and knowledge base content for both human and AI-assisted workflows
- Support observability, monitoring, and troubleshooting efforts across cloud-based production environments
- Participate in on-call rotations and continuously improve operational readiness and response processes
Skills
- 6–8 years of experience in production operations, site reliability engineering, technical support engineering, or similar operational roles
- Strong background in incident management, root cause analysis, and production system troubleshooting
- Experience working within modern SDLC, DevOps, and change management environments
- Familiarity with operational tooling such as Jira, Confluence, and observability/monitoring platforms
- Strong analytical and problem-solving skills with the ability to identify trends and drive operational improvements
- Comfortable working cross-functionally with engineering, product, operations, and leadership teams
- Strong communication skills and ability to operate effectively in fast-moving technical environments
- Bachelor's degree in Computer Science, Engineering, or equivalent relevant experience
- Experience building or working with AI/LLM-powered systems, intelligent agents, or workflow automation tools
- Familiarity with cloud platforms such as AWS and modern observability ecosystems
- Experience with event-driven architectures, orchestration frameworks, or operational automation platforms
- Background leading operational transformation or reliability improvement initiatives
- Passion for AI-native operations, automation, and improving developer/support experiences
Benefits
- Medical, dental, vision, life insurance and supplemental income plans for you and your dependents
- A Headspace app subscription
- Monthly wellness allowance
- A 401(k) Plan with a company match
- A one-time payment of $2K will be provided to cover the purchase of in-home office equipment and furniture at your discretion
- MacBook Pros, which we will deliver to you fully provisioned prior to your first day
- All employees accrue four weeks of PTO in their first year of employment
- New parents receive twelve weeks of fully paid parental leave which may be taken within one year after the birth and/or adoption of a child
- The twelve weeks is applicable to both birthing and non-birthing parent
- All employees receive up to $5000 each year for professional learning, continuing education and career development
- All team members also receive LinkedIn Learning subscriptions and access to multiple different coaching opportunities through BetterUp
Company Overview