Director, Infrastructure & SRE at TailorCare

Lead Posted about 2 hours ago RemoteFirstJobs Product

Operations

AI summary: Director leads infrastructure, site reliability, disaster recovery, and security operations while maintaining hands-on technical contributions in Terraform, incident management, and platform architecture.

Description

About the Role

The Director of Infrastructure & SRE owns the function end-to-end: reliability, security, scalability, and operational governance of TailorCare’s infrastructure, plus the team that delivers it. You will be a peer to the Director of Software Engineering, Director of Data Engineering, and Director of Data Science, own the Infrastructure & SRE scorecard in front of the executive team, and lead vendor escalations with Salesforce, AWS, and Cresta, among others, at the Director level.

This is a player-coach role. In year one you will spend roughly 60% of your time hands-on (writing Terraform, leading incidents, doing architecture work) and 40% building the team and the practice. As the team scales, that ratio shifts toward leadership, but you will never stop being technical.

This is not a slideware role. We are not hiring a manager who reviews architecture diagrams from a distance. We are hiring an operator who codes, runs incidents, owns the platform, and ships

Primary Responsibilities

Infrastructure as Code

Converge all AWS resources to Terraform; eliminate manual provisioning
Establish reproducible environments (dev, staging, production) with proper isolation and parity
Standardize CI/CD pipelines across all engineering teams

Site Reliability

Define and operate SLOs, SLIs, and error budgets for all production systems (web/mobile applications, Salesforce, data processing, telephony stack)
Build observability (metrics, logs, traces, alerting) across AWS, Salesforce, telephony/omni-channel, and Cresta integrations
Stand up the infrastructure on-call rotation, incident management, and post-incident review discipline, including RCAs
Own uptime, MTTR, and incident-volume trends as published metrics

Disaster Recovery & Business Continuity

Design and implement a tested DR strategy with documented RPO/RTO commitments
Validate recovery procedures on a recurring cadence
Align DR posture with HITRUST and HIPAA expectations

Integration Reliability

Stabilize Salesforce, telephony/omni-channel, and Cresta integrations; close persistent gaps in skills-based routing, warm transfers, and telephony data parity
Partner with Data Engineering on the reliability of data ingest paths (Fivetran, SFTP, S3) and Salesforce bulk API flows.

Security & Compliance Engineering

Translate Security & Compliance policy into enforced infrastructure controls: IAM, encryption (at rest and in transit), network segmentation, secrets management, audit logging
Partner with Security & Compliance on HITRUST evidence, audit readiness, and remediation
Own vulnerability management across cloud and application layers

Email & Domain Infrastructure

Fix DNS, SPF, DKIM, DMARC, and IP reputation to resolve spam-folder deliverability impacting patient and operational communications
Own all TailorCare domain and email infrastructure

Developer Experience

Build and maintain test, staging, and ephemeral environments engineers actually use
Reduce cycle time and remove infrastructure friction from the SDLC
Establish self-service tooling so engineers ship without filing tickets

Team & Function Leadership

Hire, level, develop, and retain the Infrastructure & SRE team
Own the function’s MBR contribution: scorecard, risks, decisions needed
Partner with Engineering, Data, Product, and Security & Compliance leadership as a peer

Other duties as assigned

Qualifications

10+ years in Infrastructure Engineering, SRE, or DevOps, with 3+ years in a senior IC or tech lead role and 2+ years directly managing engineers
Recent hands-on technical work (within the last 12 to 18 months) in Terraform, AWS, and production incident response
Track record of hiring, leveling, and developing infrastructure or SRE engineers
Deep AWS expertise (VPC, IAM, ECS/EKS, Lambda, RDS, DynamoDB, S3, API Gateway, WAF, Connect)
Production Terraform experience at scale (modules, state management, multi-environment)
Hands-on with observability stacks (CloudWatch, Datadog, Grafana, or equivalents)
Demonstrated experience standing up SRE practices: SLOs, on-call, incident management, blameless postmortems
Experience operating in a HIPAA or comparably regulated environment (PCI, SOC 2 Type II, HITRUST, FedRAMP)
CI/CD pipeline design (GitHub Actions, GitLab CI, or equivalent)
Ability and willingness to travel up to 10% as needed for onsite meetings, team collaboration, and company events.

Preferred Qualifications

Salesforce platform integration and operational experience
Amazon Connect or comparable contact center telephony platforms
Data platforms (Databricks, Snowflake, Fivetran)
HITRUST certification participation (e1 or r2)
AI/LLM-assisted operations tooling
Experience scaling an infrastructure function in a healthcare or other regulated growth-stage company

Who You Are

You own outcomes. When something breaks, you fix it and improve the system so it does not happen again.
You write code and ship infrastructure. You lead by doing, not by delegating.
You surface risks early. Bad news early is manageable; bad news late is expensive.
You build for clarity and simplicity. You distrust complexity that does not earn its keep.
You bring calm to incidents and discipline to operations.
You grow engineers. You hire well, develop your team, and create the kind of operating environment where senior people want to work.
You communicate with executives the way they want to be communicated with: concise, structured, honest, low-drama.

What you will deliver in year one

This role is explicitly hands-on. In year one:
You will personally write production Terraform and review infrastructure pull requests
You will influence product and engineering roadmaps in order to achieve the operational standards expected of the organization and our clients
You will participate in the infrastructure on-call rotation while it is being built
You will lead incidents until the team and process are mature enough to do so without you
You will pair directly with engineers on critical migrations

What’s In It For You

Meaningful Work: We are dedicated to our mission and deeply value our patients and each other. Each day offers the opportunity to make a positive impact.
Work Environment: We operate as a remote-first company with options for a hybrid work model in Nashville.
Time Off: Our generous paid time off (PTO) and holiday plans ensure you have ample time to rest and recharge.
Family First: We offer paid parental leave and support a healthy work-life balance, encouraging flexibility and autonomy. We love talking about our family and pets!
Comprehensive Benefits: From Day 1, employees enjoy medical, dental, vision, life, and disability insurance, wellness resources and an employer HSA contribution.
Fair Compensation: We are committed to equitable pay for all team members and support your future goals with a 401k plan that includes employer matching.
Community: We foster an inclusive environment where you can rely on your teammates, share honest feedback, and feel comfortable being your authentic self at work each day.

TailorCare seeks to recruit and retain staff from diverse backgrounds and encourages qualified candidates to apply. TailorCare is an equal opportunity employer and does not discriminate on the basis of age, sex, gender identity/expression, sexual orientation, color, race, creed, national origin, ancestry, religion, marital status, political belief, physical or mental disability, pregnancy, military, or veteran status.

Apply on source