Lead DevOps Engineer
ABOUT THE ROLE
Enterprise organization building a modern payments infrastructure...they run lean and move quickly, and the engineers own outcomes end-to-end. As a Lead DevOps Engineer, you’ll be responsible for keeping the platform resilient, scalable, and secure while enabling developers to ship features quickly and safely. You’ll be hands-on with AWS, automation, observability, and multi-tenant architecture.
KEY RESPONSIBILITIES
- Cloud Infrastructure
- Design, build, and operate AWS environments across multiple accounts and regions (Aurora, ECS/Fargate, Lambda, DynamoDB, SQS/SNS, CloudFront, Route 53, API Gateway).
- Balance resilience, cost optimization, and delivery speed in a fast-moving environment.
- Automation & Infrastructure as Code
- Own infrastructure as code (Terraform, CDK, or CloudFormation).
- Automate provisioning, scaling, patching, and configuration management.
- CI/CD & Developer Experience
- Build and maintain CI/CD pipelines (GitHub Actions, Jenkins, AWS CodePipeline/CodeBuild).
- Ensure developers can release quickly with built-in guardrails for quality and security.
- Observability & Reliability
- Implement centralized logging, tracing, and metrics (CloudWatch, OpenTelemetry, ELK, Dynatrace/Splunk).
- Proactively monitor and tune performance; set SLOs and alerts to catch issues early.
- Networking & Security
- Design and secure VPC architectures (IGWs, NATs, Transit Gateway, private/public subnets).
- Partner with InfoSec to harden infrastructure, enforce IAM least privilege, and ensure PCI compliance.
- High Availability & Disaster Recovery
- Architect multi-region and active/active patterns for critical services (Aurora Global, S3 replication, Route 53 failover).
- Define and automate backup, DR, and failover strategies.
- Collaboration
- Work closely with developers, product engineers, and program managers in an agile pod model.
- Provide technical mentorship and influence DevOps best practices across the company.
- Documentation & Standards
- Maintain documentation for infrastructure, runbooks, and incident response.
- Promote consistency and repeatability across all environments.
- 5–8+ years hands-on DevOps/SRE experience in AWS-heavy environments.
- Strong knowledge of AWS services: ECS/Fargate, Lambda, Aurora/RDS, DynamoDB, SQS/SNS, API Gateway, CloudFront.
- IaC expertise with Terraform (preferred) or CloudFormation/CDK.
- Solid background in CI/CD (GitHub Actions, Jenkins, CodePipeline/CodeBuild).
- Networking knowledge: VPC design, security groups, peering, Transit Gateway.
- Experience with observability tools (CloudWatch, ELK, OpenTelemetry, Dynatrace, Splunk, or similar).
- Proven ability to deliver secure, compliant environments (PCI/SOC2/HIPAA experience a plus).
- Aurora Global Database and multi-region active/active experience.
- Event-driven architectures (Kinesis, EventBridge).
- Data pipelines (Glue, Redshift, Snowflake).
- Kubernetes/EKS or other container orchestration.