The ideal candidate has experience in running automated production infrastructure in the cloud, such as Azure, Kubernetes, Terraform, Spinaker, and Packer.
The position offers opportunities for building and designing a modern, automated platform in the cloud, spanning multiple regions around the globe.
This is a high visibility role where the candidate will work across multiple teams to shape a common infrastructure to run machine-learning solutions.
You must have practical experience working with AWS, Kubernetes.
Responsibilities
Continuously improve the infrastructure for cloud-based services and client interfaces
Collaborate with team leads and management across the company to define shared capabilities
Manage the day-to-day operations of our build, testing, and continuous integration environment
Support an effective developer workflow including build, test automation, and deployment
Knowledge of best practices and IT operations in an always-up, always-available service
Proactively communicate project & task status to project stakeholders
Someone well versed in systems administration with a background and understanding of software development
Provide occasional on-call support which may include irregular hours as needed
Qualifications
Over 3-5 years of experience in provisioning, operations, and management of Azure/AWS environments.
Must have experience designing and deploying scalable infrastructure using Kubernetes or similar
Experience with development operations of continuous integration, automated testing, and automation of the dev process
Experience building out continuous integration/continuous delivery pipelines and overall development operations
Experience with automation/configuration management tools like Terraform, Ansible
Previous containerization experience with Docker of similar technology
Strong background in Linux/Unix Administration
Strong skills with at least one scripting languages Shell, Bash, Python
Proven experience managing multiple projects and competing priorities in a fast-paced work environment
Familiar with monitoring, metrics collection, and reporting using open source tools a plus
Strong written and verbal communication skills
Proven ability to work across multiple product teams and deliver solutions on tight deadlines
Familiarity with cloud networking and traffic management (VPCs, load balancers, network segregation) is a plus.