Senior/Lead AI DevOps/SRE Latvia or Remote
Senior/Lead AI DevOps/SRE Description
As a key team member, you will spearhead both traditional DevOps responsibilities and innovative approaches to MLOps. Your proactive involvement will be essential in driving the success of our AI initiatives and maximizing their impact across the organization.
#wca-senior-lead-ai-devops
#Big-Data-5-LV
#May-Referral-Digest-LV
#AI-Integration-vacancies-LV
What You’ll Do
- Implement and maintain CI/CD pipelines for AI and machine learning projects, ensuring robust deployment strategies and continuous integration
- Monitor and ensure the reliability, availability, and performance of AI applications, particularly those involving LLMs and RAG
- Collaborate with AI research teams to operationalize machine learning models and systems efficiently
- Develop and enforce best practices for version control, configuration management, and testing of AI-driven software solutions
- Utilize MLOps tools such as Kubeflow, MLflow, or TensorFlow Extended (TFX) to streamline the machine learning lifecycle from experimentation to production
- Implement monitoring solutions that track both system metrics and model performance to facilitate proactive issue resolution
- Participate in on-call rotations to support the operational health of critical systems, employing SRE principles to meet service-level objectives (SLOs) and reduce downtime
What You Have
- Bachelor’s degree in Computer Science, Engineering, or a related field
- Proven experience as a DevOps Engineer or SRE, with a strong background in software development and automation
- Expertise in deployment and management of LLMs, including technologies like RAG
- Proficient in CI/CD tools (Jenkins, GitLab CI, CircleCI) and infrastructure as code (Terraform, Ansible)
- Solid knowledge of container orchestration technologies (Kubernetes, Docker)
- Familiarity with MLOps tools and practices to support machine learning lifecycle management
Nice to Have
- Experience with cloud services (AWS, GCP, Azure), particularly in AI/ML deployments
- Background in monitoring tools like Prometheus, Grafana, and ELK stack
- Understanding of Python, particularly in data science and machine learning contexts
- Certification in Kubernetes, AWS/GCP/Azure, or similar technologies
We Offer
- Engineering Heritage: Best-in-class experts sharing a culture of excellence and tackling complex engineering challenges for over 30 years
- Advanced Tech Stack: Innovative projects where you can apply or enhance your expertise in Cloud, Data, AI and other emerging technologies
- World-Class Clients: Work closely with 295+ of Forbes Global 2000 on creating disruptive solutions that make a global impact
- Professional Growth: Exceptional support for career development with comprehensive in-house resources for upskilling or reskilling
- GenAI-X Global Community: A network of AI enthusiasts within EPAM, dedicated to exploring and sharing AI innovations
- Entrepreneurial Culture: If you're passionate and dedicated to improving business transformation, we provide the support you need to bring your ideas to life
- Hybrid Setup: The flexibility to work from any location in Latvia, whether it's your home or our office in Riga
- Other Benefits: 3400-5900 EUR gross, based on interview results, additional vacation and trust days, private health insurance, Employee Stock Purchase Plan and more
About EPAM
- EPAM is a leading global provider of digital platform engineering and development services. For over 30 years, our team has helped leading brands navigate the waves of digital transformation, building solutions that help them stay competitive through constant market disruption
- With offices in 55+ countries, EPAM has grown in Latvia to over 130+ talented innovators in 2 years. We foster creativity and unconventional ways of doing things, welcoming like-minded professionals to join us