Senior DevOps Engineer

About the job

As a DevOps engineer at our AI product company, you will define and create the platform for deploying, managing, and optimizing our distributed systems across on-premises, multiple cloud environments (AWS, Azure, Google Cloud), and Kubernetes.

Our system leverages multiple LLMs, Graph and Vector Databases and integrates data from multiple sources to power our AI solutions. You will ensure our infrastructure is robust, scalable, and secure, supporting the seamless delivery of our innovative products. This role requires combining cloud technologies and database management expertise, embracing the challenges of integrating AI and machine learning workflows on modern GPUs.

Requirements

You have

  • Excellent problem-solving and technical skills
  • Experience in building platforms with custom deployment models
  • Structured working approach and ability to piecemeal on long-term goals. 
  • Ability to document and explain technical details clearly, with solid communication and collaboration skills
  • Interest and experience in working on early-stage software and solving various problems

Key Responsibilities

  • Deploy, manage, and scale cloud infrastructure, meeting the required SLAs
  • Manage graph and vector databases for optimal performance and reliability
  • Maintain and Operate platform Observability. 
  • Ensure system security by implementing best practices and complying with data protection laws
  • Provide technical support, troubleshoot complex issues, and ensure uninterrupted service
  • Document system configurations and procedures and generate performance reports
  • Cost Management of infrastructure

Requirements

  • Preferred M.Sc or Ph.d degree in Computer Science or a related field
  • At least 7 years of experience deploying and managing cloud infrastructure (AWS, Azure, Google Cloud) 
  • At least 3 years experience in working with kubernetes environments
  • Proficient in managing and scaling Kubernetes clusters, including monitoring, troubleshooting, and ensuring high availability
  • Experience with cloud-native technologies, CI/CD pipelines, and containerization tools (e.g., Docker)
  • Familiarity with data integration and management from multiple sources in a distributed system environment
  • Proficiency in at least one programming language (Python, Java, Go), and experience with scripting for automation
  • Strong understanding of network infrastructure and security principles, ensuring compliance with data protection regulations

A Bonus:

  • Proficient in database management, specifically with Neo4j and vector databases, including setup, scaling, and optimization for performance and reliability
  • Experience deploying and running Machine Learning Solutions, including LLMs

Benefits

Benefits

  • Competitive salary
  • Opportunities to work on groundbreaking NLP & AI-related projects
  • Remote working (Remote must be within 4-5 hours of CET timezone)