Infrastructure Engineer

About Us:

At Apolo, we’re committed to simplifying AI/ML operations for organizations. By addressing the deployment challenges of AI/ML in varied environments, we provide cost-effective and hassle-free solutions. Our managed services and comprehensive tools allow businesses to focus on their core objectives, ensuring seamless AI integration and innovation without the operational complexity.

The Role:

We are looking for an Infrastructure Engineer who will be crucial in managing our product infrastructure. This role requires technical expertise, leadership qualities, and a proactive mindset to ensure our systems are secure, efficient, and in line with our product goals. Ideal candidates are resourceful, excel in problem-solving, and capable of working autonomously with minimal supervision

Requirements

● Extensive knowledge and hands-on experience with Kubernetes, including overall cluster administration.

● Proficiency with cloud service providers (AWS, GCP, Azure).

● Experience in managing bare metal infrastructure.

● Proficiency in Terraform for infrastructure automation.

● Expertise in Helm for package management.

● Strong foundation in Linux system administration, with skills in performance tuning, troubleshooting, and understanding operating system internals.

● Solid networking knowledge, including TCP/IP, DNS, load balancing, and firewall configurations, to ensure secure and efficient network operations.

● Expertise in container engines such as containerd and Docker, with practical experience in configuring, managing, and optimizing containerized environments.

● Proficiency in CI/CD practices, particularly with GitHub Actions.

Responsibilities:

● Oversee infrastructure across cloud, on-premise, and bare metal environments.

● Manage resources in multiple cloud service providers (AWS, GCP, Azure).

● Enhance observability across all environments.

● Implement and integrate solutions that align with our product goals.

● Streamline provisioning pipelines, focusing on the automation of manual processes.

● Apply Infrastructure as Code (IaC) principles using tools like Terraform and Helm.

● Facilitate certification processes and maintain compliance with industry standards.

● Implement robust security hardening practices.

Desirable Skills:

● Experience with CNI, Ingress Controllers, Service Meshes, Gateways.

● Experience with CSI, NAS, NFS and other related storage technologies.

● Prometheus / Thanos, Grafana and related tools.

● Proficient in Python for scripting and automation.

Benefits

What We Offer:

● Work remotely, ensuring time zones align for effective collaboration.

● Shape the product’s direction and success by taking ownership of essential components.

● Solve complex and innovative challenges.

● Join a supportive and dynamic team environment.

● Receive a competitive salary and benefits package