Site Reliability Engineer (EU timezones)
TileDB is looking for a Site Reliability Engineer to join our dedicated infrastructure team and help us develop, maintain and administer our TileDB Cloud product. Your key responsibilities will include managing, configuring and deploying infrastructure components such as Kubernetes clusters, MariaDB, Cloudflare workers as well as developing tools to automate them. You will help to expand TileDB Cloud to other cloud providers such as Google Cloud, Microsoft Azure and Alibaba Cloud, along with improving our existing multi-region setup. You also will assist with establishing deployments of TileDB Cloud Enterprise edition for on-premise use.
Company
We build the TileDB Cloud Database, and an interlocking set of vertical solutions using the TileDB Arrays system: for genomics, geospatial, imaging, and other applications. TileDB was founded in 2017, after several years of implementation at Intel Labs, to solve use-cases at the Broad Institute and MIT. TileDB has raised over $50 million in funding from high-profile investors, with our most recent funding round ($36 million Series B) announced in Oct. 2023.
We are a fully-remote, distributed team with employees in the USA, Europe, and South America. Our core business hours are 9 AM-12 PM US Eastern Time for meetings with team overlap. With headquarters are in Cambridge, MA, USA, and a subsidiary in Athens, Greece, we have employees on three continents.
Expectations
In your first 30 days, you will familiarize yourself with TileDB, TileDB Cloud and our Kubernetes infrastructure. After 30 days, you will be fully integrated in our team. You’ll be an active contributor and maintainer of the TileDB Cloud infrastructure, and ready to start adding additional functionality.
Note: role is expected to cover EU timezone working hours.
How You Will Contribute
- Designing and building a new distributed batch task graph feature
- Optimizing and iterating on the horizontal scaling solution of our task infrastructure
- Creating self serviced and customer driven usability improvements (global search API, performance improvements to access control)
- Making our product multi-cloud (GCP, Azure)
- Participating in on-call rotations
Our Interview Process (~1 week)
- 45 min call covering screening questions, a resume walk and time set aside for questions about the role and team
- ~1 hour technical assessment using CoderByte, containing a TileDB specific exercise with some open ended discussion questions
- Note: skipped if demonstrable open source contributions or example work can be provided
- 45 min call with the Cloud team’s Engineering Manager
- 45 min call with our CTO/CEO
- Offer
Requirements
Prerequisites
- In-depth experience using Kubernetes for production service deployments
- Obsessed with infrastructure as code (we use Terraform)
- Experience with performance monitoring tools (Prometheus, Cloudwatch, etc)
- Experience debugging performance critical applications
- Software engineering experience
- An automate-everything mindset
- Willing to handle on-call responsibilities as part of a team
Bonus points
- In-depth knowledge of Linux, Networking and well known protocols (DNS, HTTP, TCP etc.)
- Experience running and managing Databases
Benefits
- Fully-remote: work where you are most productive
- Note: this role is expected to cover EU timezone working hours
- Stock options
- Private medical insurance (MetLife)
- Flexible hours – we do our best to allow schedules that fit everyone’s needs
- Generous training budget – we love Ardan Labs’ Ultimate Go!