Senior Site Reliability Engineer at Ritual
Toronto, CA

The Engineering team at Ritual is making it possible to launch our product in new markets globally. When they’re not solving complex problems and building scalable and reliable services, they’re hosting Super Smash Bros tournaments and working on side projects that turn into major features. Primarily a Java shop hosted on Google Cloud, we’re transitioning from a monolithic codebase on AppEngine to microservices using Docker/Kubernetes communicating over gRPC. Our data layer is in MySQL, Redis, and BigQuery, and our customers interact with our application via native iOS and Android apps, as well as a web (React) interface. It’s like… pretty cool.

The Senior Site Reliability Engineer is a very hands-on role. You will be writing code, and building tools and dashboards, and reporting/monitoring metrics. As the team grows the role will focus on developing tooling, process and methodology to empower the entire engineering org to move faster and safer.  Furthermore, you will be a key evangelist for maintaining high degrees of reliability within the platform.

As a Senior Site Reliability Engineer at Ritual you will…

  • Manage production development and improve deployment stability, global availability.
  • Design and implement observability infrastructure for production system.
  • Implement and monitor production metrics and dashboards.
  • Advise and implement production network & securities rules and best practices.
  • Deliver purpose-built solutions to meet clear milestones.
  • Drive excellence for reliability through building tooling to solve reliability challenges, designing efficient and standardized process, relentless automation, engineering reliability back into applications and maximizing performance.
  • Participate in on-call rotations and provide inputs to your team and partners to sustain SLAs.
  • Architect, review, develop and deliver applications to improve availability, scalability, performance and efficiency of our services.

We’re looking for a Senior Site Reliability Engineer that has…

  • Experience with cloud platforms
  • Containers and orchestration knowledge
  • Extensive experience in Dev Ops, SRE, or Other Production operations roles
  • Strong cloud, network and systems Architecture Design Skills.
  • Expertise with database server design
  • Practical experience with incident management and response
  • Experience with CI/CD, pipelines and tools (Jenkins)
  • Advanced expertise with at least programming language (preference python, go, or Java would be great) Polyglot preferred.

Nice to haves:

  • Experience building microservices.
  • Kubernetes (Istio, ambassador, GKE)
  • Metrics, monitoring, alerting and dashboard tools (FluentD, telegraf, InfluxDB, ELK, Grafana, DataDog, StackDriver, PagerDuty etc)
  • Experience with IaC and SCM tools (terraform, ansible, packer)

What we offer:

  • Contribute to a lifestyle-changing consumer product used worldwide
  • Ritual credits to spend on lunch and coffee at local spots
  • Healthcare coverage from day one, and a flexible vacation policy (Need time? Take time)
  • Equity in the business - ownership is a key value at Ritual and we want you to share in our long-term success
  • The good stuff to help you grow in your career with us: performance reviews, product reviews with the founders
  • The tools you need to get the job done, like that really specific keyboard set up that lets you code faster than Usain Bolt runs