Site Reliability Engineer

Sumo Logic

Région

N/A
  • Role can be remote - from anywhere in Poland

We are a cloud-native SaaS machine data analytics platform, solving complex monitoring problems for DevOps, SecOps and ITOps teams.  Our customers, including Airbnb, Twitter, BBC and Toyota, choose our solution because it allows them to easily monitor and optimise their large scale applications.

Our micro services architecture in AWS ingests hundreds of TB daily across many geographic regions. We also have short release cycles and no legacy versions to maintain. We write in Scala and use open-source technologies such as Kafka, Kubernetes and Cassandra.

As a Site Reliability Engineer you will work towards enhancing the reliability of Sumo Logic product. Our customers rely not only on a rich feature set of the product but also on it being always available - often it’s their primary tool for maintaining their own software. 

The SRE team is unique in Sumo Logic as it doesn’t own any product service, you will work towards the whole codebase of the Sumo Logic product. You will identify the weakest links in either reliability or performance, research and benchmark possible improvements, and implement solutions in cooperation with the owning teams. You will not focus narrowly, there’s a broad spectrum of topics and projects you might get involved in. You will not operate the software, but create tools for other teams to increase visibility, observability, and scalability of Sumo Logic services.

As a Senior Site Reliability Engineer you will:

Deal with software which processes data at a huge scale

Identify reliability improvement areas based on past evidence of production incidents

Program in Scala

Research, benchmark, optimise and implement solutions aiming at improving the performance and reliability of our product

Work with other teams in Sumo Logic Engineering to increase the observability of their services, share reliability knowledge, automate toil, improve their tooling and replace manual processes

Example projects:

SLI (Service Level Indicator) monitoring

Performance measurements and visualisation tooling (perf, ebpf, flamegraphs)

Configuration as Code

Optimising usage of cloud services (AWS)

You have:

At least BSc in Computer Science or related field

6+ (Senior) / 9+ (Staff) years of professional experience

Good coding skills in any language. Object oriented languages are preferred

Strong troubleshooting skills in complex systems

Ability to rapidly learn new software, frameworks, open source tools and development languages

Strong knowledge of large-scale internet service architecture (e.g. load balancing)

Strong understanding of Unix and TCP/IP fundamentals

Ideally you also have:

Experience with performance, scalability, and reliability issues of 24x7 commercial services

Proficiency with the Amazon AWS ecosystem

Self-driven and being proactive

Configuration and maintenance of common infrastructure such as Apache ZooKeeper, HAProxy

Experience working in a test-driven environment

Why it’s worth applying:

Great salary - employment contract (65% authorship costs).

Strong engineering teams.

Stock (RSU) grant.

$2000 / year education budget + 2 extra days off. 

4 extra days off in 2022 (Sumo Wellness Days).

Hack weeks and tech talks.

Private healthcare for you and your family.

Medical and life insurance. 

Sports card.

WFH budget. 

Lunch budget.

Individual English lessons with a native speaker.

You can work from the office, 100% remotely or in a hybrid model. 

LI-Remote

LI-AO1

Salary and compensation

No salary data published by company so we estimated salary based on similar jobs related to Amazon, Education, Cloud, Scala, Senior, SaaS, Engineer and Apache jobs that are similar:

$70,000 — $120,000/year

Location

Wrocław, Lower Silesian Voivodeship, Poland

Sumo Logic

Société

Sumo Logic