Our client is a high load video conferencing platform. Earlier this year they reached a million minutes of video conferencing going through its platform every hour. That's video and audio being decoded, mixed and encoded in real-time on more than 4,000 virtual machines all around the world.
Currently, they are looking for a talented Site Reliability Engineer to join their engineering team and participate in building great software, increasing manageability of the platform and automating anything and everything.
Must have skills:
8+ years of combined experience with both software development and system administration/operations.
Proficient in at least one of the following languages: Go, Python, C or C++.
Experience managing applications running on private, public or hybrid cloud platforms.
Deep understanding of the Linux operating system.
Willingness to participate in a 24x7 on-call duty within the team for critical services and escalation workflows
Good to have skills:
Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
Experience with Docker, Kubernetes, Terraform, Ansible or equivalent technologies.
Understanding of standard networking protocols and components.
Ability to debug, optimize code, and automate routine tasks.
Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
Higher degree in Computer Science or relative field
This job comes with several perks and benefits