Site Reliability Engineer

가우스랩스(GaussLabs)
💰 취업축하금 50만원
  • 🚆2호선 역세권 기업

포지션 상세 정보

기술스택
GCPGCP
AWSAWS
PrometheusPrometheus
PythonPython
AZUREAZURE
GrafanaGrafana
주요업무
• Monitoring and Alerting: Creating and maintaining robust monitoring systems to proactively identify and resolve issues before they impact customers. Implementing effective alerting mechanisms to ensure timely response to critical events.
• Incident Response: Participating in on-call rotations and leading incident response efforts to minimize downtime and restore service quickly.
• Automation: Developing and implementing automation tools and scripts to streamline operations, reduce manual effort, and improve efficiency.
• Capacity Planning: Forecasting resource needs, optimizing resource utilization, and ensuring the customers’ infrastructure can handle increasing workloads.
• Performance Optimization: Identifying and resolving performance bottlenecks, optimizing system performance, and improving response times.
• Collaboration: Partnering with software engineers, data scientists, and other teams to ensure alignment and efficient operations.
• Customer Focus: Working closely with the AI program manager and Technical Account Manager to understand customer issues, provide technical support, and improve customer satisfaction.
• Continuous Improvement: Driving a culture of continuous improvement by identifying opportunities to enhance system reliability, performance, and efficiency.
자격요건
• Bachelor's degree in computer science, engineering, or a related discipline
• 5+ years of industry experience as a Site Reliability Engineer
• Experience with cloud platforms (e.g., AWS, GCP, Azure).
• Experience with scripting languages (e.g., Python).
• Experience with monitoring and alerting tools (e.g., Prometheus, Grafana).
• Experience in ticket management, issue resolution, and troubleshooting
• Strong problem-solving and troubleshooting skills.
• Ability to work independently and as part of a team.
• Excellent customer communication and interpersonal skills.
우대사항
• Knowledge of containerization technologies (Docker, Kubernetes).
• Knowledge of AI/ML infrastructure and workloads.
• Knowledge of big data technologies (Hadoop, Spark).
• Fluency in verbal and written English
복지 및 혜택
• Annual medical check-up
• Group accident insurance
• Gym support
• Overtime support
• Documentation support(Grammarly)
• Self development allowance
• Growth support
채용절차 및 기타 지원 유의사항
Application Review > Phone Interview > (Virtual) Onsite Interview > CEO Interview and Core Values Interview > Offer

포지션 경력/학력/마감일/근무지역 정보

경력
경력 5~10년
학력
대학교졸업(4년) 이상
마감일
2025-02-08
근무지역
  • 서울시 강남구테헤란로201,아주빌딩7층
    지도보기

기업/서비스 소개

기업상세 정보로 이동
가우스랩스(GaussLabs)_Site Reliability Engineer
Gauss Labs is seeking a highly skilled Site Reliability Engineer to join our team. As an SRE at Gauss Labs, you will play a critical role in ensuring our industrial AI platform's reliability, performance, and scalability. You will be responsible for building and maintaining a robust solution that supports our growing business at the customer site.