SRE Metrics Analyst Intern improving system reliability through data collection and analysis. Engage with engineering teams to shape metrics strategies for operational excellence.
Responsibilities
Design and implement a comprehensive metrics collection framework that captures key performance indicators (KPIs) related to system reliability and operational efficiency.
Identify relevant metrics and establish methods for collecting, aggregating, and storing data from various sources, including monitoring tools, logs, and databases.
Analyze collected metrics to identify trends, patterns, and anomalies that impact system reliability and performance.
Develop dashboards and visualizations to present data in a clear and actionable manner using tools such as Grafana, Kibana, or Tableau.
Create regular reports on system performance, reliability, incident response times, and other critical metrics for various stakeholders, including technical teams and management.
Provide insights and recommendations based on data analysis to drive continuous improvement initiatives.
Work closely with SRE teams to identify their metric needs and ensure alignment with operational goals.
Collaborate with engineering and operations teams to ensure that metric collection is integrated into development and deployment processes.
Requirements
Enrolled in a degree program in a related major - GPA 3.0 or better
US citizenship required
Ability to obtain and maintain a DoD security clearance
Experience in metrics collection, data analysis, or reporting, preferably in a Site Reliability Engineering or DevOps environment.
Proven experience in working with monitoring and observability tools (e.g., Prometheus, Datadog, New Relic).
Strong understanding of key metrics used in site reliability engineering, including SLIs, SLOs, and SLAs.
Proficiency in data analysis tools and languages (e.g., SQL, Python, R) for data manipulation and reporting.
Experience with data visualization tools (e.g., Grafana, Kibana, Tableau) to create dashboards and reports.
Senior DevOps/Infra Engineer collaborating with top digital entertainment companies on impactful projects. Offering a blend of freelance flexibility and traditional employment security in Stockholm.
Senior Database Reliability Engineer enhancing MongoDB and PostgreSQL deployments at SS&C, a leader in financial services technology. Collaborating with teams to ensure operational reliability and mentor junior engineers.
DevOps Engineer at Smile enhancing performance and security for digital transformation projects. Collaborating on end - to - end solutions and driving operational efficiency in a digital environment.
DevOps Engineer managing automation lifecycle and technical infrastructure support for gaming company. Collaborating with IT Operations and business units to streamline issue resolution and enhance service quality.
DevSecOps Engineer responsible for CI/CD pipeline design, infrastructure automation, and ensuring operational reliability in a fast - growing AI startup.
DevOps Engineer defining DevOps strategies and collaborating with teams at Pacific Programming and Tech. Building infrastructure and processes for software solutions in a hybrid environment.
Senior DevOps Engineer managing Azure cloud infrastructure for AI solutions in healthcare. Architecting and maintaining multi - tenant Azure environments while ensuring compliance and security.
DevSecOps Engineer modernizing multi - cloud environments for Leidos. Collaborating across AWS, Azure, Google, and Oracle clouds to support mission - critical systems.
Senior DevOps Engineer at Leidos contributing to mission - critical programs for national security. Focusing on platform architecture, automation, and cloud infrastructure solutions.
Associate DevOps Engineer enhancing application operations for secure digitization solutions at Bundesdruckerei GmbH. Collaborating on CI/CD processes in an agile team setting.