Tag: site reliability engineering
Harnessing AI for Automated and Toil-Free SRE
AI not only reduces toil but also contributes to improving system reliability, efficiency and scalability, forming a critical part of modern SRE practices ...
Revolutionizing the Nine Pillars of SRE With AI-Engineered Tools
In my blog Rapid Strategic SRE Assessments Accelerate IT Transformations published last year, I classified site reliability engineering (SRE) into nine pillars of SRE practices—a comprehensive framework that covers the full scope ...
Why SREs Are Critical to DevOps
Although a relatively new concept, site reliability engineers (SREs) have become crucial for DevOps teams, helping to solve an array of operational problems such as network availability and user experience. However, in ...
Best of 2022: Day in the Life of a Site Reliability Engineer (SRE)
As we close out 2022, we at DevOps.com wanted to highlight the most popular articles of the year. Following is the latest in our series of the Best of 2022. By now, ...
SRE Survey Reveals Major Technical and Cultural Challenges
Catchpoint, in partnership with Blameless, today published an annual survey of 559 site reliability engineers (SREs) that found 59% of respondents didn't view tool sprawl to be a major concern. Another 40% ...
Scaling Predictive Analytics With AIOps to Drive Next-Gen SRE
Enterprise systems are only as valuable as they are reliable, in the sense that they don’t suffer excessive breakdowns. Otherwise, companies experience costly downtime and added stress for engineers due to the ...
5 Ways to Prevent an Outage
In today’s always-on, ever-connected world, we all expect 100% availability. What gets in the way of this? The devil is in the details. Over time, everything breaks: Disks, nodes, containers, networks, DNS ...
Why More Incidents Are Better
Ask most SREs how many incidents they’d have to respond to in a perfect world, and their answer would probably be 'zero.' After all, making software and infrastructure so reliable that incidents ...
How to Adopt an SRE Practice (When You’re not Google)
Site reliability engineering (SRE) isn’t a new term or practice. The practice of applying software engineering skills and principles to operations problems and tasks happened even before site reliability engineer was a ...
The Pros and Cons of Embedded SREs
To embed or not to embed: That is the question. At least, that’s one of the questions that companies have to answer as they decide how to implement site reliability engineering. They ...
The Evolution of Incident Management
Have you ever thought about the history of incident management? If you’re an SRE, you might be so caught up in the day-to-day work of managing reliability and responding to incidents that ...
Site Reliability Engineering (SRE) Comes of Age in 2022
The site reliability engineer (SRE) role is still gathering steam across organizations. In January 2022, LinkedIn listed SRE as the 21st job with the highest global demand throughout the past five years ...