Podcast: Safely Changing Software to Avoid Incidents: A Conversation with Justin Sheehy

by David Chen September 8, 2025

written by David Chen September 8, 2025 2 minutes read

Safely Changing Software to Avoid Incidents: Insights from Justin Sheehy

In a recent podcast on software deployment, Michael Stiefel engaged in a fascinating dialogue with Justin Sheehy. Sheehy, a seasoned expert in the field, shared invaluable insights on the art of safely transitioning software to production environments without triggering incidents that could derail operations.

The Futility of Root Cause Analysis

One key takeaway from the conversation was Sheehy’s perspective on the limitations of traditional root cause analysis. Instead of fixating on pinpointing a singular cause for incidents, he emphasized the need to focus on systemic improvements and preventive measures. By adopting a proactive approach that prioritizes systemic resilience over mere fault attribution, teams can better safeguard their systems against potential failures.

Importance of a Shared Incident Language

Another critical point discussed was the significance of establishing a shared language for discussing and addressing incidents. Sheehy underscored how having a common vocabulary and framework for analyzing and communicating about incidents is essential for fostering effective collaboration and learning within teams. This shared understanding can streamline incident response efforts and facilitate a more coherent post-incident review process.

The Need for Malleable and Observable Software

Sheehy also highlighted the dual importance of software malleability and observability in ensuring smooth deployments. Malleable software, characterized by its flexibility and ease of modification, allows teams to adapt quickly to changing requirements and potential issues. On the other hand, observability, which refers to the ability to comprehensively monitor and analyze system behavior, plays a crucial role in detecting anomalies and diagnosing problems before they escalate into full-blown incidents.

Embracing Continuous Improvement

Throughout the podcast, Sheehy emphasized the value of continuous improvement and iterative refinement in software deployment practices. By cultivating a culture of learning from both successes and failures, teams can iteratively enhance their deployment processes, fortify system resilience, and ultimately deliver more reliable software to end-users.

In conclusion, Justin Sheehy’s insights underscore the critical importance of proactive, collaborative, and adaptive approaches to software deployment. By reevaluating traditional incident response paradigms, fostering a shared incident language, and prioritizing software malleability and observability, teams can navigate the complexities of software deployment with greater confidence and resilience.

For more in-depth insights from Justin Sheehy and Michael Stiefel on safely changing software to avoid incidents, tune in to the full podcast here.

academic collaboration AI observability AI-driven Root Cause Analysis Deployment Processes Learning Culture Shared Incident Language Software Malleability

Podcast: Safely Changing Software to Avoid Incidents: A Conversation with Justin Sheehy

Safely Changing Software to Avoid Incidents: Insights from Justin Sheehy

The Futility of Root Cause Analysis

Importance of a Shared Incident Language

The Need for Malleable and Observable Software

Embracing Continuous Improvement

Java News Roundup: OpenJDK, TornadoVM, Payara Platform, Apache Kafka, Grails, Micronaut

Getting Started With ClickHouse for AI/ML in Python

You may also like