EXEED AI

This is Fine! A Podcast about Software and Resilience Engineering's Recent LinkedIn Posts

This is Fine! A Podcast about Software and Resilience Engineering

This is Fine! A Podcast about Software and Resilience Engineering

@charity-majors

138 followers

en20 postsLinkedIn

Posts

Charity Majors

Tech & AI

3mo

Agents inherit permissions from users. Users touch around 4% of the perms they have. That is a hell of a gap to leave sitting around, inviting mischief.
94

Charity Majors

Tech & AI

3mo

Yeah, this is interesting. I used to worry about where the AI-ification of all things was going to leave SREs, since it is flattening or collapsing much of the SDLC. But SREs seem to be thriving. They are used to being relentlessly outcome oriented. It's software engineers who loved the art and craft of writing code who are struggling. And I get it. For a long time, the elegance of a diff was the best proxy we had for how maintainable that code would be, and we rewarded developers richly for that elegance. It's a tough shift to make. You can't tell someone where to find their joy. But joy is more resilient when it aligns with value.
132

Charity Majors

Tech & AI

3mo

"What I am saying is that pre-deployment testing alone is radically insufficient for the systems we’re building now." She says it nicely and professionally, with citations and data. I'll say it differently: Folks, you are fucked if you keep trying to make AI work with the same old pre-production testing regime plus "metrics, logs and traces". Models behave differently when being observed. So *observe them*.
112

Charity Majors

Tech & AI

3mo

Like the rest of you, I'm guessing, I snarfed up the notes Martin Fowler published after the Deer Valley unconference on how software is changing in the AI-native era. SO much good stuff. So much food for thought. But the longer I sit with them, the more troubled I am by what it doesn't say. Only one mention of production? in >5000 words? I wasn't there. All I have is the artifact. But I worry that some of the most respected minds in software engineering are unintentionally replicating a serious blind spot that has haunted the field for as long as I've been in it: treating code like the outcome, and production like an afterthought. Production is not your exhaust pipe. Observability is not something you look at just to fix bugs. And code is a terrible source of truth. Test in prod, or live a lie. Now more than ever.
544

Charity Majors

Tech & AI

3mo

Vendor support for OpenTelemetry will only be real if users punish them for performative support or shady bait-and-switch tactics. If a vendor claims to support OTel, but once they get you in the proof of concept or buying process, they privately urge you not to use it, or tell you it's not ready for prime time: *do not buy from that vendor*. They know what they're doing. The more a vendor can lock you in, the more they can ignore market pressure to improve their own product.
101

Charity Majors

Tech & AI

3mo

TIL that when you turn AI-SRE agents loose on your system, and give them access to a bunch of three pillars-style telemetry, they... turn up their noses and refuse to use it. They go back to the source and fetch the raw, pre-digested telemetry data, with all its relationships intact. With thanks to Kyle Forster for his excellent and thought-provoking blog post. https://lnkd.in/gbMBgQSn
86

Charity Majors

Tech & AI

3mo

Props to Clickhouse for saying this, in a sharp essay on how AI is reshaping the observability landscape: "This is the core shift that Charity Majors has been describing as Observability 2.0: replacing the three-pillar model with a single source of truth based on wide structured events stored in a columnar storage engine....Every modern observability company is now built on this model, and many use ClickHouse as the main storage engine. This might seem like a small thing. But I have long been baffled by the way vendors refuse to link arms and amplify each other's messaging, even when we see the world the same way and are trying to displace the same incumbents (who are massively larger than all of us smaller fish combined). It's confusing to the market when companies with similar philosophies and approaches insist on NOT using similar technical language, I think. My feeling is, "Come on, y'all. We can always go for each other's throats later on, after we've displaced the legacy solutions." 😉 Anyways, if you want to hear me, Alexey Milovidov and Vijay Samuel on a panel talking about o11y 2.0 and columnar stores -- and the second edition of "Observability Engineering" -- register here. March 31st in downtown SF!!! https://lnkd.in/g-QckEc6 https://lnkd.in/graytzZ8
161

Charity Majors

Tech & AI

3mo

"You still have to look at your data." It's amazing how often software engineers forget about this. I was just reading the notes from last month's vaunted Deer Valley meetup on the future of software, hosted by Thoughtworks. Question one: "Where does the rigor go?" Their answers: * Upstream to specification review * Into test suites as first-class artifacts * Into type systems and constraints * Into risk mapping * Into continuous comprehension What about... validating your intent in production? You can only validate your code so much pre-production, because the same code can behave very differently on different systems, backed by different data, with different usage patterns. Only prod is prod. It needs rigor too... and not *just* in the context of bugs and rollbacks and self-healing infrastructure.
32

Charity Majors

Tech & AI

3mo

I've done a lot of shitting on metrics over the years. (It's not their fault, they're just .. lacking.) But what if you could have most of the things you love about metrics, without most of the things you hate about metrics? Metrics, meet context. 🔥
75

Charity Majors

Tech & AI

3mo

The gods of LinkedIn have served me up a Friday gift. "How does your pricing model change behaviour over time? What will teams hesitate to collect, retain, or query once usage grows?" and, "The right vendor doesn’t just answer these questions. They help you think through them." If you care about observability, you should probably follow Fredrik Vikström. I am charmed by his feed: relentless, vendor neutral fact checking interspersed with goofy, dorky memes. I ship it.
35

Charity Majors

Tech & AI

3mo

If you bear responsibility for making decisions about data, SRE, or reliability in your organization, you owe it to yourself to comb through this post. Possibly more than once. The most fascinating bits to me are the tension encapsulated by surprises #2 and #3. Diversity of data sources helps a lot...but many data sources aren't as good as their authors think. Years of task tracking, hundreds of pages of architecture documentation, tens of millions of lines of code... all collapsible into a paragraph per service or a one page summary. "When the output of these diverse tools are combined in the same context window, iteratively narrowing in on the same problem, magic happens... Replace that with a bunch of logs and metrics labels alone? No chance." Thanks to Kyle Forster and the rest of the RunWhen team for showing their work. This is gold. https://lnkd.in/gDt7CHmb
65

Charity Majors

Tech & AI

3mo

Okay, I must know. How many other people building agents are generating wide canonical log events or traces and storing them in DuckDB (or some other local columnar store) for all your telemetry needs? 🦆 And, or: How many of you have discovered that the agents you are building have learned to route around the logs and metrics you gave them, seeking out wider, richer upstream telemetry to use instead?
17

Charity Majors

Tech & AI

3mo

"The production telemetry of a system isn't just operational data. It's the accumulated record of what the system has been asked to do and how it responded. The spec that survived. Every other artifact decayed." "For non-greenfield systems, the sequence isn't: Write a spec → generate code → run tests It's: Observe production → extract behavioral contracts → encode as system tests → use those as the spec → then bring the agent in" tfw you've been meaning to write about something, then you read someone's essay and go, "well now I don't have to!" this whole piece 🔥🔥🔥
57

Charity Majors

Tech & AI

3mo

"The uncomfortable implication for the AI-SRE space is that the quality ceiling of your automation is set by the quality of the data going in, not by the intelligence of the model doing the reasoning." Why was AIOps such a dud, and AI SRE so much more promising? Yes, the models got better. Yes, agents. But also, AIOps data was only ever flimsy and sparse. There is only so much a model can do with shallow data. The relationships must exist, the ontologies must cohere, the context must connect and shade and inform. Otherwise we go right back to confusing count with importance.
62

This is Fine! A Podcast about Software and Resilience Engineering

Tech & AI

3mo

Are software engineers artisans or are they an "arm of management" automating things that other people do? Or a mix of both? Just one of the things we spoke with Fred Hebert about as we reviewed the 2025 DORA report together in our new episode.
17

Charity Majors

Tech & AI

3mo

We are all in this compressed moment of change, where the entire lifecycle of software development is being radically reimagined. But under the hood, it's still software. Just 10x, 100x, 1000x as much of it, with changes coming faster than ever before. We were built for this moment. It's exciting to watch.
45

Charity Majors

Tech & AI

3mo

"The problem with frontend technical debt is that it creates two kinds of pain: the obvious kind that slows down developers, and the insidious kind that impacts a company's bottom line." and, "The frontend IS your product from the user's perspective" 🔥🔥🔥
81

honeycomb.io

Tech & AI

3mo

Where does rigor live in modern software systems? In this new post, Charity Majors argues that it doesn’t come from pre-production environments; it comes from production itself. As systems grow more complex (and AI accelerates change), the only place to truly understand behavior is where it actually happens. Read more 👉 https://lnkd.in/ebPUd_qR #honeycomb #observability #AI
23

Charity Majors

Tech & AI

3mo

DID YOU KNOW, fine citizens, that shortening the feedback loop between development and production could HELP WITH BOTH? I laugh so I do not cry. Love 2 see the kidz figure this one out. And no better guides for your transformation journey than the one and only Austin Parker and Akshay Utture!!! Bring your questions, try and stump em. I'll send you a sticker if you can.
73

Charity Majors

Tech & AI

3mo

"ok kyle, but could you use a larger font next time?" 😉 As Kyle says, "the race to get the most useful data, turn it into useful context, and surface it in useful ways will continue on....But logs and metrics AI-SRE alone? That's a cul-de-sac." If your three pillars data can't deliver the goods, agents will simply route around you. Traditional observability vendors should be afraid.
30