Escaping the Spaghetti: How to Test Untestable Codebases

173 slides extracted.

Slide 1 — 0:08 (watch)

Welcome to the Bug Bash podcast, where we discuss software correctness and reliability. I’m your host, David Nguyen.

Slide 2 — 0:36 (watch)

Everyone wants reliable software, but few want to test messy legacy code. Today, Lewis Campbell from OutData joins the show to share a practical approach for implementing deterministic simulation testing in existing systems. We discuss why React components are unsuitable for business logic and how front-end static typing can prevent nondeterminism. We also address non-technical aspects, such as the politics of updating old codebases, the risks of concealing data conflicts, and strategies for introducing your team to property-based testing. There's even an internet-hungry sheep named Angus, but we'll get to that later. Stay tuned.

Slide 3 — 1:02 (watch)

Before we begin, I want to mention the Bug Bash conference, taking place on April 23rd and 24th in Washington, DC. If you're interested in connecting with others who care about software correctness and reliability, I encourage you to attend. You can find all the details at bugbash.antithesis.com.

Slide 4 — 1:14 (watch)

Welcome, Lewis. Could you please introduce yourself?

Slide 5 — 1:20 (watch)

Hello, my name is Lewis Campbell. I run a small consulting company called Outdata.

Slide 6 — 1:38 (watch)

My company, Outdata, focuses on advanced testing in systems and products, particularly SaaS products with pre-existing codebases that aren't designed for advanced testing. I find that this approach significantly helps clients, especially when they struggle with the velocity of feature development. I'm excited to discuss how to test in pre-existing systems that may not be inherently suited for it.

Slide 7 — 1:54 (watch)

I believe the tagline I read stated that you were one of the first consultants to apply deterministic simulation testing to legacy codebases. Is that correct?

Slide 8 — 2:14 (watch)

To be fair, the portion where I applied deterministic simulation testing was a new component intended to integrate into an existing system. As far as I know, I am the first consultant to do this. While Antithesis implements it on a larger scale, I am not aware of another consultant who has claimed this before. I coined the term myself, and while I didn't conduct an extensive check, I believe it to be accurate.

Slide 9 — 2:30 (watch)

We didn't want to spread that across the internet and investigate potential failure conditions for that property.

Slide 10 — 2:52 (watch)

Exactly. Let's start at the beginning. How do you get into these kinds of engagements? Developer velocity and unlocking features are common challenges that teams face. What prompts companies to bring you in specifically? How do you approach the situation from day one?

Slide 11 — 3:08 (watch)

We recognize that we have a problem with complex, tangled code. The challenge is to address this issue effectively. My approach is to connect with individuals who may not be particularly interested in testing, especially advanced testing, and to engage the larger audience that simply wants their software to be more reliable and to instill confidence in its performance.

Slide 12 — 3:32 (watch)

To begin, you need to find a thread to pull. In the case of a web application, I typically start with the incoming inputs. Most web apps I encounter do not even pass or validate these inputs. Instead of focusing on testing, we often overlook this aspect and just look at feedback.

Slide 13 — 4:02 (watch)

The first step I take is to implement feedback mechanisms that involve passing and validating incoming data inputs to ensure they have a static type. While I believe static typing is one of the least useful forms of feedback, it is the fastest and easiest to implement. Therefore, I often begin with statically typing the inputs, focusing on these straightforward elements.

Slide 14 — 4:24 (watch)

When assessing the situation, focus on the architecture or system rather than the team or individuals involved. You might encounter someone on one side who appears hopeful, while on the other side, the team may seem frustrated, feeling that their time is being wasted.

Slide 15 — 4:46 (watch)

People in denial typically do not engage with me, as my marketing is not effective in attracting such clients. They usually recognize they have some kind of issue, so there is minimal pushback. They often agree that my suggestions are good ideas, but they may lack the time, focus, or resources to act on them. Most developers are aware that something is wrong; if their velocity is slow, they understand the implications.

Slide 16 — 5:30 (watch)

I have a unique ability to detect when nothing is happening. As we dive into a new project, whether internally or externally, we approach it as if we're entering a fresh environment. The first step is to establish low-pass guardrails and implement some form of static typing. What does this look like in practice? Are we focusing on back-end validation for incoming data, or are we addressing the front-end? There are multiple ways to achieve this, so what are your thoughts?

Slide 17 — 5:54 (watch)

A common approach involves having a SaaS application with a back-end, typically a relational database. The back-end understands various aspects of the data being sent to the front-end, including foreign key constraints, and it constructs a well-defined object from this data. The back-end possesses extensive knowledge about the data.

Slide 18 — 6:34 (watch)

You pass the data to the front-end, which interprets it as JSON. To enhance specificity, I create a repository containing schemas that both the back-end and front-end can access. This practice is surprisingly rare. It's essential to validate what the back-end sends against a schema and what the front-end receives against a schema. The data brought in by the front-end, whether for a web app or a mobile app, often introduces significant non-determinism. Web apps and mobile apps are not just floating across a sea of JSON; they are tightly coupled to the back-end. I make this coupling explicit by applying a schema to the input. It's better to display an error page than to present a page full of errors. If the data doesn't match our expectations, we show an error page instead of allowing silent errors, which many developers tend to do. Thus, you identify the primary source of non-determinism and implement the simplest solution to provide feedback to developers, typically through various schemas that align with your static typing system.

Slide 19 — 7:20 (watch)

What do we mean by schema? I can generate a schema file and feel proud of it, but then I often find myself unsure of its purpose.

Slide 20 — 7:40 (watch)

For JavaScript, I recommend using Superstruct or Zod. The challenge with receiving data from external sources is that its static type is inherently unknown. We can discuss static typing extensively, but since everything must interact with the outside world, that world is not aware of your type system.

Slide 21 — 8:02 (watch)

We are creating a bridge so that incoming data conforms to the world defined by your static type system. This involves designing strictly shaped holes instead of wide openings that can accept anything. While the wide opening still exists, we place a cookie cutter shape behind it to enforce our expectations. At the front entrance, think of it like a bouncer checking who is allowed in.

Slide 22 — 8:22 (watch)

You have an entry point into your application, so it's essential to know what data is coming in. A blob of JSON can propagate throughout the entire system.

Slide 23 — 8:36 (watch)

This situation occurs frequently. In TypeScript, you may find the same issue appearing everywhere in your code. This is often the case in projects that lack tests. I'm not advocating for static typing as the most crucial aspect, but if your project has no tests, you should start with something that provides feedback. Implementing static typing is one of the easiest and quickest ways to begin improving your codebase.

Slide 24 — 9:08 (watch)

We described setting up schemas between the front end and the back end as the easiest way to understand where determinism might exist. My understanding is that you view determinism differently than most people. Typically, when people think of determinism, they envision perfect reproduction in every sense. However, you take a more pragmatic approach to determinism.

Slide 25 — 9:50 (watch)

My definition of determinism is similar, but I recognize that for existing projects, achieving a perfect deterministic core, like what Will Wilson discussed about FoundationDB or what Tiger Beetle does today, isn't feasible. However, you can have segments of code that are deterministic, which I believe is still a significant advantage. I view it as finding your "islands of determinism," which is crucial in my opinion.

Slide 26 — 10:04 (watch)

You can reason about those small sections of code, which clarifies the overall structure significantly. These sections can expand and eventually integrate with one another.

Slide 27 — 10:16 (watch)

I prefer to think about it in terms of identifying smaller components rather than rewriting everything into one large core, as that approach is often impractical. The question then becomes: how do you find those smaller components? What should you search for? TSX and React are good starting points.

Slide 28 — 10:24 (watch)

I'm going to focus on React, as it remains one of the most popular frameworks.

Slide 29 — 10:30 (watch)

In a React project, you'll find numerous TSX files, with most of the code contained within these files.

Slide 30 — 10:34 (watch)

A component is an element that renders to the screen and often performs fetch calls.

Slide 31 — 10:46 (watch)

Often, the business logic resides within these components. I started with WinForms, where we were advised to write our code behind the form. However, it seems that web developers have overlooked this practice.

Slide 32 — 10:54 (watch)

The islands of determinism are often found in your top-level, front-facing components.

Slide 33 — 11:00 (watch)

Remove them from the component, as they are difficult to test and create complications.

Slide 34 — 11:08 (watch)

Remove those elements, and as you do, you'll often discover repeated business logic or opportunities to group related components together.

Slide 35 — 11:14 (watch)

I would start with that approach on a typical React project.

Slide 36 — 11:20 (watch)

Does it have a particular look or feel, or is there a signature style to it?

Slide 37 — 11:52 (watch)

The Clojure community often emphasizes the importance of pushing interfaces, particularly those with Java and other foreign function interfaces (FFIs), to the edges of your program. They advocate for maintaining a core that is as pure and lispy as possible, consisting of pure data that is also pure code, which ensures determinism. In contrast, when faced with a complex and tangled codebase, such as a "giant pile of spaghetti" or a "ball of mud," it is advisable to start from the outside and work inward, untangling the first elements you encounter. There are likely multiple entry points for this process.

Slide 38 — 12:18 (watch)

I'm trying to determine where to start when I encounter a different application. Should I begin at the index?

Slide 39 — 12:32 (watch)

If you have a significant issue, such as a large bug log with multiple bugs, it's important to make a good first impression and address the most pressing concerns. Start by identifying where the bugs are occurring. As you resolve one issue, you will likely uncover additional problems. You mentioned the closures that people implement.

Slide 40 — 12:50 (watch)

I am not a closurist, but I believe I understand your point and I agree with it. In a preexisting codebase that hasn't been designed with those principles in mind, you have to start somewhere. Often, you'll encounter several small cores. Additionally, computing tends to be very tribal.

Slide 41 — 13:10 (watch)

Software development is indeed very tribal. The distinction between determinism and non-determinism has been rediscovered by various groups that often do not communicate with one another. This concept can be illustrated by the functional core and imperative shell model. The IOMONAD essentially embodies this idea.

Slide 42 — 13:20 (watch)

I could be mistaken, and I'm sure Haskell will correct me if I am.

Slide 43 — 13:24 (watch)

We also have the hexagon architecture, created by Alistair Cockburn. He is a signatory of the Agile Manifesto, but like everyone, he has made mistakes.

Slide 44 — 13:34 (watch)

Don't judge him for that. However, he did create the hexagon architecture, also known as ports and adapters. This architecture places your business logic in a single deterministic component, with ports and adapters representing user inputs, incoming requests, and interactions with a database.

Slide 45 — 13:48 (watch)

Dependency injection is often utilized for this mechanism. The concept of separating determinism from non-determinism exists across various programming cultures and is generally regarded as a good practice. It tends to be familiar to most people, provided you communicate using their terminology.

Slide 46 — 14:04 (watch)

There is likely a neighboring abstraction we can borrow from, as starting with "find all the monads" is probably not the right approach.

Slide 47 — 14:14 (watch)

We probably can't start with finding all the monads, but I think most people understand that. I'll continue to focus on React.

Slide 48 — 14:22 (watch)

Most people understand that React components are difficult to test.

Slide 49 — 14:30 (watch)

A good starting point is to emphasize that you can create TypeScript or JavaScript files that do not contain a component. These files can focus solely on business logic, making them much easier to test. As a result, our components can become very simple.

Slide 50 — 14:40 (watch)

They contain code for business logic, which we can create to make testing straightforward. This approach allows our components to become very simple. Most people intuitively understand this concept.

Slide 51 — 15:00 (watch)

You need to use the right terminology for different programmers. Keep the simple aspects straightforward while making the complex parts as simple as possible. The first step is to separate your core business logic from your front-facing components. This separation allows you to identify where the business logic resides. Now that we have the pieces separated and new files created with easily testable components, we are in a good position.

Slide 52 — 15:36 (watch)

Our business logic is now well-structured. What comes next? How do we begin enhancing this? There are various techniques available, and while listeners may be familiar with many of them, the challenge lies in selecting the appropriate technique for a legacy codebase. It's not always a straightforward choice between property-based testing, deterministic simulation testing, or introducing randomness into unit tests. Sometimes, you may need to scale back from an ideal state you wish to achieve, as there can be blockers such as dependencies that complicate the process.

Slide 53 — 16:06 (watch)

Which approach do you take? I prefer to build from the bottom up. Small unit tests serve as sanity checks. Remember, don't let perfect be the enemy of good.

Slide 54 — 16:22 (watch)

Little unit tests in the deterministic parts of the code serve as useful sanity checks, but I don't mistake them for anything too robust. It's fairly easy to generalize these into property-based testing.

Slide 55 — 16:38 (watch)

Once you have a core large enough, deterministic simulation testing is not something only systems programmers or database experts can perform. I see a hierarchy that includes unit and integration testing, although I often struggle to define the boundary between the two.

Slide 56 — 16:56 (watch)

A unit can be defined as example-based, parameterized, or random-based. From there, you can work towards deterministic simulation testing. I always start small to get results on the page quickly, so I begin from the bottom up.

Slide 57 — 17:08 (watch)

To break down the problem, consider a typical component.

Slide 58 — 17:20 (watch)

Consider a typical component that comes to mind. Let's walk through how we would build from there if we encounter a codebase that is otherwise messy. However, we have cleared out the clutter.

Slide 59 — 17:30 (watch)

Now we're focusing on the piece we've created, our little island of determinism. What is the first step we take from this foundation?

Slide 60 — 17:48 (watch)

Yes, the island is a great concept. When I refer to islands, I mean something like a benevolent bacteria that clusters together. Perhaps that's not the best metaphor, but it conveys the idea. The first thing to consider is whether this could be a fungus of determinism that spreads and eventually constricts everything. I'm not a great marketer, but I believe user stories are important. For example, you have a scenario where a user clicks a button and a specific action occurs.

Slide 61 — 18:06 (watch)

People will intuitively understand this concept. It involves considering actions at the component level. For example, when a user clicks a button, we need to determine what will happen next.

Slide 62 — 18:18 (watch)

You should consider various scenarios of user interactions with the component and determine the expected outcomes. Ensure that these outcomes make sense to you.

Slide 63 — 18:24 (watch)

Once you're finished, you can start randomizing the inputs. This is similar to the concept of a thousand monkeys at a thousand typewriters.

Slide 64 — 18:34 (watch)

You can't predict user behavior; they will always surprise you. This unpredictability is what property-based data simulation is all about—it's like simulating the monkeys at the typewriters.

Slide 65 — 18:52 (watch)

You can think of examples that the machine can generate, which you might not have considered initially. When working with a new team and unfamiliar codebase, do you find it necessary to rely on occasional detractors? While you may not actively seek out those who are disengaged, have you encountered situations where they remain present, and you might eventually move on?

Slide 66 — 19:12 (watch)

To convey what is being built, I must acknowledge that I cannot control everything, especially what happens after I leave. My role is to guide others, but ultimately, I can only lead them to the water.

Slide 67 — 19:32 (watch)

I want people to see the value in what I'm building. While it may feel overgrown with weeds, I believe that what I create lasts longer than my presence. Regarding pushback, I have encountered some resistance. In the book "Working Effectively with Legacy Code," the author, Michael Feathers, discusses the necessity of changing existing code to enable testing.

Slide 68 — 19:54 (watch)

Michael Feathers discusses the necessity of changing existing code to enable testing. While it's important to make these changes as minimally invasive as possible, they are often unavoidable. Everyone has sections of their codebase that they hesitate to modify, but sometimes, those areas must be addressed.

Slide 69 — 20:06 (watch)

You will receive feedback, but ultimately, you cannot ignore it indefinitely.

Slide 70 — 20:20 (watch)

Eventually, you will need to confront the issues in your code. Ignoring them until the entire project requires a rewrite is likely to be much more disruptive than making small adjustments along the way. While I do receive some pushback and cannot control the outcomes, I believe that most people appreciate having feedback about the code.

Slide 71 — 20:28 (watch)

I don't receive much pushback.

Slide 72 — 20:38 (watch)

What I'm trying to convey is that many teams find themselves in a spaghetti code situation for various reasons. This could be due to expedience or the influence of their chosen LLM, among other factors.

Slide 73 — 20:56 (watch)

These ideas may seem obvious in retrospect, but not long ago, they weren't always clear from the other side of the table. I'm exploring the perspective of someone who might not be fully engaged, perhaps with one eyebrow raised, looking at you skeptically while taking notes.

Slide 74 — 21:16 (watch)

Those situations can be challenging. Consider someone who has one eyebrow raised, looking at you with a notepad, as if to say, "What do you mean?"

Slide 75 — 21:26 (watch)

Consider thinking about their first experience.

Slide 76 — 21:38 (watch)

An example of a real-world case for property-based testing could involve walking through their first property. Many resources on the internet discuss property-based testing, but most focus on sorting lists, which is useful.

Slide 77 — 22:02 (watch)

Sorting lists is important, but it's not our primary focus here. I'm confident the sort function is adequate, and if there are issues, I'll leave that to those with more expertise. Instead, I need to concentrate on defining the property I want to test. Property-based testing originates from the functional programming world, which has its own terminology and concepts.

Slide 78 — 22:40 (watch)

I present property-based testing as a way to parameterize unit tests or generalize them. While some programmers may understand concepts like associativity and commutativity, many do not. However, most people grasp the idea of unit tests. I emphasize that instead of using a single example, we can parameterize the test to cover multiple examples. I liken a unit test to a constant and a property-based test to a function. It's crucial to use the right language, as many people do not fully understand property-based testing.

Slide 79 — 23:14 (watch)

Once you explain property-based testing in terms they understand and demonstrate how many tests the machine can perform, people are usually receptive. I haven't encountered anyone who has been overtly resistant to the concept, although they may have reservations that I am unaware of.

Slide 80 — 23:32 (watch)

I haven't encountered anyone who has been difficult to my face. I try to meet people where they are, considering their diverse experiences, preferences, and biases regarding programming.

Slide 81 — 24:00 (watch)

It's very important to explain concepts in terms that resonate with the audience, and this approach has proven effective. We have examined the codebase, starting from the outside and working our way in, focusing on decoupling external components and identifying islands of determinism near the entry point. We began with Scalabus, leaving the door open while ensuring we placed a cookie cutter inside to maintain the correct shape.

Slide 82 — 24:24 (watch)

We utilized various units and integrations. For more details, you can refer to Louis's blog post on the topic.

Slide 83 — 25:00 (watch)

We took their existing units and started parameterizing them to get them up and running, demonstrating what you could obtain for free. I often joke that many of us are fascinated with technology because the machine does the work, making it easy for most people to lean into that impulse. However, the machine should indeed be doing the work at this point, which helps facilitate the process. What are the real-world challenges you've encountered? You've outlined a clear pipeline that involves generalizing step by step. In theory, everything should work perfectly, but I'm curious if anything has gone wrong.

Slide 84 — 25:36 (watch)

Have there been any challenges along that path? Yes, I understand your point. In my experience, determining the right course of action has become fairly straightforward over the years.

Slide 85 — 25:46 (watch)

Sometimes, there is so little substance that it becomes very challenging to extract something non-deterministic.

Slide 86 — 26:02 (watch)

This may be a controversial point, but when developers place a lot of logic in their database, the backend often becomes just thin API routes around database calls. This setup can be quite challenging to work with.

Slide 87 — 26:40 (watch)

Unless you have something like SQLite, which allows for easy creation of an in-memory version of your database with your schema, placing a lot of logic in your database makes reliable testing very difficult. This issue is particularly prevalent in the backend, where significant processing occurs within the database. While it may be tempting to keep logic close to the data, there are trade-offs to consider. It's much easier to mock a simpler data model or a database with minimal business logic than to mock a complex PostgreSQL system filled with stored procedures. Therefore, this is a significant challenge, and politically, it may be necessary to consider changing your database.

Slide 88 — 27:16 (watch)

Changing your database may not make you popular. It’s not exactly a way to make friends and influence people, so you need to pick your battles wisely. However, we are discussing correctness here.

Slide 89 — 27:26 (watch)

You've stated, and I believe I just heard you say, that you should never use a stored procedure.

Slide 90 — 27:32 (watch)

Those were your words, not mine.

Slide 91 — 27:38 (watch)

If not, let me put it another way.

Slide 92 — 27:48 (watch)

Never use a stored procedure on a database that lacks an in-memory representation that you can run, test, and tear down afterward. This is an important qualifier that few would dispute. I would advocate for this principle immediately, and I believe everyone would agree with it.

Slide 93 — 27:56 (watch)

That's what I want to emphasize. It's a significant point.

Slide 94 — 28:18 (watch)

Technology influences patterns significantly. The choice of programming languages, frameworks, and architectural decisions tends to favor certain styles of solutions. While this can enhance functionality when integrating large amounts of data into a database, it can also lead to complexities that complicate development. A recommendation to counteract this trend is to consider using flat text files.

Slide 95 — 28:40 (watch)

Let's slow down a bit.

Slide 96 — 28:44 (watch)

I won't go into detail about key-value stores.

Slide 97 — 28:58 (watch)

Who can possibly disagree with that? Embedded key-value stores are a solid choice for everyone. I would recommend avoiding stored procedures in the future; consider moving some of that logic into user code. However, I’m not going to tell anyone to stop using their data.

Slide 98 — 29:20 (watch)

For a new project, I strongly recommend using a simpler data model. You gain more value in correctness from something that can be tested and simulated than from a model that centralizes all its correctness in one place.

Slide 99 — 30:00 (watch)

I am not advocating for the exclusive use of document stores or relational databases. My point is that the more logic you embed in your data store, the more challenges you will face. However, convincing people to change an existing data store is often unrealistic. Focus on testing the components that are feasible to test. If certain parts of your system are written in a way that makes them untestable, and if you are unlikely to migrate away from them, then test what you can. Conduct traditional non-deterministic tests using the old staging database. While this approach may not be ideal or deterministic, it is certainly better than not testing at all or relying solely on manual testing. Aim to test as deterministically as possible, but don't get bogged down by concerns about purity, such as whether a test is deterministic, parameterized, or property-based.

Slide 100 — 30:44 (watch)

Build up towards testing pragmatically. Not every company will have the same capabilities as FoundationDB or Tiger Beetle, especially those with existing codebases. We are addressing the challenges associated with tightly coupled implementations within databases that lack an in-memory representation, which makes them difficult to test effectively.

Slide 101 — 31:26 (watch)

When working with a legacy codebase, some components may not be technically difficult to change but can be politically challenging. There’s no technical reason preventing us from swapping a datastore, yet such changes often require a pressing need to be considered. We started with two integrations on the front end and are now moving towards two on the back end. Are there any additional challenges you've encountered while identifying these isolated components or while developing your testing strategy?

Slide 102 — 32:00 (watch)

There can be political challenges when addressing certain parts of the system. In my experience, this resistance often comes from individuals who are more entrenched in the company.

Slide 103 — 32:22 (watch)

I find it effective to speak anonymously with every developer and go above their heads to address concerns. Most developers view certain parts of the codebase as significant sources of bugs; they find them incredibly difficult to change and dislike interacting with them. This situation often slows down their work. It's rare for everyone to be satisfied with the problematic areas of the codebase; typically, they want to move in the opposite direction.

Slide 104 — 33:08 (watch)

Developers often want to take a big hammer and smash the code apart. For any change you propose, you need broad consensus among the developers, which requires explaining the change in terms they care about. If you can't do that, you won't be able to implement the change. Building consensus is crucial, but it doesn't mean you have to accept the dynamics of a large meeting where one developer may dominate the conversation. You can reach out privately via Slack to gauge opinions on specific parts of the system. This approach is reasonable and can help you understand developers' perspectives. Ultimately, if you can't secure buy-in, it's important not to persist unnecessarily. If you find yourself unable to effect change, it may indicate a short engagement with the company. You can assert your position, but changing developers' minds is often challenging.

Slide 105 — 33:46 (watch)

You have to be pragmatic. This reminds me of a software development shop I was familiar with.

Slide 106 — 34:20 (watch)

I wasn't directly involved, but I was very close to a software development shop that had released a product written in Delphi for desktop use. One developer was fully committed to this project, and despite the need for updates, everything was built around a Delphi adapter and library. There were ongoing requests for a rewrite, but this developer consistently rejected them. Ultimately, the company decided to form a separate shadow team to rewrite the entire application in .NET. They changed the Delphi developer's access keys and let the situation unfold, reminiscent of a scene from "Office Space."

Slide 107 — 35:04 (watch)

We changed the keys, believing that the issue would resolve itself. They fixed the glitch, but there are more extreme solutions available if you have the will to pursue them. That's why I'm not a management consultant; I would never recommend such drastic measures against a Delphi developer. I believe there are better approaches, but who am I to judge? It's quite an extreme option, and I generally wouldn't recommend it.

Slide 108 — 35:44 (watch)

I want to walk through this process because many of you may be considering how to apply these concepts to your own projects. We start from the outside in, ensuring we set up the appropriate gateways and identify the islands of determinism, avoiding excessive integration at both the top and bottom levels. Typically, there is someone, likely the person listening, who is most familiar with the techniques, approaches, and scaffolding being built. While you mentioned that you can't always control what happens after you leave, what strategies do you use to set up the team for success as you transition out? I assume you don't just hit build and then leave, right?

Slide 109 — 36:12 (watch)

Or, do you bike? Ah, Karen elegies, it's the old Slashdot classic.

Slide 110 — 36:24 (watch)

I try to get by, and you remember Slashdot. Just hearing it mentioned gave me five more gray hairs. Anyway, you were saying.

Slide 111 — 36:38 (watch)

As I develop, I make it a point to share my progress with others. I often say, "Hey, look, I created this test." I explain how this test addresses the bug we've been experiencing.

Slide 112 — 36:52 (watch)

I fix the bug, and the test is no longer needed. I also take a unit test and generalize it.

Slide 113 — 36:58 (watch)

I created a parameterized test, and it fails when the input is NaN, an empty string, negative one, epsilon, negative infinity, or other edge cases.

Slide 114 — 37:10 (watch)

I highlight my findings, and people often express surprise, saying they never thought such issues could occur. I aim to include valuable insights without overwhelming the audience with too much information.

Slide 115 — 37:38 (watch)

I put a lot of effort into making this presentation concise. At the end, I summarize what I’ve done and how I recommend building things moving forward. I hope the examples I’ve shown, the results I’ve delivered, and the written material resonate with people to some extent. For some teams, these ideas will stick significantly, while for others, they may not. This largely depends on whether they choose to incorporate these practices into their culture. While I can present strong arguments and gain temporary buy-in, I cannot enforce these practices as the standard moving forward.

Slide 116 — 38:10 (watch)

This may sound like a cop-out answer, but I believe it's essential to ensure that the path to water is clear. We should leave signs indicating the direction to the water.

Slide 117 — 38:38 (watch)

People often struggle with creating and renaming files, so they tend to add onto existing work instead. This leads to a situation where we end up with an implementation file that accumulates business logic. I've created a clear path for them to follow, making it easier for others to contribute. Lewis, do you notice that when people begin using these techniques, which are often new to their projects, they adapt relatively quickly?

Slide 118 — 39:40 (watch)

Do you find that it's not hard to identify unusual cases quickly, and that the payoff comes relatively soon? Some people perceive property-based testing as complex due to the number of syllables in the term. They might think, "Oh, that sounds complicated," and associate it with other well-known concepts like dehumanistic simulation testing, which they feel requires unnecessary effort. However, do you find that the barrier to entry is actually lower? Many of your examples involve NANDs and unusual values, which are classic elements of randomization testing. This reminds me of the first episode of the podcast, where Dave Shear, who was building the simulator at FoundationDB, discussed the process.

Slide 119 — 40:36 (watch)

Transcribe technical terms, library, product, and command names accurately, ensuring correct casing and punctuation. Likely terms include llama.cpp, PyTorch, NGINX, CUDA, WebAssembly, and Antithesis.

Slide 120 — 41:00 (watch)

I implemented a simple queue, but it immediately failed.

Slide 121 — 41:08 (watch)

I am fully committed to randomized testing. I consider these principles in my development process.

Slide 122 — 41:16 (watch)

My solution failed immediately. I don't believe many people can write code that works perfectly on the first attempt.

Slide 123 — 41:26 (watch)

The state space of all the possible things that can go wrong is essentially a matter of basic combinatorics, to paraphrase John Cena.

Slide 124 — 41:34 (watch)

Things can go wrong easily due to the numerous combinations of potential issues. The impact becomes apparent very quickly.

Slide 125 — 41:52 (watch)

I strongly advocate for this approach because it doesn't require extensive time to identify small, elusive bugs. User-facing software is inherently full of bugs—every piece of software we create is flawed before testing. I can only speak for myself, but everything I produce initially contains bugs.

Slide 126 — 42:14 (watch)

I am always surprised by what the tools find. I've never found it difficult to identify bugs in either my own code or others'. As a user, I can confirm that many applications are broken. My wife often asks why they didn't get it right the first time, expressing frustration with almost everything she interacts with. It's essential to simulate your users, as they will ultimately discover the issues.

Slide 127 — 42:34 (watch)

We need to establish a common language within the broader community focused on rigorous testing. You mentioned the multi-syllable property-based testing approach.

Slide 128 — 42:44 (watch)

Can we refer to it as randomized testing? We might need to improve our marketing. "DST" sounds better than deterministic simulation testing.

Slide 129 — 42:56 (watch)

Can we refer to simulating your customers? I'm not sure if that will catch on; perhaps we need some marketing. Do you encounter pushback when introducing randomization? Some people might say that no one will include NANDs, reflecting a sense of unrealism around the concept.

Slide 130 — 43:24 (watch)

Yes, if we establish type expectations, we can redefine the process. We start from the edges, identify our deterministic islets, and apply schemas to define our expectations in a verifiable, static manner. However, it’s important to note that this approach has its limits. When we begin to test these expectations, people often question the scenarios we present. For example, borrowing from one of your blog posts about shopping carts, they might ask, "Who would enter an item with a negative price? That doesn’t make any sense."

Slide 131 — 43:50 (watch)

Do you receive that feedback in the real world, and how do you respond? Sometimes. Most people are experienced enough to encounter the user or the bug report that arises from an angry user email, highlighting the unexpected behavior that they initially questioned.

Slide 132 — 44:02 (watch)

People who are further removed from the customer tend to think that way.

Slide 133 — 44:12 (watch)

It's easy to connect with those who have closely interacted with customers.

Slide 134 — 44:22 (watch)

I previously worked at a warehouse where I developed and maintained the software used in both the warehouse and a small factory. My office was located upstairs, while the software operated on all the machines downstairs.

Slide 135 — 44:28 (watch)

If there was a problem, people would come upstairs and say, "Hey, there's a bug." Often, they did something that should never have been done.

Slide 136 — 44:36 (watch)

These individuals work night shifts and face pressure from supervisors, along with various workplace dramas.

Slide 137 — 45:04 (watch)

People who have worked closely with customers are not surprised by their behavior, while those further removed may find it surprising. It's important to treat users as a source of inputs; anything you allow them to do will eventually be done. You should communicate that their actions may not always make sense, as humans are fallible.

Slide 138 — 45:28 (watch)

Thank you. Yes, "fallible" means capable of making mistakes.

Slide 139 — 45:34 (watch)

Emphasize that people are fallible, including themselves, and that they must take action. Everyone makes mistakes.

Slide 140 — 45:44 (watch)

People may have concerns that influence their actions. This highlights the inverse challenge of example-based testing. If you don't consider these factors, you may overlook important aspects of how your software is used.

Slide 141 — 46:14 (watch)

If you have any expectations about how your software will be used, consider that users may have different intentions. This aligns with Hiram's Law: any behavior you expose through an API, if it has enough users, will become a dependency for someone, even if it is not explicitly stated in the contract. Therefore, we should implement methods of randomization and generalization to manage behaviors we want to restrict, preventing them from becoming unmanageable. Otherwise, giving the Internet a text box can lead to unpredictable outcomes.

Slide 142 — 46:48 (watch)

You need to be prepared for it. This requires a mindset shift: understanding what we, as software developers, can control and recognizing that there are aspects of the world beyond our influence.

Slide 143 — 47:08 (watch)

It's simply a source of data that we cannot control, and we need to approach it with that mindset, even regarding external dependencies. Many APIs have their own contracts, but we cannot control if they go down, if they take too long to respond, or if they change their schema. It's essential to fit your software into the context of the real world. Today, many software developers are very...

Slide 144 — 47:22 (watch)

I don't know.

Slide 145 — 47:34 (watch)

In an enterprise software environment, when a bug is reported, it typically goes through multiple levels of support, from the level one help desk to level two and level three, before becoming a ticket for the project team. This process can involve many steps before it reaches the programmer, making it feel quite remote. It's important to convey to stakeholders that you're essentially creating a small input interface—a box or panel—where users can submit information. However, you have no control over what goes into that panel or portal.

Slide 146 — 48:00 (watch)

Anything that can be let into the system will eventually find its way in, especially if the product is successful and attracts enough users. This mindset can be more comfortable for people, even if they are not particularly customer-focused, as it frames the situation as an engineering challenge. The question then becomes how to manage the various inputs that enter the system. There are different ways to communicate this concept depending on the audience's position within the organization, their experience, and their mindset.

Slide 147 — 48:30 (watch)

In general, people want systems to be more reliable because it reduces pain points. This desire for reliability is often an easy sell. Many techniques exist that, while named differently, essentially aim to separate determinism from legitimacy. This approach is widely accepted as beneficial.

Slide 148 — 48:50 (watch)

I hope that our community can develop effective metaphors and straightforward terminology to make these concepts more accessible. Not everyone will share our enthusiasm for testing, so it's important to communicate these ideas in a way that resonates with a broader audience.

Slide 149 — 49:02 (watch)

I hope we can start expanding these ideas in ways that appeal to a broader audience. Perhaps I'm just being selfish in wanting to make my job easier.

Slide 150 — 49:14 (watch)

I believe this approach would likely make our jobs easier. I often use Carl as an example because he is known for being exceptionally reliable. While not the most distributed, he is certainly solid.

Slide 151 — 49:48 (watch)

If your system were as solid as Carl, how would that impact your development practices? What changes would you make? Consider how much of your current process is focused on defense and safety due to a lack of confidence in your system. As you improve your testing practices, remember that it's not just about writing countless tests manually. You could also leverage an LLM to assist in this process.

Slide 152 — 50:36 (watch)

Transcribe technical terms, library names, product names, and command names accurately, ensuring correct casing and punctuation.

Slide 153 — 51:02 (watch)

I use this great website every day.

Slide 154 — 51:12 (watch)

It took a million lines of code to achieve that. This raises a question for me that I wasn't sure how to ask, but you've provided the perfect opportunity.

Slide 155 — 51:46 (watch)

Many teams share interesting reliability stories about their efforts to ensure system availability and reliability, often encountering unexpected challenges. At Google Cloud, we discussed the need for shark-proof cages around undersea cables because the wrong color attracted sharks. In another instance, Angus chewed through the wire connected to a satellite, disrupting internet access, with the nearest 2G tower being a 20-minute drive away, which was quite inconvenient. This highlights the importance of adopting more resilient practices and tools in system design.

Slide 156 — 52:22 (watch)

This reminds me of a fascinating article about programming from Antarctica, which highlights the unique challenges of dealing with latency. It emphasizes how this environment influences the design of systems, leading to very minimal examples transmitted over the network.

Slide 157 — 52:42 (watch)

You mentioned something interesting at the end. Aside from recommending best practices for application development, you noted that many frameworks and tools developers use address issues of reliability and retries, but they often obscure these solutions from the user.

Slide 158 — 53:10 (watch)

As you walk up the abstraction ladder, you may not fully understand how a particular toolchain operates, but you trust that the framework handles it effectively. The value of abstraction lies in allowing users to utilize a durable execution engine, like Temporal or DBoss, without needing to consider its internal workings. However, you suggested that users should still be aware of certain aspects.

Slide 159 — 53:40 (watch)

I was wondering if you have any guidance on how to balance these two perspectives. There are aspects that can be hidden from the user and aspects that should not be concealed. The specific context of that discussion involves both the system user and the application developer.

Slide 160 — 53:54 (watch)

I encountered that post while working in ag tech, where many users are offline, and there is significant concurrent modification of data.

Slide 161 — 54:02 (watch)

Many vendors address the issue of concurrent data editing in various ways. To illustrate this, we can consider two people editing a Git file on their own local machines.

Slide 162 — 54:22 (watch)

Git provides a commit mechanism. Many systems operate on a "last write wins" principle, where they simply record the time of each commit—like saying, "I committed at 4:02 PM, and you committed at 4:05 PM, so your commit is better," even though you haven't seen my commit.

Slide 163 — 54:42 (watch)

What I intended to convey, though I may not have articulated it clearly, is that if your domain has multiple sources of truth that allow people to edit shared items independently or concurrently, you should never hide conflicts.

Slide 164 — 55:00 (watch)

For simpler data structures, there are methods to deterministically merge them, which relates to the CRDT aspect. However, these methods often require user input if the semantic differences between the edits are significant. Therefore, it is important not to hide conflicts when a machine cannot reliably resolve them.

Slide 165 — 55:14 (watch)

The "last strike wins" approach is not a good solution because it arbitrarily discards one copy of the data. I observe this happening far too often in so-called offline systems.

Slide 166 — 55:48 (watch)

Multiple entry points to the state of a system are a critical aspect to manage in your application. You should be cautious about offloading core functionality to components whose semantics you do not fully understand. As you mentioned, the last strike wins strategy can be appropriate in certain situations, but it is not universally applicable.

Slide 167 — 56:08 (watch)

It's less about labeling a pattern as always good or bad, and more about ensuring that you keep the important elements front and center within your sphere of control. Avoid pushing away aspects that are core to what you do.

Slide 168 — 56:18 (watch)

Is that fair? It is fair. We can relate this back to marketing. Instead of thinking of conflicts, perhaps we should consider them as opportunities for consensus.

Slide 169 — 56:30 (watch)

We can view conflicts as opportunities to reach a consensus.

Slide 170 — 56:40 (watch)

In certain systems, conflicts will occur, and they are not something to fear. You cannot ignore them; they can be resolved. It's important to recognize whether your system inherently has conflicts. Sometimes, conflicts are a natural part of your data model.

Slide 171 — 57:04 (watch)

You can't always push conflicts under the rug. We’ll have to invite you back, Louis, for that distributed systems discussion, as that’s what we’re here for. We’ve run out of time for today, so thank you so much for joining us, Louis. I really appreciate your time. And to everyone watching, thank you for listening. We hope you have a great day and take care.

Slide 172 — 57:28 (watch)

Thank you for checking out the Bug Bash podcast. If you have an idea for a show or would like to be a guest, please email us at [email protected]. If you prefer chatting, visit antithesis.com and scroll to the bottom to find the link to our Discord.

Slide 173 — 57:50 (watch)

Finally, if you want to connect with others who care about software correctness and reliability, consider attending the Bug Bash conference this year on April 23rd and 24th in Washington, D.C. All the details are available at bugbash.antithesis.com. Until next time.

Slide 1 — 0:08 (watch)#

Slide 2 — 0:36 (watch)#

Slide 3 — 1:02 (watch)#

Slide 4 — 1:14 (watch)#

Slide 5 — 1:20 (watch)#

Slide 6 — 1:38 (watch)#

Slide 7 — 1:54 (watch)#

Slide 8 — 2:14 (watch)#

Slide 9 — 2:30 (watch)#

Slide 10 — 2:52 (watch)#

Slide 11 — 3:08 (watch)#

Slide 12 — 3:32 (watch)#

Slide 13 — 4:02 (watch)#

Slide 14 — 4:24 (watch)#

Slide 15 — 4:46 (watch)#

Slide 16 — 5:30 (watch)#

Slide 17 — 5:54 (watch)#

Slide 18 — 6:34 (watch)#

Slide 19 — 7:20 (watch)#

Slide 20 — 7:40 (watch)#

Slide 21 — 8:02 (watch)#

Slide 22 — 8:22 (watch)#

Slide 23 — 8:36 (watch)#

Slide 24 — 9:08 (watch)#

Slide 25 — 9:50 (watch)#

Slide 26 — 10:04 (watch)#

Slide 27 — 10:16 (watch)#

Slide 28 — 10:24 (watch)#

Slide 29 — 10:30 (watch)#

Slide 30 — 10:34 (watch)#

Slide 31 — 10:46 (watch)#

Slide 32 — 10:54 (watch)#

Slide 33 — 11:00 (watch)#

Slide 34 — 11:08 (watch)#

Slide 35 — 11:14 (watch)#

Slide 36 — 11:20 (watch)#

Slide 37 — 11:52 (watch)#

Slide 38 — 12:18 (watch)#

Slide 39 — 12:32 (watch)#

Slide 40 — 12:50 (watch)#

Slide 41 — 13:10 (watch)#

Slide 42 — 13:20 (watch)#

Slide 43 — 13:24 (watch)#

Slide 44 — 13:34 (watch)#

Slide 45 — 13:48 (watch)#

Slide 46 — 14:04 (watch)#

Slide 47 — 14:14 (watch)#

Slide 48 — 14:22 (watch)#

Slide 49 — 14:30 (watch)#

Slide 50 — 14:40 (watch)#

Slide 51 — 15:00 (watch)#

Slide 52 — 15:36 (watch)#

Slide 53 — 16:06 (watch)#

Slide 54 — 16:22 (watch)#

Slide 55 — 16:38 (watch)#

Slide 56 — 16:56 (watch)#

Slide 57 — 17:08 (watch)#

Slide 58 — 17:20 (watch)#

Slide 59 — 17:30 (watch)#

Slide 60 — 17:48 (watch)#

Slide 61 — 18:06 (watch)#

Slide 62 — 18:18 (watch)#

Slide 63 — 18:24 (watch)#

Slide 64 — 18:34 (watch)#

Slide 65 — 18:52 (watch)#

Slide 66 — 19:12 (watch)#

Slide 67 — 19:32 (watch)#

Slide 68 — 19:54 (watch)#

Slide 69 — 20:06 (watch)#

Slide 70 — 20:20 (watch)#

Slide 71 — 20:28 (watch)#

Slide 72 — 20:38 (watch)#

Slide 73 — 20:56 (watch)#

Slide 74 — 21:16 (watch)#

Slide 75 — 21:26 (watch)#

Slide 76 — 21:38 (watch)#

Slide 77 — 22:02 (watch)#

Slide 78 — 22:40 (watch)#

Slide 79 — 23:14 (watch)#

Slide 80 — 23:32 (watch)#