173 slides extracted.


Slide 1 — 0:08 (watch)

Slide 1Welcome to the Bug Bash podcast, where we discuss software correctness and reliability. I’m your host, David Nguyen.

Slide 2 — 0:36 (watch)

Slide 2Everyone wants reliable software, but few want to test messy legacy code. Today, Lewis Campbell from OutData joins the show to share a practical approach for implementing deterministic simulation testing in existing systems. We discuss why React components are unsuitable for business logic and how front-end static typing can prevent nondeterminism. We also address non-technical aspects, such as the politics of updating old codebases, the risks of concealing data conflicts, and strategies for introducing your team to property-based testing. There's even an internet-hungry sheep named Angus, but we'll get to that later. Stay tuned.

Slide 3 — 1:02 (watch)

Slide 3Before we begin, I want to mention the Bug Bash conference, taking place on April 23rd and 24th in Washington, DC. If you're interested in connecting with others who care about software correctness and reliability, I encourage you to attend. You can find all the details at bugbash.antithesis.com.

Slide 4 — 1:14 (watch)

Slide 4Welcome, Lewis. Could you please introduce yourself?

Slide 5 — 1:20 (watch)

Slide 5Hello, my name is Lewis Campbell. I run a small consulting company called Outdata.

Slide 6 — 1:38 (watch)

Slide 6My company, Outdata, focuses on advanced testing in systems and products, particularly SaaS products with pre-existing codebases that aren't designed for advanced testing. I find that this approach significantly helps clients, especially when they struggle with the velocity of feature development. I'm excited to discuss how to test in pre-existing systems that may not be inherently suited for it.

Slide 7 — 1:54 (watch)

Slide 7I believe the tagline I read stated that you were one of the first consultants to apply deterministic simulation testing to legacy codebases. Is that correct?

Slide 8 — 2:14 (watch)

Slide 8To be fair, the portion where I applied deterministic simulation testing was a new component intended to integrate into an existing system. As far as I know, I am the first consultant to do this. While Antithesis implements it on a larger scale, I am not aware of another consultant who has claimed this before. I coined the term myself, and while I didn't conduct an extensive check, I believe it to be accurate.

Slide 9 — 2:30 (watch)

Slide 9We didn't want to spread that across the internet and investigate potential failure conditions for that property.

Slide 10 — 2:52 (watch)

Slide 10Exactly. Let's start at the beginning. How do you get into these kinds of engagements? Developer velocity and unlocking features are common challenges that teams face. What prompts companies to bring you in specifically? How do you approach the situation from day one?

Slide 11 — 3:08 (watch)

Slide 11We recognize that we have a problem with complex, tangled code. The challenge is to address this issue effectively. My approach is to connect with individuals who may not be particularly interested in testing, especially advanced testing, and to engage the larger audience that simply wants their software to be more reliable and to instill confidence in its performance.

Slide 12 — 3:32 (watch)

Slide 12To begin, you need to find a thread to pull. In the case of a web application, I typically start with the incoming inputs. Most web apps I encounter do not even pass or validate these inputs. Instead of focusing on testing, we often overlook this aspect and just look at feedback.

Slide 13 — 4:02 (watch)

Slide 13The first step I take is to implement feedback mechanisms that involve passing and validating incoming data inputs to ensure they have a static type. While I believe static typing is one of the least useful forms of feedback, it is the fastest and easiest to implement. Therefore, I often begin with statically typing the inputs, focusing on these straightforward elements.

Slide 14 — 4:24 (watch)

Slide 14When assessing the situation, focus on the architecture or system rather than the team or individuals involved. You might encounter someone on one side who appears hopeful, while on the other side, the team may seem frustrated, feeling that their time is being wasted.

Slide 15 — 4:46 (watch)

Slide 15People in denial typically do not engage with me, as my marketing is not effective in attracting such clients. They usually recognize they have some kind of issue, so there is minimal pushback. They often agree that my suggestions are good ideas, but they may lack the time, focus, or resources to act on them. Most developers are aware that something is wrong; if their velocity is slow, they understand the implications.

Slide 16 — 5:30 (watch)

Slide 16I have a unique ability to detect when nothing is happening. As we dive into a new project, whether internally or externally, we approach it as if we're entering a fresh environment. The first step is to establish low-pass guardrails and implement some form of static typing. What does this look like in practice? Are we focusing on back-end validation for incoming data, or are we addressing the front-end? There are multiple ways to achieve this, so what are your thoughts?

Slide 17 — 5:54 (watch)

Slide 17A common approach involves having a SaaS application with a back-end, typically a relational database. The back-end understands various aspects of the data being sent to the front-end, including foreign key constraints, and it constructs a well-defined object from this data. The back-end possesses extensive knowledge about the data.

Slide 18 — 6:34 (watch)

Slide 18You pass the data to the front-end, which interprets it as JSON. To enhance specificity, I create a repository containing schemas that both the back-end and front-end can access. This practice is surprisingly rare. It's essential to validate what the back-end sends against a schema and what the front-end receives against a schema. The data brought in by the front-end, whether for a web app or a mobile app, often introduces significant non-determinism. Web apps and mobile apps are not just floating across a sea of JSON; they are tightly coupled to the back-end. I make this coupling explicit by applying a schema to the input. It's better to display an error page than to present a page full of errors. If the data doesn't match our expectations, we show an error page instead of allowing silent errors, which many developers tend to do. Thus, you identify the primary source of non-determinism and implement the simplest solution to provide feedback to developers, typically through various schemas that align with your static typing system.

Slide 19 — 7:20 (watch)

Slide 19What do we mean by schema? I can generate a schema file and feel proud of it, but then I often find myself unsure of its purpose.

Slide 20 — 7:40 (watch)

Slide 20For JavaScript, I recommend using Superstruct or Zod. The challenge with receiving data from external sources is that its static type is inherently unknown. We can discuss static typing extensively, but since everything must interact with the outside world, that world is not aware of your type system.

Slide 21 — 8:02 (watch)

Slide 21We are creating a bridge so that incoming data conforms to the world defined by your static type system. This involves designing strictly shaped holes instead of wide openings that can accept anything. While the wide opening still exists, we place a cookie cutter shape behind it to enforce our expectations. At the front entrance, think of it like a bouncer checking who is allowed in.

Slide 22 — 8:22 (watch)

Slide 22You have an entry point into your application, so it's essential to know what data is coming in. A blob of JSON can propagate throughout the entire system.

Slide 23 — 8:36 (watch)

Slide 23This situation occurs frequently. In TypeScript, you may find the same issue appearing everywhere in your code. This is often the case in projects that lack tests. I'm not advocating for static typing as the most crucial aspect, but if your project has no tests, you should start with something that provides feedback. Implementing static typing is one of the easiest and quickest ways to begin improving your codebase.

Slide 24 — 9:08 (watch)

Slide 24We described setting up schemas between the front end and the back end as the easiest way to understand where determinism might exist. My understanding is that you view determinism differently than most people. Typically, when people think of determinism, they envision perfect reproduction in every sense. However, you take a more pragmatic approach to determinism.

Slide 25 — 9:50 (watch)

Slide 25My definition of determinism is similar, but I recognize that for existing projects, achieving a perfect deterministic core, like what Will Wilson discussed about FoundationDB or what Tiger Beetle does today, isn't feasible. However, you can have segments of code that are deterministic, which I believe is still a significant advantage. I view it as finding your "islands of determinism," which is crucial in my opinion.

Slide 26 — 10:04 (watch)

Slide 26You can reason about those small sections of code, which clarifies the overall structure significantly. These sections can expand and eventually integrate with one another.

Slide 27 — 10:16 (watch)

Slide 27I prefer to think about it in terms of identifying smaller components rather than rewriting everything into one large core, as that approach is often impractical. The question then becomes: how do you find those smaller components? What should you search for? TSX and React are good starting points.

Slide 28 — 10:24 (watch)

Slide 28I'm going to focus on React, as it remains one of the most popular frameworks.

Slide 29 — 10:30 (watch)

Slide 29In a React project, you'll find numerous TSX files, with most of the code contained within these files.

Slide 30 — 10:34 (watch)

Slide 30A component is an element that renders to the screen and often performs fetch calls.

Slide 31 — 10:46 (watch)

Slide 31Often, the business logic resides within these components. I started with WinForms, where we were advised to write our code behind the form. However, it seems that web developers have overlooked this practice.

Slide 32 — 10:54 (watch)

Slide 32The islands of determinism are often found in your top-level, front-facing components.

Slide 33 — 11:00 (watch)

Slide 33Remove them from the component, as they are difficult to test and create complications.

Slide 34 — 11:08 (watch)

Slide 34Remove those elements, and as you do, you'll often discover repeated business logic or opportunities to group related components together.

Slide 35 — 11:14 (watch)

Slide 35I would start with that approach on a typical React project.

Slide 36 — 11:20 (watch)

Slide 36Does it have a particular look or feel, or is there a signature style to it?

Slide 37 — 11:52 (watch)

Slide 37The Clojure community often emphasizes the importance of pushing interfaces, particularly those with Java and other foreign function interfaces (FFIs), to the edges of your program. They advocate for maintaining a core that is as pure and lispy as possible, consisting of pure data that is also pure code, which ensures determinism. In contrast, when faced with a complex and tangled codebase, such as a "giant pile of spaghetti" or a "ball of mud," it is advisable to start from the outside and work inward, untangling the first elements you encounter. There are likely multiple entry points for this process.

Slide 38 — 12:18 (watch)

Slide 38I'm trying to determine where to start when I encounter a different application. Should I begin at the index?

Slide 39 — 12:32 (watch)

Slide 39If you have a significant issue, such as a large bug log with multiple bugs, it's important to make a good first impression and address the most pressing concerns. Start by identifying where the bugs are occurring. As you resolve one issue, you will likely uncover additional problems. You mentioned the closures that people implement.

Slide 40 — 12:50 (watch)

Slide 40I am not a closurist, but I believe I understand your point and I agree with it. In a preexisting codebase that hasn't been designed with those principles in mind, you have to start somewhere. Often, you'll encounter several small cores. Additionally, computing tends to be very tribal.

Slide 41 — 13:10 (watch)

Slide 41Software development is indeed very tribal. The distinction between determinism and non-determinism has been rediscovered by various groups that often do not communicate with one another. This concept can be illustrated by the functional core and imperative shell model. The IOMONAD essentially embodies this idea.

Slide 42 — 13:20 (watch)

Slide 42I could be mistaken, and I'm sure Haskell will correct me if I am.

Slide 43 — 13:24 (watch)

Slide 43We also have the hexagon architecture, created by Alistair Cockburn. He is a signatory of the Agile Manifesto, but like everyone, he has made mistakes.

Slide 44 — 13:34 (watch)

Slide 44Don't judge him for that. However, he did create the hexagon architecture, also known as ports and adapters. This architecture places your business logic in a single deterministic component, with ports and adapters representing user inputs, incoming requests, and interactions with a database.

Slide 45 — 13:48 (watch)

Slide 45Dependency injection is often utilized for this mechanism. The concept of separating determinism from non-determinism exists across various programming cultures and is generally regarded as a good practice. It tends to be familiar to most people, provided you communicate using their terminology.

Slide 46 — 14:04 (watch)

Slide 46There is likely a neighboring abstraction we can borrow from, as starting with "find all the monads" is probably not the right approach.

Slide 47 — 14:14 (watch)

Slide 47We probably can't start with finding all the monads, but I think most people understand that. I'll continue to focus on React.

Slide 48 — 14:22 (watch)

Slide 48Most people understand that React components are difficult to test.

Slide 49 — 14:30 (watch)

Slide 49A good starting point is to emphasize that you can create TypeScript or JavaScript files that do not contain a component. These files can focus solely on business logic, making them much easier to test. As a result, our components can become very simple.

Slide 50 — 14:40 (watch)

Slide 50They contain code for business logic, which we can create to make testing straightforward. This approach allows our components to become very simple. Most people intuitively understand this concept.

Slide 51 — 15:00 (watch)

Slide 51You need to use the right terminology for different programmers. Keep the simple aspects straightforward while making the complex parts as simple as possible. The first step is to separate your core business logic from your front-facing components. This separation allows you to identify where the business logic resides. Now that we have the pieces separated and new files created with easily testable components, we are in a good position.

Slide 52 — 15:36 (watch)

Slide 52Our business logic is now well-structured. What comes next? How do we begin enhancing this? There are various techniques available, and while listeners may be familiar with many of them, the challenge lies in selecting the appropriate technique for a legacy codebase. It's not always a straightforward choice between property-based testing, deterministic simulation testing, or introducing randomness into unit tests. Sometimes, you may need to scale back from an ideal state you wish to achieve, as there can be blockers such as dependencies that complicate the process.

Slide 53 — 16:06 (watch)

Slide 53Which approach do you take? I prefer to build from the bottom up. Small unit tests serve as sanity checks. Remember, don't let perfect be the enemy of good.

Slide 54 — 16:22 (watch)

Slide 54Little unit tests in the deterministic parts of the code serve as useful sanity checks, but I don't mistake them for anything too robust. It's fairly easy to generalize these into property-based testing.

Slide 55 — 16:38 (watch)

Slide 55Once you have a core large enough, deterministic simulation testing is not something only systems programmers or database experts can perform. I see a hierarchy that includes unit and integration testing, although I often struggle to define the boundary between the two.

Slide 56 — 16:56 (watch)

Slide 56A unit can be defined as example-based, parameterized, or random-based. From there, you can work towards deterministic simulation testing. I always start small to get results on the page quickly, so I begin from the bottom up.

Slide 57 — 17:08 (watch)

Slide 57To break down the problem, consider a typical component.

Slide 58 — 17:20 (watch)

Slide 58Consider a typical component that comes to mind. Let's walk through how we would build from there if we encounter a codebase that is otherwise messy. However, we have cleared out the clutter.

Slide 59 — 17:30 (watch)

Slide 59Now we're focusing on the piece we've created, our little island of determinism. What is the first step we take from this foundation?

Slide 60 — 17:48 (watch)

Slide 60Yes, the island is a great concept. When I refer to islands, I mean something like a benevolent bacteria that clusters together. Perhaps that's not the best metaphor, but it conveys the idea. The first thing to consider is whether this could be a fungus of determinism that spreads and eventually constricts everything. I'm not a great marketer, but I believe user stories are important. For example, you have a scenario where a user clicks a button and a specific action occurs.

Slide 61 — 18:06 (watch)

Slide 61People will intuitively understand this concept. It involves considering actions at the component level. For example, when a user clicks a button, we need to determine what will happen next.

Slide 62 — 18:18 (watch)

Slide 62You should consider various scenarios of user interactions with the component and determine the expected outcomes. Ensure that these outcomes make sense to you.

Slide 63 — 18:24 (watch)

Slide 63Once you're finished, you can start randomizing the inputs. This is similar to the concept of a thousand monkeys at a thousand typewriters.

Slide 64 — 18:34 (watch)

Slide 64You can't predict user behavior; they will always surprise you. This unpredictability is what property-based data simulation is all about—it's like simulating the monkeys at the typewriters.

Slide 65 — 18:52 (watch)

Slide 65You can think of examples that the machine can generate, which you might not have considered initially. When working with a new team and unfamiliar codebase, do you find it necessary to rely on occasional detractors? While you may not actively seek out those who are disengaged, have you encountered situations where they remain present, and you might eventually move on?

Slide 66 — 19:12 (watch)

Slide 66To convey what is being built, I must acknowledge that I cannot control everything, especially what happens after I leave. My role is to guide others, but ultimately, I can only lead them to the water.

Slide 67 — 19:32 (watch)

Slide 67I want people to see the value in what I'm building. While it may feel overgrown with weeds, I believe that what I create lasts longer than my presence. Regarding pushback, I have encountered some resistance. In the book "Working Effectively with Legacy Code," the author, Michael Feathers, discusses the necessity of changing existing code to enable testing.

Slide 68 — 19:54 (watch)

Slide 68Michael Feathers discusses the necessity of changing existing code to enable testing. While it's important to make these changes as minimally invasive as possible, they are often unavoidable. Everyone has sections of their codebase that they hesitate to modify, but sometimes, those areas must be addressed.

Slide 69 — 20:06 (watch)

Slide 69You will receive feedback, but ultimately, you cannot ignore it indefinitely.

Slide 70 — 20:20 (watch)

Slide 70Eventually, you will need to confront the issues in your code. Ignoring them until the entire project requires a rewrite is likely to be much more disruptive than making small adjustments along the way. While I do receive some pushback and cannot control the outcomes, I believe that most people appreciate having feedback about the code.

Slide 71 — 20:28 (watch)

Slide 71I don't receive much pushback.

Slide 72 — 20:38 (watch)

Slide 72What I'm trying to convey is that many teams find themselves in a spaghetti code situation for various reasons. This could be due to expedience or the influence of their chosen LLM, among other factors.

Slide 73 — 20:56 (watch)

Slide 73These ideas may seem obvious in retrospect, but not long ago, they weren't always clear from the other side of the table. I'm exploring the perspective of someone who might not be fully engaged, perhaps with one eyebrow raised, looking at you skeptically while taking notes.

Slide 74 — 21:16 (watch)

Slide 74Those situations can be challenging. Consider someone who has one eyebrow raised, looking at you with a notepad, as if to say, "What do you mean?"

Slide 75 — 21:26 (watch)

Slide 75Consider thinking about their first experience.

Slide 76 — 21:38 (watch)

Slide 76An example of a real-world case for property-based testing could involve walking through their first property. Many resources on the internet discuss property-based testing, but most focus on sorting lists, which is useful.

Slide 77 — 22:02 (watch)

Slide 77Sorting lists is important, but it's not our primary focus here. I'm confident the sort function is adequate, and if there are issues, I'll leave that to those with more expertise. Instead, I need to concentrate on defining the property I want to test. Property-based testing originates from the functional programming world, which has its own terminology and concepts.

Slide 78 — 22:40 (watch)

Slide 78I present property-based testing as a way to parameterize unit tests or generalize them. While some programmers may understand concepts like associativity and commutativity, many do not. However, most people grasp the idea of unit tests. I emphasize that instead of using a single example, we can parameterize the test to cover multiple examples. I liken a unit test to a constant and a property-based test to a function. It's crucial to use the right language, as many people do not fully understand property-based testing.

Slide 79 — 23:14 (watch)

Slide 79Once you explain property-based testing in terms they understand and demonstrate how many tests the machine can perform, people are usually receptive. I haven't encountered anyone who has been overtly resistant to the concept, although they may have reservations that I am unaware of.

Slide 80 — 23:32 (watch)

Slide 80I haven't encountered anyone who has been difficult to my face. I try to meet people where they are, considering their diverse experiences, preferences, and biases regarding programming.

Slide 81 — 24:00 (watch)

Slide 81It's very important to explain concepts in terms that resonate with the audience, and this approach has proven effective. We have examined the codebase, starting from the outside and working our way in, focusing on decoupling external components and identifying islands of determinism near the entry point. We began with Scalabus, leaving the door open while ensuring we placed a cookie cutter inside to maintain the correct shape.

Slide 82 — 24:24 (watch)

Slide 82We utilized various units and integrations. For more details, you can refer to Louis's blog post on the topic.

Slide 83 — 25:00 (watch)

Slide 83We took their existing units and started parameterizing them to get them up and running, demonstrating what you could obtain for free. I often joke that many of us are fascinated with technology because the machine does the work, making it easy for most people to lean into that impulse. However, the machine should indeed be doing the work at this point, which helps facilitate the process. What are the real-world challenges you've encountered? You've outlined a clear pipeline that involves generalizing step by step. In theory, everything should work perfectly, but I'm curious if anything has gone wrong.

Slide 84 — 25:36 (watch)

Slide 84Have there been any challenges along that path? Yes, I understand your point. In my experience, determining the right course of action has become fairly straightforward over the years.

Slide 85 — 25:46 (watch)

Slide 85Sometimes, there is so little substance that it becomes very challenging to extract something non-deterministic.

Slide 86 — 26:02 (watch)

Slide 86This may be a controversial point, but when developers place a lot of logic in their database, the backend often becomes just thin API routes around database calls. This setup can be quite challenging to work with.

Slide 87 — 26:40 (watch)

Slide 87Unless you have something like SQLite, which allows for easy creation of an in-memory version of your database with your schema, placing a lot of logic in your database makes reliable testing very difficult. This issue is particularly prevalent in the backend, where significant processing occurs within the database. While it may be tempting to keep logic close to the data, there are trade-offs to consider. It's much easier to mock a simpler data model or a database with minimal business logic than to mock a complex PostgreSQL system filled with stored procedures. Therefore, this is a significant challenge, and politically, it may be necessary to consider changing your database.

Slide 88 — 27:16 (watch)

Slide 88Changing your database may not make you popular. It’s not exactly a way to make friends and influence people, so you need to pick your battles wisely. However, we are discussing correctness here.

Slide 89 — 27:26 (watch)

Slide 89You've stated, and I believe I just heard you say, that you should never use a stored procedure.

Slide 90 — 27:32 (watch)

Slide 90Those were your words, not mine.

Slide 91 — 27:38 (watch)

Slide 91If not, let me put it another way.

Slide 92 — 27:48 (watch)

Slide 92Never use a stored procedure on a database that lacks an in-memory representation that you can run, test, and tear down afterward. This is an important qualifier that few would dispute. I would advocate for this principle immediately, and I believe everyone would agree with it.

Slide 93 — 27:56 (watch)

Slide 93That's what I want to emphasize. It's a significant point.

Slide 94 — 28:18 (watch)

Slide 94Technology influences patterns significantly. The choice of programming languages, frameworks, and architectural decisions tends to favor certain styles of solutions. While this can enhance functionality when integrating large amounts of data into a database, it can also lead to complexities that complicate development. A recommendation to counteract this trend is to consider using flat text files.

Slide 95 — 28:40 (watch)

Slide 95Let's slow down a bit.

Slide 96 — 28:44 (watch)

Slide 96I won't go into detail about key-value stores.

Slide 97 — 28:58 (watch)

Slide 97Who can possibly disagree with that? Embedded key-value stores are a solid choice for everyone. I would recommend avoiding stored procedures in the future; consider moving some of that logic into user code. However, I’m not going to tell anyone to stop using their data.

Slide 98 — 29:20 (watch)

Slide 98For a new project, I strongly recommend using a simpler data model. You gain more value in correctness from something that can be tested and simulated than from a model that centralizes all its correctness in one place.

Slide 99 — 30:00 (watch)

Slide 99I am not advocating for the exclusive use of document stores or relational databases. My point is that the more logic you embed in your data store, the more challenges you will face. However, convincing people to change an existing data store is often unrealistic. Focus on testing the components that are feasible to test. If certain parts of your system are written in a way that makes them untestable, and if you are unlikely to migrate away from them, then test what you can. Conduct traditional non-deterministic tests using the old staging database. While this approach may not be ideal or deterministic, it is certainly better than not testing at all or relying solely on manual testing. Aim to test as deterministically as possible, but don't get bogged down by concerns about purity, such as whether a test is deterministic, parameterized, or property-based.

Slide 100 — 30:44 (watch)

Slide 100Build up towards testing pragmatically. Not every company will have the same capabilities as FoundationDB or Tiger Beetle, especially those with existing codebases. We are addressing the challenges associated with tightly coupled implementations within databases that lack an in-memory representation, which makes them difficult to test effectively.

Slide 101 — 31:26 (watch)

Slide 101When working with a legacy codebase, some components may not be technically difficult to change but can be politically challenging. There’s no technical reason preventing us from swapping a datastore, yet such changes often require a pressing need to be considered. We started with two integrations on the front end and are now moving towards two on the back end. Are there any additional challenges you've encountered while identifying these isolated components or while developing your testing strategy?

Slide 102 — 32:00 (watch)

Slide 102There can be political challenges when addressing certain parts of the system. In my experience, this resistance often comes from individuals who are more entrenched in the company.

Slide 103 — 32:22 (watch)

Slide 103I find it effective to speak anonymously with every developer and go above their heads to address concerns. Most developers view certain parts of the codebase as significant sources of bugs; they find them incredibly difficult to change and dislike interacting with them. This situation often slows down their work. It's rare for everyone to be satisfied with the problematic areas of the codebase; typically, they want to move in the opposite direction.

Slide 104 — 33:08 (watch)

Slide 104Developers often want to take a big hammer and smash the code apart. For any change you propose, you need broad consensus among the developers, which requires explaining the change in terms they care about. If you can't do that, you won't be able to implement the change. Building consensus is crucial, but it doesn't mean you have to accept the dynamics of a large meeting where one developer may dominate the conversation. You can reach out privately via Slack to gauge opinions on specific parts of the system. This approach is reasonable and can help you understand developers' perspectives. Ultimately, if you can't secure buy-in, it's important not to persist unnecessarily. If you find yourself unable to effect change, it may indicate a short engagement with the company. You can assert your position, but changing developers' minds is often challenging.

Slide 105 — 33:46 (watch)

Slide 105You have to be pragmatic. This reminds me of a software development shop I was familiar with.

Slide 106 — 34:20 (watch)

Slide 106I wasn't directly involved, but I was very close to a software development shop that had released a product written in Delphi for desktop use. One developer was fully committed to this project, and despite the need for updates, everything was built around a Delphi adapter and library. There were ongoing requests for a rewrite, but this developer consistently rejected them. Ultimately, the company decided to form a separate shadow team to rewrite the entire application in .NET. They changed the Delphi developer's access keys and let the situation unfold, reminiscent of a scene from "Office Space."

Slide 107 — 35:04 (watch)

Slide 107We changed the keys, believing that the issue would resolve itself. They fixed the glitch, but there are more extreme solutions available if you have the will to pursue them. That's why I'm not a management consultant; I would never recommend such drastic measures against a Delphi developer. I believe there are better approaches, but who am I to judge? It's quite an extreme option, and I generally wouldn't recommend it.

Slide 108 — 35:44 (watch)

Slide 108I want to walk through this process because many of you may be considering how to apply these concepts to your own projects. We start from the outside in, ensuring we set up the appropriate gateways and identify the islands of determinism, avoiding excessive integration at both the top and bottom levels. Typically, there is someone, likely the person listening, who is most familiar with the techniques, approaches, and scaffolding being built. While you mentioned that you can't always control what happens after you leave, what strategies do you use to set up the team for success as you transition out? I assume you don't just hit build and then leave, right?

Slide 109 — 36:12 (watch)

Slide 109Or, do you bike? Ah, Karen elegies, it's the old Slashdot classic.

Slide 110 — 36:24 (watch)

Slide 110I try to get by, and you remember Slashdot. Just hearing it mentioned gave me five more gray hairs. Anyway, you were saying.

Slide 111 — 36:38 (watch)

Slide 111As I develop, I make it a point to share my progress with others. I often say, "Hey, look, I created this test." I explain how this test addresses the bug we've been experiencing.

Slide 112 — 36:52 (watch)

Slide 112I fix the bug, and the test is no longer needed. I also take a unit test and generalize it.

Slide 113 — 36:58 (watch)

Slide 113I created a parameterized test, and it fails when the input is NaN, an empty string, negative one, epsilon, negative infinity, or other edge cases.

Slide 114 — 37:10 (watch)

Slide 114I highlight my findings, and people often express surprise, saying they never thought such issues could occur. I aim to include valuable insights without overwhelming the audience with too much information.

Slide 115 — 37:38 (watch)

Slide 115I put a lot of effort into making this presentation concise. At the end, I summarize what I’ve done and how I recommend building things moving forward. I hope the examples I’ve shown, the results I’ve delivered, and the written material resonate with people to some extent. For some teams, these ideas will stick significantly, while for others, they may not. This largely depends on whether they choose to incorporate these practices into their culture. While I can present strong arguments and gain temporary buy-in, I cannot enforce these practices as the standard moving forward.

Slide 116 — 38:10 (watch)

Slide 116This may sound like a cop-out answer, but I believe it's essential to ensure that the path to water is clear. We should leave signs indicating the direction to the water.

Slide 117 — 38:38 (watch)

Slide 117People often struggle with creating and renaming files, so they tend to add onto existing work instead. This leads to a situation where we end up with an implementation file that accumulates business logic. I've created a clear path for them to follow, making it easier for others to contribute. Lewis, do you notice that when people begin using these techniques, which are often new to their projects, they adapt relatively quickly?

Slide 118 — 39:40 (watch)

Slide 118Do you find that it's not hard to identify unusual cases quickly, and that the payoff comes relatively soon? Some people perceive property-based testing as complex due to the number of syllables in the term. They might think, "Oh, that sounds complicated," and associate it with other well-known concepts like dehumanistic simulation testing, which they feel requires unnecessary effort. However, do you find that the barrier to entry is actually lower? Many of your examples involve NANDs and unusual values, which are classic elements of randomization testing. This reminds me of the first episode of the podcast, where Dave Shear, who was building the simulator at FoundationDB, discussed the process.

Slide 119 — 40:36 (watch)

Slide 119Transcribe technical terms, library, product, and command names accurately, ensuring correct casing and punctuation. Likely terms include llama.cpp, PyTorch, NGINX, CUDA, WebAssembly, and Antithesis.

Slide 120 — 41:00 (watch)

Slide 120I implemented a simple queue, but it immediately failed.

Slide 121 — 41:08 (watch)

Slide 121I am fully committed to randomized testing. I consider these principles in my development process.

Slide 122 — 41:16 (watch)

Slide 122My solution failed immediately. I don't believe many people can write code that works perfectly on the first attempt.

Slide 123 — 41:26 (watch)

Slide 123The state space of all the possible things that can go wrong is essentially a matter of basic combinatorics, to paraphrase John Cena.

Slide 124 — 41:34 (watch)

Slide 124Things can go wrong easily due to the numerous combinations of potential issues. The impact becomes apparent very quickly.

Slide 125 — 41:52 (watch)

Slide 125I strongly advocate for this approach because it doesn't require extensive time to identify small, elusive bugs. User-facing software is inherently full of bugs—every piece of software we create is flawed before testing. I can only speak for myself, but everything I produce initially contains bugs.

Slide 126 — 42:14 (watch)

Slide 126I am always surprised by what the tools find. I've never found it difficult to identify bugs in either my own code or others'. As a user, I can confirm that many applications are broken. My wife often asks why they didn't get it right the first time, expressing frustration with almost everything she interacts with. It's essential to simulate your users, as they will ultimately discover the issues.

Slide 127 — 42:34 (watch)

Slide 127We need to establish a common language within the broader community focused on rigorous testing. You mentioned the multi-syllable property-based testing approach.

Slide 128 — 42:44 (watch)

Slide 128Can we refer to it as randomized testing? We might need to improve our marketing. "DST" sounds better than deterministic simulation testing.

Slide 129 — 42:56 (watch)

Slide 129Can we refer to simulating your customers? I'm not sure if that will catch on; perhaps we need some marketing. Do you encounter pushback when introducing randomization? Some people might say that no one will include NANDs, reflecting a sense of unrealism around the concept.

Slide 130 — 43:24 (watch)

Slide 130Yes, if we establish type expectations, we can redefine the process. We start from the edges, identify our deterministic islets, and apply schemas to define our expectations in a verifiable, static manner. However, it’s important to note that this approach has its limits. When we begin to test these expectations, people often question the scenarios we present. For example, borrowing from one of your blog posts about shopping carts, they might ask, "Who would enter an item with a negative price? That doesn’t make any sense."

Slide 131 — 43:50 (watch)

Slide 131Do you receive that feedback in the real world, and how do you respond? Sometimes. Most people are experienced enough to encounter the user or the bug report that arises from an angry user email, highlighting the unexpected behavior that they initially questioned.

Slide 132 — 44:02 (watch)

Slide 132People who are further removed from the customer tend to think that way.

Slide 133 — 44:12 (watch)

Slide 133It's easy to connect with those who have closely interacted with customers.

Slide 134 — 44:22 (watch)

Slide 134I previously worked at a warehouse where I developed and maintained the software used in both the warehouse and a small factory. My office was located upstairs, while the software operated on all the machines downstairs.

Slide 135 — 44:28 (watch)

Slide 135If there was a problem, people would come upstairs and say, "Hey, there's a bug." Often, they did something that should never have been done.

Slide 136 — 44:36 (watch)

Slide 136These individuals work night shifts and face pressure from supervisors, along with various workplace dramas.

Slide 137 — 45:04 (watch)

Slide 137People who have worked closely with customers are not surprised by their behavior, while those further removed may find it surprising. It's important to treat users as a source of inputs; anything you allow them to do will eventually be done. You should communicate that their actions may not always make sense, as humans are fallible.

Slide 138 — 45:28 (watch)

Slide 138Thank you. Yes, "fallible" means capable of making mistakes.

Slide 139 — 45:34 (watch)

Slide 139Emphasize that people are fallible, including themselves, and that they must take action. Everyone makes mistakes.

Slide 140 — 45:44 (watch)

Slide 140People may have concerns that influence their actions. This highlights the inverse challenge of example-based testing. If you don't consider these factors, you may overlook important aspects of how your software is used.

Slide 141 — 46:14 (watch)

Slide 141If you have any expectations about how your software will be used, consider that users may have different intentions. This aligns with Hiram's Law: any behavior you expose through an API, if it has enough users, will become a dependency for someone, even if it is not explicitly stated in the contract. Therefore, we should implement methods of randomization and generalization to manage behaviors we want to restrict, preventing them from becoming unmanageable. Otherwise, giving the Internet a text box can lead to unpredictable outcomes.

Slide 142 — 46:48 (watch)

Slide 142You need to be prepared for it. This requires a mindset shift: understanding what we, as software developers, can control and recognizing that there are aspects of the world beyond our influence.

Slide 143 — 47:08 (watch)

Slide 143It's simply a source of data that we cannot control, and we need to approach it with that mindset, even regarding external dependencies. Many APIs have their own contracts, but we cannot control if they go down, if they take too long to respond, or if they change their schema. It's essential to fit your software into the context of the real world. Today, many software developers are very...

Slide 144 — 47:22 (watch)

Slide 144I don't know.

Slide 145 — 47:34 (watch)

Slide 145In an enterprise software environment, when a bug is reported, it typically goes through multiple levels of support, from the level one help desk to level two and level three, before becoming a ticket for the project team. This process can involve many steps before it reaches the programmer, making it feel quite remote. It's important to convey to stakeholders that you're essentially creating a small input interface—a box or panel—where users can submit information. However, you have no control over what goes into that panel or portal.

Slide 146 — 48:00 (watch)

Slide 146Anything that can be let into the system will eventually find its way in, especially if the product is successful and attracts enough users. This mindset can be more comfortable for people, even if they are not particularly customer-focused, as it frames the situation as an engineering challenge. The question then becomes how to manage the various inputs that enter the system. There are different ways to communicate this concept depending on the audience's position within the organization, their experience, and their mindset.

Slide 147 — 48:30 (watch)

Slide 147In general, people want systems to be more reliable because it reduces pain points. This desire for reliability is often an easy sell. Many techniques exist that, while named differently, essentially aim to separate determinism from legitimacy. This approach is widely accepted as beneficial.

Slide 148 — 48:50 (watch)

Slide 148I hope that our community can develop effective metaphors and straightforward terminology to make these concepts more accessible. Not everyone will share our enthusiasm for testing, so it's important to communicate these ideas in a way that resonates with a broader audience.

Slide 149 — 49:02 (watch)

Slide 149I hope we can start expanding these ideas in ways that appeal to a broader audience. Perhaps I'm just being selfish in wanting to make my job easier.

Slide 150 — 49:14 (watch)

Slide 150I believe this approach would likely make our jobs easier. I often use Carl as an example because he is known for being exceptionally reliable. While not the most distributed, he is certainly solid.

Slide 151 — 49:48 (watch)

Slide 151If your system were as solid as Carl, how would that impact your development practices? What changes would you make? Consider how much of your current process is focused on defense and safety due to a lack of confidence in your system. As you improve your testing practices, remember that it's not just about writing countless tests manually. You could also leverage an LLM to assist in this process.

Slide 152 — 50:36 (watch)

Slide 152Transcribe technical terms, library names, product names, and command names accurately, ensuring correct casing and punctuation.

Slide 153 — 51:02 (watch)

Slide 153I use this great website every day.

Slide 154 — 51:12 (watch)

Slide 154It took a million lines of code to achieve that. This raises a question for me that I wasn't sure how to ask, but you've provided the perfect opportunity.

Slide 155 — 51:46 (watch)

Slide 155Many teams share interesting reliability stories about their efforts to ensure system availability and reliability, often encountering unexpected challenges. At Google Cloud, we discussed the need for shark-proof cages around undersea cables because the wrong color attracted sharks. In another instance, Angus chewed through the wire connected to a satellite, disrupting internet access, with the nearest 2G tower being a 20-minute drive away, which was quite inconvenient. This highlights the importance of adopting more resilient practices and tools in system design.

Slide 156 — 52:22 (watch)

Slide 156This reminds me of a fascinating article about programming from Antarctica, which highlights the unique challenges of dealing with latency. It emphasizes how this environment influences the design of systems, leading to very minimal examples transmitted over the network.

Slide 157 — 52:42 (watch)

Slide 157You mentioned something interesting at the end. Aside from recommending best practices for application development, you noted that many frameworks and tools developers use address issues of reliability and retries, but they often obscure these solutions from the user.

Slide 158 — 53:10 (watch)

Slide 158As you walk up the abstraction ladder, you may not fully understand how a particular toolchain operates, but you trust that the framework handles it effectively. The value of abstraction lies in allowing users to utilize a durable execution engine, like Temporal or DBoss, without needing to consider its internal workings. However, you suggested that users should still be aware of certain aspects.

Slide 159 — 53:40 (watch)

Slide 159I was wondering if you have any guidance on how to balance these two perspectives. There are aspects that can be hidden from the user and aspects that should not be concealed. The specific context of that discussion involves both the system user and the application developer.

Slide 160 — 53:54 (watch)

Slide 160I encountered that post while working in ag tech, where many users are offline, and there is significant concurrent modification of data.

Slide 161 — 54:02 (watch)

Slide 161Many vendors address the issue of concurrent data editing in various ways. To illustrate this, we can consider two people editing a Git file on their own local machines.

Slide 162 — 54:22 (watch)

Slide 162Git provides a commit mechanism. Many systems operate on a "last write wins" principle, where they simply record the time of each commit—like saying, "I committed at 4:02 PM, and you committed at 4:05 PM, so your commit is better," even though you haven't seen my commit.

Slide 163 — 54:42 (watch)

Slide 163What I intended to convey, though I may not have articulated it clearly, is that if your domain has multiple sources of truth that allow people to edit shared items independently or concurrently, you should never hide conflicts.

Slide 164 — 55:00 (watch)

Slide 164For simpler data structures, there are methods to deterministically merge them, which relates to the CRDT aspect. However, these methods often require user input if the semantic differences between the edits are significant. Therefore, it is important not to hide conflicts when a machine cannot reliably resolve them.

Slide 165 — 55:14 (watch)

Slide 165The "last strike wins" approach is not a good solution because it arbitrarily discards one copy of the data. I observe this happening far too often in so-called offline systems.

Slide 166 — 55:48 (watch)

Slide 166Multiple entry points to the state of a system are a critical aspect to manage in your application. You should be cautious about offloading core functionality to components whose semantics you do not fully understand. As you mentioned, the last strike wins strategy can be appropriate in certain situations, but it is not universally applicable.

Slide 167 — 56:08 (watch)

Slide 167It's less about labeling a pattern as always good or bad, and more about ensuring that you keep the important elements front and center within your sphere of control. Avoid pushing away aspects that are core to what you do.

Slide 168 — 56:18 (watch)

Slide 168Is that fair? It is fair. We can relate this back to marketing. Instead of thinking of conflicts, perhaps we should consider them as opportunities for consensus.

Slide 169 — 56:30 (watch)

Slide 169We can view conflicts as opportunities to reach a consensus.

Slide 170 — 56:40 (watch)

Slide 170In certain systems, conflicts will occur, and they are not something to fear. You cannot ignore them; they can be resolved. It's important to recognize whether your system inherently has conflicts. Sometimes, conflicts are a natural part of your data model.

Slide 171 — 57:04 (watch)

Slide 171You can't always push conflicts under the rug. We’ll have to invite you back, Louis, for that distributed systems discussion, as that’s what we’re here for. We’ve run out of time for today, so thank you so much for joining us, Louis. I really appreciate your time. And to everyone watching, thank you for listening. We hope you have a great day and take care.

Slide 172 — 57:28 (watch)

Slide 172Thank you for checking out the Bug Bash podcast. If you have an idea for a show or would like to be a guest, please email us at [email protected]. If you prefer chatting, visit antithesis.com and scroll to the bottom to find the link to our Discord.

Slide 173 — 57:50 (watch)

Slide 173Finally, if you want to connect with others who care about software correctness and reliability, consider attending the Bug Bash conference this year on April 23rd and 24th in Washington, D.C. All the details are available at bugbash.antithesis.com. Until next time.