110 slides extracted.
Slide 1 — 0:04 (watch)
![]() | I'm Ben Eggers from OpenAI. To start, could I have a volunteer from the front row? |
Slide 2 — 0:10 (watch)
![]() | What's your name? I tend to get nervous when I speak, so if I'm talking too fast, please let me know. Thank you. |
Slide 3 — 0:20 (watch)
![]() | I'm Ben Eggers, and I work at OpenAI. I'm here to explain that nothing has changed in software development. |
Slide 4 — 0:28 (watch)
![]() | Software development remains unchanged. |
Slide 5 — 0:38 (watch)
![]() | Nothing has changed. First, let's conduct a poll. Please raise your hand if you have used a large language model (LLM) to develop software. |
Slide 6 — 0:52 (watch)
![]() | How many people here believe that we experienced a significant leap in model intelligence a few months ago? How many would trust an LLM to develop all of their software? Let's discuss. |
Slide 7 — 1:04 (watch)
Slide 8 — 1:14 (watch)
![]() | This slide features a photo of me speaking at Bug Bash last year, alongside a childhood photo of myself that I shared during the event. |
Slide 9 — 1:26 (watch)
Slide 10 — 1:44 (watch)
Slide 11 — 2:14 (watch)
Slide 12 — 2:26 (watch)
![]() | In part two, we will discuss how agents facilitate the work and influence your thinking about it, but they do not eliminate the need for the work itself. |
Slide 13 — 2:32 (watch)
![]() | All the challenging aspects remain unchanged. |
Slide 14 — 2:44 (watch)
Slide 15 — 2:54 (watch)
![]() | I would like to provide a disclaimer. |
Slide 16 — 2:58 (watch)
Slide 17 — 3:14 (watch)
![]() | We are not discussing web portals and applications, which typically have a broader feature set. In these cases, each feature does not significantly impact the architecture of the rest of the system. |
Slide 18 — 3:22 (watch)
![]() | I cannot definitively say that this does not apply to broad, high surface area systems. My experience with them is limited, so I do not feel qualified to comment extensively. |
Slide 19 — 3:30 (watch)
![]() | I believe I am somewhat qualified to discuss the systems on the left, so that will be our focus. |
Slide 20 — 3:36 (watch)
![]() | Many of you who attended Bug Bash last year might recall that I enjoy games. Last year, we played Guess the Impact. This year, the game is Guess What the Agent Gave Me. |
Slide 21 — 3:48 (watch)
![]() | I will describe a system and share the prompt I used with the agent. Then, you will guess the outcome. |
Slide 22 — 3:56 (watch)
![]() | Let's practice. |
Slide 23 — 4:02 (watch)
Slide 24 — 4:18 (watch)
![]() | Python's RE module is primarily written in Python. This presents an interesting problem to explore in the context of autonomous software engineering. |
Slide 25 — 4:26 (watch)
![]() | I built a harness with an essentially empty repository that operated in a while loop. |
Slide 26 — 4:38 (watch)
Slide 27 — 4:52 (watch)
![]() | What do you think happened here? Any guesses? A Rust wrapper for Python would be even better than what I have. Do we have any other suggestions? |
Slide 28 — 5:12 (watch)
![]() | The codebase consists of one million lines, which is low but morally correct. It contains numerous markdown documents, and that is accurate. |
Slide 29 — 5:26 (watch)
Slide 30 — 5:50 (watch)
Slide 31 — 6:18 (watch)
Slide 32 — 6:36 (watch)
Slide 33 — 6:44 (watch)
![]() | Part one. |
Slide 34 — 6:46 (watch)
![]() | Writing code has always revealed the most challenging aspects of programming. |
Slide 35 — 6:52 (watch)
Slide 36 — 7:10 (watch)
![]() | Code is relatively inexpensive to produce, even at high typing speeds. In fact, code has historically been considered cheap to create. |
Slide 37 — 7:26 (watch)
Slide 38 — 7:50 (watch)
Slide 39 — 8:14 (watch)
Slide 40 — 8:28 (watch)
Slide 41 — 9:12 (watch)
![]() | A human would assess whether something was correct, consider the shape of the problem, and review the API contracts. Filling in the code was often the least interesting part of the process. |
Slide 42 — 9:26 (watch)
Slide 43 — 9:38 (watch)
![]() | The slowness was a critical factor. I asked our new image model to provide an illustration of slowness as a load-bearing concept, and this is the result I received. |
Slide 44 — 9:50 (watch)
Slide 45 — 10:04 (watch)
![]() | When writing your queries, it's crucial to carefully consider how you manage your indices and the efficiency of your scans, especially for those of you interested in databases. |
Slide 46 — 10:22 (watch)
Slide 47 — 11:00 (watch)
Slide 48 — 11:14 (watch)
Slide 49 — 11:58 (watch)
Slide 50 — 12:26 (watch)
Slide 51 — 13:14 (watch)
Slide 52 — 13:22 (watch)
![]() | In summary, the old coding loop compelled design thinking. |
Slide 53 — 13:34 (watch)
Slide 54 — 13:58 (watch)
Slide 55 — 14:16 (watch)
Slide 56 — 14:50 (watch)
Slide 57 — 15:08 (watch)
Slide 58 — 15:16 (watch)
Slide 59 — 15:38 (watch)
Slide 60 — 15:44 (watch)
![]() | In comparing GPT 4.1 to GPT 4.6, it's clear that 4.1 performs significantly worse on all benchmarks. |
Slide 61 — 15:54 (watch)
Slide 62 — 16:12 (watch)
Slide 63 — 16:22 (watch)
![]() | I feel that a model is often better at my job than I am, and this change occurred about three months ago. |
Slide 64 — 16:42 (watch)
Slide 65 — 16:54 (watch)
Slide 66 — 17:10 (watch)
![]() | I need to consider what the desired outcome looks like and how to achieve it effectively. Therefore, it's essential to make decisions first. |
Slide 67 — 17:14 (watch)
![]() | You define the desired behavior changes, identify what must continue functioning, and specify the trade-offs you are willing to accept. |
Slide 68 — 17:26 (watch)
Slide 69 — 17:50 (watch)
Slide 70 — 18:02 (watch)
![]() | Design is possibly the most important slide in this entire presentation, as it reflects what I have learned about developing with agents. |
Slide 71 — 18:10 (watch)
Slide 72 — 18:20 (watch)
![]() | I have coworkers who write all of their tests by hand, which I find excessive. I believe that's the most challenging aspect of software engineering. |
Slide 73 — 18:32 (watch)
![]() | However, I know people who strongly advocate for it. |
Slide 74 — 18:36 (watch)
![]() | There is an interesting perspective that unit testing may be becoming obsolete. We've observed instances where AI-generated unit tests assert unusual values and provide little meaningful information. |
Slide 75 — 18:46 (watch)
Slide 76 — 19:02 (watch)
Slide 77 — 19:16 (watch)
![]() | If you have a provable correctness harness, the aesthetics of the code become irrelevant. You specify which data models are immutable and provide a comprehensive test harness. |
Slide 78 — 19:28 (watch)
Slide 79 — 19:46 (watch)
![]() | This is the only effective approach to AI-driven, test-driven development, particularly concerning unit tests. |
Slide 80 — 19:50 (watch)
![]() | In summary, here is a checklist for managing your AI agents. |
Slide 81 — 20:02 (watch)
Slide 82 — 20:20 (watch)
![]() | That brings us to our final game, Contrafact. I developed a practice journal to track jazz practice. |
Slide 83 — 20:28 (watch)
![]() | The details of the app are not highly specific, but it is important to note that it is not a complex system. Instead, it is a relatively shallow system with many screens. |
Slide 84 — 20:44 (watch)
Slide 85 — 21:40 (watch)
Slide 86 — 21:56 (watch)
Slide 87 — 22:12 (watch)
![]() | I completely unraveled the situation because it was problematic. It's crucial to pay attention to your data models. |
Slide 88 — 22:20 (watch)
![]() | I outlined what I intended to convey, and now I've shared that information with you. |
Slide 89 — 22:24 (watch)
![]() | Writing code used to require significant design thinking that often occurred implicitly. |
Slide 90 — 22:32 (watch)
Slide 91 — 22:44 (watch)
![]() | Software engineering is fundamentally unchanged. We prioritize correctness and how we demonstrate that correctness is a fundamentally stochastic process, though it has become faster in some respects. |
Slide 92 — 22:52 (watch)
![]() | Code has become inexpensive, but ensuring its correctness remains a challenge. |
Slide 93 — 22:54 (watch)
![]() | Thank you. |
Slide 94 — 23:24 (watch)
![]() | I ran a bit over time, but I believe I still have a few minutes for any thoughts, feelings, or questions. |
Slide 95 — 23:38 (watch)
![]() | I have not observed any programmatic enforcement regarding restrictions on modifying schemas or other elements, and I have also seen this approach fail at the prompt level. |
Slide 96 — 23:46 (watch)
![]() | I would like to see a product, open-source library, or VS Code extension that allows you to specify which files the model can or cannot access. This type of guardrail is likely to emerge soon. |
Slide 97 — 24:04 (watch)
![]() | As an inexperienced engineer, you might wonder how to transition to a tech lead role, where the responsibilities differ significantly from those of junior engineers. This is an important question. |
Slide 98 — 24:16 (watch)
![]() | That is one of the key questions in the industry right now. |
Slide 99 — 24:24 (watch)
Slide 100 — 24:34 (watch)
![]() | You can begin by defining what you want to build. |
Slide 101 — 24:40 (watch)
![]() | Consider the structure of your database and identify the core entities that are important to you. |
Slide 102 — 24:46 (watch)
Slide 103 — 25:08 (watch)
![]() | People can learn to do this directly. Thank you. I have a quick question. Even if a pattern is abstracted, do you quantify how familiar it is to a model before asking it a question? |
Slide 104 — 25:16 (watch)
![]() | For a novel concept, you would develop more architecture around it, whereas for something that is relatively standard, you would do less. |
Slide 105 — 25:20 (watch)
![]() | That's a good question. |
Slide 106 — 25:24 (watch)
![]() | There are many aspects of the software world that I am not familiar with. |
Slide 107 — 25:30 (watch)
![]() | Many patterns and practices exist that I am unaware of in places I have never visited. |
Slide 108 — 25:44 (watch)
Slide 109 — 25:52 (watch)
Slide 110 — 26:10 (watch)
![]() | I make a concerted effort to establish all the core patterns myself and understand their underlying reasons, even when a model is generating the content. Thank you. |













































































































