151 slides extracted.
Slide 1 — 0:16 (watch)
Slide 2 — 1:54 (watch)
Slide 3 — 3:48 (watch)
Slide 4 — 5:58 (watch)
Slide 5 — 8:06 (watch)
Slide 6 — 9:20 (watch)
Slide 7 — 10:02 (watch)
![]() | It's synthetic data, but it is live and changing. I apologize for racing ahead in the demo to show you what is happening behind the scenes. |
Slide 8 — 10:42 (watch)
Slide 9 — 11:34 (watch)
Slide 10 — 12:44 (watch)
Slide 11 — 13:10 (watch)
Slide 12 — 14:36 (watch)
Slide 13 — 15:44 (watch)
![]() | I did not realize this when I prepared the demo, but we have a column labeled "slop," and it is zero. It will always remain zero. We will run this query a few more times. |
Slide 14 — 16:04 (watch)
Slide 15 — 16:28 (watch)
Slide 16 — 16:42 (watch)
Slide 17 — 17:10 (watch)
![]() | Systems that function effectively do so for specific reasons. Their success is not solely due to extensive testing; there must be a foundational rationale for why the system is designed to work. |
Slide 18 — 17:38 (watch)
Slide 19 — 18:28 (watch)
Slide 20 — 19:30 (watch)
![]() | Materialize's reason for this discussion is the unexpected prevalence of virtual time among attendees. This concept, introduced in a 1985 paper by David Jefferson, will be explored further. |
Slide 21 — 19:44 (watch)
![]() | The abstract identifies two uses for virtual time. The first is in discrete event-based simulations, which many people are currently utilizing. The second use case is for concurrency control. |
Slide 22 — 19:56 (watch)
![]() | We will use virtual time for concurrency control. I will explain what that means, but in short, it is essential for our approach. |
Slide 23 — 20:02 (watch)
![]() | Materialize and its underlying stack function as a large-scale simulator. |
Slide 24 — 20:10 (watch)
![]() | It simulates potential computations rather than simply reacting to external stimuli. |
Slide 25 — 20:16 (watch)
![]() | It is performing a prescribed sequence of computations. |
Slide 26 — 20:20 (watch)
![]() | We previously observed this on the screen. |
Slide 27 — 20:30 (watch)
Slide 28 — 20:40 (watch)
![]() | We document them as triples, which may vary slightly. In this case, we focus on time difference data, as displayed on the big screen. |
Slide 29 — 20:44 (watch)
![]() | Time stamps are recorded as plus or minus ones, along with some record payloads. |
Slide 30 — 20:50 (watch)
![]() | As these collections evolve and grow in length, they provide a specific understanding of their contents at particular moments in time. |
Slide 31 — 21:00 (watch)
![]() | You aggregate all the changes up to the specified time, summing the depths for each data point. Any data point with a non-zero value reflects that multiplicity in your dataset. |
Slide 32 — 21:12 (watch)
![]() | The backbone of Materialize is its continual changelog, which allows for the addition of negative numbers. While it may seem unusual, this approach is entirely feasible. |
Slide 33 — 21:16 (watch)
![]() | Is this just an accounting technique? |
Slide 34 — 21:20 (watch)
![]() | This process is not simply about writing things down and being finished. |
Slide 35 — 21:28 (watch)
Slide 36 — 21:44 (watch)
![]() | This could be a dataset containing information about individuals, and the filter specifies that we want to retain only those individuals whose ages are even. |
Slide 37 — 22:02 (watch)
Slide 38 — 22:22 (watch)
Slide 39 — 22:30 (watch)
![]() | You can easily connect them together. |
Slide 40 — 22:34 (watch)
Slide 41 — 22:46 (watch)
![]() | That property holds true regardless of your expertise with distributed systems. |
Slide 42 — 23:04 (watch)
Slide 43 — 23:18 (watch)
![]() | This feature is very useful. The certainty regarding the equivalences between inputs and outputs allows you to confidently assemble various components. |
Slide 44 — 23:26 (watch)
![]() | Moreover, the SQL plans themselves indicate that computations at any scale can also be composed effectively. |
Slide 45 — 23:38 (watch)
Slide 46 — 23:48 (watch)
![]() | It will always be as if the input data was frozen, both of your bits of logic were executed, and then the third person's logic was applied to produce the change log for the results. |
Slide 47 — 24:08 (watch)
Slide 48 — 24:24 (watch)
Slide 49 — 24:34 (watch)
![]() | Materialize allows for extensive functionality within the system's confines. You can choose how to assemble the various components, and they are predominantly deterministic at that stage. |
Slide 50 — 24:46 (watch)
Slide 51 — 24:56 (watch)
![]() | From a system perspective, it effectively eliminates logical contention. |
Slide 52 — 25:12 (watch)
Slide 53 — 25:22 (watch)
![]() | While there is a potential for errors, the design aims to minimize user mistakes. Ideally, users should not be able to make significant errors in this system. |
Slide 54 — 25:34 (watch)
Slide 55 — 25:48 (watch)
![]() | You're likely thinking of vector clocks or Lamport clocks. |
Slide 56 — 25:58 (watch)
![]() | The fundamental difference is that virtual time is a prescriptive technique, while logical clocks are descriptive. Virtual time introduces a structure that limits the available options. |
Slide 57 — 26:06 (watch)
![]() | The system now has significantly fewer degrees of freedom than it previously did. However, this reduction also simplifies the overall complexity. |
Slide 58 — 26:12 (watch)
![]() | Users of the system no longer have to manage certain issues that could have arisen, which can be a significant relief. |
Slide 59 — 26:22 (watch)
![]() | Logical clocks, in my view, do not eliminate complexity; rather, they transcribe it. |
Slide 60 — 26:38 (watch)
Slide 61 — 27:00 (watch)
![]() | I spent a considerable amount of time thinking that my goal was to be perceived as a really smart person. I focused on presenting myself in a way that reflected that intelligence. |
Slide 62 — 27:16 (watch)
![]() | Ultimately, what are computer scientists uniquely good at? Where do we provide the most value? |
Slide 63 — 27:24 (watch)
![]() | In my personal experience, effective abstraction has been a key strength. There are many opportunities for talented individuals in this area. |
Slide 64 — 27:30 (watch)
![]() | One of our strengths is our ability to take complex concepts and reduce their surface area, making them easier and more accessible for those who prefer not to engage with that complexity. |
Slide 65 — 27:42 (watch)
![]() | In mathematics, a 150-page paper can be met with enthusiasm, and a 200-page paper may receive even more praise. |
Slide 66 — 27:50 (watch)
![]() | In reality, none of our customers want to read a 200-page document. They prefer simplicity and ease of use. For example, my computer contains a lot of complex silicon that I do not understand. |
Slide 67 — 28:06 (watch)
Slide 68 — 28:16 (watch)
![]() | It's not necessarily as interesting. Many of you may feel that you want complexity. |
Slide 69 — 28:24 (watch)
![]() | We aim to remove complexity for others, which is a key aspect of our service. Often, we are compensated for successfully achieving this goal. |
Slide 70 — 28:38 (watch)
![]() | Virtual time is a valuable abstraction. When implemented correctly, it is not overly complicated, but it can be challenging to misuse. |
Slide 71 — 28:50 (watch)
Slide 72 — 28:56 (watch)
![]() | The pieces fit together nicely and logically. |
Slide 73 — 29:00 (watch)
![]() | You obtain a result that you understand. |
Slide 74 — 29:04 (watch)
![]() | You may not fully understand how or why it works, but you receive the correct answer, which is satisfying. It's a relief not to have to learn about distributed systems to use this effectively. |
Slide 75 — 29:22 (watch)
![]() | We will now discuss several vignettes related to the abstraction of virtual time. This abstraction is beneficial not only for users but also for various system-building functions. |
Slide 76 — 29:32 (watch)
Slide 77 — 29:38 (watch)
![]() | We can conduct a brief demonstration of active replication at the end of the previous slide. |
Slide 78 — 29:46 (watch)
![]() | All forms of parallelism are generally straightforward, though I hesitate to say they are easy. |
Slide 79 — 29:52 (watch)
![]() | Task parallelism is relatively straightforward, though it can be challenging to define. |
Slide 80 — 29:56 (watch)
![]() | If five people want to use the same changelog, they can proceed without hesitation. This approach will yield consistent results at the end. |
Slide 81 — 30:10 (watch)
Slide 82 — 30:24 (watch)
![]() | Pipeline parallelism is an excellent example of where virtual time excels. In a sequence of tasks A, B, and C, task B cannot begin until task A has completed a portion of its work. |
Slide 83 — 30:34 (watch)
![]() | Virtual time records the intended start time for tasks, even if there is a delay before they can actually begin. |
Slide 84 — 30:40 (watch)
![]() | Pipelining is not easy, but it is relatively straightforward. Queries, which are another form of interaction with the system, receive virtual times. |
Slide 85 — 30:52 (watch)
Slide 86 — 31:02 (watch)
![]() | If you have additional constraints on time, such as a specific lower bound, you can achieve strict serialization. |
Slide 87 — 31:08 (watch)
![]() | A notable property is that you obtain something stronger than serialization; it is composable strict serializability. |
Slide 88 — 31:16 (watch)
![]() | By revealing these timestamps, multiple users can combine two strict serializable databases to create a third one. However, I prefer not to have to do that. |
Slide 89 — 31:32 (watch)
![]() | With virtual times, their structure is revealed in a way that allows for composition, enabling the creation of even more fascinating composed systems. |
Slide 90 — 31:44 (watch)
![]() | Errors, which often manifest as behavior, are simply data within these systems. |
Slide 91 — 31:54 (watch)
Slide 92 — 32:10 (watch)
Slide 93 — 32:24 (watch)
![]() | Finally, active replication, which I will demonstrate, ultimately results in deduplication. |
Slide 94 — 32:34 (watch)
Slide 95 — 32:50 (watch)
Slide 96 — 33:00 (watch)
![]() | Here’s a quick demonstration of active replication that will take about a minute or two. |
Slide 97 — 33:20 (watch)
Slide 98 — 33:40 (watch)
![]() | A cluster defines the computation you want to perform, while the replicas are the engines that generate the results. |
Slide 99 — 33:48 (watch)
![]() | You can have any number of replicas, including zero. One is a common choice, and two is also used frequently. Having zero replicas is typically a mistake, but we will proceed with that configuration. |
Slide 100 — 34:02 (watch)
![]() | We have a default cluster running, and we just dropped a replica. As a result, the highlighted line has stopped changing, which is unusual. |
Slide 101 — 34:12 (watch)
![]() | This is exactly the desired behavior in a virtually timed system. We cannot predict what will happen next. |
Slide 102 — 34:18 (watch)
![]() | In particular, we cannot assume that nothing has changed. While we cannot definitively say that this is the wrong answer, we are not certain that it is the correct one either. |
Slide 103 — 34:22 (watch)
![]() | We need to pause the feed. |
Slide 104 — 34:28 (watch)
![]() | It needs to stop, and it is clear that it has. As consumers, we are left wondering what comes next, but the system is unable to provide an answer. |
Slide 105 — 34:52 (watch)
Slide 106 — 35:14 (watch)
![]() | If you examine the numbers closely, you may find them difficult to see, but the change log is essentially uninterrupted. |
Slide 107 — 35:22 (watch)
![]() | Although we experienced an interruption, the change log remains essentially uninterrupted, as if we had continued running without pause. |
Slide 108 — 35:32 (watch)
Slide 109 — 35:58 (watch)
Slide 110 — 36:24 (watch)
![]() | Zero downtime physical replication is a principle that applies in various other settings as well. |
Slide 111 — 36:40 (watch)
![]() | Logical reconfiguration allows you to change the business logic of your view. Transitioning from one configuration to another can be challenging, but it is not unexpected. |
Slide 112 — 36:54 (watch)
![]() | The final point I want to address is brief, as this is the last slide. |
Slide 113 — 37:08 (watch)
Slide 114 — 37:18 (watch)
![]() | Building confidence involves figuring out how to transfer that confidence to others. Sharing your personal conviction is interesting, but not my primary focus. |
Slide 115 — 37:32 (watch)
Slide 116 — 37:44 (watch)
![]() | I recommend three key actions, though this is not an exhaustive list. |
Slide 117 — 37:58 (watch)
![]() | Drug footing is crucial. Utilize the system you are developing; if you don't engage with it, the validity of your statements is questionable. |
Slide 118 — 38:08 (watch)
Slide 119 — 38:22 (watch)
![]() | Benchmarking is something I strongly advocate for. |
Slide 120 — 38:30 (watch)
![]() | I often engage in performance-related work. If this area is not your focus, it may seem less relevant. However, I encourage you to test the limits of your system. |
Slide 121 — 38:40 (watch)
Slide 122 — 39:00 (watch)
Slide 123 — 39:04 (watch)
![]() | The advantage of thoroughly testing is that you uncover hidden issues not only in your own system but also in others. |
Slide 124 — 39:16 (watch)
Slide 125 — 39:26 (watch)
![]() | It's important to thoroughly test your ideas and identify where they may fail. Additionally, I strongly recommend maintaining open communication with others about your progress and challenges. |
Slide 126 — 39:38 (watch)
Slide 127 — 39:54 (watch)
![]() | I attempt to explain the value of the idea, which requires balancing numerous elements at precise angles. Often, I end up deleting the post and going back to improve it. |
Slide 128 — 40:00 (watch)
![]() | The exercise of communicating and bringing others on board is essential. |
Slide 129 — 40:10 (watch)
![]() | Explaining how simple and easy this process can be is both challenging and informative. It tests the theory that this information is valuable and can genuinely simplify people's lives. |
Slide 130 — 40:26 (watch)
![]() | The final takeaway is that building confidence is something you provide to others. It is not solely a technical issue; confidence cannot be solved by a piece of software. |
Slide 131 — 40:36 (watch)
![]() | Building confidence is a process that involves many components working together to achieve a successful outcome. |
Slide 132 — 41:04 (watch)
Slide 133 — 41:22 (watch)
![]() | Is your study for others, or is it for yourself? Is it a self-indulgent pursuit? |
Slide 134 — 41:32 (watch)
![]() | Software reliability can easily lead to the misconception of creating highly reliable systems in the abstract, without a clear understanding of what reliability truly entails. |
Slide 135 — 41:40 (watch)
![]() | Reliability has different implications for a self-driving car compared to a pacemaker. |
Slide 136 — 41:46 (watch)
Slide 137 — 41:54 (watch)
![]() | With Black Friday approaching, will the system remain operational as I conduct business? |
Slide 138 — 42:04 (watch)
Slide 139 — 42:16 (watch)
![]() | Design bugs are a crucial aspect of reliability, often more significant than pointer access bugs or similar issues. |
Slide 140 — 42:30 (watch)
Slide 141 — 42:38 (watch)
![]() | I will pause here. We have time for one or two questions. If the questions are particularly challenging, we may have less time to address them. |
Slide 142 — 43:00 (watch)
Slide 143 — 43:20 (watch)
![]() | To clarify, I did not originate the concept of virtual time; that was developed by Jefferson in the 1980s. |
Slide 144 — 43:40 (watch)
Slide 145 — 43:54 (watch)
![]() | I apologize for the delay in communicating this, but this is how the system needs to function. |
Slide 146 — 44:00 (watch)
Slide 147 — 44:44 (watch)
Slide 148 — 45:18 (watch)
Slide 149 — 46:54 (watch)
Slide 150 — 47:58 (watch)
![]() | Thank you. |
Slide 151 — 48:14 (watch)
![]() | Thank you. |






















































































































































