151 slides extracted.
Slide 1 — 0:16 (watch)
Slide 2 — 1:54 (watch)
Slide 3 — 3:48 (watch)
Slide 4 — 5:58 (watch)
Slide 5 — 8:06 (watch)
Slide 6 — 9:20 (watch)
Slide 7 — 10:02 (watch)
![]() | It's admittedly synthetic data, but it's live and changing. I apologize for racing ahead in the demo to show you what's happening behind the scenes. |
Slide 8 — 10:42 (watch)
Slide 9 — 11:34 (watch)
Slide 10 — 12:44 (watch)
Slide 11 — 13:10 (watch)
Slide 12 — 14:36 (watch)
Slide 13 — 15:44 (watch)
![]() | I did not realize when I put this demo together that we have a column here for slop, and it is zero. It will always be zero. We will run this query a few more times. |
Slide 14 — 16:04 (watch)
Slide 15 — 16:28 (watch)
Slide 16 — 16:42 (watch)
Slide 17 — 17:10 (watch)
Slide 18 — 17:38 (watch)
Slide 19 — 18:28 (watch)
Slide 20 — 19:30 (watch)
Slide 21 — 19:44 (watch)
![]() | The abstract presents two uses for virtual time. The first is in discrete event-based simulations, which many people are currently utilizing. The second use case is for concurrency control. |
Slide 22 — 19:56 (watch)
![]() | We will use virtual time for concurrency control. I will explain what that means, but the key takeaway is that it plays a crucial role in our system. |
Slide 23 — 20:02 (watch)
![]() | Materialize and its underlying stack function as a large-scale simulator. |
Slide 24 — 20:10 (watch)
![]() | It simulates computations that could occur instead of merely reacting to external stimuli. |
Slide 25 — 20:16 (watch)
![]() | It is performing a prescribed sequence of computations. |
Slide 26 — 20:20 (watch)
![]() | We previously observed this on the screen. |
Slide 27 — 20:30 (watch)
![]() | The backbone of Materialize consists of continually changing and evolving changelogs. These changelogs evolve only in the append sense, allowing us to learn more about our underlying data. |
Slide 28 — 20:40 (watch)
![]() | We represent them as triples, often with slight variations. In this case, we are dealing with time difference data, which we previously displayed on the big screen. |
Slide 29 — 20:44 (watch)
![]() | We have time stamps, plus or minus ones, and some record payloads. |
Slide 30 — 20:50 (watch)
![]() | As these collections evolve and grow in length, they provide a specific understanding of their contents at particular moments in time. |
Slide 31 — 21:00 (watch)
![]() | You accumulate all the changes up to the specified time, summing the depths for each piece of data. Any data with a non-zero depth is included in your dataset with that multiplicity. |
Slide 32 — 21:12 (watch)
![]() | It may seem unusual to add to a negative number, but it is entirely feasible. The backbone of Materialize is its continual changelog. |
Slide 33 — 21:16 (watch)
![]() | Is this just an accounting technique? |
Slide 34 — 21:20 (watch)
![]() | No, it's not just about writing things down and being done. |
Slide 35 — 21:28 (watch)
Slide 36 — 21:44 (watch)
![]() | This could represent a collection of data about people, where the filter specifies that we want to retain only those individuals whose ages are even. |
Slide 37 — 22:02 (watch)
Slide 38 — 22:22 (watch)
Slide 39 — 22:30 (watch)
![]() | You can start connecting them easily. |
Slide 40 — 22:34 (watch)
Slide 41 — 22:46 (watch)
![]() | That property holds true regardless of your familiarity with distributed systems. |
Slide 42 — 23:04 (watch)
Slide 43 — 23:18 (watch)
![]() | This approach is very useful. The certainty regarding the equivalences between inputs and outputs allows for confident assembly of all components. |
Slide 44 — 23:26 (watch)
![]() | Moreover, the SQL plans themselves, and the computations at any scale, also compose effectively. |
Slide 45 — 23:38 (watch)
Slide 46 — 23:48 (watch)
![]() | It will always be as if the input data was taken, frozen, both pieces of logic were executed, and then the third person's logic was applied to produce the change log for the results. |
Slide 47 — 24:08 (watch)
Slide 48 — 24:24 (watch)
Slide 49 — 24:34 (watch)
![]() | Materialize allows you to perform many operations within the confines of the system. You can choose how to assemble these components, and they are largely deterministic at that stage. |
Slide 50 — 24:46 (watch)
Slide 51 — 24:56 (watch)
![]() | From a system perspective, it effectively removes logical contention. |
Slide 52 — 25:12 (watch)
![]() | It eliminates complicated coordination and synchronization issues that could lead to multiple outcomes at runtime. This simplifies the challenge to a performance question: how to optimize speed. |
Slide 53 — 25:22 (watch)
![]() | While there is a potential for correctness issues, the design aims to prevent users from making mistakes. |
Slide 54 — 25:34 (watch)
Slide 55 — 25:48 (watch)
![]() | You're probably thinking of a vector clock or a Lamport clock. |
Slide 56 — 25:58 (watch)
![]() | The fundamental difference here is that virtual time is a prescriptive technique, while logical clocks are descriptive. Virtual time introduces structure and eliminates certain options. |
Slide 57 — 26:06 (watch)
![]() | The system now has significantly fewer degrees of freedom than before, which also simplifies its complexity. |
Slide 58 — 26:12 (watch)
![]() | Users of the system no longer have to deal with certain issues that could have arisen, which can be a relief. |
Slide 59 — 26:22 (watch)
![]() | Logical clocks, in my view, do not reduce complexity; rather, they transcribe it. |
Slide 60 — 26:38 (watch)
Slide 61 — 27:00 (watch)
![]() | I spent a lot of time thinking about my professional goals. Initially, I believed that my aim was to be perceived as a really smart person, showcasing my intelligence on stage. |
Slide 62 — 27:16 (watch)
![]() | Ultimately, what are computer scientists uniquely good at? Where do we truly provide value? |
Slide 63 — 27:24 (watch)
![]() | In my personal experience, effective abstraction has been a key strength. There are many opportunities for talented individuals in various fields. |
Slide 64 — 27:30 (watch)
![]() | One of our strengths is our ability to take complex systems and reduce their surface area, making them easier and more accessible for those who prefer not to engage with that complexity. |
Slide 65 — 27:42 (watch)
![]() | In mathematics, a 150-page paper can be met with enthusiasm, and a 200-page paper is often praised even more. |
Slide 66 — 27:50 (watch)
Slide 67 — 28:06 (watch)
Slide 68 — 28:16 (watch)
![]() | It's not necessarily as interesting. Many of you may be thinking that you want complexity. |
Slide 69 — 28:24 (watch)
![]() | We aim to remove complexity for others, which is a key aspect of the service we provide. We often receive recognition and compensation for achieving this goal. |
Slide 70 — 28:38 (watch)
![]() | Virtual time is a great abstraction. While it can be challenging to implement correctly, it is not overly complicated. However, it is easy to misuse if not handled properly. |
Slide 71 — 28:50 (watch)
Slide 72 — 28:56 (watch)
![]() | The components fit together nicely and logically. |
Slide 73 — 29:00 (watch)
![]() | You obtain a result that you understand. |
Slide 74 — 29:04 (watch)
![]() | You may not fully understand how it works or why it produces the correct answer, but you appreciate that it does. It's a relief not to have to learn about distributed systems to use this effectively. |
Slide 75 — 29:22 (watch)
![]() | We will now discuss a few vignettes related to the abstraction of virtual time. This abstraction is beneficial not only for users but also for other system-building functions. |
Slide 76 — 29:32 (watch)
Slide 77 — 29:38 (watch)
![]() | We can demonstrate active replication at the end of the previous section. |
Slide 78 — 29:46 (watch)
![]() | All forms of parallelism are generally relatively straightforward, though I hesitate to say they are easy. |
Slide 79 — 29:52 (watch)
![]() | Task parallelism is relatively straightforward, although it can be challenging to define. |
Slide 80 — 29:56 (watch)
![]() | If five people want to use the same changelog, they can proceed without hesitation. You'll receive consistent answers at the end. |
Slide 81 — 30:10 (watch)
Slide 82 — 30:24 (watch)
![]() | Pipeline parallelism is a prime example of where virtual time excels. In a sequence of tasks A, B, and C, task B cannot start until task A has completed a portion of its work. |
Slide 83 — 30:34 (watch)
![]() | Virtual time records the intended start time for tasks, even if it takes a minute or so before they can actually begin. |
Slide 84 — 30:40 (watch)
![]() | Pipelining is not easy, but it is relatively straightforward. Queries, which are another form of interaction with the system, receive virtual times. |
Slide 85 — 30:52 (watch)
Slide 86 — 31:02 (watch)
![]() | If you have specific requirements for time, such as a minimum lower bound, you can achieve strict serialization. |
Slide 87 — 31:08 (watch)
![]() | A notable property is that you achieve something stronger than serialization: composable strict serializability. |
Slide 88 — 31:16 (watch)
![]() | By revealing these timestamps, multiple users can combine two strict serializable databases to create a third one. While this might seem straightforward, I prefer not to rely on that approach. |
Slide 89 — 31:32 (watch)
![]() | With virtual times, their structure is revealed in a way that allows for composition, enabling the creation of even more fascinating composed systems. |
Slide 90 — 31:44 (watch)
![]() | Errors in these systems often manifest as data. |
Slide 91 — 31:54 (watch)
Slide 92 — 32:10 (watch)
Slide 93 — 32:24 (watch)
![]() | Finally, active replication, which I will demonstrate, effectively functions as deduplication. |
Slide 94 — 32:34 (watch)
Slide 95 — 32:50 (watch)
Slide 96 — 33:00 (watch)
![]() | Here’s a quick demonstration of active replication. |
Slide 97 — 33:20 (watch)
Slide 98 — 33:40 (watch)
![]() | A cluster defines the computation you want, while the replicas are the engines that produce the results. |
Slide 99 — 33:48 (watch)
![]() | You can have any number of replicas, including zero. One is a common choice, and two is also possible. However, having zero is usually a mistake, but we will proceed with that configuration. |
Slide 100 — 34:02 (watch)
![]() | Here we have the default cluster that this is running on. We just dropped a replica, which has caused the highlighted line to stop changing. This behavior is unexpected. |
Slide 101 — 34:12 (watch)
![]() | This behavior is exactly what you want in a virtually timed system. We cannot predict what will happen next. |
Slide 102 — 34:18 (watch)
![]() | In particular, we cannot assume that nothing has changed. While we can't definitively say that this is the wrong answer, we are not yet certain that it is the correct one. |
Slide 103 — 34:22 (watch)
![]() | We need to pause the feed. |
Slide 104 — 34:28 (watch)
![]() | It needs to stop, and it's clear that it has. As consumers, we are left wondering what comes next, but the system indicates that it doesn't know. |
Slide 105 — 34:52 (watch)
Slide 106 — 35:14 (watch)
![]() | If you look closely at the numbers, you may find them tricky to see, but the change log is essentially uninterrupted. |
Slide 107 — 35:22 (watch)
![]() | We observed an interruption on our end, but the change log as data remains essentially uninterrupted, as if we had continued running without pause. |
Slide 108 — 35:32 (watch)
Slide 109 — 35:58 (watch)
Slide 110 — 36:24 (watch)
![]() | This approach enables zero downtime physical replication, which is a principle that applies in various other settings as well. |
Slide 111 — 36:40 (watch)
![]() | Logical reconfiguration allows you to change the business logic of your view. Transitioning from one configuration to another is challenging, but not unexpected. |
Slide 112 — 36:54 (watch)
![]() | The final point I want to make is brief, as this is the last slide. |
Slide 113 — 37:08 (watch)
Slide 114 — 37:18 (watch)
Slide 115 — 37:32 (watch)
Slide 116 — 37:44 (watch)
![]() | I recommend three key things, and this is not an exhaustive list. |
Slide 117 — 37:58 (watch)
![]() | Drug footing is crucial. Use the system you are building; if you don't use it, I doubt the validity of anything you say about it. |
Slide 118 — 38:08 (watch)
Slide 119 — 38:22 (watch)
![]() | Benchmarking is something I strongly advocate for. |
Slide 120 — 38:30 (watch)
![]() | I focus on performance-related work, and if that’s not your area, this may be less relevant. However, I encourage you to try to break your system. |
Slide 121 — 38:40 (watch)
![]() | Everyone involved in structural engineering understands the point at which steel bends. Similarly, if you don't know the limits of your system, I doubt it will withstand significant stress. |
Slide 122 — 39:00 (watch)
![]() | Challenge yourself by attempting to break things and identify their weaknesses. Avoid relying solely on the easiest benchmarks; instead, engage in tests that may complicate your work. |
Slide 123 — 39:04 (watch)
![]() | The good news is that if you conduct thorough testing, you'll discover not only the weaknesses in your own system but also in others' systems. |
Slide 124 — 39:16 (watch)
Slide 125 — 39:26 (watch)
![]() | It’s important to thoroughly test systems to identify their weaknesses. Additionally, effective communication is crucial; engage with others about ongoing developments. |
Slide 126 — 39:38 (watch)
![]() | I have historically written many blog posts, and often, while writing about a fascinating idea, I find myself halfway through realizing that it’s not a good idea at all. |
Slide 127 — 39:54 (watch)
![]() | I'm trying to explain how good an idea is, but the explanation requires balancing many different elements. Often, I end up deleting the post and going back to improve it. |
Slide 128 — 40:00 (watch)
![]() | The exercise of trying to communicate and bring others on board is essential. |
Slide 129 — 40:10 (watch)
![]() | Explaining how simple and easy this process can be is not only valuable work but also informative. It tests the theory that this information is worthwhile and genuinely simplifies people's lives. |
Slide 130 — 40:26 (watch)
Slide 131 — 40:36 (watch)
![]() | Building confidence is a process that involves many components working together to achieve a functional outcome. |
Slide 132 — 41:04 (watch)
Slide 133 — 41:22 (watch)
![]() | Is your goal to build confidence for others, or is it for yourself? Are you engaging in a self-indulgent study? |
Slide 134 — 41:32 (watch)
![]() | Software reliability can easily lead to the misconception that one can create highly reliable systems in the abstract, without considering the true implications of what reliability entails. |
Slide 135 — 41:40 (watch)
![]() | Reliability has different meanings depending on the context; for example, it differs significantly for a self-driving car compared to a pacemaker. |
Slide 136 — 41:46 (watch)
Slide 137 — 41:54 (watch)
![]() | With Black Friday approaching, will Materialize remain operational as I conduct business? |
Slide 138 — 42:04 (watch)
Slide 139 — 42:16 (watch)
![]() | A crucial aspect of reliability involves design bugs rather than pointer access bugs or similar issues. |
Slide 140 — 42:30 (watch)
Slide 141 — 42:38 (watch)
![]() | I'll pause here. We have time for one or two questions. If the questions are particularly challenging, we may have less time to address them. |
Slide 142 — 43:00 (watch)
Slide 143 — 43:20 (watch)
![]() | To be clear, I did not develop the concept of virtual time; that was done by Jefferson in the 1980s. |
Slide 144 — 43:40 (watch)
Slide 145 — 43:54 (watch)
![]() | I apologize that you didn't hear about it until now, but this is how it actually needs to work. |
Slide 146 — 44:00 (watch)
Slide 147 — 44:44 (watch)
Slide 148 — 45:18 (watch)
Slide 149 — 46:54 (watch)
Slide 150 — 47:58 (watch)
![]() | Thank you. |
Slide 151 — 48:14 (watch)
![]() | Thank you. |






















































































































































