How Building with AI Can Double the Throughput of Your Engineering Team — Brian Scanlan, Intercom

53 slides extracted.

Slide 1 — 0:08 (watch)

I'm Brian from Intercom, and this conference has been great so far.

Slide 2 — 0:24 (watch)

I've gained a lot of valuable insights and inspiration from the talks and conversations with attendees.

Slide 3 — 0:30 (watch)

Intercom is a 15-year-old, privately held, Irish-American B2B SaaS startup that pivoted to become an AI company the weekend ChatGPT was released.

Slide 4 — 0:44 (watch)

We have approximately 1,400 employees across Dublin, London, Berlin, San Francisco, Chicago, and Sydney. Our R&D is primarily based in Dublin, while engineering is mostly located throughout Europe. The presence of forward-deployed engineers has altered this distribution. This graph compares our revenue growth to the growth rate of publicly traded SaaS companies over the past few years.

Slide 5 — 0:58 (watch)

You can see that publicly traded SaaS companies are experiencing a decline. In contrast, Intercom is performing well despite this downward trend. Now, I will shut Intercom down live on stage.

Slide 6 — 1:10 (watch)

I once gave a live deployment talk, which I found to be quite impressive.

Slide 7 — 1:16 (watch)

Intercom has become a leading example of companies redefining themselves in the age of AI.

Slide 8 — 1:22 (watch)

The New York Times recently published an article about SaaS companies reinventing themselves, which prominently featured Intercom.

Slide 9 — 1:52 (watch)

Being an AI company involves much more than simply adding lightweight wrappers or auto-completing text fields. Our AI agent for customer support, Finn, serves over 8,000 customers and boasts industry-leading average resolution rates, with revenues approaching $100 million. We launched Finn on the same day GPT-4 was released, making it the first product to utilize GPT-4. We have been developing AI features since around 2018, and modern LLM models have significantly enhanced our ability to address customer support inquiries. Companies like Anthropic, Snowflake, Linear, Glean, and LaunchDarkly use Finn for their customer support, demonstrating that SaaS is still very much alive and effective for businesses of all sizes.

Slide 10 — 2:28 (watch)

We recently announced that we have our own model serving 100% of Finn, which handles English text-based conversations and outperforms frontier models like Sonnet. It is cheaper, faster, and better. Currently, we achieve about 2 million resolutions per week, and we are also happy to offer direct access to our suite of models.

Slide 11 — 2:56 (watch)

I am a senior principal engineer at Intercom, where I have worked for 12 years in our platform group. We manage Intercom's uptime, performance, security, cost management, and observability. Our applications, primarily built with Ruby on Rails, are part of our commitment to internal developer productivity. At Intercom, we are obsessed with shipping. Shipping quickly and iteratively is the best way to create high-quality products that customers love. We have always invested in developer productivity because shipping is the heartbeat of our company. This concept was the subject of a blog post we published years ago, which Honeycomb even created cool stickers for.

Slide 12 — 3:20 (watch)

For the past few years, I have focused on integrating AI into our software development lifecycle.

Slide 13 — 3:32 (watch)

I will discuss our excitement about AI and its impact on our company. We have transformed our entire approach to customer support by implementing AI agents. We are eager to accelerate adoption and change our development processes across Intercom.

Slide 14 — 4:00 (watch)

We explored several familiar options, starting with GitHub Copilot, followed by widespread adoption of Cursor, and evaluations of Augment and other tools. However, by mid-last year, we were dissatisfied with the results. While there were some positive signs and certain tasks became marginally better and more enjoyable, we recognized the limitations of the current models and tools. We firmly believe that AI, much like in the past, will transform knowledge work. Therefore, in the middle of last year, we established a straightforward goal.

Slide 15 — 4:48 (watch)

Our goal is to double the throughput of engineering this year. At Intercom, we measure various metrics and conduct developer surveys, utilizing tools like DX. However, we have chosen code changes per R&D person as our primary measure of productivity. While every measure has its flaws and can become problematic once quantified, we are eager to see an overall increase in throughput. By adopting new ways of working and integrating AI across different areas, we anticipate a significant boost in productivity. We refer to this initiative as "2X," which reflects our ambitious aim to double productivity without increasing team size. This goal may seem wildly ambitious, but it is also realistic when considering the advancements in models and coding harnesses.

Slide 16 — 5:22 (watch)

In this talk, I will explain our approach to achieving this productivity increase, our perspective on productivity, and provide a sneak peek at some of our internal data and skills.

Slide 17 — 5:38 (watch)

This work coincided with a significant shift in model and coding capabilities. During the Christmas break last year, one of our principal engineers shared a sentiment that many were feeling: the landscape had changed dramatically. This shift has greatly contributed to our success in achieving 2X productivity.

Slide 18 — 6:04 (watch)

This section focuses on engineering leadership. It is essential to be decisive, provide clear executive guidance, and implement organizational change. We have made several updates, including revising job descriptions. At Intercom, if you are not adopting AI—whether you are a designer, product manager, or engineer—you are not meeting expectations. This is a binary situation.

Slide 19 — 6:32 (watch)

You need to communicate the same message repeatedly across various forums to emphasize the urgency of adopting AI. It's important to recognize and reward achievements. When team members update their skills or accomplish tasks, these updates should be shared in Slack channels to celebrate successes. This encourages collaboration, as people can share techniques and strategies that are working for them.

Slide 20 — 7:02 (watch)

We have organized hackathons and AI immersion days, which are essential for engaging our team. We also have a dedicated full-time staff, with our team size doubling and continuing to grow. We're not simply telling everyone to adopt AI; we are actively supporting our hundreds of engineers and R&D personnel in this transition. In medium to large organizations, it's crucial to have your best people focused on this full-time.

Slide 21 — 7:26 (watch)

We chose Cloud Code as our platform. Previously, we allowed people to select their favorite editor without restrictions.

Slide 22 — 7:42 (watch)

Many people are adopting Cloud Code, Cursor, and Augment. We believe in the importance of platforms in general, and while it doesn't matter which one you choose, selecting a single platform is crucial.

Slide 23 — 7:58 (watch)

To a certain extent, you need to overcome model anxiety. It's similar to being multi-cloud; you don't gain the compounding benefits of a well-designed platform if you distribute your work across different cloud providers.

Slide 24 — 8:10 (watch)

It's far more effective to fully commit to one platform, optimizing it and demonstrating its effectiveness. Only consider using multiple agents if there are specific, significant reasons that necessitate such an approach.

Slide 25 — 8:46 (watch)

Our vision is to enable cloud to function like a senior engineer for any technical task at Intercom. We aim to connect cloud to everything, allowing it to perform any action that I can do on my laptop. While we are cautious and not reckless—ensuring that cloud cannot delete our databases—we are a mature company with robust controls, permissions, and audits. This gives us confidence to integrate cloud into our environments as we do with our engineers. We need to onboard cloud and teach it the same information we provide to new hires, including our Rails conventions, architecture, and React patterns. Over the past 15 years, we have developed extensive software, and cloud must understand our testing standards, security rules, and other Intercom-specific knowledge to effectively perform technical work.

Slide 26 — 9:34 (watch)

If the system encounters an issue or goes down the wrong path, we update the guidance accordingly. This creates a flywheel effect that we all contribute to. We have encapsulated much of this knowledge within our engineering context, capturing the necessary skills, guidance, and hooks to ensure proper functionality. We invest significant effort in optimizing cloud code. For instance, we deploy our internal cloud plugins directly to everyone's laptops, bypassing the standard cloud code update mechanisms. This approach helps us avoid the extensive debugging required for cloud code installations across hundreds of laptops, which can be as challenging as managing Python installations.

Slide 27 — 10:04 (watch)

Ultimately, every aspect of technical work is involved. This includes not only code production and advanced autocomplete but also debugging, testing, and planning. The goal is for you to drive Claude, ideally doing so less frequently as you move up the hierarchy of tasks.

Slide 28 — 10:30 (watch)

It delivers real value by providing code and products to customers. Everything is in scope. Even if the models and harnesses do not improve, which is unlikely, the capability curve is accelerating. We currently have the building blocks to shift a significant portion of our software development lifecycle to be agent-first. We can pause everything and utilize this flywheel to examine every piece of work.

Slide 29 — 10:48 (watch)

The tools available today are sufficiently advanced to enable this approach.

Slide 30 — 11:04 (watch)

I wrote some principles to guide us through this transition. When you're trying to get hundreds of people to change their work habits or understand our goals, it's essential to document these principles and provide support. While different principles may apply in various contexts, we believe that all of engineering is evolving. Everything that can be done should be achievable by the agent, which can feel strange, especially when integrating it into production systems.

Slide 31 — 11:36 (watch)

Our role as engineers and product builders is evolving. In the past, I worked as a Unix sysadmin, spending time in data centers racking servers, cabling, and configuring networks. With the advent of cloud computing, I transitioned up the stack, and many others moved from sysadmin roles to Site Reliability Engineers (SREs). This shift focused on automation, leading to more impactful work and higher salaries. Now, we are experiencing this transformation at an accelerated pace across the entire industry, and it feels familiar to me.

Slide 32 — 12:08 (watch)

At Intercom, we are technically conservative and prefer to use single tools that we can master thoroughly. This approach has led us to develop Ruby on Rails monoliths. We are applying this mindset to determine where our focus and attention should be. We need to consider whether we want everyone to create their own multi-agent orchestrators or opinionated workflows.

Slide 33 — 12:48 (watch)

We aim to build durable, testable, high-quality components, encouraging everyone to consider the lifetime value of their outputs. While the specific tools and implementations will evolve over time, documenting our processes at Intercom will remain valuable. Currently, discoverability is a challenge. In practice, we focus on creating small, high-quality, durable, and testable skills that perform exceptionally well. We leverage data and backtesting, drawing from our extensive body of work, including changes, code, and incidents, to validate that these skills operate at a high standard. Additionally, we emphasize continuous improvement, ensuring these components are self-updating and maintain high quality.

Slide 34 — 13:28 (watch)

We don't want to fall behind by relying too heavily on our own implementations. Instead, we aim to adopt new tools and technologies as they become available, such as those from Anthropic. While we may not use Anthropic indefinitely, we are eager to leverage the advantages of software and capabilities developed by others, rather than building everything ourselves.

Slide 35 — 13:58 (watch)

We encourage people to assign problems to agents rather than tasks. Often at Intercom, we prompt agents to execute specific skills, which is generally acceptable and still necessary. However, we are shifting towards simply describing the problem and allowing the agent to determine which skills to invoke. I recently encountered this approach during a security incident where we accidentally published some snowflake table metadata to a public GitHub repository.

Slide 36 — 14:24 (watch)

I habitually opened Cloud Code and instructed it to join a Slack channel to investigate. I was unaware that a skill existed which perfectly encapsulated our data breach policies, criteria, and analysis procedures.

Slide 37 — 14:50 (watch)

Cloud automatically downloaded the files, performed a full analysis, and concluded that the situation was innocuous, outlining all the next steps. I didn't instruct it to do this; it figured it out on its own in about two minutes. This task would have taken me 20 minutes and involved tedious work, such as locating the relevant policy and reviewing various documents. While this may seem like a small example, it illustrates how I simply presented the problem of analyzing security incidents. The system understood the intent and utilized a well-written internal skill to complete the task for me.

Slide 38 — 15:16 (watch)

Even at Intercom, AI adoption is unevenly distributed. While we are ahead of the vast majority of companies, it's essential to help people understand their current capabilities and grow towards effectively using agents in their work. CVA recently discussed a maturity rating for engineers, which parallels our internal approach. We aim to guide individuals through different levels of proficiency.

Slide 39 — 15:40 (watch)

Ultimately, you will master all skills and become proficient with the tools. We want people to use Cloud Code for everything, automate their work, and develop their skills. This includes writing and improving skills, as well as optimizing the environment for agents. This optimization can involve various aspects, such as software architecture, documentation, and other methods that enhance the agents' effectiveness and align with their strengths.

Slide 40 — 16:04 (watch)

We have reached a significant milestone. After fully committing to one tool in December, we began the rollout in January. Within less than a year, we achieved a doubling of our pull request throughput.

Slide 41 — 16:46 (watch)

Here is additional data from our internal dashboards. The number of pull requests for our Cloud Code is in the 90s. Our current bottleneck is code review, but we have a 17.6% approval rate for automatic code approvals. This process is more complex than simply asking for approval; we have conducted extensive work using backtesting and historical data. We involve humans to label outputs and assess the confidence levels of the automatic approvers, which helps shape pull requests to be safe and straightforward. This approach allows for automatic approvals that should have been implemented from the start. Additionally, we have collaborated with our auditors to ensure compliance with SOC 2, ISO 27001, and HIPAA standards, confirming that human intervention is not necessary to meet these certifications.

Slide 42 — 17:32 (watch)

You need to know exactly what you're doing and ensure you have proper auditing controls in place. By moving approvals to a well-organized, tested, and competent suite of agents, including CodeX for code reviews, we can confidently conduct multimodal code reviews. We are confident that this approach does not degrade the environment or add additional risk. In fact, it reduces risk because well-defined agents perform better than humans in these scenarios.

Slide 43 — 18:00 (watch)

Here’s skill invocation. I believe the earlier numbers were inaccurate. We integrate everything into Honeycomb, with hooks for basic information about which skills are being invoked. This data is internally available and contains no private information, allowing everyone to understand what is being used and where. Additionally, we store all session transcripts in S3 for data mining, report writing, and evaluating the effectiveness of our skills.

Slide 44 — 18:20 (watch)

We have established a feedback loop using the session data, which allows us to maximize its value. We are already implementing some interesting initiatives with this data.

Slide 45 — 18:48 (watch)

We are not particularly proud of the increase in defects until recently, but we are closing defects faster than ever. Some teams have been inspired by the move to AI to consider goals like backlog zero or addressing hundreds or thousands of defects. While some of this has been deliberate and planned, there is also a natural deflation occurring as we process this work more efficiently. We are seeing significant progress in this area. Additionally, we have been collaborating with a research group at Stanford, and according to their metrics, our code quality has been improving over time.

Slide 46 — 19:14 (watch)

I'm running out of time at this point.

Slide 47 — 19:24 (watch)

We have hundreds of contributors and thousands of lines of code in our Cloud Code plugins. The project is very active, and Cloud itself supports us.

Slide 48 — 19:34 (watch)

Here’s an example skill. We have base plugins that handle session transcripts, session syncing, and safety hooks.

Slide 49 — 19:48 (watch)

Here’s a skill I built that fixes flaky specs. We have hundreds of thousands of tests, and they tend to become flaky over time. Since we ship frequently, we need to address these flakes efficiently.

Slide 50 — 20:02 (watch)

This skill was not built by me alone. I did not sit down and determine all the necessary steps to fix flaky specs.

Slide 51 — 20:28 (watch)

I’ve worked within a feedback loop, giving the agent a goal and guiding it to the right solutions while addressing many flaky specs. It has produced a well-organized output, including cheat codes and lookup tables, utilizing progressive disclosure. The results are impressive; if our most senior Rails engineers were handling this, I would be amazed by their work. Additionally, we faced challenges like our CI breaking down, which we need to resolve. Cloud code has gained significant traction across Intercom, becoming widely adopted beyond software. People are eager to use our consoles.

Slide 52 — 21:04 (watch)

We are considering the future of engineering, including the possibility of merging product management and design. The single-person team product experiments have been particularly interesting. I've even shipped code that users can utilize in their agents to sign up for Intercom. I've been leveraging our skills to function as a product manager, which is quite remarkable. I wish you all the best of luck. If you are not already implementing these practices, you will likely be doing so in the near future.

Slide 53 — 21:20 (watch)

My contact details are at brian.scanlan.ie. You can visit ideas.fame.ai for more information about Intercom and our agents. Thank you.

Slide 1 — 0:08 (watch)#

Slide 2 — 0:24 (watch)#

Slide 3 — 0:30 (watch)#

Slide 4 — 0:44 (watch)#

Slide 5 — 0:58 (watch)#

Slide 6 — 1:10 (watch)#

Slide 7 — 1:16 (watch)#

Slide 8 — 1:22 (watch)#

Slide 9 — 1:52 (watch)#

Slide 10 — 2:28 (watch)#

Slide 11 — 2:56 (watch)#

Slide 12 — 3:20 (watch)#

Slide 13 — 3:32 (watch)#

Slide 14 — 4:00 (watch)#

Slide 15 — 4:48 (watch)#

Slide 16 — 5:22 (watch)#

Slide 17 — 5:38 (watch)#

Slide 18 — 6:04 (watch)#

Slide 19 — 6:32 (watch)#

Slide 20 — 7:02 (watch)#

Slide 21 — 7:26 (watch)#

Slide 22 — 7:42 (watch)#

Slide 23 — 7:58 (watch)#

Slide 24 — 8:10 (watch)#

Slide 25 — 8:46 (watch)#

Slide 26 — 9:34 (watch)#

Slide 27 — 10:04 (watch)#

Slide 28 — 10:30 (watch)#

Slide 29 — 10:48 (watch)#

Slide 30 — 11:04 (watch)#

Slide 31 — 11:36 (watch)#

Slide 32 — 12:08 (watch)#

Slide 33 — 12:48 (watch)#

Slide 34 — 13:28 (watch)#

Slide 35 — 13:58 (watch)#

Slide 36 — 14:24 (watch)#

Slide 37 — 14:50 (watch)#

Slide 38 — 15:16 (watch)#

Slide 39 — 15:40 (watch)#

Slide 40 — 16:04 (watch)#

Slide 41 — 16:46 (watch)#

Slide 42 — 17:32 (watch)#

Slide 43 — 18:00 (watch)#

Slide 44 — 18:20 (watch)#

Slide 45 — 18:48 (watch)#

Slide 46 — 19:14 (watch)#

Slide 47 — 19:24 (watch)#

Slide 48 — 19:34 (watch)#

Slide 49 — 19:48 (watch)#

Slide 50 — 20:02 (watch)#

Slide 51 — 20:28 (watch)#

Slide 52 — 21:04 (watch)#

Slide 53 — 21:20 (watch)#