The Vulkan Graphics API with Tom Olson and Ralph Potter

2024/12/19

Software Engineering Daily

What motivated the creation of Vulkan?

Vulkan was created to address the inefficiencies of older APIs like OpenGL and Direct3D, which were easy to use but did not provide the performance and control developers needed. The need for Vulkan emerged around 2012-2014 when developers struggled to get high performance from GPUs using existing APIs, leading to the development of more efficient, low-level APIs like Mantle, DX12, and Vulkan.

What platforms does Vulkan support?

Vulkan supports a wide range of platforms including PCs, consoles, Android devices, and even embedded devices. It is fundamental to Android and is supported by most modern mobile phones. Vulkan is also found in less obvious devices like Coke machines and other embedded systems with modern GPUs.

What is the primary goal of Vulkan compared to older APIs?

The primary goal of Vulkan is to provide developers with direct control over the GPU, reducing overhead and enabling high performance. Unlike older APIs like OpenGL, which presented a CPU-like programming model, Vulkan exposes the massive parallelism of GPUs, allowing developers to write more efficient and powerful applications.

How does the Vulkan Working Group ensure the API's success beyond just creating it?

The Vulkan Working Group feels responsible for ensuring the API works in the ecosystem, even though they don't have authority to dictate how it develops. They track developer feedback, survey members and advisory panels, and address issues that arise. They also leverage external efforts, such as those funded by Valve, to improve the ecosystem around Vulkan.

What is the role of the Khronos Group in developing Vulkan?

The Khronos Group is an international consortium and standards body that connects software to hardware. It includes companies like ARM, Samsung, Intel, AMD, and game engine companies. The Vulkan Working Group, a part of Khronos, designs and maintains Vulkan, ensuring it meets the needs of developers and hardware vendors.

What is the Vulkan roadmap and how is it structured?

The Vulkan roadmap is a collection of extensions and features that GPU vendors and implementers aim to support by a specific date. It is designed to bring cohesion to the development process and help developers understand what to expect. Roadmaps are planned years in advance, with milestones every two years, and they address specific subsets of devices like mid to high-end smartphones, desktop PCs, and consoles.

How does Vulkan handle compute tasks compared to OpenCL?

Vulkan offers compute capabilities but is not as orthogonal and clean as OpenCL. While Vulkan can handle compute tasks, it is more focused on graphics. The working group is working to bring compute parity with OpenCL, including features like 64-bit addressing, state management improvements, and robustness features to ensure safety and reliability.

What is SPIR-V and how does it fit into the Vulkan ecosystem?

SPIR-V (Standard Portable Intermediate Representation) is a binary intermediate representation for shader programs. It standardizes the intermediate representation, allowing multiple front-end languages to generate it. This reduces the complexity and bugs in compilers and allows drivers to focus on translating the intermediate representation to the hardware instruction set. SPIR-V is a powerful choice that supports multiple shader languages like HLSL and GLSL.

How can someone get involved in the Vulkan Working Group?

To get involved in the Vulkan Working Group, the most common route is to work for a company that is a member of the Khronos Group, such as a GPU vendor or a company that uses Vulkan. Alternatively, small companies can join as associate members, which is less expensive and allows participation without a vote. Individuals can also contribute to the open-source aspects of Vulkan, such as bug reports and sample codes.

Chapters

Vulkan is a modern GPU programming API offering developers more direct control, addressing the inefficiencies of older APIs like OpenGL and Direct3D. It improves performance by exposing the massive parallelism of GPUs and enabling multi-core CPU utilization.

Vulkan provides developers with more direct control over the GPU.
It addresses inefficiencies of older APIs like OpenGL and Direct3D.
It improves performance by exposing the massive parallelism of the GPU.
It enables multi-core CPU utilization for improved efficiency.

Shownotes Transcript

Vulkan is a low-level graphics API designed to provide developers with more direct control over the GPU, reducing overhead and enabling high performance in applications like games, simulations, and visualizations. It addresses the inefficiencies of older APIs like OpenGL and Direct3D and helps solve issues with cross-platform compatibility.

Tom Olson is a distinguished engineer at Arm, and Ralph Potter is the lead chrono standards engineer at Samsung. Tom and Ralph are also the outgoing and incoming chairs of the Vulcan Working Group. They joined the podcast to talk about earlier graphics APIs, what motivated the creation of Vulcan, modern GPUs, and more.

Joe Nash is a developer, educator, and award-winning community builder who has worked at companies including GitHub, Twilio, Unity, and PayPal. Joe got his start in software development by creating mods and running servers for Gary's Mod, and game development remains his favorite way to experience and explore new technologies and concepts. Welcome to the show. Thank you so much for joining me today. How are you doing? Hi, Joe. Doing good.

This is Tom Olson, by the way. Awesome. Perfect. How about you, Ralph? Hi, Joe. Yeah, I'm doing good. Thank you for having us both. Awesome. So to kick us off, you know, I introduced you both there as the chairs of the Kronos Vulcan Group. There's a bit more to both of your stories. Tom, do you want to kick us off by introducing yourself and how you came to be working on Vulcan? Sure. So I work for ARM, which is known as a CPU company, but we do also make GPUs, the Mali GPU family.

And I've been a professional graphics standards committee chair for the past 18 years. It's a bit horrible. I started chairing the OpenGL ES standard, which some folks may have heard of. It's the mobile version of OpenGL ES.

And when the need came out for Vulcan in 2014 or so, I, for my sins, picked up the flag and ran with it and helped to get that effort started. And I've been doing that ever since. I've reached the stage where it's time for me to get out of the way and let younger people take charge. So, hence Ralph.

Perfect. Yeah. And speaking of those younger people, Ralph, how about you? Yeah. So I work for Samsung where I'm located in our GPU team, specifically Samsung Mobile, the portion who make kind of mobile phone handsets. I've been in the Vulcan working group for six or seven years now, five of them representing Samsung. Yeah.

And yeah, I don't have Tom's 18 years of experience of chairing, but I have a little bit and it will be hard to replace Tom, but we will do our best. Of course. Can I point something out? Yeah, please do. Both Ralph and I have been working on standards before our current employers. This is, it's kind of the culture of the group. There's a large number of people involved with creating Vulcan standards.

where their involvement persists across multiple employers. It kind of gets in your blood, it's hard to put down, and it's a rare skill that companies need. So they will often hire you to keep doing what you're doing. Fascinating. So actually, yeah, that leads me into, I guess, a topic that I think would be interesting to explore, which is

how Vulkan is developed and what Kronos is. But I guess first to set the scene for folks who somehow aren't familiar with Vulkan. Tom, can you tell us briefly what Vulkan is? Sure. So Vulkan is the modern way to program GPUs. In the past, you've heard of APIs like DX9 and OpenGL, and those were kind of graphics APIs. And there was a big magic driver that turned that into GPU commands.

With Vulkan, what we've done is remove, it's not really a graphics API, it's an API for controlling and programming a GPU. All GPUs today you probably know are highly programmable. They have multiple execution engines and you can write code for them, but getting it to run and run efficiently in parallel on the GPU is difficult. Vulkan's job is to expose that power.

That's a really interesting distinction between, you know, it's not a graphics API, it's an API for controlling a GPU. Very, very interesting. So I guess what are the, you know, you mentioned, obviously, it's easy to control that power. What are some of the advantages of this approach? What problems are you directly looking to solve versus the previous generation?

So in the previous generation, the basic problem is that a GPU's programming model is incredibly different from a CPU programming model, even a CPU cluster programming model. Massively, massively parallel, data parallel typically, and increasingly flexible too. And the basic problem was that the old generation of APIs, OpenGL, et cetera, presented a sort of very...

nice, convenient, but very CPU-like programming model where you gave a command, the device did it. You gave another command, the device did it. GPUs don't work like that at all. You queue up massive numbers of commands and you shove them in the driver and the hardware, take them all apart, execute everything. It's kind of like a data flow paradigm. Everything runs as soon as it can, as long as the hardware understands the dependencies. And so the result is that

With the old APIs, you couldn't get the efficiency you wanted because you were working through this very thick abstraction and this very sequential abstraction.

You need an API that exposes the massive parallelism of the device. And so Vulkan does that. Specific problems we had in the old days, OpenGL had no real way to make use of multiple CPU cores. So that if you were trying to keep the GPU fed with commands, because the GPU is a voracious consumer of data and commands,

And you might need multiple threads in order to generate those commands fast enough. You couldn't do it. It wasn't in the programming model. Vulkan solves those problems. So one of the goals of Vulkan, as I understand it, was to be hugely cross-platform and support lots of platforms, which I guess is kind of shown by both your background, between ARM and Samsung, obviously focusing on a whole world of devices.

Ralph, can you talk about what kind of platforms you're looking to support with Vulkan? Because I understand it's not... I think most people think of PCs and the consoles, right? I understand it's like a whole world of things. Yeah, sure. So definitely it exists on PCs. All of the desktop GPU vendors have Vulkan drivers. It exists on some of the consoles, both handheld and dedicated ones. It is...

pretty fundamental to Android these days and becoming more important. So the vast majority of mobile phones that you can buy today will support Vulkan. There are some outliers, but the vast majority will, at least in the Android space.

You will also see it, it may not be so obvious, but it also exists on other devices as well, more embedded devices, appliance type devices. You might find there is Vulkan in there even though it's not obvious to you as a user. So the short answer is if it's got a relatively modern GPU, there's a good chance that there is a Vulkan driver available somewhere.

Right. I think I saw in a talk from SIGGRAPH that someone mentioned a Coke machine. That might be new, Tom, actually, that Coke machine is an example. Yeah, yeah, yeah. Yeah, particularly cursed programming environment. Cool. So yeah, I think that really sets the foundation for what Vulcan is. So I guess too, I wanted to talk a little bit about the history, which obviously you mentioned a bit in your intro, Tom, about, you know, how it, you

you know, around the start of it and transitioning from OpenGL and OpenGL ES to Vulkan. You mentioned that there was a need for Vulkan and that's when you started working on it. What was that need? Can you give us the run-up and the history to Vulkan? Sure. So if you go...

Boy, it's shocking how long ago this was. So go back to, say, 2012 to 2014. The dominant graphics APIs of the day were OpenGL and DX11, which was the most modern version of DirectX on Windows. And they both had this problem that they were lovely programming environments. It was very comfortable and easy to move into environments.

Using them, they worked the way a CPU programmer would expect a graphics API to work. But people had enormous difficulty getting performance out of the devices. So as a result, for example, on consoles, nobody used them at all. Well, they used DX a bit on Xbox, but mostly they just threw them away because they could not get the performance.

And so around that time, you could say it was kind of a revolution. The farmers with the pitchforks, developers were looking for alternatives. And you had things emerge like Mantle, which was an AMD proprietary API since AMD had cornered the console market at the time.

And it had this property that it exposed the parallelism at the price of not being as nice a programming environment. It was much more painful to use and complex to think about, but it gave you the power. And so people were very excited about that. And there was talk of moving toward it. Microsoft began moving in the same direction with an API called DX12.

which I should say DX12 is to DX11 pretty much what Vulkan is to OpenGL. We divide the world down into modern GPU APIs, which is Vulkan, DX12, Metal,

and the old ones, which is OpenGL ES DX9, 10, 11. So these modern APIs were emerging and we felt that OpenGL was going to be left behind. We could see that it had problems. Well, I would say we were all working

Ralph too, I believe, we were all working in the OpenGL space. I was chairing OpenGL ES. We could see that there was no way we could evolve those APIs in a gradual way to meet the need, to provide the efficiency that developers were just demanding. And so that led to kicking off the effort. We kicked it off in 2014, took us a year and a bit. We came out in early 2016 with Falcon 1.0.

So did that answer your question? It did. And it also answered another question. You know, obviously, DirectX 12 and Metal, my timeline is a bit fuzzy, but my understanding is they're kind of all at the same time, all the same generation. And so I was going to ask, you know, how did that happen? Why were they all the same time? But I think you've really filled that in. So I guess one of the things you mentioned there is, you know, this kind of step change in developer experience in terms of, you know, what the developers could expect the ABI to do for them and how much extra work they had to put in.

How do you go about navigating that in terms of an API design philosophy? You're trying to meet the needs of the users as graphics programmers. There was a demand for this level of API, but I imagine you still have a lot of people who are still expecting the affordances of the old API. How do you juggle that? Painfully. It's a constant, I won't say balancing act, but it's a constant debate between

To be frank, in Vulkan as it is today, well, certainly in Vulkan 1.0, we created an API which gave developers what they said they wanted, but was frankly quite difficult to use. And we made sort of an intellectual commitment. And by the way, when we started this effort, we had massive participation, particularly from game engine companies, Epic, Valve, and

Well, Valve first and foremost, they were real champions of Vulkan early. But Epic, Unity, all the majors were there. And we had a big fight about, are we going to make concessions to ease of use? Or are we going to say performance is first, full stop? And we pretty much said, no, performance is first. We will never sacrifice that.

What's happened is that the hardware has gotten easier to use. And so Vulkan has gotten easier to use in parallel. And modern Vulkan is not nearly as gnarly and, you know, sharp edged as Vulkan was early.

Well, I think I'm rambling a bit here, but I'm trying to make sure I hit the various aspects to this. I would say one way we deal with this problem is by tooling. So a feature of Vulkan that we think is one of the best ideas ever. I wish I could remember who in the group had it. So a feature of Vulkan, since it's dedicated to efficiency before all else,

It does not check for errors. When you give a command to a Vulkan function, if the commands you give it are meaningless, the specification says you get undefined behavior, possibly including program termination. So you make one mistake and it's dead. Driver restarts. What do you do? Well, what we do is Vulkan has a defined interface to shim layers.

We call it the layer system. And so when you create a Vulkan, you're a programmer, your program says, I want to use Vulkan, give me a driver, please. You go through this negotiation. You can say, please install the validation layer on top of Vulkan. So if you do that, you get the same interface, all the same functions.

But when you call, if you pass garbage information into a Vulkan function, the validation layer checks it before it calls the underlying driver. And it logs an error if you did something wrong. The validation layers are incredibly powerful and useful. The investment that's gone into them is millions and millions of dollars. It's very complex software. A lot of it paid for by Valve.

a lot of other stuff written by members, but it's one of the most important things. So we're trying to provide you with

a safe and sane programming environment, but not at the price of slowing down the hardware. So the idea is you develop with validation turned on. When you ship your code, you turn it off and suddenly everything runs much faster because the driver's not checking any errors itself. Ralph, what have I forgotten? I kind of rabbit-holed on. Now, I think all of that is correct.

If I was going to give one quick piece of developer advice, I would say if you're writing a Vulkan application and you are doing it without the validation layers enabled, you are doing it wrong and you will come to regret it. They are pretty fundamental to that. I also agree with Tom that this balance of

usability and the challenge of using the API is a difficult problem. If you go back to our kind of 2016 launch publicity, at the time we said Vulkan is not the API for everybody. I think it has become less thorny to use as time has gone on. There is still no free lunch. Like it is still definitely a

harder to get started in Vulkan than it is to get started in OpenGL. There is a higher expectation. Tom said that this is an API to control a GPU. There is a higher expectation that you will understand how a GPU functions than maybe there was in OpenGL. I think once you have that understanding and once you've got over the initial hurdles of how to get started,

The fact that it is more predictable, there are less unexpected driver heroics going on. There is a place in which you can say it is more workable, but it requires a certain base level of understanding. Certainly the barrier to entry is higher. That's kind of undeniable. It's kind of intrinsic to what we built.

It's a really interesting point about if you understand how GPU works, it's easier to use. I feel like in recent years, there's been a lot more general awareness of how GPUs work from base-level programmers because of general purpose GPU, and obviously ML and AI is accelerating people's usage of GPUs. Do you think that understanding is becoming more widespread in the developer community and making it easier for developers at large to use Vulkan or...

I think, well, it's hard to answer that question without assigning where we are today versus where we are with the old API. There is a lot of information out there nowadays. There's been a lot of presentations, a lot of talks.

A lot of documentation from GPU vendors that will tell you nowadays how things work in the OpenGL three days. I'm not sure how much of the same information was out there for the general public to consume in the first place. That makes sense and goes along with Tom's point about, you know, because the graphics cards are easier to use, Vulkan is therefore easier to use. Absolutely.

So you mentioned, throughout the explanation, you mentioned Valve and you mentioned members and sponsors. And we had the introductions both to you about working for members of the group. So I think it'd be great to talk about now about what the Kronos group is, how it's organized, and how Vulkan is developed. Tom, would you like to kick off with, I guess, start with what is Kronos and the working group? Sure. So Kronos is, well, it's an international consortium and standards body with the mission statement of connecting software to hardware.

And so they generally, most of the products they create, though not all, are interfaces between some gnarly piece of hardware like a video accelerator, a computer vision accelerator, a graphics accelerator, and applications.

And their standards kind of range in approach from being hard over in the direction of developer friendly and relatively easy to use to things like Vulkan, which are, you know, here be dragons, but you'll get power if you use them. It's got about 100, 120 members-ish today, I think. And it includes, you know, Samsung ARM, Intel, AMD, StarCraft.

Silicon companies, it includes game engine companies, Valve, Unity, Epic, several others. It includes software consultancies.

Lunar G, who create the SDK and the validation layers for us. There are other kinds of people involved. It's a wide variety. The thing that ties us together is that there is an IP agreement, and this is very important. You don't want to read the legalese, but in a nutshell, we agree that if you have patents that are necessary to implement Vulkan or any other of our ratified standards, you

You agree to license them to implementers of Vulkan at no cost if they're necessary. Patents on techniques for implementing Vulkan, like particular circuits for this or that, you can own and you can enforce the patents. But something that the standard itself necessarily infringes, you have to license or you can withdraw from. There's a way to withdraw yourself, but it's suicidal. So it's very rare to do that.

So that's Kronos at a high level. There's a board, Avi, there's all this infrastructure. Then there's the Vulcan Working Group, which has the members that I said on a typical call, we have maybe 40 people from maybe 20 companies participating. We do design work on new projects.

functionality and we do this maintenance call, which we did this morning where we go through and fix the corner cases and answer developer complaints. We have a presence on GitHub and anybody in the world can come in and say, "This doesn't seem to work the way I thought it did. Is the spec wrong or what?" We will jump on that and answer that. Often it results in spec clarifications.

We do other things. We make a conformance test so that if you're implementing Vulkan and the spec says, though, implementations must do this, we make sure they do. And that's our biggest expense. We spend about a half million a year, more than that, actually, writing those tests. We have software contractors who do that for us. Then there's the SDK and the tooling part.

There's a compiler that is used for, as I said, GPUs are programmable. You program them in special languages and there's a compiler for that. Those are all things that we maintain as part of this effort. Okay, that opens up so many questions. I'm going to start with the conformance test. So you mentioned if there's an implementation, you will test it and that's the responsibility you take. So when you say an implementation, what does that mean and why?

I guess my question that came from that is, if someone goes and starts a new Vulkan implementation, just like some random person, and then they say, "Cool, do you have to test it?" Is that how that works? So here's how it works. Vulkan is a trademark of the Kronos group. And if you want to use that trademark,

you have to have permission and the Khronos groups on the website, the guidelines say you can use it for conformant implementations. And there's some weasel words about, you know, if it's in development and it's not certified yet. But basically, when you've got your thing done, and sorry, let me back up and say, typically, a Vulkan implementation is a device driver.

It comes from a GPU vendor and you get it and install it on your machine. Because you have the right GPU, you install their driver. We have the ability, this is built into the infrastructure, that if you have two separate graphics cards in your machine and two GPUs from two different vendors, you can put in drivers for both of them and it'll work.

And your application will have to choose when it's setting up, starting to use Vulkan, say, well, which device do I want to run on? But generally, a Vulkan implementation is a device driver. There's a lovely software implementation out there called LavaPipe. And if you are experimenting and learning and having trouble or you just don't want to install a device driver, you can use LavaPipe. It's quite efficient, actually.

Sorry, I got sidetracked. No, absolutely. That's perfect. Yeah, Lovepipe sounds really cool. And I think that ultimately answered my question. So sorry, I'm now going backwards through your answer to how the Acronis worked. So when it comes to participating in the working group, you work at Arm, Ralph works at Samsung. What is the, I guess, arrangement there? Do the members give employees over to the working group full-time? How does that work? No. So, well, it's up to the member.

Members of Kronos are typically companies. There are a small number of individual contributors approved by the board, but generally to work in Kronos, your company joins, they pay a fee, it's a couple of tens of thousands a year.

And then they have the right to participate. And what that typically means is they tell certain of their employees, part of your job is to go to the meetings and contribute to making Vulcan better and making Vulcan work for the community. So in my case as chair, it's been a full-time job for me, but that's rare.

For most of our members, they're working maybe 10 to 40, 50% of their time devoted to working on Vulkan and the rest, they're doing things for their own companies.

Participation involves, in the case of Vulkan, two 90-minute calls a week, NewTek and OldTek. Plus we have subgroups. There's a separate group that deals with ray tracing, and they have their own meeting. There's a separate group that deals with machine learning, and they have their own meeting.

There is a separate group for dealing with the programming language that is used to program the programmable parts of the GPU. And we have a few others. There's like a marketing committee, et cetera. So you can get as involved as you want. You can't really be effective if you aren't spending at least 10% of your time on it because you kind of need to be known. You need to have traction. You need to understand what's going on. And there's a minimum cost to that.

Yeah, that makes total sense. This episode is brought to you by WorkOS. If you're building a B2B SaaS app, at some point your customers will start asking for enterprise features like single sign-on, skim provisioning, fine-grained authorization, and audit logs. That's where WorkOS comes in, with easy-to-use and flexible APIs that help you ship enterprise features on day one without slowing down your core product development. Today, some of the hottest startups in the world are already powered by WorkOS.

including ones you probably know, like Perplexity, Vercel, Brex, and Webflow. WorkOS also provides a generous free tier of up to 1 million monthly active users for user management, making it the perfect authentication and authorization solution for growing companies. It comes standard with rich features like bot protection, MFA, roles and permissions, and more. If you are currently looking to build SSO for your first enterprise customer, you should consider using WorkOS.

The APIs are easy to use and modular, letting you pick exactly what you need to plug into your existing stack. Integrate in minutes and start shipping enterprise plans today. Check it out at WorkOS.com. That's WorkOS.com.

So I guess into the particular scene, you've got those meetings, but Vulkan has, like any software project, a cadence of releases and things that get added and the meetings you're having have to come out. Ralph, as the person who's now responsible for this, can you talk to us about the roadmap and how they're constructed? There's a couple of terms that I think would be useful because I know there's been some change in how the roadmap and how versions have worked over the years from core version, the roadmap profiles to milestones. Can you lay all that out for us and how it works? Yeah. Yeah.

So the first thing to understand about Vulkan is that we have a core API that it's kind of

The default set of things that everybody implements the mandatory requirements, on top of which we have a notion of a thing that we call an extension, which is another package of functionality that GPU vendors or implementers who feel that it's valuable can implement this extension specification. And it's essentially an optional piece of functionality that they might decide that their market and their customers see value in.

And the way that we used to do things is that we would release a core version on a pretty regular two-year cadence, which would roll up a certain amount of functionality that had been exposed in extensions, but extensions would flow out throughout the year on a pretty ad hoc frequency.

And a few years back, we came to the realization that this was getting extremely difficult to handle as a software developer.

We have 11 adopters, I want to say, off the top of my head, somewhere around that field. And we had all made different decisions about what we felt were valuable extensions. The extensions themselves contain optional sub-functionality in them. And identifying exactly what you could expect as a developer became very difficult. On top of which, we had always taken the view that

The core API had to be capable of running on more or less everything. You referred earlier to Tom saying Vulkan on Coke machines. That was sort of a constraint on the core API. From 2016, when we launched Vulkan 1.0, we never raised the minimum specs that you needed to run the core on. And so our only approach was to add more and more extensions.

As a route to trying to bring some order to that, our process now is we define a thing that we refer to as a roadmap, which is, again, a collection of extensions and features. But we say for a particular subset of devices, we describe it as immersive graphics devices. You can think mid to high end smartphones, desktop PCs, consoles, etc.

that everyone will ship devices that fit the requirements of a particular roadmap by a particular point in time or approximately that point in time. So we hope that brings some cohesion. Those have dates on them. So we've released a roadmap 2024 earlier in the year. I don't think it will be a huge surprise to anyone to say that there will be another one coming. And we have now got into a model where we can plan

the rough content of roadmaps many years out in advance, which is also important because hardware roadmaps are amazingly long. And if we need to have a conversation in the working group about we would like us all to support feature X,

If somebody doesn't have it in hardware, we're talking about probably five years to go from a hardware design to an implementation to something that shows up in a product. And so we have a roadmap for a couple of years out, and we have more tentative things for as far out as 2030. And when we get that far out, they're kind of nebulous. Maybe they won't arrive in practice, but there's a structure to it now.

And so I think my message would be if you're a developer trying to figure out where we're going, the roadmap tells you directionally where we're going. The core API is supposed to tell you what you can rely on to exist on

Any device that has updated drivers is kind of how I would categorize it. That hardware view and the amount of time that adds to it is, yeah, that's a really fun constraint for working in this world. So obviously you've just told me there that you do plan out far enough in advance, which leaves me no choice but to ask what's next to the next milestone, which I guess is 2026 is every two years, right? Yeah.

There is a milestone planned for 26. I believe that we shared some of this at SIGGRAPH. Yeah, I had slides on it, but I can't remember. I am now trying to recall your slides, but I mean, things that I know are on there. We have some work on debugging improvements. I believe there's some work on compute improvements. I believe we've talked about ML work to come.

We've got a couple of robustness features. Well, we have this expectation that WebGPU, which is the Web Graphics API, which is vaguely Vulkan-like but much friendlier and does a lot more work for you and therefore runs slower. But it is what it is. It'll be great for learners and...

Anyway, so they have very strict requirements for safety because you're going to run code off the web and you have no idea what it is and you don't want it to screw up your machine. So a bunch of robustness features that will make it, we hope, possible to write an interpreter for WebGPU that is absolutely impossible to crash from the outside. We have that. We have a bunch of stuff that

related to getting compute parity with OpenCL, which is a lovely higher level computing API for GPUs.

primarily. It's not absolutely limited to GPUs. But you can also do compute in Vulkan, but it's not as nice. It's not as orthogonal and regular and clean. And there are some safety features that aren't present in Vulkan that are present in OpenCL. So we're trying to have some uniformity there. Things like 64-bit addressing. You don't necessarily have it in Vulkan, but we'll have

In Roadmap 2026, we'll be basically saying all of the interesting, highly programmable devices that support lively open software markets are going to have 64-bit addressing in the GPU. So these things are coming.

the ability to cast pointers. Vulkan doesn't have it. Some state management improvements. State is the bane. Okay, now we're going to get too far into stuff you don't want to hear about. But managing GPUs have enormous amounts of

state and you typically set up all this state and it defines a virtual machine and then you shove data through it with a shovel as fast as you can shovel and it all just works. But managing that state is a nightmare because you want to change it and start shoveling more data, but the old data isn't finished running through. Anyway, so we have state stuff in mind, ML stuff, as Ralph said. Obviously,

Okay, this goes to a meta point. What is Vulkan's job? In our view, we have this discussion with our board of directors who persist in thinking of Vulkan as a graphics API. In our view, the mission of Vulkan is to do whatever people want to do with a GPU. And so GPUs, for example, on your desktop,

Graphics card has a video decoder in it. They all do. And so exposing video is part of Vulkan's job, and we do that. People use GPUs for machine learning. It's like the dominant platform for machine learning. And so therefore, it's in our wheelhouse to expose machine learning on GPUs. So anything people want to do with a GPU, we want to provide what you need to do it.

I think, Ralph, you did cover the debug, for example. Ralph was one of the leaders of getting debug functionality in. Yeah, I think you mentioned the other day you worked on some of the extensions prior to your chairship, right? One of my first initiatives in arriving in the Vulkan working group as Samsung's representative was, this is not unique to Vulkan, but debugging what happened to your GPU when it crashed is...

a really thorny, painful problem for developers. And so, yeah, when I arrived, one of my first initiatives was the working group should really do something about this problem. And it's not an easy problem to solve. We spent at least two years discussing exactly what we could do there.

The classic maneuver of joining the committee to solve your own problem. I like it a lot. Perfect. That was essentially what went on there. We all do that. We all do that. I think I'll also add one caveat to all of this discussion about roadmaps, which is to say we're talking about the future here and...

Historically, we have been a little bit risk averse about saying we're going to do a thing and then essentially we have historically only wanted to say we're going to do a thing when we absolutely knew it was done and nothing could possibly go wrong. Roadmaps are new ground for us. Speaking about what's on roadmaps that have not been announced is definitely new ground for us. And so there is a world in which

companies start working on these things that we've said are on our roadmap and somebody discovers there's a problem. And collaboratively, if we're asserting that everybody will support a thing, sometimes that means we need to figure things out. So this is where we're trying to go. The things that are not in a published document, there is room for them to move around in time based on problems that people run into.

Talking about the future is difficult. Yeah, yeah, that totally makes sense.

Thank you for that, that's amazing. Yeah, great review of the content. But also to do that, you mentioned your SIGGRAPH talk, Tom, which you said something in that talk that I wanted to chat about because I thought it was really interesting and it hits on another thing you just said about, you know, what is the role of Vulkan? So paraphrase, I think you said something like it takes an ecosystem to raise an API and you were talking about the ecosystem around Vulkan as a whole. But you said the really interesting thing you said was that although the working group doesn't have authority to dictate how the ecosystem develops, it does have responsibility to ensure that it works.

which seems like a very difficult hill to stand on, and I imagine it's an eye-bearer to manage. Can you talk a little bit about this and how it influences your work? Sure. Well, I mean, this was a realization we came to slowly because we created Vulcan 1.0 back in 2016, and people desperately wanted to use it. And we came out and said, here it is. We finally got it done. And we gave it to them.

And they were like, well, now what do I do? I don't know how to learn this. The API is enormously complex. I don't have any tools that I can use. There are bugs in the implementations. This, thank you, but it's not solving my problem. And it was a gradual process for us to understand that we have to define our job broadly as if the job of the Vulkan working group is to create Vulkan and also make sure it

it's successful. We have to own all the problems that somebody else isn't owning for us. I mean, it's a tiny organization. We have a budget of about a million and a quarter per year.

half of which we spend on conformance testing. So compared to some other standards, bodies were tiny compared to the way I like to say it. Maybe I said this at SIGGRAPH. We are approximately one third the size of the average McDonald's in terms of our annual budget. I don't recall that, but that's fantastic. Okay. In terms of our annual cash flow. But we have fortunately a lot of, well, I will say we are leveraging efforts of many people outside of

Valve is wonderful about funding a lot of work in the ecosystem that Chronos doesn't pay for. So the total value going into the Vulkan ecosystem is many times what the working group's budget is, but still we're small. And so anytime a developer is finding Vulkan not usable for some reason, even if we can't solve it ourselves,

We feel a responsibility to listen to them seriously, understand the problem, give them the best answer we can, and hopefully find or motivate a solution from some other part of the ecosystem if we can't do it ourselves. So you asked, how does this affect your work day to day? We do a lot of tracking. Every time we have a face-to-face, which is three times a year, one of them is virtual these days.

One of the things we always do is go through, survey, try to find every piece of feedback we can find from the developer community, survey our members, survey our advisory panel. We have an advisory panel. And all of our GPU vendor members have developer relations teams that are constantly talking to developers and trying to help them use Vulkan on their implementation.

But they hear things and they hear what's not working. So job one, we just keep on top of it. Job two, if it's a problem, it becomes an issue in our issue tracker and it comes up on the agenda and lucky Ralph gets to. A lot of the chair's job, I will say, is rubbing the group's nose in problems that aren't progressing.

And so I've been doing that for a long time and Ralph is going to do it going forward. Got to keep things moving. So another thing you mentioned was in that talk and in your summary of Aromac 2036 was the OpenCL feature parity. And, you know, you mentioned that Vulkan does offer compute. So obviously that's an enormous topic at the moment. Can we talk a little bit about what the facilities of Vulkan offers are for compute and GPU? Tom, do you want to kick us off on that?

Well, I'm old enough. I always start with history. Compute came into GPUs on the desktop back with, I think, DX10 Compute Shaders and OpenGL 3, was it 4.0? I can't remember what OpenGL version introduced Compute Shaders, but it's been around for a long time. There's been a Compute model.

It's GPU-flavored in that GPUs are quirky and thorny. So it's special memory spaces. Compute can only happen here. It can't interact with other things. But the shading languages are general purpose. They have a full population of float and integer types. In modern Vulkan, let's say Vulkan with the extensions that bring it up to 1.3 and beyond,

You have the ability to do something which is like having pointers. It's not quite exactly the same thing, but you can do fully general computing. On desktop hardware, you can do double precision. We have that. We have slowly and painfully worked ourselves to where we think the behavior of floating point numbers is fully specified. There used to be a lot of quirks like

Do you get not a number when you divide by zero or do you get zero? There was a lot of latitude in early Vulcan and we've slowly nailed that down. You may have to enable certain extensions, what do we call it? Like if, for example, you decide, I really don't want round to nearest, I really want

truncation. We have an extension shader float controls that will give you the hooks you need in the language to turn on and off different kinds of floating point behavior. Ralph, do you have any thoughts? I mean, I think I would take it up a higher level and say there are compute APIs, things like OpenCL and things like CUDA

that provide you very precise... So first of all, they tend to be more general programming, kind of C-like programming models. There's things like pointers in there. They also provide you very precise guarantees about things like what precision will my floating point operations give me, exactly how much error can I have in a square root extension, in a square root instruction, that sort of thing.

These are the sorts of things that you need if you're doing, for example, scientific computing. If you're doing a physics, a complex physics simulation, you need to know how is your floating point math going to behave. Graphics historically has been very forgiving of being slightly more lax about that because we're dealing with colors and perception and pixels. And so...

Historically, the graphics answer to how precise does a square root have to be was very different from the compute API answer to that same question. But once you start doing the same sort of compute problems on Vulkan, then a lot of the same considerations come in and we start having to nail those things down, but also nail them down in a way where if you write a compute app and it's critical, you're maybe willing to pay those costs.

But we can't make all of graphics slower as a consequence. And so those are the sorts of trade-offs. That's kind of the high-level take on where there's a difference is they've come from different places and now the use cases are sort of converging. And so some of those things have to come from the compute side. There's more things that have to be nailed down.

You preempted my next question, which is going to be where does this fit alongside OpenCL and Kudas? That's awesome. So I guess to round off this section, you mentioned earlier the programming language for Vulkan and general public shaders. So this kind of ties into, I guess, nicely the news this month that Microsoft will be supporting Spear V, which I believe is the language you're referring to for HLSL, their shader language. Can we talk a bit about what Spear V is and how its role in Vulkan? So again, I guess I'll refer back to history.

In OpenGL, we took in graphics shaders as a shading language as human readable source code. Everybody had a compiler that parsed that source code and translated it down to the native instructions of their GPU. It was built into the API that there would be a function call that you provided the source to and it would do the compilation.

There are a couple of consequences to that. One is that your API only consumes one source language, and people either have to code in that source language or they have to have something that generates that source language. A further complication is that compilers are complicated. Compilers have bugs in them. Different vendors' compilers have different bugs in them.

And that was a painful experience. So putting my former compiler engineer hat on, the typical process for compilers is they're taking a source language and they're translating it into some intermediate representation of the language, something the compiler understands that is not human readable, but still contains the structure of the code.

and then they translate that down to the actual individual instructions, the hardware level instructions. So what SPIR-V is, is that we essentially said we would standardize

the intermediate representation. We would standardize a format that says this is a representation of your program, it's not designed to be human readable, it's a binary representation, but that allows a multitude of front-end languages and front-end compilers to generate those intermediate representations. It gets drivers out of the business of parsing text

And it lets drivers, engineers just concentrate on the problem of how do I get from an intermediate representation of this problem to my instruction set. And it's been a very powerful thing. It's got us to a place where application developers can write their shaders in HLSL if they're coming from the DX world. They can still write them in GLSL if they're coming from that world. There are...

other compilers out there as well that also generate SPIR-V. So in that sense, it's been a very powerful choice. I would say that is one of the early Vulkan 1.0 decisions that we made that was absolutely right. Awesome. Cool. Thank you for running through that. So that's kind of covered all of the Vulkan specific topics I went to the chat about today. And I'm conscious that we're running low on time, but I do want to follow up on

I think towards the beginning of this podcast, we made a couple of jokes about, you know, committee work and people who enjoy it and doing it as a career. Folks who've heard this and they're like, you know, actually working on a committee for an open standard sounds like it's for me. Do you have tips for how you would get involved from the start? A lot of it comes down to picking your employer carefully. As I said, most of what that is, most of us in the group work either for a GPU company, which supports Vulkan or,

a company which makes use of Vulkan in some fashion. I did leave out, by the way, Google is a member because Android depends on Vulkan. So they put a lot of effort into it. So either you need to work for a company that needs Vulkan to exist for some reason, either because they want to sell it or because they want to buy it and use it. And then you work your way into it. It's

I mean, another thing to say about Vulkan, which maybe we haven't touched on, is that we're heavily committed to open source. So the specification is open source. All the tooling is open source. All the compilers that we use and the validation layers and all that stuff. And we've had enormous benefit from people who read, comment on. Proposing a new feature through the open source interface is a tough sell because people

If you're coming from outside and you don't work for a GPU vendor, there are tons of constraints that you're not aware of, and your chances of producing something that will actually work in hardware are near zero. So I wouldn't encourage people to just come in and try to add features. But if you start working with the spec, understand the spec,

Through looking at things that go by, you'll know enough to make a contribution. And your contributions, by the way, would be desperately grateful for. We always are. Bug reports, etc. Not in other people's drivers, but in the spec itself. It really comes down to it's difficult to contribute everything.

Actually, let me back up. There are other places that we're very interested in having help with which are not the spec. For example, part of our DevRel operation, we have a large and growing collection of sample codes.

Ralph, do you know, can you join that group without being working for it? No. Because that group develops examples for unpublished extensions. That's why it's NDA. That makes total sense. Yeah. Getting to a company that's part of a member, I guess, is a starting point. Ralph, anything to add? I mean, I think Tom's point is largely correct. The short answer is the most likely route in is

to work for a company that is a member or if your company is small but in the right space for your company to join and that gets you a seat at the table. Do your own Lunargy. Yeah, well, whether you do your own, definitely there are, if you're a GPU contractor type company, there's space for those. We have game company members. In the grand scheme of things, the cost of joining Kronos is a lot less than the cost of your engineering time.

So that is probably the route in. If I had to say how have most members who are regular participants of the working group got there, the most traditional route is...

become a driver engineer at a hardware company and volunteer to do this stuff. You have to have a certain mindset to find standards work engaging. I love it. There are other driver engineers in my team who find it, you know, a lot of meticulous paperwork and they would rather write code. It's

It's something that you either learn to love or you learn that you want to do something else. But the traditional routine is probably through driver teams in hardware companies. But as Tom has mentioned, we have other members as well. There are game companies, there are platform vendors, there are people like LunarG and Mobica and Agalia who are kind of software contractors. There are people from some of the open source projects, albeit sponsored by

working in that space. So yeah, there's a variety of routes in. I should mention, I said you pay several tens of thousand dollars for a company to become a Kronos member. We really do want the participation of small companies, small game developers, et cetera. So there is what's called an associate membership, which is...

And rather than tens of thousands, it's thousands. And it scales with company size counted as number of employees. Those members don't get a vote in the committee, but they can do everything else. They get to participate. They can make non-NDA proposals for changes, et cetera. And we do that. Wonderful. Awesome. Cool.

Thank you both so much. This has been illuminating for me and it's great to hear how everything works under the hood. And I do believe that you can say, Tom, you mentioned at the beginning that you've got an upcoming retirement. Thank you so much for all your years of service to Vulcan. As someone very downstream, as a big enjoyer of video games, I've enjoyed the fruits of your labours for many years. Thank you very much. And congratulations on your election, Ralph, and good luck for the future. Thank you. Thanks.

Thank you.

The Vulkan Graphics API with Tom Olson and Ralph Potter 51:39 Share