Back to All Videos

Raw Transcript: Stop Struggling with CUDA: How Ubuntu 26.04 is Fixing AI Development Forever

Channel: Camp Gagnon

Raw Transcript

I'm here not to tell you that Canonical is pivoting to becoming an AI company, but to tell you that it's been here all along. Ubuntu powers the majority of today's AI workloads. Unfortunately, I think I am going to contribute to the AI native dev eye test because I have dark slides and light slides, so apologies for that. As Alan said, I'm Jon. I'm the VP of Engineering for Ubuntu. I've been at Canonical for five years. In that time, I've mostly worked on cloud native orchestration-type tools, but in the last year, I've been working on this little open-source project that some of you might have heard of called Ubuntu. I'm here not to tell you that Canonical is pivoting to becoming an AI company. I don't think that's the case. It's certainly not in my immediate roadmap, but to tell you that it's been here all along, in a sense. Ubuntu has been here for the last 21 years, 22 years, and I suspect it will be here for the next 20 years if we don't make any huge mistakes in the coming probably three to four years. We are in the privileged position where I can say this today, which is that Ubuntu powers the majority of today's AI workloads. You might ask, why is that? Some of it is because the wonderful agents that we have today, when you ask them how you do something, are really good at saying, on Ubuntu, sometimes on Ubuntu or Debian, type this command. And I think that is a representation of the fact that for 20 years, the orange Linux has been the one that people reach for. That is a combination of exceptional strategy from the previous people at Canonical and a healthy dollop of luck. But when you launch cloud instances today, most of you probably don't care what kind of Linux you get, but it's probably our one in reality. It doesn't matter whether you're on Google Cloud, whether you're on Amazon, whether you're on DigitalOcean, or Volta, or Hetzner, or whatever you like. If you launch a VM, there's a good chance it's going to be Ubuntu. Whether or not we are, in fact, at the hallowed year of the Linux desktop, we have very much been year of the Linux server for many, many years. But none of those things are particularly new. Those have been facts for some time. It's long before I got involved with Ubuntu. One of the things that I think is really fascinating in this particular era, is something that no one really talks about, because it's not glamorous, it's not glitzy, it doesn't get great headlines on Reddit and Phoronix, where they like to tell us that our choices are wrong for choosing Rust and such. But it's actually our very, very strong partner business. So we're actually a pretty small company in the enterprise Linux game. We're about 1,300 people, of which about 1,000 are engineers. When I joined five years ago, we were 550 in total. But the fact remains that SUSE is, I think, five times our size, and Red Hat is 25 times our size, if you ignore the big IBM-shaped thing on the side of it. So we are relatively little, but I think we punch well above our weight when it comes to working with the people you see on screen, and more. So I couldn't get all of the logos on here, but there's lots of fun ones, like MediaTek and RivOS, and all these things. And this makes a huge difference, because this means when you go down to whichever consumer electronics store you want, or whichever internet business you buy your computer from, and you want to play with AI stuff, you want to get working with it, you open the lid, and it is actually going to work. This is particularly true of NVIDIA hardware in recent months, but we are expanding massively with AMD. We have always done lots of work with Intel. And with things like Qualcomm Dragonwing, which is one of their kind of edge IoT platforms, we were right there on launch day, good to go, with support in the kernel, support for all of the accelerators, right? Not just the CPU and GPU, but all of the NPUs and TPUs, and sort of star PUs that they're starting to introduce. And this, I think, was a pretty big moment. So there's this little Silicon company called NVIDIA that some of you will have heard of, and they released an ARM64-only AI workstation called the NVIDIA DGX Spark, to much applause in the late part of last year. And what was different about this for us is NVIDIA have shipped Ubuntu for years, and they've done that in agreement with us, and it's been called DGX OS, or it's been called NVIDIA something OS, but it's been Ubuntu with a bunch of stuff installed and some different kernel configurations. And what changed for this is they just ship Ubuntu and they call it Ubuntu. And it is not only do they ship Ubuntu and call it Ubuntu, it is the only thing they will support on the DGX Spark. Now, I don't have all of their product roadmap, and I don't know what's coming up, but I'd take a pretty good educated guess that in the future, their other workstations, so this is the Grace Blackwell architecture, also seen sold by Dell, and I think Lenovo, and possibly HP, all of them run Ubuntu. And the reason they run Ubuntu is because they can boot Ubuntu on it, and it just works. Like all of the drivers are there, the kernel behaves properly, all of the accelerators are there. So this was a pretty huge moment for us, but I also think it was quite a big moment for the AI development community, because working in development on the operating system that is running in your cloud, I think is fairly obviously and has always been fairly obviously a bit of an advantage. This is something I'm very excited about. So we do a two-year LTS release. So these are the ones that realistically everybody runs, something like the late 90% of all of our users run an LTS, and they are 22.04, 24.04, and the one coming out in April is 26.04. And for the first time, you will be able to apt install CUDA or apt install ROCm with no other commands on a base Ubuntu system and get exactly the version of ROCm that works with your version of Ubuntu and your GPU with no messing about. And having personally suffered through this with a collection of quite high-end AMD machines and various versions of ROCm and the driver and various other things, and I'm told it is equally as painful in CUDA land, I think this is kind of a big deal. And it also speaks to that cloud story as well. In reality, all of the clouds are also running these big shiny green GPUs in their data centres and not investing engineering time in trying to work out how to get the latest version of CUDA security-maintained for 15 years, which is our kind of promise, I think is a pretty big thing. And it's particularly big for developers getting started in this industry. Lots of you will already know how to wrangle CUDA into shape and make ROCm work with PyTorch. Good for you. If you have any ideas, do please let me know. But I think it's going to get a lot easier because of this. It's just going to be out of the box, essentially. Having said that we're not an AI company, because I don't think we are, in quotes, an AI company, we have released something relatively new which is distinctly kind of an AI product. Not a product in the sense that we are monetising it and selling it to you. A product in the sense that you can go and get it and use it, it's open source and we built it. I think this is really interesting for the hobbyist, the tinkerer, the developer at the moment. But I think it has huge potential to go further than that. So in the world of Linux packaging and the microcosm of internet culture that that is, there are these things called snaps. Most people in the world don't care that there is a thing called snaps. It is a packaging format. It is a packaging format we invented and it has some interesting properties in many, many applications on the internet and in computing. But those exact same properties make it especially interesting, I think, in the world of AI. So snaps are a confined package format. Unlike a deb where you apt instal or an RPM package where you dnf instal or whatever it is, a snap is a little bit more like a docker container. It is a compressed file system full of an application and all of its dependencies. But critically, it runs in a security-confined environment. So we use the AppArmor Linux security module which is a bit like SE Linux for those of you that have been in the Linux space. And that means that you can instal a snap with all kinds of scary stuff in it and run it and not worry about it doing wild things to your machine. Does this sound familiar in the AI space? So we started distributing these things called inference snaps. And this is a bit of a play towards how do we make this stuff easy for people to get working with. At the moment, it's all very fun to be at the very cutting edge playing on Hugging Face and looking at the rankings and trying to work out which model is going to fit on your GPU. And do I want Lama CPP or do I not want Lama CPP? The 90% AI engineer, if not now in the next six months, is going to be like, give me the model. I've heard of Gemma. I've heard of Nemetron. I've heard of Quen. How do I play with it without getting a PhD in Hugging Face? So that's what we've done. Inference snaps are high quality silicon-optimised AI models for everybody. And this goes a bit back to my previous statement. The key here is they are actually optimised by the silicon company. So we are partnering with AMD and NVIDIA and Intel and whoever. When we first started talking about this, no one was really interested and when we launched the first one, everyone started kicking our door down. So we are slowly rolling out more models. But the gist of it is you can, on any Ubuntu machine in the world, type snap instal Quen VL, snap instal DeepSeq R1, snap instal Gemma 3, Nemetron 3 Nano, and what you will get is a working model that has been optimised for the hardware on your machine by the people who built the silicon, delivered by us and maintained by us. And it doesn't just come with the model itself because, again, what do you do if you just download the model in its bare form? Not super interesting. It comes in a silicon-optimised form with the right inference engine for that model according to the manufacturer. And it gives you this super nice onboarding if you just want to play with local AI models. You want to hook them up to Olama. You want to hook them up to, like, Continue, whatever it is you might want to do. This is the kind of, like, gory guts of it diagram. There's a lot going on here, but it's actually relatively simple. So the snap, the big orange box, is the SquashFS file system inside which is stuff, right? And we have this thing called an engine manager which is shared between all of our inference snaps. The engine manager has some nice capabilities like understanding what a machine is, what API level your card supports, that kind of thing. We have engine manifests which describe the different kinds of things you might be, like you're an AMD GPU, you're an NVIDIA-capable, sorry, a CUDA-capable NVIDIA GPU. And then we have the inference engine which is kind of swappable, could be Lama CPP, could be something else, and the runtime of the model. And all of that is hooked up kind of automatically when you instal it through the snap confinement so it kind of punches just enough hole through the confinement to speak to the kernel and get what it needs about the machine. But the outcome is this. You can snap instal QuenVL and then you can QuenVL chat. And in your terminal, wouldn't be a Linux talk without some terminal chat, but in your terminal you can chat with the model. We've got four of these at the moment. This is the least interesting thing you could do with it. It is also, every single one of them comes with an open AI API spec compatible endpoint, right? Just running on local host on a port. Each of them comes on a distinct port so you can co-instal them. Some of them can be running one engine and runtime. Some of them can be running another. You could have, if you've got a big honking machine with a bunch of GPUs in it, you can have one running on your ROCm capable AMD card. You could have one running on your CUDA capable NVIDIA card. That's not a problem. And basically anything that can speak the open AI API spec will be able to speak to this thing. All of the, I'll do a little demo of this at the end. All of the examples on here are talking to local host, but there is nothing to stop you slapping a caddy proxy in front of this and sticking it on a cloud machine with a big honking H100 in it or something like that and using it. So this, assuming you have access to cloud machinery that has access to Ubuntu, which is I think everybody, then you can make use of this immediately. But it's particularly fun to play with on a machine that has some capability. Unfortunately, when I do the demo, I'm talking to you on a thing that has absolutely no acceleration whatsoever,
so it's real slow. But I promise you, it's cool on a machine that's a little bit quicker than my laptop. The other thing that I think as the de facto operating system for a bunch of engineers is sandboxing agents. I am probably preaching to the choir in here when I say that they were, the agents have clearly been a big step up in the capability. Like that is the lived experience for lots of people I speak to in my job, but also my own experience. It went from, hey, there's this cool bot I can talk to in my browser to, okay, I actually have like a fleet of robots that can half do some of the boring stuff and actually some of the more interesting stuff for me. And I think it's difficult to imagine, I imagine lots of you in a position where it's kind of difficult to imagine going back to doing certain classes of work without an agent. Like you're sitting there going, why was I ever doing this myself? You're probably also reading the horror stories on Reddit, sort of laughing and going, ha ha,
that user did something silly. It deleted their home directory because they're idiots. And also in the back of your mind going, what if it deletes my home directory? I have personally not had any of those big fails, but I did have a fun one recently where a set of parallel agents were set off by Claude, which as part of a build process or something else, decided to build five copies of Node.js from source, completely exhausted the memory on my cloud server, which resulted in Tailsgale being um-killed and me no longer being able to talk to it. Not catastrophic data loss, but kind of annoying. And so lots of the agents are responding to this by telling you you can run slash sandbox and it's going to be fine. It might be fine. Like your confinement is a tough topic. It is something that we have been doing for years with our snap packages, for years with AppArmor, for years with virtual machines and micro VMs, and chucking a big honking Node.js code based on someone's machine and wrapping it with bubble wrap with a thousand exceptions, not that handy. It prevents some failures, and that's why the vast majority of you probably haven't had an epic fail and lived the YOLO life, dangerously skipped whatever it is, and are mostly okay. But I'm sure you've all experienced doing something you'd rather not have done, but you're all sensible and responsible, so recover from it just fine. No tears. The good news is we have a bunch of stuff that is out of the box on pretty much every Ubuntu machine on the planet that just makes this pretty simple. We actually have a really cool product which is one of the only things I have worked on in five years that is not yet open source. We will announce in April which I think is going to blow the doors off this thing. But it is built on top of all the things I'm going to talk about today. So one of which is LexD. This is a product that until I worked at Canonical I had no idea about, and about six months after I worked at Canonical I couldn't work out why nobody was talking about it. It is in the category of something that should be boring but I think is very cool. LexD is a clustered version of LexC which is a decade old piece of containerisation technology in the Linux kernel. It lets you get Linux system containers which are a bit like a Docker container but feel a bit more like a VM. So it's a container but it also runs systemd. And so it's much, it's kind of heavier weight than a Docker container but it still doesn't have its own kernel. And it also does virtual machines. And the API is basically identical like LexC launch and then you can LexC launch dash dash VM but all of the other dash dashes and configures exactly the same. But if you want a situation where actually you'd prefer a kernel, another kernel in between you and the agent for a little bit more separation you can do that. Personally I switch between the two depending on what project I'm working on and the source code I'm writing. Sometimes I launch a VM, sometimes I launch a container. The script is identical apart from one flag. And this is broadly how I've been using Claw Code for the last few months. So I have a little tiny script. It's only about six lines and it basically creates LXD container the image for which is already cached. It mounts my local working directory into the container as a bind mount. It mounts my .clawed directory and a couple of .files and starts Ubuntu and Claw Code and it takes about three or four seconds for a cache. So I just type CC here, Claw Code here and I get Claw Code in a box just talking to my project and it can basically run while. It can't make commits because I have to tap my YubiKey for that but I can just let it do what it wants, set five of them off. You can set all the usual constraints like CPUs and memory it can use with or without a VM layer. And this is just installed. It's literally ready to go on every Ubuntu machine out of the box. It's just there. So I'd implore you to go and take a look at it. It's on all of the cloud instances that you're already hosting your stuff on and it makes a really nice little kind of wrapper around these tools. We also have Multipass. Multipass is like the Docker desktop thing in our portfolio. You can instal Multipass on your Mac, on your Windows machine and of course on your Linux machine. You get a nice UI and it's just the fastest way to get Ubuntu anywhere. You Multipass launch and you will be at a Multipass, at an Ubuntu shell in two or three seconds and that is a disposable machine you can just crap all over or an agent can crap all over and then you can get rid of it. It's got all of our different versions. You can do blueprints and recipes and things like that but it has this nice kind of Docker desktop style feel that's a bit more orange that you can use for getting disposable Ubuntu instances that are running under QMU essentially. So that's the kind of development side. We describe Multipass as a bit of a cloud sandbox which is a nice kind of segue into well what about once you finish building your shiny thing, your hundreds of thousands of lines of TypeScript or Rust or Go whatever it is that your agent has been diligently coding for you. How do you launch it? How do you deploy it? Again a question that we have been dealing with for 20 or so years of production Linux and working at Canonical is kind of a really interesting experience because literally everybody knows about Ubuntu and basically nobody has heard of Canonical or any of their other products but like LexD we have all of these other kind of interesting things which again I think are genuinely coming into their own as we head into this era of there's a whole bunch of like semi-autonomous things going off and we want to we want to use them to the best of our ability we want them to run as fast and efficiently as possible but we also want to have some we want to have some guarantees around that we want to know what they can mess with we want to know that if we set them off in production they will continue to be patched for 15 years even if the vendor disappears in the AI boom like some of the vendors that we love lovingly talk about right now I suspect might not be here in five years we're at that sort of point I think in the AI product launch sphere the thing that we have done and have always done and the only reason I can stand here after 22 years of Ubuntu being a big open source project is because we have done the long term support security maintenance patching work that lets other people do the exciting work on Linux for years and that applies here too and we have this nice initiative that we've called sort of like LTS anything and essentially you can come to us with a Docker container we come to an agreement on what that's going to cost but we will literally security maintain that thing your application and all of its dependencies even if that's thousands of Python dependencies or tens of thousands of node dependencies we will keep patching that thing for CVEs for 15 years for you for a cost all based on Ubuntu and the work that we do for keeping the archive running and keeping all those servers going and then we have a suite of kind of automation stuff Kubeflow for example we have a one command deploy Kubeflow that works on any Kubernetes it works on EKS it works on AKS it works on our Kubernetes you can literally one shot a Kubeflow for yourself play around with it and then destroy it and not lose any sleep the same for MLflow the same for OpenSearch if you want a vector database the same for Postgres and so while we don't necessarily directly pedal the newest shiniest AI thing we're not building agents we're not building models we're not training models we kind of very competently take care of all the stuff that that stuff relies on in a very safe secure kind of stop worrying about it sort of way so that's the end of my slides I thought I would give you a little demo of the inference snaps it's a slow demo so it's going to maybe I should just slow down my talking but this is the CLI so you can take a look at the status of this thing this is the Gemma3 snap so that's Gemma3 from Google you could also do the same for Nemotron which is mostly for NVIDIA stuff and each of these has a nice kind of concept of what it supports and what it doesn't so you can see here Nemotron has the concept of a CPU engine which is obviously going to run pretty slow on my potato framework laptop or you can run it on an NVIDIA GPU and it will scream and it'll get the right sized model for the GPU that you have and this is all like you can instal the snap really quickly and the snap in the background will go talk to our store exchange a bit of details of what your hardware is and then go and essentially fetch the right model in the background each of them so we'll talk to DeepSeek for a bit you can do a chat thing this is just a command line chat so hi DeepSeek this will then sit here and set my laptop on fire you can see the token rate is pretty low on my little AMD framework but equally you can open up to things like continue so because it speaks all of the right APIs like this is a little config file I wrote for the continue VS code extension which gives you your kind of self-hosted model cloud code type experience so you can see here each of these just has an API URL which runs on local host on a separate port you can get the model details from the status CLI and then of course you can start asking it questions about your code what does this do and we can sit here and laugh at the token rate that my laptop can do and because continue is designed for all this stuff you can obviously
just switch between them and as we launch more models and different optimised models you'll just be able to kind of turn them on and off as you wish and when you're not actually talking to the thing it consumes essentially no resources it's just there in the background if you need it like I say I think there is a genuine use case here for if you are a company that can afford one big stinking H100 in the cloud somewhere and you want to do private inference slapping one of these snaps on it and a caddy proxy in front of it or an nginx proxy or something like that you're good to go like you suddenly are hosting an AI model on hardware on a machine that you control and you're good to go and that is pretty much it does anyone have any questions thank you so much John
how on earth does canonical survive for 22 years giving it all away thank you very much John Seeger any questions yep let's dive straight over here hi I was wondering you mentioned how well Ubuntu is very popular everyone knows about it do you think now with just LLMs being all over like all over the internet you kind of like are in a position where you will slowly just like end up consuming all of it because every LLM will always be like oh let's do it Ubuntu way blah blah blah and therefore now you get more code on the internet also using it Ubuntu way and I'd say with any luck yes but I think we need to not rely on luck and so part of our job is to work out like how do we keep either how do we do deals with the AI providers to ensure that happens or how do we keep being a default enough in other things that the other material the LLMs are trained on they know about Ubuntu and those kind of things like we're sort of lucky not lucky that that's the situation but I think if we're not careful we could be you know a Sousa or a Red Hat could do a big drive on you know having an LLMs.txt on all of their endpoints and like really going for it so I think we need to pay attention I also think it's very easy for somebody who works at Canonical on Ubuntu to get all wound up on which Linux distribution it is I'm quite aware that most people don't care they're like give me a cloud instance and it like it is what it is again I think because for so long that has been Ubuntu lots of people have got muscle memory around apps instal or whatever it might be we just need to basically keep that up in the same way that lots of marketing teams are flapping over SEO because AI models work a bit differently we have to think a little bit differently about how we make sure that our content is in the right places in the right forms and it's actually relevant like I said we're not in quotes an AI company but that doesn't mean we don't have something to say in an era where everyone is building with AI so it's about how do we position it in a way that is actually authentic towards what we can actually do and what value we can actually add because it isn't going to be competing with Nvidia to train models and build GPUs yeah I would say I'd say pretty much every company wants to be having that conversation in our case it's about how do we make sure that our lens keep telling people how to actually instal RockM yeah cool thanks for the great talk I'm going to ask you something controversial and I'm sure you're watching the news and OpenAI is now planning to do something operating system related yeah we had browser use computer use soon it's going to be operating system use yeah what is the water cooler chat in your organisation and maybe you've heard whispers in other operating system worlds on this where is this going and what impact does it have great question yes so I don't know I mean I wouldn't say there's a lot of water cooler chat like I'm certainly not sat in my seat at home at the moment going oh my goodness OpenAI is going to out Linux us tomorrow I follow it with interest like I thought Claude Code building a C compiler was kind of a fun science experiment I'm sure you all saw this one right and Claude Code built a browser and various other things I think what that shows is it's a very interesting technology these are interesting experiments but like building an operating system isn't just about building an operating system like the fun part of building an operating system in Ubuntu was done 20 years ago like the success has come from grinding away patching security vulnerabilities doing usability studies shipping the latest and greatest open source doing things that have set the internet on fire a little bit like we're doing this big rust transition where we're replacing all the core utilities and sudo and the time-sinking demon with new rust-based alternatives everyone's like oh my god what are you doing but like it's those sorts of things that have actually meant people have kept using Ubuntu like I'm sure that Claude Code can build an OS like I'm absolutely sure of it can like but are Anthropic going to be in the business of actually maintaining an operating system for use on definitely tens of if not hundreds of millions of cloud instances I don't know like doesn't doesn't seem like their business model right like people are still going to want security maintenance and rock solid Linux I think so at the moment I'm not sweating too much cool thank you for the talk
so I've usually run models using Olama so have you noticed any significant performance improvement if you use on your local Ubuntu optimised version yes
because I imagine hardware is a limiting thing
so half of it I think is just selecting the right model like I'm sure everyone has felt the overwhelm of like I'll go and get something hugging faces like okay cool now what what's going on so part of it is just we're just going to restrict that down like we're just going to we're just going to say this is you know it's of the deep seek R1 variety and there's maybe 10 variants and it's going to pick the best one for your machine and best one for your machine is not only the one that's actually sized for your machine but has been tuned to some extent by the silicon vendor now does that mean it's going to be perfect for your use case every time probably not does it mean it's going to be better for 90% of people 90% of the time probably right like if you are doing something hyper specific in like a really really optimised way I'm sure it's worth you investing time in retuning a model and parameterizing and quantising it very very specifically but if you're like I'd like to write some Python I'd like to get some help from an AI model that isn't hosted in the cloud I think we're going to be pretty helpful to you thank you very much that's quite an interesting talk actually I basically have two questions essentially and one is I can't quite hear you sorry so I think you'll still get those kind of errors the point is how you you can kind of choose how much access you give that container to your machine so for me I like everything is in git and I like the only way I can sign commits is by tapping my little UB key so I'm a bit like go throw git commits at my git history it doesn't matter do you know what I mean even if I tell Claude not to and it does it doesn't matter because I can't push and overwrite what I've got so I think to a limited extent if the agent really really wants to delete the working directory on the thing you're working on it's going to delete the working directory the question is how far do you want that blast radius to possibly be so it's really to me about limiting the damage and also about not you talked a lot about context engineering I've noticed Claude Code does this fun thing where it asks if it can read other places on your file system and then gets all bogged down in other code bases that have nothing to do with what you're working on and so putting it in a box where it can only look at the thing that you wanted to look at is kind of handy just from an efficiency perspective as well I think and so there's like a win on both sides there no worries thank you once again Jon