Adrian Mouat is Chief Scientist at Container Solutions, a European services company specialising in container technologies. He is currently researching container orchestration platforms and image management. A member of the Docker Captains program, Adrian is also the author of Using Docker, published by O’Reilly Media.
[00:00:06] So does anybody know what this is? Just shout out if you know. Okay, so it's actually a Thinking Machines CM-200. It might look a little bit familiar because the predecessor the, CM-5, was used in the film Jurassic Park, I think just because it has flashing lights in the front. So it's a supercomputer in Jurassic Park, but this was like a cutting edge computer machinery in the mid 90s. And it was actually this exact model, the CM-200, that was actually installed in Edinburgh, at the University of Edinburgh Parallel Computing Centre, which is just up the road. So I mention that for the local connection, but also because, well it looks cool, and it kind of represents where we used to be in terms of computer hardware and software development. So I'm going to try and take us on a journey from requiring to run large applications — back in the 90s, you wanted to run a large application, you effectively had to buy your own hardware such as this and run it on your own premise — and I'm going to take us on a journey from that through towards cloud computing and microservices, hopefully.
[00:01:28] Now, admittedly, you probably wouldn't have used a CM-200. These were generally used in like artificial intelligence and high performance computing. So if you were like an Internet startup, you probably wouldn't have the CM-200, but you may well have used Sun kit. And so in late 90s, Sun was selling so much hardware that they started calling themselves 'the dot in dot com'. Unfortunately they also became 'the dot in the dot com bust;. But that's kind of another story. They were also pretty famous for the Java programming language, and I mean how many people here have used Java? Yeah, quite a lot of you. So you know Java and also C# became pretty ubiquitousm, especially in enterprise software, and you know they're pretty good languages especially when you're working on large teams. They've got a great standard libraries, work at quite high level level of abstraction, there's great tooling around them and editors etc.. But the thing is, the way that we started to develop applications using them, has now become known as something as a sort of monolithic style of architecture. So basically what that means is that all your functions, all your software enter this one large process or one binary. So everything in there goes into this one thing. So even disparate business constraints — so your search functionality or your favorite stuff, or the UI for the front end — it's all part of the same process. And while you may have a separate database for storing all your state and stuff but you still got one large application in the middle. And this works. There's certainly nothing wrong with that. But you start to see a few issues when you start to add more users. And you're running on a machine and you start to struggle with the amount of load from extra users, or you want to process more data or something.
[00:03:34] And the sort of first solution to that problem, and simplest solution actually, is to scale up. To simply run your application on a bigger machine: more CPU and more memory, and that's the easiest solution; it's the cheapest solution as well, because it requires very little engineering time to do. However, there's a couple of fairly obvious constraints. The first one is that you can only go so far. At some point even on the biggest machine with the most memory that you can afford, you'll get stuck. And also there's no sort of failover scenario here. If that machine goes down, you're in trouble; there'll be some downtime while we migrate to a new machine. So we sort of move to an architecture for large applications more like this: where we simply run multiple instances of the same monolithic code across several pieces of computer hardware. I'll probably refer to them as nodes from now. And that works. It gets a bit more complicated because you've got to figure out some way of spreading your users across the different instances of the monolith and you've also got to figure out some way of keeping the state in sync between the different instances and the pending application. There's different ways to do that. We also start to see — and this is a nice feature, assuming you don't have too much traffic — if you've a hardware failure there's a good chance that your site won't go down because the remaining two instances may be enough to handle the load. So we are certainly balancing resilience there. But the main issue is that we can only scale so fast. And in this modern day of internet applications we tend to see quite spiky traffic.
[00:05:25] So I've got this example from the "Have I Been Pwned" website run by this guy called Troy Hunt, a security researcher, and if you've not seen it's a great website. Basically you put in your username or email address and it tells you if it's been involved in any public data breaches. So it's a really useful website. And what happened to him one day was that somebody mentioned on This Morning, the TV show This Morning, the website — just in passing, it wasn't like a scripted piece; the person interviewing just happened to mention his website and suddenly he went from serving under 1,000 requests a minute to serving over 20,000 requests a minute. So he had to scale up like 20 times, or 20 to 40 times. And that implies that if you are running an architecture like this, you require 20 times the amount of compute resources lying there, normally doing nothing. Just waiting for a peak like this. And that's simply not affordable for the majority of companies running on premise. And so that was one of the drivers, this issue of scalability was one of the main drivers to those move towards cloud computing. Now I'm not going to bore you with like a definition of cloud computing because I reckon most of you here are probably pretty used to the term, but basically in cloud computing we rent our computing resources from people like Amazon and Google rather than buying our own hardware. And because different people or different organizations experience peaks at different times, the actual provider computing resource has a fairly level amount of traffic.
[00:07:12] The interesting thing about cloud computing was in the early days of cloud computing. There were a lot of naysayers, people said that nobody serious would ever run an Amazon cloud. Now this sounds ridiculous but that was common at a time, there was a lot of negativity around cloud computing. They thought it was just a toy basically, and that any proper company would have to have their own datacentres and their own computing. And one of the sort of main use cases for cloud computing came out of Netflix. So I got the slide from Adrian Cockcroft, who used to be one of the architects at Netflix. And can anybody remember back in the old days when you used Netflix, or I think it was LoveFilm in the UK, and you had this DVD by e-mail? Does anybody remember that? You went onto the website, you selected which DVDs you wanted to see, and then he sent you in the mail and you'd watch that DVD, send it back and he sent in the next one. So that was the old model. Now that did involve a website and a web application and it was fairly popular. It did require a reasonable amount of compute resources to run Netflix at that point, but at some point people started getting broadband into homes and it became plausible to start streaming video directly to them as opposed to like sending a DVD in the mail. The interesting thing though is that unsurprisingly it requires a lot more workload in terms of datacenter, about 1,000 times.
[00:08:50] So at the point when just 0.1 percent of Netflix's customers started moving into streaming, they were already using more compute power than everybody that was using the DVD by email stuff. That turned out to be a big issue for Netflix, because they had a datacentre that was running all the operations. But the datacentre was almost full, and streaming was taking off like crazy. So they were faced with a very stark choice, and they had to make a decision very quickly. They could either build a new datacentre, which would have cost tens of millions — but you know they had that money so they could easily do that, they could easily persuade a bank to give them that money. I think they had quite a lot in the bank anyway. Or they could move to the cloud. Now it's not an obvious choice because the thing is, even today it's going to be cheaper to run your own datacentre . The cost of compute in Amazon is still significantly more than it cost to compute in your own datacenter, even when you take into account stuff like operations. But one of the main reasons they went with Amazon — I remember people telling them it was a bad idea to move to Amazon because they just didn't believe that any serious up company could run on top of it — but the main reason they moved to Amazon was they took that money that they saved from building a datacentre and they put it into content. So literally the reason we got House of Cards when we did is because they decide to move to the cloud and take that capital expenditure and spend on content instead of a datacentre.
[00:10:33] Another thing that has started to come out around this time is users' expectations around reliability. By this time I guess we're talking about 2009 sort of thing. I think it took a few more years until House of Cards came out — I guess it takes some time before your investment in content comes to fruition. But around the same time users started getting much more serious and upset when the websites weren't working. I mean back in the 90s it wasn't uncommon to see a website down for maintenance, and nobody really worried too much about that. But nowadays people get very upset. So I looked at some tweets. If you just go and search for Netflix and scroll down, you get some quite interesting tweets. Like this one, where this person gets very upset and asks for a rebate. So I think she thinks Netflix is a public service, like water or electricity. I felt sorry for this girl — It was Saturday night and what am I going to do? I would suggest go outside. And other people are refusing to pay because Netflix isn't working. So now we just really do not tolerate when websites are down. People get very upset and start asking for their money back. And also one thing they found was they were running their things in this monolith. One of the reliability issues with this — there's one day at Netflix where a coder forgot semicolon to the end of an SQL statement. Literally missed one semicolon in an SQL statement. And it took down the database — not just the database, because they only had one database with multiple schema and stuff in it — so it took down the entire system. I think they were out for hours, if not days, whilst they tried to restore the system from backups. I mean, thankfully they had good backups, so they were back up and running, but it took them down for days because of one very simple mistake. And what they realised was well this is kind of silly — if we move away from the sort of monolithic architecture which they were running and we split it into separate components, we can actually become more resilient. So they moved to this sort of architecture and they also moved the databases as well.
[00:12:46] So instead of having our functions in one application, we split them out into separate independently deployable components that can run on separate DMs or separate containers as we have nowadays. And what happens is if say one of these components goes down and the architecture works correctly, it doesn't necessarily affect the rest of the system. So for example if this was search and search goes down it doesn't necessarily mean that I stop streaming video or users stop being able to favourite items or something like that. The rest of the system can continue working. If billing goes down, it doesn't mean that I interrupt the user's playback of a video, for example. And that was the case before the semicolon incident.
[00:13:30] So one of the largest drivers was idea of reliability, another one is that you can scale components independently. So for instance if this the UI, it may be that my user interface component requires a lot more resources than the main building component up here, so I can run this across several instances, while I can run this with a few instances. So I can scale things to the level they need to scale, as opposed to scale the whole monolith each time.
[00:14:07] And another thing that comes out of this is the speed of change: so, I can change any given component within this here, for example; I can change any given component without requiring to make an update to the entire system. If I want to update to, for example, billing here, I have to redeploy the entire monolith across each node, which is a much bigger deal than just making a single update to a single service which can be deployed independently. So by moving to a microservice architecture, they're also able to increase their speed of change considerably.
[00:14:50] And that ties into this idea of 'two-pizza teams'. Have people heard of 'two-pizza teams'? A few people. So this is more from Amazon rather than Netflix — actually that's an interesting point: one of the interesting things about Netflix rather than Amazon is that they are arguably making better use of AWS than Amazon have themselves with stuff like Prime Video, which I thought was frankly interesting. So they are direct competitors in some sense.
[00:15:15] Anyway, Amazon started using this idea of 'two-pizza teams'. So, a 'two-pizza team' is basically this idea that team size should be limited to a team that can be fed by two pizzas. Do bear in mind we're talking about American pizza, so we are talking about a lot of people. So I think like six to 10 would be a rough number. And Amazon felt this number was sort of a good team size, because any further and you get too much communication, any smaller and you can't do a significant amount of work. And what you can do is you can give each team a sort of microservice to look after, as opposed to try and break up bits of the monolith, which is more difficult. So your organization is reflected in your software, which will be very familiar to anyone who's heard of Conway's law. If you go to any conferences you usually hear the term Conway's law at least a dozen times.
[00:16:17] Another thing that came out of this was they found that you can use different software and different applications with different parts of your architecture. So it may be that I can write one service in Go if I need it to be more efficient than other parts, or I can write one bit in a place where I can do some serious data processing, as opposed to in a monolith where I was forced to write everything in the one language like Java or C#. So I can use the language properly for the task at hand. Same thing also with the databases. Generally you split up your databases and each service has its own datastore and that means you use it when it's appropriate for use case. So I might for example use a graph database in one area, while I use a SQL database elsewhere and a traditional Oracle SQL database in another place.
[00:17:10] Do be aware that there is a large downside here: iuf you start using lots of different languages, that means all developers have to become familiar with several different languages or you might fall foul to the bus factor as Mike described.
[00:17:28] And there's other issues as well, the primary one being that we've moved all of the complexity from the monolith that was in the software out into the network layer. So this is a diagram that Adrian Cockroft tried to create of the Netflix architecture, and it's actually just one small part of Netflix that was simplified when he was trying to draw this diagram. And it's very hard to try and figure out what's happening in your system because of all of this complexity has moved into your network. Adrian ended up calling these the Death Star diagrams, from Star Wars. As well as the complexity moving into the network, you've also made things a lot slower. So what was a fast function call in the Java monolith has now become a slow call over the network in our microservice framework, which will be several orders of magnitude slower. And that's a serious problem if you don't break up your monolith in the correct way. If I have two microservices that are too chatty, that will then incur serious overhead and you'll want to refactor to try and address that problem.
[00:18:44] One thing that I haven't talked about is containers or VMs. And that's largely because I see that as an implementation detail — although, if you know me, I'm kind of more famous for talking about containers. But in this talk I feel like it's an implementation detail.
[00:18:58] So when we all start moving towards the cloud, virtual machines really sort of enabled that move. They gave us a form of software packaging in an abstraction layer that we could use to move to the cloud, and they kept us away from caring about the actual underlying architecture of the hardware. And it's exactly the same as containers. The advantage of containers, however, is that they're a lot smaller and faster than virtual machines. So they've kind of grown up hand in hand with microservices. A lot of people basically think of microservices and containers as synonymous, but it's worth pointing out that when Netflix started moving towards microservices, they were very much using VMs; they weren't using containers. They are starting to use containers now.
[00:19:47] OK. I'd like to also talk about a couple of things that I can see us get into in the future. So the first one is unikernels. This one gets a bit technical, but when we are talking about virtual machines, we basically have a full copy of a guest operating system and a virtual machine, lots of code in there and a full copy of your application, and they tend to be quite large — Typically gigabytes in size, I would guess, and it runs on top of a hypervisor. With containers, what you have tends to be a lot smaller. So as you're probably aware, in a container you share the kernel from the host that you're running it on with the container, so the container doesn't have its own kernel, everyone's directly on top of the host kernel and that kernel is also shared between all of the containers, which slims down the size that you require for the container, at the cost that you can't do true virtualization, although in this case virtualization shouldn't be an issue. VMs are really using the cloud as a way of packaging portability as opposed to translating between machine code.
[00:21:02] So typically a container may be tens of megabytes to hundreds of megabytes, as opposed to VMs which are gigabytes in size, primarily because we tend to strip down containers to just the sort of operating system essentials that are required to run our application. But if you think about it, we can take that even further because there's lot of stuff in the kernel that we don't actually need. If you think about it, a Linux kernel is designed to run multi-user, multi-process applications. A lot of code in the kernel is designed for that purpose of handling multiple users at the same time and multiple processors at the same time. But then we go and run containers that run a single process and a single user. So we actually don't need most of what's in the kernel, and that's kinda what unikernels are designed to approach a solution towards. So unikernels is this idea that we can create a binary that will run directly on top of a hypervisor, like a VM, or can run directly on hardware and just contains the parts of the operating system that we need for our application. So we can end up with a binary that's perhaps kilobytes in size but contains a full webserver or something, because we're able to drop all the stuff from the kernel. We don't need the multi-user stuff, the floppy disk drivers... by dropping all that and combining it with an application code into a single deployable unit that we can run on bare metal or a hypervisor — and this has big implications for security, because we cut down our attack surface, and also potentially for things like IoT. So we can create deployable units to be put directly into IoT devices.
[00:22:47] However, most of you probably aren't going to use unikernels, I see this again as more of an implementation detail, and potentially it could be used in stuff like Cervalis. So we may see applications like Cervalis beginning to use stuff like Unikernels. So if you don't know Cervalis, or function-as-a-service is the term I prefer, is kind of this idea that rather than rent compute power directly, like VMs or containers, from the cloud, what we can do is just say to our vendor, "okay I've got this endpoint; when this endpoint gets hit, start this function up." And you don't say anything about the service that it runs on. So now my AWS becomes responsible for handling all the servers and I don't care about the servers. It scales up as much as it needs and the magic thing is, if my function isn't being called, I won't be charged at all for it. I'm only charged when my functions are called. And that's going to be a big thing in the future just because it can really drive down costs and also I could stop caring about servers.
[00:23:56] So just to wrap up. We've kind of moved from this world where we had to buy expensive bespoke hardware and run it on our own premises, towards a word of cloud computing largely for reasons of scalability, capital expenditure, and because it's not a lot of your core business. People don't want to run their servers, they want to run a business. At the same time or slightly later, we're also seeing this move away from a monolithic style of software architecture towards a microservices style of architecture for reasons again of reliability, but also for scalability. So I can scale first off with this approach and also speed up deployment, so people can deploy things faster in this approach. And if you're part of an organisation which is facing challenges around these issues of reliability or scalability, or you find it difficult to change as fast as you'd like, I would suggest that you perhaps consider looking at a microservices architecture or something like Cervalis. Thank you.