discoposse

TRANSCRIPT: GC On-Demand Ep. 42 - Rob Hirschfeld SRE Chat

Blog Post created by discoposse Expert on Feb 28, 2017

The following is a full transcript of the GC On-Demand Episode 42 featuring Rob Hirschfeld talking about Spiraling Ops Debt and the rise of the SRE (Site Reliability Engineer)

 

If you like getting our full transcripts, please drop in a comment to let me know and I'll start a regular export of them. 

 

Thanks for listening and supporting the GC On-Demand Podcast! 

 

 

Eric Wright:                           Here we are again. Welcome, everybody, to the GC On-Demand Podcast, a very fun time in the growth of any medium, podcast especially, is when we get to bring people back. One of my favorite things is looking for the right opportunity to keep bringing familiar faces and voices to the podcast, and I'm lucky enough to do that today.

                                                      Of course, for folks that don't already know me, my name is Eric Wright. You can find me on Twitter @DiscoPosse, and I'm in the Green Circle Community at Disco Posse as well. We've got ourselves again a great friend of the community and also somebody who wrote a pretty cool article recently of the many things that I've seen. With that, I'd like to say, welcome aboard, Rob Hirschfeld.

                                                      Rob, if you want to reintroduce yourself to the audience, and then we're going to talk a little about Spiraling Ops Debt.

Rob Hirschfeld:                    Excellent. Thank you for the introduction. My name is Rob Hirschfeld. I'm Zehicle Online and CEO and a co-founder of a company called RackN. It specializes in hybrid infrastructure automation, so really trying to solve the problem of helping people do operations in a portable way across physical, cloud, on premises, [KOLO 00:10:33], sort of making infrastructure not matter, if you will, in a really positive way. I've been really deeply involved in the OpenStack community and now the Kubernetes community and now this crazy hybriding of Kubernetes and OpenStack together.

Eric Wright:                           There's really a rich opportunity in front of us, and I love that you're doing such a great job of bridging it. What I really love is that seeing the opportunity be there for you, because you've been doing this kind of thing for a long time. You've had this vision, and we could talk briefly for a second about Crowbar, now Rebar. Then we'll get into what we wanted to talk about around the SRE approach to infrastructure.

Rob Hirschfeld:                    The history might be useful with this. My team was originally at Dell in the early OpenStack days. We had done a ton of cloud solutions, hyperscale solutions like [jo-ann 00:11:39] and Azure and Eucalyptus, way back machine type stuff. We had battle scars like you wouldn't believe deploying this stuff. When we hit OpenStack, we realized that we really were going to have an operational challenge, because OpenStack was very fast moving, relied on Linux very deeply. There were few operational baselines, and every infrastructure we touched was different. Every data center was just enough different that it was really hard to get things started without a lot of professional services.

                                                      We built a platform to automate a lot of the things that we thought were routine options, right? That was what Crowbar was, a standard way to boot provision install rate and BIOS, set up DNS. Really this ... I hate using the word canonical, but it's the right word here. Sort of this right way to install OpenStack. That's what Crowbar was. Amazingly, it's still in use. Zeus still maintains that project.

                                                      We got into it and realized that Crowbar wasn't sufficient because it really wasn't able to adapt to the operational environments that we were coming across. Networking changes would break it. Scale would break it. Somebody who didn't like Chef would throw it out because it was Chef only.

Eric Wright:                           Right.

Rob Hirschfeld:                    That's really what gave birth to Digital Rebar, which is a much more flexible, composable infrastructure that can handle that type of operations. Then that led to being able to be hybrid, because mixing ... Once you're flexible, you can mix cloud and physical infrastructure in an exchangeable way.

Eric Wright:                           It couldn't have worked out better that you happened to choose naming of the projects that, one is a rigid piece of metal. The other one is a much more flexible piece of metal that can provide rigidity for infrastructure when the right things are wrapped around it.

Rob Hirschfeld:                    I love that. I had not gone that direction in the naming. I'm glad you appreciate it. It seems natural now. It took us a long time to come up with the new project name.

Eric Wright:                           No one ever understands how painful that part of the process is. You could come up with a platform long before you come up with a name, and then there's 18 people arguing over the name that had nothing to do with the build of the project, so it's a whole different beast unto itself.

                                                      The thought around SRE, and it's still fresh in a lot of people's minds as to what it is, what, as far as the definition. We talk about site reliability engineering and site reliability engineer. Did it come out of the Google camp? Is that really where it originated, Rob?

Rob Hirschfeld:                    It didn't. It's actually a long-standing term but for a very narrow form of ops. Google morphed it into something much broader, which they were doing quietly and sort of covertly in Google's fashion. Then Google in the last six months, especially with the Kubernetes project, has really started to open up about their operational practice, but site reliability used to be just monitoring performance. Is my website up? Is it down? That was what it originally was, monitoring a website and the load balancers and all that that went into it.

                                                      Google went deeper, and they realized that they couldn't ... because they could, that the site reliability engineering team really had to own the full chain of command from the data center all the way to their website. If you think about it, Google was at the time really web property, so it was mostly web properties. What happened is the web team said, "Well, you care about uptime. We're going to work all the way backwards and run the infrastructure, because that's the only way we can guarantee uptime." The concepts that surfaced out of that are really revolutionary, because it really [inaudible 00:16:06] system thinking.

Eric Wright:                           Yeah. That's where people, it's tricky that we've chosen a name that seems to have a different connotation. Like you say, we think of site reliability as web app, like it stops at the application layer and it's really around service uptime, node uptime and there's those things. Now when we think about composable infrastructure, those are service container ... The physical bare metal portion of it seems to be almost the least important piece of it right now, because there's this sort of blind trust that the physical hardware is okay and what do we do that's literally one tiny abstraction above that, and that seems to be where we have to move our thinking a lot more.

Rob Hirschfeld:                    It is and it isn't, though. This is what I liked about the Google SRE concepts, and they just published a book. There's a great book out there. I highly recommend reading it. They actually go all the way down to when their ops team was building a data center and how they had to go from taking a month to turn up a data center to a week to turning up a data center. Then they just published security, something about how Google secures their infrastructure, and they go all the way down to custom chips, ASIX, and credentialing and things like that that are available to everybody but Google integrates it.

Eric Wright:                           Yeah. It can be the type of network one as well. Like you said, they've realized ... They had a very unique thing. Maybe this is one and this is what's really cool and I want to give a shout-out to your team and what you're doing is that, kind of like what Alex Polvi talks about CoreOS. He says the GIFE, Google Infrastructure for Everyone, and they are really thinking of a particular way to target it.

                                                      You, as well, where you don't necessarily have custom ASIX. You don't have the ability to choose that this is the particular thing that you can model everything around. You've taken an approach it seems around complete hybrid approach. We cannot guarantee that we've got the same custom physical hardware, the same custom physical network. That's that neat thing that I like how you're taking the approach of.

Rob Hirschfeld:                    Thank you. What we've found is that even in companies that think that they're very homogenous, there's a lot more variation between their hardware, between their infrastructure, between their data centers, between their ops than they're used to. Then as soon as they start throwing in, "Oh, we're going to do these things in cloud," the variation between cloud and physical is really high. The lack of fidelity between that means that people silo their operations, they do things in multiple ways, and it ends up adding a lot of cost and complexity to their ops. That's sort of sad.

                                                      The other thing I would note is that a lot of people have capabilities in their servers, their hardware, that they don't even realize they've got. The custom ASIX that Google's bragging about are really available to everybody, TPM capabilities, trusted platform modules, and hardware security modules, which include encryption for your system, are very accessible technologies. They're well understood standards. They're just incredibly hard to set up, because most people don't look at their servers as a system. They look at them as individual units. Frankly, that's what we see as the problem is the lack of systems thinking from an ops perspective.

Eric Wright:                           I think that's the right part to pick up. Let's talk about Spiraling Ops Debt and the SRE coding imperative, which was a phenomenal article, really, really well done, and it's part of ... You've got an SRE series you're writing about at your website, which for folks, is RobHirschfeld.com. Talk about the inspiration and what the background is around this idea of Spiraling Ops Debt, Rob.

Rob Hirschfeld:                    Oh, excellent. Some of this comes from reading the Google book, then of course salted with own personal experience and frustrations. The Google book does something sort of radical. They start with this premise of 50% of an operator's time should be spent in development, which when I first heard about that, I said, "Oh, my God, that's insane. Operators need to spend their time doing ops," but the more I look at what's going on in industry with, we're speeding up developers and they're getting more efficient. They're producing more code. There's more variations, so complexity is going way up.

                                                      We've got Kubernetes on a three-month cycle and Docker on a fast cycle and SDN's on a cycle around that, and you've got to do all that work to build a working infrastructure and then figure out if it's Google or Amazon or physical gear or open sect. It's insane, right? There's so many moving parts.

                                                      You've got increasing developer demand at the same time you have increasing platform complexity. The reality is that if you just keep doing things the way you've been doing it, those two factors will bury you in ops. I've seen this a lot. I watch people ... We sell Digital Rebar, and we provide commercial support for it, is a platform that allows people to fully automate their infrastructure, but the cost is, it's a new tool. It's more complex. It's a system-level tool instead of a node-by-node tool.

                                                      What we find is that the people we talk to are so under water, they don't have time to do a new tool. They're sort of like, "I hate using Cobbler. It's really a pain in the ****. I spend so much time doing it," and we offer, we're like, "We can come in and automate that in a week." They're like, "I don't have a week."

Eric Wright:                           Yeah, that's right.

Rob Hirschfeld:                    It literally is the truth. They are so behind in getting their stuff done that they can't sharpen the saw or they can't take the risk that they're going to try sharpening the saw and it won't work. That's where I had this "Aha" moment that what Google's really saying is is that if you don't have a way for your operational team to breathe ... This is a known thing for development teams and physical inventory and management. It's all that. If you can't breathe, then you can't actually be secure. You can't be robust. You can't swim.

                                                      That's really where the Spiraling Ops Debt concept comes from, because what we're watching happen is that people are saying, "Well, my ops team can't swim. They're so far behind, the only option that they have is to reset, right?

Eric Wright:                           Yeah.

Rob Hirschfeld:                    Burn down my data center, move it all to Amazon, wipe the slate clean, damn the torpedoes and then burn the boats that I came in on.

Eric Wright:                           That's right. It's a very violent shift that we seem to be taking sometimes in order to respond to the fact that we haven't kept up for a while, and it's like saying that I'm having trouble losing five pounds. Let me saw off my leg in order to really get at this thing. That's kind of I feel the approach that people are getting into, but there really is, to coin the phrase of all the good infomercials, there has to be a better way. This is where this thing comes in, right?

Rob Hirschfeld:                    Right. One of the things that we really had an objective for when we watched OpenStack become a dumpster fire, and I say that with all due respect and love, because I was part of it. OpenStack is I think actually starting to see some dawn of some good things. I have some personal ... That's a whole another podcast. I have things I would rather see them focused on than the way they spread out the project.

                                                      The thing that happened is is that since every operational environment was so different, when OpenStack showed up, it didn't have a way to solve that problem, and so we had, every OpenStack deployment was different and every deployment methodology was customized. There was no sharing back into the community. One person would be successful, but it was impossible for somebody, the next person, to benefit from their operational use cases.

                                                      I went to a lot of the operational summits. Each operator was doing great things and they'd compare notes, but they wouldn't share scripts. They wouldn't share code. They were all islands. That to me hurt OpenStack. I'm worried it's going to hurt Kubernetes and other things. What I really want to see and one of the things that we're very passionate about, part of our vision for Digital Rebar is this concept of open ops or operational [rees 00:25:33] so that you can say, "Well, I'm going to reuse this Ansible script and make it work across multiple sites."

                                                      What we've done with Digital Rebar is a really great example. We're about to cut over into code for Digital Rebar that uses the upstream Ansible playbooks for Kubernetes unmodified.

Eric Wright:                           Wow.

Rob Hirschfeld:                    Then it runs them so that, yeah, this is a really, really big deal, and it takes a ton of work on the background to make it happen. You take the upstream Ansible playbooks, so there's a community around Kubernetes doing Ansible work. It's really cool. We take those playbooks, zero modifications, and run them on Amazon, Google, Metal, OpenStack, with options for Docker or Rocket, three different SDN layers right now, multiple operating system choices. It's this ala carte menu out of Ansible playbooks, and you can run them all in parallel. It's really cool.

                                                      That to me is paying down the operational debt, because now you can make choices that fit your operational needs but still use community stuff. If you find a bug in the playbook, you can submit it back, and then we can test it against these ... I just named a commentorial matrix of 48 different combinations at least.

Eric Wright:                           That's right.

Rob Hirschfeld:                    That's where we get community acceleration around these projects.

Eric Wright:                           One of the things, Rob, and you talked about it with the OpenStack ops challenge of we had so many people doing really cool things. We continue to have. I say "had" like it's died already, but we've got all these amazing things going on, but how do we encourage and create the right way to share that stuff back into this common, these common frameworks and creating common frameworks, because I think the problem we've always got is if someone, they get a great way to do something and then they run with it, then you hear of ... It's like the stuff of legend, like, "Oh, yeah, I hear the guys at SAP, they're doing ..."

                                                      That was the whole neat thing of how they were doing OpenStack and Kubernetes and they had operationalized on three distinct physical topologies, and they had a real nice way to do it. You're like, "That's really cool. How did you do that?" He's like, "Okay, I got to go, guys." That was it. It now is this biblical thing that you're like, "Wow, I hear you can do it, but I don't know how." How do we get that tribal knowledge back into what you're doing with Rebar and other things?

Rob Hirschfeld:                    One of the things that really hurts these project, in my opinion, I am very strongly opinionated about this, is that you don't rely on the project to fix its own operational framework. In my opinion, it breaks the abstractions of the project. One of the problems that OpenStack encountered was the community said, "We're going to use OpenStack to solve OpenStack's operational problems. That happened at the Hong Kong Summits, not Tokyo, at the actual Hong Kong Summit in China. It was 2012, I think, 2013.

                                                      It really derails the effort, because the focus of OpenStack is to serve the OpenStack use cases, not the platform for OpenStack. Kubernetes is trying to do the same thing, and I think it's a distraction. Operational tooling around operational needs for these platforms is different than what the platforms are trying to do, which is simplify it and hide all this stuff. It's very simple, right? OpenStack is meant to hide all the operational complexity of a data center, so don't make it understand all that stuff in order to run itself. Run it, and this is where I love SREs as a concept.

                                                      The SREs as a concept say, "Look, we're going to have a select group of operators. They're going to deal with the nuances and complexities and all the underlay mess," what I heard one Google engineer describing as the lizard brain. It's the plumbing, right? You don't want OpenStack to have to expose network interface card topologies and stuff. You're trying to hide all that.

                                                      What we did with Digital Rebar is we said, "Look, the underlay automation has its own abstraction, its own challenges, its own heterogeneity." We deal with that. We wrote a platform that says the data center is a messy place. We're going to get dirty. That's okay, so that people running OpenStack and Kubernetes and other platforms never have to worry about that. You want 90% of your people at that abstraction layer, and then you want the 10 ... This is what SRE says. You want that 10% of the team to deal with the messy reality of the real physical infrastructure and let them spend their time automating and cleaning it up and making that happen.

                                                      That's a lot of this balance, where we need to have shared ways to automate the underlay so that that team, that SRE team, can improve their productivity and not reinvent the wheel over and over again. Expecting them to do it with Kubernetes or OpenStack is a misuse of the tool. You're pulling the tool off center in my opinion and slowing down the actual use of those tools to solve a different problem.

Eric Wright:                           This is one of the things I always think of in ... We look at the successes that have been discovered through a lot of really good embracing the SRE approach and this GIFE approach. Rob, where's the floor of environment size, where it's almost too much of an investment? In my mind it never is, but the reality is I believe a three-host environment should be automated as much as a 3,000 host, but do you think that there's this low water mark where it becomes really, really tough to justify, especially for folks that are new to it, that they really don't ... this is a fresh concept.

Rob Hirschfeld:                    I love this question. It really depends on not where you are today but where you need to get to. If you are truly only going to be a three-host environment or if you're just playing with something, use the cloud. If you're playing with Kubernetes and you just want to learn how to use it, use Amazon, use Google. That's fine. There's some tools that make it super easy to get running on those platforms, but don't expect that that's going to then translate into a scale, HJ secure production upgradeable thing.

                                                      If you know that you're going to that environment, if you know that you're going to run 100 node on premises infrastructure on metal for Kubernetes, which I think a lot of people are doing or should be considering, it's a smart way to run an infrastructure, then start with something more like what we're doing with Rebar, where you can actually operationalize it up front. You're going to have a little bit more learning curve. Frankly, it takes 20 minutes to do a Rebar on Amazon thing. You just have to install Rebar first. Oh, the horror.

                                                      We see this all the time. People want to use Terraform or Ansible because it's the desktop tool and it's super fast to download and get something running, and it is. They're very powerful tools. We love Ansible. We use a ton of it, but it's not ... Running a data center from a laptop tool is anti-pattern. You need something that maintains the state of your data center so that you can maintain it.

                                                      That's what I would suggest is that if you're going to be looking at running infrastructure on an ongoing basis, then take some time to figure out how you're going to sustain it up front, invest in those tools, learn how to make it happen. Frankly, we've been spending a lot of time and working really hard to make it so that that's not a big penalty, that you can come in and do it.

                                                      Here's the rationale, and this really comes straight back to that blog post, is that if you can start in a place where you are able to stay with the community and not be inventing your own stuff and being able to pull things in and then give back and share, it will really help you avoid being into an operational debt place, right?

                                                      We talk to people who are still maintaining kickstart files, which there's no ... I'll say this very, very directly. There is no corporate payback for you, for anybody out there, to be maintaining kickstart files for their company. It's not a value proposition. That's the type of thing where you should be able to reuse a templated system like what we've been building or doing that. They're just not value added places. Get upstack quick.

Eric Wright:                           On that one, Rob, one of the things that I find is just because of the ... I think there's networking, and that's probably what the most scariest challenge for a lot of raw ops folks is that in order for them to get to that next layer of automation, it kind of has to embrace, it's going to be networking with, whether it's vSphere Auto Deploy, whether it's going to be running PXE, running environments which have a little bit more care and feeding on that one hunk of network infrastructure or the ability to make sure that when you're running, like a Digital Rebar, admin know that it's not going to get severed off or interfere with other stuff that's happening out there.

                                                      That I think seems to be the first most challenging step. Once they get comfortable with the network side, then all bets are off, and they're like, "Oh, this is awesome. I don't know why I haven't been doing this for 10 years," right?

Rob Hirschfeld:                    I will tell you now that what we find is people can run the demo in the cloud where there's no networking pretty easily. As soon as they step into trying to bootstrap on their own infrastructure and they have to know enough about their subnet mask to build a DHCP server ...

Eric Wright:                           Yeah.

Rob Hirschfeld:                    They run aground. They can call us, but people want to figure it out themselves. I agree with you, that ends up being a big problem. Networking is hard enough. It's not getting easier. It's a factor. We find that we for very good reasons we ship with DHCP. It doesn't automatically broadcast. You have to configure it so that it broadcasts on the sub nests that you want to broadcast.

Eric Wright:                           Yeah.

Rob Hirschfeld:                    Good behavior for us.

Eric Wright:                           That's right. Excellent choice, and thank you for that. Every network person in the world thanks you for that.

Rob Hirschfeld:                    Yeah. At the same time, what it means is that if you want the system to come up, you have to understand enough about what interface you're binding to and what network it's going to be on.

Eric Wright:                           I've seen products that will ship. I can call this one out because it has long gone away, was the original HPE, like their Helion cloud. When I saw it, "Oh, this is great. They're finally going to ship it as you can push it out as an OVA. People can kick the tires on it," and everything inside it was awash with hard-coated IP addresses and networks.

                                                      I'm like, "There's really no way that people are going to be able to stand this up except on Virtual Box." That's cute but if they really want to test it out in multi-node, then there's a huge lift to get to that next stage. That, like I said, when you ship with things are down. You have the options to do the configuration. While that is a tiny bit more, like that first step, at least you've got the mindset of let's make it open. They have to do a little configurations so it's going to match their environment.

                                                      Time and time again I see people, they ship out a demo platform and it's just, like I said, it's just raft with static IP addresses, which are going to be a huge problem for many environments.

Rob Hirschfeld:                    It's definitely something we walk through in the evolution of the system, right? That's why we ended up where we were. You do have to understand your environments. We had the same problem with Amazon Subnet IDs or VPCs, is that if you have multiple VPCs to find in your environment, then you have to know which ones your [inaudible 00:39:19] ends up on before so they connect to each other. That's the way it is.

                                                      People overlook this. If we want to talk about hybrid and multi-cloud operations, Google, Amazon, Azure, OpenStack networking models are different. They're even more different than switch vendors on physical gear. If you're trying to create portable operational scripts, which people should want to do, it really requires some understanding of what you're doing with these different network models. Subnet broadcasts between Amazon and Google have different behaviors. It's crazy.

Eric Wright:                           Yeah. Then I had the classic, the question that got asked me and I knew I was talking ... We were way over someone's head is they said, "How do you enable jumbo frames out into the cloud?" I was like, "Wow, you just need to stop."

Rob Hirschfeld:                    Oh, my goodness.

Eric Wright:                           "You need to stop right there." Even when you are pretty, at the slightly higher level, where you're able to understand overly networks across multiple clouds. Like you said, it's just adapting to the idiosyncrasies of each local environment. Everybody thinks, "Oh, the cloud creates such flexibility." For that cloud, yes. They are all effectively a raccoon trap. They're designed to create ...

                                                      I loved at the AWS summit, they stopped saying "vendor lock-in." Then they started saying "legal lock-in," that, "We stop the legal lock-in that you have, because you can just vacate the entire AWS environments today, and there's nothing stopping you from doing that." It allows them to not be stuck with having the thing of like, but it's all your tooling. It's all your specifics.

                                                      Again, huge props to your team on how do you pick the horse to hitch to, and then how do you keep on top of those changes and the continuously shifting underlay environments that you're walking into when you become the cloud?

Rob Hirschfeld:                    That's probably worth a small drill-down for this also, because one of the anti-patterns we saw with DevOps scripts was that they had a tendency to build ... It's all [inaudible 00:41:47] book type thinking, but they would build very connected chains of logic, Ansible builds a big inventory file where all the variables are injected at the beginning. Chef does variable searches across a shared database, but what that ended up doing is you ended up with very connected actions, these roles that people were wiring together had very deliberate assumptions roll to roll to roll to roll, where the variables would sort of string through them.

                                                      That was one of the anti-patterns that we encountered from a work perspective, because you would end up with assumptions across rolls that were very hard to troubleshoot. The reason why I'm connecting that statement to what you made is that as people go trying to connect all these pieces together and using more Amazon services or trying not to use Amazon services, they have a tendency to embed assumptions into those rolls, because it's convenient, and it's very hard to pull back from that once you've gotten all in, [inaudible 00:42:53] the anti-patterns we saw with Crowbar.

                                                      I'm actually working on some blog posts about exactly that, but it's hard to explain how dangerous. It's sort of the more seasoned SREs and DevOps engineers will see this, but it's a very hard pattern to explain for a lot of people.

Eric Wright:                           I think as people start to explore, they're inevitably going to bump into some of those things. It's not a terrible thing anymore so than we did with any pattern as people started to adopt it, but what's good is I think the ecosystem is getting better at ... Your team is rapidly responding to things and continuously innovating on what you've got. You've got Kubernetes, like I said, rapid life cycle on Kubernetes, on Docker, so we've got a lot more rapid response to stuff. People are learning for us.

                                                      We used to have to ... You were like a design expert in one particular platform, and it was a handful of people that shared that depth of knowledge. Now you're leaning on RackN people. You're leaning on the CoreOS team and Tectonic and all these other things where people are like, "All right, they're running it elsewhere, and they're learning for me, and so I can kind of lean on that community," which is good.

Rob Hirschfeld:                    If you're spinning these things up and not thinking how you're going to share it back, and this is what we saw over and over with OpenStack. When people would get too far off the main course, it became very hard for them to keep up, not because of the ways they weren't good or productive. It would just be that they got off course. Then if we didn't provide ways for them to collaborate, then we had a thousand flowers bloom, and then it would just become a huge fight. I could go into specifics around even some new OpenStack efforts where they've branched into three different approaches to do something, and it's a bit of a mess.

Eric Wright:                           Yeah. As we get close to the Boston, we'll get together again, and I want to dive into some of those specifically, because it will be, as we get the exciting times and challenging times, as much as we've learned, we also continue to repeat a lot of the same mistakes, but ...

                                                      On the RackN side, Rob, what's new and exciting in that world? What can we expect to see next, and where do people go when they want to find out more about Digital Rebar and RackN and all the stuff that you're doing?

Rob Hirschfeld:                    Wow. We are cruising like crazy. The upstream Ansible stuff for us is this next jump that we've done from a Digital Rebar perspective. We are doing some really interesting work on the OpenStack on top of Kubernetes environment that I had actually considered a joke. I even call it the joint OpenStack Kubernetes environment.

                                                      I'll tell you, it's firming up into actually a much more real way to do it, and the way we're doing it is the way that I think lines up with community where we're actually using Kubernetes manifest. We're managing OpenStack in Kubernetes in a way that's true to Kubernetes, which is what I saw as a goal. I think it's very troubling from an OpenStack perspective, because it really boxes OpenStack's utility in. From OpenStack's perspective it's a joke. From Kubernetes perspective, it's inevitable. I'm in both camps. It works for me.

                                                      We'd love to see people collaborate and play with that. Our Kubernetes stuff is top notch. I don't think anybody's got a more robust, complete Kubernetes, especially community facing, Kubernetes approach, that we can support and help people with. RackN is RackN.com. I am ZehicleOnline. The Digital Rebar project, which is an open source project, Apache 2 license, and it's all contained.

                                                      This is one of the things that's awesome about Digital Rebar. I don't usually talk about how it works itself. It's a containerized infrastructure, so it runs as a 12-plus container microservices architecture, super lightweight. We walk the walk when it comes to containerized stuff. That is under Rebar.Digital is the website for that.

                                                      We're in there every single day doing stuff, making improvements. We love to get people's feedback, help people get their physical provisioning done, play with new hardware types and things like that. Always looking for fun, play with that, help people not escape from the Cobbler, the hell that is Cobbler.

Eric Wright:                           That's right. That was my funniest thing I wrote on my Twitter bio one time. It said, "Cobbler," and someone said, "Oh, you work with Cobbler?" I'm like, "No, I was legitimately a shoe repair man for five years." I said, "Believe me, I wouldn't go near the automation project. That thing's a steaming fire."

                                                      It was interesting in what it did. God, we could talk for a whole hour on Cobbler and Razor and the like and how those things, where they lived.

Rob Hirschfeld:                    What people forget with this is that those things, the Cobbler pieces and stuff that Digital Rebar does, because people are like, "Why do you still use TFTP Boot? That's old school stuff." I'm like, "PXE is built into people's BIOS." It's not changing very fast. There's no new sexy when it comes to booting gear. It's Radeon BIOS configuration and TFTP Boot and DHCP. You've just got to make it work. As I said, we've done what we can to make it cool.

Eric Wright:                           That's one of those things that there's nothing that's made us need to get rid of it. There's no landmark shift that said, "We really don't need that stuff anymore because look at what we've got now." It's like, well, it's got its little oddities. We've put all the effort into that next layer. These are minimum requirements and this is where the relaction happens is once that process is underway. That's what's cool, and as well for folks that haven't taken a look at Rebar.Digital, do so. Then you've also got some good demo videos.

                                                      What's really great is you literally from ground up, implementation, pushing out Kubernetes on clouds in a few minutes, and they're real-time demos. They're super-easy to walk through yourself, which is really cool, so I encourage people to have a run at it. It costs you a few cents to take the leap. As I tell people, they're like, "Doesn't it cost money to run an ADWS?" I'm like, "It does if you keep it there," but that's the whole fun is you can watch the magic happen and then undo the magic at the end and you're not going to get that bill at the end of the month. Just don't forget that part.

Rob Hirschfeld:                    Actually, we do a lot with this bare metal hosting company called Packet, Packet.net.

Eric Wright:                           Oh, okay.

Rob Hirschfeld:                    If you use RackN, RACKN100, all caps, you'll get a hundred dollar credit on that site. You could do this stuff for nothing.

Eric Wright:                           That's awesome.

Rob Hirschfeld:                    Don't let the costs slow you down, although Amazon is not expensive either for an hour-long tutorial and demo.

Eric Wright:                           No. Exactly. Exactly. Well worth it. Excellent.

                                                      Rob, thanks very much for chatting. It's always great to catch up, and I love hearing about the stuff that's going on with the team and with the community wrapped around RackN and everything you're doing. I hope to catch up in person soon, but like I said, definitely we'll, as we approach the OpenStack Summit, we'll do a separate stream on that. That's a fun stream of consciousness chat all unto itself.

Rob Hirschfeld:                    I would love to do it. As always, maybe we'll catch up on the OpenStack [inaudible 00:51:24], informal running event.

Eric Wright:                           Yes, that's right. You've got it. There we go. Excellent. Thanks very much, Rob.

Rob Hirschfeld:                    Eric, thanks. Have a great day.

Outcomes