These are unedited transcripts and may contain errors.
OpenSource session
Wednesday, 16th of October, 2013, at 9 a.m.:
ONDREJ FILIP: Welcome everybody. Welcome, and to this historical moment because ifs the first official round of the OpenSource Working Group. Thank you all for joining us.
(Applause)
Let me introduce myself, I am co?chairing this Working Group, together with Martin Winter, to my left?hand side. So, again, thank you very much and I am very glad that the room is full at the conference at the idea of OpenSource Working Group makes sense. We have prepared an agenda for you, we have basically about three topics, three presentations that will be delivered. It's going to be one delivered by Nat Morris from Cumulus and overcoming traditional network limitations with OpenSource. Second presentation will be by Scott Wilkinson and it's going to be about Nagios, and the last but not least will be NSD update by .... After those presentations, we would like to spend a couple of minutes talking to you because this is new, we would like to get feedback and hear from you what is going to be the shape, how it's going to work in the future, so we will spend some couple of minutes at the end of the session about this.
So, let me start with some mandatory administrative things. I think welcome was done. We need to select a scribe, and we have Daniel who volunteered, or was volunteered, I don't know, to be a scribe, so thank you very much. And we have Michaela, will monitor the remote participants. Thank you very much. And that is all the agenda, is, I think, finalised and unless there is somebody who would like to add something. I apologise for those guys who were denied the presentation ?? the presentations they were denied. The time is limited, we have just one slot and we cannot fit everybody, but we will use those presentations in some future sessions. So don't be afraid.
And last reminder, which is the microphone etiquette, say your name and your affiliation so the scribe will be able to catch what was said and who said it, so thank you very much.
And without any further, I will pass the microphone to Martin and he will present the first presenter.
MARTIN WINTER: Good morning, everyone.
So first speaker we have Nat Morris from Cumulus Networks, he will talk a little bit about a lot of the changes and ideas how to go over from like changing from the traditional way from like networks, how you run it and using OpenSource. And for me it was an interesting inspiration, how they start embrace and use a lot of the OpenSource tools aye hope it gives you a lot of ideas, things which can be done different, some of them radically different what you are used to in the last five ten years. Look at it for an inspiration of how things can be changed.
NAT MORRIS: We are a start ?? based in California, we came out of self mode in June this year. I want to talk about some changes that we are seeing in the data centre scene and about running Linux on all the network devices in the data centre can reduce your management overheads. I am going to talk for some of the management tools that we are seeing people use, some of the monitoring tools and then a few of the contributions that we have made from Cumulus to the community and finally, I will take some questions.
So, in the modern data centre these places are huge, we are not talking 20 or 30, these are thousand racks at a time and sometimes just for a single website, the likes of Google, Amazon and Facebook. People are deploying 100 gig into these sites now and there is a lot of admin overhead, and people like to reduce that, to reduce the cost that is required to deploy servers en masse.
There is a lot of buzz words around, SDN and OpenFlow. And with that comes sometimes a lot of overheads and running Linux in the data centre can really help you reduce that.
A typical network operating system is usually from a single vendor, comes on the hardware that you buy. And you are kind of locked in. And what we were seeing is, you just have a ?? there is no API to it, and you have got to programme it as if you are sat in front of it. It's very hard to see what is happening. You might have to create exports and send them off to attack or something if you have got a problem and monitoring and look after that environment is a bit of a nightmare, you could be Googling for minutes and things on the Internet and screen scraping and using tools like expect to look for the state of the box. And the scripting environments on these boxes aren't too good traditionally, either.
We are seeing, these large web environments having to grow very, very quickly and typically arrive on a back of lorry, prefabricated racks and within a few hours. In the old way, you'd see people turn up at the racks with their lab tops and consoles to run scripts against the switches but with OpenSource, Linux environment running on the switches you can automate that much easier.
So traditionally in the data centre, things like monitoring and enforcement have been a bit of a nightmare and about the spin up and deploying these apps. What we see now, there is lots more east/west traffic, within the same pod, so lots of self data stores, cloud stack and open stack environments where the data centre the traffic ?? within the data centre it's north/south where it's leaving that environment to go to the user so a lot of east west traffic now, so the design is having to take that into account, that is why the leave spine design.
Why Linux? It's well?known, it's very mature now. We are seeing there is a lot of practice that people are familiar with, people have been deploying Linux in the data centre for years. It's got excellent network support, it's very mature, people are very comfortable debugging it as well and it's got a great community behind it, you can easily debug things and work with colleagues to add new features quickly and also it brings a lot of people together.
The management frameworks for looking after Linux are very mature now, with tools like puppet, and it makes sense to use those same tools to look after your network devices as well as the server.
So it's a great fit. It makes sense to kind of standardise that across the data centre so you can use Linux not only for the servers but the switching and routing environment as well.
We have seen the network operator systems develop over the past sort of ten, 15 years. Originally, there was ?? Monolithic OS that came shipped with the vendor of the switch and there was a loop and not much happening and not ?? no real OS there, very modernised, like Q and X, where it's going to be proper process and memory management and in recent years we have seen people picking up Linux used for the control plane, and this sort of first box is doing that where we are just using Linux to manage the control plane with the process management and some modular and looking after the routing stack but they were still a CLI layer that they were presenting to the user of their own, it wasn't kind of open for Linux as you'd normally see on like a server, and now there is people like ourselves that are just releasing Linux that runs on this hardware, typically it's white boxes, white box switches from Taiwan, from vendors like Quanta, Acton, Delta and when we say it's Linux on the switch, it's just Linux you see, all the switch ports on the front, you see them presented as additional interfaces on IP route 2, allows you to use the traditional tools you have been using for years but now there is identical tools to manage manage your switching and infrastructure.
What advantages does it provide? You can use, when they have been ?? software routers, the community is really well developed and well mature now so you have got great OpenSource routing prongs like Quagga and BIRDS, like BGP RS PF, RIP and those tools are great and they have got awesome community behind them with fantastic mailing list and support to get you up and running. The other kind of Layer 2 tools in Linux are coming on now as well, like BDP U guard and bridge assurance, like like L L D AP for days /KOEFR, that is all there as well so we might as well embrace that and standardise on the toolset we deploy in the data centre as well as switches and routers.
These are some of the traditional tools that people have used to automate their locked vendor devices in the past. I think we all kind of spent a few hours running running on a Cisco switch and given up and there is a few vendors now you can send a big chunk of XML to and also one of the most common ways of deploying switches is things like in expect script where you just have command centre as if you are in front of the box trying to configure it. And now coming from the server world there is this movement where there is some great tools that you can use to manage OpenSource operating systems en masse and CF Engine, Puppet, chef, and every time you get back and look at that community everyone is appearing, salt stack and trigger and this is really growing and a lot of people are getting into the way of thinking they can document their data centre and their deployments of operating systems and applications inside their manifests and their recipes. The nirvana is this environment where you are using the same tools to look after the server estate as the networking estate because in these large web scale data centres, traditionally they need a much larger team to look after the networking stack. By running Linux on your routing and switching operating systems you can reduce those overheads and redeploy staff to kind of more useful task and better projects.
And this debt DevOps environment that you can use Puppet to configure network interfaces now, and the same with chef and kind of bring you all back together.
The monitoring has been pretty stagnant with how people have been monitoring these traditional networking devices and that is typically via SNMP and pooling out and as you are growing your environments like MRTG, scale up all those polars. The folks who have been looking after these big traditionally clusters have this kind of sorted for a long time, using things like ?? these tools like CollectD, Diamond, Graphite, Sensu, be pushing the data from the device to the monitoring box. By having agents deployed on the network switching infrastructure and the routers, you can reduce that management overheads and you can the client on the switches as well on the servers now.
I could talk quickly about some of the contributions that we have made to the community. So we have got OpenSource boot loader, prescriptive topology Daemon which to manage this cabling nightmare in the data centre. We are also active producing patches for Quagga and we have done a lot of work around the Layer 2 support in the kernel as well with bridging and S TP.
The first the projects is ONIE which is open network install environment. What we have seen is when you go and buy a switch from a vendor it comes, it's pretty vendor lock, Cisco box with CAT OS or Juniper box, and we really wanted to break that mould so you can go to different vendors to buy the hardware and work a common network system on there and the way we have done that we have OpenSourced this small boot loader which is based on U boot with busy box and we have managed to persuade all the ODMs, Quanta, Delta, Acton that when the switches leave the factory they will have this tiny boot loader on that, we have opened up that environment for other operator network systems, not just ourselves that, can now go to these vendors and ask for these white box switches to run their own Linux operating systems on there. If you look at the model at the top of the serve, for years we have had the bare metal server and B iOS and that has been missing so we have got the bare metal switch with and on top of that place you place your OpenSource operating system and on top that have layer your applications.
And this is how ONIE works, so the switches, come shipped from ?? from Taiwan where they are manufactured with ONIE installed and you plug it in it will go through some discovery mechanisms, it is also look in for TA CP option for network OS, it will do IPv6 enabled discovery to try and find the binary and locate in management web serve or TFTP host and install it and run it, and it's this platform that we think it's going to try and break that mould where you can develop your own operating systems or use your own to run on these much lower cost switches. There is some great documentation on there how it does discovery and how you can use it to manage the flash environments on these switches.
The next project that we have been working on is the prescriptive topology module and this fixes the big issue that we have seen with data centre cabling. Getting pre fabricated racks coming in from a fabrication house where they are putting the cabling in and the servers and switches into the racks and 230 amp powers, you can't really be sure they have been cabled up correctly every time. So, what we have looked at, we have taken the graph declaration format of graphers, as you can see on the right?hand side here, and we have mapped that to the data centre so we have got this topology here with spine switches, mid?spine on top of rack and using graphers we can declare the relationships between the entities, so switch one with port one maps to port 3 and so on. By using this combination of the host name and the switch port we can use LLDP to verify connectivity. We have a DEMON that run on the operating system that you can use to verify that, whether the cabling is correct or not.
So it's a really simple topology we have got a spine and a leaf and the mapping of the ports and you can end up with this tiny text file for this topology or a huge one and apply the same to all of your switches in the data centre. You can read which switch it was running on and which neighbours it's expecting to see on the switch ports. Because we are just using LLDPD, it's got things like CDP as well, so you are not tied to just between two hosts that do LLDPD. It's written in C, it has a hook into LLDPD and at the bottom here you can see the output. Pie on this based ?? we have run ?? you can see the switch ports, the neighbours that we are expecting and the ones that we observed on that port via LLDPD, the last one has failed because it's got incorrect cabling and you can have some hooks so you can place sort of scripts or your own pie on this into some directories in ETC and can have actions occur based on the correct or some cabling in the data centre.
And because using this it's got really got inter?op in the data centre, so here is an example with some different vendors so switch ports one and two are connected up to another switch. 3 and 4 are connected to ?? we are looking for the host name and then the port number. And switch port 5 is connected to a Cisco switch and Juniper box as well on switch port 6. We see a lot of people deploying these huge data centres want to do is verify their cabling all the way down to the hosts in the rack so instead of just running LDP and CDP, people are now deploying LLDP Daemon's on maybe their self file stores and that allows to you verify the cabling right down to the host as well.
So what are we missing in this environment? Traditionally people have deployed these Quagga or BIRD boxes in the data centre, small to medium?sized hosting companies where they are routing in CPU and that doesn't really scale up to these big web scale data centres with thousands of racks and requiring hundreds of gigs of bandwidth. So one way to do the hardware acceleration is how we have done with hardware accelerating Linux via special module. What we have done is we have a kernel driver and that talks to the SDK from the silicone manufacturer, so in our case that is Broadcom. We have a switch D which talks to the driver and also monitors the routing state, the table in the network interfaces so any routes or any routes that you are adding to Linux via Netlink we monitor that and in realtime we push those routes down into the hardware so the Layer 2 and 3 switching is actually taking place on the silicone and not coming near the control plane at all. That means we can do routing at 40 gig a second on a switch but you are configuring it like it's a Linux box, so you could write your own crazy routing table, and as long as you are adding those routes into the kernel we are pushing that down into the hardware so it looks like it's for all purposes just a soft router.
The other thing is monitoring the state of bridges so normally when you'd log on to a Cisco switch and declare the VLAN and place some interfaces into a VLAN, what we do is we create Linux bridges with B RTC L like normally with a home router box, in our case you create a Linux bridge and add in the switch ports, and then the switch driver will see that happening and it will create VLANs on the switch. And by doing this we allow the network and system administrators to treat it like it's a Linux server but in this case with 48 ?? because it's just Linux you can run anything you like on top of it. So you can do something like the Chef or Puppet or CF Engines to automate, you can install CollectD or Nagios or Ganglia to send the stats back to your monitoring station. By running Linux in the data centre you can reduce the overheads that are deploying the switches now because you are treating it just like you have been treating Linux server for years.
I think that is it. Have you got any questions?
MARTIN WINTER: OK. Thank you.
AUDIENCE SPEAKIER: One thing is look at the whole thing, the ideas, lots of these OpenSource tools which he brought up, and the whole movement which ?? ? when I talk to people from moving from the classical, how network is run, how it is designed, up to the idea of using much more OpenSource tools.
AUDIENCE SPEAKER: Stefan from AFNIC. Indeed the problem of hardware acceleration is very often mentioned by people who are skeptical about Unix based, you didn't mention another possible approach, the one described by the forces Working Group at the IETF which is standard interface between the network element and the control element of ? switch allowing to have fast closed for ?? on control element building with unique software. Do you know forces do you think it has a future or all of the RFC are now published so now there are implementation but as far as I know only in software. Do you have any opinion about it?
NAT MORRIS: Because people want to reduce their switching costs in these huge data centres we have been looking at hardware that is currently available, typically boxes normally come with Broadcom reference OS, stripping that off and putting it on our own boot loader and then accelerating T people want to greatly reduce those costs that we have been using existing hardware than looking at that project that you mentioned. But lots of people are going about it in different ways. Some people that don't present the Lynn us ?? the interfaces into Linux but it's still really early on in this Linux on a switch.
AUDIENCE SPEAKER: OTESA here in Greece. Quite a lot of stuff, most of the stuff we will take it outside, I guess. The most interesting thing for this is the hardware acceleration part. I didn't quite understand, does your company produce that hardware module in the Linux kernel, yes?
NAT MORRIS: Yes.
AUDIENCE SPEAKER: What is exactly your business model?
NAT MORRIS: So, without being too ?? our business model is customer can going to the ODMs themselves in Taiwan and negotiate a good rate or go for a reseller, we produce ?? with the kernel module and that is what we are licensing. So you can use all these ?? all your OpenSource tools on there but the element that does the hardware acceleration that is what our business it.
AUDIENCE SPEAKER: And is it the GPL an obstacle in this business?
NAT MORRIS: No,
AUDIENCE SPEAKER: Do you release all the sources in the Linux kernel?
NAT MORRIS: All the changes that we have made have been released and if you look at Quagga mailing list all the changes we have contributed to that, all the patches, if you look in August there is maybe like ?? patches we have contributed back and on the Linux kernel mailing list as well, all the improvements like IF tool they have all been contributed back as well.
AUDIENCE SPEAKER: That is quite interesting. So if one vendor, if you develop the module for one vendor for one card then it's free for everyone? Is that correct?
NAT MORRIS: Well, not free but the way we use the SDK so the platform we go for the moment is Broadcom boxes, and that is ??
AUDIENCE SPEAKER: Is not one of the friendliest vendors from what I know.
NAT MORRIS: That is why we have got this abstraction so in the future that is other vendors producing these little cost data switches that we want to see people running Linux on as well, so the switch driver would talk through a different SDK to do the acceleration, for the user it looks like Debion, you can run Quagga and Intel box ??
AUDIENCE SPEAKER: The layer is the magic.
NAT MORRIS: Yes.
MIKAEL ABRAHAMSSON: From Deutsche Telekom. I think this is really interesting and you answered part of my question I had written down about distribution, you are piggybacking off Debion to make sure you have complete packaged distribution continuously?
NAT MORRIS: Yes.
MIKAEL ABRAHAMSSON: Really great. Is there a problem with the hardware requirements for Debion when it comes to ?? these devices tend to be fairly low end when it comes to CPU and memory but I guess this has improved lately.
NAT MORRIS: What we are typically seeing, these boxes, they are not very far off the Broadcom reference design so if you bought switches from all these manufacturers and opened them up they look pretty similar, different ITC to manage flash and fans and things but at the moment they're par PC, dual core, 1.2 or .6 gigahertz and between that control plane and the data /SEPB there is about 500 megabits of bandwidth possible. So you have got that dual ?? to play about with Debion and that is ample for running Quagga or BIRD on there.
AUDIENCE SPEAKER: So we have somewhat the same problem in the PC market, I see this as going the same way it went in the '80s, this is the same thing.
NAT MORRIS: Yes, in a more marketing presentation we have got that kind of slide showing the server world to the networking world.
AUDIENCE SPEAKER: Problem with video and a.m. D releasing binary blogs and so on, which is in some cases is good enough. Do you see a lot of the Broadcoms and so on that manufacture the chip sets for these, they are going to do the same way as Broadcom might be doing in the PC market with their network cards because their release, they seem to be releasing a lot more information for OpenSource drivers there than what they do on their home gateways, they are not even pushing the changes upstream to the Linux Main Street kernel. Any thoughts?
NAT MORRIS: At the moment the elements of the SDK are closed source but hopefully when other vendors as well as ourselves are breaking into this market that is going to change. And hopefully with things like open compete projects there should be a good emphasis behind this to try and get these vendors to open up a lot more.
AUDIENCE SPEAKER: Thank you.
MARTIN WINTER: OK. Thank you.
(Applause)
Our next speaker we have Scott Wilkerson, I am very happy to get him over when we talk about OpenSource company there was always one of the which came up, and I am very happy to get two people from Nagios who actually attend the RIPE for the first time, I hope it's not the last time.
SCOTT WILKERSON: Thank you, Martin. It's really a pleasure to be invited to attend the first version of the OpenSource Working Group. I am going to in traditional presentation style, we are going to tell you what we are going to talk about and then we will talk about. Throughout the presentation, I will probably ask for some feedback from the audience and I kind of like to gauge how things are going, the direction that we take.
So I am going to tell you a bit about pie self, we will talk about what Nagios is exactly, a little bit of the history and the effects that commercial versions of Nagios will have on the system going forward. And then I have some Nagios announcements, some latest news, and then we will cover the Nagios Eco system and what I mean by that is Nagios project is just one portion of managing the whole thing; there is many different add?ons and plug?ins that have been created for Nagios, and it is challenging managing and maintaining all that. And we will talk about some of the challenges that a project of this size and magnitude has, and then finally, we will touch on supporting Nagios and then I will open it up for Q & A.
Before I get started, by a show of hands in the room, how many people are familiar with Nagios and think that maybe their organisation already uses Nagios? That is incredible, that is pretty much everybody. And I won't even believe that most of these people are in the particular department that would use it.
So that helps.
So with the vast majority of people already using Nagios, that actually helps my presentation and helps me understand the knowledge level of our product.
So a bit about myself, I am the IT manager at Nagios Enterprises. I was lucky enough to be able to join the Nagios team in 2011, just two years after we launched our first commercial product. I do have nearly 20 years in the IT industry, I spent most of it in senior management and and I owned ISP back in the day when 144 and ?? I do some development as well, not so much on the Nagios core project but a little bit on the commercial projects, but I do manage all of the information coming in, both from contributors to the project as well as our internal teams, technical support technicians and developers and staff.
So, what is Nagios? Most people believe that Nagios is infrastructure monitoring product, and that is correct, but primarily the term Nagios for a long time was synonymous with Nagios core, meaning the core engine that is utilised to monitor different systems, they can be IT infrastructure, hardware, launch sequences for space missions, etc. But Nagios is actually moved beyond the Nagios core, it is now a full?on brand company and trademark, and we covered the IT related industry as a whole. There is a lot of different projects that we work on, Nagios core is one of those projects but in?house we also contribute to some add on projects that run in conjunction with Nagios core like the NDOU Tils which is a back?end database for core, several different agents that can be deployed on systems, NRPE, we have a newly released agent, NCPA, which is a Nagios cross?platform agent. And we have our commercial products so that consists of commercial monitoring engine Nagios X I, and Nagios Fusion which gives you the ability to take multiple Nagios installations, either Core or XI, combine them together and then we also have a lightweight incident management and just released a week ago is Nagios Analyser.
So the different projects that we have hosted on source forge since 2001 when they were first moved there have approximately 5.6 million downloads to date. We believe there is a little over a million installations of Nagios Core worldwide. Some of that is a guesstimate, it may be much higher than that because there is RPM installations, prepackaged, virtual machines, etc., that aren't taken into account there.
A little bit of history: Nagios was originally developed by one man, Ethan Galstad, he is my boss and he developed Nagios to meet the needs that he had in his organisation in 1999. It was released as a ?? NetSaint was the name of the product but because of trademark complications there with another company, who believed they had rights to NetSaint, and executors decision was made at the time to change the name to Nagios in 2001. Nagios in the United States in Europe, it's often Nagios or ?? we really don't care what you decide to call it; we are just happy that people are talking about it.
So the name was changed to Nagios, which is a recursive acrim that stands for Nagios ain't going to insist on sainthood. Meaning even though we had worked out the trademark complications with the company, it was decided at that time to just rename the project and move forward, making sure that we didn't have any issues going forward. In 2007, Nagios actually formed a company to become a commercial organisation and Nagios enterprises was formed. This was an effort to actually, after over a dozen years working on the project, to actually make a commercial entity, and 2009 the first commercial product Nagios XI was released.
So in speaking with some people in the conference over the last several days, it became apparent to me that there is some uneasiness in the community because so many people used Nagios by a show of hands, that because a commercial organisation has been formed around Nagios, that perhaps the OpenSource Nagios Core is going to go away or disappear or something, so I'd like to first say that by no means will that happen any time in the foreseeable feature. As a matter of fact, the commercial version of Nagios, I believe, will likely enhance Nagios Core. Now that a company has been formed, we have the ability to give addition Alex pose our through marketing efforts as well as add?ons that have been created and generated for some of the version versions of Nagios, can also be utilised within Nagios Core. In addition to that, now that there is some revenue coming in, we can have in?house developers actually develop more OpenSource projects that either add on and extend the capabilities of Nagios.
So, moving forward. I do have some announcements. The first one Nagios Core 4 for those that aren't familiar, was released on 20th of September, just prior to the Nagios world conference. About six hours before I came on stage, Nagios Core 4.0.1 was released with just some wrap?up bug fixes. The Core 4 was a long time in the making. It has vast improvements in performance, both in the underlying monitoring engine and scheduling engine as well as adding in some functionality to be able to distribute the actual checks that are being performed out to various systems. It has a new query handler which is interface to give developers the ability to communicate with the Nagios Core engine underneath the hood, and in one of the key items that we are adding into Nagios 4 which hasn't been implemented yet, unless it went in the 4.0.1 release this morning, is a new backend API, it will be a Jason API. And the reason why we have added this into the project is to, again, give contributors and community members the ability to actually extend Nagios, building new interfaces, etc.
We also have a new agent released and that is a Nagios cross?platform agent. It was designed to be based off a one?source, so many different operating systems including Windows, Linux, various Debian operating systems will all be compiled out of one package. It's Python based but has some unique abilities to be able to get the same data out of various different machines and they can be extended. Also recently released is the analyser, this is a NetFlow analyser, it is a commercial tool but built on OpenSource components.
And finally the last piece of news that I have is that as a result of all of the information and feedback that we obtained during the Nagios world conference we actually have put out a search for next generation Nagios stars and what this is is, we are basically looking for additional community members to step up in the plate and say we really like this project and want to contribute to the project as a whole and that could either be in form of documentation, developers, etc.
So, to talk about the Nagios ecosystem, I use the term ecosystem because it's loosely defined as a community of interacting organisms and that is really what Nagios is. Nagios by design doesn't do anything until it's extended and it's because of that that we have a really large community of developers that have contributed projects to Nagios. And those come in the form of plug in that is can perform monitoring and checks against various systems but without the plug?ins and the extension that is people have written and community members have written, Nagios really wouldn't have the ability to do very much.
So, each Nagios installation by design is really different because every organisation builds it to the needs that they have. They add the plug?ins for the checks that they need to perform against their systems.
So, when changes are made to Nagios Core, it really can affect thousands of different plug?ins, both open and closed source, so there is many plug?ins that have been contributed back to the community and we host an exchange of those plug?ins, it's exchanged at Nagios.org, about but changes into the Core engine really can have vast impact so we have to be very careful when accepting vast changes, which is kind of why Nagios Core 4 remained in Beta for well over a year before it was finally released.
We also have, as I mentioned, several different commercial products and built on the same philosophy, should be able to be extended just the same as Nagios 4 can be extended and with that, somebody who is a customer of commercial products chooses to extend the capabilities of Nagios they can choose whether or not they want to release that code back into the community so it can be built on.
So, some of the challenges that we see pretty regularly at Nagios, some things seem to come up over and over, one of the biggest challenges is to actually guard the code. We need to make sure that patches, bug fixes, etc., when they come in, that they are not ?? that they are not just fulfilling somebody else's agenda or the needs that they have within their organisation. Those types of changes should be made through add?ons or extensions or plug?ins, etc., that is one thing. We have implemented some new procedures to revise the patch review testing and acceptance policies to make sure that for everybody that is using Nagios Core it's going to continue to be a fast, robust monitoring engine, and the changes that are made to Core will not affect items that have been created to extend their capabilities.
Another challenge is always protecting the brand and trademark. As I mentioned, Nagios is more than Nagios Core now; Nagios is a brand, a trademark, and while it's really exciting for many people to create add?on projects, it is a challenge that we run into because there is occasionally brand collusion where something, a new add on is created and they want it to be branded as a Nagios item directly or maybe they want to use the Nagios name, and in some instances that is OK, in others it's actually it can cause confusion.
The project itself begs to be extended. I mean, there would be nothing that Nagios could do if the ?? if a plug?in wasn't added to the system and while that is great, it also means that often there is items that are contributed back to the community that occasionally just don't function. We host an exchange of those, those plug?ins and sometimes they don't function and that is not really something that we can control, the source code is managed by the individual, that is the developer for that product.
Also, possible changes to the Nagios Core engine, as I had mentioned, could conceivably affect thousands of add on projects that have been developed to extend the capabilities of Nagios.
Finally, some other changes that we see as an organisation and some misconceptions is that Nagios enterprises will only support or is only interested in supporting paid clients, and that is simply wrong. We actually have support forum at support.Nagios.com/forum and we invite all Nagios users to come there and exchange information, help each other but we have a staff of technical support specialists that are more than happy to help out, both paid and unpaid clients. And the second misconception is that, somehow, that an organisation now exists that the free version of Nagios could disappear or we could start extoring abnormal fees for that, and that is is not going to happen because we really rely on the community and embrace the community, as a matter of fact we host a conference so people can share ideas within the community and the bulk of the conference attendees are just Nagios Core users.
So, people often ask how can I support the project I have been using it for a long time. We really love getting feedback from the community. The support forum is probably the easiest way to do that. But new feature ideas or potential add?ons, projects would you like to see in OpenSource or commercially available products, those are all items that we love to hear as feedback, either today or at a later time, maybe are via the support forum. And reporting bugs, I mean the project itself, while is very stable, there is a never ending supply of bugs. The action of actually adding features into a project is generally accompanied by introducing bugs but you have a patch for a bug we would love to have you submit that.
We have moved the Nagios development mailing list to the support forum, to make it easier for everybody to be able to see what is going on, they don't have to receive all the e?mails, but it also makes it just a little friendlier. You can find results on Google that will take you directly to the mail archives.
Another way to contribute would be to go to our support forum and assist other users and that may be experiencing a problem that you have resolved in the past, and that would be great.
I put up here financial support because, in the last talk we had ?? or the birds of a feather session that I had watched on?line and in preparation for this, there was a talk about supporting projects. So I put send donations, I thought that would be kind of funny. We'd accept donations to build additional features. But maybe a better option, if not do some of the other items and still support the project. In our instance although it may be different than in other projects, in our instance maybe considering one of the commercial solution that is we have that are are built on OpenSource is an additional way to support the project.
So, I'd like to open it up now maybe for some questions and answers. There is a lot of different items. As Martin had mentioned to me, that this would be a good forum for us to be able to really get some feedback from the community. I would be happy to answer any technical question as well as the ability to receive input and feedback on future OpenSource products in the marking and IT space.
MARTIN WINTER: So we have the room open for questions.
It's great to see the Nagios folks here, it's like kind of most of the support is interesting so you mentioned to me earlier basically from like the most like Nagios actually you seem to be at least slightly profitable, which is good to know that some OpenSource are OK to stay around and they they are not like on the brink of extinction, but like many people using it and some ?? it's sometimes nice if it's reported.
SCOTT WILKERSON: We are slightly profitable, which is good. We are not a nonprofit organisation. We are a small organisation and have approximately 20 employees in?house as well as some additional contractors and then contributors from all across the globe really adding to the project. So, we do have some funds to be able to come to conferences, put on our own conference, so that is good. Not every project is so lucky, and it really wasn't always that way with Nagios, obviously, put in well over a decade of his life before any revenue came out of the project so it's not an easy road, but with a lot of hard work we are able to make a commercial entity that does OK.
MARTIN WINTER: Can you maybe highlight what is the difference between Nagios Core and Nagios XI, between free and open commercial version?
SCOTT WILKERSON: Sure. I am going to assume that the bulk of the attendees are using Nagios Core, being the Nagios Core has been around far, far longer. Nagios XI is actually built on Nagios Core so you have the same foundation and fundamental engine. Nagios XI basically is a combination of many OpenSource projects. It contains Nagios Core and PE and NCPA, and then the primary differences, with Nagios XI you get to run easy configuration wizards, so setting up or adding new items to a Nagios configuration, it's just a step?by?step wizard. You run some basic steps. Configuration wizards can be added in to monitor various things. Some of the other advanced capabilities of Nagios XI, it has a user and session based authentication and has hooks into the system so like individual users can manage their own notification settings, they can create their own customisable dashboards and views within the system and it has an advance reporting engine, including some capabilities for capacity planning so you can take performance data that you have collected over time and project that out into the future, say six months, a year, two years, and see if you continue along the same path where would you be out in the future.
In addition to that, Nagios XI was designed to have an extended interface, it has the ability to take components that are basically just some PHP files that can add and extend the capabilities of Nagios XI, you can add reports in by basically writing a small PHP script. You can add additional configuration wizards, that takes any of the available plug?ins, there is 3,000 available just on the Nagios exchange and that doesn't include private plug?ins, you can take any of the 3,000 plug?ins, put a wrapper around it so while some of the people in this room may be the Nagios experts within their organisation, not everybody is, and it's nice to be able to give some other users in some other departments the ability to monitor some of the items without having to come to the Nagios expert to add in the monitoring configuration, etc..
MARTIN WINTER: One last question.
BEN GORDON: Ben Gordon from Sweden. I have a question about the other commercial vendors that uses Nagios as their ?? the base, we have a Swedish company OP5 and ?? do you feel that the commercial vendors also contribute back to the community with plug?ins and stuff or do they just sort of take ??
SCOTT WILKERSON: Just leach, no. Actually my experience is that the other commercial vendors absolutely contribute back to the community. We have a lot of different projects that have been started and use Nagios Core as a base for their own product and their commercial products as well as non?commercial products. The vast majority of them absolutely contribute back to the community, both in patches and development for Nagios Core engine, as well as additional plug?ins, etc.. so yeah, absolutely. It's not ?? the Nagios ecosystem really consists of tens of thousands of people that have developed items that really extend the capabilities of Nagios so while the Core engine is the fundamental building block, there is plug?ins and add?ons and that really comes from every ?? every end of the globe.
AUDIENCE SPEAKER: Thanks.
MARTIN WINTER: OK. I think that is it. Thank you very much.
(Applause)
As our next speaker we have Willem Toorop, and he is from the NLnet Labs, and talking a little bit about NS name server Daemon about version 4, the new one.
WILLEM TOOROP: So, I am tempted to ask the same question as Scott Leibrand did, who uses NSD? A bit less than Nagios. But still about 25% or so, I think, or maybe a bit less.
So, it's almost released. We hoped it was ?? we had the NSD4 release this week, last Monday, but issue came up and it's proposed ?? postponed to next Monday. So, NSD started out as a many and lien authoritative only name server and it still is.
Things were added during the years, and NSD3 came out in 2006 and was stable and continued being developed until 2010 when we started on NSD4. This is what NSD will bring with respect to NSD3. It will be much faster. The underlying database has been changed to a Radix based database. Also it will be more easily managed by a live backend, current NSD has a precompiled backend. It can handle more TCP conncections more than thousand. It is now /HREUB event and not select any more. Incremental transfers are handled much faster relative to the size of the transfer instead of the zone. Most importantly is the live back end. NSD4 will be much more manageable. We have added the concept of patterns which are sort of templates you can configure how certain zones should act as a secondary or primary and then add and remove them without restarting the name server.
Also, configuration can be changed and reloaded in the same server without restart. It's more manageable in the sense that it doesn't fork away any more. The PID stays the same, we have remote provisions and control programme which I will talk about later.
And it uses more memory, though that is open to debate. I will discuss that in another talk I will be giving at DNS Working Group.
The proven DNS protocol logic of NSD3 has not been touched, the one that was developed and improved over the four years or since 2006, that is almost seven years. The old database, the configuration file is backwards compatible though not the other way around, once you use the the NSD features you have to cut those pieces out if you want to go back to NSD3. The database will be converted to the new format, the Radix, on start up, but also if you want to go back to NSD3 then you have to convert it back to NSD3 format.
The NSD control programme is replaced with a new programme, so it's no longer there, so you probably, some people have to alter their start?up and stop scripts.
So, NSD control is now used to provision zones and do all other things with NSD that was formerly done with NSD C programme. It uses SSL. Defaulted connects to the server over the loop?back interface, but it can be configured to contact over the network. It has all the features of X ?? a bit of authorisation possible in the sense that the server authenticates the control programme and vice versa. They both check if the certificate of the other end is signed by the NSD server certificate. So the NSD will have the certificate ?? will be the certificate authority.
So the control programme needs three things for that: Its own certificate, its key file and the certificate of the server to authenticate the server and the server only needs the ?? or I am sorry, the other way around, the server own needs its own key and certificate and the control programme needs the certificate of the server and its own key and certificate. There is a programme that configures the certificate for use on your local host.
These are the other things that NSD control can do. It can write out zone files to disc, you can force a transfer without checking if it has a new serial or not. So, to the configuration file we have added the concept of patterns which are sort of templates to be used by zone files. Here, you have a pattern which, in which a secondary server for ?? two zone files that used the pattern. Patterns can also be nested. Here, there is a pattern which is a secondary at NLnet Labs and a secondary at CBI. So you add a zone that would be a master, and has the secondary at NLnet Labs and CBI.
And you can use NSD control at zone, at those zones on the fly without restarting the Daemon. A list is maintained in, it's in text format so it can also be edited manually, which zones will be served by the name server and which pattern they are used for configuration. And as you can see, you can use the zone, the location of the zone file is processed, top level domain is ?? the zone name, in this way it's very suitable for name servers with many, many zones, like thousands, hundreds of thousands or more. The NS control allows you to fetch statistics of the name server remotely or locally, as you wish. These are the ?? and that is in the form of counters. We have a plug in in the contribution section, which utilis the NSD controls. It makes all sorts of graphs, I just put in some of the more interesting ones here. And the second part of the DNS Working Group I will be comparing performance of different name servers and try to tell something about what makes the difference, why some name servers perform better in some environments than in other environments. But when configured correctly, NSD is now the fastest, I think, of the recent name servers. There is an issue going on here in Linux, you see, because when you use, try to utilise more than two Core performance degrades. This was not a case on FreeBSD.
So NSD4 uses more memory in that sense, that it has the back end database memory mapped so that doesn't mean that it has to be in Core memory, but it's swapped in and out and written to disc on the fly, so if you have 10% of that green part via set extra then NSD will perform OK.
So, release candidate 2 is now available. I suggest people who want to try it out, do that and I suspect there will be no issues in the next release will be next Monday. And that is my presentation. So it's ten minutes.
MARTIN WINTER: We have a few minutes if any questions towards this presentation? As you mentioned before, there will be more about the name server in the DNS Working Group. At what time again?
WILLEM TOOROP: At 2:00. At the 2:00 session.
SHANE KERR: Just a quick question. When you choose Libevent did you look at other models for using different sockets, like if you go to the Libevent page? It's quite compelling, I haven't done any benchmarks or anything.
WILLEM TOOROP: I don't really know, maybe router disc, but using Libevent for router ??
SHANE KERR: That makes sense then.
ONDREJ FILIP: I have a question regarding the performance tests or the graphs. Can you tell me which versions of the Daemons did you use for those graphs? Because, you know, this never ending for the fastest and best DNS server is really entertaining and it's always very important to state the versions of those Daemons.
WILLEM TOOROP: The version of knot. Knot is 1.2.0.
ONDREJ FILIP: That is Engine 1, yes.
WILLEM TOOROP: Your ?? I don't know ?? the current version.
The performance measurements have been done in June, so that is when the Beta release 5 was available and nothing has been done since on the performance part and so...
AUDIENCE SPEAKER: To answer Shane's question. While we ??
ONDREJ FILIP: Say your name, please.
Jaap: We looked at others as well but Libevent was most portable on all systems so that was the most reason to choose that one.
MARTIN WINTER: OK. Thank you.
(Applause)
So the last remaining few minutes I want to talk a little bit more from the Working Group itself, and as you don't know ?? all know, we have a very well Working Group and discussion come up like what do you expect out of this Working Group, what do you want to see here. We added this time, we got the Nagios folks in, something which was always mentioned as one of the key OpenSource products which everybody seems to be using. I was very happy to bring in like Nat Morris to talk about things you may not be using but giving views on eye opening or ideas for where the future might be going, in that directions. So it's like a bit, the question comes up, what do you expect out of the Working Group? First of all, if you are not on the mailing list, please subscribe there, start providing feedback, like specific things you want to see like show and present the next time, like your favourite products out, OpenSource community, somebody who has never been here, should probably try reaching out to them and getting them here. For me personally, I mean, part of the idea of this is really to get the communication between all these OpenSource products and you as the RIPE community as the users, like a little bit more and more, so we have a bit better understanding, what is going on and you will have a better way to help out and feedback there and preferably even help in some way to get ?? around the ones you are using and you think are useful. So, I want to open the floor a bit for ideas, questions or ?? things, like what do you expect out of the Working Groups? Do you want to bring up random speakers from other OpenSource, do you have other ideas what do you want to see here?
MIKAEL ABRAHAMSSON: I think it's good to bring in speakers for areas of interest to operators so it's like the whole ecosystem that we have been talking about here before where you actually go to commodity hardware and you put OpenSource software on it so all these components are now developed, not as closed source with vendor where you buy everything as a package but install your decoming those two eco systems. This is what I would like to see, how do you package the software and develop components like routing Daemons and managing software and all that, and yeah basically development in all those areas. And as operators, everything needs to be carrier grade and needs to be maintained for a long period of time. How does this work together with fast moving development that usually OpenSource is where people get together for a while develop together and then leave the project. So how do you make these two roles work, I think that is good to talk about.
ONDREJ FILIP: Thank you very much for that.
CARSTEN: I would be interested in hearing about people reusing OpenSource, maybe in not so expected ways and maybe also they see problems in their day?to?day use with OpenSource, by that giving feedback to the projects what can be optimised.
ONDREJ FILIP: Is it more request for a panel discussion?
AUDIENCE SPEAKER: No, I mean for the community, maybe not only to have the projects present stuff but also having like a mix of people really using the stuff and what they experience and how they use it.
ONDREJ FILIP: Sort of case study of some company using it
AUDIENCE SPEAKER: Exactly.
MARTIN WINTER: I think that is a good idea if someone will be very happy like at the next RIPE comes up and speaks about the experience on some of the OpenSource, things they are frustrated, things unexpected that went way better than ever expected, talk a bit how they are using and why, that probably would be very useful. So apart from it, what was mentioned before from the OpenSource tools and carrier grade, maybe today not some products that are like definitely carrier grade, I would say a lot of them maybe even better than what normal vendors sell. But there is sometimes other projects which may not be there yet but are sometimes very interesting ideas and me personally I see a lot of the things in there in OpenSource as new ideas because a lot of the things in the system, in ecosystem I see slowly changing over to OpenSource, more flexibility sometimes OpenSource allows you to do things completely different which may be that key advantage you can have, like other people or saves you a lot of money somewhere.
ONDREJ FILIP: Just curious, and to wake you up at the end, who of you sitting in the room is joined to the mailing list or follows the mailing list of OpenSource Working Group? And why the others are not?
AUDIENCE SPEAKER: Because I do not know about it.
ONDREJ FILIP: Please join it.
MARTIN WINTER: And speak up, feel free. The other thing if there are some favourite pet projects you are using in OpenSource community which you have never seen here, to keep in mind frequently some of these OpenSource projects are done by volunteers in their spare time and it sometimes it's very hard for us to convince them to come here, spend their own money to fly out to here to wherever RIPE is, spend money for hotel and conference fees. So feel free, if you have your pet project to already maybe consider sponsoring somebody to come out here and speak and talk to the community. And please support the OpenSource projects, the ones you are using, so they stay in business and you still have them next year.
OK, it seems we are done for the coffee break, the talks were shorter than expected. Thank you everyone for attending and see you next time at the next RIPE meeting. Thank you very much for joining us, thank you.
(Applause).