Archives

These are unedited transcripts and may contain errors.

DNS Working Group

Wednesday, 16th of October, 2013, at 2 p.m.:

PETER KOCH: So, ladies and gentlemen, welcome back to the second session of the DNS Working Group at RIPE 67, would the people be so kind slowly closing the doors. Welcome back also to the remote participants, show of hands of the remote participants would be great. Audio and video check. I will guide you through the second and final session of today. We are slightly reshuffling the agenda for some logistics. We will have have Willem then Dave Knight and after Jeff Osborn and we will continue on the agenda as published on the website.

With that, I would like to hand the stage to Anand Buddhdev from the RIPE NCC for the DNS report.

ANAND BUDDHDEV: Good afternoon Athens, and everyone. I work at the RIPE NCC and I am here to do a short update on the RIPE NCC's DNS services.

So, we have had a few changes lately and I would like to reintroduce you to our team, you know, people move around and stuff, so we'd like you all to know who we are and who you regularly talk to. So, as you may have read, we have a new member of the senior management team, cava and we all report to him. This team, the GII team which stands for global information infrastructure is led by Romeo and in his team is myself, Colin whom you will have seen walking around at this meeting ?? so these are the six current members of the team. We have one more place to fill and we are looking for a candidate and we hope to fill that place soon so there is seven of us in this team.

So, this GII team doesn't only do DNS, I mean we do of course manage the DNS services at the RIPE NCC, we run K?root serve and authoritative DNS servers for Reverse?DNS, we provide secondary DNS services to ccTLDs and other organisations. We also run the e.164.arpa ENUM zone and a small AS 112 instance as well.

One of the other things we look after is the RIPE NCC's DNSMON service, this is a service where we send queries to top level domains and plot responses on graphs.

This team also is responsible for RIPE NCC's routing information service which consists of a number of BGP route collectors, the data collected by these is fed back into our systems, goes into RIPE stat and is used to show all kinds of interesting stuff such as BG play. And we also provide support for the RIPE Atlas anchor project so we in this team we don't actually do any development of Atlas anchor software. That is done by our research and development team, but we provide the back end services. So we run a whole bunch of servers, we have clusters where we do all kinds of post processing of Atlas and RIS data and this goes into we display fancy graphs and that kind of stuff.

So, one of the services we run is K?root. This service has been running stably for a long time, we have 17 instances in various places throughout the world, five of these are global and the other 12 are local, which means that they only announce the prefix with no export to peers at an Internet Exchange.

At the moment, we are busy with doing major operating system upgrades. We run a slightly older version of Centos so trying to update and we are also moving to a configuration management tool called Ansabil and perhaps I can talk more about this in future presentation. We are still busy with getting everything organised here but it's interesting and we are doing some fun stuff with it so perhaps I will talk about it in a presentation some other time.

We have some hardware replacements, we have to replace some of the routers and servers so we are talking to the K?root hosts about this.

For the time being, because of resource constraints and because we are so busy with lots of other things we are not doing any expansion with K?root, not this year, and not in 2014 either. We want to focus on some of the other stuff we are busy with especially improve our RIS architecture and the back ends. So, K?root expansion is planned for 2015 and you will see announcements from us about this next year.

The other DNS cluster we run is the authoritative DNS cluster. We have two instances of this, it's Anycasted in Amsterdam and the other site is in London.

We are also making improvements to this system. We currently have a provisioning master and we are trying to address I will yen see there by introducing multi?master set?up. This should also help prevent the kind of disaster we had last year where a number of reverse zones were pushed out empty.

We are also actually planning on deploying a third instance of this Anycast cluster and we hope we can get it deployed by the end of this year.

One of the other services we provide is secondary DNS for cc TLDs. So at the moment we support 71 cc TLDs, of which 26 are in the RIPE NCC service region. The others are in places like Africa, Asia and various small outlying islands and stuff. In total, we serve 240 zones, so this also includes second level domains for some of these cc TLDs and some of them have internationalised domain namy equivalents for their cc TLDs so we also secondary those. This service is a best effort service so we don't have any service level agreements with any of the operators. We ?? you know, we operate this but we provide no guarantees at the moment. And this is one of the things that I would like to bring up with the RIPE NCC community. We would like all our services to be provided in an open and transparent fashion and for this particular cc TLD secondary service we don't have very clear guidelines on who qualifies, who doesn't, whom do we say to and no to, and with all the developments going on in the DNS world, we would like to have a clear and transparent policy about this, so that we can say to our community and users in general, operators, that this is the RIPE NCC's policy. So, we are really asking the community for input and guidelines on what this policy should look like, and Peter will talk a little bit more about this later.

We also operate the E164.arpa zone which is for ENUM. As you may have heard from Carsten earlier this morning, there are 52 delegations in this zone, there have been no changes since the last RIPE meeting. Six of these delegations are secure, and again, no changes, so there has been no movement.

I mentioned earlier a service called DNSMON, which we have been running, and we are busy with changes to this service. We are switching to the Atlas anchor and probe infrastructure for doing the measurements for DNSMON and the user interface that goes with this is currently in development, we will be testing this interface in ?? towards the end of this year and we plan to release it in the first quarter of 2014.

At that point, the current DNSMON, which is based upon test traffic measurement probes, will be deprecated and we will no longer provide support for it.

So one of the bigger changes that will come with the new DNSMON is that the raw data format will be completely different, so the Atlas anchors are doing measurements and they produce data in Jason, this raw data is available for anyone to download and users of DNSMON should be prepared to adapt their scripts and code to deal with a different data format.

And that is the end of my update. I am happy to take questions.

PETER KOCH: Apparently, no questions. Then we have to take up the threads on the ccTLD secondary service, the RIPE NCC would like to have feedback from the community, we have discussed this topic over the course of the years a couple of times. We have talked about this and tried to find a way around this so you get some more information to make an informed decision or give informed input so the current suggestion is we are looking for maybe 2 or 3 volunteers who would ?? we would like them to identify themselves by e?mail to the Working Group Chairs until the end of next week and then we will come pile a few people, RIPE NCC will draft some text on the current state of affairs, make some suggestions, we cook and boil this a bit, come back to the Working Group with a suggestion and then have more informed discussion on that.

So, as I said, let's not have the volunteers right now and here but we will send that request to the mailing list which gives the remote participants a chance, stay tuned for that.

My humble co?chair is going to co?manage me, which is helpless.

JIM REID: I just got a comment on a question of clarification. Would it be the intent that we actually produce a policy?

PETER KOCH: Thank you for asking that question. If you so wish, well currently it's not necessarily the case that we immediate to subject the question to the PDP, but if people would like the exercise, we can of course follow the community's wishes. Niall.

NIALL O'REILLY: UCD. Maybe you had it's just me, I am in bear of very little brain mode but it's not clear to me what you are asking the volunteers to volunteer for you.

PETER KOCH: Good point. The question ?? the question that Anand or the NCC is posing is should we continue the probone in a secondary service for ccTLDs and maybe if not, why not, or if yes, in what shape or form? If that is to be continued there would have to be some eligibility criteria to give the NCC more guidance, which is what basically the NCC is asking for and these criteria might need a review of their current state of affairs so what the ?? what the reviewers would do is look at the current data set, who is being served, what might be a good description of how they become more maintained and ?? remain eligible. If you are interested in that topic, just send e?mail to the Chairs, we will explain to you in more detail what we actually expect but you have subscribed then already. So it's a one?way ?? you don't get out, it's an opt?in to to interesting task, if you are interested in this secondary service topic, please approach us and we will report back to the Working Group. With that, thanks, Anand.

ANAND BUDDHDEV: Thank you, everyone.
(Applause)

PETER KOCH: And next on stage is Willem Toorop. And he will give us some insight, conclusions and suggestions learned from a diversity of performance measurements.

WILLEM TOOROP: Yes. So, in June, we tested the up coming NSD4 for performance against some other name servers and we compared performance with multiple threats to see if it scales up, and also on Linux and on FreeBSD and also what memory usage was. And we tested it against those name servers listed. And we noticed that not every name server scaled up as expected by adding new, more threats and processes to match the available CPU cohort, and also the memory requirements is more subtle story than just saying, well this name server has the best ?? uses the least memory and you should this because it is fastest. So this presentation is about the findings we found out, testing NSD4 performance.

To test UDP query rate we used DISTEL set?up, a player controls replayers, they send out queries to the server, with TCP replay, also the replies are captured with TCP dump and afterwards everything is analysed. This is the hardware used. It's all donations from RIPE, thank you. It's also seven years old, is what I used ?? to use and now we can have it. We used a fake route zone that we used in earlier performance tests so we could compare the tests that we already did. And here are the results for FreeBSD, and everything looks good, it doesn't scale much after three course but some other name servers do. But on Linux, the picture is different. You see only not any ?? performed better with more than two cores utilised. We were thinking, what is happening here? There is one suggestion ?? oh, no, so the difference between NSD and the other name servers is that NSD uses processes and forks off. The processes to answer queries, and the other ones use threats and also recombined with threats for comparison.

We have a suggestion that it might have been the non?uniform memory architecture, that means that the machine had two dual Core processors and one processor might access certain parts of memory quicker than than the other processor.

Also, it was noticed that during these performance tests, the key IQ Daemon saturated our remaining cores, it had loaded everything or had full CPU load. So, it might also be something with the network driver, but it's ?? it was a fairly ?? machine.

So what can we say with these results? For BIND and knot on Linux, use whatever you have for, perhaps one Core less, for NSD use two out of four cores. For FreeBSD just use everything you have got. Here are the results ?? here you see how the number of queries grades for FreeBSD on the more and more cores with more and more cores used, which is as expected and for Linux you see it's bumpy, you see especially with knot it's rather bumpy.

So, we also tested TCP performance using PowerDNS, DNS TCP bench, that is why PowerDNS is also in these results. Here is ?? you see that everything degrades more or less after two cores, except for NSD 3, all the rest do best with two cores or less, most surprisingly was the FreeBSD performance.

TCP on FreeBSD works currently different or slightly different than on Linux, in that sense that if TCP server on FreeBSD can't handle the query load any more, it sends connection resets. On Linux it doesn't do that, it turns it into time out.

So, what can we tell from this? That on Linux, the degrades are slight, so considering that you would mostly use UDP or answer UDP queries, you should use the same recommended number of cores as with UDP and accept the slight performance decrease for TCP, we concluded.

The same holds for FreeBSD, except with Yadifa, but this must be a bug, we think.

Memory usage: The machine had eight gigabytes of Core memory, and we loaded all name servers with the dot NL zone of June 2013. It has 5.3 million delegations and also is NSEC3 opt out signed, which adds another 1 and a half million records or names to be resolvable.

You see that there is a difference between the name servers that use zone compilers and the name servers that don't. BIND and Yadifa don't, they are reading the text file and build up their certainly database from that. Knot and NSD use zone compilers, the first translate the zones into databases, then write out the database and the next time you start up the name server it will use the written?out database.

So, for memory, if you are going to compile the zones from the same machine as your name server is running, then you obviously need more memory than eight gigabytes for all the zone compiling name servers, but it's also to be to do compilation on a different machine.

So, also, in NSD3, the database is read only and all updates that you receive are written to a different file and then committed to the database in memory. And in a ? job regularly you run DS D patch and it matches those ?? updates to the database to the database and writes out new database, and it needs another 3 gigabytes of memory in this case to do that because it just needs all the memory it needs for the database.

The green part of the memory is memory mapped and this means that it's convenient to have a Core but it doesn't need to be in Core memory, it can be swabbed out to disc when needed and NSD4 uses this ?? uses its back end database in memory, mapped memory, which means that if you have a small incremental updates to your zone then it only needs to modify a little part of that memory and it doesn't need to be in Core memory, so, actually, NSD4 would be able to perform this NL zone in NSD3, given the fact there won't be much large updates.

Though if you have complete transfers then it would add another six gigabytes. But as I have told you it just works.

So these are the start and stopping times we looked at. We used knot 1.2.0 which doesn't have the ? compiler yet, so it might be a bit slow, also as we saw on the previous slide the zone compiler went into swap space which might have slowed it down a bit as well.

NSD4, zone compiler, takes a little while because it has to write the memory map to back end file. So, what suits your environment best? Do you regularly need large updated zone files or new zone files that need to be compiled, because then you would be better off with a name server that does not compile. It's also notable that NS 3 and NSD4 stop quickly. NSD3 has the read only database and NSD4 applies all modifications, so I can only guess that the other name servers have some state that needs to be written out when they stop.

So, what are the overall observations:

FreeBSD is faster than Linux. Except for Yadifa and TCP of course.

FreeBSD CPU cores are more dedicated to your application, and Linux it depends on whether your servers uses processes or threats. If it uses processes, use two cores less.

BIND and Yadifa use the least memory. NSD4 memory depends on the size of the updates because of the memory mapped back end. Also do you have large and new and updated zone files, then you do not want a zone compiler, otherwise there is name servers with a zone compiler might be quickest.

Perhaps not ?? they don't need to write out data or states to disc means that the NSD servers might be more resistant. NSD, all the name servers they handle updates and use zones without restart except for NSD3, though it starts the quickest. And also, this test was, these performance measurements were done to measure NSD4, so if you want to ?? and we had this observations and we thought it might be interesting to tell you about it but to give vigorous advice then you would need to test more different environments, of course, like different processor types or different network cards.

That is my presentation.

PETER KOCH: Thanks. We should have time for one or two questions.

AUDIENCE SPEAKER: Just a remark. We have dropped zone compilations from 0 to 1.3. So this is inaccurate. And we sped up the new zone, is really, really fast. So don't just take those measurements into account.

WILLEM TOOROP: Yes, they were performed in June and so the presentation is with the June data. We had no 1.2.0. But, yeah, will do that.

AUDIENCE SPEAKER: Liam Hines, for your performance tests what type of queries were you sending to the name servers?

WILLEM TOOROP: All different type of queries, A queries, A records or AAAA and NS, and they all existed, there were no non?existent.

AUDIENCE SPEAKER: Did you try different combinations like all As? All AAAA? 50/50? Maybe some C names in there? TXDs?

WILLEM TOOROP: Yes

AUDIENCE SPEAKER: Do you have those results?

WILLEM TOOROP: No, they are just all sent out randomly.

AUDIENCE SPEAKER: OK. Thanks.

AUDIENCE SPEAKER: Raffle. A bit to extend on that question. Did you actually, did the queries, were there EDNS or just plain DNS?

WILLEM TOOROP: It was plain DNS, so no EDNS

AUDIENCE SPEAKER: Every record you asked basically existed in the zone?

WILLEM TOOROP: Yes

AUDIENCE SPEAKER: There was no kind of NSD because that when it gets more complicated.

WILLEM TOOROP: Yes, it would be better for us actually if we did ask for non?existent names, I think. But, yeah, we did not, no.

PETER KOCH: Thanks. That seems to be it with questions. Thanks, Willem.
(Applause)

And next on stage is David Knight, who is giving us some zoological insight, I guess.

DAVE KNIGHT: I work in the DNS operations group at ICANN. And over the past year, we have been developing a replacement for DSC called hedge hog. So, hedge hog is a drop in replacement for the DSC presenter. We liked using this, we used it with L root for many years, but over the last couple of years L root has grown a lot, and DSC just didn't cope with the new scale. Currently, we have about 350 name servers spread across 150 locations and presenting that with DSC as it is just didn't work any more. Also, if we wanted to have multiple presenters we have to copy all of this data around and it was just becoming too hard to work with.

So about a year ago we asked SYN Internet technologies to investigate for us a suitable replacement. Ultimately we want something more than just a straight replacement for DSC, we want to do more things to better understand the health of the L root service, involving deeper inspection into queries and things like that. So looking at what was out there there didn't seem to be a lot of active development going on in other things so we asked Sinidin to go ahead and develop a replacement for us.

So it uses the DSC collector, with one change, you would normally use the R sync to copy data files back to the presenter. We now use a post list to the web server directly, using mod DAF FS. So every agent has a client certificate for it and uses certificates to authenticate the uploads. And the collector is pushing XML towards the presenter. The presenter now has a data importer on there which can take either the XML or the dot data files which injects into a database and that is pretty much a drop?in replacement for the existing DSC extractor script, so if you have an existing DSC set?up you can turn up the database, modify the script to use the new importer and start importing your DSC data alongside into hedge hog. The web interface is written use R and HTML 5. R is a stats language, which will allow us knew tour statistical analysis of the data. Currently we are not exploiting any of those features but that is in the road map. I was going to do a demo now but my prolapper makes the projector sad so I will keep that to the end.

The web interface has interactive plots and can also generate static plots to be used in reporting. I have a nice time picker and also you can type in text to select time ranges to generate plots. And the gooey is optimised for large data sets, we have over 300 sources of data for L root, and we know it scales to at least 300 nodes and we assume it will scale far beyond that. And one of the nice things about this is that with DSC if we wanted to have multiple presenters we would have to copy all of the data around and with this set?up we can have a single database cluster and lots of presenters in front of that talking to single database back end. And one other thing: With automated caching of popular plots, so the daily plots will get generated on and cached so they will load quickly.

In the road map for the next releases we want to add support for RSSAC plots, we have a requirement with L root to generate statistics related to root zone scaling which currently we do with a bunch of pearl scripts looking at pcaps, we want to build this in here, and when I show you the demo you will see because we have so many nodes to include a plot is a bit unwieldy at the moment so there is a plan to improve that. More caching of the popular plots will go in there. And we also want to have node maps with, I am not exactly sure, we will put the geomap by node was.

And additionally we want, right now when you look at the plot there is no information about what is contained in it so we particularly for use in reporting we want to see method data about the plot, when it was created, a description of it and a static URL to download it.

In version 2, want to further enhance the node selection and tidy up the web page a bit. You will see this in the demo, it's one big page right now, and so organisationing this so it's easier to see in a single page on screen at once. And also to add ?? right now, we have a picker that let's you select today is a plot or one hour is but if you want to do advanced ?? if you want to select a specific time period you have to type it in text so the idea is to add a picker to be able to do that graphically and add an option to export raw data from the thing and start to actually use the R stuff to do more statistical analysis of the data. And then beyond that, ideas that we have are to do more interesting displace of the data.

And also add the ability to drill down into pcaps because we are capturing queries on the nodes and we want to make that visible through this interface. And as you probably ?? this first version is the drop in replacement for DSC, it hasn't been fully optimised for speed yet, so right now we are running this on one box so I hope lots of people won't start looking at this while I am trying to do my demo because it might go quite slow. I missed a point. And we also want a new way of uploading the data that is just a bit more lightweight than doing it with XML.

Hedge hog is on?line for L root right now. ICANN doesn't tend to release this as free OpenSource software but we have not yet figured out the licensing details, we hope to do that soon. And if you have any questions, you can direct them to Terry Manderson at ICANN. I am going to be leaving ICANN soon so sending them to me will probably not work. So I will try this demo now.

PETER KOCH: While you switch your laptop, I see so many people running to the mike for the most pressing question: What the name?

DAVE KNIGHT: The name is Hedgehog and it's Hedgehog because we liked hedgehogs.

PETER KOCH: Great, now the demo. The low resolution of this makes it a bit worse than it does normally. The default page here, right at the top this is the basic time picker so by default it will show a plot of nodes by ICANN region during the 24?hour period ending at midnight last night. There is a advanced time picker where you put in a specific period by typing in text. You will see it can generate static plots to be used in reporting or the default is to do interactive plots and the plots for that is to show queries by node. What you see in the plot, these five lines are the five ICANN regions, and as you scroll along this the counters for each region update and it tells you exactly the time that you are hovering over, and you can zoom in on this, unfortunately I can't see the right?hand side of this but trust me there is a right?hand side to that. So you can zoom in on the plots. And then in the bottom here is the node selector, and so this panel here has all of the nodes, so it's quite unwieldly there is about 350 individual name servers there. But we can break into individual regions and so that selects everything just in Europe or we can deselect those and select individual nodes. So I can pick Amsterdam and regenerate the plot and then get a plot just for our node in Amsterdam. And I actually picked two nodes, I guess. So that is Amsterdam and Barcelona, I think. And if you are familiar with DSC, then the things that you can see here will be quite familiar. It's pretty much the same set of things as would you get with DSC, for those we could look at by what chaos queries they got. And yes, so like I said, this is on?line, it's probably easier to see if you go and investigate this for yourself, just if could you try not to all do that at once, that would be good. Thanks. Any questions?

PETER KOCH: Thanks for visualising the chaos, Dave. We should have time for one or two questions. Thanks, Dave.
(Applause)

So, while the AV now gets rescued from the resolution change, I would invite Jeff Osborn to the stage to say hello, basically.

JEFF OSBORN: Hello. That was too easy. Actually I wanted to thank Peter and Jim for giving me a chance, I had promised this is going to be a public service announcement rather than a commercial, but there are a lot of shared stakeholders in what ISC does here today and I just want to give a little bit of update about what it is we are doing. I had a lot of input from my compatriots here, I am going to read something to keep it simple.

My name is Jeff Osborn. I was made the CEO of ISC two weeks ago on August 1st. I think most of you know, ISC basically is BIND and a lot of BIND users have heard things coming out of ISC over the last year or so that have been disconcerting. We have heard your concerns, we are working to change some changes. We are trying to refocus on our original goals which were the public benefit of the OpenSource software BIND that so many of you ran and so much of the Internet works and runs on. We are continuing to maintain and develop BIND, this year rereleased 9.9.4 including RRL, we releasing 9.9.5 early next year and alpha release due out shortly. We are committed to improving the quality of it, we have got increased QA function, we are hiring now for some people in QA and something we hope to get better at.

Another issue we had was pricing, and I wanted you to know one of the first things we did when we switched leadership in the beginning of the month was to ensure advanced security notifications are available for everyone at low reasonable fee, reduced by quite a bit for non?profits and education and for folks look the root operators who require this no fee at all. The previous subscription model was intended to be a funding model and in practice it turned out to be wildly unrealistic at best and real alienated a lot of the people we would like to re?establish relationships with and continue to work with. So, we are correcting that and other mistakes and I have got say for myself and the organisation from having been a part of that failed process, we are genuinely sorry, we meant well.

There is currently a subscription version of BIND. This is something I believe is a mistake. I think it's up to us to support what should basically be an OpenSource piece of software. And again, there was an attempt to figure out a way to enhance funding for the organisation and I think it ended up losing our way. So, you can appreciate this is going to take a little time to separate out but basically all of the advance functionality of BIND as we develop it in the future will be part of the OpenSource piece that is available.

The BIND 10 development continues although the recent development was challenging because of the way the funding for that product stopped. The pieces of that we are expecting to come out are sort of in a separate order, it's not a monolith, it's individual chunks. DHCP came out well, we are looking forward to having that available later this year and we are launching a demo at IETF next month, we hope you stop by and take a look.

The original motivations behind how BIND 10 ?? again, it kind of got lost and we are really looking for input from you the user community to figure out how we can best go forward. OpenSource is the heart and soul of our work and I want to assure you that the OpenSource continuation of and improvement of BIND is the heart and soul of what we do and it always will be. So, we mean well, thank you for your time, and we are looking forward to earning back your trust.

PETER KOCH: Thank you. And I think you are available for the rest of the week for face?to?face communication.

JEFF OSBORN: Yes.
(Applause)

PETER KOCH: Thanks. With that, I'd like to call Florian Streibelt stage and he will give us a presentation on some side seconds of EDNS 0 option.

FLORIAN STREIBELT: I heard a couple of people are capable of knowing DNS, I am from academia so there might be some bullshit included.

This talk is not about repeating all the arguments from the IETF mailing list whether it is good or not or evil or whatever. I don't know if it is good or evil but it seems to solve problem for some people. And why writing on my thesis I discovered on some things that might be of interest for the community.

The suggestion today is that we are ?? a lot of people using DNS resolvers, for Google about 7 percent and there is a study from 2011 that almost 9s% of people are using public DNS resolvers which is a fair amount of people.

What the problem here is that
With these resolvers gaining momentum all CDNs having problems with because they can't locate the clients so this leads to mislocation of the clients so the assignment to the CDN cluster servers giving bad performance for these under users. Using public available /24 from Google and put pie on this script in your resolver or something like that. But obviously this does not scale and not for example for other public DNS resolver address.

This is proposal from the ?? community they have extension that adds the actual client IP address or later the client IP subnet into the query to the authoritative name server. And this means that the authoritative name server of the CDN gets to know the client subnet and can do the DNS tricks and now they even have the client subnet not only of the resolver, which means that the assumption that was needed before that the ISP resolver is nearer to the client now is not needed any more because because I have ?? data of the client location.

And there is a study that shows that this actually gives performance gain so the problem that the CDNs have is being attacked with these solution with this extension.

How does it work. I have a standard EDNS 0 extension, put in an option code for this extension, and now it's officially IANNA assigned so there is not this problem any more. And basically I include the client IP or prefix and the answer differs in one byte including a scope that allows me for caching the answer. And what is interesting here is that as a research or attacker or competitor of a CDN I can put in whatever client IP address or client prefix that I want so I can impose every location using this extension, using just one single vantage point. This is pretty interesting.

To enable all of this your primary name server has to be ECS enabled and some people are obviously still using load balancers in front, they have to enable them as well and your name servers have to be manually white?listed by the public DNS providers and to receive this extension because they are of course, first of all afraid your name server might break and we actually notice name servers behaving strangely when being queried this way. But we find almost 13% of the host names in the list may be already ECS enabled. Why it's roughly 13% and maybe, well if you look at this you see the answer is just differing in one byte and if the name server just copies back all of the request I see a 0 in the scope which means I ?? that the name server didn't use this extension but we don't know if it's just a copy from a name server that is not a bearer of this extension or if it's actually a server not using the extension to resolve the name. So it's hard to get real data here.

So, what we showed is single vantage point is sufficient for this to impose every location we used the RIPE RIS data to use all of the announced prefixes or publically announced prefixes, and push all of them into DNS queries and we did this for Google and YouTube, which is basically the same, my squeeze box and edge cast and cache fly and torrent freak, same A order, whatever. I want to present a subset our experiments and results, and there will be new measurements as well with the other side of the traffic we get all of the traffic from CDN, so we can look at the other side as well.

There will absnapshot what have we can see there as well.

So what we actually did is, we were able to, we resolve the google.com with NS 1 which is some Anycast name server from Google. With all of the RIPE prefixes. And over a couple of months we saw a massive growth of growth of A records returned from the servers so we find more than 21,000 different IPs in more than 760 ASes worldwide having Google global caches so if you subtract the one AS that Google officially uses you get all of the different ISPs, and if you are interested in more of this growth, there is a paper being published next week I think on this from another group that has more on geolocation of the serves and so on.

But what we see is, we find all of the Google global caches in various ISPs and these are by NDA not allowed to talk about that and not allowed to advertise but we as researchers are. This could be interesting for other ISPs for example, to find out who of my competitors has a Google global cache and how do they redirect their customers there, when are they overloaded and redirecting to the original Google serves and not into the AS of the respective ISP. Also, we see how a CDN is growing, for example we were able to observe how the YouTube infrastructure was we think included in the Google global cache because the IP addresses fall together and we see a huge increase in these caches. Also by comparing the result for four different vantage points we see that actually it is being used and if we have one query executed in four locations in the same second and we get the same results only differing by two A records we see obviously on these two occasions the caches maybe have been overloaded and redirected to the original service so there is a pretty huge amount of information we can try to infer from this. We do see most of the time the clients are being served from the ASs, so the Google caches do work.

And the we also see that it is actually the information and the ECS extension is being used because the results are not dependent on the vantage point which was in Europe and USA so same results for different vantage points which is one comment from Peter Koch at the IETF meeting. That we should look into that.

Now the other thing that gets returned is the scope that allows for caching and if you look at this distribution, the bubbles are the network prefix length from the announced routes and you can see that Google is giving more specific answers and edge cast is less specific so they are aggregating and if you look at the right, you see that Google is giving a huge amount of /32s, for example, that means that a resolver that is resolving such a record is not allowed to cache this for different clients so only one client in IPv4 is allowed to get this answer. And yes, this affects the ?? makes your caches explode if being used in a wrong fashion.

Now, if you look at this scatter block we see this more clearly, smaller CDN on the left, edge cast is giving less specific, that means aggregating the answers so caching resolver I am allowed to cache this for a huge amount of client networks while with Google the other is the case. So, when I query four /24 it might be the aggregated up to 32.

And the other experiments we did is looking at the CDN side, I got data last week so there is not that much done until now. If we map the client IP we see in ECS request to the known location of the /24 of the back end server, we can try to refer some data from the Google location database and of course there will be some future work with that we can see that Anycast for example is working and there is one plot that you can actually probably not read really but on the right you see, so this is a plot from all of the queries that the CDN received from the Google resolvers in Berlin, the subnet is known on the website for Google. And we see that a lot of requests are coming from Poland and from Germany, Russia, and Czech Republic and so on and so this is a regional difference, these are absolute numbers so they are not useful to compare but we see Anycast as working and we have a huge amount of clients from there and if you compare this to Frankfurt what we did you will see a lot more of the Eastern Europe countries that are peering, for example.

What is the conclusion:

Well enabling ECS gives better performance for the clients, comes with a trade?off for DNS providers and CDN. It reveals internal information, for the CDN we can see how they are growing, where the data centres are and caches are and from Google we can infer from their geolocation database. And also, well, this enables not only researches but also competitors to look at this data and this will bring more information to them than you actually want. And there was a discussion that this whole extension testing is just a close experiment. Well, it's not, it's actually being a public experiment in the public Internet and you have to be aware if you do such a thing, other people could as well use this data and it might not be as harmless as it looked in the first place. That is the title. And we think that future adopters and current party players in this party should be aware of this. Thank you.

PETER KOCH: Thanks. Questions? Come on. We are an hour away from the lunch. You are awake now.

FLORIAN STREIBELT: Nobody listened or I didn't tell any bullshit.

PETER KOCH: You should have been a bit more provocative maybe.

FLORIAN STREIBELT: DNS is broken, it sucks.

PETER KOCH: Thank you. Next in line is Kostas, we now have a local contribution, really down to operational reality, I guess. Kostas, you have got 20 minutes.

KOSTAS ZORBADELOS: Hello. Coming from minor clarification, you seen in the programme that I come from AutoNet, was a subsidiary company during the ISP business which was absorbed by the parent company so we are now OT.

Well, in contrast to various others, we try to keep our users and use our resolvers, our infrastructure. We don't know if we succeed, of course, but our scope is to give the user the best possible experience with our infrastructure. So, having said all that, a few words about OTE. These are not marketing slides, I hope. We are the largest ISP in Greece, we are the local incumbent. We come from the old Greek telecommunications organisation, which was a monopoly. So we own pretty much all of the infrastructure around Greece. We have presence with nodes in our network all over Greece, with multi?10 gig exit points to the local Internet via another subsidiary company called dot globe and we also interconnect with other Greek ISPs in the local Internet Exchange, Greek.

So what do we provide as services? Pretty much everything in terms of IP. We have retail customers, broadband, we have corporate, leased lines, VPN services, and also all the local traditional ISP services like web, web hosting, domain hosting, mail services and of course, IP TV and recently VoIP services, which actually tend to replace the entire PS D N world, we will see how that one goes of course.

Our DNS services, we have both authoritative and resolver infrastructures. We have, we provide authoritative DNS service for both our own company domains and customers. We have around 35,000, 37 thousand was a bit more like exact number the last time I checked. We are domain registrar for dot G R. And we provide DNS resolving not ?? proproviding an open resolving service; we just provide the resolving service for our IP networks and the customers that we have behind our AS.

We have splitted the DNS resolving into the infrastructure and the clients, so end users do not use the same infrastructure as our services, our servers.

So, the rest of the presentation has to do about the resolving services and the redesign for that, which is a recent development. The old set?up involved farms of machines behind load balancers in two physical data centres. All clients were served two resolver IPs in the nodes, the determinating devices and were directed first to the primary and then secondary location. Using this of course, the prime primary location might look like single point of failure, and it actually was; we had some extreme cases of failure but our design was, had a lot of redundancy in various layers. We used carrier grade switches and load balancing modules on this. Each server had presence on both switches. And as you might imagine from the set?up, this is not the most straightforward or clean network design. As I mentioned, we had our outages but in very extreme and rare cases. So, the understanding was that those services worked fine for quite a few years actually, we had this set?up for I should say a decade or so, and as you can imagine if something works, there is a bit of a reluctance on why to change it, so there was scepticism even inside the company why we should redesign the service.

Well, one strong motivation is that the load balancer components from the vendors were end of life, and they will also be end of support in the following year, so at least we had to do something about that.

So, having heard of all the Anycast deployments and mostly the route name servers have that ?? we went on with it so most of first of all because we have no reliance on a single data centre location, DNS is critical, OK, it has also all the characteristics for a service to be Anycasted, it is connectionless or having short lived TCP connections. And to be honest, with this set?up, we ended up simplifying our network set?up, we don't need load balancer component and we leave load balancing to the routing layer.

Well, one distinguishing feature of this presentation is that we talk about Anycasting DNS service one Autonomous System, we do not use, therefore, BGP for that, so I am describing a set?up used internally.

So, how did we proceed with that?

Very cautiously, I must say. The initial target was the infrastructure resolvers, tried to minimise the impact to end users. All our tests and evaluations were conduct on VM environments, and the initial thought was to have each DNS node have two network interfaces, to separate the management, measurement traffic from the resolving traffic. But this turned out to be much more complex and little benefit produced out of it, so we abandoned that. We just used plain old point?to?point links, /30 point?to?point links on each node. We chose to easy to remember IPs for the Anycast services, one for the infrastructure and one for the end users. And our first testers were ourselves. We tested that quite a few months in the engineering teams workstations and then we proceeded with production infrastructure. It went on quite OK, I must say; we didn't notice anything unusual.

So, having settled on all those software choices, the main design choice in Anycast set?up is where should the nodes be in the network. There are also quite a few practical restrictions, well OTE has a lot of infrastructure and buildings and legacy equipment scattered all over Greece but not all these buildings have the necessary capacity and infrastructure to host servers. So, apart from the optimal set?up, one might say that we could deploy Anycast nodes as close to the end user as possible and that would be a lot of locations and not all of them will have the infrastructure, even electricity and basic stuff to host server equipment.

So given all these restrictions, we tried to figure out an optimal set?up for the load distribution in our service. So, in order to do that, we have to say a few words about the network topology. And we will consider the typical three layer approach, Core, distribution, and access. So our Core has points of presence in five or six major cities, Athens, Patra, Tipoli, multi?10 G interconnections among them and we have the distribution layer that performs layer 3 aggregation of end routers that terminate the users, and this layer has a presence in many other locations around 40 POPs in different cities.

The access layer is even more distributed, so, OK, that was ?? the access layer was not a real candidate for deploying Anycast nodes. Our equipment is more or less Cisco, mostly Cisco, but we also have Juniper boxes in the access layer.

So, the role of our Core, as in most cores, of course, was to quickly just forward packets, nothing else. We tried to evaluate the scenario where we could put Anycast nodes inside our Core, so we posed the question to the vendor, but it was not recommended even by them. So, this is how our Core looks like with interconnections.

So we went on with distribution layer. OK, it has high availability to distribution nodes and we have a major project on the way to replace our legacy B RA S terminating equipment, the actual difference is that in the original, in the legacy set?up, which of course will remain for quite a few years, we have users terminating their PPP sessions on BRAS nodes which will up links to distribution points and from there we have interconnection with our Core. In the B NG set?up the ? will be locally aggregated in the B NG node so there will be no need for the distribution layers, so the distribution and access layer kind of consolidates.

So, our choice was to place a few, to place the Anycast nodes in a few selected locations of the distribution layer in the network and one other extra practical criteria for that was easy access for the operation teams, because we do not have operation teams scattered all over Greece, so one concern was to be able to actually have access to the relevant POP.

So our final set?up was a cluster of ten machines in three locations. Two locations are in down town Athens where we have also presence in our Core. We used two machines out of the ten node cluster for the infrastructure resolving. We have enabled DNSSEC validation in the infrastructure resolving, not yet for the end users. And we ended up using eight machines for the end users and the legacy set?up had five nodes in each location.

So this is more or less how the set?up is currently in the three locations.

The software choices:

We use Linux cent OS in most of our server infrastructure. We used Quagga to just announce the /32 Anycast IP. Our layer of defence mostly is host based and we use IP tables. And we use BIND 9.9 for our name server. So this is not a big departure from the legacy set?up, in software choices. This was also to accommodate the operation teams.

A few more extra details:

BIND runs C H rooted, use single /24 for point?to?point links and tuned a bit the OSPF D timers and used fast hello support to provide minimal failover, this works quite well. And we also filtered all the OSPF routes in each node, we just used the default route in each node. And we have IP tables of course rule protection, to provide the relevant services only to relevant people, or the administration of the serves and all the console access was via the central management LANs.

So in case, in terms of OSPF topology, this more or less describes the set?up in our distribution layer. So we have two OSPF areas, one for the uplinks of the BRAS and backbone area so we placed each name server node in the local OSPF area in each route they are a will have the point?to?point link.

This is the actual Quagga configuration providing the the filtering of routes and also the minimal hello configuration.

This in practice works quite well so we can see that in ?? we have a failover in the order of five seconds, so whenever a link is dropped or we just simply drop the OSPF process, in five seconds we will have a failover of the client to different location.

The failover seems to work quite OK, we tested it quite a few times before deploying in production.

And how do we migrate the end users. This is actually something that is also, that is currently work in progress. Most of our BRAS nodes have changed however. All the users terminating their PPP sessions get the new settings, the Anycast IP, so when the PPP sessions drop and they reconnect, the new users will use up transparently their new infrastructure.

In the end of the project, the old resolvers IPs will not be discarded, this is not something that we can do easily so the same cluster will announce the old resolver IPs, as well.

The migration process is quite smooth. We can see in the upper graphs, the upper graphs contain a legacy node, so we can see the traffic gradually going down and the new nodes gradually take up the traffic. So it's quite OK.

How did we manage with the load distribution, with these choices? It went quite OK. The load was quite evenly distributed, so we can say that the choices we made were quite good in terms of the restrictions we had. Of course, we have overprovisioned a bit also in each node. What are the immediate next steps in our set?up?

First of all we will dual stack the service of course. There is IPv6 connectivity for quite some time in our networks. We have dual stacked deployments. But the addressing was not quite there yet, my colleague Janice is giving presentation about addressing plan in the next room, so we have the addressing in place now. Nothing is actually stopping us except a minor detail is that the chosen IGP for IPv6 in our case was ISIS, we didn't go with OSPF and as far as I know there is no ISIS implementation yet. We have heard something from Quagga, the Quagga folks have something but I don't know the maturity of it. However, we are willing to test if anyone has an implementation, we are pretty happy to cooperate with it and test it.

So, the plan for now is just to go with static routes and use tracking operations in routers for the IPv6 Anycast deployment.

As I said, the administration monitoring, alerting, infrastructure, has not changed since our legacy set?up. We have a few central management LANs where we provide all necessary connectivity for the hosts. We use ?? we have actie based graphs and all our DNS measurements come from BIND directly using the corresponding /KABG tie plug in.

A few extra considerations, one of the major concerns us is as in all Anycast operators, I think, is overwhelming specific nodes inside the cluster, and whether we can have form of attack that can saturate a single node or we can have cascading effect inside the infrastructure. For now, first of all, we have just overprovisioned and hope for the best. And of course, we are pretty new to operating Anycast service, this is our first Anycast service deployment, so we are pretty happy and welcome any input from other operators.

So in future work, I am really glad I saw the presentation from Dave Knight previously about the Hedgehog software. OK, the current set of tools provide all the necessary stuff for basic understanding of what is going on, but since we had our fair share of amplification attacks we would like to have better tools in our place to provide anomaly detection and have as good or as near realtime detection and analysis of DNS queries.

So, one few ideas we have, and we will go forward with them, is first of all to enable NetFlow information from our Anycast nodes, and store and collect and realtime display this information. We would like to have software using contemporary visualisation software, very interesting choices with Java script libraries out there and that was, in my humble opinion, one of the problems with DSC, the presentation layer is quite outdated, I think. And we would also like to have storage and analysis down to individual queries and not just aggregation stuff. I know there are people that try to produce a software. We will keep an eye on things and try to contribute in whatever way we can.

So, I am happy to take any questions?

AUDIENCE SPEAKER: First of all, thank you for this great presentation. I wish we had more ISPs giving presentations here, it was very interesting for me. I have just one little question. On one of the slides I noticed that you do DNSSEC validation for your infrastructure name servers, but not yet for your customers. Can you tell us a little bit about the considerations for that.

KOSTAS ZORBADELOS: Well, the major consideration is the performance impact, and to tell you the honest truth, I didn't have time to catch up with Mr. Huston's findings and read stuff about it, so we have to be very cautious on that. The major concern is the performance impact, nothing more than that. Or I don't know if there are also any considerations with current operators using DNSSEC validation and the brokenness out there. So we are just being cautious, nothing else.

AUDIENCE SPEAKER: Any plans?

KOSTAS ZORBADELOS: I cannot give an estimate, but it's definitely in our agenda and we will try to accommodate that. And having DNSSEC validation in the infrastructure resolvers is one step towards introducing it gradually to end users.

DAVE KNIGHT: You mentioned an issue choosing an IGP to use with v6. Have you thought about using BGP?

KOSTAS ZORBADELOS: No, to tell you the truth.

DAVE KNIGHT: This is what we used with R root, we switched to BGP and it works ??

KOSTAS ZORBADELOS: You mean as a ?? between the Anycast nodes and the network?

DAVE KNIGHT: Yes.

KOSTAS ZORBADELOS: Yes, we have thought about that. ??

DAVE KNIGHT: We used to use OSPF and switched to BGP because everything was much easier to debug.

KOSTAS ZORBADELOS: I can understand it's quite easier but what happens in terms of failover, failover times I mean, in our case having tuned the OSPF timers as well we have a failover in the order of a few seconds. I don't know what will happen using BGP and what impact that will give, that is our major concern. I mean, all the resolvers out there, I don't know how they will react and we definitely like to keep the operating costs and the calls and the call centres to a minimum for that.

DAVE KNIGHT: I don't have any numbers for you but for us it works very well.

KOSTAS ZORBADELOS: OK. I will also keep an eye on Hedgehog, yes.

JIM REID: I want to go back to ask you questions about DNSSEC deployment, you said you are concerned about the impact. Do you have any numbers for that, have you actually instrumented what the performance impact is on your resolving servers or other aspects of your operations?

KOSTAS ZORBADELOS: Well, we haven't measured DNSSEC performance but have made benchmarks on our current nodes and we can currently accommodate up to 30,000 query per second where we had a peak of 10K more or less in the legacy set?up. The answer is no, we do not have any numbers with DNSSEC validation enabled and whether this will be of impact. Or not. I was hoping you could provide extra information on that.

JIM REID: I will give a sales pitch here, something that the Working Group Chairs have been trying to do for a while is get a bunch of operators together and to give us information such as this is our experiences when we switched on DNSSEC in our network. What sort of things broke and what was the impact on the infrastructure blah?blah. How hopefully, again fingers crossed, no promises or guarantees or commitments, we might be able to get that done for Warsaw. We have heard one or two bits of anecdotal evidence of resolvers have switched this on and say they haven't noticed which surprises me a little bit, let's get some real data and if you had any it would be great to share it.

KOSTAS ZORBADELOS: Yes. We are pretty happy to share whatever we can. Using this set?up, after we have migrated all our users we can try enabling it in selective nodes and selective not the entire infrastructure so we can control it much more than the previous set?up.

AUDIENCE SPEAKER: From cz.nic. Just a remarks, we are already working on ISIS for BIRD routing Daemons so if you can contact me after.

KOSTAS ZORBADELOS: OK, we are really interested.

AUDIENCE SPEAKER: I am from FortNet an ISP here in Greece. When we turned on DNSSEC for our resolvers for our customer resolvers in December 2012 we saw no impact.

KOSTAS ZORBADELOS: So in theory you have DNSSEC enabled infrastructure.

AUDIENCE SPEAKER: Yes, for our customers.

KOSTAS ZORBADELOS: OK, that is really good to know.

AUDIENCE SPEAKER: I am from GerNet and we have also turned our resolvers to do DNSSEC validation for our customers and we have no impact at all and we are like 98% of our customers do DNSSEC validation right now and it's going fine.

KOSTAS ZORBADELOS: Perhaps I will turn it up by tomorrow then.
(Applause)

PETER KOCH: Thank you. Any more confessions in overtime? Thank you so much. Thank you.
(Applause)

So, before we move on into the coffee break, a couple of short announcements. Jim Reid has surrendered his presentation slot, but for those of you interested in the presentation, this would have been a slight modification of what Jim presented at the org meeting in Phoenix in the previous week. So the video caption of that is available on OARC's website.

Second, everybody registered for the RIPE NCC AGM is kindly asked to grab their packages from the info hub during the coffee break, otherwise the door bouncers will not let you in.

Next: So much speaking about DNS collisions these days, we are doing something else, we do have slot collisions and we need your feedback about the arrangement of the Working Group slots like this and we also need all your feedback on the content that you see or that you would like to see or would not like to see any more so please don't be shy, approach one of us three Working Group Chairs with your feedback, send us e?mail or talk to us directly.

JIM REID: And just one other thing: The people who were at the back of the room who said that they had switched on DNSSEC validation, could you please come and see me at the end of this because I really would like you to come and talk about your experiences in Warsaw.

PETER KOCH: Absolutely. So, then finally, I'd like to thank NCC staff, Marco, Rumi, and Denis for scribing and meeting and videotaping, the AV staff behind the black wall, our fearless stenographer, and you all for coming. That is it. See you in Warsaw.

(Applause)

Connect with RIPE 67:

Local Host

Platinum Sponsor

Gold Sponsor

Silver Sponsor

Supporting Sponsor

become a sponsor