These are unedited transcripts and may contain errors.

DNS Working Group

Wednesday, 16th of October, 2013, at 11 a.m.:

JIM REID: Good morning, everybody. This is the DNS Working Group's first session for the RIPE meeting in Athens, I was going to say Amsterdam there. Welcome.

Before we get proceedings underway, I would just like to remind everybody that this session is being recorded and it's being broadcast out on the interwebs, so when you please come to the microphones, could you please make a point of giving your name and affiliation so those who are following remotely can have an idea of who is saying what.

We have got a fairly packed agenda for the whole of the Working Group in Athens so let's get down to business.

First couple of things we need to take care of is, we have a couple of last minute changes to the agenda. One or two speaker changes and additional talk here from Jeff Osborn, who is the new chief executive of the ISC. Any other questions or comments on the agenda that was published late yesterday? We can accommodate them under any other business on the second slot after lunch.

There are currently no open action items for the Working Group so there is nothing to go through there at the moment. And we published the minutes of the last Working Group session fairly recently, and I'd like to ask if there are any comments on those and if we can consider them approved? Are we happy with those minutes? Silence implies consent, so I guess we are done. Thank you very much.

Before we get down to the actual business of the DNS Working Group, we have kind of inherited a situation since the ENUM Working Group has become dormant and we have come to understand from time to time if the co?chairs have any items of business or relevant topics for discussion, we would make time for them in the DNS Working Group as and when and if the ENUM Working Group did have a full agenda, they would have some slot in the meeting so we do have something because I understand there is a short announcement they wish to make, it's going to be Carsten Schiefner. So over to Carsten.

CARSTEN SCHIEFNER: Thank you, Jim, and good morning, everybody. This is really going to be quick because as Jim said, ENUM Working Group is currently dormant and we intend to give you a little bit of status update nonetheless during DNS Working Group session instead. Stat stacks from E /TPHUFPL data dot org there has not been any change since the last meeting in Dublin. The plus one in brackets is an error correction from the last statistics, so there was no real change; it was just that there was a slight misunderstanding how to interpret the website or numbers and figures on that very website.

Compared to what is going on in the /R*EPL area, there has been a change plus one so currently we have 32 zones delegated in the R N ENUM .net which we could possibly call silver tree or something, maybe even platinum tree, so there is quite some activity going on there. At least the number of delegations is higher than

These are the countries and regions of the earth where currently ENUM delegations and the NREN um .net are going to happen.

And well, the real breaking news from the ENUM and NREN um .net initiative is by end of September there is a global NREN document in place. This one got approved by the train in a Executive Board and train in a as an organisation is essentially in charge of running the silver tree for Two former bodies have been established, it's the ?? first of all, a governance committee that sees after the policy and the means to become a member of the NRENum initiative and is also a normal operations team that actually does the day?to?day operations for this very tree.

Currently in process or ?? yeah, is a discussion about using virtual numbers for all sorts of services under the But these discussions are currently going on and there is no final agreement on what to do with virtual numbers, where should she come from, should it be an e.164 country code, and well, set aside one, etc., so these discussions are going on.

The other thing is, and that is kind of interesting, the services primarily for research and the research and dedication community but not tightly restricted so it essentially turns out that the NRENs are doing something that we possibly have seen 20 years ago when e?mail back then was something that was still in the R and D community and people just thought why shouldn't we have that at home as well, so I think it's kind of interesting to see the R and D community to just ?? to pick the idea of ENUM up and to make something out of it.

Current activities is using tools like the web RTC 2 SIP proxy service and video?conferencing tools and tie all these together and use ENUM as a means of addressing the services to use them, well, across the R and D community.

And as I said, yeah, the use of ENUM or ENUM?like services is likely a default even so, it's whenever a new service is being considered to be made productional, the use of ENUM, use of the NRENum .net tree, I wouldn't call that mandatory, but it's at least considered by default.

Which brings me to one of the last slides already, which is if you have any updates considering the public user ENUM tree under, please send us updates, please review what kind of data is available under ENUM dot org and if you have an update, just let us know. Thanks again to the RIPE NCC, who host ENUM dot org and also to you guys, to the DNS Working Group, for this little agenda slot we are having here from time to time.

And that brings me to the question section and I have just put up some default question in case you don't have. There isn't any.

JIM REID: No questions for Carsten? Thank you.


JIM REID: For the rest of the morning we have got a group of related talks on various issues around fragmentation issues and DNS amplification attacks and potentially use of it. CP and other mitigating techniques so for the next four speakers could I ask you to please restrict your questions to questions of clarifications about the specific presentations and then we will have a more wider general discussion about these topics involving all of the speakers afterwards. So please try to restrict questions to issues of clarification for each of the next four talks and then have a wider discussion about those things at the end of the Working Group session this morning. With that in mind, next up is Willem who is going to talk about issues around packet fragmentation?

WILLEM TOOROP: Yes. So this is about M path MTU discovery for DNS and MTU is the ?? it's typically 1,500, it's the maximum size of the package that your link accepts and of course from source to destination there is a series of links and the path MTU is the smallest size or the smallest link on that path, path MTU discovery is the process of discovering what the path MTU is.

We have been investigated issues with path MTU discoveries, which are called path MTU back holes, for a bit more than a year now. We have done several student projects and the last one was in beginning this year, path MTU discovery for DNS.

So, with IPv4 fragmentation could be handled by the networks, here we have the Internet and there is a smaller link on the path and the router bordering the link fragments the packet that doesn't fit into little pieces and the other router reassembles it. But in IPv6, this is not possible any more. Fragmentation is moved to the end point, so to the client and to the name server. So how does end point know to which size to fragment? It is told by a signalling packet in case of IPv6 it is called packet too big and in the packet too big ICMP message is how much the path or the size of the path is that the packet pimped into.

But DNS servers, they have no state. They are built to answer questions quickly. They get a question, answer it and forget about it completely. So that is a bit of a problem. They do not do path MTU discovery by default, but the ?? operating system level, it is remembered for several time typically, ten minutes that that destination has a smaller path MTU and the next time a client or a requester returns, the answer will be fragmented to that site. If there is no IPv4 fallback, it will get an answer eventually after five seconds or so.

So, this is not so nice, of course, we have the five second delay, so there has been a proposal by Mark Andrews and he said, well, what if we just start fragmenting immediately, we don't wait until the operating system, fragment to the minimum size and it always goes through. But then the somewhat bigger DNS messages will be fragmented too, and in earlier research we have seen that 10% of fragments are dropped in IPv6, so they never receive the requester.

And that is problematic also from a security perspective, it opens a window for cache poisoning, etc.

So, this is what a packet too big message looks like, and it's different from the IPv4 version in that it now also contains as much of the original packet as possible in its payload, not exceeding the packet size of 1280, which is the minimum path MTU, so the answer of the name server is also in there, including the IP header and the query, the nature of the query. So there is our state, why not utilise that and make sure an answers is returned immediately instead of the five seconds?

And in that way, increase DNS responsiveness for messages in the 12321452 range.

Though, you have to be careful, because the packet too big message is coming from the router, its source address is not that of the requester, it's the source address of the router, and if we assume that its content is valid or is actually answer that he we gave before then we assume, well, this requester has asked this question, this may not be the case.

So, my first thought ?? our first thought, was we take the payload, set the truncated bit and we resubmit the answer that is not a very good idea because it makes cache poisoning very, very easy. So we have to reinject the queries somehow which is also in the answer in the name server. But then we have also the security issue of what was the packet size, the packet size that a client accepts is typically at the end of the request and it's not in the answer, at least it would be on the end and it's probably ripped off because it doesn't fit in the 1280. So you have also to be careful that you, if you reinject it, that you use a proper size and that you do not use a size that would make it possible to return an answer that would be greater than the request because that would open a window for amplification attacks.

So, the maximum size allowed to, in an answer should be the actual payload size of the ICMP message.

In June, I redid the experiment of the students, measured which ?? what the situation was on RIPE Atlas with 863 probes, we forced a baseline measurement, forced a measurement that would result in fragmented answers and then you could count the numbers returned to see how much didn't make it. So it's more than 50% of IPv6 probes was on a smaller path MTU than 1,500, here you see the announced path MTUs. 7.7%, which is less than 10, was behind fragment filters. We saw besides packets too big, we also saw some other ICMP messages that also have the name or the packet in the payload, which is interest, so we adapted our proof of concept programme to reinject those messages as well.

This is the result when used with proof of concept programme, so this is reinjecting the query when packet too big message was received or administratively prohibited or one of the other ICMP messages.

Last week, I thought it would be nice to have some recent measurements as well and I redid all the measurements and was able to use far more probes than before, 1059, and also I made sure that all experiments were executed with the same probes and also all probes were, the test was done more than 10 times at each probe, and I have some ?? somewhat different results, the results is getting better, it seems.

In January, we were also, we also had packet captured from SIDN and surf nets, SIDN had one hour packet capture of one of their name servers, SURFnet, four hours of all the name servers, to see how much of those answers could have been answered, but that didn't really give a good representation because they reply fragmented answers, how they use the ?? though it's interesting to see that for SURFnet, 1763 messages could, perhaps, be sent out without fragmentation. Extrapolating the ICMP message.

So, to do: I have started doing measurements against the probes network resolver as well and that is actually more realistic than using the probe directly, because I wasn't able to put it into this presentation yet. And also, we highlighted ?? hired students or we have a another student at NLnet Labs but this one is a programmer and he is going to make a framework that does this type of measurements for us, on a structural basis and also to check out the network evolves, if indeed fragmentation dropping is decreasing. And there are more IPv6 nodes on a 1,500 path MTU.

This is my presentation.

JIM REID: Thank you, Willem. Are there any questions or clarifications on Willem's presentation? No. OK, thanks very much.

Next up is Geoff Huston.

GEOFF HUSTON: Good day. Most of this you will have seen before. You saw RFC 791. I did get the original text here, and this is what it was originally saying for your translation, this is what we said in RFC 791 so many billion years ago that in the original specification of IP, you had to accept 576 objecting at the times in a packet, didn't matter if it was fragmented, when it came to you you had to reassemble at least 576. 1123 and that says in UDP you can only assume 512 and you kind of go, well, what is going on here. Firstly, it says if the payload needs to be bigger than 512 you should use TCP and for those of you who are obsessive compulsive about standards and normative text, that use of the world "should" in lower case is incredibly misleading. Is that a "should should" or a "na should"? Some of you have interpreted this when you write your applications as a "should should". And others have said "na should". And we will see what happens as a result. But the first thing is kind of, hang on a second someone said 576 and 512, what is going on? Well if you look at the original v4 spec you find 20 bytes of header, 8 bytes of UDP, but the ability to go back most 40 bytes of IP options which is why 576 comes down to 512.

There you go.

What happens in the original DNS when your response is larger than 512? Well the answer was, you couldn't do it. And the original spec said if the reply is greater than 512 set that truncated bit, that one just there, set it there and just give back some stuff. And the theory was that should, kind of should, trigger the client to actually requery using TCP, right? And then, RFC 2671 came along and said, you know, here is a way of saying I am big, I can do this, I can handle more than wow, 512, I am going to use an option to say you can handle responses up to size X, I can do the packet reassembly, just send it to me.

Now, this gets a big weird. We went and sampled a few million people out there and we looked at a whole bunch of resolvers that you use and we looked at the EDNS 0 size that you offer and I must admit that is a weird distribution, that is a log scale but when you sort of write it out there are some remarkably curious values which all say to me that you should never let programmers get creative because they are shit at it. Some of you read RFC 6891 and said they said 4096 we will do that. And a whole bunch of you used 1480, where did you get that number? Because that is 1,500 objecting at the times minus 20, there is another 8 October at the times of UDP guys, 1480 makes no sense to me. And where did 1452 come from? Is this some kind of weird perversion of v6 and if you are doing v6 it's the payload not the packet, so those 906 people offering me an EDNS 0 of 1280 are demented. And the only other folk that make sense are the 62977 who said forget this EDNS 0 stuff, I am going back to the original thing, it's 512 or nothing, good on them. I don't know why you pick on those other random values. Don't.

So, anyway, why is this an issue? Well, you will all understand that a small query can give you a massive answer. Querying ISC dot org gives you about a ten times gain and you combine that with DNSSEC and it just gets amazing and you look at the number of open recursive resolvers on the Net today which is somewhere variously between 28 million and 33 million, that is million, and all of a sudden this stuff just goes ballistic, what we were able to do was use the DNS for bad rather than good and use it amazingly for bad. So we have had endless discussions, what do we do? Get everyone to use BCP 38. How old is that? Five years? No. 15 years. You guys all have hearing defects, because after 15 years, none of you have heard a damn thing, no one does it, there is just no evidence of anyone doing that kind of filtering on any scale so source address spoofing just works and BCP 38 is not anywhere so that is not an answer.

What about a smaller EDNS maximum size. So instead of having these humungous EDP things flying around, let's push everyone down into TCP. So this whole thing about RR L with slip value was one and two is almost the same kind of thing, sort of says the way to get around source address spoofing is just to push folk in TCP because there is a handshake, it's really hard to spoof that. What goes on there? So this is cool. But you get the secondary question because if using TCP a kind of "nay should" or a should ?? I wonder how the stenographer is coping with "nay should" ?? or using a "should should," so we thought let's really, really try this out on the real world and let's do an experiment by returning down the EDNS size 0 ?? through an on?line add network we send a whole bunch of unsuspecting clients those tests there, we give them a short name fits in 512 and that is signed that doesn't fit in a 512 and we give them longer names. So we tested it for seven days and found 2 million unsuspecting victims all over the world and we came up with this kind of answer: That it's not that bad. Out of those 2 million?odd folk we found just 52,000 were unable to get to that thing, that URL, so whether it was a long unsigned or a long signed, around 2.6% of folk just simply didn't get there.

And then we thought maybe we are testing two things at once. Without DNSSEC it's kind of a challenge to bring up a really long response. You can try a really long name but once you get over 255 everyone just Bafs. So the only way we could make a really long response, was to whack a C name in. And we thought maybe we are just counting C name failures, and maybe this is C name not EDNS0 and TCP and then we looked in the bind code and after retching in horror for a few days, we found the places where we could change the EDNS 0 size below 512 ?? hint, Mark Andrews hid this in many places, it's not obvious and the guy just loves documentation ?? not. So we found the points in BIND where it had 512 and got them down to 275 so we could dispense with the C name construct and if the response was over 275 you got the truncated bit and this is the second round and it's much the same. We only ran it for about three days, test add million folk and the same 2.6% of folk failed.

So, that is the first kind of answer. A little more than one in 50 don't like using TCP to have their names resolved. So, don't forget I am not talking about resolvers; I am talking about clients, the end users, and they have resolvers and forwarding resolvers, whatever, the end result of all of that stuff is 2.6 of them can't resolve a name if you have to use TCP to do it. Because understanding the DNS you all know this shit so I won't go through it but there is something else that I can talk about, I can talk about the resolvers that ask me the query, those visible ones and interestingly, over there, the failure rate is higher. 17% of those resolvers never asked me TCP. And interestingly, 6.4% of clients use those, so 2.6% just died, the name didn't work, but another lot, double that almost, 3 .8%, took a longer time, because their first resolver query didn't work, and they had to use their secondary resolver to actually pull through an answer. So, that is getting a bit worse.

What about timing and performance, does TCP make stuff go incredibly slow? The theory says if I add TCP to the mix I add two RTTs because quite frankly there was a handshake and most of the time I can get the question and response through one way or another, and let's assume that is just one RTT. So I look because I can at the packets, and look at the time to resolve a name and I get a graph like that with one?second peaks and that is all beautiful. But then I did a cumulative graph of that, and that is where things get interesting. The median point of the additional time when referring folk to TCP is 400 milliseconds. Interesting. So, it's just under half a second. If you send TCP all over the world, roughly 400 milliseconds is the average. But around 600 milliseconds folk start to barf. And between 20 and 30% of folk don't like it, and start taking enormously long amounts of time to resolve that name. This stretches out to 2 seconds and even at 2 seconds 15% of folk are still struggling, they are still asking questions, kind of going, what is going on here?

So, this is interesting. The 400 milliseconds, that is because I am in Dallas for this experiment and 400 is about the average for the world, you know, that is the world, that is Dallas, that is time, that is great. But the issue is 70% of clients get that penalty of two round trip times but the other 30% really get penalised. 10% of them are doing this sort of multi?query delay and 20% are really taking extraordinary amount of time. So, the other thing here that I should also point out that I am still interested in, on the left, up the top, is a name that should resolve in a single UDP query, question, answer. And for 10% of the folk, that doesn't work. That question and answer takes multiple questions and multiple answers. So if you think the DNS is just fabulous and hunky and really good, you are wrong for at least 10% of the world. There is something basically broken and 10% is not insignificant. That is a significant brokenness factor flying around in there. Anyway, the conclusion of all of this is, could you probably get away with using TCP if you are willing to say 2.6% doesn't matter, you are OK. If one of those 2.6% is you, those people over there, tough, no DNS, sorry.

And maybe we could fix it, I am really not sure. But one modest suggestion and I have heard this from various folk including Paul Vixie the really answer is to use Jason over http over port 80 over TCP and that is the other option that maybe we should test. Thank you.


JIM REID: Thank you. Do we have any questions? I see Robert coming to the mike, I think.

ROBERT KISTELEKI: Private person. That should be port 443.
GEOFF HUSTON: The DNS is everybody's property.

ROBERT KISTELEKI: And is registered.

GEOFF HUSTON: You are so quick. I take it it's you.

SHANE KERR: Could you go back a slide or two, I am not sure, to one of your graphs. So I don't understand the zero seconds resolution.

GEOFF HUSTON: What I am looking at here is the time between the first query for a name, because all the names are unique, and the name of the last query for that name.


GEOFF HUSTON: A single query is 0 seconds of course and all of these are multiples and it's just some of that timing, even for the red, which is the simple name, is wrong, that ?? you know, if the world was perfect that wouldn't happen.

JIM REID: I have one question, sorry. When you are talking about the problem with TCP failover or reverting to TCP and then the queries not succeeding have you been able to verify if that is a problem with problem with the DNS software and behind broken firewalls and the they are blocking TCP.

GEOFF HUSTON: Great question. We did a lot of work similarly when we analysed using 6to4 was a remarkably stupid thing thing to do, between 10 and 20% of folk have a filter blocking protocol 41 but the block is incoming not outgoing. And our suspicion is that if there are TCP port 53 blocks it wouldn't be preventing the outgoing packet it will be preventing the incoming, but that is brilliant because I get to see the opening SIN. I send back a SINac and never ?? I looked what percentage of it. CP connections use what I call naked SINSs and the naked SIN rate in the DNS is remarkably low, while that is good or bad there is very few around, it's.3%, which is almost down into experimental error so I can't see systematic blocking from resolvers that query authoritative name servers, the ones at the end of the food chain from those resolvers to authoritative name servers, the TCP blocking rate is remarkably small and it's as far as I can see down in the 0s.

AUDIENCE SPEAKER: Gerard Mosh. So I have considered as part of the open resolver project also doing a TCP 53 scan as well, which would take a little bit more time than a UDP scan, is that something that people would find valuable in measuring this to understand the number of DNS serves out there that actually do respond to TCP queries?

GEOFF HUSTON: That is a really hard question to answer from my point of view, because I fully sort of understand from where I am sitting. You ask a resolver who forwards who talks to me and I am forcing that last hop conversation into TCP they may send the answer back on UDP and when you think about reflection attacks you kind of think which resolver should I worry about if I am going to turn my authoritative name servers into rate limiting slip one or other forms. I think it doesn't matter about what you are talking about. It's the end point that turns the query into a response you have got to worry about. It's what I would call ?? I am taking a bit long here but hopefully the answer is useful ?? that set of resolvers there you have got to worry about because in any reflection attack it's those resolvers and the name server that are being perverted into evil, and you are finding 33 million folk inside this cloud here and it doesn't matter whether they do TCP or not, as far as I think. Others might have different theories, but that was my thoughts on it. So, I am not sure it would make much difference. But we can think about it some more and talk.

AUDIENCE SPEAKER: Yes, I am just wondering if it would be a valuable measurement to have.

GEOFF HUSTON: Bill says yes.

AUDIENCE SPEAKER: Bill says yes.

JIM REID: Who are you?

AUDIENCE SPEAKER: Bill Manning. The first time looked at this was about 2009, and there was some question about what actually is happening under the covers, and one I think that we looked at then, which I did not explicitly see in your experiments, was the impact of query on one address family and response on a different address family. Or a query comes in on v6 and a response comes back v4 because of dual stack nature. Have you looked at that?

GEOFF HUSTON: I am the authoritative name server, the entire thing has been deliberately constructed in this experiment in v4 because I am trying to get the TCP behaviour and I thought if I was going to introduce multiple protocols either the experiment or my brain would explode, I suspect the latter, and that would be ugly and not nice.

AUDIENCE SPEAKER: Well, I apologise for your brain.

GEOFF HUSTON: Someone has got to, I am glad it's you.

AUDIENCE SPEAKER: Running an authoritative name server, I have seen until we actually hammered a code to respond to the request in the same address family, the request would not ?? or the response would not necessarily match the request in the address family so we would see queries come in on one family and the response on a different family. And TCP and UDP for v6 are slightly different than for v4 so it would be interesting to see what that impact would be in a future experiment.

GEOFF HUSTON: Thank you.

JIM REID: Thank you very much.

Our next speaker is Ralf Weber from Nominum and some ideas about DNS protections.

RALF WEBER: So DNS amplification attacks are not a new thing. We had them for a couple of years, the ideas of doing that are probably even older, and they resurface from time to time and we are actually the thing that was used in the largest ever DDoS attack that happens, probably can skip that for you. How it works, you need a spoofed IP address and you get back a real large packet for small response like this. This is any which is kind of one of the common queries used in these attacks and it gives you back something like 2K or 3K nearly for something like 50, 60 byte questions. And if you do the maths and have a home connection or where there are two or three Macs and you have sort of home connection can generate a gig of traffic and you can do a lot of stuff with and the large attack was 300 gigs so, well, it's easy for the attackers to actually just get these resources, relatively small BotNet can deliver that easy.

Now, what we initially saw were the primary targets were the authoritative servers because these are kind of nice resources and Anycast and they are distributed and they have lots of power, so people use this initially for these attacks but then the authoritative server operator said well, if they are attacking us we are going to do something against it so LIR blocking or what have you so these resources became unsuesable and then of course the attackers felt we will use something else is one of the things out there as talked about the open resolvers or in fact, they are mostly I think not open resolvers; we are I would say open proxies. And with these open proxies that actually then asked the ISPs resolvers and what the attack that we currently see with lots of our customers looks like that. So the attacker finds usually home gateway that is broken, wrong configured or has a bad default configuration and send that home gateway a query and that says I am the DNS proxy so I answer everybody and sends the DNS query with its ISP address to the ISP resolver, it's a legitimate client, I have to answer it, sends back the packet and the home gateway then kinds of sends it against the target.

So do we figure that out? Well, we sell software, sell DNS software, and we take people's money of it and when the open resolver project, thanks large to these folks, did a name server typing of what kind of software are these open resolvers using, they found out that we are, our software, had something like 500, nearly 500,000 hits and I am pretty sure we never sold 500,000 copies of our software so either somebody is stealing or I, if we did that I would be probably would be rich enough giving the talk here. There must be something else going on. We did a small kind of instrument, we created a special domain and then asked these open resolvers on it and the reason for it was just as Geoff pointed out, there is a huge kind of DNS cloud and what do you get is the last resolver out of that and we found out with that that most of these devices actually were just proxying it or forwarding it to some ISPs or some larger resolvers so when we took a lot of these queries, a small subset resolvers was coming back. So this was how we found out that the attackers were actually targeting these open proxies.

And the other thing we found out, the initial set of queries were usually right any queries, but most recently the attackers don't give them shit about DNSSEC or these large attacks, they just create large domains and pretty much change every day, and when you block one domain the next day you have another domain and they do that by kind of giving back large sets of A records, I mean, or text or what have you, and also, in the DNS there is some stuff that are, well, people who may not understand DNS too good and create RR sets or domains that are not bad, they are usable probably but they are actually also leveraging attack ? for these attackers. And the ISPs resolvers are a great target because, I mean, ISPs try to size their resolvers so that they can withstand attacks and run them usually not up to the limits so they have headroom so when the attacks come and when you do these attacks, on a low volume level you might even get unnoticed but spread them out enough and the target will get basically hit. So, what can can we do actually about it to defend against these attacks? Well, I think the kind of, if you are running a resolver, the clue is in the data because if you have say low volume of attacks you might not notice them in your normal kind of QPS so you need to look into the query data of that stuff and we have our software can store or stores every query that we get and the answers and we can then report on that and we have an open resolver somewhere and I am going to now actually show you what this resolver currently has. So, this is basically, this basically gives you the top domains that had a response size for more than 1 K over the last hour and this I never seen before but that might be something else, packet dot Asia is a old one and this is the dot info, something very common and when you want to look at what they were queried for, we are going to group them by name and query type and what I will guess we will see, I am not sure ?? so, most of the attacks in the large volume are still any but there is some stuff that is going on here that is not any. So once you have this data there is something you need to do with it and I think the best thing to do is kind of filter it or the actual best thing if you are caring about giving the client back an answer is to truncate it. You can also, I mean, that really depends on what your software can do. You can also give back an ex?domain if you are 100 percent sure that this domain is bad. What you need is reputation something, because of these attacks you are using mostly kind of made?up domains, if you have a reputation list or a list of domain that you know that are bad or if you have these dual use domains you put them in a policy and drop the query, this is something how our software does it. You just add a list and then drop the query if something on that list is hit.

Now, of course, that is very operational intensive task because you need to monitor your resolvers, you need to add these lists and you need to do stuff on your resolver actually a lot and inside it's all about size, so maybe it would be good idea to respond differently if the size of the answer is kind of large, maybe it would be a good idea to rate limiting different on that. I want to show you some of the domains maybe we had before and how they look and what you maybe can do.

So, I never saw this so I am actually going to try it. That sort of looks valid although I am not too sure what it is because it's giving you very low TTL and lots of answers. The other stuff, what we more usually see is like this, that is around 4K and that is also something very common. In these attacks, most of the software out there seems to actually automatically fall back to TCP on with kind of 4 Ks, and that is why the attacker targets these domains, to be around 4K because if you switch back to TCP, some software gives back as much as it could but other software just gives back the answer and then you don't have an amplification and these attacks about amplification.

Some more interesting thing I want to show you is, there is a domain called net firms which is one of the domains we see as a dual use attack a lot so that is, not a problem. If you query something like, you get back 4K and that is actually something that I really fear somebody has not detected yet. You can spread the queries on multiple names and get back the answers, I have tried to contact them and haven't had contact so far, these are the names, I call dual use domains, can be used for good or bad.

The route domain also, I mean, the answer you get back there isn't bad, 1.8K but the query is the smallest you can think of. So we have also seen that in attacks. So stuff to do. I think rate limiting or truncating is the right thing to do and I think rate limiting truncating on size actually would be the best thing to do. I mean, I am not sure, we have lots of other vendors here. Because you want to have that automatically, you want to have your operations department do this list. I mean the best thing if these attacks could be kind of detected automatically. And with that, I will open for questions.

JIM REID: Thank you Ralph. Questions?

AUDIENCE SPEAKER: We have been hit by exactly this type of attack, you have been describing. Well, we just managed to say in our case it was just a simple attack, it was the any query on the domain, so we just filtered or rate limited that one. The other thing you described, analysis of the resolver queries and pattern recognition and anomaly detection and somehow injecting this information into lists and making policies out of them, do you have a ready solution for that? I mean, do off product or anything else.

RALF WEBER: We have what we call network protection solution that you can buy with our software and while it initially was not thought to defeat the attacks, the way it works was totally suited, so with some customer that had that product we actually put that domains in because wee traffic from many customers and rapidly and put them into reputation lists, yes, we have that.

AUDIENCE SPEAKER: OK. And also, one more question which I forgot.

RALF WEBER: You can talk to me later, no problem.

SHANE KERR: From ISC. We think having the servers limit the reply rate based on size is a really good idea, such a good idea we are implementing it right now.

RALF WEBER: So do we.

SHANE KERR: So that is one thing. And I guess another thing is we don't currently turn on RRL by default and it's something that maybe it would be good get some feedback from the operator community. On the one hand RL is really useful if other people have it turned on on their servers because I am getting attacked by reflection and amplification attacks from them, so, but we didn't turn it on by default because it can be quite confusing I think for an administrator because it it basically causes your server to operate in ways that you are not expecting based on historical use, right? So, if the community sees that these attacks are so widespread we have to get this technology out there we can consider turning it on by default but that is kind of the dilemma that we have right now.

RALF WEBER: RRL mostly protects on ?? recursive or caching doesn't make too much sense. I think it's an operator's decision to actually turn it on or off, especially if you go on slip levels above one you are actually dropping, you may drop legitimate traffic.

SHANE KERR: We can talk about slip levels over beer.

JIM REID: We could have a more detailed discussion about that in a few minutes at the end of the session, Shane, in the open discussion, if that is OK. So thanks very much.

Our final speaker for the first session this morning is Tomas Hlavacek from NIC.CZ who has got poof of concept packet fragmentation tool or software.

JIM REID: We are having a slight problem with the projector in the room but I think normal service will be resumed shortly.

TOMAS HLAVACEK: I work for cz.nic and I am going to talk about DNS fragmentation attack and our proof of concept we have made. So the idea to use IP fragmentation as an attack tool came from, they have written paper called fragmentation considered poisonous so we took the idea, it was approximately one month ago and we created a proof of concept which worked eventually. Two weeks ago, I learned about from Brian Dick son of Verisign Labs and we exchanged a few e?mails and we have found out that our approach is slightly different because we focused on the DNS part of the attack and we have been a little bit cheating in IPRS and Brian has been fiddling mostly details of IP fragmentation so it's a different thing. And now I want to point out that actually the complexity of this kind of attack is not in depth of ideas behind it but its implementation details and whole bunch of conditions brings to the case.

What it is all about: The idea is to use IP fragments as a new attack vector for off path modification, it means that attacker does not need to intercept packets as in the middle. But he can still discard or alter contents of communication. You probably know that IP fragments are are based on IP ID which is 16?bit number in IP header and there is destination host and stores and orders all incoming fragments, when all the fragments is formed the packet are cached into the reassembly queue, eventually it reassembles the packet. The attacker can reload ?? preload the cache with some malicious second fragments, I mean fragments with some offset and it waits for the first fragment to come and then it reassembles on the spot. So it is not exactly incident of raised condition attack, right?

The legal second fragment arrives lately but it's going to be discarded after sometime because it just stays in the cache but no first fragment is then available because it reassembled before. So ?? IP ID, number and IP ID numbers are generated by counters in most operating systems. So there is no randomness, nothing. If you can find out what the IP ID for that particular connection is you are not guessing, you are just increasing your own counter and maybe slightly add or decrease your counter based on successive or failures. So and actually there are tricks how to learn about current value of the IP ID on remote host and it's covered in the Shuman paper, so you can read it. It's not my topic here. But I have to say that in Linux IP ID counters are specific for each destination, and they are held in route cache. But their operating systems, one global counter for all connections, these operating systems are not frequently used for DNS servers.

Aim of the IP fragmentation attack on DNS is to poison cache by modification of second fragment in ?? the idea is to reduce in DNS transactions from 32?bit plus DNS ID to 16?bit, which is only ?? only the IP ID in IP header and it works because UDP header plus first portion of DNS data, which contains the DNS ID is something like 30 or 40 bytes and it stays definitely in the first fragment and attacker modifies the second which contains part of ?? section, usually the whole additional section.

There are actually two types or flavours of these DNS attacks. The first one is based on a spoofing ICMP destination unreachable fragmentation needed ?? and the idea is to convince authoritative server to fragment replies for caching resolver and the second type is forging a special zone which generates responses over MTU so it fragment inherently or naturally.

First type is based on spoofing. Which means you have to send ICMP destination unreachable which is type 3 code 4, and spoofing of this ICMP is not really a problem even in BCP 38 environment because you can set source IP address to whatever you need for accommodating your BCP 38 policy, and Linux accepts those spoofed ICMP destination unreachable into routing cache for ten minutes and the minimum MTU you can set is 552 bytes.

So we have the picture for the attack. We have authoritative server and caching resolver. So the first step is to send a spoofed destination unreachable. And you have to spoof internal IP header which is the part of ICMP data and this internal IP header is actually used to convince authoritative server to fragment subsequent replies for caching resolver.

The second thing is that we can spoof second response fragment to the caching resolver. We, of course, need to query authoritative server by itself to forge the second response fragment to know what is the content, but it's simple.

Then we can send a query to caching resolver. It, of course, could be also spoofed to accommodate some access list or some caching resolver and then the caching resolver starts recursion and eventually asks for some data for ?? then, the authoritative server responds with the first fragment and this response is reassembled on the spot with the spoofed second response fragment, which happens right after the first response fragment and second response fragment is put into the cache but it times out afterwards.

So, this is the main idea of the attack. Actually, effects of ICMP spoofing, you can see this in routing cache and when you spoof authoritative server with ICMP unreachable packets it creates a record in routing cache with MTU 555 and you can also see IP ID which is interesting for us because we are working with open IP ID, knowledge for attacker.

And this is example of response which came from authoritative server, so it's real, real response, but it's fragmented, it's not ?? it's could not be seen from this packet but a borderline between first and second fragment was inside the first RR record in additional section, it's there where the red line is, so the attacker could modify the second fragment which contains ? resource record for A DNS 2 and then something else.

This is the example that, what was called in resolver log so this is where the resolver has received after reassembling first real fragment into second spoofed one and you can see that the IP address for glue record is different and there is also some difference in second RR SIG record which we are using for UDP to check some fixing because actually the resolver is not doing any DNSSEC validation so RR SIG are only sort of ? data for us so we are using those data to recount or recalculate UDP checks and we are spoofing the right bytes in order to compensate changes in IP address previous records.

So, what are the challenges or what was the challenges in our proof of concept. First ICMP packet forgery. It was easy, it was like 50 lines of byte encode using depacket libraries so it's easy.

Seconding vulnerable zone, it was slightly more complicated because we had to look at few domains or few domain ? in ?? think about it a bit, but we have found out that almost all DNSSEC assigned domains are vulnerable to this attack. And then the most complicated part was fragment forgery and fixing UDP check sums because we had to make some changes in the packet library and write some model of UDP checks on counter and do things.

And then inserting into networks which depends on local rules and BCP 38 implementation status. And another problem is IP reassembly queue size and most operating systems which needs further research and we are cheating on this point right now. RR set order and randomisation is just sort of annoyance because it decreases probability of successful reassembly, but you will get one?third or something like that, resource records randomisation and you are sure that or you can be sure that reassembly happens when ?? when randomisation is turned off.

There is label compression which is into the problem for us but it might be a problem if you want to do some more complicated name forgery or label forgery. And there is a possibility of fragment arrival reordering which potentially breaks the attack, but it does not happen usually, so it's not a problem for you.

There is another part of the attack and it is really the DNS part. The idea is that if you want your forged packet to be accepted into the cache, you have to follow bailiwick rules, you have to actually, there is a low level of trust in resource record that come from additional section so you need not to ask caching name server which has been already poisoned, for the name you have poisoned because it's on glue record and if you ask for it your serve would start a new recursion and it would eventually rewrite the poisoned record in cache so there are those tricks there and actually, my impression was that rules are getting stronger over time, so in BIND it was pretty hard to figure out what to do and what not to do in order to poison the cache and keep records inside the cache. For unbound we don't really know now but it's on my roadmap to test unbound in future.

So what was the tricks or cheats in our proof of concept:

So, first, this attack worked in lab. The main trick was that IP ID was known to attacker, not exactly the number but the range and you have some window you can spoof, so it's fine.

There are no firewalls and no connection tracking, which could be a problem. And I have used slightly ?? slightly tweaked IP reassembly queue settings but I think it could be worked out.

And with those settings, one out of three trials succeeded. It's one out of three because results records randomisation and timing problems, so if we turned off the results records at randomisation we could get three out of three.

A second type of attack is based on forging zones with specific NS records and you have to add some interesting target or name server and it's glue, which you are going to poison and you have to register the domain at lowest possible level ?? lowest level level you can afford which is in second level zone.

This is the example. We have a zone which contains I think four or five really long names for fake NS records, and then it contains one interesting name server record and it's glue, so we can attack on this interesting glue and the idea is that this zone is, this zone produces really long referral message which is slightly below 2K, so it fragment naturally by itself. The idea is that the zone is perfectly valid, could be registered by common customer and even though it contains weird NS records perhaps we wouldn't notice that, so this is the the main danger here.

What are defences: DNSSEC obviously. And while DNSSEC is not really the popular between end users and between some administrators, there are some work arounds, for the first type you can ignore ICMP destination, unreachable destination, which is from my point of view not a good idea because it breaks path MTU discovery. But it happens. And for second type, you can limit responses, you can limit response size and set EDNS as buffer and to your MTU value so it should mitigate the second type of the attack.

That is it. If you are interested in live demonstration, I suggest that you can meet in terminal room in half past one, or you can catch me in lobby or on mail or Jabber and expect that the live demonstration is going to take approximately half an hour for set?up and launching and attack and everything. So that is it. Thank you for your attention.

JIM REID: Thank you Tomas. Do we have any questions? Nobody is coming to the mike. Thank you very much.

We are pretty much on schedule and I will now ask the four speakers if they would mind coming up to the stage here and we will have an open discussion around these issues: Fragmentation, amplification attack, DNS signing and stuff like that. If people want to ask any questions on these issues please come to the mike and make your points and hopefully the four speakers could address any questions or points on those general issues, if you have any. Are we all so fed up already we want to get some lunch? Discussions or lunch?

Shane, you were talking about the issue of potentially using response sizes for some degree of rate limiting, you mentioned that might be something and also the use of the response rate limiting things. Maybe you could try and spark a discussion about what your thoughts about that were and get some feel from the operators in the room.

SHANE KERR: So the basic observation is that when you are operating a name server and you don't want it to be used as an amplification attack what you are actually trying to do is control your amplification factor, right? So rather than just arbitrarily limit on number of packets or anything like that, the idea is to try to target the actual problem which is amplification so the way to do that is kind of penalise larger responses more. Now, in some ways that is, it's a more sophisticated algorithm which again be confusing for administrators, and may cause unexpected side effects for users but with a properly running DNS set?up it should only be a problem in attack scenario so that is the basic idea there. That does kind of bring up the whole point, which is that unfortunately the way we are dealing with amplification and reflection, which I consider separate problems and I think trying to tackle them together may be a mistake, but basically you are forcing your servers to act in a kind of degraded mode when it's under attack which is unfortunate ?? when it's being used in an attack because ideally in security in an attack scenario you ignore the attacking traffic in DNS we don't have any way to separate those out so we are kind of stuck which are ad hoc prevention techniques.

GEOFF HUSTON: I suspect that where we are going isn't good enough. The whole idea of a rate limiting seems to go, the attackers selects a small number of recursive resolvers and pumbles away at authoritative resolver there is large rate of repeated queries and you can find a signature and apply the rate limiting and off you go. Jared's work tends to suggest there is another: You scan the entire v4 space for port 53 you will get 33 million successes at least, you spin that off the authoritative name server and rate limiting is irrelevant at that kind of attack. I think we are kind of beating a horse when we should be looking at cars if you think rate limiting is an answer. It is the response size that seems to be the problem. And I certainly think that there is very little chance that BCP 38 ever gets traction in the Internet we know and love. And the only conclusion from that is you have got to reduce the amplification size and the only way we know how to do that to drop EDNS0 and 4096 was an ambitious size. I was pointing out that 4096 was an ambitious size and I am agreeing with Shane that rate limiting is really not an answer when you get an attack that sweeps the entire v4 space so that there is no repeated query that you can rate limb society then you are back down into basic amplification. And EDNS0 at 4096 was overly optimistic about trying to make UDP fast and just open up vulnerabilities that folk have driven tracks through. 512 sounds pretty cool, 513 sounds like a reasonable compromise.

JIM REID: But it's not a power of 2.

SPEAKER: While scanning all these open resolvers, from what I have seen are proxies, even if you have a low attack they will at some point end up at a recursive resolver, it might be the end resolver and then you have accumulation of attacks so it might be effective to actually do something about it and from the sizes, I mean we all about IPv6 why not 1280

GEOFF HUSTON: The ones the 33 million concentrate on are the ones the rest of the world concentrate on too. So if those are the ones you are rate limiting you are rate limiting everyone so you might as well stop trying to packet yourself because you are basically sin binning Google, right? Yes.

Bill manning: And this is for I guess the folks that are ready in rate limiting. Geoff put up a really interesting slide earlier that looked at response sizes. And there were two significant end points, it looked sort of like an inverted bell. We had 512 at one end and what was the other end, Geoff?

GEOFF HUSTON: More folk do 4096 ?? 65,000

AUDIENCE SPEAKER: If DNS goes to the 64K response size, what is your definition of an amplification attack? Maybe if everybody supports 64K as a response size, authoritative servers doomed. Just pick up and walk away. There is not enough bandwidth for me to absorb 33 million queries that want a response size of 64K.

JIM REID: Fragmentation problems before we get to that point.

Bill manning: There were a number of things done in the Internet community about large Windows or large frame sizes and discovered that almost everybody does it, it always goes down to the least common denominator. So, but the idea of doing ROL or any of these other kind of things, if you actually push that response size to 64K, it changes your perception of what your threat is. At least it does me.

AUDIENCE SPEAKER: I don't think many people are seriously considering 64K as major issue. I saw that there were indeed on this cool algorithm I can scale some packets that had it but I think it's a bit of a red herring for this discussion.

JIM REID: Thanks.

AUDIENCE SPEAKER: Limiting will work for some very short period of time when you think just a little bit you can randomise destination, spoof to the class of your victim and here goes your rate limiting, it just doesn't ?? you have to come up with more and more complex limiting algorithms. And being frank, it's not about just DNS; there is enough other public UDP services in the network, SNMP, and all of them exploit for more or less order of magnitude, amplification attack. So it's all about BCP 38.

JIM REID: Well, maybe not but the thing about this, just to intervene for a second here, yes you are right, there is all those kind of other UDP, but I think DNS is unique in the case of authoritative servers and TLD servers and route servers who have to answer queries and they are in locations that have got lots of bandwidth and lots of iron for the DNS servers itself for very good vector for the attacks because they have to answer large queries volumes.

AUDIENCE SPEAKER: I know exactly, we had this in Russia, this is Russian Google, resolvers, high performance well connected servers, victim 24/7 so this guy is street limited and works for just this little period of time, but the idea is, it's not only about DNS, it's about UDP and that we allow spoof traffic through and problem is really serious and even big guys like Rosomakho Telecom getting upset and the guidelines not just something that they do once but something that they view on quarterly basis, if they comply to it, which is the only way to go, in my opinion.

GEOFF HUSTON: With respect I disagree. NTP is packet in and packet out, there is no amplification going on in. SNMP is not widely deployed as a public service. There are folk who leak the stuff, if you stick your head out of the window it's going to hit you, it does. As an attack vector, it's not reliable. The DNS is one of the few UDP public services where it's out there on UDP and anyone should be able to use it. You are right about the issue of RRL only buying you a few seconds of time. Because the immediate response to a server that does RRL is to basically increase the diversity of resolvers that query you and with a little bit of work and it's not a lot of work, you can actually figure out a set of questions against open recursive resolvers that fan out nicely even at the authoritative name server. Certainly a lot of them funnel into Google's public DNS but not all and the issue is you can craft better attacks. And then you get into a war of res collation on L LR algorithms, we all lose. I don't think RRL is going to buy you a lifetime of DNS so then comes the issue if it's not RRL where do you G and you can hope about BCP 38, you can, or you can try and look at exactly what we are doing with EDNS0 and response sizes and TC1 and TCP.

AUDIENCE SPEAKER: Michael Daley from Nominet. Our infrastructure has been used for some of the amplification attacks and we found the rate limiting has been helpful in keeping services up and running and keep bandwidth open so we can at least go and look at servers and do some other preventative measures. But we are finding that actually looking at signatures of attacks and rapidly injecting those into some configures is gives us a lot more protection against these kind of attacks. Just can't tell you about it on open mic but if you ask me privately I am quite happy to talk about it.

JIM REID: Thank you.

SHANE KERR: So it sounds quite grim, actually. And there has been I think a lot of discussion about some possible technological measures to improve the situation, you know, one possibility is TCP which I am kind of a fan of, because I think it's basically implementable today, not 100 percent but pretty close. There has been discussion of some some other technological approaches like UDP based cookies to try to kind of get put another patch on UDP to make it like TCP without making it TCP. It's probably the first time a protocol has picked TCP and tried to make it more like TCP. I guess my question though is, if we decide we wanted to solve this technologically, there is so much old stuff on the Internet, is it actually worthwhile pursuing that? Is it worthwhile trying to adopt new technologies that we know are going to take 15 years before they get out there?

GEOFF HUSTON: I did actually look across those 2.8 million users that received the ad which is smeared across the globe, and 58% of those users used less than 1% of the unique resolvers, so there is an awful lot of funneling and because of that, I'd like to think the ones that handle the biggest client populations are well fed and watered and maintained. But unlike a query out of the stone age that only old resolvers will sort of know about that I could identify how recent things are because when you are the authoritative name server, the folk who are asking you questions don't generally say, Hi, I am version 8 of BIND or whatever, they don't tell you this stuff so fingerprinting queries is kind of difficult, but I suspect it's not as grim as you think, the tail end of really bad stuff serves a very small population of the clients. So you can think about changing the world and bringing a lot of the world with you and not damaging too much. There is some collateral damage but maybe it is the 2.6% again rearing their little head.

JIM REID: I am going to close the mics off at this point because we are now into the lunch break so we have got.

AUDIENCE SPEAKER: We have seen presentations and we had a discussion last Sunday about this topic. About all sorts of vulnerabilities that we see like the fragmentation, and I don't think the fragmentation itself is an issue. All these new sorts of Cominskey like attacks can be solved by DNSSEC is my opinion, so I really like the presentation saying that do DNSSEC you are not vulnerable to cache pollution. But on the RRL, I have a different, well, perspective; we use RRL since we first saw the amplification attacks and that was actually quite a principle decision for us to do because we are dropping queries and we are not supposed to do that. But it was our only defence that we could do. Now, lately, I hear these proposals of going to do DNS over TCP and my experience has been that the only time we got in trouble with response rate limiting putting a factor to 2 if you want to protection yourself do DNSSEC, is when a an attack was played over open recursive resolvers like we were attacked over the open DNS clued and we were not able to handle all the TCP queries so we did response rate limiting, we asked the open DNS resolvers to reply on ?? or to retry over TCP and we couldn't handle the TCP coming in. So, I think we need some more data on that or whether or not it's really viable to do DNS over TCP.

SPEAKER: On TCP, the people who build DNS software as TCP was something happened once in a lifetime and I think that will change and people probably need to adapt their software or do something so they handle it better. It's probably I think a problem that large volumes of it. CP probably can't be consumed but I think it's a solvable problem.

GEOFF HUSTON: A couple of years ago I had a bad idea and simulated UDP and TCP, it is possible, you are trying to do a simple handshake, get the query, cut the connection and if you are willing to use your own TCP drivers you can cut the overheads substantially, because what you have really got is a cookie that is generated from a SYN exchange to say the source address is really and another interchange that says query response, and if you are willing to cut corners, high initial window rate, all that kind of stuff, and don't worry about doing the correct Finn handoof, just send the Finn and die. I think you will find that you can reduce the load on your serve but if you are using a standard TCP driver you are going to Coombe memory, process blocks and everything else.

AUDIENCE SPEAKER: How long it Will it take for that to be deployed.

JIM REID: Six months.

GEOFF HUSTON: With the University of technology, took that, wrote a driver and you can contact the foe there and see how it works. It is an amazing beast that is a hybrid TCP over UDP.

SHANE KERR: That was an interesting idea but I don't remember the details but I believe it pushes the complexity and state and load to the resolvers, so it doesn't really for the overall system save you, it will help the authoritative server.

GEOFF HUSTON: Precisely, there is badness in the world and it's saying it's yours and not mine at this point.

AUDIENCE SPEAKER: Shane made me come up here. He was talking about the futility of trying to go forward, given all the old stuff. And we had the same idea that we have so many challenges out here and I think we have got to recognise where do we want to go to and forget about how long it takes to get there because the abusers are innovating what we have currently and if we leave it here forever they are going to continue to innovate using what we have already built. We have to react to both the theoretical perceived threats and also but more importantly what we see happening and we see a lot of these theoretical papers saying here is what go bad. Today we have to redress and get to short?term fixes and have a long?term goal, how do we make this protocol be what it's supposed to be. Utility used everywhere, reliable but not be abused. What we have now can be abused.

AUDIENCE SPEAKER: Short comment, I can't resist. This gives me the image the DNS people are in very small car, driving and discovering obstacles in real life. Oh, UDP, it's harm so let's do RRL. But it's harm also, let's go get to TCP. We will be dead. Think of BCP 38, all the operators get them here maybe at IETF and convince them the most harm is getting the network spoofed and DNS pushing back and forth and getting the resolvers, oh, authoritative name servers, it can't continue this way.

JIM REID: OK. I am reminded of a quote by the famous John Milton Keynes, he said: In the long run we are all dead. I would like to thank all the speakers and hope to see you after lunch at 2 o'clock. And thanks to the nice lady for the stenographer and Chris for taking notes and to managing the Jabber room. Thank you.