Bind Misconfiguration/Bug creates too many DNSKEY queries

Table of Contents

1 Summary of results

At least some versions of ISC's bind recursive resolver, when configured with certain options and the old DNSKEY as its trust anchor, occasionally triggers significant numbers of DNSKEY queries for the root zone's DNKSEY set. This is occurring today, after the old DNSKEY (KSK2010) has been revoked and removed from the active DNSKEY list. I believe it, or something similar, was also occurring during the period that the KSK2010 key was advertised in the zone as a revoked DNSKEY. It seems to only sporadically happen, making reproducing the bug challenging.

To trigger the issue, I created an experiment to repeatedly start and stop bind and set in a consistent set of requests. See further below for detailed setup instructions.

1.1 Graphing the number of queries sent per experiment

Each bar in the plot below shows the number of queries for the root ('.') DNSKEY sent during each experiment. The quantity varies widely, with the large peaks being the problem this page documents. This graph shows that reproducing the bug is challenging, as after 20 experiments it only showed up 3 times (15% of the time).


1.2 Graphing the requests leaving bind over time

Each point in the following graph represents the number of queries for the root ('.') DNSKEY sent in a given second. Each different experiment run is shown using a different symbol/color. Note that some experiments show only DNSKEY queries in the beginning, and others generate many more DNSKEY queries over time. Some end these queries, some do not. Note the periodicity, which I originally thought to be 60s but actually seems shorter when measuring it (the vertical dotted lines are the 60s intervals).


2 Experimental Setup

To reproduce these results, we take a freshly compiled and installed copy of bind 9.11.5-P4, and configure it with specific settings:

./configure --prefix=/usr/local/bind-9.11.5-P4

2.1 Bind config file

The bind configuration file I used and saved in /usr/local/bind-9.11.5-P4/etc/named.conf:

options {
        listen-on port 53 {; };
        listen-on-v6 port 53 { ::1; };
        directory       "/usr/local/bind-9.11.5-P4/var/named";
        dump-file       "/usr/local/bind-9.11.5-P4/var/named/data/cache_dump.db";
        statistics-file "/usr/local/bind-9.11.5-P4/var/named/data/named_stats.txt";
        memstatistics-file "/usr/local/bind-9.11.5-P4/var/named/data/named_mem_stats.txt";
        secroots-file   "/usr/local/bind-9.11.5-P4/var/named/data/named.secroots";
        recursing-file  "/usr/local/bind-9.11.5-P4/var/named/data/named.recursing";
        allow-query     { localhost; };

        recursion yes;

        dnssec-enable no;
        //dnssec-validation yes;

        managed-keys-directory "/usr/local/bind-9.11.5-P4/var/named/dynamic";

        pid-file "/run/named/named.pid";
        session-keyfile "/run/named/session.key";

zone "." IN {
        type hint;
        file "named.ca";

include "/usr/local/bind-9.11.5-P4/etc/bind.keys";

2.2 Out of date bind.keys file

In many instances, some linux and other packaging software refuses to overwrite existing configuration files when a file already exists. Assuming this may have happened on a system, we replace the newly installed bind.keys file with an old version that contains only the DNKSEY-2010 key (comments removed for brevity):

managed-keys {
        . initial-key 257 3 8 "AwEAAaz/tAm8yTn4Mfeh5eyI96WSVexTBAvkMgJzkKTOiW1vkIbzxeF3

2.3 Out of date managed keys file

Similarly, we replace the /usr/local/bind-9.11.5-P4/var/named/dynamic/managed-keys.bind file with just the old key (taken from a system that had never updated it using 5011 processing):

$TTL 0  ; 0 seconds
@                       IN SOA  . . (
                                4          ; serial
                                0          ; refresh (0 seconds)
                                0          ; retry (0 seconds)
                                0          ; expire (0 seconds)
                                0          ; minimum (0 seconds)
                        KEYDATA 20150704162113 20150703162113 19700101000000 257 3 8 (
                                ) ; KSK; alg = RSASHA256; key id = 19036
                                ; next refresh: Sat, 04 Jul 2015 16:21:13 GMT
                                ; trusted since: Fri, 03 Jul 2015 16:21:13 GMT

3 Experimental procedure

To trigger the bug we perform the following task repeatedly:

  1. start bind
  2. start tcpdump
  3. Every 30 seconds:
    1. dig @localhost example.com
    2. sleep 1
    3. dig @localhost example.org
    4. sleep 1
    5. … repeat 7 times total

The results of this experiment and studying the resulting measurements are shown at the top of this page.

4 Follow-on questions

  1. Does the bug only happen when the bind.keys file is included, or is it enough to have just the (old) trust-anchor listed in the managed keys set?
  2. Does it matter what the state of the managed keys set is? Maybe it's just the bind.keys file that triggers the problem?
  3. What of the above named.conf settings are actually necessary to trigger the issue?
  4. What other versions of bind exhibit this behavior?
  5. Is this the same bug that was seen during the period where the DNSKEY set was marked as revoked? If so, are we about to see an increasing trend again? If different, how do we find and fix the other bug?
  6. What's the periodicity of the results?
  7. What causes the queries to end after a while for some running instances and not others?

Author: Wes Hardaker

Created: 2019-04-04 Thu 15:18