Create and Expose a public DNS service

cover

What is DNS ?

The Domain Name System (DNS) is a system that provided human readable names for computers, services and other resources connected to the internet. Basic records allow to translate a Domain Name (that humans can understand) into an IP Address (that computer understand for routing).

DNS is used all the time, like right now, for accessing this blog, your computer make a DNS request to translate lunik.tiwabbit.fr into an IP. Then it requested the web page to the IP address resolved by the DNS query. You can view this behavior by opening the developper console on your web browser :

blog-dns-debug-through-developper-console

Warning

All the IP address and DNS names used in this blog post are no longer in use by my project. Please don't make stupid things and ignore them.

Make DNS query

You can make basic DNS query using nslookup on your terminal :

nslookup lunik.tiwabbit.fr

You will get something like :

Name: lunik.tiwabbit.fr
Address: 13.224.247.30
Name: lunik.tiwabbit.fr
Address: 13.224.247.31
Name: lunik.tiwabbit.fr
Address: 13.224.247.51
Name: lunik.tiwabbit.fr
Address: 13.224.247.100

Where to deploy the service ?

I have decided to host my DNS service on a public cloud provider named Scaleway.

Here is a global view of the architecture :

scalway-architecture

I have decided to expose two public DNS servers one in each region : France (Paris 1) and Netherlands (Amsterdam 1).

I choose those two regions because theire are the only one to propose Stardust Instances. DNS servers doesn't require a tons of resources, it allow me to reduce the project cost to a minimum.

The two servers are exposed with public flexible IPs (one v4 and one v6 each) address that can be resolved via two DNS entries for more convenience.

The security was my main concern so I have attached a security group to my instances allowing me to filter incoming and outgoing traffic.

Deploying the infrastructure

Now that I know what architecture I want for the service. I need to deploy it on the cloud provider. I decided to use Terraform to create all the resources and bind them together.

Following the Terraform documentation for creating my ressource is pretty straight forward. After 1 hour I had the necessary code. Now I just needed to apply it.

Here is what the plan looked like (truncated) :

Terraform will perform the following actions:

  # scaleway_instance_ip.france["tiwabbit-dns-01"] will be created
  + resource "scaleway_instance_ip" "france" {
      + address         = (known after apply)
      + id              = (known after apply)
      + reverse         = "dns01.tiwabbit.fr"
      + server_id       = (known after apply)
      + zone            = "fr-par-1"
    }

  # scaleway_instance_security_group.france will be created
  + resource "scaleway_instance_security_group" "france" {
      + description             = "dns"
      + enable_default_security = true
      + external_rules          = false
      + id                      = (known after apply)
      + inbound_default_policy  = "drop"
      + name                    = "public-dns"
      + outbound_default_policy = "accept"
      + stateful                = true
      + zone                    = "fr-par-1"

      + inbound_rule {
          + action   = "accept"
          + ip       = "X.X.X.X"
          + port     = 22
          + protocol = "TCP"
        }
      + inbound_rule {
          + action   = "accept"
          + ip_range = "0.0.0.0/0"
          + port     = 53
          + protocol = "UDP"
        }
      + inbound_rule {
          + action   = "accept"
          + ip_range = "::/0"
          + port     = 53
          + protocol = "UDP"
        }
    }

  # scaleway_instance_server.france["tiwabbit-dns-01"] will be created
  + resource "scaleway_instance_server" "france" {
      + enable_dynamic_ip                = false
      + enable_ipv6                      = true
      + id                               = (known after apply)
      + image                            = "fr-par-1/5c8bbf4b-10f0-4cac-863b-4561781043ff"
      + ip_id                            = (known after apply)
      + ipv6_address                     = (known after apply)
      + ipv6_gateway                     = (known after apply)
      + ipv6_prefix_length               = (known after apply)
      + name                             = "tiwabbit-dns-01"
      + private_ip                       = (known after apply)
      + public_ip                        = (known after apply)
      + security_group_id                = "fr-par-1/dns"
      + state                            = "started"
      + type                             = "STARDUST1-S"
      + zone                             = "fr-par-1"

      + root_volume {
          + size_in_gb            = "10"
          + volume_id             = (known after apply)
        }
    }

[...]

Plan: 8 to add, 0 to change, 0 to destroy.

After completing the apply this is what I had on Scaleway Console :

Instances : scaleway-console-instances

Volumes : scaleway-console-volumes

IP addresses : scaleway-console-flexible-ips

Security groups : scaleway-console-security-group Here, you can see that I'm only allowing inbound traffic on the port 53 which is the one used by DNS servers. (The first rule with port 22 allow me to manage the server from a private location using SSH)

Installing and configuring the service

Now that I have two brand new server at my disposition, I need to install and configure the DNS software to run on. There are running Fedora 32 with a 5.6 Linux Kernel :

[root@tiwabbit-dns-01 ~]# screenfetch
           /:-------------:\          root@tiwabbit-dns-01
        :-------------------::        OS: Fedora 
      :-----------/shhOHbmp---:\      Kernel: x86_64 Linux 5.6.6-300.fc32.x86_64
    /-----------omMMMNNNMMD  ---:     Uptime: 18m
   :-----------sMMMMNMNMP.    ---:    Packages: 405
  :-----------:MMMdP-------    ---\   Shell: bash 5.0.11
 ,------------:MMMd--------    ---:   Disk: 1.0G / 9.6G (11%)
 :------------:MMMd-------    .---:   CPU: AMD EPYC 7281 16-Core @ 2.096GHz
 :----    oNMMMMMMMMMNho     .----:   RAM: 293MiB / 969MiB
 :--     .+shhhMMMmhhy++   .------/  
 :-    -------:MMMd--------------:   
 :-   --------/MMMd-------------;    
 :-    ------/hMMMy------------:     
 :-- :dMNdhhdNMMNo------------;      
 :---:sdNMMMMNds:------------:       
 :------:://:-------------::         
 :---------------------://

Fun fact Scaleway Stardust instance runs on AMD EPYC SoC !

I have decided to use Bind9 DNS server because it's already packaged in many distributions, there are a lot of documentation and the community is strong.

I'm using Ansible to deploy all the stack : base linux config, Bind9, Firewalld, Fail2Ban

Tip

I have alreay talk about Firewalld and Fail2Ban in another blog post : Securing web entrypoint from external threats

But for the purpose of this blog post I will detail equivalent bash commands that can be used.

Bind9 installation and configuration

The installation of Bind9 is pretty strait forward. Since it's already packaged I only need to make a simple command to install it :

[root@tiwabbit-dns-01 ~]# dnf install bind bind-utils

Last metadata expiration check: 0:02:41 ago on Fri 05 Nov 2021 09:49:15 AM UTC.
Dependencies resolved.
============================================================================================================================================================================================================
 Package                                                       Architecture                            Version                                               Repository                                Size
============================================================================================================================================================================================================
Installing:
 bind                                                          x86_64                                  32:9.11.28-1.fc32                                     updates                                  2.0 M
 bind-utils                                                    x86_64                                  32:9.11.28-1.fc32                                     updates                                  233 k
Installing dependencies:
 bind-dnssec-doc                                               noarch                                  32:9.11.28-1.fc32                                     updates                                   46 k
 bind-libs                                                     x86_64                                  32:9.11.28-1.fc32                                     updates                                   90 k
 bind-libs-lite                                                x86_64                                  32:9.11.28-1.fc32                                     updates                                  1.1 M
 bind-license                                                  noarch                                  32:9.11.28-1.fc32                                     updates                                   16 k
 fstrm                                                         x86_64                                  0.5.0-2.fc32                                          fedora                                    28 k
 mariadb-connector-c                                           x86_64                                  3.1.12-1.fc32                                         updates                                  203 k
 mariadb-connector-c-config                                    noarch                                  3.1.12-1.fc32                                         updates                                   11 k
 policycoreutils-python-utils                                  noarch                                  3.0-2.fc32                                            fedora                                    83 k
 protobuf-c                                                    x86_64                                  1.3.2-2.fc32                                          fedora                                    35 k
 python3-bind                                                  noarch                                  32:9.11.28-1.fc32                                     updates                                   64 k
Installing weak dependencies:
 bind-dnssec-utils                                             x86_64                                  32:9.11.28-1.fc32                                     updates                                  128 k

Transaction Summary
============================================================================================================================================================================================================
Install  13 Packages

Total download size: 4.0 M
Installed size: 10 M
Is this ok [y/N]:

Once it's installed, using the Bind9 documentation I have put the following configuration in /etc/named.conf :

acl "managment" {
    X.X.X.X/32;
};
acl "public" {
    0.0.0.0/0;
    ::/0;
};

options {

  dump-file          "/etc/named/data/cache_dump.db";
  statistics-file    "/etc/named/data/named_stats.txt";
  memstatistics-file "/etc/named/data/named_mem_stats.txt";
  secroots-file      "/etc/named/data/named.secroots";
  recursing-file     "/etc/named/data/named.recursing";

  listen-on port 53 { any; };
  listen-on-v6 port 53 { any; };

  allow-transfer { none; };

  max-cache-size 70%;
  allow-query-cache {
    127.0.0.1;
    localhost;
    managment;
    public;
  };

  allow-query {
    127.0.0.1;
    localhost;
    managment;
    public;
  };

  recursion yes;
  allow-recursion {
    127.0.0.1;
    localhost;
    managment;
    public;
  };

  dnssec-enable yes;
  dnssec-validation yes;

  prefetch 4 10;

  rate-limit {
    ipv4-prefix-length 28;
    ipv6-prefix-length 56;
    responses-per-second 20;
    window 5;
    slip 3;
  };

  managed-keys-directory "/var/named/dynamic";
  geoip-directory        "/usr/share/GeoIP";

  pid-file        "/run/named/named.pid";
  session-keyfile "/run/named/session.key";

  hostname "dns01.tiwabbit.fr";
  server-id "dns01.tiwabbit.fr";

  /* https://fedoraproject.org/wiki/Changes/CryptoPolicy */
  include "/etc/crypto-policies/back-ends/bind.config";
};

statistics-channels {
  inet 127.0.0.1 port 8053 allow { 127.0.0.1; };
};


logging {
  channel client_file {
    file "/var/log/named/client.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel cname_file {
    file "/var/log/named/cname.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel config_file {
    file "/var/log/named/config.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel database_file {
    file "/var/log/named/database.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel default_file {
    file "/var/log/named/default.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel delegation-only_file {
    file "/var/log/named/delegation-only.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel dispatch_file {
    file "/var/log/named/dispatch.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel dnssec_file {
    file "/var/log/named/dnssec.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel dnstap_file {
    file "/var/log/named/dnstap.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel edns-disabled_file {
    file "/var/log/named/edns-disabled.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel general_file {
    file "/var/log/named/general.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel lame-servers_file {
    file "/var/log/named/lame-servers.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel network_file {
    file "/var/log/named/network.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel notify_file {
    file "/var/log/named/notify.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel queries_file {
    file "/var/log/named/queries.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel query-errors_file {
    file "/var/log/named/query-errors.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel rate-limit_file {
    file "/var/log/named/rate-limit.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel resolver_file {
    file "/var/log/named/resolver.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel rpz_file {
    file "/var/log/named/rpz.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel security_file {
    file "/var/log/named/security.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel spill_file {
    file "/var/log/named/spill.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel trust-anchor-telemetry_file {
    file "/var/log/named/trust-anchor-telemetry.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel unmatched_file {
    file "/var/log/named/unmatched.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel update_file {
    file "/var/log/named/update.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel update-security_file {
    file "/var/log/named/update-security.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel xfer-in_file {
    file "/var/log/named/xfer-in.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel xfer-out_file {
    file "/var/log/named/xfer-out.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };

  category client { client_file; default_debug; };
  category cname { cname_file; default_debug; };
  category config { config_file; default_debug; };
  category database { database_file; default_debug; };
  category default { default_file; default_debug; };
  category delegation-only { delegation-only_file; default_debug; };
  category dispatch { dispatch_file; default_debug; };
  category dnssec { dnssec_file; default_debug; };
  category dnstap { dnstap_file; default_debug; };
  category edns-disabled { edns-disabled_file; default_debug; };
  category general { general_file; default_debug; };
  category lame-servers { lame-servers_file; default_debug; };
  category network { network_file; default_debug; };
  category notify { notify_file; default_debug; };
  category queries { queries_file; default_debug; };
  category query-errors { query-errors_file; default_debug; };
  category rate-limit { rate-limit_file; default_debug; };
  category resolver { resolver_file; default_debug; };
  category rpz { rpz_file; default_debug; };
  category security { security_file; default_debug; };
  category spill { spill_file; default_debug; };
  category trust-anchor-telemetry { trust-anchor-telemetry_file; default_debug; };
  category unmatched { unmatched_file; default_debug; };
  category update { update_file; default_debug; };
  category update-security { update-security_file; default_debug; };
  category xfer-in { xfer-in_file; default_debug; };
  category xfer-out { xfer-out_file; default_debug; };
};

Wow that's a lot !

The important line to configure are :

Access-control lists

ACLs allow you to choose the behavior of the service depending on the client IP address.

Let's take a look at my configuration :

acl "managment" {
    X.X.X.X/32;
};
acl "public" {
    0.0.0.0/0;
    ::/0;
};

Here I'm creating two ACL groups : public and managment. In each of these acl block I can put as many CIDRs as I want. public contains the IPv4 and IPv6 global CIDR. managment contains a unique IP CIDR (the one used to configure the service).

The you can reuse those ACLs groups in other part of the configuration like in allow-query, allow-query-cache, allow-recursion.

Query behavior

I choose to implement only three query behaviors to keep it simple :

allow-query define who is allow to make query to my DNS service. If the client IP is not in the list, the server will not respond.
allow-query-cache define from witch client the server should cache the responses. This allow the query to be resolved quicker the next time.
allow-recursion define who can make recursive queries. Query recursion is a DNS mechanism that find the IP associated with a DNS entry by making all the necessary requests one by one from the root servers. This allow to be independent when resolving queries (you don't need to forward the request to another public DNS server like 8.8.8.8 or 1.1.1.1)

Prefetching

The name says it all. This configuration allow the server to make DNS request on is own in anticipation of other requests. This allow to be quicker to response most of the time if a DNS entry is requested very often.

Each DNS entries have a Time To Live (or TTL). You can get it using dig :

dig lunik.tiwabbit.fr

; <<>> DiG 9.11.28-RedHat-9.11.28-1.fc32 <<>> lunik.tiwabbit.fr
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41640
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;lunik.tiwabbit.fr.   IN  A

;; ANSWER SECTION:
lunik.tiwabbit.fr.  66  IN  A 52.222.158.86
lunik.tiwabbit.fr.  66  IN  A 52.222.158.38
lunik.tiwabbit.fr.  66  IN  A 52.222.158.55
lunik.tiwabbit.fr.  66  IN  A 52.222.158.103

;; Query time: 0 msec
;; SERVER: 10.194.3.3#53(10.194.3.3)
;; WHEN: Fri Nov 05 10:13:42 UTC 2021
;; MSG SIZE  rcvd: 110

In the ANSWER SECTION :

lunik.tiwabbit.fr.  66  IN  A 52.222.158.86

66 is the TTL of this DNS entry. This means that in 66 seconds it is not valid anymore and the client need to make another request to get the new IP (most of the time, it doesn't change).

I my configuration I have :

prefetch 4 10;

If the server have cached a DNS entry with 4 or less TTL remaining then it will make a DNS query to refresh the cached one. 10 is a optional parameter which define the eligibility of the record for prefetching.

Rate limiting

This is maybe one of the most important parameter of a public DNS server. The goal of this configuration is to limit the number of response the server make if a client request to many time the same DNS entry. It is very useful to mitigate DNS amplification attacks.

Here is the configuration I have :

rate-limit {
  ipv4-prefix-length 28;
  ipv6-prefix-length 56;
  responses-per-second 20;
  window 5;
  slip 3;
};

If a client request more than 20 time the same DNS entry in a period of 5 seconds, the server drop 3 responses before responding (indefinitely). To prevent attack from large subnets the rate limite extends to a /28 IPv4 subnet and a /56 IPv6 subnet.

Logging

I made the choice to be very verbose with the logs that's why I have configured all possible channels.

The more interesting are - queries which log every query resolved by the server :

# /var/log/named/queries.log
05-Nov-2021 10:21:43.536 queries: info: client @0x7f76073dfe50 X.X.X.X#14821 (mms-iad.sp-prod.net): query: mms-iad.sp-prod.net IN A +E(0)DV (10.70.2.235)
05-Nov-2021 10:21:43.545 queries: info: client @0x7f760793d450 X.X.X.X#47583 (psja.isd.us): query: psja.isd.us IN MX +E(0)DV (10.70.2.235)
05-Nov-2021 10:21:43.547 queries: info: client @0x7f760759b400 X.X.X.X#50020 (global.reputation.invincea.com): query: global.reputation.invincea.com IN A +E(0)DV (10.70.2.235)
05-Nov-2021 10:21:43.556 queries: info: client @0x7f76075dddc0 X.X.X.X#38383 (forconzoomnyc233mmr.zoom.us): query: forconzoomnyc233mmr.zoom.us IN A + (10.70.2.235)
05-Nov-2021 10:21:43.566 queries: info: client @0x7f76077ba820 X.X.X.X#39161 (ps-membership.us-ctkip-ps3.dell.com): query: ps-membership.us-ctkip-ps3.dell.com IN A + (10.70.2.235)
05-Nov-2021 10:21:43.618 queries: info: client @0x7f7607a9f2c0 X.X.X.X#39161 (ps-membership.usgit.dell.com): query: ps-membership.usgit.dell.com IN A + (10.70.2.235)
05-Nov-2021 10:21:43.622 queries: info: client @0x7f76077fcbc0 X.X.X.X#56424 (icn.intl-global-adns.alibabacloud.com): query: icn.intl-global-adns.alibabacloud.com IN A + (10.70.2.235)
05-Nov-2021 10:21:43.623 queries: info: client @0x7f760765f400 X.X.X.X#38383 (forcon-zoomca193-123-14-158mmr.zoom.us): query: forcon-zoomca193-123-14-158mmr.zoom.us IN A + (10.70.2.235)
05-Nov-2021 10:21:43.635 queries: info: client @0x7f75f0ce5910 X.X.X.X#56548 (sfc-idzwww.riotgames.roblox.com.ru): query: sfc-idzwww.riotgames.roblox.com.ru IN A + (10.70.2.235)

- rate-limit which log every events regarding rate limiting behavior :

# /var/log/named/rate-limit.log
05-Nov-2021 10:21:39.631 rate-limit: info: client @0x7f76077cdd80 X.X.X.X#38383 (zoomff134-224-74-182mmrforcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:39.694 rate-limit: info: client @0x7f75f0cf3f10 X.X.X.X#38383 (bisdtx-orgforcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:39.752 rate-limit: info: client @0x7f76077fcbc0 X.X.X.X#38383 (kirklandwa-forcon.zoom.us): rate limit slip NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:39.863 rate-limit: info: client @0x7f76077eab00 X.X.X.X#38383 (friendsnrcforcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:39.942 rate-limit: info: client @0x7f76078e8610 X.X.X.X#38383 (deltatrust-org-uk-forcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:40.857 rate-limit: info: client @0x7f7607a13120 X.X.X.X#38383 (12mmrforcon.zoom.us): rate limit slip NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:40.871 rate-limit: info: client @0x7f76076509a0 X.X.X.X#38383 (datamarkgisforcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:40.891 rate-limit: info: client @0x7f7607866d60 X.X.X.X#38383 (wvsd208-forcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:40.907 rate-limit: info: client @0x7f7607146a70 X.X.X.X#38383 (zoomva198-251-217-184mmrforcon.zoom.us): rate limit slip NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:40.968 rate-limit: info: client @0x7f760713b020 X.X.X.X#38383 (zoomdvs185mmr-forcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting NXDOMAIN responses to X.X.X.X/28 for buffer.com  (bc75c42e)
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting error responses to X.X.X.X/28
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting NXDOMAIN responses to X.X.X.X/28 for restintergamma.nl  (97dd2767)
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting NXDOMAIN responses to X.X.X.X/28 for alibabacloud.com  (d22fa033)
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting NXDOMAIN responses to X.X.X.X/28 for dell.com  (98e605b5)
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting error responses to X.X.X.X/28
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting NXDOMAIN responses to X.X.X.X/28 for zoom.us  (4ef96ce9)

Monitoring

I choose to use Datadog to monitor the DNS servers.

Datadog agent installation

The installation is pretty simple with Ansible since they provide a galaxy role. I used it with the following configuration :

---

datadog_api_key: "{{ vault_datadog_api_key }}"
datadog_site: "datadoghq.eu"

datadog_agent_flavor: "datadog-agent"
datadog_agent_major_version: 7

datadog_enabled: yes

datadog_bind9_integration_version: 1.0.0

datadog_config:
  logs_enabled: true

datadog_checks:
  bind9:
    init_config:
    instances:
      - name: bind9
        url: "http://127.0.0.1:{{ bind_statistics_port }}"
    logs:
      - type: file
        path: /var/log/named/queries.log
        service: named
        source: bind9
        sourcecategory: queries
      - type: file
        path: /var/log/named/rate-limit.log
        service: named
        source: bind9
        sourcecategory: rate-limit

I'm activating the Bind9 and logs integrations.

Once deployed, I can see my agents through the Datadog web console in the Infrastructure panel :

datadog-host-map

Datadog metrics and logs

Metrics

Since I have activated the Bind9 integration in the agent configuration, I need to do the same in the web console. In the Integrations panel, search for Bind9 and follow the configuration guide.

datadog-bind-integration

Now I can see metrics showing up.

Logs

The logs are already showing up in the log explorer in the Logs panel :

datadog-log-explorer

But Datadog doesn't know how to understand those logs. Now he only sees a large string of characters. If I want analyse those logs, Datadog needs to parse them for me.

In Datadog builtin logs pipeline in the Logs panel, I can define a list of pattern and action to parse the log file produced by the Bind9 service. On some more popular software Datadog has already made the work for you but it seems Bind9 is an exception.

datadog-pipeline-library

Defining logs parsing pipelines

First I created a new pipeline and use predefined filters : source:bind9 and sourcecategory:queries. Those two filters are deducted from the Datadog agent configuration from earlier :

datadog_checks:
  bind9:
[...]
    logs:
[...]
      - type: file
        path: /var/log/named/queries.log
        service: named
        source: bind9
        sourcecategory: queries
[...]

Now that I'm only going to parse logs from the right source, I need a Grok parser will parse the log string. Grok parser define blocks in the string and put them into variables. When the line of log is parsed, it return a beautifully formatted JSON object.

Blocks are defined by "simplified" regex : number, word, date, ip, ...

Datadog allow me to do it really simply by have a "live" parsing view were you can see in real time what part of the log is parsed by which bloc.

datadog-grok-parser

So here is the final Grok parsing rule for queries logs :

default %{number}-%{word}-%{number}\s+%{date("HH:mm:ss.SSS"):timestamp}\s+queries:\s+%{word:status}:\s+client\s+@%{word:client.data}\s+%{ip:client.ip}\#%{number:client.port}\s+\(%{hostname:query.hostname}\):\s+query:\s+%{hostname:query.fqdn}\s+%{word:query.location}\s+%{word:query.qcode}\s+.*\s+\(%{ip:server.ip}\).*

With this example :

27-Oct-2021 16:53:43.594 queries: info: client @0x7fad52ee1fd0 127.0.0.1#64318 (google.fr): query: google.fr IN A +E(0) (10.69.86.243)

I get :

{
  "status": "info",
  "query": {
    "qcode": "A",
    "location": "IN",
    "hostname": "google.fr",
    "fqdn": "google.fr"
  },
  "client": {
    "ip": "127.0.0.1",
    "data": "0x7fad52ee1fd0",
    "port": 64318
  },
  "timestamp": 1636131223594,
  "server": {
    "ip": "10.69.86.243"
  }
}

Pretty neat !

Datadog allow to enhance that new JSON object with extra metadata. I my case, since I have the client IP addresse, I can use the GeoIP parser to find metadata about that IP. Now I can determine the ISP, the country and even the city of that client.

Here is an example :

05-Nov-2021 14:07:16.442 queries: info: client @0x7f6b462b2900 X.X.X.X#38383 (zoom.us): query: zoom.us IN A + (10.70.2.235)

{
  "client": {  
    "geoip": { 
      "as": {  
        "domain": "online.net",
        "name": "ONLINE S.A.S.",
        "number": "AS12876",
        "route": "51.15.0.0/16",
        "type": "isp"
      },
      "city": {
        "name": "Paris"
      },
      "continent": {
        "code": "EU",
        "name": "Europe"
      },
      "country": {
        "iso_code": "FR",
        "name": "France"
      },
      "ipAddress" : "X.X.X.X",
      "location": {
        "latitude": "48.85341",
        "longitude" "2.3488"
      },
      "subdivision": {
        "name": "Île-de-France"
      },
      "timezone": "Europe/Paris"
    }
  }
}

Dashboard

I can now begin the fun part of using a monitoring service : Make Dashboard and Graphs !

Here are some that I have created :

datadog-dashboard-overview datadog-dashboard-queries-by-code datadog-dashboard-client-by-location

Conclusions

Now I have a fully operational public DNS server. I can now configure enchanced bahaviour if I want, like blocking some domains names (malware, illegal stuff, ...).

Note : If you want to make your public DNS available for anyone, you can publish it on public-dns.info