---
title: Create and Expose a public DNS service
date: 2021-12-24
slug: create-public-dns-service
authors:
- lunik
description: I'm explaining how I have created an exposed a public DNS service
tags:
- dns
- server
- public
- bind9
- bind
- named
- cloud
- scaleway
- datadog
---

<!--
# CHANGELOG

-->

![cover](/blog/img/posts/2021-12-24-create-public-dns-service/cover.png)

## What is DNS ?

The [Domain Name System (DNS)][dns-rfc] is a system that provided human readable names for computers, services and other resources connected to the internet. Basic records allow to translate a [Domain Name][domain-name-wikipedia] (that humans can understand) into an [IP Address][ip-address-wikipedia] (that computer understand for routing).

<!-- truncate -->

[DNS][dns-rfc] is used all the time, like right now, for accessing this blog, your computer make a [DNS][dns-rfc] request to translate `lunik.tiwabbit.fr` into an [IP][ip-address-wikipedia]. Then it requested the web page to the [IP address][ip-address-wikipedia] resolved by the [DNS][dns-rfc] query.
You can view this behavior by opening the [developper console][developper-console-wikipedia] on your web browser :

![blog-dns-debug-through-developper-console](/blog/img/posts/2021-12-24-create-public-dns-service/dns_safari_debug.jpg)

:::warning
All the IP address and [DNS][dns-rfc] names used in this blog post are no longer in use by my project. Please don't make stupid things and ignore them.
:::

### Make DNS query

You can make basic [DNS][dns-rfc] query using `nslookup` on your terminal :
```shell
nslookup lunik.tiwabbit.fr
```

You will get something like :
```shell
Name: lunik.tiwabbit.fr
Address: 13.224.247.30
Name: lunik.tiwabbit.fr
Address: 13.224.247.31
Name: lunik.tiwabbit.fr
Address: 13.224.247.51
Name: lunik.tiwabbit.fr
Address: 13.224.247.100
```

## Where to deploy the service ?

I have decided to host my [DNS][dns-rfc] service on a public cloud provider named [Scaleway][scaleway-website].

Here is a global view of the architecture : 

![scalway-architecture](/blog/img/posts/2021-12-24-create-public-dns-service/scaleway-arch.jpg)

I have decided to expose two public [DNS][dns-rfc] servers one in each region : France (Paris 1) and Netherlands (Amsterdam 1).

I choose those two regions because theire are the only one to propose [Stardust Instances][scaleway-stardust-instance]. [DNS][dns-rfc] servers doesn't require a tons of resources, it allow me to reduce the project cost to a minimum.

The two servers are exposed with public `flexible` [IP][ip-address-wikipedia]s (one [v4][ipv4-wikipedia] and one [v6][ipv6-wikipedia] each) address that can be resolved via two [DNS][dns-rfc] entries for more convenience.

The security was my main concern so I have attached a security group to my instances allowing me to filter incoming and outgoing traffic.

## Deploying the infrastructure

Now that I know what architecture I want for the service. I need to deploy it on the cloud provider.
I decided to use [Terraform][scaleway-terraform] to create all the resources and bind them together.

Following the [Terraform documentation][scaleway-terraform-provider] for creating my ressource is pretty straight forward. After 1 hour I had the necessary code. Now I just needed to apply it.

Here is what the `plan` looked like (truncated) : 

```hcl
Terraform will perform the following actions:

  # scaleway_instance_ip.france["tiwabbit-dns-01"] will be created
  + resource "scaleway_instance_ip" "france" {
      + address         = (known after apply)
      + id              = (known after apply)
      + reverse         = "dns01.tiwabbit.fr"
      + server_id       = (known after apply)
      + zone            = "fr-par-1"
    }

  # scaleway_instance_security_group.france will be created
  + resource "scaleway_instance_security_group" "france" {
      + description             = "dns"
      + enable_default_security = true
      + external_rules          = false
      + id                      = (known after apply)
      + inbound_default_policy  = "drop"
      + name                    = "public-dns"
      + outbound_default_policy = "accept"
      + stateful                = true
      + zone                    = "fr-par-1"

      + inbound_rule {
          + action   = "accept"
          + ip       = "X.X.X.X"
          + port     = 22
          + protocol = "TCP"
        }
      + inbound_rule {
          + action   = "accept"
          + ip_range = "0.0.0.0/0"
          + port     = 53
          + protocol = "UDP"
        }
      + inbound_rule {
          + action   = "accept"
          + ip_range = "::/0"
          + port     = 53
          + protocol = "UDP"
        }
    }

  # scaleway_instance_server.france["tiwabbit-dns-01"] will be created
  + resource "scaleway_instance_server" "france" {
      + enable_dynamic_ip                = false
      + enable_ipv6                      = true
      + id                               = (known after apply)
      + image                            = "fr-par-1/5c8bbf4b-10f0-4cac-863b-4561781043ff"
      + ip_id                            = (known after apply)
      + ipv6_address                     = (known after apply)
      + ipv6_gateway                     = (known after apply)
      + ipv6_prefix_length               = (known after apply)
      + name                             = "tiwabbit-dns-01"
      + private_ip                       = (known after apply)
      + public_ip                        = (known after apply)
      + security_group_id                = "fr-par-1/dns"
      + state                            = "started"
      + type                             = "STARDUST1-S"
      + zone                             = "fr-par-1"

      + root_volume {
          + size_in_gb            = "10"
          + volume_id             = (known after apply)
        }
    }

[...]

Plan: 8 to add, 0 to change, 0 to destroy.
```

After completing the `apply` this is what I had on Scaleway Console :

**Instances :**
![scaleway-console-instances](/blog/img/posts/2021-12-24-create-public-dns-service/scaleway-console-instances.png)

**Volumes :**
![scaleway-console-volumes](/blog/img/posts/2021-12-24-create-public-dns-service/scaleway-console-volumes.png)

**[IP addresses][ip-address-wikipedia] :**
![scaleway-console-flexible-ips](/blog/img/posts/2021-12-24-create-public-dns-service/scaleway-console-flexible-ips.png)

**Security groups :**
![scaleway-console-security-group](/blog/img/posts/2021-12-24-create-public-dns-service/scaleway-console-security-group.png)
Here, you can see that I'm only allowing inbound traffic on the port `53` which is the one used by [DNS][dns-rfc] servers. (The first rule with port `22` allow me to manage the server from a private location using SSH)

## Installing and configuring the service

Now that I have two brand new server at my disposition, I need to install and configure the [DNS][dns-rfc] software to run on.
There are running [Fedora 32][fedora-website] with a `5.6` [Linux Kernel][linux-kernel-github] : 
```shell
[root@tiwabbit-dns-01 ~]# screenfetch
           /:-------------:\          root@tiwabbit-dns-01
        :-------------------::        OS: Fedora 
      :-----------/shhOHbmp---:\      Kernel: x86_64 Linux 5.6.6-300.fc32.x86_64
    /-----------omMMMNNNMMD  ---:     Uptime: 18m
   :-----------sMMMMNMNMP.    ---:    Packages: 405
  :-----------:MMMdP-------    ---\   Shell: bash 5.0.11
 ,------------:MMMd--------    ---:   Disk: 1.0G / 9.6G (11%)
 :------------:MMMd-------    .---:   CPU: AMD EPYC 7281 16-Core @ 2.096GHz
 :----    oNMMMMMMMMMNho     .----:   RAM: 293MiB / 969MiB
 :--     .+shhhMMMmhhy++   .------/  
 :-    -------:MMMd--------------:   
 :-   --------/MMMd-------------;    
 :-    ------/hMMMy------------:     
 :-- :dMNdhhdNMMNo------------;      
 :---:sdNMMMMNds:------------:       
 :------:://:-------------::         
 :---------------------://  
```

Fun fact [Scaleway Stardust instance][scaleway-stardust-instance] runs on [AMD EPYC][amd-epyc] [SoC][soc-wikipedia] !

I have decided to use [Bind9][bind9-website] [DNS][dns-rfc] server because it's already packaged in many distributions, there are a lot of documentation and the community is strong.

I'm using [Ansible][ansible-website] to deploy all the stack : base linux config, [Bind9][bind9-website], [Firewalld][firewalld-website], [Fail2Ban][fail2ban-website]

:::tip
I have alreay talk about [Firewalld][firewalld-website] and [Fail2Ban][fail2ban-website] in another blog post : Securing web entrypoint from external threats
:::

But for the purpose of this blog post I will detail equivalent `bash` commands that can be used.

### Bind9 installation and configuration

The installation of [Bind9][bind9-website] is pretty strait forward. Since it's already packaged I only need to make a simple command to install it :

```shell
[root@tiwabbit-dns-01 ~]# dnf install bind bind-utils

Last metadata expiration check: 0:02:41 ago on Fri 05 Nov 2021 09:49:15 AM UTC.
Dependencies resolved.
============================================================================================================================================================================================================
 Package                                                       Architecture                            Version                                               Repository                                Size
============================================================================================================================================================================================================
Installing:
 bind                                                          x86_64                                  32:9.11.28-1.fc32                                     updates                                  2.0 M
 bind-utils                                                    x86_64                                  32:9.11.28-1.fc32                                     updates                                  233 k
Installing dependencies:
 bind-dnssec-doc                                               noarch                                  32:9.11.28-1.fc32                                     updates                                   46 k
 bind-libs                                                     x86_64                                  32:9.11.28-1.fc32                                     updates                                   90 k
 bind-libs-lite                                                x86_64                                  32:9.11.28-1.fc32                                     updates                                  1.1 M
 bind-license                                                  noarch                                  32:9.11.28-1.fc32                                     updates                                   16 k
 fstrm                                                         x86_64                                  0.5.0-2.fc32                                          fedora                                    28 k
 mariadb-connector-c                                           x86_64                                  3.1.12-1.fc32                                         updates                                  203 k
 mariadb-connector-c-config                                    noarch                                  3.1.12-1.fc32                                         updates                                   11 k
 policycoreutils-python-utils                                  noarch                                  3.0-2.fc32                                            fedora                                    83 k
 protobuf-c                                                    x86_64                                  1.3.2-2.fc32                                          fedora                                    35 k
 python3-bind                                                  noarch                                  32:9.11.28-1.fc32                                     updates                                   64 k
Installing weak dependencies:
 bind-dnssec-utils                                             x86_64                                  32:9.11.28-1.fc32                                     updates                                  128 k

Transaction Summary
============================================================================================================================================================================================================
Install  13 Packages

Total download size: 4.0 M
Installed size: 10 M
Is this ok [y/N]:
```

Once it's installed, using the [Bind9 documentation][bind9-documentation] I have put the following configuration in `/etc/named.conf` :

```bind9

acl "managment" {
    X.X.X.X/32;
};
acl "public" {
    0.0.0.0/0;
    ::/0;
};

options {

  dump-file          "/etc/named/data/cache_dump.db";
  statistics-file    "/etc/named/data/named_stats.txt";
  memstatistics-file "/etc/named/data/named_mem_stats.txt";
  secroots-file      "/etc/named/data/named.secroots";
  recursing-file     "/etc/named/data/named.recursing";

  listen-on port 53 { any; };
  listen-on-v6 port 53 { any; };

  allow-transfer { none; };

  max-cache-size 70%;
  allow-query-cache {
    127.0.0.1;
    localhost;
    managment;
    public;
  };

  allow-query {
    127.0.0.1;
    localhost;
    managment;
    public;
  };

  recursion yes;
  allow-recursion {
    127.0.0.1;
    localhost;
    managment;
    public;
  };

  dnssec-enable yes;
  dnssec-validation yes;

  prefetch 4 10;

  rate-limit {
    ipv4-prefix-length 28;
    ipv6-prefix-length 56;
    responses-per-second 20;
    window 5;
    slip 3;
  };

  managed-keys-directory "/var/named/dynamic";
  geoip-directory        "/usr/share/GeoIP";

  pid-file        "/run/named/named.pid";
  session-keyfile "/run/named/session.key";

  hostname "dns01.tiwabbit.fr";
  server-id "dns01.tiwabbit.fr";

  /* https://fedoraproject.org/wiki/Changes/CryptoPolicy */
  include "/etc/crypto-policies/back-ends/bind.config";
};

statistics-channels {
  inet 127.0.0.1 port 8053 allow { 127.0.0.1; };
};


logging {
  channel client_file {
    file "/var/log/named/client.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel cname_file {
    file "/var/log/named/cname.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel config_file {
    file "/var/log/named/config.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel database_file {
    file "/var/log/named/database.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel default_file {
    file "/var/log/named/default.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel delegation-only_file {
    file "/var/log/named/delegation-only.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel dispatch_file {
    file "/var/log/named/dispatch.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel dnssec_file {
    file "/var/log/named/dnssec.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel dnstap_file {
    file "/var/log/named/dnstap.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel edns-disabled_file {
    file "/var/log/named/edns-disabled.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel general_file {
    file "/var/log/named/general.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel lame-servers_file {
    file "/var/log/named/lame-servers.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel network_file {
    file "/var/log/named/network.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel notify_file {
    file "/var/log/named/notify.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel queries_file {
    file "/var/log/named/queries.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel query-errors_file {
    file "/var/log/named/query-errors.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel rate-limit_file {
    file "/var/log/named/rate-limit.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel resolver_file {
    file "/var/log/named/resolver.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel rpz_file {
    file "/var/log/named/rpz.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel security_file {
    file "/var/log/named/security.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel spill_file {
    file "/var/log/named/spill.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel trust-anchor-telemetry_file {
    file "/var/log/named/trust-anchor-telemetry.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel unmatched_file {
    file "/var/log/named/unmatched.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel update_file {
    file "/var/log/named/update.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel update-security_file {
    file "/var/log/named/update-security.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel xfer-in_file {
    file "/var/log/named/xfer-in.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };
  channel xfer-out_file {
    file "/var/log/named/xfer-out.log" versions 3 size 5m;
    severity dynamic;
    print-time yes;
    print-category yes;
    print-severity yes;
  };

  category client { client_file; default_debug; };
  category cname { cname_file; default_debug; };
  category config { config_file; default_debug; };
  category database { database_file; default_debug; };
  category default { default_file; default_debug; };
  category delegation-only { delegation-only_file; default_debug; };
  category dispatch { dispatch_file; default_debug; };
  category dnssec { dnssec_file; default_debug; };
  category dnstap { dnstap_file; default_debug; };
  category edns-disabled { edns-disabled_file; default_debug; };
  category general { general_file; default_debug; };
  category lame-servers { lame-servers_file; default_debug; };
  category network { network_file; default_debug; };
  category notify { notify_file; default_debug; };
  category queries { queries_file; default_debug; };
  category query-errors { query-errors_file; default_debug; };
  category rate-limit { rate-limit_file; default_debug; };
  category resolver { resolver_file; default_debug; };
  category rpz { rpz_file; default_debug; };
  category security { security_file; default_debug; };
  category spill { spill_file; default_debug; };
  category trust-anchor-telemetry { trust-anchor-telemetry_file; default_debug; };
  category unmatched { unmatched_file; default_debug; };
  category update { update_file; default_debug; };
  category update-security { update-security_file; default_debug; };
  category xfer-in { xfer-in_file; default_debug; };
  category xfer-out { xfer-out_file; default_debug; };
};
```

Wow that's a lot !

The important line to configure are : 

#### Access-control lists

[ACL][acl-website]s allow you to choose the behavior of the service depending on the client [IP address][ip-address-wikipedia].

Let's take a look at my configuration : 
```bind9
acl "managment" {
    X.X.X.X/32;
};
acl "public" {
    0.0.0.0/0;
    ::/0;
};
```

Here I'm creating two [ACL][acl-website] groups : `public` and `managment`. In each of these `acl` block I can put as many [CIDR][cidr-wikipedia]s as I want.
`public` contains the [IPv4][ipv4-wikipedia] and [IPv6][ipv6-wikipedia] global [CIDR][cidr-wikipedia]. `managment` contains a unique [IP][ip-address-wikipedia] [CIDR][cidr-wikipedia] (the one used to configure the service).

The you can reuse those [ACL][acl-website]s groups in other part of the configuration like in `allow-query`, `allow-query-cache`, `allow-recursion`.

#### Query behavior

I choose to implement only three query behaviors to keep it simple :

- `allow-query` define who is allow to make query to my [DNS][dns-rfc] service. If the client [IP][ip-address-wikipedia] is not in the list, the server will not respond.
- `allow-query-cache` define from witch client the server should cache the responses. This allow the query to be resolved quicker the next time.
- `allow-recursion` define who can make [recursive queries][dns-recursion-article]. [Query recursion][dns-recursion-article] is a [DNS][dns-rfc] mechanism that find the [IP][ip-address-wikipedia] associated with a [DNS][dns-rfc] entry by making all the necessary requests one by one from the root servers. This allow to be independent when resolving queries (you don't need to forward the request to another public [DNS][dns-rfc] server like `8.8.8.8` or `1.1.1.1`)

#### Prefetching

The name says it all. This configuration allow the server to make [DNS][dns-rfc] request on is own in anticipation of other requests. This allow to be quicker to response most of the time if a [DNS][dns-rfc] entry is requested very often.

Each [DNS][dns-rfc] entries have a [Time To Live (or TTL)][ttl-wikipedia]. You can get it using `dig` :

```shell
dig lunik.tiwabbit.fr

; <<>> DiG 9.11.28-RedHat-9.11.28-1.fc32 <<>> lunik.tiwabbit.fr
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41640
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;lunik.tiwabbit.fr.   IN  A

;; ANSWER SECTION:
lunik.tiwabbit.fr.  66  IN  A 52.222.158.86
lunik.tiwabbit.fr.  66  IN  A 52.222.158.38
lunik.tiwabbit.fr.  66  IN  A 52.222.158.55
lunik.tiwabbit.fr.  66  IN  A 52.222.158.103

;; Query time: 0 msec
;; SERVER: 10.194.3.3#53(10.194.3.3)
;; WHEN: Fri Nov 05 10:13:42 UTC 2021
;; MSG SIZE  rcvd: 110
```

In the `ANSWER SECTION` :
```shell
lunik.tiwabbit.fr.  66  IN  A 52.222.158.86
```

`66` is the [TTL][ttl-wikipedia] of this [DNS][dns-rfc] entry. This means that in `66` seconds it is not valid anymore and the client need to make another request to get the new [IP][ip-address-wikipedia] (most of the time, it doesn't change).

I my configuration I have : 

```bind9
prefetch 4 10;
```

If the server have cached a [DNS][dns-rfc] entry with `4` or less [TTL][ttl-wikipedia] remaining then it will make a [DNS][dns-rfc] query to refresh the cached one. `10` is a optional parameter which define the eligibility of the record for prefetching.

#### Rate limiting

This is maybe one of the most important parameter of a public [DNS][dns-rfc] server. The goal of this configuration is to limit the number of response the server make if a client request to many time the same [DNS][dns-rfc] entry. It is very useful to mitigate [DNS amplification attacks][dns-amplification-attack].

Here is the configuration I have :

```bind9
rate-limit {
  ipv4-prefix-length 28;
  ipv6-prefix-length 56;
  responses-per-second 20;
  window 5;
  slip 3;
};
```

If a client request more than `20` time the same [DNS][dns-rfc] entry in a period of `5` seconds, the server drop `3` responses before responding (indefinitely).
To prevent attack from large subnets the rate limite extends to a `/28` [IPv4 subnet][ipv4-subnet-wikipedia] and a `/56` [IPv6 subnet][ipv6-subnet-wikipedia].

#### Logging

I made the choice to be very verbose with the logs that's why I have configured all possible channels.

The more interesting are
- `queries` which log every query resolved by the server :
```shell
# /var/log/named/queries.log
05-Nov-2021 10:21:43.536 queries: info: client @0x7f76073dfe50 X.X.X.X#14821 (mms-iad.sp-prod.net): query: mms-iad.sp-prod.net IN A +E(0)DV (10.70.2.235)
05-Nov-2021 10:21:43.545 queries: info: client @0x7f760793d450 X.X.X.X#47583 (psja.isd.us): query: psja.isd.us IN MX +E(0)DV (10.70.2.235)
05-Nov-2021 10:21:43.547 queries: info: client @0x7f760759b400 X.X.X.X#50020 (global.reputation.invincea.com): query: global.reputation.invincea.com IN A +E(0)DV (10.70.2.235)
05-Nov-2021 10:21:43.556 queries: info: client @0x7f76075dddc0 X.X.X.X#38383 (forconzoomnyc233mmr.zoom.us): query: forconzoomnyc233mmr.zoom.us IN A + (10.70.2.235)
05-Nov-2021 10:21:43.566 queries: info: client @0x7f76077ba820 X.X.X.X#39161 (ps-membership.us-ctkip-ps3.dell.com): query: ps-membership.us-ctkip-ps3.dell.com IN A + (10.70.2.235)
05-Nov-2021 10:21:43.618 queries: info: client @0x7f7607a9f2c0 X.X.X.X#39161 (ps-membership.usgit.dell.com): query: ps-membership.usgit.dell.com IN A + (10.70.2.235)
05-Nov-2021 10:21:43.622 queries: info: client @0x7f76077fcbc0 X.X.X.X#56424 (icn.intl-global-adns.alibabacloud.com): query: icn.intl-global-adns.alibabacloud.com IN A + (10.70.2.235)
05-Nov-2021 10:21:43.623 queries: info: client @0x7f760765f400 X.X.X.X#38383 (forcon-zoomca193-123-14-158mmr.zoom.us): query: forcon-zoomca193-123-14-158mmr.zoom.us IN A + (10.70.2.235)
05-Nov-2021 10:21:43.635 queries: info: client @0x7f75f0ce5910 X.X.X.X#56548 (sfc-idzwww.riotgames.roblox.com.ru): query: sfc-idzwww.riotgames.roblox.com.ru IN A + (10.70.2.235)
```
- `rate-limit` which log every events regarding rate limiting behavior :
```shell
# /var/log/named/rate-limit.log
05-Nov-2021 10:21:39.631 rate-limit: info: client @0x7f76077cdd80 X.X.X.X#38383 (zoomff134-224-74-182mmrforcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:39.694 rate-limit: info: client @0x7f75f0cf3f10 X.X.X.X#38383 (bisdtx-orgforcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:39.752 rate-limit: info: client @0x7f76077fcbc0 X.X.X.X#38383 (kirklandwa-forcon.zoom.us): rate limit slip NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:39.863 rate-limit: info: client @0x7f76077eab00 X.X.X.X#38383 (friendsnrcforcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:39.942 rate-limit: info: client @0x7f76078e8610 X.X.X.X#38383 (deltatrust-org-uk-forcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:40.857 rate-limit: info: client @0x7f7607a13120 X.X.X.X#38383 (12mmrforcon.zoom.us): rate limit slip NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:40.871 rate-limit: info: client @0x7f76076509a0 X.X.X.X#38383 (datamarkgisforcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:40.891 rate-limit: info: client @0x7f7607866d60 X.X.X.X#38383 (wvsd208-forcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:40.907 rate-limit: info: client @0x7f7607146a70 X.X.X.X#38383 (zoomva198-251-217-184mmrforcon.zoom.us): rate limit slip NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:40.968 rate-limit: info: client @0x7f760713b020 X.X.X.X#38383 (zoomdvs185mmr-forcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us  (4ef96ce9)
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting NXDOMAIN responses to X.X.X.X/28 for buffer.com  (bc75c42e)
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting error responses to X.X.X.X/28
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting NXDOMAIN responses to X.X.X.X/28 for restintergamma.nl  (97dd2767)
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting NXDOMAIN responses to X.X.X.X/28 for alibabacloud.com  (d22fa033)
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting NXDOMAIN responses to X.X.X.X/28 for dell.com  (98e605b5)
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting error responses to X.X.X.X/28
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting NXDOMAIN responses to X.X.X.X/28 for zoom.us  (4ef96ce9)
```

## Monitoring

I choose to use [Datadog][datadog-website] to monitor the [DNS][dns-rfc] servers. 

### Datadog agent installation

The installation is pretty simple with [Ansible][ansible-website] since they provide a [galaxy role][ansible-galaxy-datadog-role].
I used it with the following configuration : 
```yaml
---

datadog_api_key: "{{ vault_datadog_api_key }}"
datadog_site: "datadoghq.eu"

datadog_agent_flavor: "datadog-agent"
datadog_agent_major_version: 7

datadog_enabled: yes

datadog_bind9_integration_version: 1.0.0

datadog_config:
  logs_enabled: true

datadog_checks:
  bind9:
    init_config:
    instances:
      - name: bind9
        url: "http://127.0.0.1:{{ bind_statistics_port }}"
    logs:
      - type: file
        path: /var/log/named/queries.log
        service: named
        source: bind9
        sourcecategory: queries
      - type: file
        path: /var/log/named/rate-limit.log
        service: named
        source: bind9
        sourcecategory: rate-limit
```

I'm activating the [Bind9][bind9-website] and logs integrations.

Once deployed, I can see my agents through the [Datadog][datadog-website] web console in the `Infrastructure` panel :

![datadog-host-map](/blog/img/posts/2021-12-24-create-public-dns-service/datadog-host-map.png) 

### Datadog metrics and logs

#### Metrics

Since I have activated the [Bind9 integration][datadog-bind9-integration] in the agent configuration, I need to do the same in the web console. In the `Integrations` panel, search for `Bind9` and follow the configuration guide.

![datadog-bind-integration](/blog/img/posts/2021-12-24-create-public-dns-service/datadog-bind9-integration-01.png)

Now I can see metrics showing up.

#### Logs

The logs are already showing up in the log explorer in the `Logs` panel : 

![datadog-log-explorer](/blog/img/posts/2021-12-24-create-public-dns-service/datadog-log-explorer.png)

But [Datadog][datadog-website] doesn't know how to understand those logs. Now he only sees a large string of characters.
If I want analyse those logs, [Datadog][datadog-website] needs to parse them for me.

In [Datadog][datadog-website] builtin logs pipeline in the `Logs` panel, I can define a list of pattern and action to parse the log file produced by the [Bind9][bind9-website] service. On some more popular software [Datadog][datadog-website] has already made the work for you but it seems [Bind9][bind9-website] is an exception.

![datadog-pipeline-library](/blog/img/posts/2021-12-24-create-public-dns-service/datadog-log-pipeline-library.png)

##### Defining logs parsing pipelines

First I created a new pipeline and use predefined filters : `source:bind9` and `sourcecategory:queries`.
Those two filters are deducted from the [Datadog][datadog-website] agent configuration from earlier :

```yaml
datadog_checks:
  bind9:
[...]
    logs:
[...]
      - type: file
        path: /var/log/named/queries.log
        service: named
        source: bind9
        sourcecategory: queries
[...]
```

Now that I'm only going to parse logs from the right source, I need a [Grok parser][datadog-log-parsing] will parse the log string.
[Grok parser][datadog-log-parsing] define blocks in the string and put them into variables. When the line of log is parsed, it return a beautifully formatted [JSON][json-rfc] object.

Blocks are defined by "simplified" regex : `number`, `word`, `date`, `ip`, ...

[Datadog][datadog-website] allow me to do it really simply by have a "live" parsing view were you can see in real time what part of the log is parsed by which bloc.

![datadog-grok-parser](/blog/img/posts/2021-12-24-create-public-dns-service/datadog-grok-parser.png)

So here is the final [Grok parsing][datadog-log-parsing] rule for `queries` logs :
```grok
default %{number}-%{word}-%{number}\s+%{date("HH:mm:ss.SSS"):timestamp}\s+queries:\s+%{word:status}:\s+client\s+@%{word:client.data}\s+%{ip:client.ip}\#%{number:client.port}\s+\(%{hostname:query.hostname}\):\s+query:\s+%{hostname:query.fqdn}\s+%{word:query.location}\s+%{word:query.qcode}\s+.*\s+\(%{ip:server.ip}\).*
```

With this example :
```bind9
27-Oct-2021 16:53:43.594 queries: info: client @0x7fad52ee1fd0 127.0.0.1#64318 (google.fr): query: google.fr IN A +E(0) (10.69.86.243)
```
I get :
```json
{
  "status": "info",
  "query": {
    "qcode": "A",
    "location": "IN",
    "hostname": "google.fr",
    "fqdn": "google.fr"
  },
  "client": {
    "ip": "127.0.0.1",
    "data": "0x7fad52ee1fd0",
    "port": 64318
  },
  "timestamp": 1636131223594,
  "server": {
    "ip": "10.69.86.243"
  }
}
```

Pretty neat !

[Datadog][datadog-website] allow to enhance that new [JSON][json-rfc] object with extra metadata. I my case, since I have the client [IP addresse][ip-address-wikipedia], I can use the GeoIP parser to find metadata about that [IP][ip-address-wikipedia]. Now I can determine the ISP, the country and even the city of that client.

Here is an example :

```bind9
05-Nov-2021 14:07:16.442 queries: info: client @0x7f6b462b2900 X.X.X.X#38383 (zoom.us): query: zoom.us IN A + (10.70.2.235)
```

```json
{
  "client": {  
    "geoip": { 
      "as": {  
        "domain": "online.net",
        "name": "ONLINE S.A.S.",
        "number": "AS12876",
        "route": "51.15.0.0/16",
        "type": "isp"
      },
      "city": {
        "name": "Paris"
      },
      "continent": {
        "code": "EU",
        "name": "Europe"
      },
      "country": {
        "iso_code": "FR",
        "name": "France"
      },
      "ipAddress" : "X.X.X.X",
      "location": {
        "latitude": "48.85341",
        "longitude" "2.3488"
      },
      "subdivision": {
        "name": "Île-de-France"
      },
      "timezone": "Europe/Paris"
    }
  }
}
```

### Dashboard

I can now begin the fun part of using a monitoring service : Make Dashboard and Graphs !

Here are some that I have created :

![datadog-dashboard-overview](/blog/img/posts/2021-12-24-create-public-dns-service/datadog-dashboard-01.png)
![datadog-dashboard-queries-by-code](/blog/img/posts/2021-12-24-create-public-dns-service/datadog-dashboard-02.png)
![datadog-dashboard-client-by-location](/blog/img/posts/2021-12-24-create-public-dns-service/datadog-dashboard-03.png)

## Conclusions

Now I have a fully operational public [DNS][dns-rfc] server. I can now configure enchanced bahaviour if I want, like blocking some domains names (malware, illegal stuff, ...).

Note : If you want to make your public [DNS][dns-rfc] available for anyone, you can publish it on [public-dns.info][public-dns-info-website]

<!-- links -->

[acl-website]: https://en.wikipedia.org/wiki/Access-control_list
[amd-epyc]: https://www.amd.com/en/products/epyc
[ansible-galaxy-datadog-role]: https://galaxy.ansible.com/datadog/datadog
[ansible-website]: https://www.ansible.com
[bind9-documentation]: https://bind9.readthedocs.io/en/latest/reference.html
[bind9-website]: https://bind9.net
[cidr-wikipedia]: https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing
[datadog-bind9-integration]: https://docs.datadoghq.com/integrations/bind9/
[datadog-log-parsing]: https://docs.datadoghq.com/logs/log_configuration/parsing
[datadog-website]: https://www.datadoghq.com
[developper-console-wikipedia]: https://en.wikipedia.org/wiki/Web_development_tools
[dns-amplification-attack]: https://www.cloudflare.com/learning/ddos/dns-amplification-ddos-attack
[dns-recursion-article]: https://www.cloudflare.com/learning/dns/what-is-recursive-dns
[dns-rfc]: https://tools.ietf.org/html/rfc2929
[domain-name-wikipedia]: https://en.wikipedia.org/wiki/Domain_name
[fail2ban-website]: https://www.fail2ban.org
[fedora-website]: https://getfedora.org/en/
[firewalld-website]: https://firewalld.org
[ip-address-wikipedia]: https://en.wikipedia.org/wiki/IP_address
[ipv4-subnet-wikipedia]: https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#IPv4_CIDR_blocks
[ipv4-wikipedia]: https://en.wikipedia.org/wiki/IPv4
[ipv6-subnet-wikipedia]: https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#IPv6_CIDR_blocks
[ipv6-wikipedia]: https://en.wikipedia.org/wiki/IPv6
[json-rfc]: https://tools.ietf.org/html/rfc8259
[linux-kernel-github]: https://github.com/torvalds/linux
[public-dns-info-website]: https://public-dns.info
[scaleway-stardust-instance]: https://www.scaleway.com/en/stardust-instances/
[scaleway-terraform-provider]: https://registry.terraform.io/providers/scaleway/scaleway/latest
[scaleway-terraform]: https://www.scaleway.com/en/terraform/
[scaleway-website]: https://www.scaleway.com/en/elements/
[soc-wikipedia]: https://en.wikipedia.org/wiki/System_on_a_chip
[ttl-wikipedia]: https://en.wikipedia.org/wiki/Time_to_live