Create and Expose a public DNS service
What is DNS ?
The Domain Name System (DNS) is a system that provided human readable names for computers, services and other resources connected to the internet. Basic records allow to translate a Domain Name (that humans can understand) into an IP Address (that computer understand for routing).
DNS is used all the time, like right now, for accessing this blog, your computer make a DNS request to translate lunik.tiwabbit.fr
into an IP. Then it requested the web page to the IP address resolved by the DNS query.
You can view this behavior by opening the developper console on your web browser :
Warning
All the IP address and DNS names used in this blog post are no longer in use by my project. Please don't make stupid things and ignore them.
Make DNS query
You can make basic DNS query using nslookup
on your terminal :
You will get something like :
Name: lunik.tiwabbit.fr
Address: 13.224.247.30
Name: lunik.tiwabbit.fr
Address: 13.224.247.31
Name: lunik.tiwabbit.fr
Address: 13.224.247.51
Name: lunik.tiwabbit.fr
Address: 13.224.247.100
Where to deploy the service ?
I have decided to host my DNS service on a public cloud provider named Scaleway.
Here is a global view of the architecture :
I have decided to expose two public DNS servers one in each region : France (Paris 1) and Netherlands (Amsterdam 1).
I choose those two regions because theire are the only one to propose Stardust Instances. DNS servers doesn't require a tons of resources, it allow me to reduce the project cost to a minimum.
The two servers are exposed with public flexible
IPs (one v4 and one v6 each) address that can be resolved via two DNS entries for more convenience.
The security was my main concern so I have attached a security group to my instances allowing me to filter incoming and outgoing traffic.
Deploying the infrastructure
Now that I know what architecture I want for the service. I need to deploy it on the cloud provider. I decided to use Terraform to create all the resources and bind them together.
Following the Terraform documentation for creating my ressource is pretty straight forward. After 1 hour I had the necessary code. Now I just needed to apply it.
Here is what the plan
looked like (truncated) :
Terraform will perform the following actions:
# scaleway_instance_ip.france["tiwabbit-dns-01"] will be created
+ resource "scaleway_instance_ip" "france" {
+ address = (known after apply)
+ id = (known after apply)
+ reverse = "dns01.tiwabbit.fr"
+ server_id = (known after apply)
+ zone = "fr-par-1"
}
# scaleway_instance_security_group.france will be created
+ resource "scaleway_instance_security_group" "france" {
+ description = "dns"
+ enable_default_security = true
+ external_rules = false
+ id = (known after apply)
+ inbound_default_policy = "drop"
+ name = "public-dns"
+ outbound_default_policy = "accept"
+ stateful = true
+ zone = "fr-par-1"
+ inbound_rule {
+ action = "accept"
+ ip = "X.X.X.X"
+ port = 22
+ protocol = "TCP"
}
+ inbound_rule {
+ action = "accept"
+ ip_range = "0.0.0.0/0"
+ port = 53
+ protocol = "UDP"
}
+ inbound_rule {
+ action = "accept"
+ ip_range = "::/0"
+ port = 53
+ protocol = "UDP"
}
}
# scaleway_instance_server.france["tiwabbit-dns-01"] will be created
+ resource "scaleway_instance_server" "france" {
+ enable_dynamic_ip = false
+ enable_ipv6 = true
+ id = (known after apply)
+ image = "fr-par-1/5c8bbf4b-10f0-4cac-863b-4561781043ff"
+ ip_id = (known after apply)
+ ipv6_address = (known after apply)
+ ipv6_gateway = (known after apply)
+ ipv6_prefix_length = (known after apply)
+ name = "tiwabbit-dns-01"
+ private_ip = (known after apply)
+ public_ip = (known after apply)
+ security_group_id = "fr-par-1/dns"
+ state = "started"
+ type = "STARDUST1-S"
+ zone = "fr-par-1"
+ root_volume {
+ size_in_gb = "10"
+ volume_id = (known after apply)
}
}
[...]
Plan: 8 to add, 0 to change, 0 to destroy.
After completing the apply
this is what I had on Scaleway Console :
Instances :
Volumes :
Security groups :
Here, you can see that I'm only allowing inbound traffic on the port
53
which is the one used by DNS servers. (The first rule with port 22
allow me to manage the server from a private location using SSH)
Installing and configuring the service
Now that I have two brand new server at my disposition, I need to install and configure the DNS software to run on.
There are running Fedora 32 with a 5.6
Linux Kernel :
[root@tiwabbit-dns-01 ~]# screenfetch
/:-------------:\ root@tiwabbit-dns-01
:-------------------:: OS: Fedora
:-----------/shhOHbmp---:\ Kernel: x86_64 Linux 5.6.6-300.fc32.x86_64
/-----------omMMMNNNMMD ---: Uptime: 18m
:-----------sMMMMNMNMP. ---: Packages: 405
:-----------:MMMdP------- ---\ Shell: bash 5.0.11
,------------:MMMd-------- ---: Disk: 1.0G / 9.6G (11%)
:------------:MMMd------- .---: CPU: AMD EPYC 7281 16-Core @ 2.096GHz
:---- oNMMMMMMMMMNho .----: RAM: 293MiB / 969MiB
:-- .+shhhMMMmhhy++ .------/
:- -------:MMMd--------------:
:- --------/MMMd-------------;
:- ------/hMMMy------------:
:-- :dMNdhhdNMMNo------------;
:---:sdNMMMMNds:------------:
:------:://:-------------::
:---------------------://
Fun fact Scaleway Stardust instance runs on AMD EPYC SoC !
I have decided to use Bind9 DNS server because it's already packaged in many distributions, there are a lot of documentation and the community is strong.
I'm using Ansible to deploy all the stack : base linux config, Bind9, Firewalld, Fail2Ban
Tip
I have alreay talk about Firewalld and Fail2Ban in another blog post : Securing web entrypoint from external threats
But for the purpose of this blog post I will detail equivalent bash
commands that can be used.
Bind9 installation and configuration
The installation of Bind9 is pretty strait forward. Since it's already packaged I only need to make a simple command to install it :
[root@tiwabbit-dns-01 ~]# dnf install bind bind-utils
Last metadata expiration check: 0:02:41 ago on Fri 05 Nov 2021 09:49:15 AM UTC.
Dependencies resolved.
============================================================================================================================================================================================================
Package Architecture Version Repository Size
============================================================================================================================================================================================================
Installing:
bind x86_64 32:9.11.28-1.fc32 updates 2.0 M
bind-utils x86_64 32:9.11.28-1.fc32 updates 233 k
Installing dependencies:
bind-dnssec-doc noarch 32:9.11.28-1.fc32 updates 46 k
bind-libs x86_64 32:9.11.28-1.fc32 updates 90 k
bind-libs-lite x86_64 32:9.11.28-1.fc32 updates 1.1 M
bind-license noarch 32:9.11.28-1.fc32 updates 16 k
fstrm x86_64 0.5.0-2.fc32 fedora 28 k
mariadb-connector-c x86_64 3.1.12-1.fc32 updates 203 k
mariadb-connector-c-config noarch 3.1.12-1.fc32 updates 11 k
policycoreutils-python-utils noarch 3.0-2.fc32 fedora 83 k
protobuf-c x86_64 1.3.2-2.fc32 fedora 35 k
python3-bind noarch 32:9.11.28-1.fc32 updates 64 k
Installing weak dependencies:
bind-dnssec-utils x86_64 32:9.11.28-1.fc32 updates 128 k
Transaction Summary
============================================================================================================================================================================================================
Install 13 Packages
Total download size: 4.0 M
Installed size: 10 M
Is this ok [y/N]:
Once it's installed, using the Bind9 documentation I have put the following configuration in /etc/named.conf
:
acl "managment" {
X.X.X.X/32;
};
acl "public" {
0.0.0.0/0;
::/0;
};
options {
dump-file "/etc/named/data/cache_dump.db";
statistics-file "/etc/named/data/named_stats.txt";
memstatistics-file "/etc/named/data/named_mem_stats.txt";
secroots-file "/etc/named/data/named.secroots";
recursing-file "/etc/named/data/named.recursing";
listen-on port 53 { any; };
listen-on-v6 port 53 { any; };
allow-transfer { none; };
max-cache-size 70%;
allow-query-cache {
127.0.0.1;
localhost;
managment;
public;
};
allow-query {
127.0.0.1;
localhost;
managment;
public;
};
recursion yes;
allow-recursion {
127.0.0.1;
localhost;
managment;
public;
};
dnssec-enable yes;
dnssec-validation yes;
prefetch 4 10;
rate-limit {
ipv4-prefix-length 28;
ipv6-prefix-length 56;
responses-per-second 20;
window 5;
slip 3;
};
managed-keys-directory "/var/named/dynamic";
geoip-directory "/usr/share/GeoIP";
pid-file "/run/named/named.pid";
session-keyfile "/run/named/session.key";
hostname "dns01.tiwabbit.fr";
server-id "dns01.tiwabbit.fr";
/* https://fedoraproject.org/wiki/Changes/CryptoPolicy */
include "/etc/crypto-policies/back-ends/bind.config";
};
statistics-channels {
inet 127.0.0.1 port 8053 allow { 127.0.0.1; };
};
logging {
channel client_file {
file "/var/log/named/client.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel cname_file {
file "/var/log/named/cname.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel config_file {
file "/var/log/named/config.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel database_file {
file "/var/log/named/database.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel default_file {
file "/var/log/named/default.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel delegation-only_file {
file "/var/log/named/delegation-only.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel dispatch_file {
file "/var/log/named/dispatch.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel dnssec_file {
file "/var/log/named/dnssec.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel dnstap_file {
file "/var/log/named/dnstap.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel edns-disabled_file {
file "/var/log/named/edns-disabled.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel general_file {
file "/var/log/named/general.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel lame-servers_file {
file "/var/log/named/lame-servers.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel network_file {
file "/var/log/named/network.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel notify_file {
file "/var/log/named/notify.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel queries_file {
file "/var/log/named/queries.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel query-errors_file {
file "/var/log/named/query-errors.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel rate-limit_file {
file "/var/log/named/rate-limit.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel resolver_file {
file "/var/log/named/resolver.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel rpz_file {
file "/var/log/named/rpz.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel security_file {
file "/var/log/named/security.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel spill_file {
file "/var/log/named/spill.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel trust-anchor-telemetry_file {
file "/var/log/named/trust-anchor-telemetry.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel unmatched_file {
file "/var/log/named/unmatched.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel update_file {
file "/var/log/named/update.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel update-security_file {
file "/var/log/named/update-security.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel xfer-in_file {
file "/var/log/named/xfer-in.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
channel xfer-out_file {
file "/var/log/named/xfer-out.log" versions 3 size 5m;
severity dynamic;
print-time yes;
print-category yes;
print-severity yes;
};
category client { client_file; default_debug; };
category cname { cname_file; default_debug; };
category config { config_file; default_debug; };
category database { database_file; default_debug; };
category default { default_file; default_debug; };
category delegation-only { delegation-only_file; default_debug; };
category dispatch { dispatch_file; default_debug; };
category dnssec { dnssec_file; default_debug; };
category dnstap { dnstap_file; default_debug; };
category edns-disabled { edns-disabled_file; default_debug; };
category general { general_file; default_debug; };
category lame-servers { lame-servers_file; default_debug; };
category network { network_file; default_debug; };
category notify { notify_file; default_debug; };
category queries { queries_file; default_debug; };
category query-errors { query-errors_file; default_debug; };
category rate-limit { rate-limit_file; default_debug; };
category resolver { resolver_file; default_debug; };
category rpz { rpz_file; default_debug; };
category security { security_file; default_debug; };
category spill { spill_file; default_debug; };
category trust-anchor-telemetry { trust-anchor-telemetry_file; default_debug; };
category unmatched { unmatched_file; default_debug; };
category update { update_file; default_debug; };
category update-security { update-security_file; default_debug; };
category xfer-in { xfer-in_file; default_debug; };
category xfer-out { xfer-out_file; default_debug; };
};
Wow that's a lot !
The important line to configure are :
Access-control lists
ACLs allow you to choose the behavior of the service depending on the client IP address.
Let's take a look at my configuration :
Here I'm creating two ACL groups : public
and managment
. In each of these acl
block I can put as many CIDRs as I want.
public
contains the IPv4 and IPv6 global CIDR. managment
contains a unique IP CIDR (the one used to configure the service).
The you can reuse those ACLs groups in other part of the configuration like in allow-query
, allow-query-cache
, allow-recursion
.
Query behavior
I choose to implement only three query behaviors to keep it simple :
allow-query
define who is allow to make query to my DNS service. If the client IP is not in the list, the server will not respond.allow-query-cache
define from witch client the server should cache the responses. This allow the query to be resolved quicker the next time.allow-recursion
define who can make recursive queries. Query recursion is a DNS mechanism that find the IP associated with a DNS entry by making all the necessary requests one by one from the root servers. This allow to be independent when resolving queries (you don't need to forward the request to another public DNS server like8.8.8.8
or1.1.1.1
)
Prefetching
The name says it all. This configuration allow the server to make DNS request on is own in anticipation of other requests. This allow to be quicker to response most of the time if a DNS entry is requested very often.
Each DNS entries have a Time To Live (or TTL). You can get it using dig
:
dig lunik.tiwabbit.fr
; <<>> DiG 9.11.28-RedHat-9.11.28-1.fc32 <<>> lunik.tiwabbit.fr
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41640
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;lunik.tiwabbit.fr. IN A
;; ANSWER SECTION:
lunik.tiwabbit.fr. 66 IN A 52.222.158.86
lunik.tiwabbit.fr. 66 IN A 52.222.158.38
lunik.tiwabbit.fr. 66 IN A 52.222.158.55
lunik.tiwabbit.fr. 66 IN A 52.222.158.103
;; Query time: 0 msec
;; SERVER: 10.194.3.3#53(10.194.3.3)
;; WHEN: Fri Nov 05 10:13:42 UTC 2021
;; MSG SIZE rcvd: 110
In the ANSWER SECTION
:
66
is the TTL of this DNS entry. This means that in 66
seconds it is not valid anymore and the client need to make another request to get the new IP (most of the time, it doesn't change).
I my configuration I have :
If the server have cached a DNS entry with 4
or less TTL remaining then it will make a DNS query to refresh the cached one. 10
is a optional parameter which define the eligibility of the record for prefetching.
Rate limiting
This is maybe one of the most important parameter of a public DNS server. The goal of this configuration is to limit the number of response the server make if a client request to many time the same DNS entry. It is very useful to mitigate DNS amplification attacks.
Here is the configuration I have :
rate-limit {
ipv4-prefix-length 28;
ipv6-prefix-length 56;
responses-per-second 20;
window 5;
slip 3;
};
If a client request more than 20
time the same DNS entry in a period of 5
seconds, the server drop 3
responses before responding (indefinitely).
To prevent attack from large subnets the rate limite extends to a /28
IPv4 subnet and a /56
IPv6 subnet.
Logging
I made the choice to be very verbose with the logs that's why I have configured all possible channels.
The more interesting are
- queries
which log every query resolved by the server :
# /var/log/named/queries.log
05-Nov-2021 10:21:43.536 queries: info: client @0x7f76073dfe50 X.X.X.X#14821 (mms-iad.sp-prod.net): query: mms-iad.sp-prod.net IN A +E(0)DV (10.70.2.235)
05-Nov-2021 10:21:43.545 queries: info: client @0x7f760793d450 X.X.X.X#47583 (psja.isd.us): query: psja.isd.us IN MX +E(0)DV (10.70.2.235)
05-Nov-2021 10:21:43.547 queries: info: client @0x7f760759b400 X.X.X.X#50020 (global.reputation.invincea.com): query: global.reputation.invincea.com IN A +E(0)DV (10.70.2.235)
05-Nov-2021 10:21:43.556 queries: info: client @0x7f76075dddc0 X.X.X.X#38383 (forconzoomnyc233mmr.zoom.us): query: forconzoomnyc233mmr.zoom.us IN A + (10.70.2.235)
05-Nov-2021 10:21:43.566 queries: info: client @0x7f76077ba820 X.X.X.X#39161 (ps-membership.us-ctkip-ps3.dell.com): query: ps-membership.us-ctkip-ps3.dell.com IN A + (10.70.2.235)
05-Nov-2021 10:21:43.618 queries: info: client @0x7f7607a9f2c0 X.X.X.X#39161 (ps-membership.usgit.dell.com): query: ps-membership.usgit.dell.com IN A + (10.70.2.235)
05-Nov-2021 10:21:43.622 queries: info: client @0x7f76077fcbc0 X.X.X.X#56424 (icn.intl-global-adns.alibabacloud.com): query: icn.intl-global-adns.alibabacloud.com IN A + (10.70.2.235)
05-Nov-2021 10:21:43.623 queries: info: client @0x7f760765f400 X.X.X.X#38383 (forcon-zoomca193-123-14-158mmr.zoom.us): query: forcon-zoomca193-123-14-158mmr.zoom.us IN A + (10.70.2.235)
05-Nov-2021 10:21:43.635 queries: info: client @0x7f75f0ce5910 X.X.X.X#56548 (sfc-idzwww.riotgames.roblox.com.ru): query: sfc-idzwww.riotgames.roblox.com.ru IN A + (10.70.2.235)
rate-limit
which log every events regarding rate limiting behavior :
# /var/log/named/rate-limit.log
05-Nov-2021 10:21:39.631 rate-limit: info: client @0x7f76077cdd80 X.X.X.X#38383 (zoomff134-224-74-182mmrforcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us (4ef96ce9)
05-Nov-2021 10:21:39.694 rate-limit: info: client @0x7f75f0cf3f10 X.X.X.X#38383 (bisdtx-orgforcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us (4ef96ce9)
05-Nov-2021 10:21:39.752 rate-limit: info: client @0x7f76077fcbc0 X.X.X.X#38383 (kirklandwa-forcon.zoom.us): rate limit slip NXDOMAIN response to X.X.X.X/28 for zoom.us (4ef96ce9)
05-Nov-2021 10:21:39.863 rate-limit: info: client @0x7f76077eab00 X.X.X.X#38383 (friendsnrcforcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us (4ef96ce9)
05-Nov-2021 10:21:39.942 rate-limit: info: client @0x7f76078e8610 X.X.X.X#38383 (deltatrust-org-uk-forcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us (4ef96ce9)
05-Nov-2021 10:21:40.857 rate-limit: info: client @0x7f7607a13120 X.X.X.X#38383 (12mmrforcon.zoom.us): rate limit slip NXDOMAIN response to X.X.X.X/28 for zoom.us (4ef96ce9)
05-Nov-2021 10:21:40.871 rate-limit: info: client @0x7f76076509a0 X.X.X.X#38383 (datamarkgisforcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us (4ef96ce9)
05-Nov-2021 10:21:40.891 rate-limit: info: client @0x7f7607866d60 X.X.X.X#38383 (wvsd208-forcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us (4ef96ce9)
05-Nov-2021 10:21:40.907 rate-limit: info: client @0x7f7607146a70 X.X.X.X#38383 (zoomva198-251-217-184mmrforcon.zoom.us): rate limit slip NXDOMAIN response to X.X.X.X/28 for zoom.us (4ef96ce9)
05-Nov-2021 10:21:40.968 rate-limit: info: client @0x7f760713b020 X.X.X.X#38383 (zoomdvs185mmr-forcon.zoom.us): rate limit drop NXDOMAIN response to X.X.X.X/28 for zoom.us (4ef96ce9)
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting NXDOMAIN responses to X.X.X.X/28 for buffer.com (bc75c42e)
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting error responses to X.X.X.X/28
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting NXDOMAIN responses to X.X.X.X/28 for restintergamma.nl (97dd2767)
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting NXDOMAIN responses to X.X.X.X/28 for alibabacloud.com (d22fa033)
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting NXDOMAIN responses to X.X.X.X/28 for dell.com (98e605b5)
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting error responses to X.X.X.X/28
05-Nov-2021 10:21:43.761 rate-limit: info: *stop limiting NXDOMAIN responses to X.X.X.X/28 for zoom.us (4ef96ce9)
Monitoring
I choose to use Datadog to monitor the DNS servers.
Datadog agent installation
The installation is pretty simple with Ansible since they provide a galaxy role. I used it with the following configuration :
---
datadog_api_key: "{{ vault_datadog_api_key }}"
datadog_site: "datadoghq.eu"
datadog_agent_flavor: "datadog-agent"
datadog_agent_major_version: 7
datadog_enabled: yes
datadog_bind9_integration_version: 1.0.0
datadog_config:
logs_enabled: true
datadog_checks:
bind9:
init_config:
instances:
- name: bind9
url: "http://127.0.0.1:{{ bind_statistics_port }}"
logs:
- type: file
path: /var/log/named/queries.log
service: named
source: bind9
sourcecategory: queries
- type: file
path: /var/log/named/rate-limit.log
service: named
source: bind9
sourcecategory: rate-limit
I'm activating the Bind9 and logs integrations.
Once deployed, I can see my agents through the Datadog web console in the Infrastructure
panel :
Datadog metrics and logs
Metrics
Since I have activated the Bind9 integration in the agent configuration, I need to do the same in the web console. In the Integrations
panel, search for Bind9
and follow the configuration guide.
Now I can see metrics showing up.
Logs
The logs are already showing up in the log explorer in the Logs
panel :
But Datadog doesn't know how to understand those logs. Now he only sees a large string of characters. If I want analyse those logs, Datadog needs to parse them for me.
In Datadog builtin logs pipeline in the Logs
panel, I can define a list of pattern and action to parse the log file produced by the Bind9 service. On some more popular software Datadog has already made the work for you but it seems Bind9 is an exception.
Defining logs parsing pipelines
First I created a new pipeline and use predefined filters : source:bind9
and sourcecategory:queries
.
Those two filters are deducted from the Datadog agent configuration from earlier :
datadog_checks:
bind9:
[...]
logs:
[...]
- type: file
path: /var/log/named/queries.log
service: named
source: bind9
sourcecategory: queries
[...]
Now that I'm only going to parse logs from the right source, I need a Grok parser will parse the log string. Grok parser define blocks in the string and put them into variables. When the line of log is parsed, it return a beautifully formatted JSON object.
Blocks are defined by "simplified" regex : number
, word
, date
, ip
, ...
Datadog allow me to do it really simply by have a "live" parsing view were you can see in real time what part of the log is parsed by which bloc.
So here is the final Grok parsing rule for queries
logs :
default %{number}-%{word}-%{number}\s+%{date("HH:mm:ss.SSS"):timestamp}\s+queries:\s+%{word:status}:\s+client\s+@%{word:client.data}\s+%{ip:client.ip}\#%{number:client.port}\s+\(%{hostname:query.hostname}\):\s+query:\s+%{hostname:query.fqdn}\s+%{word:query.location}\s+%{word:query.qcode}\s+.*\s+\(%{ip:server.ip}\).*
With this example :
27-Oct-2021 16:53:43.594 queries: info: client @0x7fad52ee1fd0 127.0.0.1#64318 (google.fr): query: google.fr IN A +E(0) (10.69.86.243)
{
"status": "info",
"query": {
"qcode": "A",
"location": "IN",
"hostname": "google.fr",
"fqdn": "google.fr"
},
"client": {
"ip": "127.0.0.1",
"data": "0x7fad52ee1fd0",
"port": 64318
},
"timestamp": 1636131223594,
"server": {
"ip": "10.69.86.243"
}
}
Pretty neat !
Datadog allow to enhance that new JSON object with extra metadata. I my case, since I have the client IP addresse, I can use the GeoIP parser to find metadata about that IP. Now I can determine the ISP, the country and even the city of that client.
Here is an example :
05-Nov-2021 14:07:16.442 queries: info: client @0x7f6b462b2900 X.X.X.X#38383 (zoom.us): query: zoom.us IN A + (10.70.2.235)
{
"client": {
"geoip": {
"as": {
"domain": "online.net",
"name": "ONLINE S.A.S.",
"number": "AS12876",
"route": "51.15.0.0/16",
"type": "isp"
},
"city": {
"name": "Paris"
},
"continent": {
"code": "EU",
"name": "Europe"
},
"country": {
"iso_code": "FR",
"name": "France"
},
"ipAddress" : "X.X.X.X",
"location": {
"latitude": "48.85341",
"longitude" "2.3488"
},
"subdivision": {
"name": "Île-de-France"
},
"timezone": "Europe/Paris"
}
}
}
Dashboard
I can now begin the fun part of using a monitoring service : Make Dashboard and Graphs !
Here are some that I have created :
Conclusions
Now I have a fully operational public DNS server. I can now configure enchanced bahaviour if I want, like blocking some domains names (malware, illegal stuff, ...).
Note : If you want to make your public DNS available for anyone, you can publish it on public-dns.info