Peter Lowe

The Joy of TXT

Peter Lowe
2

TXT records are perhaps the most flexible type of DNS records available - but have you ever wondered how they’re really used? To see if we can answer this, the TXT records of 1 million domains are examined to see if there’s any rhyme or reason as to how people employ this quirky, open-ended record type.


Probably the most common way that the Domain Name System (DNS) is described is as a way to map domains to IP addresses. But this only tells part of the story: domains can have all sorts of records associated with them other than just IP addresses. These different resource record types, known as RRTYPEs, are used for things like aliasing a domain to another one (CNAME and DNAME records), specifying which mail servers to use for a domain (MX records), or securing domains against poisoning attacks with DNSSEC (DS, DNSKEY, et al.) - and a whole bunch of other things.

One of the most common, and versatile, types are TXT records. They can contain any arbitrary text, and domains can have multiple TXT records associated with them. Each TXT record can also have multiple strings associated with it (see below). They’re often used to configure mail security for a domain, verify domain ownership, and just to record various bits of information.

Because they’re arbitrary though, they can really be used for anything. So I decided to take a closer look, by checking the TXT records for 1 million domains and seeing what I could find.

TXTual history

NB: If you’re not familiar with RFCs, they’re kind of like standards for the Internet - they codify common practices. See Wikipedia for a full description of what they are and how they work.

Due to their flexibility, it's hard to predict how TXT records will be used. But where did they come from? In 1987, TXT records were first defined in RFC 1035 as "descriptive text". The only note provided was that "the semantics of the text depends on the domain where it is found". So that left it pretty open.

Later, in 1993, RFC 1464 was published - "Using the Domain Name System To Store Arbitrary String Attributes". This formalised the use of TXT records to store configuration settings for a domain in the format "key=value". While this format wasn’t required when using TXT records in this way, it definitely seems to have become the most common method used since then.

Another common use of TXT records is in other RFCs. Because of their flexibility, TXT records can be used as a place to store details about a protocol or framework, as in RFC 7208 (Sender Policy Framework). These types of RFCs specify the exact values and format that should be used, as an alternative to defining a whole new RRTYPE like with LOC.

Notes on the format of TXT records

Your basic TXT record looks like you’d expect - a simple character string:

"Example text here"

Domains can also have multiple TXT records associated:

"Example"
"text"
"here"

There is no way to specify any kind of ordering, so multiple records can be returned in a different order each time you ask for them.

But as mentioned earlier, it’s also valid to specify multiple strings:

"one" "two" "three"

Something I discovered while writing this is that RFC 7208 (Sender Policy Framework) has an interesting definition of how this usage is interpreted:

3.3.  Multiple Strings in a Single DNS Record

   As defined in [RFC1035], Sections 3.3 and 3.3.14, a single text DNS
   record can be composed of more than one string.  If a published
   record contains multiple character-strings, then the record MUST be
   treated as if those strings are concatenated together without adding
   spaces.  For example:

      IN TXT "v=spf1 .... first" "second string..."

   is equivalent to:

      IN TXT "v=spf1 .... firstsecond string..."

   TXT records containing multiple strings are useful in constructing
   records that would exceed the 255-octet maximum length of a
   character-string within a single TXT record.

I think this applies to all DNS records, but it might just be for SPF in particular. If I’m right, then this doesn’t appear to be a well-known fact, because a lot of domains out there specify multiple strings but seem to assume that a space would be added between them when concatenating them together. One particular example is for one of the DNS giants of the Internet, Akamai, who have the following set for akamai.net:

"This" "is" "not" "the" "nameserver" "you" "are" "looking" "for"

Which, according to RFC 7208, should end up as:

"Thisisnotthenameserveryouarelookingfor"

Move along.

TXT work (source data and methodology)

I wrote a pretty basic shell script that I’m not particularly proud of, but worked well enough, to go through two different lists of top domains and check TXT records for each entry:

#!/bin/bash
domainsfile="domains.txt"

while read domain
do
    echo "--"
    echo "[$(date)] CHECKING DOMAIN $domain"
    echo "--"

    host -W 5 -t txt "$domain". # -W 5 for a 5 second timeout
done < "$domainsfile"

The general intent was to ensure the output was human-readable, because I wanted to be able to look through it myself, but also useful to allow for parsing to get totals etc.

I then ran this and captured the output:

./get-txt-records.sh > host.out 2>&1

This produces output in the file host.out for each domain that looks like this:

--
[Mon 03 Apr 2023 04:12:36 PM BST] CHECKING DOMAIN amazonaws.com
--
amazonaws.com descriptive text "pf2vv39dfkf9tszsg5lggfs6tp6bkjn4"
amazonaws.com descriptive text "v=spf1 include:amazon.com ~all"
amazonaws.com descriptive text "spf2.0/pra include:amazon.com ~all"

The script took about 48 hours to run each time against a nameserver that hadn’t been specifically warmed on the results. The first run was with the Tranco list, generated on 10 March 2023, available at https://tranco-list.eu/list/PZJ3J. This wasn’t as interesting as I’d hoped though, I think because it was all effective second-level domains (eSLDs, which I call “parent domains”). So I ran it again with the Cisco top 1m domains list downloaded on March 13th 2023.

I did a lot of checking with just cut, grep, etc., but I did also then write a little script to import all records into an SQLite database to make things easier.

All of the files mentioned here, including scripts, the SQLite database, and files created while looking at record lengths, unique records, etc., have been uploaded to GitHub under a Creative Commons Zero licence:

TXT by numbers

Number of TXT records 765,650
Number of unique TXT records 595,398
Domains with TXT records 584,244
Domains without TXT records 415,756
Average number of TXT records per domain (that has them) 1
Longest TXT record 7,886 characters
Second-longest TXT record 5,498 characters
Total length of all TXT records concatenated 49,813,321 characters
DMARC1 records 4,218
SPF1 records 164,459
SPF1 records with "include" 131,892
SPF1 records with just "v=spf1 -all" 8,444
SPF2 records 5,091
SPF3 records 808
key=value TXT records 630,317
Empt records (just "") 183
Empty records with spaces ("\s+") 12
Empty records with just "~" 109
Verification / confirmation records 402,230
Top 5 verification records
google-site-verification 170,225
MS 56,160
Facebook 28,273
Globalsign 17,396
Apple 17,201
Fixed-length TXT records
68 characters 170,941 (mostly Google site
verifications)
13 characters 48,763 (mostly MS= TXT records)
32 characters 48,648 (random strings)
59 characters 33,205 (Facebook verifications)
26 characters 25,559 (random strings)
URLs 708
Non-HTTP URLs 16
Email addresses 4,487
Hello worlds 3
Greetings 14
Swear words not appearing in domains 0
<script> tags 1
Embedded DNS records ("IN ...") 225
Mentions of "ALIAS for" another domain 363
Security code 769
Please 27
References to tickets 28

Conclusion

So, what are TXT records used for exactly? Well, we can see that key-value settings are the most common use case, with domain verification records being the majority of those. SPF records also make a strong showing, as well as a lot of seemingly random fixed-length records that are probably being used for encoding data somehow.

But overall, they really are used for anything and everything. There’s some patterns we can pick out, but the lack of rigid rules means that the freedom to put whatever you like in a TXT record has been liberally accepted by the Internet as a whole. Which, if you ask me, is a good thing - having something in the DNS that can act as a config store, notes field, playground for new standard, or even the basis for file storage (not that this would really be recommended), has meant that we haven’t had to wait for standards to catch up in order to continue making use of this wonderful system that underpins a fundamental part of the Internet.

2

About the author

Peter Lowe Based in UK

Peter Lowe is the FIRST DNS Abuse Ambassador and co-chair of the DNS Abuse SIG. He has worked in or around internet protocols since joining the second internet cafe in the UK in 1995, co-hosts the Not So Critical Update podcast, and maintains one of the popular blocklists used by ad blockers and tracking prevention software. He likes travel and finding new ways to use things in ways they weren't originally intended.

Comments 2