Frank DENIS random thoughts.

Getting prepared for the next Cryptolocker DGA

A common feature of modern malware is the ability to communicate with command and control servers (C&C).

Infected hosts become "slaves", waiting for commands sent by C&C servers. And commands usually include launching DoS attacks, sending spam, sending passwords captured from web forms, and downloading and running additional software.

Cryptolocker also communicates with C&C servers, but for a totally different purpose.

A computer infected by Cryptolocker downloads a public key from a C&C server, generates a random key for each file it wants to encrypt using symmetric encryption, and encrypts each of these keys using RSA with the public key received from the C&C server.

The same C&C server can be used later on to get the private key, which can be used to decrypt the file-specific keys, which in turn can be used to decrypt the content of the files. And getting the RSA private key requires paying the ransom.

If Cryptolocker was dependent on a single C&C server, taking it down would be pretty easy. Unfortunately, the malware has taken the DGA approach to be very resilient to takedowns.

A set of 1000 domains is computed every day, based on the current date. And the malware tries to contact all of them until an active C&C is found.

If Cryptolocker is unable to contact a valid C&C server, the whole encryption process cannot happen at all. And OpenDNS customers infected by the current versions of the malware won’t see their files encrypted, as we are blocking all C&C candidates for them.

The algorithm used to generate the domain names has been reverse engineered. Some domains have been sinkholed.

But malware authors are constantly improving their code, and the malware can switch to a different algorithm anytime soon.

In this context, we investigated ways to possibly spot domain names used by the malware even after an update of the name generation algorithm.

A look at DNS queries sent by the malware

Right after having downloaded and run a Cryptolocker malware sample, the system it is running on will start trying to contact the domain names computed for the current day.

This is a tcpdump capture right after an infection:

21:29:18.869632 IP > 53730+ A? (38)
21:29:18.958944 IP > 53730 NXDomain
0/1/0 (98)
21:29:18.959753 IP > 47034+ A? (50)
21:29:19.067598 IP > 47034 NXDomain
0/1/0 (125)
21:29:20.058000 IP > 59150+ A? (37)
21:29:20.104971 IP > 59150 NXDomain
0/1/0 (110)
21:29:20.105764 IP > 46623+ A? (49)
21:29:20.206606 IP > 46623 NXDomain
0/1/0 (124)
21:29:21.198283 IP > 45524+ A? (37)
21:29:21.316721 IP > 45524 NXDomain
0/1/0 (110)
21:29:21.317851 IP > 56585+ A? (49)
21:29:21.427553 IP > 56585 NXDomain
0/1/0 (124)
21:29:22.416900 IP > 43280+ A? (37)
21:29:22.547327 IP > 43280 NXDomain
0/1/0 (99)
21:29:22.547939 IP > 62173+ A? (49)
21:29:22.712282 IP > 62173 NXDomain
0/1/0 (124)
21:29:23.715155 IP > 15149+ A? (36)
21:29:23.759934 IP > 15149 NXDomain
0/1/0 (97)
21:29:23.760797 IP > 43759+ A? (48)
21:29:23.857448 IP > 43759 NXDomain
0/1/0 (123)
21:29:24.854171 IP > 59677+ A? (37)
21:29:24.898653 IP > 59677 NXDomain
0/1/0 (100)
21:29:24.899409 IP > 31114+ A? (49)
21:29:25.141083 IP > 31114 NXDomain
0/1/0 (124)
21:29:26.135519 IP > 58033+ A? (39)

Rebooting the system immediately restarts the scan.

The delay between two queries is just over 1 second.

The malware doesn’t use a custom stub resolver, doesn’t parallelize queries, and sequentially sends a query, waits for a response from the default resolver, waits for one second, and sends a query for the next domain name in the list.

Even if all the domains are blocked, the malware doesn’t seem to retry at fixed intervals.

We built a set of client IPs that sent queries to a domain name previously known to be a Cryptolocker C&C, and we assume these client IPs being infected by the malware.

These infected client IPs were observed scanning the list multiple times before they find an active C&C.

But the period of these scans doesn’t seem to follow any pattern, even for a given IP. A reasonable explanation is that rescans mainly happen when clients are rebooting their system.

Out of 1000 domains, only a dozen of them actually resolve every day, some being sinkholes that we handled as nonexistent domains.

Once a valid C&C is found, we do not observe any scans of the C&C list any more, even after restarting the system.

The following observations characterize the DNS traffic observed for clients that got infected, before their files get encrypted:

  • A spike of queries leading to nonexistent-domains, none of these domain names having been observed in our traffic before. In our experiments, we defined “before” as 2 days before the reference date. The delay between two queries leading to a nonexistent domain is, on average, 1.43 seconds, with a standard deviation of 0.79. We saw some outliers, scanning the full list of daily domains at a very high rate. We believe these queries have been sent by security researchers and not by the malware.
  • The TTL of DNS records used by C&C servers is 300 second.
  • None of the functional domains seems to have been registered before September 2013.

An intuition to complete this list is that the number of queries leading to nonexistent domains is higher than what is observed for non-infected clients.

A trivial way to check if this intuition can be turned into a significant feature for our detection model is to compute (number of queries leading to a nonexistent domain) / (number of resolving queries) for each client IP address and for a given time window. Client IPs with a high ratio are more likely to be infected by malware phoning home using DGAs.

Unfortunately, this didn't quite work in practice.

We observe a ratio close to 1.0 even for non-infected clients, mainly due to misconfigured routers leaking queries for internal domain names.

Some anti viruses and content filters are sending a lot of DNS queries as a way to check if a URL or domain exists in the vendor’s database.

In addition, some clients don’t run any caching resolver or stub resolver, and keep sending the same valid queries over and over again.

We thus change the formula to use the number of unique resolving/non-resolving domains (not host names) instead of the number of queries.

At first, the results look interesting.

This graph shows (number of unique domains leading to a non existent domain) / (number of unique domains looked up), for 1-hour time slices, and for 4 clients that just got infected by Cryptolocker.

These clients are all OpenDNS customers, therefore all the Cryptolocker C&C domains have been blocked.

For all of them, we observe a ratio peaking around 3,000 queries/hour. This is not surprising considering how Cryptolocker is scanning the C&C candidates. Along with Cryptolocker C&C domains, we frequently observed queries for Zeus botnet C&C domains coming from the same client IP addresses:

The ratio always quickly grows from ~0.0 to a high value, and suddenly drops to ~0.0 again. This is presumably being caused by the system being repeatedly switched on and off by infected users.

We implemented this model using Hadoop Pig to compare the mean ratio observed for infected clients to the mean ratio observed for non-infected clients.

Things didn’t work out quite as expected, with a high ratio also being observed for a lot of non-infected clients. Mail servers, in particular, tend to send a lot of queries for nonexistent domains. Some clients that we knew were infected also didn’t had a ratio high enough to be confident that they were infected by some malware looking for an active C&C. We were not able to find an acceptable threshold for this model.

A different model to find Cryptolocker DGAs —————————————————————

Cryptolocker runs as a background process. It sends DNS queries even when the user is not actively using his computer to access the Internet.

So, a different approach is to count the number of consecutive DNS queries leading to distinct nonexistent domains observed for a given client, that is, before a resolving DNS query is observed.

Here is the probability distribution function for the number of consecutive non-resolving queries for 843 infected client IPs and 928,461 non-infected client IPs over the same day.

We can expect clients for which a high number of consecutive queries for nonexistent and distinct domains have been observed to be more likely to be infected than clients for which this number is low.

This model has a flaw, though. Client IP addresses sending a lot of queries, mainly because these are external IPs of routers hiding multiple devices, are unlikely to exhibit a high number of queries before a resolving query is found.

While this issue can be worked around, this is not a showstopper. Cryptolocker is widely spread on isolated devices, and a few of queries sent to these for overlapping sets of domains can be enough to build a list of suspicious DGAs to look at.

We used Hadoop by the way of our friend Pig in order to compute the average number of consecutive unique nonexistent domains looked up by each client IP, over 24 hours.

We then extracted the list of domain names looked up by these IP addresses.

Unsurprisingly, the IP addresses with the highest mean number of consecutive lookups for nonexistent domains sent queries to many domains looking like DGAs.

Here are snippets from the top three different IPs.




2013-10-30 20:18:01.090922500 1
2013-10-30 20:18:08.175095500        1
2013-10-30 20:18:08.175193500        1
2013-10-30 20:18:08.246492500      1
2013-10-30 20:25:53.535547500      1
2013-10-30 20:25:58.818431500       1
2013-10-30 20:25:58.818438500 1
2013-10-30 20:25:58.818796500        1
2013-10-30 22:21:39.862487500    1
2013-10-30 22:21:39.862631500      1
2013-10-30 22:21:39.862658500     1
2013-10-30 22:21:39.958563500        1
2013-10-30 22:21:46.594546500     1
2013-10-30 22:21:46.594547500   1
2013-10-30 22:21:46.594547500     1
2013-10-30 22:21:46.594551500   1
2013-10-30 22:21:46.594552500 1
2013-10-30 22:21:46.594552500    1
2013-10-30 22:21:46.594553500        1


2013-10-30 09:16:45.117311500 1
2013-10-30 09:16:45.160197500    1
2013-10-30 09:17:44.876118500  1
2013-10-30 09:17:44.887683500    1
2013-10-30 09:18:44.886323500 1
2013-10-30 09:18:44.924607500   1
2013-10-30 09:19:47.505442500  1
2013-10-30 09:19:47.582744500      1
2013-10-30 09:20:46.348474500 1
2013-10-30 09:20:46.359586500  1
2013-10-30 09:21:46.042207500 1
2013-10-30 09:21:46.202756500 1
2013-10-30 09:22:44.932434500 1
2013-10-30 09:22:44.996834500    1
2013-10-30 09:23:44.799974500  1
2013-10-30 09:23:44.903129500    1
2013-10-30 09:24:46.934405500       1
2013-10-30 09:24:47.001118500    1

We frequently see client IPs sending queries to nonexistent domains, either because they are performing some kind of experiment or because they are operating a tunnel over DNS.

But one of the Hadoop jobs we are running daily computes the “popularity” of each domain name observed in our traffic, based on the number of distinct client IP addresses having looked up each domain.

DGAs with a high popularity make us confident that these lookups don’t come from an isolated network in the context of an experiment or a tunnel over DNS. In the previous examples, domains names looked up by the IP1 and IP2 are fairly popular, while these looked up by IP3 haven’t been looked up by other clients. We thus discard the names looked up by IP3 until they are looked up by more distinct client IP addresses.

Remaining names could be attributed to broken links on popular web sites. However, as we consider domain names looked up by IPs that looked up a significant amount of other nonexistent domains, we rule out this hypothesis.

These domains are definitely suspicious and should be blocked, even if the threat they belong to is unknown.

Are they part of Cryptolocker? This is very unlikely, as in this example, the interval between two queries for nonexistent domains sent by a given client is way too low to be eligible.

Looking at other client IP addresses having sent a lot of consecutive and distinct queries leading to nonexistent domains, we quickly stumble upon one whose query rate for these queries matches the Cryptolocker one.

2013-10-30 01:47:14.2953525     1
2013-10-30 01:47:15.6041884      1
2013-10-30 01:47:17.1396666     1
2013-10-30 01:47:18.6635084   1
2013-10-30 01:47:20.0317924    1
2013-10-30 01:47:21.4113436     1

These domain names were not seen in our traffic before, another important feature of Cryptolocker DGAs:


This client was confirmed to be infected by Cryptolocker.

Other IPs with a high number of consecutive queries for nonexistent domains are sharing a large subset of domains. We were able to reconstruct the full list of Cryptolocker C&C domains from a single domain using this method.

It is hard to predict how the Cryptolocker malware is going to evolve. We recently saw many creative ways botnets took to be more resistant to takedown, by changing DNS settings, leveraging 3rd party sites (Twitter, Google Docs, Blogger…) as well as using P2P networks.

But the easiest and quickest way for Cryptolocker authors to work around security companies blocking it at the network level is to change the domain name generation algorithm.

Having a way to immediately spot new C&C domains without knowing the new algorithm means that, if other network features are not changed, we will be able to quicky block the latest version of the threat, even before we can get ahold of the new malware sample.