Riddler is an online research project which investigates algorithms for mapping the topology of the Internet. Riddler collects data about public systems via crawling and port mapping common ports.


How do I exclude myself from riddler-visits?

The crawler (Riddlerbot) is compliant with the robots.txt standard. You can use the following to prevent it from accessing your website:

User-agent: Riddler
Disallow: /

If you wish to exclude a network from the port mapping section, please contact sir@riddler.io with the netblock in question. Please ensure that the whois information for the netblock includes a valid email address, used for verification purposes.

Crawler politeness

For most sites, Riddlerbot shouldn't access your site more than once every few seconds on average. However, due to network delays, it's possible that the rate will appear to be slightly higher over short periods. In general, Riddlerbot should download only one copy of each page at a time. If you see that Riddlerbot is downloading a page multiple times, it's probably because the crawler was stopped and restarted.

Riddlerbot was designed to be distributed on several machines to improve performance and scale as the web grows. Also, to cut down on bandwidth usage, we run many crawlers on machines located near the sites they're indexing in the network. Therefore, your logs may show visits from several machines at riddler.io, all with the user-agent Riddler. Our goal is to crawl as many pages from your site as we can on each visit without overwhelming your server's bandwidth.

Why is Riddlerbot generating 404 (not found) errors?

A: The two most common reasons are a) Broken links that point to some non-existent page on your site and b) unsafe characters (such as spaces, tabs, and new lines) in your URLs. If you believe that Riddlerbot incorrectly parses your links, please let us know and we will fix the problem shortly.