By erdgeist | January 21, 2007
Surely you have seen Weird Al Yankovich making fun of Rambo movies, where dozens of villains were standing on the top of a hill, firing rounds after rounds and still being shot – one after another by our dear Al.
You might argue that a bittorrent tracker does not actually seek to kill its clients, yet the amount of fire it attracts is comparable and I always wanted to refer to “UHF” in a blog posting.
When you have to handle that many requests per seconds (I’m told, were approaching the 350 in the moment), you basically have two choices: try to figure out a complex multithreaded connection handling framework that dispatches your complex database layout or go for simplicity and try to handle request sequentially and be fast to serve them.
Before I can go into that I need to explain, what a bittorrent tracker does. In Germany there’s an institution called Mitfahrzentrale. Basically you tell them, you’re driving from one city to another at a certain date or that you would want to but lack the car. In any way, Mitfahrzentrale tells you, who also wanted to get to the same place by the same time and provides you with contact information.
With bittorrent you acquire all information you need to share a file by the .torrent file you get from bittorrent search engines. This file lists all trackers that will play Mitfahrzentrale for you plus a token to identify your ride there: a twenty bytes (160 bits) info_hash, calculated over the .torrent file.
All contact information you need to connect to another peer wanting to share the same file is its IP address and a port number its client listens on. In IPv4 thats six bytes. If you are familiar with organizing data you might by now have noticed that you just need structures to hold twenty bytes per torrent plus a pointer to a list of six byte entries for each peer.
Before I can describe, how our first important design decission evolved, you need to know about the reasons, other clients chose different designs. Bram Cohen decided to tunnel all tracker requests through HTTP (over TCP, of course) which has two draw backs. First: most requests plus its answers comfortably fit into one ethernet frame each. Having to setup a TCP connection for requests requires nine packets average instead of two, you would get when using UDP, which would even leave us with enough bandwidth for eight retries before the more reliable TCP protocol could perform better. Second: HTTP itself is an awfully complex protocol to implement. If you’re just supposed to give a simple answer to a simple question, I would argue its use is inappropriate. Considering that most firewalls let HTTP connections pass unhindered, using HTTP would be excused – if the actual data connections between peers weren’t just some binary TCP streams doomed to be filtered.
Now, when you’re using some httpd to handle the protocol for you, anyway – why bother doing the rest as a C or C++ language CGI? A typical coders reflex is to write a php-script using mysql as data backend. You learned in computer science course, that using a data base to store your data is the way to go. Right? The framework will do all the threading and forking for you, your data base ensures an appropriate locking.
Well, we thought different.
Using a kick-ass all-features-on data base? Constructing complex SQL-queries to select six byte structures from lists accessed by twenty byte indexes, that already come pre-hashed? Having a data/metadata ratio of 1:10000? Doing local data base connections, that involves setting up unix sockets again to blow up a twenty bytes request and its 50*6 bytes answer up to several kbytes overhead. No way!
All techniques to efficiently store that little data (were talking about eight MBs for 100,000 info_hashes and six MBs for 1,000,000 peers here) were around for fourty years. As already mentioned, you can use info_hash’s first bytes to address buckets, since they are already hashed. So you can easily keep sorted lists you may do binary searches in. With some minor abstractions you can sort your peers into pools you can chose to release after bittorrents peer time out.
All left to do was to find a framework to handle HTTP. Fortunally I’ve been following libowfat‘s development for a while. It is a highly scalable multi platform framework modelled after Prof. Daniel J. Bernsteins fascinating low level abstraction library, but exploiting most IO acceleration techniques like kqueue and poll. Felix von Leitner added an example httpd application that became basis for what opentracker is today. What we do is to accept, answer and close all incoming connections as fast as possible, using static buffers. This approach easily scales up to several thousand requests per second. The only reason for not having exact numbers is us being too lazy to implement stress testing software in C and all other test suites being too slow to request more than some hundred connections per second.
Topics: coding | Comments Off
Comments are closed.