Controlling Connection Bursts in Twisted Applications

I’ve been using Twisted’s Deferred and DeferredList to much pleasure, but recently I found I needed to limit the amount of tasks/connections that would run at once—my application simply spawned too many connections too quickly, and killed my little home router.

In this post I want to demonstrate how to do that using Twisted’s DeferredSemaphore.

Here’s a sample application which doesn’t use DeferredSemaphore:

from twisted.internet import defer
from twisted.internet import reactor
from twisted.names.client import lookupMailExchange

domains = [
    'google.com',
    'yahoo.com',
    'microsoft.com',
    'facebook.com',
    'twitter.com'
    ]

def showMailExchanges(results):
    for result in results:
        # DeferredList returns (status, deferred) for each deferred, e.g.:
        # (True, ([<RR name=twitter.com type=MX class=IN ttl=164s auth=False>,
        #          <RR name=twitter.com type=MX class=IN ttl=164s auth=False>,
        #          <RR name=twitter.com type=MX class=IN ttl=164s auth=False>,
        #          <RR name=twitter.com type=MX class=IN ttl=164s auth=False>,
        #          <RR name=twitter.com type=MX class=IN ttl=164s auth=False>],
        #         [], []))
        ans, auth, add = result[1] # DNS results are always a 3-part tuple
        for x in ans:
            print("{0:15} {1}".format(x.name, x.payload.name))

deferreds = []
for domain in domains:
    d = lookupMailExchange(domain)
    deferreds.append(d)
dl = defer.DeferredList(deferreds)
dl.addCallback(showMailExchanges)

reactor.run()

# Example output:
# google.com      aspmx.l.google.com
# google.com      alt2.aspmx.l.google.com
# google.com      alt3.aspmx.l.google.com
# google.com      alt1.aspmx.l.google.com
# google.com      alt4.aspmx.l.google.com
# yahoo.com       b.mx.mail.yahoo.com
# yahoo.com       d.mx.mail.yahoo.com
# yahoo.com       e.mx.mail.yahoo.com
# yahoo.com       f.mx.mail.yahoo.com
# yahoo.com       g.mx.mail.yahoo.com
# yahoo.com       h.mx.mail.yahoo.com
# yahoo.com       i.mx.mail.yahoo.com
# yahoo.com       j.mx.mail.yahoo.com
# yahoo.com       k.mx.mail.yahoo.com
# yahoo.com       l.mx.mail.yahoo.com
# yahoo.com       m.mx.mail.yahoo.com
# yahoo.com       n.mx.mail.yahoo.com
# yahoo.com       a.mx.mail.yahoo.com
# microsoft.com   mail.messaging.microsoft.com
# facebook.com    smtpin.mx.facebook.com
# twitter.com     alt2.aspmx.l.google.com
# twitter.com     ASPMX2.GOOGLEMAIL.com
# twitter.com     ASPMX3.GOOGLEMAIL.com
# twitter.com     aspmx.l.google.com
# twitter.com     alt1.aspmx.l.google.com

Say we want to limit the amount of lookups that are performed at once. Enter Twisted’s DeferredSemaphore:

from twisted.internet import defer
from twisted.internet import reactor
from twisted.names.client import lookupMailExchange

domains = [
    'google.com',
    'yahoo.com',
    'microsoft.com',
    'facebook.com',
    'twitter.com'
    ]

def showMailExchanges(results):
    for result in results:
        ans, auth, add = result[1]
        for x in ans:
            print("{0:15} {1}".format(x.name, x.payload.name))

deferreds = []
sem = defer.DeferredSemaphore(2)            # New
for domain in domains:
    d = sem.run(lookupMailExchange, domain) # New
    deferreds.append(d)
dl = defer.DeferredList(deferreds)
dl.addCallback(showMailExchanges)

reactor.run()

Neat. We only had to change two lines.

What Twisted’s “asynchronous semaphore” does is restrict the number of Deferred objects that will run at once. In this case, we specify that at most two Deferred objects (and their children) can do work at the same time. That means our script won’t try to look up the mail exchanges of more than two domains at once.

If we want to implement an application-wide semaphore, we can write a helper function that returns a global semaphore:

from twisted.internet import defer
from twisted.internet import reactor
from twisted.names.client import lookupMailExchange

domains = [
    'google.com',
    'yahoo.com',
    'microsoft.com',
    'facebook.com',
    'twitter.com'
    ]

def showMailExchanges(results):
    for result in results:
        ans, auth, add = result[1]
        for x in ans:
            print("{0:15} {1}".format(x.name, x.payload.name))

theSemaphore = None
def getSemaphore():
    global theSemaphore
    if theSemaphore is None:
        theSemaphore = defer.DeferredSemaphore(2)
    return theSemaphore

deferreds = []
sem = getSemaphore()
for domain in domains:
    d = sem.run(lookupMailExchange, domain)
    deferreds.append(d)
dl = defer.DeferredList(deferreds)
dl.addCallback(showMailExchanges)

reactor.run()

Now, whenever we do something, and we’re using this module’s getSemaphore function to load the semaphore, the amount of Deferred objects that run at once is restricted to two. Awesome.

We can even write a “semaphore map” to do away with the boilerplate looping and adding:

from twisted.internet import defer
from twisted.internet import reactor
from twisted.names.client import lookupMailExchange

domains = [
    'google.com',
    'yahoo.com',
    'microsoft.com',
    'facebook.com',
    'twitter.com'
    ]

def showMailExchanges(results):
    for result in results:
        ans, auth, add = result[1]
        for x in ans:
            print("{0:15} {1}".format(x.name, x.payload.name))

theSemaphore = None
def getSemaphore():
    global theSemaphore
    if theSemaphore is None:
        theSemaphore = defer.DeferredSemaphore(2)
    return theSemaphore

def semMap(function, things, *args, **kwargs):
    assert callable(function)
    sem = getSemaphore()
    deferreds = []
    for x in things:
        d = sem.run(function, x, *args, **kwargs)
        deferreds.append(d)
    dl = defer.DeferredList(deferreds)
    return dl

dl = semMap(lookupMailExchange, domains)
dl.addCallback(showMailExchanges)

reactor.run()

Pretty sweet, huh?

What Would You Do With Your Own Google?

Nice talk from O’Reilly OSCON Data 2011 by Steve Yegge on what programming/engineering activities are popular (cat pictures), and what should be popular (sequencing the human genome), and how some companies are trying to help facilitate a culture change:

Microsoft Releasing USB/DVD Tool Under the GPL

Linux nom'd WindowsGrab your windbreaker jacket cause it’s a cold, stormy day in hell. Microsoft is going to release the Windows 7 USB/DVD download tool under the GNU General Public License.

Okay, so they might have a strong moral, even legal, obligation to do so since the tool contains code licensed under the GPL already — still, it’s the thought that counts, ..right?

See Microsoft Open Source Community Manager Peter Galli’s official announcement and the original license violation claim by Rafael Rivera Jr. for more information.

Cdecl Tells You What Your C Code Means

The new, AJAX-powered version of the “C gibberish ↔ English” translation tool cdecl is a gem for any C programmer who has ever scratched their head wondering what a complex statement like char (((* const x[3])())[5])(int) or (double (^)(int , long long ))foo actually means. Give it a try!

cdecl screenshot

The public domain source code for the tool is available, as well. Thanks ridiculous_fish!

Google Releases Its Own Programming Language, "Go"

Go Language LogoGoogle has just announced the release of a new computer programming language named, quite creatively, Go. It is similar to the C family of imperative programming languages, but strives to be as simple in syntax as dynamic languages like Python and JavaScript. It aims to be very fast (both at compile time and at execution), safe, concurrent, memory-managed, speedy (in terms of developer productivity) and Open Source. Quite interestingly, some of its main authors are programming demigods Rob Pike and Ken Thompson.

Here is an example of a simple web server written in Golang:

package main
import (
    "http";
    "io";
)
func HelloServer(c *http.Conn, req *http.Request) {
     io.WriteString(c, "Hello world!\n");
}
func main() {
     http.Handle("/", http.HandlerFunc(HelloServer));
     err := http.ListenAndServe(":80", nil);
     if err != nil {
         panic("ListenAndServe: ", err.String())
     }
}

I’m not sure what I think about Go yet; all of its features look extremely attractive, but I can’t shake the feeling of something being “off” somehow. For instance, I’m not particularly fond of the function name capitalization, and there are no generics/templates in the language (yet). I suppose the former might largely be a matter of conditioning, though.

One aspect of Golang that has really impressed me is how fast it compiles; you really have to see it to believe it. Projects with several thousand lines of source code compile in less than a tenth of a second on an average workstation. Sadly, I think this might mean we won’t be having as many office chair sword duels in the future:

xkcd: Compiling

Have a look at the Google Go TechTalk for an introduction to and overview of the language:

Note that Go (or Golang) isn’t to be confused with the Go! Programming Language. Why Google chose an almost identical name, or why they chose such a common word at all, I don’t understand. I’m rather fond of the term ‘Golang’ since, well, it makes sense. Ericsson previously made its own language, Erlang, so it’s only a natural addition.