Play Framework and Upstart
I’ve been doing some work in Scala recently, specifically with the awesome Play Framework. More on that another time but it’s been pretty great thus far. I was setting up a production environment and ran into a bit of difficulty getting Play’s “staged” production package working with Upstart on Ubuntu. I’ve typically used init scripts in the past and wasn’t too familiar with the way things work in Upstart land, so I found a couple somewhat up to date Play examples but couldn’t get them to work out of the box. I narrowed down the problems to executing the staged “start” script. Since that script is rather simple I rolled it into Upstart and bypassed it:
description "Play Production"
env USER=user
env GROUP=group
env HOME=/path/to/project
env JAVA=/usr/bin/java
env JAVA_OPTS="-cp ./staged/* play.core.server.NettyServer ./.."
env PORT=8000
env EXTRA="-Xms128M -Xmx512m -server"
start on filesystem or runlevel [2345]
stop on runlevel [!2345]
respawn
respawn limit 5 10
exec start-stop-daemon --start --chuid $USER:$GROUP --chdir $HOME \
--exec /usr/bin/java -- -Dhttp.port=$PORT $EXTRA $JAVA_OPTS
This removes the complexity of deploying the “start” script and accommodating the forking sequence that occurs within.
This should sufficient in cases where you’re behind a load-balancer or proxy, etc. If you need to bind to privileged ports then you’ll to change things around a bit, although running a web server as root isn’t recommended anyway.
Django JSON Field
Some time ago I created django-json-field to scratch an itch. There were other widely-used JSON models fields for Django but all of them lacked what I consider key features. The most important of which is a JSON to Python decoder that allows the direct manipulation of the JSON data as native Python objects. This lazy (de)serialization is compatible with the built-in Django Python to JSON serializer and thus integrates pretty seamlessly. It also includes a form field that supports evaluation of datetime objects for your convenience, along with a few other useful things.
This custom field also integrates very well with the PostgreSQL JSON data type which provides native JSON validation as well as the ability to index and query the JSON data.
Check it out! Pull requests and suggestions are much appreciated.
2012 Presidential Election Sentiment Analysis
I’ve had an interest in natural-language processing for some time now, beginning in college where I did some research into writing software to analyze medical documents with the goal of learning facts and relationships at scale. I’m always looking for ways to exercise that interest and the 2012 presidential election was a prime candidate.
I began collecting Tweets on October 30th. By the time I stopped gathering Tweets on the evening of November 7th I had amassed nearly 2.5 million. While certainly nowhere near the total number of Tweets posted regarding Obama and/or Romney during this period the sampling is both stable and relatively random so it should still be statistically significant. That said, this is most definitely a light-hearted endeavor so don’t take the results too seriously.
My goal was to measure sentiment regarding Obama and Romney on Twitter. The method by which I measured sentiment consists of two parts: polarity (from -1 to 1), the positivity or negativity of the language, and subjectivity (from 0 to 1). Both considered together give a clearer picture of what’s going on.
It’s important to note that these measures don’t consider the meaning of a text, per se, but rather the temperament of the language used. For example, they won’t necessarily detect sarcasm or snark. However, they do analyze sentence structure on a basic level. “Obama is a cool guy” will register as positive while “Obama is NOT a cool guy” will be negative, so it is somewhat clever.
Setup
The heavy lifting is mostly handled by the wonderful Pattern library. It provides a convenient method for accessing data from common sources (Twitter, Wikipedia, Facebook, etc.), and also advanced natural-language processing tools for measuring sentiment and much, much more.
The Pattern library is nice enough to stay just under Twitter usage limits. The program searches Twitter for posts containing Obama and/or Romney every 5 minutes gathering 10 pages of 100 results each. This means every 5 minutes we can expect to get up to 1500 unique Tweets (determined by their URL). In reality it was less than this most of the time suggesting that a good number of all the Tweets mentioning Obama and/or Romney during this period were captured.
I initially used SQLite as the database, which worked perfectly until the database grew to approximately a gigabyte in size, including indices and all. Queries then slowed down very noticeably. This was still a couple days before election day and I was worried that it wouldn’t be able to withstand the deluge of activity, so I migrated to a fresh PostgreSQL installation. Unsurprisingly this proved to be far more performant.
Results
Here are the interactive charts. Warning: this might be a bit sluggish on slower machines.
Polarity
Subjectivity
Volume
Reactions
Perhaps what I found most interesting was that although Obama had many more Tweets, 1.6M vs 1.1M for Romney, their polarity and subjectivity scores were pretty similar. Inspecting the Tweets reveals a constant tug of war, lots of praise for one’s candidate of choice but also plenty of name calling and general unpleasantness. Not terribly surprising, I’m sad to say.
Examining the charts reveals some interesting trends. Most obvious is the large bump at the right: election day and the subsequent Obama victory. As you would guess, Obama’s polarity shot up greatly at this point, and Romney’s decreased. Subjectivity also increased during this time suggesting a lot of congratulatory (or inflammatory) Tweets. What becomes more obvious when viewing the interactive charts is the relationship between polarity and subjectivity. When a candidate’s subjectivity rises their polarity tends to decrease. This means that when the overall discussion becomes more opinionated those opinions are also more often negative than positive.
Overall, Obama’s polarity clocked in at an average of 0.061 and Romney’s at 0.049, giving Obama a pretty decisive upper hand on that front. Subjectivity was 0.299 and 0.302 respectively, essentially a tie.
Data
- Hourly time-line: JSON (34K)
- Raw Tweet data: CSV (171M)
Welcome!
Your regularly scheduled content will be arriving shortly…


