rsscluster

June 30, 2013 at 03:15 PM | categories: software | View Comments

One year ago I went to a Python meetup where William Bert made a presentation on gensim, a "topic modelling" library for Python. One of the most practical uses of topic modelling is finding similar documents from a large corpus. For example, where Google search takes a query as an argument and returns documents based on that query, Google News tries to organize stories from multiple sources into clusters based on the same topic or event.

In memory of Google Reader shutting down tomorrow, I present rsscluster, a small script which demonstrates the usage of the gensim library. One of the problems I ran into with wanting to play with gensim was finding a large corpus of documents with which to populate the database. When Google Reader announced it was closing and I looked for a new home for my feeds, I found a great corpus to play with.

My typical set of feeds contains about 3000 stories, which isn't a huge corpus, but enough to play with. Crucially, there are a number of feeds I subscribe to which are likely to produce "similar" documents. For example, Ars Technica and The Verge will probably both have stories about the latest Apple keynote, while The Washington Post and NPR will both cover recent Supreme Court decisions.

You can see how well rsscluster did in finding stories similar to those in my feeds published today (June 30th) here. For as little effort I put in and considering how small the corpus is, it did a pretty good job of detecting clusters around recent events (Edward Snowden and the NSA, a presidential tour of Africa, John Kerry speaking on the Middle East). It did a slightly worse job categorizing Supreme Court stories, as they've been pretty active lately and ruled on a lot of disparate issues. Gensim decided to cluster those up more than it should have. Finally, some of my feeds are just not very semantically rich, and it decided to cluster a bunch of Steam sales together. One feed is particularly degenerate in that it only contains the word "NO" in each entry. Gensim dutifully clustered all of those documents together, but those documents probably don't belong in the corpus at all.

Again, this is a really naive use of gensim (I spend more time parsing command line arguments than actually exercising the library), but it let me get gensim testing out of my system and it demonstrates how easy it is to set up a pretty powerful document similarity search engine. Hopefully someone else can also be inspired to play with it, and will find a better scenario with which to exercise it.

See it on github.

Read and Post Comments

I Just Played Little Inferno

June 21, 2013 at 09:45 PM | categories: games | View Comments

/images/littleinferno.jpg

Little Inferno is an entertaining little morsel of a game. It sustains its simplistic "gameplay"-- which mainly consists of adventure game inventory combinations--with a dark, almost morbid sense of humor, as the entire game has you buying toys just so you can immediately throw them in a fire, leaving room for you to purchase more toys. The social commentary isn't exactly subtle, but it's effective. The writing and the foreboding soundtrack elevate the game so even from the first toy you throw in the fire, you know the game is more ambitious than its mechanics suggest. Still, this may be to the games detriment, as it has you expecting dark twists from the start. By the time the game ended, the story seemed almost conventional after all of the foreshadowing leading up to the finale.

If you picked up this game as part of a Humble Bundle, I recommend actually playing through it to the end. It only takes a couple hours, and is entirely worth the time.

Read and Post Comments

I Just Watched The Queen of Versailles

April 29, 2013 at 09:40 PM | categories: movies | View Comments

/images/versailles.jpg

If someone ever makes a documentary on my life, I want it to be the people behind The Queen of Versailles. Anyone that can make me feel empathy for the subjects of this documentary has got a fair amount of talent. The documentary begins following the comically ostentatious lifestyle of timeshare real estate moguls as they live out the worst nouveau riche stereotypes and build the largest home in America.

Halfway through filming the documentary, though, the stock market crashed and it becomes something much different. The family vacillates between being totally oblivious to the extent of their privilege and occasional moments of brutal self-awareness, making it impossible to completely write them off as evil one-percenters, even as they eat caviar by the $2000 tub-full. I recommend the documentary both as an intellectual explanation for the housing bubble and crisis, as well as a character study of people that most of us despise--and with good reason--but who are still more complicated than we give them credit.

Read and Post Comments

I Just Watched Django Unchained

April 25, 2013 at 07:51 PM | categories: movies | View Comments

/images/djangounchained.jpg

Over the weekend I caught up with Django Unchained. Like every other Tarantino movie, it was a supremely enjoyable watch. It retreads a lot of the ground from Inglourious Basterds, but I'm really starting to enjoy the whole "historical revenge fantasy" genre Tarantino is creating.

Christoph Waltz and Sam Jackson give great showy performances, and for a while Jamie Foxx almost gets lost in his own movie. Still, the only out-of-place performance was Tarantino himself. At this point, he's gotta be trolling us. For as great a director he is, he has to know he's not a very good actor, so giving himself accent work just seems like another "in-joke" of his at this point.

Since watching this, I've added Shaft and Coffy to my Netflix queue. It's time I watched those.

Read and Post Comments

I Just Played Bioshock Infinite

April 14, 2013 at 07:52 PM | categories: games | View Comments

/images/bioshockinfinite.png

I just finished Bioshock Infinite, and you should have too.

I don't want to get too deep into it lest there be spoilers, but I'll say that the game has the music direction since Bastion and the best art direction since, well, Bioshock. The choice of setting the story in an early 20th century American Exceptionalism dystopia is really refreshing and makes me wonder how many other times and places in the planet's history are going underserved in modern media.

Also, any game that gets me to refresh my memory of the Boxer Rebellion and Wounded Knee (and not come off as pretentious in so doing) must be doing something right.

Read and Post Comments

« Previous Page -- Next Page »