Filtered News

Over on LJ, Candid compares getting news from MSNBC with only getting it from the internet:

Besides, I'm sure if you watch those shows, all they ever talk about is the Presidential election. Whereas my way, I don't have to hear a word about it, and I can pretend that the only newsworthy things are hurricanes, gay state chief executives, and interviews with John Perry Barlow.

Since I read about all 3 of those things last week, while avoiding pretty much anything about the election, I found this particularly amusing. Ah lovely internet, nestled in thy bosom I can almost forget that a world exists outside my geeky libertarian circles...

Share this

More automated filtering,

More automated filtering, IMHO, is the Next Big Thing. RSS aggregators let you do it in a very coarse way by choosing which blogs you read. The next step (something I've done some work on) is clustering stories the way news.google.com does and adaptive filtering.

I've started on a document clustering system in Python, but it's still very slow even though it is supposed to implement the most efficient algorithm in the literature. It needs a few tweaks. The purpose is to write an aggregator. Unfortunately, document clustering is infractructure, not an application, so my desire for a document-clustering news aggregator is not quite sufficient to get me to finish the project. The infrastructure for parsing feeds and bayesian filtering already exists, but I had to go on a document clustering kick :)

I think the simplest useful

I think the simplest useful improvement would be to have a consistent category set and use it. I'd read a lot more blogs (and a ton more LJ's) if I could restrict the category of stuff I read about.

Adaptive filtering is promising too. It really shouldn't be hard to build an interesting / not interesting classifier based on the corpus of a weeks blog reading choices.

This relates to what I want to do at Google, I want to build classifiers for search refinement. And perhaps eventually do document clustering on the entire web - but that's a much bigger, long-term project.

The LJ link is Forbidden.

The LJ link is Forbidden.

Oops, it was friends-only.

Oops, it was friends-only. Thanks, I'll just link to his journal in general.