Over on LJ, Candid compares getting news from MSNBC with only getting it from the internet:

Besides, I'm sure if you watch those shows, all they ever talk about is the Presidential election. Whereas my way, I don't have to hear a word about it, and I can pretend that the only newsworthy things are hurricanes, gay state chief executives, and interviews with John Perry Barlow.

Since I read about all 3 of those things last week, while avoiding pretty much anything about the election, I found this particularly amusing. Ah lovely internet, nestled in thy bosom I can almost forget that a world exists outside my geeky libertarian circles...

More automated filtering,

More automated filtering, IMHO, is the Next Big Thing. RSS aggregators let you do it in a very coarse way by choosing which blogs you read. The next step (something I've done some work on) is clustering stories the way does and adaptive filtering.

I've started on a document clustering system in Python, but it's still very slow even though it is supposed to implement the most efficient algorithm in the literature. It needs a few tweaks. The purpose is to write an aggregator. Unfortunately, document clustering is infractructure, not an application, so my desire for a document-clustering news aggregator is not quite sufficient to get me to finish the project. The infrastructure for parsing feeds and bayesian filtering already exists, but I had to go on a document clustering kick :)

I think the simplest useful

I think the simplest useful improvement would be to have a consistent category set and use it. I'd read a lot more blogs (and a ton more LJ's) if I could restrict the category of stuff I read about.

Adaptive filtering is promising too. It really shouldn't be hard to build an interesting / not interesting classifier based on the corpus of a weeks blog reading choices.

This relates to what I want to do at Google, I want to build classifiers for search refinement. And perhaps eventually do document clustering on the entire web - but that's a much bigger, long-term project.

