05:19 pm, 9 Mar 04
bleargh
I've been considering writing an aggregator in OCaml, because, y'know, there aren't a million different aggregators already. Oh wait, yes there are. But I want to make one that is not web-based and works specifically well with LiveJournal because that's where almost everyone I read is.
Trying to reinvent the fewest wheels, I found MyRSS, an OCaml library for parsing RSS. (Not to be confused with the million other "MyRSS" programs out there, such as MyRSS, a Python aggregator.)
They use XML light, which I pointed at a feed or two until I discovered it doesn't even support CDATA. Um.
I sent the XML light author a patch but it really seems to me that if I were making an XML lexer/parser I'd start with the XML spec and just follow the (incredibly verbose) spec. But I guess it is a "light" parser.
Blah, wasting time.
Jeff Waugh always sends encouraging mail to the GNOME lists, keeping people motivated. Whenever I read something inspiring via DWN or whatever, I try to remember to send "thanks" emails to the people that are doing thankless work on my behalf.
Trying to reinvent the fewest wheels, I found MyRSS, an OCaml library for parsing RSS. (Not to be confused with the million other "MyRSS" programs out there, such as MyRSS, a Python aggregator.)
They use XML light, which I pointed at a feed or two until I discovered it doesn't even support CDATA. Um.
I sent the XML light author a patch but it really seems to me that if I were making an XML lexer/parser I'd start with the XML spec and just follow the (incredibly verbose) spec. But I guess it is a "light" parser.
Blah, wasting time.
Jeff Waugh always sends encouraging mail to the GNOME lists, keeping people motivated. Whenever I read something inspiring via DWN or whatever, I try to remember to send "thanks" emails to the people that are doing thankless work on my behalf.
• Non-paid users have severely truncated posts
• You have to wangle together an LJ login cookie to send with the RSS request in order to see any non-public posts
Also, I think it's annoying because
• You have to poll n different RSS feeds, one per friend, which defeats the nice aggregation the regular friends-page already does for you.
Do you know if any of those problems are going to be addressed?
LJ login cookie: yeah, part of the show. Not that bad, I hope. I can't think of any other solution. (Other than that they were discussing authentication on the Atom list when I abandoned it.)
Polling different feeds: yes, that is a problem, sorta. But I look at it from LJ's perspective: LJ has to "poll" all of those feeds whenever it generates a friend view, so if anything I'm making LJ's load lighter. And I bypass the style system, etc., which is even less load. And I can poll infrequently-updating users more infrequently, etc. etc.
Authentication: Heck, couldn't LJ just use HTTP-Auth? That's what that [admittedly really annoying] Dare Obasanjo was bugging you about a few months ago, and he has a point.
Polling: Doesn't generating an RSS feed go through something not unlike the style system?
And I remembered another gripe about getting LJ via RSS: You don't get all the nice metadata like mood/music/userpic...
Generating an RSS feed queries the same backends as the style system, but the RSS generation is explicit. (It's not a custom style or anything like that, though that would seem to make a whole lot of sense. The layers of code don't hook up that way in an easy manner.)
Brad put up my "new" livejournal.org (that I wrote months ago) that includes a spec for our own RSS (and Atom, I imagine) namespace: http://www.livejournal.org/rss/lj/1.0/
It's only currently used by the latest-rss feed, which is already doing some ugly hacks to shoehorn in the aggregate LJ output (per-post usernames are going into the subject field or something like that).
If I were serious about the aggregator, I'd probably extend this namespace with more metadata and then send in LJ patches to improve the RSS/Atom output.
False:
http://www.livejournal.com/users/brad/data/rss?auth=digest
Uh, was this announced anyplace, like lj_dev or lj_clients?
And is it documented? It's going to be hard for people and aggregators to find since the LINK tag on the journal page doesn't point to that URL.
Mainly because I was thinking about applying filters. Bayesian Filters to seperate the wheat from the chaff. And something to detect stale meme propogation. Something that realises that oh, you've seen that link before. I wont show it to you again. Handy when a meme spreads like wildfire.
One day LJAssassin will Exist to purge "What kind of toaster am I?" from my friends list.
Filters is a good idea! You should learn OCaml. :P