Harvester software, v1.5

Surfing 24h/day, so you don't have to


Christina Norman, who's known me since University (and was the first to crack the "learning by Os-Moses" joke), has described me as a Master Surfer. It's true that I've already seen any office joke before it gets passed to me (as paradoxical as it sounds, it's true), and I can be a fountain of trivial current events. It's been compounded by my interest in current affairs since the events in New York City one particular September (it's surprising how few people read the newspaper, and make connections).

Keeping up with the memes and viral-media that populate popular culture can be a chore, and if you're using the Internet as your primary information source (which is like saying "using the printed word"), it's a lot of repetitive tasks. So, I wrote software to help me with this. At first it was only something to check a members-only media site for updated material, but it's grown into something more of an agent.

The software was broken for a year (like poor, poor TinGoth), but eventually I fixed it up, and after a year of not programming a thing (I didn't go to alt.goth.convergence that year, so no photography database) and only fixing things as a sysadmin... I got bitten by the bug and started writing. Okay, and it took a bottle of not very good red wine. The Harvester software is now enjoying multiple sources for memes and viral media, and uses Perl's abilities for object-oriented programming to make develpment faster, and error-trapping in eval { } blocks to make it more resilient.

I know I'm being a serious tease by not posting the source code here, but I worry that some of the websites I've been using as meme sources will be upset that I'm not looking at the ad banners, pop-ups and pop-unders that they feel are necessary to pay their bills. If you know me, then you can ask me for a Harvester CD: in a week, the Harvester collects between 500 and 700 megabytes of memetic media. I usually burn it to two CDs and date them; one for my own collection (I make video mix tapes with the video material to show at parties), and one to give away to somebody if I feel like it. I've also written something that generates an RSS file pointing to the latest material downloaded.

If you would like to see a related project, check out Andrew Medico's dailystrips.

Harvester Doccos, notes, such.

Idea: Goes to websites on a regular basis, fetches interesting
 media and dumps it in a local dir for parusing later.

Implementation: cronjob that runs ever 2-3 hours, uses independent parts
 (one for each source of media) that each have the proper intelligence
 (trathborne called it "business logic") to find the media and download
 it if hasn't already been downloaded.  Each of these parts also knows
 not to hit the website too often.

  Know the difference between no modules worth loading and no modules with
unexcpired semaphores.

  version 0.7: Fetches only pictures from filepile.org
  version 0.8: Fetches pictures, movies and text from filepile.org, and
    uses the cookie for my membership (good thing I got in early; no more
  version 0.9: Torrez introduced some anti-leeching methods in filepile.
  version 1.0: Torrez changed the anti-leeching methods so v0.9 was
    busted and I didn't fix it for a year.  This new verison uses
    referers and parses a series of links for each file.
    v1.0 is dedicated to the three things that restored my programmer's
    erection: meditation, Linux 2.6.0-test2 kernel, and DanceDance
    Revolution.  2003-08-08
  version 1.1: moved the FilePile-specific material to a separate
    OOP module, in anticipation of making more modules for different
    sites.  Other places I might make modules for are listed below.
    Renamed the program "harvester", since it's no longer just about
    filepile. 2003-10-13
  version 1.2: deleted v1.2 by accident.  Sh*t.  Anyways, new features
    that I'm going to have to rewrite are: loads modules from a directory,
    and each module checks to make sure it's not being invoked too often
    by checking the mtime on a semaphore file.  Moving these into modules
    makes it much easier and faster to include new sources.  2003-10-29
  version 1.3: since I deleted 1.2, might as well go to 1.3 and make the
    loaded modules inherit from a base class so I don't have to
    cut-and-paste the semaphore code every time.  Although I really
    wanted to avoid using eval(), I'm going to put the require() statement
    inside an eval{} so a broken module won't halt the entire thing.
    I will need to rewrite the modules now.  Also added command line
    switches so less will have to be configured by constants defined in
    the core program. 2003-11-12
  version 1.4: now comes with an RRS generator 2004-02-17
  version 1.5: 2004-10-28: added prototype class for pulling images out
    RSS feeds.