Evan Martin (evan) wrote in evan_tech,
Evan Martin

how to rip media from web pages / summary of software

Start by finding the stream URL. View Source on a web page and try using wget on the media, then read the file you get and see if it forwards you to another URL. Eventually you should either end up with the file you want, or a reference to a protocol wget doesn’t support.

Many streams use the Windows Media “mms” protocol or Real’s “rtsp” (and there’s another one?). These isn’t just for obfuscation; they’re UDP-based (?) and vary what they send based on your real-time bandwidth. Unfortunately, that means they aren’t well-supported for downloading.

With Real, you’re currently out of luck. (When I investigated this before, I seem to recall reading that Real is switching to a new standards-based streaming format which might be easier to steal; however, most existing Real streams will continue to use the old format.) For Windows media, read on.

Start by grabbing mmsclient and downloading the ASF file. ASF is Yet Another Proprietary Format, patented by Microsoft, that appears to work like AVI or Quicktime and simply contain data from real codecs. Despite Microsoft’s legal threats, many open source players now support ASF; I’ve been successful with both mplayer and totem (the latter of which was simply the Debian package).

Converting between codecs is a messy business because there are million different codecs with a million settings and a million different programs with slightly different settings. In the beginning, everyone was writing their own implementations and stealing from each other; more recently, people appear to have more or less standardized on the ffmpeg and avifile libraries for codecs that aren’t already libraryized (such as ogg). On top of that, I’ve often seen mentions of transcode as the software to use to interface the libraries (and as you can see, there are a lot of them).

For my efforts, I had ASF audio that I wanted to convert to a usable audio format. I tried to make transcode work but failed. Eventually, I realized that ffmpeg did simple conversions itself, and the simple command ffmpeg -i input.asf output.wav converted my 260mb ASF to a 3.8gb WAV.

From there, I wanted to clip out the relevant portion of this program. Unfortunately, I couldn’t find any audio editors that could load such a file. I assume most tried to load the whole file in memory, or used a signed 32-bit integer to represent the file size. (I also tried splitting the ASF before I converted it to WAV, but that didn’t work for some reason.) After a lot more fruitless searching, I realized that good old sox had a trim command. I shrunk the file down to about 500mb with that, then loaded it into Audacity. Audacity appears simple but powerful and works well. It also lets me set envelopes at the beginning and end of the clip so it wouldn’t have sharp cuts.

Finally, I encoded that WAV with oggenc down to a 39mb file. (Audacity supports OGG exports as well, but I wanted to set the OGG headers on the file.) Hooray!

  • blog moved

    As described elsewhere, I've quit LiveJournal. If you're interested in my continuing posts, you should look at one of these (each contains feed…

  • dremel

    They published a paper on Dremel, my favorite previously-unpublished tool from the Google toolchest. Greg Linden discusses it: "[...] it is capable…

  • treemaps

    I finally wrote up my recent adventures in treemapping, complete with nifty clickable visualizations.

  • Post a new comment


    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.