Home > Uncategorized > SBS Latest Online Video RSS Feed

SBS Latest Online Video RSS Feed

[An updated (but more complex) script can be found in this post]

I needed an excuse to practice some Perl. So this was my first try.

The Perl script below will convert http://www.sbs.com.au/shows/ajax/getplaylist/playlistId/94/ to an RSS feed. That 94 playlist is a list recent episodes from the TV broadcaster SBS available online. This may not work if the source file’s structure changes.

#!/usr/bin/perl

# This script will download the ajax xml file containing the latest full episode videos added to the SBS.com.au site.

#Adapted from the code at http://www.perl.com/pub/a/2001/11/15/creatingrss.html by Chris Ball.

# I declar this code to be in the public domain.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.

use strict;
use warnings;

use LWP::Simple;
use HTML::TokeParser;
use XML::RSS;
use Date::Format;

# Constants
my $playlisturl = "http://www.sbs.com.au/shows/ajax/getplaylist/playlistId/94/"; # Latest Full Ep
#my $playlisturl = "http://www.sbs.com.au/shows/ajax/getplaylist/playlistId/95/"; # Latest Sneek Peek

# LWP::Simple Download the xml file using get();.
my $content = get( $playlisturl ) or die $!;

# Create a TokeParser object, using our downloaded HTML.
my $stream = HTML::TokeParser->new( \$content ) or die $!;

# Create the RSS object.
my $rss = XML::RSS->new( version => '2.0' );

# Prep the RSS.
$rss->channel(
 title            => "SBS Latest Full Episodes",
 link             => $playlisturl,
 language         => 'en',
 lastBulidDate    => time2str("%a, %d %b %Y %T GMT", time),
 description      => "Gives the most recent full episodes avaliable from SBS.com.au"
 );

$rss->image(
 title    => "sbs.com.au Latest Full Episodes",
 url    => "http://www.sbs.com.au/web/images/sbslogo_footer.jpg",
 link    => $playlisturl
 );

# Declare variables.
my ($tag);

# vars from sbs xml
my ($eptitle, $epthumb, $eptime, $baseurl, $img, $url128, $url300, $url1000, $code1char, $code2char, $code1);

#get_tag skips forward in the HTML from our current position to the tag specified, and
#get_trimmed_text  will grab plaintext from the current position to the end position specified. 

# Find an <a> tag.
while ( $tag = $stream->get_tag("a") ) {
 # Inside this loop, $tag is at a <a> tag.
 # But do we have a "title" token, too?
 if ($tag->[1]{title}) {
 # We do!
 $eptitle = $tag->[1]{title};
 #print $eptitle."\n";

 # The next step is an <img></img> set.
 $tag = $stream->get_tag('img');
 $epthumb = $tag->[1]{src};

 #get the flv urls from the img url
 #eg. http://videocdn.sbs.com.au/u/thumbnails/SRS_FE_Global_Village_Ep_19_44_48467.jpg
 #print $epthumb."\n";
 $baseurl = substr($epthumb, 40, length($epthumb)-40-4);
 $url128 = "http://videocdn.sbs.com.au/u/video/".$baseurl."_128K.flv";
 $url300 = "http://videocdn.sbs.com.au/u/video/".$baseurl."_300K.flv";
 $url1000 = "http://videocdn.sbs.com.au/u/video/".$baseurl."_1000K.flv";

 #SRS|DOC|MOV
 $code1char = substr($baseurl,0,3);
 #SP|FE
 $code2char = substr($baseurl,4,2);

 my %epcode_hash = (
 'DOC'    => 'Documentary',
 'MOV'    => 'Movie',
 'SRS'    => 'Series',
 );
 $code1 = $epcode_hash{$code1char};

 $stream->get_tag('a');
 $tag = $stream->get_tag('p');

 # Now we can grab $eptime, by using get_trimmed_text
 # up to the close of the <p> tag.
 $eptime = $stream->get_trimmed_text('/p');

 # We need to escape ampersands, as they start entity references in XML.
 $eptime =~ s/&/&amp;/g;

 # Add the item to the RSS feed.
 $rss->add_item(
 title         => $eptitle,
 permaLink     => $url1000,
 enclosure    => { url=>$url1000, type=>"video/x-flv"},
 description     => "<![CDATA[<img src=\"$epthumb\" width=\"100\" height=\"56\" /><br>
 $eptitle<br>
 $eptime<br>
 Links: <a href=\"$url128\">128k</a>, <a href=\"$url300\">300k</a>, <a href=\"$url1000\">1000k</a><br>
 Type: $code1<br>]]>");

 }
}
print "Content-Type: application/xml; charset=ISO-8859-1"; # To help your browser display the feed better in your browser.
#$rss->save("sbslatestfullep.rss"); #this will save the RSS XML feed to a file when you run the script.
print $rss->as_string; #this will send the RSS XML feed to stdout when you run the script.
 
Advertisements
Categories: Uncategorized Tags:
  1. June 28, 2009 at 6:56 pm

    The long sequence of if’s (if ($code1char eq “DOC”) {) can be replaced with a hash.

    • Andrew Harvey
      June 29, 2009 at 12:53 pm

      Thanks, that’s a much better method.

  2. Randy
    August 29, 2009 at 3:58 pm

    Hi Andrew,
    Thank you for your briliant work!

    I have just playing with your script and works fine but Im having no success driving it to completion with flvstreamer. I get Netstream.failed using
    .\flvstreamer -r rtmp://specialbsc.fcod.llnwd.net/a1768/o21/s/SRS_FE_Global_Village_Ep_19_44_73712_1000K.flv > globalvillage.flv

    Are there some other flvstreamer parmeters other than -r required?

    Can you please help or perhaps modify your script to output the flvstreamer command in an executable format.

    many thanks in advance …randy

    • Andrew Harvey
      September 1, 2009 at 9:58 am

      Randy,

      Yes that command doesn’t seem to work anymore. If you run flvstreamer –help you can see all the possible arguments. If you then (well at least this is how I did it) inspect the RTMP packets (not sure what they are called hopefully I can study networks next session) sent when you view the video through their standard flash interface, you can see what they use for these values.

      From some trial and error it seems that you need to have the –swfUrl set, so just add the argument –swfUrl ‘http://player.sbs.com.au/web/flash/standalone_video_player_application.swf’.

      Also my script uses the thumbnail url to determine the file url, this was when I didn’t know about the url’s like this http://player.sbs.com.au/video/smil/index/standalone/73712. Really I should get the file name from there because as it turns out all my file names for videos over RTMP are wrong. Will fix this eventually. In the meantime you will can use the /video/smil/index pages.

      So it should work if you use this,
      ./flvstreamer -r 'rtmp://specialbsc.fcod.llnwd.net/a1768/o21/s/SRS_FE_Global_Village_Ep_19_44_73712_1000K?ru=24&e=1251880200&h=0ce8076ba80d82988730202977bcd82b' --swfUrl 'http://player.sbs.com.au/web/flash/standalone_video_player_application.swf' -o globalvillage.flv

  3. Randy
    September 3, 2009 at 11:03 pm

    Hi Andrew,
    Thanks for your response. Interestingly I did manage to retrieve the file using the above command, but only after 8 or 9 dropouts that required using the –resume parameter of flvstreamer.
    There is still a ways to go to perfection though as the file when played back with VLC player shows washed out colour and a verticle green strip in the centre.
    I am using flvstreamer with ABC iview and it does not have that problem, driven from a similar perl script called iview_fetcher that i found on whirlpool.

    Im not sure that I am competent enough to understand the details of wireshark rtmp packet traces but there is nothing like a challenge to build new skills.

    cheers

  1. No trackbacks yet.

I don't read comments anymore due to an increase in spam comments. If you want to get in touch please send me an email (see tianjara.net for details).

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: