perl | Youngfrog's Blog

Posts Tagged ‘perl’

Downloading full resolution images from Wikipedia

3 janvier 2014

Here’s a snippet I used to download some images from wikipedia.

Usage:
perl younameit.pl WikipediaFilePage LocalName

where WikipediaFilePage is an url like « https://en.wikipedia.org/wiki/File:Bournemouth,_The_Square.jpg »
which describes an image that you want to download at full res, and LocalName is the name you want to give it on your harddisk. You can give as many « WikipediaFilePage LocalName » pairs as you like.

KNOWN BUGS:
– Any error will stop everything (in case there are multiples pairs on the command line).
– There’s a huge memory leak (thanks to feldspath for deeply analyzing my code :D)
– This list is very much incomplete.

#!/usr/bin/perl -w
use WWW::Mechanize;
use strict;
use HTML::TreeBuilder;
binmode STDOUT, ":utf8"; # spit utf8 to terminal
use utf8; # allow for utf8 inside the code.

my $url;
 while (defined($url = shift)) {
 my $mech = WWW::Mechanize->new;
 my $tree = HTML::TreeBuilder->new;
 my $tempfile = shift; ## user supplied

print "Downloading $url...";
 $mech->get($url) and print "done !\n" or die;
 $tree->parse_content($mech->content());
 # real image:
 $url = $tree->look_down(class=>'fullImageLink')->look_down(_tag => "a")->attr('href');

print "Downloading $url to $tempfile...";
 $mech->get($url, ':content_file' => $tempfile) and print "done !\n" or die;

}

Étiquettes :automation, perl, wikipedia
Publié dans Uncategorized | Leave a Comment »

Youngfrog's Blog

Posts Tagged ‘perl’

Downloading full resolution images from Wikipedia

Pages

Archives

Catégories