I'm working on a Python script that can download images from Flickr, among other sites. I use the Flickr API to pull the various sizes of the image I'm trying to download and identify the URL for the original size. Well, that's what I'm trying to do. Here's my code so far...
URL = {a Flickr link}
flickr = re.match(r".*flickr\.com\/photos\/([^\/]+)\/([0-9^\/]+)\/", URL)
URL = "https://api.flickr.com/services/rest/?method=flickr.photos.getSizes&api_key=6002c84e96ff95c1a861eafafa4284ba&photo_id=" + flickr.group(2) + "&format=json&nojsoncallback=1"
request = requests.get(URL)
result = request.text
parsed = re.match(r".\"Original\".*\"source\"\: \"([^\"]+)", result)
URL = parsed.group(1)
Using print() statements throughout my code, I know that the first regular expression (to parse the original Flickr URL to identify the photo ID) works properly, and that the API request works properly, returning the following result (using the example URL https://www.flickr.com/photos/matbellphotography/33413612735/sizes/h/)...
{ "sizes": { "canblog": 0, "canprint": 0, "candownload": 1,
"size": [
{ "label": "Square", "width": 75, "height": 75, "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_s.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/sq\/", "media": "photo" },
{ "label": "Large Square", "width": "150", "height": "150", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_q.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/q\/", "media": "photo" },
{ "label": "Thumbnail", "width": 100, "height": 67, "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_t.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/t\/", "media": "photo" },
{ "label": "Small", "width": "240", "height": "160", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_m.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/s\/", "media": "photo" },
{ "label": "Small 320", "width": "320", "height": "213", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_n.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/n\/", "media": "photo" },
{ "label": "Medium", "width": "500", "height": "333", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/m\/", "media": "photo" },
{ "label": "Medium 640", "width": "640", "height": "427", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_z.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/z\/", "media": "photo" },
{ "label": "Medium 800", "width": "800", "height": "534", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_c.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/c\/", "media": "photo" },
{ "label": "Large", "width": "1024", "height": "683", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_b.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/l\/", "media": "photo" },
{ "label": "Large 1600", "width": "1600", "height": "1067", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_4d92e2f70d_h.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/h\/", "media": "photo" },
{ "label": "Large 2048", "width": "2048", "height": "1365", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_81441ed1da_k.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/k\/", "media": "photo" },
{ "label": "Original", "width": "5760", "height": "3840", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_34cbc172c1_o.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/o\/", "media": "photo" }
] }, "stat": "ok" }
My code apparently breaks down after that. The second regular expression, intended to identify the download URL of the image at its original filesize, apparently doesn't find any matches. According to yet another print() statement...
parsed.group(1) = none
I setup the expression using RegExr, which identified exactly what I needed from the JSON result. What have I done wrong?
re.search
instead ofre.match
..json
instead of.text
json.loads(request.content)
and go from there instead.