Making use of a decorator within a python script

Question

I've written a script in python which is able to collect links of posts and then fetch the title of each post by going one layer deep from the target page.

I've applied @get_links decorator which scrapes the titles from its inner page.

However, I wish to get any suggestion to improve my existing approach keeping the decorator within as I'm very new to work with it.

import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

url = "https://stackoverflow.com/questions/tagged/web-scraping"

def get_links(func):

    def get_target_link(*args,**kwargs):
        titles = []
        for link in func(*args,**kwargs):
            res = requests.get(link)
            soup = BeautifulSoup(res.text,"lxml")
            title = soup.select_one("h1[itemprop='name'] a").text
            titles.append(title)
        return titles
    return get_target_link

@get_links
def get_info(link):
    ilink = []
    res = requests.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    for items in soup.select(".summary .question-hyperlink"):
        ilink.append(urljoin(url,items.get('href')))
    return ilink

if __name__ == '__main__':
    print(get_info(url))

I don't get how check_pagination is supposed to help from the code itself, can you explain its purpose in more details, please? — 301_Moved_Permanently
– 301_Moved_Permanently, Commented Dec 6, 2018 at 13:54
Right you were @Mathias Ettinger . The decorator in my earlier script was for nothing. Check the update. Thanks. — SIM
– SIM, Commented Dec 6, 2018 at 14:19
Why do you think that using a decorator is appropriate here? — 200_success
– 200_success, Commented Dec 6, 2018 at 15:42
Where did you find that I thought it would be appropriate here @200_success?. I'm trying to figure out how decorator works and that's it. — SIM
– SIM, Commented Dec 7, 2018 at 8:28
Your bolded italicized paragraph seemed to insist on keeping the decorator at all costs. — 200_success
– 200_success, Commented Dec 7, 2018 at 8:29

Graipher · Accepted Answer · 2018-12-07 13:04:04Z

While decorators are fun to learn about (especially when you get to decorators taking arguments and class decorators) and they can be quite useful, I think this decorator should not be one. Sorry.

Your code becomes much easier to read and understand by making this into two functions, one that gets the links and one that gets the title from a link, which you then apply to each link:

import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

def get_title(link):
    """Load a link to get the page title."""
    res = requests.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    return soup.title.text.split(" - ")[1] # Will only work exactly like this  with Stackexchange
    # return soup.select_one("h1[itemprop='name'] a").text

def get_links(link):
    """Get all links from a page."""
    res = requests.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    relative_urls = soup.select(".summary .question-hyperlink")
    return [urljoin(url, items.get('href')) for items in relative_urls]


if __name__ == '__main__':
    url = "https://stackoverflow.com/questions/tagged/web-scraping"
    links = get_links(url)
    link_titles = [get_title(link) for link in links]
    print(link_titles)

If you really want to, you can then make a new function that uses these two functions:

def get_link_titles(url):
    """Get the titles of all links present in `url`."""
    return [get_title(link) for link in get_links(url)]

In addition, you should use requests.Session to reuse the connection to the website (since you are always connecting to the same host).

You could put getting a page and parsing it with BeautifulSoup into its own function:

SESSION = requests.Session()

def get_soup(url):
    res = SESSION.get(url)
    return BeautifulSoup(res.text,"lxml")

You might also want to check the headers for a rate limit, because when I ran your code and tried to time it, Stack Exchange temporarily blocked me after some time because the request rate was too high :).

Stack Exchange Network

Making use of a decorator within a python script

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Making use of a decorator within a python script

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions