Some help understanding my own Python code

Question

I'm starting to learn Python and I've written the following Python code (some of it omitted) and it works fine, but I'd like to understand it better. So I do the following:

html_doc = requests.get('[url here]')

Followed by:

if html_doc.status_code == 200:
    soup = BeautifulSoup(html_doc.text, 'html.parser')
    line = soup.find('a', class_="some_class")
    value = re.search('[regex]', str(line))
    print (value.group(0))

My questions are:

What does html_doc.text really do? I understand that it makes "text" (a string?) out of html_doc, but why isn't it text already? What is it? Bytes? Maybe a stupid question but why doesn't requests.get create a really long string containing the HTML code?
The only way that I could get the result of re.search was by value.group(0) but I have literally no idea what this does. Why can't I just look at value directly? I'm passing it a string, there's only one match, why is the resulting value not a string?

Using re module and its search method you don't get as a return value just a string, but Match object. You want to get 0 (first) group from this object, but you could easily get another one, if it's there. — PatNowak
– PatNowak, Commented Dec 1, 2015 at 12:15

Łukasz Rogalski · Accepted Answer · 2015-12-01 12:20:46Z

4

requests.get() return value, as stated in docs, is Response object.

re.search() return value, as stated in docs, is MatchObject object.

Both objects are introduced, because they contain much more information than simply response bytes (e.g. HTTP status code, response headers etc.) or simple found string value (e.g. it includes positions of first and last matched characters).

For more information you'll have to study docs.

FYI, to check type of returned value you may use built-in type function:

response = requests.get('[url here]')
print type(response)  #  <class 'requests.models.Response'>

edited Dec 1, 2015 at 12:20

answered Dec 1, 2015 at 12:14

Łukasz Rogalski

23.3k10 gold badges63 silver badges93 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Iron Fist · Accepted Answer · 2015-12-01 12:44:07Z

Seems to me you are lacking some basic knowledge about Classes, Object and methods...etc, you need to read more about it here (for Python 2.7) and about requests module here.

Concerning what you asked, when you type html_doc = requests.get('url'), you are creating an instance of class requests.models.Response, you can check it by:

>>> type(html_doc)
<class 'requests.models.Response'>

Now, html_doc has methods, thus html_doc.text will return to you the server's response

Same goes for re module, each of its methods generates response object that are not simply int or string

Collectives™ on Stack Overflow

Some help understanding my own Python code

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related