Webcrawler: extracting string out of array using Python3 on mac

Question

i have a problem with writing a webcrawler to extract currency rates:

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import re


url = "https://wechselkurse-euro.de/"

r = requests.get(url)
rates = []
status = r.status_code

if status != 200:
    print("Something went wrong while parsing the website " + url)

temp = BeautifulSoup(r.text, "html.parser")
current_date = temp.select(".ecb")[0].text.strip().split(" ")[5]

#rates_array = temp.select(".kurz_kurz2.center", limit= 20).string

rates_array = temp.select(".kurz_kurz2.center", limit= 20)

#for i in rates_array:
#    rate = rates_array[i].string
#    rates.append(rate)

rates = list( map( lambda x: re.search(">\d{1}\.\d{4}",x), rates_array))

print(rates)

#rate_1EUR_to_USD =  
#rate_1EUR_to_GBP =

I tried several ways which are commented out - all of them don't work and I don't know why. Especially the .string not working is suprising to me since the rates_array seems to inherit all the different information of the bs4 object, including the information that there is a td tag <td class="kurz_kurz2 center" title="Aktueller Wechselkurs am 3.4.2020">0.5554</td> where I just want the string within the tag (so the value 0.5554 in the example above). This should be easy but nothing works, what am I doing wrong?

The regular expression should not be the problem, I tested it on regExR.

I tried using the map function as currently active but I can't convert the map object to a list as I am supposed to.

The select().string returns an empty list and the same with using relgular expressions to search through the strings I saved in rates_array when I try to do the oldschool way of iterating over every item of my function with a for loop.

String as attribute of bs4-object

Try for o in rates_array: print(type(o), o.text). :)

oriberu
– oriberu

2020-04-05 20:48:53 +00:00
Commented Apr 5, 2020 at 20:48 — oriberu
– oriberu, Commented Apr 5, 2020 at 20:48

oriberu · Accepted Answer · 2020-04-05 21:04:10Z

0

Your rates_array contains Beautiful Soup tag objects, not strings. So you'll have to access their text property in order to get the values. For example:

rates = [o.text for o in rates_array]

Now rates contains:

['0.5554', '0.1758']

answered Apr 5, 2020 at 21:04

oriberu

1,22611 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mr_harm Over a year ago

Damn I tried it way too complicated... I still don't understand why o.text works, but .text or .string while filling the array does not. But thanks!

oriberu Over a year ago

@mr_harm You could just as well have accessed the text properties via [item.text for item in temp.select(".kurz_kurz2.center", limit = 20)], but you can't just attach a property selector to temp.select(...) since it's a result set, not a single result item. You have to act on all items in the set, hence the list comprehension.

sinshev · Accepted Answer · 2020-04-05 20:51:29Z

0

I would recommend you to check the locator first. Are you sure that rates_array is not empty? Also, try: rates_array[i].text

edited Apr 5, 2020 at 20:51

answered Apr 5, 2020 at 20:38

sinshev

2152 silver badges5 bronze badges

1 Comment

mr_harm Over a year ago

rates_array is not empty, checked that. Tried to go via index as well. Since .text works above as well I guess your way would work too. Thanks!

Collectives™ on Stack Overflow

Webcrawler: extracting string out of array using Python3 on mac

2 Answers 2

2 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Related