i have a problem with writing a webcrawler to extract currency rates:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import re
url = "https://wechselkurse-euro.de/"
r = requests.get(url)
rates = []
status = r.status_code
if status != 200:
print("Something went wrong while parsing the website " + url)
temp = BeautifulSoup(r.text, "html.parser")
current_date = temp.select(".ecb")[0].text.strip().split(" ")[5]
#rates_array = temp.select(".kurz_kurz2.center", limit= 20).string
rates_array = temp.select(".kurz_kurz2.center", limit= 20)
#for i in rates_array:
# rate = rates_array[i].string
# rates.append(rate)
rates = list( map( lambda x: re.search(">\d{1}\.\d{4}",x), rates_array))
print(rates)
#rate_1EUR_to_USD =
#rate_1EUR_to_GBP =
I tried several ways which are commented out - all of them don't work and I don't know why. Especially the .string not working is suprising to me since the rates_array seems to inherit all the different information of the bs4 object, including the information that there is a td tag <td class="kurz_kurz2 center" title="Aktueller Wechselkurs am 3.4.2020">0.5554</td> where I just want the string within the tag (so the value 0.5554 in the example above). This should be easy but nothing works, what am I doing wrong?
The regular expression should not be the problem, I tested it on regExR.
I tried using the map function as currently active but I can't convert the map object to a list as I am supposed to.
The select().string returns an empty list and the same with using relgular expressions to search through the strings I saved in rates_array when I try to do the oldschool way of iterating over every item of my function with a for loop.
for o in rates_array: print(type(o), o.text). :)