1

i have a problem with writing a webcrawler to extract currency rates:

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import re


url = "https://wechselkurse-euro.de/"

r = requests.get(url)
rates = []
status = r.status_code

if status != 200:
    print("Something went wrong while parsing the website " + url)

temp = BeautifulSoup(r.text, "html.parser")
current_date = temp.select(".ecb")[0].text.strip().split(" ")[5]

#rates_array = temp.select(".kurz_kurz2.center", limit= 20).string

rates_array = temp.select(".kurz_kurz2.center", limit= 20)

#for i in rates_array:
#    rate = rates_array[i].string
#    rates.append(rate)

rates = list( map( lambda x: re.search(">\d{1}\.\d{4}",x), rates_array))

print(rates)

#rate_1EUR_to_USD =  
#rate_1EUR_to_GBP =


I tried several ways which are commented out - all of them don't work and I don't know why. Especially the .string not working is suprising to me since the rates_array seems to inherit all the different information of the bs4 object, including the information that there is a td tag <td class="kurz_kurz2 center" title="Aktueller Wechselkurs am 3.4.2020">0.5554</td> where I just want the string within the tag (so the value 0.5554 in the example above). This should be easy but nothing works, what am I doing wrong?

The regular expression should not be the problem, I tested it on regExR.

I tried using the map function as currently active but I can't convert the map object to a list as I am supposed to.

The select().string returns an empty list and the same with using relgular expressions to search through the strings I saved in rates_array when I try to do the oldschool way of iterating over every item of my function with a for loop.

String as attribute of bs4-object

1
  • Try for o in rates_array: print(type(o), o.text). :) Commented Apr 5, 2020 at 20:48

2 Answers 2

0

Your rates_array contains Beautiful Soup tag objects, not strings. So you'll have to access their text property in order to get the values. For example:

rates = [o.text for o in rates_array]

Now rates contains:

['0.5554', '0.1758']
Sign up to request clarification or add additional context in comments.

2 Comments

Damn I tried it way too complicated... I still don't understand why o.text works, but .text or .string while filling the array does not. But thanks!
@mr_harm You could just as well have accessed the text properties via [item.text for item in temp.select(".kurz_kurz2.center", limit = 20)], but you can't just attach a property selector to temp.select(...) since it's a result set, not a single result item. You have to act on all items in the set, hence the list comprehension.
0

I would recommend you to check the locator first. Are you sure that rates_array is not empty? Also, try: rates_array[i].text

1 Comment

rates_array is not empty, checked that. Tried to go via index as well. Since .text works above as well I guess your way would work too. Thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.