0

I'm want to parse some info from the page, but have some troubles, becouse i cant parse something without id or class. Now i have tag div with image inside and some text (numbers) i need to get this numbers, but this div have only style tag and i cant use this style tag coz he always changing.

I have something like game site auction and trying to parse name of the item, price and link. But now i can get only names.

im trying to find all 'a' for div with father class. im trying to find hrefs im trying to find by style

def rshp_parse (base_url, headers):
    session = requests.Session()
    request = session.get(base_url, headers=headers)
    if request.status_code == 200:
        soup = bs(request.content, 'html.parser')
        divs = soup.find_all('div', class_={'shop-search-row'})
        for div in divs:
            title = div.find('span').text
            price = div.find('div')
            href = div.find('a', class_={'champions_container'})['href']
            # href = soup.find('div', style='color:#FFFFFF;text-decoration:none')

HTML

<div style="display:inline-block;width:15%;line-height:50px;vertical-align:top;white-space: nowrap;">
            <img src="/assets/rpc/shard.png" style="width:20px">35,000
        </div>

35,000 - its what im needed

<a href="/market/auction/1227124" target="_blank" style="color:#FFFFFF;text-decoration:none">

and this link

2
  • Can you share the url? Commented May 11, 2019 at 18:20
  • link Commented May 11, 2019 at 20:03

1 Answer 1

1

You can reconstruct "table" as follows. With the dataframe you can use usual pandas syntax to access any element.

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import numpy as np

r = requests.get('https://www.roshpit.ca/market/browse')
soup = bs(r.content, 'lxml')
results = []

for row in soup.select('.shop-search-row'):
    name = row.select_one('.item_image + span').text
    seller = row.select_one('div:nth-child(3)').text.strip()
    bid = row.select_one('div:nth-child(4)').text.strip()
    buyout = row.select_one('div:nth-child(5)').text.strip()
    ends = row.select_one('div:nth-child(6)').text.strip()
    listing = [name, seller, bid, buyout, ends]
    results.append(listing)

df = pd.DataFrame(results, columns = ['name', 'seller' , 'bid' , 'buyout' , 'ends'])
df = df.replace(r'^\s*$', np.nan, regex=True)
df.buyout = df.buyout.str.replace(',', '').astype(float)
df[df['name'].str.contains("Hammer") & (df["buyout"] < 50000)]
Sign up to request clarification or add additional context in comments.

3 Comments

Yes, it works, thank you! Now im stuck a little, trying to searching in dataframe with two columns at once. df[df['name'].str.match('Raijin')] and i want a second like buyout <=50
Remember to also check columns types. You may need to do conversions as read in from strings.
yeah, im understand, what im trying to do: df[['bid', 'buyout']] = df[['bid', 'buyout']].apply(pd.to_numeric) but have an error: ValueError: ('Unable to parse string "50,000" at position 15', 'occurred at index bid')

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.