Python parse element

Question

I'm want to parse some info from the page, but have some troubles, becouse i cant parse something without id or class. Now i have tag div with image inside and some text (numbers) i need to get this numbers, but this div have only style tag and i cant use this style tag coz he always changing.

I have something like game site auction and trying to parse name of the item, price and link. But now i can get only names.

im trying to find all 'a' for div with father class. im trying to find hrefs im trying to find by style

def rshp_parse (base_url, headers):
    session = requests.Session()
    request = session.get(base_url, headers=headers)
    if request.status_code == 200:
        soup = bs(request.content, 'html.parser')
        divs = soup.find_all('div', class_={'shop-search-row'})
        for div in divs:
            title = div.find('span').text
            price = div.find('div')
            href = div.find('a', class_={'champions_container'})['href']
            # href = soup.find('div', style='color:#FFFFFF;text-decoration:none')

HTML

<div style="display:inline-block;width:15%;line-height:50px;vertical-align:top;white-space: nowrap;">
            <img src="/assets/rpc/shard.png" style="width:20px">35,000
        </div>

35,000 - its what im needed

<a href="/market/auction/1227124" target="_blank" style="color:#FFFFFF;text-decoration:none">

and this link

Can you share the url?

QHarr
– QHarr

2019-05-11 18:20:00 +00:00
Commented May 11, 2019 at 18:20 — QHarr
– QHarr, Commented May 11, 2019 at 18:20
link

Mavic
– Mavic

2019-05-11 20:03:47 +00:00
Commented May 11, 2019 at 20:03 — Mavic
– Mavic, Commented May 11, 2019 at 20:03

QHarr · Accepted Answer · 2019-05-13 11:05:51Z

1

You can reconstruct "table" as follows. With the dataframe you can use usual pandas syntax to access any element.

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import numpy as np

r = requests.get('https://www.roshpit.ca/market/browse')
soup = bs(r.content, 'lxml')
results = []

for row in soup.select('.shop-search-row'):
    name = row.select_one('.item_image + span').text
    seller = row.select_one('div:nth-child(3)').text.strip()
    bid = row.select_one('div:nth-child(4)').text.strip()
    buyout = row.select_one('div:nth-child(5)').text.strip()
    ends = row.select_one('div:nth-child(6)').text.strip()
    listing = [name, seller, bid, buyout, ends]
    results.append(listing)

df = pd.DataFrame(results, columns = ['name', 'seller' , 'bid' , 'buyout' , 'ends'])
df = df.replace(r'^\s*$', np.nan, regex=True)
df.buyout = df.buyout.str.replace(',', '').astype(float)
df[df['name'].str.contains("Hammer") & (df["buyout"] < 50000)]

edited May 13, 2019 at 11:05

answered May 11, 2019 at 20:32

QHarr

84.5k14 gold badges58 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mavic Over a year ago

Yes, it works, thank you! Now im stuck a little, trying to searching in dataframe with two columns at once. df[df['name'].str.match('Raijin')] and i want a second like buyout <=50

QHarr Over a year ago

Remember to also check columns types. You may need to do conversions as read in from strings.

Mavic Over a year ago

yeah, im understand, what im trying to do: df[['bid', 'buyout']] = df[['bid', 'buyout']].apply(pd.to_numeric) but have an error: ValueError: ('Unable to parse string "50,000" at position 15', 'occurred at index bid')

Collectives™ on Stack Overflow

Python parse element

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related