All Questions
544 questions
0
votes
1
answer
63
views
Parse WhatsApp message read status [closed]
My question is more about html layout and parsing dynamic of content.
My task: parse contacts who read my particular message in the Group.
I tried to see DOM structure for the DIV block that hold that ...
2
votes
1
answer
107
views
How to handle self-closing tags without end-slash in html.parser.HTMLParser
By default it seems that html.parser.HTMLParser cannot handle self closing tags correctly, if they are not terminated using /. E.g. it handles <img src="asfd"/> fine, but it ...
1
vote
1
answer
117
views
python: parse html document with UNNESTED div tags into dataframe (using beautifulsoup)
long time user, but never had to ask my own question.
I want to use python to parse a table from an html document into a dataframe. The table is NOT an html table, I think it is javascript created ...
-2
votes
1
answer
60
views
I don't understand web parsing completely
I tried using this code below to extract hyperlinks, and it worked for 1 website I tried it with:
import requests
from bs4 import BeautifulSoup
import time
def timedelay(amount):
print("...&...
2
votes
1
answer
49
views
Replace an HTML tag in an HTML document using Python without modifying the rest of the document
I'm making a simple Python + HTML website (as part of my study). The website menu looks like this:
<ul>
<li><a href="/">Home</a></li>
<li><a ...
0
votes
2
answers
90
views
Why is my code print out the same html link a lot of times?
I'm doing a following link activity on Python ( it's an assignment on Python Web Access Data - Coursera). Here is the problem:
In this assignment you will write a Python program that expands on http://...
0
votes
1
answer
88
views
why does requests-html return content partialy?
i know, that its's because of content, being rendered by js, but requests-html supports js, so that's strange
code itself:
from requests_html import HTMLSession
session = HTMLSession()
session....
1
vote
1
answer
33
views
Python: How can i get a list of li tags in BeautifulSoup4
I'm trying to scrape a persian webpage and i want to get 3 li tags from a ul containing 6 of them. my problem is that every li, has nested li tags in it and when i use soup.find_all('li'), it finds ...
0
votes
2
answers
102
views
How to find multiple tags at once along with attributes using BeautifulSoup in python3?
I am trying to find different tags at once using find_all() method of BeautifulSoup. I found a way to include all tags in the list to get the respective tags. But I am trying to get tags along with ...
0
votes
1
answer
109
views
Removing Specific Span Tags from a CSV file
I am trying to remove specific span tags from a csv file but my code is deleting all of them. I just need to point out certain ones to be removed for example '<span style="font-family: verdana,...
1
vote
2
answers
174
views
HTML parser find tag info
I have a project where uses HTMLParser(). I never worked with this parser, so I read the documentation and found two useful methods I can override to extract information from the site: handle_starttag ...
2
votes
1
answer
56
views
How to dynamically find the nearest specific parent of a selected element?
I want to parse many html pages and remove a div that contains the text "Message", using beautifulsoup html.parser and python. The div has no name or id, so pointing to it is not possible. I ...
0
votes
1
answer
65
views
Selenium. NoSuchElementException
can someone be able to understand what the problem of this code is?I understand that the question is not new, but what I found just didn't help me, but maybe I was looking badly
wd = webdriver.Chrome('...
0
votes
0
answers
45
views
chrome user agent doesn't work for a scrapper
I have the following code to scrape images:
import os, requests, lxml, re, json, urllib.request
from bs4 import BeautifulSoup
from os.path import expanduser
headers = {
"User-Agent": &...
-1
votes
1
answer
46
views
Beatifulsoup find_all when a tag is not inside another tag
html = """
<html>
<h2>Top Single Name</h2>
<table>
<tr>
<p>hello</p>
</tr>
</table>
<div>
...