Newest 'html+python+html-parsing' Questions

0 votes

1 answer

63 views

Parse WhatsApp message read status [closed]

My question is more about html layout and parsing dynamic of content. My task: parse contacts who read my particular message in the Group. I tried to see DOM structure for the DIV block that hold that ...

Jeffrey Rasmussen

393

asked Aug 13, 2024 at 17:22

2 votes

1 answer

107 views

How to handle self-closing tags without end-slash in html.parser.HTMLParser

By default it seems that html.parser.HTMLParser cannot handle self closing tags correctly, if they are not terminated using /. E.g. it handles <img src="asfd"/> fine, but it ...

flawr

11.6k

asked Aug 4, 2024 at 12:41

1 vote

1 answer

117 views

python: parse html document with UNNESTED div tags into dataframe (using beautifulsoup)

long time user, but never had to ask my own question. I want to use python to parse a table from an html document into a dataframe. The table is NOT an html table, I think it is javascript created ...

tailor

15

asked Feb 22, 2024 at 23:19

-2 votes

1 answer

60 views

I don't understand web parsing completely

I tried using this code below to extract hyperlinks, and it worked for 1 website I tried it with: import requests from bs4 import BeautifulSoup import time def timedelay(amount): print("...&...

bacon_man284

1

asked Dec 4, 2023 at 16:24

2 votes

1 answer

49 views

Replace an HTML tag in an HTML document using Python without modifying the rest of the document

I'm making a simple Python + HTML website (as part of my study). The website menu looks like this: <ul> <li><a href="/">Home</a></li> <li><a ...

Irina Shishilova

35

asked Sep 18, 2023 at 13:13

0 votes

2 answers

90 views

Why is my code print out the same html link a lot of times?

I'm doing a following link activity on Python ( it's an assignment on Python Web Access Data - Coursera). Here is the problem: In this assignment you will write a Python program that expands on http://...

Vinh Nguyễn Thành

1

asked Jul 27, 2023 at 4:44

0 votes

1 answer

88 views

why does requests-html return content partialy?

i know, that its's because of content, being rendered by js, but requests-html supports js, so that's strange code itself: from requests_html import HTMLSession session = HTMLSession() session....

GayLord

3

asked Jul 1, 2023 at 20:00

1 vote

1 answer

33 views

Python: How can i get a list of li tags in BeautifulSoup4

I'm trying to scrape a persian webpage and i want to get 3 li tags from a ul containing 6 of them. my problem is that every li, has nested li tags in it and when i use soup.find_all('li'), it finds ...

Seyedmahdi moosavyan

95

asked Jun 12, 2023 at 15:42

0 votes

2 answers

102 views

How to find multiple tags at once along with attributes using BeautifulSoup in python3?

I am trying to find different tags at once using find_all() method of BeautifulSoup. I found a way to include all tags in the list to get the respective tags. But I am trying to get tags along with ...

David

379

asked Apr 25, 2023 at 12:16

0 votes

1 answer

109 views

Removing Specific Span Tags from a CSV file

I am trying to remove specific span tags from a csv file but my code is deleting all of them. I just need to point out certain ones to be removed for example '<span style="font-family: verdana,...

3ndurance

3

asked Jan 21, 2023 at 4:27

1 vote

2 answers

174 views

HTML parser find tag info

I have a project where uses HTMLParser(). I never worked with this parser, so I read the documentation and found two useful methods I can override to extract information from the site: handle_starttag ...

Beginner

39

asked Jan 8, 2023 at 17:48

2 votes

1 answer

56 views

How to dynamically find the nearest specific parent of a selected element?

I want to parse many html pages and remove a div that contains the text "Message", using beautifulsoup html.parser and python. The div has no name or id, so pointing to it is not possible. I ...

Newbie

590

asked Nov 30, 2022 at 18:25

0 votes

1 answer

65 views

Selenium. NoSuchElementException

can someone be able to understand what the problem of this code is?I understand that the question is not new, but what I found just didn't help me, but maybe I was looking badly wd = webdriver.Chrome('...

exPriceD

1

asked Nov 22, 2022 at 20:48

0 votes

0 answers

45 views

chrome user agent doesn't work for a scrapper

I have the following code to scrape images: import os, requests, lxml, re, json, urllib.request from bs4 import BeautifulSoup from os.path import expanduser headers = { "User-Agent": &...

v_head

815

asked Nov 20, 2022 at 17:31

-1 votes

1 answer

46 views

Beatifulsoup find_all when a tag is not inside another tag

html = """ <html> <h2>Top Single Name</h2> <table> <tr> <p>hello</p> </tr> </table> <div> ...

hit

87

asked Oct 26, 2022 at 17:45

Collectives™ on Stack Overflow

All Questions

Parse WhatsApp message read status [closed]

How to handle self-closing tags without end-slash in html.parser.HTMLParser

python: parse html document with UNNESTED div tags into dataframe (using beautifulsoup)

I don't understand web parsing completely

Replace an HTML tag in an HTML document using Python without modifying the rest of the document

Why is my code print out the same html link a lot of times?

why does requests-html return content partialy?

Python: How can i get a list of li tags in BeautifulSoup4

How to find multiple tags at once along with attributes using BeautifulSoup in python3?

Removing Specific Span Tags from a CSV file

HTML parser find tag info

How to dynamically find the nearest specific parent of a selected element?

Selenium. NoSuchElementException

chrome user agent doesn't work for a scrapper

Beatifulsoup find_all when a tag is not inside another tag

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags