Newest 'html+python+lxml' Questions

0 votes

2 answers

65 views

Unconditionally stop scraping HTML at specified element (or EOF)

I'm using Python lxml.html package to scrape an HTML file. The HTML I'm trying to scrape reads in part <h1>Description of DAB Ensemble 1</h1><table>Stuff I don't care about</...

David Chung

21

asked Oct 17, 2024 at 15:16

2 votes

0 answers

77 views

Encoding issue with parsing the same text using lxml.etree

I am parsing an HTML script using lxml.etree library. I am facing a weird issue where when I parse the same exact script and get the content using a different XPATH, the encoding of the retrieved text ...

Minions

5,497

asked Feb 16, 2024 at 22:05

2 votes

2 answers

48 views

lxml library doesn't extract text in a given html tag when there is another tag with the text

I have the following html script: <div> <p class="test1"> <i class="empty"> </i> WANTED TEXT </p> </div> I want to ...

Minions

5,497

asked Aug 2, 2023 at 20:31

1 vote

1 answer

41 views

Lxml check if element is there and get href of sibling if true

I want to check if a specific element has text and if true I want to get the href of a sibling element. There are multiple Products like this on the site and I want to check each one for availability. ...

Hey Hey

13

asked Jul 25, 2023 at 18:13

1 vote

0 answers

117 views

Disable BeautifulSoup lxml parser logs

I am working with the BeautifulSoup library to parse multiple HTMLs and I keep getting this annoying error message and can't find a way to stop printing it. document = BeautifulSoup(html_content_bytes,...

Franco Lopez Paviolo

49

asked Jun 30, 2023 at 19:16

0 votes

2 answers

55 views

Loop isn't scraping multiple pages, only returning data from one page repeatedly

import requests from bs4 import BeautifulSoup import pandas as pd headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/...

Akhilesh Desai

9

asked May 19, 2023 at 17:53

1 vote

1 answer

959 views

Python: cannot import name 'etree' from 'lxml'

I am trying to import html as lxml, i have tried to uninstall and install, without luck, i keep getting the following error: File "C:\devtools\Python3\Lib\site-packages\lxml\html\__init__.py"...

Rahbek

11

asked Apr 12, 2023 at 11:40

-2 votes

1 answer

843 views

Need help scraping data off of 3rd party dynamic website (lolalytics.com)

So for example I'm trying to scrape specific data such as delta's on pages like: https://lolalytics.com/lol/nautilus/build/. I know the xpath of the elements that I need such as : //*[@id="root&...

van dam

69

asked Mar 2, 2023 at 8:59

0 votes

0 answers

139 views

Python lxml html tostring

I have the following Code and Output (without colon and text before it): from lxml import html root = html.fromstring('<ac:link> <ri:attachment filename="test.pdf"/> </ac:...

TheDuck

15

asked Feb 17, 2023 at 12:44

1 vote

1 answer

77 views

Python lxml method tostring with selfclosing tag <br/>

I have the following Code and Output (without closing br-tag): from lxml import html root = html.fromstring('<br/>') html.tostring(root) Output : '<br>' However, I have expected some ...

TheDuck

15

asked Feb 15, 2023 at 16:52

0 votes

0 answers

119 views

Round-trip HTML using Python xml.etree.ElementTree or lxml.ElementTree

I have code that creates and saves XML fragments. Now I would like to handle HTML as well. ElementTree.write() has a method="html" parameter that suppresses end tags for "area", &...

samwyse

2,996

asked Feb 14, 2023 at 22:13

0 votes

1 answer

147 views

Getting raw text from lxml

Trying to get the text from an HtmlElement in lxml. For example, I have the HTML read in by thing = lxml.html.fromstring("<code><div></code>") But when I call ...

erobson

13

asked Feb 14, 2023 at 2:00

0 votes

1 answer

118 views

How to reconstruct original tag

Working with lxml and Python. Given an HtmlElement, I'd like to reconstruct the original opening tag used to define it with attributes. For example, if I have an HtmlElement representing <p id=&...

erobson

13

asked Feb 11, 2023 at 20:53

1 vote

1 answer

64 views

How to scrape all values from a table like HTML DIV structure without missing some of them?

Im just 3 months into learning python and I run into a little problem while building a Finance Yahoo web Scraper. import pandas as pd from bs4 import BeautifulSoup import lxml import requests import ...

Dominik Kostienko

13

asked Feb 4, 2023 at 18:43

0 votes

2 answers

1k views

Selecting an element with multiple classes in python lxml using xpath

I was trying to scrape a website using python request and lxml. I could easily select the elements with single class using html.xpath() but I can't figure out how to select the elements with multiple ...

Ajay Pun Magar

468

asked Dec 17, 2022 at 19:29

Collectives™ on Stack Overflow

All Questions

Unconditionally stop scraping HTML at specified element (or EOF)

Encoding issue with parsing the same text using lxml.etree

lxml library doesn't extract text in a given html tag when there is another tag with the text

Lxml check if element is there and get href of sibling if true

Disable BeautifulSoup lxml parser logs

Loop isn't scraping multiple pages, only returning data from one page repeatedly

Python: cannot import name 'etree' from 'lxml'

Need help scraping data off of 3rd party dynamic website (lolalytics.com)

Python lxml html tostring

Python lxml method tostring with selfclosing tag <br/>

Round-trip HTML using Python xml.etree.ElementTree or lxml.ElementTree

Getting raw text from lxml

How to reconstruct original tag

How to scrape all values from a table like HTML DIV structure without missing some of them?

Selecting an element with multiple classes in python lxml using xpath

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags