Skip to main content

All Questions

Tagged with
0 votes
2 answers
65 views

Unconditionally stop scraping HTML at specified element (or EOF)

I'm using Python lxml.html package to scrape an HTML file. The HTML I'm trying to scrape reads in part <h1>Description of DAB Ensemble 1</h1><table>Stuff I don't care about</...
David Chung's user avatar
2 votes
0 answers
77 views

Encoding issue with parsing the same text using lxml.etree

I am parsing an HTML script using lxml.etree library. I am facing a weird issue where when I parse the same exact script and get the content using a different XPATH, the encoding of the retrieved text ...
Minions's user avatar
  • 5,497
2 votes
2 answers
48 views

lxml library doesn't extract text in a given html tag when there is another tag with the text

I have the following html script: <div> <p class="test1"> <i class="empty"> </i> WANTED TEXT </p> </div> I want to ...
Minions's user avatar
  • 5,497
1 vote
1 answer
41 views

Lxml check if element is there and get href of sibling if true

I want to check if a specific element has text and if true I want to get the href of a sibling element. There are multiple Products like this on the site and I want to check each one for availability. ...
Hey Hey's user avatar
  • 13
1 vote
0 answers
117 views

Disable BeautifulSoup lxml parser logs

I am working with the BeautifulSoup library to parse multiple HTMLs and I keep getting this annoying error message and can't find a way to stop printing it. document = BeautifulSoup(html_content_bytes,...
Franco Lopez Paviolo's user avatar
0 votes
2 answers
55 views

Loop isn't scraping multiple pages, only returning data from one page repeatedly

import requests from bs4 import BeautifulSoup import pandas as pd headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/...
Akhilesh Desai's user avatar
1 vote
1 answer
959 views

Python: cannot import name 'etree' from 'lxml'

I am trying to import html as lxml, i have tried to uninstall and install, without luck, i keep getting the following error: File "C:\devtools\Python3\Lib\site-packages\lxml\html\__init__.py"...
Rahbek's user avatar
  • 11
-2 votes
1 answer
843 views

Need help scraping data off of 3rd party dynamic website (lolalytics.com)

So for example I'm trying to scrape specific data such as delta's on pages like: https://lolalytics.com/lol/nautilus/build/. I know the xpath of the elements that I need such as : //*[@id="root&...
van dam's user avatar
  • 69
0 votes
0 answers
139 views

Python lxml html tostring

I have the following Code and Output (without colon and text before it): from lxml import html root = html.fromstring('<ac:link> <ri:attachment filename="test.pdf"/> </ac:...
TheDuck's user avatar
  • 15
1 vote
1 answer
77 views

Python lxml method tostring with selfclosing tag <br/>

I have the following Code and Output (without closing br-tag): from lxml import html root = html.fromstring('<br/>') html.tostring(root) Output : '<br>' However, I have expected some ...
TheDuck's user avatar
  • 15
0 votes
0 answers
119 views

Round-trip HTML using Python xml.etree.ElementTree or lxml.ElementTree

I have code that creates and saves XML fragments. Now I would like to handle HTML as well. ElementTree.write() has a method="html" parameter that suppresses end tags for "area", &...
samwyse's user avatar
  • 2,996
0 votes
1 answer
147 views

Getting raw text from lxml

Trying to get the text from an HtmlElement in lxml. For example, I have the HTML read in by thing = lxml.html.fromstring("<code>&lt;div&gt;</code>") But when I call ...
erobson's user avatar
  • 13
0 votes
1 answer
118 views

How to reconstruct original tag

Working with lxml and Python. Given an HtmlElement, I'd like to reconstruct the original opening tag used to define it with attributes. For example, if I have an HtmlElement representing <p id=&...
erobson's user avatar
  • 13
1 vote
1 answer
64 views

How to scrape all values from a table like HTML DIV structure without missing some of them?

Im just 3 months into learning python and I run into a little problem while building a Finance Yahoo web Scraper. import pandas as pd from bs4 import BeautifulSoup import lxml import requests import ...
Dominik Kostienko's user avatar
0 votes
2 answers
1k views

Selecting an element with multiple classes in python lxml using xpath

I was trying to scrape a website using python request and lxml. I could easily select the elements with single class using html.xpath() but I can't figure out how to select the elements with multiple ...
Ajay Pun Magar's user avatar

15 30 50 per page
1
2 3 4 5
32