All Questions
478 questions
0
votes
2
answers
65
views
Unconditionally stop scraping HTML at specified element (or EOF)
I'm using Python lxml.html package to scrape an HTML file. The HTML I'm trying to scrape reads in part
<h1>Description of DAB Ensemble 1</h1><table>Stuff I don't care about</...
2
votes
0
answers
77
views
Encoding issue with parsing the same text using lxml.etree
I am parsing an HTML script using lxml.etree library. I am facing a weird issue where when I parse the same exact script and get the content using a different XPATH, the encoding of the retrieved text ...
2
votes
2
answers
48
views
lxml library doesn't extract text in a given html tag when there is another tag with the text
I have the following html script:
<div>
<p class="test1">
<i class="empty"> </i>
WANTED TEXT
</p>
</div>
I want to ...
1
vote
1
answer
41
views
Lxml check if element is there and get href of sibling if true
I want to check if a specific element has text and if true I want to get the href of a sibling element. There are multiple Products like this on the site and I want to check each one for availability.
...
1
vote
0
answers
117
views
Disable BeautifulSoup lxml parser logs
I am working with the BeautifulSoup library to parse multiple HTMLs and I keep getting this annoying error message and can't find a way to stop printing it.
document = BeautifulSoup(html_content_bytes,...
0
votes
2
answers
55
views
Loop isn't scraping multiple pages, only returning data from one page repeatedly
import requests
from bs4 import BeautifulSoup
import pandas as pd
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/...
1
vote
1
answer
959
views
Python: cannot import name 'etree' from 'lxml'
I am trying to import html as lxml, i have tried to uninstall and install, without luck, i keep getting the following error:
File "C:\devtools\Python3\Lib\site-packages\lxml\html\__init__.py"...
-2
votes
1
answer
843
views
Need help scraping data off of 3rd party dynamic website (lolalytics.com)
So for example I'm trying to scrape specific data such as delta's on pages like: https://lolalytics.com/lol/nautilus/build/.
I know the xpath of the elements that I need such as : //*[@id="root&...
0
votes
0
answers
139
views
Python lxml html tostring
I have the following Code and Output (without colon and text before it):
from lxml import html
root = html.fromstring('<ac:link> <ri:attachment filename="test.pdf"/> </ac:...
1
vote
1
answer
77
views
Python lxml method tostring with selfclosing tag <br/>
I have the following Code and Output (without closing br-tag):
from lxml import html
root = html.fromstring('<br/>')
html.tostring(root)
Output : '<br>'
However, I have expected some ...
0
votes
0
answers
119
views
Round-trip HTML using Python xml.etree.ElementTree or lxml.ElementTree
I have code that creates and saves XML fragments. Now I would like to handle HTML as well. ElementTree.write() has a method="html" parameter that suppresses end tags for "area", &...
0
votes
1
answer
147
views
Getting raw text from lxml
Trying to get the text from an HtmlElement in lxml. For example, I have the HTML read in by
thing = lxml.html.fromstring("<code><div></code>")
But when I call ...
0
votes
1
answer
118
views
How to reconstruct original tag
Working with lxml and Python. Given an HtmlElement, I'd like to reconstruct the original opening tag used to define it with attributes. For example, if I have an HtmlElement representing <p id=&...
1
vote
1
answer
64
views
How to scrape all values from a table like HTML DIV structure without missing some of them?
Im just 3 months into learning python and I run into a little problem while building a Finance Yahoo web Scraper.
import pandas as pd
from bs4 import BeautifulSoup
import lxml
import requests
import ...
0
votes
2
answers
1k
views
Selecting an element with multiple classes in python lxml using xpath
I was trying to scrape a website using python request and lxml. I could easily select the elements with single class using html.xpath() but I can't figure out how to select the elements with multiple ...