Newest 'html-parsing' Questions

0 votes

1 answer

44 views

BeautifulSoup - Extracting content blocks after specific subheadings within a larger section, ignoring document introduction

I am scraping the Dead by Daylight Fandom wiki (specifically TOME pages, e.g., https://deadbydaylight.fandom.com/wiki/Tome_1_-_Awakening) to extract memory logs. The goal is to extract the Memory ...

zeromiedo

1

asked yesterday

0 votes

2 answers

202 views

Beautiful Soup, children are clearly inside but can't get it

From the below structure I only want value of href attribute. But rec_block is returning h5 element without its children so basically <h5 class="series">Recommendations</h5>. <...

Emby

1

asked Nov 25 at 18:27

0 votes

0 answers

56 views

Issue With Jsoup Document Selector

I'm using java spring boot and jsoup and recently I upgraded jsoup version to 1.21.1. My code creates search query and searches for it in the document Elements targetElements = document.select(...

user613

243

asked Nov 18 at 9:04

Advice

1 vote

0 replies

96 views

Parsing with Python html.parser: accessing and using raw tags

I'm not a Python specialist, so bear with me. I'm trying to replace a Perl HTML::TokeParser based parser that I use for template foreign language translation to use Python html.parser. Here's the ...

Hugh Barnard

352

asked Oct 29 at 10:50

3 votes

1 answer

61 views

Beautiful Soup; splitting a paragraph only by <br> where stripped_strings is not working

I'm rather new to using Beautiful Soup and I'm having some issues splitting some html correctly by only looking at html breaks and ignoring other html elements such as changes in font color etc. The ...

James Brian

33

asked Aug 30 at 17:29

0 votes

1 answer

47 views

Can we combine '[class="a"]' and '[id="c d"]' in the same command?

I have a html where I want to get elements with class="a" and id="c d". If I have only one of them, I can use soup.select('[class="a"]') and soup.select('[id="c d&...

Akira

2,820

asked May 25 at 13:27

0 votes

0 answers

99 views

parse marked customize for list

I've seen the docs https://marked.js.org/using_pro#renderer and it has no example for the list i want to customize more detail https://github.com/markedjs/marked/blob/master/src/Tokens.ts#L137 as the ...

zummon

996

asked Apr 4 at 0:57

4 votes

5 answers

184 views

How to extract links from an html page

I have an html page that has data like so: <td><a href="test-2025-03-24_17-05.log">test-2025-03-24_17-05.log</a></td> <td><a href="PASS_report_test_2025-...

Archie

389

asked Mar 26 at 20:23

3 votes

1 answer

93 views

Why isn't the end tag included in an ASIDE.OuterHTML

My intent was to give an advise on the question Delete everything between two strings (inclusive) to use the HTMLDocument parser instead of a text based replace command. But somehow the OuterHTML ...

iRon

24.4k

asked Mar 3 at 9:46

-1 votes

1 answer

52 views

why is my html parser not outputting wanted number

my programming teacher made us program in python a calculator for calculating fuel consummation in L/100KM and i decided to go further and even have it calculate the price per 100km but heres the ...

VXV

1

asked Feb 12 at 22:26

1 vote

0 answers

31 views

Passing CSRF token through Dart html parsing

I'm making an app where students can login to there portal website and it shows their data, however I'm having trouble authenticated users, when I did this project on another website I used ...

abtlb

11

asked Feb 12 at 9:54

1 vote

2 answers

92 views

Extracting text from Wikisource using BeautifulSoup returns empty result

I'm trying to extract the text of a book from a Wikisource page using BeautifulSoup, but the result is always empty. The page I'm working on is Le Père Goriot by Balzac. Here's the code I'm using: ...

Hugo Durif

13

asked Jan 30 at 21:31

-1 votes

2 answers

85 views

Parser on python returns an empty list (i guess its an HTML class selection issue)

The idea is: i wanna collect the name of the flat and its price as a list for every flat on the website. Ive made a simple parser on python, but looks like i cant get any values, since it returns an ...

Danny Mxxre

1

asked Jan 18 at 16:45

1 vote

1 answer

151 views

How can I scrape a table from baseball reference using pandas and beautiful soup? [duplicate]

I am trying to scrape the pitching stats on this url and then save the dataframe to a csv file. https://www.baseball-reference.com/boxes/ARI/ARI202204070.shtml My current code is below (Python 3.9.7) ...

Preston Albury

13

asked Jan 14 at 6:11

0 votes

1 answer

53 views

Duplicate extra data when webscraping fbref.com

I am trying to webscrape the league table for the EPL, but when I do that I am getting duplicate links as well as links to the teams that are not even in the premier league which makes no sense. Here ...

Vignesh

27

asked Dec 26, 2024 at 22:39

Collectives™ on Stack Overflow

BeautifulSoup - Extracting content blocks after specific subheadings within a larger section, ignoring document introduction

Beautiful Soup, children are clearly inside but can't get it

Issue With Jsoup Document Selector

Parsing with Python html.parser: accessing and using raw tags

Beautiful Soup; splitting a paragraph only by <br> where stripped_strings is not working

Can we combine '[class="a"]' and '[id="c d"]' in the same command?

parse marked customize for list

How to extract links from an html page

Why isn't the end tag included in an ASIDE.OuterHTML

why is my html parser not outputting wanted number

Passing CSRF token through Dart html parsing

Extracting text from Wikisource using BeautifulSoup returns empty result

Parser on python returns an empty list (i guess its an HTML class selection issue)

How can I scrape a table from baseball reference using pandas and beautiful soup? [duplicate]

Duplicate extra data when webscraping fbref.com

Hot Network Questions