All Questions
26 questions
-1
votes
1
answer
62
views
extracting a string from html using HTMLParser & Python2.7
I am developing an Alexa skill and therefore, I have the standard Python (2.7) libraries available for us. Therefore, I don't have BeautifulSoup4 available to use.
I'm trying to identify the below ...
1
vote
1
answer
113
views
Replace all <img> tags with one word in XML file
I have XML file, that consits of many twit with html tags in them.
Among all the other tasks, I need to replace all tags with a word @emoji
I have written the following code:
for word in re.findall(...
-1
votes
2
answers
43
views
Python - Why isn't this specific text being found by findall regex?
EDIT: PLEASE DO NOT DOWNVOTE WITHOUT COMMENTING ON WHY YOU ARE DOWNVOTING. I AM TRYING MY BEST TO WRITE THIS PROPERLY!
I am trying to print all of the URL links of watches on a website. I have all ...
0
votes
1
answer
123
views
Python - How to use finditer regex?
I would like to find every instance of img src="([^"]+)" that is preceded by the div class="grid" and succeeded by div class="orderplacebut" in some HTML code i.e. I want to find all the images in the ...
0
votes
0
answers
20
views
Python regex only returning one result when DOTALL used [duplicate]
I am trying to return a heap of image URLs and want to include every character, such as new lines, in my findall function. However when I used the DOTALL flag and use .* in my regex, I go from having ...
0
votes
2
answers
864
views
How to change original match in re.sub
I want to split text in my html using <br> tags. If the text is longer than 50 characters, I want to replace last space before 10 characters by <br>.
The text is in <span class="value"&...
1
vote
1
answer
142
views
Replace all html tag attributes with regex
I'm trying to figure out how can I add attribute id=ID_<number> to all tags in html snippet and remove another attributes.
For example:
<div class="...">...</div>
to:
<div id="...
2
votes
2
answers
1k
views
How to remove any html tags within a specific pattern in beautifulsoup
<p>
A
<span>die</span>
is thrown \(x = {-b \pm
<span>\sqrt</span>
{b^2-4ac} \over 2a}\) twice. What is the probability of getting a sum 7 from
both the ...
0
votes
2
answers
89
views
Parsing javascript using re.findall
So I have several problems that I am trying to tackle.
First I am trying to parse this javascript I got from html.
$(document).ready(function() {
$('#commodity-show-thumbnails').bxSlider({...
2
votes
1
answer
96
views
Scrape data from an ill-formed pdf table
I am trying to scrape data from a poorly laid out pdf (URL in the following code). I will need to use information about the position of the lines/borders of the table to make meaningful data records. ...
0
votes
3
answers
80
views
Python How to get a specific code in website using re
I'm trying to make python challange.
http://www.pythonchallenge.com/pc/def/ocr.html
Ok. I know, I can just copy paste the code from source to a txt file and make things like that but I want to take it ...
0
votes
1
answer
67
views
Find the start and the end of Programming Code in a whole Text [closed]
i have html with text and also programming code (Generic), without any distinction or mark. There is a way in order to puts a mark for the start and the end of the code, suitable for any programming ...
1
vote
1
answer
363
views
Unable to select only first occurrence of href in anchor tag?
Here is my HTML code:
<ul class="asidemenu_h1">
<li class="top">
<h3>Mobiles</h3>
</li>
<li>
<a href="http://www.mega.pk/mobiles-...
-1
votes
1
answer
265
views
Regex to capitalize paragraphs in HTML python
I want to take everything in an HTML document and capitalize the sentences (within paragraph tags). The input file has everything in all caps.
My attempt has two flaws - first, it removes the ...
2
votes
1
answer
94
views
Python Regex matching string between abcd="_blank"> and </a>
How can I match strings between abcd="_blank"> and </a> using Regex in Python 2.7.
For example for abcd="_blank">ABBA</a> the result should be ABBA.