I've been following this This Tutorial to scrape data from a URL as it matches my need pretty closely (3 div's deep). Unfortunately where StackOverflow no longer supports IE I can't test the Tutorial's code to see if it works as is. In my case I cannot use the chrome plugin and I have to authenticate to the site first before navigating to the URL. I also tried the solutions on question 15191847 - specifically gembird's solution - it gave me the same error.
When I run the below I get a "Run-time error '91'. I printed the ie.document to a text file and verified that the div id's I'm searching on are correct and that they're being captured. The error tosses on Set Questions = QuestionList.Children
. Any thoughts on why it may be showing me the error?
Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim QuestionList As IHTMLElement, QuestionField As IHTMLElement
Dim Questions As IHTMLElementCollection, QuestionFieldLinks As IHTMLElementCollection, QuestionFields As IHTMLElementCollection
Dim Question As IHTMLElement
Dim RowNumber As Long
Dim votes As String, url As String, views As String, QuestionId As String
url = "<<my url>>"
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate url
'Wait until IE is done loading page
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to " & url
DoEvents
Loop
Cells.Clear
'show text of HTML document returned
Set html = ie.Document
'close down IE and reset status bar
Set ie = Nothing
Application.StatusBar = ""
'put heading across the top of row 3
Range("A3").Value = "Field"
Range("B3").Value = "Values"
Set QuestionList = html.getElementByID("fieldgroup ")
Set Questions = QuestionList.Children
RowNumber = 4
For Each Question In Questions
If Question.className = "fieldrow _text-field" Then
'get a list of all of the parts of this question, and loop over them
Set QuestionFields = Question.all
For Each QuestionField In QuestionFields
'if this is the question's votes, store it (get rid of any surrounding text)
If QuestionField.className = "fieldlabel" Then
Cells(RowNumber, 1).Value = Trim(QuestionField.innerText)
End If
'likewise for views (getting rid of any text)
If QuestionField.className = "fieldvalue" Then
Cells(RowNumber, 2).Value = Trim(QuestionField.innerText)
End If
Next QuestionField
'go on to next row of worksheet
RowNumber = RowNumber + 1
End If
Next
Set html = Nothing
The HTML output looks like this.
<div class="fieldgroup " style="" group-title="">
<div class="fieldrow _text-field">
<div class="fieldlabel">Reporting</div>
<div class="fieldvalue">Yes</div>
</div>
<div class="fieldrow _text-field">
<div class="fieldlabel">Annotate ''Yes''</div>
<div class="fieldvalue">Yes</div>
</div>
...
getElementsByID
- onlygetElementByID
(singular). The ID of an HTML should be unique, so there shouldn't be more than one element. Are you usingOption Explicit
- that should give you a compiler error.fieldgroup
and thereforeQuestionList
isNothing
. Try to step thru the code with the debugger and check the outcome of the single steps. Not much we can help you with that as we don't have access to the web page.