3

I'm experienced with VBA but really new with webscraping. So far I managed to extract some tables from other webpages but this one is giving me a hard time. The link is http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es

Basically, I click the arrow drop down list next to "Exportar Cuadro" button. After that, I need to change both dates that appear there to a specific one I will have into a variable.

How can I get to change that input boxes on webpage? My code so far is the next one:

Option Explicit

Sub test()

Dim URL As String, URL2 As String, URL3 As String, URL4 As String
Dim IE As Object, obj As Object, colTR As Object, doc As Object, tr As Object
Dim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tags
Dim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tags
Dim eleRow As MSHTML.IHTMLElement 'Row elements
Dim eleCol As MSHTML.IHTMLElement 'Column elements
Dim objCollection As Object
Dim j As String, i As Integer


URL = "https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=18&accion=consultarCuadroAnalitico&idCuadro=CA51&locale=es"
URL2 = "https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=18&accion=consultarCuadroAnalitico&idCuadro=CA52&locale=es"
URL3 = "https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=18&accion=consultarCuadroAnalitico&idCuadro=CA53&locale=es"
URL4 = "http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es"
'Tipos de cambio
Set IE = CreateObject("InternetExplorer.Application")

IE.Visible = True
IE.navigate URL4

Do While IE.Busy Or IE.readyState <> 4
    DoEvents
Loop

Application.Wait (Now + TimeValue("00:00:01"))

IE.document.getElementById("exportaCuadroToggle").Click

Set objCollection = IE.document.getElementsByTagName("ID")
i = 0
While i < objCollection.Length
    If objCollection(i).Value = "26/08/2019" Then
        ' Set text for search
        objCollection(i).Value = "01/08/2019"
    End If
    If objCollection(i).Name = "form-control form-control-sm fechaFin" Then
        ' Set text for search
        objCollection(i).Value = "01/08/2019"
    End If
Wend

End Sub

Note: URL, URL2 and URL3 are used in the complete code but I ommited that part for now because those links are already doing what I want.

2
  • 1
    What happens (or doesn't happen) when you run your code? Commented Aug 27, 2019 at 0:26
  • The dates I said on the question don't change. Commented Aug 27, 2019 at 2:08

3 Answers 3

2

From what I can see, changing the dates in that drop down box, doesn't update the table displayed in the page, which means there's no point of scraping that.

Unless I'm missing something, it seems way easier to download the excel file and manipulate it with vba to get the data you need. Therefore I will not address the "how to change the dates in the inputbox" issue, because I find it futile. Instead, I will suggest an alternative approach.

If you inspect the network traffic using your browser's developer tools, you'll see that when you press the "Exportar cuadro" button, a GET request is being sent which uses as parameters the start and end dates in unix timestamp and returns the corresponding excel file. You only need the URL

Here's an example of how you could get the file:

Option Explicit

Sub Test()

Dim wb As Workbook
Dim url As String
Dim startDate As Double
Dim endDate As Double
startDate = ToUnix("10/08/2019") 'use whichever date you want
endDate = ToUnix("20/08/2019") 'use whichever date you want
url = "http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es&formatoXLS.x=1&fechaInicio=" & startDate & "&fechaFin=" & endDate
Set wb = Workbooks.Open(url)

End Sub

Public Function ToUnix(dt As Date) As Double 'credits to @Tim Williams
ToUnix = DateDiff("s", "1/1/1970", dt) * 1000
End Function

The code above, for demonstration purposes, will just open the report for two random dates. Once the workbook is stored in a workbook variable you can manipulate it as usual and do whatever you need with it.

You can modify the code to suit your needs.

Now, having said that, the website offers an API with extensive documentation and examples which you could use to get whatever info you need in a fast and reliable way. I would strongly recommend looking into it.

On a separate note, there's no such thing as an HTML tag named "ID", so this:

IE.document.getElementsByTagName("ID")

should return Nothing.

2
  • Hello and thank you for your answer, where do you check the GET request? I'm really new to this HTML coding. However I already tried your solution and works perfectly for what I need. I really appreciate it. Commented Aug 27, 2019 at 14:49
  • @CarlosCasio you can see the request in the network tab of your browser’s developer tools. If you use Firefox for example just press Ctrl+Shift+E and it will show up. Glad I could help. Please don’t forget to upvote and/or accept the answers that contributed to the solution of your problem. Commented Aug 27, 2019 at 22:34
2

Looking at the API documentation referenced by @StavrosJon it seems you can do the following. The relevant API end point is:

GET series/:idSerie/datos/:fechaI/:fechaF

You can get a free token. The details regarding usage and limitations are here.

The API call requires a comma separated list of series ids as one of its parameters. You can hardcode these or, as I do, simply grab these from the existing webpage you reference and then pass in the subsequent API call. I regex out the necessary series ids.

The response is json - as detailed here - ergo you need a json parser to handle the response. I use jsonconverter.bas. Download raw code from here and add to standard module called jsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime.

I use some helper functions to ensure that I have correctly ordered dates output and that missing information is handled appropriately.

Sort output on titulo column if you want pairings of items e.g. Max / Min. Otherwise, you could implement a custom sort.


VBA:

Option Explicit
Public Sub GetData()
    '<  VBE > Tools > References > Microsoft Scripting Runtime
    Dim json As Object, re As Object, s As String, xhr As Object
    Dim startDate As String, endDate As String, ws As Worksheet, ids As String

    startDate = "2019-08-18"
    endDate = "2019-08-24"

    Dim datesDict As Object, headers(), results(), key As Variant, r As Long

    Set datesDict = GetDateDictionary(startDate, endDate)
    ReDim headers(1 To datesDict.Count + 2)
    headers(1) = "idSerie"
    headers(2) = "titulo"
    r = 3

    For Each key In datesDict.keys
        headers(r) = key
        r = r + 1
    Next

    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set re = CreateObject("VBScript.RegExp")
    Set xhr = CreateObject("MSXML2.XMLHTTP")

    With xhr
        .Open "GET", "http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es", False
        .send
        s = .responseText
        ids = GetIds(re, s)
        If ids = "No match" Then Exit Sub
        .Open "GET", "https://www.banxico.org.mx/SieAPIRest/service/v1/series/" & ids & "/datos/" & startDate & "/" & endDate & "", False 'https://www.banxico.org.mx/SieAPIRest/service/v1/doc/consultaDatosSerieRango
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .setRequestHeader "Bmx-Token", "aa833b22ee2a350192df6962b1eb6d8ea569ac895862ecc31b79b46859c7e74c" 'https://www.banxico.org.mx/SieAPIRest/service/v1/token  ''<== Replace with your generated token
        .send
        s = .responseText
    End With

    Set json = JsonConverter.ParseJson(s)("bmx")("series")

    ReDim results(1 To json.Count, 1 To UBound(headers))

    WriteOutResults ws, re, startDate, endDate, json, results, headers
End Sub

Public Sub WriteOutResults(ByVal ws As Worksheet, ByVal re As Object, ByVal startDate As String, ByVal endDate As String, ByVal json As Object, ByRef results(), ByRef headers())
    Dim item As Object, subItem As Object, key As Variant
    Dim r As Long, c As Long, datesDict As Object, nextKey As Variant
    re.Pattern = "\s{2,}"
    For Each item In json
        Set datesDict = GetDateDictionary(startDate, endDate)
        r = r + 1

        For Each key In item.keys
            Select Case key
            Case "idSerie"
                results(r, 1) = item(key)
            Case "titulo"
                results(r, 2) = re.Replace(item(key), Chr$(32))
            Case "datos"
                c = 3
                For Each subItem In item(key)
                    datesDict(subItem("fecha")) = subItem("dato")
                Next subItem
                For Each nextKey In datesDict.keys
                    results(r, c) = datesDict(nextKey)
                    c = c + 1
                Next
            End Select
        Next
    Next
    With ws
        .Cells(1, 1).Resize(1, UBound(headers)) = headers
        .Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
    End With
End Sub

Public Function GetIds(ByVal re As Object, ByVal responseText As String) As String
    Dim matches As Object, i As Long, dict As Object
    Set dict = CreateObject("Scripting.Dictionary")
    With re
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = "'(SF\d{5})'"                 'regex pattern to get json string
        If .test(responseText) Then
            Set matches = .Execute(responseText)
            For i = 0 To matches.Count - 1
                dict(matches(i).SubMatches(0)) = vbNullString
            Next
            GetIds = Join$(dict.keys, ",")
        Else
            GetIds = "No match"
        End If
    End With
End Function

Public Function GetDateDictionary(ByVal startDate As String, ByVal endDate As String) As Object
    Dim sDate As Long, eDate As Long
    Dim dateDict As Object, i As Long

    Set dateDict = CreateObject("Scripting.Dictionary")
    sDate = CDate(startDate)
    eDate = CDate(endDate)
    For i = sDate To eDate
        dateDict(Format$(i, "dd/mm/yyyy")) = vbNullString
    Next
    Set GetDateDictionary = dateDict
End Function

Example results:

enter image description here


Regex:

enter image description here

8
  • Thank you, I did see the API information yet I'm really new to this HTML thing. I only know VBA for macros on Microsoft Office and that's it. Do you have any good source where I can read more about this? How to call API's with VBA, what do I need to know about an API, etc? Thank you! Commented Aug 27, 2019 at 14:48
  • Everything about the relevant API is in the answer above. The code is VBA. API is just an interface with defined methods. You send an XHR request formatted according to API instructions in links and use a json parser to handle the response (all demonstrated above).
    – QHarr
    Commented Aug 27, 2019 at 14:59
  • This gives an introduction to using xmlhttp to make xhr requests (the url in the .open "GET" is the xhr request to the API endpoint (the dedicated interface for retrieving the info you want).
    – QHarr
    Commented Aug 27, 2019 at 15:01
  • I did try but it says "Variable not defined" on JsonConverter variable. I don't know how that sould be declared. Thank you for the link, I will read more about it. Commented Aug 27, 2019 at 16:08
  • Did you download the code from the link and add to a standard module called JsonConverter?
    – QHarr
    Commented Aug 27, 2019 at 16:09
1

I was able to change the date by putting a breakpoint on:

Set objCollection = IE.document.getElementsByTagName("ID")

Then I used the immediate window to set a variable you already declared for me:

set elecol = ie.document.queryselector("#selecPeriodoCuadro > div > div > input.form-control.form-control-sm.fechaInicio")

And used the immediate window again to change the value of the element:

elecol.value = "20/07/2019"

You can work with the other date field using this string:

"#selecPeriodoCuadro > div > div > input.form-control.form-control-sm.fechaFin"

1
  • Thank you. Although it does change the date it doesn't change the data when you click on Extract button. That's why I mark Stavros's answer as the best one. Commented Aug 28, 2019 at 23:17

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.