0

I'm trying to scrape a website (url = https://sports.betway.be/nl/sports/grp/soccer/belgium/first-division-a) via python requests. I found via the network tab the corresponding name which is downloading the JSON data (it's called GetEvents if you would try). Here I copied the cURL and converted it to Python (via https://curlconverter.com/) and it's giving me this code:

import requests

cookies = {     '_gcl_au': '1.1.1661658686.1660753544',     '_ga': 'GA1.2.1596995754.1660753545',     'BETWAY_ENSIGHTEN_PRIVACY_Marketing': '1',     'BETWAY_ENSIGHTEN_PRIVACY_Analytics': '1',     'bwui_cookieToastDismissed': 'true',     'ssc_DeviceId': '0bccd9bb-9f38-4ceb-b225-2956e2163d27',     'ssc_DeviceId_HttpOnly': '0bccd9bb-9f38-4ceb-b225-2956e2163d27',     'ai_user': 'v0JA/|2022-08-17T16:25:54.617Z',     'ens_firstVisit': '1660753692684',     'bw_BrowserId': '19353358536559595037829240027349442244',     '_sp_srt_id.c606': '543c5eed-4191-4fe4-9a12-2290ac66f159.1660753575.6.1661113080.1661108591.d6222a58-ed29-4651-ac7e-0b69163cf243',     'userLanguage': 'nl',     'SpinSportVisitId': '33f030d9-79ab-45de-b0bd-1ef52ae71f37',     'ssc_btag': '13d5a754-1cb1-45ec-88a5-a4ab3da0a288',     'TrackingVisitId': '13d5a754-1cb1-45ec-88a5-a4ab3da0a288',     'bw_SessionId': '7fab38de-b50f-4d35-8333-e86a4a026a6f',     'ai_session': 'hkBwK|1661929976364.3|1661929976364.3',     'domainCookie': 'betway.be',     '_gid': 'GA1.2.2070786832.1661929989',     '_gat_UA-1515961-1': '1',     'TimezoneOffset': '120',     '_gat': '1',     '_scid': '6e81eb56-b82b-4086-b7a0-783de175b7d8',     'ens_firstPageView': 'false',     'AMCVS_74756B615BE2FD4A0A495EB8%40AdobeOrg': '1',     'AMCV_74756B615BE2FD4A0A495EB8%40AdobeOrg': '359503849%7CMCIDTS%7C19236%7CMCMID%7C25582052571887037000237579884308421467%7CMCAAMLH-1662534800%7C6%7CMCAAMB-1662534800%7CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%7CMCCIDH%7C1868381894%7CMCOPTOUT-1661937200s%7CNONE%7CMCAID%7CNONE%7CvVersion%7C5.0.1',     '_gat_reg1': '1',     '_gat_ens': '1',     'gpv_pn': 'nl%3Asports%3Agrp%3Asoccer%3Abelgium%3Afirst-division-a',     's_cc': 'true',     'StaticResourcesVersion': '12.67.0.1',     '__cf_bm': 'ljcKg_UW1mTQeV80dsFzYz8bMMSdjFYNyXvAegwnB9c-1661930020-0-AZzywRq0XM6JfSMfWMZwpLg7AkUfHOh2ZOqCgf6xeJOmLuat6hyWkt/33xjyKT5yMZAPtLUM/OpG39S+/RtEOUg=', }

headers = {'authority': 'sports.betway.be','accept': 'application/json; charset=UTF-8','accept-language': 'nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7','content-type': 'application/json; charset=UTF-8',# Requests sorts cookies= alphabetically #'cookie': '_gcl_au=1.1.1661658686.1660753544; _ga=GA1.2.1596995754.1660753545) and it's giving me this codeeting=1; BETWAY_ENSIGHTEN_PRIVACY_Analytics=1; bwui_cookieToastDismissed=true; ssc_DeviceId=0bccd9bb-9f38-4ceb-b225-2956e2163d27; ssc_DeviceId_HttpOnly=0bccd9bb-9f38-4ceb-b225-2956e2163d27; ai_user=v0JA/|2022-08-17T16:25:54.617Z; ens_firstVisit=1660753692684; bw_BrowserId=19353358536559595037829240027349442244; _sp_srt_id.c606=543c5eed-4191-4fe4-9a12-2290ac66f159.1660753575.6.1661113080.1661108591.d6222a58-ed29-4651-ac7e-0b69163cf243; userLanguage=nl; SpinSportVisitId=33f030d9-79ab-45de-b0bd-1ef52ae71f37; ssc_btag=13d5a754-1cb1-45ec-88a5-a4ab3da0a288; TrackingVisitId=13d5a754-1cb1-45ec-88a5-a4ab3da0a288; bw_SessionId=7fab38de-b50f-4d35-8333-e86a4a026a6f; ai_session=hkBwK|1661929976364.3|1661929976364.3; domainCookie=betway.be; _gid=GA1.2.2070786832.1661929989; _gat_UA-1515961-1=1; TimezoneOffset=120; _gat=1; _scid=6e81eb56-b82b-4086-b7a0-783de175b7d8; ens_firstPageView=false; AMCVS_74756B615BE2FD4A0A495EB8%40AdobeOrg=1; AMCV_74756B615BE2FD4A0A495EB8%40AdobeOrg=359503849%7CMCIDTS%7C19236%7CMCMID%7C25582052571887037000237579884308421467%7CMCAAMLH-1662534800%7C6%7CMCAAMB-1662534800%7CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%7CMCCIDH%7C1868381894%7CMCOPTOUT-1661937200s%7CNONE%7CMCAID%7CNONE%7CvVersion%7C5.0.1; _gat_reg1=1; _gat_ens=1; gpv_pn=nl%3Asports%3Agrp%3Asoccer%3Abelgium%3Afirst-division-a; s_cc=true; StaticResourcesVersion=12.67.0.1; __cf_bm=ljcKg_UW1mTQeV80dsFzYz8bMMSdjFYNyXvAegwnB9c-1661930020-0-AZzywRq0XM6JfSMfWMZwpLg7AkUfHOh2ZOqCgf6xeJOmLuat6hyWkt/33xjyKT5yMZAPtLUM/OpG39S+/RtEOUg=','origin': 'https://sports.betway.be','referer': 'https://sports.betway.be/nl/sports/grp/soccer/belgium/first-division-a','sec-ch-ua': '"Chromium";v="104", " Not A;Brand";v="99", "Google Chrome";v="104"','sec-ch-ua-mobile': '?0','sec-ch-ua-platform': '"Windows"','sec-fetch-dest': 'empty','sec-fetch-mode': 'cors','sec-fetch-site': 'same-origin','user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36', }

json_data = {'LanguageId': 8,'ClientTypeId': 2,'BrandId': 3,'JurisdictionId': 3,'ClientIntegratorId': 1,'ExternalIds': [10121019, 10121020,10121021,],'MarketCName': 'win-draw-win','ScoreboardRequest': { 'ScoreboardType': 3, 'IncidentRequest': {}, },'BrowserId': 3,'OsId': 3,'ApplicationVersion': '','BrowserVersion': '104.0.0.0','OsVersion': 'NT 10.0','SessionId': None,'TerritoryId': 21,'CorrelationId': '625f145c-8a5e-40a5-af44-a0a13116961c','VisitId': '33f030d9-79ab-45de-b0bd-1ef52ae71f37','ViewName': 'sports','JourneyId': 'd857f2d0-b79f-4346-9683-f61d1e0c9854', }

response = requests.post('https://sports.betway.be/api/Events/v2/GetEvents', cookies=cookies, headers=headers, json=json_data)

But if I run this, it's giving me 403 Forbidden.

9
  • Welcome to Stack Overflow. Forbidden means you're not permitted to send POST method requests to the url. Check if you require any tokens in the headers to do POST requests.
    – ewokx
    Commented Aug 31, 2022 at 8:18
  • Hey Ewong, thanks for your comment. Do you have any idea how to find out which tokens are required to send because like I said, I just copied the cURL (with headers included) and converted it to Python.. Commented Aug 31, 2022 at 8:23
  • Unfortunately, I have no idea. I can't access that website so you'll need to ask the website owners. Why are you sending a POST request?
    – ewokx
    Commented Aug 31, 2022 at 8:28
  • 1
    Maybe your request get's blocked for another reason. A lot of websites block requests without valid user agent information etc in the header. Here are some infirmation for headers used often when scraping: oxylabs.io/blog/5-key-http-headers-for-web-scraping
    – Oivalf
    Commented Aug 31, 2022 at 8:28
  • And as @ewong said, post seems the wrong way for webscraping. I would expect GET here.
    – Oivalf
    Commented Aug 31, 2022 at 8:31

1 Answer 1

1

Many sites do not want you to scrape and they carefully check the request header.
The 403 Forbidden tells you something. They know what you are trying to do. They are giving you a challenge. I took a quick look at this site and they use a lot of cookies. I had to write my own code to receive and post the cookies because the cookie box in PHP's curl did not work well enough.

Look at your Browsers request when you go to the site. You Browser is doing a GET request, so you you must do so too.
It looks like you are off to a good start with the cookies. But there may be a timestamp in there.

So you must pay very close attention to your request header.
Sometimes I will try a very rare User Agent. Many sites will profile the header and they can do some things you may never think of. Like comparing the SSL handshaking because not all Browsers do it the same. That why I will try a UA they will not know the profile.

If I use Firefox I have to be careful that the site cannot tell the difference between my curl request and an actual Browser
Redirects are very common. On the initial response they will store a cookie send you a 302 redirect back to themselves and will check for the cookie. They can use JavaScript to do things too.

Many times you must accept their cookies and return them in you request header.

In general if I can turn off my Browser's JavaScript and I can get to the data I want, I know I can scrape the site.

The site you are trying scrape cannot be navigated without JavaScript. So you need to go directly to the page that has what you want. You may need to first go to the index page to get the cookies that may be needed to enter the page you targeted.



Sometimes I get lucky and find all the data I want buried as JSON in a JavaScript as an object.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.