There is no need for scraping HTML, or using a high-level selenium
approach.
Simulate the underlying XHR request(s) going to the server and returning the JSON data that is used to fill up the table on the page.
Here's an example using requests
:
import requests
url = 'http://stats.nba.com/stats/playergamelog'
params = {
'Season': '2013-14',
'SeasonType': 'Regular Season',
'LeagueID': '00',
'PlayerID': '2544',
'pageNo': '1',
'rowsPerPage': '100'
}
response = requests.post(url, data=params)
print response.json()
Prints the JSON structure containing the player game logs:
{u'parameters': {u'LeagueID': u'00',
u'PlayerID': 2544,
u'Season': u'2013-14',
u'SeasonType': u'Regular Season'},
u'resource': u'playergamelog',
u'resultSets': [{u'headers': [u'SEASON_ID',
u'Player_ID',
u'Game_ID',
u'GAME_DATE',
u'MATCHUP',
u'WL',
u'MIN',
u'FGM',
u'FGA',
u'FG_PCT',
u'FG3M',
u'FG3A',
u'FG3_PCT',
u'FTM',
u'FTA',
u'FT_PCT',
u'OREB',
u'DREB',
u'REB',
u'AST',
u'STL',
u'BLK',
u'TOV',
u'PF',
u'PTS',
u'PLUS_MINUS',
u'VIDEO_AVAILABLE'],
u'name': u'PlayerGameLog',
u'rowSet': [[u'22013',
2544,
u'0021301192',
u'APR 12, 2014',
u'MIA @ ATL',
u'L',
37,
10,
22,
0.455,
3,
7,
0.429,
4,
8,
0.5,
3,
5,
8,
5,
0,
1,
3,
2,
27,
-13,
1],
[u'22013',
2544,
u'0021301180',
u'APR 11, 2014',
u'MIA vs. IND',
u'W',
35,
11,
20,
0.55,
2,
4,
0.5,
12,
13,
0.923,
1,
5,
6,
1,
1,
1,
2,
1,
36,
13,
1],
[u'22013',
2544,
u'0021301167',
u'APR 09, 2014',
u'MIA @ MEM',
u'L',
41,
14,
23,
0.609,
3,
5,
0.6,
6,
7,
0.857,
1,
5,
6,
5,
2,
0,
5,
1,
37,
-8,
1],
...
}
Alternative solution would be to use an NBA API, see several options here: