2

Im trying to login and scrape a job site and send me notification when ever certain key words are found.I think i have correctly traced the xpath for the value of feild "login[iovation]" but i cannot extract the value, here is what i have done so far to login

import requests
from lxml import html
header = {"User-Agent":"Mozilla/4.0 (compatible; MSIE 5.5;Windows NT)"}
login_url = 'https://www.upwork.com/ab/account-security/login'
session_requests = requests.session()
#get csrf
result = session_requests.get(login_url)
tree=html.fromstring(result.text)
auth_token = list(set(tree.xpath('//*[@name="login[_token]"]/@value')))
auth_iovat = list(set(tree.xpath('//*[@name="login[iovation]"]/@value')))
# create payload
payload = {
    "login[username]": "[email protected]", 
    "login[password]": "pa$$w0rD", 
    "login[_token]": auth_token,
        "login[iovation]": auth_iovation, 
        "login[redir]": "/home" 
}

#perform login
scrapeurl='https://www.upwork.com/ab/find-work/'
result=session_requests.post(login_url, data = payload, headers = dict(referer = login_url))
#test the result
print result.text

This is screen shot of form data when i login successfully enter image description here

6
  • run browser without JavaScript and check if you see this values - maybe this values are added by JavaScript and requests doesn't run JavaScript Commented Nov 13, 2016 at 5:15
  • Thanks furas , when i disabled javascript the value field for loginp[iovation] disappeared Commented Nov 13, 2016 at 5:20
  • can you login without javascript ? On this page I saw file with insteresting name account-security-ui-combined.js. Maybe it can help to find this codes. Or you will have to use Selenium to control browser which run JavaScript. Commented Nov 13, 2016 at 5:23
  • Nope. I'll try to login using beautifulsoup(not sure) or as you suggested selenium Commented Nov 13, 2016 at 5:30
  • If page in browser doesn't work (doesn't login) without JavaScript then you could use Selenium because it can be easier. BS needs more work analyzing JavaScript before you create useful Python script. Commented Nov 13, 2016 at 14:56

3 Answers 3

1

This is because upworks uses something called iOvation (https://www.iovation.com/) to reduce fraud. iOvation uses digital fingerprint of your device/browser, which are sent via login[iovation] parameter.

If you look at the javascripts loaded on your site, you will find two javascript being loaded from iesnare.com domain. This domain and many others are owned by iOvaiton to drop third party javascript to identify your device/browser.

I think if you copy the string from the successful login and send it over along with all the http headers as is including the browser agent in python code, you should be okie.

Sign up to request clarification or add additional context in comments.

Comments

0

Are you sure that result is fetching 2XX code

When I am this code result = session_requests.get(login_url)..its fetching me a 403 status code, which means I am not going to login_url itself

Comments

0

They have an official API now, no need for scraping, just register for API keys.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.