2

So I have a web scraping project where one of the pages has all the necessary content in JSON format inside a set of <script> tags.

here's an example of said <script> tags:

<script>
  window.postData = {}
  window.postData["content"] = [json content]
</script>

I've used the HtmlAgilityPack to get to the particular <script> tags, but I am not sure how to grab just the json content from this. I can parse the JSON with JSON.net or other library/framework, so I'm not worried about that part. I'm just stuck on getting just the Json. Is there a javascript parsing library or something that I can use to get this, or is there another way to accomplish this.

Any help would be greatly appreciated!

1 Answer 1

3

Check out jint

var postDataJSON = new Engine()
    .Execute("window.postData = {}; window.postData['content'] = [json content]")
    .GetValue("window.postData");
Sign up to request clarification or add additional context in comments.

7 Comments

My guess is that this would crash since window will be undefined. Prepending window = {}; to the string should solve that though. Jint is a cool project but it's only a js-interpreter, it won't get far with scripts written for a browser that relies on the window-object or the DOM. Personally I would probably have solved this with a regex instead.
@kavun is this expecting me to know what the [json content] is supposed to be? what if I don't know what the content is? and only know that there is content there? I guess I could just dump the HtmlNode text content for the <script> node as string nodeText = node.InnerText; and use it as .Execute(nodeText).GetValue("window.postData"); ???
@Karl-JohanSjögren I'm not opposed to using regex, but I'm not that good with it.
@kavun Thank you very much. I'm going to give that a try and see how it goes. Thanks!
@kavun if I had a script that had: window.postData["content"] = [json content] window.postData["content-2"] = [json content] could I look at only one of the items or would I be stuck with looking at the two combined together? or would I be needing to post another question for that?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.