0

EDIT on 13-11-2022 (DD-MM-YYYY) to clarify things a bit:

I, a human, want to simply read the text contents of a website, which happens to be protected by CloudFlare protection. Yes, I know that such a protection is useful in order to prevent spam bots to do any harm. BUT I AM A HUMAN who seems to not even being given the chance to prove my humanity. Reading a website with my text browser is all I want, saving some information - like humans could do - would be even better.

I do not see anything bad or even illegal in an approach of simply reading text contents of a website, like a civilized human. Isn't that the reason why websites even offer information in the first place?

Hello stackexchange community!

After some hours of research and trying different things while coding ... I now think my best bet would be to ask some Linux and programming pro's like I am going to find on here.

So, my task is actually very simple. I want to execute a (e.g. batch) script, that visits a certain website and saves the HTML output to a text file.

Problematic about the website: It is protected by CloudFlare; JavaScript needed, which isn't supported by lynx).

So, I want to develop a simple solution that either uses Java or Linux (e.g. batch) in some ways. It has to be as lightweight as possible - and that's where my headache seems to start.

I encountered a list online on github, which aims to summarize all headless (text) browsers in various programming languages. Most of them, sadly, require the use of around ~20 dependencies, which is - in my humble opinion - not appropriate, nor feasible.

Also, throughout my research on StackOverflow I encountered rather similiar problems. Like this solution: Couldn't download an URL using curl or wget but it works in browser

So, there seems to be a solution using curl and transmitting some startup-parameters, which will then be used to overcome the JavaScript/CloudFlare obstacles.

But, I am afraid, I don't seem to be able to get this code to run properly.

This also seems to summarize my problem really well, but sadly, there are no useful answers to me: Command line tool to use JS-enabled browser to save web page

Could someone please give me a little tip on where to have a look at next?

Important about my little project: Lightweight as possible, no human user interaction required!

Thank you very much, dear community, for helping me in any way possible! My best regards to you - I am looking forward to hearing from any of you pro's :-)

6
  • 2
    Cloudflare's Javascript challenge is not meant to be bypassed by "bots", scripts, headless browsers, etc. It is a form of verification to ensure that the user is human. And it is a form of security protection, the bypassing of which would be considered illegal (I believe), so I am not sure whether this question is ok to ask.
    – Fanatique
    Commented Nov 9, 2022 at 12:41
  • Thanks for your answer! To avoid any misunderstandings here I am claryfing things: I, a human, want to access a webpage via my textbrowser (because graphics, formatting, scripts, etc. is USELESS for me) but it won't let me. So I am looking for a way to access this page with the very least overhead possible. But using a graphical browser would basically mean my approach failed completely. That's why I am asking if someone has achieved this or can, at least, think of a way to accomplish this (rather simple) task.
    – Orca37
    Commented Nov 13, 2022 at 13:02
  • On top of that, saving some information while browsing on the page with a textbrowser would be nice. I don't feel like outsmarting a protection to do any harm! First of all, I just want to access a page and read its TEXT contents.
    – Orca37
    Commented Nov 13, 2022 at 13:09
  • You seem to be confused about the definition of a bot. Where you say "Important about my little project: ... no human user interaction required". That is exactly what defines a bot. So if you are hitting technology designed to stop bots then they are specifically trying to stop people doing what you are trying to do. As such most developers, and some lawyers, will view attempts to bypass those restrictions as hacking. That's because you are trying to trick their server into doing something it's been specifically designed not to do. Commented Nov 13, 2022 at 13:54
  • I understand you may be doing this for legitimate reasons. If those restrictions are somehow impacting your rights, say through disability discrimination, then perhapse you have a legal rout to force them to help you. But unfortunately, technical routes are likely to be few, if any. Commented Nov 13, 2022 at 14:00

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.