EDIT on 13-11-2022 (DD-MM-YYYY) to clarify things a bit:
I, a human, want to simply read the text contents of a website, which happens to be protected by CloudFlare protection. Yes, I know that such a protection is useful in order to prevent spam bots to do any harm. BUT I AM A HUMAN who seems to not even being given the chance to prove my humanity. Reading a website with my text browser is all I want, saving some information - like humans could do - would be even better.
I do not see anything bad or even illegal in an approach of simply reading text contents of a website, like a civilized human. Isn't that the reason why websites even offer information in the first place?
Hello stackexchange community!
After some hours of research and trying different things while coding ... I now think my best bet would be to ask some Linux and programming pro's like I am going to find on here.
So, my task is actually very simple. I want to execute a (e.g. batch) script, that visits a certain website and saves the HTML output to a text file.
Problematic about the website: It is protected by CloudFlare; JavaScript needed, which isn't supported by lynx).
So, I want to develop a simple solution that either uses Java or Linux (e.g. batch) in some ways. It has to be as lightweight as possible - and that's where my headache seems to start.
I encountered a list online on github, which aims to summarize all headless (text) browsers in various programming languages. Most of them, sadly, require the use of around ~20 dependencies, which is - in my humble opinion - not appropriate, nor feasible.
Also, throughout my research on StackOverflow I encountered rather similiar problems. Like this solution: Couldn't download an URL using curl or wget but it works in browser
So, there seems to be a solution using curl and transmitting some startup-parameters, which will then be used to overcome the JavaScript/CloudFlare obstacles.
But, I am afraid, I don't seem to be able to get this code to run properly.
This also seems to summarize my problem really well, but sadly, there are no useful answers to me: Command line tool to use JS-enabled browser to save web page
Could someone please give me a little tip on where to have a look at next?
Important about my little project: Lightweight as possible, no human user interaction required!
Thank you very much, dear community, for helping me in any way possible! My best regards to you - I am looking forward to hearing from any of you pro's :-)