Revisions to HTML cleaner in JavaScript - Code Review Stack Exchange

replaced http://stackoverflow.com/ with https://stackoverflow.com/

Source Link

edited May 23, 2017 at 12:40

1

First of all, you can't parse HTML with regex you can't parse HTML with regex. RegEx is totally the wrong person for the job. Your code there assumes that the input HTML is properly formatted. If given broken HTML, you code won't even stand a chance. I suggest you use a parser instead. Creating one isn't trivial, so you might want to look for one out there.

Now, I see you use $ and I thought it was jQuery until I saw the code define it. Avoid using $. Name your function verbosely instead. That way, you don't confuse people to thinking it is jQuery when it isn't.

Now, speaking of jQuery, you could theoretically use jQuery to strip out HTML since it has a DOM parser and handy DOM functions:

// Get the HTML
var html = $('#code').val();
var DOM = $(html);

// Collect selectors to remove
var toRemove = [];
if(head.checked) toRemove.push('head');
if(style.checked) toRemove.push('style');
if(script.checked) toRemove.push('script');
if(form.checked) toRemove.push('form');

DOM.remove(toRemove.join(','));

if(comments.checked) DOM = DOM.filter('*');

var cleanHTML = DOM.html();

Just a theory. Dunno how it would really perform.

First of all, you can't parse HTML with regex. RegEx is totally the wrong person for the job. Your code there assumes that the input HTML is properly formatted. If given broken HTML, you code won't even stand a chance. I suggest you use a parser instead. Creating one isn't trivial, so you might want to look for one out there.

Now, I see you use $ and I thought it was jQuery until I saw the code define it. Avoid using $. Name your function verbosely instead. That way, you don't confuse people to thinking it is jQuery when it isn't.

Now, speaking of jQuery, you could theoretically use jQuery to strip out HTML since it has a DOM parser and handy DOM functions:

// Get the HTML
var html = $('#code').val();
var DOM = $(html);

// Collect selectors to remove
var toRemove = [];
if(head.checked) toRemove.push('head');
if(style.checked) toRemove.push('style');
if(script.checked) toRemove.push('script');
if(form.checked) toRemove.push('form');

DOM.remove(toRemove.join(','));

if(comments.checked) DOM = DOM.filter('*');

var cleanHTML = DOM.html();

Just a theory. Dunno how it would really perform.

First of all, you can't parse HTML with regex. RegEx is totally the wrong person for the job. Your code there assumes that the input HTML is properly formatted. If given broken HTML, you code won't even stand a chance. I suggest you use a parser instead. Creating one isn't trivial, so you might want to look for one out there.

Now, I see you use $ and I thought it was jQuery until I saw the code define it. Avoid using $. Name your function verbosely instead. That way, you don't confuse people to thinking it is jQuery when it isn't.

Now, speaking of jQuery, you could theoretically use jQuery to strip out HTML since it has a DOM parser and handy DOM functions:

// Get the HTML
var html = $('#code').val();
var DOM = $(html);

// Collect selectors to remove
var toRemove = [];
if(head.checked) toRemove.push('head');
if(style.checked) toRemove.push('style');
if(script.checked) toRemove.push('script');
if(form.checked) toRemove.push('form');

DOM.remove(toRemove.join(','));

if(comments.checked) DOM = DOM.filter('*');

var cleanHTML = DOM.html();

Just a theory. Dunno how it would really perform.

added 640 characters in body

Source Link

edited May 14, 2014 at 8:12

Joseph

25.4k
2
26
37

First of all, you can't parse HTML with regex. RegEx is totally the wrong person for the job. Your code there assumes that the input HTML is properly formatted. If given broken HTML, you code won't even stand a chance. I suggest you use a parser instead. Creating one isn't trivial, so you might want to look for one out there.

Now, I see you use $ and I thought it was jQuery until I saw the code define it. Avoid using $. Name your function verbosely instead. That way, you don't confuse people to thinking it is jQuery when it isn't.

Now, speaking of jQuery, you could theoretically use jQuery to strip out HTML since it has a DOM parser and handy DOM functions:

// Get the HTML
var html = $('#code').val();
var DOM = $(html);

// Collect selectors to remove
var toRemove = [];
if(head.checked) toRemove.push('head');
if(style.checked) toRemove.push('style');
if(script.checked) toRemove.push('script');
if(form.checked) toRemove.push('form');

DOM.remove(toRemove.join(','));

if(comments.checked) DOM = DOM.filter('*');

var cleanHTML = DOM.html();

Just a theory. Dunno how it would really perform.

First of all, you can't parse HTML with regex. RegEx is totally the wrong person for the job. Your code there assumes that the input HTML is properly formatted. If given broken HTML, you code won't even stand a chance. I suggest you use a parser instead. Creating one isn't trivial, so you might want to look for one out there.

Now, I see you use $ and I thought it was jQuery until I saw the code define it. Avoid using $. Name your function verbosely instead. That way, you don't confuse people to thinking it is jQuery when it isn't.

First of all, you can't parse HTML with regex. RegEx is totally the wrong person for the job. Your code there assumes that the input HTML is properly formatted. If given broken HTML, you code won't even stand a chance. I suggest you use a parser instead. Creating one isn't trivial, so you might want to look for one out there.

Now, I see you use $ and I thought it was jQuery until I saw the code define it. Avoid using $. Name your function verbosely instead. That way, you don't confuse people to thinking it is jQuery when it isn't.

Now, speaking of jQuery, you could theoretically use jQuery to strip out HTML since it has a DOM parser and handy DOM functions:

// Get the HTML
var html = $('#code').val();
var DOM = $(html);

// Collect selectors to remove
var toRemove = [];
if(head.checked) toRemove.push('head');
if(style.checked) toRemove.push('style');
if(script.checked) toRemove.push('script');
if(form.checked) toRemove.push('form');

DOM.remove(toRemove.join(','));

if(comments.checked) DOM = DOM.filter('*');

var cleanHTML = DOM.html();

Just a theory. Dunno how it would really perform.

Source Link

answered May 14, 2014 at 7:59

Joseph

25.4k
2
26
37

First of all, you can't parse HTML with regex. RegEx is totally the wrong person for the job. Your code there assumes that the input HTML is properly formatted. If given broken HTML, you code won't even stand a chance. I suggest you use a parser instead. Creating one isn't trivial, so you might want to look for one out there.

Now, I see you use $ and I thought it was jQuery until I saw the code define it. Avoid using $. Name your function verbosely instead. That way, you don't confuse people to thinking it is jQuery when it isn't.

Stack Exchange Network

Return to Answer