Skip to main content
replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link

First of all, you can't parse HTML with regexyou can't parse HTML with regex. RegEx is totally the wrong person for the job. Your code there assumes that the input HTML is properly formatted. If given broken HTML, you code won't even stand a chance. I suggest you use a parser instead. Creating one isn't trivial, so you might want to look for one out there.

Now, I see you use $ and I thought it was jQuery until I saw the code define it. Avoid using $. Name your function verbosely instead. That way, you don't confuse people to thinking it is jQuery when it isn't.


Now, speaking of jQuery, you could theoretically use jQuery to strip out HTML since it has a DOM parser and handy DOM functions:

// Get the HTML
var html = $('#code').val();
var DOM = $(html);

// Collect selectors to remove
var toRemove = [];
if(head.checked) toRemove.push('head');
if(style.checked) toRemove.push('style');
if(script.checked) toRemove.push('script');
if(form.checked) toRemove.push('form');

DOM.remove(toRemove.join(','));

if(comments.checked) DOM = DOM.filter('*');

var cleanHTML = DOM.html();

Just a theory. Dunno how it would really perform.

First of all, you can't parse HTML with regex. RegEx is totally the wrong person for the job. Your code there assumes that the input HTML is properly formatted. If given broken HTML, you code won't even stand a chance. I suggest you use a parser instead. Creating one isn't trivial, so you might want to look for one out there.

Now, I see you use $ and I thought it was jQuery until I saw the code define it. Avoid using $. Name your function verbosely instead. That way, you don't confuse people to thinking it is jQuery when it isn't.


Now, speaking of jQuery, you could theoretically use jQuery to strip out HTML since it has a DOM parser and handy DOM functions:

// Get the HTML
var html = $('#code').val();
var DOM = $(html);

// Collect selectors to remove
var toRemove = [];
if(head.checked) toRemove.push('head');
if(style.checked) toRemove.push('style');
if(script.checked) toRemove.push('script');
if(form.checked) toRemove.push('form');

DOM.remove(toRemove.join(','));

if(comments.checked) DOM = DOM.filter('*');

var cleanHTML = DOM.html();

Just a theory. Dunno how it would really perform.

First of all, you can't parse HTML with regex. RegEx is totally the wrong person for the job. Your code there assumes that the input HTML is properly formatted. If given broken HTML, you code won't even stand a chance. I suggest you use a parser instead. Creating one isn't trivial, so you might want to look for one out there.

Now, I see you use $ and I thought it was jQuery until I saw the code define it. Avoid using $. Name your function verbosely instead. That way, you don't confuse people to thinking it is jQuery when it isn't.


Now, speaking of jQuery, you could theoretically use jQuery to strip out HTML since it has a DOM parser and handy DOM functions:

// Get the HTML
var html = $('#code').val();
var DOM = $(html);

// Collect selectors to remove
var toRemove = [];
if(head.checked) toRemove.push('head');
if(style.checked) toRemove.push('style');
if(script.checked) toRemove.push('script');
if(form.checked) toRemove.push('form');

DOM.remove(toRemove.join(','));

if(comments.checked) DOM = DOM.filter('*');

var cleanHTML = DOM.html();

Just a theory. Dunno how it would really perform.

added 640 characters in body
Source Link
Joseph
  • 25.4k
  • 2
  • 26
  • 37

First of all, you can't parse HTML with regex. RegEx is totally the wrong person for the job. Your code there assumes that the input HTML is properly formatted. If given broken HTML, you code won't even stand a chance. I suggest you use a parser instead. Creating one isn't trivial, so you might want to look for one out there.

Now, I see you use $ and I thought it was jQuery until I saw the code define it. Avoid using $. Name your function verbosely instead. That way, you don't confuse people to thinking it is jQuery when it isn't.


Now, speaking of jQuery, you could theoretically use jQuery to strip out HTML since it has a DOM parser and handy DOM functions:

// Get the HTML
var html = $('#code').val();
var DOM = $(html);

// Collect selectors to remove
var toRemove = [];
if(head.checked) toRemove.push('head');
if(style.checked) toRemove.push('style');
if(script.checked) toRemove.push('script');
if(form.checked) toRemove.push('form');

DOM.remove(toRemove.join(','));

if(comments.checked) DOM = DOM.filter('*');

var cleanHTML = DOM.html();

Just a theory. Dunno how it would really perform.

First of all, you can't parse HTML with regex. RegEx is totally the wrong person for the job. Your code there assumes that the input HTML is properly formatted. If given broken HTML, you code won't even stand a chance. I suggest you use a parser instead. Creating one isn't trivial, so you might want to look for one out there.

Now, I see you use $ and I thought it was jQuery until I saw the code define it. Avoid using $. Name your function verbosely instead. That way, you don't confuse people to thinking it is jQuery when it isn't.

First of all, you can't parse HTML with regex. RegEx is totally the wrong person for the job. Your code there assumes that the input HTML is properly formatted. If given broken HTML, you code won't even stand a chance. I suggest you use a parser instead. Creating one isn't trivial, so you might want to look for one out there.

Now, I see you use $ and I thought it was jQuery until I saw the code define it. Avoid using $. Name your function verbosely instead. That way, you don't confuse people to thinking it is jQuery when it isn't.


Now, speaking of jQuery, you could theoretically use jQuery to strip out HTML since it has a DOM parser and handy DOM functions:

// Get the HTML
var html = $('#code').val();
var DOM = $(html);

// Collect selectors to remove
var toRemove = [];
if(head.checked) toRemove.push('head');
if(style.checked) toRemove.push('style');
if(script.checked) toRemove.push('script');
if(form.checked) toRemove.push('form');

DOM.remove(toRemove.join(','));

if(comments.checked) DOM = DOM.filter('*');

var cleanHTML = DOM.html();

Just a theory. Dunno how it would really perform.

Source Link
Joseph
  • 25.4k
  • 2
  • 26
  • 37

First of all, you can't parse HTML with regex. RegEx is totally the wrong person for the job. Your code there assumes that the input HTML is properly formatted. If given broken HTML, you code won't even stand a chance. I suggest you use a parser instead. Creating one isn't trivial, so you might want to look for one out there.

Now, I see you use $ and I thought it was jQuery until I saw the code define it. Avoid using $. Name your function verbosely instead. That way, you don't confuse people to thinking it is jQuery when it isn't.