HTML tags - ignore nested

onez

Member
Joined
Mar 10, 2006
Messages
10
Lets say you have a HTML tables nested in each other but you want the whole table (ignoring the inner childs end tags) how do you do this.

this doesnt work:
<table(.)*?(<table(.)*?</table>)*(.)*?</table>

Example
sdfsdfsd<table> lskdjfsd<table>dsklfj</table>sldkfj</table>asdfsd
 
onez said:
Lets say you have a HTML tables nested in each other but you want the whole table (ignoring the inner childs end tags) how do you do this.

this doesnt work:
<table(.)*?(<table(.)*?</table>)*(.)*?</table>

Example
sdfsdfsd<table> lskdjfsd<table>dsklfj</table>sldkfj</table>asdfsd

Your example shows the nested tags as being on the same line. If you set your options so that ^ matches the beginning of a line, then this should work:

^[^<]*(<.*>).*

Good Luck!
 
Solution

I think this is the solution, modified it from an expression that ignored nested parenthesis.

<table>(?>[^<>]+|<table>(?<DEPTH>)|</table>(?<-DEPTH>))*(?(DEPTH)(?!))</table>

Come on man, why would I be searching for HTML tables on one line?
 
onez said:
I think this is the solution, modified it from an expression that ignored nested parenthesis.

<table>(?>[^<>]+|<table>(?<DEPTH>)|</table>(?<-DEPTH>))*(?(DEPTH)(?!))</table>
It seems like (<table>)(.*)(</table>) should work if you set the dot to match newlines. Since the RegEx is greedy it should then grab everything. I added the parentheses in case you needed to only work with the stuff inside the outside tags.

As an aside, have you ever been to www.RegExBuddy.com? Fabulous tool for $30 US and it is exceedingly rare for email support to be more than 24 hours.
 
Get the pages html and then use an xml parser instead of regular expressions. Ive done it before in php and it worked pretty well :P
 
Last edited by a moderator:
Back
Top