Tag Content Extractor

  • + 5 comments

    This assumes the content can't contain '<'. However, what if the contents contains math, like: <tag>3 <= x</tag> ? Then it won't get picked up by group 2.

    I tried solving it while taking that into account, but was stuck trying to insert the "/" into ANY number of starting tags, since we can't directly insert characters into a group while using "\1" etc..

    <([^<>]+)>(.*)<\/\1> works fine, but what if we have two or more opening tags that are valid:
    <h1><h2>x<=3</h1></h2>

    If we quantify group 1 by extending the group to include < and >, when we reference them again for the closing tags, how do we insert the "/" into EACH tag?