We use cookies to ensure you have the best browsing experience on our website. Please read our cookie policy for more information about how we use cookies.
importjava.util.Scanner;importjava.util.regex.Matcher;importjava.util.regex.Pattern;/* Solution assumes we can't have the symbol "<" as text between tags */publicclassSolution{publicstaticvoidmain(String[]args){Scannerscan=newScanner(System.in);inttestCases=Integer.parseInt(scan.nextLine());while(testCases-->0){Stringline=scan.nextLine();booleanmatchFound=false;Patternr=Pattern.compile("<(.+)>([^<]+)</\\1>");Matcherm=r.matcher(line);while(m.find()){System.out.println(m.group(2));matchFound=true;}if(!matchFound){System.out.println("None");}}}}
Let me try to explain the regular expression:
<(.+)>
matches HTML start tags. The parentheses save the contents inside the brackets into Group #1.
([^<]+)
matches all the text in between the HTML start and end tags. We place a special restriction on the text in that it can't have the "<" symbol. The characters inside the parenthesis are saved into Group #2.
</\\1>
is to match the HTML end brace that corresponds to our previous start brace. The \1 is here to match all text from Group #1.
Tag Content Extractor
You are viewing a single comment's thread. Return to all comments →
Java solution - passes 100% of test cases
From my HackerRank solutions.
Let me try to explain the regular expression:
matches HTML start tags. The parentheses save the contents inside the brackets into Group #1.
matches all the text in between the HTML start and end tags. We place a special restriction on the text in that it can't have the "<" symbol. The characters inside the parenthesis are saved into Group #2.
is to match the HTML end brace that corresponds to our previous start brace. The \1 is here to match all text from Group #1.