- Practice
- Java
- Strings
- Tag Content Extractor
- Discussions
Tag Content Extractor
Tag Content Extractor
RodneyShag + 0 comments Java solution - passes 100% of test cases
From my HackerRank solutions.
import java.util.Scanner; import java.util.regex.Matcher; import java.util.regex.Pattern; /* Solution assumes we can't have the symbol "<" as text between tags */ public class Solution{ public static void main(String[] args){ Scanner scan = new Scanner(System.in); int testCases = Integer.parseInt(scan.nextLine()); while (testCases-- > 0) { String line = scan.nextLine(); boolean matchFound = false; Pattern r = Pattern.compile("<(.+)>([^<]+)</\\1>"); Matcher m = r.matcher(line); while (m.find()) { System.out.println(m.group(2)); matchFound = true; } if ( ! matchFound) { System.out.println("None"); } } } }
Let me try to explain the regular expression:
<(.+)>
matches HTML start tags. The parentheses save the contents inside the brackets into Group #1.
([^<]+)
matches all the text in between the HTML start and end tags. We place a special restriction on the text in that it can't have the "<" symbol. The characters inside the parenthesis are saved into Group #2.
</\\1>
is to match the HTML end brace that corresponds to our previous start brace. The \1 is here to match all text from Group #1.
sunrav8586 + 0 comments int count=0; Pattern r = Pattern.compile("<(.+?)>([^<>]+)</\\1>"); Matcher m = r.matcher(line); while(m.find()) { if (m.group(2).length() !=0) { System.out.println(m.group(2)); count++; } } if (count == 0) System.out.println("None");
mschonaker + 0 comments Is a goal of this site to promote good programming practices? It should: http://stackoverflow.com/a/1732454/368544
fnhckr + 0 comments I don't like this challenge. It has at least three (!) unstated constraints on the input:
- the content can not be an empty string
- the tag name can not be an empty string
- tags itself can not be content in all cases
To make clear what I mean, let me show you how the outputs should look like according to the challenge rules.
Empty content:
input:
<a></a>
output:
So an empty line instead of
None
.
Empty tag name:
input:
<>abc</>
output:
abc
So
abc
and instead ofNone
.
Tags as content:
input:
<a>...</a>...</a>
output:
... ...</a>...
So two lines instead of one. In the first line the first
</a>
is interpreted as closing tag. In the second line the first</a>
is interpreted as part of the content and the second as closing tag.
Please repair this challenge.
rohit_ntil + 0 comments This is my solution . Clears all test cases .
String pattern ="\\<(.+)\\>([^\\<\\>]+)\\<\\/\\1\\>"; int count = 0; Pattern p = Pattern.compile(pattern); Matcher m = p.matcher(line); while(m.find()) { System.out.println(m.group(2)); count++; } if(count == 0){ System.out.println("None"); }
Sort 141 Discussions, By:
Please Login in order to post a comment