We use cookies to ensure you have the best browsing experience on our website. Please read our cookie policy for more information about how we use cookies.
As the problem statement says: you will fail the challenge if you modify anything other than the three locations that the comments direct you to complete
importjava.util.Scanner;importjava.util.regex.Matcher;importjava.util.regex.Pattern;publicclassDuplicateWords{publicstaticvoidmain(String[]args){Stringregex="\\b(\\w+)(?:\\W+\\1\\b)+";Patternp=Pattern.compile(regex,Pattern.CASE_INSENSITIVE);Scannerin=newScanner(System.in);intnumSentences=Integer.parseInt(in.nextLine());while(numSentences-->0){Stringinput=in.nextLine();Matcherm=p.matcher(input);// Check for subsequences of input that match the compiled patternwhile(m.find()){input=input.replaceAll(m.group(),m.group(1));}// Prints the modified sentence.System.out.println(input);}in.close();}}
Regex
I used this regular expression: "\b(\w+)(?:\W+\1\b)+"
When using this regular expression in Java, we have to "escape" the backslash characters with additional backslashes (as done in the code above).
\w ----> A word character: [a-zA-Z_0-9]
\W ----> A non-word character: [^\w]
\b ----> A word boundary
\1 ----> Matches whatever was matched in the 1st group of parentheses, which in this case is the (\w+)
+ ----> Match whatever it's placed after 1 or more times
The \b boundaries are needed for special cases such as "Bob and Andy" (we don't want to match "and" twice). Another special case is "My thesis is great" (we don't want to match "is" twice).
Groups
input=input.replaceAll(m.group(),m.group(1))
The line of code above replaces the entire match with the first group in the match.
m.group() is the entire match
m.group(i) is the ith match. So m.group(1) is the 1st match (which is enclosed in the 1st set of parentheses)
The ?: is added to make it a "non-capturing group" (meaning you can't do group() to get the group), for slightly faster performance.
Hope this helps.
10/20/18 - Looks like the problem statement changed a bit, and digits should no longer be in the regular expression. User @4godspeed has an updated solution that may work.
Java Regex 2 - Duplicate Words
You are viewing a single comment's thread. Return to all comments →
Java solution - passes 100% of test cases
As the problem statement says: you will fail the challenge if you modify anything other than the three locations that the comments direct you to complete
Regular Expression Reference
I also found this Regex Matcher Tutorial helpful.
Regex
I used this regular expression: "\b(\w+)(?:\W+\1\b)+"
When using this regular expression in Java, we have to "escape" the backslash characters with additional backslashes (as done in the code above).
\w ----> A word character: [a-zA-Z_0-9]
\W ----> A non-word character: [^\w]
\b ----> A word boundary
\1 ----> Matches whatever was matched in the 1st group of parentheses, which in this case is the (\w+)
+ ----> Match whatever it's placed after 1 or more times
The \b boundaries are needed for special cases such as "Bob and Andy" (we don't want to match "and" twice). Another special case is "My thesis is great" (we don't want to match "is" twice).
Groups
The line of code above replaces the entire match with the first group in the match.
m.group() is the entire match
m.group(i) is the ith match. So m.group(1) is the 1st match (which is enclosed in the 1st set of parentheses)
The ?: is added to make it a "non-capturing group" (meaning you can't do
group()
to get the group), for slightly faster performance.Hope this helps.
10/20/18 - Looks like the problem statement changed a bit, and digits should no longer be in the regular expression. User @4godspeed has an updated solution that may work.
From my HackerRank Java solutions.