Java Regex 2 - Duplicate Words Discussions | Java

9 years ago+ 53 comments

Java solution - passes 100% of test cases

As the problem statement says: you will fail the challenge if you modify anything other than the three locations that the comments direct you to complete

Regular Expression Reference

I also found this Regex Matcher Tutorial helpful.

import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class DuplicateWords {

    public static void main(String[] args) {

        String regex = "\\b(\\w+)(?:\\W+\\1\\b)+";
        Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);

        Scanner in = new Scanner(System.in);
        int numSentences = Integer.parseInt(in.nextLine());
        
        while (numSentences-- > 0) {
            String input = in.nextLine();
            
            Matcher m = p.matcher(input);
            
            // Check for subsequences of input that match the compiled pattern
            while (m.find()) {
                input = input.replaceAll(m.group(), m.group(1));
            }
            
            // Prints the modified sentence.
            System.out.println(input);
        }
        
        in.close();
    }
}

Regex

I used this regular expression: "\b(\w+)(?:\W+\1\b)+"

When using this regular expression in Java, we have to "escape" the backslash characters with additional backslashes (as done in the code above).

\w ----> A word character: [a-zA-Z_0-9]
\W ----> A non-word character: [^\w]
\b ----> A word boundary
\1 ----> Matches whatever was matched in the 1st group of parentheses, which in this case is the (\w+)
+ ----> Match whatever it's placed after 1 or more times

The \b boundaries are needed for special cases such as "Bob and Andy" (we don't want to match "and" twice). Another special case is "My thesis is great" (we don't want to match "is" twice).

Groups

input = input.replaceAll(m.group(), m.group(1))

The line of code above replaces the entire match with the first group in the match.

m.group() is the entire match
m.group(i) is the ith match. So m.group(1) is the 1st match (which is enclosed in the 1st set of parentheses)

The ?: is added to make it a "non-capturing group" (meaning you can't do group() to get the group), for slightly faster performance.

Hope this helps.

10/20/18 - Looks like the problem statement changed a bit, and digits should no longer be in the regular expression. User @4godspeed has an updated solution that may work.

From my HackerRank Java solutions.

View more Comments..

Java solution - passes 100% of test cases

Regex

Groups

Cookie support is required to access HackerRank