Java Regex 2 - Duplicate Words

  • + 3 comments

    I had to wrestle a lot with this one, and the solutions using non capturing group didn't made sense to me, so I copied the demo text to https://regex101.com/ and went step by step.

    Here is the breakdown:

    1) we need to match a repeating text so we begin with:

    "(\w+)"

    2) we need the word to repeat, (\1) alone won't match because we have a space between repeats so we add \W:

    "(\w+)(\W\1)"

    3) we have multiple repeats so add + to the second group:

    "(\w+)(\W\1)+"

    4) we are close but need to delimit the beginning so "Goodbye" doesn't match \b works inside or outside of the first group:

    "\b(\w+)(\W\1)+" or "(\b\w+)(\W\1)+"

    5) we need to delimit the end so "inthe" doesn't match, again \b works either inside or outside of group 2:

    "(\b\w+)(\W\1\b)+" or "\b(\w+)(\W\1)+\b"

    6) just to fix the final line we make the match case insensitive:

    There should be better ways to do it, but I think this solution is easy to understand.

    I had images linked, but the site thinks they are spam, so go copy the text and paste the steps one by one.