We use cookies to ensure you have the best browsing experience on our website. Please read our cookie policy for more information about how we use cookies.
Validating Roman Numerals
Validating Roman Numerals
Sort by
recency
|
208 Discussions
|
Please Login in order to post a comment
I went with a solution based off what makes an invalid roman numeral. The numerals M, C, X, and I may be consecutive up to three times or not present at all. If they appear four or more times consecutively, the numeral is invalid. e,g, IIII is an invalid numeral. The numerals D, L, and V may not appear consecutively or not appear at all. If they appear consecutively at all, the numeral is invalid. e.g. DD is an invalid numeral.
My regex was
(?!.*([MCXI])\1{3,})(?!.*([DLV])\2)[IVXLCMD]*
.?!
is a negative lookahead: "if the following pattern does not exist, then match". The pattern is.*([MCXI])\1{3,}
: if at any point in the string (.*
) we see M, C, X, or I ([MCXI]
) consecutively more than four times (\1{3,}
), we don't match. If we don't see this pattern, we do match. The same logic is applied for D, L, and V. What follows after simply matches any amount of characters that use roman numerals.This works because HackerRank failed to create a test case such as
IIIXVI
, which is an incorrect representation ofXIX
.The regex looks a bit hard, but in fact, if you understood the first pattern, you can copy and paste until "I".
The hard thing of this exercice is to implement the "substraction roman number" (IX, CD, XL etc...).
Firstly, we know that at the start, it's possible to have 0 to 3 "M" (for 3000+) So : M{0,3}
Then, this is the most important :
Below "M" we can have : "CM" (=900) or "CD" (=400) or "D" (=500), or "C" (0 to 3)(=100-300)
SO we have to handle these possibilities with "|" (or) :
(CM|CD|D?C{0,3})
The "pattern" is : (biggest_substraction-number|lowest_substraction_number|biggest_digit?lowest|digit{0,3})
It's mean : We can have "CM" or "CD" or "D" else we have 0 to 3 times "C".
It's maybe hard to follow, but I just want to precise that the logic can be repeated after because the pattern is same.
So for number between 99 to 10 we can have : "XC" (=90) or "XL" (=40) or "L" (=50) or "X" (0 to 3)(=10-30)
Who is the bigger substraction number here ? This is "XC", so : (XC|??|???{0,3})
Who is the lowest substraction number here ? (Obviously, the one remains) This is XL, so : (XC|XL|???{0,3})
Who is the biggest roman digit here (between L and X) ? This is "L", so : (XC|XL|L??{0,3})
Finally, who is the lowest roman digit here ? (Obviously, the one remains) This is "X", so : (XC|XL|L?X{0,3})
That's it !
Now you have : M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3}))
And you have to repeat this step for IX, IV, V and I.
I hope I not confused a lot of people. I have understand the exercise after understand this logic lmaooo, so I wanted to share with other people in case it's useful for some of you.
Here the code :
regex_pattern = r"M{0,3}(C?[MD]|D?C{0,3})(X?[LC]|L?X{0,3})(I?[VX]|V?I{1,3})$"`
I didn't like this challenge. I wasn't sure how to approach this at first, then decided I should do it by base-10 number place values. I started with the ones and built up from there. After I thought the regex was about 80% complete, I submitted it to the HackerRank tester. It found a couple of problems, which took a few minutes more to fix.
The regex is not exhaustive, but it passed all the HackerRank tests. I see others have posted more concise ones here, but are they exhaustive? Clearly, I have more to learn about regexes.