We use cookies to ensure you have the best browsing experience on our website. Please read our cookie policy for more information about how we use cookies.
Short version: Yes, my method has collisions. So does the product of unique primes method. And that's okay, and even expected.
Any hashing method comes with the risk of collisions, with the exception of certain special cases where there are only a small number of predetermined possible unique instances. Most hash map implementations have a way to handle these collisions gracefully, with techniques like chaining. The trick is to make it unlikely that two unique objects have the same code, because it's impossible to ensure that unique objects have unique codes. For example, if you have [2^32 + 1] objects, it is guaranteed that you will have at least one collision, because you are mapping your objects onto the 32 bit integers, which has a domain of cardinality 2^32.
Let's look at the product of unique prime numbers method you proposed. (I'm assuming that you meant 'a'->2, 'b'->3, 'c'->5, 'd'->7, 'e'->11, etc.)
The empty string "" has hash code 1, as a base case.
The string "a" has hash code 2, because 1*2=2.
The string "aa" has hash code 4, because 2*2=4.
The string "aaa" has hash code 8, because 4*2=8.
...
The string of 30 'a's has hash code 1073741824, because 536870912*2=1073741824.
The string of 31 'a's has hash code -2147483648, because
1073741824*2=-2147483648.
The string of 32 'a's has hash code 0, because -2147483648*2=0
The string of 33 'a's has hash code 0, because 0*2=0
...
While the products of unique prime numbers method does yield unique numbers when working with the natural numbers, when working with 32 bit integers, (which form a commutative ring) overflows happen and you can have two unique sets of characters with the same hash code.
Addendum: I just noticed that you mentioned "because of the way that you modify the hashcode." Java's method of hashing strings (as lists of characters, not as sets of characters, like I did with my modification) has some good examples of simple hash collisions. For example, "Aa" and "BB" both have hash code 2112.
Cookie support is required to access HackerRank
Seems like cookies are disabled on this browser, please enable them to open this website
Sherlock and Anagrams
You are viewing a single comment's thread. Return to all comments →
Short version: Yes, my method has collisions. So does the product of unique primes method. And that's okay, and even expected.
Any hashing method comes with the risk of collisions, with the exception of certain special cases where there are only a small number of predetermined possible unique instances. Most hash map implementations have a way to handle these collisions gracefully, with techniques like chaining. The trick is to make it unlikely that two unique objects have the same code, because it's impossible to ensure that unique objects have unique codes. For example, if you have [2^32 + 1] objects, it is guaranteed that you will have at least one collision, because you are mapping your objects onto the 32 bit integers, which has a domain of cardinality 2^32.
Let's look at the product of unique prime numbers method you proposed. (I'm assuming that you meant 'a'->2, 'b'->3, 'c'->5, 'd'->7, 'e'->11, etc.) The empty string "" has hash code 1, as a base case. The string "a" has hash code 2, because 1*2=2. The string "aa" has hash code 4, because 2*2=4. The string "aaa" has hash code 8, because 4*2=8. ... The string of 30 'a's has hash code 1073741824, because 536870912*2=1073741824. The string of 31 'a's has hash code -2147483648, because 1073741824*2=-2147483648. The string of 32 'a's has hash code 0, because -2147483648*2=0 The string of 33 'a's has hash code 0, because 0*2=0 ...
While the products of unique prime numbers method does yield unique numbers when working with the natural numbers, when working with 32 bit integers, (which form a commutative ring) overflows happen and you can have two unique sets of characters with the same hash code.
Addendum: I just noticed that you mentioned "because of the way that you modify the hashcode." Java's method of hashing strings (as lists of characters, not as sets of characters, like I did with my modification) has some good examples of simple hash collisions. For example, "Aa" and "BB" both have hash code 2112.