Sherlock and Anagrams

  • + 4 comments

    Hi everyone,

    I spent a while on this problem but arrived at an efficient solution with Python. It takes at most 0.06 seconds for the longer test cases (#4 and #5). I believe its O(N^2). It’s based on the Rabin-Karp algorithm (http://www.wikiwand.com/en/Rabin–Karp_algorithm) for string searching.

    My solution can be found here (link) if it links properly.

    The idea is that you have a hash function that produces the same value for two strings if and only if they are anagrams. So for a given substring length, you compute the hash function for each possible substring, and use a dictionary (actually a defaultdict from Collections) to track how many times you have seen each anagram. Then for each value greater than 1, you add (x choose 2) to the number of pairs.

    You do this procedure for every possible substring length.

    There was an important optimization though. I found the first 26 primes (you could just list them, but I wrote a sieve), and then mapped each character in the string to the corresponding prime. So ‘abcba’ becomes [2, 3, 5, 3, 2]. Then to compute the hash function, you multiply the corresponding primes. (Critically, there won’t ever be collisions this way.) So Hash(‘abc’) -> 2*3*5 = 30. But to compute the hash value of the next string, ‘bcb’, you don’t need to compute this from scratch. Just divide by hash(‘a’) to remove ‘a’ and multiply by hash(‘b’) to add ‘b’. I think this procedure is described on the Rabin-Karp page, and is called a “Rolling Hash“. This is needed for an O(N^2) runtime. There are probably much better hash functions though; this one results in HUGE numbers, like 10^163 sometimes for the larger inputs.