Tries: Contacts

  • + 2 comments

    My first solution was identical and it didn't pass testcase 12:

    from collections import defaultdict
    from sys import stdin
    
    # clean and concise implementation that will fail on testcase 12
    # due to python wasting too much memory on deep chains
    # of characters.
    
    class Node(object):
        def __init__(self):
            self.children = defaultdict(Node)
            self.num_usage = 0
    
    class Trie(object):
        def __init__(self):
            self.root = Node()
            
        def add(self, word):
            node = self.root
            for char in word:
                node = node.children[char]
                node.num_usage += 1
        
        def find(self, partial):
            node = self.root
            for char in partial:
                if char not in node.children:
                    return 0
                node = node.children[char]
            return node.num_usage
    
    n = int(raw_input().strip())
    trie = Trie()
    for a0 in xrange(n):
        op, contact = stdin.readline().strip().split()
        if   op == 'add':
            trie.add(contact)
        elif op == 'find':
            print trie.find(contact)
    

    The problem here is that python is allocating too much memory on storage container for each node (in your case it is list, in my case it is defaultdict, etc.). That is why on testcase it is throwing error because grader exceeds its memory limit. The solution for python guys is to notice that we add a lot of very long (len == 21) rare (occurs only once) words, however, when we try to find partial, the length of partial doesn't exceed 5 (it wasn't told in tutorial, I found it imperically looking into testacases). It reflects normal situation with words -> the deeper we are going into the trie the smaller frequency becomes for each node. So we are wasting a lot of memory on very deep chains in our trie. One possible solution here is to store in the node the entire remaining suffix for each such long word if this suffix occurs only once. Only if we see this suffix again we split it into the usual trie structure. On example from the task it will be:

    • We add 'hack'. Our trie becomes: root -> 'h' -> 'ack'
    • We add 'hackerrank'. Our trie becomes: root -> 'h' -> 'a' -> 'c' -> 'k' -> 'errank'

    Such scheme allows to save memory by preserving whole suffix (like 'errank') instead of creating deep chain where on each level we would have to allocate storage container for children. This is a bit stupid for this task because this optimization doesn't change memory consumption on global scale in terms of O(n) notation. Personally, I think that the grader memory limit must be slightly increased for this task.
    In case you wonder here is optimized solution:

    from collections import defaultdict
    from sys import stdin
    
    # the whole stupid split suffix functionality
    # is only here because very rare long words will screw memory
    # consumption otherwise
    
    class Node(object):
        def __init__(self):
            self.val = ''
            self.children = defaultdict(Node)
            self.num_usage = 0
        
        @property
        def is_whole_suffix(self):
            return self.num_usage == 1   
        
        @property
        def children_suffix(self):
            assert self.is_whole_suffix         # meaningful only for whole suffixes
            return self.val[1:]                 
        
        def split_suffix(self):
            assert self.is_whole_suffix         # meaningful only for whole suffixes
            child = self.children[self.val[1]]
            child.val = self.val[1:]
            child.num_usage += 1        
            
        def add(self, char):
            if self.is_whole_suffix:            # preserve memory by concatinating
                self.val += char                # characters in suffix  if the word
                return self                     # is used only once
            
            child = self.children[char]
            if child.is_whole_suffix and len(child.val) > 1:
                child.split_suffix()
    
            child.val = char
            child.num_usage += 1
            return child
        
        def get(self, char):
            assert self.has(char)               # because defaultdict is used
            return self.children[char]
        
        def has(self, char):
            return char in self.children
        
        def has_same_suffix(self, rest_partial):
            assert self.is_whole_suffix         # meaningful only for whole suffixes
            return self.children_suffix[:len(rest_partial)] == rest_partial
        
    class Trie(object):
        def __init__(self):
            self.root = Node()
            
        def add(self, word):
            node = self.root
            for char in word:
                node = node.add(char)
        
        def find(self, partial):
            node = self.root
            for i, char in enumerate(partial):
                if node.is_whole_suffix:
                    rest_partial = partial[i:]
                    return int(node.has_same_suffix(rest_partial))
                if not node.has(char):
                    return 0
                node = node.get(char)
            return node.num_usage
    
    n = int(raw_input().strip())
    trie = Trie()
    for a0 in xrange(n):
        op, contact = stdin.readline().strip().split()
        if   op == 'add':
            trie.add(contact)
        elif op == 'find':
            print trie.find(contact)