Tries: Contacts Discussions |

9 years ago+ 2 comments

My first solution was identical and it didn't pass testcase 12:

from collections import defaultdict
from sys import stdin

# clean and concise implementation that will fail on testcase 12
# due to python wasting too much memory on deep chains
# of characters.

class Node(object):
    def __init__(self):
        self.children = defaultdict(Node)
        self.num_usage = 0

class Trie(object):
    def __init__(self):
        self.root = Node()
        
    def add(self, word):
        node = self.root
        for char in word:
            node = node.children[char]
            node.num_usage += 1
    
    def find(self, partial):
        node = self.root
        for char in partial:
            if char not in node.children:
                return 0
            node = node.children[char]
        return node.num_usage

n = int(raw_input().strip())
trie = Trie()
for a0 in xrange(n):
    op, contact = stdin.readline().strip().split()
    if   op == 'add':
        trie.add(contact)
    elif op == 'find':
        print trie.find(contact)

The problem here is that python is allocating too much memory on storage container for each node (in your case it is list, in my case it is defaultdict, etc.). That is why on testcase it is throwing error because grader exceeds its memory limit. The solution for python guys is to notice that we add a lot of very long (len == 21) rare (occurs only once) words, however, when we try to find partial, the length of partial doesn't exceed 5 (it wasn't told in tutorial, I found it imperically looking into testacases). It reflects normal situation with words -> the deeper we are going into the trie the smaller frequency becomes for each node. So we are wasting a lot of memory on very deep chains in our trie. One possible solution here is to store in the node the entire remaining suffix for each such long word if this suffix occurs only once. Only if we see this suffix again we split it into the usual trie structure. On example from the task it will be:

We add 'hack'. Our trie becomes: root -> 'h' -> 'ack'
We add 'hackerrank'. Our trie becomes: root -> 'h' -> 'a' -> 'c' -> 'k' -> 'errank'

Such scheme allows to save memory by preserving whole suffix (like 'errank') instead of creating deep chain where on each level we would have to allocate storage container for children. This is a bit stupid for this task because this optimization doesn't change memory consumption on global scale in terms of O(n) notation. Personally, I think that the grader memory limit must be slightly increased for this task.
In case you wonder here is optimized solution:

from collections import defaultdict
from sys import stdin

# the whole stupid split suffix functionality
# is only here because very rare long words will screw memory
# consumption otherwise

class Node(object):
    def __init__(self):
        self.val = ''
        self.children = defaultdict(Node)
        self.num_usage = 0
    
    @property
    def is_whole_suffix(self):
        return self.num_usage == 1   
    
    @property
    def children_suffix(self):
        assert self.is_whole_suffix         # meaningful only for whole suffixes
        return self.val[1:]                 
    
    def split_suffix(self):
        assert self.is_whole_suffix         # meaningful only for whole suffixes
        child = self.children[self.val[1]]
        child.val = self.val[1:]
        child.num_usage += 1        
        
    def add(self, char):
        if self.is_whole_suffix:            # preserve memory by concatinating
            self.val += char                # characters in suffix  if the word
            return self                     # is used only once
        
        child = self.children[char]
        if child.is_whole_suffix and len(child.val) > 1:
            child.split_suffix()

        child.val = char
        child.num_usage += 1
        return child
    
    def get(self, char):
        assert self.has(char)               # because defaultdict is used
        return self.children[char]
    
    def has(self, char):
        return char in self.children
    
    def has_same_suffix(self, rest_partial):
        assert self.is_whole_suffix         # meaningful only for whole suffixes
        return self.children_suffix[:len(rest_partial)] == rest_partial
    
class Trie(object):
    def __init__(self):
        self.root = Node()
        
    def add(self, word):
        node = self.root
        for char in word:
            node = node.add(char)
    
    def find(self, partial):
        node = self.root
        for i, char in enumerate(partial):
            if node.is_whole_suffix:
                rest_partial = partial[i:]
                return int(node.has_same_suffix(rest_partial))
            if not node.has(char):
                return 0
            node = node.get(char)
        return node.num_usage

n = int(raw_input().strip())
trie = Trie()
for a0 in xrange(n):
    op, contact = stdin.readline().strip().split()
    if   op == 'add':
        trie.add(contact)
    elif op == 'find':
        print trie.find(contact)

Cookie support is required to access HackerRank