Trie key/value store implementation comparing with HashMap

Question

I have implemented a Trie-based concurrent key/value store using hashcode similar to HashMap.
It is something like, if your hashcode is 50 (110010) then create a TRIE with array size of 4 (two binary bits), the first value is 10 which is 2,10- [][][X][] --> 00- [X][][][] --> 11- [][][][X] will be represented.
HERE The first array third element will point to second array; and second array first element will point to third array, finally your entry(key/value pair) will be placed in the third array fourth element. This will be a unique path to access my KeyValuePair through hashcode.

This is on the intention that this Trie won't require rehashing & Masking(shrinking hash to fit within fixed array, which is cause for collision). I am considering this as an alternative to ConcurrentHashMap in Java.

import java.util.concurrent.atomic.AtomicReferenceArray;

public class TrieMap {
    public static int SIZEOFEDGE = 4; 
    public static int OSIZE = 5000;
}

abstract class Node {
    public Node getLink(String key, int hash, int level){
        throw new UnsupportedOperationException();
    }
    public Node createLink(int hash, int level, String key, String val) {
        throw new UnsupportedOperationException();
    }
    public Node removeLink(String key, int hash, int level){
        throw new UnsupportedOperationException();
    }
}

class Vertex extends Node {
    String key;
    volatile String val;
    volatile Vertex next;
    
    public Vertex(String key, String val) {
        this.key = key;
        this.val = val;
    }
    
    @Override
    public boolean equals(Object obj) {
        Vertex v = (Vertex) obj;
        return this.key.equals(v.key);
    }
    
    @Override
    public int hashCode() {
        return key.hashCode();
    }
    
    @Override
    public String toString() {
        return key +"@"+key.hashCode();
    }
}


class Edge extends Node {
    volatile AtomicReferenceArray<Node> array; //This is needed to ensure array elements are volatile
    
    public Edge(int size) {
        array = new AtomicReferenceArray<Node>(8);
    }
    
    
    @Override
    public Node getLink(String key, int hash, int level){
        int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
        Node returnVal = array.get(index);
        for(;;) {
            if(returnVal == null) {
                return null;
            }
            else if((returnVal instanceof Vertex)) {
                Vertex node = (Vertex) returnVal;
                for(;node != null; node = node.next) {
                    if(node.key.equals(key)) {  
                        return node; 
                    }
                } 
                return null;
            } else { //instanceof Edge
                level = level + 1;
                index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
                Edge e = (Edge) returnVal;
                returnVal = e.array.get(index);
            }
        }
    }
    
    @Override
    public Node createLink(int hash, int level, String key, String val) { //Remove size
        for(;;) { //Repeat the work on the current node, since some other thread modified this node
            int index =  Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
            Node nodeAtIndex = array.get(index);
            if ( nodeAtIndex == null) {  
                Vertex newV = new Vertex(key, val);
                boolean result = array.compareAndSet(index, null, newV);
                if(result == Boolean.TRUE) {
                    return newV;
                }
                //continue; since new node is inserted by other thread, hence repeat it.
            } 
            else if(nodeAtIndex instanceof Vertex) {
                Vertex vrtexAtIndex = (Vertex) nodeAtIndex;
                int newIndex = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, vrtexAtIndex.hashCode(), level+1);
                int newIndex1 = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level+1);
                Edge edge = new Edge(Base10ToBaseX.Base.BASE8.getLevelZeroMask()+1);
                if(newIndex != newIndex1) {
                    Vertex newV = new Vertex(key, val);
                    edge.array.set(newIndex, vrtexAtIndex);
                    edge.array.set(newIndex1, newV);
                    boolean result = array.compareAndSet(index, vrtexAtIndex, edge); //REPLACE vertex to edge
                    if(result == Boolean.TRUE) {
                        return newV;
                    }
                   //continue; since vrtexAtIndex may be removed or changed to Edge already.
                } else if(vrtexAtIndex.key.hashCode() == hash) {//vrtex.hash == hash) {       HERE newIndex == newIndex1
                    synchronized (vrtexAtIndex) {   
                        boolean result = array.compareAndSet(index, vrtexAtIndex, vrtexAtIndex); //Double check this vertex is not removed.
                        if(result == Boolean.TRUE) {
                            Vertex prevV = vrtexAtIndex;
                            for(;vrtexAtIndex != null; vrtexAtIndex = vrtexAtIndex.next) {
                                prevV = vrtexAtIndex; // prevV is used to handle when vrtexAtIndex reached NULL
                                if(vrtexAtIndex.key.equals(key)){
                                    vrtexAtIndex.val = val;
                                    return vrtexAtIndex;
                                }
                            } 
                            Vertex newV = new Vertex(key, val);
                            prevV.next = newV; // Within SYNCHRONIZATION since prevV.next may be added with some other.
                            return newV;
                        }
                        //Continue; vrtexAtIndex got changed
                    }
                } else {   //HERE newIndex == newIndex1  BUT vrtex.hash != hash
                    edge.array.set(newIndex, vrtexAtIndex);
                    boolean result = array.compareAndSet(index, vrtexAtIndex, edge); //REPLACE vertex to edge
                    if(result == Boolean.TRUE) {
                        return edge.createLink(hash, (level + 1), key, val);
                    }
                }
            }               
            else {  //instanceof Edge
                return nodeAtIndex.createLink(hash, (level + 1), key, val);
            }
        }
    }
    
    
    @Override
    public Node removeLink(String key, int hash, int level){
        for(;;) {
            int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level);
            Node returnVal = array.get(index);
            if(returnVal == null) {
                return null;
            }
            else if((returnVal instanceof Vertex)) {
                synchronized (returnVal) {
                    Vertex node = (Vertex) returnVal;
                    if(node.next == null) {
                        if(node.key.equals(key)) {
                            boolean result = array.compareAndSet(index, node, null); 
                            if(result == Boolean.TRUE) {
                                return node;
                            }
                            continue; //Vertex may be changed to Edge
                        }
                        return null;  //Nothing found; This is not the same vertex we are looking for. Here hashcode is same but key is different. 
                    } else {
                        if(node.key.equals(key)) { //Removing the first node in the link
                            boolean result = array.compareAndSet(index, node, node.next);
                            if(result == Boolean.TRUE) {
                                return node;
                            }
                            continue; //Vertex(node) may be changed to Edge, so try again.
                        }
                        Vertex prevV = node; // prevV is used to handle when vrtexAtIndex is found and to be removed from its previous
                        node = node.next;
                        for(;node != null; prevV = node, node = node.next) {
                            if(node.key.equals(key)) {
                                prevV.next = node.next; //Removing other than first node in the link
                                return node; 
                            }
                        } 
                        return null;  //Nothing found in the linked list.
                    }
                }
            } else { //instanceof Edge
                return returnVal.removeLink(key, hash, (level + 1));
            }
        }
    }
    
}



class Base10ToBaseX {
    public static enum Base {
        /**
         * Integer is represented in 32 bit in 32 bit machine.
         * There we can split this integer no of bits into multiples of 1,2,4,8,16 bits
         */
        BASE2(1,1,32), BASE4(3,2,16), BASE8(7,3,11)/* OCTAL*/, /*BASE10(3,2),*/ 
        BASE16(15, 4, 8){       
            public String getFormattedValue(int val){
                switch(val) {
                case 10:
                    return "A";
                case 11:
                    return "B";
                case 12:
                    return "C";
                case 13:
                    return "D";
                case 14:
                    return "E";
                case 15:
                    return "F";
                default:
                    return "" + val;
                }
                
            }
        }, /*BASE32(31,5,1),*/ BASE256(255, 8, 4), /*BASE512(511,9),*/ Base65536(65535, 16, 2);
        
        private int LEVEL_0_MASK;
        private int LEVEL_1_ROTATION;
        private int MAX_ROTATION;
        
        Base(int levelZeroMask, int levelOneRotation, int maxPossibleRotation) {
            this.LEVEL_0_MASK = levelZeroMask;
            this.LEVEL_1_ROTATION = levelOneRotation;
            this.MAX_ROTATION = maxPossibleRotation;
        }
        
        int getLevelZeroMask(){
            return LEVEL_0_MASK;
        }
        int getLevelOneRotation(){
            return LEVEL_1_ROTATION;
        }
        int getMaxRotation(){
            return MAX_ROTATION;
        }
        String getFormattedValue(int val){
            return "" + val;
        }
    }
    
    public static int getBaseXValueOnAtLevel(Base base, int on, int level) {
        if(level > base.getMaxRotation() || level < 1) {
            return 0; //INVALID Input
        }
        int rotation = base.getLevelOneRotation();
        int mask = base.getLevelZeroMask();

        if(level > 1) {
            rotation = (level-1) * rotation;
            mask = mask << rotation;
        } else {
            rotation = 0;
        }
        return (on & mask) >>> rotation;
    }
}

But performance-wise, Trie took a little more milliseconds compare to ConcurrentHashMap. I could't understand why? as I thought it would be better than ConcurrentHashMap. Memory-wise, they both look similar.

Could you please tell me what is needed to improve on my code to perform better than ConcurrentHashMap, or why ConcurrentHashMap is performing better than the Trie implementation?

You can test this code using these:

Update: Improved code change

https://github.com/skanagavelu/trieConcurrentHashMap/blob/master/src/main/java/trie/TrieMap.java

HashMap is not synchronized so the performance difference could be due to that. How you measure performance can also impact the results... — assylias
– assylias, Commented Dec 2, 2015 at 15:32
@assylias I was comparing the performance with ConcurrentHasmap not with HashMap, since my code also implemented in concurrent fashion. It is my mistake on not mentioned it. i will do that now. — Kanagavelu Sugumar
– Kanagavelu Sugumar, Commented Dec 3, 2015 at 4:54
I have updated the above code for iterative approach as well in git. — Kanagavelu Sugumar
– Kanagavelu Sugumar, Commented Dec 4, 2015 at 9:51
@KanagaveluSugumar I have a question: why it should perform better? Your implementation uses synchronized which also block concurrent reads. Instead ConcurrentHashMap supports concurrent reads in each segment and it should have much better performance. I think you can do better than that (for highly mixed read/write batches) using a non blocking collection. — Adriano Repetti
– Adriano Repetti, Commented Dec 4, 2015 at 16:47

TheCoffeeCup · Accepted Answer · 2015-12-08 22:30:07Z

This abstract class:

abstract class Node {
    public Node getLink(String key, int hash, int level){
        throw new UnsupportedOperationException();
    }
    public Node createLink(int hash, int level, String key, String val) {
        throw new UnsupportedOperationException();
    }
    public Node removeLink(String key, int hash, int level){
        throw new UnsupportedOperationException();
    }
}

What's the point of an abstract class if it acts like an interface?

interface Node {

    public Node getLink(String key, int hash, int level);
    public Node createLink(int hash, int level, String key, String val);
    public Node removeLink(String key, int hash, int level);

}

Noting your comment:

What's the point of an abstract class if it acts like an interface? This will avoid adding UnsupportedOperationException method on each subclass which don't implements. In this case Vertex no need to implement it to say i am not supported.

Well, it is often bad practice to do what you are doing there, and that is why List, and other interfaces are interfaces. I have many List implementation that don't implement all the methods (there are at least 20), and I use UnsupportedOperationException there. If a method is unsupported, do it down the hierarchy; don't do it at the base class.

Which means instead of:

class Vertex extends Node {

and:

class Edge extends Node {

You have:

class Vertex implements Node {

and:

class Edge implements Node {

You consistently have:

} else {

and

}
  else

Choose one, and stick to it. The first option is recommended, as it follows standard Java conventions.

else if((returnVal instanceof Vertex)) {

Double parentheses are not necessary, and actually clutter code:

else if (returnVal instanceof Vertex) {

String key;
volatile String val;
volatile Vertex next;

This in its current state is bad practice. A get/set is better than default-level variables:

private String key;
private volatile String val;
private volatile Vertex next;

// ...

public String getKey() {
    return key;
}

public void setKey(String key) {
    this.key = key;
}

// same thing with other variables

Same thing here:

volatile AtomicReferenceArray<Node> array; //This is needed to ensure array elements are volatile

To:

private volatile AtomicReferenceArray<Node> array; //This is needed to ensure array elements are volatile

// ...

public AtomicReferenceArray<Node> getArray() {
    return array;
}

public void setArray(AtomicReferenceArray<Node> array) {
    this.array = array;
}

Here:

}



class Base10ToBaseX {

No need to have so much space between classes:

}

class Base10ToBaseX {

plusOne! Thank you! Nice review. I will take care of all these suggestions. Also is it possible to comment functionally? — Kanagavelu Sugumar
– Kanagavelu Sugumar, Commented Dec 8, 2015 at 12:52
What's the point of an abstract class if it acts like an interface? This will avoid adding UnsupportedOperationException method on each subclass which don't implements. In this case Vertex no need to implement it to say i am not supported. — Kanagavelu Sugumar
– Kanagavelu Sugumar, Commented Dec 8, 2015 at 13:00
@KanagaveluSugumar Noted, and have added some notes to my answer. — TheCoffeeCup
– TheCoffeeCup, Commented Dec 8, 2015 at 22:31

Community · Accepted Answer · 2017-05-23 12:40:57Z

The reason is huge memory is needed and created.

I have followed below improvements.

1) Recursive methodology is changed to Iterative.
2) Variable Size of TRIE is introduced to reduce NULL objects in the TOP hierarchy.
example) Bottom layer with 256 arraysize, then next one will be array size of 8, then next level will be 4 array size. So when TRIE grows, its breadth shrinked.
3) Limitation on depth of the TRIE, since few of the hashcodes having common binary bits, which leads grow in height, and most of the elements fit at the lower level. So made limitation on height of the TRIE. This may cause collision; but i am not reducing it unreasonably

With all this changes i can see some improvement around 10-15 milliseconds better than Java map for 1 lakh objects, but there is a trade off on memory. TRIE takes more memory comparatively. Here Object retrieval is scary fast since the distribution(less collision of objects) is wider here. However Memory consumption is very bad.

This memory is concerning in TWO ways,

1) Run time memory is high compare to Hashmap.
2) Memory creation (using new) is delaying the performance (CPU).

May be i missed one more improvement on getting the better hash. something like below https://stackoverflow.com/questions/2414117/explanation-of-hashmaphashint-method

Stack Exchange Network

Trie key/value store implementation comparing with HashMap

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Trie key/value store implementation comparing with HashMap

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions