Question regarding performance of HashTable of LinkedList Nodes

up vote
0
down vote

favorite

I implemented a HashTable with variable size buckets on init of the Class, merely an Array of linked lists sized at Runtime.

The issue is that with a small number of buckets where the linked-list must be traversed (can reach approx. 5K nodes in depth) is outperforming a HashTable with more buckets differing on three orders of magnitude greater.

    int SMALL_BUCKET_SIZE = 10;

    int BIG_BUCKET_SIZE = 10000;



    HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);

    HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);

I would expect the larger HashTable to be O(1) for search where the smaller hashtable having a higher collision rate, taking more time due to traversal of the linked nodes, yet my numbers below are showing the smaller table outperforming the wider table.

Fetch SmallTable: 0.000007
Fetch BigTable: 0.000018

So I decide to loop my HashTable.get a thousand times to factor for JIT and JVM Optimization. Now I'm starting to see numbers that seem to confirm what I'd expect.

Fetch SmallTable: 0.0000013630
Fetch BigTable: 0.0000002560

My question is around the sound-ness of my logic as well as the additional moving parts here. I've pasted my test alongside a link to the implementation of the HashTable and underlying Node structures.

Looking for depth/experience from folks here who may be able to provide interactive feedback regarding variables that factor into this such as key length and hashing collision rates, bucket density, etc.

HashTableTest.java

@Test

public void canInitializeHashTableWithBucketsForPerformance() throws InterruptedException {

    double smallTableTime, bigTableTime;

    int SMALL_BUCKET_SIZE = 10;

    int BIG_BUCKET_SIZE = 10000;



    HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);

    HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);

    List<String> strings = generateRandomStringKeys(1000);



    strings.forEach(string -> bigHashtTable.put(string, 10));

    strings.forEach(string -> smallHashTable.put(string, 10));



    Consumer<String> bigHashGet = bigHashtTable::get;

    Consumer<String> smallHashGet = smallHashTable::get;



    String theString = strings.get(strings.size() - 1);



    smallTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, smallHashGet);

    bigTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, bigHashGet);



    System.out.println(String.format("Fetch SmallTable: %.10f", smallTableTime));

    System.out.println(String.format("Fetch BigTable:   %.10f", bigTableTime));



    assertTrue(smallTableTime > bigTableTime);

}



public double getElapsedTimeFactoringOutJavaOptimization(String s, Consumer<String> aMethod) {

    long start = 0, end = 0;



    for (int i = 0; i < 1000; i++) {

        start = System.nanoTime();

        aMethod.accept(s);

        end = System.nanoTime();

    }



    return (end - start) / 1_000_000_000D;

}



public List<String> generateRandomStringKeys(int numOfRandomKeys) {

    List<String> keys = new ArrayList<>();



    for (int i = 0; i < numOfRandomKeys; i++) {

        byte array = new byte[10];

        new Random().nextBytes(array);

        keys.add(new String(array, Charset.forName("UTF-8")));

    }



    return keys;

}

The test can be found here - Github - HashTableTest.java

The implementation can be found here as well - Github - HashTable.java

asked Nov 8 at 23:36

mcnichol

589

Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
– Louis Wasserman
Nov 8 at 23:40

You should also average the speed of multiple rounds. Move the System.nanoTime() calls outside of the loop
– flakes
Nov 8 at 23:43

@flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
– mcnichol
Nov 8 at 23:59

@LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
– mcnichol
Nov 9 at 0:02

@mcnichol: please define what you mean by "peek out".
– Louis Wasserman
Nov 9 at 0:03

|
show 3 more comments

up vote
0
down vote

favorite

I implemented a HashTable with variable size buckets on init of the Class, merely an Array of linked lists sized at Runtime.

    int SMALL_BUCKET_SIZE = 10;

    int BIG_BUCKET_SIZE = 10000;



    HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);

    HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);

Fetch SmallTable: 0.000007
Fetch BigTable: 0.000018

So I decide to loop my HashTable.get a thousand times to factor for JIT and JVM Optimization. Now I'm starting to see numbers that seem to confirm what I'd expect.

Fetch SmallTable: 0.0000013630
Fetch BigTable: 0.0000002560

HashTableTest.java

@Test

public void canInitializeHashTableWithBucketsForPerformance() throws InterruptedException {

    double smallTableTime, bigTableTime;

    int SMALL_BUCKET_SIZE = 10;

    int BIG_BUCKET_SIZE = 10000;



    HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);

    HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);

    List<String> strings = generateRandomStringKeys(1000);



    strings.forEach(string -> bigHashtTable.put(string, 10));

    strings.forEach(string -> smallHashTable.put(string, 10));



    Consumer<String> bigHashGet = bigHashtTable::get;

    Consumer<String> smallHashGet = smallHashTable::get;



    String theString = strings.get(strings.size() - 1);



    smallTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, smallHashGet);

    bigTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, bigHashGet);



    System.out.println(String.format("Fetch SmallTable: %.10f", smallTableTime));

    System.out.println(String.format("Fetch BigTable:   %.10f", bigTableTime));



    assertTrue(smallTableTime > bigTableTime);

}



public double getElapsedTimeFactoringOutJavaOptimization(String s, Consumer<String> aMethod) {

    long start = 0, end = 0;



    for (int i = 0; i < 1000; i++) {

        start = System.nanoTime();

        aMethod.accept(s);

        end = System.nanoTime();

    }



    return (end - start) / 1_000_000_000D;

}



public List<String> generateRandomStringKeys(int numOfRandomKeys) {

    List<String> keys = new ArrayList<>();



    for (int i = 0; i < numOfRandomKeys; i++) {

        byte array = new byte[10];

        new Random().nextBytes(array);

        keys.add(new String(array, Charset.forName("UTF-8")));

    }



    return keys;

}

The test can be found here - Github - HashTableTest.java

The implementation can be found here as well - Github - HashTable.java

asked Nov 8 at 23:36

mcnichol

589

Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
– Louis Wasserman
Nov 8 at 23:40

You should also average the speed of multiple rounds. Move the System.nanoTime() calls outside of the loop
– flakes
Nov 8 at 23:43

@flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
– mcnichol
Nov 8 at 23:59

@LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
– mcnichol
Nov 9 at 0:02

@mcnichol: please define what you mean by "peek out".
– Louis Wasserman
Nov 9 at 0:03

|
show 3 more comments

up vote
0
down vote

favorite

I implemented a HashTable with variable size buckets on init of the Class, merely an Array of linked lists sized at Runtime.

    int SMALL_BUCKET_SIZE = 10;

    int BIG_BUCKET_SIZE = 10000;



    HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);

    HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);

Fetch SmallTable: 0.000007
Fetch BigTable: 0.000018

So I decide to loop my HashTable.get a thousand times to factor for JIT and JVM Optimization. Now I'm starting to see numbers that seem to confirm what I'd expect.

Fetch SmallTable: 0.0000013630
Fetch BigTable: 0.0000002560

HashTableTest.java

@Test

public void canInitializeHashTableWithBucketsForPerformance() throws InterruptedException {

    double smallTableTime, bigTableTime;

    int SMALL_BUCKET_SIZE = 10;

    int BIG_BUCKET_SIZE = 10000;



    HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);

    HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);

    List<String> strings = generateRandomStringKeys(1000);



    strings.forEach(string -> bigHashtTable.put(string, 10));

    strings.forEach(string -> smallHashTable.put(string, 10));



    Consumer<String> bigHashGet = bigHashtTable::get;

    Consumer<String> smallHashGet = smallHashTable::get;



    String theString = strings.get(strings.size() - 1);



    smallTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, smallHashGet);

    bigTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, bigHashGet);



    System.out.println(String.format("Fetch SmallTable: %.10f", smallTableTime));

    System.out.println(String.format("Fetch BigTable:   %.10f", bigTableTime));



    assertTrue(smallTableTime > bigTableTime);

}



public double getElapsedTimeFactoringOutJavaOptimization(String s, Consumer<String> aMethod) {

    long start = 0, end = 0;



    for (int i = 0; i < 1000; i++) {

        start = System.nanoTime();

        aMethod.accept(s);

        end = System.nanoTime();

    }



    return (end - start) / 1_000_000_000D;

}



public List<String> generateRandomStringKeys(int numOfRandomKeys) {

    List<String> keys = new ArrayList<>();



    for (int i = 0; i < numOfRandomKeys; i++) {

        byte array = new byte[10];

        new Random().nextBytes(array);

        keys.add(new String(array, Charset.forName("UTF-8")));

    }



    return keys;

}

The test can be found here - Github - HashTableTest.java

The implementation can be found here as well - Github - HashTable.java

asked Nov 8 at 23:36

mcnichol

589

I implemented a HashTable with variable size buckets on init of the Class, merely an Array of linked lists sized at Runtime.

    int SMALL_BUCKET_SIZE = 10;

    int BIG_BUCKET_SIZE = 10000;



    HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);

    HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);

Fetch SmallTable: 0.000007
Fetch BigTable: 0.000018

So I decide to loop my HashTable.get a thousand times to factor for JIT and JVM Optimization. Now I'm starting to see numbers that seem to confirm what I'd expect.

Fetch SmallTable: 0.0000013630
Fetch BigTable: 0.0000002560

HashTableTest.java

@Test

public void canInitializeHashTableWithBucketsForPerformance() throws InterruptedException {

    double smallTableTime, bigTableTime;

    int SMALL_BUCKET_SIZE = 10;

    int BIG_BUCKET_SIZE = 10000;



    HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);

    HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);

    List<String> strings = generateRandomStringKeys(1000);



    strings.forEach(string -> bigHashtTable.put(string, 10));

    strings.forEach(string -> smallHashTable.put(string, 10));



    Consumer<String> bigHashGet = bigHashtTable::get;

    Consumer<String> smallHashGet = smallHashTable::get;



    String theString = strings.get(strings.size() - 1);



    smallTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, smallHashGet);

    bigTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, bigHashGet);



    System.out.println(String.format("Fetch SmallTable: %.10f", smallTableTime));

    System.out.println(String.format("Fetch BigTable:   %.10f", bigTableTime));



    assertTrue(smallTableTime > bigTableTime);

}



public double getElapsedTimeFactoringOutJavaOptimization(String s, Consumer<String> aMethod) {

    long start = 0, end = 0;



    for (int i = 0; i < 1000; i++) {

        start = System.nanoTime();

        aMethod.accept(s);

        end = System.nanoTime();

    }



    return (end - start) / 1_000_000_000D;

}



public List<String> generateRandomStringKeys(int numOfRandomKeys) {

    List<String> keys = new ArrayList<>();



    for (int i = 0; i < numOfRandomKeys; i++) {

        byte array = new byte[10];

        new Random().nextBytes(array);

        keys.add(new String(array, Charset.forName("UTF-8")));

    }



    return keys;

}

The test can be found here - Github - HashTableTest.java

The implementation can be found here as well - Github - HashTable.java

java data-structures hashtable

asked Nov 8 at 23:36

mcnichol

589

asked Nov 8 at 23:36

mcnichol

589

asked Nov 8 at 23:36

mcnichol

589

asked Nov 8 at 23:36

mcnichol

589

asked Nov 8 at 23:36

mcnichol

589

Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
– Louis Wasserman
Nov 8 at 23:40

You should also average the speed of multiple rounds. Move the System.nanoTime() calls outside of the loop
– flakes
Nov 8 at 23:43

@flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
– mcnichol
Nov 8 at 23:59

@LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
– mcnichol
Nov 9 at 0:02

@mcnichol: please define what you mean by "peek out".
– Louis Wasserman
Nov 9 at 0:03

|
show 3 more comments

Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
– Louis Wasserman
Nov 8 at 23:40

You should also average the speed of multiple rounds. Move the System.nanoTime() calls outside of the loop
– flakes
Nov 8 at 23:43

@flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
– mcnichol
Nov 8 at 23:59

@LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
– mcnichol
Nov 9 at 0:02

@mcnichol: please define what you mean by "peek out".
– Louis Wasserman
Nov 9 at 0:03

Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
– Louis Wasserman
Nov 8 at 23:40

You should also average the speed of multiple rounds. Move the System.nanoTime() calls outside of the loop
– flakes
Nov 8 at 23:43

@flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
– mcnichol
Nov 8 at 23:59

@LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
– mcnichol
Nov 9 at 0:02

@mcnichol: please define what you mean by "peek out".
– Louis Wasserman
Nov 9 at 0:03

|
show 3 more comments

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

There are many things wrong here, but a handful include:

Running this operation 1000 times and taking the difference of nanoTime for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times.

Your hash table doesn't actually work any differently for different-sized tables. You use table[getHash(key) % RADIX], which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.

System.identityHashCode is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.

While you're at it, you're not using Node.next as a field, and might as well get rid of it.

answered Nov 8 at 23:44

Louis Wasserman

144k20249320

Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed to Node nodeTree = table[getHash(key) % table.length]; Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
– mcnichol
Nov 8 at 23:56

1: Using only the last value from the final loop makes it actually worse. 3: System.identityHashCode is only used if the objects do not define a sensible hash function of their own. String has a sensible hash code.
– Louis Wasserman
Nov 8 at 23:57

1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
– mcnichol
Nov 9 at 0:13

1. The granularity of System.nanoTime isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call the hashCode method of whatever the key is -- meaning key.hashCode().
– Louis Wasserman
Nov 9 at 0:20

I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
– mcnichol
Nov 9 at 1:03

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53217763%2fquestion-regarding-performance-of-hashtable-of-linkedlist-nodes%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

There are many things wrong here, but a handful include:

Running this operation 1000 times and taking the difference of nanoTime for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times.

Your hash table doesn't actually work any differently for different-sized tables. You use table[getHash(key) % RADIX], which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.

System.identityHashCode is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.

While you're at it, you're not using Node.next as a field, and might as well get rid of it.

answered Nov 8 at 23:44

Louis Wasserman

144k20249320

Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed to Node nodeTree = table[getHash(key) % table.length]; Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
– mcnichol
Nov 8 at 23:56

1: Using only the last value from the final loop makes it actually worse. 3: System.identityHashCode is only used if the objects do not define a sensible hash function of their own. String has a sensible hash code.
– Louis Wasserman
Nov 8 at 23:57

1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
– mcnichol
Nov 9 at 0:13

1. The granularity of System.nanoTime isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call the hashCode method of whatever the key is -- meaning key.hashCode().
– Louis Wasserman
Nov 9 at 0:20

I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
– mcnichol
Nov 9 at 1:03

add a comment |

up vote
1
down vote

accepted

There are many things wrong here, but a handful include:

Running this operation 1000 times and taking the difference of nanoTime for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times.

Your hash table doesn't actually work any differently for different-sized tables. You use table[getHash(key) % RADIX], which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.

System.identityHashCode is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.

While you're at it, you're not using Node.next as a field, and might as well get rid of it.

answered Nov 8 at 23:44

Louis Wasserman

144k20249320

Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed to Node nodeTree = table[getHash(key) % table.length]; Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
– mcnichol
Nov 8 at 23:56

1: Using only the last value from the final loop makes it actually worse. 3: System.identityHashCode is only used if the objects do not define a sensible hash function of their own. String has a sensible hash code.
– Louis Wasserman
Nov 8 at 23:57

1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
– mcnichol
Nov 9 at 0:13

1. The granularity of System.nanoTime isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call the hashCode method of whatever the key is -- meaning key.hashCode().
– Louis Wasserman
Nov 9 at 0:20

I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
– mcnichol
Nov 9 at 1:03

add a comment |

up vote
1
down vote

accepted

There are many things wrong here, but a handful include:

Running this operation 1000 times and taking the difference of nanoTime for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times.

Your hash table doesn't actually work any differently for different-sized tables. You use table[getHash(key) % RADIX], which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.

System.identityHashCode is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.

While you're at it, you're not using Node.next as a field, and might as well get rid of it.

answered Nov 8 at 23:44

Louis Wasserman

144k20249320

There are many things wrong here, but a handful include:

Running this operation 1000 times and taking the difference of nanoTime for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times.

Your hash table doesn't actually work any differently for different-sized tables. You use table[getHash(key) % RADIX], which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.

System.identityHashCode is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.

While you're at it, you're not using Node.next as a field, and might as well get rid of it.

answered Nov 8 at 23:44

Louis Wasserman

144k20249320

answered Nov 8 at 23:44

Louis Wasserman

144k20249320

answered Nov 8 at 23:44

Louis Wasserman

144k20249320

answered Nov 8 at 23:44

Louis Wasserman

144k20249320

Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed to Node nodeTree = table[getHash(key) % table.length]; Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
– mcnichol
Nov 8 at 23:56

1: Using only the last value from the final loop makes it actually worse. 3: System.identityHashCode is only used if the objects do not define a sensible hash function of their own. String has a sensible hash code.
– Louis Wasserman
Nov 8 at 23:57

1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
– mcnichol
Nov 9 at 0:13

1. The granularity of System.nanoTime isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call the hashCode method of whatever the key is -- meaning key.hashCode().
– Louis Wasserman
Nov 9 at 0:20

I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
– mcnichol
Nov 9 at 1:03

add a comment |

Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed to Node nodeTree = table[getHash(key) % table.length]; Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
– mcnichol
Nov 8 at 23:56

1: Using only the last value from the final loop makes it actually worse. 3: System.identityHashCode is only used if the objects do not define a sensible hash function of their own. String has a sensible hash code.
– Louis Wasserman
Nov 8 at 23:57

1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
– mcnichol
Nov 9 at 0:13

1. The granularity of System.nanoTime isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call the hashCode method of whatever the key is -- meaning key.hashCode().
– Louis Wasserman
Nov 9 at 0:20

I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
– mcnichol
Nov 9 at 1:03

Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed to Node nodeTree = table[getHash(key) % table.length]; Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
– mcnichol
Nov 8 at 23:56

1: Using only the last value from the final loop makes it actually worse. 3: System.identityHashCode is only used if the objects do not define a sensible hash function of their own. String has a sensible hash code.
– Louis Wasserman
Nov 8 at 23:57

1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
– mcnichol
Nov 9 at 0:13

1. The granularity of System.nanoTime isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call the hashCode method of whatever the key is -- meaning key.hashCode().
– Louis Wasserman
Nov 9 at 0:20

I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
– mcnichol
Nov 9 at 1:03

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk