Question regarding performance of HashTable of LinkedList Nodes











up vote
0
down vote

favorite












I implemented a HashTable with variable size buckets on init of the Class, merely an Array of linked lists sized at Runtime.



The issue is that with a small number of buckets where the linked-list must be traversed (can reach approx. 5K nodes in depth) is outperforming a HashTable with more buckets differing on three orders of magnitude greater.



    int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;

HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);


I would expect the larger HashTable to be O(1) for search where the smaller hashtable having a higher collision rate, taking more time due to traversal of the linked nodes, yet my numbers below are showing the smaller table outperforming the wider table.



Fetch SmallTable: 0.000007
Fetch BigTable: 0.000018



So I decide to loop my HashTable.get a thousand times to factor for JIT and JVM Optimization. Now I'm starting to see numbers that seem to confirm what I'd expect.



Fetch SmallTable: 0.0000013630
Fetch BigTable: 0.0000002560



My question is around the sound-ness of my logic as well as the additional moving parts here. I've pasted my test alongside a link to the implementation of the HashTable and underlying Node structures.



Looking for depth/experience from folks here who may be able to provide interactive feedback regarding variables that factor into this such as key length and hashing collision rates, bucket density, etc.



HashTableTest.java



@Test
public void canInitializeHashTableWithBucketsForPerformance() throws InterruptedException {
double smallTableTime, bigTableTime;
int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;

HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);
List<String> strings = generateRandomStringKeys(1000);

strings.forEach(string -> bigHashtTable.put(string, 10));
strings.forEach(string -> smallHashTable.put(string, 10));

Consumer<String> bigHashGet = bigHashtTable::get;
Consumer<String> smallHashGet = smallHashTable::get;

String theString = strings.get(strings.size() - 1);

smallTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, smallHashGet);
bigTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, bigHashGet);

System.out.println(String.format("Fetch SmallTable: %.10f", smallTableTime));
System.out.println(String.format("Fetch BigTable: %.10f", bigTableTime));

assertTrue(smallTableTime > bigTableTime);
}

public double getElapsedTimeFactoringOutJavaOptimization(String s, Consumer<String> aMethod) {
long start = 0, end = 0;

for (int i = 0; i < 1000; i++) {
start = System.nanoTime();
aMethod.accept(s);
end = System.nanoTime();
}

return (end - start) / 1_000_000_000D;
}

public List<String> generateRandomStringKeys(int numOfRandomKeys) {
List<String> keys = new ArrayList<>();

for (int i = 0; i < numOfRandomKeys; i++) {
byte array = new byte[10];
new Random().nextBytes(array);
keys.add(new String(array, Charset.forName("UTF-8")));
}

return keys;
}


The test can be found here - Github - HashTableTest.java



The implementation can be found here as well - Github - HashTable.java










share|improve this question






















  • Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
    – Louis Wasserman
    Nov 8 at 23:40












  • You should also average the speed of multiple rounds. Move the System.nanoTime() calls outside of the loop
    – flakes
    Nov 8 at 23:43










  • @flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
    – mcnichol
    Nov 8 at 23:59










  • @LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
    – mcnichol
    Nov 9 at 0:02










  • @mcnichol: please define what you mean by "peek out".
    – Louis Wasserman
    Nov 9 at 0:03

















up vote
0
down vote

favorite












I implemented a HashTable with variable size buckets on init of the Class, merely an Array of linked lists sized at Runtime.



The issue is that with a small number of buckets where the linked-list must be traversed (can reach approx. 5K nodes in depth) is outperforming a HashTable with more buckets differing on three orders of magnitude greater.



    int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;

HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);


I would expect the larger HashTable to be O(1) for search where the smaller hashtable having a higher collision rate, taking more time due to traversal of the linked nodes, yet my numbers below are showing the smaller table outperforming the wider table.



Fetch SmallTable: 0.000007
Fetch BigTable: 0.000018



So I decide to loop my HashTable.get a thousand times to factor for JIT and JVM Optimization. Now I'm starting to see numbers that seem to confirm what I'd expect.



Fetch SmallTable: 0.0000013630
Fetch BigTable: 0.0000002560



My question is around the sound-ness of my logic as well as the additional moving parts here. I've pasted my test alongside a link to the implementation of the HashTable and underlying Node structures.



Looking for depth/experience from folks here who may be able to provide interactive feedback regarding variables that factor into this such as key length and hashing collision rates, bucket density, etc.



HashTableTest.java



@Test
public void canInitializeHashTableWithBucketsForPerformance() throws InterruptedException {
double smallTableTime, bigTableTime;
int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;

HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);
List<String> strings = generateRandomStringKeys(1000);

strings.forEach(string -> bigHashtTable.put(string, 10));
strings.forEach(string -> smallHashTable.put(string, 10));

Consumer<String> bigHashGet = bigHashtTable::get;
Consumer<String> smallHashGet = smallHashTable::get;

String theString = strings.get(strings.size() - 1);

smallTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, smallHashGet);
bigTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, bigHashGet);

System.out.println(String.format("Fetch SmallTable: %.10f", smallTableTime));
System.out.println(String.format("Fetch BigTable: %.10f", bigTableTime));

assertTrue(smallTableTime > bigTableTime);
}

public double getElapsedTimeFactoringOutJavaOptimization(String s, Consumer<String> aMethod) {
long start = 0, end = 0;

for (int i = 0; i < 1000; i++) {
start = System.nanoTime();
aMethod.accept(s);
end = System.nanoTime();
}

return (end - start) / 1_000_000_000D;
}

public List<String> generateRandomStringKeys(int numOfRandomKeys) {
List<String> keys = new ArrayList<>();

for (int i = 0; i < numOfRandomKeys; i++) {
byte array = new byte[10];
new Random().nextBytes(array);
keys.add(new String(array, Charset.forName("UTF-8")));
}

return keys;
}


The test can be found here - Github - HashTableTest.java



The implementation can be found here as well - Github - HashTable.java










share|improve this question






















  • Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
    – Louis Wasserman
    Nov 8 at 23:40












  • You should also average the speed of multiple rounds. Move the System.nanoTime() calls outside of the loop
    – flakes
    Nov 8 at 23:43










  • @flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
    – mcnichol
    Nov 8 at 23:59










  • @LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
    – mcnichol
    Nov 9 at 0:02










  • @mcnichol: please define what you mean by "peek out".
    – Louis Wasserman
    Nov 9 at 0:03















up vote
0
down vote

favorite









up vote
0
down vote

favorite











I implemented a HashTable with variable size buckets on init of the Class, merely an Array of linked lists sized at Runtime.



The issue is that with a small number of buckets where the linked-list must be traversed (can reach approx. 5K nodes in depth) is outperforming a HashTable with more buckets differing on three orders of magnitude greater.



    int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;

HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);


I would expect the larger HashTable to be O(1) for search where the smaller hashtable having a higher collision rate, taking more time due to traversal of the linked nodes, yet my numbers below are showing the smaller table outperforming the wider table.



Fetch SmallTable: 0.000007
Fetch BigTable: 0.000018



So I decide to loop my HashTable.get a thousand times to factor for JIT and JVM Optimization. Now I'm starting to see numbers that seem to confirm what I'd expect.



Fetch SmallTable: 0.0000013630
Fetch BigTable: 0.0000002560



My question is around the sound-ness of my logic as well as the additional moving parts here. I've pasted my test alongside a link to the implementation of the HashTable and underlying Node structures.



Looking for depth/experience from folks here who may be able to provide interactive feedback regarding variables that factor into this such as key length and hashing collision rates, bucket density, etc.



HashTableTest.java



@Test
public void canInitializeHashTableWithBucketsForPerformance() throws InterruptedException {
double smallTableTime, bigTableTime;
int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;

HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);
List<String> strings = generateRandomStringKeys(1000);

strings.forEach(string -> bigHashtTable.put(string, 10));
strings.forEach(string -> smallHashTable.put(string, 10));

Consumer<String> bigHashGet = bigHashtTable::get;
Consumer<String> smallHashGet = smallHashTable::get;

String theString = strings.get(strings.size() - 1);

smallTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, smallHashGet);
bigTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, bigHashGet);

System.out.println(String.format("Fetch SmallTable: %.10f", smallTableTime));
System.out.println(String.format("Fetch BigTable: %.10f", bigTableTime));

assertTrue(smallTableTime > bigTableTime);
}

public double getElapsedTimeFactoringOutJavaOptimization(String s, Consumer<String> aMethod) {
long start = 0, end = 0;

for (int i = 0; i < 1000; i++) {
start = System.nanoTime();
aMethod.accept(s);
end = System.nanoTime();
}

return (end - start) / 1_000_000_000D;
}

public List<String> generateRandomStringKeys(int numOfRandomKeys) {
List<String> keys = new ArrayList<>();

for (int i = 0; i < numOfRandomKeys; i++) {
byte array = new byte[10];
new Random().nextBytes(array);
keys.add(new String(array, Charset.forName("UTF-8")));
}

return keys;
}


The test can be found here - Github - HashTableTest.java



The implementation can be found here as well - Github - HashTable.java










share|improve this question













I implemented a HashTable with variable size buckets on init of the Class, merely an Array of linked lists sized at Runtime.



The issue is that with a small number of buckets where the linked-list must be traversed (can reach approx. 5K nodes in depth) is outperforming a HashTable with more buckets differing on three orders of magnitude greater.



    int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;

HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);


I would expect the larger HashTable to be O(1) for search where the smaller hashtable having a higher collision rate, taking more time due to traversal of the linked nodes, yet my numbers below are showing the smaller table outperforming the wider table.



Fetch SmallTable: 0.000007
Fetch BigTable: 0.000018



So I decide to loop my HashTable.get a thousand times to factor for JIT and JVM Optimization. Now I'm starting to see numbers that seem to confirm what I'd expect.



Fetch SmallTable: 0.0000013630
Fetch BigTable: 0.0000002560



My question is around the sound-ness of my logic as well as the additional moving parts here. I've pasted my test alongside a link to the implementation of the HashTable and underlying Node structures.



Looking for depth/experience from folks here who may be able to provide interactive feedback regarding variables that factor into this such as key length and hashing collision rates, bucket density, etc.



HashTableTest.java



@Test
public void canInitializeHashTableWithBucketsForPerformance() throws InterruptedException {
double smallTableTime, bigTableTime;
int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;

HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);
List<String> strings = generateRandomStringKeys(1000);

strings.forEach(string -> bigHashtTable.put(string, 10));
strings.forEach(string -> smallHashTable.put(string, 10));

Consumer<String> bigHashGet = bigHashtTable::get;
Consumer<String> smallHashGet = smallHashTable::get;

String theString = strings.get(strings.size() - 1);

smallTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, smallHashGet);
bigTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, bigHashGet);

System.out.println(String.format("Fetch SmallTable: %.10f", smallTableTime));
System.out.println(String.format("Fetch BigTable: %.10f", bigTableTime));

assertTrue(smallTableTime > bigTableTime);
}

public double getElapsedTimeFactoringOutJavaOptimization(String s, Consumer<String> aMethod) {
long start = 0, end = 0;

for (int i = 0; i < 1000; i++) {
start = System.nanoTime();
aMethod.accept(s);
end = System.nanoTime();
}

return (end - start) / 1_000_000_000D;
}

public List<String> generateRandomStringKeys(int numOfRandomKeys) {
List<String> keys = new ArrayList<>();

for (int i = 0; i < numOfRandomKeys; i++) {
byte array = new byte[10];
new Random().nextBytes(array);
keys.add(new String(array, Charset.forName("UTF-8")));
}

return keys;
}


The test can be found here - Github - HashTableTest.java



The implementation can be found here as well - Github - HashTable.java







java data-structures hashtable






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 8 at 23:36









mcnichol

589




589












  • Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
    – Louis Wasserman
    Nov 8 at 23:40












  • You should also average the speed of multiple rounds. Move the System.nanoTime() calls outside of the loop
    – flakes
    Nov 8 at 23:43










  • @flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
    – mcnichol
    Nov 8 at 23:59










  • @LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
    – mcnichol
    Nov 9 at 0:02










  • @mcnichol: please define what you mean by "peek out".
    – Louis Wasserman
    Nov 9 at 0:03




















  • Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
    – Louis Wasserman
    Nov 8 at 23:40












  • You should also average the speed of multiple rounds. Move the System.nanoTime() calls outside of the loop
    – flakes
    Nov 8 at 23:43










  • @flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
    – mcnichol
    Nov 8 at 23:59










  • @LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
    – mcnichol
    Nov 9 at 0:02










  • @mcnichol: please define what you mean by "peek out".
    – Louis Wasserman
    Nov 9 at 0:03


















Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
– Louis Wasserman
Nov 8 at 23:40






Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
– Louis Wasserman
Nov 8 at 23:40














You should also average the speed of multiple rounds. Move the System.nanoTime() calls outside of the loop
– flakes
Nov 8 at 23:43




You should also average the speed of multiple rounds. Move the System.nanoTime() calls outside of the loop
– flakes
Nov 8 at 23:43












@flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
– mcnichol
Nov 8 at 23:59




@flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
– mcnichol
Nov 8 at 23:59












@LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
– mcnichol
Nov 9 at 0:02




@LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
– mcnichol
Nov 9 at 0:02












@mcnichol: please define what you mean by "peek out".
– Louis Wasserman
Nov 9 at 0:03






@mcnichol: please define what you mean by "peek out".
– Louis Wasserman
Nov 9 at 0:03














1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










There are many things wrong here, but a handful include:




  • Running this operation 1000 times and taking the difference of nanoTime for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times.

  • Your hash table doesn't actually work any differently for different-sized tables. You use table[getHash(key) % RADIX], which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.


  • System.identityHashCode is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.

  • While you're at it, you're not using Node.next as a field, and might as well get rid of it.






share|improve this answer





















  • Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed to Node nodeTree = table[getHash(key) % table.length]; Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
    – mcnichol
    Nov 8 at 23:56










  • 1: Using only the last value from the final loop makes it actually worse. 3: System.identityHashCode is only used if the objects do not define a sensible hash function of their own. String has a sensible hash code.
    – Louis Wasserman
    Nov 8 at 23:57










  • 1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
    – mcnichol
    Nov 9 at 0:13










  • 1. The granularity of System.nanoTime isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call the hashCode method of whatever the key is -- meaning key.hashCode().
    – Louis Wasserman
    Nov 9 at 0:20










  • I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
    – mcnichol
    Nov 9 at 1:03











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53217763%2fquestion-regarding-performance-of-hashtable-of-linkedlist-nodes%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










There are many things wrong here, but a handful include:




  • Running this operation 1000 times and taking the difference of nanoTime for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times.

  • Your hash table doesn't actually work any differently for different-sized tables. You use table[getHash(key) % RADIX], which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.


  • System.identityHashCode is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.

  • While you're at it, you're not using Node.next as a field, and might as well get rid of it.






share|improve this answer





















  • Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed to Node nodeTree = table[getHash(key) % table.length]; Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
    – mcnichol
    Nov 8 at 23:56










  • 1: Using only the last value from the final loop makes it actually worse. 3: System.identityHashCode is only used if the objects do not define a sensible hash function of their own. String has a sensible hash code.
    – Louis Wasserman
    Nov 8 at 23:57










  • 1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
    – mcnichol
    Nov 9 at 0:13










  • 1. The granularity of System.nanoTime isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call the hashCode method of whatever the key is -- meaning key.hashCode().
    – Louis Wasserman
    Nov 9 at 0:20










  • I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
    – mcnichol
    Nov 9 at 1:03















up vote
1
down vote



accepted










There are many things wrong here, but a handful include:




  • Running this operation 1000 times and taking the difference of nanoTime for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times.

  • Your hash table doesn't actually work any differently for different-sized tables. You use table[getHash(key) % RADIX], which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.


  • System.identityHashCode is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.

  • While you're at it, you're not using Node.next as a field, and might as well get rid of it.






share|improve this answer





















  • Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed to Node nodeTree = table[getHash(key) % table.length]; Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
    – mcnichol
    Nov 8 at 23:56










  • 1: Using only the last value from the final loop makes it actually worse. 3: System.identityHashCode is only used if the objects do not define a sensible hash function of their own. String has a sensible hash code.
    – Louis Wasserman
    Nov 8 at 23:57










  • 1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
    – mcnichol
    Nov 9 at 0:13










  • 1. The granularity of System.nanoTime isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call the hashCode method of whatever the key is -- meaning key.hashCode().
    – Louis Wasserman
    Nov 9 at 0:20










  • I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
    – mcnichol
    Nov 9 at 1:03













up vote
1
down vote



accepted







up vote
1
down vote



accepted






There are many things wrong here, but a handful include:




  • Running this operation 1000 times and taking the difference of nanoTime for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times.

  • Your hash table doesn't actually work any differently for different-sized tables. You use table[getHash(key) % RADIX], which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.


  • System.identityHashCode is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.

  • While you're at it, you're not using Node.next as a field, and might as well get rid of it.






share|improve this answer












There are many things wrong here, but a handful include:




  • Running this operation 1000 times and taking the difference of nanoTime for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times.

  • Your hash table doesn't actually work any differently for different-sized tables. You use table[getHash(key) % RADIX], which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.


  • System.identityHashCode is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.

  • While you're at it, you're not using Node.next as a field, and might as well get rid of it.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 8 at 23:44









Louis Wasserman

144k20249320




144k20249320












  • Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed to Node nodeTree = table[getHash(key) % table.length]; Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
    – mcnichol
    Nov 8 at 23:56










  • 1: Using only the last value from the final loop makes it actually worse. 3: System.identityHashCode is only used if the objects do not define a sensible hash function of their own. String has a sensible hash code.
    – Louis Wasserman
    Nov 8 at 23:57










  • 1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
    – mcnichol
    Nov 9 at 0:13










  • 1. The granularity of System.nanoTime isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call the hashCode method of whatever the key is -- meaning key.hashCode().
    – Louis Wasserman
    Nov 9 at 0:20










  • I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
    – mcnichol
    Nov 9 at 1:03


















  • Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed to Node nodeTree = table[getHash(key) % table.length]; Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
    – mcnichol
    Nov 8 at 23:56










  • 1: Using only the last value from the final loop makes it actually worse. 3: System.identityHashCode is only used if the objects do not define a sensible hash function of their own. String has a sensible hash code.
    – Louis Wasserman
    Nov 8 at 23:57










  • 1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
    – mcnichol
    Nov 9 at 0:13










  • 1. The granularity of System.nanoTime isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call the hashCode method of whatever the key is -- meaning key.hashCode().
    – Louis Wasserman
    Nov 9 at 0:20










  • I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
    – mcnichol
    Nov 9 at 1:03
















Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed to Node nodeTree = table[getHash(key) % table.length]; Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
– mcnichol
Nov 8 at 23:56




Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed to Node nodeTree = table[getHash(key) % table.length]; Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
– mcnichol
Nov 8 at 23:56












1: Using only the last value from the final loop makes it actually worse. 3: System.identityHashCode is only used if the objects do not define a sensible hash function of their own. String has a sensible hash code.
– Louis Wasserman
Nov 8 at 23:57




1: Using only the last value from the final loop makes it actually worse. 3: System.identityHashCode is only used if the objects do not define a sensible hash function of their own. String has a sensible hash code.
– Louis Wasserman
Nov 8 at 23:57












1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
– mcnichol
Nov 9 at 0:13




1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
– mcnichol
Nov 9 at 0:13












1. The granularity of System.nanoTime isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call the hashCode method of whatever the key is -- meaning key.hashCode().
– Louis Wasserman
Nov 9 at 0:20




1. The granularity of System.nanoTime isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call the hashCode method of whatever the key is -- meaning key.hashCode().
– Louis Wasserman
Nov 9 at 0:20












I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
– mcnichol
Nov 9 at 1:03




I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
– mcnichol
Nov 9 at 1:03


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53217763%2fquestion-regarding-performance-of-hashtable-of-linkedlist-nodes%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Hercules Kyvelos

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud