Question regarding performance of HashTable of LinkedList Nodes
up vote
0
down vote
favorite
I implemented a HashTable with variable size buckets on init of the Class, merely an Array of linked lists sized at Runtime.
The issue is that with a small number of buckets where the linked-list must be traversed (can reach approx. 5K nodes in depth) is outperforming a HashTable with more buckets differing on three orders of magnitude greater.
int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;
HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);
I would expect the larger HashTable to be O(1) for search where the smaller hashtable having a higher collision rate, taking more time due to traversal of the linked nodes, yet my numbers below are showing the smaller table outperforming the wider table.
Fetch SmallTable: 0.000007
Fetch BigTable: 0.000018
So I decide to loop my HashTable.get a thousand times to factor for JIT and JVM Optimization. Now I'm starting to see numbers that seem to confirm what I'd expect.
Fetch SmallTable: 0.0000013630
Fetch BigTable: 0.0000002560
My question is around the sound-ness of my logic as well as the additional moving parts here. I've pasted my test alongside a link to the implementation of the HashTable and underlying Node structures.
Looking for depth/experience from folks here who may be able to provide interactive feedback regarding variables that factor into this such as key length and hashing collision rates, bucket density, etc.
HashTableTest.java
@Test
public void canInitializeHashTableWithBucketsForPerformance() throws InterruptedException {
double smallTableTime, bigTableTime;
int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;
HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);
List<String> strings = generateRandomStringKeys(1000);
strings.forEach(string -> bigHashtTable.put(string, 10));
strings.forEach(string -> smallHashTable.put(string, 10));
Consumer<String> bigHashGet = bigHashtTable::get;
Consumer<String> smallHashGet = smallHashTable::get;
String theString = strings.get(strings.size() - 1);
smallTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, smallHashGet);
bigTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, bigHashGet);
System.out.println(String.format("Fetch SmallTable: %.10f", smallTableTime));
System.out.println(String.format("Fetch BigTable: %.10f", bigTableTime));
assertTrue(smallTableTime > bigTableTime);
}
public double getElapsedTimeFactoringOutJavaOptimization(String s, Consumer<String> aMethod) {
long start = 0, end = 0;
for (int i = 0; i < 1000; i++) {
start = System.nanoTime();
aMethod.accept(s);
end = System.nanoTime();
}
return (end - start) / 1_000_000_000D;
}
public List<String> generateRandomStringKeys(int numOfRandomKeys) {
List<String> keys = new ArrayList<>();
for (int i = 0; i < numOfRandomKeys; i++) {
byte array = new byte[10];
new Random().nextBytes(array);
keys.add(new String(array, Charset.forName("UTF-8")));
}
return keys;
}
The test can be found here - Github - HashTableTest.java
The implementation can be found here as well - Github - HashTable.java
java data-structures hashtable
|
show 3 more comments
up vote
0
down vote
favorite
I implemented a HashTable with variable size buckets on init of the Class, merely an Array of linked lists sized at Runtime.
The issue is that with a small number of buckets where the linked-list must be traversed (can reach approx. 5K nodes in depth) is outperforming a HashTable with more buckets differing on three orders of magnitude greater.
int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;
HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);
I would expect the larger HashTable to be O(1) for search where the smaller hashtable having a higher collision rate, taking more time due to traversal of the linked nodes, yet my numbers below are showing the smaller table outperforming the wider table.
Fetch SmallTable: 0.000007
Fetch BigTable: 0.000018
So I decide to loop my HashTable.get a thousand times to factor for JIT and JVM Optimization. Now I'm starting to see numbers that seem to confirm what I'd expect.
Fetch SmallTable: 0.0000013630
Fetch BigTable: 0.0000002560
My question is around the sound-ness of my logic as well as the additional moving parts here. I've pasted my test alongside a link to the implementation of the HashTable and underlying Node structures.
Looking for depth/experience from folks here who may be able to provide interactive feedback regarding variables that factor into this such as key length and hashing collision rates, bucket density, etc.
HashTableTest.java
@Test
public void canInitializeHashTableWithBucketsForPerformance() throws InterruptedException {
double smallTableTime, bigTableTime;
int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;
HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);
List<String> strings = generateRandomStringKeys(1000);
strings.forEach(string -> bigHashtTable.put(string, 10));
strings.forEach(string -> smallHashTable.put(string, 10));
Consumer<String> bigHashGet = bigHashtTable::get;
Consumer<String> smallHashGet = smallHashTable::get;
String theString = strings.get(strings.size() - 1);
smallTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, smallHashGet);
bigTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, bigHashGet);
System.out.println(String.format("Fetch SmallTable: %.10f", smallTableTime));
System.out.println(String.format("Fetch BigTable: %.10f", bigTableTime));
assertTrue(smallTableTime > bigTableTime);
}
public double getElapsedTimeFactoringOutJavaOptimization(String s, Consumer<String> aMethod) {
long start = 0, end = 0;
for (int i = 0; i < 1000; i++) {
start = System.nanoTime();
aMethod.accept(s);
end = System.nanoTime();
}
return (end - start) / 1_000_000_000D;
}
public List<String> generateRandomStringKeys(int numOfRandomKeys) {
List<String> keys = new ArrayList<>();
for (int i = 0; i < numOfRandomKeys; i++) {
byte array = new byte[10];
new Random().nextBytes(array);
keys.add(new String(array, Charset.forName("UTF-8")));
}
return keys;
}
The test can be found here - Github - HashTableTest.java
The implementation can be found here as well - Github - HashTable.java
java data-structures hashtable
Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
– Louis Wasserman
Nov 8 at 23:40
You should also average the speed of multiple rounds. Move theSystem.nanoTime()
calls outside of the loop
– flakes
Nov 8 at 23:43
@flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
– mcnichol
Nov 8 at 23:59
@LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
– mcnichol
Nov 9 at 0:02
@mcnichol: please define what you mean by "peek out".
– Louis Wasserman
Nov 9 at 0:03
|
show 3 more comments
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I implemented a HashTable with variable size buckets on init of the Class, merely an Array of linked lists sized at Runtime.
The issue is that with a small number of buckets where the linked-list must be traversed (can reach approx. 5K nodes in depth) is outperforming a HashTable with more buckets differing on three orders of magnitude greater.
int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;
HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);
I would expect the larger HashTable to be O(1) for search where the smaller hashtable having a higher collision rate, taking more time due to traversal of the linked nodes, yet my numbers below are showing the smaller table outperforming the wider table.
Fetch SmallTable: 0.000007
Fetch BigTable: 0.000018
So I decide to loop my HashTable.get a thousand times to factor for JIT and JVM Optimization. Now I'm starting to see numbers that seem to confirm what I'd expect.
Fetch SmallTable: 0.0000013630
Fetch BigTable: 0.0000002560
My question is around the sound-ness of my logic as well as the additional moving parts here. I've pasted my test alongside a link to the implementation of the HashTable and underlying Node structures.
Looking for depth/experience from folks here who may be able to provide interactive feedback regarding variables that factor into this such as key length and hashing collision rates, bucket density, etc.
HashTableTest.java
@Test
public void canInitializeHashTableWithBucketsForPerformance() throws InterruptedException {
double smallTableTime, bigTableTime;
int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;
HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);
List<String> strings = generateRandomStringKeys(1000);
strings.forEach(string -> bigHashtTable.put(string, 10));
strings.forEach(string -> smallHashTable.put(string, 10));
Consumer<String> bigHashGet = bigHashtTable::get;
Consumer<String> smallHashGet = smallHashTable::get;
String theString = strings.get(strings.size() - 1);
smallTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, smallHashGet);
bigTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, bigHashGet);
System.out.println(String.format("Fetch SmallTable: %.10f", smallTableTime));
System.out.println(String.format("Fetch BigTable: %.10f", bigTableTime));
assertTrue(smallTableTime > bigTableTime);
}
public double getElapsedTimeFactoringOutJavaOptimization(String s, Consumer<String> aMethod) {
long start = 0, end = 0;
for (int i = 0; i < 1000; i++) {
start = System.nanoTime();
aMethod.accept(s);
end = System.nanoTime();
}
return (end - start) / 1_000_000_000D;
}
public List<String> generateRandomStringKeys(int numOfRandomKeys) {
List<String> keys = new ArrayList<>();
for (int i = 0; i < numOfRandomKeys; i++) {
byte array = new byte[10];
new Random().nextBytes(array);
keys.add(new String(array, Charset.forName("UTF-8")));
}
return keys;
}
The test can be found here - Github - HashTableTest.java
The implementation can be found here as well - Github - HashTable.java
java data-structures hashtable
I implemented a HashTable with variable size buckets on init of the Class, merely an Array of linked lists sized at Runtime.
The issue is that with a small number of buckets where the linked-list must be traversed (can reach approx. 5K nodes in depth) is outperforming a HashTable with more buckets differing on three orders of magnitude greater.
int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;
HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);
I would expect the larger HashTable to be O(1) for search where the smaller hashtable having a higher collision rate, taking more time due to traversal of the linked nodes, yet my numbers below are showing the smaller table outperforming the wider table.
Fetch SmallTable: 0.000007
Fetch BigTable: 0.000018
So I decide to loop my HashTable.get a thousand times to factor for JIT and JVM Optimization. Now I'm starting to see numbers that seem to confirm what I'd expect.
Fetch SmallTable: 0.0000013630
Fetch BigTable: 0.0000002560
My question is around the sound-ness of my logic as well as the additional moving parts here. I've pasted my test alongside a link to the implementation of the HashTable and underlying Node structures.
Looking for depth/experience from folks here who may be able to provide interactive feedback regarding variables that factor into this such as key length and hashing collision rates, bucket density, etc.
HashTableTest.java
@Test
public void canInitializeHashTableWithBucketsForPerformance() throws InterruptedException {
double smallTableTime, bigTableTime;
int SMALL_BUCKET_SIZE = 10;
int BIG_BUCKET_SIZE = 10000;
HashTable<String, Integer> smallHashTable = new HashTable<>(SMALL_BUCKET_SIZE);
HashTable<String, Integer> bigHashtTable = new HashTable<>(BIG_BUCKET_SIZE);
List<String> strings = generateRandomStringKeys(1000);
strings.forEach(string -> bigHashtTable.put(string, 10));
strings.forEach(string -> smallHashTable.put(string, 10));
Consumer<String> bigHashGet = bigHashtTable::get;
Consumer<String> smallHashGet = smallHashTable::get;
String theString = strings.get(strings.size() - 1);
smallTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, smallHashGet);
bigTableTime = getElapsedTimeFactoringOutJavaOptimization(theString, bigHashGet);
System.out.println(String.format("Fetch SmallTable: %.10f", smallTableTime));
System.out.println(String.format("Fetch BigTable: %.10f", bigTableTime));
assertTrue(smallTableTime > bigTableTime);
}
public double getElapsedTimeFactoringOutJavaOptimization(String s, Consumer<String> aMethod) {
long start = 0, end = 0;
for (int i = 0; i < 1000; i++) {
start = System.nanoTime();
aMethod.accept(s);
end = System.nanoTime();
}
return (end - start) / 1_000_000_000D;
}
public List<String> generateRandomStringKeys(int numOfRandomKeys) {
List<String> keys = new ArrayList<>();
for (int i = 0; i < numOfRandomKeys; i++) {
byte array = new byte[10];
new Random().nextBytes(array);
keys.add(new String(array, Charset.forName("UTF-8")));
}
return keys;
}
The test can be found here - Github - HashTableTest.java
The implementation can be found here as well - Github - HashTable.java
java data-structures hashtable
java data-structures hashtable
asked Nov 8 at 23:36
mcnichol
589
589
Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
– Louis Wasserman
Nov 8 at 23:40
You should also average the speed of multiple rounds. Move theSystem.nanoTime()
calls outside of the loop
– flakes
Nov 8 at 23:43
@flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
– mcnichol
Nov 8 at 23:59
@LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
– mcnichol
Nov 9 at 0:02
@mcnichol: please define what you mean by "peek out".
– Louis Wasserman
Nov 9 at 0:03
|
show 3 more comments
Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
– Louis Wasserman
Nov 8 at 23:40
You should also average the speed of multiple rounds. Move theSystem.nanoTime()
calls outside of the loop
– flakes
Nov 8 at 23:43
@flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
– mcnichol
Nov 8 at 23:59
@LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
– mcnichol
Nov 9 at 0:02
@mcnichol: please define what you mean by "peek out".
– Louis Wasserman
Nov 9 at 0:03
Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
– Louis Wasserman
Nov 8 at 23:40
Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
– Louis Wasserman
Nov 8 at 23:40
You should also average the speed of multiple rounds. Move the
System.nanoTime()
calls outside of the loop– flakes
Nov 8 at 23:43
You should also average the speed of multiple rounds. Move the
System.nanoTime()
calls outside of the loop– flakes
Nov 8 at 23:43
@flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
– mcnichol
Nov 8 at 23:59
@flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
– mcnichol
Nov 8 at 23:59
@LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
– mcnichol
Nov 9 at 0:02
@LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
– mcnichol
Nov 9 at 0:02
@mcnichol: please define what you mean by "peek out".
– Louis Wasserman
Nov 9 at 0:03
@mcnichol: please define what you mean by "peek out".
– Louis Wasserman
Nov 9 at 0:03
|
show 3 more comments
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
There are many things wrong here, but a handful include:
- Running this operation 1000 times and taking the difference of
nanoTime
for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times. - Your hash table doesn't actually work any differently for different-sized tables. You use
table[getHash(key) % RADIX]
, which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.
System.identityHashCode
is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.- While you're at it, you're not using
Node.next
as a field, and might as well get rid of it.
Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed toNode nodeTree = table[getHash(key) % table.length];
Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
– mcnichol
Nov 8 at 23:56
1: Using only the last value from the final loop makes it actually worse. 3:System.identityHashCode
is only used if the objects do not define a sensible hash function of their own.String
has a sensible hash code.
– Louis Wasserman
Nov 8 at 23:57
1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
– mcnichol
Nov 9 at 0:13
1. The granularity ofSystem.nanoTime
isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call thehashCode
method of whatever the key is -- meaningkey.hashCode()
.
– Louis Wasserman
Nov 9 at 0:20
I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
– mcnichol
Nov 9 at 1:03
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
There are many things wrong here, but a handful include:
- Running this operation 1000 times and taking the difference of
nanoTime
for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times. - Your hash table doesn't actually work any differently for different-sized tables. You use
table[getHash(key) % RADIX]
, which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.
System.identityHashCode
is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.- While you're at it, you're not using
Node.next
as a field, and might as well get rid of it.
Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed toNode nodeTree = table[getHash(key) % table.length];
Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
– mcnichol
Nov 8 at 23:56
1: Using only the last value from the final loop makes it actually worse. 3:System.identityHashCode
is only used if the objects do not define a sensible hash function of their own.String
has a sensible hash code.
– Louis Wasserman
Nov 8 at 23:57
1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
– mcnichol
Nov 9 at 0:13
1. The granularity ofSystem.nanoTime
isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call thehashCode
method of whatever the key is -- meaningkey.hashCode()
.
– Louis Wasserman
Nov 9 at 0:20
I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
– mcnichol
Nov 9 at 1:03
add a comment |
up vote
1
down vote
accepted
There are many things wrong here, but a handful include:
- Running this operation 1000 times and taking the difference of
nanoTime
for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times. - Your hash table doesn't actually work any differently for different-sized tables. You use
table[getHash(key) % RADIX]
, which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.
System.identityHashCode
is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.- While you're at it, you're not using
Node.next
as a field, and might as well get rid of it.
Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed toNode nodeTree = table[getHash(key) % table.length];
Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
– mcnichol
Nov 8 at 23:56
1: Using only the last value from the final loop makes it actually worse. 3:System.identityHashCode
is only used if the objects do not define a sensible hash function of their own.String
has a sensible hash code.
– Louis Wasserman
Nov 8 at 23:57
1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
– mcnichol
Nov 9 at 0:13
1. The granularity ofSystem.nanoTime
isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call thehashCode
method of whatever the key is -- meaningkey.hashCode()
.
– Louis Wasserman
Nov 9 at 0:20
I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
– mcnichol
Nov 9 at 1:03
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
There are many things wrong here, but a handful include:
- Running this operation 1000 times and taking the difference of
nanoTime
for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times. - Your hash table doesn't actually work any differently for different-sized tables. You use
table[getHash(key) % RADIX]
, which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.
System.identityHashCode
is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.- While you're at it, you're not using
Node.next
as a field, and might as well get rid of it.
There are many things wrong here, but a handful include:
- Running this operation 1000 times and taking the difference of
nanoTime
for each of them does not make your benchmark valid. Seriously, use JMH. Or at least run it, like, ten million times. - Your hash table doesn't actually work any differently for different-sized tables. You use
table[getHash(key) % RADIX]
, which basically means however big the table is, you only use 10 buckets in it and pretend the rest don't exist.
System.identityHashCode
is not a useful hash function, especially on strings, especially when you're hoping to actually find an element that's in there...or not.- While you're at it, you're not using
Node.next
as a field, and might as well get rid of it.
answered Nov 8 at 23:44
Louis Wasserman
144k20249320
144k20249320
Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed toNode nodeTree = table[getHash(key) % table.length];
Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
– mcnichol
Nov 8 at 23:56
1: Using only the last value from the final loop makes it actually worse. 3:System.identityHashCode
is only used if the objects do not define a sensible hash function of their own.String
has a sensible hash code.
– Louis Wasserman
Nov 8 at 23:57
1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
– mcnichol
Nov 9 at 0:13
1. The granularity ofSystem.nanoTime
isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call thehashCode
method of whatever the key is -- meaningkey.hashCode()
.
– Louis Wasserman
Nov 9 at 0:20
I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
– mcnichol
Nov 9 at 1:03
add a comment |
Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed toNode nodeTree = table[getHash(key) % table.length];
Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.
– mcnichol
Nov 8 at 23:56
1: Using only the last value from the final loop makes it actually worse. 3:System.identityHashCode
is only used if the objects do not define a sensible hash function of their own.String
has a sensible hash code.
– Louis Wasserman
Nov 8 at 23:57
1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
– mcnichol
Nov 9 at 0:13
1. The granularity ofSystem.nanoTime
isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call thehashCode
method of whatever the key is -- meaningkey.hashCode()
.
– Louis Wasserman
Nov 9 at 0:20
I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
– mcnichol
Nov 9 at 1:03
Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed to
Node nodeTree = table[getHash(key) % table.length];
Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.– mcnichol
Nov 8 at 23:56
Point 1: I use only the last value from the final loop. Point 2: Apologies, you were faster on the draw than my push to Github. It was changed to
Node nodeTree = table[getHash(key) % table.length];
Point 3: Would love some clarity on this. For System.identityHashCode I read that this was what Object.hash actually calls underneath. Trying to read up more on this, purely for academic reasons. Point 4: It's possible this is related to the same Push/timing comment I made earlier. I'm using Node.next for recursively iterating through instead of my previous loop for searching.– mcnichol
Nov 8 at 23:56
1: Using only the last value from the final loop makes it actually worse. 3:
System.identityHashCode
is only used if the objects do not define a sensible hash function of their own. String
has a sensible hash code.– Louis Wasserman
Nov 8 at 23:57
1: Using only the last value from the final loop makes it actually worse. 3:
System.identityHashCode
is only used if the objects do not define a sensible hash function of their own. String
has a sensible hash code.– Louis Wasserman
Nov 8 at 23:57
1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
– mcnichol
Nov 9 at 0:13
1: Interested in how it is worse. My reasoning was that warming up values didn't matter and that I could take the final one. Are you saying averaging all values would be a more accurate representation than final value? 3: I had made the key generic and plan on overriding because due to prime 31 being leveraged and looking to test performance on larger primes in reference to a comment made in another thread. Have you played in this space already?
– mcnichol
Nov 9 at 0:13
1. The granularity of
System.nanoTime
isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call the hashCode
method of whatever the key is -- meaning key.hashCode()
.– Louis Wasserman
Nov 9 at 0:20
1. The granularity of
System.nanoTime
isn't necessarily fine enough to give useful measurements for a single iteration; better to time multiple iterations in one go. Averaging all values would almost certainly be better. 3. If you want to make the key generic, then call the hashCode
method of whatever the key is -- meaning key.hashCode()
.– Louis Wasserman
Nov 9 at 0:20
I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
– mcnichol
Nov 9 at 1:03
I'll mark this one as the answer. Would love any additional feedback if you have any more. I know you mentioned there were many things wrong in this one. The main thing I'm walking away with being running JMH instead of looping with the average and dropping System.identity for the objects hash. I've updated it with your feedback, thanks.
– mcnichol
Nov 9 at 1:03
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53217763%2fquestion-regarding-performance-of-hashtable-of-linkedlist-nodes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Benchmarking an operation like this is going to be harder than just running it 1000 times. (1000 times is nowhere near enough, for starters.)
– Louis Wasserman
Nov 8 at 23:40
You should also average the speed of multiple rounds. Move the
System.nanoTime()
calls outside of the loop– flakes
Nov 8 at 23:43
@flakes Notice that I'm basically dropping all of the System.nanoTimes on the floor except for the last one. Moving them outside the loop would capture the entire time the loop ran, not a single final iteration.
– mcnichol
Nov 8 at 23:59
@LouisWasserman - I increased the loop. It seems to peek out around 40k - 100k iterations. Curious if you typically see higher than that.
– mcnichol
Nov 9 at 0:02
@mcnichol: please define what you mean by "peek out".
– Louis Wasserman
Nov 9 at 0:03