Why does hash() method return short Hash value with int in Python?
up vote
1
down vote
favorite
When hash() method is called in Python 3, I noticed that it doesn't return a long-length integer when taking in int data type but with string type.
Is this supposed to work this way? If that actually is the case, for the int type to have a short hash value, won't it cause collision since it's too short?
for i in [i for i in range(5)]:
print(hash(i))
print(hash("abc"))
The Result:
0
1
2
3
4
4714025963994714141
python hash types
add a comment |
up vote
1
down vote
favorite
When hash() method is called in Python 3, I noticed that it doesn't return a long-length integer when taking in int data type but with string type.
Is this supposed to work this way? If that actually is the case, for the int type to have a short hash value, won't it cause collision since it's too short?
for i in [i for i in range(5)]:
print(hash(i))
print(hash("abc"))
The Result:
0
1
2
3
4
4714025963994714141
python hash types
4
Whathashdoes is implementation-dependent; don't make any assumptions about what it returns.
– chepner
Nov 7 at 17:00
2
Collisions are inevitable; larger tables reduce collisions, but waste more space.
– chepner
Nov 7 at 17:03
3
just to clarify:hashis not a cryptographic hash. if you are interested in those usehashlib. the built-inhashis just about unique identifiers.
– hiro protagonist
Nov 7 at 17:03
2
The purpose of this value is to distribute keys into dictionary buckets -- it's not intended to be used for purposes that require longer output or stronger guarantees; given its primary use case, the main design goal is speed (since every lookup requires calculating the hash for the key).
– Charles Duffy
Nov 7 at 17:04
1
BTW, code formatting should be used, for, well, code.a long-length integerisn't code, it's English prose; likewise forshort hash value. If you want to emphasize prose, italics are usually the right choice. See Highlighting technical words? on Meta Stack Exchange.
– Charles Duffy
Nov 7 at 17:10
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
When hash() method is called in Python 3, I noticed that it doesn't return a long-length integer when taking in int data type but with string type.
Is this supposed to work this way? If that actually is the case, for the int type to have a short hash value, won't it cause collision since it's too short?
for i in [i for i in range(5)]:
print(hash(i))
print(hash("abc"))
The Result:
0
1
2
3
4
4714025963994714141
python hash types
When hash() method is called in Python 3, I noticed that it doesn't return a long-length integer when taking in int data type but with string type.
Is this supposed to work this way? If that actually is the case, for the int type to have a short hash value, won't it cause collision since it's too short?
for i in [i for i in range(5)]:
print(hash(i))
print(hash("abc"))
The Result:
0
1
2
3
4
4714025963994714141
python hash types
python hash types
edited Nov 7 at 17:13
Charles Duffy
170k24193247
170k24193247
asked Nov 7 at 16:58
Poream3387
460114
460114
4
Whathashdoes is implementation-dependent; don't make any assumptions about what it returns.
– chepner
Nov 7 at 17:00
2
Collisions are inevitable; larger tables reduce collisions, but waste more space.
– chepner
Nov 7 at 17:03
3
just to clarify:hashis not a cryptographic hash. if you are interested in those usehashlib. the built-inhashis just about unique identifiers.
– hiro protagonist
Nov 7 at 17:03
2
The purpose of this value is to distribute keys into dictionary buckets -- it's not intended to be used for purposes that require longer output or stronger guarantees; given its primary use case, the main design goal is speed (since every lookup requires calculating the hash for the key).
– Charles Duffy
Nov 7 at 17:04
1
BTW, code formatting should be used, for, well, code.a long-length integerisn't code, it's English prose; likewise forshort hash value. If you want to emphasize prose, italics are usually the right choice. See Highlighting technical words? on Meta Stack Exchange.
– Charles Duffy
Nov 7 at 17:10
add a comment |
4
Whathashdoes is implementation-dependent; don't make any assumptions about what it returns.
– chepner
Nov 7 at 17:00
2
Collisions are inevitable; larger tables reduce collisions, but waste more space.
– chepner
Nov 7 at 17:03
3
just to clarify:hashis not a cryptographic hash. if you are interested in those usehashlib. the built-inhashis just about unique identifiers.
– hiro protagonist
Nov 7 at 17:03
2
The purpose of this value is to distribute keys into dictionary buckets -- it's not intended to be used for purposes that require longer output or stronger guarantees; given its primary use case, the main design goal is speed (since every lookup requires calculating the hash for the key).
– Charles Duffy
Nov 7 at 17:04
1
BTW, code formatting should be used, for, well, code.a long-length integerisn't code, it's English prose; likewise forshort hash value. If you want to emphasize prose, italics are usually the right choice. See Highlighting technical words? on Meta Stack Exchange.
– Charles Duffy
Nov 7 at 17:10
4
4
What
hash does is implementation-dependent; don't make any assumptions about what it returns.– chepner
Nov 7 at 17:00
What
hash does is implementation-dependent; don't make any assumptions about what it returns.– chepner
Nov 7 at 17:00
2
2
Collisions are inevitable; larger tables reduce collisions, but waste more space.
– chepner
Nov 7 at 17:03
Collisions are inevitable; larger tables reduce collisions, but waste more space.
– chepner
Nov 7 at 17:03
3
3
just to clarify:
hash is not a cryptographic hash. if you are interested in those use hashlib. the built-in hash is just about unique identifiers.– hiro protagonist
Nov 7 at 17:03
just to clarify:
hash is not a cryptographic hash. if you are interested in those use hashlib. the built-in hash is just about unique identifiers.– hiro protagonist
Nov 7 at 17:03
2
2
The purpose of this value is to distribute keys into dictionary buckets -- it's not intended to be used for purposes that require longer output or stronger guarantees; given its primary use case, the main design goal is speed (since every lookup requires calculating the hash for the key).
– Charles Duffy
Nov 7 at 17:04
The purpose of this value is to distribute keys into dictionary buckets -- it's not intended to be used for purposes that require longer output or stronger guarantees; given its primary use case, the main design goal is speed (since every lookup requires calculating the hash for the key).
– Charles Duffy
Nov 7 at 17:04
1
1
BTW, code formatting should be used, for, well, code.
a long-length integer isn't code, it's English prose; likewise for short hash value. If you want to emphasize prose, italics are usually the right choice. See Highlighting technical words? on Meta Stack Exchange.– Charles Duffy
Nov 7 at 17:10
BTW, code formatting should be used, for, well, code.
a long-length integer isn't code, it's English prose; likewise for short hash value. If you want to emphasize prose, italics are usually the right choice. See Highlighting technical words? on Meta Stack Exchange.– Charles Duffy
Nov 7 at 17:10
add a comment |
3 Answers
3
active
oldest
votes
up vote
4
down vote
accepted
In CPython, default Python interpreter implementation, built-in hash is done in this way:
For numeric types, the hash of a number x is based on the reduction
of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
hash(x) == hash(y) whenever x and y are numerically equal, even if
x and y have different types
_PyHASH_BITS is 61 (64-bit systems) or 31 (32-bit systems)(defined here)
So on 64-bit system built-in hash looks like this function:
def hash(number):
return number % (2 ** 61 - 1)
That's why for small ints you got the same values, while for example hash(2305843009213693950) returns 2305843009213693950 and hash(2305843009213693951) returns 0
add a comment |
up vote
4
down vote
The only purpose of the hash function is to produce an integer value that can be used to insert an object into a dict. The only thing hash guarantees is that if a == b, then hash(a) == hash(b). For a user-defined class Foo, it is the user's responsibility to ensure that Foo.__eq__ and Foo.__hash__ enforce this guarantee.
Anything else is implementation-dependent, and you shouldn't read anything into the value of hash(x) for any value x. Specifically, hash(a) == hash(b) is allowed for a != b, and hash(x) == x is not required for any particular x.
add a comment |
up vote
0
down vote
You should use hashlib module:
>>> import hashlib()
>>> m.update(b'abc')
>>> m.hexdigest()
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
In CPython, default Python interpreter implementation, built-in hash is done in this way:
For numeric types, the hash of a number x is based on the reduction
of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
hash(x) == hash(y) whenever x and y are numerically equal, even if
x and y have different types
_PyHASH_BITS is 61 (64-bit systems) or 31 (32-bit systems)(defined here)
So on 64-bit system built-in hash looks like this function:
def hash(number):
return number % (2 ** 61 - 1)
That's why for small ints you got the same values, while for example hash(2305843009213693950) returns 2305843009213693950 and hash(2305843009213693951) returns 0
add a comment |
up vote
4
down vote
accepted
In CPython, default Python interpreter implementation, built-in hash is done in this way:
For numeric types, the hash of a number x is based on the reduction
of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
hash(x) == hash(y) whenever x and y are numerically equal, even if
x and y have different types
_PyHASH_BITS is 61 (64-bit systems) or 31 (32-bit systems)(defined here)
So on 64-bit system built-in hash looks like this function:
def hash(number):
return number % (2 ** 61 - 1)
That's why for small ints you got the same values, while for example hash(2305843009213693950) returns 2305843009213693950 and hash(2305843009213693951) returns 0
add a comment |
up vote
4
down vote
accepted
up vote
4
down vote
accepted
In CPython, default Python interpreter implementation, built-in hash is done in this way:
For numeric types, the hash of a number x is based on the reduction
of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
hash(x) == hash(y) whenever x and y are numerically equal, even if
x and y have different types
_PyHASH_BITS is 61 (64-bit systems) or 31 (32-bit systems)(defined here)
So on 64-bit system built-in hash looks like this function:
def hash(number):
return number % (2 ** 61 - 1)
That's why for small ints you got the same values, while for example hash(2305843009213693950) returns 2305843009213693950 and hash(2305843009213693951) returns 0
In CPython, default Python interpreter implementation, built-in hash is done in this way:
For numeric types, the hash of a number x is based on the reduction
of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
hash(x) == hash(y) whenever x and y are numerically equal, even if
x and y have different types
_PyHASH_BITS is 61 (64-bit systems) or 31 (32-bit systems)(defined here)
So on 64-bit system built-in hash looks like this function:
def hash(number):
return number % (2 ** 61 - 1)
That's why for small ints you got the same values, while for example hash(2305843009213693950) returns 2305843009213693950 and hash(2305843009213693951) returns 0
edited Nov 7 at 17:19
answered Nov 7 at 17:12
ingvar
1,2101414
1,2101414
add a comment |
add a comment |
up vote
4
down vote
The only purpose of the hash function is to produce an integer value that can be used to insert an object into a dict. The only thing hash guarantees is that if a == b, then hash(a) == hash(b). For a user-defined class Foo, it is the user's responsibility to ensure that Foo.__eq__ and Foo.__hash__ enforce this guarantee.
Anything else is implementation-dependent, and you shouldn't read anything into the value of hash(x) for any value x. Specifically, hash(a) == hash(b) is allowed for a != b, and hash(x) == x is not required for any particular x.
add a comment |
up vote
4
down vote
The only purpose of the hash function is to produce an integer value that can be used to insert an object into a dict. The only thing hash guarantees is that if a == b, then hash(a) == hash(b). For a user-defined class Foo, it is the user's responsibility to ensure that Foo.__eq__ and Foo.__hash__ enforce this guarantee.
Anything else is implementation-dependent, and you shouldn't read anything into the value of hash(x) for any value x. Specifically, hash(a) == hash(b) is allowed for a != b, and hash(x) == x is not required for any particular x.
add a comment |
up vote
4
down vote
up vote
4
down vote
The only purpose of the hash function is to produce an integer value that can be used to insert an object into a dict. The only thing hash guarantees is that if a == b, then hash(a) == hash(b). For a user-defined class Foo, it is the user's responsibility to ensure that Foo.__eq__ and Foo.__hash__ enforce this guarantee.
Anything else is implementation-dependent, and you shouldn't read anything into the value of hash(x) for any value x. Specifically, hash(a) == hash(b) is allowed for a != b, and hash(x) == x is not required for any particular x.
The only purpose of the hash function is to produce an integer value that can be used to insert an object into a dict. The only thing hash guarantees is that if a == b, then hash(a) == hash(b). For a user-defined class Foo, it is the user's responsibility to ensure that Foo.__eq__ and Foo.__hash__ enforce this guarantee.
Anything else is implementation-dependent, and you shouldn't read anything into the value of hash(x) for any value x. Specifically, hash(a) == hash(b) is allowed for a != b, and hash(x) == x is not required for any particular x.
answered Nov 7 at 17:12
chepner
239k29227319
239k29227319
add a comment |
add a comment |
up vote
0
down vote
You should use hashlib module:
>>> import hashlib()
>>> m.update(b'abc')
>>> m.hexdigest()
add a comment |
up vote
0
down vote
You should use hashlib module:
>>> import hashlib()
>>> m.update(b'abc')
>>> m.hexdigest()
add a comment |
up vote
0
down vote
up vote
0
down vote
You should use hashlib module:
>>> import hashlib()
>>> m.update(b'abc')
>>> m.hexdigest()
You should use hashlib module:
>>> import hashlib()
>>> m.update(b'abc')
>>> m.hexdigest()
answered Nov 7 at 17:15
Sdrf1445
32
32
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53194244%2fwhy-does-hash-method-return-short-hash-value-with-int-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
What
hashdoes is implementation-dependent; don't make any assumptions about what it returns.– chepner
Nov 7 at 17:00
2
Collisions are inevitable; larger tables reduce collisions, but waste more space.
– chepner
Nov 7 at 17:03
3
just to clarify:
hashis not a cryptographic hash. if you are interested in those usehashlib. the built-inhashis just about unique identifiers.– hiro protagonist
Nov 7 at 17:03
2
The purpose of this value is to distribute keys into dictionary buckets -- it's not intended to be used for purposes that require longer output or stronger guarantees; given its primary use case, the main design goal is speed (since every lookup requires calculating the hash for the key).
– Charles Duffy
Nov 7 at 17:04
1
BTW, code formatting should be used, for, well, code.
a long-length integerisn't code, it's English prose; likewise forshort hash value. If you want to emphasize prose, italics are usually the right choice. See Highlighting technical words? on Meta Stack Exchange.– Charles Duffy
Nov 7 at 17:10