Why does hash() method return short Hash value with int in Python?











up vote
1
down vote

favorite












When hash() method is called in Python 3, I noticed that it doesn't return a long-length integer when taking in int data type but with string type.



Is this supposed to work this way? If that actually is the case, for the int type to have a short hash value, won't it cause collision since it's too short?



for i in [i for i in range(5)]:
print(hash(i))

print(hash("abc"))


The Result:



0
1
2
3
4
4714025963994714141









share|improve this question




















  • 4




    What hash does is implementation-dependent; don't make any assumptions about what it returns.
    – chepner
    Nov 7 at 17:00






  • 2




    Collisions are inevitable; larger tables reduce collisions, but waste more space.
    – chepner
    Nov 7 at 17:03






  • 3




    just to clarify: hash is not a cryptographic hash. if you are interested in those use hashlib. the built-in hash is just about unique identifiers.
    – hiro protagonist
    Nov 7 at 17:03








  • 2




    The purpose of this value is to distribute keys into dictionary buckets -- it's not intended to be used for purposes that require longer output or stronger guarantees; given its primary use case, the main design goal is speed (since every lookup requires calculating the hash for the key).
    – Charles Duffy
    Nov 7 at 17:04








  • 1




    BTW, code formatting should be used, for, well, code. a long-length integer isn't code, it's English prose; likewise for short hash value. If you want to emphasize prose, italics are usually the right choice. See Highlighting technical words? on Meta Stack Exchange.
    – Charles Duffy
    Nov 7 at 17:10

















up vote
1
down vote

favorite












When hash() method is called in Python 3, I noticed that it doesn't return a long-length integer when taking in int data type but with string type.



Is this supposed to work this way? If that actually is the case, for the int type to have a short hash value, won't it cause collision since it's too short?



for i in [i for i in range(5)]:
print(hash(i))

print(hash("abc"))


The Result:



0
1
2
3
4
4714025963994714141









share|improve this question




















  • 4




    What hash does is implementation-dependent; don't make any assumptions about what it returns.
    – chepner
    Nov 7 at 17:00






  • 2




    Collisions are inevitable; larger tables reduce collisions, but waste more space.
    – chepner
    Nov 7 at 17:03






  • 3




    just to clarify: hash is not a cryptographic hash. if you are interested in those use hashlib. the built-in hash is just about unique identifiers.
    – hiro protagonist
    Nov 7 at 17:03








  • 2




    The purpose of this value is to distribute keys into dictionary buckets -- it's not intended to be used for purposes that require longer output or stronger guarantees; given its primary use case, the main design goal is speed (since every lookup requires calculating the hash for the key).
    – Charles Duffy
    Nov 7 at 17:04








  • 1




    BTW, code formatting should be used, for, well, code. a long-length integer isn't code, it's English prose; likewise for short hash value. If you want to emphasize prose, italics are usually the right choice. See Highlighting technical words? on Meta Stack Exchange.
    – Charles Duffy
    Nov 7 at 17:10















up vote
1
down vote

favorite









up vote
1
down vote

favorite











When hash() method is called in Python 3, I noticed that it doesn't return a long-length integer when taking in int data type but with string type.



Is this supposed to work this way? If that actually is the case, for the int type to have a short hash value, won't it cause collision since it's too short?



for i in [i for i in range(5)]:
print(hash(i))

print(hash("abc"))


The Result:



0
1
2
3
4
4714025963994714141









share|improve this question















When hash() method is called in Python 3, I noticed that it doesn't return a long-length integer when taking in int data type but with string type.



Is this supposed to work this way? If that actually is the case, for the int type to have a short hash value, won't it cause collision since it's too short?



for i in [i for i in range(5)]:
print(hash(i))

print(hash("abc"))


The Result:



0
1
2
3
4
4714025963994714141






python hash types






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 7 at 17:13









Charles Duffy

170k24193247




170k24193247










asked Nov 7 at 16:58









Poream3387

460114




460114








  • 4




    What hash does is implementation-dependent; don't make any assumptions about what it returns.
    – chepner
    Nov 7 at 17:00






  • 2




    Collisions are inevitable; larger tables reduce collisions, but waste more space.
    – chepner
    Nov 7 at 17:03






  • 3




    just to clarify: hash is not a cryptographic hash. if you are interested in those use hashlib. the built-in hash is just about unique identifiers.
    – hiro protagonist
    Nov 7 at 17:03








  • 2




    The purpose of this value is to distribute keys into dictionary buckets -- it's not intended to be used for purposes that require longer output or stronger guarantees; given its primary use case, the main design goal is speed (since every lookup requires calculating the hash for the key).
    – Charles Duffy
    Nov 7 at 17:04








  • 1




    BTW, code formatting should be used, for, well, code. a long-length integer isn't code, it's English prose; likewise for short hash value. If you want to emphasize prose, italics are usually the right choice. See Highlighting technical words? on Meta Stack Exchange.
    – Charles Duffy
    Nov 7 at 17:10
















  • 4




    What hash does is implementation-dependent; don't make any assumptions about what it returns.
    – chepner
    Nov 7 at 17:00






  • 2




    Collisions are inevitable; larger tables reduce collisions, but waste more space.
    – chepner
    Nov 7 at 17:03






  • 3




    just to clarify: hash is not a cryptographic hash. if you are interested in those use hashlib. the built-in hash is just about unique identifiers.
    – hiro protagonist
    Nov 7 at 17:03








  • 2




    The purpose of this value is to distribute keys into dictionary buckets -- it's not intended to be used for purposes that require longer output or stronger guarantees; given its primary use case, the main design goal is speed (since every lookup requires calculating the hash for the key).
    – Charles Duffy
    Nov 7 at 17:04








  • 1




    BTW, code formatting should be used, for, well, code. a long-length integer isn't code, it's English prose; likewise for short hash value. If you want to emphasize prose, italics are usually the right choice. See Highlighting technical words? on Meta Stack Exchange.
    – Charles Duffy
    Nov 7 at 17:10










4




4




What hash does is implementation-dependent; don't make any assumptions about what it returns.
– chepner
Nov 7 at 17:00




What hash does is implementation-dependent; don't make any assumptions about what it returns.
– chepner
Nov 7 at 17:00




2




2




Collisions are inevitable; larger tables reduce collisions, but waste more space.
– chepner
Nov 7 at 17:03




Collisions are inevitable; larger tables reduce collisions, but waste more space.
– chepner
Nov 7 at 17:03




3




3




just to clarify: hash is not a cryptographic hash. if you are interested in those use hashlib. the built-in hash is just about unique identifiers.
– hiro protagonist
Nov 7 at 17:03






just to clarify: hash is not a cryptographic hash. if you are interested in those use hashlib. the built-in hash is just about unique identifiers.
– hiro protagonist
Nov 7 at 17:03






2




2




The purpose of this value is to distribute keys into dictionary buckets -- it's not intended to be used for purposes that require longer output or stronger guarantees; given its primary use case, the main design goal is speed (since every lookup requires calculating the hash for the key).
– Charles Duffy
Nov 7 at 17:04






The purpose of this value is to distribute keys into dictionary buckets -- it's not intended to be used for purposes that require longer output or stronger guarantees; given its primary use case, the main design goal is speed (since every lookup requires calculating the hash for the key).
– Charles Duffy
Nov 7 at 17:04






1




1




BTW, code formatting should be used, for, well, code. a long-length integer isn't code, it's English prose; likewise for short hash value. If you want to emphasize prose, italics are usually the right choice. See Highlighting technical words? on Meta Stack Exchange.
– Charles Duffy
Nov 7 at 17:10






BTW, code formatting should be used, for, well, code. a long-length integer isn't code, it's English prose; likewise for short hash value. If you want to emphasize prose, italics are usually the right choice. See Highlighting technical words? on Meta Stack Exchange.
– Charles Duffy
Nov 7 at 17:10














3 Answers
3






active

oldest

votes

















up vote
4
down vote



accepted










In CPython, default Python interpreter implementation, built-in hash is done in this way:




For numeric types, the hash of a number x is based on the reduction
of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
hash(x) == hash(y) whenever x and y are numerically equal, even if
x and y have different types




_PyHASH_BITS is 61 (64-bit systems) or 31 (32-bit systems)(defined here)



So on 64-bit system built-in hash looks like this function:



def hash(number):
return number % (2 ** 61 - 1)


That's why for small ints you got the same values, while for example hash(2305843009213693950) returns 2305843009213693950 and hash(2305843009213693951) returns 0






share|improve this answer






























    up vote
    4
    down vote













    The only purpose of the hash function is to produce an integer value that can be used to insert an object into a dict. The only thing hash guarantees is that if a == b, then hash(a) == hash(b). For a user-defined class Foo, it is the user's responsibility to ensure that Foo.__eq__ and Foo.__hash__ enforce this guarantee.



    Anything else is implementation-dependent, and you shouldn't read anything into the value of hash(x) for any value x. Specifically, hash(a) == hash(b) is allowed for a != b, and hash(x) == x is not required for any particular x.






    share|improve this answer




























      up vote
      0
      down vote













      You should use hashlib module:



      >>> import hashlib()
      >>> m.update(b'abc')
      >>> m.hexdigest()





      share|improve this answer





















        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














         

        draft saved


        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53194244%2fwhy-does-hash-method-return-short-hash-value-with-int-in-python%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        4
        down vote



        accepted










        In CPython, default Python interpreter implementation, built-in hash is done in this way:




        For numeric types, the hash of a number x is based on the reduction
        of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
        hash(x) == hash(y) whenever x and y are numerically equal, even if
        x and y have different types




        _PyHASH_BITS is 61 (64-bit systems) or 31 (32-bit systems)(defined here)



        So on 64-bit system built-in hash looks like this function:



        def hash(number):
        return number % (2 ** 61 - 1)


        That's why for small ints you got the same values, while for example hash(2305843009213693950) returns 2305843009213693950 and hash(2305843009213693951) returns 0






        share|improve this answer



























          up vote
          4
          down vote



          accepted










          In CPython, default Python interpreter implementation, built-in hash is done in this way:




          For numeric types, the hash of a number x is based on the reduction
          of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
          hash(x) == hash(y) whenever x and y are numerically equal, even if
          x and y have different types




          _PyHASH_BITS is 61 (64-bit systems) or 31 (32-bit systems)(defined here)



          So on 64-bit system built-in hash looks like this function:



          def hash(number):
          return number % (2 ** 61 - 1)


          That's why for small ints you got the same values, while for example hash(2305843009213693950) returns 2305843009213693950 and hash(2305843009213693951) returns 0






          share|improve this answer

























            up vote
            4
            down vote



            accepted







            up vote
            4
            down vote



            accepted






            In CPython, default Python interpreter implementation, built-in hash is done in this way:




            For numeric types, the hash of a number x is based on the reduction
            of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
            hash(x) == hash(y) whenever x and y are numerically equal, even if
            x and y have different types




            _PyHASH_BITS is 61 (64-bit systems) or 31 (32-bit systems)(defined here)



            So on 64-bit system built-in hash looks like this function:



            def hash(number):
            return number % (2 ** 61 - 1)


            That's why for small ints you got the same values, while for example hash(2305843009213693950) returns 2305843009213693950 and hash(2305843009213693951) returns 0






            share|improve this answer














            In CPython, default Python interpreter implementation, built-in hash is done in this way:




            For numeric types, the hash of a number x is based on the reduction
            of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
            hash(x) == hash(y) whenever x and y are numerically equal, even if
            x and y have different types




            _PyHASH_BITS is 61 (64-bit systems) or 31 (32-bit systems)(defined here)



            So on 64-bit system built-in hash looks like this function:



            def hash(number):
            return number % (2 ** 61 - 1)


            That's why for small ints you got the same values, while for example hash(2305843009213693950) returns 2305843009213693950 and hash(2305843009213693951) returns 0







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 7 at 17:19

























            answered Nov 7 at 17:12









            ingvar

            1,2101414




            1,2101414
























                up vote
                4
                down vote













                The only purpose of the hash function is to produce an integer value that can be used to insert an object into a dict. The only thing hash guarantees is that if a == b, then hash(a) == hash(b). For a user-defined class Foo, it is the user's responsibility to ensure that Foo.__eq__ and Foo.__hash__ enforce this guarantee.



                Anything else is implementation-dependent, and you shouldn't read anything into the value of hash(x) for any value x. Specifically, hash(a) == hash(b) is allowed for a != b, and hash(x) == x is not required for any particular x.






                share|improve this answer

























                  up vote
                  4
                  down vote













                  The only purpose of the hash function is to produce an integer value that can be used to insert an object into a dict. The only thing hash guarantees is that if a == b, then hash(a) == hash(b). For a user-defined class Foo, it is the user's responsibility to ensure that Foo.__eq__ and Foo.__hash__ enforce this guarantee.



                  Anything else is implementation-dependent, and you shouldn't read anything into the value of hash(x) for any value x. Specifically, hash(a) == hash(b) is allowed for a != b, and hash(x) == x is not required for any particular x.






                  share|improve this answer























                    up vote
                    4
                    down vote










                    up vote
                    4
                    down vote









                    The only purpose of the hash function is to produce an integer value that can be used to insert an object into a dict. The only thing hash guarantees is that if a == b, then hash(a) == hash(b). For a user-defined class Foo, it is the user's responsibility to ensure that Foo.__eq__ and Foo.__hash__ enforce this guarantee.



                    Anything else is implementation-dependent, and you shouldn't read anything into the value of hash(x) for any value x. Specifically, hash(a) == hash(b) is allowed for a != b, and hash(x) == x is not required for any particular x.






                    share|improve this answer












                    The only purpose of the hash function is to produce an integer value that can be used to insert an object into a dict. The only thing hash guarantees is that if a == b, then hash(a) == hash(b). For a user-defined class Foo, it is the user's responsibility to ensure that Foo.__eq__ and Foo.__hash__ enforce this guarantee.



                    Anything else is implementation-dependent, and you shouldn't read anything into the value of hash(x) for any value x. Specifically, hash(a) == hash(b) is allowed for a != b, and hash(x) == x is not required for any particular x.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 7 at 17:12









                    chepner

                    239k29227319




                    239k29227319






















                        up vote
                        0
                        down vote













                        You should use hashlib module:



                        >>> import hashlib()
                        >>> m.update(b'abc')
                        >>> m.hexdigest()





                        share|improve this answer

























                          up vote
                          0
                          down vote













                          You should use hashlib module:



                          >>> import hashlib()
                          >>> m.update(b'abc')
                          >>> m.hexdigest()





                          share|improve this answer























                            up vote
                            0
                            down vote










                            up vote
                            0
                            down vote









                            You should use hashlib module:



                            >>> import hashlib()
                            >>> m.update(b'abc')
                            >>> m.hexdigest()





                            share|improve this answer












                            You should use hashlib module:



                            >>> import hashlib()
                            >>> m.update(b'abc')
                            >>> m.hexdigest()






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Nov 7 at 17:15









                            Sdrf1445

                            32




                            32






























                                 

                                draft saved


                                draft discarded



















































                                 


                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53194244%2fwhy-does-hash-method-return-short-hash-value-with-int-in-python%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                這個網誌中的熱門文章

                                Academy of Television Arts & Sciences

                                L'Équipe

                                1995 France bombings