Tensorflow - Sparse embedding lookup that remains sparse











up vote
0
down vote

favorite












I'm implementing a text classifier with a CNN similar to Kim 2014 with Tensorflow. Tensorflow provides tf.nn.embedding_lookup_sparse, which allows you to provide the word IDs as a sparse tensor. This is nice, especially for enabling variable length sequences. However, this function requires a "combination" step after the lookup, such as "mean" or "sum". This coerces it back to the dense tensor space. I don't want to do any combination. I want to keep my vectors in the sparse representation, so I can do other convolutions afterwards. Is this possible in TF?



EDIT: I want to avoid padding the input prior to the embedding lookup. This is because Tensorflow's embedding lookup generates vectors for the pad value, and its a kludge trying to mask it with zeros (see here).










share|improve this question




























    up vote
    0
    down vote

    favorite












    I'm implementing a text classifier with a CNN similar to Kim 2014 with Tensorflow. Tensorflow provides tf.nn.embedding_lookup_sparse, which allows you to provide the word IDs as a sparse tensor. This is nice, especially for enabling variable length sequences. However, this function requires a "combination" step after the lookup, such as "mean" or "sum". This coerces it back to the dense tensor space. I don't want to do any combination. I want to keep my vectors in the sparse representation, so I can do other convolutions afterwards. Is this possible in TF?



    EDIT: I want to avoid padding the input prior to the embedding lookup. This is because Tensorflow's embedding lookup generates vectors for the pad value, and its a kludge trying to mask it with zeros (see here).










    share|improve this question


























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I'm implementing a text classifier with a CNN similar to Kim 2014 with Tensorflow. Tensorflow provides tf.nn.embedding_lookup_sparse, which allows you to provide the word IDs as a sparse tensor. This is nice, especially for enabling variable length sequences. However, this function requires a "combination" step after the lookup, such as "mean" or "sum". This coerces it back to the dense tensor space. I don't want to do any combination. I want to keep my vectors in the sparse representation, so I can do other convolutions afterwards. Is this possible in TF?



      EDIT: I want to avoid padding the input prior to the embedding lookup. This is because Tensorflow's embedding lookup generates vectors for the pad value, and its a kludge trying to mask it with zeros (see here).










      share|improve this question















      I'm implementing a text classifier with a CNN similar to Kim 2014 with Tensorflow. Tensorflow provides tf.nn.embedding_lookup_sparse, which allows you to provide the word IDs as a sparse tensor. This is nice, especially for enabling variable length sequences. However, this function requires a "combination" step after the lookup, such as "mean" or "sum". This coerces it back to the dense tensor space. I don't want to do any combination. I want to keep my vectors in the sparse representation, so I can do other convolutions afterwards. Is this possible in TF?



      EDIT: I want to avoid padding the input prior to the embedding lookup. This is because Tensorflow's embedding lookup generates vectors for the pad value, and its a kludge trying to mask it with zeros (see here).







      python tensorflow conv-neural-network text-classification






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 7 at 20:37

























      asked Nov 7 at 20:22









      Andy Carlson

      1,270817




      1,270817
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          I think there are two points of confusion in the question. Firstly, the combiner operation happens across the set of embedding IDs for each row of the sparse indices input sp_ids. So if sp_ids has a shape of N x 1, then you are "combining" just one embedding vector per each row of sp_ids, which will just retrieve that embedding vector (which is I think what you are saying you want).



          Secondly though, the return value is the embedding vector for each row of input. The embedding vector itself is a dense vector, by very definition of what the embedding is and what the TensorFlow embedding operations calculate. So this return result will always be dense, and that's what you want. A sparse matrix representation would be horribly inefficient, since the matrix truly will be dense (full of dense embeddings), regardless of whether any 'combiner' operation happens or not.



          The research paper you linked does not seem to be doing any type of special methodology that would result in a special case of a sparse embedding vector, so I don't see a reason here for expecting or desiring sparse outputs.



          Maybe I am incorrect, can you provide more details about why you expect the embedding vectors themselves to be sparse vectors? That would be a highly unusual situation if so.






          share|improve this answer





















          • No confusion—just an inability on my part to effectively communicate what I want through words alone. Indeed, The embedding lookup takes a (sparse) sequence of word ids and produces one (dense) embedded vector per sequence. What I would like instead is to produce one embedded vector per ID, i.e. get multiple embedded vectors per sequence. I desire the sparse representation not for efficiency, but to enable variable-length sequences of embedded vectors. As in the paper, these are then convolved and max-pooled. Essentially the "combination" I want is a 1D-conv and pool, instead of a "mean".
            – Andy Carlson
            Nov 7 at 21:18










          • Why not just call the function multiple times, where for each call sp_ids is an Nx1 sparse tensor of the ids. Essentially changing one big Kx<variable length> tensor used for sp_ids into K distinct <variable length>x1 tensors, which serve as the lookup ids for the returned tensors per each input to the function call.
            – ely
            Nov 7 at 22:04











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53197231%2ftensorflow-sparse-embedding-lookup-that-remains-sparse%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          I think there are two points of confusion in the question. Firstly, the combiner operation happens across the set of embedding IDs for each row of the sparse indices input sp_ids. So if sp_ids has a shape of N x 1, then you are "combining" just one embedding vector per each row of sp_ids, which will just retrieve that embedding vector (which is I think what you are saying you want).



          Secondly though, the return value is the embedding vector for each row of input. The embedding vector itself is a dense vector, by very definition of what the embedding is and what the TensorFlow embedding operations calculate. So this return result will always be dense, and that's what you want. A sparse matrix representation would be horribly inefficient, since the matrix truly will be dense (full of dense embeddings), regardless of whether any 'combiner' operation happens or not.



          The research paper you linked does not seem to be doing any type of special methodology that would result in a special case of a sparse embedding vector, so I don't see a reason here for expecting or desiring sparse outputs.



          Maybe I am incorrect, can you provide more details about why you expect the embedding vectors themselves to be sparse vectors? That would be a highly unusual situation if so.






          share|improve this answer





















          • No confusion—just an inability on my part to effectively communicate what I want through words alone. Indeed, The embedding lookup takes a (sparse) sequence of word ids and produces one (dense) embedded vector per sequence. What I would like instead is to produce one embedded vector per ID, i.e. get multiple embedded vectors per sequence. I desire the sparse representation not for efficiency, but to enable variable-length sequences of embedded vectors. As in the paper, these are then convolved and max-pooled. Essentially the "combination" I want is a 1D-conv and pool, instead of a "mean".
            – Andy Carlson
            Nov 7 at 21:18










          • Why not just call the function multiple times, where for each call sp_ids is an Nx1 sparse tensor of the ids. Essentially changing one big Kx<variable length> tensor used for sp_ids into K distinct <variable length>x1 tensors, which serve as the lookup ids for the returned tensors per each input to the function call.
            – ely
            Nov 7 at 22:04















          up vote
          0
          down vote













          I think there are two points of confusion in the question. Firstly, the combiner operation happens across the set of embedding IDs for each row of the sparse indices input sp_ids. So if sp_ids has a shape of N x 1, then you are "combining" just one embedding vector per each row of sp_ids, which will just retrieve that embedding vector (which is I think what you are saying you want).



          Secondly though, the return value is the embedding vector for each row of input. The embedding vector itself is a dense vector, by very definition of what the embedding is and what the TensorFlow embedding operations calculate. So this return result will always be dense, and that's what you want. A sparse matrix representation would be horribly inefficient, since the matrix truly will be dense (full of dense embeddings), regardless of whether any 'combiner' operation happens or not.



          The research paper you linked does not seem to be doing any type of special methodology that would result in a special case of a sparse embedding vector, so I don't see a reason here for expecting or desiring sparse outputs.



          Maybe I am incorrect, can you provide more details about why you expect the embedding vectors themselves to be sparse vectors? That would be a highly unusual situation if so.






          share|improve this answer





















          • No confusion—just an inability on my part to effectively communicate what I want through words alone. Indeed, The embedding lookup takes a (sparse) sequence of word ids and produces one (dense) embedded vector per sequence. What I would like instead is to produce one embedded vector per ID, i.e. get multiple embedded vectors per sequence. I desire the sparse representation not for efficiency, but to enable variable-length sequences of embedded vectors. As in the paper, these are then convolved and max-pooled. Essentially the "combination" I want is a 1D-conv and pool, instead of a "mean".
            – Andy Carlson
            Nov 7 at 21:18










          • Why not just call the function multiple times, where for each call sp_ids is an Nx1 sparse tensor of the ids. Essentially changing one big Kx<variable length> tensor used for sp_ids into K distinct <variable length>x1 tensors, which serve as the lookup ids for the returned tensors per each input to the function call.
            – ely
            Nov 7 at 22:04













          up vote
          0
          down vote










          up vote
          0
          down vote









          I think there are two points of confusion in the question. Firstly, the combiner operation happens across the set of embedding IDs for each row of the sparse indices input sp_ids. So if sp_ids has a shape of N x 1, then you are "combining" just one embedding vector per each row of sp_ids, which will just retrieve that embedding vector (which is I think what you are saying you want).



          Secondly though, the return value is the embedding vector for each row of input. The embedding vector itself is a dense vector, by very definition of what the embedding is and what the TensorFlow embedding operations calculate. So this return result will always be dense, and that's what you want. A sparse matrix representation would be horribly inefficient, since the matrix truly will be dense (full of dense embeddings), regardless of whether any 'combiner' operation happens or not.



          The research paper you linked does not seem to be doing any type of special methodology that would result in a special case of a sparse embedding vector, so I don't see a reason here for expecting or desiring sparse outputs.



          Maybe I am incorrect, can you provide more details about why you expect the embedding vectors themselves to be sparse vectors? That would be a highly unusual situation if so.






          share|improve this answer












          I think there are two points of confusion in the question. Firstly, the combiner operation happens across the set of embedding IDs for each row of the sparse indices input sp_ids. So if sp_ids has a shape of N x 1, then you are "combining" just one embedding vector per each row of sp_ids, which will just retrieve that embedding vector (which is I think what you are saying you want).



          Secondly though, the return value is the embedding vector for each row of input. The embedding vector itself is a dense vector, by very definition of what the embedding is and what the TensorFlow embedding operations calculate. So this return result will always be dense, and that's what you want. A sparse matrix representation would be horribly inefficient, since the matrix truly will be dense (full of dense embeddings), regardless of whether any 'combiner' operation happens or not.



          The research paper you linked does not seem to be doing any type of special methodology that would result in a special case of a sparse embedding vector, so I don't see a reason here for expecting or desiring sparse outputs.



          Maybe I am incorrect, can you provide more details about why you expect the embedding vectors themselves to be sparse vectors? That would be a highly unusual situation if so.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 7 at 20:56









          ely

          35.5k2095161




          35.5k2095161












          • No confusion—just an inability on my part to effectively communicate what I want through words alone. Indeed, The embedding lookup takes a (sparse) sequence of word ids and produces one (dense) embedded vector per sequence. What I would like instead is to produce one embedded vector per ID, i.e. get multiple embedded vectors per sequence. I desire the sparse representation not for efficiency, but to enable variable-length sequences of embedded vectors. As in the paper, these are then convolved and max-pooled. Essentially the "combination" I want is a 1D-conv and pool, instead of a "mean".
            – Andy Carlson
            Nov 7 at 21:18










          • Why not just call the function multiple times, where for each call sp_ids is an Nx1 sparse tensor of the ids. Essentially changing one big Kx<variable length> tensor used for sp_ids into K distinct <variable length>x1 tensors, which serve as the lookup ids for the returned tensors per each input to the function call.
            – ely
            Nov 7 at 22:04


















          • No confusion—just an inability on my part to effectively communicate what I want through words alone. Indeed, The embedding lookup takes a (sparse) sequence of word ids and produces one (dense) embedded vector per sequence. What I would like instead is to produce one embedded vector per ID, i.e. get multiple embedded vectors per sequence. I desire the sparse representation not for efficiency, but to enable variable-length sequences of embedded vectors. As in the paper, these are then convolved and max-pooled. Essentially the "combination" I want is a 1D-conv and pool, instead of a "mean".
            – Andy Carlson
            Nov 7 at 21:18










          • Why not just call the function multiple times, where for each call sp_ids is an Nx1 sparse tensor of the ids. Essentially changing one big Kx<variable length> tensor used for sp_ids into K distinct <variable length>x1 tensors, which serve as the lookup ids for the returned tensors per each input to the function call.
            – ely
            Nov 7 at 22:04
















          No confusion—just an inability on my part to effectively communicate what I want through words alone. Indeed, The embedding lookup takes a (sparse) sequence of word ids and produces one (dense) embedded vector per sequence. What I would like instead is to produce one embedded vector per ID, i.e. get multiple embedded vectors per sequence. I desire the sparse representation not for efficiency, but to enable variable-length sequences of embedded vectors. As in the paper, these are then convolved and max-pooled. Essentially the "combination" I want is a 1D-conv and pool, instead of a "mean".
          – Andy Carlson
          Nov 7 at 21:18




          No confusion—just an inability on my part to effectively communicate what I want through words alone. Indeed, The embedding lookup takes a (sparse) sequence of word ids and produces one (dense) embedded vector per sequence. What I would like instead is to produce one embedded vector per ID, i.e. get multiple embedded vectors per sequence. I desire the sparse representation not for efficiency, but to enable variable-length sequences of embedded vectors. As in the paper, these are then convolved and max-pooled. Essentially the "combination" I want is a 1D-conv and pool, instead of a "mean".
          – Andy Carlson
          Nov 7 at 21:18












          Why not just call the function multiple times, where for each call sp_ids is an Nx1 sparse tensor of the ids. Essentially changing one big Kx<variable length> tensor used for sp_ids into K distinct <variable length>x1 tensors, which serve as the lookup ids for the returned tensors per each input to the function call.
          – ely
          Nov 7 at 22:04




          Why not just call the function multiple times, where for each call sp_ids is an Nx1 sparse tensor of the ids. Essentially changing one big Kx<variable length> tensor used for sp_ids into K distinct <variable length>x1 tensors, which serve as the lookup ids for the returned tensors per each input to the function call.
          – ely
          Nov 7 at 22:04


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53197231%2ftensorflow-sparse-embedding-lookup-that-remains-sparse%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Hercules Kyvelos

          Tangent Lines Diagram Along Smooth Curve

          Yusuf al-Mu'taman ibn Hud