Tensorflow - Sparse embedding lookup that remains sparse
up vote
0
down vote
favorite
I'm implementing a text classifier with a CNN similar to Kim 2014 with Tensorflow. Tensorflow provides tf.nn.embedding_lookup_sparse
, which allows you to provide the word IDs as a sparse tensor. This is nice, especially for enabling variable length sequences. However, this function requires a "combination" step after the lookup, such as "mean" or "sum". This coerces it back to the dense tensor space. I don't want to do any combination. I want to keep my vectors in the sparse representation, so I can do other convolutions afterwards. Is this possible in TF?
EDIT: I want to avoid padding the input prior to the embedding lookup. This is because Tensorflow's embedding lookup generates vectors for the pad value, and its a kludge trying to mask it with zeros (see here).
python tensorflow conv-neural-network text-classification
add a comment |
up vote
0
down vote
favorite
I'm implementing a text classifier with a CNN similar to Kim 2014 with Tensorflow. Tensorflow provides tf.nn.embedding_lookup_sparse
, which allows you to provide the word IDs as a sparse tensor. This is nice, especially for enabling variable length sequences. However, this function requires a "combination" step after the lookup, such as "mean" or "sum". This coerces it back to the dense tensor space. I don't want to do any combination. I want to keep my vectors in the sparse representation, so I can do other convolutions afterwards. Is this possible in TF?
EDIT: I want to avoid padding the input prior to the embedding lookup. This is because Tensorflow's embedding lookup generates vectors for the pad value, and its a kludge trying to mask it with zeros (see here).
python tensorflow conv-neural-network text-classification
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I'm implementing a text classifier with a CNN similar to Kim 2014 with Tensorflow. Tensorflow provides tf.nn.embedding_lookup_sparse
, which allows you to provide the word IDs as a sparse tensor. This is nice, especially for enabling variable length sequences. However, this function requires a "combination" step after the lookup, such as "mean" or "sum". This coerces it back to the dense tensor space. I don't want to do any combination. I want to keep my vectors in the sparse representation, so I can do other convolutions afterwards. Is this possible in TF?
EDIT: I want to avoid padding the input prior to the embedding lookup. This is because Tensorflow's embedding lookup generates vectors for the pad value, and its a kludge trying to mask it with zeros (see here).
python tensorflow conv-neural-network text-classification
I'm implementing a text classifier with a CNN similar to Kim 2014 with Tensorflow. Tensorflow provides tf.nn.embedding_lookup_sparse
, which allows you to provide the word IDs as a sparse tensor. This is nice, especially for enabling variable length sequences. However, this function requires a "combination" step after the lookup, such as "mean" or "sum". This coerces it back to the dense tensor space. I don't want to do any combination. I want to keep my vectors in the sparse representation, so I can do other convolutions afterwards. Is this possible in TF?
EDIT: I want to avoid padding the input prior to the embedding lookup. This is because Tensorflow's embedding lookup generates vectors for the pad value, and its a kludge trying to mask it with zeros (see here).
python tensorflow conv-neural-network text-classification
python tensorflow conv-neural-network text-classification
edited Nov 7 at 20:37
asked Nov 7 at 20:22
Andy Carlson
1,270817
1,270817
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
I think there are two points of confusion in the question. Firstly, the combiner operation happens across the set of embedding IDs for each row of the sparse indices input sp_ids
. So if sp_ids
has a shape of N x 1, then you are "combining" just one embedding vector per each row of sp_ids
, which will just retrieve that embedding vector (which is I think what you are saying you want).
Secondly though, the return value is the embedding vector for each row of input. The embedding vector itself is a dense vector, by very definition of what the embedding is and what the TensorFlow embedding operations calculate. So this return result will always be dense, and that's what you want. A sparse matrix representation would be horribly inefficient, since the matrix truly will be dense (full of dense embeddings), regardless of whether any 'combiner' operation happens or not.
The research paper you linked does not seem to be doing any type of special methodology that would result in a special case of a sparse embedding vector, so I don't see a reason here for expecting or desiring sparse outputs.
Maybe I am incorrect, can you provide more details about why you expect the embedding vectors themselves to be sparse vectors? That would be a highly unusual situation if so.
No confusion—just an inability on my part to effectively communicate what I want through words alone. Indeed, The embedding lookup takes a (sparse) sequence of word ids and produces one (dense) embedded vector per sequence. What I would like instead is to produce one embedded vector per ID, i.e. get multiple embedded vectors per sequence. I desire the sparse representation not for efficiency, but to enable variable-length sequences of embedded vectors. As in the paper, these are then convolved and max-pooled. Essentially the "combination" I want is a 1D-conv and pool, instead of a "mean".
– Andy Carlson
Nov 7 at 21:18
Why not just call the function multiple times, where for each callsp_ids
is anNx1
sparse tensor of the ids. Essentially changing one bigKx<variable length>
tensor used forsp_ids
into K distinct<variable length>x1
tensors, which serve as the lookup ids for the returned tensors per each input to the function call.
– ely
Nov 7 at 22:04
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
I think there are two points of confusion in the question. Firstly, the combiner operation happens across the set of embedding IDs for each row of the sparse indices input sp_ids
. So if sp_ids
has a shape of N x 1, then you are "combining" just one embedding vector per each row of sp_ids
, which will just retrieve that embedding vector (which is I think what you are saying you want).
Secondly though, the return value is the embedding vector for each row of input. The embedding vector itself is a dense vector, by very definition of what the embedding is and what the TensorFlow embedding operations calculate. So this return result will always be dense, and that's what you want. A sparse matrix representation would be horribly inefficient, since the matrix truly will be dense (full of dense embeddings), regardless of whether any 'combiner' operation happens or not.
The research paper you linked does not seem to be doing any type of special methodology that would result in a special case of a sparse embedding vector, so I don't see a reason here for expecting or desiring sparse outputs.
Maybe I am incorrect, can you provide more details about why you expect the embedding vectors themselves to be sparse vectors? That would be a highly unusual situation if so.
No confusion—just an inability on my part to effectively communicate what I want through words alone. Indeed, The embedding lookup takes a (sparse) sequence of word ids and produces one (dense) embedded vector per sequence. What I would like instead is to produce one embedded vector per ID, i.e. get multiple embedded vectors per sequence. I desire the sparse representation not for efficiency, but to enable variable-length sequences of embedded vectors. As in the paper, these are then convolved and max-pooled. Essentially the "combination" I want is a 1D-conv and pool, instead of a "mean".
– Andy Carlson
Nov 7 at 21:18
Why not just call the function multiple times, where for each callsp_ids
is anNx1
sparse tensor of the ids. Essentially changing one bigKx<variable length>
tensor used forsp_ids
into K distinct<variable length>x1
tensors, which serve as the lookup ids for the returned tensors per each input to the function call.
– ely
Nov 7 at 22:04
add a comment |
up vote
0
down vote
I think there are two points of confusion in the question. Firstly, the combiner operation happens across the set of embedding IDs for each row of the sparse indices input sp_ids
. So if sp_ids
has a shape of N x 1, then you are "combining" just one embedding vector per each row of sp_ids
, which will just retrieve that embedding vector (which is I think what you are saying you want).
Secondly though, the return value is the embedding vector for each row of input. The embedding vector itself is a dense vector, by very definition of what the embedding is and what the TensorFlow embedding operations calculate. So this return result will always be dense, and that's what you want. A sparse matrix representation would be horribly inefficient, since the matrix truly will be dense (full of dense embeddings), regardless of whether any 'combiner' operation happens or not.
The research paper you linked does not seem to be doing any type of special methodology that would result in a special case of a sparse embedding vector, so I don't see a reason here for expecting or desiring sparse outputs.
Maybe I am incorrect, can you provide more details about why you expect the embedding vectors themselves to be sparse vectors? That would be a highly unusual situation if so.
No confusion—just an inability on my part to effectively communicate what I want through words alone. Indeed, The embedding lookup takes a (sparse) sequence of word ids and produces one (dense) embedded vector per sequence. What I would like instead is to produce one embedded vector per ID, i.e. get multiple embedded vectors per sequence. I desire the sparse representation not for efficiency, but to enable variable-length sequences of embedded vectors. As in the paper, these are then convolved and max-pooled. Essentially the "combination" I want is a 1D-conv and pool, instead of a "mean".
– Andy Carlson
Nov 7 at 21:18
Why not just call the function multiple times, where for each callsp_ids
is anNx1
sparse tensor of the ids. Essentially changing one bigKx<variable length>
tensor used forsp_ids
into K distinct<variable length>x1
tensors, which serve as the lookup ids for the returned tensors per each input to the function call.
– ely
Nov 7 at 22:04
add a comment |
up vote
0
down vote
up vote
0
down vote
I think there are two points of confusion in the question. Firstly, the combiner operation happens across the set of embedding IDs for each row of the sparse indices input sp_ids
. So if sp_ids
has a shape of N x 1, then you are "combining" just one embedding vector per each row of sp_ids
, which will just retrieve that embedding vector (which is I think what you are saying you want).
Secondly though, the return value is the embedding vector for each row of input. The embedding vector itself is a dense vector, by very definition of what the embedding is and what the TensorFlow embedding operations calculate. So this return result will always be dense, and that's what you want. A sparse matrix representation would be horribly inefficient, since the matrix truly will be dense (full of dense embeddings), regardless of whether any 'combiner' operation happens or not.
The research paper you linked does not seem to be doing any type of special methodology that would result in a special case of a sparse embedding vector, so I don't see a reason here for expecting or desiring sparse outputs.
Maybe I am incorrect, can you provide more details about why you expect the embedding vectors themselves to be sparse vectors? That would be a highly unusual situation if so.
I think there are two points of confusion in the question. Firstly, the combiner operation happens across the set of embedding IDs for each row of the sparse indices input sp_ids
. So if sp_ids
has a shape of N x 1, then you are "combining" just one embedding vector per each row of sp_ids
, which will just retrieve that embedding vector (which is I think what you are saying you want).
Secondly though, the return value is the embedding vector for each row of input. The embedding vector itself is a dense vector, by very definition of what the embedding is and what the TensorFlow embedding operations calculate. So this return result will always be dense, and that's what you want. A sparse matrix representation would be horribly inefficient, since the matrix truly will be dense (full of dense embeddings), regardless of whether any 'combiner' operation happens or not.
The research paper you linked does not seem to be doing any type of special methodology that would result in a special case of a sparse embedding vector, so I don't see a reason here for expecting or desiring sparse outputs.
Maybe I am incorrect, can you provide more details about why you expect the embedding vectors themselves to be sparse vectors? That would be a highly unusual situation if so.
answered Nov 7 at 20:56
ely
35.5k2095161
35.5k2095161
No confusion—just an inability on my part to effectively communicate what I want through words alone. Indeed, The embedding lookup takes a (sparse) sequence of word ids and produces one (dense) embedded vector per sequence. What I would like instead is to produce one embedded vector per ID, i.e. get multiple embedded vectors per sequence. I desire the sparse representation not for efficiency, but to enable variable-length sequences of embedded vectors. As in the paper, these are then convolved and max-pooled. Essentially the "combination" I want is a 1D-conv and pool, instead of a "mean".
– Andy Carlson
Nov 7 at 21:18
Why not just call the function multiple times, where for each callsp_ids
is anNx1
sparse tensor of the ids. Essentially changing one bigKx<variable length>
tensor used forsp_ids
into K distinct<variable length>x1
tensors, which serve as the lookup ids for the returned tensors per each input to the function call.
– ely
Nov 7 at 22:04
add a comment |
No confusion—just an inability on my part to effectively communicate what I want through words alone. Indeed, The embedding lookup takes a (sparse) sequence of word ids and produces one (dense) embedded vector per sequence. What I would like instead is to produce one embedded vector per ID, i.e. get multiple embedded vectors per sequence. I desire the sparse representation not for efficiency, but to enable variable-length sequences of embedded vectors. As in the paper, these are then convolved and max-pooled. Essentially the "combination" I want is a 1D-conv and pool, instead of a "mean".
– Andy Carlson
Nov 7 at 21:18
Why not just call the function multiple times, where for each callsp_ids
is anNx1
sparse tensor of the ids. Essentially changing one bigKx<variable length>
tensor used forsp_ids
into K distinct<variable length>x1
tensors, which serve as the lookup ids for the returned tensors per each input to the function call.
– ely
Nov 7 at 22:04
No confusion—just an inability on my part to effectively communicate what I want through words alone. Indeed, The embedding lookup takes a (sparse) sequence of word ids and produces one (dense) embedded vector per sequence. What I would like instead is to produce one embedded vector per ID, i.e. get multiple embedded vectors per sequence. I desire the sparse representation not for efficiency, but to enable variable-length sequences of embedded vectors. As in the paper, these are then convolved and max-pooled. Essentially the "combination" I want is a 1D-conv and pool, instead of a "mean".
– Andy Carlson
Nov 7 at 21:18
No confusion—just an inability on my part to effectively communicate what I want through words alone. Indeed, The embedding lookup takes a (sparse) sequence of word ids and produces one (dense) embedded vector per sequence. What I would like instead is to produce one embedded vector per ID, i.e. get multiple embedded vectors per sequence. I desire the sparse representation not for efficiency, but to enable variable-length sequences of embedded vectors. As in the paper, these are then convolved and max-pooled. Essentially the "combination" I want is a 1D-conv and pool, instead of a "mean".
– Andy Carlson
Nov 7 at 21:18
Why not just call the function multiple times, where for each call
sp_ids
is an Nx1
sparse tensor of the ids. Essentially changing one big Kx<variable length>
tensor used for sp_ids
into K distinct <variable length>x1
tensors, which serve as the lookup ids for the returned tensors per each input to the function call.– ely
Nov 7 at 22:04
Why not just call the function multiple times, where for each call
sp_ids
is an Nx1
sparse tensor of the ids. Essentially changing one big Kx<variable length>
tensor used for sp_ids
into K distinct <variable length>x1
tensors, which serve as the lookup ids for the returned tensors per each input to the function call.– ely
Nov 7 at 22:04
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53197231%2ftensorflow-sparse-embedding-lookup-that-remains-sparse%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown