Confusion about input shape for Keras Embedding layer





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















I'm trying to use the Keras embedding layer to create my own CBoW implementation to see how it works.



I've generated outputs represented by a vector of the context word I'm searching for with size equal to my vocab. I've also generated inputs so that each context word has X many nearby words represented by their one-hot encoded vectors.



So for example if my sentence is:




"I ran over the fence to find my dog"




using window size 2, I could generate the following input/output:



[[over, the, to, find], fence] where 'fence' is my context word, 'over', 'the', 'to', 'find' are my nearby words with window 2 (2 in front, 2 in back).



Using sample vocab size of 500 and 100 training samples, after one-hot encoding my input and output, it would have the following dimensions:



y.shape -> (100,500)
X.shape -> (100,4,500)


That is, I have 100 outputs each represented by a 500-sized vector. I have 100 inputs each represented by a series of 4 500-sized vectors.



I have a simple model defined as:



model = Sequential()
model.add(Embedding(input_dim=vocabulary_size, output_dim=embedding_size, input_length=2*window_size))
#take average of context words at hidden layer
model.add(Lambda(lambda x: K.mean(x, axis = 1), output_shape=(embedding_size,)))
model.add(Dense(vocabulary_size, activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer = 'adam')


However, when I try to fit my model, I get a dimensional exception:



model.fit(X, y, batch_size=10, epochs=2, verbose=1)
ValueError: Error when checking input: expected embedding_6_input to have 2 dimensions, but got array with shape (100, 4, 500)


Now, I can only assume I'm using the embedding layer wrongly. I've read both this CrossValidated Question and the Keras documentation.



I'm still not sure exactly how the inputs of this embedding layer works. I'm fairly certain my input_dim and output_dim are correct, which leaves input_length. According to the CrossValidated, my input_length is the length of my sequence. According to Keras, my input should be of dimension (batch_size, input_length).



If my inputs are 4 words each represented by a word vector of size vocab_size, how do I input this to the model?










share|improve this question

























  • If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

    – today
    Dec 6 '18 at 12:58


















0















I'm trying to use the Keras embedding layer to create my own CBoW implementation to see how it works.



I've generated outputs represented by a vector of the context word I'm searching for with size equal to my vocab. I've also generated inputs so that each context word has X many nearby words represented by their one-hot encoded vectors.



So for example if my sentence is:




"I ran over the fence to find my dog"




using window size 2, I could generate the following input/output:



[[over, the, to, find], fence] where 'fence' is my context word, 'over', 'the', 'to', 'find' are my nearby words with window 2 (2 in front, 2 in back).



Using sample vocab size of 500 and 100 training samples, after one-hot encoding my input and output, it would have the following dimensions:



y.shape -> (100,500)
X.shape -> (100,4,500)


That is, I have 100 outputs each represented by a 500-sized vector. I have 100 inputs each represented by a series of 4 500-sized vectors.



I have a simple model defined as:



model = Sequential()
model.add(Embedding(input_dim=vocabulary_size, output_dim=embedding_size, input_length=2*window_size))
#take average of context words at hidden layer
model.add(Lambda(lambda x: K.mean(x, axis = 1), output_shape=(embedding_size,)))
model.add(Dense(vocabulary_size, activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer = 'adam')


However, when I try to fit my model, I get a dimensional exception:



model.fit(X, y, batch_size=10, epochs=2, verbose=1)
ValueError: Error when checking input: expected embedding_6_input to have 2 dimensions, but got array with shape (100, 4, 500)


Now, I can only assume I'm using the embedding layer wrongly. I've read both this CrossValidated Question and the Keras documentation.



I'm still not sure exactly how the inputs of this embedding layer works. I'm fairly certain my input_dim and output_dim are correct, which leaves input_length. According to the CrossValidated, my input_length is the length of my sequence. According to Keras, my input should be of dimension (batch_size, input_length).



If my inputs are 4 words each represented by a word vector of size vocab_size, how do I input this to the model?










share|improve this question

























  • If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

    – today
    Dec 6 '18 at 12:58














0












0








0








I'm trying to use the Keras embedding layer to create my own CBoW implementation to see how it works.



I've generated outputs represented by a vector of the context word I'm searching for with size equal to my vocab. I've also generated inputs so that each context word has X many nearby words represented by their one-hot encoded vectors.



So for example if my sentence is:




"I ran over the fence to find my dog"




using window size 2, I could generate the following input/output:



[[over, the, to, find], fence] where 'fence' is my context word, 'over', 'the', 'to', 'find' are my nearby words with window 2 (2 in front, 2 in back).



Using sample vocab size of 500 and 100 training samples, after one-hot encoding my input and output, it would have the following dimensions:



y.shape -> (100,500)
X.shape -> (100,4,500)


That is, I have 100 outputs each represented by a 500-sized vector. I have 100 inputs each represented by a series of 4 500-sized vectors.



I have a simple model defined as:



model = Sequential()
model.add(Embedding(input_dim=vocabulary_size, output_dim=embedding_size, input_length=2*window_size))
#take average of context words at hidden layer
model.add(Lambda(lambda x: K.mean(x, axis = 1), output_shape=(embedding_size,)))
model.add(Dense(vocabulary_size, activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer = 'adam')


However, when I try to fit my model, I get a dimensional exception:



model.fit(X, y, batch_size=10, epochs=2, verbose=1)
ValueError: Error when checking input: expected embedding_6_input to have 2 dimensions, but got array with shape (100, 4, 500)


Now, I can only assume I'm using the embedding layer wrongly. I've read both this CrossValidated Question and the Keras documentation.



I'm still not sure exactly how the inputs of this embedding layer works. I'm fairly certain my input_dim and output_dim are correct, which leaves input_length. According to the CrossValidated, my input_length is the length of my sequence. According to Keras, my input should be of dimension (batch_size, input_length).



If my inputs are 4 words each represented by a word vector of size vocab_size, how do I input this to the model?










share|improve this question
















I'm trying to use the Keras embedding layer to create my own CBoW implementation to see how it works.



I've generated outputs represented by a vector of the context word I'm searching for with size equal to my vocab. I've also generated inputs so that each context word has X many nearby words represented by their one-hot encoded vectors.



So for example if my sentence is:




"I ran over the fence to find my dog"




using window size 2, I could generate the following input/output:



[[over, the, to, find], fence] where 'fence' is my context word, 'over', 'the', 'to', 'find' are my nearby words with window 2 (2 in front, 2 in back).



Using sample vocab size of 500 and 100 training samples, after one-hot encoding my input and output, it would have the following dimensions:



y.shape -> (100,500)
X.shape -> (100,4,500)


That is, I have 100 outputs each represented by a 500-sized vector. I have 100 inputs each represented by a series of 4 500-sized vectors.



I have a simple model defined as:



model = Sequential()
model.add(Embedding(input_dim=vocabulary_size, output_dim=embedding_size, input_length=2*window_size))
#take average of context words at hidden layer
model.add(Lambda(lambda x: K.mean(x, axis = 1), output_shape=(embedding_size,)))
model.add(Dense(vocabulary_size, activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer = 'adam')


However, when I try to fit my model, I get a dimensional exception:



model.fit(X, y, batch_size=10, epochs=2, verbose=1)
ValueError: Error when checking input: expected embedding_6_input to have 2 dimensions, but got array with shape (100, 4, 500)


Now, I can only assume I'm using the embedding layer wrongly. I've read both this CrossValidated Question and the Keras documentation.



I'm still not sure exactly how the inputs of this embedding layer works. I'm fairly certain my input_dim and output_dim are correct, which leaves input_length. According to the CrossValidated, my input_length is the length of my sequence. According to Keras, my input should be of dimension (batch_size, input_length).



If my inputs are 4 words each represented by a word vector of size vocab_size, how do I input this to the model?







python machine-learning keras word2vec word-embedding






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 28 '18 at 12:16









today

12.3k22643




12.3k22643










asked Nov 25 '18 at 7:22









KevinKevin

1,15462743




1,15462743













  • If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

    – today
    Dec 6 '18 at 12:58



















  • If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

    – today
    Dec 6 '18 at 12:58

















If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

– today
Dec 6 '18 at 12:58





If the answer resolved your issue, kindly accept it by clicking on the checkmark next to the answer to mark it as "answered" - see What should I do when someone answers my question?

– today
Dec 6 '18 at 12:58












1 Answer
1






active

oldest

votes


















0














The problem is that you are thinking about the embedding layer in a wrong way. An Embedding layer is just a trainable look-up table: you give it an integer, which is the index of the word in the vocabulary, and it returns the word-vector (i.e. word embedding) of the given index. Therefore, its input must be the indices of the words in a sentence.



As an example, if the indices of the words "over", "the", "to" and "find" are 43, 6, 9 and 33 respectively, then the input of the Embedding layer would be an array of those indices, i.e. [43, 6, 9, 33]. Therefore, the training data must have a shape of (num_samples, num_words_in_a_sentence). In your case, it would be (100, 4). In other words, you don't need to one-hot encode the words for the input data. You can also use word indices as the labels as well if you use sparse_categorical_crossentropy as the loss function instead.






share|improve this answer


























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53465475%2fconfusion-about-input-shape-for-keras-embedding-layer%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    The problem is that you are thinking about the embedding layer in a wrong way. An Embedding layer is just a trainable look-up table: you give it an integer, which is the index of the word in the vocabulary, and it returns the word-vector (i.e. word embedding) of the given index. Therefore, its input must be the indices of the words in a sentence.



    As an example, if the indices of the words "over", "the", "to" and "find" are 43, 6, 9 and 33 respectively, then the input of the Embedding layer would be an array of those indices, i.e. [43, 6, 9, 33]. Therefore, the training data must have a shape of (num_samples, num_words_in_a_sentence). In your case, it would be (100, 4). In other words, you don't need to one-hot encode the words for the input data. You can also use word indices as the labels as well if you use sparse_categorical_crossentropy as the loss function instead.






    share|improve this answer






























      0














      The problem is that you are thinking about the embedding layer in a wrong way. An Embedding layer is just a trainable look-up table: you give it an integer, which is the index of the word in the vocabulary, and it returns the word-vector (i.e. word embedding) of the given index. Therefore, its input must be the indices of the words in a sentence.



      As an example, if the indices of the words "over", "the", "to" and "find" are 43, 6, 9 and 33 respectively, then the input of the Embedding layer would be an array of those indices, i.e. [43, 6, 9, 33]. Therefore, the training data must have a shape of (num_samples, num_words_in_a_sentence). In your case, it would be (100, 4). In other words, you don't need to one-hot encode the words for the input data. You can also use word indices as the labels as well if you use sparse_categorical_crossentropy as the loss function instead.






      share|improve this answer




























        0












        0








        0







        The problem is that you are thinking about the embedding layer in a wrong way. An Embedding layer is just a trainable look-up table: you give it an integer, which is the index of the word in the vocabulary, and it returns the word-vector (i.e. word embedding) of the given index. Therefore, its input must be the indices of the words in a sentence.



        As an example, if the indices of the words "over", "the", "to" and "find" are 43, 6, 9 and 33 respectively, then the input of the Embedding layer would be an array of those indices, i.e. [43, 6, 9, 33]. Therefore, the training data must have a shape of (num_samples, num_words_in_a_sentence). In your case, it would be (100, 4). In other words, you don't need to one-hot encode the words for the input data. You can also use word indices as the labels as well if you use sparse_categorical_crossentropy as the loss function instead.






        share|improve this answer















        The problem is that you are thinking about the embedding layer in a wrong way. An Embedding layer is just a trainable look-up table: you give it an integer, which is the index of the word in the vocabulary, and it returns the word-vector (i.e. word embedding) of the given index. Therefore, its input must be the indices of the words in a sentence.



        As an example, if the indices of the words "over", "the", "to" and "find" are 43, 6, 9 and 33 respectively, then the input of the Embedding layer would be an array of those indices, i.e. [43, 6, 9, 33]. Therefore, the training data must have a shape of (num_samples, num_words_in_a_sentence). In your case, it would be (100, 4). In other words, you don't need to one-hot encode the words for the input data. You can also use word indices as the labels as well if you use sparse_categorical_crossentropy as the loss function instead.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 28 '18 at 12:19

























        answered Nov 28 '18 at 12:11









        todaytoday

        12.3k22643




        12.3k22643
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53465475%2fconfusion-about-input-shape-for-keras-embedding-layer%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Hercules Kyvelos

            Tangent Lines Diagram Along Smooth Curve

            Yusuf al-Mu'taman ibn Hud