How to get a unique value matrix corresponding to other matrix?












0















I am a Python beginner and I encounter with a NLP project.



This is my code:



doc1 = "I am a dog and I like biscut"
doc2 = "I am a cat"
doc3 = "I like to drink milk"
doc4 = "I am a bird and I fly to the sky"
doc5 = "I am an elephant and I want to sleep"
docs = [doc1.split(' '), doc2.split(' '), doc3.split(' ')]
docs2 = [doc4.split(' '), doc5.split(' ')]
docs_all = doc1.split(' ') + doc2.split(' ') + doc3.split(' ') +
doc4.split(' ') + doc5.split(' ')


And get a set of docs_all:



print(list(enumerate(set(docs_all))))
[(0, 'a'), (1, 'the'), (2, 'drink'), (3, 'elephant'), (4, 'dog'), (5,
'biscut'), (6, 'cat'), (7, 'bird'), (8, 'an'), (9, 'milk'), (10, 'want'),
(11, 'am'), (12, 'I'), (13, 'and'), (14, 'to'), (15, 'sky'), (16, 'sleep'),
(17, 'like'), (18, 'fly')]


The reference matrix of docs and docs 2 is :



setdocs = [(0, 0.44), (1, 0.14), (2, 0.22), (3, 0.113), (4, 0.44), (5, 
0.15), (6, 0.96), (7, 0.77), (8, 0.28), (9, 0.39), (10, 0.111)]
setdocs2 = [(0, 0.55), (1, 0.13), (2, 0.52), (3, 0.33), (4, 0.114),
(5,0.995),(6, 0.16), (7, 0.97), (8, 0.118), (9, 0.14), (10, 0.88), (11,
0.166), (12, 0.85)]


The first value in the tuple is the number of words of docs and docs2, come from



refdocs2 = list(enumerate(set(doc4.split(' ') + doc5.split(' '))))
refdocs = list(enumerate(set(doc1.split(' ') + doc2.split(' ') +
doc3.split(' '))))
print(refdocs)
print(refdocs2)
[(0, 'a'), (1, 'drink'), (2, 'dog'), (3, 'biscut'), (4, 'cat'), (5, 'milk'),
(6, 'am'), (7, 'I'), (8, 'and'), (9, 'to'), (10, 'like')]
[(0, 'a'), (1, 'the'), (2, 'elephant'), (3, 'bird'), (4, 'an'), (5, 'want'),
(6, 'sleep'), (7, 'am'), (8, 'I'), (9, 'and'), (10, 'to'), (11, 'sky'), (12,
'fly')]


I want to get a matrix that is about:



finaldocs = [[0.44, 0, 0, 0, 0.22, 0.113, 0, 0, 0, 0, 0, 0.96, 0.77, 0.28, 
0, 0, 0, 0.111, 0],
[0.44, 0, 0, 0, 0, 0, 0.44, 0, 0, 0, 0, 0.96, 0.77, 0, 0, 0, 0,
0, 0],
[0, 0, 0.14, 0, 0, 0, 0, 0, 0, 0.15, 0, 0, 0.77, 0, 0.39,
0, 0, 0.111, 0]]
finaldocs2 = [[0.55, 0.13, 0, 0, 0, 0, 0, 0.33, 0, 0, 0.97, 0.118, 0.14,
0.88, 0.166, 0, 0, 0.85],[0, 0.13, 0, 0.52, 0, 0, 0, 0, 0.114,
0.995, 0.97, 0.118, 0.14, 0.88, 0, 0.16, 0, 0]]


The second value in the tuple of setdocs and setdocs2 is the value I want to take out.
finaldocs[0] to finaldocs[3] refer to doc1 to doc3 and to get the second value of the tuple in setdocs with the ordinal number of list(enumerate(set(docs_all)))



For example, doc2 = "I am a cat" occur in the 0, 6, 11 ,12 value in list(enumerate(set(docs_all))). "I" "am" "a" "cat" occur in the 0,4,6,7 value in refdocs and get the second value in the tuple of from setdocs to create finaldocs[2]



[0.44, 0, 0, 0, 0, 0, 0.44, 0, 0, 0, 0, 0.96, 0.77, 0, 0, 0, 0, 0, 0]


My attempts:



dll = [np.arange(19),np.arange(19),np.arange(19)]
for i in dll:
for ii in i:
for m in list(enumerate(set(docs_all))):
for mm,nn in m:
for t in refdocs:
for tt,ll in t:
for p in setdocs:
for pp,oo in p:
if nn in ll:
i.replace(i, oo)


It does fail.

How do I get finaldocs by coding in Python?










share|improve this question





























    0















    I am a Python beginner and I encounter with a NLP project.



    This is my code:



    doc1 = "I am a dog and I like biscut"
    doc2 = "I am a cat"
    doc3 = "I like to drink milk"
    doc4 = "I am a bird and I fly to the sky"
    doc5 = "I am an elephant and I want to sleep"
    docs = [doc1.split(' '), doc2.split(' '), doc3.split(' ')]
    docs2 = [doc4.split(' '), doc5.split(' ')]
    docs_all = doc1.split(' ') + doc2.split(' ') + doc3.split(' ') +
    doc4.split(' ') + doc5.split(' ')


    And get a set of docs_all:



    print(list(enumerate(set(docs_all))))
    [(0, 'a'), (1, 'the'), (2, 'drink'), (3, 'elephant'), (4, 'dog'), (5,
    'biscut'), (6, 'cat'), (7, 'bird'), (8, 'an'), (9, 'milk'), (10, 'want'),
    (11, 'am'), (12, 'I'), (13, 'and'), (14, 'to'), (15, 'sky'), (16, 'sleep'),
    (17, 'like'), (18, 'fly')]


    The reference matrix of docs and docs 2 is :



    setdocs = [(0, 0.44), (1, 0.14), (2, 0.22), (3, 0.113), (4, 0.44), (5, 
    0.15), (6, 0.96), (7, 0.77), (8, 0.28), (9, 0.39), (10, 0.111)]
    setdocs2 = [(0, 0.55), (1, 0.13), (2, 0.52), (3, 0.33), (4, 0.114),
    (5,0.995),(6, 0.16), (7, 0.97), (8, 0.118), (9, 0.14), (10, 0.88), (11,
    0.166), (12, 0.85)]


    The first value in the tuple is the number of words of docs and docs2, come from



    refdocs2 = list(enumerate(set(doc4.split(' ') + doc5.split(' '))))
    refdocs = list(enumerate(set(doc1.split(' ') + doc2.split(' ') +
    doc3.split(' '))))
    print(refdocs)
    print(refdocs2)
    [(0, 'a'), (1, 'drink'), (2, 'dog'), (3, 'biscut'), (4, 'cat'), (5, 'milk'),
    (6, 'am'), (7, 'I'), (8, 'and'), (9, 'to'), (10, 'like')]
    [(0, 'a'), (1, 'the'), (2, 'elephant'), (3, 'bird'), (4, 'an'), (5, 'want'),
    (6, 'sleep'), (7, 'am'), (8, 'I'), (9, 'and'), (10, 'to'), (11, 'sky'), (12,
    'fly')]


    I want to get a matrix that is about:



    finaldocs = [[0.44, 0, 0, 0, 0.22, 0.113, 0, 0, 0, 0, 0, 0.96, 0.77, 0.28, 
    0, 0, 0, 0.111, 0],
    [0.44, 0, 0, 0, 0, 0, 0.44, 0, 0, 0, 0, 0.96, 0.77, 0, 0, 0, 0,
    0, 0],
    [0, 0, 0.14, 0, 0, 0, 0, 0, 0, 0.15, 0, 0, 0.77, 0, 0.39,
    0, 0, 0.111, 0]]
    finaldocs2 = [[0.55, 0.13, 0, 0, 0, 0, 0, 0.33, 0, 0, 0.97, 0.118, 0.14,
    0.88, 0.166, 0, 0, 0.85],[0, 0.13, 0, 0.52, 0, 0, 0, 0, 0.114,
    0.995, 0.97, 0.118, 0.14, 0.88, 0, 0.16, 0, 0]]


    The second value in the tuple of setdocs and setdocs2 is the value I want to take out.
    finaldocs[0] to finaldocs[3] refer to doc1 to doc3 and to get the second value of the tuple in setdocs with the ordinal number of list(enumerate(set(docs_all)))



    For example, doc2 = "I am a cat" occur in the 0, 6, 11 ,12 value in list(enumerate(set(docs_all))). "I" "am" "a" "cat" occur in the 0,4,6,7 value in refdocs and get the second value in the tuple of from setdocs to create finaldocs[2]



    [0.44, 0, 0, 0, 0, 0, 0.44, 0, 0, 0, 0, 0.96, 0.77, 0, 0, 0, 0, 0, 0]


    My attempts:



    dll = [np.arange(19),np.arange(19),np.arange(19)]
    for i in dll:
    for ii in i:
    for m in list(enumerate(set(docs_all))):
    for mm,nn in m:
    for t in refdocs:
    for tt,ll in t:
    for p in setdocs:
    for pp,oo in p:
    if nn in ll:
    i.replace(i, oo)


    It does fail.

    How do I get finaldocs by coding in Python?










    share|improve this question



























      0












      0








      0








      I am a Python beginner and I encounter with a NLP project.



      This is my code:



      doc1 = "I am a dog and I like biscut"
      doc2 = "I am a cat"
      doc3 = "I like to drink milk"
      doc4 = "I am a bird and I fly to the sky"
      doc5 = "I am an elephant and I want to sleep"
      docs = [doc1.split(' '), doc2.split(' '), doc3.split(' ')]
      docs2 = [doc4.split(' '), doc5.split(' ')]
      docs_all = doc1.split(' ') + doc2.split(' ') + doc3.split(' ') +
      doc4.split(' ') + doc5.split(' ')


      And get a set of docs_all:



      print(list(enumerate(set(docs_all))))
      [(0, 'a'), (1, 'the'), (2, 'drink'), (3, 'elephant'), (4, 'dog'), (5,
      'biscut'), (6, 'cat'), (7, 'bird'), (8, 'an'), (9, 'milk'), (10, 'want'),
      (11, 'am'), (12, 'I'), (13, 'and'), (14, 'to'), (15, 'sky'), (16, 'sleep'),
      (17, 'like'), (18, 'fly')]


      The reference matrix of docs and docs 2 is :



      setdocs = [(0, 0.44), (1, 0.14), (2, 0.22), (3, 0.113), (4, 0.44), (5, 
      0.15), (6, 0.96), (7, 0.77), (8, 0.28), (9, 0.39), (10, 0.111)]
      setdocs2 = [(0, 0.55), (1, 0.13), (2, 0.52), (3, 0.33), (4, 0.114),
      (5,0.995),(6, 0.16), (7, 0.97), (8, 0.118), (9, 0.14), (10, 0.88), (11,
      0.166), (12, 0.85)]


      The first value in the tuple is the number of words of docs and docs2, come from



      refdocs2 = list(enumerate(set(doc4.split(' ') + doc5.split(' '))))
      refdocs = list(enumerate(set(doc1.split(' ') + doc2.split(' ') +
      doc3.split(' '))))
      print(refdocs)
      print(refdocs2)
      [(0, 'a'), (1, 'drink'), (2, 'dog'), (3, 'biscut'), (4, 'cat'), (5, 'milk'),
      (6, 'am'), (7, 'I'), (8, 'and'), (9, 'to'), (10, 'like')]
      [(0, 'a'), (1, 'the'), (2, 'elephant'), (3, 'bird'), (4, 'an'), (5, 'want'),
      (6, 'sleep'), (7, 'am'), (8, 'I'), (9, 'and'), (10, 'to'), (11, 'sky'), (12,
      'fly')]


      I want to get a matrix that is about:



      finaldocs = [[0.44, 0, 0, 0, 0.22, 0.113, 0, 0, 0, 0, 0, 0.96, 0.77, 0.28, 
      0, 0, 0, 0.111, 0],
      [0.44, 0, 0, 0, 0, 0, 0.44, 0, 0, 0, 0, 0.96, 0.77, 0, 0, 0, 0,
      0, 0],
      [0, 0, 0.14, 0, 0, 0, 0, 0, 0, 0.15, 0, 0, 0.77, 0, 0.39,
      0, 0, 0.111, 0]]
      finaldocs2 = [[0.55, 0.13, 0, 0, 0, 0, 0, 0.33, 0, 0, 0.97, 0.118, 0.14,
      0.88, 0.166, 0, 0, 0.85],[0, 0.13, 0, 0.52, 0, 0, 0, 0, 0.114,
      0.995, 0.97, 0.118, 0.14, 0.88, 0, 0.16, 0, 0]]


      The second value in the tuple of setdocs and setdocs2 is the value I want to take out.
      finaldocs[0] to finaldocs[3] refer to doc1 to doc3 and to get the second value of the tuple in setdocs with the ordinal number of list(enumerate(set(docs_all)))



      For example, doc2 = "I am a cat" occur in the 0, 6, 11 ,12 value in list(enumerate(set(docs_all))). "I" "am" "a" "cat" occur in the 0,4,6,7 value in refdocs and get the second value in the tuple of from setdocs to create finaldocs[2]



      [0.44, 0, 0, 0, 0, 0, 0.44, 0, 0, 0, 0, 0.96, 0.77, 0, 0, 0, 0, 0, 0]


      My attempts:



      dll = [np.arange(19),np.arange(19),np.arange(19)]
      for i in dll:
      for ii in i:
      for m in list(enumerate(set(docs_all))):
      for mm,nn in m:
      for t in refdocs:
      for tt,ll in t:
      for p in setdocs:
      for pp,oo in p:
      if nn in ll:
      i.replace(i, oo)


      It does fail.

      How do I get finaldocs by coding in Python?










      share|improve this question
















      I am a Python beginner and I encounter with a NLP project.



      This is my code:



      doc1 = "I am a dog and I like biscut"
      doc2 = "I am a cat"
      doc3 = "I like to drink milk"
      doc4 = "I am a bird and I fly to the sky"
      doc5 = "I am an elephant and I want to sleep"
      docs = [doc1.split(' '), doc2.split(' '), doc3.split(' ')]
      docs2 = [doc4.split(' '), doc5.split(' ')]
      docs_all = doc1.split(' ') + doc2.split(' ') + doc3.split(' ') +
      doc4.split(' ') + doc5.split(' ')


      And get a set of docs_all:



      print(list(enumerate(set(docs_all))))
      [(0, 'a'), (1, 'the'), (2, 'drink'), (3, 'elephant'), (4, 'dog'), (5,
      'biscut'), (6, 'cat'), (7, 'bird'), (8, 'an'), (9, 'milk'), (10, 'want'),
      (11, 'am'), (12, 'I'), (13, 'and'), (14, 'to'), (15, 'sky'), (16, 'sleep'),
      (17, 'like'), (18, 'fly')]


      The reference matrix of docs and docs 2 is :



      setdocs = [(0, 0.44), (1, 0.14), (2, 0.22), (3, 0.113), (4, 0.44), (5, 
      0.15), (6, 0.96), (7, 0.77), (8, 0.28), (9, 0.39), (10, 0.111)]
      setdocs2 = [(0, 0.55), (1, 0.13), (2, 0.52), (3, 0.33), (4, 0.114),
      (5,0.995),(6, 0.16), (7, 0.97), (8, 0.118), (9, 0.14), (10, 0.88), (11,
      0.166), (12, 0.85)]


      The first value in the tuple is the number of words of docs and docs2, come from



      refdocs2 = list(enumerate(set(doc4.split(' ') + doc5.split(' '))))
      refdocs = list(enumerate(set(doc1.split(' ') + doc2.split(' ') +
      doc3.split(' '))))
      print(refdocs)
      print(refdocs2)
      [(0, 'a'), (1, 'drink'), (2, 'dog'), (3, 'biscut'), (4, 'cat'), (5, 'milk'),
      (6, 'am'), (7, 'I'), (8, 'and'), (9, 'to'), (10, 'like')]
      [(0, 'a'), (1, 'the'), (2, 'elephant'), (3, 'bird'), (4, 'an'), (5, 'want'),
      (6, 'sleep'), (7, 'am'), (8, 'I'), (9, 'and'), (10, 'to'), (11, 'sky'), (12,
      'fly')]


      I want to get a matrix that is about:



      finaldocs = [[0.44, 0, 0, 0, 0.22, 0.113, 0, 0, 0, 0, 0, 0.96, 0.77, 0.28, 
      0, 0, 0, 0.111, 0],
      [0.44, 0, 0, 0, 0, 0, 0.44, 0, 0, 0, 0, 0.96, 0.77, 0, 0, 0, 0,
      0, 0],
      [0, 0, 0.14, 0, 0, 0, 0, 0, 0, 0.15, 0, 0, 0.77, 0, 0.39,
      0, 0, 0.111, 0]]
      finaldocs2 = [[0.55, 0.13, 0, 0, 0, 0, 0, 0.33, 0, 0, 0.97, 0.118, 0.14,
      0.88, 0.166, 0, 0, 0.85],[0, 0.13, 0, 0.52, 0, 0, 0, 0, 0.114,
      0.995, 0.97, 0.118, 0.14, 0.88, 0, 0.16, 0, 0]]


      The second value in the tuple of setdocs and setdocs2 is the value I want to take out.
      finaldocs[0] to finaldocs[3] refer to doc1 to doc3 and to get the second value of the tuple in setdocs with the ordinal number of list(enumerate(set(docs_all)))



      For example, doc2 = "I am a cat" occur in the 0, 6, 11 ,12 value in list(enumerate(set(docs_all))). "I" "am" "a" "cat" occur in the 0,4,6,7 value in refdocs and get the second value in the tuple of from setdocs to create finaldocs[2]



      [0.44, 0, 0, 0, 0, 0, 0.44, 0, 0, 0, 0, 0.96, 0.77, 0, 0, 0, 0, 0, 0]


      My attempts:



      dll = [np.arange(19),np.arange(19),np.arange(19)]
      for i in dll:
      for ii in i:
      for m in list(enumerate(set(docs_all))):
      for mm,nn in m:
      for t in refdocs:
      for tt,ll in t:
      for p in setdocs:
      for pp,oo in p:
      if nn in ll:
      i.replace(i, oo)


      It does fail.

      How do I get finaldocs by coding in Python?







      python-3.x numpy matrix replace nlp






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 21 '18 at 5:48







      wayne64001

















      asked Nov 21 '18 at 5:16









      wayne64001wayne64001

      475




      475
























          0






          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53405645%2fhow-to-get-a-unique-value-matrix-corresponding-to-other-matrix%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53405645%2fhow-to-get-a-unique-value-matrix-corresponding-to-other-matrix%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Tangent Lines Diagram Along Smooth Curve

          Yusuf al-Mu'taman ibn Hud

          Zucchini