Mapping a dataframe (with NA) to an n by n adjacency matrix (as a data.frame object)











up vote
0
down vote

favorite












I have a three-column dataframe object recording the bilateral trade data between 161 countries, the data are of dyadic format containing 19687 rows, three columns (reporter (rid), partner (pid), and their bilateral trade flow (TradeValue) in a given year). rid or pid takes a value from 1 to 161, and a country is assigned the same rid and pid. For any given pair of (rid, pid) in which rid =/= pid, TradeValue(rid, pid) = TradeValue(pid, rid).



The data (run in R) look like this:






#load the data from dropbox folder
library(foreign)
example_data <- read.csv("https://www.dropbox.com/s/hf0ga22tdjlvdvr/example_data.csv?dl=1")

head(example_data, n = 10)
rid pid TradeValue
1 2 3 500
2 2 7 2328
3 2 8 2233465
4 2 9 81470
5 2 12 572893
6 2 17 488374
7 2 19 3314932
8 2 23 20323
9 2 25 10
10 2 29 9026220





The data were sourced from UN Comtrade database, each rid is paired with multiple pid to get their bilateral trade data, but as can be seen, not every pid has a numeric id value because I only assigned a rid or pid to a country if a list of relevant economic indicators of that country are available, which is why there are NA in the data despite TradeValue exists between that country and the reporting country (rid). The same applies when a country become a "reporter," in that situation, that country did not report any TradeValue with partners, and its id number is absent from the rid column. (Hence, you can see rid column begins with 2, because country 1 (i.e., Afghanistan) did not report any bilateral trade data with partners). A quick check with summary statistics helps confirm this






length(unique(example_data$rid))
[1] 139
# only 139 countries reported bilateral trade statistics with partners
length(unique(example_data$pid))
[1] 162
# that extra pid is NA (161 + NA = 162)





Since most countries report bilateral trade data with partners and for those who don't, they tend to be small economies. Hence, I want to preserve the complete list of 161 countries and transform this example_data dataframe into a 161 x 161 adjacency matrix in which




  1. for those countries that are absent from the rid column (e.g., rid == 1), create each of them a row and set the entire row (in the 161 x 161 matrix) to 0.

  2. for those countries (pid) that do not share TradeValue entries with a particular rid, set those cells to 0.


For example, suppose in a 5 x 5 adjacency matrix, country 1 did not report any trade statistics with partners, the other four reported their bilateral trade statistics with other (except country 1). The original dataframe is like






rid	pid	TradeValue
2 3 223
2 4 13
2 5 9
3 2 223
3 4 57
3 5 28
4 2 13
4 3 57
4 5 82
5 2 9
5 3 28
5 4 82





from which I want to convert it to a 5 x 5 adjacency matrix (of data.frame format), the desired output should look like this






	V1	V2	V3	V4	V5
1 0 0 0 0 0
2 0 0 223 13 9
3 0 223 0 57 28
4 0 13 57 0 82
5 0 9 28 82 0





And using the same method on the example_data to create a 161 x 161 adjacency matrix. However, after a couple trial and error with reshape and other methods, I still could not get around with such conversion, not even beyond the first step.



It will be really appreciated if anyone could enlighten me on this?










share|improve this question




























    up vote
    0
    down vote

    favorite












    I have a three-column dataframe object recording the bilateral trade data between 161 countries, the data are of dyadic format containing 19687 rows, three columns (reporter (rid), partner (pid), and their bilateral trade flow (TradeValue) in a given year). rid or pid takes a value from 1 to 161, and a country is assigned the same rid and pid. For any given pair of (rid, pid) in which rid =/= pid, TradeValue(rid, pid) = TradeValue(pid, rid).



    The data (run in R) look like this:






    #load the data from dropbox folder
    library(foreign)
    example_data <- read.csv("https://www.dropbox.com/s/hf0ga22tdjlvdvr/example_data.csv?dl=1")

    head(example_data, n = 10)
    rid pid TradeValue
    1 2 3 500
    2 2 7 2328
    3 2 8 2233465
    4 2 9 81470
    5 2 12 572893
    6 2 17 488374
    7 2 19 3314932
    8 2 23 20323
    9 2 25 10
    10 2 29 9026220





    The data were sourced from UN Comtrade database, each rid is paired with multiple pid to get their bilateral trade data, but as can be seen, not every pid has a numeric id value because I only assigned a rid or pid to a country if a list of relevant economic indicators of that country are available, which is why there are NA in the data despite TradeValue exists between that country and the reporting country (rid). The same applies when a country become a "reporter," in that situation, that country did not report any TradeValue with partners, and its id number is absent from the rid column. (Hence, you can see rid column begins with 2, because country 1 (i.e., Afghanistan) did not report any bilateral trade data with partners). A quick check with summary statistics helps confirm this






    length(unique(example_data$rid))
    [1] 139
    # only 139 countries reported bilateral trade statistics with partners
    length(unique(example_data$pid))
    [1] 162
    # that extra pid is NA (161 + NA = 162)





    Since most countries report bilateral trade data with partners and for those who don't, they tend to be small economies. Hence, I want to preserve the complete list of 161 countries and transform this example_data dataframe into a 161 x 161 adjacency matrix in which




    1. for those countries that are absent from the rid column (e.g., rid == 1), create each of them a row and set the entire row (in the 161 x 161 matrix) to 0.

    2. for those countries (pid) that do not share TradeValue entries with a particular rid, set those cells to 0.


    For example, suppose in a 5 x 5 adjacency matrix, country 1 did not report any trade statistics with partners, the other four reported their bilateral trade statistics with other (except country 1). The original dataframe is like






    rid	pid	TradeValue
    2 3 223
    2 4 13
    2 5 9
    3 2 223
    3 4 57
    3 5 28
    4 2 13
    4 3 57
    4 5 82
    5 2 9
    5 3 28
    5 4 82





    from which I want to convert it to a 5 x 5 adjacency matrix (of data.frame format), the desired output should look like this






    	V1	V2	V3	V4	V5
    1 0 0 0 0 0
    2 0 0 223 13 9
    3 0 223 0 57 28
    4 0 13 57 0 82
    5 0 9 28 82 0





    And using the same method on the example_data to create a 161 x 161 adjacency matrix. However, after a couple trial and error with reshape and other methods, I still could not get around with such conversion, not even beyond the first step.



    It will be really appreciated if anyone could enlighten me on this?










    share|improve this question


























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have a three-column dataframe object recording the bilateral trade data between 161 countries, the data are of dyadic format containing 19687 rows, three columns (reporter (rid), partner (pid), and their bilateral trade flow (TradeValue) in a given year). rid or pid takes a value from 1 to 161, and a country is assigned the same rid and pid. For any given pair of (rid, pid) in which rid =/= pid, TradeValue(rid, pid) = TradeValue(pid, rid).



      The data (run in R) look like this:






      #load the data from dropbox folder
      library(foreign)
      example_data <- read.csv("https://www.dropbox.com/s/hf0ga22tdjlvdvr/example_data.csv?dl=1")

      head(example_data, n = 10)
      rid pid TradeValue
      1 2 3 500
      2 2 7 2328
      3 2 8 2233465
      4 2 9 81470
      5 2 12 572893
      6 2 17 488374
      7 2 19 3314932
      8 2 23 20323
      9 2 25 10
      10 2 29 9026220





      The data were sourced from UN Comtrade database, each rid is paired with multiple pid to get their bilateral trade data, but as can be seen, not every pid has a numeric id value because I only assigned a rid or pid to a country if a list of relevant economic indicators of that country are available, which is why there are NA in the data despite TradeValue exists between that country and the reporting country (rid). The same applies when a country become a "reporter," in that situation, that country did not report any TradeValue with partners, and its id number is absent from the rid column. (Hence, you can see rid column begins with 2, because country 1 (i.e., Afghanistan) did not report any bilateral trade data with partners). A quick check with summary statistics helps confirm this






      length(unique(example_data$rid))
      [1] 139
      # only 139 countries reported bilateral trade statistics with partners
      length(unique(example_data$pid))
      [1] 162
      # that extra pid is NA (161 + NA = 162)





      Since most countries report bilateral trade data with partners and for those who don't, they tend to be small economies. Hence, I want to preserve the complete list of 161 countries and transform this example_data dataframe into a 161 x 161 adjacency matrix in which




      1. for those countries that are absent from the rid column (e.g., rid == 1), create each of them a row and set the entire row (in the 161 x 161 matrix) to 0.

      2. for those countries (pid) that do not share TradeValue entries with a particular rid, set those cells to 0.


      For example, suppose in a 5 x 5 adjacency matrix, country 1 did not report any trade statistics with partners, the other four reported their bilateral trade statistics with other (except country 1). The original dataframe is like






      rid	pid	TradeValue
      2 3 223
      2 4 13
      2 5 9
      3 2 223
      3 4 57
      3 5 28
      4 2 13
      4 3 57
      4 5 82
      5 2 9
      5 3 28
      5 4 82





      from which I want to convert it to a 5 x 5 adjacency matrix (of data.frame format), the desired output should look like this






      	V1	V2	V3	V4	V5
      1 0 0 0 0 0
      2 0 0 223 13 9
      3 0 223 0 57 28
      4 0 13 57 0 82
      5 0 9 28 82 0





      And using the same method on the example_data to create a 161 x 161 adjacency matrix. However, after a couple trial and error with reshape and other methods, I still could not get around with such conversion, not even beyond the first step.



      It will be really appreciated if anyone could enlighten me on this?










      share|improve this question















      I have a three-column dataframe object recording the bilateral trade data between 161 countries, the data are of dyadic format containing 19687 rows, three columns (reporter (rid), partner (pid), and their bilateral trade flow (TradeValue) in a given year). rid or pid takes a value from 1 to 161, and a country is assigned the same rid and pid. For any given pair of (rid, pid) in which rid =/= pid, TradeValue(rid, pid) = TradeValue(pid, rid).



      The data (run in R) look like this:






      #load the data from dropbox folder
      library(foreign)
      example_data <- read.csv("https://www.dropbox.com/s/hf0ga22tdjlvdvr/example_data.csv?dl=1")

      head(example_data, n = 10)
      rid pid TradeValue
      1 2 3 500
      2 2 7 2328
      3 2 8 2233465
      4 2 9 81470
      5 2 12 572893
      6 2 17 488374
      7 2 19 3314932
      8 2 23 20323
      9 2 25 10
      10 2 29 9026220





      The data were sourced from UN Comtrade database, each rid is paired with multiple pid to get their bilateral trade data, but as can be seen, not every pid has a numeric id value because I only assigned a rid or pid to a country if a list of relevant economic indicators of that country are available, which is why there are NA in the data despite TradeValue exists between that country and the reporting country (rid). The same applies when a country become a "reporter," in that situation, that country did not report any TradeValue with partners, and its id number is absent from the rid column. (Hence, you can see rid column begins with 2, because country 1 (i.e., Afghanistan) did not report any bilateral trade data with partners). A quick check with summary statistics helps confirm this






      length(unique(example_data$rid))
      [1] 139
      # only 139 countries reported bilateral trade statistics with partners
      length(unique(example_data$pid))
      [1] 162
      # that extra pid is NA (161 + NA = 162)





      Since most countries report bilateral trade data with partners and for those who don't, they tend to be small economies. Hence, I want to preserve the complete list of 161 countries and transform this example_data dataframe into a 161 x 161 adjacency matrix in which




      1. for those countries that are absent from the rid column (e.g., rid == 1), create each of them a row and set the entire row (in the 161 x 161 matrix) to 0.

      2. for those countries (pid) that do not share TradeValue entries with a particular rid, set those cells to 0.


      For example, suppose in a 5 x 5 adjacency matrix, country 1 did not report any trade statistics with partners, the other four reported their bilateral trade statistics with other (except country 1). The original dataframe is like






      rid	pid	TradeValue
      2 3 223
      2 4 13
      2 5 9
      3 2 223
      3 4 57
      3 5 28
      4 2 13
      4 3 57
      4 5 82
      5 2 9
      5 3 28
      5 4 82





      from which I want to convert it to a 5 x 5 adjacency matrix (of data.frame format), the desired output should look like this






      	V1	V2	V3	V4	V5
      1 0 0 0 0 0
      2 0 0 223 13 9
      3 0 223 0 57 28
      4 0 13 57 0 82
      5 0 9 28 82 0





      And using the same method on the example_data to create a 161 x 161 adjacency matrix. However, after a couple trial and error with reshape and other methods, I still could not get around with such conversion, not even beyond the first step.



      It will be really appreciated if anyone could enlighten me on this?






      #load the data from dropbox folder
      library(foreign)
      example_data <- read.csv("https://www.dropbox.com/s/hf0ga22tdjlvdvr/example_data.csv?dl=1")

      head(example_data, n = 10)
      rid pid TradeValue
      1 2 3 500
      2 2 7 2328
      3 2 8 2233465
      4 2 9 81470
      5 2 12 572893
      6 2 17 488374
      7 2 19 3314932
      8 2 23 20323
      9 2 25 10
      10 2 29 9026220





      #load the data from dropbox folder
      library(foreign)
      example_data <- read.csv("https://www.dropbox.com/s/hf0ga22tdjlvdvr/example_data.csv?dl=1")

      head(example_data, n = 10)
      rid pid TradeValue
      1 2 3 500
      2 2 7 2328
      3 2 8 2233465
      4 2 9 81470
      5 2 12 572893
      6 2 17 488374
      7 2 19 3314932
      8 2 23 20323
      9 2 25 10
      10 2 29 9026220





      length(unique(example_data$rid))
      [1] 139
      # only 139 countries reported bilateral trade statistics with partners
      length(unique(example_data$pid))
      [1] 162
      # that extra pid is NA (161 + NA = 162)





      length(unique(example_data$rid))
      [1] 139
      # only 139 countries reported bilateral trade statistics with partners
      length(unique(example_data$pid))
      [1] 162
      # that extra pid is NA (161 + NA = 162)





      rid	pid	TradeValue
      2 3 223
      2 4 13
      2 5 9
      3 2 223
      3 4 57
      3 5 28
      4 2 13
      4 3 57
      4 5 82
      5 2 9
      5 3 28
      5 4 82





      rid	pid	TradeValue
      2 3 223
      2 4 13
      2 5 9
      3 2 223
      3 4 57
      3 5 28
      4 2 13
      4 3 57
      4 5 82
      5 2 9
      5 3 28
      5 4 82





      	V1	V2	V3	V4	V5
      1 0 0 0 0 0
      2 0 0 223 13 9
      3 0 223 0 57 28
      4 0 13 57 0 82
      5 0 9 28 82 0





      	V1	V2	V3	V4	V5
      1 0 0 0 0 0
      2 0 0 223 13 9
      3 0 223 0 57 28
      4 0 13 57 0 82
      5 0 9 28 82 0






      r dataframe reshape adjacency-matrix






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 8 at 11:32

























      asked Nov 7 at 14:17









      Chris T.

      347316




      347316
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          I cannot read the dropbox file but have tried to work off of your 5-country example dataframe -



          country_num = 5

          # check countries missing in rid and pid
          rid_miss = setdiff(1:country_num, example_data$rid)
          pid_miss = ifelse(length(setdiff(1:country_num, example_data$pid) == 0),
          1, setdiff(1:country_num, example_data$pid))

          # create dummy dataframe with missing rid and pid
          add_data = as.data.frame(do.call(cbind, list(rid_miss, pid_miss, NA)))
          colnames(add_data) = colnames(example_data)

          # add dummy dataframe to original
          example_data = rbind(example_data, add_data)

          # the dcast now takes missing rid and pid into account
          mat = dcast(example_data, rid ~ pid, value.var = "TradeValue")

          # can remove first column without setting colnames but this is more failproof
          rownames(mat) = mat[, 1]
          mat = as.matrix(mat[, -1])

          # fill in upper triangular matrix with missing values of lower triangular matrix
          # and vice-versa since TradeValue(rid, pid) = TradeValue(pid, rid)
          mat[is.na(mat)] = t(mat)[is.na(mat)]

          # change NAs to 0 according to preference - would keep as NA to differentiate
          # from actual zeros
          mat[is.na(mat)] = 0


          Does this help?






          share|improve this answer























          • Just a quick touch base. Thanks so much for your response, I will look at your code after get back from work and Dropbox folder link has been updated.
            – Chris T.
            Nov 7 at 16:46










          • I applied your data to my data, but it didn't work, because add_data is of different length than example_data, the former does not have the third column (TradeValue).
            – Chris T.
            Nov 8 at 2:15










          • It's because in my example_data, the pid contains complete list of all country id, if you check that with pid_miss = setdiff(1:country_num, example_data$pid), length(pid_miss) is 0. Putting pid_miss into list(rid_miss, pid_miss, NA) will create an add_data object of only 2 columns, which is not of the same length as example_data.
            – Chris T.
            Nov 8 at 2:32










          • I know it is a big dataset but can you dput it? I am still unable to access the dropbox file, and I need it to understand the data better. Your 5 x 5 example had 1 missing from both rid and pid and so I assumed that was your dataset was like.
            – phil_t
            Nov 8 at 13:46










          • Thanks again for your response, but the dropbox link should work by now. The data contain 15602 rows, so it's hard to dput here.
            – Chris T.
            Nov 8 at 13:49













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53191265%2fmapping-a-dataframe-with-na-to-an-n-by-n-adjacency-matrix-as-a-data-frame-obj%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote



          accepted










          I cannot read the dropbox file but have tried to work off of your 5-country example dataframe -



          country_num = 5

          # check countries missing in rid and pid
          rid_miss = setdiff(1:country_num, example_data$rid)
          pid_miss = ifelse(length(setdiff(1:country_num, example_data$pid) == 0),
          1, setdiff(1:country_num, example_data$pid))

          # create dummy dataframe with missing rid and pid
          add_data = as.data.frame(do.call(cbind, list(rid_miss, pid_miss, NA)))
          colnames(add_data) = colnames(example_data)

          # add dummy dataframe to original
          example_data = rbind(example_data, add_data)

          # the dcast now takes missing rid and pid into account
          mat = dcast(example_data, rid ~ pid, value.var = "TradeValue")

          # can remove first column without setting colnames but this is more failproof
          rownames(mat) = mat[, 1]
          mat = as.matrix(mat[, -1])

          # fill in upper triangular matrix with missing values of lower triangular matrix
          # and vice-versa since TradeValue(rid, pid) = TradeValue(pid, rid)
          mat[is.na(mat)] = t(mat)[is.na(mat)]

          # change NAs to 0 according to preference - would keep as NA to differentiate
          # from actual zeros
          mat[is.na(mat)] = 0


          Does this help?






          share|improve this answer























          • Just a quick touch base. Thanks so much for your response, I will look at your code after get back from work and Dropbox folder link has been updated.
            – Chris T.
            Nov 7 at 16:46










          • I applied your data to my data, but it didn't work, because add_data is of different length than example_data, the former does not have the third column (TradeValue).
            – Chris T.
            Nov 8 at 2:15










          • It's because in my example_data, the pid contains complete list of all country id, if you check that with pid_miss = setdiff(1:country_num, example_data$pid), length(pid_miss) is 0. Putting pid_miss into list(rid_miss, pid_miss, NA) will create an add_data object of only 2 columns, which is not of the same length as example_data.
            – Chris T.
            Nov 8 at 2:32










          • I know it is a big dataset but can you dput it? I am still unable to access the dropbox file, and I need it to understand the data better. Your 5 x 5 example had 1 missing from both rid and pid and so I assumed that was your dataset was like.
            – phil_t
            Nov 8 at 13:46










          • Thanks again for your response, but the dropbox link should work by now. The data contain 15602 rows, so it's hard to dput here.
            – Chris T.
            Nov 8 at 13:49

















          up vote
          1
          down vote



          accepted










          I cannot read the dropbox file but have tried to work off of your 5-country example dataframe -



          country_num = 5

          # check countries missing in rid and pid
          rid_miss = setdiff(1:country_num, example_data$rid)
          pid_miss = ifelse(length(setdiff(1:country_num, example_data$pid) == 0),
          1, setdiff(1:country_num, example_data$pid))

          # create dummy dataframe with missing rid and pid
          add_data = as.data.frame(do.call(cbind, list(rid_miss, pid_miss, NA)))
          colnames(add_data) = colnames(example_data)

          # add dummy dataframe to original
          example_data = rbind(example_data, add_data)

          # the dcast now takes missing rid and pid into account
          mat = dcast(example_data, rid ~ pid, value.var = "TradeValue")

          # can remove first column without setting colnames but this is more failproof
          rownames(mat) = mat[, 1]
          mat = as.matrix(mat[, -1])

          # fill in upper triangular matrix with missing values of lower triangular matrix
          # and vice-versa since TradeValue(rid, pid) = TradeValue(pid, rid)
          mat[is.na(mat)] = t(mat)[is.na(mat)]

          # change NAs to 0 according to preference - would keep as NA to differentiate
          # from actual zeros
          mat[is.na(mat)] = 0


          Does this help?






          share|improve this answer























          • Just a quick touch base. Thanks so much for your response, I will look at your code after get back from work and Dropbox folder link has been updated.
            – Chris T.
            Nov 7 at 16:46










          • I applied your data to my data, but it didn't work, because add_data is of different length than example_data, the former does not have the third column (TradeValue).
            – Chris T.
            Nov 8 at 2:15










          • It's because in my example_data, the pid contains complete list of all country id, if you check that with pid_miss = setdiff(1:country_num, example_data$pid), length(pid_miss) is 0. Putting pid_miss into list(rid_miss, pid_miss, NA) will create an add_data object of only 2 columns, which is not of the same length as example_data.
            – Chris T.
            Nov 8 at 2:32










          • I know it is a big dataset but can you dput it? I am still unable to access the dropbox file, and I need it to understand the data better. Your 5 x 5 example had 1 missing from both rid and pid and so I assumed that was your dataset was like.
            – phil_t
            Nov 8 at 13:46










          • Thanks again for your response, but the dropbox link should work by now. The data contain 15602 rows, so it's hard to dput here.
            – Chris T.
            Nov 8 at 13:49















          up vote
          1
          down vote



          accepted







          up vote
          1
          down vote



          accepted






          I cannot read the dropbox file but have tried to work off of your 5-country example dataframe -



          country_num = 5

          # check countries missing in rid and pid
          rid_miss = setdiff(1:country_num, example_data$rid)
          pid_miss = ifelse(length(setdiff(1:country_num, example_data$pid) == 0),
          1, setdiff(1:country_num, example_data$pid))

          # create dummy dataframe with missing rid and pid
          add_data = as.data.frame(do.call(cbind, list(rid_miss, pid_miss, NA)))
          colnames(add_data) = colnames(example_data)

          # add dummy dataframe to original
          example_data = rbind(example_data, add_data)

          # the dcast now takes missing rid and pid into account
          mat = dcast(example_data, rid ~ pid, value.var = "TradeValue")

          # can remove first column without setting colnames but this is more failproof
          rownames(mat) = mat[, 1]
          mat = as.matrix(mat[, -1])

          # fill in upper triangular matrix with missing values of lower triangular matrix
          # and vice-versa since TradeValue(rid, pid) = TradeValue(pid, rid)
          mat[is.na(mat)] = t(mat)[is.na(mat)]

          # change NAs to 0 according to preference - would keep as NA to differentiate
          # from actual zeros
          mat[is.na(mat)] = 0


          Does this help?






          share|improve this answer














          I cannot read the dropbox file but have tried to work off of your 5-country example dataframe -



          country_num = 5

          # check countries missing in rid and pid
          rid_miss = setdiff(1:country_num, example_data$rid)
          pid_miss = ifelse(length(setdiff(1:country_num, example_data$pid) == 0),
          1, setdiff(1:country_num, example_data$pid))

          # create dummy dataframe with missing rid and pid
          add_data = as.data.frame(do.call(cbind, list(rid_miss, pid_miss, NA)))
          colnames(add_data) = colnames(example_data)

          # add dummy dataframe to original
          example_data = rbind(example_data, add_data)

          # the dcast now takes missing rid and pid into account
          mat = dcast(example_data, rid ~ pid, value.var = "TradeValue")

          # can remove first column without setting colnames but this is more failproof
          rownames(mat) = mat[, 1]
          mat = as.matrix(mat[, -1])

          # fill in upper triangular matrix with missing values of lower triangular matrix
          # and vice-versa since TradeValue(rid, pid) = TradeValue(pid, rid)
          mat[is.na(mat)] = t(mat)[is.na(mat)]

          # change NAs to 0 according to preference - would keep as NA to differentiate
          # from actual zeros
          mat[is.na(mat)] = 0


          Does this help?







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 8 at 14:43

























          answered Nov 7 at 15:41









          phil_t

          412210




          412210












          • Just a quick touch base. Thanks so much for your response, I will look at your code after get back from work and Dropbox folder link has been updated.
            – Chris T.
            Nov 7 at 16:46










          • I applied your data to my data, but it didn't work, because add_data is of different length than example_data, the former does not have the third column (TradeValue).
            – Chris T.
            Nov 8 at 2:15










          • It's because in my example_data, the pid contains complete list of all country id, if you check that with pid_miss = setdiff(1:country_num, example_data$pid), length(pid_miss) is 0. Putting pid_miss into list(rid_miss, pid_miss, NA) will create an add_data object of only 2 columns, which is not of the same length as example_data.
            – Chris T.
            Nov 8 at 2:32










          • I know it is a big dataset but can you dput it? I am still unable to access the dropbox file, and I need it to understand the data better. Your 5 x 5 example had 1 missing from both rid and pid and so I assumed that was your dataset was like.
            – phil_t
            Nov 8 at 13:46










          • Thanks again for your response, but the dropbox link should work by now. The data contain 15602 rows, so it's hard to dput here.
            – Chris T.
            Nov 8 at 13:49




















          • Just a quick touch base. Thanks so much for your response, I will look at your code after get back from work and Dropbox folder link has been updated.
            – Chris T.
            Nov 7 at 16:46










          • I applied your data to my data, but it didn't work, because add_data is of different length than example_data, the former does not have the third column (TradeValue).
            – Chris T.
            Nov 8 at 2:15










          • It's because in my example_data, the pid contains complete list of all country id, if you check that with pid_miss = setdiff(1:country_num, example_data$pid), length(pid_miss) is 0. Putting pid_miss into list(rid_miss, pid_miss, NA) will create an add_data object of only 2 columns, which is not of the same length as example_data.
            – Chris T.
            Nov 8 at 2:32










          • I know it is a big dataset but can you dput it? I am still unable to access the dropbox file, and I need it to understand the data better. Your 5 x 5 example had 1 missing from both rid and pid and so I assumed that was your dataset was like.
            – phil_t
            Nov 8 at 13:46










          • Thanks again for your response, but the dropbox link should work by now. The data contain 15602 rows, so it's hard to dput here.
            – Chris T.
            Nov 8 at 13:49


















          Just a quick touch base. Thanks so much for your response, I will look at your code after get back from work and Dropbox folder link has been updated.
          – Chris T.
          Nov 7 at 16:46




          Just a quick touch base. Thanks so much for your response, I will look at your code after get back from work and Dropbox folder link has been updated.
          – Chris T.
          Nov 7 at 16:46












          I applied your data to my data, but it didn't work, because add_data is of different length than example_data, the former does not have the third column (TradeValue).
          – Chris T.
          Nov 8 at 2:15




          I applied your data to my data, but it didn't work, because add_data is of different length than example_data, the former does not have the third column (TradeValue).
          – Chris T.
          Nov 8 at 2:15












          It's because in my example_data, the pid contains complete list of all country id, if you check that with pid_miss = setdiff(1:country_num, example_data$pid), length(pid_miss) is 0. Putting pid_miss into list(rid_miss, pid_miss, NA) will create an add_data object of only 2 columns, which is not of the same length as example_data.
          – Chris T.
          Nov 8 at 2:32




          It's because in my example_data, the pid contains complete list of all country id, if you check that with pid_miss = setdiff(1:country_num, example_data$pid), length(pid_miss) is 0. Putting pid_miss into list(rid_miss, pid_miss, NA) will create an add_data object of only 2 columns, which is not of the same length as example_data.
          – Chris T.
          Nov 8 at 2:32












          I know it is a big dataset but can you dput it? I am still unable to access the dropbox file, and I need it to understand the data better. Your 5 x 5 example had 1 missing from both rid and pid and so I assumed that was your dataset was like.
          – phil_t
          Nov 8 at 13:46




          I know it is a big dataset but can you dput it? I am still unable to access the dropbox file, and I need it to understand the data better. Your 5 x 5 example had 1 missing from both rid and pid and so I assumed that was your dataset was like.
          – phil_t
          Nov 8 at 13:46












          Thanks again for your response, but the dropbox link should work by now. The data contain 15602 rows, so it's hard to dput here.
          – Chris T.
          Nov 8 at 13:49






          Thanks again for your response, but the dropbox link should work by now. The data contain 15602 rows, so it's hard to dput here.
          – Chris T.
          Nov 8 at 13:49




















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53191265%2fmapping-a-dataframe-with-na-to-an-n-by-n-adjacency-matrix-as-a-data-frame-obj%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          Academy of Television Arts & Sciences

          L'Équipe

          1995 France bombings