ggalluvial: How do I plot an alluvial diagram when I have a dataframe with links and nodes?












2















I have this dataframe with timepoints (a, b and c), labels (l1, l2, l3) and frequencies that are distributed over the timepoints and labels.
I want to create a sankey diagram with the ggalluvial package in R.
Here's some code:



library(tidyverse)
library(forcats)
library(ggalluvial)
library(magrittr)

plotAlluvial <- function(.df,name=freq) {
y_name <- enquo(name)
ggplot(.df,
aes(
x = tp,
stratum = lbl,
alluvium = id,
label=lbl,
fill = lbl,
y=!!y_name
)
) +
geom_stratum() +
geom_flow(stat = "flow", color = "darkgray") +
geom_text(stat = "stratum") +
scale_fill_brewer(type = "qual", palette = "Set2")
}

x1=c(6,0,0,5,5,4,2,0,3)
x2=c(5,5,3,0,0,5,0,7,0)
df=data_frame(tp1=rep(c('a','b'),each=9),
lbl1=c(rep(c('l1','l2','l3'),2,each=3)),
tp2=rep(c('b','c'),each=9),
lbl2=c(rep(c('l1','l2','l3'),6)),
freq=c(x1,x2)
)

df2=df %>%
mutate(id=row_number()) %>%
unite(un1,c(tp1,lbl1)) %>%
unite(un2,c(tp2,lbl2)) %>%
tidyr::gather(key,value,-c(freq,id)) %>%
separate('value',c('tp','lbl'))
df2.left= df2 %>%
dplyr::filter(!(key=='un1' & tp=='b'))
df2.right= df2 %>%
dplyr::filter(!(key=='un2' & tp=='b'))


I can plot the left side and plot the right side of the diagram I want:



plotAlluvial(df2.left)
plotAlluvial(df2.right)


enter image description hereenter image description here



But if I try to plot the left and right side at the same time I get this plot:



plotAlluvial(df2)


enter image description here



When I use the code above, the plot of the diagram has too many frequencies at timepoint b. The stratum should be as high as the other two stratums so have a height of 25.
What am I doing wrong? How can I create a diagram that combines the first two plots?



EDIT:



After a comment I added a proportion of the frequencies variable. Now the stratum b is of the correct height but the incoming and outgoing flows still only occupy 50% of each condition in timepoint b.



df2 %<>% group_by(tp) %>% mutate(prop = freq / sum(freq)) %>%
ungroup()
plotAlluvial(df2,prop)


enter image description here










share|improve this question




















  • 2





    Could you please provide a desired diagram? Unfortunately I cannot understand what wrong with your output

    – atsyplenkov
    Nov 21 '18 at 16:38











  • Sorry I can't. But I want the the timepoint b stratum as high as the other two stratums. So freq should also be 25.

    – Robert
    Nov 21 '18 at 19:38






  • 1





    Do you basically want to be able to use position = "fill" (as documented here) with the ggalluvial geoms, and plot percentages rather than values? This isn't possible at present; you'd need to transform the data before calling ggplot() to achieve this.

    – Cory Brunson
    Nov 22 '18 at 22:02











  • Thanks Cory! It looks indeed like the thing I want. Do you have any idea how I should transform the data above so I can make the diagram I want? I have changed my question and added more pictures to better explain what I want.

    – Robert
    Nov 23 '18 at 8:32











  • @Robert, on closer inspection, i seems like you might be mis-encoding your data. Are you trying to track the distribution of 25 subjects over three labels along three time points? In that case, here's an artificial data frame of the correct (long form) structure: tidyr::crossing(sub = 1:25, tp = letters[1:3]) %>% mutate(lbl = sample(paste0("l", 1:3), 25 * 3, replace = TRUE)).

    – Cory Brunson
    Nov 23 '18 at 18:51
















2















I have this dataframe with timepoints (a, b and c), labels (l1, l2, l3) and frequencies that are distributed over the timepoints and labels.
I want to create a sankey diagram with the ggalluvial package in R.
Here's some code:



library(tidyverse)
library(forcats)
library(ggalluvial)
library(magrittr)

plotAlluvial <- function(.df,name=freq) {
y_name <- enquo(name)
ggplot(.df,
aes(
x = tp,
stratum = lbl,
alluvium = id,
label=lbl,
fill = lbl,
y=!!y_name
)
) +
geom_stratum() +
geom_flow(stat = "flow", color = "darkgray") +
geom_text(stat = "stratum") +
scale_fill_brewer(type = "qual", palette = "Set2")
}

x1=c(6,0,0,5,5,4,2,0,3)
x2=c(5,5,3,0,0,5,0,7,0)
df=data_frame(tp1=rep(c('a','b'),each=9),
lbl1=c(rep(c('l1','l2','l3'),2,each=3)),
tp2=rep(c('b','c'),each=9),
lbl2=c(rep(c('l1','l2','l3'),6)),
freq=c(x1,x2)
)

df2=df %>%
mutate(id=row_number()) %>%
unite(un1,c(tp1,lbl1)) %>%
unite(un2,c(tp2,lbl2)) %>%
tidyr::gather(key,value,-c(freq,id)) %>%
separate('value',c('tp','lbl'))
df2.left= df2 %>%
dplyr::filter(!(key=='un1' & tp=='b'))
df2.right= df2 %>%
dplyr::filter(!(key=='un2' & tp=='b'))


I can plot the left side and plot the right side of the diagram I want:



plotAlluvial(df2.left)
plotAlluvial(df2.right)


enter image description hereenter image description here



But if I try to plot the left and right side at the same time I get this plot:



plotAlluvial(df2)


enter image description here



When I use the code above, the plot of the diagram has too many frequencies at timepoint b. The stratum should be as high as the other two stratums so have a height of 25.
What am I doing wrong? How can I create a diagram that combines the first two plots?



EDIT:



After a comment I added a proportion of the frequencies variable. Now the stratum b is of the correct height but the incoming and outgoing flows still only occupy 50% of each condition in timepoint b.



df2 %<>% group_by(tp) %>% mutate(prop = freq / sum(freq)) %>%
ungroup()
plotAlluvial(df2,prop)


enter image description here










share|improve this question




















  • 2





    Could you please provide a desired diagram? Unfortunately I cannot understand what wrong with your output

    – atsyplenkov
    Nov 21 '18 at 16:38











  • Sorry I can't. But I want the the timepoint b stratum as high as the other two stratums. So freq should also be 25.

    – Robert
    Nov 21 '18 at 19:38






  • 1





    Do you basically want to be able to use position = "fill" (as documented here) with the ggalluvial geoms, and plot percentages rather than values? This isn't possible at present; you'd need to transform the data before calling ggplot() to achieve this.

    – Cory Brunson
    Nov 22 '18 at 22:02











  • Thanks Cory! It looks indeed like the thing I want. Do you have any idea how I should transform the data above so I can make the diagram I want? I have changed my question and added more pictures to better explain what I want.

    – Robert
    Nov 23 '18 at 8:32











  • @Robert, on closer inspection, i seems like you might be mis-encoding your data. Are you trying to track the distribution of 25 subjects over three labels along three time points? In that case, here's an artificial data frame of the correct (long form) structure: tidyr::crossing(sub = 1:25, tp = letters[1:3]) %>% mutate(lbl = sample(paste0("l", 1:3), 25 * 3, replace = TRUE)).

    – Cory Brunson
    Nov 23 '18 at 18:51














2












2








2








I have this dataframe with timepoints (a, b and c), labels (l1, l2, l3) and frequencies that are distributed over the timepoints and labels.
I want to create a sankey diagram with the ggalluvial package in R.
Here's some code:



library(tidyverse)
library(forcats)
library(ggalluvial)
library(magrittr)

plotAlluvial <- function(.df,name=freq) {
y_name <- enquo(name)
ggplot(.df,
aes(
x = tp,
stratum = lbl,
alluvium = id,
label=lbl,
fill = lbl,
y=!!y_name
)
) +
geom_stratum() +
geom_flow(stat = "flow", color = "darkgray") +
geom_text(stat = "stratum") +
scale_fill_brewer(type = "qual", palette = "Set2")
}

x1=c(6,0,0,5,5,4,2,0,3)
x2=c(5,5,3,0,0,5,0,7,0)
df=data_frame(tp1=rep(c('a','b'),each=9),
lbl1=c(rep(c('l1','l2','l3'),2,each=3)),
tp2=rep(c('b','c'),each=9),
lbl2=c(rep(c('l1','l2','l3'),6)),
freq=c(x1,x2)
)

df2=df %>%
mutate(id=row_number()) %>%
unite(un1,c(tp1,lbl1)) %>%
unite(un2,c(tp2,lbl2)) %>%
tidyr::gather(key,value,-c(freq,id)) %>%
separate('value',c('tp','lbl'))
df2.left= df2 %>%
dplyr::filter(!(key=='un1' & tp=='b'))
df2.right= df2 %>%
dplyr::filter(!(key=='un2' & tp=='b'))


I can plot the left side and plot the right side of the diagram I want:



plotAlluvial(df2.left)
plotAlluvial(df2.right)


enter image description hereenter image description here



But if I try to plot the left and right side at the same time I get this plot:



plotAlluvial(df2)


enter image description here



When I use the code above, the plot of the diagram has too many frequencies at timepoint b. The stratum should be as high as the other two stratums so have a height of 25.
What am I doing wrong? How can I create a diagram that combines the first two plots?



EDIT:



After a comment I added a proportion of the frequencies variable. Now the stratum b is of the correct height but the incoming and outgoing flows still only occupy 50% of each condition in timepoint b.



df2 %<>% group_by(tp) %>% mutate(prop = freq / sum(freq)) %>%
ungroup()
plotAlluvial(df2,prop)


enter image description here










share|improve this question
















I have this dataframe with timepoints (a, b and c), labels (l1, l2, l3) and frequencies that are distributed over the timepoints and labels.
I want to create a sankey diagram with the ggalluvial package in R.
Here's some code:



library(tidyverse)
library(forcats)
library(ggalluvial)
library(magrittr)

plotAlluvial <- function(.df,name=freq) {
y_name <- enquo(name)
ggplot(.df,
aes(
x = tp,
stratum = lbl,
alluvium = id,
label=lbl,
fill = lbl,
y=!!y_name
)
) +
geom_stratum() +
geom_flow(stat = "flow", color = "darkgray") +
geom_text(stat = "stratum") +
scale_fill_brewer(type = "qual", palette = "Set2")
}

x1=c(6,0,0,5,5,4,2,0,3)
x2=c(5,5,3,0,0,5,0,7,0)
df=data_frame(tp1=rep(c('a','b'),each=9),
lbl1=c(rep(c('l1','l2','l3'),2,each=3)),
tp2=rep(c('b','c'),each=9),
lbl2=c(rep(c('l1','l2','l3'),6)),
freq=c(x1,x2)
)

df2=df %>%
mutate(id=row_number()) %>%
unite(un1,c(tp1,lbl1)) %>%
unite(un2,c(tp2,lbl2)) %>%
tidyr::gather(key,value,-c(freq,id)) %>%
separate('value',c('tp','lbl'))
df2.left= df2 %>%
dplyr::filter(!(key=='un1' & tp=='b'))
df2.right= df2 %>%
dplyr::filter(!(key=='un2' & tp=='b'))


I can plot the left side and plot the right side of the diagram I want:



plotAlluvial(df2.left)
plotAlluvial(df2.right)


enter image description hereenter image description here



But if I try to plot the left and right side at the same time I get this plot:



plotAlluvial(df2)


enter image description here



When I use the code above, the plot of the diagram has too many frequencies at timepoint b. The stratum should be as high as the other two stratums so have a height of 25.
What am I doing wrong? How can I create a diagram that combines the first two plots?



EDIT:



After a comment I added a proportion of the frequencies variable. Now the stratum b is of the correct height but the incoming and outgoing flows still only occupy 50% of each condition in timepoint b.



df2 %<>% group_by(tp) %>% mutate(prop = freq / sum(freq)) %>%
ungroup()
plotAlluvial(df2,prop)


enter image description here







r ggplot2 sankey-diagram






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 26 '18 at 9:21







Robert

















asked Nov 21 '18 at 13:40









RobertRobert

4111517




4111517








  • 2





    Could you please provide a desired diagram? Unfortunately I cannot understand what wrong with your output

    – atsyplenkov
    Nov 21 '18 at 16:38











  • Sorry I can't. But I want the the timepoint b stratum as high as the other two stratums. So freq should also be 25.

    – Robert
    Nov 21 '18 at 19:38






  • 1





    Do you basically want to be able to use position = "fill" (as documented here) with the ggalluvial geoms, and plot percentages rather than values? This isn't possible at present; you'd need to transform the data before calling ggplot() to achieve this.

    – Cory Brunson
    Nov 22 '18 at 22:02











  • Thanks Cory! It looks indeed like the thing I want. Do you have any idea how I should transform the data above so I can make the diagram I want? I have changed my question and added more pictures to better explain what I want.

    – Robert
    Nov 23 '18 at 8:32











  • @Robert, on closer inspection, i seems like you might be mis-encoding your data. Are you trying to track the distribution of 25 subjects over three labels along three time points? In that case, here's an artificial data frame of the correct (long form) structure: tidyr::crossing(sub = 1:25, tp = letters[1:3]) %>% mutate(lbl = sample(paste0("l", 1:3), 25 * 3, replace = TRUE)).

    – Cory Brunson
    Nov 23 '18 at 18:51














  • 2





    Could you please provide a desired diagram? Unfortunately I cannot understand what wrong with your output

    – atsyplenkov
    Nov 21 '18 at 16:38











  • Sorry I can't. But I want the the timepoint b stratum as high as the other two stratums. So freq should also be 25.

    – Robert
    Nov 21 '18 at 19:38






  • 1





    Do you basically want to be able to use position = "fill" (as documented here) with the ggalluvial geoms, and plot percentages rather than values? This isn't possible at present; you'd need to transform the data before calling ggplot() to achieve this.

    – Cory Brunson
    Nov 22 '18 at 22:02











  • Thanks Cory! It looks indeed like the thing I want. Do you have any idea how I should transform the data above so I can make the diagram I want? I have changed my question and added more pictures to better explain what I want.

    – Robert
    Nov 23 '18 at 8:32











  • @Robert, on closer inspection, i seems like you might be mis-encoding your data. Are you trying to track the distribution of 25 subjects over three labels along three time points? In that case, here's an artificial data frame of the correct (long form) structure: tidyr::crossing(sub = 1:25, tp = letters[1:3]) %>% mutate(lbl = sample(paste0("l", 1:3), 25 * 3, replace = TRUE)).

    – Cory Brunson
    Nov 23 '18 at 18:51








2




2





Could you please provide a desired diagram? Unfortunately I cannot understand what wrong with your output

– atsyplenkov
Nov 21 '18 at 16:38





Could you please provide a desired diagram? Unfortunately I cannot understand what wrong with your output

– atsyplenkov
Nov 21 '18 at 16:38













Sorry I can't. But I want the the timepoint b stratum as high as the other two stratums. So freq should also be 25.

– Robert
Nov 21 '18 at 19:38





Sorry I can't. But I want the the timepoint b stratum as high as the other two stratums. So freq should also be 25.

– Robert
Nov 21 '18 at 19:38




1




1





Do you basically want to be able to use position = "fill" (as documented here) with the ggalluvial geoms, and plot percentages rather than values? This isn't possible at present; you'd need to transform the data before calling ggplot() to achieve this.

– Cory Brunson
Nov 22 '18 at 22:02





Do you basically want to be able to use position = "fill" (as documented here) with the ggalluvial geoms, and plot percentages rather than values? This isn't possible at present; you'd need to transform the data before calling ggplot() to achieve this.

– Cory Brunson
Nov 22 '18 at 22:02













Thanks Cory! It looks indeed like the thing I want. Do you have any idea how I should transform the data above so I can make the diagram I want? I have changed my question and added more pictures to better explain what I want.

– Robert
Nov 23 '18 at 8:32





Thanks Cory! It looks indeed like the thing I want. Do you have any idea how I should transform the data above so I can make the diagram I want? I have changed my question and added more pictures to better explain what I want.

– Robert
Nov 23 '18 at 8:32













@Robert, on closer inspection, i seems like you might be mis-encoding your data. Are you trying to track the distribution of 25 subjects over three labels along three time points? In that case, here's an artificial data frame of the correct (long form) structure: tidyr::crossing(sub = 1:25, tp = letters[1:3]) %>% mutate(lbl = sample(paste0("l", 1:3), 25 * 3, replace = TRUE)).

– Cory Brunson
Nov 23 '18 at 18:51





@Robert, on closer inspection, i seems like you might be mis-encoding your data. Are you trying to track the distribution of 25 subjects over three labels along three time points? In that case, here's an artificial data frame of the correct (long form) structure: tidyr::crossing(sub = 1:25, tp = letters[1:3]) %>% mutate(lbl = sample(paste0("l", 1:3), 25 * 3, replace = TRUE)).

– Cory Brunson
Nov 23 '18 at 18:51












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53413351%2fggalluvial-how-do-i-plot-an-alluvial-diagram-when-i-have-a-dataframe-with-links%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53413351%2fggalluvial-how-do-i-plot-an-alluvial-diagram-when-i-have-a-dataframe-with-links%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Hercules Kyvelos

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud