Weighted violinplot

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I have a Pandas dataframe of data on generator plant capacity (MW) by fuel type. I wanted to show the estimated distribution of plant capacity in two different ways: by plant (easy) and by MW (harder). Here's an example:

# import libraries

import pandas as pd

import numpy as np

import seaborn as sns



# generate empty dataframe

df = pd.DataFrame(data=None,columns=['Fuel','MW'])



# create and seed a randomstate object (to make #s repeatable below)

rnd = np.random.RandomState(7)



# generate fake data for each fuel type and append to df

for myfuel in ['Biomass','Coal','Hydro','Natural Gas','Oil','Solar','Wind','Other']:

    mymean = rnd.uniform(low=2.8,high=3.2)

    mysigma = rnd.uniform(low=0.6,high=1.0)

    df = df.append(

                   pd.DataFrame({'Fuel': myfuel,

                        'MW': np.array(rnd.lognormal(mean=mymean,sigma=mysigma,size=1000))

                       }),

                   ignore_index=True

                   )



# make violinplot

sns.violinplot(x = 'Fuel',

               y = 'MW',

               data=df,

               inner=None,

               scale='area',

               cut=0,

               linewidth=0.5

              )

And here's the plot of the estimated distributions of plant size by MW this code makes:

un-weighted violinplot

This violinplot is very deceptive, without more context. Because it is not weighted, the thin tail at the top of each category hides the fact that the relatively few plants in the tail contain a lot of (maybe even most of) the MWs of capacity. So I want a second plot with the distribution by MWs--basically a weighted version of this first violinplot.

I wanted to know if anyone has figured out an elegant way to make such a "weighted" violinplot, or if anyone has an idea about the most elegant way to do that.

I figured I could loop through each row of my plant-level dataframe and decompose the plant data (into a new dataframe) into MW-level data. For instance, for a row in the plant-level dataframe that shows a plant with 350 MW, I could decompose that into 3500 new rows of my new dataframe, each representing 100 kW of capacity. (I think I have to go to at least the 100 kW level of resolution, because some of these plants are pretty small, in the 100 kW range.) That new dataframe would be enormous, but I could then do a violinplot of that decomposed data. That seemed slightly brute force. Any better ideas for approach?

Update:

I implemented the brute force method described above. Here's what it looks like if anyone is interested. This is not the "answer" to this question, because I still would be interested if anyone knows a more elegant/simple/efficient way to do this. So please chime in if you know of such a way. Otherwise, I hope this brute force approach might be helpful to someone in the future.

So that it's easy to see that the weighted violinplot makes sense, I replaced the random data with a simple uniform series of numbers from 0 to 10. Under this new approach, the violinplot of df should look pretty uniform, and the violinplot of the weighted data (dfw) should get steadily wider towards the top of the violins. That's exactly what happens (see image of violinplots below).

# import libraries

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt



# generate empty dataframe

df = pd.DataFrame(data=None,columns=['Fuel','MW'])



# generate fake data for each fuel type and append to df

for myfuel in ['Biomass','Coal','Hydro','Natural Gas','Oil','Solar','Wind','Other']:

    df = df.append(

                   pd.DataFrame({'Fuel': myfuel,

                        # To make it easy to see that the violinplot of dfw (below)

                        # makes sense, here we'll just use a simple range list from

                        # 0 to 10

                        'MW': np.array(range(11))

                       }),

                   ignore_index=True

                   )



# I have to recast the data type here to avoid an error when using violinplot below

df.MW = df.MW.astype(float)



# create another empty dataframe

dfw = pd.DataFrame(data=None,columns=['Fuel','MW'])

# since dfw will be huge, specify data types (in particular, use "category" for Fuel to limit dfw size)

dfw = dfw.astype(dtype={'Fuel':'category', 'MW':'float'})



# Define the MW size by which to normalize all of the units

# Careful: too big -> loss of fidelity in data for small plants

#          too small -> dfw will need to store an enormous amount of data

norm = 0.1 # this is in MW, so 0.1 MW = 100 kW



# Define a var to represent (for each row) how many basic units

# of size = norm there are in each row

mynum = 0



# loop through rows of df

for index, row in df.iterrows():



    # calculate and store the number of norm MW there are within the MW of each plant

    mynum = int(round(row['MW']/norm))



    # insert mynum rows into dfw, each with Fuel = row['Fuel'] and MW = row['MW']

    dfw = dfw.append(

                   pd.DataFrame({'Fuel': row['Fuel'],

                                 'MW': np.array([row['MW']]*mynum,dtype='float')

                                 }),

                                 ignore_index=True

                    )





# Set up figure and axes

fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, sharey='row')



# make violinplot

sns.violinplot(x = 'Fuel',

               y = 'MW',

               data=df,

               inner=None,

               scale='area',

               cut=0,

               linewidth=0.5,

               ax = ax1

              )   



# make violinplot

sns.violinplot(x = 'Fuel',

               y = 'MW',

               data=dfw,

               inner=None,

               scale='area',

               cut=0,

               linewidth=0.5,

               ax = ax2

              ) 



# loop through the set of tick labels for both axes

# set tick label size and rotation

for item in (ax1.get_xticklabels() + ax2.get_xticklabels()): 

    item.set_fontsize(8)

    item.set_rotation(30)

    item.set_horizontalalignment('right')



plt.show()

un-weighted and weighted violinplots

edited Nov 23 '18 at 20:27

asked Nov 23 '18 at 17:23

Emily Beth

169111

add a comment |

# import libraries

import pandas as pd

import numpy as np

import seaborn as sns



# generate empty dataframe

df = pd.DataFrame(data=None,columns=['Fuel','MW'])



# create and seed a randomstate object (to make #s repeatable below)

rnd = np.random.RandomState(7)



# generate fake data for each fuel type and append to df

for myfuel in ['Biomass','Coal','Hydro','Natural Gas','Oil','Solar','Wind','Other']:

    mymean = rnd.uniform(low=2.8,high=3.2)

    mysigma = rnd.uniform(low=0.6,high=1.0)

    df = df.append(

                   pd.DataFrame({'Fuel': myfuel,

                        'MW': np.array(rnd.lognormal(mean=mymean,sigma=mysigma,size=1000))

                       }),

                   ignore_index=True

                   )



# make violinplot

sns.violinplot(x = 'Fuel',

               y = 'MW',

               data=df,

               inner=None,

               scale='area',

               cut=0,

               linewidth=0.5

              )

And here's the plot of the estimated distributions of plant size by MW this code makes:

un-weighted violinplot

I wanted to know if anyone has figured out an elegant way to make such a "weighted" violinplot, or if anyone has an idea about the most elegant way to do that.

Update:

# import libraries

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt



# generate empty dataframe

df = pd.DataFrame(data=None,columns=['Fuel','MW'])



# generate fake data for each fuel type and append to df

for myfuel in ['Biomass','Coal','Hydro','Natural Gas','Oil','Solar','Wind','Other']:

    df = df.append(

                   pd.DataFrame({'Fuel': myfuel,

                        # To make it easy to see that the violinplot of dfw (below)

                        # makes sense, here we'll just use a simple range list from

                        # 0 to 10

                        'MW': np.array(range(11))

                       }),

                   ignore_index=True

                   )



# I have to recast the data type here to avoid an error when using violinplot below

df.MW = df.MW.astype(float)



# create another empty dataframe

dfw = pd.DataFrame(data=None,columns=['Fuel','MW'])

# since dfw will be huge, specify data types (in particular, use "category" for Fuel to limit dfw size)

dfw = dfw.astype(dtype={'Fuel':'category', 'MW':'float'})



# Define the MW size by which to normalize all of the units

# Careful: too big -> loss of fidelity in data for small plants

#          too small -> dfw will need to store an enormous amount of data

norm = 0.1 # this is in MW, so 0.1 MW = 100 kW



# Define a var to represent (for each row) how many basic units

# of size = norm there are in each row

mynum = 0



# loop through rows of df

for index, row in df.iterrows():



    # calculate and store the number of norm MW there are within the MW of each plant

    mynum = int(round(row['MW']/norm))



    # insert mynum rows into dfw, each with Fuel = row['Fuel'] and MW = row['MW']

    dfw = dfw.append(

                   pd.DataFrame({'Fuel': row['Fuel'],

                                 'MW': np.array([row['MW']]*mynum,dtype='float')

                                 }),

                                 ignore_index=True

                    )





# Set up figure and axes

fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, sharey='row')



# make violinplot

sns.violinplot(x = 'Fuel',

               y = 'MW',

               data=df,

               inner=None,

               scale='area',

               cut=0,

               linewidth=0.5,

               ax = ax1

              )   



# make violinplot

sns.violinplot(x = 'Fuel',

               y = 'MW',

               data=dfw,

               inner=None,

               scale='area',

               cut=0,

               linewidth=0.5,

               ax = ax2

              ) 



# loop through the set of tick labels for both axes

# set tick label size and rotation

for item in (ax1.get_xticklabels() + ax2.get_xticklabels()): 

    item.set_fontsize(8)

    item.set_rotation(30)

    item.set_horizontalalignment('right')



plt.show()

un-weighted and weighted violinplots

edited Nov 23 '18 at 20:27

asked Nov 23 '18 at 17:23

Emily Beth

169111

add a comment |

# import libraries

import pandas as pd

import numpy as np

import seaborn as sns



# generate empty dataframe

df = pd.DataFrame(data=None,columns=['Fuel','MW'])



# create and seed a randomstate object (to make #s repeatable below)

rnd = np.random.RandomState(7)



# generate fake data for each fuel type and append to df

for myfuel in ['Biomass','Coal','Hydro','Natural Gas','Oil','Solar','Wind','Other']:

    mymean = rnd.uniform(low=2.8,high=3.2)

    mysigma = rnd.uniform(low=0.6,high=1.0)

    df = df.append(

                   pd.DataFrame({'Fuel': myfuel,

                        'MW': np.array(rnd.lognormal(mean=mymean,sigma=mysigma,size=1000))

                       }),

                   ignore_index=True

                   )



# make violinplot

sns.violinplot(x = 'Fuel',

               y = 'MW',

               data=df,

               inner=None,

               scale='area',

               cut=0,

               linewidth=0.5

              )

And here's the plot of the estimated distributions of plant size by MW this code makes:

un-weighted violinplot

I wanted to know if anyone has figured out an elegant way to make such a "weighted" violinplot, or if anyone has an idea about the most elegant way to do that.

Update:

# import libraries

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt



# generate empty dataframe

df = pd.DataFrame(data=None,columns=['Fuel','MW'])



# generate fake data for each fuel type and append to df

for myfuel in ['Biomass','Coal','Hydro','Natural Gas','Oil','Solar','Wind','Other']:

    df = df.append(

                   pd.DataFrame({'Fuel': myfuel,

                        # To make it easy to see that the violinplot of dfw (below)

                        # makes sense, here we'll just use a simple range list from

                        # 0 to 10

                        'MW': np.array(range(11))

                       }),

                   ignore_index=True

                   )



# I have to recast the data type here to avoid an error when using violinplot below

df.MW = df.MW.astype(float)



# create another empty dataframe

dfw = pd.DataFrame(data=None,columns=['Fuel','MW'])

# since dfw will be huge, specify data types (in particular, use "category" for Fuel to limit dfw size)

dfw = dfw.astype(dtype={'Fuel':'category', 'MW':'float'})



# Define the MW size by which to normalize all of the units

# Careful: too big -> loss of fidelity in data for small plants

#          too small -> dfw will need to store an enormous amount of data

norm = 0.1 # this is in MW, so 0.1 MW = 100 kW



# Define a var to represent (for each row) how many basic units

# of size = norm there are in each row

mynum = 0



# loop through rows of df

for index, row in df.iterrows():



    # calculate and store the number of norm MW there are within the MW of each plant

    mynum = int(round(row['MW']/norm))



    # insert mynum rows into dfw, each with Fuel = row['Fuel'] and MW = row['MW']

    dfw = dfw.append(

                   pd.DataFrame({'Fuel': row['Fuel'],

                                 'MW': np.array([row['MW']]*mynum,dtype='float')

                                 }),

                                 ignore_index=True

                    )





# Set up figure and axes

fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, sharey='row')



# make violinplot

sns.violinplot(x = 'Fuel',

               y = 'MW',

               data=df,

               inner=None,

               scale='area',

               cut=0,

               linewidth=0.5,

               ax = ax1

              )   



# make violinplot

sns.violinplot(x = 'Fuel',

               y = 'MW',

               data=dfw,

               inner=None,

               scale='area',

               cut=0,

               linewidth=0.5,

               ax = ax2

              ) 



# loop through the set of tick labels for both axes

# set tick label size and rotation

for item in (ax1.get_xticklabels() + ax2.get_xticklabels()): 

    item.set_fontsize(8)

    item.set_rotation(30)

    item.set_horizontalalignment('right')



plt.show()

un-weighted and weighted violinplots

edited Nov 23 '18 at 20:27

asked Nov 23 '18 at 17:23

Emily Beth

169111

# import libraries

import pandas as pd

import numpy as np

import seaborn as sns



# generate empty dataframe

df = pd.DataFrame(data=None,columns=['Fuel','MW'])



# create and seed a randomstate object (to make #s repeatable below)

rnd = np.random.RandomState(7)



# generate fake data for each fuel type and append to df

for myfuel in ['Biomass','Coal','Hydro','Natural Gas','Oil','Solar','Wind','Other']:

    mymean = rnd.uniform(low=2.8,high=3.2)

    mysigma = rnd.uniform(low=0.6,high=1.0)

    df = df.append(

                   pd.DataFrame({'Fuel': myfuel,

                        'MW': np.array(rnd.lognormal(mean=mymean,sigma=mysigma,size=1000))

                       }),

                   ignore_index=True

                   )



# make violinplot

sns.violinplot(x = 'Fuel',

               y = 'MW',

               data=df,

               inner=None,

               scale='area',

               cut=0,

               linewidth=0.5

              )

And here's the plot of the estimated distributions of plant size by MW this code makes:

un-weighted violinplot

I wanted to know if anyone has figured out an elegant way to make such a "weighted" violinplot, or if anyone has an idea about the most elegant way to do that.

Update:

# import libraries

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt



# generate empty dataframe

df = pd.DataFrame(data=None,columns=['Fuel','MW'])



# generate fake data for each fuel type and append to df

for myfuel in ['Biomass','Coal','Hydro','Natural Gas','Oil','Solar','Wind','Other']:

    df = df.append(

                   pd.DataFrame({'Fuel': myfuel,

                        # To make it easy to see that the violinplot of dfw (below)

                        # makes sense, here we'll just use a simple range list from

                        # 0 to 10

                        'MW': np.array(range(11))

                       }),

                   ignore_index=True

                   )



# I have to recast the data type here to avoid an error when using violinplot below

df.MW = df.MW.astype(float)



# create another empty dataframe

dfw = pd.DataFrame(data=None,columns=['Fuel','MW'])

# since dfw will be huge, specify data types (in particular, use "category" for Fuel to limit dfw size)

dfw = dfw.astype(dtype={'Fuel':'category', 'MW':'float'})



# Define the MW size by which to normalize all of the units

# Careful: too big -> loss of fidelity in data for small plants

#          too small -> dfw will need to store an enormous amount of data

norm = 0.1 # this is in MW, so 0.1 MW = 100 kW



# Define a var to represent (for each row) how many basic units

# of size = norm there are in each row

mynum = 0



# loop through rows of df

for index, row in df.iterrows():



    # calculate and store the number of norm MW there are within the MW of each plant

    mynum = int(round(row['MW']/norm))



    # insert mynum rows into dfw, each with Fuel = row['Fuel'] and MW = row['MW']

    dfw = dfw.append(

                   pd.DataFrame({'Fuel': row['Fuel'],

                                 'MW': np.array([row['MW']]*mynum,dtype='float')

                                 }),

                                 ignore_index=True

                    )





# Set up figure and axes

fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, sharey='row')



# make violinplot

sns.violinplot(x = 'Fuel',

               y = 'MW',

               data=df,

               inner=None,

               scale='area',

               cut=0,

               linewidth=0.5,

               ax = ax1

              )   



# make violinplot

sns.violinplot(x = 'Fuel',

               y = 'MW',

               data=dfw,

               inner=None,

               scale='area',

               cut=0,

               linewidth=0.5,

               ax = ax2

              ) 



# loop through the set of tick labels for both axes

# set tick label size and rotation

for item in (ax1.get_xticklabels() + ax2.get_xticklabels()): 

    item.set_fontsize(8)

    item.set_rotation(30)

    item.set_horizontalalignment('right')



plt.show()

un-weighted and weighted violinplots

python pandas seaborn violin-plot

edited Nov 23 '18 at 20:27

asked Nov 23 '18 at 17:23

Emily Beth

169111

edited Nov 23 '18 at 20:27

asked Nov 23 '18 at 17:23

Emily Beth

169111

edited Nov 23 '18 at 20:27

asked Nov 23 '18 at 17:23

Emily Beth

169111

asked Nov 23 '18 at 17:23

Emily Beth

169111

asked Nov 23 '18 at 17:23

Emily Beth

169111

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53450787%2fweighted-violinplot%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

eW14bHW3rvzMWk1j6U3clj,cekrWL29xB

搜尋此網誌

Wsrtjtyk