Mapping a new column to a DataFrame by rows from another DataFrame

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I have a Pandas DataFrame stations with index as id:

id    station     lat     lng

1     Boston      45.343  -45.333

2     New York    56.444  -35.690

I have another DataFrame df1 that has the following:

duration   date       station   gender

NaN        20181118   NaN       M

9          20181009   2.0       F

8          20170605   1.0       F

I want to add to df1 so that it looks like the following DataFrame:

duration   date       station   gender  lat     lng 

NaN        20181118   NaN       M       nan     nan

9          20181009   New York  F       56.444  -35.690

8          20170605   Boston    F       45.343  -45.333

I tried doing this iteratively by referring to the station.iloc as shown in the following example but I have about 2 mil rows and it ended up taking a lot of time.

stat_list =     

lng_list 

lat_list = 

for stat in df1:

  if not np.isnan(stat):

        ref = station.iloc[stat]

        stat_list.append(ref.station)

        lng_list.append(ref.lng)

        lat_list.append(ref.lat)

  else:

        stat_list.append(np.nan)

        lng_list.append(np.nan)

        lat_list.append(np.nan)

Is there a faster way to do this?

asked Nov 25 '18 at 9:02

Swopnil Shrestha

636

add a comment |

I have a Pandas DataFrame stations with index as id:

id    station     lat     lng

1     Boston      45.343  -45.333

2     New York    56.444  -35.690

I have another DataFrame df1 that has the following:

duration   date       station   gender

NaN        20181118   NaN       M

9          20181009   2.0       F

8          20170605   1.0       F

I want to add to df1 so that it looks like the following DataFrame:

duration   date       station   gender  lat     lng 

NaN        20181118   NaN       M       nan     nan

9          20181009   New York  F       56.444  -35.690

8          20170605   Boston    F       45.343  -45.333

I tried doing this iteratively by referring to the station.iloc as shown in the following example but I have about 2 mil rows and it ended up taking a lot of time.

stat_list =     

lng_list 

lat_list = 

for stat in df1:

  if not np.isnan(stat):

        ref = station.iloc[stat]

        stat_list.append(ref.station)

        lng_list.append(ref.lng)

        lat_list.append(ref.lat)

  else:

        stat_list.append(np.nan)

        lng_list.append(np.nan)

        lat_list.append(np.nan)

Is there a faster way to do this?

asked Nov 25 '18 at 9:02

Swopnil Shrestha

636

add a comment |

I have a Pandas DataFrame stations with index as id:

id    station     lat     lng

1     Boston      45.343  -45.333

2     New York    56.444  -35.690

I have another DataFrame df1 that has the following:

duration   date       station   gender

NaN        20181118   NaN       M

9          20181009   2.0       F

8          20170605   1.0       F

I want to add to df1 so that it looks like the following DataFrame:

duration   date       station   gender  lat     lng 

NaN        20181118   NaN       M       nan     nan

9          20181009   New York  F       56.444  -35.690

8          20170605   Boston    F       45.343  -45.333

I tried doing this iteratively by referring to the station.iloc as shown in the following example but I have about 2 mil rows and it ended up taking a lot of time.

stat_list =     

lng_list 

lat_list = 

for stat in df1:

  if not np.isnan(stat):

        ref = station.iloc[stat]

        stat_list.append(ref.station)

        lng_list.append(ref.lng)

        lat_list.append(ref.lat)

  else:

        stat_list.append(np.nan)

        lng_list.append(np.nan)

        lat_list.append(np.nan)

Is there a faster way to do this?

asked Nov 25 '18 at 9:02

Swopnil Shrestha

636

I have a Pandas DataFrame stations with index as id:

id    station     lat     lng

1     Boston      45.343  -45.333

2     New York    56.444  -35.690

I have another DataFrame df1 that has the following:

duration   date       station   gender

NaN        20181118   NaN       M

9          20181009   2.0       F

8          20170605   1.0       F

I want to add to df1 so that it looks like the following DataFrame:

duration   date       station   gender  lat     lng 

NaN        20181118   NaN       M       nan     nan

9          20181009   New York  F       56.444  -35.690

8          20170605   Boston    F       45.343  -45.333

I tried doing this iteratively by referring to the station.iloc as shown in the following example but I have about 2 mil rows and it ended up taking a lot of time.

stat_list =     

lng_list 

lat_list = 

for stat in df1:

  if not np.isnan(stat):

        ref = station.iloc[stat]

        stat_list.append(ref.station)

        lng_list.append(ref.lng)

        lat_list.append(ref.lat)

  else:

        stat_list.append(np.nan)

        lng_list.append(np.nan)

        lat_list.append(np.nan)

Is there a faster way to do this?

python pandas performance numpy dataframe

asked Nov 25 '18 at 9:02

Swopnil Shrestha

636

asked Nov 25 '18 at 9:02

Swopnil Shrestha

636

asked Nov 25 '18 at 9:02

Swopnil Shrestha

636

asked Nov 25 '18 at 9:02

Swopnil Shrestha

636

asked Nov 25 '18 at 9:02

Swopnil Shrestha

636

add a comment |

1 Answer
1

active

oldest

votes

Looks like this would be best solved with a merge which should significantly boost performance:

df1.merge(stations, left_on="station", right_index=True, how="left")

This will leave you with two columns station_x and station_y if you only want the station column with the string names in you can do:

df_merged = df1.merge(stations, left_on="station", right_index=True, how="left", suffixes=("_x", ""))

df_final = df_merged[df_merged.columns.difference(["station_x"])]

(or just rename one of them before you merge)

edited Nov 25 '18 at 9:20

answered Nov 25 '18 at 9:11

Sven Harris

2,1961516

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53466038%2fmapping-a-new-column-to-a-dataframe-by-rows-from-another-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Looks like this would be best solved with a merge which should significantly boost performance:

df1.merge(stations, left_on="station", right_index=True, how="left")

This will leave you with two columns station_x and station_y if you only want the station column with the string names in you can do:

df_merged = df1.merge(stations, left_on="station", right_index=True, how="left", suffixes=("_x", ""))

df_final = df_merged[df_merged.columns.difference(["station_x"])]

(or just rename one of them before you merge)

edited Nov 25 '18 at 9:20

answered Nov 25 '18 at 9:11

Sven Harris

2,1961516

add a comment |

Looks like this would be best solved with a merge which should significantly boost performance:

df1.merge(stations, left_on="station", right_index=True, how="left")

This will leave you with two columns station_x and station_y if you only want the station column with the string names in you can do:

df_merged = df1.merge(stations, left_on="station", right_index=True, how="left", suffixes=("_x", ""))

df_final = df_merged[df_merged.columns.difference(["station_x"])]

(or just rename one of them before you merge)

edited Nov 25 '18 at 9:20

answered Nov 25 '18 at 9:11

Sven Harris

2,1961516

add a comment |

Looks like this would be best solved with a merge which should significantly boost performance:

df1.merge(stations, left_on="station", right_index=True, how="left")

This will leave you with two columns station_x and station_y if you only want the station column with the string names in you can do:

df_merged = df1.merge(stations, left_on="station", right_index=True, how="left", suffixes=("_x", ""))

df_final = df_merged[df_merged.columns.difference(["station_x"])]

(or just rename one of them before you merge)

edited Nov 25 '18 at 9:20

answered Nov 25 '18 at 9:11

Sven Harris

2,1961516

Looks like this would be best solved with a merge which should significantly boost performance:

df1.merge(stations, left_on="station", right_index=True, how="left")

This will leave you with two columns station_x and station_y if you only want the station column with the string names in you can do:

df_merged = df1.merge(stations, left_on="station", right_index=True, how="left", suffixes=("_x", ""))

df_final = df_merged[df_merged.columns.difference(["station_x"])]

(or just rename one of them before you merge)

edited Nov 25 '18 at 9:20

answered Nov 25 '18 at 9:11

Sven Harris

2,1961516

edited Nov 25 '18 at 9:20

answered Nov 25 '18 at 9:11

Sven Harris

2,1961516

answered Nov 25 '18 at 9:11

Sven Harris

2,1961516

answered Nov 25 '18 at 9:11

Sven Harris

2,1961516

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk