How to properly use feature transformation functions in Sparklyr

up vote
1
down vote

favorite

Suppose I want to use ft_max_abs_scaler on every column of a dataset. This is what's in the documentation:

sc <- spark_connect(master = "local")

iris_tbl <- sdf_copy_to(sc, iris, name = "iris_tbl", overwrite = TRUE)



features <- c("Sepal_Length", "Sepal_Width", "Petal_Length", "Petal_Width")



iris_tbl <- iris_tbl %>%

  ft_vector_assembler(input_col = features,

                      output_col = "features_temp") %>%

  ft_max_abs_scaler(input_col = "features_temp",

                     output_col = "features")

Note that ft_vector_assembler creates a new column features_temp and ft_max_abs_scaler creates another new column features. Now suppose I want to break down the vector into individual columns, I have to do this:

iris_tbl <- iris_tbl %>% sdf_separate_column("features", into = features) 

# result in error because column name cannot be the same

Since there is no good way to delete columns, I wonder if there is a better way to do feature transformations with Sparklyr without keeping creating new columns.

edited Nov 7 at 15:45

desertnaut

15.3k53361

asked Nov 7 at 15:36

yughred

13711

2

Sigh... The thing is - sdf_separate_column is kind of a definition of a very-bad-idea. While it looks great on a toy examples, I just doesn't take into account specifics of the underlying system, and just doesn't scale and if you want to integrate this with other o.a.s.ml tools, it is completely useless. Also you can drop columns (with transmute(...) or select(-to_drop) for example).
– user6910411
Nov 7 at 17:23

add a comment |

up vote
1
down vote

favorite

Suppose I want to use ft_max_abs_scaler on every column of a dataset. This is what's in the documentation:

sc <- spark_connect(master = "local")

iris_tbl <- sdf_copy_to(sc, iris, name = "iris_tbl", overwrite = TRUE)



features <- c("Sepal_Length", "Sepal_Width", "Petal_Length", "Petal_Width")



iris_tbl <- iris_tbl %>%

  ft_vector_assembler(input_col = features,

                      output_col = "features_temp") %>%

  ft_max_abs_scaler(input_col = "features_temp",

                     output_col = "features")

iris_tbl <- iris_tbl %>% sdf_separate_column("features", into = features) 

# result in error because column name cannot be the same

Since there is no good way to delete columns, I wonder if there is a better way to do feature transformations with Sparklyr without keeping creating new columns.

edited Nov 7 at 15:45

desertnaut

15.3k53361

asked Nov 7 at 15:36

yughred

13711

2

Sigh... The thing is - sdf_separate_column is kind of a definition of a very-bad-idea. While it looks great on a toy examples, I just doesn't take into account specifics of the underlying system, and just doesn't scale and if you want to integrate this with other o.a.s.ml tools, it is completely useless. Also you can drop columns (with transmute(...) or select(-to_drop) for example).
– user6910411
Nov 7 at 17:23

add a comment |

up vote
1
down vote

favorite

Suppose I want to use ft_max_abs_scaler on every column of a dataset. This is what's in the documentation:

sc <- spark_connect(master = "local")

iris_tbl <- sdf_copy_to(sc, iris, name = "iris_tbl", overwrite = TRUE)



features <- c("Sepal_Length", "Sepal_Width", "Petal_Length", "Petal_Width")



iris_tbl <- iris_tbl %>%

  ft_vector_assembler(input_col = features,

                      output_col = "features_temp") %>%

  ft_max_abs_scaler(input_col = "features_temp",

                     output_col = "features")

iris_tbl <- iris_tbl %>% sdf_separate_column("features", into = features) 

# result in error because column name cannot be the same

Since there is no good way to delete columns, I wonder if there is a better way to do feature transformations with Sparklyr without keeping creating new columns.

edited Nov 7 at 15:45

desertnaut

15.3k53361

asked Nov 7 at 15:36

yughred

13711

Suppose I want to use ft_max_abs_scaler on every column of a dataset. This is what's in the documentation:

sc <- spark_connect(master = "local")

iris_tbl <- sdf_copy_to(sc, iris, name = "iris_tbl", overwrite = TRUE)



features <- c("Sepal_Length", "Sepal_Width", "Petal_Length", "Petal_Width")



iris_tbl <- iris_tbl %>%

  ft_vector_assembler(input_col = features,

                      output_col = "features_temp") %>%

  ft_max_abs_scaler(input_col = "features_temp",

                     output_col = "features")

iris_tbl <- iris_tbl %>% sdf_separate_column("features", into = features) 

# result in error because column name cannot be the same

Since there is no good way to delete columns, I wonder if there is a better way to do feature transformations with Sparklyr without keeping creating new columns.

r apache-spark sparklyr

edited Nov 7 at 15:45

desertnaut

15.3k53361

asked Nov 7 at 15:36

yughred

13711

edited Nov 7 at 15:45

desertnaut

15.3k53361

asked Nov 7 at 15:36

yughred

13711

edited Nov 7 at 15:45

desertnaut

15.3k53361

edited Nov 7 at 15:45

desertnaut

15.3k53361

edited Nov 7 at 15:45

desertnaut

15.3k53361

asked Nov 7 at 15:36

yughred

13711

asked Nov 7 at 15:36

yughred

13711

asked Nov 7 at 15:36

yughred

13711

2

Sigh... The thing is - sdf_separate_column is kind of a definition of a very-bad-idea. While it looks great on a toy examples, I just doesn't take into account specifics of the underlying system, and just doesn't scale and if you want to integrate this with other o.a.s.ml tools, it is completely useless. Also you can drop columns (with transmute(...) or select(-to_drop) for example).
– user6910411
Nov 7 at 17:23

add a comment |

2

Sigh... The thing is - sdf_separate_column is kind of a definition of a very-bad-idea. While it looks great on a toy examples, I just doesn't take into account specifics of the underlying system, and just doesn't scale and if you want to integrate this with other o.a.s.ml tools, it is completely useless. Also you can drop columns (with transmute(...) or select(-to_drop) for example).
– user6910411
Nov 7 at 17:23

Sigh... The thing is - sdf_separate_column is kind of a definition of a very-bad-idea. While it looks great on a toy examples, I just doesn't take into account specifics of the underlying system, and just doesn't scale and if you want to integrate this with other o.a.s.ml tools, it is completely useless. Also you can drop columns (with transmute(...) or select(-to_drop) for example).
– user6910411
Nov 7 at 17:23

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53192708%2fhow-to-properly-use-feature-transformation-functions-in-sparklyr%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

NWgkFlD1R,hwKNS0T549GQIAr6oJSgpcmIYmSGSHh3v1,4ZpogRt,b1y8H R J,1hFm2jxA Cf9EU8pO2JYIHylWr

搜尋此網誌

Wsrtjtyk