How to properly use feature transformation functions in Sparklyr
up vote
1
down vote
favorite
Suppose I want to use ft_max_abs_scaler
on every column of a dataset. This is what's in the documentation:
sc <- spark_connect(master = "local")
iris_tbl <- sdf_copy_to(sc, iris, name = "iris_tbl", overwrite = TRUE)
features <- c("Sepal_Length", "Sepal_Width", "Petal_Length", "Petal_Width")
iris_tbl <- iris_tbl %>%
ft_vector_assembler(input_col = features,
output_col = "features_temp") %>%
ft_max_abs_scaler(input_col = "features_temp",
output_col = "features")
Note that ft_vector_assembler
creates a new column features_temp
and ft_max_abs_scaler
creates another new column features
. Now suppose I want to break down the vector into individual columns, I have to do this:
iris_tbl <- iris_tbl %>% sdf_separate_column("features", into = features)
# result in error because column name cannot be the same
Since there is no good way to delete columns, I wonder if there is a better way to do feature transformations with Sparklyr without keeping creating new columns.
r apache-spark sparklyr
add a comment |
up vote
1
down vote
favorite
Suppose I want to use ft_max_abs_scaler
on every column of a dataset. This is what's in the documentation:
sc <- spark_connect(master = "local")
iris_tbl <- sdf_copy_to(sc, iris, name = "iris_tbl", overwrite = TRUE)
features <- c("Sepal_Length", "Sepal_Width", "Petal_Length", "Petal_Width")
iris_tbl <- iris_tbl %>%
ft_vector_assembler(input_col = features,
output_col = "features_temp") %>%
ft_max_abs_scaler(input_col = "features_temp",
output_col = "features")
Note that ft_vector_assembler
creates a new column features_temp
and ft_max_abs_scaler
creates another new column features
. Now suppose I want to break down the vector into individual columns, I have to do this:
iris_tbl <- iris_tbl %>% sdf_separate_column("features", into = features)
# result in error because column name cannot be the same
Since there is no good way to delete columns, I wonder if there is a better way to do feature transformations with Sparklyr without keeping creating new columns.
r apache-spark sparklyr
2
Sigh... The thing is -sdf_separate_column
is kind of a definition of a very-bad-idea. While it looks great on a toy examples, I just doesn't take into account specifics of the underlying system, and just doesn't scale and if you want to integrate this with othero.a.s.ml
tools, it is completely useless. Also you can drop columns (withtransmute(...)
orselect(-to_drop)
for example).
– user6910411
Nov 7 at 17:23
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Suppose I want to use ft_max_abs_scaler
on every column of a dataset. This is what's in the documentation:
sc <- spark_connect(master = "local")
iris_tbl <- sdf_copy_to(sc, iris, name = "iris_tbl", overwrite = TRUE)
features <- c("Sepal_Length", "Sepal_Width", "Petal_Length", "Petal_Width")
iris_tbl <- iris_tbl %>%
ft_vector_assembler(input_col = features,
output_col = "features_temp") %>%
ft_max_abs_scaler(input_col = "features_temp",
output_col = "features")
Note that ft_vector_assembler
creates a new column features_temp
and ft_max_abs_scaler
creates another new column features
. Now suppose I want to break down the vector into individual columns, I have to do this:
iris_tbl <- iris_tbl %>% sdf_separate_column("features", into = features)
# result in error because column name cannot be the same
Since there is no good way to delete columns, I wonder if there is a better way to do feature transformations with Sparklyr without keeping creating new columns.
r apache-spark sparklyr
Suppose I want to use ft_max_abs_scaler
on every column of a dataset. This is what's in the documentation:
sc <- spark_connect(master = "local")
iris_tbl <- sdf_copy_to(sc, iris, name = "iris_tbl", overwrite = TRUE)
features <- c("Sepal_Length", "Sepal_Width", "Petal_Length", "Petal_Width")
iris_tbl <- iris_tbl %>%
ft_vector_assembler(input_col = features,
output_col = "features_temp") %>%
ft_max_abs_scaler(input_col = "features_temp",
output_col = "features")
Note that ft_vector_assembler
creates a new column features_temp
and ft_max_abs_scaler
creates another new column features
. Now suppose I want to break down the vector into individual columns, I have to do this:
iris_tbl <- iris_tbl %>% sdf_separate_column("features", into = features)
# result in error because column name cannot be the same
Since there is no good way to delete columns, I wonder if there is a better way to do feature transformations with Sparklyr without keeping creating new columns.
r apache-spark sparklyr
r apache-spark sparklyr
edited Nov 7 at 15:45
desertnaut
15.3k53361
15.3k53361
asked Nov 7 at 15:36
yughred
13711
13711
2
Sigh... The thing is -sdf_separate_column
is kind of a definition of a very-bad-idea. While it looks great on a toy examples, I just doesn't take into account specifics of the underlying system, and just doesn't scale and if you want to integrate this with othero.a.s.ml
tools, it is completely useless. Also you can drop columns (withtransmute(...)
orselect(-to_drop)
for example).
– user6910411
Nov 7 at 17:23
add a comment |
2
Sigh... The thing is -sdf_separate_column
is kind of a definition of a very-bad-idea. While it looks great on a toy examples, I just doesn't take into account specifics of the underlying system, and just doesn't scale and if you want to integrate this with othero.a.s.ml
tools, it is completely useless. Also you can drop columns (withtransmute(...)
orselect(-to_drop)
for example).
– user6910411
Nov 7 at 17:23
2
2
Sigh... The thing is -
sdf_separate_column
is kind of a definition of a very-bad-idea. While it looks great on a toy examples, I just doesn't take into account specifics of the underlying system, and just doesn't scale and if you want to integrate this with other o.a.s.ml
tools, it is completely useless. Also you can drop columns (with transmute(...)
or select(-to_drop)
for example).– user6910411
Nov 7 at 17:23
Sigh... The thing is -
sdf_separate_column
is kind of a definition of a very-bad-idea. While it looks great on a toy examples, I just doesn't take into account specifics of the underlying system, and just doesn't scale and if you want to integrate this with other o.a.s.ml
tools, it is completely useless. Also you can drop columns (with transmute(...)
or select(-to_drop)
for example).– user6910411
Nov 7 at 17:23
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53192708%2fhow-to-properly-use-feature-transformation-functions-in-sparklyr%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Sigh... The thing is -
sdf_separate_column
is kind of a definition of a very-bad-idea. While it looks great on a toy examples, I just doesn't take into account specifics of the underlying system, and just doesn't scale and if you want to integrate this with othero.a.s.ml
tools, it is completely useless. Also you can drop columns (withtransmute(...)
orselect(-to_drop)
for example).– user6910411
Nov 7 at 17:23