Saved Model : LinearRegression model used but the new data have a different vector size
up vote
-1
down vote
favorite
I am using Azure and Spark version is '2.1.1.2.6.2.3-1
I have saved my model using the following command:
def fit_LR(training,testing,adl_root_path,location,modelName):
training.cache()
lr = LinearRegression(featuresCol = 'features',labelCol = 'ZZ_TIME',solver="auto",maxIter=100)
lr_model = lr.fit(training)
testing.cache()
lr_outpath = adl_root_path + "Model/Sprint6Results/RUN/" + str(location) + str(modelName)
lr_model_save = lr_model.write().overwrite().save(lr_outpath)
When I tried to use the model and reloaded it
saved_model_path = adl_root_path + "Model/Sprint6Results/RUN/" + str(location) + str(modelName)
reloaded_model = LinearRegressionModel.load(saved_model_path)
testing.cache()
reloaded_model.numFeatures()
The original data features generated with the historical data had a vector size of 1545
The new data features generated by the same methodology with the same raw columns and then we just used string_indexer and one-hot-encoding only generated a size of 1361
The main difference that I saw was since the new data have smaller set of domain values that the historical it is creating smaller size
Is there a way to make it the same size ?
I am going to run the model score in different batches but the model fit is also done once a week .
Is there a solution to this issue?
The error I get is this:
Caused by java.lang.IllegalArgumentException: requirement failed: BLAS.dot(x:Vector, y:Vector) was given Vectors with non-matching sizes: x-size = 1361 y-size = 1545
pyspark linear-regression apache-spark-2.1.1
add a comment |
up vote
-1
down vote
favorite
I am using Azure and Spark version is '2.1.1.2.6.2.3-1
I have saved my model using the following command:
def fit_LR(training,testing,adl_root_path,location,modelName):
training.cache()
lr = LinearRegression(featuresCol = 'features',labelCol = 'ZZ_TIME',solver="auto",maxIter=100)
lr_model = lr.fit(training)
testing.cache()
lr_outpath = adl_root_path + "Model/Sprint6Results/RUN/" + str(location) + str(modelName)
lr_model_save = lr_model.write().overwrite().save(lr_outpath)
When I tried to use the model and reloaded it
saved_model_path = adl_root_path + "Model/Sprint6Results/RUN/" + str(location) + str(modelName)
reloaded_model = LinearRegressionModel.load(saved_model_path)
testing.cache()
reloaded_model.numFeatures()
The original data features generated with the historical data had a vector size of 1545
The new data features generated by the same methodology with the same raw columns and then we just used string_indexer and one-hot-encoding only generated a size of 1361
The main difference that I saw was since the new data have smaller set of domain values that the historical it is creating smaller size
Is there a way to make it the same size ?
I am going to run the model score in different batches but the model fit is also done once a week .
Is there a solution to this issue?
The error I get is this:
Caused by java.lang.IllegalArgumentException: requirement failed: BLAS.dot(x:Vector, y:Vector) was given Vectors with non-matching sizes: x-size = 1361 y-size = 1545
pyspark linear-regression apache-spark-2.1.1
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
I am using Azure and Spark version is '2.1.1.2.6.2.3-1
I have saved my model using the following command:
def fit_LR(training,testing,adl_root_path,location,modelName):
training.cache()
lr = LinearRegression(featuresCol = 'features',labelCol = 'ZZ_TIME',solver="auto",maxIter=100)
lr_model = lr.fit(training)
testing.cache()
lr_outpath = adl_root_path + "Model/Sprint6Results/RUN/" + str(location) + str(modelName)
lr_model_save = lr_model.write().overwrite().save(lr_outpath)
When I tried to use the model and reloaded it
saved_model_path = adl_root_path + "Model/Sprint6Results/RUN/" + str(location) + str(modelName)
reloaded_model = LinearRegressionModel.load(saved_model_path)
testing.cache()
reloaded_model.numFeatures()
The original data features generated with the historical data had a vector size of 1545
The new data features generated by the same methodology with the same raw columns and then we just used string_indexer and one-hot-encoding only generated a size of 1361
The main difference that I saw was since the new data have smaller set of domain values that the historical it is creating smaller size
Is there a way to make it the same size ?
I am going to run the model score in different batches but the model fit is also done once a week .
Is there a solution to this issue?
The error I get is this:
Caused by java.lang.IllegalArgumentException: requirement failed: BLAS.dot(x:Vector, y:Vector) was given Vectors with non-matching sizes: x-size = 1361 y-size = 1545
pyspark linear-regression apache-spark-2.1.1
I am using Azure and Spark version is '2.1.1.2.6.2.3-1
I have saved my model using the following command:
def fit_LR(training,testing,adl_root_path,location,modelName):
training.cache()
lr = LinearRegression(featuresCol = 'features',labelCol = 'ZZ_TIME',solver="auto",maxIter=100)
lr_model = lr.fit(training)
testing.cache()
lr_outpath = adl_root_path + "Model/Sprint6Results/RUN/" + str(location) + str(modelName)
lr_model_save = lr_model.write().overwrite().save(lr_outpath)
When I tried to use the model and reloaded it
saved_model_path = adl_root_path + "Model/Sprint6Results/RUN/" + str(location) + str(modelName)
reloaded_model = LinearRegressionModel.load(saved_model_path)
testing.cache()
reloaded_model.numFeatures()
The original data features generated with the historical data had a vector size of 1545
The new data features generated by the same methodology with the same raw columns and then we just used string_indexer and one-hot-encoding only generated a size of 1361
The main difference that I saw was since the new data have smaller set of domain values that the historical it is creating smaller size
Is there a way to make it the same size ?
I am going to run the model score in different batches but the model fit is also done once a week .
Is there a solution to this issue?
The error I get is this:
Caused by java.lang.IllegalArgumentException: requirement failed: BLAS.dot(x:Vector, y:Vector) was given Vectors with non-matching sizes: x-size = 1361 y-size = 1545
pyspark linear-regression apache-spark-2.1.1
pyspark linear-regression apache-spark-2.1.1
asked Nov 7 at 21:52
E B
315517
315517
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53198399%2fsaved-model-linearregression-model-used-but-the-new-data-have-a-different-vect%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown