Feature extraction from images present in s3 using spark driver is giving error
up vote
0
down vote
favorite
I have a Pyspark application which will basically download image files somewhere s3 and extract features from those image files using keras.
Here is the entire flow:-
1. Download images from s3 using.
s3_files_rdd = sc.binaryFiles(s3_path) ## [('s3n://..',bytearray)]
2. Then convert the above byte inside the rdd to image object.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from io import BytesIO
def convert_binary_to_image_obj(obj):
img = mpimg.imread(BytesIO(obj), 'jpg')
return img
images_rdd = s3_files_rdd.map(lambda x: (x[0], convert_binary_to_image_obj(x[1])))
3. Now pass the images_rdd to another function to extract features using keras vgg16 model.
def initVGG16():
model = VGG16(weights='imagenet', include_top=True)
return Model(inputs=model.input, outputs=model.get_layer("fc2").output)
def extract_features(img):
img_data = image.img_to_array(img)
img_data = np.expand_dims(img_data, axis=0)
img_data = preprocess_input(img_data)
vgg16_feature = initVGG16().predict(img_data)[0]
return vgg16_feature
features_rdd = images_rdd.map(lambda x: (x[0], extract_features(x[1])))
But when I am trying to application it gives the below error message:-
ValueError: Error when checking input: expected input_1 to have shape (224, 224, 3) but got array with shape (300, 200, 3)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:330)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:470)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:453)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:284)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
I know the error here is in the extract_features function it expects the image to be of 224,224,3 size which is not the case right now. Because I am not saving the image to my local disk. I am directly converting using matplotlib lib to image object once I download from s3.
How to resolve this issue ?What I want basically is download the image from s3 and then in memory resize it like image.load_img(image_path, target_size=(224, 224))
function works and then pass this image object to my extract_features function.
keras pyspark deep-learning feature-extraction
add a comment |
up vote
0
down vote
favorite
I have a Pyspark application which will basically download image files somewhere s3 and extract features from those image files using keras.
Here is the entire flow:-
1. Download images from s3 using.
s3_files_rdd = sc.binaryFiles(s3_path) ## [('s3n://..',bytearray)]
2. Then convert the above byte inside the rdd to image object.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from io import BytesIO
def convert_binary_to_image_obj(obj):
img = mpimg.imread(BytesIO(obj), 'jpg')
return img
images_rdd = s3_files_rdd.map(lambda x: (x[0], convert_binary_to_image_obj(x[1])))
3. Now pass the images_rdd to another function to extract features using keras vgg16 model.
def initVGG16():
model = VGG16(weights='imagenet', include_top=True)
return Model(inputs=model.input, outputs=model.get_layer("fc2").output)
def extract_features(img):
img_data = image.img_to_array(img)
img_data = np.expand_dims(img_data, axis=0)
img_data = preprocess_input(img_data)
vgg16_feature = initVGG16().predict(img_data)[0]
return vgg16_feature
features_rdd = images_rdd.map(lambda x: (x[0], extract_features(x[1])))
But when I am trying to application it gives the below error message:-
ValueError: Error when checking input: expected input_1 to have shape (224, 224, 3) but got array with shape (300, 200, 3)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:330)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:470)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:453)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:284)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
I know the error here is in the extract_features function it expects the image to be of 224,224,3 size which is not the case right now. Because I am not saving the image to my local disk. I am directly converting using matplotlib lib to image object once I download from s3.
How to resolve this issue ?What I want basically is download the image from s3 and then in memory resize it like image.load_img(image_path, target_size=(224, 224))
function works and then pass this image object to my extract_features function.
keras pyspark deep-learning feature-extraction
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a Pyspark application which will basically download image files somewhere s3 and extract features from those image files using keras.
Here is the entire flow:-
1. Download images from s3 using.
s3_files_rdd = sc.binaryFiles(s3_path) ## [('s3n://..',bytearray)]
2. Then convert the above byte inside the rdd to image object.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from io import BytesIO
def convert_binary_to_image_obj(obj):
img = mpimg.imread(BytesIO(obj), 'jpg')
return img
images_rdd = s3_files_rdd.map(lambda x: (x[0], convert_binary_to_image_obj(x[1])))
3. Now pass the images_rdd to another function to extract features using keras vgg16 model.
def initVGG16():
model = VGG16(weights='imagenet', include_top=True)
return Model(inputs=model.input, outputs=model.get_layer("fc2").output)
def extract_features(img):
img_data = image.img_to_array(img)
img_data = np.expand_dims(img_data, axis=0)
img_data = preprocess_input(img_data)
vgg16_feature = initVGG16().predict(img_data)[0]
return vgg16_feature
features_rdd = images_rdd.map(lambda x: (x[0], extract_features(x[1])))
But when I am trying to application it gives the below error message:-
ValueError: Error when checking input: expected input_1 to have shape (224, 224, 3) but got array with shape (300, 200, 3)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:330)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:470)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:453)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:284)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
I know the error here is in the extract_features function it expects the image to be of 224,224,3 size which is not the case right now. Because I am not saving the image to my local disk. I am directly converting using matplotlib lib to image object once I download from s3.
How to resolve this issue ?What I want basically is download the image from s3 and then in memory resize it like image.load_img(image_path, target_size=(224, 224))
function works and then pass this image object to my extract_features function.
keras pyspark deep-learning feature-extraction
I have a Pyspark application which will basically download image files somewhere s3 and extract features from those image files using keras.
Here is the entire flow:-
1. Download images from s3 using.
s3_files_rdd = sc.binaryFiles(s3_path) ## [('s3n://..',bytearray)]
2. Then convert the above byte inside the rdd to image object.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from io import BytesIO
def convert_binary_to_image_obj(obj):
img = mpimg.imread(BytesIO(obj), 'jpg')
return img
images_rdd = s3_files_rdd.map(lambda x: (x[0], convert_binary_to_image_obj(x[1])))
3. Now pass the images_rdd to another function to extract features using keras vgg16 model.
def initVGG16():
model = VGG16(weights='imagenet', include_top=True)
return Model(inputs=model.input, outputs=model.get_layer("fc2").output)
def extract_features(img):
img_data = image.img_to_array(img)
img_data = np.expand_dims(img_data, axis=0)
img_data = preprocess_input(img_data)
vgg16_feature = initVGG16().predict(img_data)[0]
return vgg16_feature
features_rdd = images_rdd.map(lambda x: (x[0], extract_features(x[1])))
But when I am trying to application it gives the below error message:-
ValueError: Error when checking input: expected input_1 to have shape (224, 224, 3) but got array with shape (300, 200, 3)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:330)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:470)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:453)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:284)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
I know the error here is in the extract_features function it expects the image to be of 224,224,3 size which is not the case right now. Because I am not saving the image to my local disk. I am directly converting using matplotlib lib to image object once I download from s3.
How to resolve this issue ?What I want basically is download the image from s3 and then in memory resize it like image.load_img(image_path, target_size=(224, 224))
function works and then pass this image object to my extract_features function.
keras pyspark deep-learning feature-extraction
keras pyspark deep-learning feature-extraction
asked Nov 4 at 9:52
dks551
17310
17310
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53139544%2ffeature-extraction-from-images-present-in-s3-using-spark-driver-is-giving-error%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password