Memory keeps rising when training FFNN pyTorch
up vote
1
down vote
favorite
I have a Feed forward Neural Network which classifies the MNIST data set.
For some reason the memory keeps approaching 99% no matter how big the batch size is.
I don't have anything which is increasing in size - every dynamic variable is getting overridden after the first epoch but even after epoch number 70 the memory keeps rising.
Im running it on 8GB memory, 2.8 ghZ intel i-5 (7th gen) quad core (Ubuntu 18.04)
batch_size =50 #Number of x's we pass through the net at each iteration
num_epochs =100 #Number of times the entire training data is through the net
n_train = len(xtrain)
n_batch_train = n_train // batch_size
n_val =len(xval)
n_batch_val = n_val // batch_size
# loss/acc
train_acc,train_loss =,
val_acc,val_loss =,
test_acc,test_loss =,
#Get parameters from the net
par=
for i in range(len(layers)-1):
par=par+list(net.L[i].parameters())
#Optimizer
optimizer = optim.Adam(par,lr=0.001)
#interval of x
get_slice = lambda i,size: range(i*size,(i+1)*size)
for e in range(num_epochs):
curr_loss =0
net.train()
for i in range(n_batch_train):
x_interval = get_slice(i,batch_size)
slze = get_slice(i,batch_size)
#Batchnorm
bn = nn.BatchNorm1d(num_features = num_features)
x_batch = bn(Variable(torch.from_numpy(xtrain[slze])))
out = (net(x_batch)).double()
target_batch = Variable(torch.from_numpy(ytrain[slze]).double())
L=criterion(out,target_batch)
#Update gradients
optimizer.zero_grad()
L.backward()
optimizer.step()
#Store training accuracy and loss
train_acc.append(accuracy(target_batch, out).data)
train_loss.append(L.data.numpy())
#### Validate ####
net.eval()
for j in range(n_batch_val):
slze = get_slice(j,batch_size)
val_batch = Variable(torch.from_numpy(xval[slze]))
val_out = (net(bn(val_batch))).double()
target_batch = Variable(torch.from_numpy(yval[slze]).double())
#Store val acc and loss
val_acc.append(accuracy(target_batch,val_out).data)
python memory neural-network pytorch
add a comment |
up vote
1
down vote
favorite
I have a Feed forward Neural Network which classifies the MNIST data set.
For some reason the memory keeps approaching 99% no matter how big the batch size is.
I don't have anything which is increasing in size - every dynamic variable is getting overridden after the first epoch but even after epoch number 70 the memory keeps rising.
Im running it on 8GB memory, 2.8 ghZ intel i-5 (7th gen) quad core (Ubuntu 18.04)
batch_size =50 #Number of x's we pass through the net at each iteration
num_epochs =100 #Number of times the entire training data is through the net
n_train = len(xtrain)
n_batch_train = n_train // batch_size
n_val =len(xval)
n_batch_val = n_val // batch_size
# loss/acc
train_acc,train_loss =,
val_acc,val_loss =,
test_acc,test_loss =,
#Get parameters from the net
par=
for i in range(len(layers)-1):
par=par+list(net.L[i].parameters())
#Optimizer
optimizer = optim.Adam(par,lr=0.001)
#interval of x
get_slice = lambda i,size: range(i*size,(i+1)*size)
for e in range(num_epochs):
curr_loss =0
net.train()
for i in range(n_batch_train):
x_interval = get_slice(i,batch_size)
slze = get_slice(i,batch_size)
#Batchnorm
bn = nn.BatchNorm1d(num_features = num_features)
x_batch = bn(Variable(torch.from_numpy(xtrain[slze])))
out = (net(x_batch)).double()
target_batch = Variable(torch.from_numpy(ytrain[slze]).double())
L=criterion(out,target_batch)
#Update gradients
optimizer.zero_grad()
L.backward()
optimizer.step()
#Store training accuracy and loss
train_acc.append(accuracy(target_batch, out).data)
train_loss.append(L.data.numpy())
#### Validate ####
net.eval()
for j in range(n_batch_val):
slze = get_slice(j,batch_size)
val_batch = Variable(torch.from_numpy(xval[slze]))
val_out = (net(bn(val_batch))).double()
target_batch = Variable(torch.from_numpy(yval[slze]).double())
#Store val acc and loss
val_acc.append(accuracy(target_batch,val_out).data)
python memory neural-network pytorch
Can you please look over your code formatting again? There seems to be an indentation error for one of yourfor
-loops.
– dennlinger
Nov 8 at 14:32
Done! I think I might've located the error: thetrain_acc, train_loss=,
was not reset after each epoch
– Jakob
Nov 8 at 14:48
But it has little to none impact on the memory
– Jakob
Nov 8 at 14:58
Istrain_acc.append(accuracy(target_batch, out).data)
computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset thetrain_acc
. Also, what size of your data set are we talking about? 1000 images? 1 million?
– dennlinger
Nov 9 at 8:30
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a Feed forward Neural Network which classifies the MNIST data set.
For some reason the memory keeps approaching 99% no matter how big the batch size is.
I don't have anything which is increasing in size - every dynamic variable is getting overridden after the first epoch but even after epoch number 70 the memory keeps rising.
Im running it on 8GB memory, 2.8 ghZ intel i-5 (7th gen) quad core (Ubuntu 18.04)
batch_size =50 #Number of x's we pass through the net at each iteration
num_epochs =100 #Number of times the entire training data is through the net
n_train = len(xtrain)
n_batch_train = n_train // batch_size
n_val =len(xval)
n_batch_val = n_val // batch_size
# loss/acc
train_acc,train_loss =,
val_acc,val_loss =,
test_acc,test_loss =,
#Get parameters from the net
par=
for i in range(len(layers)-1):
par=par+list(net.L[i].parameters())
#Optimizer
optimizer = optim.Adam(par,lr=0.001)
#interval of x
get_slice = lambda i,size: range(i*size,(i+1)*size)
for e in range(num_epochs):
curr_loss =0
net.train()
for i in range(n_batch_train):
x_interval = get_slice(i,batch_size)
slze = get_slice(i,batch_size)
#Batchnorm
bn = nn.BatchNorm1d(num_features = num_features)
x_batch = bn(Variable(torch.from_numpy(xtrain[slze])))
out = (net(x_batch)).double()
target_batch = Variable(torch.from_numpy(ytrain[slze]).double())
L=criterion(out,target_batch)
#Update gradients
optimizer.zero_grad()
L.backward()
optimizer.step()
#Store training accuracy and loss
train_acc.append(accuracy(target_batch, out).data)
train_loss.append(L.data.numpy())
#### Validate ####
net.eval()
for j in range(n_batch_val):
slze = get_slice(j,batch_size)
val_batch = Variable(torch.from_numpy(xval[slze]))
val_out = (net(bn(val_batch))).double()
target_batch = Variable(torch.from_numpy(yval[slze]).double())
#Store val acc and loss
val_acc.append(accuracy(target_batch,val_out).data)
python memory neural-network pytorch
I have a Feed forward Neural Network which classifies the MNIST data set.
For some reason the memory keeps approaching 99% no matter how big the batch size is.
I don't have anything which is increasing in size - every dynamic variable is getting overridden after the first epoch but even after epoch number 70 the memory keeps rising.
Im running it on 8GB memory, 2.8 ghZ intel i-5 (7th gen) quad core (Ubuntu 18.04)
batch_size =50 #Number of x's we pass through the net at each iteration
num_epochs =100 #Number of times the entire training data is through the net
n_train = len(xtrain)
n_batch_train = n_train // batch_size
n_val =len(xval)
n_batch_val = n_val // batch_size
# loss/acc
train_acc,train_loss =,
val_acc,val_loss =,
test_acc,test_loss =,
#Get parameters from the net
par=
for i in range(len(layers)-1):
par=par+list(net.L[i].parameters())
#Optimizer
optimizer = optim.Adam(par,lr=0.001)
#interval of x
get_slice = lambda i,size: range(i*size,(i+1)*size)
for e in range(num_epochs):
curr_loss =0
net.train()
for i in range(n_batch_train):
x_interval = get_slice(i,batch_size)
slze = get_slice(i,batch_size)
#Batchnorm
bn = nn.BatchNorm1d(num_features = num_features)
x_batch = bn(Variable(torch.from_numpy(xtrain[slze])))
out = (net(x_batch)).double()
target_batch = Variable(torch.from_numpy(ytrain[slze]).double())
L=criterion(out,target_batch)
#Update gradients
optimizer.zero_grad()
L.backward()
optimizer.step()
#Store training accuracy and loss
train_acc.append(accuracy(target_batch, out).data)
train_loss.append(L.data.numpy())
#### Validate ####
net.eval()
for j in range(n_batch_val):
slze = get_slice(j,batch_size)
val_batch = Variable(torch.from_numpy(xval[slze]))
val_out = (net(bn(val_batch))).double()
target_batch = Variable(torch.from_numpy(yval[slze]).double())
#Store val acc and loss
val_acc.append(accuracy(target_batch,val_out).data)
python memory neural-network pytorch
python memory neural-network pytorch
edited Nov 8 at 14:48
asked Nov 8 at 14:12
Jakob
488
488
Can you please look over your code formatting again? There seems to be an indentation error for one of yourfor
-loops.
– dennlinger
Nov 8 at 14:32
Done! I think I might've located the error: thetrain_acc, train_loss=,
was not reset after each epoch
– Jakob
Nov 8 at 14:48
But it has little to none impact on the memory
– Jakob
Nov 8 at 14:58
Istrain_acc.append(accuracy(target_batch, out).data)
computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset thetrain_acc
. Also, what size of your data set are we talking about? 1000 images? 1 million?
– dennlinger
Nov 9 at 8:30
add a comment |
Can you please look over your code formatting again? There seems to be an indentation error for one of yourfor
-loops.
– dennlinger
Nov 8 at 14:32
Done! I think I might've located the error: thetrain_acc, train_loss=,
was not reset after each epoch
– Jakob
Nov 8 at 14:48
But it has little to none impact on the memory
– Jakob
Nov 8 at 14:58
Istrain_acc.append(accuracy(target_batch, out).data)
computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset thetrain_acc
. Also, what size of your data set are we talking about? 1000 images? 1 million?
– dennlinger
Nov 9 at 8:30
Can you please look over your code formatting again? There seems to be an indentation error for one of your
for
-loops.– dennlinger
Nov 8 at 14:32
Can you please look over your code formatting again? There seems to be an indentation error for one of your
for
-loops.– dennlinger
Nov 8 at 14:32
Done! I think I might've located the error: the
train_acc, train_loss=,
was not reset after each epoch– Jakob
Nov 8 at 14:48
Done! I think I might've located the error: the
train_acc, train_loss=,
was not reset after each epoch– Jakob
Nov 8 at 14:48
But it has little to none impact on the memory
– Jakob
Nov 8 at 14:58
But it has little to none impact on the memory
– Jakob
Nov 8 at 14:58
Is
train_acc.append(accuracy(target_batch, out).data)
computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset the train_acc
. Also, what size of your data set are we talking about? 1000 images? 1 million?– dennlinger
Nov 9 at 8:30
Is
train_acc.append(accuracy(target_batch, out).data)
computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset the train_acc
. Also, what size of your data set are we talking about? 1000 images? 1 million?– dennlinger
Nov 9 at 8:30
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53209513%2fmemory-keeps-rising-when-training-ffnn-pytorch%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you please look over your code formatting again? There seems to be an indentation error for one of your
for
-loops.– dennlinger
Nov 8 at 14:32
Done! I think I might've located the error: the
train_acc, train_loss=,
was not reset after each epoch– Jakob
Nov 8 at 14:48
But it has little to none impact on the memory
– Jakob
Nov 8 at 14:58
Is
train_acc.append(accuracy(target_batch, out).data)
computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset thetrain_acc
. Also, what size of your data set are we talking about? 1000 images? 1 million?– dennlinger
Nov 9 at 8:30