Memory keeps rising when training FFNN pyTorch











up vote
1
down vote

favorite
1












I have a Feed forward Neural Network which classifies the MNIST data set.
For some reason the memory keeps approaching 99% no matter how big the batch size is.
I don't have anything which is increasing in size - every dynamic variable is getting overridden after the first epoch but even after epoch number 70 the memory keeps rising.



Im running it on 8GB memory, 2.8 ghZ intel i-5 (7th gen) quad core (Ubuntu 18.04)



batch_size =50 #Number of x's we pass through the net at each iteration
num_epochs =100 #Number of times the entire training data is through the net

n_train = len(xtrain)
n_batch_train = n_train // batch_size
n_val =len(xval)
n_batch_val = n_val // batch_size

# loss/acc
train_acc,train_loss =,
val_acc,val_loss =,

test_acc,test_loss =,


#Get parameters from the net

par=

for i in range(len(layers)-1):
par=par+list(net.L[i].parameters())



#Optimizer

optimizer = optim.Adam(par,lr=0.001)

#interval of x

get_slice = lambda i,size: range(i*size,(i+1)*size)


for e in range(num_epochs):
curr_loss =0
net.train()
for i in range(n_batch_train):

x_interval = get_slice(i,batch_size)
slze = get_slice(i,batch_size)
#Batchnorm
bn = nn.BatchNorm1d(num_features = num_features)
x_batch = bn(Variable(torch.from_numpy(xtrain[slze])))

out = (net(x_batch)).double()
target_batch = Variable(torch.from_numpy(ytrain[slze]).double())
L=criterion(out,target_batch)

#Update gradients
optimizer.zero_grad()
L.backward()
optimizer.step()

#Store training accuracy and loss
train_acc.append(accuracy(target_batch, out).data)
train_loss.append(L.data.numpy())

#### Validate ####
net.eval()
for j in range(n_batch_val):
slze = get_slice(j,batch_size)
val_batch = Variable(torch.from_numpy(xval[slze]))
val_out = (net(bn(val_batch))).double()
target_batch = Variable(torch.from_numpy(yval[slze]).double())

#Store val acc and loss
val_acc.append(accuracy(target_batch,val_out).data)









share|improve this question
























  • Can you please look over your code formatting again? There seems to be an indentation error for one of your for-loops.
    – dennlinger
    Nov 8 at 14:32










  • Done! I think I might've located the error: the train_acc, train_loss=, was not reset after each epoch
    – Jakob
    Nov 8 at 14:48












  • But it has little to none impact on the memory
    – Jakob
    Nov 8 at 14:58










  • Is train_acc.append(accuracy(target_batch, out).data) computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset the train_acc. Also, what size of your data set are we talking about? 1000 images? 1 million?
    – dennlinger
    Nov 9 at 8:30















up vote
1
down vote

favorite
1












I have a Feed forward Neural Network which classifies the MNIST data set.
For some reason the memory keeps approaching 99% no matter how big the batch size is.
I don't have anything which is increasing in size - every dynamic variable is getting overridden after the first epoch but even after epoch number 70 the memory keeps rising.



Im running it on 8GB memory, 2.8 ghZ intel i-5 (7th gen) quad core (Ubuntu 18.04)



batch_size =50 #Number of x's we pass through the net at each iteration
num_epochs =100 #Number of times the entire training data is through the net

n_train = len(xtrain)
n_batch_train = n_train // batch_size
n_val =len(xval)
n_batch_val = n_val // batch_size

# loss/acc
train_acc,train_loss =,
val_acc,val_loss =,

test_acc,test_loss =,


#Get parameters from the net

par=

for i in range(len(layers)-1):
par=par+list(net.L[i].parameters())



#Optimizer

optimizer = optim.Adam(par,lr=0.001)

#interval of x

get_slice = lambda i,size: range(i*size,(i+1)*size)


for e in range(num_epochs):
curr_loss =0
net.train()
for i in range(n_batch_train):

x_interval = get_slice(i,batch_size)
slze = get_slice(i,batch_size)
#Batchnorm
bn = nn.BatchNorm1d(num_features = num_features)
x_batch = bn(Variable(torch.from_numpy(xtrain[slze])))

out = (net(x_batch)).double()
target_batch = Variable(torch.from_numpy(ytrain[slze]).double())
L=criterion(out,target_batch)

#Update gradients
optimizer.zero_grad()
L.backward()
optimizer.step()

#Store training accuracy and loss
train_acc.append(accuracy(target_batch, out).data)
train_loss.append(L.data.numpy())

#### Validate ####
net.eval()
for j in range(n_batch_val):
slze = get_slice(j,batch_size)
val_batch = Variable(torch.from_numpy(xval[slze]))
val_out = (net(bn(val_batch))).double()
target_batch = Variable(torch.from_numpy(yval[slze]).double())

#Store val acc and loss
val_acc.append(accuracy(target_batch,val_out).data)









share|improve this question
























  • Can you please look over your code formatting again? There seems to be an indentation error for one of your for-loops.
    – dennlinger
    Nov 8 at 14:32










  • Done! I think I might've located the error: the train_acc, train_loss=, was not reset after each epoch
    – Jakob
    Nov 8 at 14:48












  • But it has little to none impact on the memory
    – Jakob
    Nov 8 at 14:58










  • Is train_acc.append(accuracy(target_batch, out).data) computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset the train_acc. Also, what size of your data set are we talking about? 1000 images? 1 million?
    – dennlinger
    Nov 9 at 8:30













up vote
1
down vote

favorite
1









up vote
1
down vote

favorite
1






1





I have a Feed forward Neural Network which classifies the MNIST data set.
For some reason the memory keeps approaching 99% no matter how big the batch size is.
I don't have anything which is increasing in size - every dynamic variable is getting overridden after the first epoch but even after epoch number 70 the memory keeps rising.



Im running it on 8GB memory, 2.8 ghZ intel i-5 (7th gen) quad core (Ubuntu 18.04)



batch_size =50 #Number of x's we pass through the net at each iteration
num_epochs =100 #Number of times the entire training data is through the net

n_train = len(xtrain)
n_batch_train = n_train // batch_size
n_val =len(xval)
n_batch_val = n_val // batch_size

# loss/acc
train_acc,train_loss =,
val_acc,val_loss =,

test_acc,test_loss =,


#Get parameters from the net

par=

for i in range(len(layers)-1):
par=par+list(net.L[i].parameters())



#Optimizer

optimizer = optim.Adam(par,lr=0.001)

#interval of x

get_slice = lambda i,size: range(i*size,(i+1)*size)


for e in range(num_epochs):
curr_loss =0
net.train()
for i in range(n_batch_train):

x_interval = get_slice(i,batch_size)
slze = get_slice(i,batch_size)
#Batchnorm
bn = nn.BatchNorm1d(num_features = num_features)
x_batch = bn(Variable(torch.from_numpy(xtrain[slze])))

out = (net(x_batch)).double()
target_batch = Variable(torch.from_numpy(ytrain[slze]).double())
L=criterion(out,target_batch)

#Update gradients
optimizer.zero_grad()
L.backward()
optimizer.step()

#Store training accuracy and loss
train_acc.append(accuracy(target_batch, out).data)
train_loss.append(L.data.numpy())

#### Validate ####
net.eval()
for j in range(n_batch_val):
slze = get_slice(j,batch_size)
val_batch = Variable(torch.from_numpy(xval[slze]))
val_out = (net(bn(val_batch))).double()
target_batch = Variable(torch.from_numpy(yval[slze]).double())

#Store val acc and loss
val_acc.append(accuracy(target_batch,val_out).data)









share|improve this question















I have a Feed forward Neural Network which classifies the MNIST data set.
For some reason the memory keeps approaching 99% no matter how big the batch size is.
I don't have anything which is increasing in size - every dynamic variable is getting overridden after the first epoch but even after epoch number 70 the memory keeps rising.



Im running it on 8GB memory, 2.8 ghZ intel i-5 (7th gen) quad core (Ubuntu 18.04)



batch_size =50 #Number of x's we pass through the net at each iteration
num_epochs =100 #Number of times the entire training data is through the net

n_train = len(xtrain)
n_batch_train = n_train // batch_size
n_val =len(xval)
n_batch_val = n_val // batch_size

# loss/acc
train_acc,train_loss =,
val_acc,val_loss =,

test_acc,test_loss =,


#Get parameters from the net

par=

for i in range(len(layers)-1):
par=par+list(net.L[i].parameters())



#Optimizer

optimizer = optim.Adam(par,lr=0.001)

#interval of x

get_slice = lambda i,size: range(i*size,(i+1)*size)


for e in range(num_epochs):
curr_loss =0
net.train()
for i in range(n_batch_train):

x_interval = get_slice(i,batch_size)
slze = get_slice(i,batch_size)
#Batchnorm
bn = nn.BatchNorm1d(num_features = num_features)
x_batch = bn(Variable(torch.from_numpy(xtrain[slze])))

out = (net(x_batch)).double()
target_batch = Variable(torch.from_numpy(ytrain[slze]).double())
L=criterion(out,target_batch)

#Update gradients
optimizer.zero_grad()
L.backward()
optimizer.step()

#Store training accuracy and loss
train_acc.append(accuracy(target_batch, out).data)
train_loss.append(L.data.numpy())

#### Validate ####
net.eval()
for j in range(n_batch_val):
slze = get_slice(j,batch_size)
val_batch = Variable(torch.from_numpy(xval[slze]))
val_out = (net(bn(val_batch))).double()
target_batch = Variable(torch.from_numpy(yval[slze]).double())

#Store val acc and loss
val_acc.append(accuracy(target_batch,val_out).data)






python memory neural-network pytorch






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 8 at 14:48

























asked Nov 8 at 14:12









Jakob

488




488












  • Can you please look over your code formatting again? There seems to be an indentation error for one of your for-loops.
    – dennlinger
    Nov 8 at 14:32










  • Done! I think I might've located the error: the train_acc, train_loss=, was not reset after each epoch
    – Jakob
    Nov 8 at 14:48












  • But it has little to none impact on the memory
    – Jakob
    Nov 8 at 14:58










  • Is train_acc.append(accuracy(target_batch, out).data) computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset the train_acc. Also, what size of your data set are we talking about? 1000 images? 1 million?
    – dennlinger
    Nov 9 at 8:30


















  • Can you please look over your code formatting again? There seems to be an indentation error for one of your for-loops.
    – dennlinger
    Nov 8 at 14:32










  • Done! I think I might've located the error: the train_acc, train_loss=, was not reset after each epoch
    – Jakob
    Nov 8 at 14:48












  • But it has little to none impact on the memory
    – Jakob
    Nov 8 at 14:58










  • Is train_acc.append(accuracy(target_batch, out).data) computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset the train_acc. Also, what size of your data set are we talking about? 1000 images? 1 million?
    – dennlinger
    Nov 9 at 8:30
















Can you please look over your code formatting again? There seems to be an indentation error for one of your for-loops.
– dennlinger
Nov 8 at 14:32




Can you please look over your code formatting again? There seems to be an indentation error for one of your for-loops.
– dennlinger
Nov 8 at 14:32












Done! I think I might've located the error: the train_acc, train_loss=, was not reset after each epoch
– Jakob
Nov 8 at 14:48






Done! I think I might've located the error: the train_acc, train_loss=, was not reset after each epoch
– Jakob
Nov 8 at 14:48














But it has little to none impact on the memory
– Jakob
Nov 8 at 14:58




But it has little to none impact on the memory
– Jakob
Nov 8 at 14:58












Is train_acc.append(accuracy(target_batch, out).data) computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset the train_acc. Also, what size of your data set are we talking about? 1000 images? 1 million?
– dennlinger
Nov 9 at 8:30




Is train_acc.append(accuracy(target_batch, out).data) computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset the train_acc. Also, what size of your data set are we talking about? 1000 images? 1 million?
– dennlinger
Nov 9 at 8:30

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53209513%2fmemory-keeps-rising-when-training-ffnn-pytorch%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53209513%2fmemory-keeps-rising-when-training-ffnn-pytorch%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Xamarin.form Move up view when keyboard appear

Post-Redirect-Get with Spring WebFlux and Thymeleaf

Anylogic : not able to use stopDelay()