Memory keeps rising when training FFNN pyTorch

up vote
1
down vote

favorite

I have a Feed forward Neural Network which classifies the MNIST data set.
For some reason the memory keeps approaching 99% no matter how big the batch size is.
I don't have anything which is increasing in size - every dynamic variable is getting overridden after the first epoch but even after epoch number 70 the memory keeps rising.

Im running it on 8GB memory, 2.8 ghZ intel i-5 (7th gen) quad core (Ubuntu 18.04)

batch_size =50 #Number of x's we pass through the net at each iteration

num_epochs =100 #Number of times the entire training data is through the    net



n_train = len(xtrain)

n_batch_train = n_train // batch_size

n_val =len(xval)

n_batch_val = n_val // batch_size



# loss/acc

train_acc,train_loss =,

val_acc,val_loss =,



test_acc,test_loss =,





#Get parameters from the net



par=



for i in range(len(layers)-1):

    par=par+list(net.L[i].parameters())







#Optimizer



optimizer = optim.Adam(par,lr=0.001)



#interval of x



get_slice = lambda i,size: range(i*size,(i+1)*size)





for e in range(num_epochs):

  curr_loss =0

  net.train()

  for i in range(n_batch_train):



    x_interval = get_slice(i,batch_size)

    slze = get_slice(i,batch_size)

    #Batchnorm

    bn = nn.BatchNorm1d(num_features = num_features)

    x_batch = bn(Variable(torch.from_numpy(xtrain[slze])))



    out = (net(x_batch)).double()

    target_batch = Variable(torch.from_numpy(ytrain[slze]).double())

    L=criterion(out,target_batch)



    #Update gradients

    optimizer.zero_grad()

    L.backward()

    optimizer.step()



    #Store training accuracy and loss

    train_acc.append(accuracy(target_batch, out).data)

    train_loss.append(L.data.numpy())



#### Validate ####

  net.eval()

  for j in range(n_batch_val):

    slze = get_slice(j,batch_size)

    val_batch = Variable(torch.from_numpy(xval[slze]))

    val_out = (net(bn(val_batch))).double()

    target_batch = Variable(torch.from_numpy(yval[slze]).double())



    #Store val acc and loss

    val_acc.append(accuracy(target_batch,val_out).data)

edited Nov 8 at 14:48

asked Nov 8 at 14:12

Jakob

488

Can you please look over your code formatting again? There seems to be an indentation error for one of your for-loops.
– dennlinger
Nov 8 at 14:32

Done! I think I might've located the error: the train_acc, train_loss=, was not reset after each epoch
– Jakob
Nov 8 at 14:48

But it has little to none impact on the memory
– Jakob
Nov 8 at 14:58

Is train_acc.append(accuracy(target_batch, out).data) computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset the train_acc. Also, what size of your data set are we talking about? 1000 images? 1 million?
– dennlinger
Nov 9 at 8:30

add a comment |

up vote
1
down vote

favorite

Im running it on 8GB memory, 2.8 ghZ intel i-5 (7th gen) quad core (Ubuntu 18.04)

batch_size =50 #Number of x's we pass through the net at each iteration

num_epochs =100 #Number of times the entire training data is through the    net



n_train = len(xtrain)

n_batch_train = n_train // batch_size

n_val =len(xval)

n_batch_val = n_val // batch_size



# loss/acc

train_acc,train_loss =,

val_acc,val_loss =,



test_acc,test_loss =,





#Get parameters from the net



par=



for i in range(len(layers)-1):

    par=par+list(net.L[i].parameters())







#Optimizer



optimizer = optim.Adam(par,lr=0.001)



#interval of x



get_slice = lambda i,size: range(i*size,(i+1)*size)





for e in range(num_epochs):

  curr_loss =0

  net.train()

  for i in range(n_batch_train):



    x_interval = get_slice(i,batch_size)

    slze = get_slice(i,batch_size)

    #Batchnorm

    bn = nn.BatchNorm1d(num_features = num_features)

    x_batch = bn(Variable(torch.from_numpy(xtrain[slze])))



    out = (net(x_batch)).double()

    target_batch = Variable(torch.from_numpy(ytrain[slze]).double())

    L=criterion(out,target_batch)



    #Update gradients

    optimizer.zero_grad()

    L.backward()

    optimizer.step()



    #Store training accuracy and loss

    train_acc.append(accuracy(target_batch, out).data)

    train_loss.append(L.data.numpy())



#### Validate ####

  net.eval()

  for j in range(n_batch_val):

    slze = get_slice(j,batch_size)

    val_batch = Variable(torch.from_numpy(xval[slze]))

    val_out = (net(bn(val_batch))).double()

    target_batch = Variable(torch.from_numpy(yval[slze]).double())



    #Store val acc and loss

    val_acc.append(accuracy(target_batch,val_out).data)

edited Nov 8 at 14:48

asked Nov 8 at 14:12

Jakob

488

Can you please look over your code formatting again? There seems to be an indentation error for one of your for-loops.
– dennlinger
Nov 8 at 14:32

Done! I think I might've located the error: the train_acc, train_loss=, was not reset after each epoch
– Jakob
Nov 8 at 14:48

But it has little to none impact on the memory
– Jakob
Nov 8 at 14:58

Is train_acc.append(accuracy(target_batch, out).data) computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset the train_acc. Also, what size of your data set are we talking about? 1000 images? 1 million?
– dennlinger
Nov 9 at 8:30

add a comment |

up vote
1
down vote

favorite

Im running it on 8GB memory, 2.8 ghZ intel i-5 (7th gen) quad core (Ubuntu 18.04)

batch_size =50 #Number of x's we pass through the net at each iteration

num_epochs =100 #Number of times the entire training data is through the    net



n_train = len(xtrain)

n_batch_train = n_train // batch_size

n_val =len(xval)

n_batch_val = n_val // batch_size



# loss/acc

train_acc,train_loss =,

val_acc,val_loss =,



test_acc,test_loss =,





#Get parameters from the net



par=



for i in range(len(layers)-1):

    par=par+list(net.L[i].parameters())







#Optimizer



optimizer = optim.Adam(par,lr=0.001)



#interval of x



get_slice = lambda i,size: range(i*size,(i+1)*size)





for e in range(num_epochs):

  curr_loss =0

  net.train()

  for i in range(n_batch_train):



    x_interval = get_slice(i,batch_size)

    slze = get_slice(i,batch_size)

    #Batchnorm

    bn = nn.BatchNorm1d(num_features = num_features)

    x_batch = bn(Variable(torch.from_numpy(xtrain[slze])))



    out = (net(x_batch)).double()

    target_batch = Variable(torch.from_numpy(ytrain[slze]).double())

    L=criterion(out,target_batch)



    #Update gradients

    optimizer.zero_grad()

    L.backward()

    optimizer.step()



    #Store training accuracy and loss

    train_acc.append(accuracy(target_batch, out).data)

    train_loss.append(L.data.numpy())



#### Validate ####

  net.eval()

  for j in range(n_batch_val):

    slze = get_slice(j,batch_size)

    val_batch = Variable(torch.from_numpy(xval[slze]))

    val_out = (net(bn(val_batch))).double()

    target_batch = Variable(torch.from_numpy(yval[slze]).double())



    #Store val acc and loss

    val_acc.append(accuracy(target_batch,val_out).data)

edited Nov 8 at 14:48

asked Nov 8 at 14:12

Jakob

488

Im running it on 8GB memory, 2.8 ghZ intel i-5 (7th gen) quad core (Ubuntu 18.04)

batch_size =50 #Number of x's we pass through the net at each iteration

num_epochs =100 #Number of times the entire training data is through the    net



n_train = len(xtrain)

n_batch_train = n_train // batch_size

n_val =len(xval)

n_batch_val = n_val // batch_size



# loss/acc

train_acc,train_loss =,

val_acc,val_loss =,



test_acc,test_loss =,





#Get parameters from the net



par=



for i in range(len(layers)-1):

    par=par+list(net.L[i].parameters())







#Optimizer



optimizer = optim.Adam(par,lr=0.001)



#interval of x



get_slice = lambda i,size: range(i*size,(i+1)*size)





for e in range(num_epochs):

  curr_loss =0

  net.train()

  for i in range(n_batch_train):



    x_interval = get_slice(i,batch_size)

    slze = get_slice(i,batch_size)

    #Batchnorm

    bn = nn.BatchNorm1d(num_features = num_features)

    x_batch = bn(Variable(torch.from_numpy(xtrain[slze])))



    out = (net(x_batch)).double()

    target_batch = Variable(torch.from_numpy(ytrain[slze]).double())

    L=criterion(out,target_batch)



    #Update gradients

    optimizer.zero_grad()

    L.backward()

    optimizer.step()



    #Store training accuracy and loss

    train_acc.append(accuracy(target_batch, out).data)

    train_loss.append(L.data.numpy())



#### Validate ####

  net.eval()

  for j in range(n_batch_val):

    slze = get_slice(j,batch_size)

    val_batch = Variable(torch.from_numpy(xval[slze]))

    val_out = (net(bn(val_batch))).double()

    target_batch = Variable(torch.from_numpy(yval[slze]).double())



    #Store val acc and loss

    val_acc.append(accuracy(target_batch,val_out).data)

python memory neural-network pytorch

edited Nov 8 at 14:48

asked Nov 8 at 14:12

Jakob

488

edited Nov 8 at 14:48

asked Nov 8 at 14:12

Jakob

488

edited Nov 8 at 14:48

asked Nov 8 at 14:12

Jakob

488

asked Nov 8 at 14:12

Jakob

488

asked Nov 8 at 14:12

Jakob

488

Can you please look over your code formatting again? There seems to be an indentation error for one of your for-loops.
– dennlinger
Nov 8 at 14:32

Done! I think I might've located the error: the train_acc, train_loss=, was not reset after each epoch
– Jakob
Nov 8 at 14:48

But it has little to none impact on the memory
– Jakob
Nov 8 at 14:58

Is train_acc.append(accuracy(target_batch, out).data) computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset the train_acc. Also, what size of your data set are we talking about? 1000 images? 1 million?
– dennlinger
Nov 9 at 8:30

add a comment |

Can you please look over your code formatting again? There seems to be an indentation error for one of your for-loops.
– dennlinger
Nov 8 at 14:32

Done! I think I might've located the error: the train_acc, train_loss=, was not reset after each epoch
– Jakob
Nov 8 at 14:48

But it has little to none impact on the memory
– Jakob
Nov 8 at 14:58

Is train_acc.append(accuracy(target_batch, out).data) computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset the train_acc. Also, what size of your data set are we talking about? 1000 images? 1 million?
– dennlinger
Nov 9 at 8:30

Can you please look over your code formatting again? There seems to be an indentation error for one of your for-loops.
– dennlinger
Nov 8 at 14:32

Done! I think I might've located the error: the train_acc, train_loss=, was not reset after each epoch
– Jakob
Nov 8 at 14:48

But it has little to none impact on the memory
– Jakob
Nov 8 at 14:58

Is train_acc.append(accuracy(target_batch, out).data) computing the single batch accuracy, or are you calculating the values for every data point (and appending the full list)? If the latter is the case, this could indeed be a problem, unless you now reset the train_acc. Also, what size of your data set are we talking about? 1000 images? 1 million?
– dennlinger
Nov 9 at 8:30

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53209513%2fmemory-keeps-rising-when-training-ffnn-pytorch%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk