Skip to content

Data loading error while running DMM (dev branch) #788

@sameerkhurana10

Description

@sameerkhurana10

hi,

pyro=0.1.2
pytorch=0.4.0a0+5eefe87

Running DMM on the dev branch I got the below logs:

aconda3/envs/nova-cpu-bedge/lib/python3.5/site-packages/pyro_ppl-0.1.2-py3.5.egg/pyro/poutine/trace.py:17: UserWarning: Encountered NAN log_pdf at site 'z_89'
  warnings.warn("Encountered NAN log_pdf at site '{}'".format(name))
/data/sls/u/sameerk/anaconda3/envs/nova-cpu-bedge/lib/python3.5/site-packages/pyro_ppl-0.1.2-py3.5.egg/pyro/poutine/trace.py:17: UserWarning: Encountered NAN log_pdf at site 'z_90'
  warnings.warn("Encountered NAN log_pdf at site '{}'".format(name))
/data/sls/u/sameerk/anaconda3/envs/nova-cpu-bedge/lib/python3.5/site-packages/pyro_ppl-0.1.2-py3.5.egg/pyro/poutine/trace.py:17: UserWarning: Encountered NAN log_pdf at site 'z_97'
  warnings.warn("Encountered NAN log_pdf at site '{}'".format(name))
/data/sls/u/sameerk/anaconda3/envs/nova-cpu-bedge/lib/python3.5/site-packages/pyro_ppl-0.1.2-py3.5.egg/pyro/poutine/trace.py:17: UserWarning: Encountered NAN log_pdf at site 'z_99'
  warnings.warn("Encountered NAN log_pdf at site '{}'".format(name))
/data/sls/u/sameerk/anaconda3/envs/nova-cpu-bedge/lib/python3.5/site-packages/pyro_ppl-0.1.2-py3.5.egg/pyro/poutine/trace.py:17: UserWarning: Encountered NAN log_pdf at site 'z_91'
  warnings.warn("Encountered NAN log_pdf at site '{}'".format(name))
/data/sls/u/sameerk/anacond......

Then

[training epoch 0001]  nan 				(dt = 8.369 sec)
[training epoch 0002]  nan 				(dt = 8.041 sec)
[training epoch 0003]  nan 				(dt = 8.240 sec)
[training epoch 0004]  nan 				(dt = 8.217 sec)
[training epoch 0005]  nan 				(dt = 8.588 sec)
[training epoch 0006]  nan 				(dt = 8.512 sec)
[training epoch 0007]  nan 				(dt = 8.644 sec)
[training epoch 0008]  nan 				(dt = 8.434 sec)
[training epoch 0009]  nan 				(dt = 9.041 sec)
[training epoch 0010]  nan 				(dt = 8.911 sec)

I stopped the program and Ran it again, did not face the same issue:

Namespace(annealing_epochs=1000, beta1=0.96, beta2=0.999, checkpoint_freq=0, clip_norm=20.0, cuda=False, iaf_dim=100, learning_rate=0.0004, load_model='', load_opt='', log='dmm.log', lr_decay=0.99996, mini_batch_size=20, minimum_annealing_factor=0.1, num_epochs=5000, num_iafs=0, rnn_dropout_rate=0.1, save_model='', save_opt='', weight_decay=0.6)
N_train_data: 229     avg. training seq. length: 60.29    N_mini_batches: 12
[training epoch 0000]  61.9873 				(dt = 9.196 sec)
[training epoch 0001]  51.8227 				(dt = 8.735 sec)
[training epoch 0002]  25.2403 				(dt = 9.289 sec)
[training epoch 0003]  16.1608 				(dt = 8.802 sec)
[training epoch 0004]  14.1899 				(dt = 9.282 sec)

PS: I was getting the following error when running DMM:

Traceback (most recent call last):
  File "dmm.py", line 430, in <module>
    main(args)
  File "dmm.py", line 390, in main
    epoch_nll += process_minibatch(epoch, which_mini_batch, shuffled_indices)
  File "dmm.py", line 350, in process_minibatch
    mini_batch_seq_lengths, annealing_factor)
  File "/data/sls/u/sameerk/anaconda3/envs/nova-cpu-bedge/lib/python3.5/site-packages/pyro_ppl-0.1.2-py3.5.egg/pyro/infer/svi.py", line 97, in step
    loss = self.loss_and_grads(self.model, self.guide, *args, **kwargs)
  File "/data/sls/u/sameerk/anaconda3/envs/nova-cpu-bedge/lib/python3.5/site-packages/pyro_ppl-0.1.2-py3.5.egg/pyro/infer/trace_elbo.py", line 133, in loss_and_grads
    for weight, model_trace, guide_trace, log_r in self._get_traces(model, guide, *args, **kwargs):
  File "/data/sls/u/sameerk/anaconda3/envs/nova-cpu-bedge/lib/python3.5/site-packages/pyro_ppl-0.1.2-py3.5.egg/pyro/infer/trace_elbo.py", line 74, in _get_traces
    guide_trace = poutine.trace(guide).get_trace(*args, **kwargs)
  File "/data/sls/u/sameerk/anaconda3/envs/nova-cpu-bedge/lib/python3.5/site-packages/pyro_ppl-0.1.2-py3.5.egg/pyro/poutine/trace_poutine.py", line 250, in get_trace
    self(*args, **kwargs)
  File "/data/sls/u/sameerk/anaconda3/envs/nova-cpu-bedge/lib/python3.5/site-packages/pyro_ppl-0.1.2-py3.5.egg/pyro/poutine/trace_poutine.py", line 238, in __call__
    ret = super(TracePoutine, self).__call__(*args, **kwargs)
  File "/data/sls/u/sameerk/anaconda3/envs/nova-cpu-bedge/lib/python3.5/site-packages/pyro_ppl-0.1.2-py3.5.egg/pyro/poutine/poutine.py", line 147, in __call__
    return self.fn(*args, **kwargs)
  File "dmm.py", line 226, in guide
    rnn_output = poly.pad_and_reverse(rnn_output, mini_batch_seq_lengths)
  File "/data/sls/u/sameerk/repos/pyro/examples/dmm/polyphonic_data_loader.py", line 88, in pad_and_reverse
    reversed_output = reverse_sequences_torch(rnn_output, seq_lengths)
  File "/data/sls/u/sameerk/repos/pyro/examples/dmm/polyphonic_data_loader.py", line 78, in reverse_sequences_torch
    else Variable(torch.LongTensor(time_slice))
RuntimeError: tried to construct a tensor from a int sequence, but found an item of type numpy.int64 at index (0)

I fixed it by adding the line:

time_slice = time_slice.tolist() in polyphonic_data_loader.py

Any comments on this behaviour?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions