Ask questionsCustom dataset op encounters refcount error
I'm trying to implement a customize dataset op so that developing extention of tf.data may not need to recompile the whole tensorflow codebase. And as a start, I'm implementing an identity dataset op that would do nothing but pass the data of its input.
After compiled according to the custom op tutorial, the dataset can successfully output data, but it will raise error on destruction. The error message is:
% python3 identity_dataset_op.py 2020-05-29 19:09:04.290680: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-05-29 19:09:04.304126: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fd634565fb0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-05-29 19:09:04.304140: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version tf.Tensor(1, shape=(), dtype=int32) 2020-05-29 19:09:04.308089: F /usr/local/lib/python3.7/site-packages/tensorflow_core/include/tensorflow/core/lib/core/refcount.h:90] Check failed: ref_.load() == 0 (1 vs. 0) zsh: abort python3 identity_dataset_op.py
The problem is that the refcount has not been set to 0 when the program enters the destructor and this cause the destructor of
RefCounted (which is the base class of all Dataset) panic.
However, I believe the destructor of a dataset will only be called when its refcount is set to 0. I wonder if this error is connected to some functionality that a custom op cannot use. Also, it will be really nice if you tell me whether it is possible to make a customize dataset op.
BTW, during debugging, I noticed that the
MakeDataset function would be called twice. Is there any reason for this?
Thank you so much for your time on this issue.
The code for the op is in this zip file. custom_dataset_op.zip
To run the test, simple compile the .cc file to
identity_dataset_op.zip and run
Answer questions aaudiber
I looked through the code and nothing stuck out that would cause the refcount issue. When you build with the tensorflow codebase, are you building from latest, or from branch 1.15? It's possible that there were some changes to reference counting since 1.15.