profile
viewpoint
Guangda Lai aaroey @google Wir müssen wissen. Wir werden wissen.

tensorflow/tensorrt 354

TensorFlow/TensorRT integration

tensorflow/ngraph-bridge 91

TensorFlow-nGraph bridge

aaroey/docs 1

TensorFlow documentation

aaroey/tensorflow 1

Computation using data flow graphs for scalable machine learning

chsigg/training 1

Reference implementations of MLPerf benchmarks

aaroey/aaroey-lib 0

My personal library

aaroey/benchmarks 0

Benchmark code

aaroey/Chariot 0

A cross-platform open-source reimplementation of the Age of Empires (1997) engine

aaroey/jekyll-pseudocode-b 0

A simple and trivial psuedo code formatter for jekyll

aaroey/mcsema 0

Framework for lifting x86, amd64, and aarch64 program binaries to LLVM bitcode

issue commenttensorflow/tensorrt

Image Classification example with TensorRT 7 and TF 2.1 running into problems

@tfeher @bixia1 could you help to take a look?

mankeyboy

comment created time in 15 hours

push eventaaroey/aaroey-lib

Guangda Lai

commit sha e313826950492abf9307ac0f31f9b69f47223ff9

Add more utilities, fix various snippets and configs

view details

push time in 2 days

startedgoogle/tcmalloc

started time in 8 days

issue commenttensorflow/tensorflow

Allow TrtGraphConverterV2 to accept Frozen Graph input as well as saved_model

@mankeyboy frozen graph is not supported by TF2.0. So instead could you build a SavedModel using your frozen graph, and then use the converter?

mankeyboy

comment created time in 8 days

MemberEvent

issue commenttaichi-dev/taichi

Not able to reproduce sparse convolution performance

Thanks for the update! Will try it out once its ready.

aaroey

comment created time in 9 days

issue commenttensorflow/tensorrt

Object Detection example with TRT7 and TF2.1 issues

@tfeher could you help to take a look at this? Also @bixia1

mankeyboy

comment created time in 9 days

MemberEvent
MemberEvent

issue commenttensorflow/tensorflow

TensorRT plugins

@soldierofhell do you want to support plugins for custom ops, or plugins for a TF subgraph?

@sanjoy In order to support the second case, I think we'll need a way for user to specify the subgraph (e.g. inputs + outputs), which is tricky since TF2 doesn't retain the node names during function instantiation.

soldierofhell

comment created time in 12 days

issue commenttaichi-dev/taichi

Not able to reproduce sparse convolution performance

Thanks for the update. Looking forward to the new backend!

aaroey

comment created time in 19 days

issue commenttensorflow/tensorflow

TensorRT Segmentation Fault During Conversion

@bixia1 @sanjoy could you help to investigate this?

arielbenitah

comment created time in 19 days

issue closedtensorflow/models

Tensorflow-TensorRT "Engine buffer is full"


System information

  • What is the top-level directory of the model you are using: object_detection
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • TensorFlow installed from (source or binary): Binary
  • TensorFlow version (use command below): 1.12
  • Bazel version (if compiling from source): n/a
  • CUDA/cuDNN version: 10/7
  • GPU model and memory: Jetson Xavier 16GB shared w/ RAM
  • Exact command to reproduce: sess.run()

Describe the problem

I'm trying to convert the frozen weights from faster_rcnn_resnet50_coco to TensorRT and I'm getting the following error when I call session.run():

2018-12-06 12:21:53.405304: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=100 2018-12-06 12:21:53.405458: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for my_trt_op_1

Is this a bug? If not, what is the cause of this error?

Source code / logs

Here is a minimal example of the code:

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=4000000000,
    precision_mode='FP16',
    minimum_segment_size=50
)

tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True
tf_sess = tf.Session(config=tf_config)
tf.import_graph_def(trt_graph, name='')\
scores, boxes, classes, num_detections = tf_sess.run([tf_scores, tf_boxes, tf_classes, tf_num_detections], feed_dict={
    tf_input: image_resized[None, ...]
})

I'm following this guide in an NVIDIA repo if you want to see the complete code. Supposedly someone else got the same faster-rcnn working so it should work.

closed time in 23 days

atyshka

issue commenttaichi-dev/taichi

Not able to reproduce sparse convolution performance

@yuanming-hu any update on this?

aaroey

comment created time in 25 days

Pull request review commenttensorflow/tensorflow

Enable explicit batch mode with optimization profiles

 Status Converter::RenameAndMarkOutputTensors( }  Status Converter::BuildCudaEngine(-    TrtUniquePtrType<nvinfer1::ICudaEngine>* engine, int max_batch_size,-    size_t max_workspace_size_bytes, nvinfer1::IGpuAllocator* allocator,-    TRTInt8Calibrator* calibrator) {+    TrtUniquePtrType<nvinfer1::ICudaEngine>* engine,+    int max_batch_size, size_t max_workspace_size_bytes,+    nvinfer1::IGpuAllocator* allocator, TRTInt8Calibrator* calibrator,+    TrtShapeOptimizationProfile& profiles) {

Use pointer instead.

pooyadavoodi

comment created time in 25 days

Pull request review commenttensorflow/tensorflow

Enable explicit batch mode with optimization profiles

 class TRTEngineResourceOpsTest : public OpsTestBase {     network->markOutput(*output);      // Build the engine+    TrtUniquePtrType<nvinfer1::IBuilderConfig> builder_config(+          builder->createBuilderConfig());+     builder->setMaxBatchSize(1);-    builder->setMaxWorkspaceSize(1 << 10);+    builder_config->setMaxWorkspaceSize(1 << 10);++    if (dynamic) {+      // Create three profiles+      for (int i=1; i<=3; i++) {+        auto* profile = builder->createOptimizationProfile();+        nvinfer1::Dims dim;+        dim.nbDims = 1;+        dim.d[0] = i;+        profile->setDimensions(inName, nvinfer1::OptProfileSelector::kMIN, dim);+        profile->setDimensions(inName, nvinfer1::OptProfileSelector::kOPT, dim);+        profile->setDimensions(inName, nvinfer1::OptProfileSelector::kMAX, dim);+        int idx = builder_config->addOptimizationProfile(profile);+        EXPECT_NE(-1, idx);+      }+    }     TrtUniquePtrType<nvinfer1::ICudaEngine> engine(-        builder->buildCudaEngine(*network));+        builder->buildEngineWithConfig(*network, *builder_config));+     EXPECT_NE(nullptr, engine);     return engine;   }+  Logger logger;

logger_

pooyadavoodi

comment created time in 25 days

Pull request review commenttensorflow/tensorflow

Enable explicit batch mode with optimization profiles

 class TRTEngineResourceOpsTest : public OpsTestBase {     inputs_.clear();   } -  TrtUniquePtrType<nvinfer1::ICudaEngine> CreateTRTEngine() {-    Logger logger;+  TrtUniquePtrType<nvinfer1::ICudaEngine> CreateTRTEngine(bool dynamic=false) {     TrtUniquePtrType<nvinfer1::IBuilder> builder(         nvinfer1::createInferBuilder(logger));-    TrtUniquePtrType<nvinfer1::INetworkDefinition> network(-        builder->createNetwork());-+    TrtUniquePtrType<nvinfer1::INetworkDefinition> network;+    if (!dynamic) {+      network = TrtUniquePtrType<nvinfer1::INetworkDefinition>(+          builder->createNetwork());+    } else {+      network = TrtUniquePtrType<nvinfer1::INetworkDefinition>(+          builder->createNetworkV2( 1U << static_cast<int>(+              nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH)));+    }     // Add the input.     nvinfer1::Dims dims;     dims.nbDims = 1;-    dims.d[0] = 1;+    dims.d[0] = dynamic ? -1 : 1;+    const char *inName = "input";

in_name

pooyadavoodi

comment created time in 25 days

Pull request review commenttensorflow/tensorflow

Enable explicit batch mode with optimization profiles

 def build(self, input_fn):     Args:       input_fn: a generator function that yields input data as a list or tuple,         which will be used to execute the converted signature to generate TRT-        engines.-        Example: `def input_fn(): yield input1, input2, input3`+        engines. Example:+        `def input_fn():+             # Let's assume a network with 2 input tensors. We generate 3 sets+             # of dummy input data:+             input_shapes = [[(1, 16), (2, 16)], # 1st input list+                             [(2, 32), (4, 32)], # 2nd list of two tensors+                             [(4, 32), (8, 32)]] # 3rd input list+             for shapes in input_shapes:+                 # return a list of input tensors+                 yield [np.zeros(x).astype(np.float32) for x in shapes]`     """+    def _rebuild_func():+      # Rebuild function from graph_def.+      reset_converted_func = wrap_function.function_from_graph_def(+          self._converted_graph_def,+          [tensor.name for tensor in self._converted_func.inputs],+          [tensor.name for tensor in self._converted_func.outputs])+      reset_converted_func.graph.structured_outputs = nest.pack_sequence_as(+          self._converted_func.graph.structured_outputs,+          reset_converted_func.graph.structured_outputs)+      self._converted_func = reset_converted_func++    def _set_profile_generation_mode(value, node):+      node.attr["_profile_generation_mode"].b = value++    if self._need_trt_profiles:+      # Enable profile generation.+      self._for_each_trt_node(self._converted_graph_def,+                              partial(_set_profile_generation_mode, True))+      _rebuild_func()

As chatted offline, please let save() report error if user set _need_trt_profiles to True but didn't call build().

pooyadavoodi

comment created time in a month

Pull request review commenttensorflow/tensorflow

Enable explicit batch mode with optimization profiles

 Status CreateTRTNode(const ConversionParams& params,     // Create static engine for fp32/fp16 mode.     TrtUniquePtrType<nvinfer1::ICudaEngine> engine;     // TODO(sami): What happens if 1st dim is not batch?+    // We pass an empty profiles object to ConvertGraphDefToEngine.+    TrtShapeOptimizationProfile profile;

Move this inside ConvertGraphDefToEngine()?

pooyadavoodi

comment created time in 25 days

Pull request review commenttensorflow/tensorflow

Enable explicit batch mode with optimization profiles

 Status ConvertGraphDefToEngine(     nvinfer1::ILogger* trt_logger, nvinfer1::IGpuAllocator* allocator,     TRTInt8Calibrator* calibrator,     TrtUniquePtrType<nvinfer1::ICudaEngine>* engine, bool use_calibration,-    const bool use_implicit_batch, bool* convert_successfully) {+    const bool use_implicit_batch, bool* convert_successfully,+    TrtShapeOptimizationProfile& profiles) {

Use pointer instead.

pooyadavoodi

comment created time in 25 days

Pull request review commenttensorflow/tensorflow

Enable explicit batch mode with optimization profiles

 TRTEngineOp::TRTEngineOp(OpKernelConstruction* context)             << ", thus setting _use_implicit_batch=true";     use_implicit_batch_ = true;   }+  status = context->GetAttr("_profile_generation_mode", &profile_generation_mode_);+  if (status.code() == tensorflow::error::NOT_FOUND) {+    VLOG(2) << "Not found _profile_generation_mode in " << context->device()->name()+            << ", thus setting _profile_generation_mode=false";+    profile_generation_mode_ = false;+  }+  if (use_implicit_batch_) {

nit: merge this if with the one below.

pooyadavoodi

comment created time in 25 days

Pull request review commenttensorflow/tensorflow

Enable explicit batch mode with optimization profiles

 TEST_F(TRTEngineResourceOpsTest, Basic) {   resource->Unref(); } +TEST_F(TRTEngineResourceOpsTest, Profiles) {

This test is similar to the one above, can we merge them?

pooyadavoodi

comment created time in 25 days

Pull request review commenttensorflow/tensorflow

Enable explicit batch mode with optimization profiles

 def build(self, input_fn):     Args:       input_fn: a generator function that yields input data as a list or tuple,         which will be used to execute the converted signature to generate TRT-        engines.-        Example: `def input_fn(): yield input1, input2, input3`+        engines. Example:+        `def input_fn():+             # Let's assume a network with 2 input tensors. We generate 3 sets+             # of dummy input data:+             input_shapes = [[(1, 16), (2, 16)], # 1st input list+                             [(2, 32), (4, 32)], # 2nd list of two tensors+                             [(4, 32), (8, 32)]] # 3rd input list+             for shapes in input_shapes:+                 # return a list of input tensors+                 yield [np.zeros(x).astype(np.float32) for x in shapes]`     """+    def _rebuild_func():+      # Rebuild function from graph_def.+      reset_converted_func = wrap_function.function_from_graph_def(+          self._converted_graph_def,+          [tensor.name for tensor in self._converted_func.inputs],+          [tensor.name for tensor in self._converted_func.outputs])+      reset_converted_func.graph.structured_outputs = nest.pack_sequence_as(+          self._converted_func.graph.structured_outputs,+          reset_converted_func.graph.structured_outputs)+      self._converted_func = reset_converted_func++    def _set_profile_generation_mode(value, node):+      node.attr["_profile_generation_mode"].b = value++    if self._need_trt_profiles:+      # Enable profile generation.+      self._for_each_trt_node(self._converted_graph_def,+                              partial(_set_profile_generation_mode, True))+      _rebuild_func()

Please document why we need to rebuild at the very beginning.

pooyadavoodi

comment created time in a month

Pull request review commenttensorflow/tensorflow

Enable explicit batch mode with optimization profiles

 def build(self, input_fn):     Args:       input_fn: a generator function that yields input data as a list or tuple,         which will be used to execute the converted signature to generate TRT-        engines.-        Example: `def input_fn(): yield input1, input2, input3`+        engines. Example:+        `def input_fn():+             # Let's assume a network with 2 input tensors. We generate 3 sets+             # of dummy input data:+             input_shapes = [[(1, 16), (2, 16)], # 1st input list+                             [(2, 32), (4, 32)], # 2nd list of two tensors+                             [(4, 32), (8, 32)]] # 3rd input list+             for shapes in input_shapes:+                 # return a list of input tensors+                 yield [np.zeros(x).astype(np.float32) for x in shapes]`     """+    def _rebuild_func():+      # Rebuild function from graph_def.+      reset_converted_func = wrap_function.function_from_graph_def(+          self._converted_graph_def,+          [tensor.name for tensor in self._converted_func.inputs],+          [tensor.name for tensor in self._converted_func.outputs])+      reset_converted_func.graph.structured_outputs = nest.pack_sequence_as(+          self._converted_func.graph.structured_outputs,+          reset_converted_func.graph.structured_outputs)+      self._converted_func = reset_converted_func++    def _set_profile_generation_mode(value, node):+      node.attr["_profile_generation_mode"].b = value++    if self._need_trt_profiles:+      # Enable profile generation.+      self._for_each_trt_node(self._converted_graph_def,+                              partial(_set_profile_generation_mode, True))+      _rebuild_func()++    # Run inference:+    #   Builds TRT engines if self._need_trt_profiles is False.+    #   Builds TRT optimization profiles if self._need_trt_profiles is True.+    inputs = []     for inp in input_fn():+      inputs.append(inp)

This will probably cause OOM. Can we keep only one of it? Also why is one input enough? Please comment.

pooyadavoodi

comment created time in a month

issue commenttensorflow/tensorflow

TF-TRT batchSize > 0 && batchSize <= MAX_BATCH_SIZE

@bixia1 could you help to take a look at the repro above? Thanks.

sgambient

comment created time in a month

issue commenttensorflow/tensorrt

UnavailableError: Can't provision more than one single cluster at a time

@sanjoy @bixia1 could you help to investigate this?

leo-XUKANG

comment created time in a month

Pull request review commenttensorflow/docs

Update GPU install Ubuntu 16.04

 complicates installation of the NVIDIA driver and is beyond the scope of these i  # Install development and runtime libraries (~4GB) <code class="devsite-terminal">sudo apt-get install --no-install-recommends \-    cuda-10-0 \+    cuda-10-1 \     libcudnn7=7.6.4.38-1+cuda10.1  \     libcudnn7-dev=7.6.4.38-1+cuda10.1 </code>  # Install TensorRT. Requires that libcudnn7 is installed above.-<code class="devsite-terminal">sudo apt-get install -y --no-install-recommends libnvinfer5=6.0.1-1+cuda10.1 \-    libnvinfer-dev=6.0.1-1+cuda10.1+<code class="devsite-terminal">sudo apt-get install -y --no-install-recommends \+    libnvinfer5=6.0.1-1+cuda10.1 \

Should be libnvinfer6=xxx.

lamberta

comment created time in a month

startedsympy/sympy

started time in a month

issue commenttaichi-dev/taichi

Not able to reproduce sparse convolution performance

Hi @yuanming-hu,

Thanks for the quick response. I tried that but still not able to get things working. What I did are:

python3 -m pip install astpretty astor pytest opencv-python pybind11==2.2.4
wget https://raw.githubusercontent.com/yuanming-hu/taichi/legacy/install.py
python3 install.py
ti install https://github.com/yuanming-hu/taichi_lang
source ~/.bashrc
export PYTHONPATH=$TAICHI_REPO_DIR/projects/taichi_lang/python:$PYTHONPATH
ti test  # The tests passed.

However when I tried ti cnn opt=[t/f] cache_l1=[t/f] I still get the same cnn not found error.

I realized that install.py will check out the legacy branch, so I tried to manually checkout #dc162e11 and run python3 install.py again but then I got:

Warning: module [lang_core] loading failed: /home/taichi/build/libtaichi_lang_core.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):          
  File "<string>", line 1, in <module>      
  File "/home/taichi/python/taichi/__init__.py", line 13, in <module>
    from taichi.lang import *               
  File "/home/taichi/python/taichi/lang/__init__.py", line 1, in <module>
    from .impl import *                                                                               
  File "/home/taichi/python/taichi/lang/impl.py", line 2, in <module>
    from .core import taichi_lang_core                                                                
  File "/home/taichi/python/taichi/lang/core.py", line 12, in <module>
    import taichi_lang_core                                                                           
ImportError: No module named 'taichi_lang_core'                                                       
  Error: installation failed. 

I tried the installation script from head and dc162e11, the outcome is the same.

I understand the difficulty of making a legacy installation work, so I'm happy to wait. Do you have an estimation on when it'll be ready again with the new backend?

Thanks.

aaroey

comment created time in a month

issue openedtaichi-dev/taichi

Not able to reproduce sparse convolution performance

Describe the bug I was not able to reproduce the numbers mentioned in the Taichi paper section 6.4 3D Convolutional Neural Networks.

Screenshots

$ pip install taichi-nightly-cuda-10-0    
$ ti cnn opt=[t/f] cache_l1=[t/f]
[Release mode]
[T 01/09/20 15:17:10.485] [logging.cpp:Logger@68] Taichi core started. Thread ID = 134274
[Taichi version 0.3.20, cuda 10.0, commit 1c85d8e1]
...                                                    
[E 01/09/20 15:17:10.496] [task.h:create@29] Implementation [task::cnn] not found!

To Reproduce The commands to reproduce are listed above. I also realized that sparse computation is not available at the moment at head, so I also tried to install the legacy version by following this doc, but got similar problem.

created time in a month

startedtraveller59/spconv

started time in a month

pull request commenttensorflow/tensorflow

Fix saved_model_cli tensorrt conversion

@wdirons could you help to fix the sanity errors:

FAIL: Found 4 non-whitelisted pylint errors:
tensorflow/python/tools/saved_model_cli.py:768: [C0330(bad-continuation), ] Wrong hanging indentation (remove 2 spaces).

tensorflow/python/tools/saved_model_cli.py:769: [C0330(bad-continuation), ] Wrong hanging indentation (remove 2 spaces).

tensorflow/python/tools/saved_model_cli.py:770: [C0330(bad-continuation), ] Wrong hanging indentation (remove 2 spaces).

tensorflow/python/tools/saved_model_cli.py:772: [C0301(line-too-long), ] Line too long (85/80)

wdirons

comment created time in a month

issue commenttensorflow/tensorflow

Failed to build TF2.0 with TensorRT: undefined symbol: _ZN15stream_executor14StreamExecutor18EnablePeerAccessToEPS0_

From the discussion this seems not related to TensorRT. Adding sanjoy@ to followup.

jiapei100

comment created time in a month

fork aaroey/taichi

The Taichi programming language

http://taichi.graphics

fork in a month

push eventaaroey/tensorrt

Guangda Lai

commit sha 4db3e4f0db491547d0554107705c740d858800e5

backup configs

view details

push time in a month

create barnchaaroey/tensorrt

branch : r2.0-experimental

created branch time in a month

push eventaaroey/serving

Guangda Lai

commit sha 1cfe08f9570d388d476b58a9efbdda46820ed87a

backup bazel setups

view details

Guangda Lai

commit sha 0aec64fdccb1fb6853fa2035e6982efa49876534

Merge branch 'master' of https://github.com/aaroey/serving

view details

push time in a month

startedreadthedocs/sphinx_rtd_theme

started time in a month

issue commenttensorflow/tensorflow

INT8 calibration error using TrtGraphConverterV2 in TensorFlow2.0

After making the above change, the code snippet mentioned by @ay27 works well for me.

ay27

comment created time in 2 months

issue commenttensorflow/tensorflow

INT8 calibration error using TrtGraphConverterV2 in TensorFlow2.0

@ay27 could you try letting the input_fn return a tuple? yield tf.random.normal((1, 224, 224, 3)),

ay27

comment created time in 2 months

startedyuanming-hu/difftaichi

started time in 2 months

pull request commenttensorflow/tensorflow

Fix saved_model_cli tensorrt conversion

Also @sanjoy @pooyadavoodi

wdirons

comment created time in 2 months

startedLingDong-/wenyan-lang

started time in 2 months

started996icu/996.ICU

started time in 2 months

startedchinese-poetry/chinese-poetry

started time in 2 months

issue commenttensorflow/tensorrt

UnavailableError: Can't provision more than one single cluster at a time

@pooyadavoodi have you encountered similar issue before? Also @bixia1

leo-XUKANG

comment created time in 2 months

issue closedtensorflow/tensorflow

tensorflow does not detect 2nd GPU

Hi,

I am trying to use 2 GPUs, tensorflow does not recognise the 2nd one. the 2nd GPU is working fine (in widows environment).

using tensorflow example: tf.keras.utils.multi_gpu_model, https://www.tensorflow.org/api_docs/python/tf/keras/utils/multi_gpu_model

I see the follwing error

ValueError: To call multi_gpu_model with gpus=2, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/gpu:0']. Try reducing gpus.

I can switch between the two graphic cards, with CUDA VISIBLE DEVICE , which sets one of the GPUs to GPU 0. however both can not be used. device 0 and device 1 are recognised however one GPU is only recognised.

The main GPU is RTX2070 (8GB) and 2nd GPU is GTX1050 (2GB). Before i submit i spent sometime searching for solution and did whatever I could find on the internet. drivers are up to date, 64bit version and latest versions of the software are installed. I dont see any issue, beside not appearing the 2nd GPU. The codes are working fine on first GPU, both have > 3.5 computational capacity.

closed time in 2 months

cyrus2018

issue commenttensorflow/tensorrt

OP_REQUIRES failed at partitioned_ops, but tensorRT model can be loaded

@luvwinnie could you try with tensorflow/serving:latest-gpu?

luvwinnie

comment created time in 2 months

issue closedtensorflow/tensorflow

The TF function for the TRT segment could not be empty

-- System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): YES
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
  • TensorFlow installed from (source or binary): when I convert pb to tensorrt, I use tf-nightly-gpu-1.15 from pip install. When I do infer, I use tf-1.14.0 from source.
  • TensorFlow version (use command below): tf-1.14.0
  • Python version: 3.5
  • Bazel version (if compiling from source): 0.24.1
  • GCC/Compiler version (if compiling from source): 5.4
  • CUDA/cuDNN version: cuda 10 cudnn 17
  • GPU model and memory: 1060Ti

Describe the current problem Hi, I successfully used trt_convert to my pb model to tensorRT plan and I want to use c++ to infer my model. So I use bazel complie tf with tensorrt together. In my code, pb model can run successfully. I use the same code, and TensorRT model can load successfully but as session running, tf give me the following error:

2019-09-06 06:56:37.455586: E tensorflow/core/common_runtime/executor.cc:642] Executor failed to create kernel. Invalid argument: The TF function for the TRT segment could not be empty [[{{node fa_layer4_c0/TRTEngineOp_108}}]]

When I convert pb to tensorrt, it is shown as following:

2019-09-06 06:25:59.150320: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:37] DefaultLogger Tensor DataType is determined at build time for tensors not marked as input or output. 2019-09-06 06:25:59.183609: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:831] TensorRT node fa_layer4/TRTEngineOp_107 added for segment 107 consisting of 3 nodes succeeded. 2019-09-06 06:25:59.189007: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:831] TensorRT node fa_layer4_c0/TRTEngineOp_108 added for segment 108 consisting of 2 nodes succeeded. 2019-09-06 06:25:59.189378: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:37] DefaultLogger Tensor DataType is determined at build time for tensors not marked as input or output. 2019-09-06 06:25:59.189403: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:41] DefaultLogger Network must have at least one output 2019-09-06 06:25:59.189426: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:834] TensorRT node fa_layer4_c0/conv0/bn/cond_1/TRTEngineOp_109 added for segment 109 consisting of 4 nodes failed: Internal: Failed to build TensorRT engine. Fallback to TF...

Additionally, I can use the tensorrt model in python code to infer. (python installed from pip) Anyone could provide some ideas how to solve it?

closed time in 2 months

double344931987

issue commenttensorflow/tensorflow

The TF function for the TRT segment could not be empty

Thanks @superhg2012.

@double344931987 I'm closing this, feel free to reopen with a repro (model+script) if you're still experiencing this issue. Thanks.

double344931987

comment created time in 2 months

issue commenttensorflow/tensorflow

TensorRT Segmentation Fault During Conversion

@arielbenitah sorry about the trouble. It seems you're using efficientnet, is it possible for you to share the model for investigation purposes?

arielbenitah

comment created time in 2 months

issue commenttensorflow/tensorflow

Tensorflow failed to find 'TRTEngineOp' when building pip-package and libtensorflow_cc.so

Hi @qcraftai, I can see a dependency chain based on:

bazel query 'somepath(tensorflow:libtensorflow_cc.so, tensorflow/compiler/tf2tensorrt:trt_conversion)'

. When you run bazel build on tensorflow:libtensorflow_cc.so, can you try adding --config=tensorrt? Or make sure TensorRT is configured during the build by checking that environment variables like TF_NEED_TENSORRT and TENSORRT_INSTALL_PATH are set?

qcraftai

comment created time in 2 months

issue commenttensorflow/tensorflow

Library Conversion: TensorRT

@sayakpaul could you share the crash log?

dynamicwebpaige

comment created time in 2 months

issue commenttensorflow/tensorflow

tf-trt using error

Hi @andyqian2015, may I know if your trtfilepath is a TF-TRT converted model? If so, could you describe more on how you run the TF-TRT conversion? Also, is it possible to share the model file? Thanks.

Also @pooyadavoodi @sanjoy

andyqian2015

comment created time in 2 months

pull request commenttensorflow/tensorflow

Set TRT network name to builder configs

@pooyadavoodi could you please help to resolve the conflict, and make sure all the tests are passed at head with this PR? Thanks.

pooyadavoodi

comment created time in 2 months

issue commenttensorflow/tensorflow

Can not convert a TF2 saved model to a TensorRT engine and save it.

Hi @pooyadavoodi this seems to be similar to the issue you described before, did you get that resolved?

roborocklsm

comment created time in 2 months

issue commenttensorflow/tensorflow

tf.range + for x,y in dataset issue

Hi @jsimsa could you help to take a look? Thanks

SSSxCCC

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add additional padding layer to deconv converter to fix output_shape

 Status ConvertConv2DHelper(OpConverterParams* params, int group,     conv_layer = layer;   }   nvinfer1::ITensor* output_tensor = conv_layer->getOutput(0);-+  // Add an extra padding for Deconv because TRT doesn't accept the+  // argument output_shape and thus the TRT output shape could be wrong+  // in case of strides>1.+  if (is_conv2d_backprop_input) {+    auto tf_output_shape = backprop_output_size.GetTrtDims();+    nvinfer1::Dims trt_output_shape = output_tensor->getDimensions();+    // What determines the padding size is the difference between the given+    // input_sizes (tf_output_shape) and TRT computed size.+    const int height_diff = tf_output_shape.d[h_index - 1] - trt_output_shape.d[1];

It seems the output is always NCHW? Then should it be d[2] instead?

pooyadavoodi

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add additional padding layer to deconv converter to fix output_shape

 Status ConvertConv2DHelper(OpConverterParams* params, int group,     conv_layer = layer;   }   nvinfer1::ITensor* output_tensor = conv_layer->getOutput(0);-+  // Add an extra padding for Deconv because TRT doesn't accept the+  // argument output_shape and thus the TRT output shape could be wrong+  // in case of strides>1.+  if (is_conv2d_backprop_input) {+    auto tf_output_shape = backprop_output_size.GetTrtDims();+    nvinfer1::Dims trt_output_shape = output_tensor->getDimensions();+    // What determines the padding size is the difference between the given+    // input_sizes (tf_output_shape) and TRT computed size.+    const int height_diff = tf_output_shape.d[h_index - 1] - trt_output_shape.d[1];

Accordingly this should be tf_output_shape[h_index]? Similar below.

pooyadavoodi

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add additional padding layer to deconv converter to fix output_shape

 Status ConvertConv2DHelper(OpConverterParams* params, int group,     conv_layer = layer;   }   nvinfer1::ITensor* output_tensor = conv_layer->getOutput(0);-+  // Add an extra padding for Deconv because TRT doesn't accept the+  // argument output_shape and thus the TRT output shape could be wrong+  // in case of strides>1.+  if (is_conv2d_backprop_input) {+    auto tf_output_shape = backprop_output_size.GetTrtDims();

I think this should be backprop_output_size.weights().GetValues(), but I'm fixing it. Just FYI.

pooyadavoodi

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add additional padding layer to deconv converter to fix output_shape

 Status ConvertConv2DHelper(OpConverterParams* params, int group,     conv_layer = layer;   }   nvinfer1::ITensor* output_tensor = conv_layer->getOutput(0);-+  // Add an extra padding for Deconv because TRT doesn't accept the+  // argument output_shape and thus the TRT output shape could be wrong+  // in case of strides>1.+  if (is_conv2d_backprop_input) {+    auto tf_output_shape = backprop_output_size.GetTrtDims();+    nvinfer1::Dims trt_output_shape = output_tensor->getDimensions();+    // What determines the padding size is the difference between the given+    // input_sizes (tf_output_shape) and TRT computed size.+    const int height_diff = tf_output_shape.d[h_index - 1] - trt_output_shape.d[1];

Do we need to consider NHWC vs NCHW to determine whether to use trt_output_shape.d[1] or d[2]? Same below.

pooyadavoodi

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Set TRT network name to builder configs

 Status Converter::BuildCudaEngine(   TF_RETURN_IF_ERROR(TrtPrecisionModeToName(       precision_mode_, &precision_mode_str));   string trt_version_str = GetLoadedTensorRTVersion();-  string trt_network_name =-      "TF" + string(TF_VERSION_STRING) + "-" +-      "TRT" + trt_version_str + "-" +-      "Precision-" + precision_mode_str + "-" +-      "Calibration-" + std::to_string(use_calibration_) + "-" +-      "Max-Batch-Size-" + std::to_string(max_batch_size) + "-" +-      "Max-Workspace-Size-" + std::to_string(max_workspace_size_bytes);+  string trt_network_name = StrCat(+      "TF:", string(TF_VERSION_STRING), ", ",

nit: please remove calls to string() and to_string(). There is no need to construct a string explicitly with StrCat.

pooyadavoodi

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Set TRT network name to builder configs

 string GetLinkedTensorRTVersion() {   minor = 0;   patch = 0; #endif-  string trt_version = std::to_string(major) + "." +-                       std::to_string(minor) + "." +-                       std::to_string(patch);+  string trt_version = absl::StrCat(std::to_string(major), ".",

Same here and below

pooyadavoodi

comment created time in 2 months

issue commenttensorflow/tensorflow

Sessions that are closed and reset and all inputs and outputs are out of scope, do not release GPU memory.

@samhodge sorry for late reply. As mentioned by @ymodak above, there is currently no way to reset the memory allocator without exiting the program. There is a test-only method that can help to reset the memory owned by the process but that is not exposed.

samhodge

comment created time in 2 months

issue commenttensorflow/tensorflow

Multiple sessions with per_process_gpu_memory_fraction

I'm closing this, please see #8136 for more info.

sealedtx

comment created time in 2 months

issue closedtensorflow/tensorflow

Multiple sessions with per_process_gpu_memory_fraction

System information

  • OS Platform and Distribution: Linux Ubuntu 18.04
  • TensorFlow installed from (source or binary): 1.14.0
  • TensorFlow version (use command below): v1.14.0-rc1-22-gaf24dc91b5 1.14.0
  • Python version: 3.6.8
  • CUDA/cuDNN version: 10.0 / 7.4.2.24-1
  • GPU model and memory: GTX 1070ti, 8119Mb

Describe the current behavior Start multiple sessions with gpu_options and different per_process_gpu_memory_fraction, but tensorflow outputs same amount of memory for both sessions:

...
tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5a67330 executing computations on platform CUDA. Devices:
tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1070 Ti, Compute Capability 6.1
tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3696000000 Hz
tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5a645c0 executing computations on platform Host. Devices:
tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce GTX 1070 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 500 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)

... init second ...

tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 500 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)

Describe the expected behavior I expect each session to allocate amount of memory specified in config.gpu_options

...
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 500 MB memory)
...
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 300 MB memory)

Code to reproduce the issue

def init_first(self):
    graph_def = tf.compat.v1.GraphDef()
    with tf.io.gfile.GFile(self.MODEL_PATH[0], 'rb') as f:
        graph_def.ParseFromString(f.read())

    graph = tf.Graph()
    with graph.as_default():
        tf.import_graph_def(graph_def, name='import')

    input = graph.get_tensor_by_name('import/image_tensor:0')
    output = [
        graph.get_tensor_by_name('import/boxes:0'),
        graph.get_tensor_by_name('import/scores:0'),
    ]

    config = tf.compat.v1.ConfigProto()
    config.gpu_options.allow_growth = True
    config.gpu_options.per_process_gpu_memory_fraction = 0.061
    sess = tf.compat.v1.Session(graph=graph, config=config)
    return sess, input, output

def init_second(self):
    graph_def = tf.compat.v1.GraphDef()
    with tf.io.gfile.GFile(self.MODEL_PATH[1], 'rb') as f:
        graph_def.ParseFromString(f.read())

    graph = tf.Graph()
    with graph.as_default():
        tf.import_graph_def(graph_def)

    config = tf.compat.v1.ConfigProto()
    config.gpu_options.allow_growth = True
    config.gpu_options.per_process_gpu_memory_fraction = 0.37
    sess = tf.compat.v1.Session(graph=graph, config=config)

    output = ('import/softmax_2/softmax:0',
              'import/conv6-2/conv6-2:0',
              'import/conv6-3/conv6-3:0')
    input = 'import/Placeholder_2:0'

    return sess, input, output

firsrt = init_first()
second = init_second()

closed time in 2 months

sealedtx

issue commenttensorflow/tensorflow

TF-TRT slower than optimized saved model

@jtressle I think what @leo-XUKANG could be possible. I'm closing this, please feel free to reopen and provide the model and script to reproduce the problem if it still exists.

jtressle

comment created time in 2 months

issue closedtensorflow/tensorflow

TF-TRT slower than optimized saved model

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes, I have a network that does 2D convultions + batch normalization on an image.
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary):
  • TensorFlow version (use command below): TF 1.14
  • Python version: 3.6.8
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): 7.4
  • CUDA/cuDNN version: 10.0
  • GPU model and memory: T4, 12GB

Describe the current behavior

I'm trying to optimize a custom model comprised of 2D convolutions and batch normalizations done on an image. The entire network has fixed dimensions. I'm using the nightly docker TF image to perform TF-TRT.

I've tried to create a TRT model using both of the following functions:

def create_trt_saved_model(saved_model_dir, output_saved_model_dir, precision, batch_size=1):
	''' convert saved model to TRT saved model'''

	converter = trt.TrtGraphConverter(
		input_saved_model_dir = str(saved_model_dir),
		max_batch_size = batch_size,
		precision_mode = precision )
	converter.convert()
	converter.save(output_saved_model_dir = str(output_saved_model_dir))

and

def create_trt_frozen_graph(graph_def, output_nodes, precision, 
	output_graph_path = None, workspace_size=2<<10, batch_size=1):
	''' convert frozen_graph to a TRT frozen graph'''
	
	converter = trt.TrtGraphConverter(
		input_graph_def = graph_def,
		nodes_blacklist = output_nodes,
		max_batch_size = batch_size,
		max_workspace_size_bytes = workspace_size<<20,
		precision_mode = precision)

	trt_graph_def = converter.convert()

	if not (output_graph_path is None):
		write_graph_to_file(trt_graph_def, output_graph_path)

	return trt_graph_def

In both cases, the TF-TRT model is about 35X slower (20ms vs 700ms inference). The results are the same regardless if I use the graph_def from memory, or load the TF-TRT saved model.

Here is the respective TRT output:

2019-08-30 04:48:35.865582: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 4 ops of 3 different types in the graph that are not converted to TensorRT: Identity, NoOp, Placeholder, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).
2019-08-30 04:48:35.953878: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:633] Number of TensorRT candidate segments: 1
2019-08-30 04:48:35.969432: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.5
2019-08-30 04:48:35.969838: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.5
2019-08-30 04:55:04.225714: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 944 nodes succeeded.
2019-08-30 04:55:04.352053: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-08-30 04:55:04.402986: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: tf_graph
2019-08-30 04:55:04.403043: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 818 nodes (-817), 914 edges (-949), time = 62.326ms.
2019-08-30 04:55:04.403049: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: Graph size after: 950 nodes (132), 1046 edges (132), time = 50.486ms.
2019-08-30 04:55:04.403054: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 946 nodes (-4), 1042 edges (-4), time = 45.175ms.
2019-08-30 04:55:04.403059: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 3 nodes (-943), 2 edges (-1040), time = 388396.844ms.
2019-08-30 04:55:04.403063: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 3 nodes (0), 2 edges (0), time = 2.443ms.
2019-08-30 04:55:04.403067: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: TRTEngineOp_0_native_segment
2019-08-30 04:55:04.403072: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 946 nodes (0), 1042 edges (0), time = 26.303ms.
2019-08-30 04:55:04.403076: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: Graph size after: 946 nodes (0), 1042 edges (0), time = 30.005ms.
2019-08-30 04:55:04.403092: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 946 nodes (0), 1042 edges (0), time = 26.387ms.
2019-08-30 04:55:04.403097: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 946 nodes (0), 1042 edges (0), time = 3.387ms.
2019-08-30 04:55:04.403103: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 946 nodes (0), 1042 edges (0), time = 26.708ms.
2019-08-30 04:55:04.727419: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at 

Since the TRT model only has 3 nodes, one of which is the TRT Engine node, does it make sense to convert this via UFF? Would that get me a speed improvement?

Or, is there a bug in the latest version of TF docker?

Thanks!

Describe the expected behavior

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

closed time in 2 months

jtressle

issue closedtensorflow/models

tftrt14.0中plugin编译问题

after i change directory to the "xxxxxx/tensorflow/contrib/tensorrt/custom_plugin_examples" , I use the command "bazel build _inc_op.so" to compile _inc_op.so. it report the folllowing error:

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" ERROR: /home/luoyang/software/tensorflow-master/tensorflow/contrib/tensorrt/custom_plugin_examples/BUILD:35:1: Linking of rule '//tensorflow/contrib/tensorrt/custom_plugin_examples:_inc_op.so' failed (Exit 1) bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcudart_static.a(libcudart_static.a.o): In function cudaGetDevice': (.text+0x4bdc0): multiple definition ofcudaGetDevice' bazel-out/k8-opt/bin/tensorflow/stream_executor/cuda/libcudart_stub.pic.a(cudart_stub.pic.o):cudart_stub.cc:(.text.cudaGetDevice+0x0): first defined here bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcudart_static.a(libcudart_static.a.o): In function cudaStreamAddCallback': (.text+0x50760): multiple definition ofcudaStreamAddCallback' bazel-out/k8-opt/bin/tensorflow/stream_executor/cuda/libcudart_stub.pic.a(cudart_stub.pic.o):cudart_stub.cc:(.text.cudaStreamAddCallback+0x0): first defined here bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcudart_static.a(libcudart_static.a.o): In function cudaMemcpyArrayToArray': (.text+0x3afd0): multiple definition ofcudaMemcpyArrayToArray' bazel-out/k8-opt/bin/tensorflow/stream_executor/cuda/libcudart_stub.pic.a(cudart_stub.pic.o):cudart_stub.cc:(.text.cudaMemcpyArrayToArray+0x0): first defined here bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcudart_static.a(libcudart_static.a.o): In function cudaDeviceReset': (.text+0x39c40): multiple definition ofcudaDeviceReset' bazel-out/k8-opt/bin/tensorflow/stream_executor/cuda/libcudart_stub.pic.a(cudart_stub.pic.o):cudart_stub.cc:(.text.cudaDeviceReset+0x0): first defined here bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcudart_static.a(libcudart_static.a.o): In function cudaGraphicsSubResourceGetMappedArray': (.text+0x423d0): multiple definition ofcudaGraphicsSubResourceGetMappedArray' bazel-out/k8-opt/bin/tensorflow/stream_executor/cuda/libcudart_stub.pic.a(cudart_stub.pic.o):cudart_stub.cc:(.text.cudaGraphicsSubResourceGetMappedArray+0x0): first defined here bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcudart_static.a(libcudart_static.a.o): In function cudaGetSurfaceObjectResourceDesc': (.text+0x3ff00): multiple definition ofcudaGetSurfaceObjectResourceDesc' bazel-out/k8-opt/bin/tensorflow/stream_executor/cuda/libcudart_stub.pic.a(cudart_stub.pic.o):cudart_stub.cc:(.text.cudaGetSurfaceObjectResourceDesc+0x0): first defined here bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcudart_static.a(libcudart_static.a.o): In function cudaMemRangeGetAttributes': (.text+0x43200): multiple definition ofcudaMemRangeGetAttributes' bazel-out/k8-opt/bin/tensorflow/stream_executor/cuda/libcudart_stub.pic.a(cudart_stub.pic.o):cudart_stub.cc:(.text.cudaMemRangeGetAttributes+0x0): first defined here bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcudart_static.a(libcudart_static.a.o): In function cudaMemcpy2DFromArray': (.text+0x3b250): multiple definition ofcudaMemcpy2DFromArray' bazel-out/k8-opt/bin/tensorflow/stream_executor/cuda/libcudart_stub.pic.a(cudart_stub.pic.o):cudart_stub.cc:(.text.cudaMemcpy2DFromArray+0x0): first defined here bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcudart_static.a(libcudart_static.a.o): In function cudaSetDoubleForHost': (.text+0x48c40): multiple definition ofcudaSetDoubleForHost' bazel-out/k8-opt/bin/tensorflow/stream_executor/cuda/libcudart_stub.pic.a(cudart_stub.pic.o):cudart_stub.cc:(.text.cudaSetDoubleForHost+0x0): first defined here bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcudart_static.a(libcudart_static.a.o): In function cudaDestroyTextureObject': (.text+0x40960): multiple definition ofcudaDestroyTextureObject' bazel-out/k8-opt/bin/tensorflow/stream_executor/cuda/libcudart_stub.pic.a(cudart_stub.pic.o):cudart_stub.cc:(.text.cudaDestroyTextureObject+0x0): first defined here bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcudart_static.a(libcudart_static.a.o): In function `cudaHostGetFlags': """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

the configuration of computer is: tesla p4, cuda10.0 , driver 410.0, cudnn 7.3.1, tensorrt 5.02.

closed time in 2 months

tilaba

issue commenttensorflow/models

tftrt14.0中plugin编译问题

Sorry about the trouble, but TF contrib is gone and TF-TRT plugin is not currently well supported yet. I'm closing this, feel free to reopen if there are any further questions.

tilaba

comment created time in 2 months

pull request commenttensorflow/tensorflow

Change TrtConversionParams to class from NamedTuple

@pooyadavoodi this breaks tensorflow/tools/api/tests:api_compatibility_test, could you take a look?

pooyadavoodi

comment created time in 2 months

issue commenttensorflow/tensorflow

Invalid result on some GPUs, probably einsum

I can reproduce the problem with TF 2.0. @sanjoy, could you help to take a look?

hstuk

comment created time in 2 months

issue closedtensorflow/tensorflow

Tensorflow per_process_gpu_memory_fraction used more memory than specified

<em>Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Arch linux 5.1.12
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
  • TensorFlow installed from (source or binary): Arch Linux repository
  • TensorFlow version (use command below): 1.14.0-rc1
  • Python version: 3.7.3
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: 10.1.168
  • GPU model and memory: Quadro M2200, 4043 MB

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior Tensorflow allocates more memory than specified. When running multiple processes sharing the same GPU can cause one process to have out of memory exception. For example, I specified it to use no more than 50% of GPU memory. However, it actually allocates ~52% memory as in the screenshot.

image

Describe the expected behavior I would expect it to allocate no more than 50% memory. In my case, it would be <=2021.5 MB.

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

import tensorflow as tf

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)

with tf.compat.v1.Session(
        config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)
    while True:
        sess.run(c)

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

closed time in 2 months

zli117

issue commenttensorflow/tensorflow

Tensorflow per_process_gpu_memory_fraction used more memory than specified

Hi @zli117, I think this is expected. per_process_gpu_memory_fraction specifies the amount of memory that TF will be used to allocate input/output tensors of the graph and temporary buffers for intermediate results. This doesn't include memory that is needed to initialize CUDA/cuDNN and other GPU libraries.

I'm closing this, feel free to reopen if there are further questions.

zli117

comment created time in 2 months

issue commenttensorflow/tensorflow

TF-TRT batchSize > 0 && batchSize <= MAX_BATCH_SIZE

Also @sanjoy

sgambient

comment created time in 2 months

issue commenttensorflow/tensorflow

TF-TRT batchSize > 0 && batchSize <= MAX_BATCH_SIZE

@sgambient Based on the log it seems the model gets an input with 0 elements, could you help to confirm that?

Also @pooyadavoodi

sgambient

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add additional padding layer to deconv converter to fix output_shape

 Status ConvertConv2DHelper(OpConverterParams* params, int group,     layer->setDilation(dilation);     conv_layer = layer;   }-  nvinfer1::ITensor* output_tensor = conv_layer->getOutput(0);-+  nvinfer1::ITensor* conv_output_tensor = conv_layer->getOutput(0);+  nvinfer1::ITensor* output_tensor;+  // Add an extra padding for Deconv because TRT doesn't accept the+  // argument output_shape and thus the TRT output shape could be wrong+  // in case of strides>1.+  if (is_conv2d_backprop_input) {+    auto tf_output_shape = backprop_output_size.GetTrtDims();+    nvinfer1::Dims trt_output_shape = conv_output_tensor->getDimensions();+    const int heightDiff = tf_output_shape.d[h_index - 1] - trt_output_shape.d[1];+    const int widthDiff = tf_output_shape.d[w_index - 1] - trt_output_shape.d[2];+    nvinfer1::DimsHW pre_padding(0, 0);+    nvinfer1::DimsHW post_padding(heightDiff, widthDiff);

What happen if height/width_diff is 0? Please comment. Also, is it possible to get negatives?

pooyadavoodi

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add additional padding layer to deconv converter to fix output_shape

 Status ConvertConv2DHelper(OpConverterParams* params, int group,     layer->setDilation(dilation);     conv_layer = layer;   }-  nvinfer1::ITensor* output_tensor = conv_layer->getOutput(0);-+  nvinfer1::ITensor* conv_output_tensor = conv_layer->getOutput(0);+  nvinfer1::ITensor* output_tensor;+  // Add an extra padding for Deconv because TRT doesn't accept the+  // argument output_shape and thus the TRT output shape could be wrong+  // in case of strides>1.+  if (is_conv2d_backprop_input) {+    auto tf_output_shape = backprop_output_size.GetTrtDims();+    nvinfer1::Dims trt_output_shape = conv_output_tensor->getDimensions();+    const int heightDiff = tf_output_shape.d[h_index - 1] - trt_output_shape.d[1];

height_diff width_diff below

pooyadavoodi

comment created time in 2 months

pull request commenttensorflow/tensorflow

Change TrtConversionParams to class from NamedTuple

@pooyadavoodi would you please fix the ubuntu sanity errors? Thanks.

pooyadavoodi

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Set TRT network name to builder configs

 bool TRTEngineOp::ExecuteTrtEngine(OpKernelContext* ctx,                                    EngineContext* engine_context) {   VLOG(1) << "Executing TRT engine: " << name();   auto& cuda_engine = engine_context->cuda_engine;++  if (VLOG_IS_ON(2)) {+    VLOG(2) << "  Network name: " << cuda_engine->getName();

I think simple string comparisons shouldn't affect inference time that much. But it's also fine to make this a debug-only feature. We can always turn it into a strict check later.

pooyadavoodi

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Set TRT network name to builder configs

 Status TrtPrecisionModeFromName(const string& name, TrtPrecisionMode* mode) {   return Status::OK(); } +string GetLinkedTensorRTVersion() {+  int major, minor, patch;+#if GOOGLE_CUDA && GOOGLE_TENSORRT+  major = NV_TENSORRT_MAJOR;+  minor = NV_TENSORRT_MINOR;+  patch = NV_TENSORRT_PATCH;+#else+  major = 0;+  minor = 0;+  patch = 0;+#endif+  string trt_version = std::to_string(major) + "." +

StrCat here and below as well?

pooyadavoodi

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Set TRT network name to builder configs

 Status Converter::BuildCudaEngine(     }   } +  string precision_mode_str;+  TF_RETURN_IF_ERROR(TrtPrecisionModeToName(+      precision_mode_, &precision_mode_str));+  string trt_version_str = GetLoadedTensorRTVersion();+  string trt_network_name =+      "TF" + string(TF_VERSION_STRING) + "-" +

How about using StrCat?

pooyadavoodi

comment created time in 2 months

pull request commenttensorflow/tensorflow

Enable preventing engine build at runtime

Hi @pooyadavoodi, could you help to resolved the conflicts? Thanks.

pooyadavoodi

comment created time in 2 months

issue commenttensorflow/tensorflow

Merging two RT graphs throughs an error

Hi @anshkumar, it currently doesn't support merging two TF-TRT converted graphs into one, the main reason is, both graph will have TRTEngineOps with the same name. Please let me know why this is useful and provide more info about your use case. Thanks.

Also @sanjoy

anshkumar

comment created time in 2 months

issue commenttensorflow/tensorflow

C++17 features used even though C++11 standard explicitly given

@coderforlife sorry for the late response. Please feel free to file a PR and I can review that.

coderforlife

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add implicit batch experimental

 def get_tensorrt_rewriter_config(conversion_params, is_v2=False):   _check_conversion_params(conversion_params, is_v2=is_v2)    rewriter_config_with_trt = rewriter_config_pb2.RewriterConfig()-  if conversion_params.rewriter_config_template is None:++  if not disable_non_trt_optimizers:

Yeah CopyFrom will overwrite that so we need to do that at the end of the method.

pooyadavoodi

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add implicit batch experimental

 def get_tensorrt_rewriter_config(conversion_params, is_v2=False):   _check_conversion_params(conversion_params, is_v2=is_v2)    rewriter_config_with_trt = rewriter_config_pb2.RewriterConfig()-  if conversion_params.rewriter_config_template is None:++  if not disable_non_trt_optimizers:

If template is not None do we still want to add these? How about:

if rewriter_config_template is None:
  ...extend(["constfold", ...])
  ...add trt optimizer
if disable_non_trt_optimizers:
  ...disable them
pooyadavoodi

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Set TRT network name to builder configs

 bool TRTEngineOp::ExecuteTrtEngine(OpKernelContext* ctx,                                    EngineContext* engine_context) {   VLOG(1) << "Executing TRT engine: " << name();   auto& cuda_engine = engine_context->cuda_engine;++  if (VLOG_IS_ON(2)) {+    VLOG(2) << "  Network name: " << cuda_engine->getName();

Can we make sure that the TRTEngineOp's setting matches the network name?

pooyadavoodi

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Set TRT network name to builder configs

 limitations under the License. #include "absl/strings/str_cat.h" #include "absl/strings/string_view.h" #include "tensorflow/compiler/tf2tensorrt/convert/utils.h"+#include "tensorflow/compiler/tf2tensorrt/utils/py_utils.h"

Please avoid depending on this if possible. This is for python only, it depends on TF's dso_loader, and I remembered it has caused some problems before (though I forgot how exactly).

Instead, please add new methods Get{Linked,Loaded}TensorRTVersion and maybe let them return a string to make it more convenient. We already duplicate the code in line ~1189.

Thanks.

pooyadavoodi

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Ensure sorted=true in top_k converter

 Status ConvertTopK(OpConverterParams* params) {       CheckInputsWeights(*params, {{"input", false}, {"k", true}}));   TF_RETURN_IF_ERROR(       AllowDataTypes(*params, {DataType::DT_FLOAT, DataType::DT_HALF}));+  TFAttrs attrs(node_def);+  const bool sorted = attrs.get<bool>("sorted");+  if (!sorted) {+    return errors::InvalidArgument("Only sorted=True is supported, at",

Thanks, but is that part of the API semantic? I could not find anything in https://www.tensorflow.org/api_docs/python/tf/math/top_k saying that. If that's not an API guarantee, it means any order is fine?

For example, did you see not fixing this causing any problem for any models?

pooyadavoodi

comment created time in 2 months

push eventpooyadavoodi/tensorflow

Guangda Lai

commit sha 8a03f977c2b8233452a4567518e9d303a50ae1d6

Revert the change to gen_tftrt_model.py

view details

push time in 2 months

push eventpooyadavoodi/tensorflow

amoitra

commit sha e07dfe6a7091db80bef5c94d10eaf6360d7224de

Set xla_gpu_use_cudnn_batchnorm to true

view details

namrata-ibm

commit sha a816b97a61ad152b64f7550351379ccae860d841

Updating metrics_export_meta_graph.pb for fixing //tensorflow/python:framework_meta_graph_test on big endian

view details

amoitra

commit sha b64e97b4448b86bafc8dc72d6a5db64e15dfe30e

Always expand batchnorm inference

view details

amoitra

commit sha 400bba1be9dfb1518b7a0748041d279532b7f7c0

Add comments

view details

Pooya Davoodi

commit sha a26b56085598bed7afe7cfef43fd1e2547b3831d

Add script to generate TF-TRT model to be used for testing

view details

Pooya Davoodi

commit sha f1129b373d05c9dc9d11943807c6a361a6ab3092

Add tftrt_2.0_saved_model This model can be loaded and executed in tests to ensure backwards compatibility across TF versions.

view details

Pooya Davoodi

commit sha a55e9c2ae95e9d899383d4d712719e6285f3ad55

Add backward compatibility test

view details

Pooya Davoodi

commit sha ebde7cce583c0be72c887b18ac17a1bc828ec95e

Move data of quantization_mnist_test to testdata/mnist

view details

Agoniii

commit sha 6a74e16e94f05375b5b220cc046e3e0b1d5d2055

add label for xlaop

view details

Johan Euphrosine

commit sha 75057371b960dc9e468ea1977031c37e729aeeba

lite/microfrontend: fix FilterbankState unsigned type missmatch FilterbankState work is uint64_t*, casting a signed type prevent the libraries to compile w/ the esp32 arduino core toolchain.

view details

Agoniii

commit sha 90bf975aeccb36c9a816849e3d696730bf50088c

using std::string

view details

Lukas Geiger

commit sha 147de48ad973a6a05e8113af815988014652caf2

Return new instance of AutoCastVariable after assignment Assignments and sparse updates will now return a new instance of `AutoCastVariable` wrapping the [`_UnreadVariable`](https://github.com/tensorflow/tensorflow/blob/2692ea8ec1953e42952597adb5b5099181a679b2/tensorflow/python/ops/resource_variable_ops.py#L1806) returned from the assignment op.

view details

Fei Hu

commit sha be36fd93ef817281115eac37c37a800e8b182001

switch BucketBySequenceLengthTest to use combinations

view details

Fei Hu

commit sha c5c6f9686f1be93a54d344e5f2f864f1855e1bed

Switch CopyToDeviceTest to use TF combination

view details

Fei Hu

commit sha 7fc9eb7bfee7dd62000fd39c44a566444503a93a

Switch CounterTest to use TF combinations

view details

Fei Hu

commit sha 1246058569a00f9bc2d479eeb94dc5f3e4c708c4

Switch CsvDatasetTest to use TF combinations

view details

Fei Hu

commit sha b9350874eb8bbf4f0576234b1ecdc997d65bea1c

Switch DenseToSparseBatchTest to use TF combinations

view details

Fei Hu

commit sha 12a6cc569963354dfa4a5d10af291a9e6fcc3b06

Switch DirectedInterleaveDatasetTest to use TF combinations

view details

Fei Hu

commit sha b7be69fcce909f9aef82dc975025d14eb374aa45

Switch GetSingleElementTest to use TF combinations

view details

Fei Hu

commit sha b48761c689d575c92995e15dd78990cb291badf3

Switch GroupByReducerTest to use TF combinations

view details

push time in 2 months

startedKhronosGroup/SPIRV-Cross

started time in 2 months

Pull request review commenttensorflow/tensorflow

Ensure sorted=true in top_k converter

 Status ConvertTopK(OpConverterParams* params) {       CheckInputsWeights(*params, {{"input", false}, {"k", true}}));   TF_RETURN_IF_ERROR(       AllowDataTypes(*params, {DataType::DT_FLOAT, DataType::DT_HALF}));+  TFAttrs attrs(node_def);+  const bool sorted = attrs.get<bool>("sorted");+  if (!sorted) {+    return errors::InvalidArgument("Only sorted=True is supported, at",

Do you know what the semantic is when sorted is false? TF API doc doesn't say anything about that, that means it can be in any order? If that's the case, sorted order is also a valid order for sorted=false?

pooyadavoodi

comment created time in 2 months

push eventaaroey/tensorflow

Guangda Lai

commit sha b1ef8311ae80a7af07bde8124c460074479ff976

test

view details

push time in 2 months

push eventpooyadavoodi/tensorflow

Akshay Modi

commit sha 958974242816d551fa695ddeaf4e37fe78537ef6

Copy resource device correctly in TFE_TensorHandleCopyToDevice PiperOrigin-RevId: 281992454 Change-Id: I223f05688b2c873dcffa02a743fa3302bda80f16

view details

Jiri Simsa

commit sha 67ba5e2cee95ae92bf1ebf32463a9a30040e43ba

[tf.data] Cancellation improvements for asynchronous tf.data C++ kernels. PiperOrigin-RevId: 281993507 Change-Id: I3ba12fb26719356b4e24dab26c43ea104b921f92

view details

Jared Duke

commit sha 3d0958eff212020f290106cd5b6d3ab582623cf6

Update Lite readmes to reference `--config=android_arm` for building PiperOrigin-RevId: 281994306 Change-Id: I8a46c54d177ab74307ea3ab7d7ccad212e249913

view details

Feng Liu

commit sha e3c40d174a97c43fdb74984187014d3431a54736

Add tf.uint8 to the input/output types lists of tfl.strided_slice and tfl.depth_to_space Additionally, since tf.uint8 and tf.quint8 are mapped to the same tflite type, they are considered to be the same as element type. The op validation is updated, so the zip test can pass now. PiperOrigin-RevId: 281996170 Change-Id: I78eedb3568d19c6c6f2fdfbd02b088b73e17dbdb

view details

Jose Baiocchi

commit sha 714611b668497ac1ee402db204dc0f2f8be69360

Remove "platform:tracing" from "lib_internal" deps PiperOrigin-RevId: 281996549 Change-Id: I35ca7514e9aa9e8628d4f7467e4a68172fb04454

view details

Mihai Maruseac

commit sha a36c12b7c70e750ef4bb8828855c3675e9519dc9

Add hash table resource implementation into TFLite PiperOrigin-RevId: 281997222 Change-Id: I8680e454fd4de99d4a2d51ef651df97d52253f0b

view details

Davide Libenzi

commit sha 44d09fca5a853b2c543ed6c9c7f3ebb755b2de43

Reduce Shape object creations (and conversions from proto) in XlaBuilder. PiperOrigin-RevId: 281997626 Change-Id: I4fa6ce89ca42dd4a2ec451f10c2ebd9224b33867

view details

Lucy Fox

commit sha 366c6e04f57230fc554d5c7d2691ac79d305ba00

Small formatting fix in Tutorial Ch2. PiperOrigin-RevId: 281998069 Change-Id: I1cf342f204299b9fae4a73a059507e4e15cce00a

view details

Brian Zhao

commit sha f635dd19e4892f88f8b37cba8c5c604b1dd446f7

Automated g4 rollback of changelist 281835274. PiperOrigin-RevId: 281998143 Change-Id: I27c047173e3fb6dc480e03037777acf86b0b1a64

view details

Yanan Cao

commit sha 2f889d7b84128a57452138c48b9df8b9465e4b33

Support resource types in CastCompatible checks. PiperOrigin-RevId: 281999124 Change-Id: Ib3a9749114e8e5c5463c25e9f6618e4d811d1449

view details

Sean Silva

commit sha f44a805fc5b6f7d3408e9912e8d3e704df6e7dde

tf_saved_model: Disallow duplicate bound inputs. This is a useful invariant because (together with a local check that resource Value's are not passed twice in the same argument list to e.g. a called function) guarantees that resource variables don't alias in a module with tf_saved_model semantics. PiperOrigin-RevId: 282003375 Change-Id: I7ba0dbda9a6ee3c734b4503fc7f68b09b505a758

view details

Mihai Maruseac

commit sha 0332dbab7a3555c8e0f19c960afca2f7b3c6ff60

Recursively create and delete directories under POSIX modular filesystem. We also provide tests to make sure all API requirements are satisfied. Just a small sized part of work for modular filesystem plugins. For more details, consult the RFC at https://github.com/tensorflow/community/blob/master/rfcs/20190506-filesystem-plugin-modular-tensorflow.md PiperOrigin-RevId: 282004206 Change-Id: I5256fe6fabd6ac85844437833c51b27f7cf92d81

view details

Jacob Burnim

commit sha 34c7bed9f6f88b3599cbc69df8c06fd210374edb

Rollback: Avoid FindFunctionDef lookup PiperOrigin-RevId: 282004381 Change-Id: If0caa639bd059b5ced8c77b3d49e5e92aa565efe

view details

Yanan Cao

commit sha 4021a5a86e9ce03ce21df0bf6388a1ee179c665f

Run constant folding after shape inference during tf->xla legalization to maximize chance of lowering ops that have compile-time constant operand requirement. PiperOrigin-RevId: 282005507 Change-Id: I811780560268a69c8f065783cad1b091b0b2e92c

view details

Brian Zhao

commit sha 05cdd2e0ed370ed778443133b8cf06a67fc4d851

Update LLVM version, since MLIR's change https://github.com/tensorflow/tensorflow/commit/f1d30f5e8d30096951f8e2066ce74813c5519dfe breaks the build. PiperOrigin-RevId: 282007263 Change-Id: Ibc5c20139443bc52c08e81658f19445fc3979519

view details

Thomas O'Malley

commit sha f4fb3007edcf8206fe75965f0d6cc18b3d343893

Fix lazy load of training v1 in SavedModel load. PiperOrigin-RevId: 282008023 Change-Id: I66d8e0d2987c0eaef48273d2ac345c309cd80329

view details

A. Unique TensorFlower

commit sha 4b642cefe8001aa2ad8706130eece641ef1528be

Add more canonicalizations for SubViewOp. Depending on which of the offsets, sizes, or strides are constant, the subview op can be canonicalized in different ways. Add such canonicalizations, which generalize the existing approach of canonicalizing subview op only if all of offsets, sizes and shapes are constants. PiperOrigin-RevId: 282010703 Change-Id: I9d46e37d9484d34c5e2605e4351c196addb856cc

view details

Amit Patankar

commit sha 6cae11a063393fd93a2421ac3236c123de38d84e

Updated the RBE image hashes to upgrade the estimator version. PiperOrigin-RevId: 282011479 Change-Id: I2d7b2312a14be29c03b8a3b7477da20ef675d042

view details

Denis Khalikov

commit sha a2009504968511cab3445fb92bed6a2d10f10e5b

[spirv] Add a canonicalizer for `spirv::LogicalNotOp`. Add a canonicalizer for `spirv::LogicalNotOp`. Converts: * spv.LogicalNot(spv.IEqual(...)) -> spv.INotEqual(...) * spv.LogicalNot(spv.INotEqual(...)) -> spv.IEqual(...) * spv.LogicalNot(spv.LogicalEqual(...)) -> spv.LogicalNotEqual(...) * spv.LogicalNot(spv.LogicalNotEqual(...)) -> spv.LogicalEqual(...) Also moved the test for spv.IMul to arithemtic tests. Closes #256 COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/256 from denis0x0D:sandbox/canon_logical_not 76ab5787b2c777f948c8978db061d99e76453d44 PiperOrigin-RevId: 282012356 Change-Id: I60413fae31379a55a90093b23b810309810f3725

view details

A. Unique TensorFlower

commit sha 9200d7d738e5cdaf4629a00a3191b0df171e1d47

Allow tensor-like objects in _GetNdArray Otherwise, tensor-like objects that are not instances of tf.Tensor (e.g. tf.Variable) can't be used in array assertions. PiperOrigin-RevId: 282017077 Change-Id: I6c6c250b238644a7872884ff3e0a2322443d7bb8

view details

push time in 2 months

push eventpooyadavoodi/tensorflow

Guangda Lai

commit sha 168b722a126455d18a64aa2e0e80a8eb3325bbdf

Add no_pip tag to avoid pip_smoke_test depending on newly added savedmodel.

view details

push time in 2 months

Pull request review commenttensorflow/tensorflow

TF-TRT Backward compatibility test

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# =============================================================================+"""Saves a SavedModel after TensorRT conversion.+   The saved model is loaded and executed by tests to ensure backward+   compatibility across TF versions.+   The script may not work in TF1.x.++   Instructions on how to use this script:+   - Execute the script as follows:+       python gen_tftrt_model+   - Rename tftrt_saved_model to what makes sense for your test.+   - Delete directory tf_saved_model unless you want to use it.+"""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf

I think adding a build rule will make our life easier: we just need to run bazel run to get the model. But I don't have strong opinion. But at least replace import tensorflow with direct module imports.

pooyadavoodi

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

TF-TRT Backward compatibility test

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# =============================================================================+"""Saves a SavedModel after TensorRT conversion.+   The saved model is loaded and executed by tests to ensure backward+   compatibility across TF versions.+   The script may not work in TF1.x.++   Instructions on how to use this script:+   - Execute the script as follows:+       python gen_tftrt_model+   - Rename tftrt_saved_model to what makes sense for your test.+   - Delete directory tf_saved_model unless you want to use it.+"""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf

There is an error when trying to submit this PR: 'import tensorflow' statements are no allowed inside TF's code base... Please import modules directly.

Also could you help to add a build rule for this file?

pooyadavoodi

comment created time in 3 months

more