profile
viewpoint
Trevor Morris trevor-m @aws Palo Alto, Californa SageMaker Neo team at AWS. Former member of Tensorflow team at NVIDIA. Interested in Computer Graphics and Deep Learning.

tensorflow/tensorrt 354

TensorFlow/TensorRT integration

trevor-m/deep-gbuffers 29

Implementation of "Fast Global Illumination Approximations on Deep G-Buffers" (Mara et. al, 2016) using C++, OpenGL, and GLSL

trevor-m/tensorflow-SRGAN 24

Tensorflow implementation of "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network" (Ledig et al. 2017)

trevor-m/tensorflow-bicubic-downsample 14

tf.image.resize_images has aliasing when downsampling and does not have gradients for bicubic mode. This implementation fixes those problems.

trevor-m/cuda-pathtrace 4

A realtime photorealistic pathtracer implemented in CUDA, using a convolutional neural network denoising algorithm

trevor-m/raytracer 3

A multithreaded Whitted ray tracer (C++) which supports reflection, refraction, shadows, interpolated textures and normals, color and intersection shaders, as well as Monte Carlo anti-aliasing, depth-of-field, and BSSSRDFs

trevor-m/DeepLearningExamples 2

Deep Learning Examples

trevor-m/mips-processor 2

A dual-issue superscalar pipelined MIPS architecture which includes a cache, a branch-target buffer and a multiplication coprocessor. Completed in ECE154B in Spring 2016 with my partner Tristan Seroff.

trevor-m/reyes-renderer 2

A REYES-style micropolygon renderer written in C++ which implements a subset of the RenderMan specification.

trevor-m/deep-tor-detection 1

Using deep learning to distinguish between Tor and nonTor traffic

Pull request review commentneo-ai/neo-ai-dlr

[Tensorflow] Add DLR_TFConfig

 typedef struct DLR_TFTensorDesc { int CreateDLRModelFromTensorflow(DLRModelHandle* handle, const char* model_path,                                  const DLR_TFTensorDesc* inputs, int input_size,                                  const char* outputs[], int output_size,-                                 const int threads);+                                 const DLR_TFConfig tf_config);

const DLR_TFConfig tf_config -> const DLR_TFConfig& tf_config

apivovarov

comment created time in 6 days

Pull request review commentneo-ai/neo-ai-dlr

[Tensorflow] Add DLR_TFConfig

 class TensorflowModel : public DLRModel {       const std::string& model_path, const DLContext& ctx,       const std::vector<std::string>& inputs,       const std::vector<std::vector<int64_t>>& input_shapes,-      const std::vector<std::string>& outputs, const int threads);+      const std::vector<std::string>& outputs, const DLR_TFConfig tf_config);

const DLR_TFConfig tf_config -> const DLR_TFConfig& tf_config

apivovarov

comment created time in 6 days

PR opened neo-ai/tvm

Reviewers
Use add_definitions instead of set_source_file_properties

set_source_file_properties appears to remove existing definitions set by cmake files. In this case, using it to define TVM_GRAPH_RUNTIME_TENSORRT removed the DMLC_ENABLE_RTTI=0 definition set earlier by cmake. This caused DMLC to compile incorrectly when USE_TENSORRT is enabled.

Fix by using add_definitions to set -DTVM_GRAPH_RUNTIME_TENSORRT.

+1 -7

0 comment

1 changed file

pr created time in 6 days

create barnchtrevor-m/tvm

branch : trevmorr-fix-compile

created branch time in 6 days

pull request commentneo-ai/neo-ai-dlr

[Tensorflow] Add DLR_TFConfig

Have you tried setting only allow_growth=True and not setting per_process_gpu_memory_fraction?

apivovarov

comment created time in 6 days

Pull request review commentapache/incubator-tvm

[Fix] Fix get_valid_count flaky test for cuda

 from .. import tag  -def get_valid_counts_pre(data, flag, idx, score_threshold, id_index, score_index):-    """Low level IR to Prepare get valid count of bounding boxes-    given a score threshold. Also moves valid boxes to the+def cuda_atomicAdd_rule(op):+    if op.dtype == "float32":+        return tvm.call_pure_extern("float32", "atomicAdd", op.args[0], op.args[1])+    elif op.dtype == "float64":+        return tvm.call_pure_extern("float64", "atomicAdd", op.args[0], op.args[1])+    elif op.dtype == "int32":+        return tvm.call_pure_extern("int32", "atomicAdd", op.args[0], op.args[1])+    else:+        raise RuntimeError("only support int32, float32 and float64")+++tvm.target.intrin.register_intrin_rule(+    "cuda", "atomicAdd", cuda_atomicAdd_rule, override=True)+++def atomicAdd(x, y):+    return tvm.call_pure_intrin(y.dtype, "atomicAdd", x, y)+++def get_valid_counts_ir(data, valid_count, Flag, score_threshold, id_index, score_index):

Change variable/function names to fit conventions Flag -> flag atomicAdd -> atomic_add etc

Laurawly

comment created time in 7 days

push eventneo-ai/neo-ai-dlr

Olivier Cahagne (AWS)

commit sha 1145f886b4056034c1c7892495816a38026641ac

Update install.rst (#144)

view details

push time in 8 days

PR merged neo-ai/neo-ai-dlr

Update install.rst - missing closing bracket messing up the URI

Thanks for contributing to DLR! By submitting this pull request, you confirm that your contribution is made under the terms of the Apache 2.0 license.

Please refer to our guideline for useful information and tips.

+1 -1

0 comment

1 changed file

wolruf

pr closed time in 8 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha db9f65d52a05b68be16272be2fe35828cff728d0

Fix GPU inference container (#145) * Add TensorRT lib dir to LD_LIBRARY_PATH * Apply changes from PR #129 to gpu * Update Dockerfile.gpu

view details

push time in 8 days

PR merged neo-ai/neo-ai-dlr

Fix GPU inference container
  1. Add TensorRT lib dir to LD_LIBRARY_PATH
  2. Apply https://github.com/neo-ai/neo-ai-dlr/pull/129 to GPU container as well
+3 -2

0 comment

1 changed file

trevor-m

pr closed time in 8 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha 9fc47e566df83df05c71de7939b908b8ae183da5

Update Dockerfile.gpu

view details

push time in 8 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha d2340788d609e56f0cf717dfaccb939bff989df5

Apply changes from PR #129 to gpu

view details

push time in 8 days

PR opened neo-ai/neo-ai-dlr

Add TensorRT lib dir to LD_LIBRARY_PATH

Thanks for contributing to DLR! By submitting this pull request, you confirm that your contribution is made under the terms of the Apache 2.0 license.

Please refer to our guideline for useful information and tips.

+1 -0

0 comment

1 changed file

pr created time in 8 days

create barnchneo-ai/neo-ai-dlr

branch : trevmorr-trt-lib-path

created branch time in 8 days

push eventtrevor-m/tvm

Trevor Morris

commit sha 0ea238f2fa2f833c4285b34065a7f9a659b8b90a

Formatting, skip inference during calibration, build engine on last calib input

view details

push time in 11 days

push eventtrevor-m/tvm

Trevor Morris

commit sha a506e1c70c2aa98ae50194b2be6265d6b3321d16

Free calib buffers. Print cache:

view details

push time in 12 days

create barnchtrevor-m/tvm

branch : trevmorr-calibrate-int8

created branch time in 12 days

delete branch neo-ai/neo-ai-dlr

delete branch : trevor-m-patch-1

delete time in 12 days

delete branch neo-ai/neo-ai-dlr

delete branch : trevmorr-trt-7

delete time in 12 days

PR closed neo-ai/neo-ai-dlr

Try to fix install & test CI failure

CI is failing to install TF:

unning setup.py install for absl-py: started

[2020-02-11T21:31:04.172Z]     Running setup.py install for absl-py: finished with status 'error'

[2020-02-11T21:31:04.172Z]     Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-vbi86_qq/absl-py/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-2rbcu6iz-record/install-record.txt --single-version-externally-managed --compile:

[2020-02-11T21:31:04.172Z]     Traceback (most recent call last):

[2020-02-11T21:31:04.172Z]       File "<string>", line 1, in <module>

[2020-02-11T21:31:04.172Z]       File "/usr/local/lib/python3.6/dist-packages/setuptools/__init__.py", line 18, in <module>

[2020-02-11T21:31:04.172Z]     ModuleNotFoundError: No module named 'setuptools.extension'

[2020-02-11T21:31:04.172Z]     

[2020-02-11T21:31:04.172Z]     ----------------------------------------

[2020-02-11T21:31:04.172Z]   Rolling back uninstall of absl-py

[2020-02-11T21:31:05.005Z] Command "/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-vbi86_qq/absl-py/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-2rbcu6iz-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-vbi86_qq/absl-py/

script returned exit code 1
+2 -0

1 comment

1 changed file

trevor-m

pr closed time in 12 days

pull request commentneo-ai/neo-ai-dlr

Try to fix install & test CI failure

CI seems to have fixed itself again

trevor-m

comment created time in 12 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha 384ac56b212d5a6d9fa344c0c2815c8f8bbeb115

Use TRT 7 (#137)

view details

push time in 12 days

PR merged neo-ai/neo-ai-dlr

Reviewers
Use TRT 7 for inference containers enhancement

Update dockerfile, README, to use TRT7 for inference containers.

+7 -7

3 comments

3 changed files

trevor-m

pr closed time in 12 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha abf1f072789e5356fc3b6ba830b97a9cce7fcde2

Update Jenkinsfile

view details

push time in 13 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha 75c51c232995c9eb7f267c7599843519506ab74e

Update Jenkinsfile

view details

push time in 13 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha 55aeea5d79d9f868e3f992fb60ee63eb5b8e18e3

Use TRT 7

view details

push time in 13 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha e6c02906a68f003181c326ce4612b2cb152ebfbe

Use TRT 7

view details

push time in 13 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha de700f4ddf1d0294c49fde0d1e3332c2e5a5846e

Use TRT 7

view details

push time in 13 days

PR opened neo-ai/neo-ai-dlr

Try to fix install & test CI failure

Thanks for contributing to DLR! By submitting this pull request, you confirm that your contribution is made under the terms of the Apache 2.0 license.

Please refer to our guideline for useful information and tips.

+2 -0

0 comment

1 changed file

pr created time in 13 days

create barnchneo-ai/neo-ai-dlr

branch : trevor-m-patch-1

created branch time in 13 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha 1387e796c1192ace43ed064e147217f228b65db5

Use TRT 7

view details

push time in 14 days

push eventneo-ai/neo-ai-dlr

Philip Hyunsu Cho

commit sha 2895a6747f48dd7a420f0a0ac227236272af246b

Revamp inference containers to use latest MMS; Produce 400 response for ClientError (#129)

view details

Trevor Morris

commit sha 0b39471ae1034041fda8ebb79fbaee12aa92df61

Use TRT 7

view details

push time in 14 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha 7d3fa7844c7bcfcec614bc6b2bd58a3b537e266b

Use TRT 7

view details

push time in 14 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha 87da775a717b8cd48610eaf28468f8ed6520bc41

Use TRT 7

view details

push time in 14 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha 50bd4029650a1e27ec9e8fde962f9472168d31a4

Use TRT 7

view details

push time in 14 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha 64aec162820d3955ad43e47fad8b6408cac17b5e

Use TRT 7

view details

push time in 14 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha 9ef2e8b4a17f8e7c9ed5eb742ac8952ec519c2ea

Use TRT 7

view details

push time in 14 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha 6eba3d5740b0081e4b3c7c3ae0adfc199e9ab3f7

Use TRT 7

view details

push time in 14 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha 86e75ba4b5a4226a1e6409c0fd0c1587fe005cab

Use TRT 7

view details

push time in 14 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha 7087be35bf59a66ea247dc0288cd9bd91b3fea34

Use TRT 7

view details

push time in 14 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha f2d6ab374f761bc616daaa641ff4690ec429378a

Use TRT 7

view details

push time in 14 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha c8a2b16e79397c9d0817a0569b0224eb8a0571ad

Use TRT 7

view details

push time in 15 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha fc2b011ea81e410fa0b09698b4ab97cb23c7c2c8

Use TRT 7

view details

push time in 15 days

push eventneo-ai/neo-ai-dlr

Trevor Morris

commit sha 51dbde4522794801cbcb455a6dd83fcd6cd16dd9

Use TRT 7

view details

push time in 15 days

push eventneo-ai/neo-ai-dlr

Philip Hyunsu Cho

commit sha 2895a6747f48dd7a420f0a0ac227236272af246b

Revamp inference containers to use latest MMS; Produce 400 response for ClientError (#129)

view details

push time in 15 days

PR merged neo-ai/neo-ai-dlr

Revamp inference containers to use latest MMS; Produce 400 response for ClientError
  • Upgrade MMS to latest version (1.1.0)
  • Re-write handler code to work with the latest MMS
  • Filter exceptions that starts with ClientError and generate 400 HTTP response. This is only made possible by upgrading MMS.
  • Proper support for batching requests

@ashishgupta023

+233 -182

0 comment

6 changed files

hcho3

pr closed time in 15 days

pull request commentneo-ai/neo-ai-dlr

Use TRT 7 for inference containers

@hcho3 Could you please review?

trevor-m

comment created time in 18 days

PR closed neo-ai/neo-ai-dlr

Use TRT 5.1.5

Update readmes, dockerfile, and Jenkins to use TRT 5.1.5

+7 -7

2 comments

3 changed files

trevor-m

pr closed time in 18 days

pull request commentneo-ai/neo-ai-dlr

Use TRT 5.1.5

Closing in favor of https://github.com/neo-ai/neo-ai-dlr/pull/137

trevor-m

comment created time in 18 days

PR opened neo-ai/neo-ai-dlr

Use TRT 7 for inference containers

Update dockerfile, README, to use TRT7 for inference containers.

+7 -7

0 comment

3 changed files

pr created time in 18 days

create barnchneo-ai/neo-ai-dlr

branch : trevmorr-trt-7

created branch time in 18 days

Pull request review commentneo-ai/neo-ai-dlr

[TF C API] Autodetect inputs and outputs

 TF_Output TensorflowModel::ParseTensorName(const std::string& t_name) {   return oper_out; } +void TensorflowModel::DetectInputs() {+  size_t pos = 0;+  TF_Operation* op;+  while ((op = TF_GraphNextOperation(graph_, &pos)) != nullptr) {+    const std::string op_type = TF_OperationOpType(op);+    const int n_in = TF_OperationNumInputs(op);+    const int n_out = TF_OperationNumOutputs(op);+    const std::string op_name = TF_OperationName(op);+    if (op_type == "Placeholder" && n_in == 0 && n_out == 1) {+      input_names_.push_back(op_name + ":0");+    }+  }+  num_inputs_ = input_names_.size();+}++void TensorflowModel::DetectOutputs() {+  size_t pos = 0;+  TF_Operation* op;+  // while loop+  while ((op = TF_GraphNextOperation(graph_, &pos)) != nullptr) {+    const std::string op_type = TF_OperationOpType(op);+    const int n_out = TF_OperationNumOutputs(op);+    const int n_cout = TF_OperationNumControlOutputs(op);+    const std::string op_name = TF_OperationName(op);+    if (op_type != "Const" && op_type != "Assign" && op_type != "NoOp" &&+        op_type != "Placeholder" && n_cout == 0) {+      int n_cons = 0;

Since there are already a few n_* variables going around here, it may be helpful to change n_cons to n_consumers to be more clear.

apivovarov

comment created time in 21 days

PR opened neo-ai/neo-ai-dlr

Update to stable branch

Thanks for contributing to DLR! By submitting this pull request, you confirm that your contribution is made under the terms of the Apache 2.0 license.

Please refer to our guideline for useful information and tips.

+2 -2

0 comment

2 changed files

pr created time in 21 days

create barnchtrevor-m/neo-ai-dlr

branch : trevmorr-stable

created branch time in 21 days

push eventneo-ai/tvm

Trevor Morris

commit sha 747579657922308803a0dde078781475d5833a4c

[Relay/TRT] Support clip for TRT 4 using relu + eltwise (#83) * Support clip for TRT 4 using relu + eltwise * Re-enable consistency check * Invoke convertlayout properly

view details

push time in 21 days

PR merged neo-ai/tvm

[Relay/TRT] Support clip for TRT 4 using relu + eltwise

Allow clip op for TRT4 using other ops.

I also snuck in two small fixes:

  • Had accidently disabled consistency check for integration tests. Those are enabled again. (test_tensorrt.py)
  • Call ConvertLayout properly. (tensorrt.py)
+65 -6

1 comment

5 changed files

trevor-m

pr closed time in 21 days

push eventtrevor-m/tvm

Trevor Morris

commit sha 85b5b1cfb0555227841dcb555b781da181cc67a9

Support clip for TRT 4 using relu + eltwise

view details

Trevor Morris

commit sha 972c63c5f1184d4164c61f752e14a2e115538c7d

Re-enable consistency check

view details

Trevor Morris

commit sha a8ee1af97e093bff484cdbf2579626361797dfab

Invoke convertlayout properly

view details

push time in 21 days

push eventtrevor-m/tvm

Trevor Morris

commit sha e9917756003580640b22980e0d18c73387975a6d

Support clip for TRT 4 using relu + eltwise

view details

push time in 21 days

pull request commentneo-ai/tvm

[Relay/TRT] Support clip for TRT 4 using relu + eltwise

@anijain2305

trevor-m

comment created time in 21 days

PR opened neo-ai/tvm

[Relay/TRT] Support clip for TRT 4 using relu + eltwise

Allow clip op for TRT4 using other ops.

+58 -1

0 comment

3 changed files

pr created time in 21 days

push eventtrevor-m/tvm

Trevor Morris

commit sha f6c96703c0e7ecef022b9b048827164cce830181

Support clip for TRT 4 using relu + eltwise

view details

push time in 21 days

push eventtrevor-m/tvm

Trevor Morris

commit sha 3a5805db14e17401489b5746a7f5f790eb62769d

Support clip for TRT 4 using relu + eltwise

view details

push time in 21 days

create barnchtrevor-m/tvm

branch : trevmorr-clip-trt4

created branch time in 21 days

create barnchtrevor-m/tvm

branch : trevmorr-trt-support-nhwc-biasadd

created branch time in 22 days

push eventneo-ai/tvm

Trevor Morris

commit sha 3188a6829baf4d3fcbb3c795cc4a412e75d966c0

Fix bug with LegalizeLayoutTranform which added duplicate ops (#81)

view details

push time in a month

PR merged neo-ai/tvm

Fix bug with LegalizeLayoutTranform which added duplicate ops

LegalizeLayoutTransform visitor was not replacing layout_transform with transpose properly.

+5 -6

1 comment

1 changed file

trevor-m

pr closed time in a month

Pull request review commentneo-ai/tvm

Fix bug with LegalizeLayoutTranform which added duplicate ops

 def EnableTrt(mod, params=None, trt_version=None):     mod = relay.transform.RemoveUnusedFunctions()(mod)     mod = relay.transform.InferType()(mod)     mod = relay.transform.ConvertLayout('NCHW')(mod)+    print(mod['main'])

Oops, fixed.

trevor-m

comment created time in a month

push eventtrevor-m/tvm

Trevor Morris

commit sha 994a73fcc7edc9c95cf32437026f2e96dfc04d43

Fix bug with LegalizeLayoutTranform which added duplicate ops

view details

push time in a month

pull request commentneo-ai/tvm

Fix bug with LegalizeLayoutTranform which added duplicate ops

@anijain2305

trevor-m

comment created time in a month

PR opened neo-ai/tvm

Fix bug with LegalizeLayoutTranform which added duplicate ops

LegalizeLayoutTransform visitor was not replacing layout_transform with transpose properly.

+6 -6

0 comment

1 changed file

pr created time in a month

push eventtrevor-m/tvm

Trevor Morris

commit sha 9bfc36d0d1436198353245fb74628e17cd2aa9d4

Fix bug with LegalizeLayoutTranform which added duplicate ops

view details

push time in a month

create barnchtrevor-m/tvm

branch : trevmorr-fix-trt-legalize-layout-transform

created branch time in a month

Pull request review commentneo-ai/neo-ai-dlr

Ask input shape in TF C API

 void TensorflowModel::LoadFrozenModel(const char* pb_file) {   TF_DeleteBuffer(graph_def); } -void TensorflowModel::GenTensorSpec(bool is_input, const int batch_size) {-  std::vector<std::string> tensor_names =-      is_input ? input_names_ : output_names_;+TF_Output TensorflowModel::ParseTensorName(const std::string& t_name) {   std::regex r("^(.+):(\\d+)$");-  for (std::string t_name : tensor_names) {-    std::smatch match;-    std::string op_name;-    int op_out_id;-    if (std::regex_search(t_name, match, r)) {-      op_name = match.str(1);-      op_out_id = std::stoi(match.str(2));-    } else {-      LOG(FATAL) << "ERROR: failed to parse tensor name " << t_name;-      return;  // unreachable-    }+  std::smatch match;+  if (!std::regex_search(t_name, match, r)) {+    LOG(FATAL) << "ERROR: failed to parse tensor name " << t_name;+  }+  std::string op_name = match.str(1);+  int op_out_id = std::stoi(match.str(2));+  TF_Operation* op = TF_GraphOperationByName(graph_, op_name.c_str());+  if (op == NULL) {+    LOG(FATAL) << "ERROR: TF_GraphOperationByName failed for operation "+               << op_name;+  }+  TF_Output oper_out = {op, op_out_id};+  return oper_out;+} -    TF_Operation* op = TF_GraphOperationByName(graph_, op_name.c_str());-    if (op == NULL) {-      LOG(FATAL)-          << "ERROR: inputOp TF_GraphOperationByName failed for operation "-          << op_name;-      return;  // unreachable-    }-    TF_Output oper_out = {op, op_out_id};-    const int n_dim = TF_GraphGetTensorNumDims(graph_, oper_out, status_);-    if (TF_GetCode(status_) != TF_OK) {-      LOG(FATAL) << "ERROR: TF_GraphGetTensorNumDims failed "-                 << TF_Message(status_);-      return;  // unreachable-    }-    int64_t dims[n_dim];-    TF_GraphGetTensorShape(graph_, oper_out, dims, n_dim, status_);-    if (TF_GetCode(status_) != TF_OK) {-      LOG(FATAL) << "ERROR: TF_GraphGetTensorShape failed "-                 << TF_Message(status_);-      return;  // unreachable-    }-    // Set fixed batch size if batch size is dynamic-    if (dims[0] == -1) {-      dims[0] = batch_size > 0 ? batch_size : 1;-      TF_GraphSetTensorShape(graph_, oper_out, dims, n_dim, status_);-      if (TF_GetCode(status_) != TF_OK) {-        LOG(FATAL) << "ERROR: TF_GraphSetTensorShape failed "-                   << TF_Message(status_);-        return;  // unreachable-      }-    }+void TensorflowModel::PrepInputs() {+  for (int i = 0; i < num_inputs_; i++) {+    const std::string t_name = input_names_[i];+    const TF_Output oper_out = ParseTensorName(t_name);+    const std::vector<int64_t> shape = input_shapes_[i];

Make shape a reference so it doesn't have to be copied

apivovarov

comment created time in a month

Pull request review commentneo-ai/neo-ai-dlr

Ask input shape in TF C API

 void TensorflowModel::LoadFrozenModel(const char* pb_file) {   TF_DeleteBuffer(graph_def); } -void TensorflowModel::GenTensorSpec(bool is_input, const int batch_size) {-  std::vector<std::string> tensor_names =-      is_input ? input_names_ : output_names_;-  std::regex r("^(.+):(\\d+)$");-  for (std::string t_name : tensor_names) {-    std::smatch match;-    std::string op_name;-    int op_out_id;-    if (std::regex_search(t_name, match, r)) {-      op_name = match.str(1);-      op_out_id = std::stoi(match.str(2));-    } else {-      LOG(FATAL) << "ERROR: failed to parse tensor name " << t_name;-      return;  // unreachable+void TensorflowModel::ParseInputSpec(const std::string& input_dict) {+  std::string err;+  const auto json = json11::Json::parse(input_dict, err);+  if (!err.empty()) {+    LOG(FATAL) << "ERROR: Failed to parse input_dict: " << err;+    return;  // unreachable+  }+  for (auto& k : json.object_items()) {+    input_names_.push_back(k.first);+    std::vector<int64_t> shape;+    for (auto& v : k.second.array_items()) {+      shape.push_back(v.int_value());     }+    input_shapes_.push_back(shape);+  }+  num_inputs_ = input_names_.size();+} -    TF_Operation* op = TF_GraphOperationByName(graph_, op_name.c_str());-    if (op == NULL) {-      LOG(FATAL)-          << "ERROR: inputOp TF_GraphOperationByName failed for operation "-          << op_name;-      return;  // unreachable-    }-    TF_Output oper_out = {op, op_out_id};-    const int n_dim = TF_GraphGetTensorNumDims(graph_, oper_out, status_);-    if (TF_GetCode(status_) != TF_OK) {-      LOG(FATAL) << "ERROR: TF_GraphGetTensorNumDims failed "-                 << TF_Message(status_);-      return;  // unreachable-    }-    int64_t dims[n_dim];-    TF_GraphGetTensorShape(graph_, oper_out, dims, n_dim, status_);+void TensorflowModel::ParseOutputSpec(const std::string& output_arr) {+  std::string err;+  const auto json = json11::Json::parse(output_arr, err);+  if (!err.empty()) {+    LOG(FATAL) << "ERROR: Failed to parse output_arr: " << err;+    return;  // unreachable+  }+  for (auto& k : json.array_items()) {+    output_names_.push_back(k.string_value());+  }+  num_outputs_ = output_names_.size();+}++TF_Output TensorflowModel::ParseTensorName(const std::string& t_name) {+  std::regex r("^(.+):(\\d+)$");+  std::smatch match;+  if (!std::regex_search(t_name, match, r)) {+    LOG(FATAL) << "ERROR: failed to parse tensor name " << t_name;+  }+  std::string op_name = match.str(1);+  int op_out_id = std::stoi(match.str(2));+  TF_Operation* op = TF_GraphOperationByName(graph_, op_name.c_str());+  if (op == NULL) {+    LOG(FATAL) << "ERROR: TF_GraphOperationByName failed for operation "+               << op_name;+  }+  TF_Output oper_out = {op, op_out_id};+  return oper_out;+}++void TensorflowModel::PrepInputs() {+  for (int i = 0; i < num_inputs_; i++) {+    std::string t_name = input_names_[i];+    TF_Output oper_out = ParseTensorName(t_name);+    std::vector<int64_t> shape = input_shapes_[i];+    int64_t* dims = shape.data();+    size_t n_dim = shape.size();+    TF_GraphSetTensorShape(graph_, oper_out, dims, n_dim, status_);     if (TF_GetCode(status_) != TF_OK) {-      LOG(FATAL) << "ERROR: TF_GraphGetTensorShape failed "+      LOG(FATAL) << "ERROR: TF_GraphSetTensorShape failed "                  << TF_Message(status_);       return;  // unreachable     }-    // Set fixed batch size if batch size is dynamic-    if (dims[0] == -1) {-      dims[0] = batch_size > 0 ? batch_size : 1;-      TF_GraphSetTensorShape(graph_, oper_out, dims, n_dim, status_);-      if (TF_GetCode(status_) != TF_OK) {-        LOG(FATAL) << "ERROR: TF_GraphSetTensorShape failed "-                   << TF_Message(status_);-        return;  // unreachable-      }-    }+     size_t num_elements = 1;     for (int z = 0; z < n_dim; z++) {-      if (dims[z] == -1) {

Should we keep this check in case user gives a bad input shape containing -1?

apivovarov

comment created time in a month

Pull request review commentneo-ai/neo-ai-dlr

Ask input shape in TF C API

 TensorflowModel::TensorflowModel(const std::string& model_path,   }   TF_DeleteSessionOptions(opts);   LOG(INFO) << "Tensorflow Session was created";++  // Run inference to allocate output Tensors and calculate output shapes.+  for (int i = 0; i < num_inputs_; i++) {+    TF_Tensor* tensor = input_tensors_[i];+    int64_t num_elements = TF_TensorElementCount(tensor);+    float* in_t_data = (float*)TF_TensorData(tensor);+    for (int i = 0; i < num_elements; i++) {

Lets use std::fill here instead of the loop.

apivovarov

comment created time in a month

Pull request review commentneo-ai/neo-ai-dlr

Ask input shape in TF C API

 void TensorflowModel::LoadFrozenModel(const char* pb_file) {   TF_DeleteBuffer(graph_def); } -void TensorflowModel::GenTensorSpec(bool is_input, const int batch_size) {-  std::vector<std::string> tensor_names =-      is_input ? input_names_ : output_names_;-  std::regex r("^(.+):(\\d+)$");-  for (std::string t_name : tensor_names) {-    std::smatch match;-    std::string op_name;-    int op_out_id;-    if (std::regex_search(t_name, match, r)) {-      op_name = match.str(1);-      op_out_id = std::stoi(match.str(2));-    } else {-      LOG(FATAL) << "ERROR: failed to parse tensor name " << t_name;-      return;  // unreachable+void TensorflowModel::ParseInputSpec(const std::string& input_dict) {+  std::string err;+  const auto json = json11::Json::parse(input_dict, err);+  if (!err.empty()) {+    LOG(FATAL) << "ERROR: Failed to parse input_dict: " << err;+    return;  // unreachable+  }+  for (auto& k : json.object_items()) {+    input_names_.push_back(k.first);+    std::vector<int64_t> shape;+    for (auto& v : k.second.array_items()) {+      shape.push_back(v.int_value());     }+    input_shapes_.push_back(shape);+  }+  num_inputs_ = input_names_.size();+} -    TF_Operation* op = TF_GraphOperationByName(graph_, op_name.c_str());-    if (op == NULL) {-      LOG(FATAL)-          << "ERROR: inputOp TF_GraphOperationByName failed for operation "-          << op_name;-      return;  // unreachable-    }-    TF_Output oper_out = {op, op_out_id};-    const int n_dim = TF_GraphGetTensorNumDims(graph_, oper_out, status_);-    if (TF_GetCode(status_) != TF_OK) {-      LOG(FATAL) << "ERROR: TF_GraphGetTensorNumDims failed "-                 << TF_Message(status_);-      return;  // unreachable-    }-    int64_t dims[n_dim];-    TF_GraphGetTensorShape(graph_, oper_out, dims, n_dim, status_);+void TensorflowModel::ParseOutputSpec(const std::string& output_arr) {+  std::string err;+  const auto json = json11::Json::parse(output_arr, err);+  if (!err.empty()) {+    LOG(FATAL) << "ERROR: Failed to parse output_arr: " << err;+    return;  // unreachable+  }+  for (auto& k : json.array_items()) {+    output_names_.push_back(k.string_value());+  }+  num_outputs_ = output_names_.size();+}++TF_Output TensorflowModel::ParseTensorName(const std::string& t_name) {+  std::regex r("^(.+):(\\d+)$");+  std::smatch match;+  if (!std::regex_search(t_name, match, r)) {+    LOG(FATAL) << "ERROR: failed to parse tensor name " << t_name;+  }+  std::string op_name = match.str(1);+  int op_out_id = std::stoi(match.str(2));+  TF_Operation* op = TF_GraphOperationByName(graph_, op_name.c_str());+  if (op == NULL) {+    LOG(FATAL) << "ERROR: TF_GraphOperationByName failed for operation "+               << op_name;+  }+  TF_Output oper_out = {op, op_out_id};+  return oper_out;+}++void TensorflowModel::PrepInputs() {+  for (int i = 0; i < num_inputs_; i++) {+    std::string t_name = input_names_[i];+    TF_Output oper_out = ParseTensorName(t_name);+    std::vector<int64_t> shape = input_shapes_[i];

Make shape a const reference.

apivovarov

comment created time in a month

Pull request review commentneo-ai/neo-ai-dlr

Add TF C API adapter

+#include "dlr_tensorflow/dlr_tensorflow.h"+#include <cstring>+#include <fstream>+#include <numeric>+#include <regex>++using namespace dlr;++std::string dlr::GetTensorflowFile(const std::string& dirname) {+  // Support the case where user provides full path to .pb file.+  if (EndsWith(dirname, ".pb")) {+    return dirname;+  }+  // Scan Dir to find .pb file and check that only one .pb file is provided.+  std::string pb_file;+  std::vector<std::string> paths_vec;+  ListDir(dirname, paths_vec);+  for (auto filename : paths_vec) {+    std::string basename = GetBasename(filename);+    if (EndsWith(filename, ".pb")) {+      if (pb_file.empty()) {+        pb_file = filename;+      } else {+        LOG(FATAL) << "Multiple .pb files under the folder: " << dirname;+      }+    }+  }+  if (pb_file.empty()) {+    LOG(FATAL) << "No Tensorflow frozen model file found under folder: " << dirname;+  }+  return pb_file;+}++void dlr::FreeBuffer(void* data, size_t length) {+  free(data);+}++TF_Buffer* dlr::ReadTFFile(const char* file) {+  FILE *f = fopen(file, "rb");+  fseek(f, 0L, SEEK_END);+  long fsize = ftell(f);+  fseek(f, 0L, SEEK_SET);  //same as rewind(f);++  void* data = malloc(fsize);+  if (fread(data, fsize, 1, f) != 1) {+    printf("File read error....\n");+  }+  fclose(f);++  TF_Buffer* buf = TF_NewBuffer();+  buf->data = data;+  buf->length = fsize;+  buf->data_deallocator = FreeBuffer;+  return buf;+}++void TensorflowModel::LoadFrozenModel(const char* pb_file) {+  const char* op_prefix = "";+  TF_ImportGraphDefOptions* opts = TF_NewImportGraphDefOptions();+  TF_ImportGraphDefOptionsSetPrefix(opts, op_prefix);++  TF_Buffer* graph_def = ReadTFFile(pb_file);+  TF_GraphImportGraphDef(graph_, graph_def, opts, status_);+  if (TF_GetCode(status_) != TF_OK) {+    LOG(FATAL) << "ERROR: Unable to import graph " << TF_Message(status_);+    return; // unreachable+  }+  LOG(INFO) << "Successfully imported graph";+  TF_DeleteImportGraphDefOptions(opts);+  TF_DeleteBuffer(graph_def);+}++void TensorflowModel::GenTensorSpec(bool isInput) {+  std::vector<std::string> tensor_names = isInput ? input_names_ : output_names_;+  std::regex r("^(.+):(\\d+)$");+  for (std::string t_name : tensor_names) {+    std::smatch match;+    std::string op_name;+    int op_out_id;+    if (std::regex_search(t_name, match, r)) {+      op_name = match.str(1);+      op_out_id = std::stoi(match.str(2));+    } else {+      LOG(FATAL) << "ERROR: failed to parse tensor name " << t_name;+      return; // unreachable+    }++    TF_Operation* op = TF_GraphOperationByName(graph_, op_name.c_str());+    if (op == NULL) {+      LOG(FATAL) << "ERROR: inputOp TF_GraphOperationByName failed for operation " << op_name;+      return; // unreachable+    }+    TF_Output oper_out = {op, op_out_id};+    int n_dim = TF_GraphGetTensorNumDims(graph_, oper_out, status_);+    if (TF_GetCode(status_) != TF_OK) {+      LOG(FATAL) << "ERROR: TF_GraphGetTensorNumDims failed " << TF_Message(status_);+      return; // unreachable+    }+    int64_t dims[n_dim];+    TF_GraphGetTensorShape(graph_, oper_out, dims, n_dim, status_);+    if (TF_GetCode(status_) != TF_OK) {+      LOG(FATAL) << "ERROR: TF_GraphGetTensorShape failed " << TF_Message(status_);+      return; // unreachable+    }+    if (dims[0] == -1) {+      dims[0] = 1;+    }+    size_t num_pixels = dims[0];

Lets use num_elements instead of num_pixels since we aren't always using images.

apivovarov

comment created time in a month

Pull request review commentneo-ai/neo-ai-dlr

Add TF C API adapter

+#include "dlr_tensorflow/dlr_tensorflow.h"+#include <cstring>+#include <fstream>+#include <numeric>+#include <regex>++using namespace dlr;++std::string dlr::GetTensorflowFile(const std::string& dirname) {+  // Support the case where user provides full path to .pb file.+  if (EndsWith(dirname, ".pb")) {+    return dirname;+  }+  // Scan Dir to find .pb file and check that only one .pb file is provided.+  std::string pb_file;+  std::vector<std::string> paths_vec;+  ListDir(dirname, paths_vec);+  for (auto filename : paths_vec) {+    std::string basename = GetBasename(filename);+    if (EndsWith(filename, ".pb")) {+      if (pb_file.empty()) {+        pb_file = filename;+      } else {+        LOG(FATAL) << "Multiple .pb files under the folder: " << dirname;+      }+    }+  }+  if (pb_file.empty()) {+    LOG(FATAL) << "No Tensorflow frozen model file found under folder: " << dirname;+  }+  return pb_file;+}++void dlr::FreeBuffer(void* data, size_t length) {+  free(data);+}++TF_Buffer* dlr::ReadTFFile(const char* file) {+  FILE *f = fopen(file, "rb");+  fseek(f, 0L, SEEK_END);+  long fsize = ftell(f);+  fseek(f, 0L, SEEK_SET);  //same as rewind(f);++  void* data = malloc(fsize);+  if (fread(data, fsize, 1, f) != 1) {+    printf("File read error....\n");+  }+  fclose(f);++  TF_Buffer* buf = TF_NewBuffer();+  buf->data = data;+  buf->length = fsize;+  buf->data_deallocator = FreeBuffer;+  return buf;+}++void TensorflowModel::LoadFrozenModel(const char* pb_file) {+  const char* op_prefix = "";+  TF_ImportGraphDefOptions* opts = TF_NewImportGraphDefOptions();+  TF_ImportGraphDefOptionsSetPrefix(opts, op_prefix);++  TF_Buffer* graph_def = ReadTFFile(pb_file);+  TF_GraphImportGraphDef(graph_, graph_def, opts, status_);+  if (TF_GetCode(status_) != TF_OK) {+    LOG(FATAL) << "ERROR: Unable to import graph " << TF_Message(status_);+    return; // unreachable+  }+  LOG(INFO) << "Successfully imported graph";+  TF_DeleteImportGraphDefOptions(opts);+  TF_DeleteBuffer(graph_def);+}++void TensorflowModel::GenTensorSpec(bool isInput) {+  std::vector<std::string> tensor_names = isInput ? input_names_ : output_names_;+  std::regex r("^(.+):(\\d+)$");+  for (std::string t_name : tensor_names) {+    std::smatch match;+    std::string op_name;+    int op_out_id;+    if (std::regex_search(t_name, match, r)) {+      op_name = match.str(1);+      op_out_id = std::stoi(match.str(2));+    } else {+      LOG(FATAL) << "ERROR: failed to parse tensor name " << t_name;+      return; // unreachable+    }++    TF_Operation* op = TF_GraphOperationByName(graph_, op_name.c_str());+    if (op == NULL) {+      LOG(FATAL) << "ERROR: inputOp TF_GraphOperationByName failed for operation " << op_name;+      return; // unreachable+    }+    TF_Output oper_out = {op, op_out_id};+    int n_dim = TF_GraphGetTensorNumDims(graph_, oper_out, status_);

const int

apivovarov

comment created time in a month

Pull request review commentneo-ai/neo-ai-dlr

Add TF C API adapter

+#include "dlr_tensorflow/dlr_tensorflow.h"+#include <cstring>+#include <fstream>+#include <numeric>+#include <regex>++using namespace dlr;++std::string dlr::GetTensorflowFile(const std::string& dirname) {+  // Support the case where user provides full path to .pb file.+  if (EndsWith(dirname, ".pb")) {+    return dirname;+  }+  // Scan Dir to find .pb file and check that only one .pb file is provided.+  std::string pb_file;+  std::vector<std::string> paths_vec;+  ListDir(dirname, paths_vec);+  for (auto filename : paths_vec) {+    std::string basename = GetBasename(filename);+    if (EndsWith(filename, ".pb")) {+      if (pb_file.empty()) {+        pb_file = filename;+      } else {+        LOG(FATAL) << "Multiple .pb files under the folder: " << dirname;+      }+    }+  }+  if (pb_file.empty()) {+    LOG(FATAL) << "No Tensorflow frozen model file found under folder: " << dirname;+  }+  return pb_file;+}++void dlr::FreeBuffer(void* data, size_t length) {+  free(data);+}++TF_Buffer* dlr::ReadTFFile(const char* file) {+  FILE *f = fopen(file, "rb");+  fseek(f, 0L, SEEK_END);+  long fsize = ftell(f);+  fseek(f, 0L, SEEK_SET);  //same as rewind(f);++  void* data = malloc(fsize);+  if (fread(data, fsize, 1, f) != 1) {+    printf("File read error....\n");+  }+  fclose(f);++  TF_Buffer* buf = TF_NewBuffer();+  buf->data = data;+  buf->length = fsize;+  buf->data_deallocator = FreeBuffer;+  return buf;+}++void TensorflowModel::LoadFrozenModel(const char* pb_file) {+  const char* op_prefix = "";+  TF_ImportGraphDefOptions* opts = TF_NewImportGraphDefOptions();+  TF_ImportGraphDefOptionsSetPrefix(opts, op_prefix);++  TF_Buffer* graph_def = ReadTFFile(pb_file);+  TF_GraphImportGraphDef(graph_, graph_def, opts, status_);+  if (TF_GetCode(status_) != TF_OK) {+    LOG(FATAL) << "ERROR: Unable to import graph " << TF_Message(status_);+    return; // unreachable+  }+  LOG(INFO) << "Successfully imported graph";+  TF_DeleteImportGraphDefOptions(opts);+  TF_DeleteBuffer(graph_def);+}++void TensorflowModel::GenTensorSpec(bool isInput) {+  std::vector<std::string> tensor_names = isInput ? input_names_ : output_names_;+  std::regex r("^(.+):(\\d+)$");+  for (std::string t_name : tensor_names) {+    std::smatch match;+    std::string op_name;+    int op_out_id;+    if (std::regex_search(t_name, match, r)) {+      op_name = match.str(1);+      op_out_id = std::stoi(match.str(2));+    } else {+      LOG(FATAL) << "ERROR: failed to parse tensor name " << t_name;+      return; // unreachable+    }++    TF_Operation* op = TF_GraphOperationByName(graph_, op_name.c_str());+    if (op == NULL) {+      LOG(FATAL) << "ERROR: inputOp TF_GraphOperationByName failed for operation " << op_name;+      return; // unreachable+    }+    TF_Output oper_out = {op, op_out_id};+    int n_dim = TF_GraphGetTensorNumDims(graph_, oper_out, status_);+    if (TF_GetCode(status_) != TF_OK) {+      LOG(FATAL) << "ERROR: TF_GraphGetTensorNumDims failed " << TF_Message(status_);+      return; // unreachable+    }+    int64_t dims[n_dim];+    TF_GraphGetTensorShape(graph_, oper_out, dims, n_dim, status_);+    if (TF_GetCode(status_) != TF_OK) {+      LOG(FATAL) << "ERROR: TF_GraphGetTensorShape failed " << TF_Message(status_);+      return; // unreachable+    }+    if (dims[0] == -1) {+      dims[0] = 1;+    }+    size_t num_pixels = dims[0];

n_dim can be 0 when the tensor is a scalar. So num_elements should be initialized to 1 and the following loop should start at dim 0.

Scalars: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/c/c_api_test.cc#L440-L447

apivovarov

comment created time in a month

Pull request review commentneo-ai/neo-ai-dlr

Add TF C API adapter

+#include "dlr_tensorflow/dlr_tensorflow.h"+#include <cstring>+#include <fstream>+#include <numeric>+#include <regex>++using namespace dlr;++std::string dlr::GetTensorflowFile(const std::string& dirname) {+  // Support the case where user provides full path to .pb file.+  if (EndsWith(dirname, ".pb")) {+    return dirname;+  }+  // Scan Dir to find .pb file and check that only one .pb file is provided.+  std::string pb_file;+  std::vector<std::string> paths_vec;+  ListDir(dirname, paths_vec);+  for (auto filename : paths_vec) {+    std::string basename = GetBasename(filename);+    if (EndsWith(filename, ".pb")) {+      if (pb_file.empty()) {+        pb_file = filename;+      } else {+        LOG(FATAL) << "Multiple .pb files under the folder: " << dirname;+      }+    }+  }+  if (pb_file.empty()) {+    LOG(FATAL) << "No Tensorflow frozen model file found under folder: " << dirname;+  }+  return pb_file;+}++void dlr::FreeBuffer(void* data, size_t length) {+  free(data);+}++TF_Buffer* dlr::ReadTFFile(const char* file) {+  FILE *f = fopen(file, "rb");+  fseek(f, 0L, SEEK_END);+  long fsize = ftell(f);+  fseek(f, 0L, SEEK_SET);  //same as rewind(f);++  void* data = malloc(fsize);+  if (fread(data, fsize, 1, f) != 1) {+    printf("File read error....\n");+  }+  fclose(f);++  TF_Buffer* buf = TF_NewBuffer();+  buf->data = data;+  buf->length = fsize;+  buf->data_deallocator = FreeBuffer;+  return buf;+}++void TensorflowModel::LoadFrozenModel(const char* pb_file) {+  const char* op_prefix = "";+  TF_ImportGraphDefOptions* opts = TF_NewImportGraphDefOptions();+  TF_ImportGraphDefOptionsSetPrefix(opts, op_prefix);++  TF_Buffer* graph_def = ReadTFFile(pb_file);+  TF_GraphImportGraphDef(graph_, graph_def, opts, status_);+  if (TF_GetCode(status_) != TF_OK) {+    LOG(FATAL) << "ERROR: Unable to import graph " << TF_Message(status_);+    return; // unreachable+  }+  LOG(INFO) << "Successfully imported graph";+  TF_DeleteImportGraphDefOptions(opts);+  TF_DeleteBuffer(graph_def);+}++void TensorflowModel::GenTensorSpec(bool isInput) {+  std::vector<std::string> tensor_names = isInput ? input_names_ : output_names_;+  std::regex r("^(.+):(\\d+)$");+  for (std::string t_name : tensor_names) {+    std::smatch match;+    std::string op_name;+    int op_out_id;+    if (std::regex_search(t_name, match, r)) {+      op_name = match.str(1);+      op_out_id = std::stoi(match.str(2));+    } else {+      LOG(FATAL) << "ERROR: failed to parse tensor name " << t_name;+      return; // unreachable+    }++    TF_Operation* op = TF_GraphOperationByName(graph_, op_name.c_str());+    if (op == NULL) {+      LOG(FATAL) << "ERROR: inputOp TF_GraphOperationByName failed for operation " << op_name;+      return; // unreachable+    }+    TF_Output oper_out = {op, op_out_id};+    int n_dim = TF_GraphGetTensorNumDims(graph_, oper_out, status_);+    if (TF_GetCode(status_) != TF_OK) {+      LOG(FATAL) << "ERROR: TF_GraphGetTensorNumDims failed " << TF_Message(status_);+      return; // unreachable+    }+    int64_t dims[n_dim];+    TF_GraphGetTensorShape(graph_, oper_out, dims, n_dim, status_);+    if (TF_GetCode(status_) != TF_OK) {+      LOG(FATAL) << "ERROR: TF_GraphGetTensorShape failed " << TF_Message(status_);+      return; // unreachable+    }+    if (dims[0] == -1) {+      dims[0] = 1;+    }+    size_t num_pixels = dims[0];+    for (int z = 1; z < n_dim; z++) {+      num_pixels *= dims[z];+    }+    TF_DataType t_type = TF_OperationOutputType(oper_out);+    size_t num_bytes = TF_DataTypeSize(t_type) * num_pixels;++    TF_Tensor* tensor = TF_AllocateTensor(t_type, dims, n_dim, num_bytes);++    if (isInput) {+      inputs_.push_back(oper_out);+      input_tensors_.push_back(tensor);+    } else {+      outputs_.push_back(oper_out);+      output_tensors_.push_back(tensor);+    }+  }+}++int TensorflowModel::GetInputId(const char* name) {+  // In most of the cases it will be just 1 element in the vector.+  // Scan vector to find tensor by name.+  for (int i = 0; i < num_inputs_; i++) {+    if (input_names_[i].compare(name) == 0) {+      return i;+    }+  }+  LOG(FATAL) << "Input Tensor not found, name: " << name;+  return -1; // unreachable+}+++// Constructor+TensorflowModel::TensorflowModel(const std::string& model_path,+                                 const DLContext& ctx,+                                 const std::vector<std::string>& inputs,+                                 const std::vector<std::string>& outputs,+                                 const int threads+                                ): DLRModel(ctx, DLRBackend::kTENSORFLOW) {+  const std::string pb_file = GetTensorflowFile(model_path);++  status_ = TF_NewStatus();+  graph_ = TF_NewGraph();++  LoadFrozenModel(pb_file.c_str());++  // Using assignment operator to copy one vector to other+  input_names_ = inputs;+  output_names_ = outputs;+  num_inputs_ = inputs.size();+  num_outputs_ = outputs.size();++  GenTensorSpec(true);

To make this more clear

GenTensorSpec(/*is_input=*/true);
GenTensorSpec(/*is_input=*/false);
apivovarov

comment created time in a month

Pull request review commentneo-ai/neo-ai-dlr

Add TF C API adapter

+#include "dlr_tensorflow/dlr_tensorflow.h"+#include <cstring>+#include <fstream>+#include <numeric>+#include <regex>++using namespace dlr;++std::string dlr::GetTensorflowFile(const std::string& dirname) {+  // Support the case where user provides full path to .pb file.+  if (EndsWith(dirname, ".pb")) {+    return dirname;+  }+  // Scan Dir to find .pb file and check that only one .pb file is provided.+  std::string pb_file;+  std::vector<std::string> paths_vec;+  ListDir(dirname, paths_vec);+  for (auto filename : paths_vec) {+    std::string basename = GetBasename(filename);+    if (EndsWith(filename, ".pb")) {+      if (pb_file.empty()) {+        pb_file = filename;+      } else {+        LOG(FATAL) << "Multiple .pb files under the folder: " << dirname;+      }+    }+  }+  if (pb_file.empty()) {+    LOG(FATAL) << "No Tensorflow frozen model file found under folder: " << dirname;+  }+  return pb_file;+}++void dlr::FreeBuffer(void* data, size_t length) {+  free(data);+}++TF_Buffer* dlr::ReadTFFile(const char* file) {+  FILE *f = fopen(file, "rb");+  fseek(f, 0L, SEEK_END);+  long fsize = ftell(f);+  fseek(f, 0L, SEEK_SET);  //same as rewind(f);++  void* data = malloc(fsize);+  if (fread(data, fsize, 1, f) != 1) {+    printf("File read error....\n");+  }+  fclose(f);++  TF_Buffer* buf = TF_NewBuffer();+  buf->data = data;+  buf->length = fsize;+  buf->data_deallocator = FreeBuffer;+  return buf;+}++void TensorflowModel::LoadFrozenModel(const char* pb_file) {+  const char* op_prefix = "";+  TF_ImportGraphDefOptions* opts = TF_NewImportGraphDefOptions();+  TF_ImportGraphDefOptionsSetPrefix(opts, op_prefix);++  TF_Buffer* graph_def = ReadTFFile(pb_file);+  TF_GraphImportGraphDef(graph_, graph_def, opts, status_);+  if (TF_GetCode(status_) != TF_OK) {+    LOG(FATAL) << "ERROR: Unable to import graph " << TF_Message(status_);+    return; // unreachable+  }+  LOG(INFO) << "Successfully imported graph";+  TF_DeleteImportGraphDefOptions(opts);+  TF_DeleteBuffer(graph_def);+}++void TensorflowModel::GenTensorSpec(bool isInput) {+  std::vector<std::string> tensor_names = isInput ? input_names_ : output_names_;+  std::regex r("^(.+):(\\d+)$");+  for (std::string t_name : tensor_names) {+    std::smatch match;+    std::string op_name;+    int op_out_id;+    if (std::regex_search(t_name, match, r)) {+      op_name = match.str(1);+      op_out_id = std::stoi(match.str(2));+    } else {+      LOG(FATAL) << "ERROR: failed to parse tensor name " << t_name;+      return; // unreachable+    }++    TF_Operation* op = TF_GraphOperationByName(graph_, op_name.c_str());+    if (op == NULL) {+      LOG(FATAL) << "ERROR: inputOp TF_GraphOperationByName failed for operation " << op_name;+      return; // unreachable+    }+    TF_Output oper_out = {op, op_out_id};+    int n_dim = TF_GraphGetTensorNumDims(graph_, oper_out, status_);+    if (TF_GetCode(status_) != TF_OK) {+      LOG(FATAL) << "ERROR: TF_GraphGetTensorNumDims failed " << TF_Message(status_);+      return; // unreachable+    }+    int64_t dims[n_dim];+    TF_GraphGetTensorShape(graph_, oper_out, dims, n_dim, status_);+    if (TF_GetCode(status_) != TF_OK) {+      LOG(FATAL) << "ERROR: TF_GraphGetTensorShape failed " << TF_Message(status_);+      return; // unreachable+    }+    if (dims[0] == -1) {

This will override the batch size to only allow batch size = 1 when the model has a dynamic batch dimension. If we want to allow dynamic shapes to work correctly, we need to defer TF_AllocateTensor until SetInput is called.

If we don't care about dynamic shapes and always want to override the batch dimension to 1 in those cases, we should change the dimension in the graph using TF_GraphSetTensorShape. This should allow TF to further optimize the graph by using static shapes. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/c/c_api.h#L199-L219

apivovarov

comment created time in a month

Pull request review commentneo-ai/neo-ai-dlr

Add TF C API adapter

 int CreateDLRModelFromTFLite(DLRModelHandle *handle, } #endif // DLR_TFLITE +#ifdef DLR_TENSORFLOW+/*! \brief Translate c args from ctypes to std types for DLRModelFromTensorflow ctor.+ */+int CreateDLRModelFromTensorflow(DLRModelHandle *handle,+                   const char *model_path,

Allignment is off here. Use clang-format --style=google -i src/dlr.cc

apivovarov

comment created time in a month

Pull request review commentneo-ai/neo-ai-dlr

Add TF C API adapter

+#include "dlr_tensorflow/dlr_tensorflow.h"+#include <cstring>+#include <fstream>+#include <numeric>+#include <regex>++using namespace dlr;++std::string dlr::GetTensorflowFile(const std::string& dirname) {+  // Support the case where user provides full path to .pb file.+  if (EndsWith(dirname, ".pb")) {+    return dirname;+  }+  // Scan Dir to find .pb file and check that only one .pb file is provided.+  std::string pb_file;+  std::vector<std::string> paths_vec;+  ListDir(dirname, paths_vec);+  for (auto filename : paths_vec) {+    std::string basename = GetBasename(filename);+    if (EndsWith(filename, ".pb")) {+      if (pb_file.empty()) {+        pb_file = filename;+      } else {+        LOG(FATAL) << "Multiple .pb files under the folder: " << dirname;+      }+    }+  }+  if (pb_file.empty()) {+    LOG(FATAL) << "No Tensorflow frozen model file found under folder: " << dirname;+  }+  return pb_file;+}++void dlr::FreeBuffer(void* data, size_t length) {+  free(data);+}++TF_Buffer* dlr::ReadTFFile(const char* file) {+  FILE *f = fopen(file, "rb");+  fseek(f, 0L, SEEK_END);+  long fsize = ftell(f);+  fseek(f, 0L, SEEK_SET);  //same as rewind(f);++  void* data = malloc(fsize);+  if (fread(data, fsize, 1, f) != 1) {+    printf("File read error....\n");+  }+  fclose(f);++  TF_Buffer* buf = TF_NewBuffer();+  buf->data = data;+  buf->length = fsize;+  buf->data_deallocator = FreeBuffer;+  return buf;+}++void TensorflowModel::LoadFrozenModel(const char* pb_file) {+  const char* op_prefix = "";+  TF_ImportGraphDefOptions* opts = TF_NewImportGraphDefOptions();+  TF_ImportGraphDefOptionsSetPrefix(opts, op_prefix);++  TF_Buffer* graph_def = ReadTFFile(pb_file);+  TF_GraphImportGraphDef(graph_, graph_def, opts, status_);+  if (TF_GetCode(status_) != TF_OK) {+    LOG(FATAL) << "ERROR: Unable to import graph " << TF_Message(status_);+    return; // unreachable+  }+  LOG(INFO) << "Successfully imported graph";+  TF_DeleteImportGraphDefOptions(opts);+  TF_DeleteBuffer(graph_def);+}++void TensorflowModel::GenTensorSpec(bool isInput) {+  std::vector<std::string> tensor_names = isInput ? input_names_ : output_names_;+  std::regex r("^(.+):(\\d+)$");+  for (std::string t_name : tensor_names) {+    std::smatch match;+    std::string op_name;+    int op_out_id;+    if (std::regex_search(t_name, match, r)) {+      op_name = match.str(1);+      op_out_id = std::stoi(match.str(2));+    } else {+      LOG(FATAL) << "ERROR: failed to parse tensor name " << t_name;+      return; // unreachable+    }++    TF_Operation* op = TF_GraphOperationByName(graph_, op_name.c_str());+    if (op == NULL) {+      LOG(FATAL) << "ERROR: inputOp TF_GraphOperationByName failed for operation " << op_name;+      return; // unreachable+    }+    TF_Output oper_out = {op, op_out_id};+    int n_dim = TF_GraphGetTensorNumDims(graph_, oper_out, status_);+    if (TF_GetCode(status_) != TF_OK) {+      LOG(FATAL) << "ERROR: TF_GraphGetTensorNumDims failed " << TF_Message(status_);+      return; // unreachable+    }+    int64_t dims[n_dim];+    TF_GraphGetTensorShape(graph_, oper_out, dims, n_dim, status_);+    if (TF_GetCode(status_) != TF_OK) {+      LOG(FATAL) << "ERROR: TF_GraphGetTensorShape failed " << TF_Message(status_);+      return; // unreachable+    }+    if (dims[0] == -1) {+      dims[0] = 1;+    }+    size_t num_pixels = dims[0];+    for (int z = 1; z < n_dim; z++) {+      num_pixels *= dims[z];+    }+    TF_DataType t_type = TF_OperationOutputType(oper_out);+    size_t num_bytes = TF_DataTypeSize(t_type) * num_pixels;++    TF_Tensor* tensor = TF_AllocateTensor(t_type, dims, n_dim, num_bytes);++    if (isInput) {+      inputs_.push_back(oper_out);+      input_tensors_.push_back(tensor);+    } else {+      outputs_.push_back(oper_out);+      output_tensors_.push_back(tensor);+    }+  }+}++int TensorflowModel::GetInputId(const char* name) {+  // In most of the cases it will be just 1 element in the vector.+  // Scan vector to find tensor by name.+  for (int i = 0; i < num_inputs_; i++) {+    if (input_names_[i].compare(name) == 0) {+      return i;+    }+  }+  LOG(FATAL) << "Input Tensor not found, name: " << name;+  return -1; // unreachable+}+++// Constructor+TensorflowModel::TensorflowModel(const std::string& model_path,+                                 const DLContext& ctx,+                                 const std::vector<std::string>& inputs,+                                 const std::vector<std::string>& outputs,+                                 const int threads+                                ): DLRModel(ctx, DLRBackend::kTENSORFLOW) {

Use clang-format

apivovarov

comment created time in a month

Pull request review commentneo-ai/neo-ai-dlr

Add TF C API adapter

+#include "dlr_tensorflow/dlr_tensorflow.h"+#include <cstring>+#include <fstream>+#include <numeric>+#include <regex>++using namespace dlr;++std::string dlr::GetTensorflowFile(const std::string& dirname) {+  // Support the case where user provides full path to .pb file.+  if (EndsWith(dirname, ".pb")) {+    return dirname;+  }+  // Scan Dir to find .pb file and check that only one .pb file is provided.+  std::string pb_file;+  std::vector<std::string> paths_vec;+  ListDir(dirname, paths_vec);+  for (auto filename : paths_vec) {+    std::string basename = GetBasename(filename);+    if (EndsWith(filename, ".pb")) {+      if (pb_file.empty()) {+        pb_file = filename;+      } else {+        LOG(FATAL) << "Multiple .pb files under the folder: " << dirname;+      }+    }+  }+  if (pb_file.empty()) {+    LOG(FATAL) << "No Tensorflow frozen model file found under folder: " << dirname;+  }+  return pb_file;+}++void dlr::FreeBuffer(void* data, size_t length) {+  free(data);+}++TF_Buffer* dlr::ReadTFFile(const char* file) {+  FILE *f = fopen(file, "rb");+  fseek(f, 0L, SEEK_END);+  long fsize = ftell(f);+  fseek(f, 0L, SEEK_SET);  //same as rewind(f);++  void* data = malloc(fsize);+  if (fread(data, fsize, 1, f) != 1) {+    printf("File read error....\n");+  }+  fclose(f);++  TF_Buffer* buf = TF_NewBuffer();+  buf->data = data;+  buf->length = fsize;+  buf->data_deallocator = FreeBuffer;+  return buf;+}++void TensorflowModel::LoadFrozenModel(const char* pb_file) {+  const char* op_prefix = "";+  TF_ImportGraphDefOptions* opts = TF_NewImportGraphDefOptions();+  TF_ImportGraphDefOptionsSetPrefix(opts, op_prefix);++  TF_Buffer* graph_def = ReadTFFile(pb_file);+  TF_GraphImportGraphDef(graph_, graph_def, opts, status_);+  if (TF_GetCode(status_) != TF_OK) {+    LOG(FATAL) << "ERROR: Unable to import graph " << TF_Message(status_);+    return; // unreachable+  }+  LOG(INFO) << "Successfully imported graph";+  TF_DeleteImportGraphDefOptions(opts);+  TF_DeleteBuffer(graph_def);+}++void TensorflowModel::GenTensorSpec(bool isInput) {

isInput -> is_input

apivovarov

comment created time in a month

Pull request review commentneo-ai/neo-ai-dlr

Add TF C API adapter

+#include "dlr_tensorflow/dlr_tensorflow.h"+#include <cstring>+#include <fstream>+#include <numeric>+#include <regex>++using namespace dlr;++std::string dlr::GetTensorflowFile(const std::string& dirname) {+  // Support the case where user provides full path to .pb file.+  if (EndsWith(dirname, ".pb")) {+    return dirname;+  }+  // Scan Dir to find .pb file and check that only one .pb file is provided.+  std::string pb_file;+  std::vector<std::string> paths_vec;+  ListDir(dirname, paths_vec);+  for (auto filename : paths_vec) {+    std::string basename = GetBasename(filename);+    if (EndsWith(filename, ".pb")) {+      if (pb_file.empty()) {+        pb_file = filename;+      } else {+        LOG(FATAL) << "Multiple .pb files under the folder: " << dirname;+      }+    }+  }+  if (pb_file.empty()) {+    LOG(FATAL) << "No Tensorflow frozen model file found under folder: " << dirname;+  }+  return pb_file;+}++void dlr::FreeBuffer(void* data, size_t length) {+  free(data);+}++TF_Buffer* dlr::ReadTFFile(const char* file) {+  FILE *f = fopen(file, "rb");+  fseek(f, 0L, SEEK_END);+  long fsize = ftell(f);+  fseek(f, 0L, SEEK_SET);  //same as rewind(f);++  void* data = malloc(fsize);+  if (fread(data, fsize, 1, f) != 1) {+    printf("File read error....\n");+  }+  fclose(f);++  TF_Buffer* buf = TF_NewBuffer();+  buf->data = data;+  buf->length = fsize;+  buf->data_deallocator = FreeBuffer;+  return buf;+}++void TensorflowModel::LoadFrozenModel(const char* pb_file) {+  const char* op_prefix = "";+  TF_ImportGraphDefOptions* opts = TF_NewImportGraphDefOptions();+  TF_ImportGraphDefOptionsSetPrefix(opts, op_prefix);++  TF_Buffer* graph_def = ReadTFFile(pb_file);+  TF_GraphImportGraphDef(graph_, graph_def, opts, status_);+  if (TF_GetCode(status_) != TF_OK) {+    LOG(FATAL) << "ERROR: Unable to import graph " << TF_Message(status_);+    return; // unreachable+  }+  LOG(INFO) << "Successfully imported graph";+  TF_DeleteImportGraphDefOptions(opts);+  TF_DeleteBuffer(graph_def);+}++void TensorflowModel::GenTensorSpec(bool isInput) {+  std::vector<std::string> tensor_names = isInput ? input_names_ : output_names_;+  std::regex r("^(.+):(\\d+)$");+  for (std::string t_name : tensor_names) {+    std::smatch match;+    std::string op_name;+    int op_out_id;+    if (std::regex_search(t_name, match, r)) {+      op_name = match.str(1);+      op_out_id = std::stoi(match.str(2));+    } else {+      LOG(FATAL) << "ERROR: failed to parse tensor name " << t_name;+      return; // unreachable+    }++    TF_Operation* op = TF_GraphOperationByName(graph_, op_name.c_str());+    if (op == NULL) {+      LOG(FATAL) << "ERROR: inputOp TF_GraphOperationByName failed for operation " << op_name;+      return; // unreachable+    }+    TF_Output oper_out = {op, op_out_id};+    int n_dim = TF_GraphGetTensorNumDims(graph_, oper_out, status_);+    if (TF_GetCode(status_) != TF_OK) {+      LOG(FATAL) << "ERROR: TF_GraphGetTensorNumDims failed " << TF_Message(status_);+      return; // unreachable+    }+    int64_t dims[n_dim];+    TF_GraphGetTensorShape(graph_, oper_out, dims, n_dim, status_);+    if (TF_GetCode(status_) != TF_OK) {+      LOG(FATAL) << "ERROR: TF_GraphGetTensorShape failed " << TF_Message(status_);+      return; // unreachable+    }+    if (dims[0] == -1) {+      dims[0] = 1;+    }+    size_t num_pixels = dims[0];

Should we use int64_t instead of size_t since size_t could be 32 bits?

apivovarov

comment created time in a month

Pull request review commentneo-ai/neo-ai-dlr

Add TF C API adapter

+#ifndef DLR_TENSORFLOW_H_+#define DLR_TENSORFLOW_H_++#include "tensorflow/c/c_api.h"+#include "dlr_common.h"++namespace dlr {++/*! \brief Get the paths of the Tensorflow model files.+ */+std::string GetTensorflowFile(const std::string& dirname);++/*! \brief free_buffer function used to cleanup memory after TF model is built.+ */+void FreeBuffer(void* data, size_t length);++/*! \brief read tensorflow model file.+ */+TF_Buffer* ReadTFFile(const char* file);++/*! \brief class TensorflowModel+ */+class TensorflowModel: public DLRModel {+ private:+  TF_Status* status_;+  TF_Graph* graph_;+  TF_Session* sess_;+  // input_names_ are declared in base class+  std::vector<std::string> output_names_;+  std::vector<TF_Output> inputs_;+  std::vector<TF_Output> outputs_;+  std::vector<TF_Tensor*> input_tensors_;+  std::vector<TF_Tensor*> output_tensors_;+  void LoadFrozenModel(const char* pb_file);+  void GenTensorSpec(bool isInput);+  int GetInputId(const char* name);+ public:+  /*! \brief Load model files from given folder path.+   */+  explicit TensorflowModel(const std::string& model_path,+                           const DLContext& ctx,+                           const std::vector<std::string>& inputs,+                           const std::vector<std::string>& outputs,+                           const int threads);+  ~TensorflowModel();++  virtual const char* GetInputName(int index) const override;+  virtual const char* GetWeightName(int index) const override;+  virtual std::vector<std::string> GetWeightNames() const override;+  virtual void GetInput(const char* name, float* input) override;+  virtual void SetInput(const char* name, const int64_t* shape, float* input, int dim) override;+  virtual void Run() override;+  virtual void GetOutput(int index, float* out) override;+  virtual void GetOutputShape(int index, int64_t* shape) const override;+  virtual void GetOutputSizeDim(int index, int64_t* size, int* dim) override;+  virtual const char* GetBackend() const override;+  virtual void SetNumThreads(int threads) override;+  virtual void UseCPUAffinity(bool use) override;+};++} // namespace dlr+++#endif  // DLR_TENSORFLOW_H_

Add newline at EOF

apivovarov

comment created time in a month

Pull request review commentneo-ai/neo-ai-dlr

Add TF C API adapter

+#ifndef DLR_TENSORFLOW_H_+#define DLR_TENSORFLOW_H_++#include "tensorflow/c/c_api.h"+#include "dlr_common.h"++namespace dlr {++/*! \brief Get the paths of the Tensorflow model files.+ */+std::string GetTensorflowFile(const std::string& dirname);++/*! \brief free_buffer function used to cleanup memory after TF model is built.+ */+void FreeBuffer(void* data, size_t length);++/*! \brief read tensorflow model file.+ */+TF_Buffer* ReadTFFile(const char* file);++/*! \brief class TensorflowModel+ */+class TensorflowModel: public DLRModel {

class TensorflowModel : public DLRModel Add space Use clang-format on all modified files

apivovarov

comment created time in a month

push eventtrevor-m/neo-ai-dlr

Trevor Morris

commit sha 6cc9ab1f46de9607a6822cafd8d64eee5dd22df0

Update tvm to latest

view details

push time in a month

push eventneo-ai/tvm

Trevor Morris

commit sha 98b8ca474fadc2962f5ba4871dfc86a6cd5a8bdd

Use int for endch to fix portability issues regarding signed/unsigned char (#75)

view details

push time in a month

PR merged neo-ai/tvm

[cherry-pick] Fix Base64OutStream portability issue

We will need this for Relay/TRT on ARM devices https://github.com/apache/incubator-tvm/pull/4668

+1 -1

0 comment

1 changed file

trevor-m

pr closed time in a month

push eventneo-ai/tvm

Trevor Morris

commit sha ea78f1d053dcf85dd7d56d85b513e83dce9948cb

Relay/TRT Integration (whole graph only) (#54) * Add tensorrt backend. Fix merge Fix merge and clean up logs Add BiasAdd, Concat, padding ceil mode, and clean up code Fix formatting and remove unused headers uncomment models Fix bug with variable input, clean up Don't split batch norm Move TRT execution to TrtExecutor Clean up Clean up Add paritioning Implement graph_runtime execution for Relay/TRT Fix bug in extern op Fix compilation Add EnableTrt pass to perform same modification as previous wholegraphannotator Renable NNVM TRT Remove SimplifyBatchnorm, add rules for converting ops Fix format, remove unused tests Enable multiple outputs Fix multiple outputs Fix activation lookup Fix no newline at eof Add license header. Add consistency test to models Add method to check TRT used. Improve comments Fix lint Add util to check TRT version Add if guards around TRT5.1 APIs Add env var for workspace size, fix logger fix build Add TRT versioning to EnableTrt pass Fix build error in DLR Fix compile for DLR Update dmlc-core, fix copyright header, undo change to includes Remove unused headers Fix IsTrtCompatible visitor and move op list to constructor Add dropout to compatible ops for CheckTrtCompatible only. Add not compatible test Add squeeze, transpose, reshape, pad, and reduce ops. Add transpose on weights workaround Fix formatting. Add unit tests Support transpose on weights for conv2d and dense. Support asymmetric padding. Temp fix for 1D inputs. Add units tests for all ops. Support StridedSlice, AdaptivePooling approximation, Pytorch addmm fixer pass Support (2,3,0,1) tranpose on weights Allow stride to be incomplete. Support ConstantNode -> kWeight Fix pass serialized graph by value in runtime. Allow inclusive count for strided pool Comments, disable failign test Fix CI lint Removed unused variables from TrtBuilder. Add more comments Fix build for TRT4 Add GetTrtVersion(), Move convert map to function, remove uneeded include, make batch_size_, logger_ TrtBuilder members, check output existence Use shared_ptr for converters. Add check for num outputs and inputs Support image.resize Make GetOpConverters return a shared_ptr Clarify count inclusive padding weirdness Use external codegen/runtime Move to src/runtime/contrib/tensorrt. Add Save and Load methods for tensorrt module. Rename some classes Require format to be tensorrt so that loader knows how to load FoldConstants Destroy engine and context after use. Store TRT weights from op converters. Formatting Always apply ConvertLayout to NCHW Clean up Add ASF header Change ObjectRef -> NodeRef Fix lint Fix pylint Fix bug with scalar weights Making TRT cmake more informative Make tensorrt tests dependent on whether trt codegen is enabled Add serialization test. * Refactor EnableTRT checkers * Fix const weight detection * remove tensorrt_module.h, add test for multiple outputs. Use normal GetShape. Remove GetType. Add flag for additional model testing Undo add comments to prevent conflicts * Separate TRT from relay. Add docstrings and more comments. Move all passes to python. Remove double lookup for Execute Formatting Fix lint Fix pylint Rename codegen_tensorrt. Check registry get. Add comments Make trt codegen off by default. * disable for ci * TRT codegen can be turned on independently * Fix tests * Fix build without runtime * Enable AvgPool approximation * Remove change to cmake config * Move passes to PreprocessForTrt. Use op.name. Rename LegalizeLayoutTransform. * Add newlin to EOF. Remove else. Reserve space for vectors * Remove AdaptivePool2D commentted out code. Add comment for transposed weight workaround * Rename IsCompatibleFn * Use ++i instead of i++ * Improve incompatible messages, use string::empty, small improvements * Use constructor to fill func_params * Remove std::move * Use opt level 3, add helper to check whether to run test, improve load_params * Replace TransposeRSCKtoCKRS/KCRS with TransposeWeights4D * Clean up VisitExpr(CallNode) for args

view details

push time in a month

PR merged neo-ai/tvm

Relay/TRT Integration (whole graph only)

This PR includes support for a version of Relay/TRT integration which only works when the entire model can be converted to TRT. It is enabled with the EnableTrt pass. If any op in the model cannot be converted to TRT, EnableTrt will return the original module unmodified.

How to use

  1. Build TVM with cmake flag USE_TENSORRT=ON or USE_TENSORRT=/path/to/TensorRT. USE_CUDA should be enabled as well.

  2. Convert the model into TensorRT. This step will determine if every node in the graph can be converted to TensorrRT and if so will mark the graph to use TensorRT and apply some specific optimization passes.

mod = relay.transform.EnableTrt(mod, params)
  1. Check if TRT was enabled. If not, it means some op in the graph is not supported by the TensorRT conversion. EnableTrt will output which particular ops are not supported and why.
assert mod['main'].attrs and mod['main'].attrs.Compiler == 'tensorrt'
  1. Finish compilation.
with relay.build_config(opt_level=2, disabled_pass={"SimplifyInference"}):
  graph, lib, params = relay.build(mod, "cuda", params=params)
  1. (Optional) Serialize/deserialize the compiled model. The model will be serialized to three files: compiled.json, compiled.params, and compiled.tensorrt.
# Serialize
with open('compiled.json', 'w') as f_graph_json:
  f_graph_json.write(graph)
with open('compiled.params', 'wb') as f_params:
  f_params.write(relay.save_param_dict(params))
lib.save('compiled.tensorrt')

# Deserialize
with open('compiled.json', 'r') as f_graph_json:
  graph = f_graph_json.read()
with open('compiled.params', 'rb') as f_params:
  params = tvm.relay.load_param_dict(f_params.read())
lib = tvm.module.load("compiled.tensorrt")
  1. Run inference. The first invocation will trigger creation of the TensorRT engine. This could take up to a few minutes.
# Create graph runtime
mod = graph_runtime.create(graph, lib, ctx=tvm.gpu(0))
mod.set_input(**params)

i_data = np.random.uniform(0, 1, input_shape).astype(dtype)
# Build TensorRT engine
mod.run(data=i_data)

# Run inference
mod.run(data=i_data)
res = mod.get_output(0)

The tests tests/python/relay/test_tensorrt.py provide some deeper examples of how to use this feature.

The NNVM/TRT integration is still present.

+3425 -4

9 comments

13 changed files

trevor-m

pr closed time in a month

push eventneo-ai/tvm

Alex Wong

commit sha 1ce36ec7de4f9b87697f328e4031f4af1345dec2

Add a PyTorch to Relay parser (#63)

view details

push time in a month

PR merged neo-ai/tvm

Add a PyTorch to Relay parser

Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

Support PyTorch natively in TVM by providing a relay parser. Adapted from https://github.com/neo-ai/tvm/pull/23/ and currently only supports traced models (no control flow) and probably only image classification models (provided test uses torchvision impl to test against).

Like other frontends, grab the relay module and paramaters to build via: mod, params = relay.frontend.from_pytorch(trace, input_shapes)

Tested against torchvision models in the included test_forward.py. Will write a discussion post in discuss.tvm.ai to see if we want this upstream.

+1811 -0

13 comments

6 changed files

alexwong

pr closed time in a month

Pull request review commentneo-ai/tvm

Relay/TRT Integration (whole graph only)

+# Licensed to the Apache Software Foundation (ASF) under one+# or more contributor license agreements.  See the NOTICE file+# distributed with this work for additional information+# regarding copyright ownership.  The ASF licenses this file+# to you under the Apache License, Version 2.0 (the+# "License"); you may not use this file except in compliance+# with the License.  You may obtain a copy of the License at+#+#   http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing,+# software distributed under the License is distributed on an+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY+# KIND, either express or implied.  See the License for the+# specific language governing permissions and limitations+# under the License.+# pylint: disable=invalid-name,arguments-differ,no-else-return,unused-argument,missing-docstring+"""+Relay TensorRT codegen.+"""+import tvm+from tvm import relay+from tvm.relay.expr import Call, Constant++from . import _transform+from .expr_functor import ExprMutator++def _bind_params(func, params):+    """+    Bind the params to the expression as constants.+    """+    name_dict = {}+    for arg in func.params:+        name = arg.name_hint+        if name in name_dict:+            name_dict[name] = None+        else:+            name_dict[name] = arg+    bind_dict = {}+    for k, v in params.items():+        if k not in name_dict:+            continue+        arg = name_dict[k]+        if arg is None:+            raise ValueError("Multiple args in the function have name %s" % k)+        bind_dict[arg] = relay.expr.const(v)+    return relay.expr.bind(func, bind_dict)++class LegalizeLayoutTranform(ExprMutator):+    """+    Legalize Relay layout transforms to transpose ops to simplify TensorRT conversion.

I think its better to leverage relay's pass ability to convert layout_transform op to the more standard transpose ops. This way we only need to write one TrtOpConverter for transpose. If we didn't perform this legalize, I would need to write an additional TrtOpConverter for layout_transform which would be nearly identical to the one for transpose.

This feature of relay is very useful. For example, TRT recently announced that they won't support INT8 for matmul/fully connected layer and to just use 1x1 Conv instead (https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#optimize-layer). So in the future, I plan to have a similar pass to convert all matmul/dense layers into convolutions to take advantage of this. At that point I won't need a converter for dense anymore since everything would go to conv.

trevor-m

comment created time in a month

push eventtrevor-m/tvm

Trevor Morris

commit sha 05014e0387dd91dc38c54b72e69a761ebf7a6166

Clean up VisitExpr(CallNode) for args

view details

push time in a month

Pull request review commentneo-ai/tvm

Relay/TRT Integration (whole graph only)

+/* * Licensed to the Apache Software Foundation (ASF) under one+ * or more contributor license agreements.  See the NOTICE file+ * distributed with this work for additional information+ * regarding copyright ownership.  The ASF licenses this file+ * to you under the Apache License, Version 2.0 (the+ * "License"); you may not use this file except in compliance+ * with the License.  You may obtain a copy of the License at+ *+ *   http://www.apache.org/licenses/LICENSE-2.0+ *+ * Unless required by applicable law or agreed to in writing,+ * software distributed under the License is distributed on an+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY+ * KIND, either express or implied.  See the License for the+ * specific language governing permissions and limitations+ * under the License.+ */++/*!+* \file runtime/contrib/tensorrt/tensorrt_builder.cc+* \brief Contains TensorRTBuilder class which can be used to convert a relay+* program into a TRT engine which can be used for inference.+*/++#include <memory>+#include <string>++#include "../../../relay/backend/contrib/tensorrt/common_utils.h"+#include "tensorrt_builder.h"+#include "tensorrt_logger.h"+#include "tensorrt_ops.h"+#include "utils.h"++namespace tvm {+namespace relay {+namespace contrib {++const std::shared_ptr<+    std::unordered_map<std::string, std::shared_ptr<TrtOpConverter>>>+GetOpConverters() {+  static auto map = std::make_shared<+      std::unordered_map<std::string, std::shared_ptr<TrtOpConverter>>>();+  if (!map->empty()) return map;+  map->emplace("nn.relu", std::make_shared<ActivationOpConverter>());+  map->emplace("sigmoid", std::make_shared<ActivationOpConverter>());+  map->emplace("tanh", std::make_shared<ActivationOpConverter>());+  map->emplace("nn.batch_norm", std::make_shared<BatchNormOpConverter>());+  map->emplace("nn.softmax", std::make_shared<SoftmaxOpConverter>());+  map->emplace("nn.conv2d", std::make_shared<Conv2DOpConverter>());+  map->emplace("nn.dense", std::make_shared<DenseOpConverter>());+  map->emplace("nn.bias_add", std::make_shared<BiasAddOpConverter>());+  map->emplace("add", std::make_shared<ElementWiseBinaryOpConverter>());+  map->emplace("subtract", std::make_shared<ElementWiseBinaryOpConverter>());+  map->emplace("multiply", std::make_shared<ElementWiseBinaryOpConverter>());+  map->emplace("divide", std::make_shared<ElementWiseBinaryOpConverter>());+  map->emplace("power", std::make_shared<ElementWiseBinaryOpConverter>());+  map->emplace("nn.max_pool2d", std::make_shared<PoolingOpConverter>());+  map->emplace("nn.avg_pool2d", std::make_shared<PoolingOpConverter>());+  map->emplace("nn.global_max_pool2d",+               std::make_shared<GlobalPoolingOpConverter>());+  map->emplace("nn.global_avg_pool2d",+               std::make_shared<GlobalPoolingOpConverter>());+  map->emplace("exp", std::make_shared<UnaryOpConverter>());+  map->emplace("log", std::make_shared<UnaryOpConverter>());+  map->emplace("sqrt", std::make_shared<UnaryOpConverter>());+  map->emplace("abs", std::make_shared<UnaryOpConverter>());+  map->emplace("negative", std::make_shared<UnaryOpConverter>());+  map->emplace("nn.batch_flatten", std::make_shared<BatchFlattenOpConverter>());+  map->emplace("expand_dims", std::make_shared<ExpandDimsOpConverter>());+  map->emplace("squeeze", std::make_shared<SqueezeOpConverter>());+  map->emplace("concatenate", std::make_shared<ConcatOpConverter>());+  map->emplace("nn.conv2d_transpose",+               std::make_shared<Conv2DTransposeOpConverter>());+  map->emplace("transpose", std::make_shared<TransposeOpConverter>());+  map->emplace("reshape", std::make_shared<ReshapeOpConverter>());+  map->emplace("nn.pad", std::make_shared<PadOpConverter>());+  map->emplace("sum", std::make_shared<ReduceOpConverter>());+  map->emplace("prod", std::make_shared<ReduceOpConverter>());+  map->emplace("max", std::make_shared<ReduceOpConverter>());+  map->emplace("min", std::make_shared<ReduceOpConverter>());+  map->emplace("mean", std::make_shared<ReduceOpConverter>());+  map->emplace("contrib.adaptive_max_pool2d",+               std::make_shared<AdaptivePoolingOpConverter>());+  map->emplace("contrib.adaptive_avg_pool2d",+               std::make_shared<AdaptivePoolingOpConverter>());+#if TRT_VERSION_GE(5, 1, 5)+  map->emplace("clip", std::make_shared<ActivationOpConverter>());+  map->emplace("nn.leaky_relu", std::make_shared<ActivationOpConverter>());+  map->emplace("sin", std::make_shared<UnaryOpConverter>());+  map->emplace("cos", std::make_shared<UnaryOpConverter>());+  map->emplace("atan", std::make_shared<UnaryOpConverter>());+  map->emplace("ceil", std::make_shared<UnaryOpConverter>());+  map->emplace("floor", std::make_shared<UnaryOpConverter>());+  map->emplace("strided_slice", std::make_shared<StridedSliceOpConverter>());+#endif+#if TRT_VERSION_GE(6, 0, 1)+  map->emplace("image.resize", std::make_shared<ResizeOpConverter>());+#endif+  return map;+}++TensorRTBuilder::TensorRTBuilder(const std::vector<DLTensor*>& args)+    : execution_args_(args) {+  // Create TRT builder and network.+  static runtime::TensorRTLogger logger;+  builder_ = nvinfer1::createInferBuilder(logger);+  batch_size_ = args[0]->shape[0];+  builder_->setMaxBatchSize(batch_size_);+  const size_t workspace_size =+      dmlc::GetEnv("TVM_TENSORRT_MAX_WORKSPACE_SIZE", size_t(1) << 31);+  builder_->setMaxWorkspaceSize(workspace_size);+  const bool use_fp16 = dmlc::GetEnv("TVM_TENSORRT_USE_FP16", false);+  builder_->setFp16Mode(use_fp16);+  network_ = builder_->createNetwork();+}++runtime::TrtEngineAndContext TensorRTBuilder::BuildEngine(const Expr& expr) {+  // Process graph and create INetworkDefinition.+  VisitExpr(expr);+  // Mark outputs.+  auto it = node_output_map_.find(expr.operator->());+  CHECK(it != node_output_map_.end()) << "Output was not found.";+  auto network_outputs = it->second;+  std::vector<std::string> network_output_names;+  for (size_t i = 0; i < network_outputs.size(); ++i) {+    CHECK(network_outputs[i].type == kTensor);+    auto out_tensor = network_outputs[i].tensor;+    std::string output_name = "tensorrt_output" + std::to_string(i);+    out_tensor->setName(output_name.c_str());+    network_output_names.push_back(output_name);+    network_->markOutput(*out_tensor);+    DLOG(INFO) << "Added TRT network output: " << out_tensor->getName()+               << " -> " << output_name;+  }+  nvinfer1::ICudaEngine* engine = builder_->buildCudaEngine(*network_);+  CHECK_EQ(engine->getNbBindings(),+           network_input_map_.size() + network_outputs.size());+  CleanUp();+  nvinfer1::IExecutionContext* context = engine->createExecutionContext();+  return {engine, context, network_input_map_, network_output_names};+}++nvinfer1::Weights TensorRTBuilder::GetDLTensorAsWeights(+    DLTensor* dptr, DLDeviceType src_device) {+  CHECK_EQ(dptr->ctx.device_type, src_device);+  CHECK_EQ(static_cast<int>(dptr->dtype.code), kDLFloat);+  const size_t weight_bytes = runtime::GetDataSize(*dptr);+  nvinfer1::Weights weight{nvinfer1::DataType::kFLOAT, nullptr, 0};+  size_t count = 1;+  for (tvm_index_t i = 0; i < dptr->ndim; ++i) {+    count *= dptr->shape[i];+  }+  CHECK_EQ(count * 4, weight_bytes);+  weight.count = count;+  weight.values = new float[count];+  CHECK_EQ(+      TVMArrayCopyToBytes(dptr, const_cast<void*>(weight.values), weight_bytes),+      0)+      << TVMGetLastError();+  trt_weights_.push_back(weight);+  return weight;+}++nvinfer1::Weights TensorRTBuilder::GetNdArrayAsWeights(+    const runtime::NDArray& array, DLDeviceType src_device) {+  DLTensor* dptr = const_cast<DLTensor*>(array.operator->());+  return GetDLTensorAsWeights(dptr, src_device);+}++void TensorRTBuilder::GetInputAsWeights(const VarNode* node) {+  const int var_node_idx = TrackVarNode(node);+  nvinfer1::Weights weight =+      GetDLTensorAsWeights(execution_args_[var_node_idx], kDLGPU);+  node_output_map_[node] = {TrtOpInput(weight, GetShape(node->checked_type()))};+}++void TensorRTBuilder::GetConstantAsWeights(const ConstantNode* node) {+  auto weight = GetNdArrayAsWeights(node->data, kDLCPU);+  auto shape_long = node->data.Shape();+  std::vector<int> shape(shape_long.begin(), shape_long.end());+  node_output_map_[node] = {TrtOpInput(weight, shape)};+}++void TensorRTBuilder::GetInputAsTransposedWeights(const CallNode* transpose,+                                                  const VarNode* node) {+  GetInputAsWeights(node);+  CHECK_EQ(node_output_map_[node].size(), 1);+  const nvinfer1::Weights& original_weight = node_output_map_[node][0].weight;+  const auto& original_shape = node_output_map_[node][0].weight_shape;+  float* values = new float[original_weight.count];+  // Get order and new shape.+  const auto* attrs = transpose->attrs.as<TransposeAttrs>();+  std::vector<int> order(attrs->axes.size(), 0);+  std::vector<int> new_shape(attrs->axes.size(), 0);+  for (size_t i = 0; i < attrs->axes.size(); ++i) {+    const int axis = attrs->axes[i].as<IntImm>()->value;+    order[i] = axis;+    new_shape[i] = original_shape[axis];+  }+  // Perform transpose.+  if (order.size() == 4 && order[0] == 3 && order[1] == 2 && order[2] == 0 &&+      order[3] == 1) {+    TransposeRSCKtoKCRS(original_shape,+                        static_cast<const float*>(original_weight.values),+                        values);+  } else if (order.size() == 4 && order[0] == 2 && order[1] == 3 &&+             order[2] == 0 && order[3] == 1) {+    TransposeRSCKtoCKRS(original_shape,+                        static_cast<const float*>(original_weight.values),+                        values);+  } else if (order.size() == 2 && order[0] == 1 && order[1] == 0) {+    TransposeCKtoKC(original_shape,+                    static_cast<const float*>(original_weight.values), values);+  } else {+    LOG(FATAL) << "Constant transpose " << DebugString(order)+               << " is not supported.";+  }+  // Map as output of transpose op.+  nvinfer1::Weights transposed_weight{nvinfer1::DataType::kFLOAT, values,+                                      original_weight.count};+  trt_weights_.push_back(transposed_weight);+  node_output_map_[transpose] = {TrtOpInput(transposed_weight, new_shape)};+}++void TensorRTBuilder::VisitExpr_(const TupleGetItemNode* op) {+  if (const auto* tuple = op->tuple.as<TupleNode>()) {+    Expr item = tuple->fields[op->index];+    VisitExpr(item);+    node_output_map_[op] = node_output_map_[item.operator->()];+  } else {+    VisitExpr(op->tuple);+    // Index into tensor outputs from expr.+    node_output_map_[op] = {+        node_output_map_[op->tuple.operator->()][op->index]};+  }+}++void TensorRTBuilder::VisitExpr_(const TupleNode* op) {+  std::vector<TrtOpInput> outputs;+  for (auto item : op->fields) {+    VisitExpr(item);+    auto item_outputs = node_output_map_[item.operator->()];+    outputs.reserve(outputs.size() + item_outputs.size());+    outputs.insert(outputs.end(), item_outputs.begin(), item_outputs.end());+  }+  node_output_map_[op] = outputs;+}++void TensorRTBuilder::VisitExpr_(const VarNode* node) {+  const int id = TrackVarNode(node);++  const std::string& tensor_name = node->name_hint();+  auto shape = GetShape(node->checked_type());+  // Remove batch dim+  if (shape.size() > 1) shape.erase(shape.begin());+  DLOG(INFO) << "Added TRT network input: " << node->name_hint() << " "+             << DebugString(shape);+  nvinfer1::Dims dims = VectorToTrtDims(shape);+  auto type_node = node->checked_type().as<TensorTypeNode>();+  CHECK(type_node != nullptr &&+        runtime::TypeMatch(type_node->dtype, kDLFloat, 32))+      << "Only FP32 inputs are supported.";+  auto input =+      network_->addInput(tensor_name.c_str(), nvinfer1::DataType::kFLOAT, dims);+  network_input_map_[id] = tensor_name;+  node_output_map_[node] = {TrtOpInput(input)};+}++void TensorRTBuilder::VisitExpr_(const ConstantNode* node) {+  nvinfer1::Weights weight = GetNdArrayAsWeights(node->data, kDLCPU);+  auto shape = node->data.Shape();+  // Remove batch dim.+  if (shape.size() > 1 && shape[0] == 1) shape.erase(shape.begin());+  nvinfer1::Dims dims = VectorToTrtDims(shape);+  auto const_layer = network_->addConstant(dims, weight);+  CHECK(const_layer != nullptr);+  node_output_map_[node] = {TrtOpInput(const_layer->getOutput(0))};+}++void TensorRTBuilder::VisitExpr_(const CallNode* call) {+  AddTrtLayerParams params(network_, call, &trt_weights_);+  // Look up converter.+  auto it = GetOpConverters()->find(params.op_name);+  CHECK(it != GetOpConverters()->end())+      << "Unsupported operator conversion to TRT, op name: " << params.op_name;+  const auto converter = it->second;++  // Ensure that nodes are processed in topological order by visiting their+  // inputs first.+  for (size_t i = 0; i < call->args.size(); ++i) {+    // Handle special case where input must be constant array on CPU.+    if (!converter->variable_input_count &&+        converter->input_types[i] == kWeight) {+      // Input must be a constant weight+      if (auto* var = call->args[i].as<VarNode>()) {+        GetInputAsWeights(var);+      } else if (auto* node = call->args[i].as<ConstantNode>()) {+        GetConstantAsWeights(node);+      } else {+        // Temporary workaround for transposed weights. Once partitioning is+        // available, the transpose will be computed by tvm and the result will+        // be a var input.+        if (auto* transpose = call->args[i].as<CallNode>()) {+          if (transpose->op.as<OpNode>()->name == "transpose") {+            if (auto* weights = transpose->args[0].as<VarNode>()) {+              GetInputAsTransposedWeights(transpose, weights);+            } else {+              LOG(FATAL) << "TRT requires a constant input here.";+            }+          } else {+            LOG(FATAL) << "TRT requires a constant input here.";+          }+        } else {+          LOG(FATAL) << "TRT requires a constant input here.";+        }

Thanks, I was wondering how to clean that up.

trevor-m

comment created time in a month

push eventtrevor-m/tvm

Trevor Morris

commit sha 2f5278c2527cb20f8421a866db78acd69bf00b9a

Replace TransposeRSCKtoCKRS/KCRS with TransposeWeights4D

view details

push time in a month

Pull request review commentneo-ai/tvm

Relay/TRT Integration (whole graph only)

+/* * Licensed to the Apache Software Foundation (ASF) under one+ * or more contributor license agreements.  See the NOTICE file+ * distributed with this work for additional information+ * regarding copyright ownership.  The ASF licenses this file+ * to you under the Apache License, Version 2.0 (the+ * "License"); you may not use this file except in compliance+ * with the License.  You may obtain a copy of the License at+ *+ *   http://www.apache.org/licenses/LICENSE-2.0+ *+ * Unless required by applicable law or agreed to in writing,+ * software distributed under the License is distributed on an+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY+ * KIND, either express or implied.  See the License for the+ * specific language governing permissions and limitations+ * under the License.+ */++/*!+* \file runtime/contrib/tensorrt/tensorrt_builder.cc+* \brief Contains TensorRTBuilder class which can be used to convert a relay+* program into a TRT engine which can be used for inference.+*/++#include <memory>+#include <string>++#include "../../../relay/backend/contrib/tensorrt/common_utils.h"+#include "tensorrt_builder.h"+#include "tensorrt_logger.h"+#include "tensorrt_ops.h"+#include "utils.h"++namespace tvm {+namespace relay {+namespace contrib {++const std::shared_ptr<+    std::unordered_map<std::string, std::shared_ptr<TrtOpConverter>>>+GetOpConverters() {+  static auto map = std::make_shared<+      std::unordered_map<std::string, std::shared_ptr<TrtOpConverter>>>();+  if (!map->empty()) return map;+  map->emplace("nn.relu", std::make_shared<ActivationOpConverter>());+  map->emplace("sigmoid", std::make_shared<ActivationOpConverter>());+  map->emplace("tanh", std::make_shared<ActivationOpConverter>());+  map->emplace("nn.batch_norm", std::make_shared<BatchNormOpConverter>());+  map->emplace("nn.softmax", std::make_shared<SoftmaxOpConverter>());+  map->emplace("nn.conv2d", std::make_shared<Conv2DOpConverter>());+  map->emplace("nn.dense", std::make_shared<DenseOpConverter>());+  map->emplace("nn.bias_add", std::make_shared<BiasAddOpConverter>());+  map->emplace("add", std::make_shared<ElementWiseBinaryOpConverter>());+  map->emplace("subtract", std::make_shared<ElementWiseBinaryOpConverter>());+  map->emplace("multiply", std::make_shared<ElementWiseBinaryOpConverter>());+  map->emplace("divide", std::make_shared<ElementWiseBinaryOpConverter>());+  map->emplace("power", std::make_shared<ElementWiseBinaryOpConverter>());+  map->emplace("nn.max_pool2d", std::make_shared<PoolingOpConverter>());+  map->emplace("nn.avg_pool2d", std::make_shared<PoolingOpConverter>());+  map->emplace("nn.global_max_pool2d",+               std::make_shared<GlobalPoolingOpConverter>());+  map->emplace("nn.global_avg_pool2d",+               std::make_shared<GlobalPoolingOpConverter>());+  map->emplace("exp", std::make_shared<UnaryOpConverter>());+  map->emplace("log", std::make_shared<UnaryOpConverter>());+  map->emplace("sqrt", std::make_shared<UnaryOpConverter>());+  map->emplace("abs", std::make_shared<UnaryOpConverter>());+  map->emplace("negative", std::make_shared<UnaryOpConverter>());+  map->emplace("nn.batch_flatten", std::make_shared<BatchFlattenOpConverter>());+  map->emplace("expand_dims", std::make_shared<ExpandDimsOpConverter>());+  map->emplace("squeeze", std::make_shared<SqueezeOpConverter>());+  map->emplace("concatenate", std::make_shared<ConcatOpConverter>());+  map->emplace("nn.conv2d_transpose",+               std::make_shared<Conv2DTransposeOpConverter>());+  map->emplace("transpose", std::make_shared<TransposeOpConverter>());+  map->emplace("reshape", std::make_shared<ReshapeOpConverter>());+  map->emplace("nn.pad", std::make_shared<PadOpConverter>());+  map->emplace("sum", std::make_shared<ReduceOpConverter>());+  map->emplace("prod", std::make_shared<ReduceOpConverter>());+  map->emplace("max", std::make_shared<ReduceOpConverter>());+  map->emplace("min", std::make_shared<ReduceOpConverter>());+  map->emplace("mean", std::make_shared<ReduceOpConverter>());+  map->emplace("contrib.adaptive_max_pool2d",+               std::make_shared<AdaptivePoolingOpConverter>());+  map->emplace("contrib.adaptive_avg_pool2d",+               std::make_shared<AdaptivePoolingOpConverter>());+#if TRT_VERSION_GE(5, 1, 5)+  map->emplace("clip", std::make_shared<ActivationOpConverter>());+  map->emplace("nn.leaky_relu", std::make_shared<ActivationOpConverter>());+  map->emplace("sin", std::make_shared<UnaryOpConverter>());+  map->emplace("cos", std::make_shared<UnaryOpConverter>());+  map->emplace("atan", std::make_shared<UnaryOpConverter>());+  map->emplace("ceil", std::make_shared<UnaryOpConverter>());+  map->emplace("floor", std::make_shared<UnaryOpConverter>());+  map->emplace("strided_slice", std::make_shared<StridedSliceOpConverter>());+#endif+#if TRT_VERSION_GE(6, 0, 1)+  map->emplace("image.resize", std::make_shared<ResizeOpConverter>());+#endif+  return map;+}++TensorRTBuilder::TensorRTBuilder(const std::vector<DLTensor*>& args)+    : execution_args_(args) {+  // Create TRT builder and network.+  static runtime::TensorRTLogger logger;+  builder_ = nvinfer1::createInferBuilder(logger);+  batch_size_ = args[0]->shape[0];+  builder_->setMaxBatchSize(batch_size_);+  const size_t workspace_size =+      dmlc::GetEnv("TVM_TENSORRT_MAX_WORKSPACE_SIZE", size_t(1) << 31);+  builder_->setMaxWorkspaceSize(workspace_size);+  const bool use_fp16 = dmlc::GetEnv("TVM_TENSORRT_USE_FP16", false);+  builder_->setFp16Mode(use_fp16);+  network_ = builder_->createNetwork();+}++runtime::TrtEngineAndContext TensorRTBuilder::BuildEngine(const Expr& expr) {+  // Process graph and create INetworkDefinition.+  VisitExpr(expr);+  // Mark outputs.+  auto it = node_output_map_.find(expr.operator->());+  CHECK(it != node_output_map_.end()) << "Output was not found.";+  auto network_outputs = it->second;+  std::vector<std::string> network_output_names;+  for (size_t i = 0; i < network_outputs.size(); ++i) {+    CHECK(network_outputs[i].type == kTensor);+    auto out_tensor = network_outputs[i].tensor;+    std::string output_name = "tensorrt_output" + std::to_string(i);+    out_tensor->setName(output_name.c_str());+    network_output_names.push_back(output_name);+    network_->markOutput(*out_tensor);+    DLOG(INFO) << "Added TRT network output: " << out_tensor->getName()+               << " -> " << output_name;+  }+  nvinfer1::ICudaEngine* engine = builder_->buildCudaEngine(*network_);+  CHECK_EQ(engine->getNbBindings(),+           network_input_map_.size() + network_outputs.size());+  CleanUp();+  nvinfer1::IExecutionContext* context = engine->createExecutionContext();+  return {engine, context, network_input_map_, network_output_names};+}++nvinfer1::Weights TensorRTBuilder::GetDLTensorAsWeights(+    DLTensor* dptr, DLDeviceType src_device) {+  CHECK_EQ(dptr->ctx.device_type, src_device);+  CHECK_EQ(static_cast<int>(dptr->dtype.code), kDLFloat);+  const size_t weight_bytes = runtime::GetDataSize(*dptr);+  nvinfer1::Weights weight{nvinfer1::DataType::kFLOAT, nullptr, 0};+  size_t count = 1;+  for (tvm_index_t i = 0; i < dptr->ndim; ++i) {+    count *= dptr->shape[i];+  }+  CHECK_EQ(count * 4, weight_bytes);+  weight.count = count;+  weight.values = new float[count];+  CHECK_EQ(+      TVMArrayCopyToBytes(dptr, const_cast<void*>(weight.values), weight_bytes),+      0)+      << TVMGetLastError();+  trt_weights_.push_back(weight);+  return weight;+}++nvinfer1::Weights TensorRTBuilder::GetNdArrayAsWeights(+    const runtime::NDArray& array, DLDeviceType src_device) {+  DLTensor* dptr = const_cast<DLTensor*>(array.operator->());+  return GetDLTensorAsWeights(dptr, src_device);+}++void TensorRTBuilder::GetInputAsWeights(const VarNode* node) {+  const int var_node_idx = TrackVarNode(node);+  nvinfer1::Weights weight =+      GetDLTensorAsWeights(execution_args_[var_node_idx], kDLGPU);+  node_output_map_[node] = {TrtOpInput(weight, GetShape(node->checked_type()))};+}++void TensorRTBuilder::GetConstantAsWeights(const ConstantNode* node) {+  auto weight = GetNdArrayAsWeights(node->data, kDLCPU);+  auto shape_long = node->data.Shape();+  std::vector<int> shape(shape_long.begin(), shape_long.end());+  node_output_map_[node] = {TrtOpInput(weight, shape)};+}++void TensorRTBuilder::GetInputAsTransposedWeights(const CallNode* transpose,+                                                  const VarNode* node) {+  GetInputAsWeights(node);+  CHECK_EQ(node_output_map_[node].size(), 1);+  const nvinfer1::Weights& original_weight = node_output_map_[node][0].weight;+  const auto& original_shape = node_output_map_[node][0].weight_shape;+  float* values = new float[original_weight.count];+  // Get order and new shape.+  const auto* attrs = transpose->attrs.as<TransposeAttrs>();+  std::vector<int> order(attrs->axes.size(), 0);+  std::vector<int> new_shape(attrs->axes.size(), 0);+  for (size_t i = 0; i < attrs->axes.size(); ++i) {+    const int axis = attrs->axes[i].as<IntImm>()->value;+    order[i] = axis;+    new_shape[i] = original_shape[axis];+  }+  // Perform transpose.+  if (order.size() == 4 && order[0] == 3 && order[1] == 2 && order[2] == 0 &&+      order[3] == 1) {+    TransposeRSCKtoKCRS(original_shape,+                        static_cast<const float*>(original_weight.values),+                        values);+  } else if (order.size() == 4 && order[0] == 2 && order[1] == 3 &&+             order[2] == 0 && order[3] == 1) {+    TransposeRSCKtoCKRS(original_shape,+                        static_cast<const float*>(original_weight.values),+                        values);+  } else if (order.size() == 2 && order[0] == 1 && order[1] == 0) {+    TransposeCKtoKC(original_shape,+                    static_cast<const float*>(original_weight.values), values);+  } else {+    LOG(FATAL) << "Constant transpose " << DebugString(order)+               << " is not supported.";+  }+  // Map as output of transpose op.+  nvinfer1::Weights transposed_weight{nvinfer1::DataType::kFLOAT, values,+                                      original_weight.count};+  trt_weights_.push_back(transposed_weight);+  node_output_map_[transpose] = {TrtOpInput(transposed_weight, new_shape)};+}++void TensorRTBuilder::VisitExpr_(const TupleGetItemNode* op) {+  if (const auto* tuple = op->tuple.as<TupleNode>()) {+    Expr item = tuple->fields[op->index];+    VisitExpr(item);+    node_output_map_[op] = node_output_map_[item.operator->()];+  } else {+    VisitExpr(op->tuple);+    // Index into tensor outputs from expr.+    node_output_map_[op] = {+        node_output_map_[op->tuple.operator->()][op->index]};+  }+}++void TensorRTBuilder::VisitExpr_(const TupleNode* op) {+  std::vector<TrtOpInput> outputs;+  for (auto item : op->fields) {+    VisitExpr(item);+    auto item_outputs = node_output_map_[item.operator->()];+    outputs.reserve(outputs.size() + item_outputs.size());+    outputs.insert(outputs.end(), item_outputs.begin(), item_outputs.end());+  }+  node_output_map_[op] = outputs;+}++void TensorRTBuilder::VisitExpr_(const VarNode* node) {+  const int id = TrackVarNode(node);++  const std::string& tensor_name = node->name_hint();+  auto shape = GetShape(node->checked_type());+  // Remove batch dim+  if (shape.size() > 1) shape.erase(shape.begin());+  DLOG(INFO) << "Added TRT network input: " << node->name_hint() << " "+             << DebugString(shape);+  nvinfer1::Dims dims = VectorToTrtDims(shape);+  auto type_node = node->checked_type().as<TensorTypeNode>();+  CHECK(type_node != nullptr &&+        runtime::TypeMatch(type_node->dtype, kDLFloat, 32))+      << "Only FP32 inputs are supported.";+  auto input =+      network_->addInput(tensor_name.c_str(), nvinfer1::DataType::kFLOAT, dims);+  network_input_map_[id] = tensor_name;+  node_output_map_[node] = {TrtOpInput(input)};+}++void TensorRTBuilder::VisitExpr_(const ConstantNode* node) {+  nvinfer1::Weights weight = GetNdArrayAsWeights(node->data, kDLCPU);+  auto shape = node->data.Shape();+  // Remove batch dim.+  if (shape.size() > 1 && shape[0] == 1) shape.erase(shape.begin());+  nvinfer1::Dims dims = VectorToTrtDims(shape);+  auto const_layer = network_->addConstant(dims, weight);+  CHECK(const_layer != nullptr);+  node_output_map_[node] = {TrtOpInput(const_layer->getOutput(0))};+}++void TensorRTBuilder::VisitExpr_(const CallNode* call) {+  AddTrtLayerParams params(network_, call, &trt_weights_);+  // Look up converter.+  auto it = GetOpConverters()->find(params.op_name);+  CHECK(it != GetOpConverters()->end())+      << "Unsupported operator conversion to TRT, op name: " << params.op_name;+  const auto converter = it->second;++  // Ensure that nodes are processed in topological order by visiting their+  // inputs first.+  for (size_t i = 0; i < call->args.size(); ++i) {+    // Handle special case where input must be constant array on CPU.+    if (!converter->variable_input_count &&+        converter->input_types[i] == kWeight) {+      // Input must be a constant weight+      if (auto* var = call->args[i].as<VarNode>()) {+        GetInputAsWeights(var);+      } else if (auto* node = call->args[i].as<ConstantNode>()) {+        GetConstantAsWeights(node);+      } else {+        // Temporary workaround for transposed weights. Once partitioning is+        // available, the transpose will be computed by tvm and the result will+        // be a var input.+        if (auto* transpose = call->args[i].as<CallNode>()) {+          if (transpose->op.as<OpNode>()->name == "transpose") {+            if (auto* weights = transpose->args[0].as<VarNode>()) {+              GetInputAsTransposedWeights(transpose, weights);+            } else {+              LOG(FATAL) << "TRT requires a constant input here.";+            }+          } else {+            LOG(FATAL) << "TRT requires a constant input here.";+          }+        } else {+          LOG(FATAL) << "TRT requires a constant input here.";+        }+      }+    } else {+      VisitExpr(call->args[i]);+    }+  }++  // Get inputs.+  for (size_t i = 0; i < call->args.size(); ++i) {+    auto it = node_output_map_.find(call->args[i].operator->());+    CHECK(it != node_output_map_.end()) << "Input was not found.";+    for (auto out : it->second) {+      params.inputs.push_back(out);+    }+  }+  if (!converter->variable_input_count) {+    CHECK_EQ(converter->input_types.size(), params.inputs.size())+        << "Op expected a different number of inputs.";+  }++  // Convert op to TRT.+  converter->Convert(&params);++  // Get outputs.+  node_output_map_[call] = {};+  std::vector<TrtOpInput> outputs;+  for (auto out : params.outputs) {+    node_output_map_[call].push_back(TrtOpInput(out));+  }+}++int TensorRTBuilder::TrackVarNode(const VarNode* node) {+  // TODO(trevmorr): make more robust+  const int trim_length = std::string("tensorrt_input").length();+  int var_node_idx =+      std::stoi(node->name_hint().substr(trim_length, std::string::npos));+  return var_node_idx;+}++void TensorRTBuilder::CleanUp() {+  network_->destroy();+  builder_->destroy();+  for (auto weight : trt_weights_) {+    if (weight.type == nvinfer1::DataType::kFLOAT) {+      delete[] static_cast<const float*>(weight.values);+    } else {+      delete[] static_cast<const uint16_t*>(weight.values);+    }+  }+}++void TransposeRSCKtoKCRS(const std::vector<int>& original_shape,+                         const float* input_values, float* output_values) {+  const int r = original_shape[0];+  const int s = original_shape[1];+  const int c = original_shape[2];+  const int k = original_shape[3];+  for (int x = 0; x < k; x++) {+    for (int y = 0; y < c; y++) {+      for (int z = 0; z < r; z++) {+        for (int w = 0; w < s; w++) {+          const int input_index = (x) + (y * k) + (z * s * c * k) + (w * c * k);+          const int output_index =+              (x * c * r * s) + (y * r * s) + (z * s) + (w);+          output_values[output_index] = input_values[input_index];+        }+      }+    }+  }+}++void TransposeRSCKtoCKRS(const std::vector<int>& original_shape,+                         const float* input_values, float* output_values) {+  const int r = original_shape[0];+  const int s = original_shape[1];+  const int c = original_shape[2];+  const int k = original_shape[3];+  for (int x = 0; x < k; x++) {+    for (int y = 0; y < c; y++) {+      for (int z = 0; z < r; z++) {+        for (int w = 0; w < s; w++) {+          const int input_index = (x) + (y * k) + (z * s * c * k) + (w * c * k);+          const int output_index =+              (y * k * r * s) + (x * r * s) + (z * s) + (w);

Thanks, I created TransposeWeights4D which takes input and output strides as arguments.

trevor-m

comment created time in a month

push eventtrevor-m/tvm

Trevor Morris

commit sha 0895ff457c03e12d84e711f8a49c415de0268f89

Use opt level 3, add helper to check whether to run test, improve load_params

view details

push time in a month

more