profile
viewpoint
ga gaurav1086 New York https://www.linkedin.com/in/gaurav-s-7699a116/ C/C++, Python, Perl, Golang, Machine Learning

gaurav1086/dl-algo-trader 53

Training an Agent to make automated trading decisions in a simulated stochastic market environment using Reinforcement Learning or Deep Q-Learning

gaurav1086/bitcoin 0

Bitcoin Core integration/staging tree

gaurav1086/ceres-solver 0

A large scale non-linear optimization library

gaurav1086/cpython 0

The Python programming language

gaurav1086/faiss 0

A library for efficient similarity search and clustering of dense vectors.

gaurav1086/fbthrift 0

Facebook's branch of Apache Thrift, including a new C++ server.

gaurav1086/hhvm 0

A virtual machine for executing programs written in Hack.

gaurav1086/htop 0

htop is an interactive text-mode process viewer for Unix systems. It aims to be a better 'top'.

gaurav1086/incubator-mxnet 0

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

gaurav1086/incubator-tvm 0

Open deep learning compiler stack for cpu, gpu and specialized accelerators

PR closed tensorflow/tensorflow

Reviewers
[lite] Remove possible redundant/nullptr add_op awaiting review cla: yes comp:lite size:XS

op_producing_add_input can still be null while add_op is not null - due to multiple 'continue' (error cases) statements within the for loop.

+2 -4

0 comment

1 changed file

gaurav1086

pr closed time in a day

PR closed hishamhm/htop

Reviewers
kshandle nullptr check

Check kshandle before before initializing: ksvar = kshandle->ks_data

+5 -4

0 comment

1 changed file

gaurav1086

pr closed time in a day

push eventgaurav1086/incubator-mxnet

Gaurav Singh

commit sha bb3e7b1242fc292664cb81684d9b7bbc096b4f59

Fix compilation errors

view details

push time in 2 days

PR opened apache/incubator-mxnet

[SquareSumRspGradKernel] Add div by zero check for num_cols

Public interface of SquareSumRspGradKernel is not safe. When calling SquareSumRspGradKernel::Map(), if parameter num_cols is 0 that leads to division by zero.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • [ ] Changes are complete (i.e. I finished coding on this PR)
  • [ ] All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • [ ] Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • [ ] Feature1, tests, (and when applicable, API doc)
  • [ ] Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here
+4 -0

0 comment

1 changed file

pr created time in 3 days

create barnchgaurav1086/incubator-mxnet

branch : divZero_sigabrt

created branch time in 3 days

PR opened apache/incubator-tvm-vta

[xilinx] correct assert statement

Correct typo in the assert statement

+1 -1

0 comment

1 changed file

pr created time in 3 days

create barnchgaurav1086/incubator-tvm-vta

branch : xilinx_correct_assert

created branch time in 3 days

PR opened apache/incubator-tvm

[xilinx]: correct assert statement

Correct typo in the assert statement

Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

+1 -1

0 comment

1 changed file

pr created time in 3 days

create barnchgaurav1086/incubator-tvm

branch : xilinx_correct_expr

created branch time in 3 days

PR opened apache/incubator-tvm

[Dataflow]: nullptr check

Check r!=nullptr before deferencing : r->source.size()

Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

+1 -1

0 comment

1 changed file

pr created time in 3 days

create barnchgaurav1086/incubator-tvm

branch : schedule_dataflow_nullptr

created branch time in 3 days

fork gaurav1086/incubator-tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

https://tvm.apache.org/

fork in 3 days

PR opened apache/incubator-mxnet

Remove duplicate condition

Description

Remove duplicate condition.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • [ ] Changes are complete (i.e. I finished coding on this PR)
  • [ ] All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • [ ] Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • [ ] Feature1, tests, (and when applicable, API doc)
  • [ ] Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here
+1 -1

0 comment

1 changed file

pr created time in 4 days

create barnchgaurav1086/incubator-mxnet

branch : dup_condition

created branch time in 4 days

PR opened apache/incubator-mxnet

param.axes check : redundant condition

Description

(Brief description on what this PR is about) Logical condition optimization: !A || (A && B)' is equivalent to '!A || B

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • [ ] Changes are complete (i.e. I finished coding on this PR)
  • [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here
+1 -1

0 comment

1 changed file

pr created time in 4 days

create barnchgaurav1086/incubator-mxnet

branch : logical_cond_optim

created branch time in 4 days

fork gaurav1086/incubator-mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

https://mxnet.apache.org

fork in 4 days

PR opened google/or-tools

Catch exception by reference

<!-- Thank you for submitting a PR!

Please make sure you are targeting the master branch instead of stable and that all contributors have signed the Contributor License Agreement.

This simply gives us permission to use and redistribute your contributions as part of the project. Head over to https://cla.developers.google.com/ to see your current agreements on file or to sign a new one.

This project follows https://opensource.google.com/conduct/

Thanks! -->

+486249 -82946

0 comment

3256 changed files

pr created time in 16 days

create barnchgaurav1086/or-tools

branch : exception_by_reference

created branch time in 16 days

PR opened google/or-tools

Check(demon!=nullptr) before accessing demon->

<!-- Thank you for submitting a PR!

Please make sure you are targeting the master branch instead of stable and that all contributors have signed the Contributor License Agreement.

This simply gives us permission to use and redistribute your contributions as part of the project. Head over to https://cla.developers.google.com/ to see your current agreements on file or to sign a new one.

This project follows https://opensource.google.com/conduct/

Thanks! -->

+2 -2

0 comment

1 changed file

pr created time in 16 days

create barnchgaurav1086/or-tools

branch : demon_nullptr_check

created branch time in 16 days

started6-Billionaires/trading-gym

started time in a month

pull request commenttensorflow/tensorflow

[core] Added null check for output buffer

@gbaned fixed the build errors. Thank you.

gaurav1086

comment created time in a month

push eventgaurav1086/tensorflow

Gaurav Singh

commit sha d6e79390a5ab7ce86833becddfcc7009840118d3

Fix compilation errors

view details

push time in a month

push eventgaurav1086/pytorch

Edward Z. Yang

commit sha 883b18ea704a3f1e3474800c5ae0b9adae047a2f

Delete build_variables.bzl following configerator change. Signed-off-by: Edward Z. Yang <ezyang@fb.com>

view details

Vitaly Fedyunin

commit sha 05fb160048b71c1b8b00d2083a08618318158c1a

Revert D19964089: [pytorch][PR] Allow vectorized gpu loop to have different argument types Test Plan: revert-hammer Differential Revision: D19964089 Original commit changeset: a1e8e62d1ebc fbshipit-source-id: fee9423d5924714f0e92eea712cde2d2163b3cf0

view details

peter

commit sha ffe327f7d9c0bf3f6d2fc64fd4dfb8b2c2013be8

Revert "Disable flaky test TestCppExtensionAOT.test_cuda_extension in… (#33404) Summary: … Windows CI (https://github.com/pytorch/pytorch/issues/33282)" This reverts commit 5b922918d023126ad1f468c68577c9b599ad202d. Fixes https://github.com/pytorch/pytorch/issues/33270. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33404 Differential Revision: D19972594 Pulled By: ezyang fbshipit-source-id: c8f67536fd6e4b7135171d621ad671b1b2a21fd4

view details

Edward Yang

commit sha 196fda5a79f92f5716d401fe4b4c76c49d35ae92

Remove special case codegen for tril_indices/triu_indices. (#33305) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33305 The current TensorOptions code is written to exactly extract out TensorOptions based on exact struct match, including default arguments. That meant that tril_indices/triu_indices which had a different default argument didn't match, and thus needed a special case. I resolve this special case by instead replacing the explicit long default argument with a None default argument, and then adjusting the actual implementations to select the correct dtype when none was specified. I think the general rule I'm following here is that it is always acceptable to replace an explicit default argument, with a None argument (assuming the backend will compute it appropriately); the documentation gets modestly worse, but everything that was previously expressible continues to be expressible. Maybe later we should switch the default argument back to long, but for now the simplification in code is worth it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19975411 Pulled By: ezyang fbshipit-source-id: 996598759bed9e8d54fe61e19354ad038ed0e852

view details

Edward Yang

commit sha a9e4448dffa08b19290e00ce217962ce476e4584

Update documentation on why _cudnn_init_dropout_state looks the way it is. (#33347) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33347 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19975410 Pulled By: ezyang fbshipit-source-id: eb729870c2d279d7d9ca43c92e514fe38dedb06d

view details

Yanli Zhao

commit sha 01e1de8220759af2a35200ff4897eab02cb8c6b5

allow remote torchscript call to itself (#32990) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32990 right now remote torchscript call can not call to itself, this diff is to support this in the same way as how is supported when calling remote python call to itself ghstack-source-id: 98599082 Test Plan: unit test Differential Revision: D19731910 fbshipit-source-id: 6495db68c3eaa58812aa0c5c1e72e8b6057dc5c4

view details

Jithun Nair

commit sha 718c538ff939126a889d24d5818cdf1fd27b73ca

Add ability to enable/disable MIOpen at runtime (#33118) Summary: 1. Set `torch._C.has_cudnn` to `True` for ROCm 2. Make MIOpen invocations respect value of `cudnn_enabled` or `at::globalContext().userEnabledCuDNN()` 3. `torch/backends/cudnn/__init__.py`: Add hip-specific changes (use "hide whitespace changes" option to view simpler diff) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33118 Differential Revision: D19977719 Pulled By: bddppq fbshipit-source-id: 64d4dd1d78afcf96201360d85b8be5950f96dfad

view details

Vitaly Fedyunin

commit sha 3233033a17afe00693c0d909adbcf0a1ce580853

Revert D19975410: Update documentation on why _cudnn_init_dropout_state looks the way it is. Test Plan: revert-hammer Differential Revision: D19975410 Original commit changeset: eb729870c2d2 fbshipit-source-id: 4d4cc8ae78ad18751c126b93d82932ac2732f1b5

view details

Edgar Andrés Margffoy Tuay

commit sha cdf381c967c7c93aa60941b0d35c33e1f8ec3121

Fix LambdaLR scheduler side effects (#32848) Summary: Fixes https://github.com/pytorch/pytorch/issues/32756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32848 Differential Revision: D19859736 Pulled By: vincentqb fbshipit-source-id: 43b3cbb2b6bed208c75aad37aebc2a8a9565fe0d

view details

Nikolay Novik

commit sha d19a50bf277344dfddec491249005464df7eae44

Add missing weight_decay parameter validation for Adam and AdamW (#33126) Summary: Adam and AdamW are missing parameter validation for weight_decay. Other optimisers have this check present. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33126 Differential Revision: D19860366 Pulled By: vincentqb fbshipit-source-id: 286d7dc90e2f4ccf6540638286d2fe17939648fc

view details

Vitaly Fedyunin

commit sha 687a7e4a2566861c53c8fb53a80b198465168b38

Revert D19975411: Remove special case codegen for tril_indices/triu_indices. Test Plan: revert-hammer Differential Revision: D19975411 Original commit changeset: 996598759bed fbshipit-source-id: 6bdb4b8f903e13815fc146e6f3260e5bb04c1045

view details

Edward Yang

commit sha 71225ecc8ce802d44929087ee659b97469af55d5

Revert D20006312: Revert D19975410: Update documentation on why _cudnn_init_dropout_state looks the way it is. Test Plan: revert-hammer Differential Revision: D20006312 Original commit changeset: 4d4cc8ae78ad fbshipit-source-id: 4bd4b9d1331dc97f5b83e0df491be5fd0a11214a

view details

vishwakftw

commit sha 1a2574734222193c344dffb7904def10db18b874

Check for consistent devices in at::where (#33432) Summary: Changelog: - Add a check to ensure that all inputs to `where` lie on the same device Pull Request resolved: https://github.com/pytorch/pytorch/pull/33432 Test Plan: - Added test_where_invalid_device Fixes https://github.com/pytorch/pytorch/issues/33422 Differential Revision: D19981115 Pulled By: VitalyFedyunin fbshipit-source-id: 745896927edb53f61f3dd48ba9e1e6cd10d35434

view details

Nick Korovaiko

commit sha 36d724c9634eb20a85d1bc546885c59055931d57

run peephole to do profile-based optimizations (#33337) Summary: We need to run a peephole before constant propagation in the profiling pipeline, so we fold `prim::shape` for inputs with complete tensor types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33337 Differential Revision: D19905624 Pulled By: Krovatkin fbshipit-source-id: 80fff067941556053847ddc7afe0fd1c7a89a3ba

view details

anjali411

commit sha 13e4ee7883106c412d3efe17fbbe8087a6d4c5e2

Added tensor.is_complex(), is_complex and dtype.is_complex py binding, tensor printing, and dixed the scalar type returned for complex float (#33268) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33268 Test Plan: Imported from OSS Differential Revision: D19907698 Pulled By: anjali411 fbshipit-source-id: c3ce2e99fc09da91a90a8fb94e5525a00bb23703

view details

anjali411

commit sha e5cf7afd0a8cc47ab9ce044b94c264619a22db62

torch.tensor can infer complex dtype now (#33361) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33361 Test Plan: Imported from OSS Differential Revision: D19943477 Pulled By: anjali411 fbshipit-source-id: ff6d7d2a6fdb6c58390f33bdd8be2f3fa182518b

view details

Igor Sugak

commit sha 108fc78395781e90d7a70909cfba2e710064ff77

[caffe2] fix invalid % escape in inline assembly strings (#33554) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33554 NVCC/GCC accepts the existing syntax, but not Clang which requires a proper escape. Here `%laneid` is one of the many registers that CUDA's pseudo-asm provides [1]. And using the extra `%` doesn't change the semantics, as PTX expects `%laneid` value after it's processed by the asm tool. 1. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow Reviewed By: bddppq Differential Revision: D20003621 fbshipit-source-id: 8e550e55a3455925e7bd92c6df3e504b5d38c2dc

view details

Martin Yuan

commit sha 5782758b547ad4ea8d03090fd91208a811525528

Add instructions and operators for new bytecode format of PyText model (#33555) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33555 A quick fix for the PyText model (in internal production) on the new bytecode format. Test Plan: Imported from OSS Differential Revision: D20008266 Pulled By: iseeyuan fbshipit-source-id: 1916bd0bf41093898713c567c7f6fa546b9ea440

view details

Igor Sugak

commit sha 23846d5a3821451f4471e7e8f6a5ccc8b985ab4b

[caffe2] use Clang identification macro in various places (#33574) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33574 Sprinkle with Clang identification macro places that otherwise would cause build errors when Clang is used to drive the CUDA compilation. Note: `__clang__` is defined when either Clang is used as host compiler by NVCC or when Clang drives the compilation. `__CUDA__` is defined only for the latter case. Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Reviewed By: BIT-silence Differential Revision: D20007440 fbshipit-source-id: 53caa70695b99461a3910d41dc71a9f6d0728a75

view details

Peter Bell

commit sha c882425c2449a4fb67a74375abddc1456fdc84b9

Add 64-bit indexing support to THC index reductions (#33405) Summary: Fixes https://github.com/pytorch/pytorch/issues/32863, (together with https://github.com/pytorch/pytorch/issues/33310 for the `TensorIterator` reductions) This adds 64-bit indexed kernels for `THC_reduceDimIndex` and uses `THCTensor_canUse32BitIndexMath` to switch between the two at runtime. I have a test for this locally but haven't included it here because `max` is much slower than `argmax`. To the point where the test takes several minutes to call max on just one `2**32` element tensor. That seems excessive, even for a slow test but I can push it if preferred. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33405 Differential Revision: D20010769 Pulled By: ezyang fbshipit-source-id: a8a86f662598d5fade4d90448436418422c699a3

view details

push time in a month

PR opened pytorch/pytorch

[caffe2][core] Redundant file ptr repositioning

File already opened in append mode. No need for reposition fseek to the eof.

+0 -1

0 comment

1 changed file

pr created time in a month

create barnchgaurav1086/pytorch

branch : redundant_fileptr_repositioning

created branch time in a month

push eventgaurav1086/tensorflow

frreiss

commit sha b05a3c1b3db9b82dfe86cd0d6db7b91ff89bd928

Improve API documentation for WindowDataset op

view details

nikochiko

commit sha 4c9ee36f03d9b01b4d8598905aa26bbf81b380b4

Update array_ops.py Update documentation, formatting and fix typos for `tf.broadcast_dynamic_shape`, `tf.broadcast_static_shape`, `tf.boolean_mask`

view details

nikochiko

commit sha 2a6efd2e668f8418bdf1c60e8218791559724dc4

Update docstrings Updated docstrings for `tf.convert_to_tensor` and `tf.edit_distance`. `tf.convert_to_tensor`: Put example in "For example:" section and switch to carets from backticks. `tf.edit_distance`: Updated documentatoin, fixed example.

view details

nikochiko

commit sha fdadd0e5e524df6488cd763c4ab7595d469ed1ef

Update save.py Fix https://github.com/tensorflow/tensorflow/issues/34348 . Notes: - Documentation needs to be changed (in multiple places) after final changes in code. - Changed code for deciding whether to save file as h5 or tf. - Removed the unncessary _HDF5_EXTENSIONS list. Will have to make sure it wasn't used elsewhere. - Added 4 new ValueError raises.

view details

nikochiko

commit sha b33be57b2b02b1abc159edc44155b46f0bf26cad

Revert "Update docstrings" This reverts commit 2a6efd2e668f8418bdf1c60e8218791559724dc4.

view details

nikochiko

commit sha e81b7ea8d85bbedf9a0d2d00557400987975373f

Revert "Update array_ops.py" This reverts commit 4c9ee36f03d9b01b4d8598905aa26bbf81b380b4.

view details

nikochiko

commit sha 9c83a0e9a205a062d7c19a7fba175729c66ab13c

Added new function process_save_format - Added new function `validate_save_format` as requested by @k-w-w inside `network.py`. - Using `validate_save_format` for validating save_format in `save.save_model` and `network.save_weights` Although, the a few updates will have to be made in `save_weights` because - `validate_save_format` is designed to work with path as well as h5py.File objects. This works with `save.save_model` but not with `network.save_weights` which accepts only String as the path. - Does it make sense to add functionality to save_weights to save it to a h5py.File object?

view details

Kaustubh Maske Patil

commit sha f20a32355324fa131f4c69e87edecc1ed5365ecb

Merge branch 'master' into fix-save-model

view details

frreiss

commit sha 68c034cde3b887943e30a644f618369745b04e56

Update python API docs per review comments

view details

frreiss

commit sha 13d03c3caa2a45d9d60fc4faf5b97a3eb199ba80

Merge branch 'master' of https://github.com/tensorflow/tensorflow into issue-data-window-doc

view details

frreiss

commit sha 6254f1e4347b3a76c61eedb66df32df210dda690

Merge branch 'master' of https://github.com/tensorflow/tensorflow into issue-data-window-doc

view details

frreiss

commit sha 84bc9b4a8126cd42a2e479ff990fc242af1ea61e

Address review comments

view details

Rahul Huilgol

commit sha c1e45341f61344dd5463426d38d430a58c45114a

Allow an option to set CA file and CA Path to AWS SDK

view details

Hans Gaiser

commit sha 0d31c0bee8a1e06c7b4fa977ce2bc6ce347aa96f

Use _get_distribution_strategy only when it is available.

view details

Kaustubh Maske Patil

commit sha 0f7b5e410f414464ec3e08ab1995c75d378af6cc

Update save.py

view details

Kaustubh Maske Patil

commit sha b641f6953f72c8c298614ea521981f4dc86ab446

Update network.py

view details

nikochiko

commit sha 25ec563a11639c583ac38ef626d598f9ee87208b

Fix sanity

view details

Deven Desai

commit sha f8ba03dfd976278a605e53ac741210fbab14c7ae

[ROCm] Fix the ROCm CSB breakage - 200110 PR # (which enabled MIOpen Immediate Mode API for convolutions) was merged yesterday, but it causes the following test to fail ``` //tensorflow/python/keras:convolutional_test_gpu ``` The cause of the failure is because the above test calls the convolution API with cudnn_use_autotune set to false. With the new MIOpen Immediate Mode API, we need to have an explicit call to GetMIOpenConvolveAlgorithms, when convolution kernel is called for the first time for a given conv_paramereters set. That call gets skipped if cudnn_use_autotune is set to false, and hence the test failure. This fix essentially disables the cudnn_use_autotune functionality for convolution kernel calls in the ROCm flow.

view details

nikochiko

commit sha b2875d86f0f30fed4b3b947d01471d37503bcb16

Add tests

view details

nikochiko

commit sha 616154eb62ad1ab2f89c5906253edab2bc141e2d

Fix typo

view details

push time in a month

pull request commenttensorflow/tensorflow

[core] Added null check for output buffer

@mihaimaruseac @gbaned , rebased with the latest master

gaurav1086

comment created time in a month

push eventgaurav1086/tensorflow

Bruce Fontaine

commit sha a7f1d52b0396acc53e2ba27fe5499f614884d871

Split distribute/custom_training_loop_test into three parts as it is timing out on our kokoro TPU continuous tests. PiperOrigin-RevId: 294765478 Change-Id: I574ca0433ade67673e1b5ea731db94e40e28ae5f

view details

Andrew Audibert

commit sha 0c1ca5c674314854cc4a4ba32c636cafc2b4e3aa

Add API to control parallel interleave prefetching. PiperOrigin-RevId: 294766661 Change-Id: I8061629522d19d408cd8b7a1981836a4ee958110

view details

Marat Dukhan

commit sha 52d18f5e67485621e8821516c82912af27eddbda

Update XNNPACK dependency PiperOrigin-RevId: 294767643 Change-Id: I3deaf10d1a1b51ee2d72f18604dcd926f6225e72

view details

A. Unique TensorFlower

commit sha d96c1e107fdf1faffe38a2c72d328fd6778a5716

Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 294768124 Change-Id: Ic35f039f13824815d34605206b39e282c25060a5

view details

A. Unique TensorFlower

commit sha dcc5a469beb583fbfb5d005af6bc350779d6d549

Add pfor converters for: Fill StatelessMultinomial StatelessRandomBinomial StatelessRandomGammaV2 StatelessRandomNormal StatelessRandomPoisson StatelessRandomUniform StatelessRandomUniformInt StatelessTruncatedNormal UnsortedSegmentMax UnsortedSegmentMin UnsortedSegmentProd PiperOrigin-RevId: 294771643 Change-Id: Iba575475be98cc1bfb7e740f752275ab72b18ee4

view details

Bixia Zheng

commit sha 057e8630bcf90dba8d8919d12d77d08e9103a141

[TF:MLIR] Enhance promote-resources-to-arguments pass to handle resource accesses inside control flow. Use resource lifting to functionalize control flow statements. Change the pass ordering in ConvertMLIRToXlaComputation to perform control flow legalization after promoting resources to arguments. PiperOrigin-RevId: 294772980 Change-Id: I8e4b89d7c4c090fd473e579baf0424188ec26e59

view details

Akshay Modi

commit sha a94839d7910dc5dfa4b6cde20445b1d5a2fa53b8

Make "share_cluster_devices_in_session" non experimental. PiperOrigin-RevId: 294773227 Change-Id: Ib71d36f09e7e0fde3b5afe6710b9011d39445ede

view details

A. Unique TensorFlower

commit sha a90fa384cc031905420daa81a553bda6db4cc7bd

[TF2:XLA] Lower/UpperBound ops for tf.searchsorted. PiperOrigin-RevId: 294774880 Change-Id: If6137584ba86507912d0581d611eff01bb327ebd

view details

Gunhan Gulsoy

commit sha f04915d2c83bc708c6b1ed33f8f7dfc391e0d2dd

Implement GetTempFilename for windows. Instead of an ad-hoc tempo filename creation logic in gcs_filesystem, use the one provided in platform/path. PiperOrigin-RevId: 294778438 Change-Id: Ib4dfb32c76bda697f9ccde12d9fdadb42a3e6e3e

view details

A. Unique TensorFlower

commit sha 52281ba252094fc201d2dbcb49c9c1fa9d17ad03

Introduce a memory leak detection utility. PiperOrigin-RevId: 294780979 Change-Id: I27b18224dbb49535beaa7ce81906f5686cebb7ef

view details

Jiho Choi

commit sha 76e77cf61c5e3fab34f1ff119f5fe4fa77be590d

Change DerivedXLineBuilder to accept the vector of event metadata and maintain dependency with other lines. PiperOrigin-RevId: 294781084 Change-Id: Ied1b11a4cdbb33a0b16282b867174e2048fd6904

view details

TensorFlower Gardener

commit sha d74394747a2e253f9e42ed9c5e0e44f628fd200d

Merge of b75a6222b82bb556f63f7a5a04cab45212ed30c6 PiperOrigin-RevId: 294781398 Change-Id: I3cc915dd058a9b1414a7885794e4b95522ea910c

view details

Yanhui Liang

commit sha 13655728cda68ce4d8eefb92124b3b2191991dce

Fix the timeout of nasnet test. PiperOrigin-RevId: 294782893 Change-Id: Iffc97ba7ecb2072fd6d42ba7d9923952b157452d

view details

George Karpenkov

commit sha 4f2afc07b26748f80a0de768fac81c7816410b44

[XLA/GPU] Adapt tree reduction pass to the new column reduction algorithm Use the fact that we now can reduce up to 4096 items deterministically without atomics in a single kernel launch. PiperOrigin-RevId: 294783585 Change-Id: Ie941c5adc990d130104f9cf924e97859695ce0eb

view details

Rick Chao

commit sha fc36231b872e10793163cb62eef907dfcf0e7cff

Fix multi_worker_callback_tf2_test test target by only running it with CPU. The test is not using GPU anyway. PiperOrigin-RevId: 294784694 Change-Id: I9e9d2f8db05160799ef64b43f0cc8b1a927637e0

view details

Andrew Audibert

commit sha 47940211fdf68f9422f93a0c0c08382d03bdd438

Enable op-level dataset determinism configuration for ParallelMap Users can control determinism at a per-op level by specifying `deterministic` when calling map(). The `deterministic` argument takes higher priority than the `experimental_deterministic` dataset option. PiperOrigin-RevId: 294786773 Change-Id: If89f87dbe2adb51aad79791aa3f18072132e74c6

view details

A. Unique TensorFlower

commit sha 2522a14a11f20a49cc473c4587fb3bef55403be5

Update ops-related pbtxt files. PiperOrigin-RevId: 294789924 Change-Id: I26834db79554b0d7de02db7f669b784bed5e711f

view details

A. Unique TensorFlower

commit sha 10a29d7a5029207c739ba51502b2a00e8067b01b

Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 294790443 Change-Id: Idf262e9b1e54f3e1b9cb7c8a55df86866f56f5d6

view details

A. Unique TensorFlower

commit sha ee6c34b5a9d3e743c0f9f00fa2a6b18555ee2981

Automated rollback of commit a90fa384cc031905420daa81a553bda6db4cc7bd PiperOrigin-RevId: 294792062 Change-Id: I56c1915922822e2ddf2fd445fafd1e6590acba04

view details

Timon Van Overveldt

commit sha 596090262a846253f1c0a66fe81119de52141078

Revert to fnmatch-based FileSystem::Match on mobile, avoids APK size increase from RE2 dep. PiperOrigin-RevId: 294796880 Change-Id: I7f43104cbf4e261187204d91c41cfadb0098d5ed

view details

push time in a month

push eventgaurav1086/tensorflow

Peter Hawkins

commit sha bceb1d7854f31a3fb4147bd3f2fa8266012771f2

Make gtl/int_type.h types hashable in ABSL containers. (Note ABSL hash functions are templated, and so there's no cost if you don't use them.) PiperOrigin-RevId: 294764409 Change-Id: I19bafd4a19eb6b623a48ce5d136dd5bf6c20e56b

view details

Bruce Fontaine

commit sha a7f1d52b0396acc53e2ba27fe5499f614884d871

Split distribute/custom_training_loop_test into three parts as it is timing out on our kokoro TPU continuous tests. PiperOrigin-RevId: 294765478 Change-Id: I574ca0433ade67673e1b5ea731db94e40e28ae5f

view details

Andrew Audibert

commit sha 0c1ca5c674314854cc4a4ba32c636cafc2b4e3aa

Add API to control parallel interleave prefetching. PiperOrigin-RevId: 294766661 Change-Id: I8061629522d19d408cd8b7a1981836a4ee958110

view details

Marat Dukhan

commit sha 52d18f5e67485621e8821516c82912af27eddbda

Update XNNPACK dependency PiperOrigin-RevId: 294767643 Change-Id: I3deaf10d1a1b51ee2d72f18604dcd926f6225e72

view details

A. Unique TensorFlower

commit sha d96c1e107fdf1faffe38a2c72d328fd6778a5716

Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 294768124 Change-Id: Ic35f039f13824815d34605206b39e282c25060a5

view details

A. Unique TensorFlower

commit sha dcc5a469beb583fbfb5d005af6bc350779d6d549

Add pfor converters for: Fill StatelessMultinomial StatelessRandomBinomial StatelessRandomGammaV2 StatelessRandomNormal StatelessRandomPoisson StatelessRandomUniform StatelessRandomUniformInt StatelessTruncatedNormal UnsortedSegmentMax UnsortedSegmentMin UnsortedSegmentProd PiperOrigin-RevId: 294771643 Change-Id: Iba575475be98cc1bfb7e740f752275ab72b18ee4

view details

Bixia Zheng

commit sha 057e8630bcf90dba8d8919d12d77d08e9103a141

[TF:MLIR] Enhance promote-resources-to-arguments pass to handle resource accesses inside control flow. Use resource lifting to functionalize control flow statements. Change the pass ordering in ConvertMLIRToXlaComputation to perform control flow legalization after promoting resources to arguments. PiperOrigin-RevId: 294772980 Change-Id: I8e4b89d7c4c090fd473e579baf0424188ec26e59

view details

Akshay Modi

commit sha a94839d7910dc5dfa4b6cde20445b1d5a2fa53b8

Make "share_cluster_devices_in_session" non experimental. PiperOrigin-RevId: 294773227 Change-Id: Ib71d36f09e7e0fde3b5afe6710b9011d39445ede

view details

A. Unique TensorFlower

commit sha a90fa384cc031905420daa81a553bda6db4cc7bd

[TF2:XLA] Lower/UpperBound ops for tf.searchsorted. PiperOrigin-RevId: 294774880 Change-Id: If6137584ba86507912d0581d611eff01bb327ebd

view details

Gunhan Gulsoy

commit sha f04915d2c83bc708c6b1ed33f8f7dfc391e0d2dd

Implement GetTempFilename for windows. Instead of an ad-hoc tempo filename creation logic in gcs_filesystem, use the one provided in platform/path. PiperOrigin-RevId: 294778438 Change-Id: Ib4dfb32c76bda697f9ccde12d9fdadb42a3e6e3e

view details

A. Unique TensorFlower

commit sha 52281ba252094fc201d2dbcb49c9c1fa9d17ad03

Introduce a memory leak detection utility. PiperOrigin-RevId: 294780979 Change-Id: I27b18224dbb49535beaa7ce81906f5686cebb7ef

view details

Jiho Choi

commit sha 76e77cf61c5e3fab34f1ff119f5fe4fa77be590d

Change DerivedXLineBuilder to accept the vector of event metadata and maintain dependency with other lines. PiperOrigin-RevId: 294781084 Change-Id: Ied1b11a4cdbb33a0b16282b867174e2048fd6904

view details

TensorFlower Gardener

commit sha d74394747a2e253f9e42ed9c5e0e44f628fd200d

Merge of b75a6222b82bb556f63f7a5a04cab45212ed30c6 PiperOrigin-RevId: 294781398 Change-Id: I3cc915dd058a9b1414a7885794e4b95522ea910c

view details

Yanhui Liang

commit sha 13655728cda68ce4d8eefb92124b3b2191991dce

Fix the timeout of nasnet test. PiperOrigin-RevId: 294782893 Change-Id: Iffc97ba7ecb2072fd6d42ba7d9923952b157452d

view details

George Karpenkov

commit sha 4f2afc07b26748f80a0de768fac81c7816410b44

[XLA/GPU] Adapt tree reduction pass to the new column reduction algorithm Use the fact that we now can reduce up to 4096 items deterministically without atomics in a single kernel launch. PiperOrigin-RevId: 294783585 Change-Id: Ie941c5adc990d130104f9cf924e97859695ce0eb

view details

Rick Chao

commit sha fc36231b872e10793163cb62eef907dfcf0e7cff

Fix multi_worker_callback_tf2_test test target by only running it with CPU. The test is not using GPU anyway. PiperOrigin-RevId: 294784694 Change-Id: I9e9d2f8db05160799ef64b43f0cc8b1a927637e0

view details

Andrew Audibert

commit sha 47940211fdf68f9422f93a0c0c08382d03bdd438

Enable op-level dataset determinism configuration for ParallelMap Users can control determinism at a per-op level by specifying `deterministic` when calling map(). The `deterministic` argument takes higher priority than the `experimental_deterministic` dataset option. PiperOrigin-RevId: 294786773 Change-Id: If89f87dbe2adb51aad79791aa3f18072132e74c6

view details

A. Unique TensorFlower

commit sha 2522a14a11f20a49cc473c4587fb3bef55403be5

Update ops-related pbtxt files. PiperOrigin-RevId: 294789924 Change-Id: I26834db79554b0d7de02db7f669b784bed5e711f

view details

A. Unique TensorFlower

commit sha 10a29d7a5029207c739ba51502b2a00e8067b01b

Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 294790443 Change-Id: Idf262e9b1e54f3e1b9cb7c8a55df86866f56f5d6

view details

A. Unique TensorFlower

commit sha ee6c34b5a9d3e743c0f9f00fa2a6b18555ee2981

Automated rollback of commit a90fa384cc031905420daa81a553bda6db4cc7bd PiperOrigin-RevId: 294792062 Change-Id: I56c1915922822e2ddf2fd445fafd1e6590acba04

view details

push time in a month

pull request commenttensorflow/tensorflow

[lite] fix ConvBuffer1x1 check

Can you review this as well ? Thank you.

gaurav1086

comment created time in a month

pull request commenttensorflow/tensorflow

[core] Added null check for output buffer

Hi, is there anything from my side here ? Please advise. Thanks.

gaurav1086

comment created time in a month

push eventgaurav1086/pytorch

Mikhail Zolotukhin

commit sha dde2ff46084202f646632d32aece341c911ff269

[Fuser] Add a knob for disabling/enabling CUDA fuser. (#33395) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33395 By default the GPU fuser stays enabled, but this function allows to manually disable it. It will be useful for working on other implementations of fuser. Test Plan: Imported from OSS Differential Revision: D19926911 Pulled By: ZolotukhinM fbshipit-source-id: 7ea9d1dd7821453d640f81c487b63e1d585123c4

view details

Vasil Khalidov

commit sha cfb4862673303a2725d815c0338435f6b6c699fb

[pytorch] correct input size check for GroupNorm (#33008) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33008 Corrects D19373507 to allow valid use cases that fail now. Multiplies batch size by the number of elements in a group to get the correct number of elements over which statistics are computed. **Details**: The current implementation disallows GroupNorm to be applied to tensors of shape e.g. `(1, C, 1, 1)` to prevent cases where statistics are computed over 1 element and thus result in a tensor filled with zeros. However, in GroupNorm the statistics are calculated across channels. So in case where one has an input tensor of shape `(1, 256, 1, 1)` for `GroupNorm(32, 256)`, the statistics will be computed over 8 elements and thus be meaningful. One use case is [Atrous Spatial Pyramid Pooling (ASPPPooling)](https://github.com/pytorch/vision/blob/791c172a337d98012018f98ffde93b1020ba3ed5/torchvision/models/segmentation/deeplabv3.py#L50), where GroupNorm could be used in place of BatchNorm [here](https://github.com/pytorch/vision/blob/791c172a337d98012018f98ffde93b1020ba3ed5/torchvision/models/segmentation/deeplabv3.py#L55). However, now this is prohibited and results in failures. Proposed solution consists in correcting the computation of the number of elements over which statistics are computed. The number of elements per group is taken into account in the batch size. Test Plan: check that existing tests pass Reviewed By: fmassa Differential Revision: D19723407 fbshipit-source-id: c85c244c832e6592e9aedb279d0acc867eef8f0c

view details

Richard Zou

commit sha 28c5213a97eddc9d742a0ba1087981a8d6010654

Add mechanism to pass a number of workers to cpp extensions (#33346) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33346 Fixes #33091 This PR lets users control the number of workers that cpp extensions uses through the environment variable `MAX_JOBS`. If the environment variable is a non-negative integer we use that many threads; otherwise, ninja falls back to the default. I chose to use the name `MAX_JOBS` because we use it in PyTorch already to control the number of workers PyTorch builds with. There is a risk that users of cpp extensions already have `MAX_JOBS` set but we are hoping that that risk is small and/or it means semantically the same thing. Test Plan: - tested locally Differential Revision: D19911645 Pulled By: zou3519 fbshipit-source-id: d20ed42de4f845499ed38f1a1c73e9ccb620f780

view details

Assaf Shocher

commit sha 2c99ea86540da8dad0d2123e01d17148052f0fef

Dirac init compatibility with group convolutions (#32825) Summary: Initializing weights of group-conv with init.dirac_, and applying, previously resulted in an output that makes no sense: ``` x = torch.randn([1, 3, 3, 3]) print('input:\n', x) conv_layer = torch.nn.Conv2d(3, 3, 3, padding=1, groups=3, bias=False) torch.nn.init.dirac_(conv_layer.weight.data) print('\noutput (before this PR):\n',conv_layer(x)) input: tensor([[[[ 0.5369, -1.1428, 0.1031], [ 0.4638, -0.0854, -0.6553], [ 0.8321, -2.5926, -0.3214]], [[-0.2289, -0.0895, 0.4407], [ 1.2309, -1.2096, -1.5216], [-0.1798, 1.1694, 0.3469]], [[ 0.1905, 0.8095, 0.5490], [-0.4525, -0.4284, -0.1141], [ 1.1857, -0.9246, -0.5119]]]]) output (before this PR): tensor([[[[ 0.5369, -1.1428, 0.1031], [ 0.4638, -0.0854, -0.6553], [ 0.8321, -2.5926, -0.3214]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]]], grad_fn=<MkldnnConvolutionBackward>) ```` This PR allows introducing groups to the initialization: ``` torch.nn.init.dirac_(conv_layer.weight.data, groups=3) print('output (after this PR):\n', conv_layer(x)) output (after this PR): tensor([[[[ 0.5369, -1.1428, 0.1031], [ 0.4638, -0.0854, -0.6553], [ 0.8321, -2.5926, -0.3214]], [[-0.2289, -0.0895, 0.4407], [ 1.2309, -1.2096, -1.5216], [-0.1798, 1.1694, 0.3469]], [[ 0.1905, 0.8095, 0.5490], [-0.4525, -0.4284, -0.1141], [ 1.1857, -0.9246, -0.5119]]]], grad_fn=<MkldnnConvolutionBackward>) ``` When out_channels is different than input_channels, it does the natural thing which is applying identity in each group separately: ``` x = torch.randn([1, 2, 3, 3]) print('input:\n', x) conv_layer = torch.nn.Conv2d(2, 4, 3, padding=1, groups=2, bias=False) torch.nn.init.dirac_(conv_layer.weight.data, groups=2) print('\noutput:\n', conv_layer(x)) input: tensor([[[[ 1.2205, -0.6608, 0.8640], [-0.5464, 1.1288, 1.4726], [-0.6693, 0.4000, -1.7613]], [[-0.8760, -0.8814, -0.4705], [ 0.6283, -0.5943, 0.6873], [-0.6852, 1.4723, 0.3325]]]]) output: tensor([[[[ 1.2205, -0.6608, 0.8640], [-0.5464, 1.1288, 1.4726], [-0.6693, 0.4000, -1.7613]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]], [[-0.8760, -0.8814, -0.4705], [ 0.6283, -0.5943, 0.6873], [-0.6852, 1.4723, 0.3325]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]]], grad_fn=<MkldnnConvolutionBackward>) ``` Argument 'groups' defaults to 1 so it is backward compatible. Tests are modified to include cases of with groups>1 but also contain groups=1 cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32825 Differential Revision: D19859926 Pulled By: vincentqb fbshipit-source-id: 9dfdd24471ff14d79c442dfd28c1891aff812fdf

view details

Jeong Ukjae

commit sha 879cf0b15a54c7848ae710e3d0ec62c4a9d7d3dd

fix typing bug of LambdaLR.__init__ (#33271) Summary: ## problem ```python class LambdaLR(_LRScheduler): """Sets the learning rate of each parameter group to the initial lr times a given function. When last_epoch=-1, sets initial lr as lr. Args: optimizer (Optimizer): Wrapped optimizer. lr_lambda (function or list): A function which computes a multiplicative factor given an integer parameter epoch, or a list of such functions, one for each group in optimizer.param_groups. last_epoch (int): The index of last epoch. Default: -1. Example: >>> # Assuming optimizer has two groups. >>> lambda1 = lambda epoch: epoch // 30 >>> lambda2 = lambda epoch: 0.95 ** epoch >>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2]) >>> for epoch in range(100): >>> train(...) >>> validate(...) >>> scheduler.step() """ ``` `LambdaLR` takes a lambda that returns a float and takes a int, or a list of such lambdas. ## related issue Resolve https://github.com/pytorch/pytorch/issues/32645 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33271 Differential Revision: D19878665 Pulled By: vincentqb fbshipit-source-id: 50b16caea13de5a3cbd187e688369f33500499d0

view details

Gregory Chanan

commit sha f938b3b4e06ebad6b1c5c208be77885819024a7b

Remove TH binding of set_(Tensor). (#33358) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33358 We just translate this code to ATen. Test Plan: Imported from OSS Differential Revision: D19911114 Pulled By: gchanan fbshipit-source-id: 2279e63bb7006f7253620417937e3ce9301e0cdb

view details

svcscm

commit sha 4468a7b7b3170c4543d465bcfc3f3dd49bbb268e

Updating submodules Summary: GitHub commits: https://github.com/hhvm/hsl/commit/efc34423b6b2870412f51da847571f586581e881 https://github.com/facebook/fbthrift/commit/75bb4596547b2f9daf33b8d6649a434e50109f59 https://github.com/facebook/litho/commit/fc1945c2e032dbdbac6ec1db02d782aa168a9b7b https://github.com/facebookresearch/pytorch-biggraph/commit/332a31a1450b87e0aea5cc0906aea6f728d0d450 https://github.com/pytorch/fbgemm/commit/2b6eef4dc93c5f5cb9e41eb3779cbe21fcc89e1d Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: d105b9aa5001c53f884f007406684b73809a7680

view details

Michael Suo

commit sha abbf6e7f53f9c70b3d87e3cf5b1f54fcb2e32d25

fix clang-tidy lint (#33448) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33448 Test Plan: Imported from OSS Differential Revision: D19952962 Pulled By: suo fbshipit-source-id: db04bf74f6156edd1bd0716b12f6ca911c84a6bf

view details

peter

commit sha 4c8064c9e14d7118296b5c378d80fde63a62aeb7

Fix avx-512 detection logic for jit fuser with MSVC 2019 (#33403) Summary: Fixes https://github.com/pytorch/pytorch/issues/33401. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33403 Differential Revision: D19949812 Pulled By: soumith fbshipit-source-id: 00dc3c99b5ba1c13394d5d38bcb148720434b0a3

view details

Peter Bell

commit sha 44af8ee6cd3073ef2c6c6e8a61cfdde5dff9b345

Add pybind11 exception translator (#30588) Summary: Closes https://github.com/pytorch/pytorch/issues/30027 The idea here is that you can bind a function with `pybind11` in a single line and without modifying the function: ```cpp m.def("foo", foo, py::call_guard<torch::PyWarningHandler>()); ``` Where warnings are handled by the [`call_guard`](https://pybind11.readthedocs.io/en/stable/advanced/functions.html#call-guard) and exceptions are handled by the `pybind11` exception translator. To do this, I have added support for handling C++ exceptions in `torch::PyWarningHandler`'s destructor without setting the python error state before hand. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30588 Differential Revision: D19905626 Pulled By: albanD fbshipit-source-id: 90c0a5e298b123cc0c8ab9c52c91be4e96ea47c6

view details

Michael Ranieri

commit sha 1af30451e54aeac20ac6fd9f1e8b22f07a8e55d8

sync srcs between fbcode and ovrsource targets (#33368) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33368 reorganizing files that describe sources to ensure the same list is used for both fbcode and ovrsource targets. (BUCK vs TARGETS) Test Plan: CI green Reviewed By: malfet Differential Revision: D19803036 fbshipit-source-id: 69c1fa10877c3f0c0e9c1517784949c3c9939710

view details

Owen Anderson

commit sha 1d743e31543cca850665fb7926e6a748664168c7

Add guard elimination support for aten::unsqueeze. (#33371) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33371 Differential Revision: D19920041 Pulled By: resistor fbshipit-source-id: 906af47676dba014c31eef069a4753207f2efc60

view details

anjali411

commit sha 016d73bd74d93f418910c6a4368bca7552bfda7e

remove Complex CPU/CUDA backend enum keys (#33267) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33267 Test Plan: Imported from OSS Differential Revision: D19907696 Pulled By: anjali411 fbshipit-source-id: 78cc55344313387c4b05bb003688915cee64e3be

view details

anjali411

commit sha da015c77a1e24ff156ee626fb0d5f3fbe419003d

Cummax and Cummin doc update and performance benchmark (#32537) Summary: [CPU] Benchmark results for cummax, cummin: In [1]: import torch In [2]: x=torch.randn(5,6,7).cuda() In [3]: %timeit x.cummax(0) 134 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [4]: %timeit x.max(0) 114 µs ± 560 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [5]: %timeit x.cummax(1) 134 µs ± 760 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [6]: %timeit x.max(1) 118 µs ± 514 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [7]: %timeit x.cumsum(0) 97.1 µs ± 6.93 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [8]: %timeit x.cumprod(0) 83.6 µs ± 689 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [9]: %timeit x.cumprod(1) 86.3 µs ± 528 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [10]: y=torch.randn(5,6,7) In [11]: %timeit y.cummax(0) 148 µs ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [12]: %timeit y.max(0) 111 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [13]: %timeit y.cumsum(0) 54.8 µs ± 311 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [14]: %timeit y.cumprod(0) 56.2 µs ± 836 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32537 Differential Revision: D19951171 Pulled By: anjali411 fbshipit-source-id: cf972c550189473e9ce62e24ac7dd34b9373fef9

view details

Zachary DeVito

commit sha c59e35b1477ce7a09780dc678bb7362a6aee69f7

interpreter handling for varargs to remove need for looking at Node (#32791) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32791 When a registered operator has varags (ends with ... in its schema), the interpreter now appends the number of arguments to the top of the stack before invoking the operator. This allows the removal of more uses of Node* in the interpreter. This PR also then cleans up the constructors for Operator to make it more likely someone chooses the correct one. After making these ops: ``` USES NODE: prim::TupleUnpack(...) -> (...) USES NODE: prim::TupleSlice(...) -> (...) USES NODE: prim::TupleConstruct(...) -> (...) USES NODE: prim::ListUnpack(...) -> (...) USES NODE: prim::ListConstruct(...) -> (...) USES NODE: prim::DictConstruct(...) -> (...) USES NODE: prim::Constant() -> (...) USES NODE: prim::isinstance(...) -> (...) USES NODE: prim::CreateObject(...) -> (...) USES NODE: prim::fork(...) -> (...) USES NODE: aten::warn(str message, *, int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack ``` Into interpreter primitives, we can remove all but two constructors for operators: one that is (schema_string, operation), and one that is (symbol, op_creator) for the remaining weird primitives. Test Plan: Imported from OSS Differential Revision: D19673158 Pulled By: zdevito fbshipit-source-id: 95442a001538a6f53c1db4a210f8557ef118de66

view details

Zachary DeVito

commit sha 83c347ff4ae96f75fdd11c26df322cc12a434a8d

Remove prim::Constant op (#32804) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32804 Constants are interpreter primitives so the op was not actually used. This cleans up some of the logic around it. This also fixes constant prop such that failures to look up an op do not silently stop constant propagation. Instead, only errors inside the op implementation itself will do this. Test Plan: Imported from OSS Differential Revision: D19673156 Pulled By: zdevito fbshipit-source-id: 7beee59a6a67a6c2f8261d86bd505280fefa999e

view details

Zachary DeVito

commit sha 7f2c25b6fa56cda041f8e254cd57ca016df16bc9

Move special ops into interpreter (#32889) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32889 Common primitive ops that have special inputs make it very hard to serialize the bytecode for mobile because information about how the op behaves is hidden in the Node*. This changes how we handle the following ops so that they are encoded as their own interpreter bytecodes. ``` USES NODE: prim::TupleUnpack(...) -> (...) USES NODE: prim::TupleSlice(...) -> (...) USES NODE: prim::TupleConstruct(...) -> (...) USES NODE: prim::ListUnpack(...) -> (...) USES NODE: prim::ListConstruct(...) -> (...) USES NODE: prim::DictConstruct(...) -> (...) USES NODE: prim::Constant() -> (...) USES NODE: prim::isinstance(...) -> (...) USES NODE: prim::CreateObject(...) -> (...) USES NODE: prim::fork(...) -> (...) USES NODE: aten::warn(str message, *, int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack ``` This leaves a state where the _only_ remaining Node*-consuming builtins are things that are only introduced during JIT optimization and will not appear in mobile code. Serialization of bytecode can now be made to directly write the CodeImpl object without modification. Test Plan: Imported from OSS Differential Revision: D19673157 Pulled By: zdevito fbshipit-source-id: 7b8c633d38a4c783b250fbdb222705e71a83ad26

view details

Zachary DeVito

commit sha f1b73799d58c6c3b3247a90bf9cd8f423dcb1015

Clean up isinstance flags (#33265) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33265 This removes the need for isinstance to keep trace of list and tuple separately by introducing AnyListType and AnyTupleType into the JIT type system to be the common supertype of any lists or tuples. This allows us to remove the weird flags from the interpreter for the isinstance operator. Test Plan: Imported from OSS Differential Revision: D19883933 Pulled By: zdevito fbshipit-source-id: f998041b42d8b4554c5b99f4d95d1d42553c4d81

view details

Ashkan Aliabadi

commit sha 43e015f4b1c6f688ebf0f1bc573f51e53ecef896

Bug fix in dynamic quantization kernels + better test coverage. (#33320) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33320 Reviewed By: supriyar Differential Revision: D19893911 Pulled By: AshkanAliabadi fbshipit-source-id: e79dd06af333c6629e3412315550814da28d9c24

view details

Xingdong Zuo

commit sha feaa622fc658dd76bbc26fecaf614d4056bbac23

[Update transforms.py]Add `TanhTransform` (#19785) Summary: Resolves https://github.com/pytorch/pytorch/issues/33195 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19785 Differential Revision: D19642395 Pulled By: ezyang fbshipit-source-id: 73c386fb89cd195201757b5fa47d6c01914a1f8f

view details

push time in a month

pull request commentfacebook/rocksdb

[tools] simplify user_access_only expression

@riversand963 , its done. Thank you.

gaurav1086

comment created time in a month

push eventgaurav1086/rocksdb

Adam Retter

commit sha 7242dae7fe5818eed1cc655c4444d54173c5ca8c

Improve RocksJava Comparator (#6252) Summary: This is a redesign of the API for RocksJava comparators with the aim of improving performance. It also simplifies the class hierarchy. **NOTE**: This breaks backwards compatibility for existing 3rd party Comparators implemented in Java... so we need to consider carefully which release branches this goes into. Previously when implementing a comparator in Java the developer had a choice of subclassing either `DirectComparator` or `Comparator` which would use direct and non-direct byte-buffers resepectively (via `DirectSlice` and `Slice`). In this redesign there we have eliminated the overhead of using the Java Slice classes, and just use `ByteBuffer`s. The `ComparatorOptions` supplied when constructing a Comparator allow you to choose between direct and non-direct byte buffers by setting `useDirect`. In addition, the `ComparatorOptions` now allow you to choose whether a ByteBuffer is reused over multiple comparator calls, by setting `maxReusedBufferSize > 0`. When buffers are reused, ComparatorOptions provides a choice of mutex type by setting `useAdaptiveMutex`. --- [JMH benchmarks previously indicated](https://github.com/facebook/rocksdb/pull/6241#issue-356398306) that the difference between C++ and Java for implementing a comparator was ~7x slowdown in Java. With these changes, when reusing buffers and guarding access to them via mutexes the slowdown is approximately the same. However, these changes offer a new facility to not reuse mutextes, which reduces the slowdown to ~5.5x in Java. We also offer a `thread_local` mechanism for reusing buffers, which reduces slowdown to ~5.2x in Java (closes https://github.com/facebook/rocksdb/pull/4425). These changes also form a good base for further optimisation work such as further JNI lookup caching, and JNI critical. --- These numbers were captured without jemalloc. With jemalloc, the performance improves for all tests, and the Java slowdown reduces to between 4.8x and 5.x. ``` ComparatorBenchmarks.put native_bytewise thrpt 25 124483.795 ± 2032.443 ops/s ComparatorBenchmarks.put native_reverse_bytewise thrpt 25 114414.536 ± 3486.156 ops/s ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_adaptive-mutex thrpt 25 17228.250 ± 1288.546 ops/s ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_non-adaptive-mutex thrpt 25 16035.865 ± 1248.099 ops/s ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_thread-local thrpt 25 21571.500 ± 871.521 ops/s ComparatorBenchmarks.put java_bytewise_direct_reused-64_adaptive-mutex thrpt 25 23613.773 ± 8465.660 ops/s ComparatorBenchmarks.put java_bytewise_direct_reused-64_non-adaptive-mutex thrpt 25 16768.172 ± 5618.489 ops/s ComparatorBenchmarks.put java_bytewise_direct_reused-64_thread-local thrpt 25 23921.164 ± 8734.742 ops/s ComparatorBenchmarks.put java_bytewise_non-direct_no-reuse thrpt 25 17899.684 ± 839.679 ops/s ComparatorBenchmarks.put java_bytewise_direct_no-reuse thrpt 25 22148.316 ± 1215.527 ops/s ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_adaptive-mutex thrpt 25 11311.126 ± 820.602 ops/s ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_non-adaptive-mutex thrpt 25 11421.311 ± 807.210 ops/s ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_thread-local thrpt 25 11554.005 ± 960.556 ops/s ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_adaptive-mutex thrpt 25 22960.523 ± 1673.421 ops/s ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_non-adaptive-mutex thrpt 25 18293.317 ± 1434.601 ops/s ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_thread-local thrpt 25 24479.361 ± 2157.306 ops/s ComparatorBenchmarks.put java_reverse_bytewise_non-direct_no-reuse thrpt 25 7942.286 ± 626.170 ops/s ComparatorBenchmarks.put java_reverse_bytewise_direct_no-reuse thrpt 25 11781.955 ± 1019.843 ops/s ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6252 Differential Revision: D19331064 Pulled By: pdillinger fbshipit-source-id: 1f3b794e6a14162b2c3ffb943e8c0e64a0c03738

view details

Huisheng Liu

commit sha eb4d6af5ae6268033407cf2eb5e9a56b57bc4ceb

Error handler test fix (#6266) Summary: MultiDBCompactionError fails when it verifies the number of files on level 0 and level 1 without waiting for compaction to finish. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6266 Differential Revision: D19701639 Pulled By: riversand963 fbshipit-source-id: e96d511bcde705075f073e0b550cebcd2ecfccdc

view details

sdong

commit sha 36c504be17d9e7c81567ba0732ef81632d3d2c74

Avoid create directory for every column families (#6358) Summary: A relatively recent regression causes for every CF, create and open directory is called for the DB directory, unless CF has a private directory. This doesn't scale well with large number of column families. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6358 Test Plan: Run all existing tests and see it pass. strace with db_bench --num_column_families and observe it doesn't open directory for number of column families. Differential Revision: D19675141 fbshipit-source-id: da01d9216f1dae3f03d4064fbd88ce71245bd9be

view details

sdong

commit sha f195d8d5231339242118937797d1b38e080cbcf7

Use ReadFileToString() to get content from IDENTITY file (#6365) Summary: Right now when reading IDENTITY file, we use a very similar logic as ReadFileToString() while it does an extra file size check, which may be expensive in some file systems. There is no reason to duplicate the logic. Use ReadFileToString() instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6365 Test Plan: RUn all existing tests. Differential Revision: D19709399 fbshipit-source-id: 3bac31f3b2471f98a0d2694278b41e9cd34040fe

view details

anand76

commit sha 7330ec0ff18451c731f46c8b8e414864f80732d5

Fix a test failure in error_handler_test (#6367) Summary: Fix an intermittent failure in DBErrorHandlingTest.CompactionManifestWriteError due to a race between background error recovery and the main test thread calling TEST_WaitForCompact(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6367 Test Plan: Run the test using gtest_parallel Differential Revision: D19713802 Pulled By: anand1976 fbshipit-source-id: 29e35dc26e0984fe8334c083e059f4fa1f335d68

view details

Mike Kolupaev

commit sha 637e64b9ac798d88097102d922b7284c6deaad54

Add an option to prevent DB::Open() from querying sizes of all sst files (#6353) Summary: When paranoid_checks is on, DBImpl::CheckConsistency() iterates over all sst files and calls Env::GetFileSize() for each of them. As far as I could understand, this is pretty arbitrary and doesn't affect correctness - if filesystem doesn't corrupt fsynced files, the file sizes will always match; if it does, it may as well corrupt contents as well as sizes, and rocksdb doesn't check contents on open. If there are thousands of sst files, getting all their sizes takes a while. If, on top of that, Env is overridden to use some remote storage instead of local filesystem, it can be *really* slow and overload the remote storage service. This PR adds an option to not do GetFileSize(); instead it does GetChildren() for parent directory to check that all the expected sst files are at least present, but doesn't check their sizes. We can't just disable paranoid_checks instead because paranoid_checks do a few other important things: make the DB read-only on write errors, print error messages on read errors, etc. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6353 Test Plan: ran the added sanity check unit test. Will try it out in a LogDevice test cluster where the GetFileSize() calls are causing a lot of trouble. Differential Revision: D19656425 Pulled By: al13n321 fbshipit-source-id: c2c421b367633033760d1f56747bad206d1fbf82

view details

sdong

commit sha 69c8614815b9d378d85447080b8ba4ee83152e6f

Avoid to get manifest file size when recovering from it. (#6369) Summary: Right now RocksDB gets manifest file size before recovering from it. The information is available in LogReader. Use it instead to prevent one file system call. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6369 Test Plan: Run all existing tests Differential Revision: D19714872 fbshipit-source-id: 0144be324d403c99e3da875ea2feccc8f64e883d

view details

sdong

commit sha 3a073234da663709fcb7a479ec88ce7476c48e3a

Consolidate ReadFileToString() (#6366) Summary: It's a minor refactoring. We have two ReadFileToString() but they are very similar. Make the one with Env argument calls the one with FS argument instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6366 Test Plan: Run all existing tests Differential Revision: D19712332 fbshipit-source-id: 5ae6fabf6355938690d95cda52afd1f39e0a7823

view details

Mike Kolupaev

commit sha 1ed7d9b1b5521c9774f59a68f0d4f5db0c469d6e

Avoid lots of calls to Env::GetFileSize() in SstFileManagerImpl when opening DB (#6363) Summary: Before this PR it calls GetFileSize() once for each sst file in the DB. This can take a long time if there are be tens of thousands of sst files (e.g. in thousands of column families), and even longer if Env is talking to some remote service rather than local filesystem. This PR makes DB::Open() use sst file sizes that are already known from manifest (typically almost all files in the DB) and only call GetFileSize() for non-sst or obsolete files. Note that GetFileSize() is also called and checked against manifest in CheckConsistency(), so the calls in SstFileManagerImpl were completely redundant. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6363 Test Plan: deployed to a test cluster, looked at a dump of Env calls (from a custom instrumented Env) - no more thousands of GetFileSize()s. Differential Revision: D19702509 Pulled By: al13n321 fbshipit-source-id: 99f8110620cb2e9d0c092dfcdbb11f3af4ff8b73

view details

sdong

commit sha 24c9dce8254a7f87d10f280e801f4a6e7ab7310d

Remove include math.h (#6373) Summary: We see some odd errors complaining math. However, it doesn't seem that it is needed to be included. Remove the include of math.h. Just removing it from db_bench doesn't seem to break anything. Replacing sqrt from std::sqrt seems to work for histogram.cc Pull Request resolved: https://github.com/facebook/rocksdb/pull/6373 Test Plan: Watch Travis and appveyor to run. Differential Revision: D19730068 fbshipit-source-id: d3ad41defcdd9f51c2da1a3673fb258f5dfacf47

view details

Cheng Chang

commit sha f5f79f01a2bd94b35ece9f0cf9ecd209888b457c

Be able to read compatible leveldb sst files (#6370) Summary: In `DBSSTTest.SSTsWithLdbSuffixHandling`, some sst files are renamed to ldb files, the original intention of the test is to test that the ldb files can be loaded along with the sst files. The original test checks this by `ASSERT_NE("NOT_FOUND", Get(Key(k)))`, but the problem is `Get(Key(k))` returns IO error due to path not found instead of NOT_FOUND, so the success of ASSERT_NE does not mean the key can be retrieved. This PR updates the test to make sure Get(Key(k)) returns the original value. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6370 Test Plan: make db_sst_test && ./db_sst_test Differential Revision: D19726278 Pulled By: cheng-chang fbshipit-source-id: 993127f56457b315e669af4eeb92d6f956b7a4b7

view details

atul

commit sha c6f75516b7a13a0de4d197e0157b3191d28a4c18

Fixing the documentation of the function (#4803) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6354 Differential Revision: D19725459 Pulled By: riversand963 fbshipit-source-id: fded24576251bfa4b289399f0909f1fe43426e28

view details

Yanqin Jin

commit sha f361cedf0656d55381a0cae2722e934198569f4e

Atomic flush rollback once on failure (#6385) Summary: Before this fix, atomic flush codepath may hit an assertion failure on a specific failure case. If all flush jobs within an atomic flush succeed (they do not write to MANIFEST), but batch writing version edits to MANIFEST fails, then `cfd->imm()->RollbackMemTableFlush()` will be called twice, and the second invocation hits assertion failure `assert(m->flush_in_progress_)` since the first invocation resets the variable `flush_in_progress_` to false already. Test plan (dev server): ``` ./db_flush_test --gtest_filter=DBAtomicFlushTest/DBAtomicFlushTest.RollbackAfterFailToInstallResults make check ``` Both must succeed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6385 Differential Revision: D19782943 Pulled By: riversand963 fbshipit-source-id: 84e1592625e729d1b70fdc8479959387a74cb121

view details

Cheng Chang

commit sha 0a74e1b958f735fd2d79e2efa6e81f3c62c029c2

Add status checks during DB::Open (#6380) Summary: Several statuses were not checked during DB::Open. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6380 Test Plan: make check Differential Revision: D19780237 Pulled By: cheng-chang fbshipit-source-id: c8d189d20344bd1607890dd1449345bda2ef96b9

view details

Cheng Chang

commit sha 107a7ca9301f5b0958d17f2247f2511960acc970

Remove inappropriate comments (#6371) Summary: The comments are for iterators, not Cleanable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6371 Test Plan: no need Differential Revision: D19727527 Pulled By: cheng-chang fbshipit-source-id: c74aeffa27ea0ce15a36ff6f9694826712cd1c70

view details

Levi Tamasi

commit sha 1b4be4cac9e803a66e88130ae2ff2141b8e2d633

BlobDB: ignore trivially moved files when updating the SST<->blob file mapping (#6381) Summary: BlobDB keeps track of the mapping between SSTs and blob files using the `OnFlushCompleted` and `OnCompactionCompleted` callbacks of the `EventListener` interface: upon receiving a flush notification, a link is added between the newly flushed SST and the corresponding blob file; for compactions, links are removed for the inputs and added for the outputs. The earlier code performed this link deletion and addition even for trivially moved files; the new code walks through the two lists together (in a fashion that's similar to merge sort) and skips such files. This should mitigate https://github.com/facebook/rocksdb/issues/6338, wherein an assertion is triggered with the earlier code when a compaction notification for a trivial move precedes the flush notification for the moved SST. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6381 Test Plan: make check Differential Revision: D19773729 Pulled By: ltamasi fbshipit-source-id: ae0f273ded061110dd9334e8fb99b0d7786650b0

view details

Cheng Chang

commit sha 5f478b9f752414b5ec8155dc91f50c3331d296c3

Remove outdated comment (#6379) Summary: Since the logic for handling IDENTITY file is now inside `NewDB`, the comment above `NewDB` is no longer relevant. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6379 Test Plan: not needed Differential Revision: D19795440 Pulled By: cheng-chang fbshipit-source-id: 0b1cca87ac6d92474701c46aa4c8d4d708bfa19b

view details

Levi Tamasi

commit sha 752c87af78381b775c93f3d878f2d1f6b4098e14

Clean up VersionEdit a bit (#6383) Summary: This is a bunch of small improvements to `VersionEdit`. Namely, the patch * Makes the names and order of variables, methods, and code chunks related to the various information elements more consistent, and adds missing getters for the sake of completeness. * Initializes previously uninitialized stack variables. * Marks all getters const to improve const correctness. * Adds in-class initializers and removes the default ctor that would create an object with uninitialized built-in fields and call `Clear` afterwards. * Adds a new type alias for new files and changes the existing `typedef` for deleted files into a type alias as well. * Makes the helper method `DecodeNewFile4From` private. * Switches from long-winded iterator syntax to range based loops in a couple of places. * Fixes a couple of assignments where an integer 0 was assigned to boolean members. * Fixes a getter which used to return a `const std::string` instead of the intended `const std::string&`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6383 Test Plan: make check Differential Revision: D19780537 Pulled By: ltamasi fbshipit-source-id: b0b4f09fee0ec0e7c7b7a6d76bfe5346e91824d0

view details

Chad Austin

commit sha 25fbdc5a319945e7fdfcea1c0ad3b4a9a7324cd5

Fix Buck build on macOS (#6378) Summary: liburing is a Linux-specific dependency, so make sure it's configured in the Linux-only Buck rules. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6378 Test Plan: ``` ~/fbcode $ cp internal_repo_rocksdb/repo/TARGETS rocksdb/src ~/fbcode $ buck build mode/mac eden ``` Reviewed By: chadaustin Differential Revision: D19760039 Pulled By: riversand963 fbshipit-source-id: 2abfce81c8b17965ef76012262cd117708e0294f

view details

Cheng Chang

commit sha b42fa1497fc321b712c6b3147cfdb36bebffe4f8

Support move semantics for PinnableSlice (#6374) Summary: It's logically correct for PinnableSlice to support move semantics to transfer ownership of the pinned memory region. This PR adds both move constructor and move assignment to PinnableSlice. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6374 Test Plan: A set of unit tests for the move semantics are added in slice_test. So `make slice_test && ./slice_test`. Differential Revision: D19739254 Pulled By: cheng-chang fbshipit-source-id: f898bd811bb05b2d87384ec58b645e9915e8e0b1

view details

push time in a month

pull request commentfacebook/rocksdb

[tools] simplify user_access_only expression

@riversand963 np. sure, will do. thanks.

gaurav1086

comment created time in a month

push eventgaurav1086/tensorflow

frreiss

commit sha b6c5a9fb8e12e42681f4317f72ce52872653497c

Add soft-link to pylintrc to project root

view details

ANSHUMAN TRIPATHY

commit sha 7375586f215385d1c29a6a9538ae658f5ae7b936

Contradicting comments removed

view details

ANSHUMAN TRIPATHY

commit sha 447c1420b8c2a7d9baf19248c97880b8ff832e0f

[1] Review comments handled

view details

Koan-Sin Tan

commit sha e305ac4b75a9523bf047fdaef75159f13bd04b86

[tflite] add int8 input/output to label_image More and more models, such as MobilenetV3's EdgeTPU ones, are using post-training full integer quantization. With this patch, I can get reasonable results. ./label_image_int8 -m mobilenet_edgetpu_224_1.0_int8.tflite Loaded model mobilenet_edgetpu_224_1.0_int8.tflite resolved reporter INFO: Initialized TensorFlow Lite runtime. invoked average time: 15.363 ms 0.867188: 653 military uniform 0.0390625: 835 suit 0.015625: 458 bow tie 0.0078125: 907 Windsor tie 0.00390625: 716 pickelhaube

view details

Koan-Sin Tan

commit sha b698e34e97ba49aa2d562a42804476ab5a024ab0

clean up

view details

Koan-Sin Tan

commit sha ec72fed7066f44d09172b3cfa358a299b7e5ec12

address review comments 1. add explicit cast back 2. change int to TfLiteType

view details

Koan-Sin Tan

commit sha ae2e9865a1ddfc782e9a41d89b59d4a7783c3f30

[tflite] bump SPLIT op ver from 1 to 3 in NNAPI delegate I need SPLIT op version 3. Since it's supported by TFLite and NNAPI 1.2. It's should be safe to bump the op version so that I can delegate SPLIT ops to accelerators.

view details

Duncan Riach

commit sha ed955df9438a4e13f33a439338c92cbc029a713d

Change bias_op tests to work in eager mode (as well as graph mode)

view details

Duncan Riach

commit sha c227f00a33de7ed10ade9bfb0ddce2833110a0e6

Fix Ubuntu Sanity CI fail due to pylint error

view details

Anuj Rawat

commit sha 5670f0f29c50dc2427f8cb4386aaeab6094083f1

Fixing test that fails on AVX512 The operation categorical_crossentropy requires taking log as an intermediate step. Due to the rank (2) and shape (3, 3) of the tensors used in this example, on AVX2 and older builds, the log operation uses plog, Eigen's packet log method, whereas on AVX512 build, the log operation is not vectorized and ends up using std::log. Due to the precision mismatch between std::log and Eigen's plog, the results do not match exactly. The loss values comes out to be equal to [0.10536055 0.8046685 0.06187541], instead of [0.10536055 0.8046684 0.06187541]. This is an expected mismatch and should not fail the test. The absolutely correct way to test would be to compare hex values and make sure that the results are within the expected range of the ULP error. An easier fix would be to reduce the precision of the test to account for such mismatches between the implementation of operators in the underlying math libraries. We are taking the second approach and will compare results after rounding to 5 decimal places.

view details

Arvind Sundararajan

commit sha d28af41cf90fb85c91e09cddbb08b7ad43bf30d9

Handle indexed slice empty shapes in IndexedSlices gradients correctly.

view details

Lamar

commit sha 71dd20a99530f22c86a987088484db8f4f227e52

fixed static sized arrays with variable length using const int or int for the size of an array implies that it has variable length (ill-formed, https://en.cppreference.com/w/cpp/language/ub), static arrays' lengths should be constexpr or a macro constant

view details

Koan-Sin Tan

commit sha cdf28e8bd05bdf317ebee1970230b27dcb269b49

Merge branch 'master' into label_image_int8

view details

Koan-Sin Tan

commit sha 3d09de22db5ab3088c3997349d3658e8ae732b43

avoid problem reported by Google's internal test

view details

Duncan Riach

commit sha 4ea10c4bcc1ca3d98e34c6742220c2c8fe9df946

Fix Ubuntu Sanity CI fail due to pylint error

view details

Koan-Sin Tan

commit sha ce899395b4ee2c21c6c20fa4d98cbb8228c67de3

including "tensorflow/lite/c/common.h" in get_top_n_impl.h move including "tensorflow/lite/c/common.h" from label_image.cc to get_top_n_impl.h

view details

Eugene Kuznetsov

commit sha af54994072bda083229fd11cb2b1d58e2cd38ab0

Implementing GpuManagedAllocator for ROCm Enabling several common runtime unit tests for ROCm

view details

Koan-Sin Tan

commit sha e20497cf3b0a97ee1e243537544e8e024b6d813d

add `//tensorflow/lite/c:common` to label_image_test

view details

Eugene Kuznetsov

commit sha ae0e325a9fd53f2981bc569a2e3f8699c72a2ddc

Fixing ROCm LSTM and GRU v2 test

view details

Koan-Sin Tan

commit sha d768d147870b202559878c610c366a0ac536a748

[tflite] enable INT8 for Java binding some models created by full-integer post training quantization, e.g., the mobilenet v3 edgetpu one [1], have INT8 input and output tensors. [1] https://storage.cloud.google.com/mobilenet_edgetpu/checkpoints/mobilenet_edgetpu_224_1.0.tgz

view details

push time in 2 months

push eventgaurav1086/pytorch

Wanchao Liang

commit sha 9ae4d38a2115e600781263750a5133a3bbbd1010

[rpc] Switch RRef to be managed by intrusive_ptr (#33189) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33189 Add RRefInterface to Aten/Core, which will later be used by IValue Switch all the rpc code base to use intrusive_ptr instead of shared_ptr, so that we could add it to IValue. Actual adding to IValue and JIT will be in next PR Test Plan: Imported from OSS Differential Revision: D19871241 Pulled By: wanchaol fbshipit-source-id: d7e1fd04b46320e0f26c18591b49c92ad30a4032

view details

Wanchao Liang

commit sha b2c58964327fb98abf941b78845392ad32338a90

[jit] Add RRef to IValue and JIT type system (#32992) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32992 This PR add RRef to IValue and the JIT type system. - The RRefInterface abstract class inherit from intrusive_ptr_target, this made the RRef class can be hold in ivalue as intrusive_ptr - Add RRefType as a JIT type, it's a container type similar to future type. Test Plan: Imported from OSS Differential Revision: D19871242 Pulled By: wanchaol fbshipit-source-id: cb80ca32605096f9a42ef147109fb368a7c1d4d3

view details

Wanchao Liang

commit sha 93179b1c1ca48d1749fbbff17e7345c565e93909

[jit] Initial use RRef in TorchScript (#33190) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33190 This enable the initial RRef type to be used inside TorchScript, user could pass a python RRef into a torchscript function and call to_here inside. Specifically, this PR: - Add RRef schema type parsing - Add python interop for RRef in Python and into JIT - register to_here op in register_distributed_ops More support for RRef in TorchScript will be added in future PRs Test Plan: Imported from OSS Differential Revision: D19871244 Pulled By: wanchaol fbshipit-source-id: 7eca6c491a84666b261c70806254b705603bd663

view details

Lu Fang

commit sha e5c7b7b8b523cab3ef6697aab0595bbeb7ece76c

Automatic update of fbcode/onnx to 04a29addfd5b912812addb8dea5f8763fbfaad01 (#33328) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33328 Previous import was 8b3f7e2e7a0f2aba0e629e23d89f07c7fc0e6a5e Included changes: - **[04a29add](https://github.com/onnx/onnx/commit/04a29add)**: Use // instead of # (#2598) <Lu Fang> - **[f8e140a9](https://github.com/onnx/onnx/commit/f8e140a9)**: Kezhan/function update (#2596) <Ke Zhang> - **[6185faae](https://github.com/onnx/onnx/commit/6185faae)**: fix the attribute types section in IR.md (#2590) <Ke Zhang> - **[f254647a](https://github.com/onnx/onnx/commit/f254647a)**: Allow Constant operator to promote scalar and list to tensors. (#2592) <Jeremy Cochoy> - **[f12ec799](https://github.com/onnx/onnx/commit/f12ec799)**: Add NegativeLogLikelihood(NllLoss) op (#2551) <liqunfu> Test Plan: ci Reviewed By: hl475 Differential Revision: D19897554 fbshipit-source-id: d8efb5c5ac8f9d71727de33c67af681ed8ec8123

view details

Lu Fang

commit sha 642bd51043dd92086fa6a16e1c901fae41a861c3

[ONNX] Skip problematic ONNX test to unblock CI (#33323) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33323 skip the tests until it is fixed Test Plan: ci Reviewed By: hl475 Differential Revision: D19894675 fbshipit-source-id: 1cfc153577bf021171f4412115d84719beae7a91

view details

Jongsoo Park

commit sha 92fbf7cf97fb09af2b2a627bb35cd928eaf734f6

[caffe2] use JIT'ed fp16 SLS (#32432) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32432 Use JIT'ed fp16 SLS in D19477209 from Caffe2 operators Test Plan: CI Reviewed By: jianyuh Differential Revision: D19477208 fbshipit-source-id: ef2ccba10f5f4c475166141bf09c266dedb92d38

view details

songyouwei

commit sha e5218e3e12096780f809efb6aced72509a1b54d2

Add missing error messages for container modules (#29991) Summary: Container `Module`s, including `ModuleList`, `ParameterList` and `ParameterDict`, should not be called like a regular `Module`. This PR add error messages for these special modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29991 Differential Revision: D19698535 Pulled By: ezyang fbshipit-source-id: fe156a0bbb033041086734b38f8c6fde034829bf

view details

Xiang Gao

commit sha 602aec325d2e4baa9dbf5c74f4dc205dd7c10031

Kill old cuda support (#33302) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33302 Differential Revision: D19899586 Pulled By: ezyang fbshipit-source-id: 11293475795b4bfee9a65133bb6718649e220787

view details

Francis Charette Migneault

commit sha 0150f40ddea0aa65336e55e4ac549d400c8e180f

dont force msvc /Ox flag which can conflict with /RTC1 in debug config (#33164) Summary: Relates to https://github.com/pytorch/pytorch/issues/33132 This fix doesn't add full multi-configuration support described in https://github.com/pytorch/pytorch/issues/33132 but at least avoid the error presented in the issue when `CMAKE_BUILD_TYPE=Debug` is used with MSVC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33164 Differential Revision: D19899727 Pulled By: ezyang fbshipit-source-id: 28a364d920c4a3fb577c6b484ccd69a133fbcf5d

view details

Yinghai Lu

commit sha ecd3c252b4da3056797f8a505c9ebe8d68db55c4

Suport all length one SLS op lowering: C2 part (#33332) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33332 We check the input shape of lengths and indices of SLS and add an attribute if they are the same. Test Plan: ``` buck test glow/fb/test/numerics:test_operator_onnxifinnpi -- test_slws_fused_8bit_rowwise_length1_graph ``` Reviewed By: ipiszy Differential Revision: D19874903 fbshipit-source-id: 06b643b5351d0ba19ba209b5a5b599fbb38b1dfc

view details

Ahmad Salim Al-Sibahi

commit sha b1583ceb1e1798ce2221e1182ed24b869c0a3e92

Second try on Von Mises: Make it JIT compatible (#33177) Summary: Follow up from https://github.com/pytorch/pytorch/issues/17168 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/33177 Differential Revision: D19899550 Pulled By: ezyang fbshipit-source-id: fbcdd9bc91438164bcb2b1cbc314c765520754e1

view details

George Guanheng Zhang

commit sha ff5f38f53b13714a267d4f535e76271609c9c662

Revert D19858239: [pytorch][PR] Refactor and add VS 14.16 and 2019 CI for Windows Test Plan: revert-hammer Differential Revision: D19858239 Original commit changeset: f068d8505886 fbshipit-source-id: b117e44d5552e157747920d8098ce3b86a29c6bf

view details

George Guanheng Zhang

commit sha 0c98939b7bd2d8065a1cbdeb7567069929b71b4c

Revert D19899550: [pytorch][PR] Second try on Von Mises: Make it JIT compatible Test Plan: revert-hammer Differential Revision: D19899550 Original commit changeset: fbcdd9bc9143 fbshipit-source-id: c8a675a8b53f884acd0e6c57bc7aa15faf83d5d6

view details

Jeremy Lilley

commit sha 1b2d2ba50474cccc3bbef3dba7950c3e53d80a54

[PyTorch] Fix write-after-free (TSAN) in GraphTask::set_error() (#33156) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33156 When dist_autograd_spawn_thrift's 'test_backward_node_failure_python_udf' test is run, it was encountering a TSAN error related to holding the mutex while the underlying datastructure was being dealloced. In this change, we simply get a shared_ptr<> reference to the future, and set_exception() without having the lock held, to avoid deallocing underneath the lock. ghstack-source-id: 98303434 Test Plan: buck test mode/opt-tsan //caffe2/test/distributed/rpc:dist_autograd_spawn_thrift -- 'test_backward_node_failure_python_udf \(test_dist_autograd_spawn\.DistAutogradTestWithSpawn\)' Differential Revision: D19821362 fbshipit-source-id: 82f735e33f8e608552418ae71592400fa3621e40

view details

Hong Xu

commit sha 7dde91b0aef2130da5aa5f8df98bfb8a38a91baf

Vectorize elu and its backward function on CPU (#32986) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32986 Benchmark: (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz) ```python import timeit for op in ('ELU',): print('Forward') for dtype in ('torch.double', 'torch.float'): for n, t in [(10_000, 100000), (100_000, 10000)]: print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('m(a)', setup=f'import torch; m = torch.nn.{op}(); a = torch.linspace(-1, 1, {n}, dtype={dtype})', number=t)) print('Backward') for dtype in ('torch.double', 'torch.float'): for n, t in [(20_000, 100000), (200_000, 10000)]: print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('y.backward(retain_graph=True)', setup=f'import torch; m = torch.nn.{op}(); a = torch.linspace(-1, 1, {n}, requires_grad=True, dtype={dtype}); x = m(a); y = x.sum()', number=t)) ``` Before: ``` Forward torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.double 5.292799739996553 torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.double 4.828570917001343 torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.float 3.1359513780043926 torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.float 2.7030876770004397 Backward torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.double 4.568238995998399 torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.double 1.8908141480060294 torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.float 3.8652471189998323 torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.float 1.13068484600808 ``` After: ``` Forward torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.double 2.1265591429983033 torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.double 1.6708065870043356 torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.float 1.1806934149935842 torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.float 0.77735430400935 Backward torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.double 4.494567882007686 torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.double 2.007220732004498 torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.float 3.615133151994087 torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.float 1.105554559995653 ``` Test Plan: Imported from OSS Differential Revision: D19794595 Pulled By: VitalyFedyunin fbshipit-source-id: c319ec04676ced22179b8b34789ac8bf6428deab

view details

Nicki Skafte

commit sha 4bef344210b6edbcb2214862fb90c1a83b573a06

Implementation of mixture distributions (#22742) Summary: Addressing issue https://github.com/pytorch/pytorch/issues/18125 This implements a mixture distributions, where all components are from the same distribution family. Right now the implementation supports the ```mean, variance, sample, log_prob``` methods. cc: fritzo and neerajprad - [x] add import and `__all__` string in `torch/distributions/__init__.py` - [x] register docs in docs/source/distributions.rst ### Tests (all tests live in tests/distributions.py) - [x] add an `Example(MixtureSameFamily, [...])` to the `EXAMPLES` list, populating `[...]` with three examples: one with `Normal`, one with `Categorical`, and one with `MultivariateNormal` (to exercise, `FloatTensor`, `LongTensor`, and nontrivial `event_dim`) - [x] add a `test_mixture_same_family_shape()` to `TestDistributions`. It would be good to test this with both `Normal` and `MultivariateNormal` - [x] add a `test_mixture_same_family_log_prob()` to `TestDistributions`. - [x] add a `test_mixture_same_family_sample()` to `TestDistributions`. - [x] add a `test_mixture_same_family_shape()` to `TestDistributionShapes` ### Triaged for follup-up PR? - support batch shape - implement `.expand()` - implement `kl_divergence()` in torch/distributions/kl.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/22742 Differential Revision: D19899726 Pulled By: ezyang fbshipit-source-id: 9c816e83a2ef104fe3ea3117c95680b51c7a2fa4

view details

xiaobing.zhang

commit sha b276ddda380eab969a78cbefe5db8011ba922bb2

remove THC dist code which nerver be used (#33283) Summary: Remove THC dist code which nerver be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33283 Differential Revision: D19905361 Pulled By: gchanan fbshipit-source-id: 367fd31e2209d36b30af31511554fdbdd67c98e4

view details

Edward Yang

commit sha ae53f8dd252385480c45305ff15ef5e7aa5ca9e2

Revert D19859905: [pytorch][PR] Gradient scaling API Test Plan: revert-hammer Differential Revision: D19859905 Original commit changeset: bb8ae6966214 fbshipit-source-id: 28f1c93e8a00e3a4bbe8cc981499b15468f0b970

view details

Omkar Salpekar

commit sha 92b67c03e45048b37961db1a00975f85c2da5e9b

[RPC Reliability] Implemented retries for RPCs with exponential backoff (#32602) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32602 This adds functionality for re-trying RPC's that are sent with the function `sendWithRetries()`. It adds RPC's that will potentially need to be retried to a sorted map that contains the timeout at which to retry the RPC and associated metadata. A separate thread iteratively removes the earliest retry-able RPC from the map, sleeps until the corresponding time point, re-tries the RPC, and adds to the map again with a future timeout. GitHub Issue: https://github.com/pytorch/pytorch/issues/32124 Per the first 3 milestones, the following will be addressed in future PR's: * enabling RPC Retries for RRef internal messages Differential Revision: D19560159 fbshipit-source-id: 40cd86f9a25dc24367624d279a3b9720b20824cf

view details

Guanheng Zhang

commit sha 82456410917ecbe1012ce2cdbf7166d7acbff773

Re-activate binary_macos_libtorch_2_7_cpu_build and binary_macos_li… (#33321) Summary: Re-send the PR as Intel has restored the relevant packages. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33321 Differential Revision: D19894221 Pulled By: zhangguanheng66 fbshipit-source-id: bc19dcfa5b17ff047f9ae09ebd8eadfb01f7ed68

view details

push time in 2 months

PR closed numpy/numpy

MAINT: Use expm1(x) instead of exp(x) - 1 03 - Maintenance component: numpy.random

np.exp(1e-99) - 1 0.0

np.expm1(1e-99) 1e-99

<!-- Please be sure you are following the instructions in the dev guidelines http://www.numpy.org/devdocs/dev/development_workflow.html -->

<!-- We'd appreciate it if your commit message is properly formatted http://www.numpy.org/devdocs/dev/development_workflow.html#writing-the-commit-message -->

+5 -1

14 comments

2 changed files

gaurav1086

pr closed time in 2 months

push eventgaurav1086/pytorch

Elias Ellison

commit sha ca33aeba093d99a8dc34b67a670c82fafd16f70c

[JIT] Add Exit Transform / Convert To SSA to docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24114 Differential Revision: D19780828 Pulled By: eellison fbshipit-source-id: d481ad886b2ad6349a1646672e507336d45759fb

view details

BowenBao

commit sha 432858c9601be87facc9d4aac4c209c583893948

[ONNX] Fix exporting copy_ with index as tensor input (#32801) Summary: Supporting the below case. Previously index for copy_ was only considered as constant integer, where as it could be a tensor input as well. ```python class InPlaceIndexedAssignment(torch.nn.Module): def forward(self, data, index, new_data): data[index] = new_data return data data = torch.zeros(3, 4) index = torch.tensor(1) new_data = torch.arange(4).to(torch.float32) torch.onnx.export(InPlaceIndexedAssignment(), (data, index, new_data), 'inplace_assign.onnx', opset_version=11) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32801 Reviewed By: hl475 Differential Revision: D19731666 Pulled By: houseroad fbshipit-source-id: 08703fdccd817f901282e19847e259d93929e702

view details

Brian Stark

commit sha afa8cbf8c21a080f43acfdf3cac6a07a4c606841

Modifed randNLike for scripting (#32830) Summary: the rand N like function had required args which were not being used. As such modified the method signature to give default values so when scripting does not provide these arguments which are not even being used, no error is thrown. Additionally modified the const checker for handling prim::Constant as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/32830 Reviewed By: hl475 Differential Revision: D19731715 Pulled By: houseroad fbshipit-source-id: a3cacb3977eecb88b122e0ceb654fdbf1c8286c1

view details

svcscm

commit sha 10db323b75dd22e611315407360f2b830d79abd9

Updating submodules Summary: GitHub commits: https://github.com/facebook/fb303/commit/4121390031f20c7f800aa07e833ed453ce684736 https://github.com/facebook/fbthrift/commit/fdd24faa6c3461cc74e5ce73f453fb01a727c537 https://github.com/facebook/fbzmq/commit/94471e632be926d7bad91a15f621e7a6322a093e https://github.com/facebook/folly/commit/0a24425afd8a2e6e0a80532b33cdbca3c0669b74 https://github.com/facebook/proxygen/commit/8b79c69b6c15327e2278cd8521c969fea3bc26cc https://github.com/facebook/wangle/commit/99f391782610d9120dd56405e657822893d920a9 https://github.com/facebookincubator/fizz/commit/3853cef0ba8fe1b9a02e305936aa632b5a7d55c9 https://github.com/facebookincubator/katran/commit/5db0cb90fc9c0730368cb8d5403b88e41c71dfd5 https://github.com/facebookincubator/mvfst/commit/714edbb20f55801ddab654cf0d1933d01071a01d https://github.com/pytorch/fbgemm/commit/880ade142005368d62de356d0910788e36de7c5c Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: a63558a8df40c936d8959287f815835502b6cbd9

view details

Jeremy Lilley

commit sha f0d7bd41b9637ed138dc90e8bbceafbe1547bc6a

[jit] Minor: avoid recalculating some keys for map accesses in pickler. (#33060) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33060 Noticed this when tracking down a partially-related SIGSEGV. If inserting a non-present key into a memoized map, don't re-calculate it twice (probably safer that way anyway). ghstack-source-id: 97904485 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D19778008 fbshipit-source-id: 95b1d708c034a54b96a22ccbdffb24f72d08dffd

view details

Pritam Damania

commit sha 05d18ffaf5ccc5a19245afafe3998fc3731b570d

Distributed Autograd: Allow multiple backward passes to accumulate gradients. (#32506) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32506 In this PR, we've introduced a `retain_graph` parameter to distributed autograd similar to `torch.autograd.backward`. In terms of design, this parameter is sent over RPC to all nodes and is used to create the GraphTask on the local nodes. This enables us to run `dist_autograd.backward()` multiple times in the same context. The use case currently for this is to benchmark only the backward pass for distributed autograd. We'd like to measure the QPS for the backward pass and as a result, running a single forward pass and multiple backward passes in a loop is one way to benchmark backward pass performance. ghstack-source-id: 97868900 Test Plan: waitforbuildbot Differential Revision: D19521288 fbshipit-source-id: 7ad8521059fd400d7b5a6ab77ce56e1927ced90a

view details

Michael Ranieri

commit sha e025f393f6078fe8ccdb5e43e575d9f074bc4e53

windows template specialization bug (#33076) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33076 attempt at fixing https://github.com/pytorch/pytorch/issues/30886 Test Plan: circleCI with `call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=14.16` passes Differential Revision: D19784550 fbshipit-source-id: 9fb42c3854d1d00d96cd7179bef9dd1aa2972ea6

view details

Lu Fang

commit sha 674dca0831874314bf34b94f58f3a311708df34a

Automatic update of fbcode/onnx to 8b3f7e2e7a0f2aba0e629e23d89f07c7fc0e6a5e (#33075) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33075 Previous import was 65020daafa9183c769938b4512ce543fd5740f8f Included changes: - **[8b3f7e2e](https://github.com/onnx/onnx/commit/8b3f7e2e)**: Update Dropout and BatchNorm to be Training Friendly (#2568) <Lara Haidar> - **[61f0bbc5](https://github.com/onnx/onnx/commit/61f0bbc5)**: Fix a bug in ScatterND shape inference (#2577) <Bowen Bao> - **[05bce9cf](https://github.com/onnx/onnx/commit/05bce9cf)**: add utility function to make reference attribute whose name is not the same as the attribute it refers. (#2583) <Ke Zhang> - **[71181c83](https://github.com/onnx/onnx/commit/71181c83)**: Clarify spec for constant of shape with dim_n = 0 (#2567) <Negin Raoof> - **[eadba733](https://github.com/onnx/onnx/commit/eadba733)**: Update sigs.md with link to calendar page (#2579) <Prasanth Pulavarthi> - **[08562f8e](https://github.com/onnx/onnx/commit/08562f8e)**: Update working-groups.md (#2580) <Prasanth Pulavarthi> - **[0e718913](https://github.com/onnx/onnx/commit/0e718913)**: Fix Slice op's shape inference logic (#2526) <Hariharan Seshadri> - **[12111410](https://github.com/onnx/onnx/commit/12111410)**: Add missing spaces to Random*Like doc (#2572) <Takeshi Watanabe> - **[7e6e61d6](https://github.com/onnx/onnx/commit/7e6e61d6)**: Contributing: fix typos (#2571) <Maher Jendoubi> - **[bbd604ef](https://github.com/onnx/onnx/commit/bbd604ef)**: Add Einsum op (#2504) <Negin Raoof> - **[fd3ab73a](https://github.com/onnx/onnx/commit/fd3ab73a)**: Clarify split supports zero length splits (#2544) <Negin Raoof> - **[6dd73774](https://github.com/onnx/onnx/commit/6dd73774)**: Fix circleci build and drop unsupported Windows builds (#2565) <Wei-Sheng Chin> - **[b3d201a2](https://github.com/onnx/onnx/commit/b3d201a2)**: Fix the formula of intermediate zero calculation for DynamicQuantizeLinear (#2556) <Yufeng Li> - **[3613eb25](https://github.com/onnx/onnx/commit/3613eb25)**: Add wording to clarify. (#2555) <Dwayne Robinson> - **[dfa4384c](https://github.com/onnx/onnx/commit/dfa4384c)**: Fix shape inference for Split with split attribute (#2328) <Shinichiro Hamaji> - **[684fc1bc](https://github.com/onnx/onnx/commit/684fc1bc)**: Keep symbolic dims in Concat with a single input (#2418) <Shinichiro Hamaji> Test Plan: ci Reviewed By: hl475 Differential Revision: D19784487 fbshipit-source-id: 421cdc3394faeff0168853f4ff065fc599ca3967

view details

Alban Desmaison

commit sha 3b2f267ad8f326e9e05ac73b2b1c7cb626e5c3ab

add to codeowner to get better inbox notification for PR Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33087 Differential Revision: D19790389 Pulled By: albanD fbshipit-source-id: 360ee1fc47a9b0b8d8ddbe47b77f2cbffaead9c8

view details

Negin Raoof

commit sha d678093907c0f946f74f0f38e4e20a024cb52585

[ONNX] Extend op registration to next opsets (#32943) Summary: Currently, custom ops are registered for a specific opset version. For example, all torchvision custom ops are registered for opset 11, and cannot be exported into higher opset versions. This PR extends op registration to higher opset versions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32943 Reviewed By: hl475 Differential Revision: D19739406 Pulled By: houseroad fbshipit-source-id: dd8b616de3a69a529d135fdd02608a17a8e421bc

view details

Hongyu Cai

commit sha de27f4261d3742c124f4a9721ae98c3e87f6c640

[jit] remove redundant variables from JIT TestCase Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29091 Differential Revision: D19746083 Pulled By: suo fbshipit-source-id: 76fd71740fe7a3f52da361d96a7b694ec208de24

view details

Kimish Patel

commit sha e7b42209eb4f7cd5db9206e4761fc8867c58b1b8

Added sparkspot model. Summary: Lite interpereter does not have softplus and sub ops for this model. Test Plan: buck run fbsource//xplat/aibench:run_bench -- -b ../xplat/aibench/specifications/models/pytorch/mobile_migration/sparkspot.json --platform android --framework pytorch --remote --devices SM-G960U-8.0.0-26 https://our.intern.facebook.com/intern/aibench/details/890521439770638 buck run fbsource//xplat/aibench:run_bench -- -b ../xplat/aibench/specifications/models/pytorch/mobile_migration/sparkspot.json --platform android/arm64 --framework pytorch --remote --devices SM-G960U-8.0.0-26 https://our.intern.facebook.com/intern/aibench/details/485779747361527 For Caffe2: buck run fbsource//xplat/aibench:run_bench -- -b ../xplat/aibench/specifications/models/caffe2/mobile_migration/sparkspot.json --platform android --framework caffe2 --remote --devices SM-G950U-7.0-24 https://our.intern.facebook.com/intern/aibench/details/177482569133423 Reviewed By: ljk53, iseeyuan Differential Revision: D19757721 fbshipit-source-id: cdd4b39d072925fc8de17184f2c90918de6245ba

view details

Hong Xu

commit sha a9583c1f7595fd8bc7479a7ea3b9a0a717ba679e

Vectorize softplus and its backward function on CPU (#32944) Summary: The benchmarking shows a huge performance gain (2-7x faster). Also note that I removed Half support because it isn't generally supported on CPU. Benchmark: (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz) ```python import timeit for op in ('Softplus',): print('Forward') for dtype in ('torch.double', 'torch.float'): for n, t in [(10_000, 10000), (100_000, 1000)]: print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('m(a)', setup=f'import torch; m = torch.nn.{op}(); a = torch.randn({n}, dtype={dtype})', number=t)) print('Backward') for dtype in ('torch.double', 'torch.float'): for n, t in [(10_000, 40000), (100_000, 4000)]: print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('y.backward(retain_graph=True)', setup=f'import torch; m = torch.nn.{op}(); a = torch.randn({n}, dtype={dtype}, requires_grad=True); x = m(a); y = x.sum()', number=t)) ``` Before: ``` Forward torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.double 3.73130346799735 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.double 3.6790116359916283 torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.float 2.7477027159911813 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.float 2.7382752639969112 Backward torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.double 7.037510035006562 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.double 5.855093962003593 torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.float 3.413616877005552 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.float 2.5485514330066508 ``` After: ``` Forward torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.double 0.9465823079954134 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.double 0.8799468770012027 torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.float 0.39715987400268205 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.float 0.3563060039887205 Backward torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.double 2.400547721001203 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.double 1.4740848699875642 torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.float 1.6684603010071442 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.float 0.6815649690106511 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32944 Differential Revision: D19725407 Pulled By: VitalyFedyunin fbshipit-source-id: 7430de838df731bd17617eff63f10107d5ad6b8b

view details

Lara

commit sha 868db903ae4582db7d8e5fa9a6db0d43b6e77599

ONNX support for torch.take (#33061) Summary: Adding ONNX export support for torch.take() Pull Request resolved: https://github.com/pytorch/pytorch/pull/33061 Reviewed By: hl475 Differential Revision: D19782651 Pulled By: houseroad fbshipit-source-id: 0168fb941e166acda4ca607165248b8e0b260ace

view details

Nikolay Korovaiko

commit sha c6fa6d82aebc1dcd7561ea29ec5d41c5a211bae1

move Decompose before profiling to prevent clearing shape info Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33100 Differential Revision: D19793346 Pulled By: Krovatkin fbshipit-source-id: fdc5927f4970eabbb5a8f62a499d5b79117af2a9

view details

Kiuk Chung

commit sha 7314f1c2818978fb28b420710ffe1f3b1c0b95a1

[torch/multiprocessing] Update documentation indicating that start_method is ignored for mp.spawn() (#33070) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33070 `start_method` parameter is intentionally ignored for `mp.spawn()`. Document this fact and point the user to `start_processes` if they want to use a different `start_method`. Test Plan: Warning message looks like: ``` main.py:8: UserWarning: This method only supports start_method=spawn (got: fork). To use a different start_method use: torch.multiprocessing.start_process(...) warnings.warn(msg) ``` Reviewed By: ailzhang Differential Revision: D19780235 fbshipit-source-id: 4599cd18c3ba6cc401810efe4f390290ffa8023b

view details

Brian Stark

commit sha 17d4ef9e9e9b2c59ae1473a509d085c4d397bee3

Support using scalar tensor for split (#32493) Summary: split requires an int input, however in tracing operators such as size(axis) return a tensor, which is different behavior than when not tracing. As such need to modify split to handle these cases. Fixes https://github.com/pytorch/pytorch/issues/27551 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32493 Reviewed By: hl475 Differential Revision: D19538254 Pulled By: houseroad fbshipit-source-id: c8623009de5926aa38685e08121f4b48604bd8c0

view details

Raghuraman Krishnamoorthi

commit sha 0e29e9e0f6c3f8299e592525c78530fbbb2da932

Re-enable internal test runs Summary: Fix internal error message due to old version of hypothesis test_suite = self.load_tests() File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/__fb_test_main__.py", line 678, in load_tests suite = loader.load_all() File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/__fb_test_main__.py", line 467, in load_all __import__(module_name, level=0) File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/test_quantization.py", line 45, in <module> hu.assert_deadline_disabled() File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/torch/testing/_internal/hypothesis_utils.py", line 322, in assert_deadline_disabled assert settings().deadline is None File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/hypothesis/_settings.py", line 127, in __getattr__ raise AttributeError('settings has no attribute %s' % (name,)) AttributeError: settings has no attribute deadline Test Plan: buck test mode/dev //caffe2/test:quantization -- --run-disabled runs successfully Differential Revision: D19795232 fbshipit-source-id: ef1d8be20b4be30e1cfad4cd5019c4779a5f4568

view details

Negin Raoof

commit sha 6249d7302b7277864ed0ade93f58d88ee0cd3aa8

[ONNX] Fix export for avg_pool with default stride (#33017) Summary: If using nn.functional avg_pool, stride is an optional arg. If not provided, it is set to kernel_size. This PR fixes the export of avg_pool with default stride. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33017 Reviewed By: hl475 Differential Revision: D19759604 Pulled By: houseroad fbshipit-source-id: b0352db6fbaf427f4cff9ba8a942efdeb39b6f02

view details

Summer Deng

commit sha e2f12885140c36c1d5bf82de6eb47797856fdacd

Add utils to inspect fp16/int8 packed weights (#32979) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32979 Since we use prepacked weights in the Fp16 FCs and future Int8 FCs in production Ads models, we provide the python utils to inspect the unpacked format of the weights for debugging purpose. The main interfaces are the following: ``` from deeplearning.numeric_suite.toolkit import packed_weights_inspector # inspect fp16 packed weights unpacked_fp16_weights = packed_weights_inspector.extract_fp16_fc_packed_weights(fp16_weight_blob_name) # inspect int8 packed weights unpacked_int8_weights, qparams = packed_weights_inspector.extract_int8_fc_packed_weights(int8_weight_blob_name) ``` Test Plan: ``` buck test mode/opt deeplearning/numeric_suite/toolkit/test:packed_weights_inspector_test ``` Reviewed By: amylittleyang Differential Revision: D19724474 fbshipit-source-id: e937672b3722e61bc44c2587aab2288a86aece9a

view details

push time in 2 months

pull request commentmongodb/mongo

SERVER-46011 rename_collection: reduce uuid expression

@Schubes you are welcome. Has this been merged already ?

gaurav1086

comment created time in 2 months

PR closed python/cpython

Reviewers
Redundant expression CLA not signed awaiting core review

<!-- Thanks for your contribution! Please read this comment in its entirety. It's quite important.

Pull Request title

It should be in the following format:

bpo-NNNN: Summary of the changes made

Where: bpo-NNNN refers to the issue number in the https://bugs.python.org.

Most PRs will require an issue number. Trivial changes, like fixing a typo, do not need an issue.

Backport Pull Request title

If this is a backport PR (PR made against branches other than master), please ensure that the PR title is in the following format:

[X.Y] <title from the original PR> (GH-NNNN)

Where: [X.Y] is the branch name, e.g. [3.6].

GH-NNNN refers to the PR number from master.

-->

+2 -2

5 comments

2 changed files

gaurav1086

pr closed time in 2 months

PR closed pytorch/pytorch

Reviewers
Use expm1(x) for precision module: cpp open source triaged

For small magnitude values of x, expm1(x) may be more accurate than exp(x) - 1

np.exp(1e-99) - 1 0.0

np.expm1(1e-99) 1e-99

+9 -9

2 comments

2 changed files

gaurav1086

pr closed time in 2 months

PR closed pytorch/pytorch

Reviewers
[caffe2] Use log1p for precision caffe2 open source triaged

For small magnitude values of x, log1p(x) may be more accurate than log (1 + x)

np.log1p(1e-99) 1e-99

np.log(1 + 1e-99) 0.0

+7 -7

2 comments

3 changed files

gaurav1086

pr closed time in 2 months

PR closed tensorflow/tensorflow

Reviewers
tensor_jni memory cleanup awaiting review cla: yes size:XS

Delete dims[] before return.

+4 -1

0 comment

1 changed file

gaurav1086

pr closed time in 2 months

pull request commenttensorflow/tensorflow

[lite] pass array_names by const ref

@jdduke done. Thank you.

gaurav1086

comment created time in 2 months

push eventgaurav1086/tensorflow

ga

commit sha 67a91228c51e0f5599c7989fece53bdcdd591a42

Code review changes

view details

push time in 2 months

Pull request review commenttensorflow/tensorflow

[lite] pass array_names by const ref

 bool MinMaxApproximatelyEqual(const MinMax& minmax1, const MinMax& minmax2) { // If multiple of these arrays have MinMax, then these are required // to agree with each other. bool PropagateMinMaxAmongArrays(Model* model,-                                const std::vector<string> array_names) {+                                const std::vector<string> &array_names) {

@jdduke , thank you for the review. However I do not understand the point. Do you mean replace: std::vector<string> with std::vector ?

For the vector template class, you have to define the type everywhere, (unless you are using auto)

The following sample code in C++ works: #include <iostream> #include <vector>

using namespace std;

void pass_by_ref(const vector<string> &v) { //do nothing return; }

int main() { vector<string> v = {"hello", "world"}; pass_by_ref(v); return 0; }

If I change the signature of the function to: void pass_by_ref(const vector &v), it gives me a compilation error.

Thoughts ?

gaurav1086

comment created time in 2 months

push eventgaurav1086/tensorflow

TensorFlower Gardener

commit sha b83cc0ac8eb83c85b78778bbe8a0c96f323d747e

Merge pull request #22231 from MichaelKonobeev:sparse-xent-op-hessian PiperOrigin-RevId: 260802377

view details

MichaelKonobeev

commit sha ea809e3ad7c0d8a1fc1170dec6c782c7feac299b

Implement IsZero in eager mode

view details

Trent Lo

commit sha b33c788a2f479a4753f49b566f08079692c75af2

Implement horizontal fusion. - It reduces kernel launch overhead and increases lauch dims by horizontally fusing indepedent computations.

view details

Trent Lo

commit sha cd68827e01d454937399bafcdb1eb4b9a116678a

Minor cleanup for horizontal fusion.

view details

Trent Lo

commit sha cb9ab8bee96530c9973d5e295b53d936cbf8ef72

Polishing coding style and comments.

view details

Trent Lo

commit sha 1876f2acc02dee840b3a8b6ab59f950b5a3bbf4f

Factor out lambdas in HorizontalFusionImpl.

view details

Trent Lo

commit sha 86bd5bf3e75cb5d14d24194a2d1e2d8f60753b03

Comment polishing.

view details

Trent Lo

commit sha 474e79985f722afa57d12447fb2f4dc30e890d06

Add some more unittests for horizontal fusion. In addition, we record the execution time of the tests here, showing the optimization effects of horizontal fusion, measured by --xla_hlo_profile. The accumulated kernel execution time in GradientDescentOptimizerLike is reduced from 2.39ms to 311us; the execution time in RMSProp is reduced from 980us to 112us. Before horizontal fusion: 2019-12-10 22:05:45.215015: I tensorflow/compiler/xla/service/executable.cc:208] Execution profile for GradientDescentOptimizerLike: (2.39 ms @ f_nom) 2019-12-10 22:05:48.877372: I tensorflow/compiler/xla/service/executable.cc:208] Execution profile for RMSPropLike: (980 us @ f_nom) After horizontal fusion: 2019-12-10 22:05:03.831600: I tensorflow/compiler/xla/service/executable.cc:208] Execution profile for GradientDescentOptimizerLike: (311 us @ f_nom) 2019-12-10 22:05:13.513901: I tensorflow/compiler/xla/service/executable.cc:208] Execution profile for RMSPropLike: (112 us @ f_nom)

view details

Trent Lo

commit sha a629a452bff5b7f7f2688086483d7eb8d3d02420

Polishing comments and coding styles.

view details

Stephan Uphoff

commit sha 7813cb00f35d6fc6d8ad8421021c1535f3e8c029

lite/micro: Add feature buffer to micro_speech example. This fixes #35117 Accumulate feature slices in separate buffer. The input tensor is not suitable for keeping state across interference as it has limited lifetime and the buffer space may be reused.

view details

Trent Lo

commit sha 7abde726e4706df2fa83c2ec3c89ef9fb5c99228

Polish coding styles and comments based on review feedback. In addition, use hlo_matcher to verify resultant DAGs instead of LLVM filecheck.

view details

Trent Lo

commit sha 5f5aa78f86a43d073663cc0f96acb3926d621e42

Merge branch 'upstream_master_dec19' into horizontal_fusion_github

view details

MichaelKonobeev

commit sha 6fe6391ea937a3c20308b3986f7232967e6f0268

Unconditionally tag zero tensors

view details

MichaelKonobeev

commit sha b187faf53c68ff9b0c711b246116fb81660ad4c7

Remove expired forward compatibility check

view details

MichaelKonobeev

commit sha cb9ce8a40c41d35725900f0f0e12a934e28ba837

Merge branch 'master' into sparse-xent-op-hessian

view details

Eugene Kuznetsov

commit sha 968a674ecb6db34e5d2e09068a8d9ca5ca4e3e24

Enable //tensorflow/python:stateful_random_ops_test

view details

Eugene Kuznetsov

commit sha f7b28191777b6ae86c0dbdab7a74b8370e53eaa8

Fix for //tensorflow/python:stateful_random_ops_test: Pack arguments of UpdateVariableAndFill_Philox into a struct

view details

Eugene Kuznetsov

commit sha eee5851777b842945b12937600b005a58aae0f2c

Fix for //tensorflow/python:stateful_random_ops_test: Move the thread counter into the global namespace

view details

Trent Lo

commit sha 47ba0995d9838e5f9aa634abc59f4569c4a37375

Fix a buildifier format issue.

view details

RichardXiao13

commit sha f8a15ce2b6f48523effe2dd42e7844ea7ef1d97a

Add usage example to math.poly_val

view details

push time in 2 months

pull request commenttensorflow/tensorflow

[core] Added null check for output buffer

if we just barely return false may induce other following logic issue, right?

@Leslie-Fang , thanks a lot for the review. The function CompressInternal() returns false in most of the error cases, so I believe the caller should handle it the similar way that it handles other errors. Thoughts ?

gaurav1086

comment created time in 2 months

startedgaurav1086/dl-algo-trader

started time in 2 months

push eventgaurav1086/pytorch

Jeremy Lilley

commit sha b894dc06de3e0750d9db8bd20b92429f6d873fa1

[Pytorch] Propagate errors in clearAndWaitForOutstandingRpcsAsync. (#32952) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32952 When the Async() version of clearAndWaitForOutstandingRpcs() was written, we didn't yet have the generic Future<T> class, and hadn't worked out our error model fully. This change fixes that method to properly propagate the first encountered error to the future, using a bool+CAS. ghstack-source-id: 97665749 Test Plan: existing test coverage, buck test mode/dev-nosan caffe2/test/... Differential Revision: D19710337 fbshipit-source-id: 66ce5593a94a16ea624930dbb9409917ef5cfd5d

view details

Kimish Patel

commit sha 820410b5057e751baada48e306c1938352f5a32c

Added upsample_neartest2d op for lite interpreter. (#32913) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32913 This enables mobile detection and tracking models. Test Plan: buck test caffe2/test/cpp/jit:jit -- JitTest.LiteInterpreterUpsampleNearest2d Reviewed By: iseeyuan Differential Revision: D19664502 fbshipit-source-id: 1c7270dcf394aba7b510c5aa80552c58a5038f24

view details

Gregory Chanan

commit sha ec2c974bd51b9677f9925eafc25cd15add8883c6

Simplify some TH codegen by moving code out of the switch and killing dead code. (#32888) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32888 This kills ~1500 lines of generated code by doing the following: 1) Stop binding _th_clone, which isn't used anymore. 2) Move allocation code out of the switch, because it doesn't need to be there, example: Now: ``` auto dispatch_scalar_type = infer_scalar_type(self); auto result_ = c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(c10::Storage(scalarTypeToTypeMeta(dispatch_scalar_type), 0, allocator(), true),DispatchKey::CPUTensorId).release(); auto result = Tensor(c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl>::reclaim(result_)); switch (dispatch_scalar_type) { case ScalarType::Bool: { ... case ScalarType::Byte: { ... ``` Before: ``` auto dispatch_scalar_type = infer_scalar_type(self); switch(dispatch_scalar_type) { case ScalarType::Bool: { auto result_ = c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(caffe2::TypeMeta::Make<bool>(), 0, allocator(), true),DispatchKey::CPUTensorId).release(); auto result = Tensor(c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl>::reclaim(result_)); case ScalarType::Byte: { auto result_ = c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(caffe2::TypeMeta::Make<byte>(), 0, allocator(), true),DispatchKey::CPUTensorId).release(); auto result = Tensor(c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl>::reclaim(result_)); ``` Note there's one extra lookup from ScalarType -> TypeMeta, but that can go away once we are able to put everything in a dispatch macro. 3) Prepare for more moves out of the switch by using dispatch_scalar_type where we would have used an explicit ScalarType::Name More moves are currently blocked by "real" types needing to map scalar_type -> C++ type. Dispatch macros can solve that, but I'll need to wrap the actual TH calls in templates so the entire thing can be done via dispatch. 4) Kill some codegen that isn't used anymore: ALLOC_WRAP, is_actual_return_long. Test Plan: Imported from OSS Differential Revision: D19672613 Pulled By: gchanan fbshipit-source-id: 753f480842d11757e10182e43b471bd3abaa5446

view details

Jie

commit sha 9e7c47644fb80dd80aace1ee8b6b046b8cdca54f

[NHWC CUDNN CONV]Update cudnn convolution memory_format behavior (#32482) Summary: 1. Allows both the memory_format of weight & input to dictate the output memory_format. 2. Provides utility function to recursively convert memory_format of Conv2d and ConvTranspose2d layers. This allows easy model conversion and ensures that lost memory_format through incompatible layers could be restored at Convolution-like layer, where significant performance boost is expected on later generation CUDA devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32482 Differential Revision: D19647903 Pulled By: VitalyFedyunin fbshipit-source-id: 62c96ff6208ff5e84fae1f55b63af9a010ad199a

view details

James Reed

commit sha 341fb6d11dec4321f8b910760903d5a81651bd96

Make caffe2/caffe2/python/models/seq2seq python3 compatible Test Plan: watiforsadcastle Reviewed By: dzhulgakov Differential Revision: D19698403 fbshipit-source-id: 36b73e07e598c848abbe368e522484da9ba4c78f

view details

Mike Ruberry

commit sha aa3c8717392071f78110d1a4144f05f3bfd549c8

Adds TestViewOps, updates documentation (#32512) Summary: Understanding which ops return views and which return tensors with new storage is a common user issue, and an issue for developers connecting accelerators to PyTorch, too. This generic test suite verifies that ops which should return views do (and a few ops that shouldn't don't). The documentation has also been updated for .t(), permute(), unfold(), and select() to clarify they return views. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32512 Differential Revision: D19659454 Pulled By: mruberry fbshipit-source-id: b4334be9b698253a979e1bb8746fdb3ca24aa4e3

view details

Jiakai Liu

commit sha e922826dda69c08acfe35ffc7c1d591c1eac0d7b

[pytorch] simplify lazy initialization of DefaultCPUGenerator singleton (#32897) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32897 Moving the default static instance into the method to achieve the same purpose. ghstack-source-id: 97570792 Test Plan: - CI Reviewed By: dreiss Differential Revision: D19674566 fbshipit-source-id: 27f54da66dd7667c34905eddaac6579e64aa1118

view details

peter

commit sha d3fa68eeec893bfb809bc18169d79f5d397feb79

Fix for MKL detection script on Windows (#32970) Summary: Fixes https://github.com/pytorch/pytorch/issues/32914. 1. Use `DEFINED ENV{MKLProductDir}` instead of `$ENV{MKLProductDir}` 2. Cache `INTEL_COMPILER_DIR` and `INTEL_MKL_DIR` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32970 Differential Revision: D19727677 Pulled By: soumith fbshipit-source-id: 065c6bee35a2295f1c478df1460cad7668b25af5

view details

svcscm

commit sha e999095594dab901478a85aa91a45b5cd8dda214

Updating submodules Summary: GitHub commits: https://github.com/facebook/fbthrift/commit/8f3d7019bb09b9f1fa2db86242d0018c4921327a https://github.com/facebook/mcrouter/commit/a5df50cf5cd8d28120d49233c29902d7b23267a9 https://github.com/facebook/proxygen/commit/b896a52075fa1f4b30d1d64fd55a219bc20a11e6 https://github.com/facebook/rocksdb/commit/3a073234da663709fcb7a479ec88ce7476c48e3a https://github.com/facebook/wangle/commit/7c05bee0551a02b1a8417399bbc12b751f831b8b https://github.com/facebookincubator/mvfst/commit/90f0aa96653dc3e9cbb9b48c99857dc0addc7ba9 https://github.com/pytorch/fbgemm/commit/5cdd1abbb99a6d01354c6409340ad0822775be8b Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 70dd062814f68bda77e119bb9deaefbf71c551e6

view details

nihui

commit sha b69c685c4a9f4d44427ebd1b4b45bbd7859e1430

try to find cudnn header in /usr/include/cuda (#31755) Summary: With fedora negativo17 repo, the cudnn headers are installed in /usr/include/cuda directory, along side with other cuda libraries. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31755 Differential Revision: D19697262 Pulled By: ezyang fbshipit-source-id: be80d3467ffb90fd677d551f4403aea65a2ef5b3

view details

Shinichiro Hamaji

commit sha 67706187fbf2e911a53c9e604a1eaa15c54aca3c

Fix a broken link in contribution_guide.rst Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30814 Differential Revision: D19697403 Pulled By: ezyang fbshipit-source-id: b01fd0e189b3bc7ccaa197c9c64e12fee70a6310

view details

peng

commit sha 18d1896ba014c8a79ed60ab40dcd7be0f96ae431

Fix confusing "does not have GPU support" warning message (#30721) Summary: Many people who use caffe2 are confused about "does not have GPU support" warning message. https://github.com/facebookresearch/video-nonlocal-net/issues/6 facebookarchive/caffe2#346 facebookarchive/caffe2#1634 facebookarchive/caffe2#197 Many none GPU reasons can cause this warning message. It is better to give the error info. ![image](https://user-images.githubusercontent.com/13826327/70129721-41175e00-16ba-11ea-85df-a4b1a1690149.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30721 Differential Revision: D19697413 Pulled By: ezyang fbshipit-source-id: bd24b7c814e7e677352068b9e9f77a68de080159

view details

Shinichiro Hamaji

commit sha 478356aeec571c6f43671b2b0949a3787c6a37f7

Fix broken links in governance.rst Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30815 Differential Revision: D19697401 Pulled By: ezyang fbshipit-source-id: d7e1a1b54039624f471b6cfb568428feb73060f4

view details

Brian W. Hart

commit sha ea968f5cc363f69c89a84b113a83e65453f95eb0

fix possible pandas import error during tensorboard tests (#29650) Summary: TensorBoard tests using SummaryWriter() may fail with a pandas import complaint if TensorFlow packages are installed in the same python environment as PyTorch: Traceback (most recent call last): File "test_tensorboard.py", line 212, in test_writer with self.createSummaryWriter() as writer: File "test_tensorboard.py", line 64, in createSummaryWriter return SummaryWriter(temp_dir) ... File "[...]/site-packages/pandas/core/arrays/categorical.py", line 52, in <module> import pandas.core.algorithms as algorithms AttributeError: module 'pandas' has no attribute 'core' The exact failure may depend on the pandas version. We've also seen: File "[...]/site-packages/pandas/core/arrays/categorical.py", line 9, in <module> import pandas.compat as compat AttributeError: module 'pandas' has no attribute 'compat' The module import chain leading to the failure is tensorboard imports tensorflow imports tensorflow_estimator imports pandas. pandas includes a submodule named 'bottleneck', whose name collides with the PyTorch 'test/bottleneck/' subdirectory. So IF tensorboard, tensorflow, tensorflow_estimator, and pandas are installed in the python environment AND IF testing is run from within PyTorch's 'test/' directory (or maybe just with 'test/' in PYTHONPATH, etc.), then TensorBoard tests using SummaryWriter() will fail. Rename the 'bottleneck/' directory slightly to avoid the name collision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29650 Differential Revision: D19698638 Pulled By: ezyang fbshipit-source-id: cb59342ed407cb37aefc833d67f768a8809129ac

view details

hello@nicklashansen.com

commit sha d3a0bdd06b0a265903d94570d6c0b9004883ddd0

proofreading (#29797) Summary: two instances of if -> it in torch.nn.modules.batchnorm.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/29797 Differential Revision: D19698613 Pulled By: ezyang fbshipit-source-id: 7312b2333f227113e904dfa91db90d00e525affb

view details

cyy

commit sha 27e1fecabd1941b70eaa54b65d921452b613de69

let user specify CUDA_HOST_COMPILER Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32904 Differential Revision: D19729047 Pulled By: ezyang fbshipit-source-id: c233e3924f71a025c51d25a7e3a8d728dac8730a

view details

Ashkan Aliabadi

commit sha b0d5ce3848abf89e612a872c60c810b9340426b4

Revert D19710990: [pytorch][PR] properly update _flat_weights in RNN modules Test Plan: revert-hammer Differential Revision: D19710990 Original commit changeset: c978c7519464 fbshipit-source-id: 8710bc2f4f1d01d9c93d038b59caf1e6859375dd

view details

Ralf Gommers

commit sha 6305e4a88ff721713fcd266561e1da35cee20b4e

Add warning and example for seeding to DistributedSampler (#32951) Summary: Closes gh-31771 Also note that the `epoch` attribute is *only* used as a manual seed in each iteration (so it could easily be changed/renamed). Seeding consecutive iterations with `[0, 1, 2, ...]` is low-entropy, however in practice it probably doesn't matter when using the sampler in combination with a dataloader (because there won't be enough data nor epochs to run into statistical issues due to low-entropy seeding). So leaving that as is. Rendered docstring: <img width="534" alt="image" src="https://user-images.githubusercontent.com/98330/73701250-35134100-46e9-11ea-97b8-3baeb60fcb37.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/32951 Differential Revision: D19729333 Pulled By: ezyang fbshipit-source-id: 3ddf90a3828b8bbae88aa2195a5d0b7d8ee1b066

view details

aviloria

commit sha 4f5908d5d7c7aa0aff9106cd8066e57db3b0a652

Remove unneded TORCH_API (#32015) Summary: It was causing a build error when compiling on MINGW64 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32015 Differential Revision: D19697296 Pulled By: ezyang fbshipit-source-id: 71e58783c48f8e99755c091b2027d59740dfca47

view details

Ehsan Azar

commit sha 58e8d5588acc2ab0840380c3a89848292a73af3e

[ONNX] Export bitwise_not for bool (logical_not) (#28439) Summary: Fixes https://github.com/pytorch/pytorch/issues/25805 (for bool tensors as in the issue) Pull Request resolved: https://github.com/pytorch/pytorch/pull/28439 Differential Revision: D19700156 Pulled By: ezyang fbshipit-source-id: 0706ada6a8d259dce381ba2d009f226e14c3c14f

view details

push time in 2 months

pull request commentintel/mkl-dnn

[jit] redundant non-negative unroll_y check

Thank you.

gaurav1086

comment created time in 2 months

pull request commentmongodb/mongo

SERVER-46011 rename_collection: reduce uuid expression

@carlchampain thanks for the review. signed.

gaurav1086

comment created time in 2 months

Pull request review commentintel/mkl-dnn

[jit] redundant non-negative unroll_y check

 void jit_avx_gemv_t_f32_kern::innerloop(int unroll_m, int unroll_n) { // Outer loop. void jit_avx_gemv_t_f32_kern::outerloop(         int unroll_x, int unroll_y, Label *&cur_outerloop_label) {-    if ((unroll_x > M_UNROLL_) || (unroll_y > N_UNROLL_) || (unroll_y < 0)

@rsdubtso , thanks for the review and the clarification. Changes made.

gaurav1086

comment created time in 2 months

push eventgaurav1086/mkl-dnn

Gaurav Singh

commit sha b6fd714e3e1f31e082a33ee6c010980fb3f9f394

Added check for unroll_x < 0

view details

push time in 2 months

Pull request review commentnumpy/numpy

MAINT: Use expm1(x) instead of exp(x) - 1

 def test_default_is_pcg64(self):         # a deprecation cycle to move to a different function.         assert_(isinstance(self.rg.bit_generator, PCG64)) +    def test_expm1(self):+        np.random.default_rng(12345)+        assert_(np.random.default_rng(1e99) > 0.0)) +

Thanks. Removed.

gaurav1086

comment created time in 2 months

push eventgaurav1086/numpy

Gaurav Singh

commit sha 72d5553a19f4b12bb535a95215a360a87ab5e0fb

Remove extra paranthesis

view details

push time in 2 months

PR opened intel/mkl-dnn

[jit] redundant non-negative unroll_y check

Removed duplicate check for non-negative unroll_y

Description

Please include a summary of the change. Please also include relevant motivation and context. See contribution guidelines for more details. If the change fixes an issue not documented in the project's Github issue tracker, please document all steps necessary to reproduce it.

Fixes # (github issue)

Checklist

All Submissions

  • [ ] Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally?
  • [ ] Have you formatted the code using clang-format?

New features

  • [ ] Have you added relevant tests?
  • [ ] Have you provided motivation for adding a new feature?

Bug fixes

  • [ ] Have you added relevant regression tests?
  • [ ] Have you included information on how to reproduce the issue (either in a github issue or in this PR)?
+1 -2

0 comment

1 changed file

pr created time in 2 months

create barnchgaurav1086/mkl-dnn

branch : jit_remove_extra_or_expr

created branch time in 2 months

fork gaurav1086/mkl-dnn

Deep Neural Network Library (DNNL)

https://01.org/dnnl

fork in 2 months

startedquantlib/QuantLib

started time in 2 months

PR opened pytorch/pytorch

[caffe2] simplify relative error expr

simplify relative error expr

+1 -2

0 comment

1 changed file

pr created time in 2 months

create barnchgaurav1086/pytorch

branch : caffe2_simply_relative_err_expr

created branch time in 2 months

pull request commentpytorch/pytorch

[aten] fix vector memory leak

Cool, thank you. Kindly let me know if there are any other changes needed in this fix.

gaurav1086

comment created time in 2 months

PR closed facebook/fbthrift

[java] Remove dup index check CLA Signed

Save one extra comparison

+1 -1

0 comment

1 changed file

gaurav1086

pr closed time in 2 months

push eventgaurav1086/numpy

Gaurav Singh

commit sha 33c9318c8c52d1106ce685532a49b582881573b9

Added tests for np.random.default_rng()

view details

push time in 2 months

push eventgaurav1086/pytorch

Mingzhe Li

commit sha cccf5e7011f689c8dea499a1ab881900152e2575

Resolve rendezvous race condition Summary: When running the ctr_mbl_feed, we've encountered hang issue related to the rendezvous handshake based on zeus. It was mitigated by this diff https://our.intern.facebook.com/intern/diff/D19167151/. This diff resolves the race condition by adding a reference to the rendezvous handler. Test Plan: x7340282797 Reviewed By: yifuwang Differential Revision: D19627293 fbshipit-source-id: 560af289db8ef6cf8d6f101f95ec27d5a361fd04

view details

Sampath Mummadi

commit sha 8ead65a94647cae21984f791f8c81ed3f1259fd2

[PyTorch][TorchScript] Add support for join on List of strings in TorchScript Summary: Add support for join on List of strings in TorchScript. Test Plan: (pytorch) smummadi@smummadi-mbp pytorch % python test/test_jit_string.py Fail to import hypothesis in common_utils, tests are not derandomized . ---------------------------------------------------------------------- Ran 1 test in 1.090s OK Differential Revision: D19611800 fbshipit-source-id: cef66356abc14dfd100a806d25dd1a8bc9af0a11

view details

Brian Stark

commit sha 55c382e62bb364b8cd1c631961cc312794d16a02

Fixed access to element in size tensor for scripting (#32652) Summary: when using scripting, there was an error in attempting to access a specific element from within the size tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32652 Reviewed By: hl475 Differential Revision: D19610726 Pulled By: houseroad fbshipit-source-id: bca49927bbe71dbe7e7d7edf301908fe79e089b5

view details

Supriya Rao

commit sha c2d736cefb426a213c43a3356f1780f71b458c94

Add support for Dynamic LSTM quantization on Mobile (#32757) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32757 This PR updates the main quantize_dynamic API to use QNNPACK backend for mobile Test Plan: python test/test_quantization.py PostTrainingDynamicQuantTest.test_quantized_rnn Imported from OSS Differential Revision: D19632220 fbshipit-source-id: b4c51485c281d088524101b97c84dd806438b597

view details

Jeremy Lilley

commit sha 821b6aa769645c8190703b7d8e2cc9f36597853a

[pytorch] Minor: avoid acquiring GIL twice in PyRRef::localValue() (#32785) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32785 Add PythonRpcHandler::handleExceptionWithGIL() so that in PyRRef::localValue(), we don't need to release the GIL and re-acquire the following line. ghstack-source-id: 97418465 Test Plan: existing test coverage Differential Revision: D19626195 fbshipit-source-id: db694d04b078811f819626789e1e86f1b35adb5b

view details

Basil Hosmer

commit sha fb159b5236917c113745e55ae38501aed176c06d

Some work on eager op binding codegen (gen_python_functions.py) (#29986) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29986 Previously in addition to generating a python binding for each op, we would generate an almost-trivial helper for each overload. This PR eliminates the helpers, simplifying codegen logic a bit and reducing the source-level indirection by a step. Perf should be unchanged. codegen diff: https://github.com/bhosmer/scratch/commit/1f2f07fb605e782cf7fdfb7d5eb33050eb65a6b4 Note: in the interests of keeping the diff contained, there's only some light cleanup here beyond what's necessary for the codegen changes. Plan is to do some more substantial refactoring in followup PRs that leave generated code unchanged. Test Plan: Imported from OSS Differential Revision: D18567980 Pulled By: bhosmer fbshipit-source-id: eb9a81babb4489abd470842757af45580d4c9906

view details

Basil Hosmer

commit sha affd598c1fc781919b88aa2efc4a2c1f2d2c96a7

Fix/simplify alias annotation handling in op codegen. (#32574) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32574 Previously, we ignored alias annotations when deriving argument mutability and instead recognized particular signature patterns (in-place, out variant) and assigned mutability accordingly. Op signatures that didn't fit these patterns would error (e.g. see #30526, which this fixes). No change in the generated binding code. Code changes: 1. in function_wrapper.py, fix the mutability derivation logic used when creating an argument's c++ type property. Note that we temporarily need to trap a special case and apply the old logic, see code comment for details. 2. in gen_jit_dispatch.py, update logic that assumed only one mutable Tensor argument per declaration. Happily this mostly was accomplished by bypassing some now-redundant signature regeneration machinery. Another special case here requires that we keep the old machinery around temporarily. Test Plan: Imported from OSS Differential Revision: D19564875 Pulled By: bhosmer fbshipit-source-id: 5637a9672923676d408c9586f3420bcc0028471a

view details

Shihao Xu

commit sha b0923acb29fc7ba322ce119830a8fc9d23a38dfe

Reduce RPC branches for Python/BuiltinOp/TorchScript (#32689) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32689 As described in https://github.com/pytorch/pytorch/issues/32565 ghstack-source-id: 97440343 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_script_functions_not_supported buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_script_functions_not_supported ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call ``` Differential Revision: D5721814 fbshipit-source-id: 9079e81764be1e7c7b85dd72a18c76f3ecfd2547

view details

Edward Yang

commit sha 68742789857a7860ee0af928b3d0f553d06ee4a8

Revert D19611800: [PyTorch][TorchScript] Add support for join on List of strings in TorchScript Test Plan: revert-hammer Differential Revision: D19611800 Original commit changeset: cef66356abc1 fbshipit-source-id: 41af9e0de83b1fb808b17255ec905e137909457d

view details

Pavel Belevich

commit sha 85bd3e5bdbc586de25aef49aeceb233b641b760d

Removing @expectedFailureXLA from test_nll_loss_empty_tensor_reduction_mean (#32701) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32701 Because it's disabled in XLA(https://github.com/pytorch/xla/pull/1563) Discussed in https://github.com/pytorch/xla/issues/1539 Test Plan: Imported from OSS Differential Revision: D19633349 Pulled By: pbelevich fbshipit-source-id: b9a81c976a96b325356ff210ff838dfcd5352db7

view details

albanD

commit sha fa65859270d9ebb73b24a74780366fdac4bdf614

Re-enable non-deterministic autograd tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32793 Test Plan: Imported from OSS Differential Revision: D19634632 Pulled By: albanD fbshipit-source-id: 9dda29536c2ed4afb81ecbea471ba615241bbac2

view details

James Reed

commit sha cc35c876cbc38380230afc56432593158568eb5f

Fix backcompat for linear_relu_dynamic_fp16 (#32803) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32803 Stack from [ghstack](https://github.com/ezyang/ghstack): * **#32803 Fix backcompat for linear_relu_dynamic_fp16** Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D19642281 Pulled By: albanD fbshipit-source-id: 3b6ae4dd81bf8a70dd81ccbb02fffd7653bbd08c

view details

peter

commit sha 9bab617b3e6c7e80b74cf9b091cf5150e3571064

Make python version a parameterizable option for Windows CI. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32823 Differential Revision: D19642347 Pulled By: ezyang fbshipit-source-id: a4d461aa29a06bb7f5e5d359a2df2c90e9a4fd41

view details

Mike Ruberry

commit sha 413c0f6c2930c96f52899069fcef94b7e1dd8be4

Fixes moving after weight norm application (#32563) Summary: This PR updates how RNNs handle their "flat weights." In particular, it allows for only some flat weights to be "materialized" when apply is called, and it updates the flattening behavior to only apply if all flat weights are (1) materialized, (2) share a dtype and (3) are acceptable to cuDNN. One test is modified and another created to test these changes. One practical effect of this change is that weight norm can be successfully applied to a module BEFORE that module is moved to an accelerator. Previously doing so would throw an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32563 Differential Revision: D19602725 Pulled By: mruberry fbshipit-source-id: d8f9441d17815c8c9ba15b256d4be36f784a3cf9

view details

Shen Li

commit sha a40a19ccabde0981c021a94ee81cd1f3ceb9f97a

Remove GIL from RRefContext (#32807) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32807 After this commit, RRefContext no longer depends on pybind. Test Plan: Imported from OSS Differential Revision: D19636316 Pulled By: mrshenli fbshipit-source-id: 88faa101c32e9019e979ae8e5da6706e49842726

view details

Edward Yang

commit sha 3d0a470d89e1ca44850b2deadf592be538cf3be2

Rename DispatchKey::UndefinedTensorId to Undefined (#32728) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32728 It doesn't have much to do with tensors anymore. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19628093 Pulled By: ezyang fbshipit-source-id: 4d57111cdf44ba347bec8a32bb5b4b47a83c1eaf

view details

Edward Yang

commit sha 5ddd2cd92b6dab4493a140c41af0823e815f5b41

Make DispatchKeyGuards accept DispatchKey::Undefined (#32729) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32729 When working on the vmap prototype I noticed that this was helpful as it lets me easily initialize a no-op guard, if I need to do it at constructor time (which I usually do, because the guards don't have move constructors). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19628092 Pulled By: ezyang fbshipit-source-id: d6259a3f70d287cdac2e4a5f3984e2880f19bdc2

view details

Edward Yang

commit sha 690d41f24ec59ed52b81214bebf6d76613bb7ec2

Centralize addition of "always on" dispatch keys. (#32734) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32734 VariableTensorId is the only key with this treatment today, but BackendSelect and CompoundOp are coming soon. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19628091 Pulled By: ezyang fbshipit-source-id: 250753f90528fa282af7a18d8d2f7736382754bd

view details

Yinghai Lu

commit sha 94ddc2c462bd7c79426057d67b03d4a397db0dd1

Resubmit more code fakefp16 mapping unification (#32798) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32798 ATT Test Plan: unittests Reviewed By: amylittleyang Differential Revision: D19632251 fbshipit-source-id: 670004050d67415bb24392f3520afa32b64ce740

view details

Gaurav Singh

commit sha 765904f1b97bbdda1f4070ba28d7f43e25de16b2

[torch] fd error check Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32797 Differential Revision: D19642262 Pulled By: mrshenli fbshipit-source-id: 1720812166dd583dca6d72cb7e24b65ec013a62b

view details

push time in 2 months

PR closed pytorch/pytorch

[toDLPack] return a unique ptr to avoid memory leak open source

No need to return shared/raw ptr since the owner(src) already has its own copy. Return a unique_ptr to avoid a memory leak.

+9 -7

0 comment

4 changed files

gaurav1086

pr closed time in 2 months

PR opened pytorch/pytorch

[toDLPack] return a unique ptr to avoid memory leak

No need to return shared/raw ptr since the owner(src) already has its own copy. Return a unique_ptr to avoid a memory leak.

+9 -7

0 comment

4 changed files

pr created time in 2 months

create barnchgaurav1086/pytorch

branch : toDLPack_memory_leak

created branch time in 2 months

PR opened facebook/fbthrift

[java] Remove dup index check

Save one extra comparison

+1 -1

0 comment

1 changed file

pr created time in 2 months

create barnchgaurav1086/fbthrift

branch : remove_dup_index_check

created branch time in 2 months

fork gaurav1086/fbthrift

Facebook's branch of Apache Thrift, including a new C++ server.

fork in 2 months

PR closed apple/swift

Remove duplicate expression

<!-- What's in this pull request? --> Replace this paragraph with a description of your changes and rationale. Provide links to external references/discussions if appropriate.

<!-- If this pull request resolves any bugs in the Swift bug tracker, provide a link: --> Resolves SR-NNNN.

<!-- Before merging this pull request, you must run the Swift continuous integration tests. For information about triggering CI builds via @swift-ci, see: https://github.com/apple/swift/blob/master/docs/ContinuousIntegration.md#swift-ci

Thank you for your contribution to Swift! -->

+0 -1

2 comments

1 changed file

gaurav1086

pr closed time in 2 months

pull request commentpytorch/pytorch

[aten] fix vector memory leak

@agolynski , looks like the build failure is unrelated to my change.

gaurav1086

comment created time in 2 months

pull request commentapache/thrift

Propagate exception instead of rethrowing

@allengeorge , thanks for the review. This is a performance fix. There is no jira. Thx.

gaurav1086

comment created time in 2 months

pull request commentnumpy/numpy

MAINT: Use expm1(x) instead of exp(x) - 1

@seberg , I changed random_pareto() in distribution.c . The legacy was another change which I reverted back.

gaurav1086

comment created time in 2 months

PR opened apache/thrift

Propagate exception instead of rethrowing

<!-- Explain the changes in the pull request below: --> Propagate exception instead of rethrowing -> In the catch block, use throw; instead of throw e;

<!-- We recommend you review the checklist/tips before submitting a pull request. -->

  • [ ] Did you create an Apache Jira ticket? (not required for trivial changes)
  • [ ] If a ticket exists: Does your pull request title follow the pattern "THRIFT-NNNN: describe my issue"?
  • [ ] Did you squash your changes to a single commit? (not required, but preferred)
  • [ ] Did you do your best to avoid breaking changes? If one was needed, did you label the Jira ticket with "Breaking-Change"?
  • [ ] If your change does not involve any code, add [skip ci] at the end of your pull request to free up build resources.

<!-- The Contributing Guide at: https://github.com/apache/thrift/blob/master/CONTRIBUTING.md has more details and tips for committing properly. -->

+2 -2

0 comment

1 changed file

pr created time in 2 months

create barnchgaurav1086/thrift

branch : threadfactory_propagate_exception

created branch time in 2 months

create barnchgaurav1086/ceres-solver

branch : LevenbergMarquardtStrategy_simply_expr

created branch time in 2 months

PR opened ceres-solver/ceres-solver

[Covariance]: Check column access beyond row size

Check the column bounds by row size before accessing the column vector

+1 -1

0 comment

1 changed file

pr created time in 2 months

create barnchgaurav1086/ceres-solver

branch : access_cols_beyond_row_size

created branch time in 2 months

fork gaurav1086/ceres-solver

A large scale non-linear optimization library

http://ceres-solver.org/

fork in 2 months

PR opened mongodb/mongo

rename_collection: reduce uuid expression

uuid expression reduction

+1 -1

0 comment

1 changed file

pr created time in 2 months

create barnchgaurav1086/mongo

branch : rename_collection_simplify_uuid_expr

created branch time in 2 months

fork gaurav1086/mongo

The MongoDB Database

https://www.mongodb.com/

fork in 2 months

PR opened apache/thrift

[compiler] catch exception by ref

<!-- Explain the changes in the pull request below: --> The standard practice for exceptions in C++ is throw by value, catch by reference.

<!-- We recommend you review the checklist/tips before submitting a pull request. -->

  • [ ] Did you create an Apache Jira ticket? (not required for trivial changes)
  • [ ] If a ticket exists: Does your pull request title follow the pattern "THRIFT-NNNN: describe my issue"?
  • [ ] Did you squash your changes to a single commit? (not required, but preferred)
  • [ ] Did you do your best to avoid breaking changes? If one was needed, did you label the Jira ticket with "Breaking-Change"?
  • [ ] If your change does not involve any code, add [skip ci] at the end of your pull request to free up build resources.

<!-- The Contributing Guide at: https://github.com/apache/thrift/blob/master/CONTRIBUTING.md has more details and tips for committing properly. -->

+3 -3

0 comment

1 changed file

pr created time in 2 months

create barnchgaurav1086/thrift

branch : compiler_exception_catch_by_ref

created branch time in 2 months

fork gaurav1086/thrift

Apache Thrift

fork in 2 months

PR closed facebook/hhvm

Reviewers
Assert tmpbuf not null before accessing tmpbuf[0] CLA Signed
+1 -1

0 comment

1 changed file

gaurav1086

pr closed time in 2 months

create barnchgaurav1086/rocksdb

branch : user_access_only_simply_expression

created branch time in 2 months

fork gaurav1086/rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.

http://rocksdb.org

fork in 2 months

PR opened pocoproject/poco

[FTPClient] Optimize expression
+1 -1

0 comment

1 changed file

pr created time in 2 months

create barnchgaurav1086/poco

branch : FTPClient_simplify_expression

created branch time in 2 months

more