profile
viewpoint
Ralf Gommers rgommers Quansight Netherlands http://www.linkedin.com/in/ralfgommers NumPy, SciPy, PyWavelets maintainer. Building open source communities at Quansight Labs. He/him.

pytorch/vision 7295

Datasets, Transforms and Models specific to Computer Vision

nigma/pywt 160

We're moving. Please visit https://github.com/PyWavelets

numfocus/gsod 15

NumFOCUS participation information for Google Season Of Docs

cournape/numscons 14

Using scons within distutils

antonior92/ip-nonlinear-solver 13

A trust-region interior-point method for general nonlinear programing problems (GSoC 2017).

Quansight-Labs/rnumpy 12

An experiment in trying to define a core and cleaned-up NumPy API: RNumPy

BIDS-numpy/numpy-paper 10

Draft of NumPy paper

gpanterov/statsmodels 2

Statsmodels: statistical modeling and econometrics in Python

numfocus/summit2019-scheduling 2

2019 Summit sessions & unconference ideas

astrofrog/oldest-supported-numpy 1

Meta-package providing the oldest supported Numpy for a given Python version and platform

pull request commentscipy/scipy

WIP: REL: put upper bounds on versions of dependencies

The new resolver might make it impossible, as things stand today.

hmm, okay. Just as a data point, it sometimes happens that conda can't solve some complex set of packages, and then the escape hatch is "remove a problematic package, and install that package with pip". Some escape hatch is necessary sometimes. Otherwise I expect to see people go do the "manually unpack the wheel into site-packages" thing.

rgommers

comment created time in 20 hours

issue commentdata-apis/consortium-feedback

[RFC] Adopt DLPack as cross-language C ABI stable data structure for array exchange

Thanks @honnibal, that's very useful detail.

Currently we can't communicate with TensorFlow without copying data via the CPU, which I find quite unbelievable.

https://www.tensorflow.org/api_docs/python/tf/experimental/dlpack just landed it seems? The TensorFlow devs participating in this Consortium seem also in favour of standardizing on DLPack.

Tried with a few NumPy devs too, they will need some more docs and context. The main pain point is probably complex number support. With CuPy, JAX, TensorFlow and PyTorch all supporting or in the middle of implementing such support, it seems essential to at least have agreement on it being added to DLPack in the future. Only MXNet doesn't have it as far as I can tell (it has "complex (for FFT)" on its 2.0 roadmap though).

tqchen

comment created time in a day

pull request commentscipy/scipy

WIP: REL: put upper bounds on versions of dependencies

Ah that helps, thanks for making it concrete @pganssle.

I don't think (1) and (2) can easily happen: this whole stack has standardized on dropping old versions of NumPy (and Python) at a certain schedule (see NEP 29). The upper bound will be 1-1.5 years in the future rather than "the latest version released today", and should therefore be safe for maintained packages that do at least 1 release a year. The same will be true for other packages - the version range of supported dependencies is typically at least 3 years worth of releases.

Also good to keep in mind: this is not the wild west of, e.g., web packages with a gazillion dependencies. numpy has 0 dependencies, scipy 1, scikit-learn 2, astropy 2, pandas 3, etc.

The reverse does not really exist (and would be much harder to get right), so if your end users are experiencing a bug that is only fixed by updating something beyond your upper limit, they are more or less out of luck.

Today that's easy, just pip install -U whatever dependency needs updating. I imagine the new resolver may make that harder, but there must be some way to still force it even after the legacy resolver finally goes away years from now.

rgommers

comment created time in a day

pull request commentQuansight-Labs/quansight-labs-site

Create a new ibis post about the backends.

@rpekrul I unblocked this PR by fixing the build, but there's still comments for @tonyfast to address.

tonyfast

comment created time in a day

pull request commentscipy/scipy

WIP: REL: put upper bounds on versions of dependencies

Nice timing with the tzlocal announcement:) "Major breaking change coming .... If your application doesn't break, you are probably using tzlocal wrong anyway." - so anything without a pin will break, not even a release with a deprecation warning in between.

I think for now you will do more harm than good by placing artificial upper bounds on the software versions

Two pretty knowledgeable people have said that now, so I really would like to understand how true that is. If the lower layers of the SciPy stack (say numpy, scipy, matplotlib, dask, pandas, sklearn, skimage, statsmodels + the next 1-2 layers of domain-specific packages like astropy) did this, would pip really not be able to handle that? These are correct constraints, so the one big reason to not add them would be that the pip/poetry/conda/.... resolvers just can't deal with them even though there are many valid solutions, or solve times become so large that it gets too painful.

rgommers

comment created time in 2 days

pull request commentscipy/scipy

WIP: REL: put upper bounds on versions of dependencies

You don't know that future versions will break you, and in fact they probably won't break you.

The first part is right - we can't know. The second part depends on the API surface of the dependency you're using, and a host of other things. On average though, the accumulated result of deprecations, bug fixes and unintentionally introduced new bugs is that it's very unlikely that a NumPy version released 2-3 years from now will not break anything in today's SciPy (or Pandas, or Matplotlib, or <insert large package>).

rgommers

comment created time in 2 days

pull request commentscipy/scipy

WIP: REL: put upper bounds on versions of dependencies

I think you should use pip-compile to generate a constraints.txt file and include it in your sdists. Users who are having trouble installing or using numpy can use pip install -c constraints.txt <numpy> to use a known-good configuration.

Yet another thing I've never seen:) Let me look into that.

rgommers

comment created time in 2 days

pull request commentscipy/scipy

WIP: REL: put upper bounds on versions of dependencies

It might be possible to either work out a deal with setuptools to give a bit more notice between merge and release or to add a release-blocking integration test that builds the latest sdist of numpy, since I (and I'm sure the other setuptools maintainers) don't want to break numpy.

I don't think we need a "deal". In case of real issues, setuptools maintainers have been willing to put in a temporary patch. The situation now is though that major backwards-incompatible changes are planned with the distutils integration, and that makes total sense. So there should just be a staggered upgrading: setuptools makes changes, numpy.distutils adds fixes after a while. then bumps the pin.

There's just zero chance that if we release a version of NumPy or SciPy now that doesn't have an upper bound, it will work 2-3 years from now. Which is what I'd like to achieve.

rgommers

comment created time in 2 days

pull request commentscipy/scipy

WIP: REL: put upper bounds on versions of dependencies

I think the "proper" solution is to talk more with upstream setuptools and collaborate with them. Of course, this will need resources and effort directed toward the packaging side of things BUT given the direction we're likely headed with distutils I think it's super important to actually start collaborating to make sure the upcoming changes don't break numpy.distutils's usecases.

We are talking. But breakage is definitely unavoidable. That's fine, setuptools needs to do a big migration, and then NumPy can adapt.

rgommers

comment created time in 2 days

issue commentpytorch/pytorch

torch.utils Cannot find reference 'utils' in '__init__.pyi | __init__.pyi'

Then you have to do import torch.utils and you will gets hints for what's in that namespace.

paantya

comment created time in 2 days

pull request commentscipy/scipy

WIP: REL: put upper bounds on versions of dependencies

but I do think that the version ranges of wheel and setuptools should allow for the latest patch/bugfix releases, by using setuptools<51.0.0, wheel <1.0.0 for example.

wheel I trust, it's quite stable so the choice here matters less - but I think it's still not guaranteed to work, so it's probably still better to pin (we've never had a bug in wheel prevent anyone from installing from an sdist AFAIK).

setuptools is making breaking changes regularly, also in bugfix releases, and the interaction with numpy.distutils is quite bad (e.g. numpy itself already pins to <0.49.2). I really don't see the downside in pinning setuptools - we should only let users build with versions that are known to work, and if there's a real bug we didn't test (e.g. AIX build is broken) and a newer setuptools has a fix for that, the workaround is to download the sdist and edit pyproject.toml.

rgommers

comment created time in 2 days

pull request commentscipy/scipy

WIP: REL: put upper bounds on versions of dependencies

https://github.com/jazzband/pip-tools is basically the best thing around if you want to have an set of dependencies (requirements.in) converted into a lockfile (requirements.txt).

Thanks @pradyunsg, very helpful.

I do get the impression we're talking about solutions for different problems here:

Lock files are (I think) for developers consuming a package, and putting their own code in production.

Putting bounds on install_requires is for package maintainers releasing a package, it basically encodes what's also in the release notes (e.g., "this release works with Python 3.6-3.8 and NumPy xxxx"). If maintainers don't do that and users try to install some version a few years after release, it won't work - they will get the wrong (too recent) version of a dependency, and then have to somehow figure out what version was the last one to work. And when that happens with multiple dependencies, it quickly becomes impossible to install an old version.

rgommers

comment created time in 2 days

issue commentscipy/oldest-supported-numpy

also publish sdist

Excellent, thanks @astrofrog

tacaswell

comment created time in 3 days

pull request commentscipy/scipy

WIP: REL: put upper bounds on versions of dependencies

@pganssle suggested on Twitter to use a lock file instead of an upper bound on the version of a dependency. I'll need to look into that. One obvious concern is that a lock file is not standardized, there's pip, poetry, pipenv, conda, flit, and a host of other (including not-yet-written) package management tools.

Using a lock file isn't mentioned in the pip docs AFAICT. And pipenv and poetry have different lock files (see, e.g., https://pipenv.pypa.io/en/latest/basics/#pipenv-lock). It's unclear to me how this would help, and if it even works with pip at all. It also seems to be at version 0.0.2 (https://pypi.org/project/pipfile/) right now, which screams "don't use me yet".

The pip docs do talk about repeatability, and in the first example leave figuring out the version of a dependency that is needed up to the user (here), which of course makes very little sense. To be clear, what we're trying to achieve is a user typing pip install scipy and have it work (right after the release, and years later), not some development workflow.

As far as I can tell, one should indeed use a range of versions for install_requires, and set the limits of that range in a way that (a) things are actually going to work, also 2-5 years later, and (b) are not so narrow that if some dependency low down in the stack (like numpy or scipy) makes a new release, that isn't compatible with anything else so everyone needs to go cut a new release of their downstream package.

rgommers

comment created time in 3 days

pull request commentscipy/scipy

REL: put upper bounds on versions of dependencies

A few comments:

  • Some of these changes are right for release branches, but not for master. We need to be able to test unreleased Python/NumPy/... versions in CI, and devs will want to do this too locally sometimes. So instead we need a procedure to apply changes only after creating a release branch.
  • @pganssle suggested on Twitter to use a lock file instead of an upper bound on the version of a dependency. I'll need to look into that. One obvious concern is that a lock file is not standardized, there's pip, poetry, pipenv, conda, flit, and a host of other (including not-yet-written) package management tools.
  • Note that this PR is for SciPy only, after it's reviewed I'll generalize it into advice for other packages in the NumPy/SciPy stack. That should also include a bit more explanation, and things like oldest-supported-numpy.

I'll add WIP, will ping when it's ready for review. Good ideas already very welcome of course - this is a hairy topic.

rgommers

comment created time in 4 days

issue closedscipy/scipy

Build requirements are exact version pins?!

Hello, It appears that pyproject.toml has this in the build requirements:

    "numpy==1.14.5; python_version=='3.6' and platform_system!='AIX'",
    "numpy==1.14.5; python_version=='3.7' and platform_system!='AIX'",
    "numpy==1.17.3; python_version>='3.8' and platform_system!='AIX'",
    "numpy==1.16.0; python_version=='3.6' and platform_system=='AIX'",
    "numpy==1.16.0; python_version=='3.7' and platform_system=='AIX'",
    "numpy==1.17.3; python_version>='3.8' and platform_system=='AIX'",

However this will yield horrible build times if you have a different numpy version installed. Perhaps what was meant there is a minimum version constraint (eg: >=)?

closed time in 4 days

ionelmc

issue commentscipy/scipy

Build requirements are exact version pins?!

The install requirements do not have any upper constraint for numpy, and in some cases it's not even present in that list. One of these is wrong. Either the build reqs should have min constraints, or the install requirements have upper/lower constraints (if ABI compatibility is really a problem).

Yes, the install requirements need a significant update including upper bounds, I'm just doing that in gh-12862.

Build-time requirements in pyproject.toml are correct.

However this will yield horrible build times if you have a different numpy version installed.

We're in the world of isolated builds now, installed numpy versions will not be used for a pip install scipy or pip install . build.

There's nothing left to do here other than finish up gh-12862, so I'll close this issue. If you have more questions or comments, feel free to continue the conversation of course.

ionelmc

comment created time in 4 days

push eventrgommers/scipy

Tyler Reddy

commit sha a78b344bc7e4c5d4f7374aadc33796ece1eacf41

MAINT: simplify directed_hausdorff * simplify the `directed_hausdorff()` implementation by removing a variable, two assignment operations, and a net reduction in comparison operations by 1 * tests appear to continue passing after these simplifications; because we handle more corner cases and also the indices on top of the distance, we are still slightly more complex than the published pseudo-code algorithm cited in the docstring

view details

Gregory Lee

commit sha a58327c7dfb87e5b88144b1b4830594ad811d7b4

BUG: avoid boolean or integer addition errors in ndimage.measurements.median closes gh-12836

view details

Tyler Reddy

commit sha 27ac968691b3aa677571d8e950cffd8404dfac58

Merge pull request #12846 from rgommers/bump-min-numpy MAINT: update minimum NumPy version to 1.16.5

view details

Matt Haberland

commit sha 08b0b8d870eddd1cbe4d94037dcc76fcc1af0271

ENH: Adds quadratic_assignment with two methods (#12775) ENH: Adds quadratic_assignment with two methods Co-authored-by: Ali Saad-Eldin <ali.saadeldin11@gmail.com>

view details

Gregory Lee

commit sha fa4228a56a0d19c5ff9ea4157c84029fcd3d952a

pep8 fix

view details

Gregory Lee

commit sha 8bd9db8f82212b03e9e3d14f7ed45159a26d857a

change np.bool to bool to avoid warning

view details

Peter Larsen

commit sha 8c98e840a6dfd1b05ca762d3f555596d9f0afde0

ENH: disjoint set data structure (#12600) * initial commit of disjoint set * corrected docs * input validation * better example docs * removed input validation * added disjoint set to sparse.csgraph * reordered doc sections * removed license string * added DisjointSet to toctree * added binary tree test * better testing * better docstring * docstring improvements * renamed `nc` to `n_components` * explain range of element indices in docstring * renamed `n` to `n_elements` * renamed elements -> nodes removed initialization parameter * pure python implementation * fixed pure python implementation * added tests for non-contiguous node indices * removed disjoint set * typo in docstring * fixed example in documentation * implement `find` using path-halving * add comment about path halving remove redundant line * removed tie-breaking logic * test parametrization * disjoint set now takes arbitrary immutable objects * replaced `find` method with `__getitem__` * use dict lookup to deal with numpy comparisons * remove guarantee about tie-breaking * updated example in docstring * renamed "immutable object" to "hashable object" * removed unused `shuffle` parameter * removed ticks around True and False * renamed `dis` to `disjoint_set` in docstring example * renamed `find` to `__getitem__` * renamed `union` to `merge` * renamed `a` and `b` to `x` and `y` * added `connected` method * added `__iter__` method * typo * added tie-breaking back in using insertion order * added docstring description of node insertion * added benchmark * restrict nodes in `__iter__` to those present at time of instantiation * added `__contains__` method * removed implicit creation of nodes * renamed nodes->elements and components->subsets for consistency * update "Methods" entry in docstring * removed `__iter__` from methods docstring * added test for __len__ * consistent naming * fixed timing issue * more benchmark cases * split `test_contains` in two * moved DisjointSet to _lib module * removed DisjointSet from sparse.csgraph docs * moved DisjointSet into cluster.hierarchy added toctree entry * comment for _indices variable * nitpicks * reverted * removed `See Also` from docs * corrected import in docstring * typo * fixed bug in `add` method * added test for reverse linear union sequence * fixed docstring * corrected comment about use of OrderedDict * pep8

view details

Kristian Eschenburg

commit sha b8c8e0f050eb4b7cebd70375ea2d6b6aac9c1534

check if column in spearman rho is entirely NaN or Inf (#12460) make spearman rho handle NaN appropriately * check if either column in spearman rho is entirely NaN or Inf * TST: stats: test spearmanr nan_policy='omit' fix * TST: stats: fixups per @rlucas7 comments in gh-12460 * Update scipy/stats/tests/test_stats.py Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

view details

Tyler Reddy

commit sha dda87ae6cea67284d26f427c0dbc1f74a36cd17f

Merge pull request #12845 from grlee77/ndimage_measurements_int_fix BUG: avoid boolean or integer addition error in ndimage.measurements.median

view details

Simon Segerblom Rex

commit sha 0cb7ab309e291e431fc919c1fa8a6809aefd93fe

BUG: Use uint16 for cost in NI_WatershedElement (#12842) Co-authored-by: Simon Segerblom Rex <simonsm@axis.com>

view details

Seth Troisi

commit sha 914523af3bc03fe7bf61f621363fca27e97ca1d6

MAINT: Replace np.max with np.maximum (#12698) fixes #12696

view details

Tyler Reddy

commit sha 445147c2326c88f98a600652d40553560d5efb1e

Merge pull request #12822 from tylerjereddy/directed_hausdorff_simplify MAINT: simplify directed_hausdorff

view details

Mike Taves

commit sha 4c268986664ce3ed83ec47f1c103890a580f0458

DOC: use :doi: and :arxiv: directives for references

view details

Seth Troisi

commit sha 8e30f7797bd1ee442f4f1a25172e4402521c1e16

Merge pull request #12858 from mwtoews/doi DOC: use :doi: and :arxiv: directives for references

view details

Ralf Gommers

commit sha 11be8879a53aafc8feb5f02c7e3dd68ae8f6bc68

BLD: pin setuptools and wheel to latest released versions as upper bound Rationale: this is known to work, and we expect future releases of these packages to break the build.

view details

Ralf Gommers

commit sha d52a90e052002760e2b2f96336d2f79fdf20b781

BLD: add an upper limit to supported Python versions. Normally we'd want 3.8 here, but in this case we know that the Python 3.9 release will come before the next SciPy release.

view details

Ralf Gommers

commit sha 446050522608dd332f9fbd1bdee904a96d18353e

REL: put upper bound on NumPy version range we support Rationale: future releases of NumPy are likely to break an already released version of SciPy, so really we should put an upper bound. In this case one more than the already released NumPy versions (we're at 1.19.2 now, and support all 1.20.x) to avoid the next NumPy release making SciPy un-installable.

view details

Ralf Gommers

commit sha 6b285592e503fdcfb385085ed9e87cc68b0ea8cd

BLD: also pin pybind11

view details

Ralf Gommers

commit sha 71ef72ef5697e30e6bd00cf68290fcbe2085b745

DOC: update INSTALL.rst.txt for versions of dependencies

view details

Ralf Gommers

commit sha b792d3b0294503527f370fab2e4e87c521400d86

MAINT: update numpy version in __init__.py

view details

push time in 4 days

issue commentpypa/pip

PEP 518 build requirements cannot be overriden by user

Stepping back from the specific request of overriding build dependencies, the problem presented in the top post can be avoided by adding additional logic to how build dependencies are chosen. When a package specifies numpy (for example) as a build dependency, pip can choose freely any version of numpy. Right now it chooses the latest simply because it’s the default logic. But we can instead condition the logic to prefer matching the run-time environment if possible instead, which would keep the spirit of build isolation, while at the same time solve the build/run-time ABI mismatch problem.

+1 this is a healthy idea in general, and I don't see serious downsides.

Note that for numpy specifically, we try to teach people good habits, and there's a package oldest-supported-numpy that people can depend on in pyproject.toml. But many people new to shipping a package on PyPI won't be aware of that.

ghost

comment created time in 4 days

pull request commentconda-forge/scipy-feedstock

Try cross compiling

@isuruf yes, happy to review/accept that. There's regular questions about cross-compiling. Distutils isn't really built for that and NumPy devs don't really have a need, so no one has spent a lot of effort on it. But if there's something like this we can do to make your life easier, then that sounds like a good idea.

isuruf

comment created time in 4 days

PR opened scipy/scipy

REL: put upper bounds on versions of dependencies Build issues Official binaries

This is a set of changes related to how we deal with build-time and run-time dependencies:

  1. put upper bound on NumPy version range we support

    Rationale: future releases of NumPy are likely to break an already released version of SciPy, so really we should put an upper bound. In this case one more than the already released NumPy versions (we're at 1.19.2 now, and support all 1.20.x) to avoid the next NumPy release making SciPy un-installable.

    To demonstrate the issue, try: pip install scipy==1.1.0 --no-binary :all: This will fail because scipy 1.1.0 only has a >= specifier for numpy, so it'll try to build with latest numpy, and that will have some backwards-incompatible changes (and the number of those, i.e. deprecated and then removed features) will grow over time.

  2. add an upper limit to supported Python versions.

    Normally we'd want 3.8 here, but in this case we know that the Python 3.9 release will come before the next SciPy release.

  3. pin setuptools and wheel to latest released versions as upper bound

    Rationale: this is known to work, and we expect future releases of these packages to break the build.

Cc @melissawm

+10 -10

0 comment

3 changed files

pr created time in 5 days

create barnchrgommers/scipy

branch : dependency-ranges

created branch time in 5 days

issue openednumpy/numpy

update PyPI classifiers to not install with unsupported Python versions

Right now if you do pip-3.9 install numpy, it will try to build NumPy for a version that does not support Python 3.9 yet (as stated in the release notes, and in numpy/__init__.py). We should see if we can fix the PyPI classifiers for this.

created time in 5 days

pull request commentscipy/scipy.org

DOC: citing.rst: add "The NumPy Array: A Structure for Efficient Numerical Computation" (2020)

Thanks @westurner, help with updating citation info is much appreciated. Yes, this should have the full author list. The ideal outcome here is to put the citation info into the same format as the Scipy Nature Methods paper, and remove the CISE paper and the NumPy book that are now listed.

westurner

comment created time in 5 days

push eventnumpy/numpy.org

Ralf Gommers

commit sha 05b9201f13e73c0db32f7212afbbb7dfa5985752

New translations cricket-analytics.md (Portuguese, Brazilian)

view details

push time in 5 days

push eventnumpy/numpy.org

Ralf Gommers

commit sha a043d611a93c19f8e3f86f6c3c49ebb75d746a40

New translations cricket-analytics.md (Portuguese, Brazilian)

view details

push time in 5 days

pull request commentpytorch/rfcs

Adding fill value property to PyTorch sparse tensors

Also, as mentioned, I'd like to get a better understanding for how we treat sparse tensors without a fill value. Should we consider them masked tensors and let functions have their own interpretations of that mask?

It may be good to answer this by prioritizing a good rewrite of the docs (https://github.com/pytorch/pytorch/issues/44635) that answers this, rather than go into detail on this PR. That has to be done anyway (for 1.7.0), and doing it first will make this RFC easier to fit into the picture.

pearu

comment created time in 5 days

pull request commentnumpy/numpy

NEP: Regenerate table in NEP 29 (add numpy 1.18 and 1.19 to list)

Let's get this in, follow-up for * very welcome.

Carreau

comment created time in 6 days

push eventnumpy/numpy

Matthias Bussonnier

commit sha c6853c7c27bb8352ab498848439d4fee9eb79a33

NEP: Regenerate table in NEP 29 (add numpy 1.18 and 1.19 to list) (#17337) * Regenerate Table for NEP 29. Numpy 1.18 and 1.19 have been released, which add two new recommended stop date for recommended numpy support in 2021 and 2022. * Infer next version as max-known-minor-version+1 Do not print the first 4 entries to be dropped; as they are pre-nep-29

view details

push time in 6 days

PR merged numpy/numpy

NEP: Regenerate table in NEP 29 (add numpy 1.18 and 1.19 to list) 03 - Maintenance component: NEP

Numpy 1.18 and 1.19 have been released, which add two new recommended stop date for recommended numpy support in 2021 and 2022.

+17 -7

2 comments

1 changed file

Carreau

pr closed time in 6 days

pull request commentnumpy/numpy

NEP: Regenerate table in NEP 29 (add numpy 1.18 and 1.19 to list)

I'm thinking on adding a * to the versions number that have not been released yet, like:

That sounds like a good idea to me.

Carreau

comment created time in 6 days

pull request commentQuansight-Labs/quansight-labs-site

Add a blog post on the design of versioned-hdf5

Post 2 was published, so aiming for next Tue/Wed for this one?

asmeurer

comment created time in 6 days

push eventQuansight-Labs/quansight-labs-site

Melissa Weber Mendonça

commit sha ee425a01d33c78b2737f1ee2283e3cacd848b0a8

Added post about performance of the Versioned HDF5 library. (#118)

view details

push time in 6 days

PR merged Quansight-Labs/quansight-labs-site

Reviewers
Added post about performance of the Versioned HDF5 library. content

I think this still needs a review, especially the images (sizing and position).

+134 -0

13 comments

16 changed files

melissawm

pr closed time in 6 days

PullRequestReviewEvent

pull request commentscipy/scipy.org

Fix BibTeX items

Thanks, LGTM now, merged.

mwtoews

comment created time in 6 days

push eventscipy/scipy.org

Mike Taves

commit sha e8cdc3aa5132fdaf32ba7a870bb983288ae53634

Fix BibTeX items

view details

Ralf Gommers

commit sha f95beae94c4cba4ffb5e3d66bec6c5a4f7ef28c8

Merge pull request #367 from mwtoews/bib Fix BibTeX items

view details

push time in 6 days

PR merged scipy/scipy.org

Fix BibTeX items

This PR does a few things:

  • Move visually appealing line wrapping
  • Fix Vand erPlas -> VanderPlas and (I think?) {Jarrod Millman}, K. -> Millman, K. Jarrod
  • More common BibTeX style (e.g. don't need {/} for each surname, don't need ~ between first name initials)
  • Remove https://doi.org/ from doi entry
+21 -21

0 comment

1 changed file

mwtoews

pr closed time in 6 days

Pull request review commentnumpy/numpy

NEP: Regenerate table in NEP 29 (add numpy 1.18 and 1.19 to list)

 Jun 23, 2020 3.7+   1.15+ Jul 23, 2020 3.7+   1.16+ Jan 13, 2021 3.7+   1.17+ Jul 26, 2021 3.7+   1.18+-Dec 26, 2021 3.8+   1.18++Dec 22, 2021 3.7+   1.19++Dec 26, 2021 3.8+   1.19++Jun 21, 2022 3.8+   1.18+

This and the next line should be 1.20+ right?

Carreau

comment created time in 6 days

PullRequestReviewEvent
PullRequestReviewEvent

pull request commentnumpy/numpy

BLD: enabled negation of library choices in NPY_*_ORDER

Okay, in it goes:) Thanks @zerothi!

zerothi

comment created time in 6 days

push eventnumpy/numpy

Nick R. Papior

commit sha 233c63a56974de22b846ac989cef1fabe45e7296

BLD: enabled negation of library choices in NPY_*_ORDER (#17219) BLD: enabled negation of library choices in NPY_*_ORDER When users build for a particular order it may be beneficial to disallow certain libraries. In particular a user may not care about which accelerated BLAS library is used, so long as the NetLIB or ATLAS library isn't used. This is now possible with: NPY_BLAS_ORDER='^blas,atlas' or NPY_BLAS_ORDER='!blas,atlas' Since we may envision more BLAS/LAPACK libraries to the pool, this will provide greater flexibility as they enter. A new (local) method is added in system_info.py which removes duplicate code and allows for easier usage across libraries.

view details

push time in 6 days

PR merged numpy/numpy

BLD: enabled negation of library choices in NPY_*_ORDER 01 - Enhancement component: numpy.distutils

When users build for a particular order it may be beneficial to disallow certain libraries.

In particular a user may not care about which accelerated BLAS library is used, so long as the NetLIB library isn't used.

This is now possible with:

NPY_BLAS_ORDER='^blas'

or

NPY_BLAS_ORDER='!blas'

Since we may envision more BLAS/LAPACK libraries to the pool, this will provide greater flexibility as they enter.

A new (local) method is added in system_info.py which removes duplicate code and allows for easier usage across libraries.

<!-- Please be sure you are following the instructions in the dev guidelines http://www.numpy.org/devdocs/dev/development_workflow.html -->

<!-- We'd appreciate it if your commit message is properly formatted http://www.numpy.org/devdocs/dev/development_workflow.html#writing-the-commit-message -->

+145 -34

10 comments

4 changed files

zerothi

pr closed time in 6 days

Pull request review commentscipy/scipy.org

Fix BibTeX items

 Here's an example of a BibTeX entry: ::      @ARTICLE{2020SciPy-NMeth,-           author = {{Virtanen}, Pauli and {Gommers}, Ralf and {Oliphant},-             Travis E. and {Haberland}, Matt and {Reddy}, Tyler and-             {Cournapeau}, David and {Burovski}, Evgeni and {Peterson}, Pearu-             and {Weckesser}, Warren and {Bright}, Jonathan and {van der Walt},-             St{\'e}fan J.  and {Brett}, Matthew and {Wilson}, Joshua and-             {Jarrod Millman}, K.  and {Mayorov}, Nikolay and {Nelson}, Andrew-             R.~J. and {Jones}, Eric and {Kern}, Robert and {Larson}, Eric and-             {Carey}, CJ and {Polat}, {\.I}lhan and {Feng}, Yu and {Moore},-             Eric W. and {Vand erPlas}, Jake and {Laxalde}, Denis and-             {Perktold}, Josef and {Cimrman}, Robert and {Henriksen}, Ian and-             {Quintero}, E.~A. and {Harris}, Charles R and {Archibald}, Anne M.-             and {Ribeiro}, Ant{\^o}nio H. and {Pedregosa}, Fabian and-             {van Mulbregt}, Paul and {SciPy 1.0 Contributors}},-            title = "{{SciPy} 1.0: Fundamental Algorithms for Scientific-                      Computing in Python}",-          journal = {Nature Methods},-          year = {2020},-          volume={17},-          pages={261--272},-          adsurl = {https://rdcu.be/b08Wh},-          doi = {https://doi.org/10.1038/s41592-019-0686-2},+      author  = {Virtanen, Pauli and Gommers, Ralf and Oliphant, Travis E. and+                Haberland, Matt and Reddy, Tyler and Cournapeau, David and+                Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and+                Bright, Jonathan and {van der Walt}, St{\'e}fan J. and+                Brett, Matthew and Wilson, Joshua and Millman, K. Jarrod and+                Mayorov, Nikolay and Nelson, Andrew R. J. and Jones, Eric and+                Kern, Robert and Larson, Eric and Carey, C J and+                Polat, {\.I}lhan and Feng, Yu and Moore, Eric W. and+                VanderPlas, Jake and Laxalde, Denis and Perktold, Josef and

Shouldn't VanderPlas keep {} around it to not lose the capital P?

mwtoews

comment created time in 6 days

PullRequestReviewEvent
PullRequestReviewEvent
PullRequestReviewEvent

pull request commentnumpy/numpy.org

make it clear this is a news item on the news page

Thanks @mattip. I'm happy with either the old or the new here. @InessaPawson and @joelachance, are you fine with making this change?

old version:

image

version with this PR:

image

mattip

comment created time in 7 days

issue commentnumpy/numpy

add a canonical way to determine if dtype is integer, floating point or complex

Deprecating issubsctype sounds like a good idea to me - but then maybe all the other sctype ones as well?

And hide some of the other weird stuff like issubclass_ towards the bottom of the page with some disclaimer, if we don't want to deprecate it?

rgommers

comment created time in 7 days

pull request commentnumpy/numpy

DOC: improve `issubdtype` and scalar type docs

The improved issubdtype docstring with examples looks much better, thanks @eric-wieser

eric-wieser

comment created time in 7 days

Pull request review commentnumpy/numpy

DOC: improve `issubdtype` and scalar type docs

 def issubdtype(arg1, arg2):     See Also     --------     issubsctype, issubclass_-    numpy.core.numerictypes : Overview of numpy type hierarchy.+    arrays.scalars : Overview of numpy type hierarchy.      Examples     ---------    >>> np.issubdtype('S1', np.string_)+    `issubdtype` can be used to check the type of arrays:++    >>> ints = np.array([1, 2, 3], dtype=np.int32)+    >>> np.issubdtype(ints.dtype, np.integer)+    True+    >>> np.issubdtype(ints.dtype, np.floating)+    False++    >>> floats = np.array([1, 2, 3], dtype=np.float32)+    >>> np.issubdtype(floats.dtype, np.integer)+    False+    >>> np.issubdtype(floats.dtype, np.floating)     True++    Similar types of different sizes are not subdtypes of each other:+     >>> np.issubdtype(np.float64, np.float32)     False+    >>> np.issubdtype(np.float32, np.float64)+    False++    but both are subtypes of `floating`:++    >>> np.issubdtype(np.float64, np.floating)+    True+    >>> np.issubdtype(np.float32, np.floating)+    True++    For convinience, dtype-like objects are allowed too:

minor: typo here

eric-wieser

comment created time in 7 days

PullRequestReviewEvent

Pull request review commentnumpy/numpy

DOC: improve `issubdtype` and scalar type docs

 def issubdtype(arg1, arg2):     See Also     --------     issubsctype, issubclass_-    numpy.core.numerictypes : Overview of numpy type hierarchy.+    arrays.scalars : Overview of numpy type hierarchy.

This links to code objects. numpy.core.numerictypes is a thing, arrays.scalars is not. You'll need :ref: I think.

eric-wieser

comment created time in 7 days

PullRequestReviewEvent

issue commentnumpy/numpy

Increased output when running f2py (possibly due to SIMD detection code)

Fortran: I don't think so. It's used much less than C/C++/Cython for new code, and it's hard enough to maintain f2py and Fortran compiler support as is.

Cython: not sure, what would that take? Doesn't seem like a priority anyway, let's focus on regular C code and performance-critical parts like ufunc loops first.

melissawm

comment created time in 7 days

issue commentnumpy/numpy

add a canonical way to determine if dtype is integer, floating point or complex

Oh yes, that doesn't help, more weird and badly documented functions:(

In [13]: z = np.arange(3, dtype=np.complex64)                              

In [14]: np.iscomplex(z)                                                   
Out[14]: array([False, False, False])

In [15]: np.iscomplexobj(z)                                                
Out[15]: True

In [16]: np.iscomplexobj(z.dtype)   # iscomplexobj docstring: "Check for a complex type ...."                        
Out[16]: False
rgommers

comment created time in 7 days

issue commentnumpy/numpy

add a canonical way to determine if dtype is integer, floating point or complex

In my mind, we already have canonical spellings for these and they are

There's a number of issues with that:

  • It's not documented as far as I can tell, I'd expect it in https://numpy.org/devdocs/reference/arrays.dtypes.html or on https://numpy.org/devdocs/reference/routines.dtype.html.
  • If I check NumPy and SciPy both methods I mentioned are used
  • If one reads the docstring for np.floating, all it says is "Abstract base class of all floating-point scalar types". So someone familiar with Python would probably try to use isinstance first.
  • The issubdtype docstring says "Returns True if first argument is a typecode lower/equal in type hierarchy." which won't make sense to many users, and it's not even clear why one would prefer that over issubsctype given the one-line descriptions.
  • Users that want to know "is this an array of floats or integers?" shouldn't have to understand the details of NumPy's dtype hierarchy.

I'm not too interested in arguing about the semantics of "canonical", so let me just say: there is no good way of doing this currently - should we add is_floating, is_complex and is_integer or something similarly named as a sane way of doing this?

rgommers

comment created time in 7 days

issue openednumpy/numpy

add a canonical way to determine if dtype is integer, floating point or complex

There is currently no good way (AFAIK) to figure out if the dtype of an array is integer, floating point or complex. Right now one of these is the most common probably:

x.dtype.kind in np.typecodes["AllFloat"]
np.issubdtype(x.dtype, np.floating)

Both are pretty awful.

A naive way to write code in the absence of something like is_floating_point/is_integer/is_complex would be:

x.dtype in (np.float16, np.float32, np.float64)

The trouble is that we have extended precision dtypes, and only one of float96 or float128 will actually exist (the other one will raise an AttributeError, also annoying and a frequent source of bugs).

Adding a set of functions is_floating_point/is_integer/is_complex (whether with or without an underscore in the name, or naming it floating or floating_point) seems like a good idea to me.

In other libraries: TensorFlow doesn't seem to have any API for this, PyTorch has is_floating_point and is_complex.

Thoughts?

created time in 7 days

issue openedpytorch/pytorch

make torch.all and torch.any support all dtypes

🚀 Feature

Motivation

Current torch.all/torch.any and Tensor.all, Tensor.any only support bool/uint8. This is inconvenient when trying to write numpy-compatible code like

def somefunc(x):
    # x can be a numpy.ndarray or torch.Tensor instance
    if x.all():
        ....

It is also inconsistent with torch.logical_and et al., which work with all integer and float dtypes just fine.

Note that torch.all/any aren't documented as public API because they don't support all dtypes (see https://github.com/pytorch/pytorch/issues/7539#issuecomment-388659871), that'd be nice to fix too.

Pitch

  1. Add support for all dtypes to torch.any/all.
  2. Document torch.all and torch.any as public API at https://pytorch.org/docs/master/torch.html#comparison-ops

Alternatives

The builtins all and any do work:

>>> t = torch.tensor([1, 2], dtype=torch.float64)                          
>>> all(t)                                                                 
True
>>> torch.all(t)                                                           
Traceback (most recent call last):
  File "<ipython-input-78-1c35bb87b542>", line 1, in <module>
    torch.all(t)
RuntimeError: all only supports torch.uint8 and torch.bool dtypes

however, they are orders of magnitude slower:

>>> t = torch.ones(10000, dtype=torch.bool)                                
>>> %timeit all(t)                                                         
10.9 ms ± 54.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit torch.all(t)                                                   
8.74 µs ± 57.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Casting to bool first before using torch.any/all is the other alternative, but that's pretty annoying.

created time in 7 days

PullRequestReviewEvent

Pull request review commentpytorch/builder

Mutex

 package:   name: cpuonly   version: 1.0 -build:-  track_features:-      - cpuonly-  noarch: generic+requirements:+  run_constrained:+    - pytorch-proc * cpu++outputs:+  # A meta-package to select CPU or GPU build for faiss.

This comment is a little confusing - why specifically for faiss? I assume it's for pytorch and possibly other packages too?

scopatz

comment created time in 7 days

PullRequestReviewEvent

issue commentnumpy/numpy

DOC: NumPy distutils documentation is focused on SciPy

This is a fairly hopeless case. numpy.distutils is going to change significantly in the coming 1-2 years (because of distutils being merged into setuptools), so I would not spend any effort on touching the docs for it now.

melissawm

comment created time in 7 days

Pull request review commentpytorch/rfcs

PyTorch Sparse Tensors: fill-value property

+# PyTorch Sparse Tensors: `fill_value` property++|            |                 |+| ---------- | --------------- |+| Authors    | Pearu Peterson  |+| Status     | Draft           |+| Type       | Process         |+| Created    | 2020-09-08      |+| Resolution | TBD             |++## Abstract++This proposal introduces a `fill_value` property to PyTorch sparse+tensors that generalizes the current interpretation of unspecified+elements from zero value to any value, including an indefinite+fill value as a future extension.++## Motivation and Scope++In general, the unspecified elements of sparse tensors have+domain-specific interpretations:+- In Linear Algebra domain, the unspecified elements are zero-valued.+- In Graph domain, the unspecified elements of adjacency matrices+  represent non-edges. In this document, a non-edge corresponds to+  an unspecified fill value.+- In neural networks, sparse tensors can be inputs, say, to activation+  functions that are defined in terms of elementary functions.+  Evaluation of such functions on sparse tensors can be element-wise,+  or one-dimension-wise operations. This includes the evaluation of+  the functions on the values corresponding to unspecified+  elements. Even when the unspecified elements are initially+  zero-valued, the values of unspecified elements may be mapped to+  nonzero values (see [an example+  below](#application-random-sequence-of-rare-events-with-non-zero-mean)). So,+  in the domain of Calculus, the unspecified elements of sparse+  tensors can have any defined value.++Currently, in PyTorch, the unspecified elements of sparse tensors are+zero-valued with few exceptions. For instance, the+`torch.sparse.softmax` function assumes that the unspecified elements+of a sparse tensor input are negative infinity (-inf) valued.++In PyTorch 1.6, element-wise functions from Calculus such as `exp`,+`log`, etc, or arithmetic operations such as addition, subtraction,+etc, on sparse tensors are not supported, because the existing+functions on sparse tensors assume that the unspecified elements are+zero valued (this applies to functions in `torch` namespace, functions+in `torch.sparse` namespace may use a different interpretation for+unspecified elements).++To support Calculus functions on sparse tensors, we propose adding a+`fill_value` property to PyTorch sparse tensors that will represent+the values of unspecified sparse tensor elements.  While doing so, we+also need to consider:+- how the nonzero fill value affects the result of linear algebra+  operations such as matrix multiplication, etc.+- how to define a fill value for hybrid tensors (tensors with sparse+  and dense dimensions),+- how PyTorch autograd support needs to deal with a specified+  fill value.+++## Proposal++This proposal is about enabling sparse tensor support for the Calculus+domain while preserving the existing functionality for the Linear+Algebra domain, and allowing extensions to the Graph domain.++In the following, we describe the sparse tensor `fill_value` feature+with examples involving sparse tensors in COO storage format.  When a+new sparse tensor storage format is introduced to PyTorch, the same+semantics should be applied to the new format.++0.  Used terminology++    - "Defined value" is a value for which memory is allocated and+      initialized.+    - "Undefined value" is a value for which memory is allocated, but+      the content of the memory can be arbitrary.+    - "Indefinite value" represents a structural lack of value.+    - "Sparse tensor format" is a memory-efficient storage format for tensors+      with many equal elements.+    - "Strided tensor format" is a process-efficient storage format+      for general tensors.+    - "Dense tensor" is a tensor with many non-equal elements. Dense+      tensors can be stored in any tensor format, including a sparse+      tensor format.+    - "Dense part" is a strided tensor that is obtained by fixing the+      indices of all sparse dimensions in a hybrid tensor.++1.  We propose to extend sparse tensor constructors with a keyword+    argument `fill_value`, used to define the value for+    unspecified elements of the constructed sparse tensor.++    For instance, the Python signature of `sparse_coo_tensor` would be:++    ```python+    torch.sparse_coo_tensor(indices, values, size=None, fill_value=None, dtype=None, device=None, requires_grad=False)+    ```++    where `fill_value=None` indicates the constructor to use the+    default fill value.++2.  The default fill value is zero.++    This choice is consistent with the interpretation of an unspecified+    element in Linear Algebra.++    The default fill value of a sparse non-hybrid tensor is a scalar+    tensor: `torch.tensor(0, dtype=A.dtype, device=A.device)`. For the+    default fill value of a hybrid tensor, see points 5 and 6 below.++3.  PyTorch functions that have the `layout` argument may use a different+    `fill_value` when constructing a sparse tensor as defined by the+    following table:++    | Function              | `fill_value` of returned sparse tensor |+    | :-------------------- | :----------------------------------- |+    | `torch.empty`         | undefined value                      |+    | `torch.empty_like`    | undefined value                      |+    | `torch.empty_strided` | N/A                                  |+    | `torch.eye`           | 0                                    |+    | `torch.full`          | same as `fill_value` argument        |+    | `torch.full_like`     | same as `fill_value` argument        |+    | `torch.ones`          | 1                                    |+    | `torch.ones_like`     | 1                                    |+    | `torch.zeros`         | 0                                    |+    | `torch.zeros_like`    | 0                                    |++    Note that this table does not include functions `torch.arange`,+    `torch.linspace`, `torch.logspace` and `torch.range` that have the+    `layout` argument as well. We excluded these functions here+    because these are likely never used for creating sparse tensors.+    See also point 14.ii below.++4.  The fill value of a sparse tensor can be acquired via the+    `fill_value()` method that returns a strided `torch.Tensor`+    instance with the same dtype and storage location properties as+    the elements of the sparse tensor:++    - `A.fill_value().dtype == A.dtype`+    - `A.fill_value().device == A.device`++    See point 6 below for how the output of `fill_value()` method is+    computed.++5.  The fill value of a hybrid sparse tensor has the same shape as the+    dense part of the tensor.++    For instance, for a sparse tensor `A` in COO storage format we+    have:++    - `A.fill_value().shape == A.values().shape[1:]`++    Using a non-scalar fill value for hybrid tensors is a natural+    extension of a scalar fill value for non-hybrid tensors:++    1. Hybrid tensors can be considered as sparse arrays with dense+       tensors as elements. The `fill_value` attribute represents an unspecified+       element of the array, hence, the fill value has all the+       properties that any array element has, including the shape and+       memory consumption.++    2. A non-scalar fill value of a hybrid tensor is more memory-efficient+       than if the fill value were a scalar.++       For instance, when applying a row-wise function, say, `softmax`+       to a hybrid tensor, each row may induce a different fill value.+       As an example, consider a 2D hybrid tensor with one sparse and+       one dense dimension in COO format, and apply softmax along the+       first dimension assuming the default fill value `0`. We have:++       ```python+       >>> A = torch.sparse_coo_tensor(indices=[[0, 3]],+                                       values=[[.11, .12], [.31, .32]],+                                       size=(4, 2))+       >>> A.to_dense()+       tensor([[0.1100, 0.1200],+               [0.0000, 0.0000],+               [0.0000, 0.0000],+               [0.3100, 0.3200]])+       >>> A.to_dense().softmax(0)+       tensor([[0.2492, 0.2503],+               [0.2232, 0.2220],+               [0.2232, 0.2220],+               [0.3044, 0.3057]])+       ```++       Observe that the softmax result can be represented as a sparse+       tensor with the same sparsity structure (same indices) as the+       input only when the fill value is a 1-D dense tensor, that is:++       ```python+       A.softmax(0) == torch.sparse_coo_tensor(indices=A.indices(),+                                               values=[[0.2492, 0.2503], [0.3044, 0.3057]],+                                               size=A.size(),+                                               fill_value=[0.2232, 0.2220])+       ```++       The alternative, if the fill value was defined as scalar,+       would be memory inefficient because the result of+       `A.softmax(0)` is a full sparse tensor with all elements+       specified.++6.  The fill value specified that is specified in the `fill_value`+    argument of the sparse tensor constructors may have different+    (smaller) shape than the fill value of the hybrid tensor (as+    defined by point 5).++    The specified fill value (after converting it to `torch.Tensor`+    instance) can be acquired via `_fill_value()` method.++    For example, the fill value of a (1+1)-D hybrid tensor can be+    specified as scalar:++    ```python+    A = torch.sparse_coo_tensor(indices=[[0, 3]],+                                values=[[.11, .12], [.31, .32]],+                                size=(4, 2),+                                fill_value = 1.2)+    A.fill_value() -> torch.tensor([1.2, 1.2])+    A._fill_value() -> torch.tensor(1.2)+    ```++    The output of `fill_value()` is computed as++    ```python+    A._fill_value().resize(A.values().shape[1:])+    ```++    Storing the specified fill value instead of the fill value of the+    hybrid tensor has several advantages:++    - lower memory consumption,+    - optimal evaluation of element-wise functions, see point 8 below,+    - optimal detection of zero fill value, see point 10 below.+++7.  The fill value of a sparse tensor can be changed in place.++    For instance:++    ```python+    A._fill_value().fill_(1.2)+    ```++    resets the fill value of a sparse tensor `A` to be `1.2`.++8.  If `A` is a sparse tensor and `f` is any calculus function that is+    applied to a tensor element-wise, then:++    ```python+    f(A) == torch.sparse_coo_tensor(A.indices(), f(A.values()), fill_value=f(A._fill_value()))+    ```++    Note that if `A` would be using COO storage format then this+    relation holds only if `A` is coalesced (`A.values()` would throw+    an exception otherwise).++    From this relation follows an identity:++    ```python+    f(A).to_dense() == f(A.to_dense())+    ```++    that will be advantageous in testing the sparse tensor support of+    element-wise functions.++9.  The fill value of an element-wise n-ary operation on sparse+    tensors with different fill values is equal to the result of+    applying the operation to the fill values of the sparse+    tensors.++    For instance (`*` represents unspecified element),++    ```+    A = [[1, *], [3, 4]]                  # A fill value is 2+    B = [[*, 6], [*, 8]]                  # B fill value is 7+    A + B = [[1 + 3, 2 + 4], [*, 6 + 8]]  # A + B fill value is 2 + 7 = 9+    ```++10. Existing PyTorch functions that support sparse tensor as inputs,+    need to be updated for handling the defined fill values. This will+    be implemented in two stages:++    1. All relevant functions need to check for zero fill value. If a+       nonzero fill value is used, raise an exception.++       For instance:++       ```C+++       TORCH_CHECK(A._fill_value().nonzero().numel() == 0,+                   "The <function> requires a sparse tensor with zero fill value, got ",+                   A.fill_value());+       ```++       This check ensures that the existing functions will not produce+       incorrect results silently when users pass a sparse tensor+       with nonzero fill value as an input while the function assumes+       it to be zero.++    2. Update the related functions to handle nonzero fill values of+       input sparse tensors correctly.++       For instance, consider a matrix multiplication of two sparse+       tensors `A` and `B` with fill values `a` and `b`, respectively,+       then the `matmul` operation can be expanded as follows:++       ```python+       matmul(A, B) = matmul(A - fA + fA, B - fB + fB)+                    = matmul(A - fA, B - fB) + fA * matmul(ones_like(A), B) + fB * matmul(A, ones_like(B))+       ```++       where the first term can be computed using existing matmul for+       sparse tensors with zero fill value, and the last two terms can+       be replaced with a computation of a single row or column of the+       corresponding matrix products that has reduced computational+       complexity.++11. We propose to add an optional argument `fill_value` to the `to_sparse`+    method:++    ```python+    torch.Tensor.to_sparse(self, sparseDims=None, fill_value=None)+    ```++    that enables efficient construction of sparse tensors from tensors+    with repeated values equal to given `fill_value`. For example:++    ```python+    torch.ones(10).to_sparse(fill_value=1.) -> torch.sparse_coo_tensor([[]], [], (10,), fill_value=1.)+    ```++    The `fill_value` argument has the same semantics as in sparse+    tensor constructors (see point 1) and its shape must be consistent+    with the shape of the input tensor (`self`) and the value of+    `sparseDims` when specified: `self.shape[:sparseDims]` must be+    equal to `<constructed sparse tensor>.fill_value().shape`.++12. Sparse tensors with defined fill value have intrinsic constraints+    between all the unspecified tensor elements (these are always+    equal) that must be taken into account when implementing Autograd+    backward methods for functions that receive sparse tensors as+    inputs.++    Sparse tensors with indefinite fill value don't have the intrinsic+    constraints as discussed above.

Does autograd even make sense in that case? I suspect not, in which case just deleting this sentence would be good.

pearu

comment created time in 7 days

Pull request review commentpytorch/rfcs

PyTorch Sparse Tensors: fill-value property

+# PyTorch Sparse Tensors: `fill_value` property++|            |                 |+| ---------- | --------------- |+| Authors    | Pearu Peterson  |+| Status     | Draft           |+| Type       | Process         |+| Created    | 2020-09-08      |+| Resolution | TBD             |++## Abstract++This proposal introduces a `fill_value` property to PyTorch sparse+tensors that generalizes the current interpretation of unspecified+elements from zero value to any value, including an indefinite+fill value as a future extension.++## Motivation and Scope++In general, the unspecified elements of sparse tensors have+domain-specific interpretations:+- In Linear Algebra domain, the unspecified elements are zero-valued.+- In Graph domain, the unspecified elements of adjacency matrices+  represent non-edges. In this document, a non-edge corresponds to+  an unspecified fill value.+- In neural networks, sparse tensors can be inputs, say, to activation+  functions that are defined in terms of elementary functions.+  Evaluation of such functions on sparse tensors can be element-wise,+  or one-dimension-wise operations. This includes the evaluation of+  the functions on the values corresponding to unspecified+  elements. Even when the unspecified elements are initially+  zero-valued, the values of unspecified elements may be mapped to+  nonzero values (see [an example+  below](#application-random-sequence-of-rare-events-with-non-zero-mean)). So,+  in the domain of Calculus, the unspecified elements of sparse+  tensors can have any defined value.++Currently, in PyTorch, the unspecified elements of sparse tensors are+zero-valued with few exceptions. For instance, the+`torch.sparse.softmax` function assumes that the unspecified elements+of a sparse tensor input are negative infinity (-inf) valued.++In PyTorch 1.6, element-wise functions from Calculus such as `exp`,+`log`, etc, or arithmetic operations such as addition, subtraction,+etc, on sparse tensors are not supported, because the existing+functions on sparse tensors assume that the unspecified elements are+zero valued (this applies to functions in `torch` namespace, functions+in `torch.sparse` namespace may use a different interpretation for+unspecified elements).++To support Calculus functions on sparse tensors, we propose adding a+`fill_value` property to PyTorch sparse tensors that will represent+the values of unspecified sparse tensor elements.  While doing so, we+also need to consider:+- how the nonzero fill value affects the result of linear algebra+  operations such as matrix multiplication, etc.+- how to define a fill value for hybrid tensors (tensors with sparse+  and dense dimensions),+- how PyTorch autograd support needs to deal with a specified+  fill value.+++## Proposal++This proposal is about enabling sparse tensor support for the Calculus+domain while preserving the existing functionality for the Linear+Algebra domain, and allowing extensions to the Graph domain.++In the following, we describe the sparse tensor `fill_value` feature+with examples involving sparse tensors in COO storage format.  When a+new sparse tensor storage format is introduced to PyTorch, the same+semantics should be applied to the new format.++0.  Used terminology++    - "Defined value" is a value for which memory is allocated and+      initialized.+    - "Undefined value" is a value for which memory is allocated, but+      the content of the memory can be arbitrary.+    - "Indefinite value" represents a structural lack of value.+    - "Sparse tensor format" is a memory-efficient storage format for tensors+      with many equal elements.+    - "Strided tensor format" is a process-efficient storage format+      for general tensors.+    - "Dense tensor" is a tensor with many non-equal elements. Dense+      tensors can be stored in any tensor format, including a sparse+      tensor format.+    - "Dense part" is a strided tensor that is obtained by fixing the+      indices of all sparse dimensions in a hybrid tensor.++1.  We propose to extend sparse tensor constructors with a keyword+    argument `fill_value`, used to define the value for+    unspecified elements of the constructed sparse tensor.++    For instance, the Python signature of `sparse_coo_tensor` would be:++    ```python+    torch.sparse_coo_tensor(indices, values, size=None, fill_value=None, dtype=None, device=None, requires_grad=False)+    ```++    where `fill_value=None` indicates the constructor to use the+    default fill value.++2.  The default fill value is zero.++    This choice is consistent with the interpretation of an unspecified+    element in Linear Algebra.++    The default fill value of a sparse non-hybrid tensor is a scalar+    tensor: `torch.tensor(0, dtype=A.dtype, device=A.device)`. For the+    default fill value of a hybrid tensor, see points 5 and 6 below.++3.  PyTorch functions that have the `layout` argument may use a different+    `fill_value` when constructing a sparse tensor as defined by the+    following table:++    | Function              | `fill_value` of returned sparse tensor |+    | :-------------------- | :----------------------------------- |+    | `torch.empty`         | undefined value                      |+    | `torch.empty_like`    | undefined value                      |+    | `torch.empty_strided` | N/A                                  |+    | `torch.eye`           | 0                                    |+    | `torch.full`          | same as `fill_value` argument        |+    | `torch.full_like`     | same as `fill_value` argument        |+    | `torch.ones`          | 1                                    |+    | `torch.ones_like`     | 1                                    |+    | `torch.zeros`         | 0                                    |+    | `torch.zeros_like`    | 0                                    |++    Note that this table does not include functions `torch.arange`,+    `torch.linspace`, `torch.logspace` and `torch.range` that have the+    `layout` argument as well. We excluded these functions here+    because these are likely never used for creating sparse tensors.+    See also point 14.ii below.++4.  The fill value of a sparse tensor can be acquired via the+    `fill_value()` method that returns a strided `torch.Tensor`+    instance with the same dtype and storage location properties as+    the elements of the sparse tensor:++    - `A.fill_value().dtype == A.dtype`+    - `A.fill_value().device == A.device`++    See point 6 below for how the output of `fill_value()` method is+    computed.++5.  The fill value of a hybrid sparse tensor has the same shape as the+    dense part of the tensor.++    For instance, for a sparse tensor `A` in COO storage format we+    have:++    - `A.fill_value().shape == A.values().shape[1:]`++    Using a non-scalar fill value for hybrid tensors is a natural+    extension of a scalar fill value for non-hybrid tensors:++    1. Hybrid tensors can be considered as sparse arrays with dense+       tensors as elements. The `fill_value` attribute represents an unspecified+       element of the array, hence, the fill value has all the+       properties that any array element has, including the shape and+       memory consumption.++    2. A non-scalar fill value of a hybrid tensor is more memory-efficient+       than if the fill value were a scalar.++       For instance, when applying a row-wise function, say, `softmax`+       to a hybrid tensor, each row may induce a different fill value.+       As an example, consider a 2D hybrid tensor with one sparse and+       one dense dimension in COO format, and apply softmax along the+       first dimension assuming the default fill value `0`. We have:++       ```python+       >>> A = torch.sparse_coo_tensor(indices=[[0, 3]],+                                       values=[[.11, .12], [.31, .32]],+                                       size=(4, 2))+       >>> A.to_dense()+       tensor([[0.1100, 0.1200],+               [0.0000, 0.0000],+               [0.0000, 0.0000],+               [0.3100, 0.3200]])+       >>> A.to_dense().softmax(0)+       tensor([[0.2492, 0.2503],+               [0.2232, 0.2220],+               [0.2232, 0.2220],+               [0.3044, 0.3057]])+       ```++       Observe that the softmax result can be represented as a sparse+       tensor with the same sparsity structure (same indices) as the+       input only when the fill value is a 1-D dense tensor, that is:++       ```python+       A.softmax(0) == torch.sparse_coo_tensor(indices=A.indices(),+                                               values=[[0.2492, 0.2503], [0.3044, 0.3057]],+                                               size=A.size(),+                                               fill_value=[0.2232, 0.2220])+       ```++       The alternative, if the fill value was defined as scalar,+       would be memory inefficient because the result of+       `A.softmax(0)` is a full sparse tensor with all elements+       specified.++6.  The fill value specified that is specified in the `fill_value`+    argument of the sparse tensor constructors may have different+    (smaller) shape than the fill value of the hybrid tensor (as+    defined by point 5).++    The specified fill value (after converting it to `torch.Tensor`+    instance) can be acquired via `_fill_value()` method.++    For example, the fill value of a (1+1)-D hybrid tensor can be+    specified as scalar:++    ```python+    A = torch.sparse_coo_tensor(indices=[[0, 3]],+                                values=[[.11, .12], [.31, .32]],+                                size=(4, 2),+                                fill_value = 1.2)+    A.fill_value() -> torch.tensor([1.2, 1.2])+    A._fill_value() -> torch.tensor(1.2)+    ```++    The output of `fill_value()` is computed as++    ```python+    A._fill_value().resize(A.values().shape[1:])+    ```++    Storing the specified fill value instead of the fill value of the+    hybrid tensor has several advantages:++    - lower memory consumption,+    - optimal evaluation of element-wise functions, see point 8 below,+    - optimal detection of zero fill value, see point 10 below.+++7.  The fill value of a sparse tensor can be changed in place.++    For instance:++    ```python+    A._fill_value().fill_(1.2)+    ```++    resets the fill value of a sparse tensor `A` to be `1.2`.++8.  If `A` is a sparse tensor and `f` is any calculus function that is+    applied to a tensor element-wise, then:++    ```python+    f(A) == torch.sparse_coo_tensor(A.indices(), f(A.values()), fill_value=f(A._fill_value()))+    ```++    Note that if `A` would be using COO storage format then this+    relation holds only if `A` is coalesced (`A.values()` would throw+    an exception otherwise).++    From this relation follows an identity:++    ```python+    f(A).to_dense() == f(A.to_dense())+    ```++    that will be advantageous in testing the sparse tensor support of+    element-wise functions.++9.  The fill value of an element-wise n-ary operation on sparse+    tensors with different fill values is equal to the result of+    applying the operation to the fill values of the sparse+    tensors.++    For instance (`*` represents unspecified element),++    ```+    A = [[1, *], [3, 4]]                  # A fill value is 2+    B = [[*, 6], [*, 8]]                  # B fill value is 7+    A + B = [[1 + 3, 2 + 4], [*, 6 + 8]]  # A + B fill value is 2 + 7 = 9+    ```++10. Existing PyTorch functions that support sparse tensor as inputs,+    need to be updated for handling the defined fill values. This will+    be implemented in two stages:++    1. All relevant functions need to check for zero fill value. If a+       nonzero fill value is used, raise an exception.++       For instance:++       ```C+++       TORCH_CHECK(A._fill_value().nonzero().numel() == 0,+                   "The <function> requires a sparse tensor with zero fill value, got ",+                   A.fill_value());+       ```++       This check ensures that the existing functions will not produce+       incorrect results silently when users pass a sparse tensor+       with nonzero fill value as an input while the function assumes+       it to be zero.++    2. Update the related functions to handle nonzero fill values of+       input sparse tensors correctly.++       For instance, consider a matrix multiplication of two sparse+       tensors `A` and `B` with fill values `a` and `b`, respectively,+       then the `matmul` operation can be expanded as follows:++       ```python+       matmul(A, B) = matmul(A - fA + fA, B - fB + fB)+                    = matmul(A - fA, B - fB) + fA * matmul(ones_like(A), B) + fB * matmul(A, ones_like(B))+       ```++       where the first term can be computed using existing matmul for+       sparse tensors with zero fill value, and the last two terms can+       be replaced with a computation of a single row or column of the+       corresponding matrix products that has reduced computational+       complexity.++11. We propose to add an optional argument `fill_value` to the `to_sparse`+    method:++    ```python+    torch.Tensor.to_sparse(self, sparseDims=None, fill_value=None)+    ```++    that enables efficient construction of sparse tensors from tensors+    with repeated values equal to given `fill_value`. For example:++    ```python+    torch.ones(10).to_sparse(fill_value=1.) -> torch.sparse_coo_tensor([[]], [], (10,), fill_value=1.)+    ```++    The `fill_value` argument has the same semantics as in sparse+    tensor constructors (see point 1) and its shape must be consistent+    with the shape of the input tensor (`self`) and the value of+    `sparseDims` when specified: `self.shape[:sparseDims]` must be+    equal to `<constructed sparse tensor>.fill_value().shape`.++12. Sparse tensors with defined fill value have intrinsic constraints+    between all the unspecified tensor elements (these are always+    equal) that must be taken into account when implementing Autograd+    backward methods for functions that receive sparse tensors as+    inputs.++    Sparse tensors with indefinite fill value don't have the intrinsic+    constraints as discussed above.+++### Future extensions and existing issues++Introducing the fill value feature according to this proposal does not+require addressing the following extensions and issues. These are+given here as suggestions to clean up the PyTorch sparse tensor+support in general.++13. For the Graph domain, the indefinite fill value can be specified as a+    tensor with zero dimension(s) that satisfies all the relations+    listed above except point 5. Invalidation of the point 5 will+    provide a consistent way to differentiate between defined and+    indefinite fill values.++14. The introduction of a nonzero fill value feature encourages a+    revisit of the existing PyTorch tensor API.++    1. The acronym NNZ means the "Number of nonzeros" in a sparse+       tensor. In PyTorch, this acronym is used in several places:++       - in the implementations of sparse tensor and related+         functionality,+       - in the `repr` output of COO sparse tensor, see+         [here](https://pytorch.org/docs/master/generated/torch.sparse_coo_tensor.html#torch.sparse_coo_tensor),+       - as a private method `_nnz()`, see+         [here](https://pytorch.org/docs/master/sparse.html?highlight=nnz#torch.sparse.FloatTensor._nnz),+       - as a optional keyword argument `check_sparse_nnz` in+         [torch.autograd.gradcheck](https://pytorch.org/docs/master/autograd.html#numerical-gradient-checking).++       The acronym NNZ is misused in PyTorch:++       - `nnz` holds the value of the "Number of Specified Elements"+         (NSE) in a sparse tensor+       - `nnz` is not always equal to the number of zeros in the+         sparse tensor, for instance, the `values` of the sparse+         tensor in COO format may contain zero values that are not+         accounted in `nnz`++       With the introduction of nonzero fill values, the misuse of+       acronym NNZ will get worse because with nonzero fill value the+       sparse tensor may have no zero elements, e.g. `torch.full((10,+       10), 1.0, layout=torch.sparse_coo)` for which `nnz` would be+       `0`.++       Recommendation: stop the misuse of NNZ acronym via++       - replace the usage of "NNZ" with "NSE",+       - deprecate the use of `_nnz()` in favor of `_nse()`,+       - remove `_nnz()` starting from PyTorch 2.0.++       Alternative: Do nothing.  This is the (undocumented) approach+       taken in Wolfram Language where [one can use "NonzeroValues" to+       determine the number of specified+       elements](https://community.wolfram.com/groups/-/m/t/1168496)+       even when the fill value specified is nonzero.++    2. The `torch` namespace functions `arange`, `range`, `linspace`,+       and `logspace` have `layout` argument that is not needed.++       Currently, PyTorch defines three layouts: `strided`,+       `sparce_coo`, and `_mkldnn`. Because the mentioned functions

typo in sparce_coo

pearu

comment created time in 7 days

Pull request review commentpytorch/rfcs

PyTorch Sparse Tensors: fill-value property

+# PyTorch Sparse Tensors: `fill_value` property++|            |                 |+| ---------- | --------------- |+| Authors    | Pearu Peterson  |+| Status     | Draft           |+| Type       | Process         |+| Created    | 2020-09-08      |+| Resolution | TBD             |++## Abstract++This proposal introduces a `fill_value` property to PyTorch sparse+tensors that generalizes the current interpretation of unspecified+elements from zero value to any value, including an indefinite+fill value as a future extension.++## Motivation and Scope++In general, the unspecified elements of sparse tensors have+domain-specific interpretations:+- In Linear Algebra domain, the unspecified elements are zero-valued.+- In Graph domain, the unspecified elements of adjacency matrices+  represent non-edges. In this document, a non-edge corresponds to+  an unspecified fill value.+- In neural networks, sparse tensors can be inputs, say, to activation+  functions that are defined in terms of elementary functions.+  Evaluation of such functions on sparse tensors can be element-wise,+  or one-dimension-wise operations. This includes the evaluation of+  the functions on the values corresponding to unspecified+  elements. Even when the unspecified elements are initially+  zero-valued, the values of unspecified elements may be mapped to+  nonzero values (see [an example+  below](#application-random-sequence-of-rare-events-with-non-zero-mean)). So,+  in the domain of Calculus, the unspecified elements of sparse+  tensors can have any defined value.++Currently, in PyTorch, the unspecified elements of sparse tensors are+zero-valued with few exceptions. For instance, the+`torch.sparse.softmax` function assumes that the unspecified elements+of a sparse tensor input are negative infinity (-inf) valued.++In PyTorch 1.6, element-wise functions from Calculus such as `exp`,+`log`, etc, or arithmetic operations such as addition, subtraction,+etc, on sparse tensors are not supported, because the existing+functions on sparse tensors assume that the unspecified elements are+zero valued (this applies to functions in `torch` namespace, functions+in `torch.sparse` namespace may use a different interpretation for+unspecified elements).++To support Calculus functions on sparse tensors, we propose adding a+`fill_value` property to PyTorch sparse tensors that will represent+the values of unspecified sparse tensor elements.  While doing so, we+also need to consider:+- how the nonzero fill value affects the result of linear algebra+  operations such as matrix multiplication, etc.+- how to define a fill value for hybrid tensors (tensors with sparse+  and dense dimensions),+- how PyTorch autograd support needs to deal with a specified+  fill value.+++## Proposal++This proposal is about enabling sparse tensor support for the Calculus+domain while preserving the existing functionality for the Linear+Algebra domain, and allowing extensions to the Graph domain.++In the following, we describe the sparse tensor `fill_value` feature+with examples involving sparse tensors in COO storage format.  When a+new sparse tensor storage format is introduced to PyTorch, the same+semantics should be applied to the new format.++0.  Used terminology++    - "Defined value" is a value for which memory is allocated and+      initialized.+    - "Undefined value" is a value for which memory is allocated, but+      the content of the memory can be arbitrary.+    - "Indefinite value" represents a structural lack of value.+    - "Sparse tensor format" is a memory-efficient storage format for tensors+      with many equal elements.+    - "Strided tensor format" is a process-efficient storage format+      for general tensors.+    - "Dense tensor" is a tensor with many non-equal elements. Dense+      tensors can be stored in any tensor format, including a sparse+      tensor format.+    - "Dense part" is a strided tensor that is obtained by fixing the+      indices of all sparse dimensions in a hybrid tensor.++1.  We propose to extend sparse tensor constructors with a keyword+    argument `fill_value`, used to define the value for+    unspecified elements of the constructed sparse tensor.++    For instance, the Python signature of `sparse_coo_tensor` would be:++    ```python+    torch.sparse_coo_tensor(indices, values, size=None, fill_value=None, dtype=None, device=None, requires_grad=False)+    ```++    where `fill_value=None` indicates the constructor to use the+    default fill value.++2.  The default fill value is zero.++    This choice is consistent with the interpretation of an unspecified+    element in Linear Algebra.++    The default fill value of a sparse non-hybrid tensor is a scalar+    tensor: `torch.tensor(0, dtype=A.dtype, device=A.device)`. For the+    default fill value of a hybrid tensor, see points 5 and 6 below.++3.  PyTorch functions that have the `layout` argument may use a different+    `fill_value` when constructing a sparse tensor as defined by the+    following table:++    | Function              | `fill_value` of returned sparse tensor |+    | :-------------------- | :----------------------------------- |+    | `torch.empty`         | undefined value                      |+    | `torch.empty_like`    | undefined value                      |+    | `torch.empty_strided` | N/A                                  |+    | `torch.eye`           | 0                                    |+    | `torch.full`          | same as `fill_value` argument        |+    | `torch.full_like`     | same as `fill_value` argument        |+    | `torch.ones`          | 1                                    |+    | `torch.ones_like`     | 1                                    |+    | `torch.zeros`         | 0                                    |+    | `torch.zeros_like`    | 0                                    |++    Note that this table does not include functions `torch.arange`,+    `torch.linspace`, `torch.logspace` and `torch.range` that have the+    `layout` argument as well. We excluded these functions here+    because these are likely never used for creating sparse tensors.+    See also point 14.ii below.++4.  The fill value of a sparse tensor can be acquired via the+    `fill_value()` method that returns a strided `torch.Tensor`+    instance with the same dtype and storage location properties as+    the elements of the sparse tensor:++    - `A.fill_value().dtype == A.dtype`+    - `A.fill_value().device == A.device`++    See point 6 below for how the output of `fill_value()` method is+    computed.++5.  The fill value of a hybrid sparse tensor has the same shape as the+    dense part of the tensor.++    For instance, for a sparse tensor `A` in COO storage format we+    have:++    - `A.fill_value().shape == A.values().shape[1:]`++    Using a non-scalar fill value for hybrid tensors is a natural+    extension of a scalar fill value for non-hybrid tensors:++    1. Hybrid tensors can be considered as sparse arrays with dense+       tensors as elements. The `fill_value` attribute represents an unspecified+       element of the array, hence, the fill value has all the+       properties that any array element has, including the shape and+       memory consumption.++    2. A non-scalar fill value of a hybrid tensor is more memory-efficient+       than if the fill value were a scalar.++       For instance, when applying a row-wise function, say, `softmax`+       to a hybrid tensor, each row may induce a different fill value.+       As an example, consider a 2D hybrid tensor with one sparse and+       one dense dimension in COO format, and apply softmax along the+       first dimension assuming the default fill value `0`. We have:++       ```python+       >>> A = torch.sparse_coo_tensor(indices=[[0, 3]],+                                       values=[[.11, .12], [.31, .32]],+                                       size=(4, 2))+       >>> A.to_dense()+       tensor([[0.1100, 0.1200],+               [0.0000, 0.0000],+               [0.0000, 0.0000],+               [0.3100, 0.3200]])+       >>> A.to_dense().softmax(0)+       tensor([[0.2492, 0.2503],+               [0.2232, 0.2220],+               [0.2232, 0.2220],+               [0.3044, 0.3057]])+       ```++       Observe that the softmax result can be represented as a sparse+       tensor with the same sparsity structure (same indices) as the+       input only when the fill value is a 1-D dense tensor, that is:++       ```python+       A.softmax(0) == torch.sparse_coo_tensor(indices=A.indices(),+                                               values=[[0.2492, 0.2503], [0.3044, 0.3057]],+                                               size=A.size(),+                                               fill_value=[0.2232, 0.2220])+       ```++       The alternative, if the fill value was defined as scalar,+       would be memory inefficient because the result of+       `A.softmax(0)` is a full sparse tensor with all elements+       specified.++6.  The fill value specified that is specified in the `fill_value`+    argument of the sparse tensor constructors may have different+    (smaller) shape than the fill value of the hybrid tensor (as+    defined by point 5).++    The specified fill value (after converting it to `torch.Tensor`+    instance) can be acquired via `_fill_value()` method.++    For example, the fill value of a (1+1)-D hybrid tensor can be+    specified as scalar:++    ```python+    A = torch.sparse_coo_tensor(indices=[[0, 3]],+                                values=[[.11, .12], [.31, .32]],+                                size=(4, 2),+                                fill_value = 1.2)+    A.fill_value() -> torch.tensor([1.2, 1.2])+    A._fill_value() -> torch.tensor(1.2)+    ```++    The output of `fill_value()` is computed as++    ```python+    A._fill_value().resize(A.values().shape[1:])+    ```++    Storing the specified fill value instead of the fill value of the+    hybrid tensor has several advantages:++    - lower memory consumption,+    - optimal evaluation of element-wise functions, see point 8 below,+    - optimal detection of zero fill value, see point 10 below.+++7.  The fill value of a sparse tensor can be changed in place.++    For instance:++    ```python+    A._fill_value().fill_(1.2)+    ```++    resets the fill value of a sparse tensor `A` to be `1.2`.++8.  If `A` is a sparse tensor and `f` is any calculus function that is+    applied to a tensor element-wise, then:++    ```python+    f(A) == torch.sparse_coo_tensor(A.indices(), f(A.values()), fill_value=f(A._fill_value()))+    ```++    Note that if `A` would be using COO storage format then this+    relation holds only if `A` is coalesced (`A.values()` would throw+    an exception otherwise).

Unrelated to the proposal, but it reminds me: this is pretty annoying behavior. Calling .coalesce() when needed would be much more user-friendly. For example, this makes little sense:

In [13]: s                                                                 
Out[13]: 
tensor(indices=tensor([[0, 3]]),
       values=tensor([[1, 2],
                      [3, 4]]),
       size=(4, 2), nnz=2, layout=torch.sparse_coo)

In [14]: s.values()                                                        
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-14-671cfa0cc2a0> in <module>
----> 1 s.values()

RuntimeError: Cannot get values on an uncoalesced tensor, please call .coalesce() first
pearu

comment created time in 7 days

Pull request review commentpytorch/rfcs

PyTorch Sparse Tensors: fill-value property

+# PyTorch Sparse Tensors: `fill_value` property++|            |                 |+| ---------- | --------------- |+| Authors    | Pearu Peterson  |+| Status     | Draft           |+| Type       | Process         |+| Created    | 2020-09-08      |+| Resolution | TBD             |++## Abstract++This proposal introduces a `fill_value` property to PyTorch sparse+tensors that generalizes the current interpretation of unspecified+elements from zero value to any value, including an indefinite+fill value as a future extension.++## Motivation and Scope++In general, the unspecified elements of sparse tensors have+domain-specific interpretations:+- In Linear Algebra domain, the unspecified elements are zero-valued.+- In Graph domain, the unspecified elements of adjacency matrices+  represent non-edges. In this document, a non-edge corresponds to+  an unspecified fill value.+- In neural networks, sparse tensors can be inputs, say, to activation+  functions that are defined in terms of elementary functions.+  Evaluation of such functions on sparse tensors can be element-wise,+  or one-dimension-wise operations. This includes the evaluation of+  the functions on the values corresponding to unspecified+  elements. Even when the unspecified elements are initially+  zero-valued, the values of unspecified elements may be mapped to+  nonzero values (see [an example+  below](#application-random-sequence-of-rare-events-with-non-zero-mean)). So,+  in the domain of Calculus, the unspecified elements of sparse+  tensors can have any defined value.++Currently, in PyTorch, the unspecified elements of sparse tensors are+zero-valued with few exceptions. For instance, the+`torch.sparse.softmax` function assumes that the unspecified elements+of a sparse tensor input are negative infinity (-inf) valued.++In PyTorch 1.6, element-wise functions from Calculus such as `exp`,+`log`, etc, or arithmetic operations such as addition, subtraction,+etc, on sparse tensors are not supported, because the existing+functions on sparse tensors assume that the unspecified elements are+zero valued (this applies to functions in `torch` namespace, functions+in `torch.sparse` namespace may use a different interpretation for+unspecified elements).++To support Calculus functions on sparse tensors, we propose adding a+`fill_value` property to PyTorch sparse tensors that will represent+the values of unspecified sparse tensor elements.  While doing so, we+also need to consider:+- how the nonzero fill value affects the result of linear algebra+  operations such as matrix multiplication, etc.+- how to define a fill value for hybrid tensors (tensors with sparse+  and dense dimensions),+- how PyTorch autograd support needs to deal with a specified+  fill value.+++## Proposal++This proposal is about enabling sparse tensor support for the Calculus+domain while preserving the existing functionality for the Linear+Algebra domain, and allowing extensions to the Graph domain.++In the following, we describe the sparse tensor `fill_value` feature+with examples involving sparse tensors in COO storage format.  When a+new sparse tensor storage format is introduced to PyTorch, the same+semantics should be applied to the new format.++0.  Used terminology++    - "Defined value" is a value for which memory is allocated and+      initialized.+    - "Undefined value" is a value for which memory is allocated, but+      the content of the memory can be arbitrary.+    - "Indefinite value" represents a structural lack of value.+    - "Sparse tensor format" is a memory-efficient storage format for tensors+      with many equal elements.+    - "Strided tensor format" is a process-efficient storage format+      for general tensors.+    - "Dense tensor" is a tensor with many non-equal elements. Dense+      tensors can be stored in any tensor format, including a sparse+      tensor format.+    - "Dense part" is a strided tensor that is obtained by fixing the+      indices of all sparse dimensions in a hybrid tensor.++1.  We propose to extend sparse tensor constructors with a keyword+    argument `fill_value`, used to define the value for+    unspecified elements of the constructed sparse tensor.++    For instance, the Python signature of `sparse_coo_tensor` would be:++    ```python+    torch.sparse_coo_tensor(indices, values, size=None, fill_value=None, dtype=None, device=None, requires_grad=False)+    ```++    where `fill_value=None` indicates the constructor to use the+    default fill value.++2.  The default fill value is zero.++    This choice is consistent with the interpretation of an unspecified+    element in Linear Algebra.++    The default fill value of a sparse non-hybrid tensor is a scalar+    tensor: `torch.tensor(0, dtype=A.dtype, device=A.device)`. For the+    default fill value of a hybrid tensor, see points 5 and 6 below.++3.  PyTorch functions that have the `layout` argument may use a different+    `fill_value` when constructing a sparse tensor as defined by the+    following table:++    | Function              | `fill_value` of returned sparse tensor |+    | :-------------------- | :----------------------------------- |+    | `torch.empty`         | undefined value                      |+    | `torch.empty_like`    | undefined value                      |+    | `torch.empty_strided` | N/A                                  |+    | `torch.eye`           | 0                                    |+    | `torch.full`          | same as `fill_value` argument        |+    | `torch.full_like`     | same as `fill_value` argument        |+    | `torch.ones`          | 1                                    |+    | `torch.ones_like`     | 1                                    |+    | `torch.zeros`         | 0                                    |+    | `torch.zeros_like`    | 0                                    |++    Note that this table does not include functions `torch.arange`,+    `torch.linspace`, `torch.logspace` and `torch.range` that have the+    `layout` argument as well. We excluded these functions here+    because these are likely never used for creating sparse tensors.+    See also point 14.ii below.++4.  The fill value of a sparse tensor can be acquired via the+    `fill_value()` method that returns a strided `torch.Tensor`+    instance with the same dtype and storage location properties as+    the elements of the sparse tensor:++    - `A.fill_value().dtype == A.dtype`+    - `A.fill_value().device == A.device`++    See point 6 below for how the output of `fill_value()` method is+    computed.++5.  The fill value of a hybrid sparse tensor has the same shape as the+    dense part of the tensor.++    For instance, for a sparse tensor `A` in COO storage format we+    have:++    - `A.fill_value().shape == A.values().shape[1:]`++    Using a non-scalar fill value for hybrid tensors is a natural+    extension of a scalar fill value for non-hybrid tensors:++    1. Hybrid tensors can be considered as sparse arrays with dense+       tensors as elements. The `fill_value` attribute represents an unspecified+       element of the array, hence, the fill value has all the+       properties that any array element has, including the shape and+       memory consumption.++    2. A non-scalar fill value of a hybrid tensor is more memory-efficient+       than if the fill value were a scalar.++       For instance, when applying a row-wise function, say, `softmax`+       to a hybrid tensor, each row may induce a different fill value.+       As an example, consider a 2D hybrid tensor with one sparse and+       one dense dimension in COO format, and apply softmax along the+       first dimension assuming the default fill value `0`. We have:++       ```python+       >>> A = torch.sparse_coo_tensor(indices=[[0, 3]],+                                       values=[[.11, .12], [.31, .32]],+                                       size=(4, 2))+       >>> A.to_dense()+       tensor([[0.1100, 0.1200],+               [0.0000, 0.0000],+               [0.0000, 0.0000],+               [0.3100, 0.3200]])+       >>> A.to_dense().softmax(0)+       tensor([[0.2492, 0.2503],+               [0.2232, 0.2220],+               [0.2232, 0.2220],+               [0.3044, 0.3057]])+       ```++       Observe that the softmax result can be represented as a sparse+       tensor with the same sparsity structure (same indices) as the+       input only when the fill value is a 1-D dense tensor, that is:++       ```python+       A.softmax(0) == torch.sparse_coo_tensor(indices=A.indices(),+                                               values=[[0.2492, 0.2503], [0.3044, 0.3057]],+                                               size=A.size(),+                                               fill_value=[0.2232, 0.2220])+       ```++       The alternative, if the fill value was defined as scalar,+       would be memory inefficient because the result of+       `A.softmax(0)` is a full sparse tensor with all elements+       specified.++6.  The fill value specified that is specified in the `fill_value`+    argument of the sparse tensor constructors may have different+    (smaller) shape than the fill value of the hybrid tensor (as+    defined by point 5).++    The specified fill value (after converting it to `torch.Tensor`+    instance) can be acquired via `_fill_value()` method.++    For example, the fill value of a (1+1)-D hybrid tensor can be+    specified as scalar:++    ```python+    A = torch.sparse_coo_tensor(indices=[[0, 3]],+                                values=[[.11, .12], [.31, .32]],+                                size=(4, 2),+                                fill_value = 1.2)+    A.fill_value() -> torch.tensor([1.2, 1.2])+    A._fill_value() -> torch.tensor(1.2)+    ```++    The output of `fill_value()` is computed as++    ```python+    A._fill_value().resize(A.values().shape[1:])+    ```++    Storing the specified fill value instead of the fill value of the+    hybrid tensor has several advantages:++    - lower memory consumption,+    - optimal evaluation of element-wise functions, see point 8 below,+    - optimal detection of zero fill value, see point 10 below.+++7.  The fill value of a sparse tensor can be changed in place.++    For instance:++    ```python+    A._fill_value().fill_(1.2)+    ```++    resets the fill value of a sparse tensor `A` to be `1.2`.++8.  If `A` is a sparse tensor and `f` is any calculus function that is+    applied to a tensor element-wise, then:++    ```python+    f(A) == torch.sparse_coo_tensor(A.indices(), f(A.values()), fill_value=f(A._fill_value()))+    ```++    Note that if `A` would be using COO storage format then this+    relation holds only if `A` is coalesced (`A.values()` would throw+    an exception otherwise).++    From this relation follows an identity:++    ```python+    f(A).to_dense() == f(A.to_dense())+    ```++    that will be advantageous in testing the sparse tensor support of+    element-wise functions.++9.  The fill value of an element-wise n-ary operation on sparse+    tensors with different fill values is equal to the result of+    applying the operation to the fill values of the sparse+    tensors.++    For instance (`*` represents unspecified element),++    ```+    A = [[1, *], [3, 4]]                  # A fill value is 2+    B = [[*, 6], [*, 8]]                  # B fill value is 7+    A + B = [[1 + 3, 2 + 4], [*, 6 + 8]]  # A + B fill value is 2 + 7 = 9

Many numbers in here seem wrong [1 + 3, 2 + 4], [*, 6 + 8]]. For example for the first element, A has value 1, B is unspecified so should be 7 rather than 3.

pearu

comment created time in 7 days

Pull request review commentpytorch/rfcs

PyTorch Sparse Tensors: fill-value property

+# PyTorch Sparse Tensors: `fill_value` property++|            |                 |+| ---------- | --------------- |+| Authors    | Pearu Peterson  |+| Status     | Draft           |+| Type       | Process         |+| Created    | 2020-09-08      |+| Resolution | TBD             |++## Abstract++This proposal introduces a `fill_value` property to PyTorch sparse+tensors that generalizes the current interpretation of unspecified+elements from zero value to any value, including an indefinite+fill value as a future extension.++## Motivation and Scope++In general, the unspecified elements of sparse tensors have+domain-specific interpretations:+- In Linear Algebra domain, the unspecified elements are zero-valued.+- In Graph domain, the unspecified elements of adjacency matrices+  represent non-edges. In this document, a non-edge corresponds to+  an unspecified fill value.+- In neural networks, sparse tensors can be inputs, say, to activation+  functions that are defined in terms of elementary functions.+  Evaluation of such functions on sparse tensors can be element-wise,+  or one-dimension-wise operations. This includes the evaluation of+  the functions on the values corresponding to unspecified+  elements. Even when the unspecified elements are initially+  zero-valued, the values of unspecified elements may be mapped to+  nonzero values (see [an example+  below](#application-random-sequence-of-rare-events-with-non-zero-mean)). So,+  in the domain of Calculus, the unspecified elements of sparse+  tensors can have any defined value.++Currently, in PyTorch, the unspecified elements of sparse tensors are+zero-valued with few exceptions. For instance, the+`torch.sparse.softmax` function assumes that the unspecified elements+of a sparse tensor input are negative infinity (-inf) valued.++In PyTorch 1.6, element-wise functions from Calculus such as `exp`,+`log`, etc, or arithmetic operations such as addition, subtraction,+etc, on sparse tensors are not supported, because the existing+functions on sparse tensors assume that the unspecified elements are+zero valued (this applies to functions in `torch` namespace, functions+in `torch.sparse` namespace may use a different interpretation for+unspecified elements).++To support Calculus functions on sparse tensors, we propose adding a+`fill_value` property to PyTorch sparse tensors that will represent+the values of unspecified sparse tensor elements.  While doing so, we+also need to consider:+- how the nonzero fill value affects the result of linear algebra+  operations such as matrix multiplication, etc.+- how to define a fill value for hybrid tensors (tensors with sparse+  and dense dimensions),+- how PyTorch autograd support needs to deal with a specified+  fill value.+++## Proposal++This proposal is about enabling sparse tensor support for the Calculus+domain while preserving the existing functionality for the Linear+Algebra domain, and allowing extensions to the Graph domain.++In the following, we describe the sparse tensor `fill_value` feature+with examples involving sparse tensors in COO storage format.  When a+new sparse tensor storage format is introduced to PyTorch, the same+semantics should be applied to the new format.++0.  Used terminology++    - "Defined value" is a value for which memory is allocated and+      initialized.+    - "Undefined value" is a value for which memory is allocated, but+      the content of the memory can be arbitrary.+    - "Indefinite value" represents a structural lack of value.+    - "Sparse tensor format" is a memory-efficient storage format for tensors+      with many equal elements.+    - "Strided tensor format" is a process-efficient storage format+      for general tensors.+    - "Dense tensor" is a tensor with many non-equal elements. Dense+      tensors can be stored in any tensor format, including a sparse+      tensor format.+    - "Dense part" is a strided tensor that is obtained by fixing the+      indices of all sparse dimensions in a hybrid tensor.++1.  We propose to extend sparse tensor constructors with a keyword+    argument `fill_value`, used to define the value for+    unspecified elements of the constructed sparse tensor.++    For instance, the Python signature of `sparse_coo_tensor` would be:++    ```python+    torch.sparse_coo_tensor(indices, values, size=None, fill_value=None, dtype=None, device=None, requires_grad=False)+    ```++    where `fill_value=None` indicates the constructor to use the+    default fill value.++2.  The default fill value is zero.++    This choice is consistent with the interpretation of an unspecified+    element in Linear Algebra.++    The default fill value of a sparse non-hybrid tensor is a scalar+    tensor: `torch.tensor(0, dtype=A.dtype, device=A.device)`. For the+    default fill value of a hybrid tensor, see points 5 and 6 below.++3.  PyTorch functions that have the `layout` argument may use a different+    `fill_value` when constructing a sparse tensor as defined by the+    following table:++    | Function              | `fill_value` of returned sparse tensor |+    | :-------------------- | :----------------------------------- |+    | `torch.empty`         | undefined value                      |+    | `torch.empty_like`    | undefined value                      |+    | `torch.empty_strided` | N/A                                  |+    | `torch.eye`           | 0                                    |+    | `torch.full`          | same as `fill_value` argument        |+    | `torch.full_like`     | same as `fill_value` argument        |+    | `torch.ones`          | 1                                    |+    | `torch.ones_like`     | 1                                    |+    | `torch.zeros`         | 0                                    |+    | `torch.zeros_like`    | 0                                    |++    Note that this table does not include functions `torch.arange`,+    `torch.linspace`, `torch.logspace` and `torch.range` that have the+    `layout` argument as well. We excluded these functions here+    because these are likely never used for creating sparse tensors.+    See also point 14.ii below.++4.  The fill value of a sparse tensor can be acquired via the+    `fill_value()` method that returns a strided `torch.Tensor`+    instance with the same dtype and storage location properties as+    the elements of the sparse tensor:++    - `A.fill_value().dtype == A.dtype`+    - `A.fill_value().device == A.device`++    See point 6 below for how the output of `fill_value()` method is+    computed.++5.  The fill value of a hybrid sparse tensor has the same shape as the+    dense part of the tensor.++    For instance, for a sparse tensor `A` in COO storage format we+    have:++    - `A.fill_value().shape == A.values().shape[1:]`++    Using a non-scalar fill value for hybrid tensors is a natural+    extension of a scalar fill value for non-hybrid tensors:++    1. Hybrid tensors can be considered as sparse arrays with dense+       tensors as elements. The `fill_value` attribute represents an unspecified+       element of the array, hence, the fill value has all the+       properties that any array element has, including the shape and+       memory consumption.++    2. A non-scalar fill value of a hybrid tensor is more memory-efficient+       than if the fill value were a scalar.++       For instance, when applying a row-wise function, say, `softmax`+       to a hybrid tensor, each row may induce a different fill value.+       As an example, consider a 2D hybrid tensor with one sparse and+       one dense dimension in COO format, and apply softmax along the+       first dimension assuming the default fill value `0`. We have:++       ```python+       >>> A = torch.sparse_coo_tensor(indices=[[0, 3]],+                                       values=[[.11, .12], [.31, .32]],+                                       size=(4, 2))+       >>> A.to_dense()+       tensor([[0.1100, 0.1200],+               [0.0000, 0.0000],+               [0.0000, 0.0000],+               [0.3100, 0.3200]])+       >>> A.to_dense().softmax(0)+       tensor([[0.2492, 0.2503],+               [0.2232, 0.2220],+               [0.2232, 0.2220],+               [0.3044, 0.3057]])+       ```++       Observe that the softmax result can be represented as a sparse+       tensor with the same sparsity structure (same indices) as the+       input only when the fill value is a 1-D dense tensor, that is:++       ```python+       A.softmax(0) == torch.sparse_coo_tensor(indices=A.indices(),+                                               values=[[0.2492, 0.2503], [0.3044, 0.3057]],+                                               size=A.size(),+                                               fill_value=[0.2232, 0.2220])+       ```++       The alternative, if the fill value was defined as scalar,+       would be memory inefficient because the result of+       `A.softmax(0)` is a full sparse tensor with all elements+       specified.++6.  The fill value specified that is specified in the `fill_value`+    argument of the sparse tensor constructors may have different+    (smaller) shape than the fill value of the hybrid tensor (as+    defined by point 5).++    The specified fill value (after converting it to `torch.Tensor`+    instance) can be acquired via `_fill_value()` method.++    For example, the fill value of a (1+1)-D hybrid tensor can be+    specified as scalar:++    ```python+    A = torch.sparse_coo_tensor(indices=[[0, 3]],+                                values=[[.11, .12], [.31, .32]],+                                size=(4, 2),+                                fill_value = 1.2)+    A.fill_value() -> torch.tensor([1.2, 1.2])+    A._fill_value() -> torch.tensor(1.2)+    ```++    The output of `fill_value()` is computed as++    ```python+    A._fill_value().resize(A.values().shape[1:])+    ```++    Storing the specified fill value instead of the fill value of the+    hybrid tensor has several advantages:++    - lower memory consumption,+    - optimal evaluation of element-wise functions, see point 8 below,+    - optimal detection of zero fill value, see point 10 below.+++7.  The fill value of a sparse tensor can be changed in place.++    For instance:++    ```python+    A._fill_value().fill_(1.2)+    ```++    resets the fill value of a sparse tensor `A` to be `1.2`.++8.  If `A` is a sparse tensor and `f` is any calculus function that is+    applied to a tensor element-wise, then:++    ```python+    f(A) == torch.sparse_coo_tensor(A.indices(), f(A.values()), fill_value=f(A._fill_value()))+    ```++    Note that if `A` would be using COO storage format then this+    relation holds only if `A` is coalesced (`A.values()` would throw+    an exception otherwise).++    From this relation follows an identity:++    ```python+    f(A).to_dense() == f(A.to_dense())+    ```++    that will be advantageous in testing the sparse tensor support of+    element-wise functions.++9.  The fill value of an element-wise n-ary operation on sparse+    tensors with different fill values is equal to the result of+    applying the operation to the fill values of the sparse+    tensors.++    For instance (`*` represents unspecified element),++    ```+    A = [[1, *], [3, 4]]                  # A fill value is 2+    B = [[*, 6], [*, 8]]                  # B fill value is 7+    A + B = [[1 + 3, 2 + 4], [*, 6 + 8]]  # A + B fill value is 2 + 7 = 9+    ```++10. Existing PyTorch functions that support sparse tensor as inputs,+    need to be updated for handling the defined fill values. This will+    be implemented in two stages:++    1. All relevant functions need to check for zero fill value. If a+       nonzero fill value is used, raise an exception.++       For instance:++       ```C+++       TORCH_CHECK(A._fill_value().nonzero().numel() == 0,+                   "The <function> requires a sparse tensor with zero fill value, got ",+                   A.fill_value());+       ```++       This check ensures that the existing functions will not produce+       incorrect results silently when users pass a sparse tensor+       with nonzero fill value as an input while the function assumes+       it to be zero.++    2. Update the related functions to handle nonzero fill values of+       input sparse tensors correctly.++       For instance, consider a matrix multiplication of two sparse+       tensors `A` and `B` with fill values `a` and `b`, respectively,+       then the `matmul` operation can be expanded as follows:++       ```python+       matmul(A, B) = matmul(A - fA + fA, B - fB + fB)+                    = matmul(A - fA, B - fB) + fA * matmul(ones_like(A), B) + fB * matmul(A, ones_like(B))+       ```++       where the first term can be computed using existing matmul for+       sparse tensors with zero fill value, and the last two terms can+       be replaced with a computation of a single row or column of the+       corresponding matrix products that has reduced computational+       complexity.

Would be good to add a note to this point that it may not make sense to actually spend effort on this for all functions, I can imagine that it could be nontrivial for some linalg functions.

pearu

comment created time in 7 days

Pull request review commentpytorch/rfcs

PyTorch Sparse Tensors: fill-value property

+# PyTorch Sparse Tensors: `fill_value` property++|            |                 |+| ---------- | --------------- |+| Authors    | Pearu Peterson  |+| Status     | Draft           |+| Type       | Process         |+| Created    | 2020-09-08      |+| Resolution | TBD             |++## Abstract++This proposal introduces a `fill_value` property to PyTorch sparse+tensors that generalizes the current interpretation of unspecified+elements from zero value to any value, including an indefinite+fill value as a future extension.++## Motivation and Scope++In general, the unspecified elements of sparse tensors have+domain-specific interpretations:+- In Linear Algebra domain, the unspecified elements are zero-valued.+- In Graph domain, the unspecified elements of adjacency matrices+  represent non-edges. In this document, a non-edge corresponds to+  an unspecified fill value.+- In neural networks, sparse tensors can be inputs, say, to activation+  functions that are defined in terms of elementary functions.+  Evaluation of such functions on sparse tensors can be element-wise,+  or one-dimension-wise operations. This includes the evaluation of+  the functions on the values corresponding to unspecified+  elements. Even when the unspecified elements are initially+  zero-valued, the values of unspecified elements may be mapped to+  nonzero values (see [an example+  below](#application-random-sequence-of-rare-events-with-non-zero-mean)). So,+  in the domain of Calculus, the unspecified elements of sparse+  tensors can have any defined value.++Currently, in PyTorch, the unspecified elements of sparse tensors are+zero-valued with few exceptions. For instance, the+`torch.sparse.softmax` function assumes that the unspecified elements+of a sparse tensor input are negative infinity (-inf) valued.++In PyTorch 1.6, element-wise functions from Calculus such as `exp`,+`log`, etc, or arithmetic operations such as addition, subtraction,+etc, on sparse tensors are not supported, because the existing+functions on sparse tensors assume that the unspecified elements are+zero valued (this applies to functions in `torch` namespace, functions+in `torch.sparse` namespace may use a different interpretation for+unspecified elements).++To support Calculus functions on sparse tensors, we propose adding a+`fill_value` property to PyTorch sparse tensors that will represent+the values of unspecified sparse tensor elements.  While doing so, we+also need to consider:+- how the nonzero fill value affects the result of linear algebra+  operations such as matrix multiplication, etc.+- how to define a fill value for hybrid tensors (tensors with sparse+  and dense dimensions),+- how PyTorch autograd support needs to deal with a specified+  fill value.+++## Proposal++This proposal is about enabling sparse tensor support for the Calculus+domain while preserving the existing functionality for the Linear+Algebra domain, and allowing extensions to the Graph domain.++In the following, we describe the sparse tensor `fill_value` feature+with examples involving sparse tensors in COO storage format.  When a+new sparse tensor storage format is introduced to PyTorch, the same+semantics should be applied to the new format.++0.  Used terminology++    - "Defined value" is a value for which memory is allocated and+      initialized.+    - "Undefined value" is a value for which memory is allocated, but+      the content of the memory can be arbitrary.+    - "Indefinite value" represents a structural lack of value.+    - "Sparse tensor format" is a memory-efficient storage format for tensors+      with many equal elements.+    - "Strided tensor format" is a process-efficient storage format+      for general tensors.+    - "Dense tensor" is a tensor with many non-equal elements. Dense+      tensors can be stored in any tensor format, including a sparse+      tensor format.+    - "Dense part" is a strided tensor that is obtained by fixing the+      indices of all sparse dimensions in a hybrid tensor.++1.  We propose to extend sparse tensor constructors with a keyword+    argument `fill_value`, used to define the value for+    unspecified elements of the constructed sparse tensor.++    For instance, the Python signature of `sparse_coo_tensor` would be:++    ```python+    torch.sparse_coo_tensor(indices, values, size=None, fill_value=None, dtype=None, device=None, requires_grad=False)+    ```++    where `fill_value=None` indicates the constructor to use the+    default fill value.++2.  The default fill value is zero.++    This choice is consistent with the interpretation of an unspecified+    element in Linear Algebra.++    The default fill value of a sparse non-hybrid tensor is a scalar+    tensor: `torch.tensor(0, dtype=A.dtype, device=A.device)`. For the+    default fill value of a hybrid tensor, see points 5 and 6 below.++3.  PyTorch functions that have the `layout` argument may use a different+    `fill_value` when constructing a sparse tensor as defined by the+    following table:++    | Function              | `fill_value` of returned sparse tensor |+    | :-------------------- | :----------------------------------- |+    | `torch.empty`         | undefined value                      |+    | `torch.empty_like`    | undefined value                      |+    | `torch.empty_strided` | N/A                                  |+    | `torch.eye`           | 0                                    |+    | `torch.full`          | same as `fill_value` argument        |+    | `torch.full_like`     | same as `fill_value` argument        |+    | `torch.ones`          | 1                                    |+    | `torch.ones_like`     | 1                                    |+    | `torch.zeros`         | 0                                    |+    | `torch.zeros_like`    | 0                                    |++    Note that this table does not include functions `torch.arange`,+    `torch.linspace`, `torch.logspace` and `torch.range` that have the+    `layout` argument as well. We excluded these functions here+    because these are likely never used for creating sparse tensors.+    See also point 14.ii below.++4.  The fill value of a sparse tensor can be acquired via the+    `fill_value()` method that returns a strided `torch.Tensor`+    instance with the same dtype and storage location properties as+    the elements of the sparse tensor:++    - `A.fill_value().dtype == A.dtype`+    - `A.fill_value().device == A.device`++    See point 6 below for how the output of `fill_value()` method is+    computed.++5.  The fill value of a hybrid sparse tensor has the same shape as the+    dense part of the tensor.++    For instance, for a sparse tensor `A` in COO storage format we+    have:++    - `A.fill_value().shape == A.values().shape[1:]`++    Using a non-scalar fill value for hybrid tensors is a natural+    extension of a scalar fill value for non-hybrid tensors:++    1. Hybrid tensors can be considered as sparse arrays with dense+       tensors as elements. The `fill_value` attribute represents an unspecified+       element of the array, hence, the fill value has all the+       properties that any array element has, including the shape and+       memory consumption.++    2. A non-scalar fill value of a hybrid tensor is more memory-efficient+       than if the fill value were a scalar.++       For instance, when applying a row-wise function, say, `softmax`+       to a hybrid tensor, each row may induce a different fill value.+       As an example, consider a 2D hybrid tensor with one sparse and+       one dense dimension in COO format, and apply softmax along the+       first dimension assuming the default fill value `0`. We have:++       ```python+       >>> A = torch.sparse_coo_tensor(indices=[[0, 3]],+                                       values=[[.11, .12], [.31, .32]],+                                       size=(4, 2))+       >>> A.to_dense()+       tensor([[0.1100, 0.1200],+               [0.0000, 0.0000],+               [0.0000, 0.0000],+               [0.3100, 0.3200]])+       >>> A.to_dense().softmax(0)+       tensor([[0.2492, 0.2503],+               [0.2232, 0.2220],+               [0.2232, 0.2220],+               [0.3044, 0.3057]])+       ```++       Observe that the softmax result can be represented as a sparse+       tensor with the same sparsity structure (same indices) as the+       input only when the fill value is a 1-D dense tensor, that is:++       ```python+       A.softmax(0) == torch.sparse_coo_tensor(indices=A.indices(),+                                               values=[[0.2492, 0.2503], [0.3044, 0.3057]],+                                               size=A.size(),+                                               fill_value=[0.2232, 0.2220])+       ```++       The alternative, if the fill value was defined as scalar,+       would be memory inefficient because the result of+       `A.softmax(0)` is a full sparse tensor with all elements+       specified.++6.  The fill value specified that is specified in the `fill_value`

minor: typo here, delete the first "specified"

pearu

comment created time in 7 days

Pull request review commentpytorch/rfcs

PyTorch Sparse Tensors: fill-value property

+# PyTorch Sparse Tensors: `fill_value` property++|            |                 |+| ---------- | --------------- |+| Authors    | Pearu Peterson  |+| Status     | Draft           |+| Type       | Process         |+| Created    | 2020-09-08      |+| Resolution | TBD             |++## Abstract++This proposal introduces a `fill_value` property to PyTorch sparse+tensors that generalizes the current interpretation of unspecified+elements from zero value to any value, including an indefinite+fill value as a future extension.++## Motivation and Scope++In general, the unspecified elements of sparse tensors have+domain-specific interpretations:+- In Linear Algebra domain, the unspecified elements are zero-valued.+- In Graph domain, the unspecified elements of adjacency matrices+  represent non-edges. In this document, a non-edge corresponds to+  an unspecified fill value.+- In neural networks, sparse tensors can be inputs, say, to activation+  functions that are defined in terms of elementary functions.+  Evaluation of such functions on sparse tensors can be element-wise,+  or one-dimension-wise operations. This includes the evaluation of+  the functions on the values corresponding to unspecified+  elements. Even when the unspecified elements are initially+  zero-valued, the values of unspecified elements may be mapped to+  nonzero values (see [an example+  below](#application-random-sequence-of-rare-events-with-non-zero-mean)). So,+  in the domain of Calculus, the unspecified elements of sparse+  tensors can have any defined value.++Currently, in PyTorch, the unspecified elements of sparse tensors are+zero-valued with few exceptions. For instance, the+`torch.sparse.softmax` function assumes that the unspecified elements+of a sparse tensor input are negative infinity (-inf) valued.++In PyTorch 1.6, element-wise functions from Calculus such as `exp`,+`log`, etc, or arithmetic operations such as addition, subtraction,+etc, on sparse tensors are not supported, because the existing+functions on sparse tensors assume that the unspecified elements are+zero valued (this applies to functions in `torch` namespace, functions+in `torch.sparse` namespace may use a different interpretation for+unspecified elements).++To support Calculus functions on sparse tensors, we propose adding a+`fill_value` property to PyTorch sparse tensors that will represent+the values of unspecified sparse tensor elements.  While doing so, we+also need to consider:+- how the nonzero fill value affects the result of linear algebra+  operations such as matrix multiplication, etc.+- how to define a fill value for hybrid tensors (tensors with sparse+  and dense dimensions),+- how PyTorch autograd support needs to deal with a specified+  fill value.+++## Proposal++This proposal is about enabling sparse tensor support for the Calculus+domain while preserving the existing functionality for the Linear+Algebra domain, and allowing extensions to the Graph domain.++In the following, we describe the sparse tensor `fill_value` feature+with examples involving sparse tensors in COO storage format.  When a+new sparse tensor storage format is introduced to PyTorch, the same+semantics should be applied to the new format.++0.  Used terminology++    - "Defined value" is a value for which memory is allocated and+      initialized.+    - "Undefined value" is a value for which memory is allocated, but+      the content of the memory can be arbitrary.+    - "Indefinite value" represents a structural lack of value.+    - "Sparse tensor format" is a memory-efficient storage format for tensors+      with many equal elements.+    - "Strided tensor format" is a process-efficient storage format+      for general tensors.+    - "Dense tensor" is a tensor with many non-equal elements. Dense+      tensors can be stored in any tensor format, including a sparse+      tensor format.+    - "Dense part" is a strided tensor that is obtained by fixing the+      indices of all sparse dimensions in a hybrid tensor.++1.  We propose to extend sparse tensor constructors with a keyword+    argument `fill_value`, used to define the value for+    unspecified elements of the constructed sparse tensor.++    For instance, the Python signature of `sparse_coo_tensor` would be:++    ```python+    torch.sparse_coo_tensor(indices, values, size=None, fill_value=None, dtype=None, device=None, requires_grad=False)+    ```++    where `fill_value=None` indicates the constructor to use the+    default fill value.++2.  The default fill value is zero.++    This choice is consistent with the interpretation of an unspecified+    element in Linear Algebra.++    The default fill value of a sparse non-hybrid tensor is a scalar+    tensor: `torch.tensor(0, dtype=A.dtype, device=A.device)`. For the+    default fill value of a hybrid tensor, see points 5 and 6 below.++3.  PyTorch functions that have the `layout` argument may use a different+    `fill_value` when constructing a sparse tensor as defined by the+    following table:++    | Function              | `fill_value` of returned sparse tensor |+    | :-------------------- | :----------------------------------- |+    | `torch.empty`         | undefined value                      |+    | `torch.empty_like`    | undefined value                      |+    | `torch.empty_strided` | N/A                                  |+    | `torch.eye`           | 0                                    |+    | `torch.full`          | same as `fill_value` argument        |+    | `torch.full_like`     | same as `fill_value` argument        |+    | `torch.ones`          | 1                                    |+    | `torch.ones_like`     | 1                                    |+    | `torch.zeros`         | 0                                    |+    | `torch.zeros_like`    | 0                                    |++    Note that this table does not include functions `torch.arange`,+    `torch.linspace`, `torch.logspace` and `torch.range` that have the+    `layout` argument as well. We excluded these functions here+    because these are likely never used for creating sparse tensors.+    See also point 14.ii below.++4.  The fill value of a sparse tensor can be acquired via the+    `fill_value()` method that returns a strided `torch.Tensor`+    instance with the same dtype and storage location properties as+    the elements of the sparse tensor:++    - `A.fill_value().dtype == A.dtype`+    - `A.fill_value().device == A.device`++    See point 6 below for how the output of `fill_value()` method is+    computed.++5.  The fill value of a hybrid sparse tensor has the same shape as the+    dense part of the tensor.++    For instance, for a sparse tensor `A` in COO storage format we+    have:++    - `A.fill_value().shape == A.values().shape[1:]`++    Using a non-scalar fill value for hybrid tensors is a natural+    extension of a scalar fill value for non-hybrid tensors:++    1. Hybrid tensors can be considered as sparse arrays with dense+       tensors as elements. The `fill_value` attribute represents an unspecified+       element of the array, hence, the fill value has all the+       properties that any array element has, including the shape and+       memory consumption.++    2. A non-scalar fill value of a hybrid tensor is more memory-efficient+       than if the fill value were a scalar.++       For instance, when applying a row-wise function, say, `softmax`+       to a hybrid tensor, each row may induce a different fill value.+       As an example, consider a 2D hybrid tensor with one sparse and+       one dense dimension in COO format, and apply softmax along the+       first dimension assuming the default fill value `0`. We have:++       ```python+       >>> A = torch.sparse_coo_tensor(indices=[[0, 3]],+                                       values=[[.11, .12], [.31, .32]],+                                       size=(4, 2))+       >>> A.to_dense()+       tensor([[0.1100, 0.1200],+               [0.0000, 0.0000],+               [0.0000, 0.0000],+               [0.3100, 0.3200]])+       >>> A.to_dense().softmax(0)+       tensor([[0.2492, 0.2503],+               [0.2232, 0.2220],+               [0.2232, 0.2220],+               [0.3044, 0.3057]])+       ```++       Observe that the softmax result can be represented as a sparse+       tensor with the same sparsity structure (same indices) as the+       input only when the fill value is a 1-D dense tensor, that is:++       ```python+       A.softmax(0) == torch.sparse_coo_tensor(indices=A.indices(),+                                               values=[[0.2492, 0.2503], [0.3044, 0.3057]],+                                               size=A.size(),+                                               fill_value=[0.2232, 0.2220])+       ```++       The alternative, if the fill value was defined as scalar,+       would be memory inefficient because the result of+       `A.softmax(0)` is a full sparse tensor with all elements+       specified.++6.  The fill value specified that is specified in the `fill_value`+    argument of the sparse tensor constructors may have different+    (smaller) shape than the fill value of the hybrid tensor (as+    defined by point 5).++    The specified fill value (after converting it to `torch.Tensor`+    instance) can be acquired via `_fill_value()` method.++    For example, the fill value of a (1+1)-D hybrid tensor can be+    specified as scalar:++    ```python+    A = torch.sparse_coo_tensor(indices=[[0, 3]],+                                values=[[.11, .12], [.31, .32]],+                                size=(4, 2),+                                fill_value = 1.2)+    A.fill_value() -> torch.tensor([1.2, 1.2])+    A._fill_value() -> torch.tensor(1.2)+    ```++    The output of `fill_value()` is computed as++    ```python+    A._fill_value().resize(A.values().shape[1:])+    ```++    Storing the specified fill value instead of the fill value of the

This point is a little unclear. The "specified fill value" is fill_value = 1.2 in the example above, but what is "the fill value of the hybrid tensor" and why does 1.2 reduce memory consumption?

Or, is this related to Hameer's point about it being broadcastable. If so, I understand - but this text is cryptic.

pearu

comment created time in 7 days

PullRequestReviewEvent
PullRequestReviewEvent

issue commentregro/conda-metachannel

metachannel is not accessible or is invalid

Would it make sense to officially end-of-life conda-metachannel?

This package was featured quite prominently in Anaconda's blogs on performance,

  • https://www.anaconda.com/blog/understanding-and-improving-condas-performance
  • https://www.anaconda.com/blog/how-we-made-conda-faster-4-7

and is mentioned in the conda-forge FAQ under "Installing and updating takes a long time, what can I do?".

But now the package is going unmaintained (last commit was 14 months ago) and the metachannel service has been down for months.

I'd suggest this is doing users a disservice. Conda still has a lot of performance problems, so users will be looking for those blogs and that FAQ entry. I'd rather they see "try using Mamba instead of Conda" as the answer in the FAQ instead.

allefeld

comment created time in 8 days

pull request commentscipy/scipy

DEP: remove setup_requires

A couple of comments left, and there's a merge conflict as well now. @kousu could you update this? I'd like to get it merged.

kousu

comment created time in 8 days

PR closed numpy/numpy

WIP: CI: try building with setuptools 50.0 31 - Third-party binaries

Reverts gh-16993

See also https://github.com/scipy/scipy/pull/12798

+6 -6

11 comments

5 changed files

rgommers

pr closed time in 8 days

pull request commentnumpy/numpy

WIP: CI: try building with setuptools 50.0

Not needing this anymore, we're staying with the 0.49 pin. So closing this PR.

rgommers

comment created time in 8 days

delete branch rgommers/dask

delete branch : add-arrayfuncs-to-docs

delete time in 8 days

pull request commentnumpy/numpy.org

deploy via gh-pages

@mattip, thanks for asking - yes there is. Copying over the exact deploy method we use for the numpy devdocs to this repo, so we can deploy to the numpy.github.com repo.

mattip

comment created time in 8 days

PR opened dask/dask

Add a number of missing ufuncs to dask.array docs

I noticed power was missing (which also caused at least one broken cross-link within the docs), and when adding it noticed that a lot more functions from ufunc.py were missing. It may still not be 100% complete - that would require adding a test to CI I think that the included functions match what's in the namespace (e.g. with refguide_check.py - but it's a lot closer than before.

+19 -0

0 comment

1 changed file

pr created time in 8 days

push eventrgommers/dask

James Bourbeau

commit sha 5ad732dcec47bae2821898eb64787538df37d6e2

Fix from_sequence typo in delayed best practices (#5045)

view details

tshatrov

commit sha b26771d90f49f3b08821bf89e4a38b338914a487

Fix cumulative functions on tables with more than 1 partition (#5034) Fixes #5024

view details

asmith26

commit sha 41acaa6db7371004e6e08933c8ae554cedfcab94

TST: check dtypes unchanged when using DataFrame.assign * Add test to check dtypes unchanged when using assign (https://github.com/dask/dask/issues/3907)

view details

Tom Augspurger

commit sha 140a27b57dfe0c3a5a694e04c3c0cd07412d56eb

CI: Environment creation overhaul (#5038) Clean up environment creation

view details

David Brochart

commit sha 4815850923e1e6eaa7466d8279053ba877971f45

Changes vizualize to visualize (#5061)

view details

Sean McKenna

commit sha 188930f24ce317ffba643ab669a22136226cb98e

Point to latest K8s setup article in JupyterHub docs (#5065)

view details

GALI PREM SAGAR

commit sha ddcfd80c11a2b4b9be022947ae980dc42556a90d

Remove hard dependency on pandas in get_dummies (#5057) This enables support with pandas-like libraries, like cudf.

view details

Ralf Gommers

commit sha 2f55cf432ab4334e7ed6b03e62dc169e4208b33e

Change __array_function__ implementation for backwards compatibility (#5043) * Add regression tests for #5031 (__array_function__ min/amin alias) * Deal with numpy aliases and functions and numpy functions not in dask See #5031 for discussion. Note that decorating all functions is not yet complete; this is enough to show tests in `test_numpy_compat.py` passing. * Clean up `@implements` decorator, addresses review comment on gh-5043 * Better handling of args, kwargs for __array_function__. Addresses review comment on gh-5043. This is a lot less code, and it handles the case of, e.g., a dask array of cupy arrays correctly (this didn't work before so not needed for backwards compat, but it's nice to make it work). * Add @implements to all functions in reductions.py and routines.py * Add @implements to support __array_function__ for fft functions * Add @implements to support __array_function__ to linalg functions * Add @implements to dask/array/core functions * Change __array_function__ test from check for TypeError to check for warning * Fix some issues with array_ufunc tests. isreal/iscomplex/real/imag aren't ufuncs * Add argmin/argmax/nanargmin/nanargmax to __array_function__ dict Can't do in chunk.py, that would result in a circular dependency. * Add __array_function__ handling for functions in creation.py * Use @implements decorator as a function to clean up some __array_function__ registration * Make warning message in __array_function__ more explicit, change to FutureWarning * Fix last test failure - FutureWarning only happens if NEP18 is enabled * Update __array_function__ to only use dict mapping for aliases and name mismatches * Change warning message for __array_function__ to use map_blocks.

view details

Matthew Rocklin

commit sha 36db9e9a6de8ab5173375a54097a4341f6a440bf

Add recompute= keyword to svd_compressed for lower-memory use (#5041) The svd_compressed algorithm needs to use the input array a few times Often, this array is too large to fit into memory, and so it is better to recompute it. This PR adds a keyword, ``compute=``, which optionally triggers this behavior. It is turned off by default. We also fix up the docstring a bit and one of the keys in the high level graph. This also introduces a dask.distributed agnostic wait function to wait on persisted data if we're using the distributed scheduler, but to do nothing otherwise. With this change we're able to compute arbitrarily large svd computations in low memory.

view details

Willi Rath

commit sha ce253bad0c222f054a53c56356504dc443704071

Ensure scalar arrays are not rendered to SVG (#5058) * Add test for scalar array SVG rendering * Ensure empty chunks not drawn * Move test to html repr * Add error test for scalars and comment test * Clarify and test for evg-rendering exceptions * Test for scalar as well * Fix typo

view details

Peter Andreas Entschev

commit sha bc82231f07c6c48b131a19003eacbd29e28b7912

Fix compute_meta recursion in blockwise (#5048)

view details

James Bourbeau

commit sha ea59c670c5d2b96bd63a51ac795d79faf6449741

Fixes upstream dev CI build installation (#5072)

view details

James Bourbeau

commit sha 381457a945031510184f4ef61affb3dc6b8d6139

Removes index keyword from pandas to_parquet call (#5075)

view details

James Bourbeau

commit sha ed33fbe6ec47e361d1f6f45b84acfe0a98e511ca

bump version to 2.1.0

view details

Brett Naul

commit sha e0d14f7a9a39b01c6615f394970d6ed268e7f2d2

Fix pd.MultiIndex size estimate (#5066)

view details

Matthew Rocklin

commit sha 9de17fc850dfba4b1f27743a6edcc2463c463e2c

Use da.from_array(..., asarray=False) if input follows NEP-18 (#5074) * Use da.from_array(..., asarray=False) if the input implements __array_function__ Previously we called `np.asarray` on each chunk in `da.from_array`. This can be troublesome if we have arrays like `cupy` or `sparse` that are capable of performing computations on their own. Now we look for an `__array_function__` method on the input as a signal that it can handle numpy-like functions. If it has this method then we don't call `np.asarray` * List True/False/None options explicitly * clean up test_unregistered_func

view details

James Bourbeau

commit sha 3eb2dc89c9a70dcb828e86d043e23b751ab0d0fd

Adds NumFOCUS badge to the README (#5086)

view details

Jim Crist

commit sha c792364d1528c9c688a66c4d116e5f3b49776bb8

Update developer docs [ci skip] (#5093) Unfortunately `psutil` doesn't package wheels, and is a requirement for `distributed` (which is pulled in automatically by `distributed`). We add it to the develop instructions, and remove `mock` which is no longer needed for python 3.

view details

Natalya Rapstine

commit sha 395a72cc9206b91a83a36ec77c2698c608d5c878

Fixing issue #2982 (#5099) * fix doc issue 2982 * fix doc issue 2982 * Update dask/dataframe/core.py Co-Authored-By: James Bourbeau <jrbourbeau@users.noreply.github.com> * Update dask/dataframe/core.py Co-Authored-By: James Bourbeau <jrbourbeau@users.noreply.github.com>

view details

GALI PREM SAGAR

commit sha 4b2dbd4469423cf313ec4e9fc6967b606493a3b1

[REVIEW] Generalizing has_known_categories (#5090) * make get_dummies compatible with cudf * picking the package name from the data._meta attribute * removing length check as it won't be necessary * generalizing has_known_categories * change from cat to categories as both Categorical Indices have this attribute in common

view details

push time in 8 days

create barnchrgommers/dask

branch : add-arrayfuncs-to-docs

created branch time in 8 days

push eventnumpy/numpy.org

Ralf Gommers

commit sha 14e66c790c7cc1e8dde56e62d25cbb1f0bd93260

New translations news.md (Japanese)

view details

push time in 8 days

push eventnumpy/numpy.org

Ralf Gommers

commit sha d6af00e0741a796a879bb05d43e384d651f3ecbf

New translations news.md (Spanish)

view details

push time in 8 days

push eventnumpy/numpy.org

Ralf Gommers

commit sha 64faba67c9ae94f0c0499851d4166fa54f8adf29

New translations news.md (Chinese Simplified)

view details

push time in 8 days

push eventnumpy/numpy.org

Ralf Gommers

commit sha 6e222e53105ba0ed198d6cab623216b413fe8ea1

New translations news.md (Portuguese, Brazilian)

view details

push time in 8 days

push eventnumpy/numpy.org

Ralf Gommers

commit sha 71540114b3d6714f6f3da6275adc8a56ef96b604

New translations news.md (Korean)

view details

push time in 8 days

pull request commentnumpy/numpy.org

deploy via gh-pages

This is really hacky, but until the above is done, use:

hugo
cd public
cp -R . ../../tmp/numpy.github.com/
git commit -a
git push origin master
mattip

comment created time in 8 days

push eventnumpy/numpy.github.com

Ralf Gommers

commit sha 25420151ade08230278a61e9bf72098da8232381

Deploy latest master with Python 3.9 and NumPy 1.19.2 news

view details

push time in 8 days

pull request commentnumpy/numpy.org

NEWS: add two news items

In it goes!

mattip

comment created time in 8 days

push eventnumpy/numpy.org

Matti Picus

commit sha e2fc8d5cab511bc8d4d0587f769a47b7494d6155

NEWS: add two news items (#358)

view details

push time in 8 days

PR merged numpy/numpy.org

NEWS: add two news items content

Add the latest 1.19.2 release and "prepare for python 3.9" news items

+25 -1

2 comments

1 changed file

mattip

pr closed time in 8 days

push eventmattip/numpy.org

Ralf Gommers

commit sha 5d393e1a0683afba2594a916387e1b696938c4f0

Fix conda-forge name

view details

push time in 8 days

PullRequestReviewEvent

pull request commentnumpy/numpy.org

NEWS: add two news items

use --only-binary=numpy or --only-binary=:all: to prevent pip from trying to build from source.

This hints at a bug in our PyPI classifiers I think. Not much we can do about it anymore though, because we can't change the classifiers for released versions anymore.

mattip

comment created time in 8 days

issue commentnumpy/numpy.org

Announce numpy releases here as well

Are you interested in the rc releases or only the stable ones?

Only in the stable releases I'd say. RCs are more a mailing list thing.

mattip

comment created time in 8 days

pull request commentscipy/scipy

ENH: add solver for minimum weight full bipartite matching

Ah okay. If one scipy submodule already depends on the other one, it's clear where to add tests. If not, probably in scipy/_lib/tests/ is the best place.

fuglede

comment created time in 9 days

pull request commentnumpy/numpy

DOC: add new glossary terms

Just discussed in the docs meeting, merged as is since it's in good shape. @mattip will open up a follow-up PR for finalizing the glossary content.

Thanks @mattip and @bjnath for sorting out this whole glossary thing!

mattip

comment created time in 9 days

push eventnumpy/numpy

Matti Picus

commit sha 4cdd3606160de923fb4054cf93f4ea02a356def0

DOC: add new glossary terms (#17263) * DOC: add new glossary terms * DOC: link to python Ellipsis * DOC: fixes from review * DOC: fixes from review * DOC: remove glossary items that belong to python

view details

push time in 9 days

PR merged numpy/numpy

DOC: add new glossary terms 04 - Documentation component: Documentation

This PR contains only the parts of gh-16996 that add new glossary terms. Adding a checklist so we can follow which ones have been reviewed and accepted. Edit this comment with ~...~ to mark those that were rejected. Hopefully this list reflects the PR faithfully.

  • [ ] ('n',)
  • [ ] -1
  • [ ] ...
  • [ ] :
  • [ ] <
  • [ ] >
  • [ ] advanced indexing
  • [ ] array scalar
  • [ ] axis
  • [ ] base
  • [ ] copy
  • [ ] dimension
  • [ ] dtype
  • [ ] fancy indexing
  • [ ] object array
  • [ ] ravel
  • [ ] stride
  • [ ] structured array
+262 -20

8 comments

2 changed files

mattip

pr closed time in 9 days

more