profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/changhiskhan/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.

changhiskhan/poseidon 63

Python CLI for Digital Ocean API v2

changhiskhan/classes 6

notebooks for classes

changhiskhan/d3 2

A JavaScript visualization library for HTML and SVG.

changhiskhan/numpy 2

Numpy main repository

changhiskhan/dotfiles 1

shell stuff

changhiskhan/ipython 1

Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.

changhiskhan/MarkdownPresenter 1

For when you're giving a presentation in half an hour, and you haven't got the time to open up keynote...

changhiskhan/notebooks 1

Misc Jupyter notebooks

changhiskhan/nvd3 1

Reusable charts and chart components for d3.js.

issue openedpandas-dev/pandas

BUG: MonthEnd(), April 31, 2020 is not a date

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [ ] (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample

import pandas as pd
from pandas.tseries.offsets import MonthEnd

pd.Timestamp('2020-05-01') + MonthEnd(0)
>>> Timestamp('2020-05-31 00:00:00')

Problem description

There are not 31 days in April... Or am I missing an April Fools joke?

I have tried this code above in multiple environments, however I keep getting April 31, 2020.

There is no error, I just think this is a bug and wanted to report it.

Expected Output

Timestamp('2020-05-30 00:00:00')

Output of pd.show_versions()

<details>

INSTALLED VERSIONS

commit : db08276bc116c438d3fdee492026f8223584c477 python : 3.7.4.final.0 python-bits : 64 OS : Darwin OS-release : 20.3.0 Version : Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8 pandas : 1.1.3 numpy : 1.19.2 pytz : 2020.1 dateutil : 2.8.1 pip : 21.0.1 setuptools : 53.0.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.3.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pyxlsb : None s3fs : None scipy : 1.5.2 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None </details>

created time in 7 minutes

push eventpandas-dev/pandas

Marco Gorelli

commit sha 9da3ca23850332e52d19250abbdcb7c482dc5508

enable branch coverage (#40142)

view details

push time in an hour

pull request commentpandas-dev/pandas

CI, TST enable branch coverage

thanks @MarcoGorelli

MarcoGorelli

comment created time in an hour

PR merged pandas-dev/pandas

Reviewers
CI, TST enable branch coverage CI
  • [x] Ensure all linting tests pass, see here for how to run them

As suggested on Gitter - this could help identify more untested parts of the codebase and hence uncover bugs

+1 -1

3 comments

1 changed file

MarcoGorelli

pr closed time in an hour

Pull request review commentpandas-dev/pandas

DOC: 1.2.3 release date

 .. _whatsnew_123: -What's new in 1.2.3 (March ??, 2021)+What's new in 1.2.3 (March 01, 2021)

-> 2

simonjayhawkins

comment created time in an hour

push eventpandas-dev/pandas

jbrockmendel

commit sha cbbaf20b34feab95fecba2daeb329947a703173d

TYP: to_arrays, BUG: from_records empty dtypes (#40121)

view details

push time in an hour

PR merged pandas-dev/pandas

TYP: to_arrays, BUG: from_records empty dtypes Dtypes Reshaping
  1. the argument annotation for columns in to_arrays isnt quite right. this is fixed by adding an ensure_index inside DataFrame.__init__
  2. call ensure_index in the empty case in to_arrays so we can tighten from Tuple[Any, Any] to Tuple[Any, Index]`
  3. in the structured array case in to_arrays, return a list of empty ndarrays instead of list of empty lists, so we can further tighten to Tuple[List[ArrayLike], Index]
  4. This breaks test_from_records_empty_with_nonempty_fields_gh3682, at which point we decide that the new behavior is better, so calling it a bugfix.
+21 -13

2 comments

5 changed files

jbrockmendel

pr closed time in an hour

push eventpandas-dev/pandas

jbrockmendel

commit sha 80b3e8d35b493a557996bbe85dd8640c4e15a0d4

REF: share recarray constructor code (#40129)

view details

push time in an hour

PR merged pandas-dev/pandas

REF: share recarray constructor code Constructors Refactor
  • [ ] closes #xxxx
  • [ ] tests added / passed
  • [ ] Ensure all linting tests pass, see here for how to run them
  • [ ] whatsnew entry
+32 -19

1 comment

3 changed files

jbrockmendel

pr closed time in an hour

Pull request review commentpandas-dev/pandas

REF: move Block.astype implementation to array_algos

+import inspect++import numpy as np++from pandas._typing import (+    ArrayLike,+    DtypeObj,+)++from pandas.core.dtypes.cast import (+    astype_dt64_to_dt64tz,+    astype_nansafe,+)+from pandas.core.dtypes.common import (+    is_datetime64_dtype,+    is_datetime64tz_dtype,+    is_dtype_equal,+    pandas_dtype,+)+from pandas.core.dtypes.dtypes import ExtensionDtype++from pandas.core.arrays import ExtensionArray+++def astype_array(values: ArrayLike, dtype: DtypeObj, copy: bool = False):+    """+    Cast array to the new dtype.++    Parameters+    ----------+    values : ndarray or ExtensionArray+    dtype : dtype object+    copy : bool, default False+        copy if indicated++    Returns+    -------+    ndarray or ExtensionArray+    """+    if (+        values.dtype.kind in ["m", "M"]+        and dtype.kind in ["i", "u"]+        and isinstance(dtype, np.dtype)+        and dtype.itemsize != 8+    ):+        # TODO(2.0) remove special case once deprecation on DTA/TDA is enforced+        msg = rf"cannot astype a datetimelike from [{values.dtype}] to [{dtype}]"+        raise TypeError(msg)++    if is_datetime64tz_dtype(dtype) and is_datetime64_dtype(values.dtype):+        return astype_dt64_to_dt64tz(values, dtype, copy, via_utc=True)++    if is_dtype_equal(values.dtype, dtype):+        if copy:+            return values.copy()+        return values++    if isinstance(values, ExtensionArray):+        values = values.astype(dtype, copy=copy)++    else:+        values = astype_nansafe(values, dtype, copy=copy)++    # now in ObjectBlock._maybe_coerce_values(cls, values):+    if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):+        values = np.array(values, dtype=object)++    return values+++def astype_array_safe(values, dtype, copy: bool = False, errors: str = "raise"):

if we move the pandas_dtype call up into the caller we can do it once instead of (block|array)-wise

Then also the dtype validation needs to be moved up (requiring some more changes to avoid duplication of that part). At the moment, it's a rather straightfoward cut and paste from blocks.py to cast.py, so I would maybe prefer to keep it that way for this PR.

jorisvandenbossche

comment created time in an hour

Pull request review commentpandas-dev/pandas

WIP: ENH: Support ZoneInfo timezones

     timezones,     tzconversion, )+from pandas.compat import PY39  from pandas import (     Timestamp,     date_range, ) import pandas._testing as tm +if PY39:+    from zoneinfo import ZoneInfo

there's also a backport i think

AlexKirko

comment created time in an hour

pull request commentpydata/pandas-datareader

Fix AV, see issue #741

Codecov Report

Merging #856 (617ea59) into master (90f155a) will not change coverage. The diff coverage is 0.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #856   +/-   ##
=======================================
  Coverage   64.03%   64.03%           
=======================================
  Files          65       65           
  Lines        2906     2906           
  Branches      311      311           
=======================================
  Hits         1861     1861           
  Misses        969      969           
  Partials       76       76           
Impacted Files Coverage Δ
pandas_datareader/av/time_series.py 40.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 90f155a...617ea59. Read the comment docs.

Lambdac0re

comment created time in an hour

issue commentpandas-dev/pandas

CI: failing numpy_dev 3.8 build

agree. see https://github.com/pandas-dev/pandas/pull/40143#issuecomment-787999262. will start release tomorrow regardless.

jreback

comment created time in 2 hours

pull request commentpandas-dev/pandas

TYP: to_arrays, BUG: from_records empty dtypes

whatsnew added + greenish

jbrockmendel

comment created time in 2 hours

pull request commentpandas-dev/pandas

PERF: extract_array -> _values

Should we then also use consistently pd_array in tests and remove the line

another option would be to define it pd_array in core.construction and import it into the pd namespace as array

jbrockmendel

comment created time in 2 hours

Pull request review commentpandas-dev/pandas

REF: move Block.astype implementation to array_algos

+import inspect++import numpy as np++from pandas._typing import (+    ArrayLike,+    DtypeObj,+)++from pandas.core.dtypes.cast import (+    astype_dt64_to_dt64tz,+    astype_nansafe,+)+from pandas.core.dtypes.common import (+    is_datetime64_dtype,+    is_datetime64tz_dtype,+    is_dtype_equal,+    pandas_dtype,+)+from pandas.core.dtypes.dtypes import ExtensionDtype++from pandas.core.arrays import ExtensionArray+++def astype_array(values: ArrayLike, dtype: DtypeObj, copy: bool = False):

Moved to just below astype_nansafe

jorisvandenbossche

comment created time in 2 hours

pull request commentpandas-dev/pandas

PERF: extract_array -> _values

Should we then also use consistently pd_array in tests and remove the line

https://github.com/pandas-dev/pandas/blob/95a86a9884e6e0a32aeffff345e39434bbb722e5/scripts/check_for_inconsistent_pandas_namespace.py#L32

? cc @jorisvandenbossche as you're commented on pd_array previously

jbrockmendel

comment created time in 2 hours

PR opened pandas-dev/pandas

PERF: extract_array -> _values

Found that grepping for uses of pd.array is a PITA, so went through and changed them all (inside core) to pd_array. @MarcoGorelli would it be feasible to make a code check for this?

+20 -19

0 comment

7 changed files

pr created time in 2 hours

issue commentpandas-dev/pandas

CI: failing numpy_dev 3.8 build

@simonjayhawkins since this is only about numpy master (and they don't plan to release any time soon, they just had a release), I don't think this should be blocking 1.2.3 (still be useful for 1.2.x in general if there is a fix, of course)

jreback

comment created time in 2 hours

PR opened pydata/pandas-datareader

Fix AV, see issue #741
  • [x] closes #741
  • [ ] tests added / passed
  • [ ] passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • [ ] passes black --check pandas_datareader
  • [ ] added entry to docs/source/whatsnew/vLATEST.txt

Can't do the tests etc right now, just doing a quick inline bugfix, sorry about that.

+1 -1

0 comment

1 changed file

pr created time in 3 hours

issue commentpandas-dev/pandas

CI: failing numpy_dev 3.8 build

cc @rhshadrach

jreback

comment created time in 3 hours

pull request commentpandas-dev/pandas

ENH: Implement rounding for floating dtype array #38844

@jreback two tests failed in code that I did not write/modify? Is this normal?

benoit9126

comment created time in 4 hours

pull request commentpandas-dev/pandas

BUG: raise on RangeIndex.array

I am more concerned with extract_array than with .array, and extract_array definitely shouldn't be allocating an entirely new array.

Yes, as mentioned above, IMO the two don't necessarily need to do the same. I certainly follow that extract_array shouldn't allocate a new array.

(I actually dont like the .array property in most cases, pretty much always find ._values more useful)

The .array us public API, and being able to use array-like values (eg to circumvent alignment) reliably is nice. For this use case, having RangeIndex raise seems more annoying than useful? (of course, you still have the memory allocation you might not expect)

I suppose we could implement a RangeArray extension array?

id like to see this. among other things it would allows set_index(rangecol).reset_index() to round-trip nicely

Let's open a separate issue to discuss this if people would like to see this. (I am personally not convinced that we should add it: if we have it, it needs to be a proper dtype and allowed in columns, and IMO it is not needed to support a "range" dtype)

jbrockmendel

comment created time in 4 hours

Pull request review commentpandas-dev/pandas

REF: move Block.astype implementation to array_algos

+import inspect++import numpy as np++from pandas._typing import (+    ArrayLike,+    DtypeObj,+)++from pandas.core.dtypes.cast import (+    astype_dt64_to_dt64tz,+    astype_nansafe,+)+from pandas.core.dtypes.common import (+    is_datetime64_dtype,+    is_datetime64tz_dtype,+    is_dtype_equal,+    pandas_dtype,+)+from pandas.core.dtypes.dtypes import ExtensionDtype++from pandas.core.arrays import ExtensionArray+++def astype_array(values: ArrayLike, dtype: DtypeObj, copy: bool = False):

yah lets put it next to astype_nansafe

jorisvandenbossche

comment created time in 4 hours

Pull request review commentpandas-dev/pandas

REF: move Block.astype implementation to array_algos

+import inspect++import numpy as np++from pandas._typing import (+    ArrayLike,+    DtypeObj,+)++from pandas.core.dtypes.cast import (+    astype_dt64_to_dt64tz,+    astype_nansafe,+)+from pandas.core.dtypes.common import (+    is_datetime64_dtype,+    is_datetime64tz_dtype,+    is_dtype_equal,+    pandas_dtype,+)+from pandas.core.dtypes.dtypes import ExtensionDtype++from pandas.core.arrays import ExtensionArray+++def astype_array(values: ArrayLike, dtype: DtypeObj, copy: bool = False):+    """+    Cast array to the new dtype.++    Parameters+    ----------+    values : ndarray or ExtensionArray+    dtype : dtype object+    copy : bool, default False+        copy if indicated++    Returns+    -------+    ndarray or ExtensionArray+    """+    if (+        values.dtype.kind in ["m", "M"]+        and dtype.kind in ["i", "u"]+        and isinstance(dtype, np.dtype)+        and dtype.itemsize != 8+    ):+        # TODO(2.0) remove special case once deprecation on DTA/TDA is enforced+        msg = rf"cannot astype a datetimelike from [{values.dtype}] to [{dtype}]"+        raise TypeError(msg)++    if is_datetime64tz_dtype(dtype) and is_datetime64_dtype(values.dtype):+        return astype_dt64_to_dt64tz(values, dtype, copy, via_utc=True)++    if is_dtype_equal(values.dtype, dtype):+        if copy:+            return values.copy()+        return values++    if isinstance(values, ExtensionArray):+        values = values.astype(dtype, copy=copy)++    else:+        values = astype_nansafe(values, dtype, copy=copy)++    # now in ObjectBlock._maybe_coerce_values(cls, values):+    if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):+        values = np.array(values, dtype=object)++    return values+++def astype_array_safe(values, dtype, copy: bool = False, errors: str = "raise"):

make sense. if we move the pandas_dtype call up into the caller we can do it once instead of (block|array)-wise

jorisvandenbossche

comment created time in 4 hours

Pull request review commentpandas-dev/pandas

REF: move Block.astype implementation to array_algos

 def astype(self, dtype, copy: bool = False, errors: str = "raise"):         -------         Block         """-        errors_legal_values = ("raise", "ignore")--        if errors not in errors_legal_values:-            invalid_arg = (-                "Expected value of kwarg 'errors' to be one of "-                f"{list(errors_legal_values)}. Supplied value is '{errors}'"-            )-            raise ValueError(invalid_arg)--        if inspect.isclass(dtype) and issubclass(dtype, ExtensionDtype):-            msg = (-                f"Expected an instance of {dtype.__name__}, "-                "but got the class instead. Try instantiating 'dtype'."-            )-            raise TypeError(msg)--        dtype = pandas_dtype(dtype)+        values = self.values+        if values.dtype.kind in ["m", "M"]:+            values = self.array_values()

could move this into astype_array_safe and use ensure_wrapped_if_datetimelike; would make it robust to AM/BM (though i think both AM and BM now have PRs to make the arrays EAs to begin with)

jorisvandenbossche

comment created time in 4 hours

pull request commentpandas-dev/pandas

BUG: raise on RangeIndex.array

I suppose we could implement a RangeArray extension array?

id like to see this. among other things it would allows set_index(rangecol).reset_index() to round-trip nicely

The only strong opinion i have on this is that RangeIndex behavior should match MultiIndex behavior.

@jbrockmendel Can you explain why you want this? A practical reason? Or conceptually since both are not directly backed by a single array? (on which I would say: that's true, but RangeIndex still "represents" a single array, while MultiIndex does not. So I think there are also reasons to give them different behaviour)

(At least until there is a RangeArray) RangeIndex doesn't represent a single array. As you mentioned above, it has .values, but if that is the standard, then we should do the same for MultiIndex.

(I actually dont like the .array property in most cases, pretty much always find ._values more useful)

I am more concerned with extract_array than with .array, and extract_array definitely shouldn't be allocating an entirely new array.

jbrockmendel

comment created time in 4 hours

Pull request review commentpandas-dev/pandas

REF: move Block.astype implementation to array_algos

+import inspect++import numpy as np++from pandas._typing import (+    ArrayLike,+    DtypeObj,+)++from pandas.core.dtypes.cast import (+    astype_dt64_to_dt64tz,+    astype_nansafe,+)+from pandas.core.dtypes.common import (+    is_datetime64_dtype,+    is_datetime64tz_dtype,+    is_dtype_equal,+    pandas_dtype,+)+from pandas.core.dtypes.dtypes import ExtensionDtype++from pandas.core.arrays import ExtensionArray+++def astype_array(values: ArrayLike, dtype: DtypeObj, copy: bool = False):+    """+    Cast array to the new dtype.++    Parameters+    ----------+    values : ndarray or ExtensionArray+    dtype : dtype object+    copy : bool, default False+        copy if indicated++    Returns+    -------+    ndarray or ExtensionArray+    """+    if (+        values.dtype.kind in ["m", "M"]+        and dtype.kind in ["i", "u"]+        and isinstance(dtype, np.dtype)+        and dtype.itemsize != 8+    ):+        # TODO(2.0) remove special case once deprecation on DTA/TDA is enforced+        msg = rf"cannot astype a datetimelike from [{values.dtype}] to [{dtype}]"+        raise TypeError(msg)++    if is_datetime64tz_dtype(dtype) and is_datetime64_dtype(values.dtype):+        return astype_dt64_to_dt64tz(values, dtype, copy, via_utc=True)++    if is_dtype_equal(values.dtype, dtype):+        if copy:+            return values.copy()+        return values++    if isinstance(values, ExtensionArray):+        values = values.astype(dtype, copy=copy)++    else:+        values = astype_nansafe(values, dtype, copy=copy)++    # now in ObjectBlock._maybe_coerce_values(cls, values):+    if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):+        values = np.array(values, dtype=object)++    return values+++def astype_array_safe(values, dtype, copy: bool = False, errors: str = "raise"):

mypy isn't smart enough for that (we also didn't annotate dtype in the code from where I copied this). But will add values: ArrayLike

jorisvandenbossche

comment created time in 4 hours

Pull request review commentpandas-dev/pandas

REF: move Block.astype implementation to array_algos

+import inspect++import numpy as np++from pandas._typing import (+    ArrayLike,+    DtypeObj,+)++from pandas.core.dtypes.cast import (+    astype_dt64_to_dt64tz,+    astype_nansafe,+)+from pandas.core.dtypes.common import (+    is_datetime64_dtype,+    is_datetime64tz_dtype,+    is_dtype_equal,+    pandas_dtype,+)+from pandas.core.dtypes.dtypes import ExtensionDtype++from pandas.core.arrays import ExtensionArray+++def astype_array(values: ArrayLike, dtype: DtypeObj, copy: bool = False):

See my remark in the top post, for me either is fine

jorisvandenbossche

comment created time in 4 hours

issue commentpandas-dev/pandas

BUG: Roundtrip with openpyxl and datetime precision

Related issue on openpyxl: https://foss.heptapod.net/openpyxl/openpyxl/-/issues/1630

rhshadrach

comment created time in 4 hours