profile
viewpoint
Huon Wilson huonw Sydney, Australia http://huonw.github.io/ @data61. Formerly: Swift frontend @apple, core team @rust-lang.

Gankra/collect-rs 67

Miscellaneous Collections

huonw/brainfuck_macros 56

A brainfuck procedural macro for Rust

brendanzab/algebra 54

Abstract algebra for Rust (still very much a WIP!)

huonw/2048-4D 19

A small clone of 1024 (https://play.google.com/store/apps/details?id=com.veewo.a1024)

brendanzab/rusp 14

A minimal scripting and data language for Rust.

huonw/alias 10

alias offers some basic ways to mutate data while aliased.

huonw/cfor 8

A C-style for loop macro

huonw/char-iter 3

An iterator over a linear range of characters.

pull request commentmagit/forge

Add forge-insert-requested-pullreqs, to see review requests easily

Awesome; thank you!

huonw

comment created time in 21 minutes

delete branch huonw/forge

delete branch : feature-insert-requested-prs

delete time in 24 minutes

pull request commentbuildkite-plugins/docker-compose-buildkite-plugin

Check service names in cache-from against the services being built

Hi @jayco, I updated this per your suggestion. Could you take a second look? Thanks!

huonw

comment created time in 39 minutes

push eventstellargraph/stellargraph

Huon Wilson

commit sha 4a9c6fd402808a2c24814710f77062f3162e0318

docs, edge types, train/test/valid sizes

view details

push time in 2 hours

issue commentstellargraph/stellargraph

Error handling could be improved when node IDs appear in edges but not in nodes (new StellarGraph)

Work-arounds when only the edges are explicitly specified:

Single edge type (i.e. edges is a dataframe like in the issue):

sources_and_targets = pd.concat([edges.source, edges.target]))
nodes = pd.DataFrame(index=pd.unique(sources_and_targets))

g = StellarGraph(nodes, edges)

Multiple edge types (edges is a dict of data frames):

# example data:
edges = {
    "a": pd.DataFrame({"source": [1], "target": [2]}), 
    "b": pd.DataFrame({"source": [2], "target": [3]}, index=[1])
}

sources_and_targets = pd.concat(
    [series for df in edges.values() for series in [df.source, df.target]]
)
nodes = pd.DataFrame(index=pd.unique(sources_and_targets))

g = StellarGraph(nodes, edges)

Both of these assume there's only one node type.

kjun9

comment created time in 16 hours

create barnchstellargraph/stellargraph

branch : feature/knowledge-graph-datasets

created branch time in 16 hours

push eventstellargraph/stellargraph

Huon Wilson

commit sha 6b604aa3bc6657468fe5589d54fe230e5e5d1b6a

Add datasets.Cora.load to package up parsing and StellarGraph-ing

view details

Huon Wilson

commit sha 13bd0d9462f20397420e8babebd476ee831fa0ac

formatting

view details

Huon Wilson

commit sha b181d7883573d1a33380dedb09f65bf8af29d8c0

Merge remote-tracking branch 'origin/develop' into feature/812-load-cora

view details

Huon Wilson

commit sha 67b954c3d0283fdcb2c56d33b4f04c9d24d052a8

Fix directed GS

view details

Huon Wilson

commit sha 7e27797d838ebb2f7b82c23811bae3ef7af7ccd2

Fix cora GS

view details

Huon Wilson

commit sha 4b6f6b5a146fef79f6bf8cca1f6b821b88393eaf

Add test for Cora.load

view details

Huon Wilson

commit sha 51901868bd479cf04a22b5d6417e7b3f06bdff63

black

view details

Huon Wilson

commit sha 409206280fb9ec4105173f2ab3078480ffaaaf93

Revert

view details

Huon Wilson

commit sha 8719b2094859d5ab7bb31916b8262570780ea8d2

do unsupervised graphsage cora

view details

Huon Wilson

commit sha e57b59196686505855fec29b617262b531533676

gat

view details

Huon Wilson

commit sha 7d5f2e592e89808734a64f6bc365cab366c75f6e

be undirected by default, and add docs

view details

Huon Wilson

commit sha 59522eef8e3fc7819cac936fe1c0692adb3e8a08

Merge remote-tracking branch 'origin/develop' into feature/812-load-cora

view details

Huon Wilson

commit sha a369ce9f904f5042143d48d1d08746b114168e7e

rerun notebooks

view details

Huon Wilson

commit sha 9977dbbbffa3d6928189ffddc768d3cee484f741

Add changelog sections for post-0.10 development (#965)

view details

kevin

commit sha 26e6591469ca7a30d4b3d54c57166d37ce4ce00a

Mention branch protection in release procedure (#962) Part of #961

view details

Huon Wilson

commit sha 70fe4ae5ed44e50d8457a983a58ce578b259ad84

Mention StellarDiGraph

view details

Huon Wilson

commit sha 865a27c20967d67694b6b8f104cafa5b4a0f5f84

Use `if ...: raise ...` for input validation, not `assert` (#966) `assert`s in Python can disappear, if `-O` is specified. This means that they should be used for optional internal consistency checks, rather than input validation that must happen. Code Climate flags asserts for this reason, including this instance (https://codeclimate.com/github/stellargraph/stellargraph/stellargraph/layer/graph_attention.py/source#issue-340917ec283449fd7ae831f96b47c067) of input validation. Related to #954.

view details

Huon Wilson

commit sha 242fd7bebf7e2cbcdfefeae288cc1e2db7d80c3d

Disable some SonarPython checks on Code Climate (#968) This disables three SonarPython checks to reduce noise and make it more likely that the things it flags are useful/relevant. The checks disabled are: - number of parameters in a function: we generally allow many parameters, because we often have many with defaults - cognitive complexity: these are, arguably, not actually a useful measure (see also #595) - FIXME comments: we have `# FIXME(#someissue)` comments recording future work by connecting to our issue tracker This silences 39 issues (out of 145 total, and 72 from SonarPython specifically) on Code Climate. See: #954

view details

Huon Wilson

commit sha a66c7ffab4b69ac6c6e6088a9a6c1b2fde396b17

Disable the method-lines length check on Code Climate (#967) For the moment, we don't care about long methods, so reducing the noise from this check by turning it off is better than continuing to manually ignore it. This silences 2 issues on Code Climate. See: #954

view details

kevin

commit sha 176a50103257e7c0d8282ded6aae60c333eecc65

Clear session after reproducibility test (#974) Creating tensorflow models iteratively in reproducibility tests, ends up hogging a lot of memory. This explicitly clears the tensorflow session after each of these tests. Part of #971

view details

push time in 17 hours

delete branch stellargraph/stellargraph

delete branch : bugfix/931-sparse-efficiency

delete time in 17 hours

push eventstellargraph/stellargraph

Huon Wilson

commit sha cb39c2b8555f71b02be1bef5a13e778bf2b361f5

Optimise `setdiag(0)` in StellarGraph.to_adjacency_matrix on undirected graphs (#932) On a CSR matrix (https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html), `setdiag` will change the sparsity structure, meaning it has to introduce new elements into the optimised `numpy` arrays that it stores under the hood (which is `O(edges)` in our case). This is true even when setting the diagonal to `0` which theoretically only needs to clear the elements that exist, not add new ones (https://github.com/scipy/scipy/issues/11600). Per that issue, it's faster to manually find the zero elements in the diagonal, and clear only them to avoid changing the sparsity structure, which is `O(nodes)`. This improves the included benchmark on undirected graphs with 1000 nodes and 5000 edges significantly (15×), and will likely have a large improvement on larger graphs. (Times in microseconds.) | type | code | min (s) | mean (s) | stddev (s) | |------------|---------|--------:|---------:|-----------:| | directed | this PR | 484 | 565 | 88 | | directed | develop | 485 | 573 | 92 | | undirected | this PR | 937 | 1060 | 155 | | undirected | develop | 12900 | 15100 | 1240 | (The directed code path doesn't change and those observations are within noise of each other.) This also switches to `transpose` rather than constructing a whole new separate matrix for the backwards links, because that's also faster (min time: 1110us -> 937us). See: #931

view details

push time in 17 hours

PR merged stellargraph/stellargraph

Optimise `setdiag(0)` in StellarGraph.to_adjacency_matrix on undirected graphs

On a CSR matrix, setdiag will change the sparsity structure, meaning it has to introduce new elements into the optimised numpy arrays that it stores under the hood (which is O(edges) in our case). This is true even when setting the diagonal to 0 which theoretically only needs to clear the elements that exist, not add new ones (https://github.com/scipy/scipy/issues/11600). Per that issue, it's faster to manually find the zero elements in the diagonal, and clear only them to avoid changing the sparsity structure, which is O(nodes).

This improves the included benchmark on undirected graphs with 1000 nodes and 5000 edges significantly (15×), and will likely have a large improvement on larger graphs. (Times in microseconds.)

type code min (s) mean (s) stddev (s)
directed this PR 484 565 88
directed develop 485 573 92
undirected this PR 937 1060 155
undirected develop 12900 15100 1240

(The directed code path doesn't change and those observations are within noise of each other.)

This also switches to transpose rather than constructing a whole new separate matrix for the backwards links, because that's also faster (min time: 1110us -> 937us).

See: #931

+17 -2

2 comments

3 changed files

huonw

pr closed time in 17 hours

push eventstellargraph/stellargraph

Huon Wilson

commit sha 6b604aa3bc6657468fe5589d54fe230e5e5d1b6a

Add datasets.Cora.load to package up parsing and StellarGraph-ing

view details

Huon Wilson

commit sha 13bd0d9462f20397420e8babebd476ee831fa0ac

formatting

view details

Huon Wilson

commit sha b181d7883573d1a33380dedb09f65bf8af29d8c0

Merge remote-tracking branch 'origin/develop' into feature/812-load-cora

view details

Huon Wilson

commit sha 67b954c3d0283fdcb2c56d33b4f04c9d24d052a8

Fix directed GS

view details

Huon Wilson

commit sha 7e27797d838ebb2f7b82c23811bae3ef7af7ccd2

Fix cora GS

view details

Huon Wilson

commit sha 4b6f6b5a146fef79f6bf8cca1f6b821b88393eaf

Add test for Cora.load

view details

Huon Wilson

commit sha 51901868bd479cf04a22b5d6417e7b3f06bdff63

black

view details

Huon Wilson

commit sha 409206280fb9ec4105173f2ab3078480ffaaaf93

Revert

view details

Huon Wilson

commit sha 8719b2094859d5ab7bb31916b8262570780ea8d2

do unsupervised graphsage cora

view details

Huon Wilson

commit sha e57b59196686505855fec29b617262b531533676

gat

view details

Huon Wilson

commit sha 7d5f2e592e89808734a64f6bc365cab366c75f6e

be undirected by default, and add docs

view details

Huon Wilson

commit sha 4ffcf7e32806007f199258f7c8625480fb661508

Move modules from `stellargraph.utils` to top-level ones (#938) This moves most of the code within `stellargraph.utils` to dedicated top-level modules, because `utils` is an overly generic name. The old paths still work, but have runtime warnings to aid the migration. This is done by overriding `__getattr__` for the `stellargraph.utils` module (unfortunately this cannot use [PEP 562](https://www.python.org/dev/peps/pep-0562/) because that is Python 3.7+, and we support 3.6), and replacing each of the submodule files with, essentially: ```python warnings.warn(...) from path.to.new_module import * ``` (Unfortunately, I think these have to be files with the same name, so these new small files are stopping git from detecting the moves: all of the large new files are exact moves, with no other changes.) These two forms are both needed because there's two possibilities for how users might be using this: 1. `import stellargraph.utils` followed by `utils.foo`: calls `__getattr__` if it's not otherwise found 2. a direct import of a submodule `import stellargraph.utils.foo`: I believe this just does a file system traversal and opens/executes the given file (i.e. `stellargraph/utils/foo.py` or `stellargraph/utils/foo/__init__.py`) to create the submodule directly (rather than getting attributes of parent modules), and so needs the handling in each file individually Full list of changes: | old | new | |-----------------------------------------------|----------------------------------------------------------------------| | stellargraph.utils.calibration | stellargraph.calibration | | stellargraph.utils.IsotonicCalibration | stellargraph.calibration.IsotonicCalibration | | stellargraph.utils.TemperatureCalibration | stellargraph.calibration.TemperatureCalibration | | stellargraph.utils.expected_calibration_error | stellargraph.calibration.expected_calibration_error | | stellargraph.utils.plot_reliability_diagram | stellargraph.calibration.plot_reliability_diagram | | stellargraph.utils.ensemble | stellargraph.ensemble | | stellargraph.utils.BaggingEnsemble | stellargraph.ensemble.BaggingEnsemble | | stellargraph.utils.Ensemble | stellargraph.ensemble.Ensemble | | stellargraph.utils.saliency_maps | stellargraph.interpretability.saliency_maps | | stellargraph.utils.integrated_gradients | stellargraph.interpretability.saliency_maps.integrated_gradients | | stellargraph.utils.integrated_gradients_gat | stellargraph.interpretability.saliency_maps.integrated_gradients_gat | | stellargraph.utils.saliency_gat | stellargraph.interpretability.saliency_maps.saliency_gat | | stellargraph.utils.GradientSaliencyGAT | stellargraph.interpretability.GradientSaliencyGAT | | stellargraph.utils.IntegratedGradients | stellargraph.interpretability.IntegratedGradients | | stellargraph.utils.IntegratedGradientsGAT | stellargraph.interpretability.IntegratedGradientsGAT | Computed by adding the following to `stellargraph/utils/__init__.py`: ``` for name, (new_module_name, new_value) in sorted( _MAPPING.items(), key=lambda x: ("stellargraph." + x[1][0] + "zzz", x[0]) if x[1][0] is not None else (x[1][1].__name__, "") ): if isinstance(new_value, types.ModuleType): new_location = new_value.__name__ else: new_location = f"stellargraph.{new_module_name}.{new_value.__name__}" print(f"| `stellargraph.utils.{name}` | `{new_location}` |") raise ValueError() ``` See: #909

view details

Huon Wilson

commit sha b090f64937900e51aac7bfd737d34a51aa443b17

Use https for github links in the changelog, not http (#949) These were mistakenly written as 'http' links in #930.

view details

Kieran Ricardo

commit sha 52c2b5ce1b4592336108e36b703d4acf1f71b4ea

bumped version (#952)

view details

Huon Wilson

commit sha ba2a696bb3c8db4f87738d40a323a1cfb7c483f5

Use scores for computing ROC curve, not labels, in YELP example (#950) The ROC AUC reported by the example goes from ~0.9 to ~0.99 with this change. See: #428

view details

Huon Wilson

commit sha 3cf4b99d59cf9dc21c1722ace9c03aae69a7c003

Add MovieLens.load to parse and encode the movielens dataset (#947) This adds a `load` function that returns: - a StellarGraph containing the users (with IDs `u_...`), and movies (IDs `m_...`) as well as "rating" edges, where the features in the users nodes have been encoded and normalised - a pandas DataFrame containing the edges (as in `user_id` and `movie_id`) and their rating label to use for training/testing Example first few rows of each file for reference: `u.data`: ``` 196 242 3 881250949 186 302 3 891717742 22 377 1 878887116 ``` `u.item`: ``` 1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0 2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0 3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0 ``` `u.user`: ``` 1|24|M|technician|85711 2|53|F|other|94043 3|23|M|writer|32067 ``` See: #812

view details

kieranricardo

commit sha e13075bd6eccc0d01fedd489f02a895cb2acc6d6

Release 0.10.0

view details

kieranricardo

commit sha 3b4413ab928a776e443bbc51e6f7603d63b72134

Bump version

view details

kieranricardo

commit sha 25f66b8c588cddd4683ad5f48d7f832af91a5249

Merge remote-tracking branch 'origin/develop' into develop

view details

Huon Wilson

commit sha 59522eef8e3fc7819cac936fe1c0692adb3e8a08

Merge remote-tracking branch 'origin/develop' into feature/812-load-cora

view details

push time in 17 hours

Pull request review commentstellargraph/stellargraph

Add MUTAG to datasets

 class AIFB(     source="https://figshare.com/articles/AIFB_DataSet/745364", ):     pass+++class MUTAG(+    DatasetLoader,+    name="MUTAG",+    directory_name="mutag",+    url="https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/MUTAG.zip",+    url_archive_format="zip",+    expected_files=[+        "MUTAG_A.txt",+        "MUTAG_graph_indicator.txt",+        "MUTAG_node_labels.txt",+        "MUTAG_edge_labels.txt",+        "MUTAG_graph_labels.txt",+        "README.txt",+    ],+    description="Each graph represents a chemical compound and graph labels represent 'their mutagenic effect on a specific gram negative bacterium.'"+    "The dataset includes 188 graphs with 18 nodes and 20 edges on average for each graph. Graph nodes have 7 labels and each graph is labelled as belonging to 1 of 2 classes.",+    source="https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/",+):+    def load(self):+        """+        Load this dataset into a list of StellarGraph objects with corresponding labels, downloading it if required.++        Note: Edges in MUTAG are labelled as one of 4 values: aromatic, single, double, and triple indicated by integers+        0, 1, 2, 3 respectively. The edge labels are included in the  :class:`StellarGraph` objects as edge weights in+        integer representation.++        Returns:+            A tuple that is a list of :class:`StellarGraph` objects and a Pandas Series of labels one for each graph.+        """+        self.download()+        return self._load_from_location(location=self.data_directory)++    def _load_from_txt_file(+        self, location, filename, names=None, dtype=None, index_increment=None+    ):+        df = pd.read_csv(+            os.path.join(location, filename),+            header=None,+            index_col=False,+            dtype=dtype,+            names=names,+        )+        if index_increment:+            df.index = df.index + index_increment+        return df++    def _load_from_location(self, location):++        if not os.path.isdir(location):+            raise NotADirectoryError(+                "The location {} is not a directory.".format(location)+            )++        df_graph = self._load_from_txt_file(+            location=location, filename="MUTAG_A.txt", names=["source", "target"]+        )+        df_edge_labels = self._load_from_txt_file(+            location=location,+            filename="MUTAG_edge_labels.txt",+            names=["weight"],+            dtype=int,+        )+        df_graph = pd.concat([df_graph, df_edge_labels], axis=1)  # add edge weights+        df_graph_ids = self._load_from_txt_file(+            location=location,+            filename="MUTAG_graph_indicator.txt",+            names=["graph_id"],+            index_increment=1,+        )+        df_graph_labels = self._load_from_txt_file(+            location=location,+            filename="MUTAG_graph_labels.txt",+            dtype="category",+            names=["label"],+            index_increment=1,+        )  # binary labels {-1, 1}+        df_node_labels = self._load_from_txt_file(+            location=location,+            filename="MUTAG_node_labels.txt",+            dtype="category",+            index_increment=1,+        )++        # Let us one-hot encode the node labels because these are used as node features+        # in graph classification tasks.+        df_node_labels = pd.get_dummies(df_node_labels)++        graphs = []+        for graph_id in np.unique(df_graph_ids):

Right... that's a rather unhelpful response. "It gives the right answer" is only a small part of why one might do something a particular way: not all ways will be all equal in terms of performance or clarity or many other metrics one might choose.

I ran the following in a Jupyter notebook with and without the change above:

from stellargraph import datasets
mutag = datasets.MUTAG()
mutag.download()
%timeit -n 1 mutag.load()

Loop: 1.03 s ± 47.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Group-by: 878 ms ± 9.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In this case, it doesn't matter that much, because the time to construct the StellarGraph is the dominant factor, but:

  • I think there's a lot we can do to optimise StellarGraph's construction time, and thus reduce the overhead, making other factors more important (like the manual loop vs. group-by)
  • MUTAG is the smallest of the datasets your survey document mentions, and thus I suspect reducing the O(number of graphs * number of nodes) cost of this loop will be more valuable for them

Overall, I think it's fine to merge as is, but if we do so, it's something we'll have to keep in mind for future work.

PantelisElinas

comment created time in 18 hours

Pull request review commentstellargraph/stellargraph

Add MUTAG to datasets

 class AIFB(     source="https://figshare.com/articles/AIFB_DataSet/745364", ):     pass+++class MUTAG(+    DatasetLoader,+    name="MUTAG",+    directory_name="MUTAG",+    url="https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/MUTAG.zip",+    url_archive_format="zip",+    expected_files=[+        "MUTAG_A.txt",+        "MUTAG_graph_indicator.txt",+        "MUTAG_node_labels.txt",+        "MUTAG_edge_labels.txt",+        "MUTAG_graph_labels.txt",+        "README.txt",+    ],+    description="Each graph represents a chemical compound and graph labels represent 'their mutagenic effect on a specific gram negative bacterium.'"+    "The dataset includes 188 graphs with 18 nodes and 20 edges on average for each graph. Graph nodes have 7 labels and each graph is labelled as belonging to 1 of 2 classes.",+    source="https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/",

I think the more human readable page is a slightly better source page, rather than the listing of the zip files.

    source="https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets",
PantelisElinas

comment created time in 17 hours

Pull request review commentstellargraph/stellargraph

Add MUTAG to datasets

 class AIFB(     source="https://figshare.com/articles/AIFB_DataSet/745364", ):     pass+++class MUTAG(+    DatasetLoader,+    name="MUTAG",+    directory_name="mutag",+    url="https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/MUTAG.zip",+    url_archive_format="zip",+    expected_files=[+        "MUTAG_A.txt",+        "MUTAG_graph_indicator.txt",+        "MUTAG_node_labels.txt",+        "MUTAG_edge_labels.txt",+        "MUTAG_graph_labels.txt",+        "README.txt",+    ],+    description="Each graph represents a chemical compound and graph labels represent 'their mutagenic effect on a specific gram negative bacterium.'"+    "The dataset includes 188 graphs with 18 nodes and 20 edges on average for each graph. Graph nodes have 7 labels and each graph is labelled as belonging to 1 of 2 classes.",+    source="https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/",+):+    def load(self):+        """+        Load this dataset into a list of StellarGraph objects with corresponding labels, downloading it if required.++        Note: Edges in MUTAG are labelled as one of 4 values: aromatic, single, double, and triple indicated by integers+        0, 1, 2, 3 respectively. The edge labels are included in the  :class:`StellarGraph` objects as edge weights in+        integer representation.++        Returns:+            A tuple that is a list of :class:`StellarGraph` objects and a Pandas Series of labels one for each graph.+        """+        self.download()+        return self._load_from_location(location=self.data_directory)++    def _load_from_txt_file(+        self, location, filename, names=None, dtype=None, index_increment=None+    ):+        df = pd.read_csv(+            os.path.join(location, filename),+            header=None,+            index_col=False,+            dtype=dtype,+            names=names,+        )+        if index_increment:+            df.index = df.index + index_increment+        return df++    def _load_from_location(self, location):++        if not os.path.isdir(location):+            raise NotADirectoryError(+                "The location {} is not a directory.".format(location)+            )++        df_graph = self._load_from_txt_file(+            location=location, filename="MUTAG_A.txt", names=["source", "target"]+        )+        df_edge_labels = self._load_from_txt_file(+            location=location,+            filename="MUTAG_edge_labels.txt",+            names=["weight"],+            dtype=int,+        )+        df_graph = pd.concat([df_graph, df_edge_labels], axis=1)  # add edge weights+        df_graph_ids = self._load_from_txt_file(+            location=location,+            filename="MUTAG_graph_indicator.txt",+            names=["graph_id"],+            index_increment=1,

Tricky. I think it'd be great to have a comment noting that it exists to keep the correspondence between the rows representing individual rows and the node IDs used in edges.

PantelisElinas

comment created time in 18 hours

create barnchstellargraph/stellargraph

branch : bugfix/create_graph_schema-docs

created branch time in 18 hours

push eventstellargraph/stellargraph

Huon Wilson

commit sha 6b604aa3bc6657468fe5589d54fe230e5e5d1b6a

Add datasets.Cora.load to package up parsing and StellarGraph-ing

view details

Huon Wilson

commit sha 13bd0d9462f20397420e8babebd476ee831fa0ac

formatting

view details

Huon Wilson

commit sha b181d7883573d1a33380dedb09f65bf8af29d8c0

Merge remote-tracking branch 'origin/develop' into feature/812-load-cora

view details

Huon Wilson

commit sha 67b954c3d0283fdcb2c56d33b4f04c9d24d052a8

Fix directed GS

view details

Huon Wilson

commit sha 7e27797d838ebb2f7b82c23811bae3ef7af7ccd2

Fix cora GS

view details

Huon Wilson

commit sha 4b6f6b5a146fef79f6bf8cca1f6b821b88393eaf

Add test for Cora.load

view details

Huon Wilson

commit sha 51901868bd479cf04a22b5d6417e7b3f06bdff63

black

view details

Huon Wilson

commit sha 409206280fb9ec4105173f2ab3078480ffaaaf93

Revert

view details

Huon Wilson

commit sha 8719b2094859d5ab7bb31916b8262570780ea8d2

do unsupervised graphsage cora

view details

Huon Wilson

commit sha e57b59196686505855fec29b617262b531533676

gat

view details

Huon Wilson

commit sha 7d5f2e592e89808734a64f6bc365cab366c75f6e

be undirected by default, and add docs

view details

Huon Wilson

commit sha 59522eef8e3fc7819cac936fe1c0692adb3e8a08

Merge remote-tracking branch 'origin/develop' into feature/812-load-cora

view details

TheDen

commit sha a18963c8aa8f2ae3066f22a95f7bac44a716ef6f

update docker plugin to v3.5.0

view details

Huon Wilson

commit sha a369ce9f904f5042143d48d1d08746b114168e7e

rerun notebooks

view details

Denis Khoshaba

commit sha 679eae688687c4b868d204ef55420713e412e5a0

update docker-compose plugin to v3.2.0 (#963)

view details

Denis Khoshaba

commit sha 21bd6b81feac4d5d76dc551113ef87404e7cbd80

update docker plugin to v3.5.0 (#964)

view details

Huon Wilson

commit sha 9977dbbbffa3d6928189ffddc768d3cee484f741

Add changelog sections for post-0.10 development (#965)

view details

kevin

commit sha 26e6591469ca7a30d4b3d54c57166d37ce4ce00a

Mention branch protection in release procedure (#962) Part of #961

view details

Huon Wilson

commit sha 70fe4ae5ed44e50d8457a983a58ce578b259ad84

Mention StellarDiGraph

view details

Huon Wilson

commit sha 865a27c20967d67694b6b8f104cafa5b4a0f5f84

Use `if ...: raise ...` for input validation, not `assert` (#966) `assert`s in Python can disappear, if `-O` is specified. This means that they should be used for optional internal consistency checks, rather than input validation that must happen. Code Climate flags asserts for this reason, including this instance (https://codeclimate.com/github/stellargraph/stellargraph/stellargraph/layer/graph_attention.py/source#issue-340917ec283449fd7ae831f96b47c067) of input validation. Related to #954.

view details

push time in 18 hours

pull request commentstellargraph/stellargraph

Implement DistMult

The paper uses a custom loss function which we don't have

Worth noting 👍

This seems to be equivalent to the proposed changes in #882; but if anything this difference will improve results.

This sentence is a bit unclear to me. Just to make sure I'm on the same page: the approach that is currently in the library is likely to improve results over the one in the paper/#882?

huonw

comment created time in 18 hours

push eventstellargraph/stellargraph

Huon Wilson

commit sha 710bcd2e3cb75791f5957210208d1f22989c88f8

Add fit to test, remove embeddings_constraint

view details

push time in 18 hours

Pull request review commentstellargraph/stellargraph

Implement DistMult

 def build(self):         x_out = self(x_inp)          return x_inp, x_out+++class DistMultScore(Layer):+    """+    DistMult scoring Keras layer.++    Original Paper: Embedding Entities and Relations for Learning and Inference in Knowledge+    Bases. Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, Li Deng. ICLR 2015++    This combines subject, relation and object embeddings into a score of the likelihood of the+    link.+    """++    def __init__(self, *args, **kwargs):+        super().__init__(*args, **kwargs)++    def build(self, input_shape):+        self.built = True++    def call(self, inputs):+        e1, r, e2 = inputs+        # y_(e_1)^T M_r y_(e_2), where M_r = diag(w_r) is a diagonal matrix+        score = tf.reduce_sum(e1 * r * e2, axis=2)+        return score+++@experimental(reason="results from the reference paper have not been reproduced yet")+class DistMult:+    """+    Embedding layers and a DistMult scoring layers that implement the DistMult knowledge graph+    embedding algorithm as in https://arxiv.org/pdf/1412.6575.pdf++    Args:+        generator (KGTripleGenerator): A generator of triples to feed into the model.++        k (int): the dimension of the embedding (that is, a vector in R^k is learnt for each node+            and each link type)++        embedding_initializer (str or func, optional): The initialiser to use for the embeddings.++        embedding_regularizer (str or func, optional): The regularizer to use for the embeddings.+    """++    def __init__(+        self, generator, k, embedding_initializer=None, embedding_regularizer=None,

I'd personally prefer to not add complexity here (vs. waiting and doing the "right" thing of with embeddings_constraint), especially because AmpliGraph reports their best hyper parameters for DistMult are without normalisation: https://docs.ampligraph.org/en/1.2.0/experiments.html (normalize_ent_emb: false in each DistMult row).

huonw

comment created time in 18 hours

Pull request review commentstellargraph/stellargraph

Remove deprecations included in 0.10 release

 class StellarGraph:             Deprecated, use :meth:`from_networkx`.         edge_type_name:             Deprecated, use :meth:`from_networkx`.

I had carefully left these ones here despite removing the adjacent edge_weight_label because these arguments are not properly deprecated: they don't trigger a runtime warning yet. I think it's important that we use a runtime warning to prompt people to move because:

  • user with existing code won't re-read the docs, and
  • they may not realise that a breaking change listed in the CHANGELOG affects their code (as in, they may not remember all the details about their code)

These arguments don't trigger a runtime warning yet because "FIXME(#717): this should have a deprecation warning, once the tests and examples have stopped using it" on line 253/241 below.

huonw

comment created time in 18 hours

CommitCommentEvent

delete branch stellargraph/stellargraph

delete branch : feature/812-load-cora

delete time in 18 hours

push eventstellargraph/stellargraph

Huon Wilson

commit sha 6b604aa3bc6657468fe5589d54fe230e5e5d1b6a

Add datasets.Cora.load to package up parsing and StellarGraph-ing

view details

Huon Wilson

commit sha 13bd0d9462f20397420e8babebd476ee831fa0ac

formatting

view details

Huon Wilson

commit sha b181d7883573d1a33380dedb09f65bf8af29d8c0

Merge remote-tracking branch 'origin/develop' into feature/812-load-cora

view details

Huon Wilson

commit sha 67b954c3d0283fdcb2c56d33b4f04c9d24d052a8

Fix directed GS

view details

Huon Wilson

commit sha 7e27797d838ebb2f7b82c23811bae3ef7af7ccd2

Fix cora GS

view details

Huon Wilson

commit sha 4b6f6b5a146fef79f6bf8cca1f6b821b88393eaf

Add test for Cora.load

view details

Huon Wilson

commit sha 51901868bd479cf04a22b5d6417e7b3f06bdff63

black

view details

Huon Wilson

commit sha 409206280fb9ec4105173f2ab3078480ffaaaf93

Revert

view details

Huon Wilson

commit sha 8719b2094859d5ab7bb31916b8262570780ea8d2

do unsupervised graphsage cora

view details

Huon Wilson

commit sha e57b59196686505855fec29b617262b531533676

gat

view details

Huon Wilson

commit sha 7d5f2e592e89808734a64f6bc365cab366c75f6e

be undirected by default, and add docs

view details

Huon Wilson

commit sha 59522eef8e3fc7819cac936fe1c0692adb3e8a08

Merge remote-tracking branch 'origin/develop' into feature/812-load-cora

view details

Huon Wilson

commit sha a369ce9f904f5042143d48d1d08746b114168e7e

rerun notebooks

view details

Huon Wilson

commit sha 70fe4ae5ed44e50d8457a983a58ce578b259ad84

Mention StellarDiGraph

view details

Huon Wilson

commit sha e65804ca4352b876fb74e5962df6ccc3abdf8109

Add datasets.Cora.load to package up parsing and StellarGraph-ing (#913) * Add datasets.Cora.load to package up parsing and StellarGraph-ing * formatting * Fix directed GS * Fix cora GS * Add test for Cora.load * black * Revert * do unsupervised graphsage cora * gat * be undirected by default, and add docs * rerun notebooks * Mention StellarDiGraph

view details

push time in 18 hours

PR merged stellargraph/stellargraph

Add datasets.Cora.load to package up parsing and StellarGraph-ing

This adds a load() function to the Cora dataset loader, that returns a StellarGraph with the nodes (and features) and edges, and a pandas.Series with the node labels. This is rolled out to several of the notebooks that use the Cora dataset, but not all, because some of them do more than just create the graph and so it's a harder migration.

Part of: #812

+790 -1184

3 comments

8 changed files

huonw

pr closed time in 18 hours

push eventstellargraph/stellargraph

Huon Wilson

commit sha ede537ac98c2c8b9738a53b456f0c21aee1fc5c6

black

view details

push time in 19 hours

push eventstellargraph/stellargraph

Tim Pitman

commit sha a813bd308c62a004eed868d0963963c61b5d7776

Restore demo script for link prediction via random walks (#895) (#935)

view details

Kieran Ricardo

commit sha 55402cf2d6dbf17d5f61bd286b06a81efeca9898

moved the GraphWave "args" doc into the main class docstring (#944) * moved the GraphWave "args" doc into the main class docstring * fix formatting issue

view details

Kieran Ricardo

commit sha 6880c6d0f5c0c188ab63b601c5c7f3d5e8cec6ed

Directed graphsage link generator (#879) * added support for link prediction with directed graphsage * added demo for directed graphsage * updated demo + formatting * DirectedGraphSageLinkGenerator unit tests * typo fixes * typo fixes * typo fixes * edge_splitter bugfix * review suggestions * review suggestions * black formatting * added Edmonds algorithm for directed minimum spanning tree * Update CHANGELOG.md Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update stellargraph/data/edge_splitter.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * - made more variables private in `DirectedGraphSAGELinkGenerator` - updated stellargraph creation in demo - refactored tests * black formatting * merged develop * fixed edge_splitter to handle directed edges * reverted `DirectedGraphSAGELinkGenerator` `in_sample` and `out_samples` to public for use with `DirectedGraphSAGE` * edited demo handle predictions with directed edges * update DirectedGaphSAGELinkGenerator to use SeededPerBatch * updated parallelism * removed changes to `edge_splitter` and removed demo (will be implemented and tested in separate PR) * decreased parallelism Co-authored-by: Huon Wilson <wilson.huon@gmail.com>

view details

Huon Wilson

commit sha 7647b8751dc965d7b25cabe896768d8a43e9bd0f

Use path: . instead of url/sha256 to define conda package (#941) Using `path: .` makes it easy to build a package from the current state of the source tree, without having to first upload to PyPI, and then pull out the URL and sha256 hash of the uploaded package. As a test, `conda build .` is successful with this change. This does rely on the tree being the appropriate version to be published, but this should be true for someone following the release procedure. See: #742

view details

Kieran Ricardo

commit sha 7f9f45d9c1f4d5f4e327d89fb88710a500639bad

Feature/graphwave chebyshev (#920) * minimal implementation of graphwave * added support for multi-scale wavelets * added in option to select nodes in GraphWaveGenerator.flow() * added docstrings and black formatting * added GraphWave demo. * - fixed laplacian calculation - refactored embedding calculation - added `scales="auto"` option * added GraphWave into api.txt, added experimental tag to GraphWave, and added GraphWave info into the readme * connected issue to experimental tag * docstring typo fix * added copyright header * formatted demo notebook * review suggestions * review suggestions * black formatting * demo formatting * change GraphWave generator to use StellarGraph node indexing * increased parallelism to 33 * removed unnecessary data copying and other optimizations * removed unnecessary data copying and other optimizations * black formatting * added GraphWave auto eigenvalue search * removed option for auto-calculating scales * added documentation about removal of automatic scale calculation * Update stellargraph/mapper/graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * review suggestions * review suggestions * demo formatting * add `min_delta` user paramater to GraphWave * tweak to ensure no eigenvalue is missed * black formatting * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * added the option for using Chebyshev polynomials in GraphWave * review suggestions: - explanation of -1e-3 parameter - default values for `scales` parameters + explanation of how changing `scales` will affect the embeddings - filter out nan eigen values and vectors - demo improvements * demo formatting * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update stellargraph/mapper/graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * demo tweaks + note about differences with paper * refactored _chebyshev * removed eigs method + added a unit test for `_chebyshev` * made chebyshev more numerically stable * added unit tests + fixes to GraphWave to pass tests * black formatting * removed experimental tags + edited changelog * rename demo notebook * edit doc string * update demo * added the option to cache the GraphWave embeddings in storage * black formatting * copyright headers * Update stellargraph/mapper/graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update tests/mapper/test_graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update tests/mapper/test_graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update stellargraph/mapper/graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * nested graphwave in changelog * replace `deg` with `degree` * caste arrays in test to np.float32 * caste coefficient arrays to float32 immediately * replaced `np.isclose(..).all()` with `np.array_equals` and `np.allclose()` * removed equality with bools * removed unecessary scalar multiplication * added issue ref no to changelog * updated docstring * updated docstring * updated demo * created integer validation function + refactored graphwave tests * refactored tests to not use boradcasting * updated graphwave tests * refactored `_chebyshev` to be more readable * referenced paper in `_chebyshev` * removed `cache_filename` param + blacking * updated params in demo * fixed docstring and edited `require_integer_in_range` error msg * improved documentation * Update stellargraph/mapper/graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update CHANGELOG.md Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * improved `require_integer_in_range` * black formatting Co-authored-by: Huon Wilson <wilson.huon@gmail.com>

view details

kevin

commit sha d80f6825fec22ca9005cec9ac08a565c70ce3d4b

Update release procedure (#943) Major update to release procedure here is the change in the order of steps - PyPi and Conda publishing should now be done before pushing to master. This is so that any problems with the publishing steps can be resolved before packaging the release on GitHub. This also adds some missing items from previous experience with the release procedure. Some of the wording and formatting also updated in the document for clarity. Part of #940

view details

Huon Wilson

commit sha dc71b882e59a0a26b1e9e5945b13f18673af4dac

Add CHANGELOG entries for new work since 0.9 (#930) This is created by going over the list of changes since 0.9 and added changelog entries for the interesting ones. It also adds a table with a basic comparison of the memory use and construction time of the new `StellarGraph` object.

view details

Huon Wilson

commit sha 4ffcf7e32806007f199258f7c8625480fb661508

Move modules from `stellargraph.utils` to top-level ones (#938) This moves most of the code within `stellargraph.utils` to dedicated top-level modules, because `utils` is an overly generic name. The old paths still work, but have runtime warnings to aid the migration. This is done by overriding `__getattr__` for the `stellargraph.utils` module (unfortunately this cannot use [PEP 562](https://www.python.org/dev/peps/pep-0562/) because that is Python 3.7+, and we support 3.6), and replacing each of the submodule files with, essentially: ```python warnings.warn(...) from path.to.new_module import * ``` (Unfortunately, I think these have to be files with the same name, so these new small files are stopping git from detecting the moves: all of the large new files are exact moves, with no other changes.) These two forms are both needed because there's two possibilities for how users might be using this: 1. `import stellargraph.utils` followed by `utils.foo`: calls `__getattr__` if it's not otherwise found 2. a direct import of a submodule `import stellargraph.utils.foo`: I believe this just does a file system traversal and opens/executes the given file (i.e. `stellargraph/utils/foo.py` or `stellargraph/utils/foo/__init__.py`) to create the submodule directly (rather than getting attributes of parent modules), and so needs the handling in each file individually Full list of changes: | old | new | |-----------------------------------------------|----------------------------------------------------------------------| | stellargraph.utils.calibration | stellargraph.calibration | | stellargraph.utils.IsotonicCalibration | stellargraph.calibration.IsotonicCalibration | | stellargraph.utils.TemperatureCalibration | stellargraph.calibration.TemperatureCalibration | | stellargraph.utils.expected_calibration_error | stellargraph.calibration.expected_calibration_error | | stellargraph.utils.plot_reliability_diagram | stellargraph.calibration.plot_reliability_diagram | | stellargraph.utils.ensemble | stellargraph.ensemble | | stellargraph.utils.BaggingEnsemble | stellargraph.ensemble.BaggingEnsemble | | stellargraph.utils.Ensemble | stellargraph.ensemble.Ensemble | | stellargraph.utils.saliency_maps | stellargraph.interpretability.saliency_maps | | stellargraph.utils.integrated_gradients | stellargraph.interpretability.saliency_maps.integrated_gradients | | stellargraph.utils.integrated_gradients_gat | stellargraph.interpretability.saliency_maps.integrated_gradients_gat | | stellargraph.utils.saliency_gat | stellargraph.interpretability.saliency_maps.saliency_gat | | stellargraph.utils.GradientSaliencyGAT | stellargraph.interpretability.GradientSaliencyGAT | | stellargraph.utils.IntegratedGradients | stellargraph.interpretability.IntegratedGradients | | stellargraph.utils.IntegratedGradientsGAT | stellargraph.interpretability.IntegratedGradientsGAT | Computed by adding the following to `stellargraph/utils/__init__.py`: ``` for name, (new_module_name, new_value) in sorted( _MAPPING.items(), key=lambda x: ("stellargraph." + x[1][0] + "zzz", x[0]) if x[1][0] is not None else (x[1][1].__name__, "") ): if isinstance(new_value, types.ModuleType): new_location = new_value.__name__ else: new_location = f"stellargraph.{new_module_name}.{new_value.__name__}" print(f"| `stellargraph.utils.{name}` | `{new_location}` |") raise ValueError() ``` See: #909

view details

Huon Wilson

commit sha b090f64937900e51aac7bfd737d34a51aa443b17

Use https for github links in the changelog, not http (#949) These were mistakenly written as 'http' links in #930.

view details

Kieran Ricardo

commit sha 52c2b5ce1b4592336108e36b703d4acf1f71b4ea

bumped version (#952)

view details

Huon Wilson

commit sha ba2a696bb3c8db4f87738d40a323a1cfb7c483f5

Use scores for computing ROC curve, not labels, in YELP example (#950) The ROC AUC reported by the example goes from ~0.9 to ~0.99 with this change. See: #428

view details

Huon Wilson

commit sha 3cf4b99d59cf9dc21c1722ace9c03aae69a7c003

Add MovieLens.load to parse and encode the movielens dataset (#947) This adds a `load` function that returns: - a StellarGraph containing the users (with IDs `u_...`), and movies (IDs `m_...`) as well as "rating" edges, where the features in the users nodes have been encoded and normalised - a pandas DataFrame containing the edges (as in `user_id` and `movie_id`) and their rating label to use for training/testing Example first few rows of each file for reference: `u.data`: ``` 196 242 3 881250949 186 302 3 891717742 22 377 1 878887116 ``` `u.item`: ``` 1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0 2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0 3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0 ``` `u.user`: ``` 1|24|M|technician|85711 2|53|F|other|94043 3|23|M|writer|32067 ``` See: #812

view details

kieranricardo

commit sha e13075bd6eccc0d01fedd489f02a895cb2acc6d6

Release 0.10.0

view details

kieranricardo

commit sha 3b4413ab928a776e443bbc51e6f7603d63b72134

Bump version

view details

kieranricardo

commit sha 25f66b8c588cddd4683ad5f48d7f832af91a5249

Merge remote-tracking branch 'origin/develop' into develop

view details

TheDen

commit sha a18963c8aa8f2ae3066f22a95f7bac44a716ef6f

update docker plugin to v3.5.0

view details

Denis Khoshaba

commit sha 679eae688687c4b868d204ef55420713e412e5a0

update docker-compose plugin to v3.2.0 (#963)

view details

Denis Khoshaba

commit sha 21bd6b81feac4d5d76dc551113ef87404e7cbd80

update docker plugin to v3.5.0 (#964)

view details

Huon Wilson

commit sha 9977dbbbffa3d6928189ffddc768d3cee484f741

Add changelog sections for post-0.10 development (#965)

view details

kevin

commit sha 26e6591469ca7a30d4b3d54c57166d37ce4ce00a

Mention branch protection in release procedure (#962) Part of #961

view details

push time in 19 hours

pull request commentscipy/scipy

ENH: Optimise _cs_matrix._set_many when new entries are all zero

(I wrote this before I saw https://github.com/scipy/scipy/issues/11600#issuecomment-591768286 , which probably changes whether this is a reasonable approach.)

huonw

comment created time in 19 hours

PR opened scipy/scipy

ENH: Optimise _cs_matrix._set_many when new entries are all zero

Reference issue

Closes: gh-11600

What does this implement/fix?

This adds a special case for doing a scatter-set (arrayXarray) on a compressed matrices where there's many zeros, and, in particular, where the only new values outside the existing sparsity structure are zero.

Changing the sparsity structure of a csr_matrix or csc_matrix is expensive, and so it's a nice idea to check whether that's actually required, before doing anything. This accelerates functionality like m.setdiag(0) and m[i, j] = 0 significantly, because it never has to change sparsity.

For the latter case, the Getset.track_fancy_setitem benchmark is extended to also measure setting some indices to zero, in addition to setting it to random values. On csr and csc matrices, with a different sparsity structure, setting to zero is 3.7-67× faster than random.

The example in #11600 (setdiag(0) on a 10000x10000 random CSR matrix) goes from 147ms to ~5ms; essentially the same as the "manual" one described there.

+29 -11

0 comment

3 changed files

pr created time in 19 hours

push eventhuonw/scipy

Huon Wilson

commit sha a6e67df319690db2907e8930f4428cae4b5e11b3

ENH: Optimise _cs_matrix._set_many when new entries are all zero Changing the sparsity structure of a csr_matrix or csc_matrix is expensive, and so it's a nice idea to check whether that's actually required, before doing anything. This accelerates functionality like `m.setdiag(0)` and `m[i, j] = 0` significantly, because it never has to change sparsity. For the latter case, the `Getset.track_fancy_setitem` benchmark is extended to also measure setting some indices to zero, in addition to setting it to random values. On csr and csc matrices, with a different sparsity structure, setting to zero is 3.7-67× faster than random. The example in #11600 (`setdiag(0)` on a 10000x10000 random CSR matrix) goes from 147ms to ~5ms; essentially the same as the "manual" one described there.

view details

push time in 19 hours

create barnchhuonw/scipy

branch : special-case-zero-many

created branch time in 20 hours

issue openedscipy/scipy

sparse.csr_matrix.setdiag(0) is slow due to changing sparsity structure

I'm doing some graph manipulations using adjacency matrices, usually stored as scipy.sparse.csr_matrix instances. Some of this involves handling self-loops, which are the diagonal elements. One common operation is removing those loops, i.e. m.setdiag(0).

That operation is very slow (direct in the code below), because it unnecessarily changes the sparsity structure, even though the "new" elements it inserts during that change are just zeros anyway. It's so slow that even converting to COO and then back to CSR is faster (via_coo). Even better than that is using the fact that the new values are zero to only zero out existing structure (manual).

Subtracting out the baseline copying overhead, direct is ~4× slower than via_coo, and 60× slower than manual; with size = 10000.

The same likely applies to arr[x, y] = 0 too.

Reproducing code example:

import numpy as np
from scipy import sparse
import timeit

def run(f, n=5):
    t = timeit.timeit(f, number=n)/n
    print(f"{f.__name__} = {t * 1000:.2f}ms")

size = 10000
csr = sparse.random(size, size, random_state=0).tocsr()

def baseline_just_copy():
    # every example copies, to avoid successive runs from interfering
    # with each other, so measure that overhead
    x = csr.copy()
    
def direct():
    x = csr.copy()

    x.setdiag(0)
    
def via_coo():
    x = csr.copy()

    m = x.tocoo()
    m.setdiag(0)
    m.tocsr()
    
def manual():
    x = csr.copy()

    # manually zero only the non-zero diagonal elements
    nonzero, = x.diagonal().nonzero()
    x[nonzero, nonzero] = 0

run(baseline_just_copy)
run(direct)
run(via_coo)
run(manual)

Output

baseline_just_copy = 2.28ms
direct = 146.91ms
via_coo = 41.54ms
manual = 4.72ms

Error message:

N/A

Scipy/Numpy/Python version information:

1.4.1 1.17.4 sys.version_info(major=3, minor=6, micro=9, releaselevel='final', serial=0)

created time in 20 hours

fork huonw/scipy

Scipy library main repository

https://scipy.org/scipylib/

fork in 20 hours

push eventstellargraph/stellargraph

Huon Wilson

commit sha 7562dc6d525f22f983d5e761a065e8e67c6619ed

Fix

view details

push time in a day

delete branch stellargraph/stellargraph

delete branch : bugfix/954-code-climate-method-length

delete time in a day

push eventstellargraph/stellargraph

Huon Wilson

commit sha a66c7ffab4b69ac6c6e6088a9a6c1b2fde396b17

Disable the method-lines length check on Code Climate (#967) For the moment, we don't care about long methods, so reducing the noise from this check by turning it off is better than continuing to manually ignore it. This silences 2 issues on Code Climate. See: #954

view details

push time in a day

PR merged stellargraph/stellargraph

Reviewers
Disable the method-lines length check on Code Climate

For the moment, we don't care about long methods, so reducing the noise from this check by turning it off is better than continuing to manually ignore it.

This silences 2 issues on Code Climate.

See: #954

+3 -0

3 comments

1 changed file

huonw

pr closed time in a day

delete branch stellargraph/stellargraph

delete branch : bugfix/954-disable-complexity-checks

delete time in a day

push eventstellargraph/stellargraph

Huon Wilson

commit sha 242fd7bebf7e2cbcdfefeae288cc1e2db7d80c3d

Disable some SonarPython checks on Code Climate (#968) This disables three SonarPython checks to reduce noise and make it more likely that the things it flags are useful/relevant. The checks disabled are: - number of parameters in a function: we generally allow many parameters, because we often have many with defaults - cognitive complexity: these are, arguably, not actually a useful measure (see also #595) - FIXME comments: we have `# FIXME(#someissue)` comments recording future work by connecting to our issue tracker This silences 39 issues (out of 145 total, and 72 from SonarPython specifically) on Code Climate. See: #954

view details

push time in a day

PR merged stellargraph/stellargraph

Disable some SonarPython checks on Code Climate

This disables three SonarPython checks to reduce noise and make it more likely that the things it flags are useful/relevant. The checks disabled are:

  • number of parameters in a function: we generally allow many parameters, because we often have many with defaults
  • cognitive complexity: these are, arguably, not actually a useful measure (see also #595)
  • FIXME comments: we have # FIXME(#someissue) comments recording future work by connecting to our issue tracker

This silences 39 issues (out of 145 total, and 72 from SonarPython specifically) on Code Climate.

See: #954

+10 -0

2 comments

1 changed file

huonw

pr closed time in a day

delete branch stellargraph/stellargraph

delete branch : bugfix/no-assert-for-input-validation

delete time in a day

push eventstellargraph/stellargraph

Huon Wilson

commit sha 865a27c20967d67694b6b8f104cafa5b4a0f5f84

Use `if ...: raise ...` for input validation, not `assert` (#966) `assert`s in Python can disappear, if `-O` is specified. This means that they should be used for optional internal consistency checks, rather than input validation that must happen. Code Climate flags asserts for this reason, including this instance (https://codeclimate.com/github/stellargraph/stellargraph/stellargraph/layer/graph_attention.py/source#issue-340917ec283449fd7ae831f96b47c067) of input validation. Related to #954.

view details

push time in a day

PR merged stellargraph/stellargraph

Reviewers
Use `if ...: raise ...` for input validation, not `assert`

asserts in Python can disappear, if -O is specified. This means that they should be used for optional internal consistency checks, rather than input validation that must happen.

Code Climate flags asserts for this reason, including this instance of input validation.

Related to #954.

+3 -3

3 comments

1 changed file

huonw

pr closed time in a day

push eventstellargraph/stellargraph

Huon Wilson

commit sha 70fe4ae5ed44e50d8457a983a58ce578b259ad84

Mention StellarDiGraph

view details

push time in a day

push eventstellargraph/stellargraph

Huon Wilson

commit sha 96d6bcba26a6b61a0c401f7da761b269d70589d2

remove BlogCatalog3._load_from_location

view details

push time in a day

delete branch stellargraph/stellargraph

delete branch : feature/new-changelog

delete time in a day

push eventstellargraph/stellargraph

Huon Wilson

commit sha 9977dbbbffa3d6928189ffddc768d3cee484f741

Add changelog sections for post-0.10 development (#965)

view details

push time in a day

Pull request review commentstellargraph/stellargraph

Implement DistMult

 def build(self):         x_out = self(x_inp)          return x_inp, x_out+++class DistMultScore(Layer):+    """+    DistMult scoring Keras layer.++    Original Paper: Embedding Entities and Relations for Learning and Inference in Knowledge+    Bases. Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, Li Deng. ICLR 2015++    This combines subject, relation and object embeddings into a score of the likelihood of the+    link.+    """++    def __init__(self, *args, **kwargs):+        super().__init__(*args, **kwargs)++    def build(self, input_shape):+        self.built = True++    def call(self, inputs):+        e1, r, e2 = inputs+        # y_(e_1)^T M_r y_(e_2), where M_r = diag(w_r) is a diagonal matrix+        score = tf.reduce_sum(e1 * r * e2, axis=2)+        return score+++@experimental(reason="results from the reference paper have not been reproduced yet")+class DistMult:+    """+    Embedding layers and a DistMult scoring layers that implement the DistMult knowledge graph+    embedding algorithm as in https://arxiv.org/pdf/1412.6575.pdf++    Args:+        generator (KGTripleGenerator): A generator of triples to feed into the model.++        k (int): the dimension of the embedding (that is, a vector in R^k is learnt for each node+            and each link type)++        embedding_initializer (str or func, optional): The initialiser to use for the embeddings.++        embedding_regularizer (str or func, optional): The regularizer to use for the embeddings.+    """++    def __init__(+        self, generator, k, embedding_initializer=None, embedding_regularizer=None,

Ah, unfortunately this doesn't work due to https://github.com/tensorflow/tensorflow/issues/33755: it hits RuntimeError: Cannot use a constraint function on a sparse variable..

<details><summary>Full trace</summary>

_____________________________________________________________________________________________________________________________________________________________________________ test_dismult _____________________________________________________________________________________________________________________________________________________________________________

knowledge_graph = <stellargraph.core.graph.StellarDiGraph object at 0x107c67cc0>

    def test_dismult(knowledge_graph):
        # this test creates a random untrained model and predicts every possible edge in the graph, and
        # compares that to a direct implementation of the scoring method in the paper
        gen = KGTripleGenerator(knowledge_graph, 3)
    
        # use a random initializer with a large range, so that any differences are obvious
        init = initializers.RandomUniform(-1, 1)
        x_inp, x_out = DistMult(gen, 5, embeddings_initializer=init).build()
    
        model = Model(x_inp, x_out)
    
        model.compile(loss=losses.BinaryCrossentropy(from_logits=True))
    
        every_edge = itertools.product(
            knowledge_graph.nodes(),
            knowledge_graph._edges.types.pandas_index,
            knowledge_graph.nodes(),
        )
        df = triple_df(*every_edge)
    
        # check the model can be trained on a few (uneven) batches
        model.fit(
            gen.flow(df.iloc[:7], negative_samples=2),
>           validation_data=gen.flow(df.iloc[7:14], negative_samples=3),
        )

tests/layer/test_knowledge_graph.py:113: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py:728: in fit
    use_multiprocessing=use_multiprocessing)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2.py:324: in fit
    total_epochs=epochs)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2.py:123: in run_one_epoch
    batch_outs = execution_function(iterator)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py:86: in execution_function
    distributed_function(input_fn))
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py:457: in __call__
    result = self._call(*args, **kwds)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py:503: in _call
    self._initialize(args, kwds, add_initializers_to=initializer_map)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py:408: in _initialize
    *args, **kwds))
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py:1848: in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py:2150: in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py:2041: in _create_graph_function
    capture_by_value=self._capture_by_value),
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py:915: in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py:358: in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py:73: in distributed_function
    per_replica_function, args=(model, x, y, sample_weights))
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py:760: in experimental_run_v2
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py:1787: in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py:2132: in _call_for_each_replica
    return fn(*args, **kwargs)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py:292: in wrapper
    return func(*args, **kwargs)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py:264: in train_on_batch
    output_loss_metrics=model._output_loss_metrics)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_eager.py:311: in train_on_batch
    output_loss_metrics=output_loss_metrics))
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_eager.py:272: in _process_single_batch
    model.optimizer.apply_gradients(zip(grads, trainable_weights))
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py:441: in apply_gradients
    kwargs={"name": name})
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py:1917: in merge_call
    return self._merge_call(merge_fn, args, kwargs)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py:1924: in _merge_call
    return merge_fn(self._strategy, *args, **kwargs)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py:485: in _distributed_apply
    var, apply_grad_to_update_var, args=(grad,), group=False))
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py:1530: in update
    return self._update(var, fn, args, kwargs, group)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py:2142: in _update
    return self._update_non_slot(var, fn, (var,) + tuple(args), kwargs, group)
../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py:2148: in _update_non_slot
    result = fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

var = <tf.Variable 'DISTMULT_NODE/embeddings:0' shape=(4, 5) dtype=float32, numpy=
array([[ 0.00609207,  0.94447637,  0.8121...223864 , -0.5335603 ],
       [ 0.00211883, -0.8434782 , -0.61788964,  0.14921594,  0.03648853]],
      dtype=float32)>, grad = <tensorflow.python.framework.indexed_slices.IndexedSlices object at 0x14afaa208>

    def apply_grad_to_update_var(var, grad):
      """Apply gradient to variable."""
      if isinstance(var, ops.Tensor):
        raise NotImplementedError("Trying to update a Tensor ", var)
    
      apply_kwargs = {}
      if isinstance(grad, ops.IndexedSlices):
        if var.constraint is not None:
          raise RuntimeError(
>             "Cannot use a constraint function on a sparse variable.")
E         RuntimeError: Cannot use a constraint function on a sparse variable.

../../../.pyenv/versions/3.6.9/envs/sg3/lib/python3.6/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py:459: RuntimeError

</details>

huonw

comment created time in a day

fork huonw/tensorflow

An Open Source Machine Learning Framework for Everyone

https://tensorflow.org

fork in a day

PR opened stellargraph/stellargraph

Rename name/label -> attr in to_networkx too

This is doing the same change as #893, but for the StellarGraph -> NetworkX conversion direction.

+70 -16

0 comment

4 changed files

pr created time in a day

push eventstellargraph/stellargraph

Huon Wilson

commit sha d9b5a721c42cb83d2b6bc64c8dcecbed853369e1

fix newly broken test

view details

push time in a day

create barnchstellargraph/stellargraph

branch : feature/to_networkx-attr

created branch time in a day

push eventstellargraph/stellargraph

Huon Wilson

commit sha fb5e524e7fed6cb5449a5aa38e582fd11b71d96f

update test for new name

view details

push time in a day

push eventhuonw/github-review

Huon Wilson

commit sha d6f6a5404b93451440a933dc073a604ee7fe7227

Update documentation for switch to fetching all comments by default The switch happened in #37, but the docs weren't updated.

view details

push time in a day

PR opened charignon/github-review

Update documentation for switch to fixing all comments by default

The switch happened in #37, but the docs weren't updated.

+1 -2

0 comment

1 changed file

pr created time in a day

create barnchhuonw/github-review

branch : docs-for-comments

created branch time in a day

fork huonw/github-review

Github code reviews with Emacs.

fork in a day

PR opened magit/forge

Add forge-insert-requested-pullreqs, to see review requests easily

This allows seeing a list of PRs for which someone has requested your review in a status buffer (e.g. by adding it to magit-status-sections-hook). It builds on #241/#244.

It can, for instance, be used with https://github.com/charignon/github-review to review directly from the magic status buffer.

I haven't updated the docs/forge.texi file.

+29 -1

0 comment

2 changed files

pr created time in a day

push eventhuonw/forge

Huon Wilson

commit sha 9ec6040a62c9068ca922f8ece0a566d3705f464e

Add forge-insert-requested-pullreqs, to see review requests easily This allows seeing a list of PRs for which someone has requested your review in a status buffer (e.g. by adding it to `magit-status-sections-hook`). It builds on #241/#244. It can, for instance, be used with https://github.com/charignon/github-review to review directly from the magic status buffer.

view details

push time in a day

create barnchhuonw/forge

branch : feature-insert-requested-prs

created branch time in a day

Pull request review commentnumpy/numpy

BUG: fix unique(arr, axis=n) when `0 in arr.shape`

 def test_unique_axis(self):         result = np.array([[-0.0, 0.0]])         assert_array_equal(unique(data, axis=0), result, msg) +    def test_unique_axis_zeros(self):+        # issue 15559+        single_zero = np.empty(shape=(2, 0), dtype=np.int8)+        uniq, idx, inv, cnt = unique(single_zero, axis=0, return_index=True,+                                     return_inverse=True, return_counts=True)++        assert_equal(uniq.dtype, single_zero.dtype)+        msg = "Unique with shape=(2, 0) and axis=0 failed"+        assert_array_equal(uniq, np.empty(shape=(0, 0)), msg)

Ok, thanks for the info.

huonw

comment created time in a day

Pull request review commentnumpy/numpy

BUG: fix unique(arr, axis=n) when `0 in arr.shape`

 def unique(ar, return_index=False, return_inverse=False,         ret = _unique1d(ar, return_index, return_inverse, return_counts)         return _unpack_tuple(ret) +    if ar.size == 0:+        # if there's no elements (at least one axis is 0), the reshaping and+        # viewing doesn't work, but there's also no data, so those+        # manipulations aren't needed to get right answer.++        axis_len = ar.shape[axis]

Ah, nice catch; putting it after the moveaxis makes this a little more awkward (because it has to then also do np.moveaxis back to the right place), but not too bad.

huonw

comment created time in a day

Pull request review commentnumpy/numpy

BUG: fix unique(arr, axis=n) when `0 in arr.shape`

 def test_unique_axis(self):         result = np.array([[-0.0, 0.0]])         assert_array_equal(unique(data, axis=0), result, msg) +    def test_unique_axis_zeros(self):+        # issue 15559+        single_zero = np.empty(shape=(2, 0), dtype=np.int8)+        uniq, idx, inv, cnt = unique(single_zero, axis=0, return_index=True,+                                     return_inverse=True, return_counts=True)++        # there's 1 element of shape (0,) along axis 0+        assert_equal(uniq.dtype, single_zero.dtype)+        msg = "Unique with shape=(2, 0) and axis=0 failed"+        assert_array_equal(uniq, np.empty(shape=(1, 0)), msg)

Ah, ok 👍

huonw

comment created time in a day

Pull request review commentnumpy/numpy

BUG: fix unique(arr, axis=n) when `0 in arr.shape`

 def unique(ar, return_index=False, return_inverse=False,         ret = _unique1d(ar, return_index, return_inverse, return_counts)         return _unpack_tuple(ret) +    if ar.size == 0:+        # if there's no elements (at least one axis is 0), the reshaping and+        # viewing doesn't work, but there's also no data, so those+        # manipulations aren't needed to get right answer.++        axis_len = ar.shape[axis]+        if axis_len == 0:+            # there's no elements along this axis, so there's no unique+            # elements:+            num_empties = return_index + return_inverse + return_counts+            output = (ar,) + (np.array([], dtype=np.intp),) * num_empties

I think the dtype and shape may be different between these values and ar.

huonw

comment created time in a day

push eventhuonw/numpy

Huon Wilson

commit sha c3abc3b6f91f678a52d145d736ef29a2a12faf3a

Review feedback: AxisError, test neatness, np.zeros

view details

push time in a day

push eventstellargraph/stellargraph

Huon Wilson

commit sha 4d6117497f80cdf0549c3d3c84bbc984daec929f

remove from StellarDiGraph

view details

push time in a day

Pull request review commentstellargraph/stellargraph

Make graphsage link prediction reproducible

 def test_nai(petersen_graph, shuffle):             petersen_graph, targets, [2, 2], tf.optimizers.Adam(1e-3), shuffle=shuffle         )     )+++# FIXME (#970): This test fails intermittently with shuffle=True+@pytest.mark.parametrize("shuffle", [False])+def test_link_prediction(petersen_graph, shuffle):+    num_examples = 4

Hm, num_examples = 4 here with batch_size = 4 means that there'll only ever be one batch? And thus this test isn't actually testing reproducibility with respect to batch orders?

kjun9

comment created time in a day

issue openedpandas-dev/pandas

`isin(index)` is 2-6x slower than `isin(...)` with other data types including list, and orders of magnitude slower than using its hash table

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
import timeit

# data set up:
data_size = 1234
query_size = 10000

data_ndarray = np.random.randint(100000, size=data_size)
data_series = pd.Series(data_ndarray)

list_ = list(range(query_size))
range_index = pd.Index(range(query_size))
array_index = pd.Index(list_)
series = array_index.to_series()
ndarray = array_index.to_numpy()

print(f"{data_series.dtype=}, {range_index.dtype=}, {array_index.dtype=}")

N = 1000
def run(name, f):
    return name, timeit.timeit(f, number=N) / N


df = pd.DataFrame(
    [
        run("list", lambda: data_series.isin(list_)),
        run("range index", lambda: data_series.isin(range_index)),
        run("array index", lambda: data_series.isin(array_index)),
        run("series", lambda: data_series.isin(series)),
        run("ndarray", lambda: data_series.isin(ndarray)),
        # variations on using indices
        run("array index.to_numpy()", lambda: data_series.isin(array_index.to_numpy())),
        run("range index.__contains__", lambda: data_series.apply(range_index.__contains__)),
        run("array index.__contains__", lambda: data_series.apply(array_index.__contains__)),
        run("array index.get_indexer", lambda: array_index.get_indexer(data_series.values) != -1),
        # poke into the internals
        run("array index._engine.mapping.__contains__", lambda: data_series.apply(array_index._engine.mapping.__contains__)),
        run("array index._engine.get_indexer", lambda: array_index._engine.get_indexer(data_series.values) != -1),
        # numpy for comparison
        run("np.isin", lambda: np.isin(data_ndarray, ndarray)),
    ],
    columns=["name", "time"]
).set_index("name")

# double check that the get_indexer version works correctly
assert (data_series.isin(array_index) == (array_index._engine.get_indexer(data_series.values) != -1)).all()

print(df.sort_values("time"))

Output:

data_series.dtype=dtype('int64'), range_index.dtype=dtype('int64'), array_index.dtype=dtype('int64')
                                              time
name                                              
array index._engine.get_indexer           0.000016
array index.get_indexer                   0.000048
array index.to_numpy()                    0.000199
ndarray                                   0.000211
series                                    0.000224
np.isin                                   0.000227
array index._engine.mapping.__contains__  0.000292
array index.__contains__                  0.000586
list                                      0.000624
range index.__contains__                  0.000755
range index                               0.001444
array index                               0.001457

Problem description

The exact ratios and numbers depend very much on the data , especially because the get_indexer forms are O(data_size) (with O(1) indexing into the index's hash table), while the other forms likely depend on query_size, at least to pay the cost of converting data.

Expected Output

I'd expect:

  • using a pandas type to always be faster than a Python types like list,
  • it to not be much slower than even a simple variant like .isin(index.to_numpy()),
  • it to use the hash table if it exists, given how much faster that is

Output of pd.show_versions()

<details>

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.8.1.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 18.6.0
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_AU.UTF-8
LOCALE           : en_AU.UTF-8

pandas           : 1.0.1
numpy            : 1.18.1
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 19.2.3
setuptools       : 41.2.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : 7.12.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : None

</details>

created time in a day

push eventstellargraph/stellargraph

Huon Wilson

commit sha 976b5e5a1df65e34f53910eb0fdb81633d0137b7

Fix

view details

push time in 2 days

create barnchstellargraph/stellargraph

branch : feature/959-no-deprecations

created branch time in 2 days

push eventstellargraph/stellargraph

Huon Wilson

commit sha 0b0201f107be7577331dd695bd2b89116cee530c

yamllint

view details

push time in 2 days

push eventstellargraph/stellargraph

Huon Wilson

commit sha 14f72d43bd1b83def6d37b274d8e206b25339104

spell the configuration option correctly

view details

push time in 2 days

PR opened stellargraph/stellargraph

Disable some sonar-python checks on Code Climate

This disables three sonar-python checks to reduce noise and make it more likely that the things it flags are useful/relevant. The checks disabled are:

  • number of parameters in a function: we generally allow many parameters, because we often have many with defaults
  • cognitive complexity: these are, arguably, not actually a useful measure (see also #595)
  • FIXME comments: we have # FIXME(#someissue) comments recording future work by connecting to our issue tracker

See: #954

+11 -0

0 comment

1 changed file

pr created time in 2 days

PR opened stellargraph/stellargraph

Disable the method-length check on Code Climate

For the moment, we don't care about long methods, so reducing the noise from this check by turning it off is better than continuing to manually ignore it.

See: #954

+3 -0

0 comment

1 changed file

pr created time in 2 days

PR opened stellargraph/stellargraph

Use `if ...: raise ...` for input validation, not `assert`

asserts in Python can disappear, if -O is specified. This means that they should be used for optional internal consistency checks, rather than input validation.

Code Climate flags asserts for this reason, including this instance.

Related to #954.

+3 -3

0 comment

1 changed file

pr created time in 2 days

push eventstellargraph/stellargraph

Huon Wilson

commit sha 6e3230fd288a0dcfd321503f5e2951b2ff4b11f5

Disable individual sonar-python checks

view details

push time in 2 days

create barnchstellargraph/stellargraph

branch : feature/new-changelog

created branch time in 2 days

pull request commentstellargraph/stellargraph

Make graphsage link prediction reproducible

Hm, that's suspicious, but yeah, let's merge if we can't reproduce easily now.

kjun9

comment created time in 2 days

push eventstellargraph/stellargraph

Huon Wilson

commit sha a369ce9f904f5042143d48d1d08746b114168e7e

rerun notebooks

view details

push time in 2 days

Pull request review commentstellargraph/stellargraph

Add MUTAG to datasets

 def test_blogcatalog3_deprecated_load() -> None:         load_dataset_BlogCatalog3(dataset.data_directory)  +def test_mutag_load() -> None:+    graphs, labels = MUTAG().load()++    n_graphs = 188++    assert len(graphs) == n_graphs+    assert len(labels) == n_graphs  # one label per graph++    # get a list with the number of nodes in each graph+    n_nodes = [g.number_of_nodes() for g in graphs]++    # calculate average and max number of nodes across all graphs+    n_avg_nodes = np.mean(n_nodes)+    max_nodes = np.max(n_nodes)++    # average number of nodes should be 17.93085... or approximately 18.+    assert round(n_avg_nodes) == 18

Maybe instead of looking at the the averages, we could look at something exact:

    assert sum(n_nodes) == 3371

And similarly, verify that the edge selection is working appropriately, because it's slightly subtle:

n_edges = [g.number_of_edges() for g in graph]
assert sum(n_edges) == 7442
PantelisElinas

comment created time in 2 days

Pull request review commentstellargraph/stellargraph

Add MUTAG to datasets

 def test_blogcatalog3_deprecated_load() -> None:         load_dataset_BlogCatalog3(dataset.data_directory)  +def test_mutag_load() -> None:+    graphs, labels = MUTAG().load()++    n_graphs = 188++    assert len(graphs) == n_graphs+    assert len(labels) == n_graphs  # one label per graph++    # get a list with the number of nodes in each graph

FWIW, I'm unsure of the value of comments like this, because they just rephrase what the code is saying. It's definitely valuable if the code is subtle, but most of the code here is straightforward. Explaining what the (exact) average is definitely useful, but (effectively) explaining that np.mean calculates an average doesn't seem quite as useful.

PantelisElinas

comment created time in 2 days

Pull request review commentstellargraph/stellargraph

Add MUTAG to datasets

 def test_blogcatalog3_deprecated_load() -> None:         load_dataset_BlogCatalog3(dataset.data_directory)  +def test_mutag_load() -> None:+    graphs, labels = MUTAG().load()++    n_graphs = 188++    assert len(graphs) == n_graphs+    assert len(labels) == n_graphs  # one label per graph++    # get a list with the number of nodes in each graph+    n_nodes = [g.number_of_nodes() for g in graphs]++    # calculate average and max number of nodes across all graphs+    n_avg_nodes = np.mean(n_nodes)+    max_nodes = np.max(n_nodes)++    # average number of nodes should be 17.93085... or approximately 18.+    assert round(n_avg_nodes) == 18+    # maximum number of nodes should be 28+    assert max_nodes == 28++    # There are two labels -1 and 1+    assert len(np.unique(labels)) == 2

I think we could tweak this to be more specific, and thus make the comment above unnecessary:

    assert set(labels) == {-1, 1}
PantelisElinas

comment created time in 2 days

Pull request review commentstellargraph/stellargraph

Add MUTAG to datasets

 class AIFB(     source="https://figshare.com/articles/AIFB_DataSet/745364", ):     pass+++class MUTAG(+    DatasetLoader,+    name="MUTAG",+    directory_name="mutag",+    url="https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/MUTAG.zip",+    url_archive_format="zip",+    expected_files=[+        "MUTAG_A.txt",+        "MUTAG_graph_indicator.txt",+        "MUTAG_node_labels.txt",+        "MUTAG_edge_labels.txt",+        "MUTAG_graph_labels.txt",+        "README.txt",+    ],+    description="Each graph represents a chemical compound and graph labels represent 'their mutagenic effect on a specific gram negative bacterium.'"+    "The dataset includes 188 graphs with 18 nodes and 20 edges on average for each graph. Graph nodes have 7 labels and each graph is labelled as belonging to 1 of 2 classes.",+    source="https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/",+):+    def load(self):+        """+        Load this dataset into a list of StellarGraph objects with corresponding labels, downloading it if required.++        Note: Edges in MUTAG are labelled as one of 4 values: aromatic, single, double, and triple indicated by integers+        0, 1, 2, 3 respectively. The edge labels are included in the  :class:`StellarGraph` objects as edge weights in+        integer representation.++        Returns:+            A tuple that is a list of :class:`StellarGraph` objects and a Pandas Series of labels one for each graph.+        """+        self.download()+        return self._load_from_location(location=self.data_directory)++    def _load_from_txt_file(+        self, location, filename, names=None, dtype=None, index_increment=None+    ):+        df = pd.read_csv(+            os.path.join(location, filename),+            header=None,+            index_col=False,+            dtype=dtype,+            names=names,+        )+        if index_increment:+            df.index = df.index + index_increment+        return df++    def _load_from_location(self, location):++        if not os.path.isdir(location):+            raise NotADirectoryError(+                "The location {} is not a directory.".format(location)+            )++        df_graph = self._load_from_txt_file(+            location=location, filename="MUTAG_A.txt", names=["source", "target"]+        )+        df_edge_labels = self._load_from_txt_file(+            location=location,+            filename="MUTAG_edge_labels.txt",+            names=["weight"],+            dtype=int,+        )+        df_graph = pd.concat([df_graph, df_edge_labels], axis=1)  # add edge weights+        df_graph_ids = self._load_from_txt_file(+            location=location,+            filename="MUTAG_graph_indicator.txt",+            names=["graph_id"],+            index_increment=1,+        )+        df_graph_labels = self._load_from_txt_file(+            location=location,+            filename="MUTAG_graph_labels.txt",+            dtype="category",+            names=["label"],+            index_increment=1,+        )  # binary labels {-1, 1}+        df_node_labels = self._load_from_txt_file(+            location=location,+            filename="MUTAG_node_labels.txt",+            dtype="category",+            index_increment=1,+        )++        # Let us one-hot encode the node labels because these are used as node features+        # in graph classification tasks.+        df_node_labels = pd.get_dummies(df_node_labels)++        graphs = []+        for graph_id in np.unique(df_graph_ids):+            # find the subgraph with nodes that correspond to graph_id+            node_ids = list(+                df_graph_ids.loc[df_graph_ids["graph_id"] == graph_id].index+            )++            df_subgraph = df_graph[+                df_graph["source"].isin(node_ids) | df_graph["target"].isin(node_ids)

It probably doesn't matter, but strictly, this should be "and", not "or", because otherwise a subgraph could end up with dangling edges pointing to unknown nodes (i.e. not in node_ids)

                df_graph["source"].isin(node_ids) & df_graph["target"].isin(node_ids)

However, I assume that the dataset is designed so that there's no such edges, and the source and target of an edge always have the same graph ID; as in, the graph ID of the source uniquely determines which graph it is in. As such, this could be simplified to:

                df_graph["source"].isin(node_ids)

(This suggestion applies to graph_for_nodes in the group-by comment above.)

PantelisElinas

comment created time in 2 days

Pull request review commentstellargraph/stellargraph

Add MUTAG to datasets

 class AIFB(     source="https://figshare.com/articles/AIFB_DataSet/745364", ):     pass+++class MUTAG(+    DatasetLoader,+    name="MUTAG",+    directory_name="mutag",+    url="https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/MUTAG.zip",+    url_archive_format="zip",+    expected_files=[+        "MUTAG_A.txt",+        "MUTAG_graph_indicator.txt",+        "MUTAG_node_labels.txt",+        "MUTAG_edge_labels.txt",+        "MUTAG_graph_labels.txt",+        "README.txt",+    ],+    description="Each graph represents a chemical compound and graph labels represent 'their mutagenic effect on a specific gram negative bacterium.'"+    "The dataset includes 188 graphs with 18 nodes and 20 edges on average for each graph. Graph nodes have 7 labels and each graph is labelled as belonging to 1 of 2 classes.",+    source="https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/",+):+    def load(self):+        """+        Load this dataset into a list of StellarGraph objects with corresponding labels, downloading it if required.++        Note: Edges in MUTAG are labelled as one of 4 values: aromatic, single, double, and triple indicated by integers+        0, 1, 2, 3 respectively. The edge labels are included in the  :class:`StellarGraph` objects as edge weights in+        integer representation.++        Returns:+            A tuple that is a list of :class:`StellarGraph` objects and a Pandas Series of labels one for each graph.+        """+        self.download()+        return self._load_from_location(location=self.data_directory)++    def _load_from_txt_file(+        self, location, filename, names=None, dtype=None, index_increment=None+    ):+        df = pd.read_csv(+            os.path.join(location, filename),+            header=None,+            index_col=False,+            dtype=dtype,+            names=names,+        )+        if index_increment:+            df.index = df.index + index_increment+        return df++    def _load_from_location(self, location):++        if not os.path.isdir(location):+            raise NotADirectoryError(+                "The location {} is not a directory.".format(location)+            )++        df_graph = self._load_from_txt_file(+            location=location, filename="MUTAG_A.txt", names=["source", "target"]+        )+        df_edge_labels = self._load_from_txt_file(+            location=location,+            filename="MUTAG_edge_labels.txt",+            names=["weight"],+            dtype=int,+        )+        df_graph = pd.concat([df_graph, df_edge_labels], axis=1)  # add edge weights+        df_graph_ids = self._load_from_txt_file(+            location=location,+            filename="MUTAG_graph_indicator.txt",+            names=["graph_id"],+            index_increment=1,+        )+        df_graph_labels = self._load_from_txt_file(+            location=location,+            filename="MUTAG_graph_labels.txt",+            dtype="category",+            names=["label"],+            index_increment=1,+        )  # binary labels {-1, 1}+        df_node_labels = self._load_from_txt_file(+            location=location,+            filename="MUTAG_node_labels.txt",+            dtype="category",+            index_increment=1,+        )++        # Let us one-hot encode the node labels because these are used as node features+        # in graph classification tasks.+        df_node_labels = pd.get_dummies(df_node_labels)++        graphs = []+        for graph_id in np.unique(df_graph_ids):

I think this can be expressed more directly as a group-by operation using df_graph_ids["graph_id"] as the grouping key (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#iterating-through-groups), which saves us from having to do the manual comparisons to find the appropriate nodes in each set, as well as saving us from having to do the separate df_node_labels indexing:

def graph_for_nodes(nodes):
    edges = df_graph[df_graph["source"].isin(nodes.index) | df_graph["target"].isin(nodes.index)]
    return StellarGraph(nodes, edges)

groups = df_node_labels.groupby(df_graph_ids["graph_id"])
graphs = [graph_for_nodes(nodes) for _, nodes in groups]
PantelisElinas

comment created time in 2 days

Pull request review commentstellargraph/stellargraph

Add MUTAG to datasets

 class AIFB(     source="https://figshare.com/articles/AIFB_DataSet/745364", ):     pass+++class MUTAG(+    DatasetLoader,+    name="MUTAG",+    directory_name="mutag",+    url="https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/MUTAG.zip",+    url_archive_format="zip",+    expected_files=[+        "MUTAG_A.txt",+        "MUTAG_graph_indicator.txt",+        "MUTAG_node_labels.txt",+        "MUTAG_edge_labels.txt",+        "MUTAG_graph_labels.txt",+        "README.txt",+    ],+    description="Each graph represents a chemical compound and graph labels represent 'their mutagenic effect on a specific gram negative bacterium.'"+    "The dataset includes 188 graphs with 18 nodes and 20 edges on average for each graph. Graph nodes have 7 labels and each graph is labelled as belonging to 1 of 2 classes.",+    source="https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/",+):+    def load(self):+        """+        Load this dataset into a list of StellarGraph objects with corresponding labels, downloading it if required.++        Note: Edges in MUTAG are labelled as one of 4 values: aromatic, single, double, and triple indicated by integers+        0, 1, 2, 3 respectively. The edge labels are included in the  :class:`StellarGraph` objects as edge weights in+        integer representation.++        Returns:+            A tuple that is a list of :class:`StellarGraph` objects and a Pandas Series of labels one for each graph.+        """+        self.download()+        return self._load_from_location(location=self.data_directory)++    def _load_from_txt_file(+        self, location, filename, names=None, dtype=None, index_increment=None+    ):+        df = pd.read_csv(+            os.path.join(location, filename),+            header=None,+            index_col=False,+            dtype=dtype,+            names=names,+        )+        if index_increment:+            df.index = df.index + index_increment+        return df++    def _load_from_location(self, location):++        if not os.path.isdir(location):+            raise NotADirectoryError(+                "The location {} is not a directory.".format(location)+            )++        df_graph = self._load_from_txt_file(+            location=location, filename="MUTAG_A.txt", names=["source", "target"]+        )+        df_edge_labels = self._load_from_txt_file(+            location=location,+            filename="MUTAG_edge_labels.txt",+            names=["weight"],+            dtype=int,+        )+        df_graph = pd.concat([df_graph, df_edge_labels], axis=1)  # add edge weights+        df_graph_ids = self._load_from_txt_file(+            location=location,+            filename="MUTAG_graph_indicator.txt",+            names=["graph_id"],+            index_increment=1,

Why add one to the IDs?

PantelisElinas

comment created time in 2 days

Pull request review commentstellargraph/stellargraph

Add MUTAG to datasets

 class AIFB(     source="https://figshare.com/articles/AIFB_DataSet/745364", ):     pass+++class MUTAG(+    DatasetLoader,+    name="MUTAG",+    directory_name="mutag",

I think the zip contains MUTAG as the folder name, i.e., uppercase, not lowercase, which is causing CI to fail:

    directory_name="MUTAG",
PantelisElinas

comment created time in 2 days

Pull request review commentstellargraph/stellargraph

Add MUTAG to datasets

 class AIFB(     source="https://figshare.com/articles/AIFB_DataSet/745364", ):     pass+++class MUTAG(+    DatasetLoader,+    name="MUTAG",+    directory_name="mutag",+    url="https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/MUTAG.zip",+    url_archive_format="zip",+    expected_files=[+        "MUTAG_A.txt",+        "MUTAG_graph_indicator.txt",+        "MUTAG_node_labels.txt",+        "MUTAG_edge_labels.txt",+        "MUTAG_graph_labels.txt",+        "README.txt",+    ],+    description="Each graph represents a chemical compound and graph labels represent 'their mutagenic effect on a specific gram negative bacterium.'"+    "The dataset includes 188 graphs with 18 nodes and 20 edges on average for each graph. Graph nodes have 7 labels and each graph is labelled as belonging to 1 of 2 classes.",+    source="https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/",+):+    def load(self):+        """+        Load this dataset into a list of StellarGraph objects with corresponding labels, downloading it if required.++        Note: Edges in MUTAG are labelled as one of 4 values: aromatic, single, double, and triple indicated by integers+        0, 1, 2, 3 respectively. The edge labels are included in the  :class:`StellarGraph` objects as edge weights in+        integer representation.++        Returns:+            A tuple that is a list of :class:`StellarGraph` objects and a Pandas Series of labels one for each graph.+        """+        self.download()+        return self._load_from_location(location=self.data_directory)++    def _load_from_txt_file(+        self, location, filename, names=None, dtype=None, index_increment=None+    ):+        df = pd.read_csv(+            os.path.join(location, filename),+            header=None,+            index_col=False,+            dtype=dtype,+            names=names,+        )+        if index_increment:+            df.index = df.index + index_increment+        return df++    def _load_from_location(self, location):

BlogCatalog3 only took the approach of having a separate _load_from_location function because it was a good way to maintain compatibility with the load_dataset_BlogCatalog3 without duplication. In this case, I think it would be clearer to not have a separate location parameter, and just use self.data_directory, or, better, self._resolve_path directly:

For instance, this _load_from_location function could be folded into load:

def load(self):
    """..."""
    self.download()
    
    df_graph = self._load_from_txt_file(...)
    ...

And, _load_from_txt_file could become:

def _load_from_txt_file(self, filename, ...):
    pd.read_csv(self._resolve_path(filename), ...)
    ...
PantelisElinas

comment created time in 2 days

push eventstellargraph/stellargraph

Tim Pitman

commit sha a813bd308c62a004eed868d0963963c61b5d7776

Restore demo script for link prediction via random walks (#895) (#935)

view details

Kieran Ricardo

commit sha 55402cf2d6dbf17d5f61bd286b06a81efeca9898

moved the GraphWave "args" doc into the main class docstring (#944) * moved the GraphWave "args" doc into the main class docstring * fix formatting issue

view details

Kieran Ricardo

commit sha 6880c6d0f5c0c188ab63b601c5c7f3d5e8cec6ed

Directed graphsage link generator (#879) * added support for link prediction with directed graphsage * added demo for directed graphsage * updated demo + formatting * DirectedGraphSageLinkGenerator unit tests * typo fixes * typo fixes * typo fixes * edge_splitter bugfix * review suggestions * review suggestions * black formatting * added Edmonds algorithm for directed minimum spanning tree * Update CHANGELOG.md Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update stellargraph/data/edge_splitter.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * - made more variables private in `DirectedGraphSAGELinkGenerator` - updated stellargraph creation in demo - refactored tests * black formatting * merged develop * fixed edge_splitter to handle directed edges * reverted `DirectedGraphSAGELinkGenerator` `in_sample` and `out_samples` to public for use with `DirectedGraphSAGE` * edited demo handle predictions with directed edges * update DirectedGaphSAGELinkGenerator to use SeededPerBatch * updated parallelism * removed changes to `edge_splitter` and removed demo (will be implemented and tested in separate PR) * decreased parallelism Co-authored-by: Huon Wilson <wilson.huon@gmail.com>

view details

Huon Wilson

commit sha 7647b8751dc965d7b25cabe896768d8a43e9bd0f

Use path: . instead of url/sha256 to define conda package (#941) Using `path: .` makes it easy to build a package from the current state of the source tree, without having to first upload to PyPI, and then pull out the URL and sha256 hash of the uploaded package. As a test, `conda build .` is successful with this change. This does rely on the tree being the appropriate version to be published, but this should be true for someone following the release procedure. See: #742

view details

Kieran Ricardo

commit sha 7f9f45d9c1f4d5f4e327d89fb88710a500639bad

Feature/graphwave chebyshev (#920) * minimal implementation of graphwave * added support for multi-scale wavelets * added in option to select nodes in GraphWaveGenerator.flow() * added docstrings and black formatting * added GraphWave demo. * - fixed laplacian calculation - refactored embedding calculation - added `scales="auto"` option * added GraphWave into api.txt, added experimental tag to GraphWave, and added GraphWave info into the readme * connected issue to experimental tag * docstring typo fix * added copyright header * formatted demo notebook * review suggestions * review suggestions * black formatting * demo formatting * change GraphWave generator to use StellarGraph node indexing * increased parallelism to 33 * removed unnecessary data copying and other optimizations * removed unnecessary data copying and other optimizations * black formatting * added GraphWave auto eigenvalue search * removed option for auto-calculating scales * added documentation about removal of automatic scale calculation * Update stellargraph/mapper/graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * review suggestions * review suggestions * demo formatting * add `min_delta` user paramater to GraphWave * tweak to ensure no eigenvalue is missed * black formatting * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * added the option for using Chebyshev polynomials in GraphWave * review suggestions: - explanation of -1e-3 parameter - default values for `scales` parameters + explanation of how changing `scales` will affect the embeddings - filter out nan eigen values and vectors - demo improvements * demo formatting * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update stellargraph/mapper/graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * demo tweaks + note about differences with paper * refactored _chebyshev * removed eigs method + added a unit test for `_chebyshev` * made chebyshev more numerically stable * added unit tests + fixes to GraphWave to pass tests * black formatting * removed experimental tags + edited changelog * rename demo notebook * edit doc string * update demo * added the option to cache the GraphWave embeddings in storage * black formatting * copyright headers * Update stellargraph/mapper/graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update tests/mapper/test_graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update tests/mapper/test_graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update stellargraph/mapper/graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * nested graphwave in changelog * replace `deg` with `degree` * caste arrays in test to np.float32 * caste coefficient arrays to float32 immediately * replaced `np.isclose(..).all()` with `np.array_equals` and `np.allclose()` * removed equality with bools * removed unecessary scalar multiplication * added issue ref no to changelog * updated docstring * updated docstring * updated demo * created integer validation function + refactored graphwave tests * refactored tests to not use boradcasting * updated graphwave tests * refactored `_chebyshev` to be more readable * referenced paper in `_chebyshev` * removed `cache_filename` param + blacking * updated params in demo * fixed docstring and edited `require_integer_in_range` error msg * improved documentation * Update stellargraph/mapper/graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update CHANGELOG.md Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * improved `require_integer_in_range` * black formatting Co-authored-by: Huon Wilson <wilson.huon@gmail.com>

view details

kevin

commit sha d80f6825fec22ca9005cec9ac08a565c70ce3d4b

Update release procedure (#943) Major update to release procedure here is the change in the order of steps - PyPi and Conda publishing should now be done before pushing to master. This is so that any problems with the publishing steps can be resolved before packaging the release on GitHub. This also adds some missing items from previous experience with the release procedure. Some of the wording and formatting also updated in the document for clarity. Part of #940

view details

Huon Wilson

commit sha dc71b882e59a0a26b1e9e5945b13f18673af4dac

Add CHANGELOG entries for new work since 0.9 (#930) This is created by going over the list of changes since 0.9 and added changelog entries for the interesting ones. It also adds a table with a basic comparison of the memory use and construction time of the new `StellarGraph` object.

view details

Huon Wilson

commit sha 4ffcf7e32806007f199258f7c8625480fb661508

Move modules from `stellargraph.utils` to top-level ones (#938) This moves most of the code within `stellargraph.utils` to dedicated top-level modules, because `utils` is an overly generic name. The old paths still work, but have runtime warnings to aid the migration. This is done by overriding `__getattr__` for the `stellargraph.utils` module (unfortunately this cannot use [PEP 562](https://www.python.org/dev/peps/pep-0562/) because that is Python 3.7+, and we support 3.6), and replacing each of the submodule files with, essentially: ```python warnings.warn(...) from path.to.new_module import * ``` (Unfortunately, I think these have to be files with the same name, so these new small files are stopping git from detecting the moves: all of the large new files are exact moves, with no other changes.) These two forms are both needed because there's two possibilities for how users might be using this: 1. `import stellargraph.utils` followed by `utils.foo`: calls `__getattr__` if it's not otherwise found 2. a direct import of a submodule `import stellargraph.utils.foo`: I believe this just does a file system traversal and opens/executes the given file (i.e. `stellargraph/utils/foo.py` or `stellargraph/utils/foo/__init__.py`) to create the submodule directly (rather than getting attributes of parent modules), and so needs the handling in each file individually Full list of changes: | old | new | |-----------------------------------------------|----------------------------------------------------------------------| | stellargraph.utils.calibration | stellargraph.calibration | | stellargraph.utils.IsotonicCalibration | stellargraph.calibration.IsotonicCalibration | | stellargraph.utils.TemperatureCalibration | stellargraph.calibration.TemperatureCalibration | | stellargraph.utils.expected_calibration_error | stellargraph.calibration.expected_calibration_error | | stellargraph.utils.plot_reliability_diagram | stellargraph.calibration.plot_reliability_diagram | | stellargraph.utils.ensemble | stellargraph.ensemble | | stellargraph.utils.BaggingEnsemble | stellargraph.ensemble.BaggingEnsemble | | stellargraph.utils.Ensemble | stellargraph.ensemble.Ensemble | | stellargraph.utils.saliency_maps | stellargraph.interpretability.saliency_maps | | stellargraph.utils.integrated_gradients | stellargraph.interpretability.saliency_maps.integrated_gradients | | stellargraph.utils.integrated_gradients_gat | stellargraph.interpretability.saliency_maps.integrated_gradients_gat | | stellargraph.utils.saliency_gat | stellargraph.interpretability.saliency_maps.saliency_gat | | stellargraph.utils.GradientSaliencyGAT | stellargraph.interpretability.GradientSaliencyGAT | | stellargraph.utils.IntegratedGradients | stellargraph.interpretability.IntegratedGradients | | stellargraph.utils.IntegratedGradientsGAT | stellargraph.interpretability.IntegratedGradientsGAT | Computed by adding the following to `stellargraph/utils/__init__.py`: ``` for name, (new_module_name, new_value) in sorted( _MAPPING.items(), key=lambda x: ("stellargraph." + x[1][0] + "zzz", x[0]) if x[1][0] is not None else (x[1][1].__name__, "") ): if isinstance(new_value, types.ModuleType): new_location = new_value.__name__ else: new_location = f"stellargraph.{new_module_name}.{new_value.__name__}" print(f"| `stellargraph.utils.{name}` | `{new_location}` |") raise ValueError() ``` See: #909

view details

Huon Wilson

commit sha b090f64937900e51aac7bfd737d34a51aa443b17

Use https for github links in the changelog, not http (#949) These were mistakenly written as 'http' links in #930.

view details

Kieran Ricardo

commit sha 52c2b5ce1b4592336108e36b703d4acf1f71b4ea

bumped version (#952)

view details

Huon Wilson

commit sha ba2a696bb3c8db4f87738d40a323a1cfb7c483f5

Use scores for computing ROC curve, not labels, in YELP example (#950) The ROC AUC reported by the example goes from ~0.9 to ~0.99 with this change. See: #428

view details

Huon Wilson

commit sha 3cf4b99d59cf9dc21c1722ace9c03aae69a7c003

Add MovieLens.load to parse and encode the movielens dataset (#947) This adds a `load` function that returns: - a StellarGraph containing the users (with IDs `u_...`), and movies (IDs `m_...`) as well as "rating" edges, where the features in the users nodes have been encoded and normalised - a pandas DataFrame containing the edges (as in `user_id` and `movie_id`) and their rating label to use for training/testing Example first few rows of each file for reference: `u.data`: ``` 196 242 3 881250949 186 302 3 891717742 22 377 1 878887116 ``` `u.item`: ``` 1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0 2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0 3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0 ``` `u.user`: ``` 1|24|M|technician|85711 2|53|F|other|94043 3|23|M|writer|32067 ``` See: #812

view details

kieranricardo

commit sha e13075bd6eccc0d01fedd489f02a895cb2acc6d6

Release 0.10.0

view details

kieranricardo

commit sha 3b4413ab928a776e443bbc51e6f7603d63b72134

Bump version

view details

kieranricardo

commit sha 25f66b8c588cddd4683ad5f48d7f832af91a5249

Merge remote-tracking branch 'origin/develop' into develop

view details

Huon Wilson

commit sha 1458135c33f235867a85ff4a58c0d966c7545740

Merge remote-tracking branch 'origin/develop' into feature/755-distmult

view details

Huon Wilson

commit sha 64d2ee049caaffa138f6803d1a8492f6a907eec5

Fixes

view details

push time in 2 days

Pull request review commentstellargraph/stellargraph

Implement DistMult

 def build(self):         x_out = self(x_inp)          return x_inp, x_out+++class DistMultScore(Layer):+    """+    DistMult scoring Keras layer.++    Original Paper: Embedding Entities and Relations for Learning and Inference in Knowledge+    Bases. Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, Li Deng. ICLR 2015++    This combines subject, relation and object embeddings into a score of the likelihood of the+    link.+    """++    def __init__(self, *args, **kwargs):+        super().__init__(*args, **kwargs)++    def build(self, input_shape):+        self.built = True++    def call(self, inputs):+        e1, r, e2 = inputs+        # y_(e_1)^T M_r y_(e_2), where M_r = diag(w_r) is a diagonal matrix+        score = tf.reduce_sum(e1 * r * e2, axis=2)+        return score+++@experimental(reason="results from the reference paper have not been reproduced yet")+class DistMult:+    """+    Embedding layers and a DistMult scoring layers that implement the DistMult knowledge graph+    embedding algorithm as in https://arxiv.org/pdf/1412.6575.pdf++    Args:+        generator (KGTripleGenerator): A generator of triples to feed into the model.++        k (int): the dimension of the embedding (that is, a vector in R^k is learnt for each node+            and each link type)++        embedding_initializer (str or func, optional): The initialiser to use for the embeddings.++        embedding_regularizer (str or func, optional): The regularizer to use for the embeddings.+    """++    def __init__(+        self, generator, k, embedding_initializer=None, embedding_regularizer=None,+    ):+        if not isinstance(generator, KGTripleGenerator):+            raise TypeError(+                f"generator: expected KGTripleGenerator, found {type(generator).__name__}"+            )++        graph = generator.G+        self.num_nodes = graph.number_of_nodes()+        self.num_edge_types = len(graph._edges.types)+        self.k = k

Much better.

huonw

comment created time in 2 days

Pull request review commentstellargraph/stellargraph

Implement DistMult

 def build(self):         x_out = self(x_inp)          return x_inp, x_out+++class DistMultScore(Layer):+    """+    DistMult scoring Keras layer.++    Original Paper: Embedding Entities and Relations for Learning and Inference in Knowledge+    Bases. Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, Li Deng. ICLR 2015++    This combines subject, relation and object embeddings into a score of the likelihood of the+    link.+    """++    def __init__(self, *args, **kwargs):+        super().__init__(*args, **kwargs)++    def build(self, input_shape):+        self.built = True++    def call(self, inputs):+        e1, r, e2 = inputs+        # y_(e_1)^T M_r y_(e_2), where M_r = diag(w_r) is a diagonal matrix+        score = tf.reduce_sum(e1 * r * e2, axis=2)+        return score+++@experimental(reason="results from the reference paper have not been reproduced yet")+class DistMult:+    """+    Embedding layers and a DistMult scoring layers that implement the DistMult knowledge graph+    embedding algorithm as in https://arxiv.org/pdf/1412.6575.pdf++    Args:+        generator (KGTripleGenerator): A generator of triples to feed into the model.++        k (int): the dimension of the embedding (that is, a vector in R^k is learnt for each node+            and each link type)++        embedding_initializer (str or func, optional): The initialiser to use for the embeddings.++        embedding_regularizer (str or func, optional): The regularizer to use for the embeddings.+    """++    def __init__(+        self, generator, k, embedding_initializer=None, embedding_regularizer=None,

I've gone with uniform to match TF's default.

huonw

comment created time in 2 days

Pull request review commentstellargraph/stellargraph

Implement DistMult

 def build(self):         x_out = self(x_inp)          return x_inp, x_out+++class DistMultScore(Layer):+    """+    DistMult scoring Keras layer.++    Original Paper: Embedding Entities and Relations for Learning and Inference in Knowledge+    Bases. Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, Li Deng. ICLR 2015++    This combines subject, relation and object embeddings into a score of the likelihood of the+    link.+    """++    def __init__(self, *args, **kwargs):+        super().__init__(*args, **kwargs)++    def build(self, input_shape):+        self.built = True++    def call(self, inputs):+        e1, r, e2 = inputs+        # y_(e_1)^T M_r y_(e_2), where M_r = diag(w_r) is a diagonal matrix+        score = tf.reduce_sum(e1 * r * e2, axis=2)+        return score+++@experimental(reason="results from the reference paper have not been reproduced yet")+class DistMult:+    """+    Embedding layers and a DistMult scoring layers that implement the DistMult knowledge graph+    embedding algorithm as in https://arxiv.org/pdf/1412.6575.pdf++    Args:+        generator (KGTripleGenerator): A generator of triples to feed into the model.++        k (int): the dimension of the embedding (that is, a vector in R^k is learnt for each node+            and each link type)++        embedding_initializer (str or func, optional): The initialiser to use for the embeddings.++        embedding_regularizer (str or func, optional): The regularizer to use for the embeddings.+    """++    def __init__(+        self, generator, k, embedding_initializer=None, embedding_regularizer=None,

Oh, I forgot to update this after doing it for ComplEx. Thanks.

huonw

comment created time in 2 days

Pull request review commentstellargraph/stellargraph

Implement DistMult

 def build(self):         x_out = self(x_inp)          return x_inp, x_out+++class DistMultScore(Layer):+    """+    DistMult scoring Keras layer.++    Original Paper: Embedding Entities and Relations for Learning and Inference in Knowledge+    Bases. Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, Li Deng. ICLR 2015++    This combines subject, relation and object embeddings into a score of the likelihood of the+    link.+    """++    def __init__(self, *args, **kwargs):+        super().__init__(*args, **kwargs)++    def build(self, input_shape):+        self.built = True++    def call(self, inputs):+        e1, r, e2 = inputs+        # y_(e_1)^T M_r y_(e_2), where M_r = diag(w_r) is a diagonal matrix+        score = tf.reduce_sum(e1 * r * e2, axis=2)+        return score+++@experimental(reason="results from the reference paper have not been reproduced yet")+class DistMult:+    """+    Embedding layers and a DistMult scoring layers that implement the DistMult knowledge graph+    embedding algorithm as in https://arxiv.org/pdf/1412.6575.pdf++    Args:+        generator (KGTripleGenerator): A generator of triples to feed into the model.++        k (int): the dimension of the embedding (that is, a vector in R^k is learnt for each node+            and each link type)++        embedding_initializer (str or func, optional): The initialiser to use for the embeddings.++        embedding_regularizer (str or func, optional): The regularizer to use for the embeddings.+    """++    def __init__(+        self, generator, k, embedding_initializer=None, embedding_regularizer=None,

Ah, yeah, nice catch. 👍

huonw

comment created time in 2 days

push eventstellargraph/stellargraph

Huon Wilson

commit sha 2eb67110a63a7238ccdd9a0ce0ab74d0b6e3a258

Add buildkite annotation when notebook parallelism setting is wrong (#916) If the number of notebooks doesn't match the `parallelism` setting used for testing notebooks, all of those steps will fail, and one needs to go into them to deduce what the problem was, and then interpret the message. This PR (a) rewrites the message to be a bit clearer, including mentioning the common case of needing to merge with develop, and (b) adds it as an annotation (https://buildkite.com/docs/agent/v3/cli-annotate) at the top of the build for convenience.

view details

Huon Wilson

commit sha a01bc2a266df545604967a6dfee9f4dd55af8dfd

Provide convenient link for seeing CI notebooks on nbviewer (#915) This parses the `buildkite-agent artifact upload` output (https://buildkite.com/docs/agent/v3/cli-artifact#uploading-artifacts) to pull out the artifact UUID, and then constructs the appropriate https://nbviewer.jupyter.org/ URL for seeing the notebook rendered. If the notebook failed, this information is also added as an annotation (https://buildkite.com/docs/agent/v3/cli-annotate) at the top of the build, with clickable links, so one doesn't need to dig through the logs of each individual step. For example, https://buildkite.com/stellar/stellargraph-public/builds/1631 has two annotations, one of which looks like: > Notebook cluster-gcn-node-classification.ipynb had an error: [failed job][1], > [rendered notebook][2] > > [1]: https://buildkite.com/stellar/stellargraph-public/builds/1631#4f6791c9-3bd3-4b37-a8a3-afcd654c0b96 > [2]: https://nbviewer.jupyter.org/urls/buildkite.com/organizations/stellar/pipelines/stellargraph-public/builds/1631/jobs/4f6791c9-3bd3-4b37-a8a3-afcd654c0b96/artifacts/61446afa-8bd2-43c2-bf5c-9d5f88a9191b (An alternative implementation strategy would be to read the artifact ID using the artifacts API (https://buildkite.com/docs/apis/rest-api/artifacts), however having a token and JSON parsing code just for this seems somewhat like complexity we don't need.) See: #914

view details

Huon Wilson

commit sha 0b5b7746ac5a15a9a4e3f9d82f9969ebb4529754

Add a 'plot_history' function for viewing metrics across epochs (#902) This factors out the `plot_history` function that has been, essentially, copy-and-pasted into every notebook into a function in the library. It places it in `stellargraph.utils.plot_history` (in `utils` because there's not an obvious better place at the moment: related to #909) The plot allows for multiple histories (for ensemble models) and validation history (optional). It puts all of the plots in a column, sharing the x-axis (epoch number). This doesn't rerun all of the notebooks that were updated: - `rgcn-aifb-node-classification-example.ipynb` fails: #911 - `hateful-twitters.ipynb` fails: #912 - `directed-graphsage-on-cora-neo4j-example.ipynb` and `undirected-graphsage-on-cora-neo4j-example.ipynb`: I haven't got Neo4j running locally (relates to #849 and #904) See: #898

view details

Huon Wilson

commit sha 20b299e86a3fb4512edaa743f02ca311d13bf88d

Use SeededPerBatch in KGTripleSequence (#924) Now that #844 is merged, `KGTripleSequence` can use `SeededPerBatch` instead of manually reproducing the appropriate random seed separation.

view details

Huon Wilson

commit sha 76df74542ccc6efa78f7796c75129acfd00affcb

Sort notebooks before deciding which to run in each step on CI (#918) I think #917 is caused by different steps getting different orders of the `$NOTEBOOKS` array, due to differing orders from `find`. This would mean that some steps might be selecting a notebook that was run by a different step and skipping over the notebook they were meant to be running. For example, consider build 1643 (https://buildkite.com/stellar/stellargraph-public/builds/1643), which is the same code as `develop`, but prints out the array of notebooks in each notebook step to allow for comparisons. - in notebook step 0 https://buildkite.com/stellar/stellargraph-public/builds/1643#b3bbd052-b8d9-4574-9f7a-bfc93dcb4304/162-277 the list of notebooks goes: - ... - `gcn-cora-node-classification-example.ipynb` - **`directed-graphsage-on-cora-example.ipynb`** - `graphsage-cora-node-classification-example.ipynb` - `graphsage-pubmed-inductive-node-classification-example.ipynb` - `stellargraph-node2vec-node-classification.ipynb` - ... - in notebook step 3 https://buildkite.com/stellar/stellargraph-public/builds/1643#a0a520f4-aa28-47fb-b224-ea8deeb0b2a9/162-277 it goes: - ... - `gcn-cora-node-classification-example.ipynb` - `graphsage-cora-node-classification-example.ipynb` - `graphsage-pubmed-inductive-node-classification-example.ipynb` - **`directed-graphsage-on-cora-example.ipynb`** - `stellargraph-node2vec-node-classification.ipynb` - ... That is, the position of `directed-graphsage-on-cora-example.ipynb` in the `NOTEBOOKS` array has changed. In that particular build, steps 6, 22 and 25 have the same order as step 3, step 12 has `load-cora-into-neo4j.ipynb` move, step 30 has `node-link-importance-demo-gcn.ipynb` move and step 33 has both `calibration-pubmed-link-prediction.ipynb` and `ensemble-link-prediction-example.ipynb` move. It seems that build got lucky, and those moves didn't affect which notebooks were run (that is, the mini script in #917 didn't find any differences), however they definitely could have done so. If we sort by the file path, we'll always get a consistent order in the array, so the `f=${NOTEBOOKS[$INDEX]}` selection will always give (a) the same result for same `$INDEX` values, and (b) different notebooks for different `$INDEX` values. Empirical verification: I did 10 builds with sorting on CI (1651, 1655, 1656, 1657, 1658, 1659, 1660, 1661, 1662, 1663), and checked them all with the mini script in #917; all of them built exactly the notebooks they should build. Specifically, the following printed no diff output, and thus all of the input pairs (list of expected notebooks, and list of artifacts) are identical. ```bash t=<your buildkite token> bad_notebooks='attacks_clustering_analysis.ipynb\|hateful-twitters-interpretability.ipynb\|hateful-twitters.ipynb\|stellargraph-attri2vec-DBLP.ipynb\|node-link-importance-demo-gat.ipynb\|node-link-importance-demo-gcn.ipynb\|node-link-importance-demo-gcn-sparse.ipynb\|rgcn-aifb-node-classification-example.ipynb\|directed-graphsage-on-cora-neo4j-example.ipynb\|undirected-graphsage-on-cora-neo4j-example.ipynb\|load-cora-into-neo4j.ipynb\|stellargraph-metapath2vec.ipynb' for b in 1651 1655 1656 1657 1658 1659 1660 1661 1662 1663; do diff -U100 \ <(find . -name '*.ipynb' '!' -path '*.ipynb_checkpoints*' | xargs basename | grep -v "$bad_notebooks" | sort) \ <(curl -H "Authorization: Bearer $t" "https://api.buildkite.com/v2/organizations/stellar/pipelines/stellargraph-public/builds/$b/artifacts" | jq -r '.[].filename' | grep ipynb | sort) done ``` See: #917

view details

Tim Pitman

commit sha a813bd308c62a004eed868d0963963c61b5d7776

Restore demo script for link prediction via random walks (#895) (#935)

view details

Kieran Ricardo

commit sha 55402cf2d6dbf17d5f61bd286b06a81efeca9898

moved the GraphWave "args" doc into the main class docstring (#944) * moved the GraphWave "args" doc into the main class docstring * fix formatting issue

view details

Kieran Ricardo

commit sha 6880c6d0f5c0c188ab63b601c5c7f3d5e8cec6ed

Directed graphsage link generator (#879) * added support for link prediction with directed graphsage * added demo for directed graphsage * updated demo + formatting * DirectedGraphSageLinkGenerator unit tests * typo fixes * typo fixes * typo fixes * edge_splitter bugfix * review suggestions * review suggestions * black formatting * added Edmonds algorithm for directed minimum spanning tree * Update CHANGELOG.md Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update stellargraph/data/edge_splitter.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * - made more variables private in `DirectedGraphSAGELinkGenerator` - updated stellargraph creation in demo - refactored tests * black formatting * merged develop * fixed edge_splitter to handle directed edges * reverted `DirectedGraphSAGELinkGenerator` `in_sample` and `out_samples` to public for use with `DirectedGraphSAGE` * edited demo handle predictions with directed edges * update DirectedGaphSAGELinkGenerator to use SeededPerBatch * updated parallelism * removed changes to `edge_splitter` and removed demo (will be implemented and tested in separate PR) * decreased parallelism Co-authored-by: Huon Wilson <wilson.huon@gmail.com>

view details

Huon Wilson

commit sha 7647b8751dc965d7b25cabe896768d8a43e9bd0f

Use path: . instead of url/sha256 to define conda package (#941) Using `path: .` makes it easy to build a package from the current state of the source tree, without having to first upload to PyPI, and then pull out the URL and sha256 hash of the uploaded package. As a test, `conda build .` is successful with this change. This does rely on the tree being the appropriate version to be published, but this should be true for someone following the release procedure. See: #742

view details

Kieran Ricardo

commit sha 7f9f45d9c1f4d5f4e327d89fb88710a500639bad

Feature/graphwave chebyshev (#920) * minimal implementation of graphwave * added support for multi-scale wavelets * added in option to select nodes in GraphWaveGenerator.flow() * added docstrings and black formatting * added GraphWave demo. * - fixed laplacian calculation - refactored embedding calculation - added `scales="auto"` option * added GraphWave into api.txt, added experimental tag to GraphWave, and added GraphWave info into the readme * connected issue to experimental tag * docstring typo fix * added copyright header * formatted demo notebook * review suggestions * review suggestions * black formatting * demo formatting * change GraphWave generator to use StellarGraph node indexing * increased parallelism to 33 * removed unnecessary data copying and other optimizations * removed unnecessary data copying and other optimizations * black formatting * added GraphWave auto eigenvalue search * removed option for auto-calculating scales * added documentation about removal of automatic scale calculation * Update stellargraph/mapper/graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * review suggestions * review suggestions * demo formatting * add `min_delta` user paramater to GraphWave * tweak to ensure no eigenvalue is missed * black formatting * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * added the option for using Chebyshev polynomials in GraphWave * review suggestions: - explanation of -1e-3 parameter - default values for `scales` parameters + explanation of how changing `scales` will affect the embeddings - filter out nan eigen values and vectors - demo improvements * demo formatting * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update demos/embeddings/barbell-graphwave.ipynb Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update stellargraph/mapper/graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * demo tweaks + note about differences with paper * refactored _chebyshev * removed eigs method + added a unit test for `_chebyshev` * made chebyshev more numerically stable * added unit tests + fixes to GraphWave to pass tests * black formatting * removed experimental tags + edited changelog * rename demo notebook * edit doc string * update demo * added the option to cache the GraphWave embeddings in storage * black formatting * copyright headers * Update stellargraph/mapper/graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update tests/mapper/test_graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update tests/mapper/test_graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update stellargraph/mapper/graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * nested graphwave in changelog * replace `deg` with `degree` * caste arrays in test to np.float32 * caste coefficient arrays to float32 immediately * replaced `np.isclose(..).all()` with `np.array_equals` and `np.allclose()` * removed equality with bools * removed unecessary scalar multiplication * added issue ref no to changelog * updated docstring * updated docstring * updated demo * created integer validation function + refactored graphwave tests * refactored tests to not use boradcasting * updated graphwave tests * refactored `_chebyshev` to be more readable * referenced paper in `_chebyshev` * removed `cache_filename` param + blacking * updated params in demo * fixed docstring and edited `require_integer_in_range` error msg * improved documentation * Update stellargraph/mapper/graphwave_generator.py Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * Update CHANGELOG.md Co-Authored-By: Huon Wilson <Huon.Wilson@data61.csiro.au> * improved `require_integer_in_range` * black formatting Co-authored-by: Huon Wilson <wilson.huon@gmail.com>

view details

kevin

commit sha d80f6825fec22ca9005cec9ac08a565c70ce3d4b

Update release procedure (#943) Major update to release procedure here is the change in the order of steps - PyPi and Conda publishing should now be done before pushing to master. This is so that any problems with the publishing steps can be resolved before packaging the release on GitHub. This also adds some missing items from previous experience with the release procedure. Some of the wording and formatting also updated in the document for clarity. Part of #940

view details

Huon Wilson

commit sha dc71b882e59a0a26b1e9e5945b13f18673af4dac

Add CHANGELOG entries for new work since 0.9 (#930) This is created by going over the list of changes since 0.9 and added changelog entries for the interesting ones. It also adds a table with a basic comparison of the memory use and construction time of the new `StellarGraph` object.

view details

Huon Wilson

commit sha 4ffcf7e32806007f199258f7c8625480fb661508

Move modules from `stellargraph.utils` to top-level ones (#938) This moves most of the code within `stellargraph.utils` to dedicated top-level modules, because `utils` is an overly generic name. The old paths still work, but have runtime warnings to aid the migration. This is done by overriding `__getattr__` for the `stellargraph.utils` module (unfortunately this cannot use [PEP 562](https://www.python.org/dev/peps/pep-0562/) because that is Python 3.7+, and we support 3.6), and replacing each of the submodule files with, essentially: ```python warnings.warn(...) from path.to.new_module import * ``` (Unfortunately, I think these have to be files with the same name, so these new small files are stopping git from detecting the moves: all of the large new files are exact moves, with no other changes.) These two forms are both needed because there's two possibilities for how users might be using this: 1. `import stellargraph.utils` followed by `utils.foo`: calls `__getattr__` if it's not otherwise found 2. a direct import of a submodule `import stellargraph.utils.foo`: I believe this just does a file system traversal and opens/executes the given file (i.e. `stellargraph/utils/foo.py` or `stellargraph/utils/foo/__init__.py`) to create the submodule directly (rather than getting attributes of parent modules), and so needs the handling in each file individually Full list of changes: | old | new | |-----------------------------------------------|----------------------------------------------------------------------| | stellargraph.utils.calibration | stellargraph.calibration | | stellargraph.utils.IsotonicCalibration | stellargraph.calibration.IsotonicCalibration | | stellargraph.utils.TemperatureCalibration | stellargraph.calibration.TemperatureCalibration | | stellargraph.utils.expected_calibration_error | stellargraph.calibration.expected_calibration_error | | stellargraph.utils.plot_reliability_diagram | stellargraph.calibration.plot_reliability_diagram | | stellargraph.utils.ensemble | stellargraph.ensemble | | stellargraph.utils.BaggingEnsemble | stellargraph.ensemble.BaggingEnsemble | | stellargraph.utils.Ensemble | stellargraph.ensemble.Ensemble | | stellargraph.utils.saliency_maps | stellargraph.interpretability.saliency_maps | | stellargraph.utils.integrated_gradients | stellargraph.interpretability.saliency_maps.integrated_gradients | | stellargraph.utils.integrated_gradients_gat | stellargraph.interpretability.saliency_maps.integrated_gradients_gat | | stellargraph.utils.saliency_gat | stellargraph.interpretability.saliency_maps.saliency_gat | | stellargraph.utils.GradientSaliencyGAT | stellargraph.interpretability.GradientSaliencyGAT | | stellargraph.utils.IntegratedGradients | stellargraph.interpretability.IntegratedGradients | | stellargraph.utils.IntegratedGradientsGAT | stellargraph.interpretability.IntegratedGradientsGAT | Computed by adding the following to `stellargraph/utils/__init__.py`: ``` for name, (new_module_name, new_value) in sorted( _MAPPING.items(), key=lambda x: ("stellargraph." + x[1][0] + "zzz", x[0]) if x[1][0] is not None else (x[1][1].__name__, "") ): if isinstance(new_value, types.ModuleType): new_location = new_value.__name__ else: new_location = f"stellargraph.{new_module_name}.{new_value.__name__}" print(f"| `stellargraph.utils.{name}` | `{new_location}` |") raise ValueError() ``` See: #909

view details

Huon Wilson

commit sha b090f64937900e51aac7bfd737d34a51aa443b17

Use https for github links in the changelog, not http (#949) These were mistakenly written as 'http' links in #930.

view details

Kieran Ricardo

commit sha 52c2b5ce1b4592336108e36b703d4acf1f71b4ea

bumped version (#952)

view details

Huon Wilson

commit sha ba2a696bb3c8db4f87738d40a323a1cfb7c483f5

Use scores for computing ROC curve, not labels, in YELP example (#950) The ROC AUC reported by the example goes from ~0.9 to ~0.99 with this change. See: #428

view details

Huon Wilson

commit sha 3cf4b99d59cf9dc21c1722ace9c03aae69a7c003

Add MovieLens.load to parse and encode the movielens dataset (#947) This adds a `load` function that returns: - a StellarGraph containing the users (with IDs `u_...`), and movies (IDs `m_...`) as well as "rating" edges, where the features in the users nodes have been encoded and normalised - a pandas DataFrame containing the edges (as in `user_id` and `movie_id`) and their rating label to use for training/testing Example first few rows of each file for reference: `u.data`: ``` 196 242 3 881250949 186 302 3 891717742 22 377 1 878887116 ``` `u.item`: ``` 1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0 2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0 3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0 ``` `u.user`: ``` 1|24|M|technician|85711 2|53|F|other|94043 3|23|M|writer|32067 ``` See: #812

view details

kieranricardo

commit sha e13075bd6eccc0d01fedd489f02a895cb2acc6d6

Release 0.10.0

view details

kieranricardo

commit sha 3b4413ab928a776e443bbc51e6f7603d63b72134

Bump version

view details

kieranricardo

commit sha 25f66b8c588cddd4683ad5f48d7f832af91a5249

Merge remote-tracking branch 'origin/develop' into develop

view details

push time in 2 days

more