profile
viewpoint
Robert Coup rcoup @koordinates Edinburgh, Scotland https://koordinates.com @koordinates founder, open source contributor, map geek, sailing enthusiast. Formerly of Auckland, New Zealand.

linz/linz_basemaps 5

Code, imagery, and data needed to recreate NZ Terrain basemaps found on LINZ Data Service (LDS)

rcoup/carto 1

hyperspeed CSS-like map styling

koordinates/chargebee-python 0

Python wrapper for the ChargeBee API

koordinates/libgit2 0

A cross-platform, linkable library implementation of Git that you can use in your application.

koordinates/pygit2 0

Python bindings for libgit2

koordinates/wal2json 0

PostgreSQL Logical Replication JSON output plugin

linz/lds-bde-loader 0

Manages loading Landonline data updates into LINZ Data Service

rcoup/auditwheel 0

Auditing and relabeling cross-distribution Linux wheels.

rcoup/baseimage-docker 0

A minimal Ubuntu base image modified for Docker-friendliness

rcoup/bottle 0

bottle.py is a fast and simple micro-framework for python web-applications.

Pull request review commentkoordinates/sno

Shard CI test on windows into 10 - green windows ✅

 test: $(app) venv\.test.installed  ci-test: 	set CI=true-	pytest \-		--verbose \-		-p no:sugar \-		--cov-report term \-		--cov-report html:test-results\coverage/ \-		--junit-xml=test-results\junit.xml \-		--benchmark-enable \-		-p no:xdist+	FOR /L %%I IN (0, 1, 9) DO \+		pytest \+			--num-shards 10 \

is all good :)

olsen232

comment created time in 11 hours

PullRequestReviewEvent

Pull request review commentkoordinates/sno

Shard CI test on windows into 10 - green windows ✅

 test: $(app) venv\.test.installed  ci-test: 	set CI=true-	pytest \-		--verbose \-		-p no:sugar \-		--cov-report term \-		--cov-report html:test-results\coverage/ \-		--junit-xml=test-results\junit.xml \-		--benchmark-enable \-		-p no:xdist+	FOR /L %%I IN (0, 1, 9) DO \+		pytest \+			--num-shards 10 \

If it's using hash() then I think it's only guaranteed to be deterministic in the same interpreter instance? Though PYTHONHASHSEED affects for scalar objects, hashes of lists/dicts/objects are derived from their memory addresses.

Something based on a hashlib algorithm would be more useful if it's possible, since it'd be repeatable across CI runs.

@craigds Didn't we go through this with dandelion/something?

olsen232

comment created time in a day

PullRequestReviewEvent

create barnchkoordinates/sno

branch : ci-no-brew-cleanup

created branch time in a day

Pull request review commentkoordinates/sno

Shard CI test on windows into 10 - green windows ✅

 test: $(app) venv\.test.installed  ci-test: 	set CI=true-	pytest \-		--verbose \-		-p no:sugar \-		--cov-report term \-		--cov-report html:test-results\coverage/ \-		--junit-xml=test-results\junit.xml \-		--benchmark-enable \-		-p no:xdist+	FOR /L %%I IN (0, 1, 9) DO \+		pytest \+			--num-shards 10 \

are the shards deterministic?

olsen232

comment created time in a day

PullRequestReviewEvent

pull request commentkoordinates/sno

Tidy up sno repo directories

Will review more this morning, but we should check all the files/dirs are properly set to hidden on Windows (which is a file attribute).

need to check this one as well

olsen232

comment created time in 2 days

PullRequestReviewEvent

Pull request review commentkoordinates/sno

Tidy up sno repo directories

 import contextlib+import logging import struct+import subprocess+import sys from pathlib import Path  import pygit2 -from .repository_version import write_repo_version_config-from . import repo_files+from .exceptions import translate_subprocess_exit_code, NotFound, NO_REPOSITORY+from .repository_version import REPO_VERSIONS+from .working_copy import WorkingCopy +L = logging.getLogger("sno.sno_repo") -# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,-# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,-# but since we differ slightly from a standard git repository, there are a few extra things to take care of. +class SnoRepoFiles:+    """Useful files that are found in `sno_repo.gitdir_path`""" -def _append_checksum(data):-    return data + pygit2.hash(data).raw---INDEX_VERSION = 2-EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)+    # Standard git files:+    HEAD = "HEAD"+    INDEX = "index"+    COMMIT_EDITMSG = "COMMIT_EDITMSG"+    ORIG_HEAD = "ORIG_HEAD"+    MERGE_HEAD = "MERGE_HEAD"+    MERGE_MSG = "MERGE_MSG" -# Extension name does not start with A-Z => is a required extension.-LOCKED_SNO_EXTENSION = struct.pack(">4sI", b".sno", 0)+    # Sno-specific files:+    MERGE_INDEX = "MERGE_INDEX"+    MERGE_BRANCH = "MERGE_BRANCH" -# An empty index file, but exteneded with a required ".sno" extension in the extensions section of the index binary-# format. (Not the file extension - the filename is simply "index", it has no file extension.)-# Causes all git commands that would involve the index or working copy to fail with "unsupported extension: .sno" --# in that sense it is "locked" to git. Various techniques can be used to unlock it if certain git functionality is-# needed - eg marking the repository as bare so it is ignored, or removing the unsupported extension. -# See https://git-scm.com/docs/index-format-LOCKED_EMPTY_GIT_INDEX = _append_checksum(EMPTY_GIT_INDEX + LOCKED_SNO_EXTENSION)---def init_repository(repo_root_path, repo_version):+class SnoConfigKeys:     """-    Initialise a new sno repo. A sno repo is basically a git repo, except --    - git internals are stored in .sno instead of .git-      (.git is a file that contains a reference to .sno, this is allowed by git)-    - datasets are stored in /.sno-dataset/ trees according to a particular dataset format version - see DATASETS_v2.md-      but, this only matters once we start committing data. When initialising a repo, these are not yet present.-    - there is a blob called sno.repository.version that contains the dataset format version number - but, this is only-      written when data is first imported. When initialising a repo, this is not yet present.-    - there is property in the repo config called sno.repository.version that contains the dataset format version-      number, which is used until the sno.repository.version blob is written.-    - the .sno/index file has been extended to stop git messing things up - see LOCKED_EMPTY_GIT_INDEX.--    There can also be properties in the repo config relating to the working copy - named sno.workingcopy.* - but-    these are not written by this function, it is up to the caller to configure these as needed.-    See WorkingCopy.write_config()-    """--    repo = pygit2.init_repository(str(repo_root_path.resolve()))-    repo = convert_git_repo_to_sno_repo(repo_root_path)-    write_repo_version_config(repo, repo_version)-    assert is_sno_repo(repo)-    return repo---def convert_git_repo_to_sno_repo(repo_root_path):-    """Given a newly created non-bare git repo, with the following structure:-    repo_root_path/  # also worktree path-        .git/-            HEAD-            ...      # objects, refs, etc-    Converts it to a sno repo with the following structure:-    repo_root_path/  # also worktree path, as before-        .git         # contains "gitdir: .sno"-        .sno/-            HEAD-            index    # Locked using .sno extension-            ...      # etc+    Sno specifig config variables found in sno_repo.config+    (which is read from the file at `sno_repo.gitdir_path / "config"`)     """ -    # Note: pygit2 can read the .sno directory okay, but it won't create it, so we do it ourselves using Path.rename()-    # Note: We could also call `git init --separate-git-dir=.sno` to crete it but it doesn't work very well - it-    # converts all paths to be absolute, which would mean sno repos wouldn't be renameable.+    SNO_REPOSITORY_VERSION = "sno.repository.version"+    SNO_WORKINGCOPY_PATH = "sno.workingcopy.path"+    SNO_WORKINGCOPY_BARE = "sno.workingcopy.bare"  # Older sno repos use this custom variable instead of core.bare+    CORE_BARE = "core.bare"  # Newer sno repos use the standard "core.bare" variable. -    # Move .git to .sno but add a reference at .git so git can find it.-    dot_git_path = repo_root_path / ".git"-    assert dot_git_path.is_dir()-    dot_sno_path = repo_root_path / ".sno"-    assert not dot_sno_path.exists() -    dot_git_path.rename(dot_sno_path)-    dot_git_path.write_text("gitdir: .sno\n", encoding="utf-8")--    repo = pygit2.Repository(str(repo_root_path.resolve()))-    assert is_sno_repo(repo)+def _append_checksum(data):+    """Appends the 160-bit git-hash to the end of data"""+    return data + pygit2.hash(data).raw -    # Disable git commands that would mess things up:-    repo_files.write_repo_file(repo, repo_files.INDEX, LOCKED_EMPTY_GIT_INDEX) -    # Force writing to reflogs.-    repo.config["core.logAllRefUpdates"] = "always"-    return repo+class LockedGitIndex:+    """+    An empty index file, but extended with a required ".sno" extension in the extensions section of the index binary+    format. (Not the file extension - the filename is simply "index", it has no file extension.)+    Causes all git commands that would involve the index or working copy to fail with "unsupported extension: .sno" -+    in that sense it is "locked" to git. Various techniques can be used to unlock it if certain git functionality is+    needed - eg marking the repository as bare so it is ignored, or removing the unsupported extension.+    """ +    GIT_INDEX_VERSION = 2+    BASE_EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", GIT_INDEX_VERSION, 0) -def is_sno_repo(pygit_repo):-    if pygit_repo is not None:-        # Older sno repos were just bare repos, newer sno repos are non-bare git repos that use ".sno" instead of ".git"-        return pygit_repo.is_bare or Path(pygit_repo.path).stem == ".sno"-    return False+    # Extension name does not start with A-Z, therefore is a required extension.+    LOCKED_SNO_EXTENSION = struct.pack(">4sI", b".sno", 0) +    # See https://git-scm.com/docs/index-format -def get_sno_workdir(pygit_repo):-    # Older sno repos are bare and use repo.path as the "workdir" - that is, where working copies are written --    # Newer ones have an actual workdir.-    return Path(pygit_repo.path) if pygit_repo.is_bare else Path(pygit_repo.workdir)+    LOCKED_EMPTY_GIT_INDEX = _append_checksum(+        BASE_EMPTY_GIT_INDEX + LOCKED_SNO_EXTENSION+    )  -@contextlib.contextmanager-def unlocked_git_index(pygit_repo):+class SnoRepo(pygit2.Repository):     """-    A sno repo has an index file containing a required ".sno" extension that git doesn't support, which stops git from-    performing various operations that we want to be only performed by sno - eg `sno checkout` does what the user wants,-    `git checkout` would not do what the user wants, so we disable it in case the user types it out of habit.--    To temporarily re-enable some of these operations from within sno, we can temporarily unlock the index by removing-    the required ".sno" extension.+    A valid pygit2.Repository, since all sno repos are also git repos - but with some added functionality.+    Ensures the git directory structure is one of the two supported by sno - "old + bare" or "new + tidy".+    Prevents worktree-related git commands from working by using a "locked git index".+    Helps set up sno specific config, and adds support for pathlib Paths.     """-    index_path = repo_files.repo_file_path(pygit_repo, repo_files.INDEX)-    if not index_path.exists():-        yield-        return--    _remove_locked_sno_extension(index_path)-    try:-        yield-    finally:-        _add_locked_sno_extension(index_path)---def _remove_locked_sno_extension(index_path):-    index_contents = index_path.read_bytes()-    index_contents = index_contents[:-20]  # Trim off checksum-    if index_contents[-8:] == LOCKED_SNO_EXTENSION:-        index_path.write_bytes(_append_checksum(index_contents[:-8]))---def _add_locked_sno_extension(index_path):-    index_contents = index_path.read_bytes()-    index_contents = index_contents[:-20]  # Trim off checksum-    if index_contents[-8:] != LOCKED_SNO_EXTENSION:-        index_path.write_bytes(_append_checksum(index_contents + LOCKED_SNO_EXTENSION))++    def __init__(self, root_path):+        if isinstance(root_path, Path):+            root_path = str(root_path.resolve())++        try:+            super().__init__(root_path)+        except pygit2.GitError:+            raise NotFound("Not an existing sno repository", exit_code=NO_REPOSITORY)++        self.gitdir_path = Path(self.path).resolve()+        assert self.is_old_bare_repo() or self.is_new_tidy_repo()

same assertionerror issue

olsen232

comment created time in 2 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

 def get_directory_from_url(url):     type=click.Path(exists=False, file_okay=False, writable=True),     required=False, )-def clone(ctx, bare, wc_path, wc_version, do_progress, depth, url, directory):+def clone(ctx, bare, wc_path, do_progress, depth, url, directory):     """ Clone a repository into a new directory """      repo_path = Path(directory or get_directory_from_url(url))+    assert not (repo_path / ".git").exists()

same AssertionError thing

olsen232

comment created time in 2 days

PullRequestReviewEvent

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,+# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,+# but since we differ slightly from a standard git repository, there are a few extra things to take care of.+++def _append_checksum(data):+    return data + pygit2.hash(data).raw+++INDEX_VERSION = 2+EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)++# Extension name does not start with A-Z => is a required extension.+LOCKED_SNO_EXTENSION = struct.pack(">4sI", b".sno", 0)++# An empty index file, but exteneded with a required ".sno" extension in the extensions section of the index binary+# format. (Not the file extension - the filename is simply "index", it has no file extension.)+# Causes all git commands that would involve the index or working copy to fail with "unsupported extension: .sno" -+# in that sense it is "locked" to git. Various techniques can be used to unlock it if certain git functionality is+# needed - eg marking the repository as bare so it is ignored, or removing the unsupported extension.++# See https://git-scm.com/docs/index-format+LOCKED_EMPTY_GIT_INDEX = _append_checksum(EMPTY_GIT_INDEX + LOCKED_SNO_EXTENSION)+++def init_repository(repo_root_path, repo_version):+    """+    Initialise a new sno repo. A sno repo is basically a git repo, except -+    - git internals are stored in .sno instead of .git+      (.git is a file that contains a reference to .sno, this is allowed by git)+    - datasets are stored in /.sno-dataset/ trees according to a particular dataset format version - see DATASETS_v2.md+      but, this only matters once we start committing data. When initialising a repo, these are not yet present.+    - there is a blob called sno.repository.version that contains the dataset format version number - but, this is only+      written when data is first imported. When initialising a repo, this is not yet present.+    - there is property in the repo config called sno.repository.version that contains the dataset format version+      number, which is used until the sno.repository.version blob is written.+    - the .sno/index file has been extended to stop git messing things up - see LOCKED_EMPTY_GIT_INDEX.++    There can also be properties in the repo config relating to the working copy - named sno.workingcopy.* - but+    these are not written by this function, it is up to the caller to configure these as needed.+    See WorkingCopy.write_config()+    """++    repo = pygit2.init_repository(str(repo_root_path.resolve()))+    repo = convert_git_repo_to_sno_repo(repo_root_path)+    write_repo_version_config(repo, repo_version)+    assert is_sno_repo(repo)+    return repo+++def convert_git_repo_to_sno_repo(repo_root_path):+    """Given a newly created non-bare git repo, with the following structure:+    repo_root_path/  # also worktree path+        .git/+            HEAD+            ...      # objects, refs, etc+    Converts it to a sno repo with the following structure:+    repo_root_path/  # also worktree path, as before+        .git         # contains "gitdir: .sno"+        .sno/+            HEAD+            index    # Locked using .sno extension+            ...      # etc+    """++    # Note: pygit2 can read the .sno directory okay, but it won't create it, so we do it ourselves using Path.rename()+    # Note: We could also call `git init --separate-git-dir=.sno` to crete it but it doesn't work very well - it+    # converts all paths to be absolute, which would mean sno repos wouldn't be renameable.++    # Move .git to .sno but add a reference at .git so git can find it.+    dot_git_path = repo_root_path / ".git"+    assert dot_git_path.is_dir()+    dot_sno_path = repo_root_path / ".sno"+    assert not dot_sno_path.exists()++    dot_git_path.rename(dot_sno_path)+    dot_git_path.write_text("gitdir: .sno\n", encoding="utf-8")++    repo = pygit2.Repository(str(repo_root_path.resolve()))+    assert is_sno_repo(repo)++    # Disable git commands that would mess things up:+    repo_files.write_repo_file(repo, repo_files.INDEX, LOCKED_EMPTY_GIT_INDEX)++    # Force writing to reflogs.+    repo.config["core.logAllRefUpdates"] = "always"+    return repo+++def is_sno_repo(pygit_repo):+    if pygit_repo is not None:+        # Older sno repos were just bare repos, newer sno repos are non-bare git repos that use ".sno" instead of ".git"+        return pygit_repo.is_bare or Path(pygit_repo.path).stem == ".sno"+    return False+++def get_sno_workdir(pygit_repo):+    # Older sno repos are bare and use repo.path as the "workdir" - that is, where working copies are written -+    # Newer ones have an actual workdir.+    return Path(pygit_repo.path) if pygit_repo.is_bare else Path(pygit_repo.workdir)+++@contextlib.contextmanager+def unlocked_git_index(pygit_repo):

Not sure about every time, feels like that'd mess up concurrent calls to sno?

Whatever we can come up with that is reasonably sane + safe :)

olsen232

comment created time in 2 days

PullRequestReviewEvent

pull request commentkoordinates/sno

Tidy up sno repo directories

Allow upgrade of sno repos to "new+tidy":

Not done.

Need some sort of solution here that doesn't involve re-cloning.

sno upgrade almost works for this already, except it is specifically not allowed to upgrade a v2 repo to a v2 repo. Otherwise it would already work. However, it would be much slower than is necessary, and it would not work in-place.

All we really need is a way of calling SnoRepo.convert_git_repo_to_sno_repo(), right?

I could special-case sno upgrade so it can do in-place v2 "old+bare" -> v2 "new+tidy" repos.

I think this is my preference, it can always be removed again later. sno upgrade worktree or some other incantation?

I could even decide that v2 + "new+tidy" is actually v3 - seems mostly unnecessary though, and like it might be more annoying than helpful to make that distinction.

I agree, I'm not that keen on this option.

olsen232

comment created time in 2 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,+# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,+# but since we differ slightly from a standard git repository, there are a few extra things to take care of.+++def _append_checksum(data):+    return data + pygit2.hash(data).raw+++INDEX_VERSION = 2+EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)++# Extension name does not start with A-Z => is a required extension.+LOCKED_SNO_EXTENSION = struct.pack(">4sI", b".sno", 0)++# An empty index file, but exteneded with a required ".sno" extension in the extensions section of the index binary+# format. (Not the file extension - the filename is simply "index", it has no file extension.)+# Causes all git commands that would involve the index or working copy to fail with "unsupported extension: .sno" -+# in that sense it is "locked" to git. Various techniques can be used to unlock it if certain git functionality is+# needed - eg marking the repository as bare so it is ignored, or removing the unsupported extension.++# See https://git-scm.com/docs/index-format+LOCKED_EMPTY_GIT_INDEX = _append_checksum(EMPTY_GIT_INDEX + LOCKED_SNO_EXTENSION)+++def init_repository(repo_root_path, repo_version):+    """+    Initialise a new sno repo. A sno repo is basically a git repo, except -+    - git internals are stored in .sno instead of .git+      (.git is a file that contains a reference to .sno, this is allowed by git)+    - datasets are stored in /.sno-dataset/ trees according to a particular dataset format version - see DATASETS_v2.md+      but, this only matters once we start committing data. When initialising a repo, these are not yet present.+    - there is a blob called sno.repository.version that contains the dataset format version number - but, this is only+      written when data is first imported. When initialising a repo, this is not yet present.+    - there is property in the repo config called sno.repository.version that contains the dataset format version+      number, which is used until the sno.repository.version blob is written.+    - the .sno/index file has been extended to stop git messing things up - see LOCKED_EMPTY_GIT_INDEX.++    There can also be properties in the repo config relating to the working copy - named sno.workingcopy.* - but+    these are not written by this function, it is up to the caller to configure these as needed.+    See WorkingCopy.write_config()+    """++    repo = pygit2.init_repository(str(repo_root_path.resolve()))+    repo = convert_git_repo_to_sno_repo(repo_root_path)+    write_repo_version_config(repo, repo_version)+    assert is_sno_repo(repo)+    return repo+++def convert_git_repo_to_sno_repo(repo_root_path):+    """Given a newly created non-bare git repo, with the following structure:+    repo_root_path/  # also worktree path+        .git/+            HEAD+            ...      # objects, refs, etc+    Converts it to a sno repo with the following structure:+    repo_root_path/  # also worktree path, as before+        .git         # contains "gitdir: .sno"+        .sno/+            HEAD+            index    # Locked using .sno extension+            ...      # etc+    """++    # Note: pygit2 can read the .sno directory okay, but it won't create it, so we do it ourselves using Path.rename()+    # Note: We could also call `git init --separate-git-dir=.sno` to crete it but it doesn't work very well - it+    # converts all paths to be absolute, which would mean sno repos wouldn't be renameable.++    # Move .git to .sno but add a reference at .git so git can find it.+    dot_git_path = repo_root_path / ".git"+    assert dot_git_path.is_dir()

assert statements are a bit of a danger in production Python code, since python -O compiles them away, is better to conditionally raise an AssertionError explicitly.

olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,+# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,+# but since we differ slightly from a standard git repository, there are a few extra things to take care of.+++def _append_checksum(data):+    return data + pygit2.hash(data).raw+++INDEX_VERSION = 2+EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)

add a comment about what this is for?

olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,+# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,+# but since we differ slightly from a standard git repository, there are a few extra things to take care of.+++def _append_checksum(data):+    return data + pygit2.hash(data).raw+++INDEX_VERSION = 2+EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)++# Extension name does not start with A-Z => is a required extension.+LOCKED_SNO_EXTENSION = struct.pack(">4sI", b".sno", 0)++# An empty index file, but exteneded with a required ".sno" extension in the extensions section of the index binary+# format. (Not the file extension - the filename is simply "index", it has no file extension.)+# Causes all git commands that would involve the index or working copy to fail with "unsupported extension: .sno" -+# in that sense it is "locked" to git. Various techniques can be used to unlock it if certain git functionality is+# needed - eg marking the repository as bare so it is ignored, or removing the unsupported extension.++# See https://git-scm.com/docs/index-format+LOCKED_EMPTY_GIT_INDEX = _append_checksum(EMPTY_GIT_INDEX + LOCKED_SNO_EXTENSION)+++def init_repository(repo_root_path, repo_version):+    """+    Initialise a new sno repo. A sno repo is basically a git repo, except -+    - git internals are stored in .sno instead of .git+      (.git is a file that contains a reference to .sno, this is allowed by git)+    - datasets are stored in /.sno-dataset/ trees according to a particular dataset format version - see DATASETS_v2.md+      but, this only matters once we start committing data. When initialising a repo, these are not yet present.+    - there is a blob called sno.repository.version that contains the dataset format version number - but, this is only+      written when data is first imported. When initialising a repo, this is not yet present.+    - there is property in the repo config called sno.repository.version that contains the dataset format version+      number, which is used until the sno.repository.version blob is written.+    - the .sno/index file has been extended to stop git messing things up - see LOCKED_EMPTY_GIT_INDEX.++    There can also be properties in the repo config relating to the working copy - named sno.workingcopy.* - but+    these are not written by this function, it is up to the caller to configure these as needed.+    See WorkingCopy.write_config()+    """++    repo = pygit2.init_repository(str(repo_root_path.resolve()))+    repo = convert_git_repo_to_sno_repo(repo_root_path)+    write_repo_version_config(repo, repo_version)+    assert is_sno_repo(repo)+    return repo+++def convert_git_repo_to_sno_repo(repo_root_path):+    """Given a newly created non-bare git repo, with the following structure:+    repo_root_path/  # also worktree path+        .git/+            HEAD+            ...      # objects, refs, etc+    Converts it to a sno repo with the following structure:+    repo_root_path/  # also worktree path, as before+        .git         # contains "gitdir: .sno"+        .sno/+            HEAD+            index    # Locked using .sno extension+            ...      # etc+    """++    # Note: pygit2 can read the .sno directory okay, but it won't create it, so we do it ourselves using Path.rename()+    # Note: We could also call `git init --separate-git-dir=.sno` to crete it but it doesn't work very well - it+    # converts all paths to be absolute, which would mean sno repos wouldn't be renameable.++    # Move .git to .sno but add a reference at .git so git can find it.+    dot_git_path = repo_root_path / ".git"+    assert dot_git_path.is_dir()+    dot_sno_path = repo_root_path / ".sno"+    assert not dot_sno_path.exists()++    dot_git_path.rename(dot_sno_path)+    dot_git_path.write_text("gitdir: .sno\n", encoding="utf-8")++    repo = pygit2.Repository(str(repo_root_path.resolve()))+    assert is_sno_repo(repo)++    # Disable git commands that would mess things up:+    repo_files.write_repo_file(repo, repo_files.INDEX, LOCKED_EMPTY_GIT_INDEX)++    # Force writing to reflogs.+    repo.config["core.logAllRefUpdates"] = "always"+    return repo+++def is_sno_repo(pygit_repo):+    if pygit_repo is not None:+        # Older sno repos were just bare repos, newer sno repos are non-bare git repos that use ".sno" instead of ".git"

We can still have bare sno repos though? (ie: for server-side repo hosting)

But that behaviour is set through git config using the sno.workingcopy.* values? (ie: do i have a working copy)

olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,+# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,+# but since we differ slightly from a standard git repository, there are a few extra things to take care of.+++def _append_checksum(data):+    return data + pygit2.hash(data).raw+++INDEX_VERSION = 2+EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)++# Extension name does not start with A-Z => is a required extension.+LOCKED_SNO_EXTENSION = struct.pack(">4sI", b".sno", 0)++# An empty index file, but exteneded with a required ".sno" extension in the extensions section of the index binary+# format. (Not the file extension - the filename is simply "index", it has no file extension.)+# Causes all git commands that would involve the index or working copy to fail with "unsupported extension: .sno" -+# in that sense it is "locked" to git. Various techniques can be used to unlock it if certain git functionality is+# needed - eg marking the repository as bare so it is ignored, or removing the unsupported extension.++# See https://git-scm.com/docs/index-format+LOCKED_EMPTY_GIT_INDEX = _append_checksum(EMPTY_GIT_INDEX + LOCKED_SNO_EXTENSION)+++def init_repository(repo_root_path, repo_version):+    """+    Initialise a new sno repo. A sno repo is basically a git repo, except -+    - git internals are stored in .sno instead of .git+      (.git is a file that contains a reference to .sno, this is allowed by git)+    - datasets are stored in /.sno-dataset/ trees according to a particular dataset format version - see DATASETS_v2.md+      but, this only matters once we start committing data. When initialising a repo, these are not yet present.+    - there is a blob called sno.repository.version that contains the dataset format version number - but, this is only+      written when data is first imported. When initialising a repo, this is not yet present.+    - there is property in the repo config called sno.repository.version that contains the dataset format version+      number, which is used until the sno.repository.version blob is written.+    - the .sno/index file has been extended to stop git messing things up - see LOCKED_EMPTY_GIT_INDEX.++    There can also be properties in the repo config relating to the working copy - named sno.workingcopy.* - but+    these are not written by this function, it is up to the caller to configure these as needed.+    See WorkingCopy.write_config()+    """++    repo = pygit2.init_repository(str(repo_root_path.resolve()))+    repo = convert_git_repo_to_sno_repo(repo_root_path)+    write_repo_version_config(repo, repo_version)+    assert is_sno_repo(repo)+    return repo+++def convert_git_repo_to_sno_repo(repo_root_path):+    """Given a newly created non-bare git repo, with the following structure:+    repo_root_path/  # also worktree path+        .git/+            HEAD+            ...      # objects, refs, etc+    Converts it to a sno repo with the following structure:+    repo_root_path/  # also worktree path, as before+        .git         # contains "gitdir: .sno"+        .sno/+            HEAD+            index    # Locked using .sno extension+            ...      # etc+    """++    # Note: pygit2 can read the .sno directory okay, but it won't create it, so we do it ourselves using Path.rename()+    # Note: We could also call `git init --separate-git-dir=.sno` to crete it but it doesn't work very well - it+    # converts all paths to be absolute, which would mean sno repos wouldn't be renameable.++    # Move .git to .sno but add a reference at .git so git can find it.+    dot_git_path = repo_root_path / ".git"+    assert dot_git_path.is_dir()+    dot_sno_path = repo_root_path / ".sno"+    assert not dot_sno_path.exists()++    dot_git_path.rename(dot_sno_path)+    dot_git_path.write_text("gitdir: .sno\n", encoding="utf-8")++    repo = pygit2.Repository(str(repo_root_path.resolve()))+    assert is_sno_repo(repo)++    # Disable git commands that would mess things up:+    repo_files.write_repo_file(repo, repo_files.INDEX, LOCKED_EMPTY_GIT_INDEX)++    # Force writing to reflogs.+    repo.config["core.logAllRefUpdates"] = "always"+    return repo+++def is_sno_repo(pygit_repo):+    if pygit_repo is not None:+        # Older sno repos were just bare repos, newer sno repos are non-bare git repos that use ".sno" instead of ".git"+        return pygit_repo.is_bare or Path(pygit_repo.path).stem == ".sno"+    return False+++def get_sno_workdir(pygit_repo):+    # Older sno repos are bare and use repo.path as the "workdir" - that is, where working copies are written -+    # Newer ones have an actual workdir.+    return Path(pygit_repo.path) if pygit_repo.is_bare else Path(pygit_repo.workdir)+++@contextlib.contextmanager+def unlocked_git_index(pygit_repo):+    """+    A sno repo has an index file containing a required ".sno" extension that git doesn't support, which stops git from+    performing various operations that we want to be only performed by sno - eg `sno checkout` does what the user wants,+    `git checkout` would not do what the user wants, so we disable it in case the user types it out of habit.++    To temporarily re-enable some of these operations from within sno, we can temporarily unlock the index by removing+    the required ".sno" extension.+    """+    index_path = repo_files.repo_file_path(pygit_repo, repo_files.INDEX)+    if not index_path.exists():+        yield+        return++    _remove_locked_sno_extension(index_path)+    try:+        yield+    finally:+        _add_locked_sno_extension(index_path)+++def _remove_locked_sno_extension(index_path):+    index_contents = index_path.read_bytes()+    index_contents = index_contents[:-20]  # Trim off checksum+    if index_contents[-8:] == LOCKED_SNO_EXTENSION:

Should we be logging a warning if it's not how we expect?

olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,+# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,+# but since we differ slightly from a standard git repository, there are a few extra things to take care of.+++def _append_checksum(data):+    return data + pygit2.hash(data).raw+++INDEX_VERSION = 2+EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)++# Extension name does not start with A-Z => is a required extension.+LOCKED_SNO_EXTENSION = struct.pack(">4sI", b".sno", 0)++# An empty index file, but exteneded with a required ".sno" extension in the extensions section of the index binary+# format. (Not the file extension - the filename is simply "index", it has no file extension.)+# Causes all git commands that would involve the index or working copy to fail with "unsupported extension: .sno" -+# in that sense it is "locked" to git. Various techniques can be used to unlock it if certain git functionality is+# needed - eg marking the repository as bare so it is ignored, or removing the unsupported extension.++# See https://git-scm.com/docs/index-format+LOCKED_EMPTY_GIT_INDEX = _append_checksum(EMPTY_GIT_INDEX + LOCKED_SNO_EXTENSION)+++def init_repository(repo_root_path, repo_version):+    """+    Initialise a new sno repo. A sno repo is basically a git repo, except -+    - git internals are stored in .sno instead of .git+      (.git is a file that contains a reference to .sno, this is allowed by git)+    - datasets are stored in /.sno-dataset/ trees according to a particular dataset format version - see DATASETS_v2.md+      but, this only matters once we start committing data. When initialising a repo, these are not yet present.+    - there is a blob called sno.repository.version that contains the dataset format version number - but, this is only+      written when data is first imported. When initialising a repo, this is not yet present.+    - there is property in the repo config called sno.repository.version that contains the dataset format version+      number, which is used until the sno.repository.version blob is written.+    - the .sno/index file has been extended to stop git messing things up - see LOCKED_EMPTY_GIT_INDEX.++    There can also be properties in the repo config relating to the working copy - named sno.workingcopy.* - but+    these are not written by this function, it is up to the caller to configure these as needed.+    See WorkingCopy.write_config()+    """++    repo = pygit2.init_repository(str(repo_root_path.resolve()))+    repo = convert_git_repo_to_sno_repo(repo_root_path)+    write_repo_version_config(repo, repo_version)+    assert is_sno_repo(repo)+    return repo+++def convert_git_repo_to_sno_repo(repo_root_path):+    """Given a newly created non-bare git repo, with the following structure:+    repo_root_path/  # also worktree path+        .git/+            HEAD+            ...      # objects, refs, etc+    Converts it to a sno repo with the following structure:+    repo_root_path/  # also worktree path, as before+        .git         # contains "gitdir: .sno"+        .sno/+            HEAD+            index    # Locked using .sno extension+            ...      # etc+    """++    # Note: pygit2 can read the .sno directory okay, but it won't create it, so we do it ourselves using Path.rename()+    # Note: We could also call `git init --separate-git-dir=.sno` to crete it but it doesn't work very well - it+    # converts all paths to be absolute, which would mean sno repos wouldn't be renameable.++    # Move .git to .sno but add a reference at .git so git can find it.+    dot_git_path = repo_root_path / ".git"+    assert dot_git_path.is_dir()+    dot_sno_path = repo_root_path / ".sno"+    assert not dot_sno_path.exists()++    dot_git_path.rename(dot_sno_path)+    dot_git_path.write_text("gitdir: .sno\n", encoding="utf-8")++    repo = pygit2.Repository(str(repo_root_path.resolve()))+    assert is_sno_repo(repo)++    # Disable git commands that would mess things up:+    repo_files.write_repo_file(repo, repo_files.INDEX, LOCKED_EMPTY_GIT_INDEX)++    # Force writing to reflogs.+    repo.config["core.logAllRefUpdates"] = "always"+    return repo+++def is_sno_repo(pygit_repo):+    if pygit_repo is not None:+        # Older sno repos were just bare repos, newer sno repos are non-bare git repos that use ".sno" instead of ".git"+        return pygit_repo.is_bare or Path(pygit_repo.path).stem == ".sno"+    return False+++def get_sno_workdir(pygit_repo):+    # Older sno repos are bare and use repo.path as the "workdir" - that is, where working copies are written -+    # Newer ones have an actual workdir.+    return Path(pygit_repo.path) if pygit_repo.is_bare else Path(pygit_repo.workdir)+++@contextlib.contextmanager+def unlocked_git_index(pygit_repo):+    """+    A sno repo has an index file containing a required ".sno" extension that git doesn't support, which stops git from+    performing various operations that we want to be only performed by sno - eg `sno checkout` does what the user wants,+    `git checkout` would not do what the user wants, so we disable it in case the user types it out of habit.++    To temporarily re-enable some of these operations from within sno, we can temporarily unlock the index by removing+    the required ".sno" extension.+    """+    index_path = repo_files.repo_file_path(pygit_repo, repo_files.INDEX)+    if not index_path.exists():+        yield+        return++    _remove_locked_sno_extension(index_path)+    try:+        yield+    finally:+        _add_locked_sno_extension(index_path)+++def _remove_locked_sno_extension(index_path):+    index_contents = index_path.read_bytes()

indexes can potentially be GiB-sized, right? 🤨

olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,+# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,+# but since we differ slightly from a standard git repository, there are a few extra things to take care of.+++def _append_checksum(data):+    return data + pygit2.hash(data).raw+++INDEX_VERSION = 2+EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)++# Extension name does not start with A-Z => is a required extension.+LOCKED_SNO_EXTENSION = struct.pack(">4sI", b".sno", 0)++# An empty index file, but exteneded with a required ".sno" extension in the extensions section of the index binary
# An empty index file, but extended with a required ".sno" extension in the extensions section of the index binary
olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,+# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,+# but since we differ slightly from a standard git repository, there are a few extra things to take care of.+++def _append_checksum(data):+    return data + pygit2.hash(data).raw+++INDEX_VERSION = 2+EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)++# Extension name does not start with A-Z => is a required extension.+LOCKED_SNO_EXTENSION = struct.pack(">4sI", b".sno", 0)++# An empty index file, but exteneded with a required ".sno" extension in the extensions section of the index binary+# format. (Not the file extension - the filename is simply "index", it has no file extension.)+# Causes all git commands that would involve the index or working copy to fail with "unsupported extension: .sno" -+# in that sense it is "locked" to git. Various techniques can be used to unlock it if certain git functionality is+# needed - eg marking the repository as bare so it is ignored, or removing the unsupported extension.++# See https://git-scm.com/docs/index-format+LOCKED_EMPTY_GIT_INDEX = _append_checksum(EMPTY_GIT_INDEX + LOCKED_SNO_EXTENSION)+++def init_repository(repo_root_path, repo_version):+    """+    Initialise a new sno repo. A sno repo is basically a git repo, except -+    - git internals are stored in .sno instead of .git+      (.git is a file that contains a reference to .sno, this is allowed by git)+    - datasets are stored in /.sno-dataset/ trees according to a particular dataset format version - see DATASETS_v2.md+      but, this only matters once we start committing data. When initialising a repo, these are not yet present.+    - there is a blob called sno.repository.version that contains the dataset format version number - but, this is only+      written when data is first imported. When initialising a repo, this is not yet present.+    - there is property in the repo config called sno.repository.version that contains the dataset format version+      number, which is used until the sno.repository.version blob is written.+    - the .sno/index file has been extended to stop git messing things up - see LOCKED_EMPTY_GIT_INDEX.++    There can also be properties in the repo config relating to the working copy - named sno.workingcopy.* - but+    these are not written by this function, it is up to the caller to configure these as needed.+    See WorkingCopy.write_config()+    """++    repo = pygit2.init_repository(str(repo_root_path.resolve()))+    repo = convert_git_repo_to_sno_repo(repo_root_path)+    write_repo_version_config(repo, repo_version)+    assert is_sno_repo(repo)+    return repo+++def convert_git_repo_to_sno_repo(repo_root_path):+    """Given a newly created non-bare git repo, with the following structure:+    repo_root_path/  # also worktree path+        .git/+            HEAD+            ...      # objects, refs, etc+    Converts it to a sno repo with the following structure:+    repo_root_path/  # also worktree path, as before+        .git         # contains "gitdir: .sno"+        .sno/+            HEAD+            index    # Locked using .sno extension+            ...      # etc+    """++    # Note: pygit2 can read the .sno directory okay, but it won't create it, so we do it ourselves using Path.rename()+    # Note: We could also call `git init --separate-git-dir=.sno` to crete it but it doesn't work very well - it+    # converts all paths to be absolute, which would mean sno repos wouldn't be renameable.++    # Move .git to .sno but add a reference at .git so git can find it.+    dot_git_path = repo_root_path / ".git"+    assert dot_git_path.is_dir()+    dot_sno_path = repo_root_path / ".sno"+    assert not dot_sno_path.exists()++    dot_git_path.rename(dot_sno_path)+    dot_git_path.write_text("gitdir: .sno\n", encoding="utf-8")++    repo = pygit2.Repository(str(repo_root_path.resolve()))+    assert is_sno_repo(repo)++    # Disable git commands that would mess things up:+    repo_files.write_repo_file(repo, repo_files.INDEX, LOCKED_EMPTY_GIT_INDEX)++    # Force writing to reflogs.+    repo.config["core.logAllRefUpdates"] = "always"+    return repo+++def is_sno_repo(pygit_repo):+    if pygit_repo is not None:+        # Older sno repos were just bare repos, newer sno repos are non-bare git repos that use ".sno" instead of ".git"+        return pygit_repo.is_bare or Path(pygit_repo.path).stem == ".sno"+    return False+++def get_sno_workdir(pygit_repo):+    # Older sno repos are bare and use repo.path as the "workdir" - that is, where working copies are written -+    # Newer ones have an actual workdir.+    return Path(pygit_repo.path) if pygit_repo.is_bare else Path(pygit_repo.workdir)+++@contextlib.contextmanager+def unlocked_git_index(pygit_repo):

Would prefer some more specific tests around locking and unlocking behaviours

olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,+# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,+# but since we differ slightly from a standard git repository, there are a few extra things to take care of.+++def _append_checksum(data):+    return data + pygit2.hash(data).raw+++INDEX_VERSION = 2+EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)++# Extension name does not start with A-Z => is a required extension.+LOCKED_SNO_EXTENSION = struct.pack(">4sI", b".sno", 0)++# An empty index file, but exteneded with a required ".sno" extension in the extensions section of the index binary+# format. (Not the file extension - the filename is simply "index", it has no file extension.)+# Causes all git commands that would involve the index or working copy to fail with "unsupported extension: .sno" -+# in that sense it is "locked" to git. Various techniques can be used to unlock it if certain git functionality is+# needed - eg marking the repository as bare so it is ignored, or removing the unsupported extension.++# See https://git-scm.com/docs/index-format+LOCKED_EMPTY_GIT_INDEX = _append_checksum(EMPTY_GIT_INDEX + LOCKED_SNO_EXTENSION)+++def init_repository(repo_root_path, repo_version):+    """+    Initialise a new sno repo. A sno repo is basically a git repo, except -+    - git internals are stored in .sno instead of .git+      (.git is a file that contains a reference to .sno, this is allowed by git)+    - datasets are stored in /.sno-dataset/ trees according to a particular dataset format version - see DATASETS_v2.md+      but, this only matters once we start committing data. When initialising a repo, these are not yet present.+    - there is a blob called sno.repository.version that contains the dataset format version number - but, this is only+      written when data is first imported. When initialising a repo, this is not yet present.+    - there is property in the repo config called sno.repository.version that contains the dataset format version+      number, which is used until the sno.repository.version blob is written.+    - the .sno/index file has been extended to stop git messing things up - see LOCKED_EMPTY_GIT_INDEX.++    There can also be properties in the repo config relating to the working copy - named sno.workingcopy.* - but+    these are not written by this function, it is up to the caller to configure these as needed.+    See WorkingCopy.write_config()+    """++    repo = pygit2.init_repository(str(repo_root_path.resolve()))+    repo = convert_git_repo_to_sno_repo(repo_root_path)+    write_repo_version_config(repo, repo_version)+    assert is_sno_repo(repo)+    return repo+++def convert_git_repo_to_sno_repo(repo_root_path):+    """Given a newly created non-bare git repo, with the following structure:+    repo_root_path/  # also worktree path+        .git/+            HEAD+            ...      # objects, refs, etc+    Converts it to a sno repo with the following structure:+    repo_root_path/  # also worktree path, as before+        .git         # contains "gitdir: .sno"+        .sno/+            HEAD+            index    # Locked using .sno extension+            ...      # etc+    """++    # Note: pygit2 can read the .sno directory okay, but it won't create it, so we do it ourselves using Path.rename()+    # Note: We could also call `git init --separate-git-dir=.sno` to crete it but it doesn't work very well - it+    # converts all paths to be absolute, which would mean sno repos wouldn't be renameable.++    # Move .git to .sno but add a reference at .git so git can find it.+    dot_git_path = repo_root_path / ".git"+    assert dot_git_path.is_dir()+    dot_sno_path = repo_root_path / ".sno"+    assert not dot_sno_path.exists()++    dot_git_path.rename(dot_sno_path)+    dot_git_path.write_text("gitdir: .sno\n", encoding="utf-8")++    repo = pygit2.Repository(str(repo_root_path.resolve()))+    assert is_sno_repo(repo)++    # Disable git commands that would mess things up:+    repo_files.write_repo_file(repo, repo_files.INDEX, LOCKED_EMPTY_GIT_INDEX)++    # Force writing to reflogs.+    repo.config["core.logAllRefUpdates"] = "always"+    return repo+++def is_sno_repo(pygit_repo):+    if pygit_repo is not None:+        # Older sno repos were just bare repos, newer sno repos are non-bare git repos that use ".sno" instead of ".git"+        return pygit_repo.is_bare or Path(pygit_repo.path).stem == ".sno"+    return False+++def get_sno_workdir(pygit_repo):+    # Older sno repos are bare and use repo.path as the "workdir" - that is, where working copies are written -+    # Newer ones have an actual workdir.+    return Path(pygit_repo.path) if pygit_repo.is_bare else Path(pygit_repo.workdir)+++@contextlib.contextmanager+def unlocked_git_index(pygit_repo):+    """+    A sno repo has an index file containing a required ".sno" extension that git doesn't support, which stops git from+    performing various operations that we want to be only performed by sno - eg `sno checkout` does what the user wants,+    `git checkout` would not do what the user wants, so we disable it in case the user types it out of habit.++    To temporarily re-enable some of these operations from within sno, we can temporarily unlock the index by removing+    the required ".sno" extension.+    """+    index_path = repo_files.repo_file_path(pygit_repo, repo_files.INDEX)+    if not index_path.exists():+        yield+        return++    _remove_locked_sno_extension(index_path)+    try:+        yield+    finally:+        _add_locked_sno_extension(index_path)+++def _remove_locked_sno_extension(index_path):+    index_contents = index_path.read_bytes()

Oh, is always empty. Should we check it's not become GiB-size before we read it all into memory?

olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,+# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,+# but since we differ slightly from a standard git repository, there are a few extra things to take care of.+++def _append_checksum(data):+    return data + pygit2.hash(data).raw+++INDEX_VERSION = 2+EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)++# Extension name does not start with A-Z => is a required extension.+LOCKED_SNO_EXTENSION = struct.pack(">4sI", b".sno", 0)++# An empty index file, but exteneded with a required ".sno" extension in the extensions section of the index binary+# format. (Not the file extension - the filename is simply "index", it has no file extension.)+# Causes all git commands that would involve the index or working copy to fail with "unsupported extension: .sno" -+# in that sense it is "locked" to git. Various techniques can be used to unlock it if certain git functionality is+# needed - eg marking the repository as bare so it is ignored, or removing the unsupported extension.++# See https://git-scm.com/docs/index-format+LOCKED_EMPTY_GIT_INDEX = _append_checksum(EMPTY_GIT_INDEX + LOCKED_SNO_EXTENSION)+++def init_repository(repo_root_path, repo_version):+    """+    Initialise a new sno repo. A sno repo is basically a git repo, except -+    - git internals are stored in .sno instead of .git+      (.git is a file that contains a reference to .sno, this is allowed by git)+    - datasets are stored in /.sno-dataset/ trees according to a particular dataset format version - see DATASETS_v2.md+      but, this only matters once we start committing data. When initialising a repo, these are not yet present.+    - there is a blob called sno.repository.version that contains the dataset format version number - but, this is only+      written when data is first imported. When initialising a repo, this is not yet present.+    - there is property in the repo config called sno.repository.version that contains the dataset format version+      number, which is used until the sno.repository.version blob is written.+    - the .sno/index file has been extended to stop git messing things up - see LOCKED_EMPTY_GIT_INDEX.++    There can also be properties in the repo config relating to the working copy - named sno.workingcopy.* - but+    these are not written by this function, it is up to the caller to configure these as needed.+    See WorkingCopy.write_config()+    """++    repo = pygit2.init_repository(str(repo_root_path.resolve()))+    repo = convert_git_repo_to_sno_repo(repo_root_path)+    write_repo_version_config(repo, repo_version)+    assert is_sno_repo(repo)+    return repo+++def convert_git_repo_to_sno_repo(repo_root_path):

again, think we should make sure we have a clear recovery off-ramp for this aborting part-way through.

eg: no index or no reflog config would be annoying and non-obvious, even though .sno/ is there and it'd otherwise seem to "work"

Maybe just a case of re-ordering the steps?

olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,+# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,+# but since we differ slightly from a standard git repository, there are a few extra things to take care of.+++def _append_checksum(data):+    return data + pygit2.hash(data).raw+++INDEX_VERSION = 2+EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)

ah, maybe move all these definitions under the explanation at :26?

olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

 def checkout(ctx, branch, force, discard_changes, refish):     reset_wc_if_needed(repo, commit, discard_changes=discard_changes)      repo.set_head(head_ref)-    repo.reset(commit.oid, pygit2.GIT_RESET_SOFT)

Why aren't these needed any more?

olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,+# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,+# but since we differ slightly from a standard git repository, there are a few extra things to take care of.+++def _append_checksum(data):+    return data + pygit2.hash(data).raw+++INDEX_VERSION = 2+EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)++# Extension name does not start with A-Z => is a required extension.+LOCKED_SNO_EXTENSION = struct.pack(">4sI", b".sno", 0)++# An empty index file, but exteneded with a required ".sno" extension in the extensions section of the index binary

Are the extension identifiers typically .-prefixed?

olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,+# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,+# but since we differ slightly from a standard git repository, there are a few extra things to take care of.+++def _append_checksum(data):+    return data + pygit2.hash(data).raw+++INDEX_VERSION = 2+EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)++# Extension name does not start with A-Z => is a required extension.+LOCKED_SNO_EXTENSION = struct.pack(">4sI", b".sno", 0)++# An empty index file, but exteneded with a required ".sno" extension in the extensions section of the index binary+# format. (Not the file extension - the filename is simply "index", it has no file extension.)+# Causes all git commands that would involve the index or working copy to fail with "unsupported extension: .sno" -+# in that sense it is "locked" to git. Various techniques can be used to unlock it if certain git functionality is+# needed - eg marking the repository as bare so it is ignored, or removing the unsupported extension.++# See https://git-scm.com/docs/index-format+LOCKED_EMPTY_GIT_INDEX = _append_checksum(EMPTY_GIT_INDEX + LOCKED_SNO_EXTENSION)+++def init_repository(repo_root_path, repo_version):+    """+    Initialise a new sno repo. A sno repo is basically a git repo, except -+    - git internals are stored in .sno instead of .git+      (.git is a file that contains a reference to .sno, this is allowed by git)+    - datasets are stored in /.sno-dataset/ trees according to a particular dataset format version - see DATASETS_v2.md+      but, this only matters once we start committing data. When initialising a repo, these are not yet present.+    - there is a blob called sno.repository.version that contains the dataset format version number - but, this is only+      written when data is first imported. When initialising a repo, this is not yet present.+    - there is property in the repo config called sno.repository.version that contains the dataset format version+      number, which is used until the sno.repository.version blob is written.+    - the .sno/index file has been extended to stop git messing things up - see LOCKED_EMPTY_GIT_INDEX.++    There can also be properties in the repo config relating to the working copy - named sno.workingcopy.* - but+    these are not written by this function, it is up to the caller to configure these as needed.+    See WorkingCopy.write_config()+    """++    repo = pygit2.init_repository(str(repo_root_path.resolve()))+    repo = convert_git_repo_to_sno_repo(repo_root_path)+    write_repo_version_config(repo, repo_version)+    assert is_sno_repo(repo)+    return repo+++def convert_git_repo_to_sno_repo(repo_root_path):+    """Given a newly created non-bare git repo, with the following structure:+    repo_root_path/  # also worktree path+        .git/+            HEAD+            ...      # objects, refs, etc+    Converts it to a sno repo with the following structure:+    repo_root_path/  # also worktree path, as before+        .git         # contains "gitdir: .sno"+        .sno/+            HEAD+            index    # Locked using .sno extension+            ...      # etc+    """++    # Note: pygit2 can read the .sno directory okay, but it won't create it, so we do it ourselves using Path.rename()+    # Note: We could also call `git init --separate-git-dir=.sno` to crete it but it doesn't work very well - it+    # converts all paths to be absolute, which would mean sno repos wouldn't be renameable.++    # Move .git to .sno but add a reference at .git so git can find it.+    dot_git_path = repo_root_path / ".git"+    assert dot_git_path.is_dir()+    dot_sno_path = repo_root_path / ".sno"+    assert not dot_sno_path.exists()++    dot_git_path.rename(dot_sno_path)+    dot_git_path.write_text("gitdir: .sno\n", encoding="utf-8")++    repo = pygit2.Repository(str(repo_root_path.resolve()))+    assert is_sno_repo(repo)++    # Disable git commands that would mess things up:+    repo_files.write_repo_file(repo, repo_files.INDEX, LOCKED_EMPTY_GIT_INDEX)++    # Force writing to reflogs.+    repo.config["core.logAllRefUpdates"] = "always"+    return repo+++def is_sno_repo(pygit_repo):+    if pygit_repo is not None:+        # Older sno repos were just bare repos, newer sno repos are non-bare git repos that use ".sno" instead of ".git"+        return pygit_repo.is_bare or Path(pygit_repo.path).stem == ".sno"+    return False+++def get_sno_workdir(pygit_repo):+    # Older sno repos are bare and use repo.path as the "workdir" - that is, where working copies are written -+    # Newer ones have an actual workdir.+    return Path(pygit_repo.path) if pygit_repo.is_bare else Path(pygit_repo.workdir)+++@contextlib.contextmanager+def unlocked_git_index(pygit_repo):

we should probably start to resurrect fsck again to allow us to do things like "re-lock" an index if this process crashes hard and the index doesn't get reset.

olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,+# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,+# but since we differ slightly from a standard git repository, there are a few extra things to take care of.+++def _append_checksum(data):+    return data + pygit2.hash(data).raw+++INDEX_VERSION = 2+EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)++# Extension name does not start with A-Z => is a required extension.+LOCKED_SNO_EXTENSION = struct.pack(">4sI", b".sno", 0)++# An empty index file, but exteneded with a required ".sno" extension in the extensions section of the index binary+# format. (Not the file extension - the filename is simply "index", it has no file extension.)+# Causes all git commands that would involve the index or working copy to fail with "unsupported extension: .sno" -+# in that sense it is "locked" to git. Various techniques can be used to unlock it if certain git functionality is+# needed - eg marking the repository as bare so it is ignored, or removing the unsupported extension.++# See https://git-scm.com/docs/index-format+LOCKED_EMPTY_GIT_INDEX = _append_checksum(EMPTY_GIT_INDEX + LOCKED_SNO_EXTENSION)+++def init_repository(repo_root_path, repo_version):+    """+    Initialise a new sno repo. A sno repo is basically a git repo, except -+    - git internals are stored in .sno instead of .git+      (.git is a file that contains a reference to .sno, this is allowed by git)+    - datasets are stored in /.sno-dataset/ trees according to a particular dataset format version - see DATASETS_v2.md+      but, this only matters once we start committing data. When initialising a repo, these are not yet present.+    - there is a blob called sno.repository.version that contains the dataset format version number - but, this is only+      written when data is first imported. When initialising a repo, this is not yet present.+    - there is property in the repo config called sno.repository.version that contains the dataset format version+      number, which is used until the sno.repository.version blob is written.+    - the .sno/index file has been extended to stop git messing things up - see LOCKED_EMPTY_GIT_INDEX.++    There can also be properties in the repo config relating to the working copy - named sno.workingcopy.* - but+    these are not written by this function, it is up to the caller to configure these as needed.+    See WorkingCopy.write_config()+    """++    repo = pygit2.init_repository(str(repo_root_path.resolve()))+    repo = convert_git_repo_to_sno_repo(repo_root_path)+    write_repo_version_config(repo, repo_version)+    assert is_sno_repo(repo)+    return repo+++def convert_git_repo_to_sno_repo(repo_root_path):+    """Given a newly created non-bare git repo, with the following structure:+    repo_root_path/  # also worktree path+        .git/+            HEAD+            ...      # objects, refs, etc+    Converts it to a sno repo with the following structure:+    repo_root_path/  # also worktree path, as before+        .git         # contains "gitdir: .sno"+        .sno/+            HEAD+            index    # Locked using .sno extension+            ...      # etc+    """++    # Note: pygit2 can read the .sno directory okay, but it won't create it, so we do it ourselves using Path.rename()+    # Note: We could also call `git init --separate-git-dir=.sno` to crete it but it doesn't work very well - it+    # converts all paths to be absolute, which would mean sno repos wouldn't be renameable.++    # Move .git to .sno but add a reference at .git so git can find it.+    dot_git_path = repo_root_path / ".git"+    assert dot_git_path.is_dir()+    dot_sno_path = repo_root_path / ".sno"+    assert not dot_sno_path.exists()++    dot_git_path.rename(dot_sno_path)+    dot_git_path.write_text("gitdir: .sno\n", encoding="utf-8")++    repo = pygit2.Repository(str(repo_root_path.resolve()))+    assert is_sno_repo(repo)++    # Disable git commands that would mess things up:+    repo_files.write_repo_file(repo, repo_files.INDEX, LOCKED_EMPTY_GIT_INDEX)++    # Force writing to reflogs.+    repo.config["core.logAllRefUpdates"] = "always"+    return repo+++def is_sno_repo(pygit_repo):+    if pygit_repo is not None:+        # Older sno repos were just bare repos, newer sno repos are non-bare git repos that use ".sno" instead of ".git"
        # Older sno repos were just bare git repos, newer sno repos are non-bare git repos that use ".sno" instead of ".git"
olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,+# but the .git file, .sno directory, .sno/index file. pygit2 mostly takes care of this for us,+# but since we differ slightly from a standard git repository, there are a few extra things to take care of.+++def _append_checksum(data):+    return data + pygit2.hash(data).raw+++INDEX_VERSION = 2+EMPTY_GIT_INDEX = struct.pack(">4sII", b"DIRC", INDEX_VERSION, 0)++# Extension name does not start with A-Z => is a required extension.+LOCKED_SNO_EXTENSION = struct.pack(">4sI", b".sno", 0)++# An empty index file, but exteneded with a required ".sno" extension in the extensions section of the index binary+# format. (Not the file extension - the filename is simply "index", it has no file extension.)+# Causes all git commands that would involve the index or working copy to fail with "unsupported extension: .sno" -

This is elegant, nice find 😄 Interacts the same with command-line git as libgit2, or just the former?

olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

 def clone(ctx, bare, wc_path, wc_version, do_progress, depth, url, directory):     except subprocess.CalledProcessError as e:         sys.exit(translate_subprocess_exit_code(e.returncode)) -    repo = pygit2.Repository(str(repo_path.resolve()))+    repo = sno_repo.convert_git_repo_to_sno_repo(repo_path)

if these clone steps fail/abort part-way through, what happens?

  1. does it act like it's working but it doesn't? (👎)
  2. does it not work, you need to rm and start again? (fine)
  3. something else?
olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

+import contextlib+import struct+from pathlib import Path++import pygit2++from .repository_version import write_repo_version_config+from . import repo_files+++# Utilities for maintaining file structure of a sno repo - not the versioned controlled blobs,

Should be in a """ docstring really

olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

 def del_repo_cfg(key):             del_repo_cfg("sno.workingcopy.path")             return -        path = path or f"{Path(repo.path).resolve().stem}.gpkg"+        path = path or cls.default_path(repo)         repo_cfg["sno.workingcopy.path"] = str(path)         del_repo_cfg("sno.workingcopy.bare") +    @classmethod+    def default_path(cls, repo):+        workdir = sno_repo.get_sno_workdir(repo)

docstring/example?

olsen232

comment created time in 3 days

Pull request review commentkoordinates/sno

Tidy up sno repo directories

 def clone(ctx, bare, wc_path, wc_version, do_progress, depth, url, directory):     except subprocess.CalledProcessError as e:         sys.exit(translate_subprocess_exit_code(e.returncode)) -    repo = pygit2.Repository(str(repo_path.resolve()))+    repo = sno_repo.convert_git_repo_to_sno_repo(repo_path) 

this is quite a long method, can we have some comments about the various steps that are happening?

olsen232

comment created time in 3 days

PullRequestReviewEvent
PullRequestReviewEvent

pull request commentkoordinates/sno

Tidy up sno repo directories

Will review more this morning, but we should check all the files/dirs are properly set to hidden on Windows (which is a file attribute).

Might require solving cross-platform pip-compile for installing pywin32 though 🐇🕳

olsen232

comment created time in 3 days

issue commentlensapp/lens

Cannot connect to k8s behind VPN on Mac OS

Dunno about a "solution" since it's still broken out of the box.

I guess Lens could bundle the Homebrew kubectl binary on macOS instead of the upstream one (until upstream gets fixed)

ITD27M01

comment created time in 9 days

issue commentlensapp/lens

Cannot connect to k8s behind VPN on Mac OS

Withe Lens 3.6.0, I can workaround by setting the Path to Kubectl binary to /usr/local/bin/kubectl (ie homebrew) in Lens...Preferences

ITD27M01

comment created time in 10 days

pull request commentkoordinates/sno

Reflogs

@OrangeOranges GTG, but lets get CI green before merging.

https://github.com/koordinates/sno/pull/253 should sort that, then we can rebase this one and it should be green

OrangeOranges

comment created time in 14 days

pull request commentkoordinates/sno

Fix duplicate builds for koordinates-internal PRs

this still doesn't seem right (see #251). I wrote https://github.com/koordinates/sno/pull/253 last week which never got PRd for some reason :) Maybe with them together it's all good?

craigds

comment created time in 14 days

PR opened koordinates/sno

Reviewers
ci: fix identification of CI runs on/from forks or fork PRs

In CI, add to version step outputs as .is_fork, and use consistently

ref: https://github.community/t/how-to-detect-a-pull-request-from-a-fork/18363/3

Related links:

https://github.com/koordinates/sno/pull/251 & other PRs from forks https://github.com/koordinates/sno/pull/244

Checklist:

  • [X] Have you reviewed your own change?
  • [ ] Have you included test(s)?
  • [ ] Have you updated the changelog?
+19 -8

0 comment

1 changed file

pr created time in 14 days

PullRequestReviewEvent

Pull request review commentkoordinates/sno

Reflogs

 def test_branches(         r = cli_runner.invoke(["checkout", "HEAD~1"])         assert r.exit_code == 0, r -        assert text_branches(cli_runner) == ["* (no branch)", "  master"]+        assert text_branches(cli_runner) == ["* (HEAD detached at 63a9492)", "  master"]

so this changes because git reads it back from the reflog? 🤯 👍

OrangeOranges

comment created time in 14 days

PullRequestReviewEvent

issue commentlibgit2/pygit2

pip install pygit2 on MacOS does not provide GIT_FEATURE_SSH

Not building libgit2 with libssh2 was a deliberate decision at the time: libssh2 is deprecated in Redhat 7 removed in 8 (similarly centos/etc), and libgit2 doesn't have libssh support yet

See the discussion in https://github.com/libgit2/pygit2/issues/994#issuecomment-617049941

jsjohnst

comment created time in 15 days

Pull request review commentkoordinates/sno

Reflogs

 def init(     write_repo_version_config(repo, repo_version)     WorkingCopy.write_config(repo, wc_path, bare) +    # Enable writing to reflogs+    repo.config["core.logAllRefUpdates"] = "always"

so this automatically persists/writes into the repo .git/config? #til

OrangeOranges

comment created time in 15 days

PullRequestReviewEvent

issue commentAGWA/git-crypt

please add subcommand to determine lock state

Small improvement on @fauust' suggestion

git ls-tree -r --name-only -z HEAD some/path/ | xargs -0 grep -qsPa "\x00GITCRYPT"
  • HEAD is the git ref-ish to look at (can use branch names, commit IDs, etc too)
  • some/path/ is a path to look under — can omit to search the entire repo, or repeat to search multiple paths
  • handles files with spaces/etc by using nulls between paths
  • return is 0 if locked and non-zero if unlocked.
smemsh

comment created time in 15 days

issue commentkoordinates/sno

Patches which create or delete datasets are not yet supported

Should (2) & (4) become new issues?

olsen232

comment created time in 16 days

issue commentkoordinates/sno

commit/apply creates loose objects

Packing is relatively slow, we don't want to do it with tiny commits (which in normal editing use cases is most of them), so feels like maybe we want to:

  • Run git gc --auto after each commit/apply operation
  • Tweak the gc.auto configuration default

gc.auto When there are approximately more than this many loose objects in the repository, git gc --auto will pack them. Some Porcelain commands use this command to perform a light-weight garbage collection from time to time. The default value is 6700.

craigds

comment created time in 16 days

Pull request review commentkoordinates/sno

Support patches that create or delete datasets

 class RichTreeBuilder:     Conflicts are not detected.     """ -    def __init__(self, repo, initial_root_tree):+    def __init__(self, repo, initial_root_tree, auto_include_version_blobs=True):         """         The repo and an initial root tree which will be updated.         All paths are specified relative to this tree - the root tree at a particular commit is a good choice.

document what auto_include_version_blobs does, since it's not obvious?

olsen232

comment created time in 16 days

Pull request review commentkoordinates/sno

Support patches that create or delete datasets

 from .timestamps import tz_offset_to_minutes  +def get_head_tree(repo):+    """Returns the tree at the current repo HEAD."""+    if repo.is_empty:+        return None+    try:+        return repo.head.peel(pygit2.Tree)+    except pygit2.GitError:+        # This happens when the repo is not empty, but the current HEAD has no commits.+        return None+++def get_head_commit(repo):+    """Returns the commit at the current repo HEAD."""

Can the docstring be expanded with the comments about when/why None is returned?

olsen232

comment created time in 16 days

Pull request review commentkoordinates/sno

Support patches that create or delete datasets

 def apply_meta_diff(self, meta_diff, tree_builder):         if not meta_diff:             return -        conflicts = False+        # Applying diffs works even if there is no tree yet created for the dataset,+        # as is the case when the dataset is first being created right now.+        meta_tree = self.meta_tree if self.tree is not None else () -        meta_tree = self.meta_tree+        conflicts = False

would prefer to use has_/is_/can_/etc as a prefix for boolean variable names

olsen232

comment created time in 16 days

Pull request review commentkoordinates/sno

Support patches that create or delete datasets

 from .timestamps import tz_offset_to_minutes  +def get_head_tree(repo):+    """Returns the tree at the current repo HEAD."""

Can the docstring be expanded with the comments about when/why None is returned?

olsen232

comment created time in 16 days

PullRequestReviewEvent
PullRequestReviewEvent

issue openedlensapp/lens

fit-to-window button stays focused

Describe the bug

Clicking "Fit to window" or "Exit full size mode" buttons in the UI sets the focus to the button control. So if you hit a key on the keyboard it toggles again.

To Reproduce

  1. Click the Fit to Window button in the terminal
  2. Press Enter
  3. Watch it toggle again

Expected behavior

I think the focus should go back to the active terminal when the button is clicked.

Environment:

  • Lens Version: 3.5.3
  • OS: macOS
  • Installation method (e.g. snap or AppImage in Linux): DMG

created time in 17 days

Pull request review commentkoordinates/sno

Add a RichTreeBuilder, rewrite diff-apply code to use it.

 def diff(self, other, ds_filter=UNFILTERED, reverse=False):                 # GIT_DELTA_UNTRACKED                 raise NotImplementedError(f"Delta status: {d.status_char()}") -        # TODO - detect renames by comparing blob ID--        ds_diff["feature"] = feature_diff-        return ds_diff--    def diff_meta(self, other, reverse=False):-        """-        Generates a diff from self -> other, but only for meta items.-        If reverse is true, generates a diff from other -> self.-        """-        if reverse:-            old, new = other, self-        else:-            old, new = self, other--        meta_old = dict(old.meta_items()) if old else {}-        meta_new = dict(new.meta_items()) if new else {}-        return DeltaDiff.diff_dicts(meta_old, meta_new)+        return result -    def _recursive_build_feature_tree(-        self, *, repo, orig_tree, deltas_dict, encode_kwargs, pk_field, geom_column_name-    ):-        """-        Recursively builds a new tree object by applying deltas to the orig_tree.-        """-        conflicts = False-        if orig_tree is None:-            builder = repo.TreeBuilder()-        else:-            builder = repo.TreeBuilder(orig_tree)-        for k, v in deltas_dict.items():-            if isinstance(v, dict):-                orig_subtree = None-                if orig_tree is not None:-                    try:-                        orig_subtree = orig_tree / k-                    except KeyError:-                        pass--                new_subtree, new_conflicts = self._recursive_build_feature_tree(-                    repo=repo,-                    orig_tree=orig_subtree,-                    deltas_dict=v,-                    encode_kwargs=encode_kwargs,-                    pk_field=pk_field,-                    geom_column_name=geom_column_name,-                )-                conflicts = conflicts or new_conflicts-                if new_subtree != orig_subtree:-                    if orig_tree is not None and k in orig_tree:-                        builder.remove(k)-                    builder.insert(k, new_subtree, pygit2.GIT_FILEMODE_TREE)-            else:-                # actually apply blob deltas-                delta = v-                if delta.type == "delete":-                    if (not orig_tree) or k not in orig_tree:-                        conflicts = True-                        pk = delta.old.value[pk_field]-                        click.echo(-                            f"{self.path}: Trying to delete nonexistent feature: {pk}"-                        )-                        continue-                    builder.remove(k)--                elif delta.type == "insert":-                    feature_path, feature_data = self.encode_feature(-                        delta.new.value, **encode_kwargs-                    )-                    if orig_tree and k in orig_tree:-                        conflicts = True-                        pk = delta.new.value[pk_field]-                        click.echo(-                            f"{self.path}: Trying to create feature that already exists: {pk}"-                        )-                        continue-                    blob_id = repo.create_blob(feature_data)-                    builder.insert(k, blob_id, pygit2.GIT_FILEMODE_BLOB)--                elif delta.type == "update":-                    pk = self.decode_path_to_1pk(k)-                    old_pk = delta.old.value[pk_field]-                    if pk == old_pk:-                        # k refers to the *old* object-                        # (it *may* also be the new object if this isn't a rename)-                        if (not orig_tree) or k not in orig_tree:-                            conflicts = True-                            click.echo(-                                f"{self.path}: Trying to update nonexistent feature: {old_pk}"-                            )-                            continue-                        actual_existing_feature = self.get_feature(old_pk)-                        old_feature = delta.old.value-                        if geom_column_name:-                            # FIXME: actually compare the geometries here.-                            # Turns out this is quite hard - geometries are hard to compare sanely.-                            # Even if we add hacks to ignore endianness, WKB seems to vary a bit,-                            # and ogr_geometry.Equal(other) can return false for seemingly-identical geometries...-                            actual_existing_feature.pop(geom_column_name)-                            old_feature = old_feature.copy()-                            old_feature.pop(geom_column_name)-                        if actual_existing_feature != old_feature:-                            conflicts = True-                            click.echo(-                                f"{self.path}: Trying to update already-changed feature: {old_pk}"-                            )-                            continue-                        builder.remove(k)--                    new_pk = delta.new.value[pk_field]-                    if pk == new_pk:-                        # k refers to the *old* object-                        # (it *may* also be the old object if this isn't a rename)-                        new_feature_path, new_feature_data = self.encode_feature(-                            delta.new.value, **encode_kwargs-                        )-                        blob_id = repo.create_blob(new_feature_data)-                        builder.insert(k, blob_id, pygit2.GIT_FILEMODE_BLOB)-        return builder.write(), conflicts--    def write_to_new_tree(self, dataset_diff, repo, *, orig_tree):+    def apply_diff(self, dataset_diff, tree_builder):         """         Given a diff that only affects this dataset, write it to the given treebuilder.-        Blobs will be created in the repo, and referenced in the returned tree, but+        Blobs will be created in the repo, and referenced in the resulting tree, but         no commit is created - this is the responsibility of the caller.         """         # TODO - support multiple primary keys.-        # TODO - support writing new schemas-        pk_field = self.primary_key+        with tree_builder.cd(self.path):+            meta_diff = dataset_diff.get("meta")+            schema = None+            if meta_diff:+                self.apply_meta_diff(meta_diff, tree_builder)++                if "schema.json" in meta_diff and meta_diff["schema.json"].new_value:+                    schema = Schema.from_column_dicts(+                        meta_diff["schema.json"].new_value+                    )++            feature_diff = dataset_diff.get("feature")+            if feature_diff:+                self.apply_feature_diff(feature_diff, tree_builder, schema=schema)++    def apply_meta_diff(self, meta_diff, tree_builder):+        """Applies a meta diff. Not supported until Datasets V2"""+        if not meta_diff:+            return++        raise NotYetImplemented(+            f"Meta changes are not supported for version {self.version}"+        )++    def apply_feature_diff(self, feature_diff, tree_builder, *, schema=None):+        """Applies a feature diff."""++        if not feature_diff:+            return -        conflicts = False         encode_kwargs = {}+        if schema is not None:+            encode_kwargs = {"schema": schema} -        if "meta" in dataset_diff and self.version < 2:-            raise NotYetImplemented(-                f"Meta changes are not supported for version {self.version}"-            )+        geom_column_name = self.geom_column_name -        meta_path = self.full_path(self.META_PATH)-        meta_tree = orig_tree / meta_path-        for delta in dataset_diff.get("meta", {}).values():-            name = delta.key+        conflicts = False+        for delta in feature_diff.values():+            old_key = delta.old_key+            new_key = delta.new_key+            old_path = (+                self.encode_1pk_to_path(old_key, relative=True) if old_key else None+            )+            new_path = (+                self.encode_1pk_to_path(new_key, relative=True) if new_key else None+            ) -            if delta.type == "delete":-                if name not in meta_tree:-                    conflicts = True-                    click.echo(-                        f"{self.path}: Trying to delete nonexistent meta item: {name}"-                    )-                    continue-                meta_tree = git_util.replace_subtree(repo, meta_tree, name, None)+            # Conflict detection+            if delta.type == "delete" and old_path not in self.tree:+                conflicts = True+                click.echo(+                    f"{self.path}: Trying to delete nonexistent feature: {old_key}"+                )+                continue -            elif delta.type == "insert":-                if name in meta_tree:-                    conflicts = True-                    click.echo(-                        f"{self.path}: Trying to create meta item that already exists: {name}"-                    )-                    continue-                if name.endswith(".json"):-                    blob_id = repo.create_blob(json_pack(delta.new.value))-                else:-                    blob_id = repo.create_blob(ensure_bytes(delta.new.value))-                meta_tree = git_util.replace_subtree(repo, meta_tree, name, blob_id)--            elif delta.type == "update":-                if name not in meta_tree:-                    conflicts = True-                    click.echo(-                        f"{self.path}: Trying to update nonexistent meta item: {name}"-                    )-                    continue+            if delta.type == "insert" and new_path in self.tree:+                conflicts = True+                click.echo(+                    f"{self.path}: Trying to create feature that already exists: {new_key}"+                )+                continue -                if self.get_meta_item(name) != delta.old.value:-                    conflicts = True-                    click.echo(-                        f"{self.path}: Trying to update already-changed meta item: {name}"-                    )-                    continue-                meta_tree = git_util.replace_subtree(repo, meta_tree, name, None)-                if name == "schema.json":-                    old_schema = Schema.from_column_dicts(delta.old.value)-                    new_schema = Schema.from_column_dicts(delta.new.value)-                    if not old_schema.is_pk_compatible(new_schema):-                        raise NotYetImplemented(-                            "Schema changes that involve primary key changes are not yet supported"-                        )--                    encode_kwargs = {"schema": new_schema}-                    to_write = []-                    key, value = self.encode_schema(new_schema)-                    to_write.append((key[len(meta_path) :].lstrip('/'), value))-                    key, value = self.encode_legend(new_schema.legend)-                    to_write.append((key[len(meta_path) :].lstrip('/'), value))-                elif name.endswith(".json"):-                    to_write = [(name, json_pack(delta.new.value))]-                else:-                    to_write = [(name, ensure_bytes(delta.new.value))]-                for path, data in to_write:-                    blob_id = repo.create_blob(data)-                    meta_tree = git_util.replace_subtree(repo, meta_tree, path, blob_id)+            if delta.type == "update" and old_path not in self.tree:+                conflicts = True+                click.echo(+                    f"{self.path}: Trying to update nonexistent feature: {old_key}"+                )+                continue -        orig_tree = git_util.replace_subtree(repo, orig_tree, meta_path, meta_tree)+            if delta.type == "update" and not self._features_equal(+                self.get_feature(old_key), delta.old_value, geom_column_name+            ):+                conflicts = True+                click.echo(+                    f"{self.path}: Trying to update already-changed feature: {old_key}"+                )+                continue -        geom_column_name = self.geom_column_name-        deltas_by_directory = {}-        for delta in dataset_diff.get("feature", {}).values():-            if delta.type == "delete":-                pks = {delta.old.value[pk_field]}-            elif delta.type == "insert":-                pks = {delta.new.value[pk_field]}-            elif delta.type == "update":-                pks = {delta.old.value[pk_field], delta.new.value[pk_field]}--            for pk in pks:-                feature_path = self.encode_1pk_to_path(pk)-                pieces = feature_path.rsplit('/')-                dir_ = deltas_by_directory-                for piece in pieces[:-1]:-                    dir_ = dir_.setdefault(piece, {})-                dir_[pieces[-1]] = delta--        new_tree, conflicts = self._recursive_build_feature_tree(-            orig_tree=orig_tree,-            repo=repo,-            deltas_dict=deltas_by_directory,-            encode_kwargs=encode_kwargs,-            pk_field=pk_field,-            geom_column_name=geom_column_name,-        )+            # Actually write the feature diff:+            if old_path and old_path != new_path:+                tree_builder.remove(old_path)+            if delta.new_value:+                path, data = self.encode_feature(+                    delta.new.value, relative=True, **encode_kwargs+                )+                tree_builder.insert(path, data)          if conflicts:             raise InvalidOperation(                 "Patch does not apply", exit_code=PATCH_DOES_NOT_APPLY,             )-        return new_tree++    def _features_equal(self, lhs, rhs, geom_column_name):+        # FIXME: actually compare the geometries here.+        # Turns out this is quite hard - geometries are hard to compare sanely.+        # Even if we add hacks to ignore endianness, WKB seems to vary a bit,

So in terms of our geometries, our WKB encoding should (now) be deterministic. But, there is also:

  • ST_OrderingEquals() — this is OGRGeometry.Equals(other) (and OGRGeometry.Equal(other) is a legacy API to the same behaviour).
  • ST_Equals() — afaict this isn't implemented in GDAL, but is in the underlying GEOS library

Would one of them do what you want?

olsen232

comment created time in 17 days

Pull request review commentkoordinates/sno

Add a RichTreeBuilder, rewrite diff-apply code to use it.

+import contextlib++import pygit2++EMPTY_TREE_ID = '4b825dc642cb6eb9a060e54bf8d69288fbee4904'+++class RichTreeBuilder:+    """+    Like a pygit2.TreeBuilder, but more powerful - the client can buffer any number of writes to any paths+    whereas a pygit2.TreeBuilder only lets you modify one tree at a time.+    Also a bit like a pygit2.Index, but much more performant since it uses dicts instead of sorted vectors.+    Conflicts are not detected.+    """++    def __init__(self, repo, initial_root_tree):+        """+        The repo and an initial root tree which will be updated.+        All paths are specified relative to this tree - the root tree at a particular commit is a good choice.+        """+        self.repo = repo+        self.root_tree = initial_root_tree++        self.root_dict = {}+        self.cur_path = []+        self.path_stack = []++    def _resolve_path(self, path):+        """+        Resolve the given a path relative to the current path.+        The given path can be a string like "a/b/c" or a list like ["a", "b", "c"].+        """+        if isinstance(path, str):

is it worth checking that there's no \ in the path, incase os.path.join()/etc happens somewhere in the calling code?

olsen232

comment created time in 17 days

Pull request review commentkoordinates/sno

Add a RichTreeBuilder, rewrite diff-apply code to use it.

+import contextlib++import pygit2++EMPTY_TREE_ID = '4b825dc642cb6eb9a060e54bf8d69288fbee4904'+++class RichTreeBuilder:+    """+    Like a pygit2.TreeBuilder, but more powerful - the client can buffer any number of writes to any paths+    whereas a pygit2.TreeBuilder only lets you modify one tree at a time.+    Also a bit like a pygit2.Index, but much more performant since it uses dicts instead of sorted vectors.+    Conflicts are not detected.+    """++    def __init__(self, repo, initial_root_tree):+        """+        The repo and an initial root tree which will be updated.+        All paths are specified relative to this tree - the root tree at a particular commit is a good choice.+        """+        self.repo = repo+        self.root_tree = initial_root_tree++        self.root_dict = {}+        self.cur_path = []+        self.path_stack = []++    def _resolve_path(self, path):+        """+        Resolve the given a path relative to the current path.+        The given path can be a string like "a/b/c" or a list like ["a", "b", "c"].+        """+        if isinstance(path, str):+            if path.startswith("/"):+                raise RuntimeError(+                    "RichTreeBuilder.cd() does not support absolute paths"+                )+            path = path.strip("/").split("/")+        return self.cur_path + path++    @contextlib.contextmanager+    def cd(self, path):+        """+        Change the current directory used to resolve paths by the given relative path.+        Returns a context manager - when the context manager is closed, the original current directory is restored.++        Example:+        >>> with rich_tree_builder.cd("a/b/c/.sno-dataset"):+        >>>    # Make edits inside a/b/c/.sno-dataset:+        >>>    rich_tree_builder.remove("meta/title")+        >>> # Context manager closes, current path is reset to the default.+        """+        path = self._resolve_path(path)+        self._pushd()

feels like _pushd() and _popd() could be inlined here

olsen232

comment created time in 17 days

Pull request review commentkoordinates/sno

Add a RichTreeBuilder, rewrite diff-apply code to use it.

+import contextlib++import pygit2++EMPTY_TREE_ID = '4b825dc642cb6eb9a060e54bf8d69288fbee4904'

Is this a constant in libgit2? And/or Should it be in pygit2 as a constant?

olsen232

comment created time in 17 days

Pull request review commentkoordinates/sno

Add a RichTreeBuilder, rewrite diff-apply code to use it.

+import contextlib++import pygit2++EMPTY_TREE_ID = '4b825dc642cb6eb9a060e54bf8d69288fbee4904'+++class RichTreeBuilder:+    """+    Like a pygit2.TreeBuilder, but more powerful - the client can buffer any number of writes to any paths+    whereas a pygit2.TreeBuilder only lets you modify one tree at a time.+    Also a bit like a pygit2.Index, but much more performant since it uses dicts instead of sorted vectors.+    Conflicts are not detected.+    """++    def __init__(self, repo, initial_root_tree):+        """+        The repo and an initial root tree which will be updated.+        All paths are specified relative to this tree - the root tree at a particular commit is a good choice.+        """+        self.repo = repo+        self.root_tree = initial_root_tree++        self.root_dict = {}+        self.cur_path = []+        self.path_stack = []++    def _resolve_path(self, path):+        """+        Resolve the given a path relative to the current path.+        The given path can be a string like "a/b/c" or a list like ["a", "b", "c"].+        """+        if isinstance(path, str):+            if path.startswith("/"):+                raise RuntimeError(+                    "RichTreeBuilder.cd() does not support absolute paths"+                )+            path = path.strip("/").split("/")+        return self.cur_path + path++    @contextlib.contextmanager+    def cd(self, path):

small naming bikeshed, but chdir() would match stdlib

olsen232

comment created time in 17 days

PullRequestReviewEvent
PullRequestReviewEvent

pull request commenterrbotio/errbot

fix: merging configs via --storage-merge cli

Tested ok 👍

sijis

comment created time in 17 days

Pull request review commentkoordinates/sno

Fix duplicate builds for koordinates-internal PRs

 jobs:             os: macos-latest             package-formats: pkg             services: {}-    if: "startsWith(github.ref, 'refs/tags/v') || github.ref == 'refs/heads/master' || !contains(github.event.head_commit.message, '[ci only windows]')"+    # We want to run on external PRs, but not on our own internal PRs as they'll be run+    # by the push to the branch.+    # https://github.community/t/duplicate-checks-on-push-and-pull-request-simultaneous-event/18012/7
    # https://github.community/t/duplicate-checks-on-push-and-pull-request-simultaneous-event/18012/7
    # Skip Linux/macOS builds with `[ci only windows]` unless it's master or a release tag.
craigds

comment created time in 17 days

Pull request review commentkoordinates/sno

Fix duplicate builds for koordinates-internal PRs

 jobs:   windows:     name: Windows     runs-on: windows-2016-    if: "startsWith(github.ref, 'refs/tags/v') || github.ref == 'refs/heads/master' || !contains(github.event.head_commit.message, '[ci only posix]')"+    # We want to run on external PRs, but not on our own internal PRs as they'll be run+    # by the push to the branch.+    # https://github.community/t/duplicate-checks-on-push-and-pull-request-simultaneous-event/18012/7
    # https://github.community/t/duplicate-checks-on-push-and-pull-request-simultaneous-event/18012/7
    # Skip Windows builds with `[ci only posix]` unless it's master or a release tag.
craigds

comment created time in 17 days

PullRequestReviewEvent
PullRequestReviewEvent

pull request commentkoordinates/sno

Fix duplicate builds for koordinates-internal PRs

I replaced the existing if: blocks which support [ci only windows], basically because that seems niche enough to not bother and i didn't want a really overcomplicated if statement there

I've used [ci only windows|posix] a lot while working on packaging and CI, would prefer to keep it there.

craigds

comment created time in 17 days

issue commenterrbotio/errbot

`--storage-merge` doesn't merge

@sijis LGTM 👍

Thanks!

rcoup

comment created time in 18 days

delete branch rcoup/libspatialindex

delete branch : rtd-doxygen

delete time in 21 days

PR opened libspatialindex/libspatialindex

Add doxygen docs to ReadTheDocs builds

Fixes #215

  • Builds Doxygen when make html is run in the ReadTheDocs environment.
  • Installs at /doxygen/ in the built HTML
  • Makes the Class Documentation link on the homepage relative

Tested locally via READTHEDOCS=True make html, not sure what more I can do than that.

+32 -3

0 comment

3 changed files

pr created time in 21 days

create barnchrcoup/libspatialindex

branch : rtd-doxygen

created branch time in 21 days

fork rcoup/libspatialindex

C++ implementation of R*-tree, an MVR-tree and a TPR-tree with C API

https://libspatialindex.org

fork in 21 days

pull request commentkoordinates/sno

Better support for custom CRS

@rouault is FindMatches() preferred over AutoIdentifyEPSG() in the general case now?

Context without having to read back a mile: trying to get user-supplied WKT to the matching PROJCS/GEOGCS EPSG code.

olsen232

comment created time in 21 days

pull request commentkoordinates/sno

Better support for custom CRS

I'm a bit confused by this though - seems like AutoIdentifyEPSG is something you've used a lot in dandelion and it should work?

@olsen232 I think we have a pile of heuristics, of which AutoIdentifyEPSG() is a part, but it's all pretty old and predates newer PROJ/FindMatches()/etc.

olsen232

comment created time in 21 days

pull request commentkoordinates/sno

Better support for custom CRS

Here's the various docs:

AutoIdentifyEPSG():

Set EPSG authority info if possible.

This method inspects a WKT definition, and adds EPSG authority nodes where an aspect of the coordinate system can be easily and safely corresponded with an EPSG identifier. In practice, this method will evolve over time. In theory it can add authority nodes for any object (i.e. spheroid, datum, GEOGCS, units, and PROJCS) that could have an authority node. Mostly this is useful to inserting appropriate PROJCS codes for common formulations (like UTM n WGS84).

...

Since GDAL 2.3, the FindMatches() method can also be used for improved matching by researching the EPSG catalog.

FindMatches():

Try to identify a match between the passed SRS and a related SRS in a catalog.

Matching may be partial, or may fail. Returned entries will be sorted by decreasing match confidence (first entry has the highest match confidence).

The exact way matching is done may change in future versions. Starting with GDAL 3.0, it relies on PROJ' proj_identify() function.

proj_identify():

Identify the CRS with reference CRSs.

The candidate CRSs are either hard-coded, or looked in the database when it is available.

Note that the implementation uses a set of heuristics to have a good compromise of successful identifications over execution time. It might miss legitimate matches in some circumstances.

The method returns a list of matching reference CRS, and the percentage (0-100) of confidence in the match. The list is sorted by decreasing confidence.

100% means that the name of the reference entry perfectly matches the CRS name, and both are equivalent. In which case a single result is returned. Note: in the case of a GeographicCRS whose axis order is implicit in the input definition (for example ESRI WKT), then axis order is ignored for the purpose of identification. That is the CRS built from GEOGCS[“GCS_WGS_1984”,DATUM[“D_WGS_1984”,SPHEROID[“WGS_1984”,6378137.0,298.257223563]], PRIMEM[“Greenwich”,0.0],UNIT[“Degree”,0.0174532925199433]] will be identified to EPSG:4326, but will not pass a isEquivalentTo(EPSG_4326, util::IComparable::Criterion::EQUIVALENT) test, but rather isEquivalentTo(EPSG_4326, util::IComparable::Criterion::EQUIVALENT_EXCEPT_AXIS_ORDER_GEOGCRS)

90% means that CRS are equivalent, but the names are not exactly the same. 70% means that CRS are equivalent, but the names are not equivalent. 25% means that the CRS are not equivalent, but there is some similarity in the names.

olsen232

comment created time in 21 days

issue commentlibspatialindex/libspatialindex

Doxygen docs are missing

Ah, I figured it had worked at some point in the past...

Caveat: I know next-to-nothing about sphinx or doxygen, and not much more about RTD.

  • https://stackoverflow.com/a/41199722/2662 suggests you can tell Sphinx/RTD to call doxygen during the build then add the directory as an extra html path
  • https://breathe.readthedocs.io/en/latest/readthedocs.html has basically the same thing but also shows the env var to check if the sphinx build is running on RTD so it wouldn't affect local builds.

Or if you want to cross-reference between sphinx & doxygen, looks like Breathe is the way to go, but that's clearly a bigger task.

rcoup

comment created time in 22 days

push eventrcoup/sno

rcoup

commit sha 0db9262462e11835d60043a561d0e9dd6cada7c1

ci: fix identification of CI runs on/from forks or fork PRs Add to version outputs as `.is_fork`, and use consistently ref: https://github.community/t/how-to-detect-a-pull-request-from-a-fork/18363/3

view details

push time in 22 days

push eventrcoup/sno

rcoup

commit sha 7e6b95bb8f98cac34a6f5734b94efa6c1d171106

ci: fix identification of PRs from forks Add to version outputs as `.is_fork_pr`, and use consistently ref: https://github.community/t/how-to-detect-a-pull-request-from-a-fork/18363/3

view details

push time in 22 days

pull request commentkoordinates/sno

Better support for custom CRS

[there] are two broad approaches we can take:

  1. prefer to preserve WKT
  2. prefer to normalise WKT

Yeah, seems until we clearly decide this it'll be hard to get anything correct.

My reckons, opinions @hamishcampbell @craigds?

  • EPSG codes are canonical and immutable, so there's no ambiguity in the definition of EPSG:2193. Feels to me like we should normalise these.
  • Some WKT (ESRI) maps directly onto EPSG definitions, but excludes the codes (because 🤷 ) and/or other defaults. Feels like we should normalise these too.
  • You can define anything you want in Custom WKT. Feels like we shouldn't normalise these in principle

For Custom WKT, do we do a content-hash? Even with normalising whitespace, field ordering and other things could easily affect it unless there's a parser. SpatialReference.ExportToWkt() does support some options including extensions, does that turn out to be the best way to format it anyway?

olsen232

comment created time in 22 days

issue commentlibspatialindex/libspatialindex

Doxygen docs are missing

The files seem to be present at https://github.com/libspatialindex/libspatialindex.github.com/tree/master/doxygen so maybe something with the ReadTheDocs configuration?

rcoup

comment created time in 22 days

issue openedlibspatialindex/libspatialindex

Doxygen docs are missing

https://libspatialindex.org/en/latest/ links to https://libspatialindex.org/en/latest/doxygen/ for the class documentation, which is returning a 404.

created time in 22 days

create barnchrcoup/sno

branch : fork-pr-actions-fix

created branch time in 22 days

fork rcoup/sno

Distributed version-control for geospatial and tabular data

https://sno.earth

fork in 22 days

pull request commentkoordinates/sno

Pretty wkt output

macOS build failure appears to be a code-signing thing; I guess it doesn't work on third-party PRs. Can ignore that

Should skip rather than failing, I'll get it fixed on master

OrangeOranges

comment created time in 22 days

pull request commentkoordinates/sno

Better support for custom CRS

Sure. I'm kinda thinking we preserve the original WKT and if we can identify the authority code, just use it. As long as we reject bad WKT that can't be parsed at all. That way if there's weird extensions in there then they're at least preserved.

Wondering for a fallback identifier about hashing the proj4 string (ExportToProj4() or projinfo -o PROJ) rather than just creating a random number, then taking the first few bits of it? AFAICT it only needs to really be unique to a repository anyway?

Proj string for EPSG:2193 is +proj=tmerc +lat_0=0 +lon_0=173 +k=0.9996 +x_0=1600000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs +type=crs

auth_name = crs.GetAuthorityName(None) or "CUSTOM"
auth_code = crs.GetAuthorityCode(None)

if not auth_code:
  crs_proj = crs.ExportToProj4()
  crs_hash = hashlib.sha256(crs_proj.encode('ascii'))
  crs_hash_code = int.from_bytes(crs_hash.digest()[:3], byteorder='little')
  auth_code = 100000 + crs_hash_code

crs_id = f"{auth_name}:{auth_code}"
olsen232

comment created time in 23 days

push eventkoordinates/sno

Robert Coup

commit sha a83bc63de55b19c4309e87fd0e9772c80ce2448f

win: fix missing PROJ data in builds - PROJ data files don't appear to be picked up by PyInstaller's osgeo/gdal hook, so include them explicitly. - add a dist check script for Windows and run it during CI. - report PROJ version via `sno --version`

view details

rcoup

commit sha 00023d5ecb211b4d30bccee9bf2b8ef895afc9f3

e2e: use alternate CRS for diff to exercise proj

view details

Robert Coup

commit sha f0013179b85b7e142debcbce47a23c64dd609921

Merge pull request #235 from koordinates/rc-proj-data-fix-win Windows: Fix missing PROJ data in builds

view details

push time in 23 days

delete branch koordinates/sno

delete branch : rc-proj-data-fix-win

delete time in 23 days

PR merged koordinates/sno

Windows: Fix missing PROJ data in builds

Description

  • PROJ data files don't appear to be picked up by PyInstaller's osgeo/gdal hook, so include them explicitly.
  • add a distcheck PowerShell script for Windows and run it during CI. Currently checks for GDAL & PROJ data files only. Rough corollary of the existing distcheck.sh script.
  • report PROJ version via sno --version

Related links:

Similar to #42 but for Windows.

Checklist:

  • [x] Have you reviewed your own change?
  • [x] Have you included test(s)?
  • [x] Have you updated the changelog?
+45 -10

0 comment

7 changed files

rcoup

pr closed time in 23 days

push eventkoordinates/sno

rcoup

commit sha d97e5505e9e30f156544a0c978204808397c7e00

ci: enable for pull requests

view details

push time in 23 days

pull request commentkoordinates/sno

Better support for custom CRS

this can change the CRS definition.

Trying to understand what the downside of this is? GDAL expands/updates/fleshes-out assumed/default bits, but the underlying maths will work the same.

olsen232

comment created time in 23 days

Pull request review commentkoordinates/sno

Better support for custom CRS

 def _gpkg_srs_id(v2_obj):  def wkt_to_gpkg_spatial_ref_sys(wkt):     """Given a WKT crs definition, generate a gpkg_spatial_ref_sys meta item."""-    # TODO: Better support for custom WKT. https://github.com/koordinates/sno/issues/148     spatial_ref = SpatialReference(wkt)-    spatial_ref.AutoIdentifyEPSG()

I think maybe call AutoIdentifyEPSG() and use the auth name+code for the identifier if it returns one. Then keep the original WKT?

olsen232

comment created time in 23 days

PullRequestReviewEvent

push eventkoordinates/sno

rcoup

commit sha 00023d5ecb211b4d30bccee9bf2b8ef895afc9f3

e2e: use alternate CRS for diff to exercise proj

view details

push time in 23 days

push eventkoordinates/sno

OrangeOranges

commit sha 61a0bdde7ff53f7aabcd31b313d48140d91b7586

Moved row inserts in WorkingCopyGPKG into a separate function.

view details

OrangeOranges

commit sha 83ade5378e3b783ca476a10c822c2edd3351a684

Replaced SQL inserts with sqll_insert_dict function

view details

OrangeOranges

commit sha 09121880dcfd0b913954b03897d70017d6737731

sql_insert_dict function now returns cursor

view details

Craig de Stigter

commit sha f96f048200eb045d23e8e28e393e4e0caa364e9a

Merge pull request #232 from OrangeOranges/tidy_inserts Tidy inserts

view details

Andrew Olsen

commit sha 4cfa382beb97952cf685259132e478b72bc82b68

Support -m / --message for `sno merge --continue`

view details

Andrew Olsen

commit sha b991297e7c2cc00646e6803b171771e7e22c10ee

Merge pull request #236 from koordinates/merge-dash-m Support -m / --message for `sno merge --continue`

view details

Craig de Stigterx

commit sha b7ba9130d6a0327682d0a6340e3de7bde895ad21

Handle author/committer env vars in commit

view details

Andrew Olsen

commit sha 145eecd0b18282be116ae6dc582975299769472f

Better error messages for unsupported patches

view details

Craig de Stigterx

commit sha 7b8abfb2321165f6743eb5bb14656975530df5b1

apply/import: Handle author details from env

view details

Craig de Stigter

commit sha de6513d23d9fd85cbe859e10bbb56e081667b651

Merge pull request #237 from koordinates/user-from-env User from env

view details

Craig de Stigterx

commit sha 5a936c24f5c298890fb7e8e733a94b5a16c9921b

Fix win tests (uninstall pytest-xdist in CI)

view details

Robert Coup

commit sha a83bc63de55b19c4309e87fd0e9772c80ce2448f

win: fix missing PROJ data in builds - PROJ data files don't appear to be picked up by PyInstaller's osgeo/gdal hook, so include them explicitly. - add a dist check script for Windows and run it during CI. - report PROJ version via `sno --version`

view details

push time in 23 days

PullRequestReviewEvent

Pull request review commentkoordinates/sno

Better support for custom CRS

 def _gpkg_srs_id(v2_obj):  def wkt_to_gpkg_spatial_ref_sys(wkt):     """Given a WKT crs definition, generate a gpkg_spatial_ref_sys meta item."""-    # TODO: Better support for custom WKT. https://github.com/koordinates/sno/issues/148     spatial_ref = SpatialReference(wkt)-    spatial_ref.AutoIdentifyEPSG()

It's pretty common to have WKT without authority codes, but otherwise identical to an EPSG definition. This is particularly common with classic ESRI tools. If the parameters exactly match EPSG:1234 then we should use that for it's CRS identifier and that's what AutoIdentifyEPSG tries to do.

eg. WKT2 for NZTM (EPSG:2193):

    BASEGEOGCRS["NZGD2000",
        DATUM["New Zealand Geodetic Datum 2000",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4167]],
    CONVERSION["New Zealand Transverse Mercator 2000",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",173,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",1600000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",10000000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["unknown"],
        AREA["New Zealand - onshore"],
        BBOX[-47.33,166.37,-34.1,178.63]],
    ID["EPSG",2193]]

ESRI WKT for the same CRS:

PROJCS["NZGD_2000_New_Zealand_Transverse_Mercator",GEOGCS["GCS_NZGD_2000",DATUM["D_NZGD_2000",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",1600000.0],PARAMETER["False_Northing",10000000.0],PARAMETER["Central_Meridian",173.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]
olsen232

comment created time in 23 days

more