profile
viewpoint

saracen/fastzip 197

Fastzip is an opinionated Zip archiver and extractor with a focus on speed.

saracen/kubeql 56

Kubeql is a SQL-like query language for Kubernetes.

saracen/lfscache 51

LFS Cache is a caching Git LFS proxy.

saracen/matcher 50

Matcher is a fast path matcher/globber supporting globstar/doublestar

saracen/go7z 43

A native Go 7z archive reader.

saracen/bitcoin-all-key-generator 37

directory.io without the latency

saracen/navigator 18

Navigator is an easy to use Helm Chart Repository.

saracen/magnification 14

Eulerian Video Magnification

saracen/fpf 4

Golang Form Population Filter

issue commentgolang/go

crypto/x509: http.Response.TLS.VerifiedChains behavior changed in go1.9

I recently revisited this problem and there's been some changes since this was last discussed.

curl not treating intermediate certificates in the trust store as trust-anchors is only true if the ssl backend is openssl.

libcurl/7.68.0 (release Jan 2020) rectified this inconsistency using openssl's X509_V_FLAG_PARTIAL_CHAIN flag: https://github.com/curl/curl/commit/94f1f771586913addf5c68f9219e176036c50115.

The change in go1.9 made the behaviour similar to the majority of ssl backends (though openssl has yet to make X509_V_FLAG_PARTIAL_CHAIN the default: https://github.com/openssl/openssl/issues/7871).

git requires at least libcurl/7.19.4 so it will take a while for git built with openssl to work consistently in this respect, but debian/ubuntu use gnutls (so unaffected) and alpine upgraded to a libcurl that supports this from v3.11 (release 2019), so it's likely less of an issue today than it previously was.

It would be nice if an option passed to x509.Certificate.Verify could disable partial chains so that it could be used to return a chain whose root is a self-signed trust anchor and not a leaf or intermediate.

nolith

comment created time in a month

pull request commentsaracen/fastzip

Add support for GO on z/OS.

No it is not possible to cross compile. You need a real native z/OS or an emulated z/OS with zD&T (https://www.ibm.com/docs/fr/zdt/13.3.0)

@jbyibm Do I need a license for this? It'll be difficult to make changes to this library in the future that might affect zos if it's not part of the standard Go compiler and there's no way to check it at least compiles.

Is there something like a dockerized version of zD&T available for open-source CI that can be used without a license? Thank you.

jbyibm

comment created time in a month

Pull request review commentsaracen/fastzip

Add support for GO on z/OS.

+// +build zos+package fastzip++import (+    "os"+    "time"+)++func lchmod(name string, mode os.FileMode) error {+    if mode&os.ModeSymlink != 0 {+        // it is a symlink, do not follow+        return nil+    }+    err := os.Chmod(name, mode)+    if err != nil {+        return &os.PathError{Op: "lchmod", Path: name, Err: err}+    }+    return nil+}

This can be simplified to:

func lchmod(name string, mode os.FileMode) error {
	if mode&os.ModeSymlink != 0 {
		return nil
	}

	return os.Chmod(name, mode)
}

os.Chmod takes care of os.PathError for us.

jbyibm

comment created time in a month

Pull request review commentsaracen/fastzip

Add support for GO on z/OS.

+// +build zos+package fastzip++import (+    "os"+    "time"+)++func lchmod(name string, mode os.FileMode) error {+    if mode&os.ModeSymlink != 0 {+        // it is a symlink, do not follow+        return nil+    }+    err := os.Chmod(name, mode)+    if err != nil {+        return &os.PathError{Op: "lchmod", Path: name, Err: err}+    }+    return nil+}++func lchtimes(name string, mode os.FileMode, atime, mtime time.Time) error {+    err := lchmod(name, mode)+    if err != nil {+        return err+    }+    err = os.Chtimes(name, atime, mtime)+    if err != nil {+        return &os.PathError{Op: "lchtimes", Path: name, Err: err}+    }+    return nil

I'm assuming that lutimes is supported by zos, but it's available as part of golang.org/x/sys/unix yet?

I wonder if we can support it by calling the syscall directly. Are there any plans for zos support to be added to golang.org/x/sys/unix?

jbyibm

comment created time in a month

PullRequestReviewEvent
PullRequestReviewEvent

pull request commentBurntSushi/toml

Allow lossless integer to float conversions

@arp242

I think the float32 boundary check could be removed. I'd expect loss here, and unlikely integers, the TOML spec makes no mention about them being represented losslessly.

The error check for int -> float I'd keep, but only because the spec says:

If an integer cannot be represented losslessly, an error must be thrown.

In practice, you could probably remove both checks and just cast to float with loss in both cases and nobody would complain 👍 I don't mind which route we take.

saracen

comment created time in 2 months

pull request commentBurntSushi/toml

Allow lossless integer to float conversions

It would be helpful to add a little bit of detail on what problem this solves. As in, what is currently hard or impossible to do and how this solves it. Unless I missed it, that's not really mentioned in the issue you linked.

type Config struct {
    IdleScaleFactor float64 
}

If a user inputs IdleScaleFactor = 1 they receive the error: toml: cannot load TOML value of type int64 into a Go float.

IdleScaleFactor = 1.0 will however work.

One possible solution to this would be to define a IntOrFloat struct with an implemented UnmarshalTOML. This approach does work, but for us it's slightly more involved to implement because we have other unmarshalers and would need to further implement the same solution for them too.

Our users are typically of a technical background, so it is likely they'll be able to figure out what the problem is from the error message, but it's a poor experience nevertheless.

The arguments in https://github.com/BurntSushi/toml/issues/60 are the same: For their users, it's unintuitive.

So the problem this solves is a user experience issue for anybody populating config and for developers it lifts the burden from them needing to implement a special type.

The argument against supporting this was that an int to float conversion can lose information and that it was a natural in Go for TOML integers to map to Go integer types and TOML floats to map to Go float types.

I don't think this is really a "TOML specification issue" as such by the way; it's really up to the implementation to decide what can be accepted for which values depending on what makes sense for the specific language, environment, etc.

I think this slightly touches on both a specification issue and what makes sense for the language.

For Go and other libraries, it's not uncommon to expect an integer to be converted to a float upon being unmarshalled:

  • The encoding/json package can do it: https://play.golang.org/p/_Ajc2z_Npyz
  • The encoding/xml library can do it: https://play.golang.org/p/K2FmPiAicA9
  • The flag package can do it: https://play.golang.org/p/P6rXPM_v2bW

Where this becomes a spec issue is that those other decoders convert the provided integer and can lose information, but the TOML specification makes it clear that an error should be thrown if an integer is provided and information would be lost.

For other conversions, this library already adheres to that rule. A really large number would not fit within a uint8 and an error is thrown. This PR does just that with integer to float conversions, ensuring that if information would be lost, an error is thrown.

I'm feeling pretty sick at the moment so give me a week or so. Feel free to ping me if I forget after this.

I'm sorry to hear that, I hope you feel better soon! There's no rush. I'll catch up with you once you're better. Thank you!

saracen

comment created time in 2 months

issue commentBurntSushi/toml

Cannot decode integer value to float64 type.

I've opened https://github.com/BurntSushi/toml/pull/325 which I believe addresses the concerns here that integers cannot be represented losslessly as floats. Some integers can be represented losslessly and I think it makes sense to allow those and is in keeping with the specification.

@BurntSushi I'd appreciate if you could take a look and let me know if you're convinced by my argument. If not, then I apologise for the noise. We stumbled across this problem in GitLab-Runner recently and this seemed easier than going the route of supporting a special IntOrFloat type.

bmatsuo

comment created time in 2 months

PR opened BurntSushi/toml

Allow lossless integer to float conversions

According to the specification, 64-bit signed integers should be accepted providing they can be represented losslessly.

At the moment, integers to be converted to a float immediately return an error. This change permits integers and only returns an error if they would overflow the float type they're being converted to as they have no problem being represented losslessly.

In addition, this change returns an error if a float is provided but cannot fit within the float type specified by the struct.

This relates to issue https://github.com/BurntSushi/toml/issues/60, which I believe was incorrectly closed:

BurntSushi: This appears to be working as intended. In TOML, floats are indicated by the presence of a decimal point and at least one trailing 0.

Whilst this statement is true, that concerns the input of a float. I believe an argument can be made that the input here concerns an integer, that just so happens to be unmarsharled into a float. It can be represented losslessly. Only integers that cannot be should return an error.

+78 -0

0 comment

2 changed files

pr created time in 2 months

create barnchsaracen/toml

branch : int-float-bounds

created branch time in 2 months

fork saracen/toml

TOML parser for Golang with reflection.

fork in 2 months

fork saracen/toml

Tom's Obvious, Minimal Language

https://toml.io

fork in 2 months

issue commentsaracen/fastzip

Decompressed file content is messed up

That does take away some of the benefits of using fastzip though.

In order to perform parallel zip compression, we compress several files to individual files on disk and then read back the compressed content of each to add to the archive serially. You're the only one that has reported a problem with this technique.

Can you tell me more about your OS/disk etc? If it occurs on the cloud VMs, maybe I can replicate it.

assafavital

comment created time in 2 months

issue commentsaracen/fastzip

Decompressed file content is messed up

@assafavital

Does this occur if you pass the option WithArchiverConcurrency(1) too?

I'd have expected there to be a checksum error if what you're decompressing isn't correct.

assafavital

comment created time in 2 months

issue commentsaracen/fastzip

Decompressed file content is messed up

Hi @assafavital

Can you check if v0.1.5 of fastzip also has this same problem? It might be a regression with the latest versions.

Are you able to put together a zip file (using the zip utility, to avoid the issues mentioned here), whose contents compressed with fastzip causes the problem?

Thank you for the report and help diagnosing this!

assafavital

comment created time in 3 months

pull request commentklauspost/compress

zip: Add fs.FS support for Go 1.16+

@klauspost That does appear to work. I figured the same thing would happen as io/fs was only introduced in go1.16. Apologies for the noise!

klauspost

comment created time in 3 months

pull request commentklauspost/compress

zip: Add fs.FS support for Go 1.16+

% go version 
go version go1.13.8 darwin/amd64
% go mod tidy
...
	github.com/klauspost/compress/zip imports
	io/fs: malformed module path "io/fs": missing dot in first path element

It looks like go mod tidy doesn't take into consideration build tags. This is a problem for us, as we use go mod tidy on the CI to ensure consistency.

@klauspost I don't expect there's anything that can be done about this, but figured I'd leave this here for anybody else that stumbles across this problem.

klauspost

comment created time in 3 months

issue commentgolang/go

archive/zip: Performance regression with reading ZIP files with many entries

I maintain the existing behavior in that I only overwrite the CRC32

The existing behaviour didn't overwrite, but instead compared the data descriptor's CRC32 to the central directory's entry, and returned an error if they didn't match. I've opened this issue for this: https://github.com/golang/go/issues/49089.

As the change envolved, it looks like much was reverted to match the previous implementation. It doesn't look like there's now any need to be parsing the uncompressed/compressed sizes from data descriptor, nor keeping the CRC32 value and creating a data descriptor object for each file entry.

It looks like fully reverting to the previous data descriptor behaviour wouldn't be a breaking change after all, so maybe that's something we should now explore?

The diff below works for me, and still passes Bad-CRC32-in-data-descriptor. It also corrects https://github.com/golang/go/issues/49089

diff --git a/src/archive/zip/reader.go b/src/archive/zip/reader.go
index c91a8d0..bb71f45 100644
--- a/src/archive/zip/reader.go
+++ b/src/archive/zip/reader.go
@@ -125,7 +125,6 @@ func (z *Reader) init(r io.ReaderAt, size int64) error {
 		if err != nil {
 			return err
 		}
-		f.readDataDescriptor()
 		z.File = append(z.File, f)
 	}
 	if uint16(len(z.File)) != uint16(end.directoryRecords) { // only compare 16 bits here
@@ -186,10 +185,15 @@ func (f *File) Open() (io.ReadCloser, error) {
 		return nil, ErrAlgorithm
 	}
 	var rc io.ReadCloser = dcomp(r)
+	var desr io.Reader
+	if f.hasDataDescriptor() {
+		desr = io.NewSectionReader(f.zipr, f.headerOffset+bodyOffset+size, dataDescriptorLen)
+	}
 	rc = &checksumReader{
 		rc:   rc,
 		hash: crc32.NewIEEE(),
 		f:    f,
+		desr: desr,
 	}
 	return rc, nil
 }
@@ -205,49 +209,13 @@ func (f *File) OpenRaw() (io.Reader, error) {
 	return r, nil
 }
 
-func (f *File) readDataDescriptor() {
-	if !f.hasDataDescriptor() {
-		return
-	}
-
-	bodyOffset, err := f.findBodyOffset()
-	if err != nil {
-		f.descErr = err
-		return
-	}
-
-	// In section 4.3.9.2 of the spec: "However ZIP64 format MAY be used
-	// regardless of the size of a file.  When extracting, if the zip64
-	// extended information extra field is present for the file the
-	// compressed and uncompressed sizes will be 8 byte values."
-	//
-	// Historically, this package has used the compressed and uncompressed
-	// sizes from the central directory to determine if the package is
-	// zip64.
-	//
-	// For this case we allow either the extra field or sizes to determine
-	// the data descriptor length.
-	zip64 := f.zip64 || f.isZip64()
-	n := int64(dataDescriptorLen)
-	if zip64 {
-		n = dataDescriptor64Len
-	}
-	size := int64(f.CompressedSize64)
-	r := io.NewSectionReader(f.zipr, f.headerOffset+bodyOffset+size, n)
-	dd, err := readDataDescriptor(r, zip64)
-	if err != nil {
-		f.descErr = err
-		return
-	}
-	f.CRC32 = dd.crc32
-}
-
 type checksumReader struct {
 	rc    io.ReadCloser
 	hash  hash.Hash32
 	nread uint64 // number of bytes read so far
 	f     *File
-	err   error // sticky error
+	desr  io.Reader // if non-nil, where to read the data descriptor
+	err   error     // sticky error
 }
 
 func (r *checksumReader) Stat() (fs.FileInfo, error) {
@@ -268,12 +236,12 @@ func (r *checksumReader) Read(b []byte) (n int, err error) {
 		if r.nread != r.f.UncompressedSize64 {
 			return 0, io.ErrUnexpectedEOF
 		}
-		if r.f.hasDataDescriptor() {
-			if r.f.descErr != nil {
-				if r.f.descErr == io.EOF {
+		if r.desr != nil {
+			if err1 := readDataDescriptor(r.desr, r.f); err1 != nil {
+				if err1 == io.EOF {
 					err = io.ErrUnexpectedEOF
 				} else {
-					err = r.f.descErr
+					err = err1
 				}
 			} else if r.hash.Sum32() != r.f.CRC32 {
 				err = ErrChecksum
@@ -485,10 +453,8 @@ parseExtras:
 	return nil
 }
 
-func readDataDescriptor(r io.Reader, zip64 bool) (*dataDescriptor, error) {
-	// Create enough space for the largest possible size
-	var buf [dataDescriptor64Len]byte
-
+func readDataDescriptor(r io.Reader, f *File) error {
+	var buf [dataDescriptorLen]byte
 	// The spec says: "Although not originally assigned a
 	// signature, the value 0x08074b50 has commonly been adopted
 	// as a signature value for the data descriptor record.
@@ -497,9 +463,10 @@ func readDataDescriptor(r io.Reader, zip64 bool) (*dataDescriptor, error) {
 	// descriptors and should account for either case when reading
 	// ZIP files to ensure compatibility."
 	//
-	// First read just those 4 bytes to see if the signature exists.
+	// dataDescriptorLen includes the size of the signature but
+	// first read just those 4 bytes to see if it exists.
 	if _, err := io.ReadFull(r, buf[:4]); err != nil {
-		return nil, err
+		return err
 	}
 	off := 0
 	maybeSig := readBuf(buf[:4])
@@ -508,28 +475,21 @@ func readDataDescriptor(r io.Reader, zip64 bool) (*dataDescriptor, error) {
 		// bytes.
 		off += 4
 	}
-
-	end := dataDescriptorLen - 4
-	if zip64 {
-		end = dataDescriptor64Len - 4
+	if _, err := io.ReadFull(r, buf[off:12]); err != nil {
+		return err
 	}
-	if _, err := io.ReadFull(r, buf[off:end]); err != nil {
-		return nil, err
+	b := readBuf(buf[:12])
+	if b.uint32() != f.CRC32 {
+		return ErrChecksum
 	}
-	b := readBuf(buf[:end])
 
-	out := &dataDescriptor{
-		crc32: b.uint32(),
-	}
+	// The two sizes that follow here can be either 32 bits or 64 bits
+	// but the spec is not very clear on this and different
+	// interpretations has been made causing incompatibilities. We
+	// already have the sizes from the central directory so we can
+	// just ignore these.
 
-	if zip64 {
-		out.compressedSize = b.uint64()
-		out.uncompressedSize = b.uint64()
-	} else {
-		out.compressedSize = uint64(b.uint32())
-		out.uncompressedSize = uint64(b.uint32())
-	}
-	return out, nil
+	return nil
 }
 
 func readDirectoryEnd(r io.ReaderAt, size int64) (dir *directoryEnd, err error) {
stanhu

comment created time in 3 months

issue openedgolang/go

archive/zip: CRC32 data descriptor and central directory entry need no longer match in go1.17

<!-- Please answer these questions before submitting your issue. Thanks! For questions please use one of our forums: https://github.com/golang/go/wiki/Questions -->

What version of Go are you using (go version)?

<pre> $ go version go version go1.16.2 darwin/amd64 </pre>

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

<details><summary><code>go env</code> Output</summary><br><pre> $ go env

</pre></details>

What did you do?

Before 1.17, if a zip file's central directory entry's CRC32 value was different to it's data descriptor entry, ErrChecksum would be returned.

I've written a test with such behaviour. This fails with 1.17, but succeeds for 1.16. https://play.golang.org/p/103JOqe-uKp

What did you expect to see?

I don't think it was intentional for this to be changed, therefore, both CRC32 entries should still be checked against each other.

What did you see instead?

If a data descriptor is present, only it's CRC32 field is taken into consideration.

created time in 3 months

issue commentgolang/go

archive/zip: Performance regression with reading ZIP files with many entries

If I'm reading this change correctly, the only benefit to loading the data descriptors in init was that the CRC32 field was populated for each file entry and available immediately after opening the archive.

I was not reading it correctly.

The CRC32 field is available, but is from the central directory.

In the changes introduced in https://github.com/golang/go/commit/ddb648fdf6c21e7e56a2252df3e3913a212ca4ab, we started overwriting this value with the CRC32 from the data descriptor. Previously, the checksum check was made against the data descriptor CRC32, but the field in the header kept the central directory value. I don't think it makes sense for this value to be overwritten.

stanhu

comment created time in 3 months

issue commentgolang/go

archive/zip: Performance regression with reading ZIP files with many entries

@tinti Thank you for https://golang.org/cl/353715

I just tested the patch and unfortunately it doesn't entirely resolve the issue for how we use this at GitLab.

We convert reads to HTTP range requests and we still generate more than we did previously. I believe the reason is that reading this data descriptor used to occur after reading a file entry (https://go-review.googlesource.com/c/go/+/312310/14/src/archive/zip/reader.go#b224) and would convert to one continous read (the data descriptor immediately following the file data). Now that the data descriptor is read upon opening the file, we generate additional range requests.

I wonder if the newer functionality introduced in 1.17 would continue to work as expected if we moved the data descriptor read back to occuring after the file entry read?

stanhu

comment created time in 3 months

issue commentgolang/go

archive/zip: Performance regression with reading ZIP files with many entries

Change https://golang.org/cl/353715 mentions this issue: archive/zip: lazy load file data descriptor

If I'm reading this change correctly, the only benefit to loading the data descriptors in init was that the CRC32 field was populated for each file entry and available immediately after opening the archive.

@rsc Removing this could be a breaking change if anybody has started to rely on this field since 1.17, but it feels unlikely that anybody is relying on it (at least while it's a fairly young change). Are you able to comment on whether there's even a possibility of fixing this now?

stanhu

comment created time in 3 months

more