profile
viewpoint
Martin Möhrmann martisch @Google

martisch/eprover_dev 1

personal development copy of eprover.org source code

martisch/nanite 0

Development of a non-clausal connection calculus prover written in go

issue commentgolang/go

runtime: consider adding 24-byte size class

Still working on the heapbitmap code. Likely ready in march, just not the first week the tree opens.

dvyukov

comment created time in 21 hours

issue commentgolang/go

cmd/compile: add support for compiling without the MMX requirement on GOARCH=386

As far as I remember the fundamental reason why MMX is needed is that otherwise there is no fast/easy way to atomically write 64bit with non-mmx instructions that are part of x86.

One can go through floating point load and stores but that requires making sure the floating point control has 64bit mantissas set. Im not sure go enforces/requires that.

One can use lock cmpxchg8b but thats not guaranteed to exist as part of the instruction set so one would need to check for that. It is also has catastrophic performance implications being slower by x100 in some cases.

In the era of P5 and P6 pentiums also the guarantees changed for aligned/unaligned 64bit load/store atomicity guarantees. P5 only supported atomicity on naturally aligned (64bit for 64bit int) load/stores. That however may not be an issue since I think all callers are required to use 64bit alignment already for runtime atomic.

More practical problems: How would we test we dont regress introducing a non mmx instruction somewhere? We would need SMP builders that run in non mmx mode to catch regressions. Maybe QEMU can do this.

If its not implemented as an option it will likely slow down other x86 systems that have MMX. Is the maintenance burden supporting an extra option (running builders, code maintenance) worth the potential usage for the use case. Maybe gcc-go is enough?

b-

comment created time in 15 days

issue commentgolang/go

cmd/internal/obj/x86: pad jumps to avoid Intel erratum

I agree that the effects you describe can happen and cause performance regressions. I think on the other side there are also effects that happen just because of the padding. A jump that might have previously aligned now is misaligned. Code that previously would have fit into x cache lines now takes x+1 cache lines. So there are performance regressions that purely happen due to the padding. So the tradeoff is between adding the padding, added options/code and having regressions and having regressions in other places.

If we are purely going to do it to stabilize performance then it should not be an option but the default as that is what I expect the majority of Go users to run. Otherwise if users are turning on and off the option to see performance effect they will also measure the effects of code not padded by jumps now being aligned differently. However turning on the option for all cpus also applies padding to cpus that dont need it and change their performance (for better or worse).

rsc

comment created time in 16 days

issue commentgolang/go

hash/maphash: Document whether used hash algorithm is stable across future versions

I agree this should be called out in the packages documentation.

As this is using the runtime hash which has different implementations (non-AES vs AES) it can differ even using the same binary depending on cpu feature detection. It seems https://golang.org/cl/205069 dropped this part of the documentation which outlines that hashes are not even guaranteed to be equal between processes using the same binary:

// Two Hash instances with the same seed in different processes are // not guaranteed to behave identically, even if the processes share // the same binary.

@randall77 @rsc

nightlyone

comment created time in 16 days

issue commentgolang/go

cmd/compile: it is not possible to prevent FMA with complex values

I think that's a runtime setting. Maybe @martisch can help.

As @randall77 pointed out FMA setting doesnt exist for arm64 as we assume its always present: https://github.com/golang/go/blob/03ef105daeff4fef1fd66dbffb8e17d1f779b9ea/src/internal/cpu/cpu_arm64.go#L44

There is no condition for arm64 in the compiler currently to not generate FMA or make it conditional on the GODEBUG setting at runtime: https://github.com/golang/go/blob/a50c3ffbd47e3dcfc1b5bd2a2d19d55731481eaa/src/cmd/compile/internal/gc/ssa.go#L3574

kortschak

comment created time in 17 days

issue commentgolang/go

strings: 10-30% speed regression in Contains from 1.13 to tip

While at it can please test if its just a matter of general jump alignment and set funcAlign to 32 instead of 16, compile tip and rebenchmark: https://github.com/golang/go/blob/50bd1c4d4eb4fac8ddeb5f063c099daccfb71b26/src/cmd/link/internal/amd64/l.go#L36

zikaeroh

comment created time in 17 days

issue commentgolang/go

cmd/internal/obj/x86: pad jumps to avoid Intel erratum

It’s important to note that the software patch doesn’t just reduce the impact of the microcode update on performance. It also reduces the impact of the microcode update on consistency of performance, from one build to another. Without a software mitigation, it could be difficult to rely on benchmark data for comparative purposes.

As far as I understand cache alignment effects not only happen for jumps and aligning functions on 32byte boundaries instead of 16bytes alone would solve this without requiring padded jumps. Just aligning functions on 32byte generally sound like a good idea and is likely to have positive effects even on chips not effected by the erratum.

rsc

comment created time in 22 days

issue commentgolang/go

fmt: %#g behavior not consistent

I see now that "#" is supposed to alter the g to not remove trailing zeros later in the documentation.

"do not remove trailing zeros for %g and %G;"

musiphil

comment created time in a month

issue commentgolang/go

fmt: %#g behavior not consistent

This seems consistent with the documentation (https://godoc.org/fmt) of the g verb and precision:

"For floating-point values, width sets the minimum width of the field and precision sets the number of places after the decimal, if appropriate, except that for %g/%G precision sets the maximum number of significant digits (trailing zeros are removed)."

musiphil

comment created time in a month

issue commentgolang/go

math: FMA is slower than non-FMA calculation

As investigated above the slow down is caused by dynamically checking the if the FMA CPU capability is present on every iteration.

math.FMA is still useful even if slower as it has more precision then doing the computation with temporary results explicitly.

What could be improved when FMA operations are executed in a loop is hoisting the CPU feature checking out of the loop and create two specialized loops if the loop body is small. I would suggest we create a new generic CPU feature detection issue for that and close this issue.

mattn

comment created time in 2 months

issue commentgolang/go

math: FMA is slower than non-FMA calculation

To verify that this is the runtime dispatch overhead to determine if the cpu supports FMA I changed the compiler in cmd/compile/internal/gc/ssa.go to not add any checks.

name    time/op
FMA     0.58ns ± 2%
NonFMA  0.56ns ± 4%

As noted even if equally fast FMA has the advantage of not rounding the intermediate step.

As long as the build go binary needs to support both FMA capable and non FMA capable cpus there will be some overhead. Ideally that could be moved outside the loop but we do not have that currently. For the later I thought we already had a general bug to move the checks.

mattn

comment created time in 2 months

issue commentgolang/go

math: FMA is slower than non-FMA calculation

VFMADD231SD has a 5 cycle latency on Haswell and 2 can be executed in parallel. Same for MULSD. The added check and jump as well as other factors can make the FMA indeed slower. This seems to be 1 or 2 cycles here. This might be WAI due to a slight overhead for runtime dispatch within the loop.

mattn

comment created time in 2 months

issue commentgolang/go

math: FMA is slower than non-FMA calculation

It looks like turning off FMA support has no effect on the numbers which is wierd if the program would use FMA instructions.

Disasembly for me shows:

main_test.go:12	0x4fabae		f20f110d3ab91600	MOVSD_XMM X1, _/usr/local/google/home/moehrmann/test_test.vd(SB)	
  main_test.go:25	0x4fabb6		48ffc1			INCQ CX									
  main_test.go:25	0x4fabb9		48398810010000		CMPQ CX, 0x110(AX)							
  main_test.go:25	0x4fabc0		7e66			JLE 0x4fac28								
  main_test.go:26	0x4fabc2		90			NOPL									
  main_test.go:12	0x4fabc3		f20f100515b91600	MOVSD_XMM _/usr/local/google/home/moehrmann/test_test.vb(SB), X0	
  main_test.go:12	0x4fabcb		f20f100d15b91600	MOVSD_XMM _/usr/local/google/home/moehrmann/test_test.vc(SB), X1	
  main_test.go:12	0x4fabd3		f20f1015fdb81600	MOVSD_XMM _/usr/local/google/home/moehrmann/test_test.va(SB), X2	
  main_test.go:12	0x4fabdb		803d95b8160000		CMPB $0x0, runtime.x86HasFMA(SB)					
  main_test.go:12	0x4fabe2		7407			JE 0x4fabeb								
  main_test.go:12	0x4fabe4		c4e2e9b9c8ebc348	MOVL $0x48c3ebc8, CX							
  main_test.go:25	0x4fabec		894c2420		MOVL CX, 0x20(SP)							
  main_test.go:12	0x4fabf0		f20f111424		MOVSD_XMM X2, 0(SP)							
  main_test.go:12	0x4fabf5		f20f1005e3b81600	MOVSD_XMM _/usr/local/google/home/moehrmann/test_test.vb(SB), X0	
  main_test.go:12	0x4fabfd		f20f11442408		MOVSD_XMM X0, 0x8(SP)							
  main_test.go:12	0x4fac03		f20f1005ddb81600	MOVSD_XMM _/usr/local/google/home/moehrmann/test_test.vc(SB), X0	
  main_test.go:12	0x4fac0b		f20f11442410		MOVSD_XMM X0, 0x10(SP)							
  main_test.go:12	0x4fac11		e82a28f8ff		CALL math.FMA(SB)							
  main_test.go:12	0x4fac16		f20f104c2418		MOVSD_XMM 0x18(SP), X1							
  main_test.go:25	0x4fac1c		488b442438		MOVQ 0x38(SP), AX							
  main_test.go:25	0x4fac21		488b4c2420		MOVQ 0x20(SP), CX							
  main_test.go:12	0x4fac26		eb86			JMP 0x4fabae								
  main_test.go:25	0x4fac28		488b6c2428		MOVQ 0x28(SP), BP							
  main_test.go:25	0x4fac2d		4883c430		ADDQ $0x30, SP								
  main_test.go:25	0x4fac31		c3			RET				

Which looks wrong since its missing the VFMADD231SD instruction.

Using GOSSAFUNC=BenchmarkFMA go test -c however I see the VFMADD231SD instruction:

00035 (+12) MOVSD "".vb(SB), X0
00036 (12) MOVSD "".vc(SB), X1
00037 (12) MOVSD "".va(SB), X2
00038 (12) CMPB runtime.x86HasFMA(SB), $0
00039 (12) JEQ 42
00040 (12) VFMADD231SD X0, X2, X1
00041 (12) JMP 26
00042 (25) PCDATA $0, $0
00043 (25) MOVQ CX, "".i-8(SP)
00044 (12) MOVSD X2, (SP)
00045 (12) MOVSD "".vb(SB), X0
00046 (12) MOVSD X0, 8(SP)
00047 (12) MOVSD "".vc(SB), X0
00048 (12) MOVSD X0, 16(SP)
00049 (12) CALL math.FMA(SB)
00050 (12) MOVSD 24(SP), X1
00051 (25) PCDATA $0, $1
00052 (25) MOVQ "".b(SP), AX
00053 (25) MOVQ "".i-8(SP), CX
00054 (12) JMP 26
00055 (25) PCDATA $0, $-1
00056 (25) PCDATA $1, $-1
00057 (25) RET
00058 (?) END
mattn

comment created time in 2 months

issue commentgolang/go

math: FMA is slower than non-FMA calculation

Which CPU is used for the benchmark?

Please benchmark with GODEBUG=cpu.fma=off and see if this changes anything. If FMA instructions are used I would expect a change in the FMA benchmark numbers.

Please also write the benchmark as a go benchmark: https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go

Then execute the test with -count=20 and store the results in a file. Use a quiet machine with e.g. no browser or videos running.

Afterwards use https://godoc.org/golang.org/x/tools/cmd/benchcmp to produce an "average" over the runs.

This will give better information how consistent between runs the results are.

mattn

comment created time in 2 months

issue closedgolang/go

Member of anonymity Rega hawlery

<!-- Please answer these questions before submitting your issue. Thanks! -->

What version of Go are you using (go version)?

<pre> $ go version

</pre>

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

<details><summary><code>go env</code> Output</summary><br><pre> $ go env

</pre></details>

What did you do?

<!-- If possible, provide a recipe for reproducing the error. A complete runnable program is good. A link on play.golang.org is best. -->

What did you expect to see?

What did you see instead?

closed time in 3 months

Anonymbus4

issue commentgolang/go

cmd/internal/obj/x86: pad jumps to avoid Intel erratum

Phoronix has some benchmarks for GCC https://www.phoronix.com/scan.php?page=article&item=intel-jcc-microcode

It seems like adding padding can also regress performance in benchmarks for affected CPUs with new microcode.

rsc

comment created time in 3 months

issue commentgolang/go

cmd/internal/obj/x86: pad jumps to avoid Intel erratum

Some more thoughts on the topic:

Go 1.14: This issue is currently marked as go1.14 however I do not think this change should be made at this point in the cycle as there is a workaround (microcode update) that is needed for other programs to work correctly on the affected CPU at any rate and previous go versions unless back ported will have the same unpredictable behavior on affected CPUs without microcode updates.

Padding and binary size: Go Gc apart from function alignment has AFAIK not really started to exploit the potential of padding for performance improvements. If e.g. a 2% binary size budget increase is to be used it may be better to apply this towards a generic loop alignment to speed up execution on a much larger selection of amd64 CPUs.

Architecture specific padding: I think its needed to discuss some general thresholds/criteria of accepting the tradeoffs of binary size vs performance. If this change is to be accepted as the default would we also consider adding NOPs to pad instruction streams to increase performance on Atom architectures?

Adding GOAMD64 options: We have largely avoided adding tuning flags for amd64 architectures. I can see other options potentially having a much larger effect on amd64 optimization than padding (e.g. assume SSE4, AVX together with additional compiler changes to use these instructions much more broadly). The difference to just different padding here is of course that these options would not allow the resulting binary to run on old hardware. Depending on the instructions chosen to be a given there might however be very few CPU that are actually used often affected. Further options that might have better effects but do not exclude older CPUs may be instruction scheduling and cost of operations.

Maintenance of new compiler options: The more options are added the more buildbots, benchmarks, tests we will be need to understand and detect that compiler changes do not regress in correctness or performance. This also provides one more option by which Go programs can differ when analyzing bug reports.

Pseudo assembler doing more magic: A problem that may arise is that if the Go assembler starts inserting NOPs also into assembler the developers have written that this might interfere with careful alignments they may have wanted to be achieved by choosing a specific chain of instructions.

rsc

comment created time in 3 months

issue commentgolang/go

proposal: cmd/go: add GOMIPS64=r2 for mips64r2 code generation

I think we need clear demonstration that using more advanced instruction could lead to non-trivial performance gain for general Go programs before even considering bumping the minimal requirement.

I think the same (making sure the advantage is big enough to warrant the change) can be said for adding a new GOMIPS64 option. A new option will mean more compiler complexity which makes maintaining the compiler and changing the compiler harder. E.g. making any generic rule change needs to understand and test one more configuration. This thereby also has an impact on optimizations efforts for other architectures. This also brings the question if there is capacity to add a new buildbot to make sure neither r2 or non-r2 ports regresses.

mengzhuo

comment created time in 3 months

issue commentgolang/go

cmd/internal/obj/x86: pad jumps to avoid Intel erratum

@knweiss your text also aligns with my understanding: If the affected CPU has an updated microcode then padding is only a performance improvement for the affected and microcode updated CPU while adding the padding is lower performance (decode bandwidth, icache usage) for all CPUs (unaffected or unpatched).

This looks like a performance tradeoff decision (as e.g. many operating systems load updated microcode automatically even if the bios is not updated) between affected and unaffected CPUs to me.

rsc

comment created time in 3 months

issue commentgolang/go

strings: 10-30% speed regression in Contains from 1.13 to tip

What CPU does run on your dev machine and at what CL are you building at tip?

Any CL can potentially cause the benchmark code to align differently and then benchmark the effects of branch alignment. I have seen in the past that it can matter a lot where the benchmark loop is placed in the file which can cause different alignment. With all the side channel attacks on caching/branch prediction there can even be benchmark differences due to different microcode versions of the same cpu.

zikaeroh

comment created time in 3 months

issue commentgolang/go

proposal: runtime/pprof: add new WithLabels* function that requires fewer allocations

It would still be good to find a way to avoid exposing the map[string]string. I agree.

Since we are in freeze for go1.14 I have 6months to spend some time figuring out an interface (and potential generic compiler optimization if needed) that does not cause an additional allocation and just assumes a string type for key and value but no other implementation details.

martisch

comment created time in 3 months

issue commentgolang/go

proposal: runtime/pprof: add new WithLabels* function that requires fewer allocations

Running new benchmarks will add them here soon.

martisch

comment created time in 3 months

issue commentgolang/go

proposal: cmd/go: adding GOMIPS64REV for MIPS64r*

I agree with cherrymui@ in that it is unclear (at least to me) whether the added complexity is worth the benefit.

Will there be builders for the new combinations if we do not have any?

It would also be nice to know what the performance benefits are vs not doing this or doing runtime detection of the features (if that is possible).

If pre r2 is not widely used an alternative might be to just make r2 the new minimal requirement.

mengzhuo

comment created time in 4 months

issue commentgolang/go

fmt does not format byte slices nicely

The current behavior looks to me as document. Note that []byte often formats the way a string would be formatted. That is why []int behaves differently. We can not change it without breaking backwards compatibility. Since there is a way to format the slcByte the way wanted I am closing the issue.

If the behavior should be changed please open a new issue with a concrete proposal for Go 2 how the behavior of fmt should be changed and how the gained advantage is worth breaking the backwards compatibility.

guysoffer

comment created time in 4 months

issue closedgolang/go

fmt does not format byte slices nicely

<!-- Please answer these questions before submitting your issue. Thanks! -->

What version of Go are you using (go version)?

<pre> go version go1.12.1 windows/amd64 </pre>

Does this issue reproduce with the latest release?

reproduces on go playground

What operating system and processor architecture are you using (go env)?

<details><summary><code>go env</code> Output</summary><br><pre> C:\Users\gs082r>go env set GOARCH=amd64 set GOBIN= set GOCACHE=C:\Users\gs082r\AppData\Local\go-build set GOEXE=.exe set GOFLAGS= set GOHOSTARCH=amd64 set GOHOSTOS=windows set GOOS=windows set GOPATH=C:\Users\gs082r\go set GOPROXY= set GORACE= set GOROOT=C:\Go set GOTMPDIR= set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64 set GCCGO=gccgo set CC=gcc set CXX=g++ set CGO_ENABLED=1 set GOMOD= set CGO_CFLAGS=-g -O2 set CGO_CPPFLAGS= set CGO_CXXFLAGS=-g -O2 set CGO_FFLAGS=-g -O2 set CGO_LDFLAGS=-g -O2 set PKG_CONFIG=pkg-config set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\gs0 82r\AppData\Local\Temp\go-build177268958=/tmp/go-build -gno-record-gcc-switches

</pre></details>

What did you do?

I'm trying to print a slice of bytes as an array of hex values.

What did you expect to see?

[0x0a 0x14 0x1e 0x28 0x32 0x3c 0x46 0x50 0x5a 0x64 0x65]

What did you see instead?

0x0a14

Please see playground link: https://play.golang.org/p/gQIYJV6ogx4

The first line is ok for a slice of ints.. the %.2v is a sort of workaround, but not what I actually need.

dump from playground:

package main

import (
	"fmt"
)

func main() {
	slcInt8 := [...]int8{10,20,30,40,50,60,70,80,90,100,101}
	fmt.Printf("%#.2x\n", slcInt8)	
	slcByte := [...]byte{10,20,30,40,50,60,70,80,90,100,101}
	fmt.Printf("%#.2x\n", slcByte)	
	slcUint8 := [...]byte{10,20,30,40,50,60,70,80,90,100,101}
	fmt.Printf("%#.2x\n", slcUint8)	
	
	fmt.Printf("%#x\n", slcInt8)	
	fmt.Printf("%#x\n", slcByte)	
	fmt.Printf("%#x\n", slcUint8)	
	
	fmt.Printf("%#.2v\n", slcInt8)	
	fmt.Printf("%#.2v\n", slcByte)	
	fmt.Printf("%#.2v\n", slcUint8)		
}

output:

[0x0a 0x14 0x1e 0x28 0x32 0x3c 0x46 0x50 0x5a 0x64 0x65]
0x0a14
0x0a14
[0xa 0x14 0x1e 0x28 0x32 0x3c 0x46 0x50 0x5a 0x64 0x65]
0x0a141e28323c46505a6465
0x0a141e28323c46505a6465
[11]int8{10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 101}
[11]uint8{0x0a, 0x14, 0x1e, 0x28, 0x32, 0x3c, 0x46, 0x50, 0x5a, 0x64, 0x65}
[11]uint8{0x0a, 0x14, 0x1e, 0x28, 0x32, 0x3c, 0x46, 0x50, 0x5a, 0x64, 0x65}

closed time in 4 months

guysoffer

issue commentgolang/go

fmt does not format byte slices nicely

fmt.Printf("%# x\n", slcByte) will output:

0x0a 0x14 0x1e 0x28 0x32 0x3c 0x46 0x50 0x5a 0x64 0x65
guysoffer

comment created time in 4 months

issue commentgolang/go

fmt: converts numeric type to string before rendering characters as hex

  1. I think is documented at https://golang.org/pkg/fmt/:

If the format (which is implicitly %v for Println etc.) is valid for a string (%s %q %v %x %X), the following two rules apply: .... If an operand implements method String() string, that method will be invoked to convert the object to a string, which will then be formatted as required by the verb (if any).

If wording can be improved/clarified you can send a CL to do so. I do not think this would need an extra proposal. Details can be discussed on the CL.

Filling bugs/feature proposals for 2. and 3. sounds good to me.

wti

comment created time in 4 months

issue commentgolang/go

proposal: runtime/pprof: add new WithLabels* function that requires fewer allocations

I have not gotten around to profiling or testing that yet. Putting better naming aside a possible approach is:

type LabelSet struct {
  list []label // never used
  mapper LabelMapper
}

func LabelsFromMapper(lm LabelMapper) LabelSet {
  return LabelSet{mapper: lm}
}

func Labels(args ...string) LabelSet {
  return LabelSet{mapper: listMapper(args)}
}

func (l listMapper) Len() int { return len(l) }
func (l listMapper) Map(f func(key, value string)) {
  for i := 0; i+1 < len(l); i += 2 {
   f(l[i], l[i+1])
  }
}

to keep LabelSet struct backwards compatible. The inner workings of WithLabels can then be replaced with that of WithLabelsFromMapper from https://golang.org/cl/188499 using the mapper field of LabelSet.

martisch

comment created time in 4 months

issue closedgolang/go

cmd/compile: prefer AND instead of SHR+SHL

In cl/19485 we added a generic SSA rule to replace an AND with specific constants by two SHIFT instructions.

While this optimization does avoid a load of the constant to be ANDed into an extra register and has a shorter encoding on amd64 it does use two data dependent instructions. There was already some discussion on the CL after accidental early submission that micro benchmarks do not show using two shifts to be faster. Some seem to show it can be slower e.g. x &= 1 << 63. I benchmarked math.Copysign and it showed the same throughput using either variant. LLVM and GCC do not seem to replace AND with SHIFTs on amd64 either and some platforms like arm64 in the go gc compiler even reverse this rewrite.

https://go.googlesource.com/go/+/06f709a04acc6d5ba0ba181129e9ee93ed20f311/src/cmd/compile/internal/ssa/gen/ARM64.rules#1866

There has also been some unwanted interaction with other optimizations rules e.g. #32781.

The initial added rules from cl/19485 operated on And64 and And32. The And32 case was already removed in cl/20410 .

Removing the And64 case too can make binaries slightly larger but in common cases where there is no register pressure should be as fast or faster as two SHIFTs on modern amd64 CPUs and should create less interference with other SSA rules that need not consider the additional case of a mask using AND having been rewritten to SHIFTs.

I intend to send CLs for evaluation and submission to remove the generic rule to rewrite AND into a pair of SHIFTs for go1.14 and follow up with some CLs that avoid regressing on interaction with other rules that are based on optimizing SHIFTs instead of ANDs. For example the AND instruction in (u64>>32)&0xFFFFFFFF should still be optimized away by SSA.

This issue is to document the related CLs and discuss this (de)optimization and whether any go gc supported 64bit platforms should keep rewriting some ANDs to two SHIFTs.

/cc @klauspost @cherrymui @khr @dr2chase @laboger @mundaym

closed time in 4 months

martisch

issue commentgolang/go

cmd/compile: prefer AND instead of SHR+SHL

The feature is in at the 3rd try and AFAIK did not cause any new breakages since then.

There is some cleanups that can be done. e.g. adding codegen tests that the old shift based rules are exercised and tested or just remove the old rules. However I wont likely get to that and test them across the different architectures due to other feature CLs before go1.14. Therefore closing this issue as the desired generate of code itself is in for go1.14.

martisch

comment created time in 4 months

issue commentgolang/go

cmd/compile: skip slicebytetostring when argument doesn't escape

Small fixed length slices that do not escape are allocated on the stack. I think this has been for a while. Having a variable byte slice and then a string conversion as far as I know still causes two allocations if the resulting string escapes even if there are no other references to the byte slice.

bradfitz

comment created time in 5 months

issue commentgolang/go

internal/reflectlite: Implements erroneously reports that map types do not implement interfaces

@cuonglm got to it before I could finish writing tests locally and verify this fixes issues seen. Can we also add tests to https://golang.org/cl/197559 to prevent this from happening again?

proglottis

comment created time in 5 months

issue commentgolang/go

internal/reflectlite: Implements erroneously reports that map types do not implement interfaces

One thing that I noticed while looking at reflectlite is that we forgot to update it (like we did for reflect) in https://golang.org/cl//191198 which likely causes mismatches between how the runtime and reflectlite.

proglottis

comment created time in 5 months

issue commentgolang/go

cmd/compile: inline functions that are called only once

Do you have some concrete examples where this is useful?

I assume called once does mean only referenced once with a call? Whether a function is called once might only be found out at runtime.

There are possible bad interactions of always inlining that come to my mind:

  • it effects the midstack inline budget of inlined too function
  • its a function with a large codebase and slow path that was explicitly split out by the programmer to separate cold from hot code
  • increases register pressure and more work for the compiler to figure out what needed when
marigonzes

comment created time in 5 months

issue commentgolang/go

x/build: add "slowbots" support

Related:

  • https://github.com/golang/go/issues/29239 x/build: trybots should include all platforms that can contribute release-blockers
bradfitz

comment created time in 5 months

issue commentgolang/go

cmd/compile: internal compiler error: checkIfRange on ppc64le builders

@laboger. I do not currently have a ppc64le machine setup to test or debug this. Could you please have a look if there is an easy fix forward?

As this only happens in race and on ppc64le (not ppc64) it seems a special/local problem. Maybe since cl/194297 can use introduce the use of an extra register to load a large constant where it previously did not this increased the register pressure above a threshold when compiling the checkIfRange in race mode. If that is the case could registers be spilled to resolve the problem? Seems other changes could run into this problem too if not resolved generally.

https://github.com/golang/go/blob/7ed973b4d9dab38347f34e87febf3c8659160ce6/src/net/http/fs.go#L447

ALTree

comment created time in 5 months

issue commentgolang/go

strings: strings.Builder with sync.Pool case memory leak

Closing as this looks to be resolved and working as intended.

szyhf

comment created time in 5 months

issue closedgolang/go

strings: strings.Builder with sync.Pool case memory leak

<!-- Please answer these questions before submitting your issue. Thanks! -->

What version of Go are you using (go version)?

Tried both 1.13.0 and 1.12.9.

<pre> $ go version

</pre>

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

In docker image alpine:3.8, build cmd is GOOS=linux GOARCH=amd64 go build on macOS 10.14.6

<details><summary><code>go env</code> Output</summary><br><pre> $ go env

</pre></details>

What did you do?

<!-- If possible, provide a recipe for reproducing the error. A complete runnable program is good. A link on play.golang.org is best. -->

Using a sync.Pool to reuse strings.Builder.

https://play.golang.org/p/TkWbxGu4PRJ

I use strings.Builder to build sql, and exec the sql-str every times, but I found that the memory will increase slowly, then I called pprof/heap and found that the strings.(*Builder).WriteString inuse a lot of memory and looks never released.

QQ20190923-085002

What did you expect to see?

The memory not increase unexpectedly

What did you see instead?

The memory not increase unexpectedly

Really thanks for help!

closed time in 5 months

szyhf

issue commentgolang/go

fmt: priority of error and Stringer when handling non-simple types

I think this is very similar to the previously discussed but closed https://github.com/golang/go/issues/25707.

I think this is working as documented currently (https://golang.org/pkg/fmt/) and we would not be able to change it without breaking backwards compatibility.

4. If an operand implements the error interface, the Error method will be invoked to convert the object to a string, which will then be formatted as required by the verb (if any).

5. If an operand implements method String() string, that method will be invoked to convert the object to a string, which will then be formatted as required by the verb (if any).

If its agreed to be working as documented for fmt do you want to rewrite your report as a change proposal focusing on why the new way would better and worth potentially breaking existing code that should be rewritten?

mfridman

comment created time in 5 months

issue openedgolang/go

js/wasm: codegen tests broken with commit 9c384cc

After 9c384cc test$ ../bin/go run run.go -all_codegen -v codegen with go tip fails for me locally on two different computers.

The build dashboard does not show a breakage: Are codegen tests run on js/wasm builders?

cc @agnivade @bradfitz

created time in 5 months

issue commentgolang/go

x/build: run full codegen testsuite (-all_codegen) on amd64 trybots

I think we can add 9c384cc (@agnivade) to the list of CLs breaking codegen tests and not being caught by trybots. Although the builder did not fail test$ ./bin/go run run.go -all_codegen -v codegen` fails for me locally.

rasky

comment created time in 5 months

issue commentgolang/go

proposal: runtime/pprof: add new WithLabels* function that requires fewer allocations

ping to experts. Would be nice if this could be resolved before go1.14 enters the freeze period. /cc @hyangah @pjweinbgo @matloob

martisch

comment created time in 5 months

issue commentgolang/go

fmt: converts numeric type to string before rendering characters as hex

That a type implementing Stringer calls the String method first is working as intended/documented. It has also come up and was discussed in other reports e.g. https://github.com/golang/go/issues/21535.

wti

comment created time in 5 months

issue commentgolang/go

x/sys/cpu: respect CPU feature overrides specified in GODEBUG

I was thinking about initialization function problems with different major versions. I guess there is no problem because runtime only needs to initialize the major version that it itself uses.

Lekensteyn

comment created time in 5 months

issue commentgolang/go

x/sys/cpu: respect CPU feature overrides specified in GODEBUG

Would there be a problem later with the outlined procedure when user code or std lib needs to initialize two different major versions of x/sys/cpu?

Lekensteyn

comment created time in 5 months

issue commentgolang/go

proposal: runtime/pprof: add new WithLabels* function that requires fewer allocations

They still get copied into the context in some form, though, so you've cut the allocations by at most 50%, not 100%.

I do not think the proposal anywhere claims to lower down the usage to no allocations. There is at least one allocation needed when keeping the internal map as the tags need to be copied in and owned by the the runtime so they can not be altered anymore from the outside.

Is there a simpler or cleaner API? Is Len really necessary?

The Len allows to pre-size the internally created map to usually hold all items from parent and child context with the initial allocation as another optimisation. Profiling shows that some map growth is made in this function which seems to account for ~30% of the time in WithLabels.

for _, label := range labels.list {
	childLabels[label.key] = label.value
}
martisch

comment created time in 5 months

issue commentgolang/go

cmd/compile: detect and optimize slice insertion idiom append(sa, append(sb, sc...)...)

Careful with newcap := len(s1) + len(s2) + len(s3) + ... this can overflow and wrap around making the resulting length to small. The adds need to be done one by one with overflow checked each time. If any problem is encountered on the way the up to then good appends need to be applied so there is no difference in semantics.

go101

comment created time in 5 months

issue commentgolang/go

x/sys/cpu: respect CPU feature overrides specified in GODEBUG

We've tried to stop adding new packages to std, preferring everything to be public where it can have its own release cadence. (e.g. fix bugs today, don't wait 6 months.... https://golang.org/doc/faq#x_in_std)

I am aware of that but I think that point will lose some of its appeal when we hide the coupling with runtime internals under the hood with build tags and unsafe. We also have exposed other std lib/runtime functions in new std lib packages lately like bytes/hash that are coupled to the runtime.

Lekensteyn

comment created time in 5 months

issue commentgolang/go

x/sys/cpu: respect CPU feature overrides specified in GODEBUG

Other complications for why runtime does not use x/sys/cpu:

  • some constructs like maps can not be used anywhere as those parts of the runtime are not initialized yet
  • internal/cpu supports more architectures/platforms as it is has access to non public runtime internals like auxv information to derive hwcap bits which x/sys/cpu can not use directly unless we expose an API or update unsafe coupling with any related runtime change.

Let me ask the other way around: Why cant we just make internal/cpu public as package cpu in standard lib?

Lekensteyn

comment created time in 5 months

issue commentgolang/go

cmd/compile: prefer AND instead of SHR+SHL

The first attempt broke codegen tests in arm64 (bfxil) and s390x (abs and copysign). https://build.golang.org/log/a47a1c98bac2aa509d570d6bf583609857b1be5a https://build.golang.org/log/6456da8a84cc82e7407981287f211caaab41c7eb

Will make sure to run codegen tests for all platforms for the new version of the CL.

For a new version of the CL I added abs and copysign rules that need detect the new AND variant, still working on the arm64 bitfield adjustment.

martisch

comment created time in 5 months

issue closedgolang/go

compile: s390x copysign and abs codegen tests broken

CL https://go-review.googlesource.com/c/go/+/191780 preferring AND over shifts broke the s390x codegen math tests.

I assume the copysign and abs detection introduced in CL https://go-review.googlesource.com/c/go/+/73950/ does not apply anymore.

The ssa optimization rules for s390x will need to be adjusted to detect the new patterns.

cc @mundaym @randall77

closed time in 5 months

martisch

issue commentgolang/go

compile: s390x copysign and abs codegen tests broken

Rolling back in https://go-review.googlesource.com/c/go/+/193850 as it also broke arm64. Will followup on #33826.

martisch

comment created time in 5 months

issue openedgolang/go

compile: s390x copysign and abs codegen tests broken

CL https://go-review.googlesource.com/c/go/+/191780 preferring AND over shifts broke the s390x codegen math tests.

I assume the copysign and abs detection introduced in CL https://go-review.googlesource.com/c/go/+/73950/ does not apply anymore.

The ssa optimization rules for s390x will need to be adjusted to detect the new patterns.

cc @mundaym @randall77

created time in 5 months

issue commentgolang/go

x/sys/cpu: respect CPU feature overrides specified in GODEBUG

I think we should port over all CPU feature masking options that internal/cpu supports. Individual feature settings and masking all features. This will also help packages imported into the standard library that use x/sys/cpu (see #32102) to be aligned with internal/cpu uses.

Lekensteyn

comment created time in 5 months

more