Ask questionsruntime: memory corruption on Linux 5.2+

We've had several reports of memory corruption on Linux 5.3.x (or later) kernels from people running tip since asynchronous preemption was committed. This is a super-bug to track these issues. I suspect they all have one root cause.

Typically these are "runtime error: invalid memory address or nil pointer dereference" or "runtime: unexpected return pc" or "segmentation violation" panics. They can also appear as self-detected data corruption.

If you encounter a crash that could be random memory corruption, are running Linux 5.3.x or later, and are running a recent tip Go (after commit 62e53b79227dafc6afcd92240c89acb8c0e1dd56), please file a new issue and add a comment here. If you can reproduce it, please try setting "GODEBUG=asyncpreemptoff=1" in your environment and seeing if you can still reproduce it.

Duplicate issues (I'll edit this comment to keep this up-to-date):

runtime: corrupt binary export data seen after signal preemption CL (#35326): Corruption in file version header observed by vet. Medium reproducible. Strong leads.

cmd/compile: panic during early copyelim crash (#35658): Invalid memory address in cmd/compile/internal/ssa.copyelim. Not reproducible. Nothing obvious in stack trace. Haven't dug into assembly.

runtime: SIGSEGV in mapassign_fast64 during cmd/vet (#35689): Invalid memory address in runtime.mapassign_fast64 in vet. Stack trace includes random pointers. Some assembly decoding work.

runtime: unexpected return pc for runtime.(*mheap).alloc (#35328): Unexpected return pc. Stack trace includes random pointers. Not reproducible.

cmd/dist: I/O error: read src/xxx.go: is a directory (#35776): Random misbehavior. Not reproducible.

runtime: "fatal error: mSpanList.insertBack" in mallocgc (#35771): Bad mspan next pointer (random and unaligned). Not reproducible.

cmd/compile: invalid memory address or nil pointer dereference in gc.convlit1 (#35621): Invalid memory address in cmd/compile/internal/gc.convlit1. Evidence of memory corruption, though no obvious random pointers. Not reproducible.

cmd/go: unexpected signal during runtime execution (#35783): Corruption in file version header observed by vet. Not reproducible.

runtime: unexpected return pc for runtime.systemstack_switch (#35592): Unexpected return pc. Stack trace includes random pointers. Not reproducible.

cmd/compile: random compile error running tests (#35760): Compiler data corruption. Not reproducible.


Answer questions dr2chase

Prefer touching the signal stack because it reduces the need to explain things to users, though I hope we're talking about a small number of people anyway (latency-sensitive Go users on latest-N-greatest Linux for the next few months). And, also, it will be trivial to remove if/when we get around to doing that.


Related questions

cmd/link: segmentation fault during mach-o linking
cmd/go: cannot find module providing package error stops `go get` processing hot 3
vendor/ undefined: errors.Frame ... hot 2
cmd/vet: potential false positive in the "suspect or" check hot 2
cmd/go: needs a better error than "missing dot in first path element" when GOROOT is set incorrectly hot 2
x/xerrors: fails to compile on tip hot 1
cmd/go: `go clean <package>` downloads modules hot 1
cmd/cgo error: runtime: unknown pc 0x7fff5c805b86 hot 1
runtime: crash with "invalid pc-encoded table" hot 1
cmd/link: showing many ld warnings of "building for macOS, but linking in object file" hot 1
runtime: go program crach, it seems fall into infinite loop hot 1
cmd/go: major version without preceding tag must be v0, not v1 - breaks build of hot 1
gollvm: Using External Go Packages with gollvm hot 1
runtime: macOS Sierra builders spinning hot 1
cmd/go: Problem using go modules hot 1
Github User Rank List