The Go programming language
Go to file
Russ Cox 468cf82780 liblink: fix incorrect hash collision in lookup
linklookup uses hash(name, v) as the hash table index but then
only compares name to find a symbol to return.
If hash(name, v1) == hash(name, v2) for v1 != v2, the lookup
for v2 will return the symbol with v1.

The input routines assume that each symbol is found only once,
and then each symbol is added to a linked list, with the list header
in the symbol. Adding a symbol to such a list multiple times
short-circuits the list the second time it is added, causing symbols
to be dropped.

The liblink rewrite introduced an elegant, if inefficient, handling
of duplicated symbols by creating a dummy symbol to read the
duplicate into. The dummy symbols are named .dup with
sequential version numbers. With many .dup symbols, eventually
there will be a conflict, causing a duplicate list add, causing elided
symbols, causing a crash when calling one of the elided symbols.

The bug is old (2011) but could not have manifested until the
liblink rewrite introduced this heavily duplicated symbol .dup.
(See History section below.)

1. Correct the lookup function.

2. Since we want all the .dup symbols to be different, there's no
point in inserting them into the table. Call linknewsym directly,
avoiding the lookup function entirely.

3. Since nothing can refer to the .dup symbols, do not bother
adding them to the list of functions (textp) at all.

4. In lieu of a unit test, introduce additional consistency checks to
detect adding a symbol to a list multiple times. This would have
caught the short-circuit more directly, and it will detect a variety
of double-use bugs, including the one arising from the bad lookup.

Fixes #7749.

History

On April 9, 2011, I submitted CL 4383047, making ld 25% faster.
Much of the focus was on the hash table lookup function, and
one of the changes was to remove the s->version == v comparison [1].

I don't know if this was a simple editing error or if I reasoned that
same name but different v would yield a different hash slot and
so the name test alone sufficed. It is tempting to claim the former,
but it was probably the latter.

Because the hash is an iterated multiply+add, the version ends up
adding v*3ⁿ to the hash, where n is the length of the name.
A collision would need x*3ⁿ ≡ y*3ⁿ (mod 2²⁴ mod 100003),
or equivalently x*3ⁿ ≡ x*3ⁿ + (y-x)*3ⁿ (mod 2²⁴ mod 100003),
so collisions will actually be periodic: versions x and y collide
when d = y-x satisfies d*3ⁿ ≡ 0 (mod 2²⁴ mod 100003).
Since we allocate version numbers sequentially, this is actually
about the best case one could imagine: the collision rate is
much lower than if the hash were more random.
http://play.golang.org/p/TScD41c_hA computes the collision
period for various name lengths.

The most common symbol in the new linker is .dup, and for n=4
the period is maximized: the 100004th symbol is the first collision.
Unfortunately, there are programs with more duplicated symbols
than that.

In Go 1.2 and before, duplicate symbols were handled without
creating a dummy symbol, so this particular case for generating
many duplicate symbols could not happen. Go does not use
versioned symbols. Only C does; each input file gives a different
version to its static declarations. There just aren't enough C files
for this to come up in that context.

So the bug is old but the realization of the bug is new.

[1] https://golang.org/cl/4383047/diff/5001/src/cmd/ld/lib.c

LGTM=minux.ma, iant, dave
R=golang-codereviews, minux.ma, bradfitz, iant, dave
CC=golang-codereviews, r
https://golang.org/cl/87910047
2014-04-16 11:53:14 -04:00
api api: update next.txt 2014-04-01 13:14:45 -04:00
doc doc/debugging_with_gdb: use -w to strip debug info. 2014-04-16 01:19:26 -04:00
include liblink: fix incorrect hash collision in lookup 2014-04-16 11:53:14 -04:00
lib codereview: remove unused upload_options.revision 2014-02-24 10:11:37 -05:00
misc misc/emacs: ignore backquote in comment or string 2014-04-09 12:28:27 -04:00
src liblink: fix incorrect hash collision in lookup 2014-04-16 11:53:14 -04:00
test undo CL 66510044 / 6c0339d94123 2014-04-14 09:48:11 -04:00
.hgignore lib9: enable on Plan 9 2014-02-13 20:06:41 +01:00
.hgtags tag go1.2.1 2014-03-03 13:22:13 +11:00
AUTHORS A+C: Billie Harold Cleek (individual CLA) 2014-04-16 13:39:51 +10:00
CONTRIBUTORS A+C: Billie Harold Cleek (individual CLA) 2014-04-16 13:39:51 +10:00
favicon.ico godoc: update favicon 2012-10-11 17:02:36 +11:00
LICENSE doc: update licensing text one more time 2012-03-27 15:09:13 +11:00
PATENTS LICENSE: separate, change PATENTS text 2010-12-06 16:31:59 -05:00
README README: Fix installation instructions 2013-11-20 13:47:37 -08:00
robots.txt godoc: serve robots.txt raw 2011-02-19 05:46:20 +11:00

This is the source code repository for the Go programming language.  

For documentation about how to install and use Go,
visit http://golang.org/ or load doc/install-source.html
in your web browser.

After installing Go, you can view a nicely formatted
doc/install-source.html by running godoc --http=:6060
and then visiting http://localhost:6060/doc/install/source.

Unless otherwise noted, the Go source files are distributed
under the BSD-style license found in the LICENSE file.

--

Binary Distribution Notes

If you have just untarred a binary Go distribution, you need to set
the environment variable $GOROOT to the full path of the go
directory (the one containing this README).  You can omit the
variable if you unpack it into /usr/local/go, or if you rebuild
from sources by running all.bash (see doc/install.html).
You should also add the Go binary directory $GOROOT/bin
to your shell's path.

For example, if you extracted the tar file into $HOME/go, you might
put the following in your .profile:

    export GOROOT=$HOME/go
    export PATH=$PATH:$GOROOT/bin

See doc/install.html for more details.