GoLang字符串比较(二)

2020/09/19 golang 共 7940 字,约 23 分钟

pic

1. 写在前面

微信公众号:[double12gzh]

关注容器技术、关注Kubernetes。问题或建议,请公众号留言。

在上一篇文章中,我们介绍了GoLang中字符串不同的比较方法,同时也使用一种比较简单粗暴的方法来一起了下不同方法的执行时间。

在本文中,我们还是会针对前面提到的不同的比较方法,这次,我们使用benchmark的方式来看一下,不同的比较方法的效率。

2. Benchmark测试

2.1 准备测试代码

hello/utils/str/compare.go

package str

import (
	"main/utils/reader"
	"strings"
)

// BasicCompare 区分大小写。使用==进行比较
func BasicCompare(src, dest string) bool {
	if len(src) != len(dest) {
		return false
	}

	if src == dest {
		return true
	} else {
		return false
	}
}

// CompareCompare 区分大小写。使用strings包中的Compare进行比较
func CompareCompare(src, dest string) bool {
	if len(src) != len(dest) {
		return false
	}

	if strings.Compare(src, dest) == 0 {
		return true
	} else {
		return false
	}
}

// CaseInSensitiveBasicCompare 不区分大小写。使用==进行比较
func CaseInSensitiveBasicCompare(src, dest string) bool {
	if len(src) != len(dest) {
		return false
	}

	if strings.ToLower(src) == strings.ToLower(dest) {
		return true
	} else {
		return false
	}
}

// CaseInSensitiveCompareCompare 不区分大小写。使用Compare进行比较
func CaseInSensitiveCompareCompare(src, dest string) bool {
	if len(src) != len(dest) {
		return false
	}

	if strings.Compare(strings.ToLower(src), strings.ToLower(dest)) == 0 {
		return true
	} else {
		return false
	}
}

// CaseInSensitiveEqualFoldCompare 不区分大小写。使用EqualFold进行比较
func CaseInSensitiveEqualFoldCompare(a string, b string) bool {
	if len(a) != len(b) {
		return false
	}

	if strings.EqualFold(a, b) {
		return true
	} else {
		return false
	}
}

hello/utils/str/compare_test.go

package str

import (
	"testing"
)

func BenchmarkBasicCompare(b *testing.B) {
	for n := 0; n < b.N; n++ {
		BasicCompare("you are dog sun, sb wu", "you are dog sun, sb wU")
	}
}

func BenchmarkCompareCompare(b *testing.B) {
	for n := 0; n < b.N; n++ {
		CompareCompare("you are dog sun, sb wu", "you are dog sun, sb wU")
	}
}

func BenchmarkCaseInSensitiveBasicCompare(b *testing.B) {
	for n := 0; n < b.N; n++ {
		CaseInSensitiveBasicCompare("you are dog sun, sb wu", "you are dog sun, sb wU")
	}
}

func BenchmarkCaseInSensitiveCompareCompare(b *testing.B) {
	for n := 0; n < b.N; n++ {
		CaseInSensitiveCompareCompare("you are dog sun, sb wu", "you are dog sun, sb wU")
	}
}

func BenchmarkCaseInSensitiveEqualFoldCompare(b *testing.B) {
	for n := 0; n < b.N; n++ {
		CaseInSensitiveEqualFoldCompare("you are dog sun, sb wu", "you are dog sun, sb wU")
	}
}

2.2 执行benchmark测试

jeffrey@DESKTOP-41NV243 MINGW64 ~/Desktop/hello/utils/str
$ go test -bench=.
goos: windows
goarch: amd64
pkg: main/utils/str
BenchmarkBasicCompare-12                        379284132                3.13 ns/op
BenchmarkCompareCompare-12                      190776010                6.01 ns/op
BenchmarkCaseInSensitiveBasicCompare-12         12153535               105 ns/op
BenchmarkCaseInSensitiveCompareCompare-12       11950690               100 ns/op
BenchmarkCaseInSensitiveEqualFoldCompare-12     33408315                35.6 ns/op
PASS
ok      main/utils/str  7.224s

2.3 结论

根据上面的压测结果,我们可以很容易的得出以下两点结论:

  • 区大小写的情况,使用==比使用strings.Compare效率更高
  • 不区分大小写的情况,使用EqualFold效率更高

3. 其它测试

通过使用上一节产生的数据样本,我们进行本节的测试,来看一下不同的比较的方法的执行效率。很明显,这样测试时,得出的执行时间肯定是会受到文件读取时间的影响。

在本节点的测试中,其思路如下:

  • 打开文件
  • 一行一行获取文本并进行比较

3.1 前提条件

  • 有一个含有200000行的样本数据
  • 目标字符串与样本最后一行的内容完全一样

3.2 优点

  • 更加切合生产场景
  • 可以使用更多的字符串实现不同场景的比较

3.3 测试代码

hello/utils/str/compare.go

// ReadAndBasicCompare 区分大小写。读取文件然后使用==进行比较
func ReadAndBasicCompare(fileName, dest string) bool {
	return reader.ReadFileAndCompare(fileName, dest, BasicCompare)
}

// ReadAndCompareCompare 区分大小写。读取文件然后使用strings.Compare进行比较
func ReadAndCompareCompare(fileName, dest string) bool {
	return reader.ReadFileAndCompare(fileName, dest, CompareCompare)
}

// ReadAndCaseInSensitiveBasicCompare 不区分大小写。读取文件然后使用==进行比较
func ReadAndCaseInSensitiveBasicCompare(fileName, dest string) bool {
	return reader.ReadFileAndCompare(fileName, dest, CaseInSensitiveBasicCompare)
}

// ReadAndCaseInSensitiveCompareCompare 不区分大小写。读取文件然后使用strings.Compare进行比较
func ReadAndCaseInSensitiveCompareCompare(fileName, dest string) bool {
	return reader.ReadFileAndCompare(fileName, dest, CaseInSensitiveCompareCompare)
}

// ReadAndCaseInSensitiveEqualFoldCompare 不区分大小写。读取文件然后使用EqualFold进行比较
func ReadAndCaseInSensitiveEqualFoldCompare(fileName, dest string) bool {
	return reader.ReadFileAndCompare(fileName, dest, CaseInSensitiveEqualFoldCompare)
}

hello/utils/reader/fileReader.go

func ReadFileAndCompare(fileName string, dest string, ComPareFunc func(a, b string) bool) bool {
	f, err := os.Open(fileName)

	if err != nil {
		panic(err)
	}

	scan := bufio.NewScanner(f)
	scan.Split(bufio.ScanLines)

	result := false

	for scan.Scan() {
		cmptString := scan.Text()
		if ComPareFunc(cmptString, dest) {
			result = true
		} else {
			result = false
		}
	}

	f.Close()

	return result
}

hello/utils/str/compare_test.go

func BenchmarkReadAndBasicCompare(b *testing.B) {
	for n := 0; n < b.N; n++ {
		ReadAndBasicCompare("../../mock_data.txt", "79rIt3yb8zp5EMRw")
	}
}

func BenchmarkReadAndCompareCompare(b *testing.B) {
	for n := 0; n < b.N; n++ {
		ReadAndCompareCompare("../../mock_data.txt", "79rIt3yb8zp5EMRw")
	}
}

func BenchmarkReadAndCaseInSensitiveBasicCompare(b *testing.B) {
	for n := 0; n < b.N; n++ {
		ReadAndCaseInSensitiveBasicCompare("../../mock_data.txt", "79rIt3yb8zp5EMRw")
	}
}

func BenchmarkReadAndCaseInSensitiveCompareCompare(b *testing.B) {
	for n := 0; n < b.N; n++ {
		ReadAndCaseInSensitiveCompareCompare("../../mock_data.txt", "79rIt3yb8zp5EMRw")
	}
}

func BenchmarkReadAndCaseInSensitiveEqualFoldCompare(b *testing.B) {
	for n := 0; n < b.N; n++ {
		ReadAndCaseInSensitiveEqualFoldCompare("../../mock_data.txt", "79rIt3yb8zp5EMRw")
	}
}

3.4 执行测试

jeffrey@hacker ~/Desktop/hello/utils/str go test -bench=.

goos: windows
goarch: amd64
pkg: main/utils/str
...
BenchmarkReadAndBasicCompare
BenchmarkReadAndBasicCompare-12                       	     128	   9153796 ns/op
BenchmarkReadAndCompareCompare
BenchmarkReadAndCompareCompare-12                     	     100	  10240239 ns/op
BenchmarkReadAndCaseInSensitiveBasicCompare
BenchmarkReadAndCaseInSensitiveBasicCompare-12        	      27	  41919633 ns/op
BenchmarkReadAndCaseInSensitiveCompareCompare
BenchmarkReadAndCaseInSensitiveCompareCompare-12      	      28	  43576511 ns/op
BenchmarkReadAndCaseInSensitiveEqualFoldCompare
BenchmarkReadAndCaseInSensitiveEqualFoldCompare-12    	     124	   9639406 ns/op
PASS

Process finished with exit code 0

3.5 结论

  • 区大小写的情况,使用==比使用strings.Compare效率更高
  • 不区分大小写的情况,使用EqualFold效率更高

4. 添加长度判断会提高效率吗

思路: 增加长度判断

在前面的所有代码中,我们一开始就是针对字符串进行完全比较,其实,在比较之前,我们可以先获取一下字符串的长度,如果长度不相等,那么这两个字符串肯定是不相等的。例如:

// CaseInSensitiveEqualFoldCompare 不区分大小写。使用EqualFold进行比较
func CaseInSensitiveEqualFoldCompare(a string, b string) bool {
	if len(a) != len(b) {
		return false
	}

	if strings.EqualFold(a, b) {
		return true
	} else {
		return false
	}
}

修改代码,加入长度判断,然后执行benchmark测试得到的结果如下:

goos: windows
goarch: amd64
pkg: main/utils/str
BenchmarkBasicCompare
BenchmarkBasicCompare-12                              	382882960	         3.14 ns/op
BenchmarkCompareCompare
BenchmarkCompareCompare-12                            	197837797	         6.19 ns/op
BenchmarkCaseInSensitiveBasicCompare
BenchmarkCaseInSensitiveBasicCompare-12               	11463978	       104 ns/op
BenchmarkCaseInSensitiveCompareCompare
BenchmarkCaseInSensitiveCompareCompare-12             	11909770	       101 ns/op
BenchmarkCaseInSensitiveEqualFoldCompare
BenchmarkCaseInSensitiveEqualFoldCompare-12           	32394259	        35.3 ns/op
BenchmarkReadAndBasicCompare
BenchmarkReadAndBasicCompare-12                       	     128	   9199173 ns/op
BenchmarkReadAndCompareCompare
BenchmarkReadAndCompareCompare-12                     	     100	  10155251 ns/op
BenchmarkReadAndCaseInSensitiveBasicCompare
BenchmarkReadAndCaseInSensitiveBasicCompare-12        	      27	  42963244 ns/op
BenchmarkReadAndCaseInSensitiveCompareCompare
BenchmarkReadAndCaseInSensitiveCompareCompare-12      	      27	  44257270 ns/op
BenchmarkReadAndCaseInSensitiveEqualFoldCompare
BenchmarkReadAndCaseInSensitiveEqualFoldCompare-12    	     122	   9650323 ns/op
PASS

不加长度判断时的benchmark结果:

goos: windows
goarch: amd64
pkg: main/utils/str
BenchmarkBasicCompare
BenchmarkBasicCompare-12                              	370923894	         3.11 ns/op
BenchmarkCompareCompare
BenchmarkCompareCompare-12                            	198499410	         6.12 ns/op
BenchmarkCaseInSensitiveBasicCompare
BenchmarkCaseInSensitiveBasicCompare-12               	12226632	       108 ns/op
BenchmarkCaseInSensitiveCompareCompare
BenchmarkCaseInSensitiveCompareCompare-12             	11799502	       101 ns/op
BenchmarkCaseInSensitiveEqualFoldCompare
BenchmarkCaseInSensitiveEqualFoldCompare-12           	33342316	        35.5 ns/op
BenchmarkReadAndBasicCompare
BenchmarkReadAndBasicCompare-12                       	     128	   9153796 ns/op
BenchmarkReadAndCompareCompare
BenchmarkReadAndCompareCompare-12                     	     100	  10240239 ns/op
BenchmarkReadAndCaseInSensitiveBasicCompare
BenchmarkReadAndCaseInSensitiveBasicCompare-12        	      27	  41919633 ns/op
BenchmarkReadAndCaseInSensitiveCompareCompare
BenchmarkReadAndCaseInSensitiveCompareCompare-12      	      28	  43576511 ns/op
BenchmarkReadAndCaseInSensitiveEqualFoldCompare
BenchmarkReadAndCaseInSensitiveEqualFoldCompare-12    	     124	   9639406 ns/op
PASS

结论: 增加长度判断并没有提高效率,反而还略有下降

5. 总结

  • 区大小写的情况,使用==效率更高
  • 不区分大小写的情况,使用EqualFold效率更高

Search

    Table of Contents