pprof

by 夏泽民 Mar 25, 2020

Go语言项目中的性能优化主要有以下几个方面：

CPU profile：报告程序的 CPU 使用情况，按照一定频率去采集应用程序在 CPU 和寄存器上面的数据
Memory Profile（Heap Profile）：报告程序的内存使用情况
Block Profiling：报告 goroutines 不在运行状态的情况，可以用来分析和查找死锁等性能瓶颈
Goroutine Profiling：报告 goroutines 的使用情况，有哪些 goroutine，它们的调用关系是怎样的

二、采集性能数据#
Go语言内置了获取程序的运行数据的工具，包括以下两个标准库：

runtime/pprof：采集工具型应用运行数据进行分析
net/http/pprof：采集服务型应用运行时数据进行分析
pprof开启后，每隔一段时间（10ms）就会收集下当前的堆栈信息，获取格格函数占用的CPU以及内存资源；最后通过对这些采样数据进行分析，形成一个性能分析报告。

注意，我们只应该在性能测试的时候才在代码中引入pprof。

三、工具型应用#
如果你的应用程序是运行一段时间就结束退出类型。那么最好的办法是在应用退出的时候把 profiling 的报告保存到文件中，进行分析。对于这种情况，可以使用runtime/pprof库。首先在代码中导入runtime/pprof工具：

import runtime/pprof

3.1 CPU性能分析#
开启CPU性能分析：

Copy
pprof.StartCPUProfile(w io.Writer)
停止CPU性能分析：

Copy
pprof.StopCPUProfile()
应用执行结束后，就会生成一个文件，保存了我们的 CPU profiling 数据。得到采样数据之后，使用go tool pprof工具进行CPU性能分析。

3.2 内存性能优化#
记录程序的堆栈信息

Copy
pprof.WriteHeapProfile(w io.Writer)
得到采样数据之后，使用go tool pprof工具进行内存性能分析。

go tool pprof默认是使用-inuse_space进行统计，还可以使用-inuse-objects查看分配对象的数量。

四、服务型应用#
如果你的应用程序是一直运行的，比如 web 应用，那么可以使用net/http/pprof库，它能够在提供 HTTP 服务进行分析。

如果使用了默认的http.DefaultServeMux（通常是代码直接使用 http.ListenAndServe(“0.0.0.0:8000”, nil)），只需要在你的web server端代码中按如下方式导入net/http/pprof

import net/http/pprof

如果你使用自定义的 Mux，则需要手动注册一些路由规则：

Copy
r.HandleFunc(/debug/pprof/, pprof.Index)
r.HandleFunc(/debug/pprof/cmdline, pprof.Cmdline)
r.HandleFunc(/debug/pprof/profile, pprof.Profile)
r.HandleFunc(/debug/pprof/symbol, pprof.Symbol)
r.HandleFunc(/debug/pprof/trace, pprof.Trace)
如果你使用的是gin框架，那么推荐使用”github.com/DeanThompson/ginpprof”。

这个路径下还有几个子页面：

/debug/pprof/profile：访问这个链接会自动进行 CPU profiling，持续 30s，并生成一个文件供下载
/debug/pprof/heap： Memory Profiling 的路径，访问这个链接会得到一个内存 Profiling 结果的文件
/debug/pprof/block：block Profiling 的路径
/debug/pprof/goroutines：运行的 goroutines 列表，以及调用关系
五、go tool pprof命令#
不管是工具型应用还是服务型应用，我们使用相应的pprof库获取数据之后，下一步的都要对这些数据进行分析，我们可以使用go tool pprof命令行工具。

go tool pprof最简单的使用方式为:

Copy
go tool pprof [binary] [source]
其中：

binary 是应用的二进制文件，用来解析各种符号；
source 表示 profile 数据的来源，可以是本地的文件，也可以是 http 地址。
注意事项：获取的 Profiling 数据是动态的，要想获得有效的数据，请保证应用处于较大的负载（比如正在生成中运行的服务，或者通过其他工具模拟访问压力）。否则如果应用处于空闲状态，得到的结果可能没有任何意义。

六、具体示例#
首先我们来写一段有问题的代码：
// runtime_pprof/main.go
package main

import (
“flag”
“fmt”
“os”
“runtime/pprof”
“time”
)

// 一段有问题的代码
func logicCode() {
var c chan int
for {
select {
case v := <-c:
fmt.Printf(“recv from chan, value:%v\n”, v)
default:

}
}
}

func main() {
var isCPUPprof bool
var isMemPprof bool

flag.BoolVar(&isCPUPprof, "cpu", false, "turn cpu pprof on")

flag.BoolVar(&isMemPprof, "mem", false, "turn mem pprof on")

flag.Parse()

if isCPUPprof {

	file, err := os.Create("./cpu.pprof")

	if err != nil {

		fmt.Printf("create cpu pprof failed, err:%v\n", err)

		return

	}

	pprof.StartCPUProfile(file)

	defer pprof.StopCPUProfile()

}

for i := 0; i < 8; i++ {

	go logicCode()

}

time.Sleep(20 * time.Second)

if isMemPprof {

	file, err := os.Create("./mem.pprof")

	if err != nil {

		fmt.Printf("create mem pprof failed, err:%v\n", err)

		return

	}

	pprof.WriteHeapProfile(file)

	file.Close()

} }

通过flag我们可以在命令行控制是否开启CPU和Mem的性能分析。将上面的代码保存并编译成runtime_pprof可执行文件，执行时加上-cpu命令行参数如下：
./runtime_pprof -cpu
等待30秒后会在当前目录下生成一个cpu.pprof文件。

6.1 命令行交互界面#
我们使用go工具链里的pprof来分析一下。
go tool pprof cpu.pprof
执行上面的代码会进入交互界面如下：
i$ go tool pprof cpu.pprof
Type: cpu
Time: Mar 25, 2020 at 11:22am (CST)
Duration: 20.22s, Total samples = 50.60s (250.29%)
Entering interactive mode (type “help” for commands, “o” for options)
(pprof)

我们可以在交互界面输入top3来查看程序中占用CPU前3位的函数：
(pprof) top3
Showing nodes accounting for 48.90s, 96.64% of 50.60s total
Dropped 2 nodes (cum <= 0.25s)
Showing top 3 nodes out of 4
flat flat% sum% cum cum%
19.53s 38.60% 38.60% 37.78s 74.66% runtime.selectnbrecv
16.56s 32.73% 71.32% 17.10s 33.79% runtime.chanrecv
12.81s 25.32% 96.64% 50.59s 100% main.logicCode
(pprof)

其中：

flat：当前函数占用CPU的耗时
flat：:当前函数占用CPU的耗时百分比
sun%：函数占用CPU的耗时累计百分比
cum：当前函数加上调用当前函数的函数占用CPU的总耗时
cum%：当前函数加上调用当前函数的函数占用CPU的总耗时百分比
最后一列：函数名称
在大多数的情况下，我们可以通过分析这五列得出一个应用程序的运行情况，并对程序进行优化。

我们还可以使用list 函数名命令查看具体的函数分析，例如执行list logicCode查看我们编写的函数的详细分析。
(pprof) list main.logicCode
Total: 50.60s
ROUTINE ======================== main.logicCode in /Users/didi/GitBook/xiazm/Import/gotool/src/pprof/exp1/main.go
12.81s 50.59s (flat, cum) 100% of Total
. . 12:// 一段有问题的代码
. . 13:func logicCode() {
. . 14: var c chan int
. . 15: for {
. . 16: select {
12.81s 50.59s 17: case v := <-c:
. . 18: fmt.Printf(“recv from chan, value:%v\n”, v)
. . 19:default:
. . 20:
. . 21:}
. . 22:}
(pprof)

通过分析发现大部分CPU资源被17行占用，我们分析出select语句中的default没有内容会导致上面的case v:=<-c:一直执行。我们在default分支添加一行time.Sleep(time.Second)即可。

6.2 图形化#
或者可以直接输入web，通过svg图的方式查看程序中详细的CPU占用情况。想要查看图形化的界面首先需要安装graphviz图形化工具。

Mac：
brew install graphviz
Windows: 下载graphviz 将graphviz安装目录下的bin文件夹添加到Path环境变量中。在终端输入dot -version查看是否安装成功。

关于图形的说明：每个框代表一个函数，理论上框的越大表示占用的CPU资源越多。方框之间的线条代表函数之间的调用关系。线条上的数字表示函数调用的次数。方框中的第一行数字表示当前函数占用CPU的百分比，第二行数字表示当前函数累计占用CPU的百分比。

这样程序运行的时候的cpu信息就会记录到XXX.prof中了。

下一步就可以使用这个prof信息做出性能分析图了（需要安装graphviz）。

使用go tool pprof (应用程序) （应用程序的prof文件）

进入到pprof，使用web命令就会在/tmp下生成svg文件，svg文件是可以在浏览器下看的。

七、go-torch和火焰图#
火焰图（Flame Graph）是 Bredan Gregg 创建的一种性能分析图表，因为它的样子近似 🔥而得名。上面的 profiling 结果也转换成火焰图，如果对火焰图比较了解可以手动来操作，不过这里我们要介绍一个工具：go-torch。这是 uber 开源的一个工具，可以直接读取 golang profiling 数据，并生成一个火焰图的 svg 文件。

7.1 安装go-touch#
Copy
go get -v github.com/uber/go-torch
火焰图 svg 文件可以通过浏览器打开，它对于调用图的最优点是它是动态的：可以通过点击每个方块来 zoom in 分析它上面的内容。

火焰图的调用顺序从下到上，每个方块代表一个函数，它上面一层表示这个函数会调用哪些函数，方块的大小代表了占用 CPU 使用的长短。火焰图的配色并没有特殊的意义，默认的红、黄配色是为了更像火焰而已。

go-torch 工具的使用非常简单，没有任何参数的话，它会尝试从http://localhost:8080/debug/pprof/profile获取 profiling 数据。它有三个常用的参数可以调整：

-u –url：要访问的 URL，这里只是主机和端口部分
-s –suffix：pprof profile 的路径，默认为 /debug/pprof/profile
–seconds：要执行 profiling 的时间长度，默认为 30s
7.2 安装 FlameGraph#
要生成火焰图，需要事先安装 FlameGraph工具，这个工具的安装很简单（需要perl环境支持），只要把对应的可执行文件加入到环境变量中即可。

下载安装perl：https://www.perl.org/get.html
下载FlameGraph：git clone https://github.com/brendangregg/FlameGraph.git
将FlameGraph目录加入到操作系统的环境变量中。
Windows平台的同学，需要把go-torch/render/flamegraph.go文件中的GenerateFlameGraph按如下方式修改，然后在go-torch目录下执行go install即可。
// GenerateFlameGraph runs the flamegraph script to generate a flame graph SVG. func GenerateFlameGraph(graphInput []byte, args …string) ([]byte, error) {
flameGraph := findInPath(flameGraphScripts)
if flameGraph == “” {
return nil, errNoPerlScript
}
if runtime.GOOS ==”windows” {
return runScript(“perl”, append([]string{flameGraph}, args…), graphInput)
}
return runScript(flameGraph, args, graphInput)
}
7.3 压测工具wrk#
推荐使用https://github.com/wg/wrk 或 https://github.com/adjust/go-wrk

7.4 使用go-torch#
使用wrk进行压测:go-wrk -n 50000 http://127.0.0.1:8080/book/list 在上面压测进行的同时，打开另一个终端执行go-torch -u http://127.0.0.1:8080 -t 30，30秒之后终端会初夏如下提示：Writing svg to torch.svg

然后我们使用浏览器打开torch.svg就能看到如下火焰图了。pprof3.png火焰图的y轴表示cpu调用方法的先后，x轴表示在每个采样调用时间内，方法所占的时间百分比，越宽代表占据cpu时间越多。通过火焰图我们就可以更清楚的找出耗时长的函数调用，然后不断的修正代码，重新采样，不断优化。

八、pprof与性能测试结合#
go test命令有两个参数和 pprof 相关，它们分别指定生成的 CPU 和 Memory profiling 保存的文件：

-cpuprofile：cpu profiling 数据要保存的文件地址
-memprofile：memory profiling 数据要报文的文件地址
我们还可以选择将pprof与性能测试相结合，比如：

比如下面执行测试的同时，也会执行 CPU profiling，并把结果保存在 cpu.prof 文件中：

go test -bench . -cpuprofile=cpu.prof
比如下面执行测试的同时，也会执行 Mem profiling，并把结果保存在 cpu.prof 文件中：

go test -bench . -memprofile=./mem.prof
需要注意的是，Profiling 一般和性能测试一起使用，这个原因在前文也提到过，只有应用在负载高的情况下 Profiling 才有意义。

使用gin框架编写一个接口，使用go-wrk进行压测，使用性能调优工具采集数据绘制出调用图和火焰图。

https://github.com/pkg/profile
golang自带的prof包是runtime/pprof，这个是低级别的，需要你手动做一些设置等等周边工作，不利于我们快速上手，利用pprof帮助我们解决实际的问题。这里推荐davecheney封装的pprof，它可以1行代码，让你用上pprof，专心解决自己的代码问题，下载：

go get github.com/pkg/profile
第2步：安装graphviz
pprof生成的prof文件时二进制的，需要把这个二进制的文件转换为我们人类可读的，graphviz可以帮助我们把二进制的prof文件转换为图像。Mac安装：

brew install graphviz
其他系统安装参考这里Graphviz Download。

第3步：修改你的main函数
只需要为hi.go增加这一行，defer profile.Start().Stop()，程序运行时，默认就会记录cpu数据:

package main

import (
“fmt”
“github.com/pkg/profile”
)

func main() {
defer profile.Start().Stop()

sl := makeSlice()

fmt.Printf("sum = %d\n", sumSlice(sl)) }

func makeSlice() []int {
sl := make([]int, 10000000)
for idx := range sl {
sl[idx] = idx
}
return sl
}

func sumSlice(sl []int) int {
sum := 0
for _, x := range sl {
sum += x
}
return sum
}
第4步：编译运行你的函数
编译和执行hi.go。

go build hi.go
./hi
应当看到类似的结果，它输出了生成的cpu.pprof的路径：

2018/11/07 19:47:21 profile: cpu profiling enabled, /var/folders/5g/rz16gqtx3nsdfs7k8sb80jth0000gn/T/profile046201825/cpu.pprof
sum = 49999995000000
2018/11/07 19:47:21 profile: cpu profiling disabled, /var/folders/5g/rz16gqtx3nsdfs7k8sb80jth0000gn/T/profile046201825/cpu.pprof
第5步：可视化prof
可视化有多种方式，可以转换为text、pdf、svg等等。text命令是

go tool pprof –text /path/to/yourbinary /var/path/to/cpu.pprof
结果是：

go tool pprof -text ./hi /var/folders/5g/rz16gqtx3nsdfs7k8sb80jth0000gn/T/profile046201825/cpu.pprof
File: hi
Type: cpu
Time: Nov 7, 2018 at 7:47pm (CST)
Duration: 202.18ms, Total samples = 50ms (24.73%)
Showing nodes accounting for 50ms, 100% of 50ms total
flat flat% sum% cum cum%
40ms 80.00% 80.00% 40ms 80.00% main.makeSlice /Users/shitaibin/go/src/github.com/shitaibin/awesome/hi.go
10ms 20.00% 100% 10ms 20.00% main.sumSlice /Users/shitaibin/go/src/github.com/shitaibin/awesome/hi.go
0 0% 100% 50ms 100% main.main /Users/shitaibin/go/src/github.com/shitaibin/awesome/hi.go
0 0% 100% 50ms 100% runtime.main /usr/local/go/src/runtime/proc.go
还有pdf这种效果更好：

go tool pprof –pdf /path/to/yourbinary /var/path/to/cpu.pprof > cpu.pdf
例子：

go tool pprof -pdf ./hi /var/folders/5g/rz16gqtx3nsdfs7k8sb80jth0000gn/T/profile046201825/cpu.pprof > cpu.pdf
效果：

cpu pprof
5步已经结束，你已经学会使用cpu pprof了吗？

轻松获取内存ppfo
如果你掌握了cpu pprof，mem pprof轻而易举就能拿下，只需要改1行代码：

defer profile.Start(profile.MemProfile).Stop()
效果：

go tool pprof -pdf ./hi /var/folders/5g/rz16gqtx3nsdfs7k8sb80jth0000gn/T/profile986580758/mem.pprof > mem.pdf
mem pprof

$ go tool pprof –pdf ./mem.pprof ./
./: read ./: is a directory
Fetched 1 source profiles out of 2
Generating report in profile001.pdf

$ go tool pprof –pdf ./m ./mem.pprof > cpu.pdf

$ go tool pprof –text ./mem.pprof ./
./: read ./: is a directory
Fetched 1 source profiles out of 2
Type: cpu
Time: Mar 25, 2020 at 12:03pm (CST)
Duration: 205.19ms, Total samples = 90ms (43.86%)
Showing nodes accounting for 90ms, 100% of 90ms total
flat flat% sum% cum cum%
40ms 44.44% 44.44% 40ms 44.44% main.makeSlice
40ms 44.44% 88.89% 40ms 44.44% runtime.usleep
10ms 11.11% 100% 10ms 11.11% main.sumSlice
0 0% 100% 50ms 55.56% main.main
0 0% 100% 40ms 44.44% runtime.gcBgMarkWorker.func2
0 0% 100% 40ms 44.44% runtime.gcDrain
0 0% 100% 50ms 55.56% runtime.main
0 0% 100% 40ms 44.44% runtime.markroot
0 0% 100% 40ms 44.44% runtime.markroot.func1
0 0% 100% 40ms 44.44% runtime.mstart
0 0% 100% 40ms 44.44% runtime.osyield
0 0% 100% 40ms 44.44% runtime.scang
0 0% 100% 40ms 44.44% runtime.systemstack

go tool pprof -svg ./m ./mem.pprof > cpu.svg

一、概述
go的pprof工具可以用来监测进程的运行数据，用于监控程序的性能，对内存使用和CPU使用的情况统信息进行分析。

官方提供了两个包：runtime/pprof和net/http/pprof，前者用于普通代码的性能分析，后者用于web服务器的性能分析。

官方文档：

https://golang.org/pkg/runtime/pprof/

https://golang.org/pkg/net/http/pprof/#Index

https://github.com/google/pprof/blob/master/doc/pprof.md

二、runtime/pprof的使用
该包提供了一系列用于调试信息的方法，可以很方便的对堆栈进行调试。

通常用得多得是以下几个：

StartCPUProfile：开始监控cpu。
StopCPUProfile：停止监控cpu，使用StartCPUProfile后一定要调用该函数停止监控。
WriteHeapProfile：把堆中的内存分配信息写入分析文件中。

go tool pprof cpu.prof
Entering interactive mode (type “help” for commands)
(pprof) help

Commands:
cmd [n] [–cum] [focus_regex]* [-ignore_regex]*
Produce a text report with the top n entries.
Include samples matching focus_regex, and exclude ignore_regex.
Add –cum to sort using cumulative data.
Available commands:
callgrind Outputs a graph in callgrind format
disasm Output annotated assembly for functions matching regexp or address
dot Outputs a graph in DOT format
eog Visualize graph through eog
evince Visualize graph through evince
gif Outputs a graph image in GIF format
gv Visualize graph through gv
list Output annotated source for functions matching regexp
pdf Outputs a graph in PDF format
peek Output callers/callees of functions matching regexp
png Outputs a graph image in PNG format
proto Outputs the profile in compressed protobuf format
ps Outputs a graph in PS format
raw Outputs a text representation of the raw profile
svg Outputs a graph in SVG format
tags Outputs all tags in the profile
text Outputs top entries in text form
top Outputs top entries in text form
tree Outputs a text rendering of call graph
web Visualize graph through web browser
weblist Output annotated source in HTML for functions matching regexp or address
peek func_regex
Display callers and callees of functions matching func_regex.

1.top
命令格式：top [n]，查看排名前n个数据，默认为10。
(pprof) top
8490ms of 8510ms total (99.76%)
Dropped 13 nodes (cum <= 42.55ms)
Showing top 10 nodes out of 15 (cum >= 110ms)
flat flat% sum% cum cum%
6780ms 79.67% 79.67% 8510ms 100% main.main
670ms 7.87% 87.54% 1250ms 14.69% math/rand.(Rand).Int31n
350ms 4.11% 91.66% 1600ms 18.80% math/rand.(Rand).Intn
260ms 3.06% 94.71% 580ms 6.82% math/rand.(Rand).Int31
190ms 2.23% 96.94% 190ms 2.23% math/rand.(rngSource).Int63
130ms 1.53% 98.47% 320ms 3.76% math/rand.(Rand).Int63
110ms 1.29% 99.76% 110ms 1.29% runtime.memclrNoHeapPointers
0 0% 99.76% 8510ms 100% runtime.goexit
0 0% 99.76% 110ms 1.29% runtime.heapBits.initSpan
0 0% 99.76% 110ms 1.29% runtime.largeAlloc
2.tree
命令格式：tree [n]，以树状图形式显示，默认显示10个。
(pprof) tree 5
8250ms of 8510ms total (96.94%)
Dropped 13 nodes (cum <= 42.55ms)
Showing top 5 nodes out of 15 (cum >= 190ms)
———————————————————-+————-
flat flat% sum% cum cum% calls calls% + context
———————————————————-+————-
6780ms 79.67% 79.67% 8510ms 100% | main.main
1600ms 100% | math/rand.(Rand).Intn
———————————————————-+————-
1250ms 100% | math/rand.(Rand).Intn
670ms 7.87% 87.54% 1250ms 14.69% | math/rand.(Rand).Int31n
580ms 100% | math/rand.(Rand).Int31
———————————————————-+————-
1600ms 100% | main.main
350ms 4.11% 91.66% 1600ms 18.80% | math/rand.(Rand).Intn
1250ms 100% | math/rand.(Rand).Int31n
———————————————————-+————-
580ms 100% | math/rand.(Rand).Int31n
260ms 3.06% 94.71% 580ms 6.82% | math/rand.(Rand).Int31
190ms 100% | math/rand.(rngSource).Int63
———————————————————-+————-
190ms 100% | math/rand.(Rand).Int31
190ms 2.23% 96.94% 190ms 2.23% | math/rand.(rngSource).Int63
———————————————————-+————-
3.web
以web形式查看，在web服务的时候经常被用到，需要安装gv工具，官方网页：http://www.graphviz.org/。

linux用户使用yum install graphviz安装即可，当然，纯命令行界面是不能查看的。

windows用户下载msi包安装后需要把安装目录下的bin目录添加到环境变量才行。

如果没有安装gv工具，使用会报错：

Cannot find dot, have you installed Graphviz?

exec: “firefox”: executable file not found in $PATH

4.其他
其他的都是以不同形式展现出来，大同小异，以后有时间再测试。

四、web服务器监测
在web服务器中监测只需要在import部分加上监测包即可：
import(
_ “net/http/pprof”
)
当服务开启后，在当前服务环境的http://ip:port/debug/pprof页面可以看到当前的系统信息：

点击查看具体的信息：

通常可以对服务器在一段时间内进行数据采样，然后分析服务器的耗时和性能:
go tool pprof http://:/debug/pprof/profile
使用该命令后会对服务进行30s的采样，这段时间内可以尽量多使用web服务，生成多一些统计数据。

go tool pprof http://127.0.0.1:8080/debug/pprof/profile
Fetching profile from http://127.0.0.1:8080/debug/pprof/profile
Please wait… (30s)
Saved profile in \pprof\pprof.127.0.0.1.samples.cpu.001.pb.gz
Entering interactive mode (type “help” for commands)
(pprof) top
3870ms of 4800ms total (80.62%)
Dropped 37 nodes (cum <= 24ms)
Showing top 10 nodes out of 66 (cum >= 110ms)
flat flat% sum% cum cum%
1230ms 25.62% 25.62% 1300ms 27.08% runtime.mapaccess1_faststr
860ms 17.92% 43.54% 860ms 17.92% runtime.memclrNoHeapPointers
810ms 16.88% 60.42% 1010ms 21.04% runtime.scanobject
190ms 3.96% 64.38% 190ms 3.96% runtime.heapBitsForObject
160ms 3.33% 67.71% 190ms 3.96% strconv.ParseInt
140ms 2.92% 70.62% 1720ms 35.83% business_sets/haoxingdai_qiangdan/server/handler.makeOrder4Replace
140ms 2.92% 73.54% 1990ms 41.46% runtime.mallocgc
120ms 2.50% 76.04% 120ms 2.50% runtime.heapBitsSetType
110ms 2.29% 78.33% 1680ms 35.00% runtime.mapassign
110ms 2.29% 80.62% 110ms 2.29% runtime.memhash
使用web命令后会生成采样时间内每个系统调用的耗时分析，可以用来分析web服务的响应时间都用在哪了

Go 程序的性能优化及 pprof 的使用

程序的性能优化无非就是对程序占用资源的优化。对于服务器而言，最重要的两项资源莫过于 CPU 和内存。性能优化，就是在对于不影响程序数据处理能力的情况下，我们通常要求程序的 CPU 的内存占用尽量低。反过来说，也就是当程序 CPU 和内存占用不变的情况下，尽量地提高程序的数据处理能力或者说是吞吐量。

Go 的原生工具链中提供了非常多丰富的工具供开发者使用，其中包括 pprof。

对于 pprof 的使用要分成下面两部分来说。

Web 程序使用 pprof

先写一个简单的 Web 服务程序。程序在 9876 端口上接收请求。
package main

import (
“bytes”
“io/ioutil”
“log”
“math/rand”
“net/http”

_ “net/http/pprof”
)

func main() {
http.HandleFunc(“/test”, handler)
log.Fatal(http.ListenAndServe(“:9876”, nil))
}

func handler(w http.ResponseWriter, r *http.Request) {
err := r.ParseForm()
if nil != err {
w.Write([]byte(err.Error()))
return
}
doSomeThingOne(10000)
buff := genSomeBytes()
b, err := ioutil.ReadAll(buff)
if nil != err {
w.Write([]byte(err.Error()))
return
}
w.Write(b)
}

func doSomeThingOne(times int) {
for i := 0; i < times; i++ {
for j := 0; j < times; j++ {

}   } }

func genSomeBytes() *bytes.Buffer {
var buff bytes.Buffer
for i := 1; i < 20000; i++ {
buff.Write([]byte{‘0’ + byte(rand.Intn(10))})
}
return &buff
}
可以看到我们只是简单地引入了 net/http/pprof ，并未显示地使用。

启动程序。

我们用 wrk 来简单地模拟请求。
brew install wrk
wrk -c 400 -t 8 -d 3m http://localhost:9876/test
-c, –connections 跟服务器建立并保持的TCP连接数量
-d, --duration 压测时间
-t, --threads 使用多少个线程进行压测

wrk -c 400 -t 8 -d 3m http://localhost:9876/test

这时我们打开 http://localhost:9876/debug/pprof，会显示如下页面：

用户可以点击相应的链接浏览内容。不过这不是我们重点讲述的，而且这些内容看起来并不直观。

我们打开链接 http://localhost:9876/debug/pprof/profile 稍后片刻，可以下载到文件 profile。

使用 Go 自带的 pprof 工具打开。go tool pprof test profile。（proof 后跟的 test 为程序编译的可执行文件）

输入 top 命令得到：

可以看到 cpu 占用前 10 的函数，我们可以对此分析进行优化。

只是这样可能还不是很直观。

我们输入命令 web（需要事先安装 graphviz，macOS 下可以 brew install graphviz），会在浏览器中打开界面如下：

可以看到 main.doSomeThingOne 占用了 92.46% 的 CPU 时间，需要对其进行优化。

Web 形式的 CPU 时间图对于优化已经完全够用，这边再介绍一下火焰图的生成。macOS 推荐使用 go-torch 工具。使用方法和 go tool pprof 相似。

go-torch test profile 会生成 torch.svg 文件。可以用浏览器打开
刚才只是讲了 CPU 的占用分析文件的生成查看，其实内存快照的生成相似。http://localhost:9876/debug/pprof/heap，会下载得到 heap.gz 文件。
我们同样可以使用 go tool pprof test heap.gz，然后输入 top 或 web 命令查看相关内容。

Category golang