未加星标

Node.js under a Microscope: CPU FlameGraph and FlameScope

字体大小 | |
[前端(javascript) 所属分类 前端(javascript) | 发布者 店小二04 | 时间 2018 | 作者 红领巾 ] 0人收藏点击收藏

Last week, I had an opportunity to talk about profiling Node.js applications on production at an internal guild meeting at work. Here is a written version of it plus some additional information.

Background

I have been working on Node.js microservices, which fetch data from API servers and render HTML with React, at work. We monitor response times at load balancers, in application metrics and with distributed tracing with OpenTracing. One of the microservices had a weird gap between 99 percentile response times of itself and its dependencies. It was spending an extra 500 milliseconds―but I didn't know why.

My first suspect was the network. It is the place full of uncertainty. After learning and trying different commands and metrics, I took tcpdump and checked packets one by one with my eyes and a script. There were no significant delays that I had expected. So I had to stop blaming the network―or someone else .

CPU Profiling with linux perf Command

Because the weird latency was happening in the application itself, I wanted to know what's going on in it. There are mainly two ways to achieve this: profiling and tracing. Profiling records some samples and tracing records everything. I wanted to do it on production , so profiling was naturally a good fit because of its smaller overhead.

For Node.js, there are mainly two different tools. One is V8 profiler , and the other is Linux perf . V8 profiler uses the profiler provided by V8. It covers all javascript executions and V8 native functions. It works on non-Linux operating systems. If you use non-Linux machines, it might be pretty handy. On the other hand, Linux perf can profile almost anything including Linux kernel, libuv, and all processes on your OS with minimal overhead. However, as the name suggests, it works only on Linux. According to Node CPU Profiling Roadmap , it seems that V8 profiler is the one officially supported by the V8 team, but Linux perf will keep working for a while. After all, I picked Linux perf because of low performance-overhead and small intervention to applications.

Linux perf record records stack traces into a binary file called perf.data by default. The binary file has only addresses and file names of functions. perf script converts the stack traces into a human-readable text file adding function names from program binaries and symbol map files.

# Install dependencies for `perf` command sudo apt-get install linux-tools-common linux-tools-$(uname -r) # Test `perf` command sudo perf top # Record stack traces 99 times per second for 30 seconds sudo perf record -F 99 -p ${pid} -g -- sleep 30s # Generate human readable stack traces sudo perf script > stacks.${pid}.out

Now we have human-readable stack traces, but it's still hard to browse thousands of stack traces and get insights from them. How can we efficiently analyze them?

CPU Flame Graph

CPU Flame Graph by Brendan Gregg is a great way of visualizing stack traces. It aggregates stack traces into one chart. Frequently executed functions are shown wider and rarely executed functions are narrower in the chart.


Node.js under a Microscope: CPU FlameGraph and FlameScope
A CPU Flame Graph from a sample application

I found some insights about the application on production with CPU Flame Graph:

JSON.parse()

There are a few tools like FlameGraph and 0x to generate CPU Flame Graph from Linux perf stack traces. However, I eventually didn't need them because FlameScope, which I'll explain next, can generate CPU Flame Graph too.

FlameScope

FlameScope by Netflix is another great tool for visualizing stack traces in a time-series. It shows a heatmap out of stack traces. Each cell represents a short amount of time, 20 ms if 50 cells per second, and its color represents how many times the process was on-CPU. It visualizes patterns of your application's activity.


Node.js under a Microscope: CPU FlameGraph and FlameScope
Image from Netflix/flamescope

If you select a time range on the heatmap, FlameScope shows you a CPU Flame Graph of the range. It allows you to examine what happened when in details.

To use FlameScope, check out the repository and run the python server. Then put stack trace files from perf script into examples directory, and open http://localhost:5000 .

I found a couple of exciting insights about the application on production using this tool.

Example 1: Heavy Tasks in the Master Process

The application used the cluster module to utilize multiple CPU cores. FlameScope showed that the master process was not busy for most of the time, but it occasionally kept using CPU for 1.5 seconds continuously! FlameScope showed that it was caused by metrics aggregation.

The master process was aggregating application metrics from worker processes, and it was responding to metrics collectors a few times in a minute. When the metrics collectors asked for data, the master process calculated percentiles of response times and prepared a JSON response. The percentile calculation was taking long time because the application had a lot of metrics buckets and the library that we used was using JSON.stringify() and JSON.parse() to deep-copy objects!

Example 2: Frequent Garbage Collections

FlameScope showed that the worker processes were not overloaded for most of the time, but they had a few hundred milliseconds of CPU-busy time in about 10 seconds. It was caused by mark-sweep and mark-compact garbage collections.

The application had an in-memory fallback cache for API calls that was used only when API calls and retries fail. Even when API had problems, the cache hit rate was very low because of the number of permutations. In other words, it was not used almost at all. It cached large API responses for a while and threw them away after the cache expired. It looked innocent at first glance―but it was a problem for V8's generational garbage collector .

The API responses were always promoted to the old generation space causing frequent slow GCs. GC of the old generation is much slower than GC of the young generation. After removing the fallback cache, the application's 99 percentile response time improved by hundreds of milliseconds!

Node.js Gotchas

perf script collects symbols for function addresses from program binaries. For Node.js, we need something special because functions are compiled just in time. As far as I know, there are two ways to record symbols:

Run your Node.js process with --perf-basic-prof-only-functions option. It generates a log file at /tmp/perf-${pid}.map . The file keeps growing. The speed depends on your application, but it was a few megabytes per day for an application at work. Another problem is that functions in V8 keep moving and the addresses in /tmp/perf-${pid}.map get outdated. I wrote a script to fix the issue . Use mmarchini/node-linux-perf . It generates the same /tmp/perf-${pid}.map as --perf-basic-prof-only-functions does, but on demand. Because it always freshly generates the file, it doesn't contain outdated symbols. It seems to be the way to go, but I haven't tried this on production yet.

In addition to the above, there are a few more Node.js options that you can use to improve your stack traces―though I haven't tried them on production because the stack traces were already good enough for me:

--no-turbo-inlining turns off function inlining, which is an optimization done by V8. Because function inlining fuses multiple functions into one, it can make it harder to understand stack traces. Turning it off generates more named frames. --interpreted-frames-native-stack fixes Builtin:InterpereterEntryTrampoline in stack traces. It is available from Node.js 10.4.0. Check out "Interpreted Frames" in Updates from the Diagnostics Summit for more details. Docker Gotchas

It gets a bit tricky when you are using containers to run your application. There are two ways to use Linux perf with Docker:

Run perf record and perf script in the same Docker container as your application is running Run perf record and perf script in the host OS

I eventually chose the option 2. I tried the option 1 first but gave up because I was using Alpine Linux as the base image and it was hard to make Linux perf available on it.

To run perf record in the host OS, we need to figure out pid of the application process in the host.

$ ps ax | grep -n 'node --perf' 21574 pts/0 Sl+ 2:53 node --perf-basic-prof-only-functions src/index.js 30481 pts/3 S+ 0:00 grep --color=auto node --perf # or $ pgrep -f 'node --perf' 21574 $ sudo perf record -F 99 -p 21574 -g -- sleep 30s

perf script collects symbols from binaries and symbol files to get human-readable function names. It needs to be able to read the binaries whose functions were recorded with perf script and /tmp/${pid}.map files that applications generate. However, perf script in the host OS cannot read them with the same file names as the container can. (It seems that this is not the case anymore with the latest Linux kernel because its perf command knows containers . But it was the case for me because I was not using the latest kernel.)

I learned how to overcome the issue from Making FlameGraphs with Containerized Java . I just copied necessary files from the container to the host.

# Horrible hack! Binaries to be used depend on your set up. `perf script` tells you what it wants if anything is missing. sudo docker cp mycontainer:/usr/bin/node /usr/bin/node sudo docker cp mycontainer:/lib/ld-musl-x86_64.so.1 /lib/ld-musl-x86_64.so.1 sudo docker cp mycontainer:/usr/lib/libstdc++.so.6.0.22 /usr/lib/libstdc++.so.6.0.22

To copy symbol map files, we need to find the pid in the container. We can do it by checking /proc/${host_pid}/status .

$ cat /proc/21574/status | grep NSpid NSpid: 21574 6 $ sudo docker cp mycontainer:/tmp/perf-6.map /tmp/perf-21574.map

Now everything is ready! Then we can use perf script as usual.

sudo perf script > stacks.${pid}.out

I set up a sample project for profiling a Node.js application on Docker. It was nice to practice profiling a bit before doing it on production!

Conclusion

Linux perf provides great observability to Node.js applications on production. Tools like CPU Flame Graph and FlameScope helped me to identify performance bottlenecks.

There are some gotchas to profile Node.js applications on Docker with Linux perf . It took some time for me to figure out how to do it because Node.js and Linux evolve day by day and I couldn't find many up-to-date resources online. I hope this post is helpful!

本文前端(javascript)相关术语:javascript是什么意思 javascript下载 javascript权威指南 javascript基础教程 javascript 正则表达式 javascript设计模式 javascript高级程序设计 精通javascript javascript教程

代码区博客精选文章
分页:12
转载请注明
本文标题:Node.js under a Microscope: CPU FlameGraph and FlameScope
本站链接:https://www.codesec.net/view/610809.html


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 前端(javascript) | 评论(0) | 阅读(104)