未加星标

Fixing a Tough Memory Leak in Python

字体大小 | |
[开发(python) 所属分类 开发(python) | 发布者 店小二04 | 时间 2018 | 作者 红领巾 ] 0人收藏点击收藏
Fixing a Tough Memory Leak In python

A few of our power users reported that long-runningbacktests would sometimes run out of memory. These power-users are the people who often find new trading strategies and so we wanted to work with them to improve the performance of our backtesting tools. Over the past couple of weeks, our senior engineer found that the problem wasn’t in our code, but in one of the popular Python libraries that we use.

We found the problem in numpyand numba. The leak was ultimately caused by how we were using these libraries. We made the correction and as you can see from the following chart, it really improved the memory utilization for our trade simulator.

The following is the write-up by our senior engineer so that others can learn from our engineering efforts.


Fixing a Tough Memory Leak in Python

Always iterate to completion with Numpy

Finding Python memory leaks using LD_PRELOAD and libunwind

You have a Python process that is consuming memory for an unintended and unknown reason (“leaking”) and you want to investigate. The process may import extension modules that have memory management issues of their own, outside the Python interpreter. You have access to a linux environment for debugging.

We will:

identify the C call stack where memory is allocated, where this memory is not freed in a timely manner LD_PRELOAD a custom library to raise a signal when this stack is encountered handle the signal in python to dump a python call stack Identify problematic C call stacks

Since we will be using libunwind in step 2, and to avoid some writing of code, we will use memleax ― based on libunwind to get started.

This utility attaches to a running process and produces a report of C call stacks where the allocation is not matched by a deallocation for a given period of time. If the analysis is still running and a matching deallocation happens eventually (even after the configured interval), the associated call stack is pruned from the interesting results.

An example result:

// ...
#define UNW_LOCAL_ONLY
#include
// ...
std::set s; // keep track of allocations that have not been freed
std::mutex mut; // protect s
extern "C" {
// LD_PRELOAD will cause the process to call this instead of malloc(3)
void *malloc(size_t size)
{
// on first call, get a function pointer for malloc(3)
static void *(*real_malloc)(size_t) = NULL;
if(!real_malloc)
real_malloc = (void *(*)(size_t))dlsym(RTLD_NEXT, "malloc");
assert(real_malloc);
// call malloc(3)
void *retval = real_malloc(size);
static __thread int dont_recurse = 0; // init to zero on first call
if(dont_recurse)
return retval;
dont_recurse=1; // if anything below calls malloc, skip analysis
// on first call, create cache for symbol at each address
static thread_local std::map <unw_word_t std::string=""> *m = NULL;
if(!m)
m = new std::map <unw_word_t std::string=""> ();
// collect stack symbols, updating cache as needed
unw_cursor_t cursor;
unw_context_t context;
unw_getcontext(&context);
unw_init_local(&cursor, &context);
std::vector trace;
while (unw_step(&cursor) > 0)
{
unw_word_t pc;
unw_get_reg(&cursor, UNW_REG_IP, &pc);
if (pc == 0)break;
std::string &str = (*m)[pc];
if(str=="") // build cache
{unw_word_t offset;// started as C, feel free to use std::string/ostringstreamchar sym[1024], line[1024];sprintf(line,"0x%lx:", pc);if (!unw_get_proc_name(&cursor, sym, sizeof(sym), &offset)) sprintf(&line[strlen(line)], " (%s+0x%lx)\n", sym, offset);else sprintf(&line[strlen(line)], " -- no symbol\n");str = line;
}
trace.push_back(&str);
}
// look for our particular stack context
// - log it
// - save for free()
// - raise signal
if(trace.size() >=4)
if(strstr(trace[0]->c_str(),"npy_alloc_cache"))
if(strstr(trace[3]->c_str(),"array_boolean_subscript"))
{
fprintf(stderr,"malloc @ %p for %lu\n", retval, size);
std::lock_guard g(mut);
s.insert(retval);
raise(SIGUSR1);
}
dont_recurse=0;
return retval;
}
// report matching free() calls
void free(void *ptr)
{
static void (*real_free)(void *) = NULL;
if(!real_free)
real_free = (void (*)(void *))dlsym(RTLD_NEXT, "free");
assert(real_free);
real_free(ptr);
static __thread int dont_recurse = 0;
if(dont_recurse)
return;
dont_recurse=1;
mut.lock(); // in case lock_guard would call malloc/free?
if(s.find(ptr) != s.end())
{
fprintf(stderr,"free @ %p\n", ptr);
s.erase(ptr); // b/c addr will get reused
}
mut.unlock(); // before the following line, unlike with lock_guard
dont_recurse=0;
}
} // extern C
</unw_word_t> </unw_word_t>

Compile with something like:

g++ -std=c++11 -g -Wall -fPIC -shared -o stack_signal.so \
stack_signal.cpp -ldl -lpthread -lunwind

Run the python:

LD_PRELOAD=./stack_signal.so python my.py # or with -m pdb

You should see output like the following:

malloc @ 0x55c2db93e360 for 36020
malloc @ 0x55c2d9691b30 for 84100
free @ 0x55c2db93e360
free @ 0x55c2d9691b30 In python, handle the signal by printing the (python!) stack import signal
import traceback
def debug_signal_handler(signal, frame): traceback.print_stack(frame)
signal.signal(signal.SIGUSR1, debug_signal_handler)

Now your “malloc @” log lines should be followed by stacks in your Python code. In this case it identified the following as the caller of “array_boolean_subscript” in cases where “malloc @” log lines don’t have corresponding “free @” log lines:

... = chunk[to_keep] # a section of a numpy array identified by the boolean array …now that we know where the problem starts

It turned out that the numpy array resulting from the above operation was being passed to a numba generator compiled in “nopython” mode. This generator was not being iterated to completion, which caused the leak. As per the numba documentation, generators must be compiled with forceobj=True in order for the generator finalizer to handle this case. In our case it made sense to add code to ensure that the generator is always iterated to completion, and we retained the “nopython” compilation.

Hopefully, you now have an additional tool at your disposal when confronted with perplexing Python memory leaks! While this methodology applied to this problem could not directly point the finger at numba, it certainly helped to know the line of python where the problem originated.

本文开发(python)相关术语:python基础教程 python多线程 web开发工程师 软件开发工程师 软件开发流程

代码区博客精选文章
分页:12
转载请注明
本文标题:Fixing a Tough Memory Leak in Python
本站链接:https://www.codesec.net/view/620910.html


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 开发(python) | 评论(0) | 阅读(169)