While I was working with the R programming language, I always somehow found it to be slow coming from a JAVA/C++ background. Then I discovered the "RCpp" package which allows me to write C++ codes and call them from R. It greatly improved my program speed by an order of 100-1000x. Most of the coding that I did was in C++ while the interface remained in R. Since I moved to python, I found that using libraries like "NumPy" greatly improves the performance of any code as compared to doing the calculations manually in a for-loop. But then I always wanted the raw power of C/C++ in my Python codes.

What hold me back was the ease of coding in Python. I was being more focussed on solving the problem rather than looking to gain a 10-20 ms boost in speed by optimizing the code (assuming the algorithm is optimized). But once I started to do production deployment, I realized that multiple such codes are part of a pipeline, thus the delay adds up.

Cython is the 'C equivalent' of Python. Every Python program is a valid Cython program (but the reverse is not true).

Python is an interpreted programming language which means it is dynamically typed i.e. we do not need to specify the data types and these are interpreted at run-time only. Cython is a compiled programming language like C and C++. It is statically typed i.e. we specify the data types in the code itself and no additional cost is incurred at run-time. Best part of Cython is that instead of writing an entire code in C or C++ and then importing it, Cython requires minimal changes to existing Python code but with lots of boost in speed. We can also import codes written entirely in C or C++ using Cython similar to RCpp. Loops are inherently slow in interpreted language , which is mitigated by moving entire blocks of code involving loops to C or C++.Without getting into the internals of Cython language, I would be demonstrating how to write Python codes in Cython and analyze and compare the run-times.

The first example that I am going to show is generating primes upto some N using the Sieve of Eratosthenes approach. The sieve algorithm works by eliminating all multiples of a prime number in each iteration, i.e. first remove all factors of 2, then remove factors of 3, 5, 7 and so on.

Observe that after we are done eliminating all multiples of a prime p i , the next number to consider is the next prime p i+1 because all composite numbers betweenp i andp i+1 are already eliminated by p 0 , p 1 , ... and so on. Also the smallest remaining multiple forp i is (p i ) 2 because all multiples ofp i less than(p i ) 2 are eliminated.

Create a file named "sieve_python.py" and add the following Python code to it:

import numpy as np def sieve(n): arr = np.empty(n+1, dtype=np.int32) arr.fill(1) arr[0], arr[1] = 0, 0 sqrt_n = int(np.sqrt(n)) for i in range(2, sqrt_n+1): if arr[i] == 1: j = i**2 while j <= n: arr[j] = 0 j += i return np.nonzero(arr)Note that the above code is not optimized because we are not 'eliminating' the composites but just setting their bit value to 0. Alternately we can use a doubly linked list which allows us to delete nodes corresponding to composites in O(1) time complexity. But for comparison purpose at this moment, lets say that we use this version.

We need to benchmark the time taken with the plain Python version. On a Jupyter or IPython notebook, run the following code:

from sieve_python import sieve %timeit sieve(5000000)We get the following output:

1 loop, best of 3: 3.15 s per loopThus, the plain Python version takes around 3.15 s to obtain all prime numbers less than N=5 million.

Let's create the Cython version for the same code. Create a file named 'sieve_cython.pyx' in the same folder.

import numpy as np def sieve(int n): np_arr = np.empty(n+1, dtype=np.int32) np_arr.fill(1) cdef int[:] arr = np_arr cdef int sqrt_n, i, j arr[0], arr[1] = 0, 0 sqrt_n = int(np.sqrt(n)) for i in range(2, sqrt_n+1): if arr[i] == 1: j = i**2 while j <= n: arr[j] = 0 j += i return np.nonzero(arr) The line 'cdef int[:] arr = np_arr' creates a typed memory-view of a NumPy array. This is something special to Cython, where NumPy arrays as well as C arrays can be casted into Cython memory-views.The 'dtype' for the NumPy array is set to 'np.int32' because in C integer is 32 bit, 'np.int' is same as 'np.int64' i.e. 64 bit integer which is 'long' in C.

Since Cython is compiled programming language, we need to build and compile the above Cython code. Create a "setup.py" file in the same location:

from distutils.core import setup from Cython.Build import cythonize setup(name="sieve_cython", ext_modules=cythonize('sieve_cython.pyx'),)Run the following command on the command line to build the Cython file:

python setup.py build_ext --inplaceThis will generate a C file "sieve_cython.c" which is the C translation of the Cython code and a "build" folder containing the python distribution package.

Now we can run the Cython version of the sieve method in the same way as a Python function:

from sieve_cython import sieve %timeit sieve(5000000)With Cython code for sieve algorithm we obtain the following timing:

10 loops, best of 3: 62.6 ms per loopJust 62.6 ms as compared to 3.15 s with naive Python version, i.e. an improvement of around 50x. The changes to the original Python is very minimal, i.e. we only added the data types to the variables

We can check, which part of the Cython code uses pure Python and which part uses C, by running the following command:

cython sieve_cython.pyx -aWhich will generate a .html file as follows:

The lines marked in bright yellow are lines that uses pure Python, whereas lines without any yellow background uses pure C. The brighter the yellow background, the more 'Pythonic' it is.

This gives us "hint" that where we need to focus on in order to improve the performance of our code. But we notice that most of the yellow bands are around NumPy method calls, which is already highly optimized, thus it might not be further possible to improve the performance of the Cython code.

Without giving up hope, we try to use mostly pure C syntax:

from libc.stdlib cimport malloc, free from libc.math cimport sqrt def sieve(int n): cdef int *arr = <int *> malloc((n+1) * sizeof(int)) cdef int sqrt_n, i, j for i in range(2, n+1): arr[i] = 1本文开发（python）相关术语:python基础教程 python多线程 web开发工程师 软件开发工程师 软件开发流程

本文标题：Speeding up with Cython

本站链接：https://www.codesec.net/view/628048.html

1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责；

2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性，不作出任何保证或承若；

3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。