php and threads. Just this 3-word sentence, and we could write a book. As usual, we won't, but give informations and details to some degree about the subject
Let's start by some confusion many people fall in when it comes to such a subject. PHP is not a threaded language. PHP doesn't use threads itself into its heart, and PHP does not natively allow userland code to use threads as parallel mechanism, in any way.
So PHP is very far from other technologies, such as Java (for exemple). In Java, both the language itself is heavily threaded, and it also allows its users to make use of threads into their own programs. Not PHP. And this is for some reasons.
PHP's heart is not threaded, mainly for simplicity. When you'll read next chapter, you'll learn that threads is not "a magical technology that allows any program to run faster". Seems like a sales speech isn't it ? We are not sales, but technical , and so we know what we talk about. So PHP's engine does not use threads at the moment. It could in the future, but using threads introduce many many new difficulties in programming, for a result that could not be what you expect. The main difficulty is cross platform thread programming. The second one is shared resources and lock management, and the third one is that not every program can be turned to thread programming. PHP's design was born mainly around year 2000, at this time, thread programming was not that spreaded and mature, and engineers behind PHP (mainly Zend) decided to create a full monolithic engine with no threads (also they did not have the resource to ship a stable crossplatform threaded engine).
Second point is that PHP userland code can't use threads, because it is not how PHP expects your code to run. PHP is a fire-and-forget language, you should treat your request as fast as possible, and release PHP so that it can treat the next-to-come request. PHP has been designed as a glue language : you don't compute complex tasks that could require the usage of threads, but instead you access fast-and-ready resources, glue all together, and send that back to the user. With PHP you do things, and whatever could take "more time than usual" should not be done in PHP. That's why we use "Queue based" system to async some heavy tasks in PHP (Gearman, AMQP, ActiveMQ etc...). Unix way of seeing things : "develop small self-contained tools and link them together". So PHP is not designed to allow massive parallelism but other specialized technologies are - use the right tool for the right problem.
Let's quickly remind souls about threads. Remember that we won't detail many things, and that you may find in books or on the Web everything you ever wanted to know about threads, in deep.
Threads are light unit of work treatment that reside into processes. Note the ownership : a process can spawn threads, and a thread must be part of one process (and just one). Process is the base unit of work under an Operating System (OS). Processes are heavy units of work treatment. On multi-CPU machines (nowadays' machines), several CPU will run in parallel and will compute some load for some tasks to run. If two processes A and B are ready to be scheduled, and two CPUs (or two CPU cores) are ready to take some load, then A and B should get scheduled in the same time. The machine will then effectively compute several things in one solo unique unit of time (time frame), we call that "parallelism".
A process :
A thread :
All together :
A and B previously were processes : full independant workloads. Threads are not processes. Threads are unit of executions that live into a process. That is, a process can decide to cut its job into several more little tasks, that could run concurrently. For example, process A and B could each spawn threads, A1, A2 and B1, B2. If the machine hosts several CPUs (8 CPUs for example), then A1, A2, B1 and B2 could be run in the same timeframe.
Threads, are a way to cut a process job into several small jobs, that could be run in parallel (in the same timeframe). Threads are run barely the same way processes are : they own a state that the Kernel thread scheduler will use to manage them.
Threads are lighter than processes, they only need a stack and some registers, whereas a process needs many more things (a new VM frame from the kernel, a heap, some signal informations, some file descriptor informations, some locks informations etc...).
Processes memory is hardware managed by the Kernel and the MMU, whereas thread memory is software managed by the programmer and the threading library used.
What you can memorize is that threads are lighter than processes. If well used, they'll run faster than processes, as the OS Kernel is very less involved in thread management and scheduling that it would be with processes.
As we've seen, threads have their own stack, that is when they access variables declared into a function, they own their own copy of such data.
But we can't say the same about the process heap : that latter is shared accross threads, so are global variables and file descriptors. This is an advantage, or a drawback. If you only read from global memory, you need to read at the right moment (after thread X and before thread Y for example). If you happen to write to it, you then need to make sure several threads don't try to write to the same memory area at the same time : they would corrupt that area and leave the memory in an unpredictable state; what we call a race condition . This is the main challenge behind thread programming.
For those concurrent access to happen, you need to incorporate into your code some programming technics such as reentrancy or synchronization routines. Reentrancy prevents concurrency, whereas synchronization masters concurrency in a predictable way.
Processes don't share any memory between them, the OS perfectly isolate them. Threads, however, share a big amount of the same process memory.
Having a big part of the memory shared, there is a need to synchronize common memory access, technical tools are used such as semaphores or mutexes (the most common ones). Those are based on a "lock" concept, that is if the resource is locked and a thread tries to access it, it will (by default) block, waiting for the shared resource to be available. And this is why using threads don't automaticaly means your program will run faster. If you don't divide the tasks efficiently, and if you don't manage the shared memory locking efficiently, you'll end up having a program that takes more time to run than it would in one solo process with no threads : just because your threads keep waiting for each other (and I don't talk about dead locks, starvation, etc...).
Thread programming is trully complex if you are not used to it. You'll need many many hours of practice, and tons of WTF moments to gain experience to work with threads. Should you forget one little detail and your whole program will blow up at your face. Debugging threads is harder than debugging a thread-free program, as we are talking about real use cases of hundreds or thousands of threads running into a process. You get lost into your mind, and you quickly sink deep in the pool.
Thread programming is hard. Good thread programming, and good program parallel computing is really a challenge.
As sharing memory that way is not always what we want, Thread Local Storage (TLS) appeared. TLS is mainly a concept of "globals owned by threads", those are memory areas that represent global state, but private to each thread. To implement TLS, on thread creation, one must allocate some process heap memory, ask the thread library for a key and associate that key to that storage. Every further access will use the key to unlock the thread-specific storage. A destructor is needed at the end of thread life as well.
An application is said "thread safe", when it fully masters every global resource access in a 100% predictable way. If not : random things start to happen and the game is over.
As you may have guessed, threads require some OS Kernel help. Threads have appeared in OS back in mid nineties, so that's quite a long time ago : they are mature and managed by Kernel OSes since a long time.
But there still exists some crossplatform issues. Especially windows against Unix worlds. Both have adopted different threading models, and different thread libraries.
Programming with threads and supporting crossplatform is still a challenge as of nowadays.
Under linux, to create both a thread or a process, the Kernel system call is clone() . But that system call is extremely complex, thus as usual some C code have emerged around the syscalls to ease day to day programming using threads. Thread operations are not yet managed by the libc (C11 standard has started such a move), but by external libraries. Nowadays, under Unix flavors, pthread is used (though other libraries exist). Pthread stands for "Posix threads", which is a POSIX normalization of thread usage and behavior dating back from 1995. Hence, if you want to use threads in your program, you'll need to link it with libpthread, aka pass the -lpthread switch to GCC. Also, libpthread is a library. It is written in C, open source , and have its own version control and management.
So nowadays we mainly use the pthread library to program threads under Unix flavors. Not going into details again, pthread allows concurrency but parallelism is dependant on the OS and the machine.
Concurrency is multiple threads running on the same CPU out of order. Parallelism is multiple threads running at the same time on different CPUs.
Here is some concurrency :
Here is some parallelism :
What happens to PHP in there ?. Let's start by reminders :PHP is not a threaded language : its engine and its code don't manage threads to parallelize its own internal work. PHP doesn't offer threads to users : You can't use threads with the PHP language natively. Joe Watkins , PHP Core developper, created a nice library that adds threads to userland : ext/pthread . It is a nice project, but I personnaly wouldn't use PHP for such tasks : it's not the right language for that, I'll go with C or Java for example.
So, what about threads and PHP, what's the point ?How PHP treats requests
It is all about how PHP will handle HTTP requests. To serve several clients at the same time, a webserver needs some concurrency (or some parallelism). You can't pause everyone as you are answering to just one client right ?
Thus, what servers usually do is they use multiple processes, or multiple threads , to answer clients.
Historically, under Unix, the process model is used. Simply because processes is the basic of Unix, once Unix was born, processes was born with the ability to create new ones ( fork() ), destroy them ( exit() ) and synchronize them ( wait() , waitpid() ). In such environnements, multiple PHP will serve multiple requests for clients, but each one will be in its own process .
If you remember the introduction chapters, in such a case, there is nothing to do into PHP code : processes are fully isolated between them, and process A treating request A about client data A will not be able to communicate (read or write) with process B treating request B about client B. And this is what we want.
Such models include php-fpm , and Apache with mpm_prefork . Usually in 98% of cases you use one of theses two architectures.
Things get more complicated under Windows, or under Unixes where your server uses threads.
Windows is a great operating system (true). It has just one drawback : its source code is closed. But many technical resources about its internal engine can be found on the Web or into books . Microsoft engineers share many knowledge about how Windows works into its heart.
Microsoft Windows has taken a different path from Unixes when it comes to concurrency or parallelism. Windows heavily relies on threads. In fact, creating a process in Windows is such an overkill heavy task that you usually don't do it. Under Windows you use threads, everywhere, everytime. Windows threads are order of magnitude more powerful that Linux ones ; yes they are.
So when you run PHP under Windows, the webserver (whatever it is, IIS , Apache, FooBarBaz) will treat different clients into threads, and not into processes. That means that in such an environment, PHP will run into a thread ; and in that case; PHP must be extra carefull about thread specifications : it must be thread safe .
PHP must be thread safe, that is it must master the concurrency it hasn't itself created, but leaves in/with. As you may have guessed, that will mean that PHP will have to find a way to protect its access to global variables ; and there are many of them into PHP's heart.
The layer that is responsible of such a protection is called Zend Thread Safety, aka ZTS.
Please, note that the same is true under Unix if you happen to use threads as the way of parallelize client request treatments, but that is a very very uncommon situation as under Unix we are very used to using classical processes for such a task. Also, if you happen to use a PHP extension that requires thread safety to be activated - such as ext/pthread - you will need a thread safe PHP.Zend Thread Safety internal details
Ok here we go. ZTS is activated using the --enable-maintainer-zts switch. As said before, you usually don't need this switch, until you run PHP under Windows, or you run PHP with an extension that needs the engine to be thread safe (like ext/pthread for example).
To check against ZTS, you have several ways to achieve that. Use CLI and php -v, which tells you NTS (Not Thread Safe) or ZTS (Zend Thread Safe).
You can also ask phpinfo() :
In your code, you can read the PHP_ZTS constant value from PHP.
All PHP's heart is thread safe when compiled with ZTS. What could not be thread safe are extensions you activated. Official PHP extensions (distributed with PHP) are all thread safe, but for other third-party ones, who knows ? You will see in a few moment that mastering thread safety from PHP extensions needs some special programming API usage, and as always with threads : one miss and you risk to have your whole server blow at your face.
Remember that with threads, if you dont call reentrant functions (many from libc) or if you access a true global variable blindly, you are going to generate some weird behaviors in all the sibling threads . Translated to the PHP use-case : if you mess-up with threads in one of your extension, you are going to impact every client occupied in every other thread of the webserver! This is absolutely dramatic situation, as one client could corrupt every other clients data.
When designing PHP extensions, ultra care and very good knowledge of thread programming are necessary. If not, when running in a thread environment, you're gonna break the whole webserver.Use and design reentrant functions
When designing a PHP extension, use reentrant functions. Reentrant functions are functions that don't rely on any global state to work. This is simplified, the true definition is that reentrant functions are functions that can be called as they've not finished to be called yet. Think about functions that can be run in parallel in two or more threads. It then becomes obvious that if such functions use global state, they are not reentrant (but they could lock their global state, and thus be thread-safe either ;-)). Many libc traditional functions are not reentrant, because they've been designed in a time where threads simply did not exist yet. So some libc (especially glibc) publish reentrant equivalent functions as functions suffixed by _r() . Also, the new C11 standard gives a big room to threads, and C11 libcs benefit from a rewrite with an s suffix : _s() ( localtime_s() for example).
Aka, strtok() => strtok_r() ; strerror() , strerror_r() ; or readdir() => readdir_r() . etc...
PHP itself provide some of them mainly for crossplatforms purpose. Have a look at main/reentrancy.c .
Also, if you happen to write your own C functions, think about reentrancy. If you can pass your function everything it needs as arguments (on the stack or through registers so), and if that function doesn't use any global/static variables and any non-reentrant function; it is then reentrant.Don't link against non-thread safe libraries
Still obvious, remember that thread programming is about the whole process memory image being shared, and the whole process memory image includes any linked libraries.
If your extension links against a known-to-not-be-thread-safe library, then you will have to develop you own thread safety tricks to protect access to global state into such a library. Something really comon in C and thread programing, but that is easy to miss.ZTS usage and details
本文开发（php）相关术语:php代码审计工具 php开发工程师 移动开发者大会 移动互联网开发 web开发工程师 软件开发流程 软件开发工程师