I put the essence of Python coroutine Pa was clean!

This article is a large amount of information, from IO multiplexing, to use the generator, then async, await realization of the principle behind it, in simple terms, the analysis was very thorough, very hardcore!

Two days for personal reasons because it did not touch a long time to write a point of Python, which involves "coroutine" program, the last time out, it is Web framework tornado unique feature, now we have async, await keyword support . Thought about its implementation, reviews the evolution of these years, feel a little bit mean.

They are single-threaded, why the original code with the low efficiency of the async, await add some asynchronous library becomes higher efficiency?

They are single-threaded, why the original code with the low efficiency of the async, await add some asynchronous library becomes higher efficiency?

If you do Python-based network or Web development, this question has puzzled, this article attempts to give an answer.

Before beginning 0x00

Firstly, Not take you browse the source codeAnd then tell you the control to achieve the original code Python standard. Instead, we will set out from the real problems, think of solutions to the problem, step by step evolution path experience solutions, and most importantly, hoping to gain knowledge in a systematic upgrade process.

This article only provides an independent direction of thinking, does not follow the historical and current actual implementation details.

Secondly, you need to read this article familiar with Python, at least understand the concept generator generator of Python.

0x01 IO multiplexing

This is the key to performance. But we are here only to explain the concept, its implementation details is not the point, which we understand Python coroutine enough, as already know enough about, advances to 0x02.

First, you want to know all the network service program is a huge loop, your business logic can be called at some point in this cycle:

defhandler (request):

WHILETRUE:

# Get a new request

request=accept

# To get users to write business logic function according to the route map

Handler=GET_HANDLER (Request)

Handler (Request)

Imagine your Web service of a handler, after receiving a request requires a response to the results of API calls.

For the most traditional network applications, your API requests issued to wait for a response after this time the program stops running, even new requests have to get in after the end of the response. If you rely on an API request packet loss seriously, especially in response to slow it? That will be very low throughput applications.

Many traditional Web server using multi-threading technology to solve this problem: the run handler is placed on other threads, each dealing with a request, this does not affect the new thread blocks request to enter. This problem can be solved to some extent, but for larger systems concurrent, multithreaded scheduling will bring significant performance overhead.

IO multiplexing can be done to solve the problem without the use of threads, it is provided by the operating system kernel functions, we can say specifically for this type of scenario for us. Simply put, your program encounters network IO, tells the operating system to help you staring at, while the operating system gives you a way to make you can feel free to get what IO operation has been completed. like this:

# # 操作 系统 复 复 示 示 例 代代

# Register the ID and type of IO operations to the operating system IO

IO_REGISTER (IO_ID, IO_TYPE)

# Get completed IO operations

Events=IO_GET_FINISHED

For (IO_ID, IO_TYPE) INEvents:

IFIO_TYPE==Read:

Data=read_data (IO_ID)

Elifio_Type==Write:

Write_data (IO_ID, DATA)

Gring the IO multiplex logic into our server, probably like this:

Call_backs={}

Defhandler (REQ):

# do jobs here

DefCall_back (Result):

# Use the returned Result to complete the remaining work …

Call_backs [IO_ID]=CALL_BACK

# New cycle

WHILETRUE:

# Get the completed IO event

IFIO_TYPE==Read: # read

Data=read (IO_ID)

Call_back=call_backs [io_id]

Call_back (data)

Else:

# Other types of IO event processing

PASS

# Get a new request

Handler (Request)

Our Handler has returned immediately for the IO operation. At the same time, each iteration will perform a callback over the completed IO, the network request no longer blocks the entire server.

The pseudo code above is only for understanding, and the details are more complicated. Moreover, it is also possible to connect the new request to the IO event from the operating system to the monitor port.

If we split the cycle part with a call_backs dictionary to a separate module, we can get an EventLoop, which is the iOLOOP provided in the Python Standard Library Asynci.

0x02 with generator to eliminate Callback

He focuses on the Handler function written in our business, after having independent iOLOOP, it now becomes like this:

# 业 业 代 代 … …

# Need an API request

Print (Result)

ask_LOOP.GET_EVENT_LOOP.IO_CALL (API, CALL_BACK)

Here, performance problems have been resolved: We no longer need multi-threads to constantly accept new requests in the source, and don’t have to rely on the API response.

But we have also introduced a new problem. The original business logic code is now demolished. The code before requesting the API is still normal. The code after the request API can only be written in the callback function.

Here our business logic has only one API call. If there are multiple APIs, plus the call to Redis or MySQL (their essential is also a network request), the entire logic will be split, this is a burden on business development .

For some languages ??with anonymous functions (right is Java), it may also trigger a so-called "turning hell".

Next, we find way to solve this problem.

We can easily think that if the function can be suspended after running to the network IO operation, it will wake up at the breakpoint after completion.

If you are familiar with Python’s "Builder", you should find that it happens to have this feature:

Defexample:

Value=yield2

Print ("Get", Value)

ReturnValue

g=esample

# 启 启 生器, we will get 2

Got=G.send (NONE)

Print (got) # 2

TRY:

# Anti-start will display "get 4", which is our incoming value

Got=g.send (got * 2)

ExceptStopItemization ASE:

# Builder runs, will print (4), E.Value is the value of generator return

Print (E.Value)

There is Yield keyword in the function, and the call function will get a generator, and a key method for generator can interact with the generator.

G.send (none) runs the generator code until you encounter Yield, and return to the object, that is, 2, the generator code is stopped here until we perform G.send (got * 2) again, The 2 * 2 is also 4 to assign the value Value in front of Yield, and then continue to run the generator code.

Yield is like a door, you can send a thing from here, you can also take another thing.

If Send makes the generator to run until the next yield is over, the Send call will trigger a special exception STOPITERATION, which comes with a property Value, which is the value of the generator Return.

If we convert our Handler to a generator with Yield keyword, run it to The specific content of IO operationReturns, put the IO result back and restore the generator to run, then solve the problem of uncomfortable business code:

# 业 业 代 代 … …

# Need to execute an API request, directly put the IO request information yield

Result=yieldio_info

# Use the result returned by the API to complete the remaining work

Print (Result)

# This function is registered in iOLOOP, used to call back when there is a new request

Defon_Request (request):

Handler=GET_HANDLER (Request)

g=Handler (Request)

# 首 首 启 获得 获得 i 获得

IO_INFO=G.send (none)

g.send (Result)

ask_LOOP.GET_EVENT_LOOP.IO_CALL (IO_INFO, CALL_BACK)

The above example, the Handler code written by the user will not be dispersed into the callback, and the ON_Request function interacts with Callback and IOLOOP, but it will be implemented in the web framework, which is not visible to the user.

The above code is enough to give us inspiration of Callback destroyed with the builder, but there are two points:

  1. Only a network IO is initiated in business logic, but it is often more

  2. Business logic does not call other asynchronous functions (helping), but in practice, we tend to call other levels.

Let’s take a more complex example:

Among them, Request executes real IO, FUNC1, FUNC2 is only called. Obviously our code can only be written:

Deffunc1:

Ret=yieldfunc2 (re)

returnret

Deffunc2 (DATA):

ReturnResult

DEFREQUEST (URL):

# This simulation returns an IO operation, contains all information about the IO operation, where the string is simplified

Result=yield "IOJOB OF% S"% URL

ReturnResult

For Request, we expose the IO operation to the framework through Yield.

for Func1 and func2, calling request, clearly add Yield keywords Otherwise, the request call returns a generator and will not be paused and continue to perform subsequent logic obviously errors.

This is basically that we don’t write asynchronous code in the Tornado framework without Yield from, Aysnc, AWAIT.

To run the entire calling stack, the approximate process is as follows:

  1. Call FUNC1 to get the generator

  2. Call Send (None) Start it gets the result of request ("http://test.com/foo"), or generator object

  3. Send (none) Starts the generator generated by the request, gets the IO operation, registered by the frame to IOLOOP and specify a callback

  4. The Request Builder wakes up in the callback function after IO, and the generator will go to the return statement end

  5. Capture an exception to get the return value of the Request Builder, wake up the last layer of FUNC1, and get a FUNC2 generator

  6. Continue to execute …

Call FUNC1 to get the generator

Call Send (None) Start it gets the result of request ("http://test.com/foo"), or generator object

Send (none) Starts the generator generated by the request, gets the IO operation, registered by the frame to IOLOOP and specify a callback

The Request Builder wakes up in the callback function after IO, and the generator will go to the return statement end

Capture an exception to get the return value of the Request Builder, wake up the last layer of FUNC1, and get a FUNC2 generator

Continue to execute …

Friends who are familiar with the algorithm and data structure encounter such a traversal logic that will be returned, can be recursively used, because the recursive use generator can not do it, we can use the stack, in fact, this is the word "call stack" origin.

With the stack, we can Connect all generators connected in series in the entire call chain to a generatorFor its constant Send, you can continue to get all IO operation information and drive the call chain advancement, and the implementation method is as follows:

  1. The first generator is in the stack

  2. Call the Send, if you get the generator, you will enter the next round iteration

  3. I encountered IO to ask Yield, let the frame sign up to iOLOOP

  4. After the IO operation is completed, the cache result is forth, enter the next round iteration, the purpose is to restore the upper function using IO results.

  5. If a generator is running, you also need to restore the upper function to the upper function.

The first generator is in the stack

Call the Send, if you get the generator, you will enter the next round iteration

I encountered IO to ask Yield, let the frame sign up to iOLOOP

After the IO operation is completed, the cache result is forth, enter the next round iteration, the purpose is to restore the upper function using IO results.

If a generator is running, you also need to restore the upper function to the upper function.

If implemented, the code is not long but the amount of information is relatively large.

It turns the entire call chain to a generator, calling the send, to complete the IO in the chain, complete these IO, continue to push the logic execution in the calling chain until the overall logic ends:

DEFWrapper (GEN):

# The first layer calls the stack

Stack=stack

Stack.push (gen)

# Start a layer-by-layer call

WHILETRUE:

# Get the top elements of the stack

item=stack.peak

Result=none

IFisgenerator (item):

TRY:

# Try to get the next layer call and get it in the stack

Child=item.send (Result)

Stack.push (child)

# Result Restore to NONE

Result=none

# After entering the stack, enter the next loop directly, continue to explore down

Continue

# If you have an end, you will temporarily save the Result, the next step to make yourself out.

Result=E.Value

Else: # o o operation

# # I 操作 操作, Yield, Io will be woken up and temporarily saved after IO

Result=yieldItem

# 走 到 here, this layer has been completed, out of the stack, the next iteration will be a layer of calling chain

Stack.pop

# 没有有 上, the entire call chain is completed, return

Ifstack.empty:

Print ("finished")

ReturnResult

This may be the most complicated part. If you look hard, it is actually as long as you understand that for the call chain in the example above, it can achieve the effect as follows:

W=Wrapper (Func1)

# Will get "IOJOB of http://test.com/foo"

W.send (none)

# 上 上 ojob foo completed the result "bar" incompart, continue to run, get "IOJOB OF http://test.com/bar"

W.send ("bar")

# 上 上 i i b 完成 完成 传 传 传 传 入 入 入 入 入 入 入 入 入 入 入 入 入

W.send ("BARZ")

With this part, the frame will be added to the matching code:

# Maintain a list of ready lists, store all completed IO events, format is (Wrapper, Result)

Ready=

# After using the wrapper package, you can process IO by Send.

g=wrapper (func1)

# Take the start state directly as the result of NONE

Ready.Append ((g, none))

# Let the iOLOOP perform this function each cycle to handle the ready for IO

Ready.Append ((g, result))

# Traversing all already generators, advance it down

Forg, Result InselF.Ready:

# Use the Result to wake the builder and get the next IO operation

IO_JOB=G.send (Result)

# After the IO operation is complete, add the generator to the ready list, wait for the next round of processing.

ask_LOOP.GET_EVENT_LOOP.IO_CALL (

IO_JOB, LambdareSult: Ready.Append ((g, result)

Here, the core idea is to maintain a list of ready-to-read, and IOLOOP is overwhelmed, and the generator that promotes the ready state is run down, and the new IO operation is registered. After the IO is completed, the ready, after several rounds of Ioloop iteration A Handler will eventually be executed.

At this point, we use the generator to write to write business logic to run normally.

0x04 Improved Scalability

If you can read it here, Python’s scope is basically clear.

We have already achieved one Miniature sweeping frameworkThe realization details of the standard library look great here, but the specific ideas are consistent.

Our equilation framework has a restriction, we can only use IO to operate asynchronously, although in the world of network programming and web programming, the block is basically only IO operation, but there are some exceptions, such as I want the current operation Sleep for a few seconds. The use of time.sleep will make the entire thread to block, requiring special implementation. For example, some CPU-intensive operations can be asynchronously through multi-threaded asynchronous, so that another thread notification event has been completed and followed.

Therefore, it is best to decouple an open space with the network, so that the network IO is only one of the scenes, improves the scalability.

The Python official solution is to let the user hand to block the block code. As for the IOLOOP to register IO event or open a thread completely by yourself, and provide a standard "placeholder" FUTURE, indicating that his results wait for the future Yes, some prototypes are as follows:

ClassFuture:

# Set the result

Defset_Result (Result): Pass

# 获取 结果 结果

Defresult: Pass

# 表示 表示 This Future object is not already set up.

Defdone: Pass

# Set the callback function that should be executed when he is set, you can set multiple

Defadd_done_callback (Callback): Pass

Our slight change can support Future, making the scalability stronger. Network request functions for user code:

# 现在 r es es,, 生 生 器 器 器 器 器 器 器 器

#future is understood as a placeholder

Fut=future

Defcallback (Result):

# Assign the placeholder when the network IO completed the callback

Fut.set_Result (Result)

ask_LOOP.GET_EVENT_LOOP.IO_CALL (URL, CALLBACK)

Now, Request is no longer a generator, but directly returns Future.

And for the function of processing the ready list in the frame:

DEFCALLBACK (FUT):

#future is set to be placed in the ready list

Ready.Append ((g, fut.result))

Fut=g.send (Result)

Fut.add_done_callback (callback)

0x05 development and change

Many years ago, when using Tornado, probably only one Yield keyword is available, the sweeper wants to achieve, that is, this idea, even Yield keywords and return keywords can not appear in a function, you want to run after the generator Returns a value, you need to manually Raise an exception, although the effect is the same as Return now, but it is still very awkward, not elegant.

Later, there was Yield from expression. What can it do?

It is popular, it is done what the generator Wrapper is doing the above: Calling link through the stack, it is the syntax of the Wrapper logic.

With it, the same example you can write:

Deffunc1:

# Note Yield from

Ret=yieldFromRequest ("http://test.com/foo")

# Note Yield from

Ret=yieldfromfunc2 (re)

returnret

Deffunc2 (DATA):

# Note Yield from

Result=yieldfromRequest ("http://test.com/"+data)

ReturnResult

# 同 上 上 实 实 实 实 实 r

Then you no longer need the brainless Wrapper function:

g=func1

# Return the first request for Future

g.send (none)

# Continue to run, automatically enter FUNC2 and get the FUTURE inside it

G.send ("bar")

# Continue to run, complete the residual logic of the call chain, throw the stopiteration exception

G.send ("BARZ")

Yield from the entire call chain directly, it is already great, but it is used asynchronous programming or otherwise, and other languages ??have special-top Async, the AWAIT keyword, until the later version puts these content With dedicated Async, AWAIT keyword packaging, it is a more elegant look today.

0x06 summary and comparison

Overall, Python’s native and trip is achieved from two aspects:

  1. Based on IO multiplexing technology, the entire application is non-blocking on IO, achieving high efficiency

  2. Change the dispersed Callback code through the generator into synchronous code, reducing business writing difficulties

Based on IO multiplexing technology, the entire application is non-blocking on IO, achieving high efficiency

Change the dispersed Callback code through the generator into synchronous code, reducing business writing difficulties

There is a language of the object of the generator. Its IO fight is achieved, the evolution of the Java fight is basically exactly, the keyword is the same, and the Future class is the same than the promise.

However, it is different for this, which is different from this sweeping with the degree of GO-named GO, and does not explicitly based on the generator.

If the class ratio, you can implement the geventime of Python, which is the runtime, and Patch off the system calls to access your own Runtime, you come to the scheduling sweeper, gevent is focused on the network, based on network IO scheduling, relatively simple, The GO achieves perfect multi-core support, more complex and perfect, and creates a new CHANNEL new programming paradigm.

Author: Mao bean peanut

Getting Started: The Most Complete Zero-Basic Python Problem | Zero-Based 8 Months Python | Battle Project | Learning Python is this shortcut

Dry goods: crawling Douban short comment, movie "later we" | 38 years old NBA best player analysis | From Wanzhong to Word! Tang Dynasty 3 disappointment | Laughing to see the new Eti Dragon Slay Dollar | Running Question King | Make a massive Miss in Python Sketch | Disc, I am so fire, I use machine to learn to be a mini recommended system movie

Fun: Poultry Game | Nine Mao Hai | Beautiful Flower | Two-Article Python "Everyday Cool" game!

AI: Robot that will be poetry | Give the picture color | predictive income | Solver, I use the machine to learn to be a mini recommended system movie

Gadget: PDF to Word, easy to get forms and watermarks! | One button saves the HTML page as PDF! Goodbye PDF to extract charges! Use 90 lines of code to create the strongest PDF converter, Word, PPT, Excel, Markdown, HTML one-to-date conversion | Make a staple low-cost ticket prompt! | 60 lines of code made a voice wallpaper switch every day to see a little sister! |

Annual explosion case

  • 1). Lying! PDF to Word Use Python to easily get it!
  • 2) Learn Python is really fragrant! I made a website with 100 lines of code, helping people PS travel pictures, earn a chicken leg to eat
  • 3). The first broadcast over 100 million, hot all net, I analyzed the "Sister Taking Wind and Waves" and found these secrets
  • 4) 80 lines of code! Do a Dream in Python with Python
  • 5). You must master the 20 Python code, short and delicate, useless
  • 6). 30 python hambo skills
  • 7). I summarized 80 "rookie Python selection dry goods.pdf", all dry goods
  • 8). Goodbye Python! I want to learn Go! 2500 word depth analysis!
  • 9). Find a dog welfare! This Python reptile artifact is too cool, automatically download the girl picture