Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client timeout includes connection pool queue wait time #10313

Open
1 task done
mikev9 opened this issue Jan 8, 2025 · 3 comments
Open
1 task done

Client timeout includes connection pool queue wait time #10313

mikev9 opened this issue Jan 8, 2025 · 3 comments

Comments

@mikev9
Copy link

mikev9 commented Jan 8, 2025

Describe the bug

The issue lies in how aiohttp handles timeouts when using ClientSession and connection pooling. Currently, the timeout starts counting from the moment the coroutine session.get() is called, not from when the connection is acquired from the connection pool. This means that the time spent waiting in the connection pool queue is also included in the timeout calculation, leading to unexpected TimeoutError exceptions even if the actual request could be completed within the specified timeout.

This behavior can cause issues when the connection pool is overloaded with many simultaneous requests, as clients experience timeouts primarily due to delays in acquiring a connection rather than delays in the actual request processing.

To Reproduce

  • Use the following server.py and client.py to replicate the issue.
  • Run the server first and then execute the client.
  • Observe the output and note the high count of TimeoutError

server.py

import asyncio

from aiohttp import web


async def main(_):
    await asyncio.sleep(1)
    return web.Response()


app = web.Application()
app.add_routes([web.get('/', main)])
web.run_app(app)

client.py

import asyncio
import time
import statistics as stat

import aiohttp


async def request(session) -> tuple[bool | None, float]:
    timeout_error = None
    start_time = time.time()

    try:
        async with session.get('/'):
            pass

        timeout_error = False
    except TimeoutError:
        timeout_error = True
    finally:
        end_time = time.time()
        duration = end_time - start_time

    return timeout_error, duration


async def main(n: int=1000):
    timeout = aiohttp.ClientTimeout(total=5)

    async with aiohttp.ClientSession('http://127.0.0.1:8080', timeout=timeout) as session:
        async with asyncio.TaskGroup() as tg:
            tasks = [tg.create_task(request(session)) for _ in range(n)]

    return [task.result() for task in tasks]


if __name__ == '__main__':
    results: list[tuple[bool | None, float]] = asyncio.run(main())
    assert len(results) == len([x for x in results if x[0] is not None])
    durations = [x[1] for x in results]
    print(f'Requests count: {len(results)}')
    print(f'Timeout Error count: {len([x for x in results if x[0] is True])}')
    print(f'First request time: {durations[0]}')
    print(f'Last request time: {durations[-1]}')
    print(f'Median request time: {stat.median(durations)}')

Run a server as follows:

python server.py

Then run a client:

python client.py

Despite setting the timeout to 5 seconds, many requests fail with TimeoutError.
The observed behavior indicates that the timeout starts before acquiring a connection from the pool, rather than only including the time for the actual request.

Expected behavior

All requests will be executed successfully without a TimeoutError.
The timeout should start counting only after a connection is acquired from the connection pool. This ensures that the specified timeout duration applies exclusively to the actual request-processing time, making the timeout behavior predictable and intuitive.

Logs/tracebacks

$ python client.py

Requests count: 1000
Timeout Error count: 500
First request time: 1.070129156112671
Last request time: 5.747848749160767
Median request time: 5.41119110584259

Python Version

3.13.0

aiohttp Version

3.11.11

multidict Version

6.1.0

propcache Version

0.2.1

yarl Version

1.18.3

OS

Debian GNU/Linux 11 (bullseye)

Related component

Client

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@mikev9 mikev9 added the bug label Jan 8, 2025
@Dreamsorcerer
Copy link
Member

I'm under the impression that is the intended design of the total timeout. i.e. It is the total time of everything.

@Dreamsorcerer Dreamsorcerer added question StackOverflow and removed bug labels Jan 8, 2025
@mikev9
Copy link
Author

mikev9 commented Jan 9, 2025

I'm under the impression that is the intended design of the total timeout. i.e. It is the total time of everything.

  • First of all, this is not reflected in the documentation. Here's what the documentation( says about the total parameter:
    The maximal number of seconds for the whole operation including connection establishment, request sending and response reading.
    connection establishment, request sending and response reading - all of this is logical and intuitively clear, and it's what you expect when setting a timeout for a request execution, but nothing is mentioned here about the time spent waiting for a connection from the pool. This gives the impression that it's not "intended design" after all.

  • Secondly, including the time spent waiting in the queue for a connection from the pool in the timeout is a very strange logic. When I set a timeout, I want to limit the request execution time "outside of the code" (network operations and the time spent processing the request on the server side).
    In the current timeout implementation, the first thing that comes to mind when encountering a TimeoutError is that the problem is on the server, that the server is taking too long to process the request, but that’s not the case. This is confusing.
    Also, with the current implementation, it’s impossible to properly limit request execution time during a high number of concurrent requests without adding extra logic to the user code, whereas this functionality should work "out of the box" and be intuitively expected from a feature like client timeout.

So all of this seems more like a bug rather than "intended design."

@Dreamsorcerer
Copy link
Member

It could be clearer, but it does say "for the whole operation". I think it'd probably make sense to change it to behave the way you expected though, as an overall timeout can be easily achieved with an external asyncio.timeout() call from the user.

It may not be as trivial as you think to change the code though, so feel free to take a look and submit a PR if it easy enough (unless anyone else has any objections).

@Dreamsorcerer Dreamsorcerer added enhancement and removed question StackOverflow labels Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants