-
-
Notifications
You must be signed in to change notification settings - Fork 972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix #1806: N+1 query issue on dashboard index page #1813
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
dashboard/views.py
Outdated
metrics.extend( | ||
MC.objects.filter(show_on_dashboard=True).select_related("category") | ||
) | ||
|
||
content_types = ContentType.objects.get_for_models(*metrics) | ||
datum_queryset = Datum.objects.none() | ||
for metric, content_type in content_types.items(): | ||
datum_queryset = datum_queryset.union( | ||
Datum.objects.filter( | ||
content_type_id=content_type.id, object_id=metric.id | ||
).order_by("-timestamp")[0:1] | ||
) | ||
|
||
latest_datums = { | ||
(datum.object_id, datum.content_type_id): datum for datum in datum_queryset | ||
} | ||
|
||
data = [] | ||
for metric in metrics: | ||
data.append({"metric": metric, "latest": metric.data.latest()}) | ||
for metric, content_type in content_types.items(): | ||
if latest := latest_datums.get((metric.id, content_type.id)): | ||
data.append({"metric": metric, "latest": latest}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to write this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to find an alternate approach (got pretty close with a Datum.objects.values('content_type', 'object_id').annotate(Max('timestamp'))
but unfortunately that can only get the latest timestamp for each metric, not the associated measurement).
Your original approach using UNION
seems quite good (and it's quite clever).
I've made a suggestion for a slight rewrite that is a bit shorter and also closer to the original view implementation (it does so by pushing some logic onto the Datum
manager). If you like it I can push a commit onto this branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! It's very interesting how you've moved most of the logic to DatumQuerySet
!
The only concern is that latest = latest_by_metric.get((metric.content_type.pk, metric.pk))
could return None. Will the frontend be able to handle rendering of None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the current behavior for a metric with no data is for the view to crash, so it can't be worse than that 😁
(good call though, I'll test it out)
dashboard/tests.py
Outdated
@@ -33,10 +33,12 @@ def setUp(self): | |||
def test_index(self): | |||
for MC in Metric.__subclasses__(): | |||
for metric in MC.objects.filter(show_on_dashboard=True): | |||
metric.data.create(measurement=44) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding extra data to ensure the query will retrieve the latest datum for each metric.
a529fc0
to
2288df6
Compare
@ontowhee It took a few tries, but I think I finally landed on something that I think I like. I moved a lot of the logic onto manager methods so I could simplify the view (and it also simplified the N+1 tests sinceI could test directly against the manager method rather thant involve the view). I managed to switch to a subquery instead of a union which I think simplifies the code, but I don't know if the performance is better/worse/equivalent. Do you know anything about that? I also added an explicit Let me know what you think, if you're OK with it I'll merge all commits on this branch into one, with both of us as co-authors. |
For commit 8fe1e01, the overall performance was 110ms. For commit 2288df6, using the subquery seems to be worse for TracTicketMetric. It took 186ms to query that table, and overall it took 224ms. Will this performance be noticeable by the end user or the server? Django debug toolbar also identifies "4 similar" queries on ContentType due to calling .for_dashboard() in the loop. Will Sentry flag these 4 similar queries as an N+1 issue? |
Thanks for doing the measurements! The difference between 110 and 186ms doesn't seem so bad to me, but I would like to know your opinion too.
|
It doesn't seem too bad to me either. I wasn't sure how similar the local database was to production. I noticed that the results are returned in much less time in production too.
You're right, on first load it shows the warning, but on refresh, the content type queries are cached along with all the other queries. (0 queries!) I think this is good to merge! Thanks so much for refactoring this. I learned so much from your code!! |
I was playing around to understand the queries better. Here's another approach that queries on the Datum model. It uses Not asking for any changes here. I think your approach of querying on the Metric is easier to understand, and it’s already an improvement from the current dashboard! |
fix #1806
The existing query was looping on metrics to grab the latest Datum for each metric, which led to 18 similar queries reported by django-debug-toolbar. There were 45 sql queries overall for the page.
This is WIP. The change here reduces the number of queries down to 1 large statement of unions of each metric's latest datum. There are 9 sql queries overall for the page.
Changes include:
select_related('category')
for the metric query that is used for sorting based ondisplay_position
.