Adds Variable Batching Proposal #307

michaelstaib · 2024-08-29T13:18:14Z

Within the Composite Schema WG we have discussed a new batching format that is in the first place meant for Subgraphs/Source Schema in a federated graph. This new variable batching allows the distributed GraphQL executor to execute the same operation with a set of variables.

graphql/composite-schemas-spec#25

Shane32 · 2024-08-29T13:33:39Z

Does the distributed GraphQL executor currently support 'regular' batching requests?

michaelstaib · 2024-08-29T13:34:34Z

@Shane32 we will specify this as well ... we call it at the moment request batching ... the idea is that a request batch can also consist of variable batches.

Shane32 · 2024-08-29T13:35:20Z

So it would support variable batching within request batching then?

michaelstaib · 2024-08-29T13:39:16Z

Yes, this is the current discussion. There are a lot of constraints we will put in place for the first iteration of this, we have explored this also in combination with subscriptions and all. But for this initial appendix we are focusing on variable batching first as this will be the minimum requirement for the composite schema spec.

Shane32 · 2024-08-29T13:59:21Z

Ok. I would suggest that we do not use the jsonl format.

If there is variable batching within request batching, there is no clear way to format such a response. Obviously wrapping a jsonl response in a JSON list produces unparsable JSON due to the missing commas:

[
{"data":{"hello":"world-1"}}
{"data":{"hello":"world-2"}}
,
{"data":{"name":"john-1"}}
{"data":{"name":"john-2"}}
]

Perhaps it could be specified that when request batching was layered on top of variable batching, then the lists are flattened. I'm not sure this is the best approach, but it's feasible.

If this scenario is not supported, implying that the request is request batching OR variable batching, then there is no reason to differentiate the response formats.
Parsing jsonl is not common in JSON environments, at least not the ones I'm familiar with. Attempting to parse such a response within .NET will throw an exception upon encountering the next line in the jsonl response. Similarly the JSON.parse method of javascript does not support jsonl and will throw. In any of these environments, one would have to code another layer to separate the responses before decoding the results, adding code complexity in addition to having a likely-slower implementation. It seems much cleaner and easier to parse the JSON and then iterate through the response list.

Let's use the response format that is commonplace now and supported by various servers and clients alike.

Perhaps as a separate appendix, the jsonl format is described as an optional response format for batching requests (request batching or variable batching). There it can state that if multiple batching approaches are used, the lists are flattened. But I just don't see the benefit of adding another response format.

michaelstaib · 2024-08-29T15:09:55Z

There actually is we have specified that there is a requestIndex and a variableIndex in the response structure ... we do need these also as there is @defer and @stream involved. Every execution could yield to a stream of responses. We have this also already implemented with reference implementations ... I will get through the spec text this week and then you will see how this plays out.

michaelstaib · 2024-08-29T15:21:09Z

BTW ... we will introduce requestIndex and variableIndex also since the server should be able reorder responses and not make the consumer wait just in order to return completed results in order.

michaelstaib · 2024-08-29T15:38:23Z

@Shane32 I have put a bit more about the response in.

michaelstaib · 2024-08-29T16:40:44Z

spec/Appendix B -- Variable Batching.md

@@ -0,0 +1,170 @@
+## B. Appendix: Variable Batching


After discussion in the composite schema working group we will have a single appendix about batching as this would allow for better example that show also both request batching and variable batching in combination.

@benjie shall we do also some example in here with defer and stream ... I asked that because defer and stream is not yet handled in the spec.

For now: no; but would be good to have the text handy anyway.

benjie

We should remove some of the repetition of the core spec in this appendix, try and see it more as "additions" (extra fields, extra rules, changes in behavior) rather than a re-implementation.

spec/Appendix B -- Variable Batching.md

benjie · 2025-01-09T16:12:09Z

spec/Appendix B -- Variable Batching.md

+
+A client SHOULD indicate the media types that it supports in responses using the `Accept` HTTP header as specified in [RFC7231](https://datatracker.ietf.org/doc/html/rfc7231).
+
+For **variable batching requests**, the client SHOULD include the media type `application/graphql+jsonl` in the `Accept` header to indicate that it expects a batched response in JSON Lines format.


There's a few issues with this:

+jsonl isn't allowed; see: https://www.rfc-editor.org/rfc/rfc6838.html#section-4.2.8

The primary guideline for whether a structured type name suffix is
registrable is that it be described by a readily available
description, preferably within a document published by an established
standards-related organization, and for which there's a reference
that can be used in a Normative References section of an RFC.

application/graphql is incorrect (that's normally used for GraphQL documents, not GraphQL responses) - I think you meant application/graphql-response?

This isn't a response but instead a number of responses.

Suggestion: just use application/jsonl as mentioned by https://jsonlines.org/ - this is sufficient to solve the issues that application/json has w.r.t. status codes/etc.

spec/Appendix B -- Variable Batching.md

benjie · 2025-01-09T16:19:11Z

spec/Appendix B -- Variable Batching.md

+
+### Response
+
+When a server receives a well-formed _variable batching request_, it MUST return a well‐formed stream of _GraphQL responses_. Each response in the stream corresponds to the result of validating and executing the requested operation with one set of variables. The server's response stream describes the outcome of each operation, including any errors encountered during the execution of the request.


I don't think we should use the word "stream" here. I think "list" is fine; I think the default media type would likely be e.g. application/graphql-response-list+json (a JSON array of GraphQL responses) but application/jsonl would allow for easier early parsing of responses when the client is stream-capable.

benjie · 2025-01-09T16:21:17Z

spec/Appendix B -- Variable Batching.md

+
+A server must comply with [RFC7231](https://datatracker.ietf.org/doc/html/rfc7231).
+
+Each response in the stream follows the standard GraphQL response format, with the addition of a required `variableIndex` field at the top level of each response. The `variableIndex` indicates the index of the variables map from the original request, allowing the client to associate each response with the correct set of variables.


The definition of the application/graphql-response type should be modified such that other keys are allowed iff they are specified as part of a specification or appendix hosted under the graphql.org domain; then this spec can "extend" that type with these keys. May want to go a slightly different direction on this in future.

spec/Appendix B -- Variable Batching.md

benjie · 2025-01-09T16:23:23Z

spec/Appendix B -- Variable Batching.md

+
+#### Response Structure
+
+Each line in the JSON Lines (JSONL) response MUST include the following fields:


The combination of MUST and optional is confusing. I would change this to something more along the lines of "in addition to the regular fields expected on a GraphQL response, each entry in a variable batching response MUST include the variableIndex field..."

benjie · 2025-01-09T16:37:55Z

Parsing jsonl is not common in JSON environments, at least not the ones I'm familiar with. Attempting to parse such a response within .NET will throw an exception upon encountering the next line in the jsonl response. Similarly the JSON.parse method of javascript does not support jsonl and will throw. In any of these environments, one would have to code another layer to separate the responses before decoding the results, adding code complexity in addition to having a likely-slower implementation. It seems much cleaner and easier to parse the JSON and then iterate through the response list.

It's not really fair to expect a JSON parser to parse JSONL... it's a different format. However; to make JSONL parseable as JSON is very straightforward; here's a JSONL parser for JS:

const parseJSONL = (jsonl) => JSON.parse(`[${jsonl.trim().replaceAll('\n', ',')}]`);

I'm not sure that you can really say that this significantly "increases the code complexity"? I would expect .NET to also be capable of:

trimming whitespace (including newlines) from both ends of a string
replacing all instances of a newline in the string with a comma
prepending an [ and appending a ]
parsing the result as JSON

JSONL is an incredibly straightforward format. Most importantly for this use case, it allows you to process the values as they stream down because you just scan through it looking for the next \n and when you hit it everything up until that point is parseable as JSON - you don't need a streaming parser as you would if you were to use JSON as the data type.

All that said, I think we should RECOMMEND that clients and servers implement JSONL, but we should REQUIRE that servers support plain JSON arrays. Which I think is already the case in this text (since SHOULD is equivalent to RECOMMEND according to RFC2119).

Co-authored-by: Benjie <[email protected]>

Shane32 · 2025-01-09T16:59:44Z

here's a JSONL parser for JS:

const parseJSONL = (jsonl) => JSON.parse(`[${jsonl.trim().replaceAll('\n', ',')}]`);

Fair enough; I didn't think of that.

I'm not sure that you can really say that this significantly "increases the code complexity"? I would expect .NET to also be capable of:

True, .NET could perform string manipulation also fairly easily. However, the typical .NET JSON parser performs on a UTF8 byte stream so that there is no additional memory allocations beyond the minimum required to deserialize the data. Converting the incoming data to a string, performing a number of string manipulations on it, and then parsing it would considerably slow down the JSON engine. To maintain speed on par with the default implementation, you'd have to write a specialized wrapper that parsed the characters as they were being read, looking for \n and either (a) reading each JSON line as it is streamed, or (b) creating a virtual list of 'sub-streams' for further parsing. Or perhaps a streaming wrapper that changed the characters similar to the JS conversion above so it could be read as JSON (but that would be even more complex).

JSONL is an incredibly straightforward format. Most importantly for this use case, it allows you to process the values as they stream down because you just scan through it looking for the next \n and when you hit it everything up until that point is parseable as JSON - you don't need a streaming parser as you would if you were to use JSON as the data type.

Probably true. I may have another viewpoint if I was more familiar with the use case (which I am not).

michaelstaib added 2 commits August 29, 2024 15:13

Adds Variable Batching Proposal

ac935e4

Added response section

ba03118

Shane32 mentioned this pull request Aug 29, 2024

Add Request Batching Appendix #308

Open

michaelstaib added 2 commits August 29, 2024 17:30

Started work on the response structure

e7b37d9

Starte work on the response body

f159e73

michaelstaib self-assigned this Aug 29, 2024

michaelstaib commented Aug 29, 2024

View reviewed changes

benjie mentioned this pull request Nov 18, 2024

Add security best practices page. graphql/graphql.github.io#1820

Merged

benjie requested changes Jan 9, 2025

View reviewed changes

michaelstaib and others added 6 commits January 9, 2025 17:40

Update spec/Appendix B -- Variable Batching.md

6c995bd

Co-authored-by: Benjie <[email protected]>

Update spec/Appendix B -- Variable Batching.md

c8755c8

Co-authored-by: Benjie <[email protected]>

Update spec/Appendix B -- Variable Batching.md

654165b

Co-authored-by: Benjie <[email protected]>

Update spec/Appendix B -- Variable Batching.md

c9c4253

Co-authored-by: Benjie <[email protected]>

Update spec/Appendix B -- Variable Batching.md

d1212a7

Co-authored-by: Benjie <[email protected]>

Update spec/Appendix B -- Variable Batching.md

16be80a

Co-authored-by: Benjie <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds Variable Batching Proposal #307

Adds Variable Batching Proposal #307

michaelstaib commented Aug 29, 2024

Shane32 commented Aug 29, 2024

michaelstaib commented Aug 29, 2024

Shane32 commented Aug 29, 2024

michaelstaib commented Aug 29, 2024 •

edited

Loading

Shane32 commented Aug 29, 2024 •

edited

Loading

michaelstaib commented Aug 29, 2024 •

edited

Loading

michaelstaib commented Aug 29, 2024

michaelstaib commented Aug 29, 2024

michaelstaib Aug 29, 2024

michaelstaib Aug 29, 2024

benjie Sep 19, 2024

benjie left a comment

benjie Jan 9, 2025

benjie Jan 9, 2025

benjie Jan 9, 2025

benjie Jan 9, 2025

benjie commented Jan 9, 2025

Shane32 commented Jan 9, 2025 •

edited

Loading


		A client SHOULD indicate the media types that it supports in responses using the `Accept` HTTP header as specified in [RFC7231](https://datatracker.ietf.org/doc/html/rfc7231).

		For variable batching requests, the client SHOULD include the media type `application/graphql+jsonl` in the `Accept` header to indicate that it expects a batched response in JSON Lines format.


		### Response

		When a server receives a well-formed _variable batching request_, it MUST return a well‐formed stream of _GraphQL responses_. Each response in the stream corresponds to the result of validating and executing the requested operation with one set of variables. The server's response stream describes the outcome of each operation, including any errors encountered during the execution of the request.


		A server must comply with [RFC7231](https://datatracker.ietf.org/doc/html/rfc7231).

		Each response in the stream follows the standard GraphQL response format, with the addition of a required `variableIndex` field at the top level of each response. The `variableIndex` indicates the index of the variables map from the original request, allowing the client to associate each response with the correct set of variables.


		#### Response Structure

		Each line in the JSON Lines (JSONL) response MUST include the following fields:

Adds Variable Batching Proposal #307

Are you sure you want to change the base?

Adds Variable Batching Proposal #307

Conversation

michaelstaib commented Aug 29, 2024

Shane32 commented Aug 29, 2024

michaelstaib commented Aug 29, 2024

Shane32 commented Aug 29, 2024

michaelstaib commented Aug 29, 2024 • edited Loading

Shane32 commented Aug 29, 2024 • edited Loading

michaelstaib commented Aug 29, 2024 • edited Loading

michaelstaib commented Aug 29, 2024

michaelstaib commented Aug 29, 2024

michaelstaib Aug 29, 2024

Choose a reason for hiding this comment

michaelstaib Aug 29, 2024

Choose a reason for hiding this comment

benjie Sep 19, 2024

Choose a reason for hiding this comment

benjie left a comment

Choose a reason for hiding this comment

benjie Jan 9, 2025

Choose a reason for hiding this comment

benjie Jan 9, 2025

Choose a reason for hiding this comment

benjie Jan 9, 2025

Choose a reason for hiding this comment

benjie Jan 9, 2025

Choose a reason for hiding this comment

benjie commented Jan 9, 2025

Shane32 commented Jan 9, 2025 • edited Loading

michaelstaib commented Aug 29, 2024 •

edited

Loading

Shane32 commented Aug 29, 2024 •

edited

Loading

michaelstaib commented Aug 29, 2024 •

edited

Loading

Shane32 commented Jan 9, 2025 •

edited

Loading