Question: client window active status in the headless mode? #835

OrKoN · 2024-12-20T13:55:52Z

https://w3c.github.io/webdriver-bidi/#get-the-client-window-info says:

If client window can receive keyboard input channeled from the operating system, let active be true, otherwise let active be false.

what does it mean for the headless mode of running the browser when no windows are visible? should all windows be active=false or active=true? should browsers in the headless mode accept keyboard input from the user or only allow input via the protocol?

the headless mode is not really standardized but Firefox and Chrome support it and it might be relevant for the users to get consistent active status across headless browsers.

cc @sadym-chromium

OrKoN · 2024-12-20T14:08:55Z

A test case:

import puppeteer from "puppeteer";

const browser = await puppeteer.launch({
  browser: 'firefox',
});

const page = await browser.newPage();

await page.goto('https://domeventviewer.com/key-event-viewer.html');

console.log('type now')

console.log(JSON.stringify(await browser.connection.send('browser.getClientWindows', {}), null, 2));

await new Promise(resolve => setTimeout(resolve, 5000))

await page.screenshot({
  path: 'keyboard.png',
});

await browser.close();

the following code returns that a window is active but the window is not able to receive keyboard input from the operating system.

jgraham · 2024-12-20T17:46:41Z

What's the visibility state of these windows?

I don't know how headless usually deals with system focus, especially in the presence of multiple headless windows i.e. whether it assumes that at most one window can have (emulated) focus, or if all of them do. Maybe it differs between implementations.

OrKoN · 2024-12-20T17:53:26Z

What's the visibility state of these windows?

I don't know how headless usually deals with system focus, especially in the presence of multiple headless windows i.e. whether it assumes that at most one window can have (emulated) focus, or if all of them do. Maybe it differs between implementations.

so we have two implementations in Chrome: in the headless shell the visibility for all windows is faked as visible, and in the headless Chrome (which is based on the headful Chrome with all UI hidden) the visibility is determined by the protocol commands (i.e., creating a new page in the foreground will make the window appear visible to the web APIs or activating via browsingContext.activate). But in all those case, the windows do not receive user input events, the only way to send events is via an automation protocol. The Firefox implementation seems to be more similar to headless Chrome.

whimboo · 2025-01-09T09:11:54Z

@OrKoN yes, that sounds correct. Just to get a clarification for the old headless mode, I've read in this blog post that it will be removed at some point. So do we still have to take that mode into account for this particular issue or can we ignore it?

OrKoN · 2025-01-09T09:21:37Z

@OrKoN yes, that sounds correct. Just to get a clarification for the old headless mode, I've read in this blog post that it will be removed at some point. So do we still have to take that mode into account for this particular issue or can we ignore it?

it's removed from the chrome binary but it will be still available as a standalone browser binary that we intend to support, although we accept that it might not be behaving up to the spec due to its history. So for this issue I think we should only consider the new headless/headless Chrome.

whimboo · 2025-01-09T09:52:17Z

That’s good to hear. What remains to be discussed is the scope of supported user input. It might be a good idea to adjust the issue topic to reflect this focus.

After further consideration, I believe we should disallow the browser from receiving events directly from the operating system. Such events cannot be reliably targeted to a specific context and could escape the content scope. This would allow external code to interact with the browser UI, which should be prohibited by default to maintain security and isolation between contexts.

Do I miss something where those events would be useful to receive or is it just about the wording in the BiDi spec?

OrKoN · 2025-01-09T09:56:59Z

That’s good to hear. What remains to be discussed is the scope of supported user input. It might be a good idea to adjust the issue topic to reflect this focus.

After further consideration, I believe we should disallow the browser from receiving events directly from the operating system. Such events cannot be reliably targeted to a specific context and could escape the content scope. This would allow external code to interact with the browser UI, which should be prohibited by default to maintain security and isolation between contexts.

Do I miss something where those events would be useful to receive or is it just about the wording in the BiDi spec?

I do not think that this is the issue. This issue is about the spec saying that a window is active if it is able to receive keyboard input. In headful mode, this is no problem, the top-most window usually can receive the user input. In the headless mode though none of the windows can receive keyboard input, thus, making headless mode behave differently from the headful mode for the same automation code w.r.t. to the active flag.

jgraham · 2025-01-09T10:01:08Z

To be clear it doesn't actually affect whether you can interact with the window via e.g. actions; you can already send actions to windows that aren't active. It just affects whether we claim the window is active or not.

OrKoN · 2025-01-09T10:14:32Z

I think it is also not easily possible to make headless browser to receive keyboard input from the operating system (since there is nothing for the OS to send events to) and I do not think it's a desired behavior.

I believe we can do one of the following:

Implement this flag according to the current spec text, meaning that in the headless mode all windows are reported as inactive.
Adjust the spec and find another condition to determine if a window is considered to be active: for example, the window that was last activated using WebDriver if the browser instance is not able to receive events from the OS? or perhaps all windows should be reported as active if the browser instance is not able to receive events from the OS?

jgraham · 2025-01-09T11:41:30Z

Yes, I agree that those are the options, and that the current spec requires option 1 (but that this wasn't considered).

I guess the question is what the value / use case of the active flag actually is.

If people are using it to determine which windows the browser is considering to be in the foreground (and so have at least one document that not subject to throttling etc.) we should probably have a different definition that considers that directly rather than depending on keyboard input. For headless that might mean all windows are "active". However since throttling are per traversable not per OS window, this doesn't obviously seem like a good fit.

If the idea is that calling browsingContext.activate always results in the window containing that browsing context being active then I think that's quite hard to maintain as an invariant in non-headless mode (anything at the OS layer could change which window has OS level focus at any time).

If the idea is that it really is about which window could be receiving real user input, then I think marking all headless windows as not active is totally reasonable. It would also preserve the (I assume) invariant that there's at most 1 active window at a time. Of course you can't really rely on that information for much (because it can change at any time, and you never know if you have a race condition). Support for a feature like "UI that responds to whether the window is currently active" would require something like an event.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: client window active status in the headless mode? #835

Question: client window active status in the headless mode? #835

OrKoN commented Dec 20, 2024 •

edited

Loading

OrKoN commented Dec 20, 2024 •

edited

Loading

jgraham commented Dec 20, 2024

OrKoN commented Dec 20, 2024

whimboo commented Jan 9, 2025

OrKoN commented Jan 9, 2025

whimboo commented Jan 9, 2025

OrKoN commented Jan 9, 2025 •

edited

Loading

jgraham commented Jan 9, 2025

OrKoN commented Jan 9, 2025

jgraham commented Jan 9, 2025

Question: client window active status in the headless mode? #835

Question: client window active status in the headless mode? #835

Comments

OrKoN commented Dec 20, 2024 • edited Loading

OrKoN commented Dec 20, 2024 • edited Loading

jgraham commented Dec 20, 2024

OrKoN commented Dec 20, 2024

whimboo commented Jan 9, 2025

OrKoN commented Jan 9, 2025

whimboo commented Jan 9, 2025

OrKoN commented Jan 9, 2025 • edited Loading

jgraham commented Jan 9, 2025

OrKoN commented Jan 9, 2025

jgraham commented Jan 9, 2025

OrKoN commented Dec 20, 2024 •

edited

Loading

OrKoN commented Dec 20, 2024 •

edited

Loading

OrKoN commented Jan 9, 2025 •

edited

Loading