Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: client window active status in the headless mode? #835

Open
OrKoN opened this issue Dec 20, 2024 · 10 comments
Open

Question: client window active status in the headless mode? #835

OrKoN opened this issue Dec 20, 2024 · 10 comments

Comments

@OrKoN
Copy link
Contributor

OrKoN commented Dec 20, 2024

https://w3c.github.io/webdriver-bidi/#get-the-client-window-info says:

If client window can receive keyboard input channeled from the operating system, let active be true, otherwise let active be false.

what does it mean for the headless mode of running the browser when no windows are visible? should all windows be active=false or active=true? should browsers in the headless mode accept keyboard input from the user or only allow input via the protocol?

the headless mode is not really standardized but Firefox and Chrome support it and it might be relevant for the users to get consistent active status across headless browsers.

cc @sadym-chromium

@OrKoN
Copy link
Contributor Author

OrKoN commented Dec 20, 2024

A test case:

import puppeteer from "puppeteer";

const browser = await puppeteer.launch({
  browser: 'firefox',
});

const page = await browser.newPage();

await page.goto('https://domeventviewer.com/key-event-viewer.html');

console.log('type now')

console.log(JSON.stringify(await browser.connection.send('browser.getClientWindows', {}), null, 2));

await new Promise(resolve => setTimeout(resolve, 5000))

await page.screenshot({
  path: 'keyboard.png',
});

await browser.close();

the following code returns that a window is active but the window is not able to receive keyboard input from the operating system.

@jgraham
Copy link
Member

jgraham commented Dec 20, 2024

What's the visibility state of these windows?

I don't know how headless usually deals with system focus, especially in the presence of multiple headless windows i.e. whether it assumes that at most one window can have (emulated) focus, or if all of them do. Maybe it differs between implementations.

@OrKoN
Copy link
Contributor Author

OrKoN commented Dec 20, 2024

What's the visibility state of these windows?

I don't know how headless usually deals with system focus, especially in the presence of multiple headless windows i.e. whether it assumes that at most one window can have (emulated) focus, or if all of them do. Maybe it differs between implementations.

so we have two implementations in Chrome: in the headless shell the visibility for all windows is faked as visible, and in the headless Chrome (which is based on the headful Chrome with all UI hidden) the visibility is determined by the protocol commands (i.e., creating a new page in the foreground will make the window appear visible to the web APIs or activating via browsingContext.activate). But in all those case, the windows do not receive user input events, the only way to send events is via an automation protocol. The Firefox implementation seems to be more similar to headless Chrome.

@whimboo
Copy link
Contributor

whimboo commented Jan 9, 2025

@OrKoN yes, that sounds correct. Just to get a clarification for the old headless mode, I've read in this blog post that it will be removed at some point. So do we still have to take that mode into account for this particular issue or can we ignore it?

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 9, 2025

@OrKoN yes, that sounds correct. Just to get a clarification for the old headless mode, I've read in this blog post that it will be removed at some point. So do we still have to take that mode into account for this particular issue or can we ignore it?

it's removed from the chrome binary but it will be still available as a standalone browser binary that we intend to support, although we accept that it might not be behaving up to the spec due to its history. So for this issue I think we should only consider the new headless/headless Chrome.

@whimboo
Copy link
Contributor

whimboo commented Jan 9, 2025

That’s good to hear. What remains to be discussed is the scope of supported user input. It might be a good idea to adjust the issue topic to reflect this focus.

After further consideration, I believe we should disallow the browser from receiving events directly from the operating system. Such events cannot be reliably targeted to a specific context and could escape the content scope. This would allow external code to interact with the browser UI, which should be prohibited by default to maintain security and isolation between contexts.

Do I miss something where those events would be useful to receive or is it just about the wording in the BiDi spec?

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 9, 2025

That’s good to hear. What remains to be discussed is the scope of supported user input. It might be a good idea to adjust the issue topic to reflect this focus.

After further consideration, I believe we should disallow the browser from receiving events directly from the operating system. Such events cannot be reliably targeted to a specific context and could escape the content scope. This would allow external code to interact with the browser UI, which should be prohibited by default to maintain security and isolation between contexts.

Do I miss something where those events would be useful to receive or is it just about the wording in the BiDi spec?

I do not think that this is the issue. This issue is about the spec saying that a window is active if it is able to receive keyboard input. In headful mode, this is no problem, the top-most window usually can receive the user input. In the headless mode though none of the windows can receive keyboard input, thus, making headless mode behave differently from the headful mode for the same automation code w.r.t. to the active flag.

@jgraham
Copy link
Member

jgraham commented Jan 9, 2025

To be clear it doesn't actually affect whether you can interact with the window via e.g. actions; you can already send actions to windows that aren't active. It just affects whether we claim the window is active or not.

@OrKoN
Copy link
Contributor Author

OrKoN commented Jan 9, 2025

I think it is also not easily possible to make headless browser to receive keyboard input from the operating system (since there is nothing for the OS to send events to) and I do not think it's a desired behavior.

I believe we can do one of the following:

  1. Implement this flag according to the current spec text, meaning that in the headless mode all windows are reported as inactive.
  2. Adjust the spec and find another condition to determine if a window is considered to be active: for example, the window that was last activated using WebDriver if the browser instance is not able to receive events from the OS? or perhaps all windows should be reported as active if the browser instance is not able to receive events from the OS?

@jgraham
Copy link
Member

jgraham commented Jan 9, 2025

Yes, I agree that those are the options, and that the current spec requires option 1 (but that this wasn't considered).

I guess the question is what the value / use case of the active flag actually is.

If people are using it to determine which windows the browser is considering to be in the foreground (and so have at least one document that not subject to throttling etc.) we should probably have a different definition that considers that directly rather than depending on keyboard input. For headless that might mean all windows are "active". However since throttling are per traversable not per OS window, this doesn't obviously seem like a good fit.

If the idea is that calling browsingContext.activate always results in the window containing that browsing context being active then I think that's quite hard to maintain as an invariant in non-headless mode (anything at the OS layer could change which window has OS level focus at any time).

If the idea is that it really is about which window could be receiving real user input, then I think marking all headless windows as not active is totally reasonable. It would also preserve the (I assume) invariant that there's at most 1 active window at a time. Of course you can't really rely on that information for much (because it can change at any time, and you never know if you have a race condition). Support for a feature like "UI that responds to whether the window is currently active" would require something like an event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants