-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: client window active status in the headless mode? #835
Comments
A test case: import puppeteer from "puppeteer";
const browser = await puppeteer.launch({
browser: 'firefox',
});
const page = await browser.newPage();
await page.goto('https://domeventviewer.com/key-event-viewer.html');
console.log('type now')
console.log(JSON.stringify(await browser.connection.send('browser.getClientWindows', {}), null, 2));
await new Promise(resolve => setTimeout(resolve, 5000))
await page.screenshot({
path: 'keyboard.png',
});
await browser.close(); the following code returns that a window is active but the window is not able to receive keyboard input from the operating system. |
What's the visibility state of these windows? I don't know how headless usually deals with system focus, especially in the presence of multiple headless windows i.e. whether it assumes that at most one window can have (emulated) focus, or if all of them do. Maybe it differs between implementations. |
so we have two implementations in Chrome: in the headless shell the visibility for all windows is faked as visible, and in the headless Chrome (which is based on the headful Chrome with all UI hidden) the visibility is determined by the protocol commands (i.e., creating a new page in the foreground will make the window appear visible to the web APIs or activating via browsingContext.activate). But in all those case, the windows do not receive user input events, the only way to send events is via an automation protocol. The Firefox implementation seems to be more similar to headless Chrome. |
@OrKoN yes, that sounds correct. Just to get a clarification for the old headless mode, I've read in this blog post that it will be removed at some point. So do we still have to take that mode into account for this particular issue or can we ignore it? |
it's removed from the chrome binary but it will be still available as a standalone browser binary that we intend to support, although we accept that it might not be behaving up to the spec due to its history. So for this issue I think we should only consider the new headless/headless Chrome. |
That’s good to hear. What remains to be discussed is the scope of supported user input. It might be a good idea to adjust the issue topic to reflect this focus. After further consideration, I believe we should disallow the browser from receiving events directly from the operating system. Such events cannot be reliably targeted to a specific context and could escape the content scope. This would allow external code to interact with the browser UI, which should be prohibited by default to maintain security and isolation between contexts. Do I miss something where those events would be useful to receive or is it just about the wording in the BiDi spec? |
I do not think that this is the issue. This issue is about the spec saying that a window is active if it is able to receive keyboard input. In headful mode, this is no problem, the top-most window usually can receive the user input. In the headless mode though none of the windows can receive keyboard input, thus, making headless mode behave differently from the headful mode for the same automation code w.r.t. to the active flag. |
To be clear it doesn't actually affect whether you can interact with the window via e.g. actions; you can already send actions to windows that aren't active. It just affects whether we claim the window is active or not. |
I think it is also not easily possible to make headless browser to receive keyboard input from the operating system (since there is nothing for the OS to send events to) and I do not think it's a desired behavior. I believe we can do one of the following:
|
Yes, I agree that those are the options, and that the current spec requires option 1 (but that this wasn't considered). I guess the question is what the value / use case of the If people are using it to determine which windows the browser is considering to be in the foreground (and so have at least one document that not subject to throttling etc.) we should probably have a different definition that considers that directly rather than depending on keyboard input. For headless that might mean all windows are "active". However since throttling are per traversable not per OS window, this doesn't obviously seem like a good fit. If the idea is that calling If the idea is that it really is about which window could be receiving real user input, then I think marking all headless windows as not active is totally reasonable. It would also preserve the (I assume) invariant that there's at most 1 active window at a time. Of course you can't really rely on that information for much (because it can change at any time, and you never know if you have a race condition). Support for a feature like "UI that responds to whether the window is currently active" would require something like an event. |
https://w3c.github.io/webdriver-bidi/#get-the-client-window-info says:
what does it mean for the headless mode of running the browser when no windows are visible? should all windows be active=false or active=true? should browsers in the headless mode accept keyboard input from the user or only allow input via the protocol?
the headless mode is not really standardized but Firefox and Chrome support it and it might be relevant for the users to get consistent active status across headless browsers.
cc @sadym-chromium
The text was updated successfully, but these errors were encountered: